Calvin's Blog

Intro to Streaming JSON

∙ json∙ data-format
article meme

Introduction

While working on a disk space analyzer tool I was outputting the data as a hierarchical JSON object that resembles the file system being scanned (directories that contain files and other directories… etc).

My resulting JSON object was about 2 GB in size, and the web application I wrote to visualize the data was choking hard on such a large file… Turns out the browser was not meant to locally process multi GB files all at once…

The fact that my JSON was a single highly nested object meant I would need to destructure it and find a way to read one object at a time from the input and putting the hierarchy back together once the object had been read from the file.

Streaming JSON was what I used for this.

What is Streaming JSON

Streaming JSON is just normal JSON objects but character delimited or length prefixed.

A file or stream can contain multiple JSON objects and a parser aware of Streaming JSON will be able to pull out and parse each object in the stream.

This can be very useful in case you have a live stream of objects to receive and parse, or in my case if you just want to do something really impractical and parse multi GB files locally in the browser.

Delimited Streaming JSON

Delimited Streaming JSON is just a stream of JSON objects separated by a specific character that you choose. To avoid conflicts with content in the JSON objects its a good idea to try and use standard separator characters.

According to my reading its also a good idea to end JSON objects with a new line so that non object top level values can be parsed like numbers or boolean values.

In this example for clarity my delimiting character is |

|{"key":1}\n|{"key":"1"}\n|{"key":false}\n

Record Separator Delimited JSON

This is a specific format of delimited Streaming JSON where the delimiter is a record separator and the JSON objects end with a new line character.

This format of Streaming JSON is defined in RFC-7464 and even has its own MIME type application/json-seq

In this example \036 is the escape code for the record separator character and \n is the new line character

\036{"key":1}\n\036{"key":"1"}\n\036{"key":false}\n

Length Prefixed Streaming JSON

Finally we have the length prefixed Streaming JSON format which in my testing was the most performant. It makes sense when you think about it.

Length Delimited Streaming JSON is a series of JSON objects all back to back, but before each object the length in the object in bytes is prefixed to the object.

9{"key":1}11{"key":"1"}13{"key":false}

This can be very fast because you only care about checking each character in the stream until the objects opening {. After that we just read the next N bytes from the stream where N is the number we read from the beginning of the object.

Code Examples

fyi icon FYI
My delimited reader and writer is generic and not specific to the record separator character.

I have working Read and Write examples for these Streaming JSON formats here.