Prometheus Blogs

Beyond Vanilla JSON

04/02/2016

JSON is a great way to transport information. When I first suggested using JSON for a data transfer standard around 2003, the stakeholders thought I was nuts. Now some standards (like HL7 FHIR) embrace JSON. Despite the apparently stalled1 JSON Schema effort, JSON has become very popular. For AJAX applications, JSON has largely supplanted2 XML. JSON is therefore well known to developers.

Off-the-shelf (ECMA-404) JSON nevertheless falls short of perfect, at least for graph transport in object oriented systems. Without any extra conventions, all JSON can represent is a tree (directed acyclic graph). Unfortunately objects are often highly, cyclicly, interconnected. Another problem is that JSON does not specify object classes. That may be acceptable in a classless language like JavaScript, but it can be ambiguous in class-based3 languages like Ruby or Java. For example, different classes may represent similar (but not identical) semantics, and even have identical structures (but probably different behaviors). When the structures are identical, it wouldn’t be clear which class to instantiate on the receiving end. My final gripe is that some pretty fundamental types are not represented in JSON. Dates, times, timestamps, and binary come to mind.

[{
	"title": "The Eyes of the Overworld",
	"isbn": "0671809040",
	"author": {
		"First Name": "Jack",
		"Last Name": "Vance"
	}
}, {
	"title": "Confederacy of Dunces",
	"isbn": "0802130208",
	"author": {
		"First Name": "John",
		"Last Name": "Toole"
	}
}, {
	"title": "The Languages of Pao",
	"isbn": "0879975415",
	"author": {
		"First Name": "Jack",
		"Last Name": "Vance"
	}
}]

Vanilla JSON has no way to reference another object. This can result in duplication.

There are, of course, solutions to all these problems. For example, you could designate:

  • A JSON attribute (such as “@type”) to contain the class name. Better yet, use package paths4 instead of class names, so that multiple namespaces can be used without conflict.
  • A JSON attribute (such as “@id”) to contain an object identifier for complex (non-primitive) objects.
  • A JSON attribute (such as “@ref”) that can be used to reference an object through its “@id”.

[{
	"@type": "Library.Book.Travelogue",
	"title": "The Eyes of the Overworld",
	"isbn": "0671809040",
	"author": {
		"@type": "Library.Author",
		"@id": 1,
		"First Name": "Jack",
		"Last Name": "Vance"
	}
}, {
	"@type": "Library.Book.Comedy",
	"title": "Confederacy of Dunces",
	"isbn": "0802130208",
	"author": {
		"@type": "Library.Author",
		"First Name": "John",
		"Last Name": "Toole"
	}
}, {
	"@type": "Library.Book.Philosopy",
	"title": "The Languages of Pao",
	"isbn": "0879975415",
	"author": {
		"@ref": 1
	}
}]

Depth-first serialized JSON with references. Classes are also included. In this case, the @id attribute was included only for objects that are referenced from another part of the stream.

For depth-first marshalling (serialization), when you encounter an object that’s already serialized, you specify only the @ref attribute. You can instead do breadth-first marshalling by writing objects in full (with associated objects represented only by an object that has only an @ref attribute), all at the same level5. Some of the @ref attributes would point to nothing until the traversal is finished. Personally, I think breadth first is pretty sensible, and it’s hardly ever done. STEP files (defined by ISO-10303-21) are examples of breadth-first marshalling.

[{
	"@type": "Library.Book.Travelogue",
	"title": "The Eyes of the Overworld",
	"isbn": "0671809040",
	"author": {
		"@ref": 1
	}
}, {
	"@type": "Library.Book.Comedy",
	"title": "Confederacy of Dunces",
	"isbn": "0802130208",
	"author": {
		"@ref": 2
	}
}, {
	"@type": "Library.Book.Philosopy",
	"title": "The Languages of Pao",
	"isbn": "0879975415",
	"author": {
		"@ref": 1
	}
}, {
	"@type": "Library.Author",
	"@id": 1,
	"First Name": "Jack",
	"Last Name": "Vance"
}, {
	"@type": "Library.Author",
	"@id": 2,
	"First Name": "John",
	"Last Name": "Toole"
}]

Breadth-first marshalling. The second book is encountered before the first book’s author, so a reference is written.

The three attributes described above handily solve the first two problems, and that’s exactly what the json-io library does. This solution is syntactically valid plain-vanilla JSON. There is no shortage of alternatives to json-io, including jsog, Flexjson, and JSON Graph.

Reserving special attributes (like @type, @id, and @ref) does mean they are not available for other purposes. This might not be a problem, if you don’t have any attribute names that begin with ‘@’. If it is a problem, just add an unlikely-to-be-encountered prefix or suffix to the reserved attribute names.

The above @id and @ref solution may seem weird when it’s marshalled depth first, because a single graph may be represented by multiple, different looking, JSON strings. That’s because the different ways of traversing the graph during serialization will all give you back the same graph during unmarshalling. If that troubles you, all you have to do is nail down the sequence in which complex object attributes are traversed. It’s hardly worth it though, since it has no effect on the unmarshalled object graph.

The third problem can be resolved by EJSON, which adds Date, Binary, and custom types. This too is valid plain-vanilla JSON.


  1. The IETF standard draft expired January 31, 2013. Work nevertheless proceeds.

  2. AJAJ just doesn’t have the same ring to it.

  3. I will refrain from saying “classy”, especially in the same breath as “Java”.

  4. For example, delimited by Java’s ‘.’ or UML/Ruby’s ‘::’

  5. Breadth first files are pretty flat. Not counting a root object or collection, they are only one or two levels deep. The first level is for objects, and the second level is for references.

Reader Comments
Leave a Comment

Back