Comparison of JSON Like Serializations – JSON vs UBJSON vs MessagePack vs CBOR

Recently I’ve been working on some extensions to ASEXOR, adding there direct support for messaging via WebSocket and I use JSON for small messages that travels between client (browser or standalone) and backend. Messages looks like these:

messages = [
    {'call_id': 1, 'kwargs': {}, 'args': ['sleep', 0.1]},
    {'call_id': 1, 't': 'r', 'returned': 'd53b2823d35b471282ab5c8b6c2e4685'},
    {'call_id': 2, 'kwargs': {'utc': True}, 'args': ['date', '%d-%m-%Y %H:%M %Z']},
    {'call_id': 2, 't': 'r', 'returned': '77da239342e240a0a3078d50019a20a0'},
    {'call_id': 1, 'data': {'status': 'started', 'task_id': 'd53b2823d35b471282ab5c8b6c2e4685'}, 't': 'm'},
    {'call_id': 2, 'data': {'status': 'started', 'task_id': '77da239342e240a0a3078d50019a20a0'}, 't': 'm'},
    {'call_id': 1, 'data': {'status': 'success', 'task_id': 'd53b2823d35b471282ab5c8b6c2e4685', 'result': None, 'duration': 0.12562298774719238}, 't': 'm'},
    {'call_id': 2, 'data': {'status': 'success', 'task_id': '77da239342e240a0a3078d50019a20a0', 'result': '27-02-2017 11:46 UTC', 'duration': 0.04673957824707031}, 't': 'm'}
    
]

I wondered, if choosing different serialization format(s) (similar to JSON, but binary) could bring more efficiency into the application – considering both message size and encoding/decoding processing time. I run small tests in python 3.5 (CPython and PyPy) (see tests here on gist) with few established serializers, which can be used as quick replacement for JSON and below are results (updated Dec 2nd 2017 thanks to comment below, as situation changed a bit with new libraries versions):

Format	Total messages size (bytes)	Processing time 10000 x encoding/decoding all messages	PyPy 3
JSON (standard library)	798	789 ms	706 ms
JSON (ujson)	798	181 ms	3.14 s
MessagePack (official lib)	591	286 ms	314 ms
MessagePack (umsgpack)	585	435 ms	519 ms
CBOR	585	164 ms	313 ms
UBJSON	668	292 ms	406 ms

As messaging can use clients in web browser we can also look at performace of some serializers in Javascript on this page. As JSON serialization in part of browsers Web API, unsurprisingly it’s fastest there.

All alternative libraries are faster then standard library JSON, some improved significatly form previous tests ( UBJSON and umsgpack). Standard library implementation of JSON serializer can be easily replaced by better performing ujson package. In PyPy interpreter standard library JSON is doing a slightly better, however every other library is performing worse, notably ujson.

Conclusions

JSON is today really ubiquitous, thanks to it’s ease of use and readability. It’s probably good choice for many usage scenarios and luckily JSON serializers show good performance. If size of messages is of some concern, CBOR looks like great, almost instant replacement for JSON, with similar performance in Python ( slower performance in browser is not big issues as browser will process typically only few messages) and 27% smaller messages size.

If size of messages is big concern carefully designed binary protocol ( with Protocol Buffers for instance) can provide much smaller messages ( but with additional costs in development).

5 thoughts on “Comparison of JSON Like Serializations – JSON vs UBJSON vs MessagePack vs CBOR”

Thanks for the great blog post.

Another dataformat worth checking out is Smile.
For a similar example I get these numbers.

JSON: 744 bytes
Smile: 470 bytes
CBOR: 600 bytes
Msgpack: 586 bytes

The reason why Smile is much smaller is the built in back reference feature. Formats like json, cbor and msgpack have the problem that they have to send the key name with every field. In your example json, cbor and msgpack all contain the string ‘call_id’ 8 times in the output. But smile only writes this string once and then adds a reference in all the other locations. When you send a lot of similar objects this can save a lot of bandwith.

Text from Wikipedia: https://en.wikipedia.org/wiki/Smile_(data_interchange_format)
Compared to JSON, Smile is both more compact and more efficient to process (both to read and write). Part of this is due to more efficient binary encoding (similar to BSON, CBOR and UBJSON), but an additional feature is optional use of back references for property names and values. Back referencing allows replacing of property names and/or short (64 bytes or less) String values with 1- or 2-byte reference ids.

admin says:

December 2, 2017 at 23:24

Frankly spoken at least in Python it does not seem to be good alternative – I tried pysmile – It’s 2.7 only, last commit 2 years ago, performance very bad (17 secs for same tests), message size 618 bytes.

Reply

Nice article. CBOR can actually really come into its own with numerical payloads, with much bigger gains vs JSON [1]. You are right that it’s almost a drop in replacement.

[1] http://richardstartin.uk/concise-binary-object-representation/

Your test data seems to contain some binary data in hexadecimal form (e.g. “returned” and “task_id”). If you stored these in byte form, you’d benefit from them only requiring half the space compare to their hex string representation and any binary-type supporting formats would produce a considerably smaller encoded size.

Also, it would appear you didn’t run your tests with the ubjson C-extension compiled. With it, performance should be comparable with (if not slightly better than) Python’s built-in json module (assuming version 0.10.0 or later).

admin says:

December 2, 2017 at 22:37

Thanks – updated with recent results. task_id and returned are basically UUIDs, so string representation is easiest, but I agree in wire protocol they can be converted to bytes, if one wants to save space. Anyhow messages are rather arbitrary, I just got them at hand when doing tests. Any other messages can be easily tested in provided IPython notebook.

Reply

Ivanovo

Comparison of JSON Like Serializations – JSON vs UBJSON vs MessagePack vs CBOR

Conclusions

5 thoughts on “Comparison of JSON Like Serializations – JSON vs UBJSON vs MessagePack vs CBOR”

Leave a Reply Cancel reply

My Digital Bits And Pieces