Estimating JSON size

I’ve been working on a system that heavily uses message queuing (RabbitMQ via MassTransit specifically) and occasionally the system needs to deal with large object graphs that need to be processed different — either broken into smaller pieces of work or serialized to an external source and a pointer put into the message instead.

The first idea was to serialize all messages to a MemoryStream but unfortunately this has some limitations, specifically:

  • For smaller messages the entire stream is duplicated due to the way the MassTransit interface works
  • For larger messages a lot of memory is consumed albeit for a short time

For short-lived HTTP requests this is generally not a problem but for long-running message queue processing systems we want to be a bit more careful with GC pressures.

Which led me to two possible solutions.

Introducing LengthOnlyStream

This method is 100% accurate taking into consideration JSON attributes etc. and yet only requires a few bytes of memory.

Basically it involves a new Stream class that does not record what is written merely the length.

Pros & cons

  • ✔️ 100% accurate
  • ✔️ Can work with non-JSON serialization too
  • ❌ Still goes through the whole serialization process

Source

internal class LengthOnlyStream : Stream {
    long length;

    public override bool CanRead => false;
    public override bool CanSeek => false;
    public override bool CanWrite => true;
    public override long Length => length;
    public override long Position { get; set; }
    public override void Flush() { }
    public override int Read(byte[] buffer, int offset, int count) => throw new NotImplementedException();
    public void Reset() => length = 0;
    public override long Seek(long offset, SeekOrigin origin) => 0;
    public override void SetLength(long value) => length = value;
    public override void Write(byte[] buffer, int offset, int count) => length += count - offset;
}

Usage

You can now get the actual serialized size with:

using var countStream = new LengthOnlyStream();
JsonSerializer.Serialize(countStream, damien, typeof(Person), options);
var size = countStream.Length;

Introducing JsonEstimator

In our particular case we don’t need to be 100% accurate and instead would like to minimize the amount of work done as the trade-off. This is where JsonEstimator comes into play:

Pros & cons

  • ❌ Not 100% accurate (ignores Json attributes)
  • ✔️ Fast and efficient

Source

public static class JsonEstimator
{
    public static long Estimate(object obj, bool includeNulls) {
        if (obj is null) return 4;
        if (obj is Byte || obj is SByte) return 1;
        if (obj is Char) return 3;
        if (obj is Boolean b) return b ? 4 : 5;
        if (obj is Guid) return 38;
        if (obj is DateTime || obj is DateTimeOffset) return 35;
        if (obj is Int16 i16) return i16.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is Int32 i32) return i32.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is Int64 i64) return i64.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is UInt16 u16) return u16.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is UInt32 u32) return u32.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is UInt64 u64) return u64.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is String s) return s.Length + 2;
        if (obj is Decimal dec) return dec.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is Double dou) return dou.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is Single sin) return sin.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is IDictionary dict) return EstimateDictionary(dict, includeNulls);
        if (obj is IEnumerable enumerable) return EstimateEnumerable(enumerable, includeNulls);

        return EstimateObject(obj, includeNulls);
    }

    static long EstimateEnumerable(IEnumerable enumerable, bool includeNulls) {
        long size = 0;
        foreach (var item in enumerable)
            size += Estimate(item, includeNulls) + 1; // ,
        return size > 0 ? size + 1 : 2;
    }

    static readonly BindingFlags publicInstance = BindingFlags.Instance | BindingFlags.Public;

    static long EstimateDictionary(IDictionary dictionary, bool includeNulls) {
        long size = 2; // { }
        bool wasFirst = true;
        foreach (var key in dictionary.Keys) {
            var value = dictionary[key];
            if (includeNulls || value != null) {
                if (!wasFirst)
                    size++;
                else
                    wasFirst = false;
                size += Estimate(key, includeNulls) + 1 + Estimate(value, includeNulls); // :,
            }
        }
        return size;
    }

    static long EstimateObject(object obj, bool includeNulls) {
        long size = 2;
        bool wasFirst = true;
        var type = obj.GetType();
        var properties = type.GetProperties(publicInstance);
        foreach (var property in properties) {
            if (property.CanRead && property.CanWrite) {
                var value = property.GetValue(obj);
                if (includeNulls || value != null) {
                    if (!wasFirst)
                        size++;
                    else
                        wasFirst = false;
                    size += property.Name.Length + 3 + Estimate(value, includeNulls);
                }
            }
        }

        var fields = type.GetFields(publicInstance);
        foreach (var field in fields) {
            var value = field.GetValue(obj);
            if (includeNulls || value != null) {
                if (!wasFirst)
                    size++;
                else
                    wasFirst = false;
                size += field.Name.Length + 3 + Estimate(value, includeNulls);
            }
        }
        return size;
    }
}

Usage

var size = JsonEstimator.Estimate(damien, true);

Wrap-up

I also had the chance to play with Benchmark.NET and also to look at various optimizations of the estimator (using Stack<T> instead of a recursion, reducing foreach allocations and a simpler switch that did not involve pattern matching) but none of them yielded positive results on my test set — I likely need a bigger set of objects to see the real benefits.

Regards,

[)amien

2 responses to Estimating JSON size

  1. Avatar for

    Information is only used to show your comment. See my Privacy Policy.

  2. Avatar for Marcin Bluszcz
    Marcin Bluszcz

    Hi, probably there is bug in Estimate(…) method. In my case, I have nested dictionaries in objects. However, using provided code, calculations are too low (~600 bytes when HTTP response with json object is about ~700 KB). What I found and seems that helped, that changing order of these two lines:

    if (obj is IEnumerable enumerable) return EstimateEnumerable(enumerable, includeNulls);
    if (obj is IDictionary dict) return EstimateDictionary(dict, includeNulls);
    

    to

    if (obj is IDictionary dict) return EstimateDictionary(dict, includeNulls);
    if (obj is IEnumerable enumerable) return EstimateEnumerable(enumerable, includeNulls);
    

    Reason of this behaviour is obvious — IDictionary is IEnumerable, so second “if” statement what never reached.

  3. Avatar for rifat

    Very nice and useful piece of code, thanks. Just wanted to mention that the length calculation of left side of decimals is wrong.

    Instead of left = (long)Math.Floor(dec % 10), something like left = ((long)dec).ToString(CultureInfo.InvariantCulture).Length would be more precise.