Estimating JSON size

I've been working on a system that heavily uses message queuing (RabbitMQ via MassTransit specifically) and occasionally the system needs to deal with large object graphs that need to be processed different - either broken into smaller pieces of work or serialized to an external source and a pointer put into the message instead.

The first idea was to serialize all messages to a MemoryStream but unfortunately this has some limitations, specifically:

a. For smaller messages the entire stream is duplicated due to the way the MassTransit interface works b. For larger messages we waste a lot of memory

For short-lived HTTP requests this is generally not a problem but for long-running message queue processing systems we want to be a bit more careful with GC pressures.

Which led me to two possible solutions:

LengthOnlyStream

This method is 100% accurate taking into consideration JSON attributes etc. and yet only requires a few bytes of memory.

Basically it involves a new Stream class that does not record what is written merely the length.

Pros & cons

  • + 100% accurate
  • + Can work with non-JSON serialization too
  • - Still goes through the whole serialization process

Source

internal class LengthOnlyStream : Stream {
    long length;

    public override bool CanRead => false;
    public override bool CanSeek => false;
    public override bool CanWrite => true;
    public override long Length => length;
    public override long Position { get; set; }
    public override void Flush() { }
    public override int Read(byte[] buffer, int offset, int count) => throw new NotImplementedException();
    public void Reset() => length = 0;
    public override long Seek(long offset, SeekOrigin origin) => 0;
    public override void SetLength(long value) => length = value;
    public override void Write(byte[] buffer, int offset, int count) => length += count - offset;
}

Usage

You can now get the actual serialized size with:

using var countStream = new LengthOnlyStream();
JsonSerializer.Serialize(countStream, damien, typeof(Person), options);
var size = countStream.Length;

JsonEstimator

In our particular case we don't need to be 100% accurate and instead would like to minimize the amount of work done as the trade-off.

This is where JsonEstimator comes into play:

Pros & cons

  • - Not 100% accurate (ignores Json attributes)
  • + Fast and efficient

Source

public static class JsonEstimator
{
    public static long Estimate(object obj, bool includeNulls) {
        if (obj is null) return 4;
        if (obj is Byte || obj is SByte) return 1;
        if (obj is Char) return 3;
        if (obj is Boolean b) return b ? 4 : 5;
        if (obj is Guid) return 38;
        if (obj is DateTime || obj is DateTimeOffset) return 35;
        if (obj is Int16 i16) return i16.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is Int32 i32) return i32.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is Int64 i64) return i64.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is UInt16 u16) return u16.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is UInt32 u32) return u32.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is UInt64 u64) return u64.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is String s) return s.Length + 2;
        if (obj is Decimal dec) {
            var left = (long)Math.Floor(dec % 10);
            var right = BitConverter.GetBytes(decimal.GetBits(dec)[3])[2];
            return right == 0 ? left : left + right + 1;
        }
        if (obj is Double dou) return dou.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is Single sin) return sin.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is IDictionary dict) return EstimateDictionary(dict, includeNulls);
        if (obj is IEnumerable enumerable) return EstimateEnumerable(enumerable, includeNulls);

        return EstimateObject(obj, includeNulls);
    }

    static long EstimateEnumerable(IEnumerable enumerable, bool includeNulls) {
        long size = 0;
        foreach (var item in enumerable)
            size += Estimate(item, includeNulls) + 1; // ,
        return size > 0 ? size + 1 : 2;
    }

    static readonly BindingFlags publicInstance = BindingFlags.Instance | BindingFlags.Public;

    static long EstimateDictionary(IDictionary dictionary, bool includeNulls) {
        long size = 2; // { }
        bool wasFirst = true;
        foreach (var key in dictionary.Keys) {
            var value = dictionary[key];
            if (includeNulls || value != null) {
                if (!wasFirst)
                    size++;
                else
                    wasFirst = false;
                size += Estimate(key, includeNulls) + 1 + Estimate(value, includeNulls); // :,
            }
        }
        return size;
    }

    static long EstimateObject(object obj, bool includeNulls) {
        long size = 2;
        bool wasFirst = true;
        var type = obj.GetType();
        var properties = type.GetProperties(publicInstance);
        foreach (var property in properties) {
            if (property.CanRead && property.CanWrite) {
                var value = property.GetValue(obj);
                if (includeNulls || value != null) {
                    if (!wasFirst)
                        size++;
                    else
                        wasFirst = false;
                    size += property.Name.Length + 3 + Estimate(value, includeNulls);
                }
            }
        }

        var fields = type.GetFields(publicInstance);
        foreach (var field in fields) {
            var value = field.GetValue(obj);
            if (includeNulls || value != null) {
                if (!wasFirst)
                    size++;
                else
                    wasFirst = false;
                size += field.Name.Length + 3 + Estimate(value, includeNulls);
            }
        }
        return size;
    }
}

Usage

var size = JsonSizeCalculator.Estimate(damien, true);

Wrap-up

I also had the chance to play with Benchmark.NET and also to look at various optimizations of the estimator (using Stack<T> instead of a recursion, reducing foreach allocations and a simpler switch that did not involve pattern matching) but none of them yielded positive results on my test set - I likely need a bigger set of objects to see the real benefits.

Regards,

[)amien

0 responses