Estimating JSON size
I've been working on a system that heavily uses message queuing (RabbitMQ via MassTransit specifically) and occasionally the system needs to deal with large object graphs that need to be processed different - either broken into smaller pieces of work or serialized to an external source and a pointer put into the message instead.
The first idea was to serialize all messages to a MemoryStream but unfortunately this has some limitations, specifically:
a. For smaller messages the entire stream is duplicated due to the way the MassTransit interface works b. For larger messages we waste a lot of memory
For short-lived HTTP requests this is generally not a problem but for long-running message queue processing systems we want to be a bit more careful with GC pressures.
Which led me to two possible solutions:
LengthOnlyStream
This method is 100% accurate taking into consideration JSON attributes etc. and yet only requires a few bytes of memory.
Basically it involves a new Stream class that does not record what is written merely the length.
Pros & cons
- + 100% accurate
- + Can work with non-JSON serialization too
- - Still goes through the whole serialization process
Source
internal class LengthOnlyStream : Stream {
long length;
public override bool CanRead => false;
public override bool CanSeek => false;
public override bool CanWrite => true;
public override long Length => length;
public override long Position { get; set; }
public override void Flush() { }
public override int Read(byte[] buffer, int offset, int count) => throw new NotImplementedException();
public void Reset() => length = 0;
public override long Seek(long offset, SeekOrigin origin) => 0;
public override void SetLength(long value) => length = value;
public override void Write(byte[] buffer, int offset, int count) => length += count - offset;
}
Usage
You can now get the actual serialized size with:
using var countStream = new LengthOnlyStream();
JsonSerializer.Serialize(countStream, damien, typeof(Person), options);
var size = countStream.Length;
JsonEstimator
In our particular case we don't need to be 100% accurate and instead would like to minimize the amount of work done as the trade-off.
This is where JsonEstimator comes into play:
Pros & cons
- - Not 100% accurate (ignores Json attributes)
- + Fast and efficient
Source
public static class JsonEstimator
{
public static long Estimate(object obj, bool includeNulls) {
if (obj is null) return 4;
if (obj is Byte || obj is SByte) return 1;
if (obj is Char) return 3;
if (obj is Boolean b) return b ? 4 : 5;
if (obj is Guid) return 38;
if (obj is DateTime || obj is DateTimeOffset) return 35;
if (obj is Int16 i16) return i16.ToString(CultureInfo.InvariantCulture).Length;
if (obj is Int32 i32) return i32.ToString(CultureInfo.InvariantCulture).Length;
if (obj is Int64 i64) return i64.ToString(CultureInfo.InvariantCulture).Length;
if (obj is UInt16 u16) return u16.ToString(CultureInfo.InvariantCulture).Length;
if (obj is UInt32 u32) return u32.ToString(CultureInfo.InvariantCulture).Length;
if (obj is UInt64 u64) return u64.ToString(CultureInfo.InvariantCulture).Length;
if (obj is String s) return s.Length + 2;
if (obj is Decimal dec) {
var left = (long)Math.Floor(dec % 10);
var right = BitConverter.GetBytes(decimal.GetBits(dec)[3])[2];
return right == 0 ? left : left + right + 1;
}
if (obj is Double dou) return dou.ToString(CultureInfo.InvariantCulture).Length;
if (obj is Single sin) return sin.ToString(CultureInfo.InvariantCulture).Length;
if (obj is IDictionary dict) return EstimateDictionary(dict, includeNulls);
if (obj is IEnumerable enumerable) return EstimateEnumerable(enumerable, includeNulls);
return EstimateObject(obj, includeNulls);
}
static long EstimateEnumerable(IEnumerable enumerable, bool includeNulls) {
long size = 0;
foreach (var item in enumerable)
size += Estimate(item, includeNulls) + 1; // ,
return size > 0 ? size + 1 : 2;
}
static readonly BindingFlags publicInstance = BindingFlags.Instance | BindingFlags.Public;
static long EstimateDictionary(IDictionary dictionary, bool includeNulls) {
long size = 2; // { }
bool wasFirst = true;
foreach (var key in dictionary.Keys) {
var value = dictionary[key];
if (includeNulls || value != null) {
if (!wasFirst)
size++;
else
wasFirst = false;
size += Estimate(key, includeNulls) + 1 + Estimate(value, includeNulls); // :,
}
}
return size;
}
static long EstimateObject(object obj, bool includeNulls) {
long size = 2;
bool wasFirst = true;
var type = obj.GetType();
var properties = type.GetProperties(publicInstance);
foreach (var property in properties) {
if (property.CanRead && property.CanWrite) {
var value = property.GetValue(obj);
if (includeNulls || value != null) {
if (!wasFirst)
size++;
else
wasFirst = false;
size += property.Name.Length + 3 + Estimate(value, includeNulls);
}
}
}
var fields = type.GetFields(publicInstance);
foreach (var field in fields) {
var value = field.GetValue(obj);
if (includeNulls || value != null) {
if (!wasFirst)
size++;
else
wasFirst = false;
size += field.Name.Length + 3 + Estimate(value, includeNulls);
}
}
return size;
}
}
Usage
var size = JsonSizeCalculator.Estimate(damien, true);
Wrap-up
I also had the chance to play with Benchmark.NET and also to look at various optimizations of the estimator (using Stack<T>
instead of a recursion, reducing foreach
allocations and a simpler switch that did not involve pattern matching) but none of them yielded positive results on my test set - I likely need a bigger set of objects to see the real benefits.
Regards,
[)amien