Posts in category .net

Model binding form posts to immutable objects

I’ve been working on porting over my blog to a static site generator and fired up an Azure Function to handle the form-comment to PR process to enable user comments to still be part of the site without using a 3rd party commenting system - more on that in a future post - and found the ASP.NET model binding for form posts distinctly lacking.

It’s been great getting back into .NET and brushing up some skills making the code clear, short and reusable. What I wanted was a super-clear action on my controller that tried to collect, validate and sanitize the data then if all was well create the pull request or report errors.

Ideally it would look like this;

[FunctionName("PostComment")]
public static async Task<HttpResponseMessage> Run([HttpTrigger(AuthorizationLevel.Anonymous, "post")] HttpRequestMessage request) {
    var form = await request.Content.ReadAsFormDataAsync();
    if (TryCreateComment(form, out Comment comment, out var errors))
        await CreateCommentAsPullRequest(comment);
    return request.CreateResponse(errors.Any()
      ? HttpStatusCode.BadRequest : HttpStatusCode.OK, String.Join("\n", errors));
}

To do that however we need a function capable of creating the Comment class from the form post. Sure you can manually do it field by field but that’s not very reusable, repetitive and of course no fun. The Comment class is also - like all good little objects - immutable.

Creating a function to do this is simple with a little bit of reflection;

private static object ConvertParameter(string parameter, Type targetType) {
    return String.IsNullOrWhiteSpace(parameter)
           ? null : TypeDescriptor.GetConverter(targetType).ConvertFrom(parameter);
}

private static bool TryCreateCommentFromForm(NameValueCollection form, out Comment comment, out List<string> errors) {
    var constructor = typeof(Comment).GetConstructors()[0];
    var values = constructor.GetParameters()
                            .ToDictionary(p => p.Name, p => ConvertParameter(form[p.Name], p.ParameterType)
                                      ?? (p.HasDefaultValue ? p.DefaultValue : new MissingRequiredValue()));
    errors = values.Where(p => p.Value is MissingRequiredValue)
                   .Select(p => $"Form value missing for '{p.Key}'").ToList();
    comment = errors.Any() ? null : (Comment)constructor.Invoke(values.Values.ToArray());
    return !errors.Any();
}

What this does is grab the constructor for the Comment and try to find keys in the form that match the parameter name. Any that are missing are reported as errors unless they have a default value in which case that is used. MissingRequiredValue is just an empty object to act as a sentinel. The use of TypeDescriptor.GetConverter means it should be quite happy handling ints, decimals, urls etc.

The constructor for Comment specifies which fields are required and the parameter names must match the form field names by convention. Any value that is optional has a default value that the constructor will happily fill in a sensible default for.

public Comment(string post_id, string message, string author, string email, 
    DateTime? date = null, Uri url = null, int? id = null, string gravatar = null) {
    this.post_id = pathValidChars.Replace(post_id, "-");
    this.message = message;
    this.author = author;
    this.email = email;
    this.date = date ?? DateTime.UtcNow;
    this.url = url;
    this.id = id ?? new { this.post_id, this.author, this.message, this.date }.GetHashCode();
    this.gravatar = gravatar ?? EncodeGravatar(email);
}

I’ll post more of the form commenting system source soon once it’s a bit better tested and I look into anti-spam integration. Ideally I’ll also provide an AWS Lambda variant of the code so you can choose (or load balance) comment posting and almost certainly get what you need on the free tier. For now the Jekyll rendering templates and WordPress exporter are available.

[)amien

Table per hierarchy in Azure Table Storage

If you’re coming from an ORM background to Azure Table Storage you might be wondering how to map class hierarchies to tables.

Documentation on the topic is hard to find unless you know the magic class name EntityResolver which you can discover by digging into the Azure Client for .NET source code.

Let’s say we have a basic blog style system (minimal fields shown):

public class Content {
  public string Id { get; set; }
  public string Title { get; set }
}

public class BlogPost : Content {
  public List<string> Topics { get; set; }
}

public class Page : Content {
  public String Slug { get; set; }
}

The trick is to create an instance of EntityResolver where T is your base class, e.g. Content. Strangely EntityResolver’s signature requires T implement new() so you can’t make your base class abstract.

Firstly we need to add to our base class some kind of identifier for the type – in ORM terms this is referred to as a discriminator. Then we’d override that in the sub-types to ensure new instances get the correct type set on insertion.

public class Content {
  public string Id { get; set; }
  public string Title { get; set }
}

public class BlogPost : Content {
  public List<string> Topics { get; set; }
}

public class Page : Content {
  public String Slug { get; set; }
}

Let’s say we want to store all of these in a table called ‘content’. We would typically write a small helper class to handle the cloud table and storage, e.g.

public class Content {
  public string Id { get; set; }
  public string Title { get; set }
  public virtual string ContentType { get; set; }
}

public class BlogPost : Content {
  public List<string> Topics { get; set; }
  public override string ContentType {
    get { return "blog"; }
    set { }
  }
}

public class Page : Content {
  public String Slug { get; set; }
  public override string ContentType {
    get { return "page"; }
    set { }
  }
}

With just that change you can actually start inserting rows into Azure Table Storage but querying them back will always result in Content types and saving those back will result in data loss!

We can however help the CloudTable client materialize the correct results by creating an EntityResolver:

EntityResolver<Content> contentResolver(partitionKey, rowKey, timestamp, properties, etag) {
  var contentType = properties["ContentType"].StringValue;
  switch (contentType) {
    case "blog": return new BlogPost();
    case "page": return new Page();
    default: throw new NotSupportedException(String.Format("Unknown ContentType '{0}'", contentType));
  }
}

Which is then passed into operations that materialize results. Note that some signatures don’t accept a resolver so find one that does even if it means supplying a default OperationContent. For example:

var query = table.CreateQuery<Content>().Where(c => c.PartitionKey == yearMonth);
var results = query.ExecuteQuery(query.AsTableQuery(), contentResolver, myRequestOptions, myOperationContext);

Given that these entity resolvers are essential to correctly materializing your results without data loss it’s worth wrapping the CloudTable client with the necessary setup/table-creation/entity resolver.

[)amien

Sequence averages in Scala

I’ve been learning Scala and decided to put together a C# to Scala cheat sheet. All is going pretty well but then I got stuck on the equivalent of Average.

Enumerable.Average in .NET calculates a mean average from your sequence by summing up all the values and counting them in a single pass then returning the sum divided by the count in a floating point format (or decimal).

The problem

Given that Scala has nothing built-in there are more than a few suggestions online that boil down to:

val average = seq.sum / seq.length

This has a few problems:

  1. Visiting a sequence twice can be inefficient
  2. Sum can overflow as it is the same type as the sequence
  3. Applied to an integer without casting it returns an integer average

A solution

Scala provides a useful high-order function called foldLeft. Its job is to take an initial state and a function then keep applying the function with each value to the state. So one more efficient solution to the problem is:

val average = seq.foldLeft((0.0, 1)) ((acc, i) => ((acc._1 + (i - acc._1) / acc._2), acc._2 + 1))._1

How does this work?

What we do here is calculate an average as we go, adding the new weighted average each time.

It achieves this by setting up a tuple to contain our initial state with (0.0, 1). This specifies our starting average of 0.0 and our starting position of 1.

The next part specifies the function that takes that state as acc (for accumulator) and the next value in the sequence as i and calculates our rolling average for each value and increases the position as it goes along.

Finally at the end of our call we specify ._1 which tells the compiler we want the first value from the tuple – the average – as we no longer care about the position.

If you wanted to make this function more reusable you could do this:

def average(s: Seq[Int]): Double = s.foldLeft((0.0, 1)) ((acc, i) => ((acc._1 + (i - acc._1) / acc._2), acc._2 + 1))._1

Be aware you might need multiple overloads for each numeric sequence type you want to be able to average given the lack of a common numeric trait that allows for the subtraction and division.

Precision and rounding

There is some slight variance in results between this approach and the total / count due to rounding precision. If you wanted to preserve that you could always add and then divide at the end still in a single pass much like .NET does but with Scala’s foldLeft rather than a foreach.

def average(s: Seq[Int]): Double = { val t = s.foldLeft((0.0, 0)) ((acc, i) => (acc._1 + i, acc._2 + 1)); t._1 / t._2 }

[)amien

Optimizing Sum, Count, Min, Max and Average with LINQ

LINQ is a great tool for C# programmers letting you use familiar syntax with a variety of back-end systems without having to learn another language or paradigm for many query operations.

Ensuring that the queries still perform well can be a bit of a chore and one set that fails quite badly are the aggregate operations when you want more than one.

Multiple sequential queries (bad)

var count = db.Invoices.Count();
var total = db.Invoices.Sum(i => i.Paid);
var average = db.Invoices.Average(i => i.Paid);

Will issue three separate requests. There is nothing a LINQ provider can do to optimize that pattern as they are three discrete statements.

Background

If we wanted these values by country we could do this in LINQ:

var a = db.Invoices.GroupBy(i => i.Country)
      .Select(g => new { Country = g.Key,
           Count = g.Count(),
           Total = g.Sum(i => i.Paid),
           Average = g.Average(i => i.Paid) });

Which gets us everything in a single statement broken down by country. In SQL this is:

SELECT Country, Count(*), Sum(Paid), Average(Paid)
    FROM Invoices GROUP BY Country

Many data sources including SQL are happy to provide aggregate values without a group by so how do we generate that from LINQ?

In the absence of a Group method that doesn’t take a property we need to fake it and because of the way many LINQ providers optimize out parts of the tree we can:

Single optimized query (good)

Replacing the property in a GroupBy with a constant value gives us an optimized single query:

var a = db.Invoices.GroupBy(i => 1)
    .Select(g => new { Count = g.Count(),
               Total = g.Sum(i => i.Paid),
               Average = g.Average(i => i.Paid) });

Here are the providers I’ve tried:

  • LINQ to Objects (Works although constant is likely evaluated)
  • LINQ to SQL (Works although passes 1 parameter to SQL)
  • Entity Framework 6 (Works although query is a little obscure)
  • ElasticLINQ (Works and optimizes out totally)

Count+Where optimizations

If we are performing counts with a predicate or against a where we can also optimize these.

var high = db.Invoices.Count(i => i.Paid >= 1000);
var low = db.Invoices.Where(i => i.Paid < 1000).Count();
var sum = db.Invoices.Sum(i => i.Paid);

Then we can express this as:

var a = db.Invoices.GroupBy(g => 1)
    .Select(g => new { High = g.Count(i => i.Paid >= 1000),
                   Low = g.Count(i => i.Paid < 1000),
                   Sum = g.Sum(i => i.Paid) });

[)amien