Archive for .NET category

Sequence averages in Scala

I’ve been learning Scala and decided to put together a C# to Scala cheat sheet. All is going pretty well but then I got stuck on the equivalent of Average.

Enumerable.Average in .NET calculates a mean average from your sequence by summing up all the values and counting them in a single pass then returning the sum divided by the count in a floating point format (or decimal).

The problem

Given that Scala has nothing built-in there are more than a few suggestions online that boil down to:

val average = seq.sum / seq.length

This has a few problems:

  1. Visiting a sequence twice can be inefficient
  2. Sum can overflow as it is the same type as the sequence
  3. Applied to an integer without casting it returns an integer average

A solution

Scala provides a useful high-order function called foldLeft. Its job is to take an initial state and a function then keep applying the function with each value to the state. So one more efficient solution to the problem is:

val average = seq.foldLeft((0.0, 1)) ((acc, i) => ((acc._1 + (i - acc._1) / acc._2), acc._2 + 1))._1

How does this work?

What we do here is calculate an average as we go, adding the new weighted average each time.

It achieves this by setting up a tuple to contain our initial state with (0.0, 1). This specifies our starting average of 0.0 and our starting position of 1.

The next part specifies the function that takes that state as acc (for accumulator) and the next value in the sequence as i and calculates our rolling average for each value and increases the position as it goes along.

Finally at the end of our call we specify ._1 which tells the compiler we want the first value from the tuple – the average – as we no longer care about the position.

If you wanted to make this function more reusable you could do this:

def average(s: Seq[Int]): Double = s.foldLeft((0.0, 1)) ((acc, i) => ((acc._1 + (i - acc._1) / acc._2), acc._2 + 1))._1

Be aware you might need multiple overloads for each numeric sequence type you want to be able to average given the lack of a common numeric trait that allows for the subtraction and division.

Precision and rounding

There is some slight variance in results between this approach and the total / count due to rounding precision. If you wanted to preserve that you could always add and then divide at the end still in a single pass much like .NET does but with Scala’s foldLeft rather than a foreach.

def average(s: Seq[Int]): Double = { val t = s.foldLeft((0.0, 0)) ((acc, i) => (acc._1 + i, acc._2 + 1)); t._1 / t._2 }

[)amien

Optimizing Sum, Count, Min, Max and Average with LINQ

LINQ is a great tool for C# programmers letting you use familiar syntax with a variety of backend systems without having to learn another language or paradigm for many query operations.

Ensuring that the queries still perform well can be a bit of a chore and one set that fails quite badly are the aggregate operations when you want more than one.

Multiple sequential queries (bad)

var count = db.Invoices.Count();
var total = db.Invoices.Sum(i => i.Paid);
var average = db.Invoices.Average(i => i.Paid);

Will issue three separate requests. There is nothing a LINQ provider can do to optimize that pattern as they are three discrete statements.

Background

If we wanted these values by country we could do this in LINQ:

var a = db.Invoices.GroupBy(i => i.Country)
          .Select(g => new { Country = g.Key,
                             Count = g.Count(),
                             Total = g.Sum(i => i.Paid),
                             Average = g.Average(i => i.Paid) });

Which gets us everything in a single statement broken down by country. In SQL this is:

SELECT Country, Count(*), Sum(Paid), Average(Paid)
    FROM Invoices GROUP BY Country

Many data sources including SQL are happy to provide aggregate values without a group by so how do we generate that from LINQ?

In the absence of a Group method that doesn’t take a property we need to fake it and because of the way many LINQ providers optimize out parts of the tree we can:

Single optimized query (good)

Replacing the property in a GroupBy with a constant value gives us an optimized single query:

var a = db.Invoices.GroupBy(i => 1)
          .Select(g => new { Count = g.Count(),
                             Total = g.Sum(i => i.Paid),
                             Average = g.Average(i => i.Paid) });

Here are the providers I’ve tried:

  • LINQ to Objects (Works although constant is likely evaluated)
  • LINQ to SQL (Works although passes 1 parameter to SQL)
  • Entity Framework 6 (Works although query is a little obscure)
  • Elasticsearch (Works and optimizes out totally)

Count+Where optimizations

If we are performing counts with a predicate or against a where we can also optimize these.

var high = db.Invoices.Count(i => i.Paid >= 1000);
var low = db.Invoices.Where(i => i.Paid < 1000).Count();
var sum = db.Invoices.Sum(i => i.Paid);

Then we can express this as:

var a = db.Invoices.GroupBy(g => 1)
          .Select(g => new { High = g.Count(i => i.Paid >= 1000),
                             Low = g.Count(i => i.Paid < 1000),
                             Sum = g.Sum(i => i.Paid) });

[)amien

8 Visual Studio debugging tips &#8211; debug like a boss

There are so many useful debugging features built into Visual Studio that aren’t well known. Here are a few my favorites including some recent finds in VS 2013.

1. Breakpoint inside a lambda

If you click the left gutter to set breakpoints you could be easily mislead into thinking breakpoints happen at line level.

You can actually insert a breakpoint inside parts of the line such as inside a lambda in your LINQ expression. Just right-click the part of the code and choose Breakpoint > Insert breakpoint from the context menu.

2. Usable output window

The output window is useful for debugging where breakpoints would be too invasive or interrupt flow but it’s pretty noisy.

Just right click in the output window (make sure output is set to debug) and turn off the Module Load, Module Unload, Process Exit and Thread Exit to leave you with stuff you actually care about. Now Debug.WriteLine to your heart’s content.

Visual Studio output window filtering options

You can also press Ctrl-S in the output window to save the contents.

3. Attach debugger to client and server (VS 2012)

It’s useful to have both server and client projects in a single solution so you only need one copy of Visual Studio running and don’t get lost alt-tabbing back and forth especially if they share common code such as a data model project.

One disadvantage is that the start-up project is the only one to get a debugger attached. If you encounter an exception it will show in your client not your server project.

That’s easily solved now. Right-click on the solution, choose properties and choose Multiple startup projects then select the Start action for the projects you need to attach to.

Visual Studio Solution properties dialog

4. Create a repro project template

If you’re responsible for a SDK or API create a simple application that uses your stuff in a small self-contained way. Then use File > Export template… to save it.

Now you can create a new project from your template whenever you need it with a few clicks. Even better make it available to users and testers so they can send you minimal repros.

5. Use the DebuggerDisplay attribute

By default the debugger will use ToString() for watch and auto windows which normally outputs class name. Even if you overwrote ToString it’s probably not what somebody debugging wants to see at a glance.

Add DebuggerDisplay to your class with a simple expression to evaluate properties instead. e.g.:

[DebuggerDisplay("Order {ID,nq}")
class Order {
    public string ID { get { return id; } }
}

The “nq” prevents double-quotes from being emitted. You can also use methods here too but don’t do anything with subtle side-effects otherwise your observation of the subject will change its behavior and could cause weird issues.

6. Manage breakpoints

You set-up some interesting breakpoints and now you need to switch one off for as it’s getting hit too much but you’ll need it again in a minute. If you remove the breakpoint you’ll have to come back and find it again.

Enter the much-overlooked Breakpoints window (Ctrl-Alt-B). This will show all breakpoints you have set but crucially lets you disable them without unsetting them by simply removing the check-mark. Check it again to re-enable it.

Visual Studio breakpoints window

This window also provides the ability to quickly:

  • Condition when a breakpoint should occur
  • Hit count to see how often it is hit and to only break based on that count
  • Label a breakpoint to allow toggling on and off in batches
  • When Hit to put a message in the output window instead of actually breaking

7. Break on or output the caller information (.NET 4.5/Windows 8 Store)

There isn’t a global variable for the current method of the caller and getting the current stack can be a very slow operation.

One quick and simple trick is to add an extra optional string parameter to the method with the CallerMemberName attribute. e.g.

void MyFunction(string someValue, [CallerMemberName] string caller = null) {
    ...
}

Because it is an optional value you don’t need to modify any callers but you can now:

  1. Set a breakpoint condition inside DoSomething based on the caller variable
  2. Output the contents of caller to a log or output window

You can also use CallerLineNumber and CallerFilePath. Also remember that constructors, finalizers and operator overloads will display their underlying method names (.ctor, op_Equals etc).

8. See the value returned by a function (VS 2013, .NET 4.5.1/Windows 8.1 Store)

Sometimes you want to see what a function returned but you can’t easily because you didn’t store the value because it was the input to another function.

This was added in VS 2013 but is incredibly easy to miss as you have to be in the right place at the right time. The right place is the Autos window and the right time is exactly the step that returned you to where the function was called from. You won’t see this before you call the function or while in the function. It’s there for a single step and looks like this:

Visual Studio autos window

The arrow icon indicates it’s a return value and it lets you know the name of the function alongside it.

Wrap up

I also can’t stress enough how useful having logs are for troubleshooting once the software leaves your machine but that’s a much bigger discussion than this one.

Am I missing some great debugging tips? Feel free to let me know below :)

PS: Michael Parshin has some great tips on debugging too.

[)amien

5 simple steps to publishing a NuGet package

There is a fair amount of info on making and publishing NuGet packages but I couldn’t find a simplified guide for the simple case. Here it is and start by downloading nuget.exe and putting it in your path.

1. Multi-platform considerations (optional)

Supporting multiple platforms gives you a choice to make:

  1. Portable Class Library (PCL)
  2. One project with MSBuild magic
  3. Multiple projects

If you can go with PCL do it. For CSharpAnalytics we use platform-specific system info and hooks so it’s not an option – we went with multiple projects.

Multiple projects

Creating a separate .csproj for each platform and putting in the same folder means adding files isn’t too painful (show all files then include the ones you need) but you do need to take steps to make sure the build process for the projects don’t interfere with each other by separating the bin and obj paths:

  1. Set the output path in the Build tab of project properties to be unique per configuration to for the bin files, e.g. “bin\Net45\Release\”
  2. Edit the .csproj file adding a BaseIntermediateOutputPath tag for obj files, e.g. <BaseIntermediateOutputPath>obj\Net45</BaseIntermediateOutputPath>

2. Create your .nuspec definition

Now that you know which release dll files you need to include you can go ahead and create the nuspec file that tells nuget how to package your files up.

Open a PowerShell and type nuget spec to create you an XML file to edit in your text editor

Once you’ve entered your author details, a snappy description and links to your project page and license you can then add the files. Libraries will want to copy the .dlls into the lib folder with element like these:

<file src="..\bin\Net45\Release\MyLibrary.dll" target="lib\net45" />

Each platform will require a specific target and they should use platform name (e.g. net45, sl5, windows8) described in the NuSpec creating packages documentation. That page has a lot more detail on things such as content file types etc.

If you prefer a graphical UI then NuGet Package Explorer will make your life easier.

Remember to check your .nuspec file to source control (there is nothing private in it) and add it to your solution as a solution item so it doesn’t get missed.

3. Create your .nupkg package

The easiest part of the process. From PowerShell type:

nuget pack yourfile.nuspec

If all goes well it will create yourfile.nupkg.

4. Test your package

Just because your package was created doesn’t mean it works and you don’t want to publish to the world until you know it works especially given you can’t delete packages from NuGet:

  1. Create a folder to be your own private testing NuGet repository, e.g. c:\testnuget
  2. Publish to your test repository with nuget publish yourfile.nuspec -source c:\testnuget
  3. Configure Visual Studio to use your test repository by going to Tools > Library Package Manager > Package Manager Settings > Package Sources and then adding your test folder to the Available package sources test
  4. Create a new test application and then add a reference using Manage NuGet Packages to choose your new package from your test repository.
  5. Write a few lines of code to test you can actually use your package ok!

5. Publish to the world

Okay, you’re now ready to publish. If you haven’t yet signed up for an account over at Nuget.org you’ll need to do that first.

  1. Go to Your Account and copy your API key
  2. Run the PowerShell command nuget setApiKey followed by your API key, e.g. nuget setApiKey 99995594-38d2-42cd-a8b1-ddcd722bb7e7
  3. Run nuget publish yourfile.nuspec again this time without the -source option to publish to the default public repository

[)amien