URL parsing and manipulation in .NET

While .NET has a URI class it’s not great. They don’t expose the various parts as properties and you can’t manipulate them either. Many projects I’ve seen (Subtext included) just try and manipulate them via strings which varying degrees of success.

Here’s a C# .NET URL decoder that uses a regular expression I developed for performance based on a VBScript class I developed a while back. Comments and white space have been removed to keep it short.

There were some odd bugs in this class. It is currently being revised and unit tests written. It should turn up later this month (June 2011)

[)amien

16 responses  

  1. I linked from DotNetKicks. This is cool. How would I use this to grab the URL without the QueryString?

    John Rummell – July 10th, 2006
  2. Yeah sorry there are no usage samples – I did write a few but my blog software started activating the anti-spam on me :D
    To get the current URL:

    URL myURL = new URL(Page.Request.Url.ToString());

    You could remove all the query string with:

    myURL.QueryString = "";

    Alternatively if you just wanted to set one item – say a customerID you would normally have to hope it didn’t exist and add it – taking note to append “&” or “?” depending on if there are other query values. With this class you can just;

    myURL.Query["CustomerID"] = "1";

    Oh, and to write our your hyperlink you’d do something like;

    nextCustomerLink.Href = myURL.FullURL;

    [)amien

    Damien GuardJuly 10th, 2006
  3. You should consider making it Serializable. THat’s one issue I’ve had with the existing Uri class.

    HaackedJuly 17th, 2006
  4. I just checked and Uri in .NET 2.0 is serializable.

    HaackedJuly 17th, 2006
  5. The Uri class in .NET is pretty useless. It doesn’t include username/password properties, the query part is a string instead of a collection of name/value pairs etc.

    [)amien

    Damien GuardJuly 18th, 2006
  6. Great job, thanks!

    FabianJuly 19th, 2006
  7. Mike – March 17th, 2008
  8. @Mike: Well you could but if that’s all you want you may as well just grab the string up to the final / with urlString.Substring(0,urlString.LastIndexOf(‘/’)-1)

    [)amien

    Damien GuardApril 11th, 2008
  9. This works well, but any attempts that I make to use an imported query string dont seem to work. I can build a querystring and export ok, but importing one fails.

    Is there a download of this as a .cs class file?

    Russ – May 2nd, 2008
  10. I found the problem, it’s in the regular expression.. Here’s the new line.

    private const string urlDecodeRegex = @”([^:]+)://(([^:@]+)(:([^@]+))?@)?([^:/?#]+)(:([d]+))?([^?#]+)?(\?([^#]+))?(#(.*))?”;

    Russ – May 2nd, 2008
  11. Thanks for spotting and figuring that out Ross – I must have broken it during the reformatting exercise when I switched to this theme.

    [)amien

    Damien GuardMay 3rd, 2008
  12. It’s also remarkably intolerant of syntax problems. There’s also a problem with the Uri class (in .NET 2.0 Compact Framework at least) where creating it with a string that has leading spaces causes memory corruption. Lovely. We have our own Uri class to fix this (and other) problems. Backward slashes resulting in FormatExceptions is another one that comes to mind. Quite possibly all this is fixed in new shiny .NET versions.

    So, how long is it till V-day? :)

    MrToesMay 12th, 2008
  13. Nice, but it does not do one thing: splitting the host from the domain
    so if I enter reag123.afsda123.ukdomain.co.uk it will return me: ukdomain.co.uk

    Anyone have any idea how can I do it ?

    Ziv Rozzbach – October 5th, 2008
  14. Hello, I think the code has a bug and I had rewrite it like this:
    private const string urlDecodeRegex = @”([^:]+)://(([^:@]+)(:([^@]+))?@)?([^:/?#]+)(:([\d]+))?([^?#]+)?(\?([^#]+))?(#(.*))?”;

    guitarpoet – May 17th, 2009
  15. I’ve been collecting versions of different solutions that parse URLS and unfortunately, this one is by far the worst performing solution. Uri for all its weaknesses can process a million encode/decodes in about .4 seconds in my test environment. The second best solution which breaks more out than yours here is about 3 times slower than that but its versatile but I’m looking for something better performing still. But the regex solution above takes about a minute to process a million encode/decodes. That tracks to about 142 times slower. Sorry. Regex is a poor solution for parsing on the serverside.

    Michael – April 10th, 2010
  16. It’s a good solution for general parsing operations even in a server environment where you want to do things like manipulate page parameters etc. (which is what it was originally designed for)

    Any regex based parsing will be less well performing than a hand-coded parser – this applies to any parsing not just this one or url parsing.

    [)amien

    Damien GuardApril 10th, 2010

Respond to this