URL parsing and manipulation in .NET

While .NET has a URI class it’s not great. They don’t expose the various parts as properties and you can’t manipulate them either. Many projects I’ve seen (Subtext included) just try and manipulate them via strings which varying degrees of success.

Here’s a C# .NET URL decoder that uses a regular expression I developed for performance based on a VBScript class I developed a while back. Comments and white space have been removed to keep it short.

There were some odd bugs in this class. It has been removed.

[)amien

15 responses

  1. Gravatar for John Rummell

    I linked from DotNetKicks. This is cool. How would I use this to grab the URL without the QueryString?

    John Rummell – July 10th, 2006
  2. Gravatar for Damien Guard

    Yeah sorry there are no usage samples - I did write a few but my blog software started activating the anti-spam on me :D To get the current URL: <pre>URL myURL = new URL(Page.Request.Url.ToString());</pre> You could remove all the query string with: <pre>myURL.QueryString = "";</pre> Alternatively if you just wanted to set one item - say a customerID you would normally have to hope it didn’t exist and add it - taking note to append “&” or “?” depending on if there are other query values. With this class you can just; <pre>myURL.Query["CustomerID"] = "1";</pre> Oh, and to write our your hyperlink you’d do something like; <pre>nextCustomerLink.Href = myURL.FullURL;</pre>

    Damien Guard July 10th, 2006
  3. Gravatar for Haacked

    You should consider making it Serializable. THat’s one issue I’ve had with the existing Uri class.

    Haacked July 17th, 2006
  4. Gravatar for Haacked

    I just checked and Uri in .NET 2.0 is serializable.

    Haacked July 17th, 2006
  5. Gravatar for Damien Guard

    The Uri class in .NET is pretty useless. It doesn’t include username/password properties, the query part is a string instead of a collection of name/value pairs etc.

    Damien Guard July 18th, 2006
  6. Gravatar for Mike

    If I had this: https://damieng.com/blog/2006/07/07/URL_parsing_and_manipulation_in_NET How could I get just this part? https://damieng.com/blog/2006/07/07

    Mike – March 17th, 2008
  7. Gravatar for Damien Guard

    @Mike: Well you could but if that’s all you want you may as well just grab the string up to the final / with urlString.Substring(0,urlString.LastIndexOf(‘/’)-1)

    Damien Guard April 11th, 2008
  8. Gravatar for Russ

    This works well, but any attempts that I make to use an imported query string dont seem to work. I can build a querystring and export ok, but importing one fails. Is there a download of this as a .cs class file?

    Russ – May 2nd, 2008
  9. Gravatar for Russ

    I found the problem, it’s in the regular expression.. Here’s the new line. private const string urlDecodeRegex = @”([^:]+)://(([^:@]+)(:([^@]+))?@)?([^:/?#]+)(:([d]+))?([^?#]+)?(\?([^#]+))?(#(.*))?”;

    Russ – May 2nd, 2008
  10. Gravatar for Damien Guard

    Thanks for spotting and figuring that out Ross - I must have broken it during the reformatting exercise when I switched to this theme.

    Damien Guard May 3rd, 2008
  11. Gravatar for MrToes

    It’s also remarkably intolerant of syntax problems. There’s also a problem with the Uri class (in .NET 2.0 Compact Framework at least) where creating it with a string that has leading spaces causes memory corruption. Lovely. We have our own Uri class to fix this (and other) problems. Backward slashes resulting in FormatExceptions is another one that comes to mind. Quite possibly all this is fixed in new shiny .NET versions. So, how long is it till V-day? :)

    MrToes – May 12th, 2008
  12. Gravatar for Ziv Rozzbach

    Nice, but it does not do one thing: splitting the host from the domain so if I enter reag123.afsda123.ukdomain.co.uk it will return me: ukdomain.co.uk Anyone have any idea how can I do it ?

    Ziv Rozzbach – October 5th, 2008
  13. Gravatar for guitarpoet

    Hello, I think the code has a bug and I had rewrite it like this: private const string urlDecodeRegex = @”([^:]+)://(([^:@]+)(:([^@]+))?@)?([^:/?#]+)(:([\d]+))?([^?#]+)?(\?([^#]+))?(#(.*))?”;

    guitarpoet – May 17th, 2009
  14. Gravatar for Michael

    I’ve been collecting versions of different solutions that parse URLS and unfortunately, this one is by far the worst performing solution. Uri for all its weaknesses can process a million encode/decodes in about .4 seconds in my test environment. The second best solution which breaks more out than yours here is about 3 times slower than that but its versatile but I’m looking for something better performing still. But the regex solution above takes about a minute to process a million encode/decodes. That tracks to about 142 times slower. Sorry. Regex is a poor solution for parsing on the serverside.

    Michael – April 10th, 2010
  15. Gravatar for Damien Guard

    It’s a good solution for general parsing operations even in a server environment where you want to do things like manipulate page parameters etc. (which is what it was originally designed for) Any regex based parsing will be less well performing than a hand-coded parser - this applies to any parsing not just this one or url parsing.

    Damien Guard April 10th, 2010

Respond to this