Calculating CRC-32 in C# and .NET
- 📅
- 📝 269 words
- 🕙 2 minutes
- 📦 .NET
- 🏷️ C#, hashing
- 💬 16 responses
Just a few days ago I found myself needing to calculate a CRC-32 in .NET. With so many facilities available I was a little shocked that there was nothing built-in to do it so knocked up something myself.
GitHub has the latest version of Crc32
Because unsigned ints aren’t CLS compliant it won’t play well with VB.NET and implementing the HashAlgorithm
might lead people to believe it’s suitable for signing — it isn’t. CRC-32’s are only any good for check-sums along the lines of WinZIP, RAR etc. and certainly shouldn’t come near a password and instead consider SHA-512 or similar.
As well as using it as a HashAlgorithm with block processing you can also access the static method Compute although there is an overhead with every call building the table with that. If neither option suits you needs cut ‘n splice it to something that does.
If using the
Compute
methods multiple times for the same hash you must XOR (~) the current hash when passing it in to the next subsequent call’s seed parameter.
To compute the hash for a file simply:
var crc32 = new Crc32();
var hash = String.Empty;
using (var fs = File.Open("c:\\myfile.txt", FileMode.Open))
foreach (byte b in crc32.ComputeHash(fs)) hash += b.ToString("x2").ToLower();
Console.WriteLine("CRC-32 is {0}", hash);
[)amien
16 responses to Calculating CRC-32 in C# and .NET
If you want to ensure file integrity SHA256 is a great choice — but some people need to calculate a CRC32 for backwards compatibility with things like ZIP.
I may be missing something, but why not use System.Security.Cryptography.SHA256 instead if all you need is to verify file integrity?
It produces a 32-bit CRC (hence the name). If you display this as hexadecimal then each pair of two hex letters is 8 bits. Four of these pairs gives 32-bits.
Thank you for the code. I made some performance tests, that could be interesting. For a 55MB text file: System.Security.Cryptography.SHA256 needs 779 ms. System.Security.Cryptography.MD5 needs 144 ms. CRC32 needs 660 ms.
Is it correct that the crc is just a 8 character string?
I tried to use the Create() method, but than realized that this creates the SHA1Managed object.
Crc32.Create().ComputeHash(fs);
Than I put this in your code:
You’ll see it inherits from the HashAlgorithm class which has a ComputeHash method that takes a Stream object, reads 4K blocks at a time and calls through to HashCore for each block.
In your example you have:
However the only Compute methods I see are static and none of them take a FileStream object.
If you interpret size as being the size of the array, yes, but the array already has a length, why would you need to pass it in again?
When passing the array, an offset and size, size refers to the size of the data, see the parameter definition for the HashCore function of the HashAlgorithm:
cbSize: The number of bytes in the byte array to use as data
Cheers, Phooey.
That is intentional — start is an index into the buffer to start populating at. If you change the loop as you describe it will go out of bounds with a non-zero value of start.
Thanks for the code, there’s a little bug in the CalculateHash function:
needs to be:
else you wont hash the whole buffer for non-zero values of start.
Cheers, Phooey.
You can’t really do that — the whole idea of a checksum is that if a single byte changes or is transferred incorrectly then the checksum fails — if you only checksum every 1 byte in 100 then it’s not going to catch 99% of the errors.
How to speed up calculation? On 20GB files it takes a lot… Do you think can be done something like spot calculation? Say 1 byte every 100 bytes?
Camillo
I think that returning it as a BigEndian Byte array is a mistake since 98% of .NET code is running on windows based systems (which use the Little-Endian method).
also, reading every byte from the result array, formatting it to “x2″ and adding it into a string is expensive and causing you to construct 5 strings (1 empty, and another one for each byte you append the formatted output).
I think that adding something like this would benefit everyone more.. (after returning little endian byte arrays) :)
PS — thx for the kickass implementation :) PS #2 — it works fine on Vista ultimate x64
Hi, I tried to execute the program under Windows XP and the CRC-hash calculated for a file of 25MB wasn’t correct, just because of in windows XP the CalculateHash was called more than once(it works with a lenght of 4096 byte at a time). I toke off the ‘~’ from “return ~crc” of CalculateHash and put it on “byte hashBuffer = UInt32ToBigEndianBytes(~hash)” of HashFinal, so that step was made only once at the end of cycle (without modifying the intermediate values). In this way the algorithm worked corrctly.
I hope I gived a hand, Alessio
You’re right, multiple calls to HashCore will carry on from the final complement instead of the actual current hash value.
Removing the ~ complement from the CalculateHash function means you can’t use that static method directly… I think complementing it in Initialize and inside HashCore might be a better approach.
Will update the code once I’ve had chance to test it and check the other hashing algorithms I have up.
Thanks!
Nice code, but i found a bug if you compute values for streams larger than 4096 bytes (make textfile in sample code larger than 4096). In this case the HashCore function is called more than once but the complement is build in the CalculateHash function. So you don’t get the complement for the final value but for each intermediate value which will result in a wrong value. The complement build should take place in the HashFinal function.
Dude, thanks a lot, i found a lot of dumb code out there and this is the best code written for this purpose.
I wrote one my self but it was a conversion from Java to c# and i did not use at least some support which c# provides and there was some other code which generated wrong checksomes on a 64-bit processor. Although I have not tested your code on 64 bit processor but it seems it should work fine as far as my knowledge is concerned.