Calculating MD5 Hash over multiple strings with C#

I just spent a fair amount of hours calculating one MD5 hash code over multiple strings so I thought I write this down. Please note that the same principle can be used for SHA1 and other algorithms provided by the .Net Framework. Please note that MD5 is not the strongest crypto. If you need strong security choose another algorithm.
To build one hash code for multiple data blocks one needs the TransformBlock()
and the TransformFinalBlock()
functions of .Net’s crypro API’s like the MD5
class.
Here is my main routine which calculates one MD5 hash over a whole set of ITranslationEntry
. Each ITranslationEntry
contains some string and a list of sub-classes (Texts
) which strings too. I need one MD5 hash for the whole object hierarchy so I can detect changes in this structure which then needs to trigger some other logic.
using (var md5 = MD5.Create()) { void AddStringToHash(ICryptoTransform cryptoTransform, string textToHash) { var inputBuffer = Encoding.UTF8.GetBytes(textToHash); cryptoTransform.TransformBlock(inputBuffer, 0, inputBuffer.Length, inputBuffer, 0); } foreach (var trans in defaultEntries) { AddStringToHash(md5, trans.Category); AddStringToHash(md5, trans.Key); foreach (var transText in trans.Texts) { AddStringToHash(md5, transText.TwoLetterLanguageCode); AddStringToHash(md5, transText.Text); } } md5.TransformFinalBlock(new byte[0], 0, 0); return ConvertByteArrayToString(md5.Hash); }
The above code creates a instance of .Net’s MD5
class using its Create()
factory method. Make sure you dispose this instance when not needed anymore as it uses operating-system resources. That’s why I put it in a using
statement.
Next I have a local function called AddStringToHash()
which does exactly what its name says by using the TransformBlock()
method of the MD5
class. This method does two things. First id calculates the hash for the given block (the string in our case) and adds it to the MD5. Then it copies the input data to output buffer. I couldn’t figure out why it does this. But specify the input buffer as the output buffer does the job here.
Then we use this local function to add some of the property values when looping over our object hierarchy.
Finally we must call TransformFinalBlock()
. Otherwise .Net will throw an exception when trying to access MD5.Hash
.
The hash itself can be retrieved as a byte-array from MD5.Hash
. As I need a proper string representation of it I added a little conversion method. Here’s the code:
private static string ConvertByteArrayToString(byte[] bytes) { var sb = new StringBuilder(); foreach (var b in bytes) { sb.Append(b.ToString("X2")); } return sb.ToString(); }
Hope this saves you some time if try to calculate one hash code for several data chunks.
Categories