Skip to content

Calculating MD5 Hash over multiple strings with C#

I just spent a fair amount of hours calculating one MD5 hash code over multiple strings so I thought I write this down. Please note that the same principle can be used for SHA1 and other algorithms provided by the .Net Framework. Please note that MD5 is not the strongest crypto. If you need strong security choose another algorithm.

To build one hash code for multiple data blocks one needs the TransformBlock() and the TransformFinalBlock() functions of .Net’s crypro API’s like the MD5 class.

Here is my main routine which calculates one MD5 hash over a whole set of ITranslationEntry. Each ITranslationEntry contains some string and a list of sub-classes (Texts) which strings too. I need one MD5 hash for the whole object hierarchy so I can detect changes in this structure which then needs to trigger some other logic.

            using (var md5 = MD5.Create())
            {
                void AddStringToHash(ICryptoTransform cryptoTransform, string textToHash)
                {
                    var inputBuffer = Encoding.UTF8.GetBytes(textToHash);
                    cryptoTransform.TransformBlock(inputBuffer, 0, inputBuffer.Length, inputBuffer, 0);
                }

                foreach (var trans in defaultEntries)
                {
                    AddStringToHash(md5, trans.Category);
                    AddStringToHash(md5, trans.Key);
                    foreach (var transText in trans.Texts)
                    {
                        AddStringToHash(md5, transText.TwoLetterLanguageCode);
                        AddStringToHash(md5, transText.Text);
                    }
                }

                md5.TransformFinalBlock(new byte[0], 0, 0);

                return ConvertByteArrayToString(md5.Hash);
            }

The above code creates a instance of .Net’s MD5 class using its Create() factory method. Make sure you dispose this instance when not needed anymore as it uses operating-system resources. That’s why I put it in a using statement.

Next I have a local function called AddStringToHash() which does exactly what its name says by using the TransformBlock() method of the MD5 class. This method does two things. First id calculates the hash for the given block (the string in our case) and adds it to the MD5. Then it copies the input data to output buffer. I couldn’t figure out why it does this. But specify the input buffer as the output buffer does the job here.

Then we use this local function to add some of the property values when looping over our object hierarchy.

Finally we must call TransformFinalBlock(). Otherwise .Net will throw an exception when trying to access MD5.Hash.

The hash itself can be retrieved as a byte-array from MD5.Hash. As I need a proper string representation of it I added a little conversion method. Here’s the code:

        private static string ConvertByteArrayToString(byte[] bytes)
        {
            var sb = new StringBuilder();
            foreach (var b in bytes)
            {
                sb.Append(b.ToString("X2"));
            }

            return sb.ToString();
        }

Hope this saves you some time if try to calculate one hash code for several data chunks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: