The Luhn Algorithm
Checksums can be very helpful for validating that data is intact and free from simple mistakes. They are extremely fast to calculate, and computing power has made them all but trivial over the past few decades. Because of this, checksums often hide in plain sight. One of these is the Luhn Checksum, which is produced by the Luhn Algorithm.
The Luhn Algorithm was invented by Hans Peter Luhn, a scientist at IBM. It's patent was granted in 1960, but has since expired and it is now in the public domain. The best place where you can find its handiwork? Your wallet. Virtually every credit and banking card number uses a Luhn Checksum to calculate its final digit. It can also be found in plenty of other places such as library cards, and other ID numbers.
The algorithm itself is quite simple, because it was designed to be implemented mechanically in calculators. The patent link above is for one such device. To try it out for yourself, pull a credit card out of your wallet and copy down its number.
- Ignore the last digit, because that's the checksum. If you have a 16-digit credit card number, only work with the leftmost 15 digits.
- Starting with the new rightmost digit (because we are ignoring the final one), and working from right to left, double this and every second digit. If your doubled value is now two digits (eg. 6×2 = 12), add those two digits (1+2 = 3). The digits that you are skipping over and not doubling, just copy down as they are.
- Sum all these new digits you either calculated or copied.
- Multiply this sum by 9.
- Take the modulo 10 value of this product. (So the value in the ones column).
- This result should equal the final digit of your credit card number, which we ignored in step 1.
Wikipedia has a more in-depth example of how to calculate a Luhn Checksum. For purposes of this post, I have implemented both a Luhn Checksum calculator and a validator in a PowerShell notebook:
If you have never tried PowerShell Notebooks in Azure Data Studio before, consider this an invitation to try them out. They are quite simple to set up, and wonderful to work with!
It's important to remember that since a Luhn Checksum is only a single digit, it's chances of a "collision", or two different values producing the same checksum, is about 1 in 10. As I stated earlier, it is for the detection of simple mistakes, such as transposition of digits and errors in copying. It is not for security or cryptographic uses, nor was it ever designed to be.
Certain digit transpositions also cannot be detected by this algorithm, such as 09 and 90. I have examples of this and others in the PowerShell notebook above.
Is It Worth It?
Since the Luhn Algorithm can only detect about 90% of errors, is it really worth using? I would argue yes, and judging from how pervasive it is I would say that others agree with me. It may not be strong enough for cryptographic use, but given how it can detect honest mistakes with ~90% accuracy, I'd say that's a big deal.
Consider an identifying number without any checksums: Social Security Numbers in the United States. They are literally just a number, with no internal checking capability whatsoever. If you screw up one digit when entering your SSN anywhere, you likely are now using someone else's number. And while SSNs are often copied for purposes of committing fraud, I've also seen enough issues stemming from honest mistakes that a checksum to detect this would make a huge difference. Unfortunately, the SSN predates the Luhn Algorithm by over 20 years, so it was not yet an option at that point.
For a humorous look at just how terrible Social Security Numbers are and how they got to be that way, check out this video:
And finally, I should mention that just before I was getting ready to post this, it was pointed out to me that Øyvind Kallstad wrote a blog post on this same topic, also with a PowerShell implementation. His code is definitely more elegant than mine, and worth your time if you want to learn more.