The Luhn mod N algorithm is an extension to the Luhn algorithm that allows it to work with sequences of non-numeric characters. This can be useful when a check digit is required to validate an identification string composed of letters, a combination of letters and digits or even any arbitrary set of characters.
Informal explanation
The Luhn mod N algorithm generates a check digit within the same range of valid characters as the input string. For example, if the algorithm is applied to a string of lower-case letters, the check character will also be a lower-case letter. Apart from this distinction, it resembles very closely the original algorithm. The main idea behind the extension is that the full set of valid input characters is mapped to a list of code-points. The algorithmprocesses the input string by converting each character to its associated code-point and then performing the computations in mod N. Finally, the resulting check code-point is mapped back to obtain its corresponding check character.
Mapping characters to code-points
Initially, a mapping between valid input characters and code-points must be created. For example, consider that the valid characters are the lower-case letters from a to f. Therefore, a suitable mapping would be:
Character
a
b
c
d
e
f
Code-point
0
1
2
3
4
5
Note that the order of the characters is completely irrelevant. This other mapping would also be acceptable :
Character
c
e
a
f
b
d
Code-point
0
1
2
3
4
5
It is also possible to intermix letters and digits. For example, this mapping would be appropriate for lower-case hexadecimal digits:
Character
0
1
2
3
4
5
6
7
8
9
a
b
c
d
e
f
Code-point
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Algorithm in C#
Assuming the following functions are defined: int CodePointFromCharacter char CharacterFromCodePoint int NumberOfValidInputCharacters
The function to generate a check character is: char GenerateCheckCharacter
And the function to validate a string is: bool ValidateCheckCharacter
Algorithm in Java
Assuming the following functions are defined: int codePointFromCharacter char characterFromCodePoint int numberOfValidInputCharacters
The function to generate a check character is: char generateCheckCharacter
And the function to validate a string is: boolean validateCheckCharacter
Example
Generation
Consider the above set of valid input characters and the example input string. To generate the check character, start with the last character in the string and move left doubling every other code-point. The "digits" of the code-points as written in base 6 should then be summed up:
Character
a
b
c
d
e
f
Code-point
0
1
2
3
4
5
Double
2
6 10
10 14
Reduce
0
2
2
1 + 0
4
1 + 4
Sum of digits
0
2
2
1
4
5
The total sum of digits is 14. The number that must be added to obtain the next multiple of 6 is 4. This is the resulting check code-point. The associated check character is e.
Validation
The resulting string can then be validated by using a similar procedure:
Character
a
b
c
d
e
f
e
Code-point
0
1
2
3
4
5
4
Double
2
6 10
10 14
Reduce
0
2
2
1 + 0
4
1 + 4
4
Sum of digits
0
2
2
1
4
5
4
The total sum of digits is 18. Since it is divisible by 6, the check character is valid.
Implementation
The mapping of characters to code-points and back can be implemented in a number of ways. The simplest approach is to use ASCII code arithmetic. For example, given an input set of 0 to 9, the code-point can be calculated by subtracting the ASCII code for '0' from the ASCII code of the desired character. The reverse operation will provide the reverse mapping. Additional ranges of characters can be dealt with by using conditional statements. Non-sequential sets can be mapped both ways using a hard-coded switch/case statement. A more flexible approach is to use something similar to an associative array. For this to work, a pair of arrays is required to provide the two-way mapping. An additional possibility is to use an array of characters where the array indexes are the code-points associated with each character. The mapping from character to code-point can then be performed with a linear or binary search. In this case, the reverse mapping is just a simple array lookup.
Weakness
This extension shares the same weakness as the original algorithm, namely, it cannot detect the transposition of the sequence to . This is equivalent to the transposition of 09 to 90. On a positive note, the larger the set of valid input characters, the smaller the impact of the weakness.