Lyra2


Lyra2 is a password hashing scheme that can also work as a key derivation function. It received a special recognition during the Password Hashing Competition in July 2015., which was won by Argon2. Besides being used for its original purposes, it is also in the core of proof-of-work algorithms such as Lyra2REv2, adopted by Vertcoin, MonaCoin, among other cryptocurrencies
Lyra2 was designed by Marcos A. Simplicio Jr., Leonardo C. Almeida, Ewerton R. Andrade, Paulo C. F. dos Santos, and Paulo S. L. M. Barreto from Escola Politécnica da Universidade de São Paulo.. It is an improvement over Lyra, previously proposed by the same authors. Lyra2 preserves the security, efficiency and flexibility of its predecessor, including: the ability to configure the desired amount of memory, processing time and parallelism to be used by the algorithm; and the capacity of providing a high memory usage with a processing time similar to that obtained with scrypt. In addition, it brings the following improvements when compared to its predecessor:
This algorithm enables parameterization in terms of:
The main strengths of the algorithm are:
As any PHS, Lyra2 takes as input a salt and a password, creating a pseudorandom output that can then be used as key material for cryptographic algorithms or as an authentication string.
Internally, the scheme's memory is organized as a matrix that is expected to remain in memory during the whole password hashing process: since its cells are iteratively read and written, discarding a cell for saving memory leads to the need of recomputing it whenever it is accessed once again, until the point it was last modified.
The construction and visitation of the matrix is done using a stateful combination of the absorbing, squeezing and duplexing operations of the underlying sponge, ensuring the sequential nature of the whole process.
Also, the number of times the matrix's cells are revisited after initialization is defined by the user, allowing Lyra2's execution time to be fine-tuned according to the target platform's resources.
  1. *** Functions/symbols ***
  2. || Concatenate two strings
  3. ^ Bitwise XOR
  4. Wordwise add operation
  5. % Modulus
  6. W The target machine's word size
  7. omega Number of bits to be used in rotations
  8. >>> Right rotation
  9. rho Number of rounds for reduced squeeze or duplexing operations
  10. blen Sponge's block size in bytes
  11. H or H_i Sponge with block size blen and underlying permutation f
  12. H.absorb Sponge's absorb operation on input
  13. H.squeeze Sponge's squeeze operation of len bytes
  14. H.squeeze_ Sponge's squeeze operation of len bytes using rho rounds of f
  15. H.duplexing Sponge's duplexing operation on input, producing len bytes
  16. H.duplexing_ Sponge's duplexing operation on input, using rho rounds of f, producing len bytes
  17. pad Pads a string to a multiple of blen bytes
  18. lsw The least significant word of input
  19. len Length of a string, in bytes
  20. syncThreads Synchronize parallel threads
  21. swap Swap the value of two inputs
  22. C Number of columns on the memory matrix
  23. P Degree of parallelism
  24. *** Inputs ***
  25. password
  26. salt
  27. t_cost
  28. m_cost
  29. outlen
  30. *** Algorithm without parallelism ***
  31. ** Bootstrapping phase: Initializes the sponge's state and local variables
  32. Byte representation of input parameters
params = outlen || len || len || t_cost || m_cost || C
  1. Initializes the sponge's state
H.absorb
  1. Initializes visitation step, window and first rows that will feed
gap = 1
stp = 1
wnd = 2
sqrt = 2
prev0 = 2
row1 = 1
prev1 = 0
  1. ** Setup phase: Initializes a memory matrix, its cells having blen-byte cells
  2. Initializes M, M and M
for col = 0 to C-1
M = H.squeeze_
for col = 0 to C-1
M = H.duplexing_
for col = 0 to C-1
M = H.duplexing_
  1. Filling Loop: initializes remainder rows
for row0 = 3 to m_cost-1
# Columns Loop: M is initialized and M is updated
for col = 0 to C-1
rand = H.duplexing_
M = M ^ rand
M = M ^
# Rows to be revisited in next loop
prev0 = row0
prev1 = row1
row1 = % wnd
# Window fully revisited
if
# Doubles window and adjusts step
wnd = 2 * wnd
stp = sqrt + gap
gap = -gap

# Doubles sqrt every other iteration
if
sqrt = 2 * sqrt
  1. ** Wandering phase: Iteratively overwrites pseudorandom cells of the memory matrix
  2. Visitation Loop: rows revisited in pseudorandom fashion
for wCount = 0 to
# Picks pseudorandom rows
row0 = lsw % m_cost
row1 = lsw % m_cost
# Columns Loop: updates both M and M
for col = 0 to C-1
# Picks pseudorandom columns
col0 = lsw % C
col1 = lsw % C
rand = H.duplexing_
M = M ^ rand
M = M ^
# Next iteration revisits most recently updated rows
prev0 = row0
prev1 = row1
  1. ** Wrap-up phase: output computation
  2. Absorbs a final column with a full-round sponge
H.absorb
  1. Squeezes outlen bits with a full-round sponge
output = H.squeeze
  1. Provides outlen-long bitstring as output
return output
  1. *** Algorithm with parallelism ***
for each i in , M_i and M_i
for col = 0 to C-1
M_i = H_i.squeeze_
for col = 0 to C-1
M_i = H_i.duplexing_
for col = 0 to C-1
M_i = H_i.duplexing_
# Filling Loop: initializes remainder rows
for row0 = 3 to
# Columns Loop: M_i is initialized and M_j is updated
for col = 0 to C-1
rand = H_i.duplexing_
M_i = M_i ^ rand
M_j = M_j ^
# Rows to be revisited in next loop
prev0 = row0
prevP = rowP
rowP = % wnd
# Window fully revisited
if
# Doubles window and adjusts step
wnd = 2 * wnd
stp = sqrt + gap
gap = -gap

# Doubles sqrt every other iteration
if
sqrt = 2 * sqrt

# Synchronize point
if
sync = sync +
j = % P
syncThreads
syncThreads
# ** Wandering phase: Iteratively overwrites pseudorandom cells of the memory matrix
wnd = m_cost /
sync = sqrt
off0 = 0
offP = wnd
# Visitation Loop: rows revisited in pseudorandom fashion
for wCount = 0 to
# Picks pseudorandom rows and slices
row0 = off0 +
rowP = offP +
j = lsw % P
# Columns Loop: update M_i
for col = 0 to C-1
# Picks pseudorandom column
col0 = lsw % C
rand = H_i.duplexing_
M_i = M_i ^ rand
# Next iteration revisits most recently updated rows
prev0 = row0

# Synchronize point
if
sync = sync + sqrt
swap
syncThreads
syncThreads
# ** Wrap-up phase: output computation
# Absorbs a final column with a full-round sponge
H_i.absorb
# Squeezes outlen bits with a full-round sponge
output_i = H_i.squeeze
  1. Provides outlen-long bitstring as output
return output_0 ^... ^ output_

Security analysis

Against Lyra2, the processing cost of attacks using of the amount of memory employed by a legitimate user is expected to be between and, the latter being a better estimate for, instead of the achieved when the amount of memory is, where is a user-defined parameter to define a processing time.
This compares well to scrypt, which displays a cost of when the memory usage is, and with other solutions in the literature, for which the results are usually.
Nonetheless, in practice these solutions usually involve a value of lower than those attained with the Lyra2 for the same processing time.

Performance

The processing time obtained with a SSE single-core implementation of Lyra2 are illustrated in the hereby shown figure. This figure was extracted from, and is very similar of third-party benchmarks performed during the PHC context.
The results depicted correspond to the average execution time of Lyra2 configured with,, bits, and different and settings, giving an overall idea of possible combinations of parameters and the corresponding usage of resources.
As shown in this figure, Lyra2 is able to execute in: less than 1 s while using up to 400 MB or up to 1 GB of memory ; or in less than 5 s with 1.6 GB.
All tests were performed on an Intel Xeon E5-2430 equipped with 48 GB of DRAM, running Ubuntu 14.04 LTS 64 bits, and the source code was compiled using gcc 4.9.2.