Most systems today need to handle the user authentication. That means, the password entered during user registration must be stored in the system for later comparison.

It is obvious that the passwords must not be stored in plain-text form. In that case, if an attacker succeeded in getting access to the database, where these passwords are stored (e.g. using SQL Injection), he would obtain the whole list of user names with their corresponding passwords. Then it is very simple for him to impersonate a valid user.

Hashing

However, to check, if the password entered by the user is correct, we do not need the original password. It is enough to have a suitable information, which uniquely identifies it and can be easily computed from each password entering the system.

Such information is the password hash. Hash algorithm is a one-way function, generating a fixed-length string from the inputs (in this case from the given password) with no possibility to derive these inputs back from the computed string. Another property of a cryptographic hash function is that change of one input bit leads to change of many bits in the resulting hash. When the hash function is collision-free, we can assume that the identical hashes imply the identical inputs, from which these hashes are computed.

So instead of the password itself, only its hash will be stored in the system. Every time a user tries to login to the system, hash of the password entered is computed and compared to the stored one.

Slow hashing

However, cryptographic hash functions such as MD5 or SHA are not appropriate. The purpose of these functions is calculation of digest of large amount of data to ensure its integrity. This digest needs to be computed in as short time as possible, and thus these hash functions are designed to be fast. This property is, however, not desirable for password hashing.

As an example take the MD5 function. One 2.13GHz core is able to compute cca 6 million MD5 hashes per second using Cain & Abel tool. Trying every single possible 8 character long lowercase alphanumeric password then takes approximately 130 hours. And that is only one core. Modern computers use more of them, for example with six such cores a password can be cracked in less than a day. Furthermore, we can definitely assume that an attacker has much better equipment.

In order to prevent an attacker from trying millions of hashes per second, we need to use a slow cryptographic hash function for password hashing. Several hash functions were specifically designed for this purpose. These functions include: PBKDF2, bcrypt, scrypt.

Work factor parameters

These hash functions are not only slow, they also come up with work factor parameters defining how expensive the hash computation will be. Although the scrypt function is the youngest one (designed in 2009), it has an advantage over the older ones – it not only defines the CPU cost, but also the memory requirements. That is why scrypt is recommended function for password storage and this article talks mainly about it.

Scrypt uses following work factor parameters:

  • N – number of iterations, related to both memory and CPU cost
  • r – size of the RAM block needed, related to memory cost
  • p – parallelization, defines maximum number of threads, related to CPU cost

These parameters allow to set the memory needed and time it takes to compute one hash. The approximate memory usage for a single hash generation can be computed from the parameters using the following formula:

memory  =  N  ·  2  ·  r  ·  64

The time, on the other hand, is platform-dependent. The graph below shows dependency of time needed for single hash computation on the work factor parameters N and r. The parallelization parameter is set to 1 in all cases. The values in the graph were measured using CryptSharp, the C# implementation of scrypt function, on Windows Server 2012 with four 2.2GHz cores.

csharp_Server12_scrypt

It is needed to specify the computation time as a compromise between the usability and security provided. For example, if we have a system with only one login at a time and high security is needed, we set the parameters to make computation take cca one second. However, in case of many parallel logins this time needs to be set to only few milliseconds.

We can take the above example of password hash cracking. Using scrypt function (CryptSharp implementation) with parameters N=210, r=4 and p=1, hashing of one password takes approximately 10ms, i.e. this 2.2GHz core is able to compute 100 hashes per second. Then computation of all possible 8 character long lowercase alphanumeric passwords takes 895 years.

Attacker goals

Imagine an attacker, who obtained the list of user names and corresponding password hashes. There are now three goals he can have:

  • Crack a password of one specific user (e.g. admin)
  • Crack a password of any user
  • Crack passwords of a longer list of users

Attacks

In the first option the attacker has a password hash and wants to find the corresponding password it was computed from. He can use brute force or dictionary attack, i.e. try many possible inputs to the hashing function and compare the results with the obtained password hash.
An effective method for trying so many hashes is usage of lookup tables. The general idea is to pre-compute hashes of possible passwords and store them in a lookup table data structure (or Rainbow tables for lower memory requirements). Comparison of these pre-computed values with given hash is much faster than hash computation.

The second option is simpler. The only thing needed is to compute hashes of possible inputs and compare each result with all password hashes in the obtained list. Sooner or later the attacker will hit some match.

For cracking a longer list of hashes the attacker does not need to crack one password at a time, he will instead compare each computed hash with all hashes from the list. This way cracking of a hashes list takes approximately the same time as cracking only one specific password.

Salt

The above attacks work because each password is hashed the same way, the same password always results in the same hash. The simplest way of preventing against this is salting. That means, a random string (salt) is generated for each password and used together with it to create a hash.
It is needed to ensure uniqueness of the salts, thus they really need to be randomly generated. Any random number generator can be used, however, cryptographically secure RNGs, such as RNGCryptoServiceProvider in C# or SecureRandom in Java, are recommended.

The salt is a non-secret value, it needs to be stored together with the password hash to ensure its availability to the hash function. Thus, if someone gets access to the hashes, he automatically gets also all the salts. However, the salt power is not in its secrecy, but in randomness.

With different salt, same passwords result in different hashes. Pre-computed hash attack is infeasible due to a large additional memory requirements – an attacker needs to store pre-computed hashes for each possible salt.

Cracking password of any user is reduced to cracking password of a specific one, since the salt for each user password is different.

Also cracking of a larger list of hashes is more complicated with different salt for each password, the attacker has no other choice than cracking one password at a time.

Pepper

In order to increase security even more, we can use another randomly generated string – pepper. In comparison to the salt, pepper needs to be kept secret as it is used as an HMAC key. HMAC is a one-way algorithm based on hash function generating fixed-length string from the input message and a secret key, which in our case is generated pepper.

Since pepper is a secret key, it needs to be generated using a cryptographically secure random number generator, such as RNGCryptoServiceProvider.

public static byte[] GeneratePepper()
{
    byte[] pepper = new byte[32];
    RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
    rng.GetBytes(pepper);
    return pepper;
}

When generated, the pepper must be stored separately in a configuration file with restricted access.

Although an attacker had enough resources to be able to crack the hash function, he would still need this secret value for obtaining the user password. And with pepper randomly generated for each system instance, if one instance is compromised, other remain secure.

Overall scheme

The overall hashing of the password with both salt and pepper looks as follows:

scrypt ( Base64 ( HMAC ( ‘SHA256’, password, pepper ) ), salt, workFactors )

And the C# implementation of this scheme using CryptSharp library:

public byte[] HashPassword(String password, byte[] pepper, byte[] salt)
{
    if (salt == null)
    {
        Console.WriteLine("Password hash not created - salt is null.");
        return null;
    }

    String encodedHmac = HmacBase64(password, pepper);
    return CryptSharp.Utility.SCrypt.ComputeDerivedKey(Encoding.UTF8.
         GetBytes(encodedHmac), salt, n, r, p, null, HASH_LENGTH);
}

private static string HmacBase64(string password, byte[] pepper)
{
    if (pepper == null)
    {
        Console.WriteLine("Password hash not created - pepper is null.");
        return null;
    }
    HMACSHA256 hmac = new HMACSHA256(pepper);
    hmac.Initialize();
    byte[] buffer = Encoding.UTF8.GetBytes(password);
    byte[] rawHmac = hmac.ComputeHash(buffer);
    return System.Convert.ToBase64String(rawHmac);
}

Conclusion

User passwords must never be stored as plain text, always compute its hash using a slow cryptographic hash function. To each password generate random salt and use this value together with the password for hash computation. For higher level of security generate random secret pepper for each system instance.

Of course, security of the user password depends on the password itself. An attacker could still try frequently used passwords such as “123456”, however, with secure storage we can protect him from trying too many of them and from obtaining the strong ones.

Comments