Data leaks – incidents where unauthorized people have gained access to data collections – happen again and again. To prevent user passwords from being compromised in such a case, it is important that they are not simply stored in plain text. Instead, they should always be stored only “hashed”. This article explains which hash functions are suitable for this purpose.
Introduction
Almost every week, you can read about a new data leak in which hackers have managed to hack into other people’s databases and read out the data stored in them. This data often includes user information such as email addresses and passwords. The website Have I Been Pwned? [1] collects and analyses data leaks. Internet users can check on this website whether their personal data has been leaked or not.
It is precisely for such cases that passwords should not simply be stored in plain text. Instead, passwords should be hashed and only the hash value of a password should be stored. Hash functions [2] are so-called one-way functions. Such functions are easy to calculate, but difficult to reverse. For a password, this means that the hash of the password can be easily calculated, but it is extremely difficult to find out a password using only the hash. Attackers can only calculate the hash for all possible passwords and compare it with the given hash.
Unfortunately, users have no control over how their passwords are stored. Most of the time they don’t even know how their passwords are stored. If a user forgets their password, uses the “forgot password” function of a website and gets their old password in plain text via email, they know that the password is stored either in plain text or in reversible form. Both cases are problematic, and you should make sure that the same password is not used for other web services.
How Passwords Are Stored
There are only a few studies about how passwords are stored. This is not surprising, since this information is usually not publicly available, and if it is, it is usually only said that the passwords are stored securely.
In 2017 [3] and 2018 [4], Naiakshina et al. investigated whether students securely store passwords when developing an application that uses passwords to authenticate users. As part of the study, students were tasked with programming the registration for a social networking platform. Part of the assignment was to securely store passwords. However, only half of the students were told about this. The other half were not explicitly informed that the security of the registration was also a requirement of the project.
The evaluation found that none of the students who were not told to store passwords securely did so. The students did not feel responsible for the security of their implementation because it was not explicitly required of them.
Following the programming portion, the students were asked about their knowledge of security. The majority of the students knew what hashing was. However, it was shown that knowledge about secure software did not automatically lead to secure software. The functionality of the implementation was the main focus. Security was at most an afterthought.
Several students stated in the subsequent interview that they would have stored the passwords securely if they had actually implemented the registration for a company and not as part of a project during their studies. Naiakshina et al. therefore wanted to find out whether this would actually have been the case and repeated the study in 2019 [5] with freelance developers.
They hired over 40 developers through a platform, who then also developed the registration part of a social networking platform. The results of this study were sobering. The majority of freelancers also did not store passwords securely unless they were explicitly asked to do so. There were also many misconceptions about how to store passwords securely, or outdated methods were used that no longer meet current recommendations or are even considered simply insecure.
Only 4 of the 22 freelancers who had not been explicitly informed of this beforehand did not store passwords in plain text. All the others were subsequently asked to store passwords securely and to adapt their implementation.
Table 1 shows which algorithms were used to store passwords.
Method | Number of solutions * |
Base64 | 8 |
MD5 | 10 |
SHA1 | 1 |
SHA256 | 5 |
HMAC-SHA1 | 1 |
Symmetric (3DES) | 3 |
Symmetric (AES) | 3 |
PBKDF2 | 5 |
Bcrypt | 7 |
8 of the 43 developers stored passwords Base64-encoded. Base64 is a method for encoding binary data. Although the password is then no longer recognizable to the naked eye, it is trivial to decode Base64-encoded data.
10 of the 43 developers used the hash algorithm MD5, which has been considered cryptographically broken for over 10 years. 2 developers used SHA1, respectively HMAC-SHA1, which is also considered broken.
6 developers encrypted the passwords symmetrically. Half of them used the Triple-DES (3DES) encryption algorithm, which is vulnerable to multiple attacks and should no longer be used. However, it is generally true that passwords should not be stored encrypted because an attacker who compromises the system on which the passwords are stored often also comes into possession of the key that was used for encryption. After that, they can easily decrypt all encrypted passwords. Moreover, most of these developers used passwords for encryption that would have been quite easy to guess. Furthermore, with encrypted passwords, there is a possibility that an internal employee with malicious intent can figure out the passwords.
5 developers used the hash function SHA256. This hash function is not designed to hash passwords and was optimized for performance. This means that SHA256 hashes can be cracked more quickly.
Only 12 of the developers used PBKDF2 or Bcrypt. Both are modern algorithms suitable for hashing passwords.
None of the developers used a memory-intensive hashing algorithm like Argon2 or Scrypt, which are state-of-the-art.
How come so many outdated algorithms were used? On the one hand, the interviews revealed that developers were using them in part because they had been doing so for years without checking what was currently recommended. On the other hand, it appeared that both the students and the freelancers sometimes copied code from the Internet. While there is nothing wrong with copying code, the fact is that the code one finds is not always up to date. Acar et al. [6] found that answers on Stack Overflow [7] (an Internet platform where software development questions can be asked) are easy to use, but result in less security than official API documentation.
How Passwords Should Be Stored
Passwords need to be stored hashed. That much is clear. But not every hash function is suitable for hashing passwords! Different hash functions have different purposes. Only password hash functions or key derivation functions should be used, since these functions have features that make brute-force attacks much more difficult.
Recommended functions are (based on [8] and [9], among others):
- Argon2: Argon2 is a family of password hashing functions and emerged as the winner of the Password Hashing Competition [10] in 2015. Argon2 currently represents the best choice for password hashing. However, since the functions are still relatively new, some of them are not yet supported on older systems.
- Scrypt: Scrypt is also a newer function that is designed to require a certain amount of memory for computation. This is supposed to make parallelization more expensive. Since Scrypt is also a rather newer function, it has the same disadvantage as Argon2 regarding support.
- Bcrypt: Bcrypt was also designed in such a way that the computation is as elaborate as possible. Via an adjustable cost factor, it can be configured how long the computation of a hash is to last. Bcrypt is probably the most supported algorithm.
- PBKDF2: PBKDF2 is a key derivation function that can derive a cryptographic key from a password. However, the function is also suitable for hashing passwords. PBKDF2 is not memory intensive and can therefore be optimized for GPU.
If the selected hash function has a configurable cost factor, such as the number of hash rounds or memory consumption, this should be adapted to the corresponding system. The value should be chosen as high as possible, depending on how the system on which the hash is calculated allows it.
Since hardware is constantly becoming more powerful and also cheaper, it is important that the cost factor is also increased from time to time. This can be done according to the methods presented in the next chapter for changing the hash function.
If a password is used by several users, then also the same hash comes out. To avoid this, the password is to be linked with a random value – the so-called salt – when calculating the hash. The salt is chosen randomly. This not only results in different hashes being calculated for the same password, but also makes it more difficult to crack password hashes. Some password hashing functions automatically use salts for the calculation of the hash. For other functions salting has to be implemented by the user. For this purpose, a random string should be generated with a cryptographically secure random number generator. Then the salt and the password are concatenated and the resulting string is passed to the hash function. The salt must be stored in the database along with the hash to allow password verification.
Passwords can be not only salted, but also peppered. Salting and peppering basically work very similarly, but unlike salts, the pepper is the same for all stored passwords. The pepper is not stored in the database together with the password hashes, but in a separate location. The purpose of the pepper is to prevent an attacker from cracking stolen hashes if they only have access to the
Migration of Password Hash Function
But what if one suddenly realizes that the currently used password hash function is not secure or no longer up to date? What should be done in such a case? There are several possible courses of action:
Password reset:
From the time of migration, all users will be forced to change their password when logging in. The old password will no longer be accepted. The advantage of this method is that the new hash function takes effect immediately. The disadvantage, however, is that all users are forced to change their password. This will most likely lead to an increased number of support requests. Furthermore, problems may arise if a user is unable to reset the password, for example because they no longer have access to the email account they specified when creating the user account.
Change of function after login:
During login, the user’s password is compared with the old stored hash. If the password is correct, it is hashed with the new hash function and saved. For this purpose, it must be clear for each user which hash function is used, either on the basis of the stored hash (e.g. due to the length) or on the basis of an additional column in the database.Either on the basis of the stored hash (e.g. due to the length) or on the basis of an additional column in the database. It must be ensured that the old hash is no longer taken into account when logging in. The advantage of this method is that the users do not notice anything. However, the disadvantage is that as long as a user does not log in, they are not migrated to the new hash function.
Double hashing:
This variant is similar to the previous one, but all old hashes are hashed and stored with the new password hash function. This ensures that there are no more old hashes in the database. The advantage of this method is also that the migration of the password hash function happens in the background without the users noticing anything. However, this approach leads to a new vulnerability. If an attacker manages to get hold of both the double hashed passwords and passwords hashed with the old hash function (for example, from a completely different database or from known data leaks), they can try to find out passwords by using the list of old hashes as input for the new hash function without the old hashes having been cracked.
These options affect users differently and must be weighed against each other accordingly. Theoretically, it would also be possible to change the hash function at a certain point in time after a successful login and to force a password change at a later point in time for all users who have not logged in in the meantime.
Conclusion
Not every hash function is suitable for hashing passwords. Therefore, the statement that passwords should be stored hashed is not sufficiently precise. A hash function that has been specially designed for hashing passwords should be used.
Furthermore, it should be regularly checked whether the hash function used and the possible cost factor are still up to date. If this is no longer the case, the function or cost factor should be adjusted.
As the studies by Naiakshina et al. have shown, it is important that specifications also include the security requirements. Otherwise, there is a risk that the specification will be implemented without paying attention to security. Security must also be made a priority and placed on the same level as functional correctness.
Sources
- [1]: https://haveibeenpwned.com/
- [2]: https://de.wikipedia.org/wiki/Hashfunktion
- [3]: A. Naiakshina, A. Danilova, C. Tiefenau, M. Herzog, S. Dechand und M. Smith, „Why do developers get password storage wrong?: A qualitative usability study.“, Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 311-328, 2017
- [4]: A. Naiakshina, A. Danilova, C. Tiefenau und S. Matthew, „Deception task design in developer password studies: exploring a student sample.“, Fourteenth Symposium on Usable Privacy and Security (SOUPS) 2018, pp. 297-313, 2018
- [5]: A. Naiakshina, A. Danilova, E. Gerlitz, E. von Zezschwitz und M. Smith, „»If you want, I can store the encrypted password.» A Password-Storage Field Study with Freelance Developers.“, Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1-12, 2019
- [6]: Y. Acar, M. Backes, S. Fahl, D. Kim, L. M. Mazurek und C. Stransky, „You Get Where You’re Looking For – The Impact of Information Sources on Code Security“, 2016 IEEE Symposium on Security and Privacy, pp. 289-306, 2016
- [7]: https://stackoverflow.com/questions/
- [8]: https://tools.ietf.org/html/draft-ietf-kitten-password-storage-02#section-5
- [9]: https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#modern-algorithms
- [10]: https://password-hashing.net