Tuesday, October 14, 2014

Why Web Sites Don’t Need to Store Your Password

It seems counterintuitive, but web sites that require logins don’t actually need to store your password.  And they actually shouldn’t – it is a very bad idea to do so.   We see too many leaks of account databases for it to ever be safe to store passwords in any form, even if encrypted.

So how can a site validate a login if it doesn’t store the password?  The answer is something really cool called a hash function.  I know your eyes just glazed over, but bear with me, the concept is actually simple.

A hash function is a way of processing data that is one-way… you can put data in, and always get the same result coming out, but there is no way to reverse the process to get back the original data.  I won’t get into the specifics of how hashes actually work, but I can describe a very simple hash that will illustrate the principle.

Say, for the sake of simplicity, we are creating a web site that uses a 4-digit PIN as a password to log in.  We know that storing the PIN itself is dangerous because it could be leaked out or viewed by site administrators, so instead we add up the four digits and store that sum.  So if my PIN is 2468, we store 20 (2+4+6+8) in the database.  When we go back to the site to log in, the site can add up the four digits we enter for the PIN, compare that result against the sum in the database, and validate that we know what the correct PIN number is.  A hacker that gets his hands on the database only knows that the sum of the digits is 20… he can’t possibly know that the original PIN was 2468.  They’d have to guess what the original PIN number was by trying different combinations.

Of course this is overly simplified.  This demonstration hash function wouldn’t really work in the real world because it is too easy to figure out combinations that would let hackers in.  This situation is called a collision… 8642, 5555, 8282, 1991, and 6446 all produce the same hash value of 20.  But real hash functions used for account login verification are much, much more complicated, and aren’t normally subject to problems with collisions.  But you get the idea.  Instead of storing the actual password, we store a value that is calculated from the password.  We can validate that someone knows the password without actually storing that password.

This has other advantages as well.  For example, using a hash function there is no limit to the length of the password, because hash result values are always the same length regardless of the amount of data going in.  Someone could enter 6 letters, or 200 random symbols, and either one can be hashed down to a value of a standard length that can be stored in the database. 

Because of this, you can sometimes tell web sites that don’t use hashes to securely store passwords because they enforce a maximum length for passwords.  This isn’t always the case, but it can be one indicator that the site’s security has been poorly designed.  But if you are signing up for an account on a web site and they have a low limit on the length of the password, like 12 characters, you might look for other signs of poor site security or privacy policies.  And definitely don’t reuse a password from another site.  Or just steer clear.

The down side to using hashes is that if you forget your password the site has no way of sending it to you… because they actually don’t know it.  That is why sites generate a brand-new, random passwords that they send to you via email when you forget your password.  They honestly have no idea what your password was, so the only solution is to create a new one and use that temporarily until you create your own.

The whole process is considerably more complicated than I’ve described here – or at least it should be.  Just using a hash isn’t sufficient, either, because we’ve got affordable computers these days that can calculate billions of hashes per second and are therefore capable of brute-forcing short passwords very quickly.  (A 6-letter password, for example, would be cracked hundreds of times over in just one second using a simple hash).  But for a site to use a hash on passwords is one step in the right direction.

No comments:

Google Search