What Is SHA-1, and Why Will Retiring It Kick Thousands Off the Internet?

On the first day of 2016, Mozilla terminated support for a technology called SHA-1 in the Firefox web browser. Almost immediately, they reversed their decision, because it would have cut off important internet access to thousands–if not millions–of people. But what does this technology do? Why would it cut off access? And why would they ever decide to remove it in the first place?

What Is SHA-1?

The SHA in SHA-1 stands for Secure Hash Algorithm, and, simply put, you can think of it as a kind of math problem or method that scrambles the data that is put into it. Developed by the United States NSA, it’s a core component of many technologies used to encrypt important transmissions on the internet. Common encryption methods SSL and TLS, which you might have heard of, can use a hash function like SHA-1 to create the signed certificates you see in your browser toolbar.

We won’t go deep into the math and computer science of any of the SHA functions, but here’s the basic idea. A “hash” is a unique code based on the input of any data. Even small, random string of letters input into a hash function like SHA-1 will return a long, set number of characters, making it (potentially) impossible to revert the string of characters back to the original data. This is how password storage usually works. When you create a password, your password input is hashed and stored by the server. Upon your return, when you type in your password, it is hashed again. If it matches the original hash, the input can be assumed to be the same, and you’ll be granted access to your data.

Hash functions are useful primarily because they make it easy to tell if the input, for instance, a file or a password, has changed. When the input data is secret, like a password, the hash is nearly impossible to reverse and recover the original data (also known as the “key”). This is a bit different from “encryption”, whose purpose is scrambling data for the purpose of descrambling it later, using ciphers and secret keys. Hashes are simply meant to ensure data integrity–to make sure that everything is the same. Git, the version control and distribution software for open source code, uses SHA-1 hashes for this very reason.

That’s a lot of technical information, but to put it simply: a hash is not the same thing as encryption, since it is used to identify if a file has changed.

How Does This Technology Affect Me?

Let’s say you need to visit a website privately. Your bank, your email, even your Facebook account–all use encryption to keep the data you send them private. A professional website will provide encryption by obtaining an certificate from a trusted authority–a third party, trusted to ensure that the encryption is on the level, private between the website and user, and not being spied on by any other party. This relationship with the third party, called Certificate Authorities, or CA, is crucial, since any user can create a “self-signed” certificate–you can even do it yourself on a machine running Linux with Open SSL. Symantec and Digicert are two widely-known CA companies, for example.

Let’s run through a theoretical scenario: How-To Geek wants to keep logged in users sessions private with encryption, so it petitions a CA like Symantec with a Certificate Signing Request, orCSR. They create a public key and private key for encrypting and decrypting data sent over the internet. The CSR request sends the public key to Symantec along with information about the website. Symantec checks the key against its record to verify that the data is unchanged by all parties, because any small change in the data makes the hash radically different.

Those public keys and digital certificates are signed by hash functions, because the output of these functions are easy to see. A public key and certificate with a verified hash from Symantec (in our example), an authority, assures a user of How-To Geek that the key is unchanged, and not sent from someone malicious.

Because the hash is easy to monitor and impossible (some would say “difficult”) to reverse, the correct, verified hash signature means that the certificate and the connection can be trusted, and data can be agreed to be sent encrypted from end to end. But what if the hashwasn’t actually unique?

Wait, It’s a Problem That It’s My Birthday?

You might have heard of the “Birthday Problem” in mathematics, although you might not have known what it was called. The basic idea is that if you gather a large enough group of people, chances are pretty high that two or more people will have the same birthday. Higher than you’d expect, in fact–enough that it seems like a weird coincidence. In a group as small as 23 people, there’s a 50% chance that two will share a birthday.

This is the inherent weakness in all hashes, including SHA-1. Theoretically, the SHA function should create a unique hash for any data that is put into it, but as the number of hashes grows, it becomes more likely that different pairs of data can create the same hash. So in theory, one could create an untrusted certificate with an identical hash to a trusted certificate. If they got you to install that untrusted certificate, it could masquerade as trusted, and distribute malicious data.

Finding matching hashes within two files is called a collision attack, and because of the increasing power (and falling cost) of computing, it’s becoming obvious that if SHA-1 is not retired soon, it will be within reach to make phony certificates out of duplicate hashes.

The math behind this is actually pretty simple if we make an educated guess for 2^14 * 2^60 processor cycles to find a duplicate hash. Amazon rents out servers at a rate of mere cents an the hour, and if speeds keep increasing and the server-per-hour cost keeps dropping, surpassing this number of cycles could cost as little as 173 thousand dollars by 2018. That means real life cybervillians could soon be able to afford the creation of certificates that appear signed by trusted authorities.

Are Collision Attacks Already Possible?

Not only are collision attacks possible, at least one large scale collision attack is known to have already happened. Back in 2012, computers were infected through the Windows update tool because Microsoft was signing their certificates with an MD5 hash. This older method of creating hash signatures is now incredibly easy to generate a pair for, making all MD5 signed hash certificates fairly useless.

Open-source software Hashclash can be utilized on an Amazon server by practically anyone with the know-how to generate pairs of MD5 hashes for the low low price of about 65 cents, taking about ten hours. And, as we’ve discussed, it will soon be well within reach to crack the duplicate signature problem even for SHA-1.

This means that any website using SHA-1 signed encryption certificates could soon be just as problematic as those using an MD5 hash. If that happens, users could unknowingly transmit sensitive data to a “man in the middle” presenting a fake certificate with an identical hash, or even allowing something as sophisticated as the local hijacking of the windows update tool to propagate malicious software.

Why Did Mozilla Decide to Restore Support for SHA-1?

As one of the most popular free and open-source browsers available, many see Mozilla as an authority on internet security. Their intentions are certainly good, and nearly all authorities with a stake in online security agree that it should be phased out as soon as possible. The unfortunate truth of the matter, though, is that old tech has a tendency to never die. (Who’s reading this on Windows XP?)

SHA-1 hashed certificates are unfortunately far more popular on the web than they have any right to be, so Mozilla flatly cutting off support for them blocks encryption for websites incapable of updating to the much better SHA-2 or SHA-256. Websites incapable of (or simply too lazy to spend the time) updating their certs could be cut off entirely from their users using Firefox. When you take into consideration that this is probably likely to happen in poorer countries with more limited access to the internet, it goes against Mozilla’s basic principles of a right to internet access. Needless to say, Mozilla has quickly reversed their decision and updated Firefox to allow SHA-1 certs for a limited time longer.

Is There Life After SHA-1?

SHA-2 and the related family of hash algorithms were created in 2001, so they’ve been around and thoroughly tested, and are most likely adopted by most of the important services that you use. In addition to this, SHA-3 was released August of 2015. It may one day be used to sign certificates, or perhaps another, even more sophisticated hashing method will take its place.

The SHA-1 hash is still temporarily secure, as the investment in creating a collision pair costs more than half a million dollars of server time in 2016. But it is still possible, and that threat is worth knowing about.

Techy Games

About