Why is this significant and how did we create these?
Cryptographic hashing functions are widely used across cyber security for quickly uniquely finger-printing an object. Finding two objects that hash to the same value must be infeasible. Hashing is used to verify and validate that an original object has not been altered since it was created. An example of this would be with some internet file downloads, the file owner posts the file with the the hash value of that file. As a user, I can download the file, run a hashing function on the object and verify that the bits I downloaded are the same as the bits from the original file. This guarantee is one of the key underpinnings for security and data integrity.
Let's look at the following scenario. We have two different documents, we run them through a SHA1 Hashing algorithm, and we should get two different results.
This is expected behavior. However, if an attacker is able to manipulate the files to generate a collision, the result can be a malicious file with the same SHA1 hash as a clean file. The risk is a hash collision could potentially make a harmful file appear as a trusted file.
How easy is this? If you check out the following video, you'll see two different documents altered in a matter of seconds to have the same SHA1 hash, but different SHA256 hashes. We were able to generate this simple SHA1 collision using a script available here: https://github.com/nneonneo/sha1collider.
The 2017 work by the shattered team showed that the weaknesses in SHA1
could practically be exploited by using 6610 hours of compute resources.
Since then, additional work by Gaëtan Leurent and Thomas Peyrin lowered
the effort required significantly. It is now estimated that a SHA1
collision can be found with roughly $50,000 of cloud compute.