Hash Functions and Cryptographic Hashing in Java

Hash functions are a foundational topic in computer science and have many practical applications in many areas of software engineering. Cryptographic hashes due to their unique properties are used in cryptography and also play a key role in the implementation of cryptocurrencies, blockchains and other distributed ledger technologies. It’s important to understand this topic as more and more Web 3.0 technologies will emerge that heavily use cryptography to achieve security in the new decentralized web of the future. For this reason, I thought I should write a post covering the basics of hash functions, and how to generate cryptographic hashes using Java.

What is a Hash Function?

First let’s start with the basics. A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. Hash functions always produce a fixed number of bits as output regardless of the size of the input and these values are often represented as integers or a hexadecimal string when working with them in programming languages.

Hash functions have the following properties:

  • Fixed size output – Hashes generated are always the same length, regardless of the size of the input data set.
  • Determinism – A given input will always produce the same output.
  • Uniformity – Hash outputs should have roughly the same probabilities of being generated as each other.
  • One way – Cannot be reversed. Hash output cannot be used to compute the input (although there are other brute force methods for trying to compute the input that produced the hash value).

In addition to the above, cryptographic hash functions also have the following properties:

  • Collision resistant – It should be very unlikely to find two inputs that produce the same hash value.
  • Pre-image resistant – Output hash must not reveal any information about it’s input.
  • Large avalanche – A small change in the input should produce a large change in the output hash.

Types of Hashing Algorithms

MD5

MD5 is a 128 bit hash algorithm which was originally designed to be used in cryptography. Since it’s creation vulnerabilities have been found in its design and so for this reason it should no longer be used. It is a fast algorithm and so attackers can use brute force attacks to discover collisions.

To generated a MD5 hash in Java we can run the following code:

byte[] input = "test-data".getBytes();
MessageDigest digest = MessageDigest.getInstance("MD5");
byte[] hash = digest.digest(input);

To format the bytes[] as hexidecimal and print the hash to the console:

Formatter formatter = new Formatter();
for (byte b : hash) {
    formatter.format("%02x", b);
}

System.out.println(formatter.toString());

The first few lines use the MessageDigest class from the Java standard library which simply takes a byte[] as input, calculates the hash and returns it as a byte[].

SHA-1

SHA-1 is a 160 bit cryptographic hash algorithm that was created by the NSA in 1995. It became widely used for many purposes and was most often used to verify the integrity of files. Vulnerabilities have since been found in SHA-1 in 2005 and so this is another hash that should generally no longer be used.

The SHA-1 hash can easily be computed in Java using the following code:

byte[] input = "test-data".getBytes();
MessageDigest digest = MessageDigest.getInstance("SHA-1");
byte[] hash = digest.digest(input);

SHA-2

SHA-2 is a family of hashes later created by the NSA to improve on the weaknesses of its predecessor SHA-1. The new algorithm has significant changes and improvements and supports hashes with sizes of 224, 256, 384, or 512 bits. SHA-2 is also commonly referred to as SHA-256 which is the most commonly used bit length and is also the hashing algorithm used by the Bitcoin protocol. At the time of writing SHA-2 is still considered to be secure. As you would expect the larger bit lengths are more secure.

Here is the code to compute the SHA-256 hash:

byte[] input = "test-data".getBytes();
MessageDigest digest = MessageDigest.getInstance("SHA-256");
byte[] hash = digest.digest(input);

SHA-3 and Keccak

SHA-3 was released by NIST in 2015 and uses a completely different type of algorithm to the MD5, SHA-1 and SHA-2 type hashes. The most commonly used bit length is the SHA3-256 but 224, 384, and 512 bit lengths are also supported. Keccak-256 is another hash that serves as an alternative to the standard SHA3-256 and delivers the same security and only differs from SHA3-256 on the padding rule. Keccak-256 is also the hash used by the Ethereum blockchain protocol.

To create a SHA3-256 in Java we can use this code (Note: that the newer SHA-3 algorithms were introduced in Java 9 so you will need Java 9 or later to use the SHA3-256 algorithm as in the example):

byte[] input = "test-data".getBytes();
MessageDigest digest = MessageDigest.getInstance("SHA3-256");
byte[] hash = digest.digest(input);

Bcrypt

In the past MD5 and then later SHA1 hashes were used to hash passwords but as vulnerabilities were discovered better algorithms were developed. Brypt is an ideal hash for storing passwords because it is slow and therefore resistant to brute force attacks and has a configurable strength. Java doesn’t currently have direct support for generating Bcrypt hashes but Spring Security does provide an implementation in the BcryptPasswordEncoder class.

Java Hashing Libraries

Guava Library

The Google Guava library provides a utility class which can make working with hashes a bit easier. First we need to include the maven dependancy:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>28.1-jre</version>
</dependency>

Then we can hash strings directly without converting to and from bytes:

String hashHex = Hashing.sha256()
        .hashString("test-data", StandardCharsets.UTF_8)
        .toString();

Apache Commons Codecs

The Apache Commons Codecs library also provides a useful util class called DigestUtils that can generate hashes from Strings. To include the maven dependancy:

<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.13</version>
</dependency>

To hash a string and get the digest as hex:

String hashHex = DigestUtils.sha256Hex("test-data");

Bouncy Castle Library

Bouncy Castle is is the leading Java cryptography library. It has a helper class that can encode bytes as Hex and also supports the Keccak algorithm which is not supported by the Java standard library. To use it include the following maven dependancy:

<dependency>
    <groupId>org.bouncycastle</groupId>
    <artifactId>bcprov-jdk15on</artifactId>
    <version>1.63</version>
</dependency>

This code will generate a Keccak hash using the Bouncy Castle Library and uses the Hex util class to encode the hash bytes as hex:

Keccak.Digest256 digest = new Keccak.Digest256();
byte[] hash = digest.digest("test-data".getBytes(StandardCharsets.UTF_8));
String hashHex = new String(Hex.encode(hash));

Summary

So that covers the basics of hash functions and gives you some examples of how to generate hashes in Java. I covered just a few of the most widely known hash algorithms MD5, SHA-1, SHA-2, SHA-3 and Bcrypt as well as detailed some useful libraries which you can include in your code to make things easier when dealing with hashes. The Java Standard library also provides basic support for common hash algorithms so using the third party libraries may not always be needed depending on the use case.

Leave a Reply

Your email address will not be published. Required fields are marked *

Share via
Copy link
Powered by Social Snap