EIP-55: Mixed-case checksum address encoding.

Mistakes can get fairly expensive when it comes to dealing with money. Because of this, Bitcoin implemented error detection using checksums to catch attempts to use invalid addresses. Ethereum decided to take more compositional approach and delegate this functionality to higher-level systems, but while they are being created and adopted many users are losing their money. Implementing checksum is fairly easy, but the tricky part is doing it in a backwards compatible way. Take a minute and try to come up with a way to add checksums to Ethereum addresses without changing them. Hint:

Ethereum Addresses are based on the Hexadecimal format (also base16 or hex). [...]. Ethereum addresses are not case sensitive and can be used as lowercase or uppercase. Since addresses are case insensitive, what if we use casing to embed checksum information? That's exactly what Vitalik proposed in EIP-55:

def checksum_encode(addr): # Takes a 20-byte binary address as input
    hex_addr = addr.hex()
    checksummed_buffer = ""

    # Treat the hex address as ascii/utf-8 for keccak256 hashing
    hashed_address = eth_utils.keccak(text=hex_addr).hex()

    # Iterate over each character in the hex address
    for nibble_index, character in enumerate(hex_addr):

        if character in "0123456789":
            # We can't upper-case the decimal digits
            checksummed_buffer += character
        elif character in "abcdef":
            # Check if the corresponding hex digit (nibble) in the hash is 8 or higher
            hashed_address_nibble = int(hashed_address[nibble_index], 16)
            if hashed_address_nibble > 7:
                checksummed_buffer += character.upper()
            else:
                checksummed_buffer += character
        else:
            raise eth_utils.ValidationError(
                f"Unrecognized hex character {character!r} at position {nibble_index}"
            )

    return "0x" + checksummed_buffer

Basically we are computing a checksum using keccak256 and use its hex representation as a mask to decide whether hex digit of the address should be upper-cased based on a value of the checksum at the same position:

...
            # Check if the corresponding hex digit (nibble) in the hash is 8 or higher
            hashed_address_nibble = int(hashed_address[nibble_index], 16)
            if hashed_address_nibble > 7:
                checksummed_buffer += character.upper()
            else:
                checksummed_buffer += character
...

Older clients ignore the case and don't perform the check, hence backwards compatibility, but newer clients perform the check and detect errors. It's fascinating that such a simple, cheap and backwards compatible technique

On average there will be 15 check bits per address, and the net probability that a randomly generated address if mistyped will accidentally pass a check is 0.0247%. This is a ~50x improvement over ICAP, but not as good as a 4-byte check code.

This underscores the value of specification, since it defines behavior, and is a great example of clever way to embed additional information.