Who of my contacts is already using this service?

That’s a natural question for every new user of a communication service like social networks etc. Popular examples of companies who have to deal with this question are Facebook, WhatsApp, and Instagram. A common method involves uploading the user’s address book to the service. However, this is a critical move because the service gets a lot of personal data of people who never agreed to share their data with a third party.

In some countries such as here in Germany, such practices are even strictly forbidden because citizens should be able to control the use of their personal data. Our constitutional court even gave this principle a name: it is called “informational self-determination” (“informationelle Selbstbestimmung”) and it became a basic right in 1983, only a few weeks before Orwell’s year began …

Sure, that’s why we’re hashing that stuff before uploading!

If you don’t want to upload the address book in plain text, you could upload the hashes of its entries. Instead of the message “I know someone whose phone number is +1234567890”, only the message “I know someone whose phone number has the hash 12039d6dd9a7e27622301e935b6eefc78846802e”. The server can then compare this to the phone number hashes of its users to find out whether it knows this person.

Hashing contact information before uploading is a good start. When using a cryptographic hash function like SHA-1 or something from the SHA-2 family, you are working with a solid tool. But all of these hash functions are designed to be fast and you can bruteforce them similarly to password hashes.

Bruteforcing email addresses

The easiest way to bruteforce large amounts of email hashes is to use the fact that a lot of people use one of a small set of email providers. Having a fixed email domain (e.g. “@gmail.com”) allows you to concentrate on the username. Most people use short usernames, let’s say less than 15 characters. Popular are [a-z0-9.-], i.e. 38 different chars per position.

That’s 38^15 ≈ 5*10^23 ≈ 79 bits.

For shorter usernames (10 chars), we have 38^10 ≈ 6*10^15 ≈ 52 bits.

When we consider that email addresses are not random strings but usually include real names or dictionary words, the real amount of information contained in the address is much less, which makes bruteforcing fast. Let’s have a wild guess: If one char of the usual email address contains 3 bits of information, we have only around 30 bits for the whole username. With current bruteforcing tools such as Hashcat, a hash of such an email address can be cracked in seconds.

Bruteforcing phone numbers

Assume you know that the majority of your users is from a specific country, let’s say Germany.
Their mobile phone number will start with a +49 followed by one of the 17 possible mobile carrier codes and a 7-8 digit number.

That’s 17*10^8 ≈ 2^31, i.e. less that 2 billion possibilities. Again, this can be cracked in seconds.

Additionaly, one can store the results of the bruteforce attack in a rainbow table, making it easy to reconstruct a large amount of phone numbers from their hashes all at once.

Conclusion

Pure contact information hashing is useless in case of phone numbers and weak in case of email addresses.

What can be done?

You can use the power of combinations. Assume there are 2^52 email addresses and 2^31 phone numbers, then there are 2^83 email-phone pairs, which is much harder to bruteforce.

Implementation

We’ll suggest a way of more secure user discovery, where your user needs to know
email and phone number to find a peson from his address book on your service.

Ask your user: Help your contacts to find you on [name of service]. Only those who already know your email and phone number will find you. Please enter: email, phone.

Then you generate the discovery hash for your user like that: discovery_hash = hash(user.email + user.phone) where + is meant to be the string concatenation and hash a cryptographic hash function.

On the client, your user generates the same hashes for each contact in his address book
contact_hash = hash(contact.email + contact.phone).

Note: There might be several hashes per contact if people have multiple email
addresses and phone numbers (one for every combination). But the number of hashes will still be low (8 hashes if there are 4 emails and 2 phone numbers in one contact).

Make hash cryptographically secure, slow and and it’s output big since attackers suffer from that much more than honest users do. The hashes from the SHA family are generally regarded as secure, but they are very fast. The recommended way to make a hash function slower is by iteration,
as discussed for example in PKCS #5.

3 thoughts on “Who of my contacts is already using this service?

  1. Hi Simon.

    Great post about a critical topic. Thank you for explaining this big issue, which most people might not have thought about. Also thank you for this feasible solution!

    Cheers David

  2. Some thoughts on how to find out one’s telephone number by having their email address (which is most times easy to very easy to obtain):

    Use the known email and concatenate it with the enumerated phone numbers (less than 2 billion). Do the hashing and the comparison, when it matches you have the number that belongs to the email and the person in question. Done in seconds.

    This should also be taken into consideration!

    Best Regards, Arne P.

    • Yes, if you know the email address for a given hash, you can brute-force the phone number. However, in the case of user discovery, you only get a list of hashes from the client’s address book, without any further information.

      Sure, we could achieve something in this direction by storing every hash ever sent to us so that we would accumulate a large database of hashes of contacts who are not our customers. If we wanted to know the phone number that corresponds to a given email address and this email address is contained in one of the hashes, we could use brute force. However, we only store the hashes of our customers, whose phone numbers we already know… Hashes for which no match was found (=non-customers) are thrown away immediately.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.