That’s a natural question for every new user of a communication service like social networks etc. Popular examples of companies who have to deal with this question are Facebook, WhatsApp, and Instagram. A common method involves uploading the user’s address book to the service. However, this is a critical move because the service gets a lot of personal data of people who never agreed to share their data with a third party.
In some countries such as here in Germany, such practices are even strictly forbidden because citizens should be able to control the use of their personal data. Our constitutional court even gave this principle a name: it is called “informational self-determination” (“informationelle Selbstbestimmung”) and it became a basic right in 1983, only a few weeks before Orwell’s year began …
Sure, that’s why we’re hashing that stuff before uploading!
If you don’t want to upload the address book in plain text, you could upload the hashes of its entries. Instead of the message “I know someone whose phone number is +1234567890”, only the message “I know someone whose phone number has the hash 12039d6dd9a7e27622301e935b6eefc78846802e”. The server can then compare this to the phone number hashes of its users to find out whether it knows this person.
Hashing contact information before uploading is a good start. When using a cryptographic hash function like SHA-1 or something from the SHA-2 family, you are working with a solid tool. But all of these hash functions are designed to be fast and you can bruteforce them similarly to password hashes.
Bruteforcing email addresses
The easiest way to bruteforce large amounts of email hashes is to use the fact that a lot of people use one of a small set of email providers. Having a fixed email domain (e.g. “@gmail.com”) allows you to concentrate on the username. Most people use short usernames, let’s say less than 15 characters. Popular are [a-z0-9.-], i.e. 38 different chars per position.
That’s 38^15 ≈ 5*10^23 ≈ 79 bits.
For shorter usernames (10 chars), we have 38^10 ≈ 6*10^15 ≈ 52 bits.
When we consider that email addresses are not random strings but usually include real names or dictionary words, the real amount of information contained in the address is much less, which makes bruteforcing fast. Let’s have a wild guess: If one char of the usual email address contains 3 bits of information, we have only around 30 bits for the whole username. With current bruteforcing tools such as Hashcat, a hash of such an email address can be cracked in seconds.
Bruteforcing phone numbers
Assume you know that the majority of your users is from a specific country, let’s say Germany.
Their mobile phone number will start with a +49 followed by one of the 17 possible mobile carrier codes and a 7-8 digit number.
That’s 17*10^8 ≈ 2^31, i.e. less that 2 billion possibilities. Again, this can be cracked in seconds.
Additionaly, one can store the results of the bruteforce attack in a rainbow table, making it easy to reconstruct a large amount of phone numbers from their hashes all at once.
Pure contact information hashing is useless in case of phone numbers and weak in case of email addresses.
What can be done?
You can use the power of combinations. Assume there are 2^52 email addresses and 2^31 phone numbers, then there are 2^83 email-phone pairs, which is much harder to bruteforce.
We’ll suggest a way of more secure user discovery, where your user needs to know
email and phone number to find a peson from his address book on your service.
Ask your user: Help your contacts to find you on [name of service]. Only those who already know your email and phone number will find you. Please enter: email, phone.
Then you generate the discovery hash for your user like that:
discovery_hash = hash(user.email + user.phone) where
+ is meant to be the string concatenation and
hash a cryptographic hash function.
On the client, your user generates the same hashes for each contact in his address book
contact_hash = hash(contact.email + contact.phone).
Note: There might be several hashes per contact if people have multiple email
addresses and phone numbers (one for every combination). But the number of hashes will still be low (8 hashes if there are 4 emails and 2 phone numbers in one contact).
hash cryptographically secure, slow and and it’s output big since attackers suffer from that much more than honest users do. The hashes from the SHA family are generally regarded as secure, but they are very fast. The recommended way to make a hash function slower is by iteration,
as discussed for example in PKCS #5.