Because the DHT is public and anyone can join it, it may become a victim of attacks from evil nodes that don’t behave as they should.
As somebody pointed out, there are already papers that explain in great detail how to attack a DHT and poison it, but we think that our modified Kademlia DHT would be able to prevent some of those attacks.
To perform an attack, the attacker should create a number of malicious nodes and try to introduce them in the routing table of the honest DHT nodes.
Once the malicious nodes are in, they can return only information about other malicious nodes, return values for keys that they don’t really store, sniff all the traffic and try to understand the relationship between the nodes.
Defenses against fake IDs, or clusters of malicious nodes
In order to defeat a cluster of malicious nodes that try to insert them-self in other nodes’ routing tables, or DHT solutions does the following:
- accepts only a certain percentage of nodes from IPs close to each other
The ID of the nodes in the routing table should be distributed between different IPs, especially for those nodes with ID close to the ID of the node for which we consider the hosting table.
- identifying the nodes
When two nodes connect to each other they exchange their signature public key: since their IDs is derived from the public key, they cannot forge their IDs
All the subsequent messages are signed by the sending nodes
- temporarily blacklisting nodes that return wrong results (e.g. return nodes with id that doesn’t match the public key, wrong hashes, etc) and the nodes that returned the malicious node ids.
To successfully destabilize a node’s routing table, the attacker should possess several IPs: the attack is still possible but it would require a large amount of resources
Defenses against fake return values
There is really no point in faking a return value.
In FlowingMail the DHT is used to:
- find other nodes
- store the list of available emails
- store the public keys of the nodes (encryption and signature keys)
- store the list of nodes that have a list of blocks composing a mail
- store the list of nodes that have a block of a mail
The first point (find other nodes) has been analyzed before (fake IDs, cluster of malicious nodes) and it may be an issue, but the other points are pretty much covered.
The list of available mails is retrieved from several nodes, not just from the first one that returns a valid result.
This is done because the list of available mails is not composed by a single value, but by a sequence of ADD operations that add a new mail hash to the ones that already exist.
Returning a list of fake mail hashes will have no impact, because a second query for the list of nodes with the fake mail’s list of blocks will return no values, unless the attacker has also control of the nodes with ID similar to the fake mail hash.
Even in the case a list blocks for a fake mail is returned, the attacker must have the control for the nodes with ID similar to each block in the mail, and at that point is just easier to send an email using an honest FlowingMail node (which by the way must be signed by the sender).
A fake signature public key will be detected immediately, since its hash will not match the ID of the owner of that key.
A fake encryption public key will be detected because the digital signature (done using the owner’s private signature key) will not be verified (using the signature public key).
Nodes that return fake values are also blacklisted and not taken into consideration when they popup in lookups or try to get back into the routing table.
At the end of the day, attacks that we didn’t predict (and maybe some that we predicted) will be possible, but the aim of this project is to study and fortify the protocol until is ready for general usage.