Defending the DHT

Because the DHT is public and anyone can join it, it may become a victim of attacks from evil nodes that don’t behave as they should. As somebody pointed out, there are already papers that explain in great detail how to attack a DHT and poison it, but we think that our modified Kademlia DHT would be able to prevent some of those attacks. To perform an attack, the attacker should create a number of malicious nodes and try to introduce them in the routing table of the honest DHT nodes. Once the malicious nodes are in, they can return only information about other malicious nodes, return values for keys that they don’t really store, sniff all the traffic and try to understand the relationship between the nodes. Defenses against fake IDs, or clusters of malicious nodes In order to defeat a cluster of malicious nodes that try to insert them-self in other nodes’ routing tables, or DHT solutions does the following: accepts only a certain percentage of nodes from IPs close to each other The ID of the nodes in the routing table should be distributed between different IPs, especially for those nodes with ID close to the ID of the node for which we consider the hosting table. identifying the nodes When two nodes connect to each other they exchange their signature public key: since their IDs is derived from the public key, they cannot forge their IDs All the subsequent messages are signed by the sending nodes temporarily blacklisting nodes that return wrong results (e.g. return nodes with id that doesn’t match the public key, wrong hashes, etc) and the nodes that returned the malicious node ids. To successfully destabilize a node’s routing table, the attacker should possess several IPs: the attack is still possible but it would require a large amount of resources Defenses against fake return values There is really no point in faking a return value. In FlowingMail the DHT is used to: find other nodes store the list of available emails store the public keys of the nodes (encryption and signature keys) store the list of nodes that have a list of blocks composing a mail store the list of nodes that have a block of a mail The first point (find other nodes) has been analyzed before (fake IDs, cluster of malicious nodes) and it may be an issue, but the other points are pretty much covered. The list of available mails is retrieved from several nodes, not just from the first one that returns a valid result. This is done because the list of available mails is not composed by a single value, but by a sequence of ADD operations that add a new mail hash to the ones that already exist. Returning a list of fake mail hashes will have no impact, because a second query for the list of nodes with the fake mail’s list of blocks will return no values, unless the attacker has also control of the nodes with ID similar to the fake mail hash. Even in the case a list blocks for a fake mail is returned, the attacker must have the control for the nodes with ID similar to each block in the mail, and at that point is just easier to send an email using an honest FlowingMail node (which by the way must be signed by the sender). A fake signature public key will be detected immediately, since its hash will not match the ID of the owner of that key. A fake encryption public key will be detected because the digital signature (done using the owner’s private signature key) will not be verified (using the signature public key). Nodes that return fake values are also blacklisted and not taken into consideration when they popup in lookups or try to get back into the routing table. At the end of the day, attacks that we didn’t predict (and maybe some that we predicted) will be possible, but the aim of this project is to study and fortify the protocol until is ready for general...

Become a geek #2: The role of the DHT in FlowingMail

The DHT, or Distributed Hash Table is a big database containing values identified by an ID. Each node in the FlowingMail network is responsible for the storage of the values identified by an ID similar to the node’s ID. We saw in a previous article how the nodes find other nodes with a specific ID. Because of the DHT redundancy (each value is stored in several nodes) the DHT is used to store information that has to be always available. The following values are published to the DHT: the nodes’ public keys (they are identified by the owner’s ID) the hashes of the available email (they are identified by the recipient’s ID) the list of nodes that have the list of blocks forming an email (they are identified by the mail hash) the list of nodes containing specific blocks of an email (they are identified by the block hash) Therefore a node can always: ask for a public key of a specific node, even if the requested node is offline ask for the list of email hashes that have been sent to a specific ID (but only the recipient can decrypt the mails) ask for the list of nodes that are storing parts of the mails ask for a list of nodes that store specific blocks of an email Once a recipient obtains the list of nodes that contain the blocks of his emails then it can start downloading the mails from them. More information about the theory of the DHT: Kademlia white...

Become a geeks #1: How does FlowingMail peers find other peers?

This is the first article of a series of technical explanation that will guide you through the functionalities of FlowingMail FlowingMail is a serverless messaging system. Since there is no server that routes the messages to their destinations, how does FlowingMail sends the messages to the correct recipients? The response: DHT, or Distributed Hash Tables. We can imagine the DHT as a sort of database, where each record or line contains two columns: an ID and a value related to that ID. For instance, the following list could represent the content of a DHT: – ID=”ABC”, VALUE=”ZDUSI” – ID=”ABD”, VALUE=”IDISOAPDIS” – ID=”ACD”, VALUE=”IDJSN” – ID=”EFH”, VALUE=”UDSIIOP” The main property of the DHT is that it is distributed: this means that each value of the DHT is stored in different nodes (FlowingMail clients) that have an ID similar to the ID of the record. For instance, if 4 nodes partecipate in the DHT, and their ID is “ABC”, “ABE”, “CAB” and “EGG” then the first two nodes will probably be good candidates to store the values for the IDs “ABC”, “ABD” and “ACD”, while the fourth node will store the value for the ID “EFH”. The first time that a node starts then it has to know the internet address of just another node that is already online: we facilitate this operation by supplying each client with a list of FlowingMail nodes that we will prepare and let always online, but each user can decide to supply its own list of known nodes. Once the node has connected to the known nodes then it will query them to get a list of the node they know and their internet address, and it will keep the data about the nodes with an ID similar to its own one. The node will repeat the query for the new nodes it learns about until it will have a list big enough. The next time the node will start then it will try to reconnect to all the nodes it knew from the last session: few of them will be offline or will have a different address, but this doesn’t matter because it needs just one of them to successfully connect to the FlowingMail network. When the node needs to find the internet address of another node then it will first try to find it in the list of nodes it already knows, then it will query the nodes with ID closest to the one it wants: the queried nodes will reply with a list of nodes they know with the ID close to the requested one. The operation continues until the requested node is found or, if it is offline, until the node closer to it have been found. When the sender has a list of nodes with ID close to the recipient’s one then it sends the mail to all of them: if the recipient is online then it will receive it immediately, otherwise it will query the nodes with a similar ID to retrieve the mails intended for itself. * This explanation is simplistic and it is meant to give an overview of how a DHT works. It doesn’t cover the encryption part, the authentication, the generation of the IDs and so on. For more information about the DHT please check the Kademlia white paper: FlowingMail uses a variant of Kademlia for the DHT. Next article: the role of the DHT in...

How secure are your usual emails?

How secure are your mails? Not much. Normally your mail travel using the SMTP protocol (Simple Mail Transfer Protocol). Note how the S in SMTP stands for Simple, not Secure. Only few email providers are connected to each other using encrypted channels. GMail is known for supporting encrypted communication with other servers, but only if also the other servers support it (obviously). So, what happens when you send a mail from a GMail account to a Yahoo one? Everything is fine until the mail leaves the GMail server to travel to the Yahoo server: at this point the email is on a non-encrypted channel and can be captured by anyone listening on that channel, without warrant and without cooperation from the email providers. This happens for most of the mails sent between different email...