Become a geeks #1: How does FlowingMail peers find other peers?

This is the first article of a series of technical explanation that will guide you through the functionalities of FlowingMail

FlowingMail is a serverless messaging system. Since there is no server that routes the messages to their destinations, how does FlowingMail sends the messages to the correct recipients?

The response: DHT, or Distributed Hash Tables.

We can imagine the DHT as a sort of database, where each record or line contains two columns: an ID and a value related to that ID.
For instance, the following list could represent the content of a DHT:
– ID=”ABC”, VALUE=”ZDUSI”
– ID=”ABD”, VALUE=”IDISOAPDIS”
– ID=”ACD”, VALUE=”IDJSN”
– ID=”EFH”, VALUE=”UDSIIOP”

The main property of the DHT is that it is distributed: this means that each value of the DHT is stored in different nodes (FlowingMail clients) that have an ID similar to the ID of the record.
For instance, if 4 nodes partecipate in the DHT, and their ID is “ABC”, “ABE”, “CAB” and “EGG” then the first two nodes will probably be good candidates to store the values for the IDs “ABC”, “ABD” and “ACD”, while the fourth node will store the value for the ID “EFH”.

The first time that a node starts then it has to know the internet address of just another node that is already online: we facilitate this operation by supplying each client with a list of FlowingMail nodes that we will prepare and let always online, but each user can decide to supply its own list of known nodes.

Once the node has connected to the known nodes then it will query them to get a list of the node they know and their internet address, and it will keep the data about the nodes with an ID similar to its own one. The node will repeat the query for the new nodes it learns about until it will have a list big enough.

The next time the node will start then it will try to reconnect to all the nodes it knew from the last session: few of them will be offline or will have a different address, but this doesn’t matter because it needs just one of them to successfully connect to the FlowingMail network.

When the node needs to find the internet address of another node then it will first try to find it in the list of nodes it already knows, then it will query the nodes with ID closest to the one it wants: the queried nodes will reply with a list of nodes they know with the ID close to the requested one. The operation continues until the requested node is found or, if it is offline, until the node closer to it have been found.

When the sender has a list of nodes with ID close to the recipient’s one then it sends the mail to all of them: if the recipient is online then it will receive it immediately, otherwise it will query the nodes with a similar ID to retrieve the mails intended for itself.

* This explanation is simplistic and it is meant to give an overview of how a DHT works.
It doesn’t cover the encryption part, the authentication, the generation of the IDs and so on.
For more information about the DHT please check the Kademlia white paper: FlowingMail uses a variant of Kademlia for the DHT.

Next article: the role of the DHT in FlowingMail.