Understanding the Bittorrent protocol:
Ok, am not writing a wikipedia page but sought to shade light to those torrenting, about this protocol in a sense of seeing the forest through the trees.
Many of you are familiar of the client-server file distribution, where a server sends a copy of a file to all peers thus placing an enormous burden to the server n consuming bandwidth. But in P2P file distribution, each peer can redistribute any portion of the file it has recieved to any other peers thus assisting the server.
We all know the BitTorrent protocol, originally developed by Bram Cohen. Many clients have been built ontop of it.
[Skip the math of the scalability of P2P architecture for the faint of heart, unless otherwise.]
In BitTorrent lingo, the collection of all peers participating in distribution of a file is called a torrent. Peers in a torrent download equal-size chunks of the file from one another, with a typical chunk size of 256KB.
When a peer first joins a toorent, it has no chunk but over time it accumulates more and more chunks. In the process it also uploads chunks to other peers. Once a peer has acquired the whole file, it may selfishly leave th torrent or altruistically remain in the torrent and upload to others. A peer may also leave a torrent with half chunks and later rejoin the torrent.
Each torrent has an infrastructure node called a tracker, so when a peer joins a torrent, it registers with the tracker and periodically informs the tracker that its still in the torrent. In this manner the tracker keeps track of the number of peers participating in the torrent, which might vary below 10 to above 1000.
Taking set = 100->1000 peers in a torrent, a tracker randomly selects a subset of peers(say 50) from the set, and sends the IP addresses of the subset(50) to the new peer(John). With the list, John tries to establish concurrent TCP connections with all the peers on the list. With time some peers may leave and others may come. Periodically John will ask each peer(50) the chunks they have and use the rarest first technique to acquire, from the chunks she doesn't have, the chunks rarest to the peers hence increasing the populancy of rare chunks.
John gives priority to peers supplying at the highest rate and picks 4 of the best. Every 10 seconds, recalculates and mdifies her inner circle. The 4 peers are said to be unchoked. Every 30 seconds picks one more optimistically unchoked peer which might join the inner circle if the upload is higher than any of the 4.
BitTorrent has a number of interesting mechanisms that are not discussed here, including pieces (mini-chunks), pipelining, random first selection, endgame mode, and anti-snubbing. And if anyone is interested, i might talk about Distributed Hash Table and how the trackers store and maintain the database of peers.
Tl;dr.
References: -> For anything i don't know i crawl the web, read in a book and try only to ask if am at my wits end. (I guess thats no ref.)