Saturday, April 15, 2006

Distributed Computing can do wonders

I was always attracted towards Distributed Computing.... The concept is so simple but so powerful at the same time. It was Distributed Computing because of which the Cheapest Super Computer in the world could materialize. Now there are many super computers in existence which utilize the power of Distributed Computing and are termed as Beowulf Clusters.
Its fascinating how, many independent entities work together to achieve a goal. Just like any living being does. Take example of KAZAA, a Peer to Peer download utility. It’s so simple and works really in the best way it can.
1) It searches for the files at all Peers (KAZAA users).
2) It allows download of a file in parallel from any no. of sites (sites keep on changing as the users go offline).

Search is very simple to visualize but what makes the file download process so robust and flexible. Here is how I think it must be working.....

Algorithm: Download File
Input: A File on a site of say 500 MB (Just to indicate that download is gonna take a lot of time)

1) Ask the peer from where the file is selected.... to send the checksum/MD5 hash (I bet for MD5) of the file first.
2) Once you have the MD5 checksum you search available sites for a file matching this checksum.
3) Suppose 3 sites are found.
4) with each sites found you get some settings like... download from that site is allowed or not, download limit, no. of concurrent downloads that site allows, speed at which that site can send data.
5) Depending upon all above factors compute a chunk of bytes which can be downloaded from each site in a brief time frame.... logic is if the download speed is too slow then start downloading small chunk of bytes from the site, so that as soon as we find faster sites we start downloading rest of bytes from that site o.w. it would take ages to download.
6) Start 3 threads, one thread to download from one site.
7) Now MD5 ensures we are downloading the portions of same file but its important to make sure we are downloading mutually exclusive chunks from sites and also we don't miss even a single byte.
8) Its easy to ensure mutually exclusive chunks of download.... Just maintain a common current offset from where a site will start downloading its assigned chunk of bytes and update the offset as it starts.
9) To ensure we haven't missed a single byte... we can maintain a log for each thread which will tell us how many bytes were successfully downloaded by the thread out of its assigned chunk.
10) Also we keep checking for the availability of more sites periodically and if found we start new threads to download from those sites as well.
11) Finally when the treads terminate. Run MD5 on downloaded file to crosscheck we have got what we were expecting.

So simple and so powerful....ain't it?

0 comments:

About This Blog

Followers

  © Blogger templates The Professional Template by Ourblogtemplates.com 2008

Back to TOP