I wrote this letter earlier this year regarding a debate whether to put up a collection of talks up as one large file directory on bittorrent, or have separate “torrents” for each individual talk. I favored one large file directory, and went entirely overboard explaining why. I figured it might have some relevance for others to read, and I welcome debate and corrections.
So here’s a little quick overview of bittorrent and why I prefer one big mojo torrent of a given thematic collection over a pile of tiny torrents.
When Bram Cohen introduced Bittorrent at Codecon way back when, he did the classic one-man-army maneuver of working on it in solitude and silence for a year and the dropping the fully working binary on the convention, and the world, promiting the breakthroughs he’d done but of course immediately painting himself into a corner with regards to pitfalls in his implementation.
When Cohen designed Bittorrent (as regards his speech), his central idea was critical and the motivating factor: prevent bandwidth bottlenecks. In the situations where a large amount of people want something, the Bittorrent protocol and setup allows them all to help each other while also downloading a file. If you go back to his 2002 presentation, here’s his description:
BitTorrent – hosting large, popular files cheaply. Started in May 2001, based on lessons learned writing Mojo Nation, I’ve been working on it full time since then. Will show installation and download on a fresh machine, then show how to host files as well. Can handle several downloads at once. Integrates seamlessly with the web – users download simply by clicking on hyperlinks. Scaling will be improved to thousands of simultaneous downloaders.
This is all well and good, and Cohen’s initial implementation stayed in use, and was quickly recognized for its benefits: shared bandwidth on ludicrously large files, like, oh, say…. movie files. With that, the protocol took off, much to Cohen’s quite public dismay.
Part of this is because for all the impressiveness of bittorrent’s debut, it also has a number of the classic problems, like “tit for tat” routines that judge how much each person is downloading vs. uploading. It wasn’t hard to write stuff to ignore that. People had to then rewrite stuff to punish that, etc. But let’s set that aside.
We know the classic Server-Client model. Server has stuff. Server makes stuff available, generally through ports. Client connects through port and gets stuff. This is how websites work, this is how FTP works, this is even how sshd and telnet work, although in those cases the “stuff” is interactive access to programs running within the machine. There’s pros and cons to this model, but there’s one which is not obvious, which Bittorrent loses, which I’ll get to in a moment.
In Bittorrent’s model, there’s no server in the standard sense of the word. There’s no central “thing” holding “stuff”. Instead, there’s a “tracker” that keeps track of all the clients with the “stuff”. There is a “torrent” file generated, which is kind of a bit info-file, saying:
- What files are in the “collection”
- What their MD5 hashes are
- What tracker is officially “associated” with this “torrent”
The “tracker” is told via these “torrent” files what it is “tracking”. The “clients” use these “torrent” files to know where the “tracker” is, and the information on the “collection”.
When people are downloading, they’re “leeches”. “Leeches” both download, and upload. When a Leech has successfully downloaded 100% of the collection, they are a “seed”. It is possible to be the “seed” for a single file or set of files, if you have 100% of them, and obviously, if you have ANY piece of the files, you can share that piece you have with the other clients/leeches, all while you’re downloading.
That’s the rough history and explanation. Now, my issues with it and why a larger torrent file is better than a bunch of smaller ones.
The absolutely unintended and shocking side-effect of the bittorrent model is that it has two critical points of failure: if the tracker goes down nobody can share the torrent, and if there are no seeds then it doesn’t matter if the tracker is up!
You would think that as a default, following three decades of client-server technology in use, Bittorrent would have the “tracker” function as a “seed” when there were no “seeds”. No. Not in the least. Instead, the tracker will report that there are 0 “seeds” and you’re SOL.
This first critical point of failure (tracker goes down) was mitigated a little while after the introduction of DHT/”trackerless” torrents, where basically servers ignore the “tracker” part of a .torrent file and just ask out in the world “so, anybody connected to anything I’m connected to have this file?” In that way, the problem was dealt with a bit by at least making it theoretically possible to find other “seeds” without a specific tracker being up.
The second, however, has never really been handled.
Without the tracker-as-seed model (which is just fine as far as many trackers are concerned, since it totally frees them from carrying pirated material), it is very easy, very simple, for files to become a popularity contest.
Files, like say, an in-the-theatres movies .AVI or a version of some distributed software, can fall out of favor. Once you have the new version, or a DVD rip where you just had a screener, people switch to the better files, like a fad. Or maybe the files just get old, out of date, or otherwise fall out of whatever fad state encouraged them.
What this means is that things just kind of “die”. People stop being seeds, reboot, get away from what they’re up to and look to other things. Whereas on a regular file server, you can have something only get downloaded 2-3 times a year, it takes CPU power and disk space to seed things you might no longer care about, and so torrents die.
In fact, I find the average torrent barely lasts 2 months. As an example, the 2002 Codecon put up their entire mp3 collection on bittorrent, to show off the technology… and now you can’t get it! The tracker is long gone, the seeds long gone. The only place you can get them… is from me:
Where I put them up the old fashioned way, slow and stuff, although I’m mirrored elsewhere where the connections are fast. So if someone wants the historically interesting first introduction of bittorrent, you have to go to one of my websites.
I think this problem is endemic to Bittorrent. That said, it does mean it’s really, really good for high-traffic massive-interest files. New TV shows, for example: you go out, find the new TV show, download the whole crazy thing, and you’re torrenting with thousands of others so it comes in zippity-quick, and then, after a few weeks, who cares? What ends up happening is someone makes a “new” torrent of the entire SEASON of that show. Later, when the show is cancelled, someone makes a torrent of the entire RUN of the show. So even though it doesn’t do historical stuff well (and by that, I mean a created torrent doesn’t have a long shelf life), it definitely presents the “stuff” very well for the short lifespan.
So, what you want when you present a torrent of stuff, especially a thematically-similar set of stuff, is to ensure the largest number of simultaneous users, all pulling what they want, and therefore increasing the chances of someone “seeding” multiple files, even if they later delete them or choose not to have them. Since the torrent has a relatively short lifespan, the fad of downloading a given new torrent will ensure the saved bandwidth followed by a sad and lonely death of the file availability.
The relative spectrum of clients available means that some do not have the functionality of choosing specific files. But these are in such a minority that they almost rate as science projects. Even if one doesn’t have a windows machine super-client like Azureus or utorrent, there’s at least one leading edge client (bittornado) which is cross platform across basically every known major unix, windows, and even OSX. It functions both as a text-only command-line client and with a GUI. The advantages of a clustered group sharing of the notacon audio/video collection greatly outweigh the inability of a number of anaemic clients to handle per-file selection, especially with such a wide range of replacement clients that do things like handle DHT, and optimize the bandwidth usage.
By splitting the torrent into dozens of torrents, even if initially distributed by a zip, you lose several advantages. First of all, you end up taking in more CPU usage. (Even if you want to “share them all”, most clients will not do this, swapping between a random selection so as not to pin the machine sharing them all). Second, you can’t just drop the large .torrent on thepiratebay, mininova, legaltorrents and so on, so you can’t have one large and explanatory description file (and additional things like a copy of the program or logos or bonus music), but instead end up flooding the search engines with a bunch of ad-hoc singular files, even to the point of going “here’s the video of the talk, here’s the audio of the talk”. If the point of this is to distribute the talks among a small insular group who already know what they want and where they want it, it probably could have just been handled with some per-person DVDs, or pointing them to the DVD clearinghouse.
Anyway, there’s the thinking behind what I said. It’s long and tedious and technically weird, but I do believe in it. So there we go.
Categorised as: Uncategorized
Comments are disabled on this post