ASCII by Jason Scott

Jason Scott's Weblog

Bittorrent: Solitary. Fashionable. Ethereal. —

I wrote this letter earlier this year regarding a debate whether to put up a collection of talks up as one large file directory on bittorrent, or have separate “torrents” for each individual talk. I favored one large file directory, and went entirely overboard explaining why. I figured it might have some relevance for others to read, and I welcome debate and corrections.


So here’s a little quick overview of bittorrent and why I prefer one big mojo torrent of a given thematic collection over a pile of tiny torrents.

When Bram Cohen introduced Bittorrent at Codecon way back when, he did the classic one-man-army maneuver of working on it in solitude and silence for a year and the dropping the fully working binary on the convention, and the world, promiting the breakthroughs he’d done but of course immediately painting himself into a corner with regards to pitfalls in his implementation.

When Cohen designed Bittorrent (as regards his speech), his central idea was critical and the motivating factor: prevent bandwidth bottlenecks. In the situations where a large amount of people want something, the Bittorrent protocol and setup allows them all to help each other while also downloading a file. If you go back to his 2002 presentation, here’s his description:

BitTorrent – hosting large, popular files cheaply. Started in May 2001, based on lessons learned writing Mojo Nation, I’ve been working on it full time since then. Will show installation and download on a fresh machine, then show how to host files as well. Can handle several downloads at once. Integrates seamlessly with the web – users download simply by clicking on hyperlinks. Scaling will be improved to thousands of simultaneous downloaders.

This is all well and good, and Cohen’s initial implementation stayed in use, and was quickly recognized for its benefits: shared bandwidth on ludicrously large files, like, oh, say…. movie files. With that, the protocol took off, much to Cohen’s quite public dismay.

Part of this is because for all the impressiveness of bittorrent’s debut, it also has a number of the classic problems, like “tit for tat” routines that judge how much each person is downloading vs. uploading. It wasn’t hard to write stuff to ignore that. People had to then rewrite stuff to punish that, etc. But let’s set that aside.

We know the classic Server-Client model. Server has stuff. Server makes stuff available, generally through ports. Client connects through port and gets stuff. This is how websites work, this is how FTP works, this is even how sshd and telnet work, although in those cases the “stuff” is interactive access to programs running within the machine. There’s pros and cons to this model, but there’s one which is not obvious, which Bittorrent loses, which I’ll get to in a moment.

In Bittorrent’s model, there’s no server in the standard sense of the word. There’s no central “thing” holding “stuff”. Instead, there’s a “tracker” that keeps track of all the clients with the “stuff”. There is a “torrent” file generated, which is kind of a bit info-file, saying:

  • What files are in the “collection”
  • What their MD5 hashes are
  • What tracker is officially “associated” with this “torrent”

The “tracker” is told via these “torrent” files what it is “tracking”. The “clients” use these “torrent” files to know where the “tracker” is, and the information on the “collection”.

When people are downloading, they’re “leeches”. “Leeches” both download, and upload. When a Leech has successfully downloaded 100% of the collection, they are a “seed”. It is possible to be the “seed” for a single file or set of files, if you have 100% of them, and obviously, if you have ANY piece of the files, you can share that piece you have with the other clients/leeches, all while you’re downloading.


That’s the rough history and explanation. Now, my issues with it and why a larger torrent file is better than a bunch of smaller ones.

The absolutely unintended and shocking side-effect of the bittorrent model is that it has two critical points of failure: if the tracker goes down nobody can share the torrent, and if there are no seeds then it doesn’t matter if the tracker is up!

You would think that as a default, following three decades of client-server technology in use, Bittorrent would have the “tracker” function as a “seed” when there were no “seeds”. No. Not in the least. Instead, the tracker will report that there are 0 “seeds” and you’re SOL.

This first critical point of failure (tracker goes down) was mitigated a little while after the introduction of DHT/”trackerless” torrents, where basically servers ignore the “tracker” part of a .torrent file and just ask out in the world “so, anybody connected to anything I’m connected to have this file?” In that way, the problem was dealt with a bit by at least making it theoretically possible to find other “seeds” without a specific tracker being up.

The second, however, has never really been handled.

Without the tracker-as-seed model (which is just fine as far as many trackers are concerned, since it totally frees them from carrying pirated material), it is very easy, very simple, for files to become a popularity contest.

Files, like say, an in-the-theatres movies .AVI or a version of some distributed software, can fall out of favor. Once you have the new version, or a DVD rip where you just had a screener, people switch to the better files, like a fad. Or maybe the files just get old, out of date, or otherwise fall out of whatever fad state encouraged them.

What this means is that things just kind of “die”. People stop being seeds, reboot, get away from what they’re up to and look to other things. Whereas on a regular file server, you can have something only get downloaded 2-3 times a year, it takes CPU power and disk space to seed things you might no longer care about, and so torrents die.

In fact, I find the average torrent barely lasts 2 months. As an example, the 2002 Codecon put up their entire mp3 collection on bittorrent, to show off the technology… and now you can’t get it! The tracker is long gone, the seeds long gone. The only place you can get them… is from me:

http://audio.textfiles.com/cons/codecon2002/

Where I put them up the old fashioned way, slow and stuff, although I’m mirrored elsewhere where the connections are fast. So if someone wants the historically interesting first introduction of bittorrent, you have to go to one of my websites.

I think this problem is endemic to Bittorrent. That said, it does mean it’s really, really good for high-traffic massive-interest files. New TV shows, for example: you go out, find the new TV show, download the whole crazy thing, and you’re torrenting with thousands of others so it comes in zippity-quick, and then, after a few weeks, who cares? What ends up happening is someone makes a “new” torrent of the entire SEASON of that show. Later, when the show is cancelled, someone makes a torrent of the entire RUN of the show. So even though it doesn’t do historical stuff well (and by that, I mean a created torrent doesn’t have a long shelf life), it definitely presents the “stuff” very well for the short lifespan.

So, what you want when you present a torrent of stuff, especially a thematically-similar set of stuff, is to ensure the largest number of simultaneous users, all pulling what they want, and therefore increasing the chances of someone “seeding” multiple files, even if they later delete them or choose not to have them. Since the torrent has a relatively short lifespan, the fad of downloading a given new torrent will ensure the saved bandwidth followed by a sad and lonely death of the file availability.

The relative spectrum of clients available means that some do not have the functionality of choosing specific files. But these are in such a minority that they almost rate as science projects. Even if one doesn’t have a windows machine super-client like Azureus or utorrent, there’s at least one leading edge client (bittornado) which is cross platform across basically every known major unix, windows, and even OSX. It functions both as a text-only command-line client and with a GUI. The advantages of a clustered group sharing of the notacon audio/video collection greatly outweigh the inability of a number of anaemic clients to handle per-file selection, especially with such a wide range of replacement clients that do things like handle DHT, and optimize the bandwidth usage.

By splitting the torrent into dozens of torrents, even if initially distributed by a zip, you lose several advantages. First of all, you end up taking in more CPU usage. (Even if you want to “share them all”, most clients will not do this, swapping between a random selection so as not to pin the machine sharing them all). Second, you can’t just drop the large .torrent on thepiratebay, mininova, legaltorrents and so on, so you can’t have one large and explanatory description file (and additional things like a copy of the program or logos or bonus music), but instead end up flooding the search engines with a bunch of ad-hoc singular files, even to the point of going “here’s the video of the talk, here’s the audio of the talk”. If the point of this is to distribute the talks among a small insular group who already know what they want and where they want it, it probably could have just been handled with some per-person DVDs, or pointing them to the DVD clearinghouse.

Anyway, there’s the thinking behind what I said. It’s long and tedious and technically weird, but I do believe in it. So there we go.


Categorised as: Uncategorized

Comments are disabled on this post


9 Comments

  1. Will says:

    For what it’s worth, Azureus is also cross platform (SWT Java application). It makes my Mac run hot, but it’s got plugins enough to block out the evil ip’s of he RIAA or whomever so you don’t get corrupted sectors. And you can choose which files you want to download within the torrent, so you can easily pick and choose rather than downloading and deleting.

  2. Jason Scott says:

    The problem with Azureus is that it is fantastically resource intensive. Whatever uses it ends up becoming an ‘Azureus Machine’. I use an older client of utorrent before it was bought out and became suspect.

  3. James says:

    I have to disagree – blaming bittorrent for disappearing content makes no more sense than blaming HTTP for disappearing content. Take linux.conf.au 2003 – an archive of the website is at http://www.linux.org.au/conf/2003/ – now try downloading the conference CD .iso, or viewing people’s photos of the event (linked halfway down the page). Most of the links don’t work any more, and it’s only because there’s 7 mirrors that you’ve got half a chance of finding one that still works.

    Cool URIs don’t change – but part of not changing is not disappearing. This is an eternal burden that even the best of us have trouble fulfilling, let alone your average joe who has put up his conference photos.

  4. Jason Scott says:

    I have to disagree right back. In the case you give, mirrors slowly die. In the case of bittorrent, they quickly, quickly die. I have no doubt there’s the issue of “webrot” across all collections/availability of data, just like there has been with things like mailing addresses and store locations; it’s just that now it’s been accelerated to the point that mere weeks can mean the end of a torrent, and in most cases, since there’s no “main” server, it’s even more flaky.

  5. James says:

    Now I’m wondering what we’re disagreeing about. To some extent it’s NOTABUG, since it was designed to deal with releases, not be a file-sharing “network” as such. If you want a movie from last decade, use a more traditional P2P network like eMule where you can see what people actually have on their hard drives.

    Maybe what I’m saying is that a torrent is not data, it’s a catalogue of data at a certain point in time, ie metadata. And all data is subject to entropy – it gets moved around, compressed, copied, lost and so on. But the metadata in the torrent doesn’t get updated to handle this, and there’s no tool to say “I have this torrent, and this drive full of files – where are the files that go in this torrent so I can seed it?”. If the source torrent of a file was stored in the file itself, then you could reorganise your files to your heart’s content, and just point your torrent app and it would automagically reconstitute the torrent.

    Ah, here we go – “The truth is in the file” http://blog.jonudell.net/2007/02/20/whos-got-the-tag-database-truth-versus-file-truth-part-3/
    and http://blog.jonudell.net/2007/02/14/truth-files-microformats-and-xmp/

    PS: Freenet 0.5 is going to die at some point – are you archiving any of the freesites on it?

  6. Jason Scott says:

    I think we’re mostly disagreeing over the wrong bit. What I’m arguing about is that people are using bittorrent as a way to “archive” and make available stuff for general release, and that the lifespan is very, very short. You say it was “designed to deal with releases”, but it’s being used in all other manner of long-term stuff and then failing. Additionally, it is a galactic pain in the ass to keep running. That’s all I’m saying here. The fact that I have the only copy of the bittorrent announcement online testifies to this side of things.

    I don’t think Bram Cohen thought of Bittorrent’s conceptual place within Internet transfers beyond it being a way of having load-sharing abilities that standard download mechanisms don’t have. That said, it was rampant, intense piracy that spread its popularity, not it’s inherent design.

    That’s all!

  7. ewen chia says:

    The problem with Azureus is that it is fantastically resource intensive.

  8. Charles says:

    > As an example, the 2002 Codecon put up their entire
    > mp3 collection on bittorrent, to show off the technology…
    > and now you can’t get it! The tracker is long gone,
    > the seeds long gone. The only place you can get them
    > is from me:
    >
    > http://audio.textfiles.com/cons/codecon2002/
    >
    > So if someone wants the historically interesting first
    > introduction of bittorrent, you have to go to one of
    > my websites.

    …which is 404ed, ironically enough. :)

  9. deepgeek says:

    Just wanted to mention that my favorite CLI download manger is Aria2. It has an ability to download from both http and torrent in the fastest way possible, then seed if desired.

    CLI rocks…

    DG