ASCII by Jason Scott

Jason Scott's Weblog

All of the Podcasts —

This is a story about how I ended up downloading every podcast. But it’s actually a little more than that.

I have a reputation/name as a historian now, and that’s nice, but I’m primarily a collector. I have an innate need to put things with other things like it and end up with a large set of like things. I do it everywhere and in a whole range of ways, and have done it since I was very young. I don’t really discriminate too much about what I like to collect, although I suppose it leans in the direction of information or unusual subcultures. What happened and continues to happen is that in pulling together these collections, I discover patterns and themes that reveals things far beyond the mere collection itself, and I draw on those themes to write or speak about. It’s like enjoying bike-riding and you bike-ride so much you discover how towns are laid out and roads are planned, simply from the mass of places you’ve ridden your bike. A tangental, but important and ultimately vital set of learning.

When putting together a collection of any sort, there is a vital but most-unrewarding portion of the process in the beginning. You start putting like things together, begin assembling them in a rough fashion or order, and are spending some significant amount of time doing so. Depending on the nature of the collecting, you might find that you lose a day or days to it, at the end of which you’ve only increased your collection minimally. Additionally, there’s very little difference between a beginning collection and a trash pile. A lot of people have trash piles that could be collections if they cared about them, but the trash pile is the cast-off shell to them, not the fruit. It is also very difficult to explain to people why you think something needs to be collected in the first place, and so you have the worst of all worlds: a small, non-comprehensive collection of something that you know others have done better and which will never have “it all”, which is taking a lot of your time to put together. This is where most people move on, and put stuff into the electronic or physical trash can, delighted their worthless proto-collection has been set aside to make space for more important things in their world. This once expressed itself in things like piles of magazines, sets of carefully arranged postal stamps, or small piles of rocks representing various minerals and non-precious gems. Now it expresses itself in piles of printouts, files, manuals and hard drives.

About once a month I get a tragic, sad letter from someone who threw away their BBS lives a year or multiple years ago, who regret it heavily now that they see my collection and the gaps they could have filled. These are not enjoyable letters to get. But it’s quite understandable why they did so. There was a definite physical heft to the collection, but no value as they saw it.

(For the record, if you have a collection of BBS material, whether it be printouts, old parts, or archives of files, I will take it, no questions asked.)

So one day I looked at Podcasts. I liked some aspects of them, so I am downloading all of them. Every one. I am going back and swiping older ones as I can find them, but I’m still in the process of getting every single one, so it’s taking some time. I have them in languages I’ve never spoken, and I have listened to less than one tenth of one percent of them. At last count I’m at 75 gigabytes of podcasts which works out to roughly 7,500 individual files. I suspect there are doubles and many missed files, but we’ll see if that comes with time.

I’ll take a moment to describe how I am doing this. Obviously, I need some space to store all these podcasts, but space, these days, is very cheap. I watch sites that provide specials for hardware, and can purchase a 250 gigabyte hard drive for $100. It’s a drive type that is prone to failure, so I buy two. At home, I run these drives on USB2 enclosures, on two separate machines, and I use a program called rsync to keep them synchronized. I download podcasts using a program called doppler, which has several advantages to its approach that are useful for archiving. I have the podcasts on a network drive, so I am not beholden to a specific machine to download the podcasts. I found very quickly that Doppler Radio didn’t check to see if you had pointed it to multiple copies of the same feeds (it assumes you’re using such a small amount of feeds, that you would always notice the doubles yourself), so I wrote a perl script that yanked out doubles. This has held up for the time being, and while I don’t have firm numbers on how much disk space per day this process is taking, I’m not too worried about it.

While I’m here, I’ll give my own thoughts on the general medium of podcasting. I think the name is incredibly dumb. It sounds like the thing only works with iPods, which it does not. It sounds like you’re doing some sort of radio show and nothing else, when in fact it’s just a container for any data you choose to send along. And it sounds new and revolutionary, when it is anything but.

Podcasting certainly has its roots in zine culture, home-brew tapes, BBSes, carbon-copy SF fanzines, and telegraph. If that’s too high-minded and artsy-historian, then I could point to the direct event of the fad of “Push Technology” that infected a number of companies in 1998 through to 1999. Microsoft and Netscape both claimed that Push technology would change everything, and Pointcast tried to build a business on it. Really, it was all a fine idea, but the order of the day was to claim that not only was a good idea good, but it would actually turn dog poop into solid gold, so the actuality had issues with the (stock-driven) promises.

“What is this Blog thing?” my father asked me on the phone just a few days ago. Dad doesn’t buy into much, because life has taught him that everything’s one big massive scam with collusion by government and industry adding to the mess. Describing it to my dad, as I’ve learned over the years, requires about two paragraphs at most before it’s obvious I’m just being long-winded. So I basically said this (and I did, actually, say this; I am not playing semantic or dramatic games):

“Every once in a while, a group of people with a lot of free time who talk too much band together and take over an already-existing hobby, task or medium. In doing so, they invent a whole set of language to describe the already-existent thing they do, so it sounds like it’s really new and neat. They tend to ignore what’s before them, which is bad, but they also cause this critical mass where they force money and interest in the thing, which is good. The thing becomes easier and better put together to help these people get what they want out of it, which is to be really cool or make a lot of money.”

“So blogs are diaries that are online, where people talk about themselves and other people can read them and tell them how cool or uncool they are.”

Obviously the medium of blogs has a depth or meaning far beyond this, but I think that nails a lot of it, for the purposes of a quick explanation to my father when in fact he was wondering when my documentary was going to be finished. (The answer is, I’m working very hard on it.)

For the record, I am not very fond of the word “blog” at all, but the online and offline worlds are littered and choked with etymolgical abortions that grate and dismay, so there’s no sense in crying about it or trying to turn the tide. I’ll stick with “Audio Diaries” in ten years after it all dies down.

So again. Why am I collecting tens of gigabytes of podcasts, when I don’t seem to have an overreaching awe and admiration of them? Because life has taught me several facts about history and the nature of collecting which tell my gut instincts to go after all of them anyway. I gave a speech about this in 2004 called “Saving Digital History: A Quick and Dirty Guide”, but I’ll summarize quickly.

The hardest single part of analyzing history is to be at the historical event when it happens. You could be very good at knowing everything about Lincoln’s assassination, but the best information all flows from being at the event when it happened in the theatre, not reading second or third-hand accounts, or finding cribbed trial notes or anything else. But obviously, it is most difficult to travel back in time and be there.

Similarly, it is very hard to tell what in the present day will have historical significance. There’s some easy, large targets like major political events or spectacular trials, but sometimes it’s just dumb luck, having a camera or a good memory for facts, and being at the right place at the right time. Sometimes, there’s actually no historical significance but the artistry around recounting the event gives it historical significance. (The Woodstock concert/Aquarian Exposition of 1969 comes to mind.) And sometimes it’s merely a case that, looking back, you find that something has an entire other meaning than anybody associated with it could ever have imagined.

Such I think it will be with Podcasts. They are, essentially, a few people (not more than 1500 at any given time) who are recording their voices or music collections into compressed music files and then making them available for distribution. The fact that the clients for getting these music files are geared towards use-and-discard broadcasting models is irrelevant to me. What, instead, that I focus on is that there are entire swaths of life being recorded by these folks: their accents, their way of phrasing things, their lives, pieces of the world around them, who they know and knew, and how seminal events cut across all these geographic and personal boundaries.

The example I like to give (and I’ve done it a lot, just not in writing) is a hypothetical Letter to Home. Imagine a Civil-War era soldier writing home to his wife to tell her how things are. He might tell her how they’re very cold and unhappy but that the war might be over soon, and that he misses her very much and they should think about getting some more cows. Pretty straightforward stuff, and likely, to someone of the time, to be a rather boring or at least unremarkable letter.

Time changes its value. Obviously Civil War-era letters gain some amount of value by merely being over 100 years old, but beyond that the letter itself could reveal facts or insight that were never thought of at the time when the letter was collected. For example, the soldier might mention being in a specific field which will tell when the armies reached a certain point in a battle, different than previously thought. Maybe that soldier used a word or term that was coming into vogue at the time and helps language specialists trace the spread of that term through the US.

Or maybe, just maybe, that letter contains a watermark showing that it was manufactured by a company that claimed it never sold provisions to the “other side” during the war.

The point is, you can’t know. There’s so much information in the nature of the spoken voice and what the spoken voice is speaking of at the time, that it has contextual meanings that might come out months or years down the line. When combined with the times that they were recorded or the location of the speaker, you end up with a whole host of insight that comes up from your collection.

There are a number of other factors which will also assist me in collecting, most of which I pull from my experience collecting other such from-the-ground works. First of all, the number of day-to-day, consistently outputting podcasts will be very low. Like any interesting medium with a barrier to entry involving time or effort, the novelty wears off and the person stops doing the project. This turns it from an ongoing concern into an exhibit, and exhibits are very easy to collect. Another point is that the whole nature of this particular medium is that people are doing all the hard work themselves, that is, generating the content and ensuring its distribution through directories and clients. That means that I just have to keep setting my clients to the widest swath possible, open up every filter, and make sure the disk drives work, and 95 percent of my effort is automatic. When I have time, I might find more contextual information about each feed, but otherwise, even just having it all in one place is good for now. Obviously I have a lot of other things on my plate, but in a given day, I do basically zero work towards collecting, so it’s not a strain.

Where will this go? I don’t know. I don’t see there being a podcasts.textfiles.com and I’m certainly not looking to start a business as a podcast respository. But libraries and collections out there, some of them really amazing, were started because someone said “Why throw that out? I’ll put it away with the others.” and so it began.

So it begins.


Categorised as: Uncategorized

Comments are disabled on this post


25 Comments

  1. James says:

    Amusingly, http://odeo.com/ just launched, with some business plan to make money from podcasts. So it seems you have excellent timing. Also, may I suggest you collect screencasts as well? An even smaller output, but hugely rich in cultural info.

  2. Jason Scott says:

    Help me here. In looking around, as far as I can tell, “Screencasting” is basically Jon Udell’s term for “Flash movies that instruct”. Am I at all wrong, there?

    I collect flash movies, but not in any all-inclusive amount, but I do. Some of the dopplers (see? “dopplers” works as well as “podcasts”) I grab have non-mp3/ogg stuff in them, so they’re getting collected as well.

  3. oscar says:

    You could collect “video blogs”, which have been given the cringe-inducing appelation “vlogs”.

    Then again, doppler might be grabbing video files already.

  4. Jason Scott says:

    My “Cringe Muscle” broke years ago, so we’re safe there. I can look into the eye of the nomenclature storm and not blink.

    I know one of the podcasts I grab is nothing but WMV files, so I’m assuming it’s doing the right thing, should I find new logs as I go. By the way, each time I add a podcast, I jump in disk usage initially as I grab out all the old “episodes”. So I’m now past 80 gigabytes.

    My big thing now is all these weird Japanese podcasts which of course are completely and utterly decontextualized. But my Japanes stands to get better as a result.

  5. James says:

    Yeah, that’s what I’m talking about. Notably, he adds them as enclosures to his RSS feed.

    I dunno about “dopplers”, I get weird images of radar. “Feedloads” is better, but still awkward.

  6. Jason Scott says:

    Well, the doppler effect is when you have something going by at a certain speed, and there’s variation in the audio. So I like the naming because it implies a moving thing making sound, which you’re picking up. Neat!

    By the way, this whole insanity with weird names has some great precedents, my favorite being a fad in the late 19th century where all the newspapers were screwing around with words, spelling them wrong, and then abbreviating them. This is, bizzarely, where “O.K.” comes from. It stands for “Oll Korrect”.

  7. Boing Boing says:

    Archiving every Podcast

    Jason Scott is the archvist whose textfiles.org contains copies of every text-file that circulated on massive world of BBSes in the pre-Internet days. He’s launched a new project: archiving every single Podcast ever made. It’s only 75GB so far, but gro…

  8. Mark says:

    Fascinating. I would be interested in hearing more details your archival scheme at this early point. I assume you are keeping the recordings separated by source, with some indication of date-published (or date-received)? Do you store any other metadata, and if so, where are you getting it from, and what lengths are you going to to get it? RSS feeds can have author information, but many do not. Are you going to the associated home page and downloading/caching a copy of that along with the recordings? Just a one-time grab (when you find a new feed) could provide valuable metadata to sift through later.

  9. tomhigins says:

    My cringe muscle goes off at times like a race horse juiced up for the Kentucky Derby…the term podcast had me in twitches for days. You would think after all this time I would learn to medicate the beast. With Dr Gonzo gone I think we all need to take up Ether so that the industry does not collapse.

    Podcasts are DOWNLOADS…hey what about that. Did we come up with a new name for DOwnloads when we moved from Xmodem to Ymodem? Did things become Zmodemable all of a sudden. PushMePullYous? Yea I mean its more descriptive and it would get a freakishly cool mascot.

    But then again everthing is a Download, so for catagorzing this would be akin to having a one directory file system called FILES and everything(EVERYTHING) went in it.

    Thats why there came Filez which begat Warez which was the close friend to Philez which spawned the 3l33+n355 and verily 0Dayz, which rockethed hard.

    Ok so that just one set of classifications. Some folks hade directories for Music, Movies, Games, etc etc etc. However you sliced it there was a need for naming the categories.

    But did we need the name PodCast? Nope.

    They are Broadcasts, like shoutcasts and icecasts and under1W (pirate) FM and the occasional hijak of the satalites/cable feeders.

    These are things Cast on the waters of the medium (radio waves, the net, the net over radio waves (wifi), etc) with the hopes of being caught by an audience,

    AudioDiaries sounds almost right, just like Blogs are WebDiaries, all of which fall under the Diary classification. But not all these things are Diaries, like my JeanShepherd podcast for instance, Some casts are less a diary and more a presentation. Diaries always struck me as things done for oncself, at least the intent if not the historic outcome, where as what most blogs and casts do is open the personal to the masses.

    NicheCast? Sounds NPRish enough for the Times not to treat it like the jelly jar swilling bastard cousin of unsavory things like Virtual Reality( hands up if you were at all part of the beaten down hopefull of the VR revloution)

    So what do I call these things. Well when im talking to the mass lumpen I call them podcasts to avoid the confusion you often get when you mention shopping places other than WalMart or not having heard the lattest Toby Keith flagwrapped ode to the red white and blew. When I am talking to folks with half a skull of primodrial grey matter ooze i tend to say Netcast or even just Cast.

    My 2cents having been spent I now give all thanks to Jason for his taking up of this task. As he points out it really is a slice of downloads which I think over the passing of time will prove worthy to have collected.

    Exclecisor you Fatheads

    -tomhiggins

  10. If your missing any from Geek News Central I have all of mine here and can fill some gaps I was going to start submitting them to another archive.

  11. Jason Scott says:

    For Mark:

    Right now, I’m still in the discovery/aggregation phase, where I’m just trying to capture a whole swath of the stuff. In doing so, I’m basically relying on the medium/subculture’s ability to regulate/classify itself, which, as you might expect, ranges quite heavily.

    Some people are clever and forward-thinking and name their stuff things like mybroadcast_003.mp3. Some are really forward thinking and name them things like scrambled_eggs-20050104.mp3. But others, sadly, will use names like “barbie.mp3″ and “today.mp3″, and that’s the way it is.

    Contextual information will likely be a combination of hand-hacked stuff and script work. I know I’m missing a ton, just like when you look at a collection of novels you miss all the newspaper articles, speeches, meetings, and conventions those novels might have sparked. But it’s all a case of ‘starting out somewhere’, like I say in the entry.

  12. mungojelly says:

    I prefer the term “feed.” There’s no need IMHO to have separate terms for feeds that are text, audio, or video. “Audio feed” works fine, & I expect that in the future most feeds will be mixed, anyway; already most “podcasts” are text feeds as well as audio. The preexisting connotations of “feed” are beautifully appropriate.

    Thanks BTW for your work. Collectors are at least as essential as creators, in this new world.

    I think you’re going to have to give up on archiving *all* the podcasts soon enough, though. It’ll be like my ill-fated mission, when the web was beginning, to visit every webpage. (The times got ahead of me; I could see that new pages sped up until they were appearing about as fast as I could look at them, then faster, then *much* faster.) You’ll still have a useful archive though of this, the beginning.

    <3

  13. Jason Scott says:

    Well, it’s interesting. I’ve had a number of people contacting me privately saying, basically “You’re admirable and insane and it’s impossible.” Very little is impossible, but there are several factors I am counting on to continue to buffer these things in great amounts.

    First of all, I am not archiving music casts. “Songs of the Day” are not really what I’m going for, or “The new trance explosion XXXIVI”. If you cut out pre-formed bits of commercial music, that’s about 90-95 percent of the Internet Broadcasting going on, and I’m all set for that. So that’s cutting out a lot.

    This leaves people who, on a regular basis, are sitting down and talking into a microphone, about a subject (themselves or something topical) or playing some pieces and describing them… you know, talk radio or analysis kind of stuff. And I contend, and will contend for the forseeable future, that this is a barrier to entry that will limit it greatly.

    I am aware there are companies now working to lower that barrier to entry. I’m all for it, but I still think this is limited appeal.

    Keep in mind also, that we’ll often end up with strata: people who are creating this stuff with intent for wide dispersal (think BoingBoing, Wonkette, Winer, and so on), people who are doing it as a lark for themselves and we get to listen in, and then people who are basically doing it for 2 other people and don’t entirely understand they’re broadcasting to the full and complete internet (a lot of livejournals). I suspect these audio feeds will be the same. I’m going for the first two and as a bonus the third. I think the most growth will ALWAYS be in the “just screwing around” section, and also, those will be of weird frequency.

    And we agree there; ANY kind of collection is at least SOMETHING.

  14. Yes! Bravo! I’m glad you’re doing this. Now…can someone please do this???

    http://homepage.mac.com/dave7/podilicious/podilicious.htm

  15. tomhiggins says:

    Having caught the bug a few months ago some observations…

    There is a newer breed of podcasts that came up fast that is basicaly “How can we make money off of podcasting podcasts about making money from podcasts” Yea, ya know its ok if you miss these cause Im deleteing them and thier feeds from my catcher just as soon as I hear the caching of the “Well we are due our due” cry drown out the creativity of doing a show. There is a decided core of folks who are going to turn this commercial in the same old stupid ways fast.

    Then there are folks looking to the economoy of the micro scale for a revenue stream. Them I like better. Call it the NPR way, basicaly its donations and sponsorship in a way that tries not ot overshadow the content. Its a balancing act to be sure.

    Of course there are folks who are and will do this for gratis and to them I am more apt to toss a buck or ten in the paypal tip jar or buy some merch..

    For catching these feeds (casts) I love love love BashPodder. Why? First off its a bash script, pure and simple and cli just like Bog intended the good stuff. Second, the name is a perfect image of Bashing Ipods and the white eared iPlod zombies. Bash Bash. The third..well being a bash script I was able to mod it to make a copy of the new casts over to my Zen Nomad while I sleep. Each morning I away to new things. With 30gig its not a big thing to cruft it up , I usualy spend a few minuets a week culling out the old.

    If you need any back casts of the Personal Telco Project Podcast or the Jean Shepherds Fatheads Cast just let me know.

    Some casts I love and think desrve grabbing, if you have not already got em on your list…
    WholeWheatRadio
    SlackerAstronomy
    OpenPodcast

    Much respect for you habitual dives into the depths of archivists insanity.

    -tomhiggins

  16. mediageek says:

    Give Thanks to the Archivists

    Jason Scott, of Textfiles.com fame, has announced that he is currently archiving all of the podcasts that he can find, currently storing them in 75 …

  17. Mr. X says:

    Hey Jason–

    Great idea about archiving Podcasts–I would like to extend an invitation to you to come on the Mr. X and Just Julie Podcast via Skype and talk some more about it–I find it very fascinating and I think my listeners would as well–Add us to your Skype list (if you use Skype) with the name “ifthensoftware”

    If you see me online, Skype in–I only run Skype when the show is recording. (Yes, anyone can Skype into the show when we are recording and you will most likely be on the show).

  18. Rob Greenlee says:

    I am already building a radio show and podcast archive at http://www.downloadradio.org. It will not be an archive with 100% of all podcasts, but will create a seachable achive that uses Bit Torrent for distribution, but will still display links to streams and direct mp3 download links.

    It is not a dream, but is being built and is operational right now.

    Rob Greenlee
    WebTalkRadio.com
    DownloadRadio.org

  19. Jason Scott says:

    First of all, let me say, SeattleWireless TV rocks. I wish you guys would put more episodes out more regularly. It’s been an inspiration for many people.

    Your archive is incomplete, and has a business agenda, however small. This is absolutely cool, absolutely neat. We are not in the same place, not working from the same motivations, and not working towards the same goals. This is, again, absolutely cool.

    I see that this is starting to become a drumbeat for you. (http://www.webtalkradio.com/blog/87.shtml) I especially like the “It is not a dream, it is being built and is operational right now.”

    My archive isn’t a dream either. I past 90gb this morning.

    Anyway, like I said, I love your work, go for it, the more collections out there the better. I’m sure archive.org will come along and kick both our asses.

  20. Dazz says:

    Am I alowed to publish this ‘article’ on my german blog? I want to introduce your idea to more people. I found the most german podcaster are not confident enough in their casts. They say things like “in my opinion” at the ending of their sentences or “if you like it”. I think they shoudnt care about if others like it or not, because if listeners wouldnt they can write a mail or give a comment. If you say something like that you withdraw your listeners the reason to react. So I want them to now, that their opinion is important, maybe not now, but to someone in the future! I listened to that given mp3 three times to get everything you said and i have to say that Im stunned, that this is the first time I find someone talking about “digital history” where you have unicef and others wanting to keep the history of humanity. Its sad that nobody who lived in the time where telephones where introduced recorded how human behavoir changed and what it did to communication. If you have any source of it, i’d be happy.

    Starting to think about MY time on earth and record it. Someone will need it in the future!

    Dazz

  21. Jason Scott says:

    Hey there, Dazz. Well, you certainly have my permission to reprint or translate or otherwise use the material from the weblog; I put a license on it explicitly saying you could.

    The time when telephones were introduced did not have ways to record human speech in that way; it came later.

    It is well known that the recording of music changed how music was played, because it affected how people percieved it (they could now hear themselves and instead of taking music and moving it in different ways through the years, they had a set “form” for some subtle aspects that they started to emulate).

    So go ahead, spread it around, although I don’t know if it’s that groundbreaking.

  22. No New Podcasts Today

    No new podcasts today, but some new postings over at Blogging Alone .

  23. dilvie says:

    Hi Jason,

    I wish I had the resources to archive all the Mp3 blogs running. While I’m very interested in the historical value of the spoken word, I’m a little more interested in the historical value of music development — especially the more obscure music that tends to turn up in feeds like Knobtweakers.net.

    I am keeping a complete archive of Knobtweakers, of course — since it’s my blog, it’s fairly trivial to do. One day, I’d like to donate a copy of the collection to a true collector, like yourself, if such a person exists…

    What would be even more interesting, is to put key players together into a room and record the conversation. I certainly have a lot of really cool information that I don’t put into the blog (because it’s not really an appropriate forum for it). An interview documentary like the one you did for the BBS scene would be very cool.

    Sadly, one of the key players in electronic music Mp3 blogs has already passed away. I hope somebody takes it up soon. This is a critical time for audio bloggers — mainstream media is starting to notice that we exist, but I would still say that we represent a niche culture, and few or none of us are being unduly influenced by politics or business…

    I think it’s still safe to say, the vast majority of us are doing what we do primarily because we love it, and that almost always produces some cool content.

    – Eric

  24. Josh Renaud says:

    Yikes, I guess it’s time to close the comments.

  25. Preserving podcasts

    Yes, there is someone who’s trying to save all the podcasts ever made, according to Wired News. As documentary filmmaker and amateur historian Jason Scott explains on his blog, “I’m certainly not looking to start a business as a podcast…