Bump Not Sage: Saving 4Chan
Probably the best part of following a logical-conclusion path is when people supporting you with pumping fists, hoots, and hollers start to pump their fists a bit less and do a lot less hooting.
So let me inform you all and the world that, after many months of work and negotiation, I have acquired 10 million expired threads from 4chan’s history. Roughly half a decade’s worth.
Why? Because it’s part of online history, a study of the human soul when untethered by identity, a way to confirm statements made years ago… any range of reasons which I could not hope to compose out of the air for you. That’s not my job. My job is to save things. And now I’ve saved this.
It’s going on archive.org over the next week. I’ll let you know when it’s done. It’s dozens of gigabytes, and I have it in XML, HTML and MYSQL formats, all of which show different parts of the data. (Conversion strips out some data that original formats might not have, and so on.)
An awful lot of history that we have at our fingertips is because someone, somewhere, hit “save” instead of “delete”. Someone did that in this case, and so here we go.
Plan accordingly.
Update: This has been cancelled (postponed, really, for a few years). Please read this weblog entry.







Jamie dubs
wrote:
Amazing! Is your archive current? I’ve actually been working on a 4chan archive/search engine and would love to swap notes or contribute data
Posted on 23-Jul-09 at 2:36 pm | Permalink
chronomex
wrote:
That’s really surprising; I thought I heard most expired 4chan threads were dropped on the floor. (Notably excepting /r9k/…) Does this include images, or is it just text?
I would presume that images are what I heard as being deleted on expiry, so I suspect that it’s text-only.
Posted on 23-Jul-09 at 2:43 pm | Permalink
Alex Leavitt
wrote:
Do you know the date range?
Posted on 23-Jul-09 at 4:26 pm | Permalink
Michael Kohne
wrote:
At dozens of gigs, I suspect there’s at least some images. Now, what interesting sociological research can we do with 10 million 4chan threads?
Nicely done, Jason. Did you come up with a way to get the expired stuff on an ongoing basis? I really suspect there’s a least a few papers for the psych majors in that data.
Posted on 23-Jul-09 at 4:31 pm | Permalink
durr
wrote:
does this include cp threads?
Posted on 23-Jul-09 at 9:38 pm | Permalink
Toshiaki
wrote:
The images are not there.
“interesting sociological research”? Clearly, you have never visited 4chan.
This archive is is only useful if you want to know the first utterance of ‘fgsfds’.
Posted on 24-Jul-09 at 8:32 am | Permalink
Anonymous
wrote:
MOAR CP Plz
Posted on 24-Jul-09 at 9:43 am | Permalink
Anonymous
wrote:
nah just kidding…AMAZING project though…can’t wait to see how it turns out! 4chan IMHO is the epitome of what the internet is, can be, will be, and has been. It’s one of the only things on the internet that can ONLY be on the internet. It really is a piece of history, try to get the national archives to store it for you!
Posted on 24-Jul-09 at 9:45 am | Permalink
Kevin
wrote:
Contents aside, I’m curious to see volume changes over time. How might they map against other events.
Posted on 24-Jul-09 at 9:48 am | Permalink
Antifuchs' µblag wrote:
“So let me inform you all and the world that, after many months of work and ne…”…
So let me inform you all and the world that, after many months of work and negotiation, I have acquired 10 million expired threads from 4chan’s history. Roughly half a decade’s worth.[...]It’s going on archive.org over the next week….
Posted on 24-Jul-09 at 10:01 am | Permalink
Five Years of 4chan - spincitydotorg wrote:
[...] Years of 4chan Five years of 4chan is being added to Archive.org. Crazy. [...]
Posted on 24-Jul-09 at 1:24 pm | Permalink
Torley
wrote:
Heard about this via Waxy too! What a chunk of Internet.
It would be wonderful to find accessibly exciting ways to browse these archives and the evolutions of memes (and other items of confusion-causing cultural significance).
And generate other stats like a graph over time like Google Trends showing how many times the words “FAIL” and “WIN” were used.
Posted on 24-Jul-09 at 6:55 pm | Permalink
Anonymous
wrote:
This is really sad. Please don’t do it. Does nobody see the value in transience anymore?
Posted on 24-Jul-09 at 10:49 pm | Permalink
David
wrote:
Why would anyone care if images are not included?
Posted on 25-Jul-09 at 4:13 am | Permalink
Jason Scott: Saving 4Chan | Extra Future wrote:
[...] Jason’s acquired about 5 years worth of expired threads from the internet’s House of Awful Shit and Meme Factory, 4chan. Yes, this is culturally important. Just trust me. [...]
Posted on 25-Jul-09 at 7:35 am | Permalink
Anonymous
wrote:
@Anonymous:
Why should only the people who were alive and active at a given moment be allowed to enjoy said moment? There’s something to be said for fleeting events, but there’s a lot more to be said for preserving a fairly massive piece of history.
@David:
Well, they’re part of the history too.
Posted on 25-Jul-09 at 7:51 am | Permalink