DistriWiki: A Proposal —
People, it’s time.
Actually, it was time probably 5 years ago, but better late than never. If you believe Clay Shirky, we can just keep burning energy nearly forever in terms of collaboration energy, but let us not waste too much more than we have, can we? It’s like how bananas used to taste different and then we broke bananas and had to get new bananas to replace them.
As an information activist, I like it when stuff “happens” that brings into sharp focus a bunch of issues at once. It can get hella dreary explaining countless levels of intra-specific concepts that eventually deploy a meager payload of only relative interest to the masses. But one good clusterfuck, well, that’s worth a week of seminars.
We just had that happen with Wikipedia. This is a very simple introduction to what happened, but it’s a still a fuckload of a lot of work to read all that. So, I’ll summarize thusly:
Fox News decided they were going to do a story on how much porn and adult material is on Wikipedia, in the form of images. As part of this, they started contacting Wikimedia/Wikipedia donors, specifically the big-ticket ones instead of the peons. This scared the fuck out of Jimbo Wales, so he started deleting hundreds and hundreds of images, any that might possibly cause any raised eyebrows anywhere in the western world. By the time people caught onto it, he’d done an enormous amount of deletion, and after a lot of infighting and debating, a percentage of the images are back, some are calling Jimbo a hero, and some admins are resigning in protest. Meanwhile, Fox News was able to release a story instead claiming victory for helping to purge Wikipedia of adult material.
So, listen. We could point out the problem is Jimbo Wales, and yes, he’s kind of a problem, in the way that your crazy perv uncle is a problem – you can’t predict him, he’s sometimes a lot of laughs but other times he’s creepy as all get-out. We could point out a problem in how the masses of admins and Wikipedia users reacted to this, with their endless discussions and finger-pointing and votes and whatever other bureaucratic bullshit they like to burrow into. We could even take the idea that Wikipedia itself is to blame, with the editing and the unrealistic goals and the claims to be censorship-free and all that.
But that’s not the problem. The problem is a different one, and it’s a problem we solved a long, long time ago.
Wikipedia is fucking centralized.
It’s on a bunch of servers that serve code to each other, most definitely. It has shared resources and it goes out to a couple datacenters and it’s got some level of redundancy, so it’s not on one server. That’s not what’s meant by centralized. I mean that one entity controls it, one entity has fiat, that entity makes decisions and the decisions lead to policy, actual policy, like policy in the code and the construction and the implementation of right and wrong, and it’s all centralized. That’s why one vandal, Jimbo, was able to do so much damage, so quickly. That’s why it took dozens, maybe hundreds to undo it. That’s a problem. It’s not a good thing.
Now, in the way that all entities that centrally control the domain will tell you the domain being centralized is awesome, I’m sure the Wikimedia Foundation and Wikipedia’s actual rulers will tell you this is good, that it means that Wikipedia is protected and it will live on, and those delicious, delicious donors will be able to do a one-stop-shop and give their dollars to a place and give Wikipedia a building and everything is great. That is what controlling entities do. It’s not evil, it’s not bad – it’s just their nature. It’s like getting mad at a dog that bites you – the dog bites things, and you dipped your hand in meat sauce 2 minutes ago. Don’t get angry at the dog – either stop dipping your hand in meat sauce, or don’t go near the dog for some time after you dip your hand in meat sauce.
If every Jeep Cherokee stopped working for 15 minutes at the same time all over the country and it turned out it was because a server crashed in Texas, we’d be concerned, right? We’d start asking some questions. When Boston’s single-point-of-waterpipe broke and the city had no clean water for a couple days, people started talking about redundancy. You know, once you have a pretty clear indication that something is wrong, you start to talk about solutions.
So let’s talk about solutions.
We’re lucky – the Wikipedia “problem” I’m talking about was solved years ago. It was called Usenet.
Usenet was a major solution to a problem it didn’t even know it had. It was founded in the beginning era of the general Internet, the network of networks where things were going on in all directions and there were almost no guarantees and almost no idea what it was going to all be about. All that was known was it had potential and could be really cool and really powerful. So Usenet went through a number of iterations and a bunch of fights and a whole lot of events, and guess what – it got shit done.
Now, let’s give a moment for people to say Usenet didn’t work, or that it’s caked with spam and bullshit, and it’s broken and a terrible model to base Wikipedia on, since Wikipedia works.
Bullllllllllllllllllllllllllllllllllllllllllllshit. Bullshit on all levels.
Usenet worked fine. It had a bunch of security issues because it assumed if you were big-cheese enough to run a Usenet node you were probably mature and sane enough not to do crazy insane things, an assumption that became less valid as more people showed up to the party and the barriers to entry dropped like pants at said party. And without a doubt, a lot of Usenet got overrun with spam, but a lot of Usenet had functionality to deal with spam, including on both the reader and the server side. It was a known problem. Also, it was an attention crash issue – once people left servers running without maintenance, spam increased, just like entries on Wikipedia increase in spam and problems when they lose the attention of folks. Stuff can sit on Wikipedia for days, months, years – until it’s fixed. Compared to the situation that Wikipedia has a single point of failure in terms of presentation and control, Usenet’s functionality concerns and issues are a rounding error. Usenet worked fine. Usenet works fine. It’s not the big hot thing – but the servers go up, and they stay up.
Critically, Usenet was decentralized. Different Usenet servers gave changes to each other – they provided articles to each other, some sections had moderation, others did not. The protocol was designed to assume there would be dozens, hundreds of places around the internet, some of them accessing only a few times a day or week – and the changes would go to them as resources permitted. As a result, some servers had a wide range of postings, and a long retention rate – others could barely keep up and not using them for a few days meant you missed stuff. People wrote indexing and archiving utilities to cleave off what was needed and let a person seeking information find it. Backups happened, that years later resulted in us having decades-old saves of Usenet articles. Seriously, this is good stuff. We learned a lot.
With Wikipedia, we forgot it all.
Now, not to say that it’s all Wikipedia’s fault – webservers worked in this “one server gives out the info, oops, that one server is gone” way as well. FTP servers, also, worked in this way. To one way of thinking, Wikipedia was just following the trend, the dominant paradigm.
But in both the cases of FTP and webservers, mirroring mitigated this situation by having ways for various FTP servers or various webservers check on their “masters” and do changes accordingly. If the “master” disappeared or was overrun, the mirrors were right there to save shit. And again, mirrors were not just down for repairs. Censorship, shutdowns, fights, politics…. all of these disasters were reduced in scope with Usenet, with mirroring.
Wikipedia forgot that little bit, that not all disasters are code-based, not all downtimes are hardware based.
I therefore propose DistriWiki, a set of protocols and MediaWiki extensions that push out compressed snapshot differences of the Wikipedia software and which allow mirror MediaWikis to receive these changes and make decisions based on them.
Imagine a world where this happens.
Imagine a world where the main Wikipedia would issue a deletion out to servers around the world, and some would follow it, and some would not? A set of rules on the mirrors, like “do not automatically delete any article that is more than 100 days old” or “do not delete this subset of articles under any condition”.
Imagine a world where these little Wikipedia mirrors have their own subsets of Wikipedia space that are different than Wikipedia, where other thoughts other than the grey goo consensus of Wikipedia rules the day; where a separate “article space” exists there, which can be shared on other Wikipedias at will – demoscene space, muppet space, all the crap that Wikia offers in a commercial setting, except now being done by various vendors and non-profits, and not reliant on a single point of political failure?
Imagine these Wiki variants existing:
- PuritanWiki: Nothing with anything adult-oriented ends up being covered – people can send their kids to browse it related to education and whatever other nanny-tistic approaches they want to and not be worried that their children will ever discover other people have genitals or how they can protect themselves from pregnancy any other way but never having sex, and they’ll never figure out who Hugh Hefner is.
- ScienceWiki: Perfect for people trying to find out about scientific information without having every single link end up somewhere between Deep Space Nine and Red Dwarf.
- FandomWiki: Every single last piece of every last pop culture world lives and breathes and may be the stupidest thing you can imagine, but people who want this are in heaven. Wikipedia may have long ago deleted every reference to every fake element in your favorite sci-fi show, but it lives on in this space.
Please don’t tell me this is technically impossible. Go back to school. It is not just technically feasible, it’s nearly trivial. There’s been so much advancement in compression, difference tracking, and network protocol hub-dubbery that this is the kind of project that could be done in beta by CS students as a final project. It would have scale and bugtracking issues, but it would work. Don’t even tell me that in a world that people can use GMail as a filesystem or jam with people in realtime or use processing over the web or any of a thousand other miracles we see every week, we can’t handle this. We can. The hurdles are political and mindset-related. Wikimedia isn’t going to want this – it’s more work and lessens control for their non-profit, without realizing that with collaborative networking comes competitive quality, and they merely have to maintain being the best to stay ahead and validate the millions. Jimbo Wales will be against it, because Jimbo is in it for Jimbo and something that takes control out from under Jimbo is not going to serve Jimbo. And Jimbo doesn’t like that. But these are minor hurdles in many ways, too.
If this had been in place, Jimbo doing this deletion binge would have been a minor setback – the mirrors would have retained or not retained the pictures and information, and the choices made by someone in the heat of fear for his little PR outlook would have been ignored or followed – but you know which servers I’d have wanted to be on. As further information fads infect Wikipedia (“oh my god, we need to delete everything about gay people or the United Arab Emirates will stop donating money”) , this decentralized, mirroring, robust and variant ecosystem of interconnected Wikis would resist them, like the diseases they are.
Look up the history. Or don’t, and trust me.
Decentralize Wikipedia.
Now.
Categorised as: computer history | jason his own self | textfiles.com
Comments are disabled on this post
As I mentioned on Twitter, I’m totally with you on this one. Oddly enough, I had a rather spirited phone conversation with one of my cofounders today about the whole Usenet and BBS and World Wide Web thing, and pieces of it involved some of the discretionary redundancy that our NNTP-hosting (and often UUCP-using, before the advent of the Internet) overlords exercised.
Seriously, put together a more verbose spec for this, and I’ll provide some elbow grease effort to get shit done. It isn’t like I’m doing much else these days than being a complete brokeass with nothing interesting (and notable) to work on.
Interesting how Wikitruth predicted this sort of Wikipocalypse almost 4 years ago:
http://wikitruth.info/index.php?title=Wikipocalypse:Sexual_Revolution
Well we’re pretty much already doing what you suggest. The 3 ideas you say people will think are impossible are already going on. Of course we don’t have a specialised extension that does these things automatically but that’s just because I don’t see the point right now. EmuWiki is just as you describe : a specialised wiki that takes from Wikipedia, gives to Wikipedia also, but this specialised wiki constitutes a filter against these kind of attempts at destroying data from a centralized perspective. Moreover, we have a perspective that goes beyond what the “Wikipedia rules” (written on a table corner) allow. For example, we are an actual encyclopedia as meant by the actual definition of encyclopedias : we care about preserving knowledge. Wikipedia does not. So in a lot of articles, we collect old versions of emulators that we would have difficulties adding to Wikipedia, not even the actual emulator file but just its description; some who-the-fuck-are-you guy would come and tell us that the emulator is not “notable” enough to be inside an encyclopedia. People who invented encyclopedias like Denis Diderot would go crazy if they saw what their invention became.
I agree with you on creating a distributed wikipedia, where each node can have policies regarding the kind of content they pull from others. I propose an extension to Wiki content, to combat deletionism: a markup for importance/notability, at both article and paragraph/sentence/link level. With this, one could present or pull only those things that are above the waterline – Qwikipedia could pull only the gist, sciencepedia would pull only a little material from Pokemonpedia etc.
Although I think this is a neat idea, I think it damages the idea of Wikipedia. Wikipedia’s value is in that there is a single relatively ‘authoritative’ source that people consult. Everyone goes to the one place to get the current snapshot, feeling that it’s reached some sort of homeostasis from people arguing over the viewpoints.
If there are millions of different flavors, shards and varieties of this “knowledge,” you’ll end up with splintered, opinion-influenced versions, each jealously guarded by those who are interested in that particular area.
Right-wing vs. Left-wing versions, Apple vs. PC fans, who knows what else.
Wikipedia’s not perfect by a long shot, but the central point of reference does have value. The single-point of failure, well, maybe not so much. And that’s what backups are for. =)
That’s why I’m calling WP an anachronism since 2003.
The Internet itself with it’s decentralized structure where everyone connected to this network can communicate with everyone, publish independently and authentic, be read by everyone, makes WP an anachronism.
WP is even an insult on human culture, which consists of independent functioning but interacting and cooperating human brains. There is no one super human brain the human race cannot exist without. No matter which human brain fails, the others replace it seamlessly. Carl-Heinz Schroth, a german actor, did sum it up this way: “The gap we leave behind replaces us perfectly.”
So what WP would need the most – in case you don’t want to trash it – is pluralism in all central aspects: articles, discussion, opinions, etc.
But WP is a child of its history and as nobody with some common sense stays with the human garbage that runs WP, that garbage determines the future of WP.
So the best you can do as a single person: stay away from WP and publish on your own.
it occurred to me quite a while ago that the best parts of the internet, architecturally, are those that re-implement the interesting parts of levels three and four in level five. the internet as a routing network is famously resistant to all sorts of shenanigans; any properly designed decentralized application protocol will follow suit. the best parts of the ’net are therefore usenet, email, irc, and bittorrent (with a special mention always reserved for freenet for being more the ’net than the ’net itself).
Others are think about this. See these proposals on Wikimedias own strategy wiki:
http://strategy.wikimedia.org/wiki/Proposal:Distributed_Infrastructure
and
http://strategy.wikimedia.org/wiki/Proposal:Distributed_Wikipedia
Also see the Leviation project to convert Wikipedia to a GIT repository:
http://levit.at/ion/wiki/Main_Page
@Mark: Having many versions/flavors of wiki is actually a good thing. For one thing, it would be much more obvious that any one data source has its own slant, and Wikipedia is not exempt from bias. What you describe is exactly what a good book libraries is all about – collecting varied descriptions of the same info, giving slightly different interpretations based on the reporter’s background and bias. Researching a topic is therefore the act of gathering this data and insights from these various sources to create a new understanding of the topic. We don’t want the “one true book”, we want a whole library.
Critically, though, Usenet didn’t need referential integrity. Wikipedia does: most edits are to existing pages, not creating new pages. How do you maintain consistency? (DVCS is not a solution in and of itself: you’ll inevitably end up with merge failures wherever you fork pages away from Wikipedia.)
Check out http://www.fanlore.org for your FandomWiki, there. 😉 OK, so it’s mostly a wiki that’s about fan culture, and not one about a particular fan property. But I think that there’s something to be said for the very popular, and very strong, fandom wikis that exist – like Memory Alpha, etc. Fanlore is a little different because it’s supported by a nonprofit that makes it somewhat more stable, IMHO, than many large wikis – but, yknow.
(Of course, this says nothing to the actual problem that Fanlore itself, along with Memory Alpha etc., are all centralized. But there are some fairly large and authoritative fan wikis out there, and it seems appropriate to bring them up in this context.)
Usenet stopped being decentralized when the cost of getting a feed got way higher than the cost of reading it on a central server (DejaNews, Google Groups). You could stop spinning your own disks and stop maintaining a bunch of software and just point your browser at a web site.
Things of wiki nature are decentralized by having multiple independent wikis, not all of which have the same scope or focus or even editorial control as Wikipedia. Arborwiki, which focuses on Ann Arbor and the surrounding area, will have in-depth detail that would be deleted out of Wikipedia as too irrelevant. The Muppet Wiki will tell you more details about Miss Piggy than you will ever get in Wikipedia.
The solution to Wikipedia is to let many wikis bloom; each with a name space that overlaps with Wikipedia, but which decides that some additional set of things are relevant and worth naming, and that some huge number of things that Wikipedia covers are irrelevant.
Ed
This entry made me say YES.
Part of what is scary about the Cloud (especially as concerns Google) is that people seem to think of the Cloud as decentralizing computing, but if you think about it, all one is doing is handing off one’s data to some company that will ostensibly keep it safe for you and not keep it safe from you and hold it for ransom, of course.
Hopefully folks make this connection too.
as for different versions, well, people could use some CRITICAL thinking. Wikipedia’s best use for me is as a linkfarm.
I really like the idea of a MediaWiki extension for mirrors which says “Don’t delete any article older than N days old” — I hope that someone implements this before the next big Wikipedia disaster.
I really like this idea. However, I fear two things really have to be addressed before this is implemented:
1. How to keep the spam and vandalism out. Back in the Usenet heyday, spam was almost nonexistent, and most Internet users were using their connection for professional use. The in-flood of casual Internet users dramatically changed Usenet; see “Eternal September” (ironically, Wikipedia is a good start). If you think the occasional obscenity-shouting 12yo on Wikipedia is problematic, wait until that can be automated in ways you could never imagine.
2. How to keep information that parties do want to have synchronised up-to-date. Dates of birth and death, mathematical formulas, there is such a thing as purely objective data, and it would be a mighty shame if a distributed Wiki would require all this to be manually synchronised. If PuritanWiki is more up-to-date regarding factual data than HedonistWiki, guess which will get more traffic and which will thus have an easier time actually staying up-to-date and not dying a slow death?
Being the CS geek that I am I immediately started thinking about sentence-based merging algorithms that can mostly solve #2 (and Ted Nelson’s Xanadu project probably did so), but that won’t help much if #1 remains.
Very inspiring post. Over the next few weeks, amongst a few other projects, I’ll be working on https://sourceforge.net/projects/distriwiki/ as a very simple proof-of-method distributed wiki server. It’ll be interesting to see what other implementations people come up with. Thanks for the idea!
@Tabsels: Merging algorithms fall apart fast as soon as any real reorganization (or, worse, rewriting) takes place on a fork.
[…] did you know Wikipedia is currently going through a controversy related to nudity? Anyway, link’s been rendered more unambiguously SFW. […]
To what extent should this technology applicable to ordinary web site mirroring?
Here is my example (currently very basic) proposal for a distributed wiki protocol:
http://bytenoise.co.uk/DistriWiki
A decentralized Wikipedia would be great. It will never happen, of course, but it would be great.
But I don’t agree USENET is a model for Wikipedia or any such effort. I loved it in its day, but as you note, it had many problems of its own. There’s a reason usership of USENET has plummeted and Wikipedia is doing as well as it is – for the end user, it works much better. Wikipedia is plagued with many problems from within but because the users so greatly outnumber the contributors, from an end user perspective, the problems don’t matter.
Speaking of which, did you snag that graphic from Wikimedia Commons? For shame 😉
http://en.wikipedia.org/wiki/File:Usenet_servers_and_clients.svg
So, I just posted an update to Jason’s Wiki entry about this proposal – let’s see how long it lasts, eh?
It would have been good if your update was accurate. You said that I indicated Wikis should be mirrored via Usenet or Usenet Servers, when I meant than the decentralized model of Usenet should be emulated in maintaining copies of Wikipedia’s content, via extensions that feed between MediaWiki servers. But hey, it’s Wikipedia.
Sorry- I’ll go back on and fix it- great article though…
I saw the correction go by. Very appreciated.
No problem, sorry I got it wrong- it would be great to see something like this become a reality someday- and any entry on Wikipedia that brings up what a fascist blowhard Wales is is a plus in my book.
Man, very inspiring!
@Zoe Blade: My thoughts are it is good to look up to the git (as in the version control system) way of doing things for the federation protocol (if you are going to be working on one :), each wiki can make itself a module of Wikipedia, or another wiki, or not, and pull/push from there. Allow to fork an entire wiki, etc.
I took a look at your site, I think it would be wise to build on MediaWiki (as a set of extensions) than start from scratch. It’s a great project 🙂
@Jason
It would be awesome if you discussed this with wise people you know and wrote about the (preferred) technical details!
> Look up the history. Or don’t, and trust me.
You da man! \o/
Read the article, but comments… tl;dr
Anyhow, something like this will require a much better backend than the one MediaWiki currently has. That is, something capable of actual merges. I think Darcs got that pretty good for code (disclaimer: I never figured out how to actually *use* Darcs), but AFAIK it was still line-based. MediaWiki does something right because it can diff by character. Some combination would seem like a good idea. And in any case, there must be some UI for merging (or even *remerging* something the computer did wrong) by hand.
There are already different versions of Wikipedia with divergent content: The various language Wikipedias.
…which are all hosted at the same servers run by the same admins, run by the same rules.