Proceedings of the Hivemind B

My partner just sent me a link to a post by Marcio von Muhlen called We Need a Github of Science. I started using Github just recently. For those new to Github (like me), it is a web-based hosting service for software development projects that use the Git revision control system. Basically you can upload your code there, and anyone else can see it and its entire revision history, comment on it, and download it. Most of interesting of all, anyone can make a copy of it to correct mistakes or to create their own extensions (forking), and you can receive that modified code back to build upon your original code. Github is built for collaboration.

A lot of what Marcio writes resonated with me, starting with this:

I believe it [Github] represents a demonstrably superior way of distributing validated knowledge than academic publishing.

It’s the “validated knowledge” bit that got me. The proof is in the pudding, as they say, and there is no better (nor more scientific!) way to validate knowledge than to show that it works. On Github I will know if it works by simply cloning it and trying it myself. But how many times have I come across a modelling paper and been unable to replicate the result (only to find out a few emails later that I was missing a key parameter value or assumption)? Github is a simple solution to that.

Marcio goes on to make an important observation about the difference between traditional publishing and the possibilities of modern technology:

The existing peer-review process arose from the limited carrying capacity of physical journals. Prioritization had to happen before publication … However, in the times we are living in, distributing media is basically free.

We no longer live in a world limited by page-count; the Internet is infinite (more or less). The idea that a paper might be rejected because “space in our journal is limited” is a nonsense (albeit a polite and ego-saving one). What they really mean to say is, this journal is a curator of good papers, and your paper wasn’t. Which brings returns us to the question of what is to be “good”:

Prestige is really about having an engaged audience that follows and recognizes your activities.

This is the marker of a prestigious journal, that more people read and cite the papers in it. But what is it to cite a paper except to say, “I found this useful”? We use the impact factor of a journal as a proxy for the usefulness of the papers within it, but is that proxy really needed? Why not just go straight to the source?

As Marcio points out, page-count is not limited, but human capacity for reading pages is, so a process of prioritization and authentication remains necessary. Yet that process of prioritisation can be the work of the hivemind, as Github demonstrates:

[publications to Github] are prioritized by the numbers of developers “watching” for updates or “forking” new development lines … the abundance of non-significant projects in GitHub does not detract from its usability, because those projects are never brought to anyone’s attention

So what should a Github for Science look like?

PLoS One comes closest to what I am describing, in that their peer-review process screens only for scientific rigour, not perceived impact, meaning they will publish content considered unsexy and let future citations determine importance. But they have not yet embraced the social web, as the lack of scientist profiles (with associated prestige metrics) in their website demonstrates.

And it occurs to me now that, barring special issues, I never search through a particular journal. I usually search on topics, ordered by citations. I often search on a paper that I’ve found useful to see who has cited them since. Occasionally I search on an author name. If mine is the current practice, the structures can’t be far behind.