The Most Important Social Network: GitHub by jgn on Saturday, July 14, 2012 in Uncategorized

Now that GitHub has raised $100M ( http://go.bloomberg.com/tech-deals/2012-07-09-github-takes-100m-in-largest-investment-by-andreessen-horowitz/), I should publish this blog post I started a few weeks ago.

TL;DR: GitHub is the largest public repository of the everyday experience of work. Ever. If you're a scholar or journalist interested in collaboration, this is perhaps the most important archive you will find regarding what actually happens as people work together.


Facebook! Twitter! LinkedIn! VK! Renren! These are among the most famous and largest social networking platforms in the world. But are they important? Of course they are. They've changed the way humans interact. But let me challenge you and ask: Have they changed the way we work and think? I do think they have, to some extent.

But there is a social network whose sole purpose is facilitating the collaborative production of knowledge and invention: GitHub ( http://www.github.com). GitHub currently has more than one million users. In 2002, there were about 600,000 software developers in the United States, so clearly GitHub's "mindshare" is highly significant for the people who write code.

GitHub started off as a web-based platform for managing code sharing through a system called Git ( http://git-scm.com/), which was created by Linus Torvalds, the inventor of Linux. Git is a distributed source code management system; the "distributed" part means that it can work without any central "master" repository. (This lack of a central repository / authority is itself an important aspect of work organization, but it's another story.)

What GitHub does is layer on top of Git a central place for code exchange and discussion of the code. Many GitHub projects are private ("owned" by private groups or companies) but others if not most are public. Everything that is contributed to these public projects can be studied in the open, with a full view of the history of its changes; and anyone can propose a change to such projects. Such changes can be commented upon, and, when appropriate, can be merged into the code base instantly.

This is a big deal.

Let me tell you about knowledge production: much of it is private. I have a PhD in English and wrote a dissertation on the interaction between literary and medical knowledge in the sixteenth and seventeenth centuries. My research notes and revisions were essentially private. My drafts were my property. In certain highly ceremonial performances, I might share my "work in progress" with an individual (a faculty advisor or an eminent scholar or a friend who could provide feedback), or with a study group interested in the project, or from the lectern at a conference. But for the most part, sharing to the entire world happened at the moment of final "production," when the artifact was safely ensconced in the library or computer, and indexed by domain experts. This pattern is much the same in the social sciences and the sciences (the sciences are circulating more papers in pre-publication form, but the door is closed to full access to the laboratory).

Software was something like this for a long time. You'd write code, and when it was done, you'd share it with your colleagues. Your company might enforce code reviews, but the prospect for embarrassment was so high that the code that got reviewed tended to be pretty polished; and if someone proposed a change, there would be a good deal of mental friction. But in the end, software production was even more closed off than scholarly production. For example, the code for most operating systems was not available to the public -- and certainly not all of the internal commentary leading to the final product. I remember in the 1970s that it was a big deal when I got legal access to the source code to the Digital Equipment Corporation's VAX/VMS operating system (on microfiche).

The advent of open source software changed some of that, but, still, discussion of the code was often ancillary, in separate systems (email, online chat, discussion boards, and the like). For example, when Linux was announced, it happened on an email list. Today, the announcement of a new project would happen by a number of means, but the beginnings of the code project would exist at GitHub. GitHub puts the social exchange at the very center. I suspect that GitHub's servers now contain the world's largest corpus of commentary around intellectual production. As I see it, the production of code has been wildly accelerated since the advent of the open source model; but the emergence of GitHub has been like throwing gasoline on that fire. In the code I live in -- Ruby and Rails and its associated libraries -- development has been proceeding at an unparalleled pace, largely encouraged by the collaborative model at GitHub.

In GitHub, every single change to the code is tracked and can be discussed at the most minute level.

Here are a few examples of the kinds of comments you might read on GitHub:

Discussion of a contribution to a project: https://github.com/copycopter/copycopter-server/pull/67

Comment on a single line: https://github.com/rails/rails/pull/6348#discussion-diff-848099 Based on this discussion, the code change might be removed or altered. Frequently there are replies and further discussion right here at the line level.

Discussion of a potential bug: https://github.com/rails/rails/issues/7034

Jokes, meta-discussion, and watercooler talk: https://github.com/sstephenson/rbenv/pull/30

And so forth.

But sometimes the discussion goes far and wide: For example, here's a major discussion of product direction, erupting from a significant change at the code level: https://github.com/rails/rails/commit/9f09aeb8273177fc2d09ebdafcc76ee8eb56fe33 While this discussion does not quite descend to the level of Godwin's Law ( http://en.wikipedia.org/wiki/Godwin's_law - after enough time, someone makes a comparison to the Nazis or Hitler), it does manage to include edited pictures of Darth Vader and Stalin with ironic commentary. Much of the commentary here shifts into a discussion of community and the mode and means of communication itself. Arguably, this is ground zero for the construction and destruction of the community of contributors and users of the project.

I think these discussions go to the core of so many issues we deal with every day at work. Let me list a few:

And I am sure there are many more dimensions to think about. Many claims are made about the nature of communication in online communities. But the GitHub difference is the overtly purposeful nature of the communication. Yes, I know that conversations on Facebook and Twitter have purposes, but at GitHub, there is real pressure to move a project along and keep it alive. If you're a scholar interested in computer-mediated communication, you ignore GitHub at your peril. Increasingly we are seeing the GitHub model adopted elsewhere, for instance at Docracy ( http://www.docracy.com/ - for legal documents), but for sheer volume and diversity, GitHub is the place. GitHub is writing -- and writing about writing. It can be analyzed with a microscope: But there is also an API providing for machine analysis of the corpus (see http://developer.github.com/v3/). I took a quick spin through the recent tables of contents of major journals in rhetoric, composition, and technical writing, and I don't see much if anything regarding GitHub. Scholars, hop to it.

comments powered by Disqus