The Most Important Social Network: GitHub

by john on July 14, 2012

Now that GitHub has raised $100M (http://go.bloomberg.com/tech-deals/2012-07-09-github-takes-100m-in-largest-investment-by-andreessen-horowitz/), I should publish this blog post I started a few weeks ago.

TL;DR: GitHub is the largest public repository of the everyday experience of work. Ever. If you’re a scholar or journalist interested in collaboration, this is perhaps the most important archive you will find regarding what actually happens as people work together.



Facebook! Twitter! LinkedIn! VK! Renren! These are among the most famous and largest social networking platforms in the world. But are they important? Of course they are. They’ve changed the way humans interact. But let me challenge you and ask: Have they changed the way we work and think? I do think they have, to some extent.

But there is a social network whose sole purpose is facilitating the collaborative production of knowledge and invention: GitHub (http://www.github.com). GitHub currently has more than one million users. In 2002, there were about 600,000 software developers in the United States, so clearly GitHub’s “mindshare” is highly significant for the people who write code.

GitHub started off as a web-based platform for managing code sharing through a system called Git (http://git-scm.com/), which was created by Linus Torvalds, the inventor of Linux. Git is a distributed source code management system; the “distributed” part means that it can work without any central “master” repository. (This lack of a central repository / authority is itself an important aspect of work organization, but it’s another story.)

What GitHub does is layer on top of Git a central place for code exchange and discussion of the code. Many GitHub projects are private (“owned” by private groups or companies) but others if not most are public. Everything that is contributed to these public projects can be studied in the open, with a full view of the history of its changes; and anyone can propose a change to such projects. Such changes can be commented upon, and, when appropriate, can be merged into the code base instantly.

This is a big deal.

Let me tell you about knowledge production: much of it is private. I have a PhD in English and wrote a dissertation on the interaction between literary and medical knowledge in the sixteenth and seventeenth centuries. My research notes and revisions were essentially private. My drafts were my property. In certain highly ceremonial performances, I might share my “work in progress” with an individual (a faculty advisor or an eminent scholar or a friend who could provide feedback), or with a study group interested in the project, or from the lectern at a conference. But for the most part, sharing to the entire world happened at the moment of final “production,” when the artifact was safely ensconced in the library or computer, and indexed by domain experts. This pattern is much the same in the social sciences and the sciences (the sciences are circulating more papers in pre-publication form, but the door is closed to full access to the laboratory).

Software was something like this for a long time. You’d write code, and when it was done, you’d share it with your colleagues. Your company might enforce code reviews, but the prospect for embarrassment was so high that the code that got reviewed tended to be pretty polished; and if someone proposed a change, there would be a good deal of mental friction. But in the end, software production was even more closed off than scholarly production. For example, the code for most operating systems was not available to the public — and certainly not all of the internal commentary leading to the final product. I remember in the 1970s that it was a big deal when I got legal access to the source code to the Digital Equipment Corporation’s VAX/VMS operating system (on microfiche).

The advent of open source software changed some of that, but, still, discussion of the code was often ancillary, in separate systems (email, online chat, discussion boards, and the like). For example, when Linux was announced, it happened on an email list. Today, the announcement of a new project would happen by a number of means, but the beginnings of the code project would exist at GitHub. GitHub puts the social exchange at the very center. I suspect that GitHub’s servers now contain the world’s largest corpus of commentary around intellectual production. As I see it, the production of code has been wildly accelerated since the advent of the open source model; but the emergence of GitHub has been like throwing gasoline on that fire. In the code I live in — Ruby and Rails and its associated libraries — development has been proceeding at an unparalleled pace, largely encouraged by the collaborative model at GitHub.

In GitHub, every single change to the code is tracked and can be discussed at the most minute level.

Here are a few examples of the kinds of comments you might read on GitHub:

Discussion of a contribution to a project: https://github.com/copycopter/copycopter-server/pull/67

Comment on a single line: https://github.com/rails/rails/pull/6348#discussion-diff-848099 Based on this discussion, the code change might be removed or altered. Frequently there are replies and further discussion right here at the line level.

Discussion of a potential bug: https://github.com/rails/rails/issues/7034

Jokes, meta-discussion, and watercooler talk: https://github.com/sstephenson/rbenv/pull/30

And so forth.

But sometimes the discussion goes far and wide: For example, here’s a major discussion of product direction, erupting from a significant change at the code level: https://github.com/rails/rails/commit/9f09aeb8273177fc2d09ebdafcc76ee8eb56fe33
While this discussion does not quite descend to the level of Godwin’s Law (http://en.wikipedia.org/wiki/Godwin’s_law – after enough time, someone makes a comparison to the Nazis or Hitler), it does manage to include edited pictures of Darth Vader and Stalin with ironic commentary. Much of the commentary here shifts into a discussion of community and the mode and means of communication itself. Arguably, this is ground zero for the construction and destruction of the community of contributors and users of the project.

I think these discussions go to the core of so many issues we deal with every day at work. Let me list a few:

  • public vs. private
  • ownership of intellectual production
  • Who is commenting, and what kinds of comments are they? (gender, race, experience, etc.)
  • Tempo of collaboration
  • Praise and blame
  • Honor and shame

And I am sure there are many more dimensions to think about. Many claims are made about the nature of communication in online communities. But the GitHub difference is the overtly purposeful nature of the communication. Yes, I know that conversations on Facebook and Twitter have purposes, but at GitHub, there is real pressure to move a project along and keep it alive. If you’re a scholar interested in computer-mediated communication, you ignore GitHub at your peril. Increasingly we are seeing the GitHub model adopted elsewhere, for instance at Docracy (http://www.docracy.com/ – for legal documents), but for sheer volume and diversity, GitHub is the place. GitHub is writing — and writing about writing. It can be analyzed with a microscope: But there is also an API providing for machine analysis of the corpus (see http://developer.github.com/v3/). I took a quick spin through the recent tables of contents of major journals in rhetoric, composition, and technical writing, and I don’t see much if anything regarding GitHub. Scholars, hop to it.

  • http://shawnyeager.com Shawn Yeager

    Thank you for writing this, John. It’s a very informative, thought-provoking piece, and goes a long way toward explaining why GitHub was able to raise that impressive round of funding.

  • Jed Harris

    Does anyone know any scholars using this material? Or planning to use it?

    Where is there movement toward open collaboration / discussion outside of software and Wikipedia?

  • http://7fff.com jgn

    Thanks. The comments on Hacker News are pretty funny – Guess I should have been even more blunt and obvious: http://news.ycombinator.com/item?id=4244226

  • Louie

    Excellent insight. Of course, to analyze exchanges on GitHub knowledgeably, scholars would have to know something about the code/language under discussion, no? To reach that goal, most scholars in rhetoric, composition, and tech writing will indeed have to hop to it! BTW, I introduced an SVN repository into my electronic textual editing course to manage collaborative work on a TEI encoding project (https://code.google.com/p/stephens-letters/), but I didn’t push the advantages of comment and discussion. My bad. Doing so would have enriched the course. Next time. And I guess I need to explore the relationship between an SVN repository and GitHub, though our encoding application ( comes with an SVN client, so there’s good reason to stick with that for our class projects. Thanks for the post.

  • http://7fff.com jgn

    Thanks, Louie. I hope you can send this out to people like Andrea, et al.

    As for the code/language under discussion: It is amazing the number of comments that are in the “phatic” dimension — a lot of the discussion isn’t even about the code. I think there are some insights to be drawn without even knowing what’s being talked about explicitly.

    For your electronic text editing course: GitHub is pretty amazing — you could set up a private repository for $7, and all of your students would be able to participate. Each project gets a Wiki as well.

  • http://philgo20.com/ philgo20

    And as more and more people publish plain text editorial pieces on Github or other type of content, it creates an incredibly social collaborative space.

  • http://www.justarandomguy.com Akshay Bist

    Nice post. Really made me think.
    Correct me if I’m wrong, but the metric you’re using is the end product that comes out of the interaction on a social network, right?
    In that case, I agree that nothing comes close to GitHub as a social network, at the moment.

  • http://7fff.com jgn

    Right. People actually produce “something.” I don’t want to denigrate what comes out of Twitter and Facebook — but I think the fact that GitHub is dedicated to “making things” makes it shine as something to observe, understand, and study.

  • http://blog.rubydubee.com Pradyumna Dandwate

    I see your point! I love github, but I will argue on the fact that “Its the most important social network”, even for the programmers!
    I still think StackOverflow or StackExchange in general is the most important social network. Nice post BTW.

  • Franko

    In order to use github you need to understand git, a version control system. While this kind of system can be very useful also for scholars it is unlikely that it will be adopted by them because their great majority doesn’t know any of these systems. For this reason I guess that github will remain a tool used by programmers. The other reason is that git requires, ultimately the usage of a CLI interface and this is considered too difficult for many non-programmers.

    The other limitation of git is that it works fine only with files in text mode. For people used to work with files in Word format or similar this isn’t going to work at all. The usage of a textual form is the necessary condition to have the possibility of comparing different versions of the same file and merge multiple changes. This is the case for many scholars that use latex but many of them still use M$ Word.

    If word processor applications were using an intelligent format like HTML with CSS this would be possible but unfortunately M$ choose a proprietary binary format that, in principle, can be opened with M$ software. The same is also true with other popular commercial applications.

  • Jed Harris

    All interesting points. However there are lots of disciplines where scholars use computers and can handle a CLI. (Although there are also good GUIs for Git.) And in some disciplines (math, statistics, computer science, economics, etc.) there are significant sub-groups that use LaTEX or other text formats for their papers.

    I think the academic norms against open collaboration are a big problem — they are pretty deep rooted. There may be some erosion, for example with the various Polymath projects. But change will be slow.

    I would very much like a WISIWYG word processor with clean HTML + CSS format output, by the way — even if it required some extended attributes or whatever. I don’t understand why there isn’t one.

  • http://7fff.com jgn

    Franko & Jed: Arguably — and I am not making the argument — the fact that academics write in MS Word format, RTF, and the like, inhibits collaboration. It is a real problem that it is so hard to compare documents, quote via a link, etc. Were academics more deeply invested in true collaboration (and I think there are good reasons to think that their interests are opposed to collaboration-in-public owing to the practices around promotion and tenure) they might prefer text-based documents. Incidentally, if you want to see an area where academic technology is truly screwed up, check out the conventions for bibliographic citation, and the software that is supposed to facilitate that: It’s a minefield.

  • http://twitter.com/fettemama fettemama

    lol m(

  • Pingback: Four short links: 23 July 2012 - O'Reilly Radar

  • Pingback: Develop in the Cloud - Karl Hakkarainen - The Social Effect of Cloud Development

  • Pingback: Clay Shirky on version control as the latest form of argument • Jonathan Warren - Web Consultant

  • Pingback: The Most Important Social Network: GitHub « Prativa's Blog

  • Pingback: Spoiled Milk | Build 2012, the talks and the stories

  • Pingback: What if we checked the commons into GitHub? | Harlan T Wood

  • Pingback: » Jaki jest pożytek z Internetu dla nauki akademickiej, dlaczego warto się dzielić mikroodkryciami i jak wykorzystać technikę, by pracować wydajniej SmarterPoland

Previous post:

Next post: