Looking Outwards – Dataset of GIThub projects’ activities (and swear words by programming language)
http://corte.si/posts/code/devsurvey/index.html
This guy has a really cool data set of over 5000 “active” GIT repositories. He pretty much did the first 3 steps of Ben Fry’s steps to making an info-vis: acquire, parse, and filter. He produced some basic statistics on the data, but there’s probably a lot more interesting information hiding there!
My favorite is the “Number of swear words per 1000 commits by language.” I remember an old javascript/php web app I made where I didn’t have a dev environment set up, I had to commit to the actual server to see results. Every time I had to debug something I’d end up with 50 or so commits just on that issue…many of the comments were filled with cursing
Nice!
Interesting to see Java doing pretty well there. Quite surprising!