Looking Outwards – Dataset of GIThub projects’ activities (and swear words by programming language)
http://corte.si/posts/code/devsurvey/index.html
This guy has a really cool data set of over 5000 “active” GIT repositories. He pretty much did the first 3 steps of Ben Fry’s steps to making an info-vis: acquire, parse, and filter. He produced some basic statistics on the data, but there’s probably a lot more interesting information hiding there!
My favorite is the “Number of swear words per 1000 commits by language.” I remember an old javascript/php web app I made where I didn’t have a dev environment set up, I had to commit to the actual server to see results. Every time I had to debug something I’d end up with 50 or so commits just on that issue…many of the comments were filled with cursing 🙂
Nice!
Interesting to see Java doing pretty well there. Quite surprising!