Ben Gotow-Potential data sources
SMS Message Logs:
For the last couple weeks, I’ve been working on a tool that syncs SMS messages off Android phones and stores them on your computer. As a result, I have a data set of thousands and thousands of text messages I’ve sent and received, along with their contents and timestamps. I’d like to create an infographic that explores SMS usage by graphing messages as they travel between phones in a graph. I’d like to visually represent the social network that can be inferred from the frequency of communication between people, and also look at how SMS “conversations” (in which many messages are sent back and forth in quick succession) can be used to gauge the complexity of the relationship between two individuals.
GPS data of runners in Pittsburgh
Garmin Connect is a website that allows people to track their outdoor exercises (running, biking, hiking, etc…) using the GPS signals recorded by Garmin devices. The site has recorded more than 657,000,000 miles of movement by it’s users while exercising, and I’d like to synthesize that information to produce a visualization of this style (http://www.vimeo.com/10199455) that shows the activity of runners within the city of Pittsburgh.
Price of flights out of this winter crap over time
I’d like to write code that periodically looks at the price of plane tickets between Pittsburgh and warmer cities and provides a visualization of the average change in price in the weeks leading up to the flight. This visualization would be a map of the united states. Arrows going from Pittsburgh to other cities would change in thickness based on the price of those flights, and the user would be able to scrub through time to see prices change.
Hi Ben,
All of these ideas are interesting, so I’d be grateful to know which one you’ve settled on. I have a mild preference for the SMS idea since you’re closely invested in the data.
Regarding text messages, I like the idea of displaying a social graph of your messaging log. But there’s also some interesting opportunities for displaying text content — characterizing, somehow, the vocabulary and tone you use to communicate in this medium. A tag cloud would be a simple start; but you could also look at various open-source tools for sentiment analysis, reading grade-level evaluation etc. for analyzing this corpora.
Check out the Stanford CoreNLP tools (http://nlp.stanford.edu/software/corenlp.shtml) for example!