Ben Gotow-SMS Visualization
The Idea
What can you infer about someone’s social network from their text messaging activity?
A few months ago, I started working on an app that syncs text messages from an Android phone to a desktop client for the Mac. The idea was to decouple text messaging from the phone, enabling the user to have a conversation anywhere and seamlessly transition between messaging on the phone and messaging on a laptop or desktop.
While developing the application, I noticed that the thousands of messages synced by the app revealed interesting trends about my conversational patterns, and it seemed like a perfect data set for a visualization.
The Data
The application downloads all the user’s messages from their phone and stores them in an SQLite database. A text dump from this database formed the data used in the visualization. The format of the text dump is shown below:
6157141096:::Allison:::1:::119:::87
The text is ‘:::’ delimited. The columns are as follows:
- The phone number messaged
- The display name of the user messaged
- The origin of the message (0 = your phone, 1 = theirs)
- The frame number on which the message should appear in the animation. This is calculated by taking the timestamp of when the message was sent or received, subtracting the timestamp of the first message in the animation and dividing by an acceleration factor.
- The length of the text content in the message.
The Visualization
The Discoveries
The biggest discovery along the way was that Processing is pretty cool and very easy to use. I have a lot of experience working with OpenGL and Mac OS X’s Quartz2D APIs, and Processing was a nice surprise. I was able to go from concept to an early working version of the visualization in one afternoon. My one big complaint is that there’s no built-in debugger whatsoever… Coming from a programming background, that’s pretty damn ghetto. I’ve heard you can use Eclipse somehow, so I’ll try that next time.
I was unsure of how to create a graph of interconnected nodes in Processing. I wanted to create one dynamically without advance knowledge of the number of nodes needed, and I didn’t want to write any code to do it. I thought that using some sort of spring physics model would allow the graph to be self-organizing. I did some searching and found the Traer Physics library, which I dropped into my Processing libraries folder and linked into my sketch by binding each contact object to a node in the physics simulation. That was it. There was much rejoicing.
Each node was added to the physics model as a solid body, and negative attractors were added between each of the nodes to cause them to spread evenly. Springs were added between each node and the center ring. This turned out to be a great solution because the resting length of the spring could be adjusted to move the nodes toward and away from the center. I’d wanted to do this the whole time, but using the springs allowed me to smoothly animate that part of the visualization, too.
My original idea was to represent each message sent or received as an arc between nodes. However, I wasn’t sure whether Processing would be able to handle drawing the number of curves required at a decent framerate. With a data set of over 2300 messages, I was pretty sure it would become unworkably slow. Big surprise, it’s Java, and it did. I had to add a shortcut to disable lines so I could rapidly test the visualization.
The Critique
Overall, I’m pretty happy with the visualization. I was able to animate it, and it achieved the initial goal of revealing the social network inferred by your messaging habits. There are a few things I’d like to explore in the data that the visualization doesn’t reveal, though. There’s a lot of data in the actual text contents of the messages that would be fun to look at. How often do people use emoticons? Do you use emoticons more frequently when the person you’re talking to also does? Is there a bimodal distribution in message length that implies that some messages are part of complex multi-message conversations while others are simple “pings?” Answering those questions would require other visualizations, I think–but I’m really curious.
The Code
Dependencies: The processing applet requires the Traer Physics library.
Example Code Used: The code below draws on a large amount of sample code, from Processing.org and from the documentation of the Traer Physics Library. The Processing “Load File 2” example was particularly useful. The code for the wavestream was written from scratch (in a rather ghetto way.) I’m still looking for a good library that creates them!
The source code is available here
A note about source data: Unfortunately, the source data for this experiment contains sensitive data including people’s phone numbers and names. I’ll be releasing the SMS synchronizing app for Mac and Android soon and that will allow you to gather and format your own messaging history for visualization.