Caitlin Boyle :: Project 2 InfoViz
My idea came from…, various exercises in frustration. In a way, the hardest part about this project was just committing to an idea… Once my initial project fell through, my attack plan fell to pieces. I’m not used to having to think in terms of data, and I think I got too wrapped up in the implications of “data”. Really, data could have been anything… I could have taken pictures of my belongings and made a color map, or done the same for my clothing; but in my head, at the time, I had a very specific idea of what the dataset I was searching for was, what it meant, and what I was going to do once I found it. I think stumbling upon the ruins of the government’s bat database put me in too narrow a mindset for the rest of the project… for a week after I realized the batdata wasn’t going to fly, I went looking for other population datasets without really questioning why, or looking at the problem another way. It took me a little longer than it should have to come back around to movie subtitles, and I had to start looking at the data before I had any idea of what I wanted to visualize with it. My eventual idea stemmed out of the fluctuation of word frequency in different genres; what can you infer about the genre’s maturity level, overarching plot, and tone by looking at a word map? Can anything really be taken from dialogue, or is everything in the visuals? The idea was poked along with thanks to Golan Levin and two of his demos; subtitle parsing and word clouds in processing.
Data was obtained… after I scraped it by hand from www.imdb.com ‘s 50 Best/Worst charts for the genres Horror, Comedy, Action and Drama. .srt files were also downloaded by hand because I am a glutton for menial tasks I’m a novice programmer, and was uncomfortable poking my nose into scripting. I just wanted to focus on getting my program to perform semi-correctly.
Along the way, I realized… how crucial it is to come to a decision about content MUCH EARLIER to open up plenty of time for debugging, and how much I have still to learn about Processing. I used a hashtable for the first time, got better acquainted with classes, and realized how excruciatingly slow I am as a programmer. In terms of the dataset itself, I was fascinated by the paths that words like “brother, mother, father” and words like “fucking” took across different genres. Comedy returns a lot of family terms in high frequency, but barely uses expletives; letting us know that movies that excel in lewd humor (Judd Apatow flicks, Scary Movie, etc.) are not necessarily very popular on imdb. On the other hand, the most recurring word in drama is “fucking”, letting us know right away that the dialogue in this genre is fueled by anger.
All in all I think I gave myself too little time to answer the question I wanted to answer. I am taking today’s critique into consideration and adding a few things to my project overnight; my filter was inadequate, leaving the results muddied and inconclusive. I don’t think you can get too much out of my project in terms of specific trending; the charm is in it’s wiki-like linking from genre-cloud, to movie titles, to movie cloud, to movie titles, to movie cloud, for as long as you want to sit there and click through it. I really personally enjoy making little connections between different films that may not be apparent at first.
Subtitle Infoviz ver. 1 video
Post-Critique (coming soon) :: more screenshots/video/zip coming soon… making slight adjustments in response to critique, implementing labels and color, being more comprehensive when filtering out more common words. I plan to polish this project on my own time.
Hi Caitlin, here are comments from the PiratePad:
label the genres
^^ Agreed
nice hyper linking
frequency threshold might help in combination to the word length threshold
what is the reason for the color shifting, random?
The use of screens is a little confusing. It would be nice if you could duplicate that interation all in one area. This is really interesting though. I think the colors could have been more meaningful.
I like the ability to link your way through this wikipedia style. >>agreed
Is there any meaning for the colors? I would like to see the frequency of the words for further comparison.
Need to label your genres in your small multiple tag clouds.
It’s important to filter these results by the most common words (which you have done — by eliminating the common words). But there’s also a chance to find out what the most common words in movies are, as different from written text. The word please is something like the 108th most common word — that’s why it keeps showing up at the top of your tag clouds. I’m not sure it’s a good conclusion about movies.
Would have been nice to show the selected word in Red. etc.