Words from Beyond Hope

by Cheng @ 4:53 am 3 February 2010

Words from Beyond Hope is an attempt to visualize the last statement made by criminals shortly before they are executed.

The Texas Department of Criminal Justice maintains a list of executed offenders’ last statements, along with their gender, racial, photo, crime and victims, even their unappetizing last meal request. I keep a collection of related links on my blog post here. A sample of the statement is here-

It strikes to me how the simplest words repeat in all the statements. This is partly due to, as Golan pointed out, their limitation on education. Partly, I think, because it’s the most honest statement flowed right out of their mind, with no rhetorical decoration.

My design try to strike a balance between reserving the repelling reading experience and showing the similarity shared among all those executed. I envisioned a canvas with words appearing at the rate of heart beat, and a bit kinetic effect to resemble heart beat and life, and time elapsing. Then the words gray out and  scatter around the screen, disappearing a bit as the life did. With the font proportional to occurrence prequency, the piece immediately highlight the common words from all last statements at the first glance. If viewer stays with it, they could read the statement words by words, as if viewing the writing composing them. In this way new information are conveyed to viewer and the interaction can carry on for a longer period of time.

Although a number of compromises had to be made in the final piece, one detail comes out as a little surprise to me. The collection of the most common words seems to form a message that is almost the average of the statement themselves.

Image below: at a glance, you’ll read ” I love you all”,”god love you all”, “I love my family”, “thank you”… the very phrases that’ll stay in your head as you read the actual statement.


The discrete words popping don’t do a great job suggesting they’re actual readable sentences and lose the power of the raw statement. Giving them a smooth in and out and allow them to stay “in focus” for longer may be a quick fix to encourage actual reading.

Project 1: Minute II

by areuter @ 11:56 pm 31 January 2010

Minute (II) is an interactive application which allows the viewer to examine how people’s backgrounds affect their perception of time–in this case, a minute. Previously, in Minute (I), I asked several people to perceive a minute while I recorded them, and afterward, fill out a survey on their background (do they drink coffee, how much did they sleep last night, etc).  From this I created a video in which the participants’ minutes are arranged in a grid; their exact placement roughly determined by one aspect of their background.  There appeared to be some correlation, but it was difficult to tell due to the limitations of using a discrete grid structure on a continuous set of data.  For this iteration, I decided to break away from the grid and place the minutes along the x-axis based on the currently selected background criteria (the y-axis is random).  Most importantly, I implemented my goal of making the information interactive (using openFrameworks), so that the viewer can investigate the specific background information that they believe might influence a person’s perception of time.  This also makes it possible to include a huge database of minutes, and then randomly select a subset of them to play each time the application is run.  One other modification was moving the actual elapsed time from the center of the screen to the bottom so that more emphasis is placed on the participant’s relative perception of time (potentially a result of their background) than how “accurate” they are.

Please contact me if you would like a copy of the application, the ZIP is pretty large (74 MB)…

Learning some math, any suggestions?

by jsinclai @ 4:35 pm

Hey all,

So I’m certainly a bit behind on my computer graphics and math skills. I haven’t taken any math since single variable calculus in High School, and I’ve never taken a graphics course. Anyone know any good tutorials or crash courses from which I could learn some of the principles?

I might even consider buying a textbook if it’s a really useful reference.

Thanks for any direction,
~Jordan Sinclair

Project 1 – Inbred Music

by jsinclai @ 8:26 am 27 January 2010

So, I have an awful taste in music. I am absolutely in love with this obscure type of techno called Happy Hardcore. Essentially, it’s underground British rave music. I don’t know how it happened, but it did, so I have to live it with.
Anyways! Ishkur’s guide to electronic music says that “ALL the world’s Happy Hardcore is made by only 12 guys, who have more pseudonyms than a shark has teeth, and who churn it out at such a feverish pace you’d almost think that there’s probably a program that makes it for them. Just randomize the key values, get Sugar from YTV to sing the lyrics, and away you go.”

I wanted to know if this is true.
Is my favorite genre of music controlled by 12 individuals who lack any sense of originality and who must rely on each other to create anything considered music?

I pulled a ton of data from Discogs (like IMDB for music). The data was poorly structured, and not very crawlable, so I had to do a bit of manual tinkering around to download a single artist. Once I had an artist’s real name, I could then pull their data (all their aliases and all the groups they were in) using the Discogs API, which returns some awfully structured XML…

The visualization itself shows three columns of information. The first shows every artist and a circle. The size of the circle represented their number of releases under their solo name and all their solo aliases. The third column had the artist, but this time their circle represented the number of tracks released by groups they were in. The middle column shows all the groups, again, sorted by number of tracks released by that group.

One of the main problems I had was many dimensions to my data, and wanting to show all of them. I had things like number of tracks, number of aliases, tracks per aliases, number of groups, tracks per group, members per group, and I probably could have figured a way to see who remixed who (but that might be a separate project…). In the end, I realized I needed to just focus on one piece: how artists were connected to each other through groups. Everything else at that point was peripheral. Unfortunately, I already had a substantial amount of code and couldn’t find myself starting over. I then was stuck with a visualization that wasn’t very scalable (there are still a lot more artists with a low quantity of releases that were not included).

Regardless, I think this visualization is a great start. I’d love to keep working with this data if I had more time, to further realize this vision.

Future Steps:
-Comparison should be much easier. I had a few mechanisms in place, but as the codebase grew, they broke and didn’t scale.
-I’m really interested in the difference of “inbredness” of different genres, so I’d love to throw a couple of these next to each other.
-The current visualization isn’t scalable at all. I think the best way to scale something like this would be to use something similiar to the “well formed eigenfactors” visualization we saw, though, I’m still unable to conceptualize how I could tie my data into that.

Download Source code and Data Files, as well as the song used 🙂

Project 1 – Pin Numbers

by jmeng @ 8:18 am

To examine people pin numbers for what patterns people tend to lean towards when using a 10-digit keypad arranged in a 3 x 3 grid (with outlying zero). I wanted to examine trends in the numbers and patterns “drawn” when typing in pin numbers. Also to see if these trends changed by sex — more or less patterned. And finally, to see if pin numbers are really as divers as we assume they should be and see what would be considered the “safest” pin.

I created a survey through Google Documents Forms and posted it on a facebook event, as a question on Yahoo! Answers, in an email to my sorority d-list, and apart of a blog post on this course’s blog. I imagine that most of the people who filled out the survey were facebook invitees. 600 people were invited to the event, 164 people filled out the survey, and 71 were willing to give up their pin numbers. All information gathered can be viewed here.

I tried to look for patterns by grouping the numbers in different ways:

Benford’s Law – “The first digit is 1 almost one third of the time, and larger digits occur as the leading digit with lower and lower frequency, to the point where 9 as a first digit occurs less than one time in twenty” (Wikipedia).
The image above is an analysis of the numbers by splitting them up in to pairs of digits (blue = first two digits, red = middle two digits, green = last two digits) so that all numbers are between 0 and 99. The height of each bar is the number of times that number appeared in a single pin number. Benford’s Law was quite accurate for the data, but I wanted to look at more patterns in the hand movement used when typing in the numbers.

I then tried to look at patterns by when each number was used in the pin number. The image above graphs the numbers 0 – 9 against their densities based on which number they were in the pin number (blue = first number, red = second number, green = third number, purple = fourth number).

I finally used processing (for a loooong time) to create the series of images featured above. The leftmost image is an analysis of all the data collected, the middle image is for female information only and the right most image is for male information only. From four different keypad representations from left to right show the digit used as that number in the pin number (left most pad is first digit of pin, right most pad is fourth digit of pin, etc…). The top-most row is densities of a number for all data in that set. The darker the color, the denser the population. The following rows show data by numbers repeated during pin numbers by showing which numbers were repeated by density and when in the pin number. I found that 50 out of 71 people (70%) had repeating digits in their pin numbers, including: 19 out of 28 guys (68%) and 31 out of 43 females (72%), two of which had a digit repeating 3 times.

I am currently working on, and have not finished an interactive model that graphs densities of digits based on digits that the user inputs. I will probably show a non-complete version in class if possible.

I think my project went fairly successful. I was very shocked to see that so many people were so willing to give up such private and valuable information, and online nonetheless. I am pleased with the data I found and can only imagine that there are so many more number patterns hidden that I did not find. I really like the three images produced above but would’ve really enjoyed being able to interact with something. I have only used processing a few times before and tried to challenge myself to analyze all of the data using processing, and not by using excel or just looking at the numbers. In retrospect, I probably should have just found what I wanted to visualize and then hit up processing, instead of using processing to find patterns and see if it is an interesting visualization, but I did learn a lot about the language and environment that I hope to make use of in other projects.

project files

Project 1: Pursuit of Happiness

by davidyen @ 8:16 am

(My project also has issues viewing in a browser due to loading textfiles. Here’s my project.)

My project looked at visualizing people’s happiness, as it relates to their occupation and their salary. I used two different sources of data: University of Chicago’s General Social Survey, a large, comprehensive opinion survey conducted pretty regularly since 1972, and the US Bureau of Labor Statistics’s Occupational Employment and Wage data. Among routine questions including occupation, age, gender, etc., the GSS has a question about how happy people are in their life. Cross referencing this with the salary statistics from BLS, I hoped to gain insight into whether people were really happier if they earned more money.

I think I underestimated the complexity of the data and the interface necessary to effectively explore it, so I didn’t manage to fully implement some features I had intended. Overall, I’m pretty satisfied with the outcome as far as representing my idea, given the time, even though it is incomplete and there are some technical and design issues.

My greatest trouble with this project was working with the data. The GSS and BLS data had to be comparable across occupations and span many years (I originally planned 1972-2008) however BLS only provides 1998-2008 and GSS simultaneously provides data only every other year starting in the 90’s. The main issue by far was the occupational codes. Standards changed every few years and I had to write 7 different parsers in Processing to convert the data to something usable, and even then I found that there was errors in the data translation (you may notice that postmasters earn an unusual amount of money in my viz).

Some features that I plan to implement are sorting the data, scraping BLS.gov for occupational descriptions to give a little insight into each job, and a more organic softbody interaction with the bubbles. I had originally placed a lot of emphasis on getting the softbody interaction to really capitalize on the buoyancy as happiness idea, as well as have the various occupations/industries pushing past each other.

Project 1: Contrast

by jedmund @ 8:05 am

Contrast is a tool that compares tagged images from Flickr across different days. The user can define all information. The ultimate goal was to allow the user to clearly see the difference in color between the two dates entered. I haven’t quite made it there yet, but even through the visual “map” created from the small thumbnails, you can begin to see how the colors on these two days might differ.

At first I wanted to make a highly interactive piece in Processing, but in the end I’m a lot more proficient with PHP. Making it in PHP meant I’d run into a lot less problems, and ultimately, I’d have a lot more to show. Regardless, speed is still a real issue (that stems from the generation of color palettes of each image), so that’s something I hope to work on in the next few days. I’d also like to port it to Processing as time allows to get some really nice interactions that PHP doesn’t allow.

I’ll post more as I update it!
download contrast

Project 1 – ESPN Jersey Numbers

by ryun @ 7:27 am


My project is about “Jersey numbers”. When I was a middle school student, NBA was so popular in Korea, I was one of the big fans of it. Back then, I noticed that a lot of famous players had numbers such as 23, 32, 33, 34. Maybe it is because the basketball heroes had such those numbers (ex. Kareem Abdul-Jabbar had 33, Magic Johnson had 32 and so on). In this project I wanted to see if there is any pattern of popular jersey numbers according to the specific sports, positions, ages and even salaries. This is the reason why I came up with this idea.


It took a while to make such a huge data sheet. From ESPN website, I copied and paste team by team. Finally I could get almost all the current players information containing their age, position, height, weight, jersey numbers even salary.

Total 3272 players(383 NBA players + 1261 MLB players + 1628 NFL players). I was able to get all the NBA players pictures but due to the time limitation I decided not to handle the other sports players. This is why you do not see the pictures of MLB and NFL players.

I used Processing to visualize this data. I categorized the data by numbers, sports, positions, experience, salaries, and so on. I tried to apply the color code to make the visualization more clear.
There are two graphs – one is to see what numbers are popular to use as a one digit (i.e. 34 is counted as 3 and 4) and the other is to see the popular numbers as a whole. Each sports has options to choose so that you can sort and filter the players by positions, experience and salary.

As I expected, I was able to see a certain patterns for each sports. I want to show some interesting patterns here.

As far as I know, NBA players can wear any number they want unless other players in the team have the number already. But surprisingly, you can see the numbers over 50 are not so popular.

The position wise, you can observe the small size players such guards wear small numbers(light color) and big players such as centers have big numbers – it this coincident?

There is a strong pattern that pitchers(light color) have over 25-65 and infielders and outfielders have small numbers (1-25).

Another interesting thing in MLB is if the player are new or low salary, they mostly wear big numbers. I assume that their numbers here are their temporary numbers in case that other players wear it already so they have to wear leftover numbers. This indicates that big numbers are not very popular in MLB too.

In NFL I was able to get a very strong pattern. Because NFL players usually wear their numbers by positions you can see this interesting visualization.

Accrding to Benford law (the first digit is 1 almost one third of the time, and larger digits occur as the leading digit with lower and lower frequency, to the point where 9 as a first digit occurs less than one time in twenty.), I was not able to get a strong outcome to support it. But I can guess that MLB and NBA players do not tend to have large numbers. And Some article indicate that number “12” is the most popular in NBA but not with my data. FYI, number 1 and 7 was the most popular.
It was so fun to see such patterns but I wish I had more time to play with it more. Maybe I could have compared with more of other sports with more other categories. In HCI perspective, the interface has some things to fix and improve such as cursor shape for mouseover features, Y axis on the graphs should have indicators , maybe could use more effective colors to compare each others. But with the limited time, I am pretty happy with this result.


Source file (16.2M)
Presentation slide (6.5M)

project 1 – infovis.

by Mishugana @ 7:24 am


Assignment 1: The Modes of William Shakespeare

by Michael Hill @ 7:20 am
GDE Error: Unable to load profile settings

From the beginning I was quick to discover that I would have to pull back on the scope of this project.  I simply wanted to display too much information.  Initially I had intended to show a wide variety of statistics along with each characters influencing dialogue.  This proved to be too much.  Instead I chose to take a more straight forward approach and find a way to display the play itself in a unique way.

While prototyping this display method, I discovered a few things I thought were particularly interesting:

Poetry From "A Midsummer Night's Dream"

The picture above shows lines of poetry spoken within “A Midsummer Night’s Dream”.  I found this visual representation of speech rather unique.

One Partial Scene and Three Whole Scenes From "Titus Andronicus"

Another feature I found interesting was the obvious distinguishing line between scenes (as seen above).

Aftermath and Breakdown

Overall, I am happy with my final “poster” results.  I do wish that I had been able to make this an interactive piece (which I may go back and do for posterity’s sake).  One feature I would have liked to have been able to include is a “Last Words” pop up which would show a character’s last words in the play.

It would also be nice to figure out a method in which to display all of a play on screen at one time and still have it be relatively readable.  I found that it was nearly impossible to display any of the plays I tested in a window less than 2000 pixels.  Images ranging from 10k-20k pixels in width were the best for display, but were impractical when it came to navigating


The Complete Works of William Shakespeare came as a single file from Project Gutenberg:


I then removed the Sonnets and extra text placed there by Project Gutenberg.

During the process of parsing the large “Book” of plays, as well as each of Shakespeare’s plays individually, I found Java’s Class Pattern Documentation particularly helpful

My Book Parser

My Play Parser

Please be sure to check out the full version of the PDF displayed above.  Google’s PDF reader is not capable of properly displaying a file of that size.  Thanks! -M

presentation slides (.ppt, .pdf)

Project 1: Words Across Culture

by aburridg @ 6:56 am

Here’s a link to my project >applet< .

And…here are some screen shots for each emotion:


As mentioned, I used Mechanical Turk to collect most of the information. I was able to get responses from roughly 500 people total in 1-2 weeks from all over the world (which is pretty cool). I only ended up using about 300 of the responses–some participants did not fill out the survey correctly, and I had too many participants from the US and India (150-200 of the data I used were from US and India participants). For each country, I rounded the numbers to 20 to help my program run a little faster, but the trends are all still visible.

Figuring out how to make pie charts was the most interesting/challenging next to collecting the data. I got a lot of help from these two sites: Processing arc() reference and Processing Pie Chart Example. Yep–this project was written in Processing. I had a lot of fun with it…and I learned to appreciate it more after seeing how easy it was to load data from a file (Processing has a lot of cool formatting tricks for that).

You can’t really tell with the pictures, but this project is interactive…the “next >” button will turn white if you mouse over it, and if you click on the “next >” button, another word will show with its respected pie charts. I wanted to make the pie charts morph from word to word, but I decided to keep it simple and clean. I think with the colors and the way the information is displayed, the pie charts look aesthetically pleasing on their own.

So, why is the information I have interesting/useful?

Well, you can learn a lot about how colors and concepts are interpreted internationally. You can also see the relation between different concepts and colors. Some are rather obvious: like the more aggressive concepts (anger, fear) are usually associated with red and black. You can also see how some words which have no cultural context, people chose a color that suited their experiences. For example, with confidence, my participants were all over the board–from their comments, I learned that most chose their respected colors because the color looked good on them or they associated the color with a good memory.

That being said, a lot of the trends you see in the data are pretty obvious. For example, for “Jealousy” all the Western cultures picked green predominately due to the coined phrase “green with envy”. Or, most people associate “Anger” with red because “people’s faces turn red” or “red is the color of blood”.

Another interesting observation is with “Happiness”. Most western cultures associate happiness with yellow while other cultures associate it with “green” because green seems to represent fertility and prosperity according to their religions.

I did want to display the comments–but a lot of them were redundant. I might add them on later…because I plan to continue with this project and collect more data to use for my final project.

I did think I showed my data broadly and accurately. And, I did get some interesting results. I like the way the pie charts work–they’re very easy to compare against each other.

However, I don’t think 300 data points is necessarily enough to make an argument. I think I was a little too ambitious and probably should’ve sticked to pulling something off the internet, practically (however, I don’t regret my choice too much because I learned a LOT about obtaining my own data, and the process, though trying, was fun!). I also wish I could make the display a little more interesting.

Jon Miller – Project 1

by Jon Miller @ 5:26 am

Get Adobe Flash player

I have attached a zip file here: (link) containing an html file that opens the flash object without resizing it. Download both, then open the html file. Thanks everyone for the positive feedback!

I wanted to explore something that might be completely new to people, including myself. Password lists are widely available online, however, they are predominantly used to gain access to accounts, and occasionally viewed as a curiosity. I decided to look at the content of the most commonly used passwords to see what I would find.

After exploring the various databases for awhile, I decided to sort by category a database of the most popular 500 passwords myself and with the help of a friend. Although this would introduce some personal bias into the data, I felt it would be more useful this way to us, to allow us to see what people find important, and to compare the relative popularities of related things. For example, while it might not make much sense to compare “132456” with “jordan”, it could be more interesting to compare “mustang” directly to “yamaha”.

I also included a sample of a much larger database, so that people can observe random passwords scrolling by. Having looked by now at several hundred passwords, I have come to appreciate the value of simply reading them, recognizing things such as childhood artefacts and obscure motivational phrases.

People most often put (presumably) their first names. Names which coincide with famous people (for example: Michael Jordan) are more popular. Other names, such as Thomas, are simply very popular first names. Other very popular choices are profanity and sexually related words, which perhaps shows more about what people would prefer to think about when they think no one is looking. Other major categories include references to pop culture, sports related passwords, and incredibly easy to guess passwords, such as “password”. This might be a reflection of the apathy or ignorance to the ease of which one’s account can be cracked. However, it might also be a reflection of the fact that these passwords come from sites which were hacked – most likely social networking or other noncritical websites. Thus, a password list from, say, a bank or a company would be less likely to contain such obvious passwords.

Looking at individual passwords, we can see that many of them educate us about many popular cultures: for example, “ou812” is an album by Val Halen, and “srinivas” is a famous Indian mandolin player. Of particular curiosity is “abgrtyu”, which is not a convenient sequence of keys like “asdf” or “qwerty”, has no apparent cultural origin, and yet is still in the top 500 list. One theory is that the word was repeatedly autogenerated by spam bots who created accounts. Another theory is that it is a fake password, added to the list to prevent people from plagiarizing this particular password list, similar to the way real dictionaries will add an imaginary word so that thieves can be easily caught.

We can delve further into the categories and look at what people seem to value in their vehicles and the brand names – there are american made sports cars at the top, with higher and lower vehicles appearing further down the list. Curiously, “Scooter” appears 5th – perhaps because of its recognition as a band as well?
Looking at the randomized database of several million passwords, there are many more references to things, many of which I do not recognize, some of which I do. They range far and wide, from minor characters in videogames to storybook villians. Many passwords here are similar to the top 500 passwords (which should come as no surprise).

This journey has been a highly speculative one, involving many google searches leading to cultural references and lots of browsing over seemingly random assortments of words and phrases. It is refreshing to see that people choose overall more positive things than negative (for example, “fuckme is soundly ahead of “fuckyou”, though both are popular passwords), and it was interesting to reflect on my own choice of passwords.

I chose to program this in Flash, not because I felt it was most suitable for the task (given its lack of file I/O, it could be argued that it is the least suitable), but because I want to become a better Flash programmer.

Further steps would be to make the interface interactive somehow, so that the people on the internet could sort words their own way, perhaps slightly similar to refrigerator magnets. This way people could see what everyone thinks, rather than deal with my personal opinions on how they should be sorted. Perhaps also user submitted passwords could be added to the list.

Project 1- Color Changing Fruit

by caudenri @ 3:30 am

Download the project here…

I’ve been thinking about doing this project for quite a while in the general sense of measuring the color change in foods over time. For this project, I attempted to measure the change of color in an apple, an avocado, and a banana while exposed to air over time. In the future I’d really like to try measuring the color of many other things.

To collect the data, I made a white foamcore box and attached a webcam facing in so I could put whatever I wanted to photograph in there and be able to somewhat control the lighting and have a consistent background. I set the camera to take an image every 10 seconds for the apple and avocado, and every 5 minutes for the banana since I figured it would take longer. I still underestimated the amount of time I would need to achieve a dramatic change in color and going back to revise this project that will be the first thing I work on. As you see, the banana’s color didn’t seem to change much and it represents over 48 hours of image-taking.
Making images of the apple

I was a little bit disappointed with the color quality– namely because it’s very hard to detect much of a difference from the beginning of the gradient to the end, and the colors are very muted. I used Hexadecimal color in order to easily save the strings to a text file and read them out again, but I may try using HSB and see if I can get a little better color quality and distinction. This problem could also be fixed by using a better camera and taking the images over a longer period of time.

I had wanted to end up with photoshop color swatch tables or gradients that users could plug in, but I simply ran out of time to work with this, and I plan to go back and try to do this with the project. Another thing I wish I had done is be more careful with the lighting when taking the images– I covered the box where the images were being taken but I think they were still effected by the lighting changes in the room, which you can see where there is an abrupt hue change in the gradient.

Project 1 – kayak.com visualization

by rcameron @ 3:28 am

Download OSX Executable (~1MB)

The setup is you click your location and based on data pulled from Kayak, Bezier curves are drawn to possible destinations. The weight of the lines represents how cheap or expensive the ticket is. Thicker = cheaper.

On the back end, I wrote a Ruby script that polled for flights between the cities shown on the map. Since Kayak only allows 1,000 hits/day, I had to limit it in some way. That constraint led to only looking for flights for the next weekend.

My biggest disappointment was not getting the Bezier curves to animate in Processing. On top of that, the current city choices are pretty limited.

Tesserae: Making Art Out of Google Images

by Max Hawkins @ 2:47 am


After the critique last week I decided to change my topic from mapping the colors of the tree of life to comparing the meaning of words in context. This idea came from the realization that images of closely-related species take on different colors based on whether that species lives in captivity or in the wild. One chimpanzee had a green hue in the visualization whereas its close siblings were all yellow. It turned out that the difference was due to green leaves in the first chimp’s environment and the yellow zoo-bedding in its siblings’ environments.

My final visualization (named Tesserae, the latin word for tiles) was created in order to display these differences visually and allow people to test hypotheses about the contextual meanings of words like the names of the two chimps. It uses an HTML interface that allows users to type in related words and see their visualizations side-by-side.

One frustration I had with the previous project was that the color average used to color the nodes lost the rich textures of the underlying images, leading to an uninteresting visualization. To remedy that problem I took 15×15 pixel chunks out of the source images, preserving the textures while obscuring the overall image. This allows the visualization to become an composite that abstracts away the meaning of the individual pieces, making the user to focus on the relationships between images rather than the images themselves.

Since all of the images are downloaded and client-side, no two visualizations are the same. The tiles are arranged randomly on load and the images themselves change as Google’s search index updates.


In practice Tesserae is more useful for aesthetics than for practical comparisons between words. I’ve been using it more often as a story-telling tool than a data analysis one. This visualization is a good example. It uses the search terms “stars,” “earth,” and “sun” to paint a picture of our solar system.

Some meaningful comparisons do exist, however. They just seem harder to find. One that I found a little interesting was the difference between the names “Alex” and “Alexandra,” shown below. Tesserae does all image searches with SafeSearch off, so the image for Alexandra is full of pink-tinted pornography. Alex has no porn and is a washed out gray. If I put on my “culture” hat for a second, this might say something about how the internet views women and men differently.

There are a few more examples worth looking at on the project’s website.

Room for Improvement

Aside from the performance issues listed on the project website, I can think of a few places where the visualization could improve:

Since it only grabs the top eight images from Google Images it’s easy to pick different images out of the composite. A larger number of search results might be beneficial but is cumbersome to implement using Google’s search API.

A save button could be added to make it easier to share generated visualizations with friends.

The original photos that created the mosaic could be displayed on mouseover. This would make it easier to find out why a mosaic turned an unexpected color.


Project-1 The Smell of…

by kuanjuw @ 1:50 am

“The Smell of…” project is trying to visualize one of the five senses : smell. The way we used in this project is very simple: collecting the sentence in twitter by searching “what smells like” ,and then use the words to find the picture from flickr.

How would people describe a smell? For the negative part we have “evil-smelling, fetid, foul, foul-smelling, funky, high, malodorous, mephitic, noisome, olid, putrid, rancid, rank, reeking, stinking, strong, strong- smelling, whiffy…”; for the positive part we have “ambrosial, balmy, fragrant, odoriferous, perfumed, pungent, redolent, savory, scented, spicy, sweet, sweet-smelling…” (Search on Thesaurus.com by “smelly” and “fragrant”). However, compare to smell, the adjective of sight is a lot more. So, how do people describe smell? Most of time we use objects we are familiar with or we tell an experience for example: “WET DOG” or “it smells like a spring rain”.

In Matthieu Louis’s “Bilateral olfactory sensory input enhances chemotaxis behavior.” project, the authors analysed the chemical components of oder, and then visualize the oder by showing the concentration of different odor sources  . In the novel “Perfume”, Patrick Suskind’s words and sentence successfully evoked readers’ imagination of all kind of smells, and moreover, Tom Tykwer visualized them so well in the movie version that we can even feel like we are really smelling something.

The project “The Smell of…” is developed by processing with the controlP5 library, twitter API, and Flickr API. First, users will find a text field for typing in the thing they want to know the smell.

In this case we type in “my head”.

After submitted,  the program started searching the terms “my head smells like” in twitter. Once received the data, we split the sentence after the world “like” and then cut the string after period ‘. ‘. So it came up with all the result:

Figure.2 the result of searching “my head smells like” from Twitter.com

Third, the program used these words or sentences as the tag to find the image on Flickr.com::

Figure.3 The result of image set from flickr.com

So here is the image of “my head”.

For now I haven’t done any image processing so that all the pictures are raw. In the later version I would like to try averaging the color of every photos and presenting in a more organic form, like smoke or bubble. Also,  the tweets we found are interesting so maybe we can keep the text in some way.

Figure.4:: The smell of “my hand”

Figure.5 The Smell of “book”


20*20 PPT20-20

Project 1

by guribe @ 1:04 am

Visualizing beauty trends in America from the past ninety years

Click on the link below to view my information visualization for project 1:
My Information Visualization

Where the idea came from

When I started this project, I originally wanted to compare data from the Miss America or Miss USA pageant and compare it national obesity rates by state. Knowing that both obesity rates and standards for beauty have risen, I was intrigued by the comparisons these two datasets would create. Eventually, after several iterations and variations of this original idea, I decided that 1) I did not want to be limited to creating a map of the United States by using state by state data and 2) I wanted my visualization to have some sort of interactive element.

At this point, I decided that the data I had collected about the past Miss America winners was strong enough to produce a visualization that would show trends in our culture on its own without needing the comparisons of other datasets.

Using Processing, I aimed to show general changes amongst the Miss Americas through graphs and timelines that included photos and data about their height, weight, body mass index, and waist measurements.

How I obtained the data

I collected all of the data by hand, from various websites, Wikipedia, and Google image searches. I did not find any websites where parsing or scraping would have been useful, so I decided not to use these techniques. Instead, I manually collected data of each of the winners’ height, weight, body mass index, and waist measurements, as well as finding as many pictures of the winners as possible for the timeline.

The idea for this type of data collection came about when I found a page in the pbs.org website dedicated to the Miss America pageant that included most of the past winners’ stats. I used this page for most of my data and filled in the holes with my own Google searches.

Some discoveries I made along the way

The most challenging part of this project for me was working with Processing. I had only used it once before (not including project 0) mostly just using it to capture video.

Through this project, I became more familiar about how to use Processing to create drawings and interactive elements. Although the final applet may seem primitive, programming with Processing in this way was extremely new to me and I feel much more confident now about using Processing in the future.

While I was working on the project, I realized that I was more interested in the pictures of the timeline than in the graphs. If I could change the project now, I might make more of an emphasis on the images than the other data I had. It would be interesting to see a composite Miss America similar to the project shown in class about people who look like Jesus.

My self-critique

In retrospect, there are several changes I wish I could have made to this project. First of all, I feel as if I wasted much time rethinking my concept for the content of the project. More time was spent thinking of new ideas than actually creating the project which caused my final product to suffer.

Secondly, I feel as if the interactive elements fall short of my initial ambition. I wanted this to be extremely interactive; and although it is interactive in some ways, it seems no more interactive than a common website. The ultimate experience turned out to be much less exciting than I was hoping for.

Thirdly, I wish I had put more thought into the visualizations of the graphs. They are quite static and a bit boring, and especially after seeing what other students in the class have created, I feel as if it could have been more compelling.

More specific changes I would make if I had more time to work on this project would include making the scrolling action for the timeline less jumpy, and plotting the yearly data in the graphs instead of only plotting the rounded average of each decade. I would also strive to make the graphs more interesting and create a stronger interactive element.

Project 1 – Moving

by sbisker @ 12:29 am

this post is a work in progress…coming wednesday afternoon:
*revised writeup (below)
*cleaned script downloads

Final Presentation PDF (quasi-Pecha Kucha format):

Final Project (“Moving”) :
solbisker_project1_moving_v0point1 – PNG
solbisker_project1_moving_v0point1 – Processing applet – Only works on client, not in browser right now; click through to download source


“Moving” is a first attempt at exploring the American job employment resume as a societal artifact. Countless sites exist to search and view individual resumes based on keywords and skills, letting employers find the right person for the right job. “Moving” is about searching and viewing resumes in the aggregate, in hopes of learning a bit about the individuals who write them.

What It Is:

I wrote a script to download over a thousand PDF resumes of individuals from all over the internet (as indexed by Google and Bing). For each resume I then extracted information about where they currently live (through their address zipcode) and where they have moved from in the past (through the cities and states of their employer) over the course of their careers. I’ve then plotted the resulting locations on a map, with each resume having its own “star graph” on the map (the center node being the present location of the person, the outer nodes being the places where they’ve worked.) The resulting visualization gives a picture in how various professionals have moved (or chosen not to move) geographically as they have progressed throughout their careers and lives.


Over winter break, I began updating my resume for the first time since arriving in Pittsburgh. Looking over my resume, it made me think about my entire life in retrospect. In particular, it reminded me of my times spent in other cities, especially my last seven years in Boston (which I sorely miss) – and how the various places I’ve lived, and the memories I associate with each place, have fundamentally shaped me as a person.

How It Works

First, the python script uses the Google API and Bing API to find and download resumes, using the search query “filetype:pdf resume.pdf” to locate URLs of resumes.
To get around the engines’ result limits, search queries are salted by adding words thought to be common to resumes (“available upon request”, “education”, etc.)

Then, the Python script “Duplinator” (open-source script by someone besides me) finds and deletes duplicate resumes based on file contents and hash sums.
(At this stage, resume results can also be hand-scrubbed to eliminate false positive “Sample Resumes”, depending on quality of search results).

Now, with the corpus of PDF resumes established, a workflow in AppleScript (built with Apple’s Automator) converts all resumes from PDF format to TXT format.

Another Python script takes these text versions of the resumes and extracts all detected zipcode information and references to cities in the united states, saving results into a new TXT file for each resume.
A list of city names to check for is scraped in real time from the zipcode-lat/long-city name mappings provided for user input recognition in Ben Fry’s “Zip Decode” project. The extracted location info for each resume is saved in a unique TXT file for later visualization and analysis.

Finally, Processing is used to read in and plot resume location information.
This information is drawn as a unique star graph for each resume. The zipcode representing the person’s current address as the center node, cities representing past addresses being the outer nodes.
(At the moment, the location plotting code is a seriously hacked version of the code behind Ben Fry’s “zipdecode,” allowing me to recycle his plotting of city latitude and longitudes to arbitrary screen locations in processing.)


The visualization succeeds as a first exploration of the information space, in that it immediately causes people to start asking questions. What is all of that traffic between San Francisco and NY? Why are there so many people moving around single cities in Texas? What happened to Iowa? Do these tracks follow the normal patterns of, say, change of address forms (representing all moves people ever make and report?) Etcetra. It seems clear that resumes are a rich, under-visualized and under-studied data set.

That said, the visualization itself still has issues, both from a clarity point and an aesethetic point. The curves used to connect the points are an improvement over the straight-edges I first used, and the opacity of lines is a good start – but is clear that much information is lost around highly populated parts of the US. It is unclear if a better graph visualization algorithm is needed or if a simple zoom mechanism would suffice to let people explore the detail in those areas. The red and blue dots that denote work nodes versus home nodes fall on top of each other in many zipcodes, stopping us from really exploring where people only work and don’t like to live, and vice versa.

Finally, many people still look at this visualization and ask “Oh, does this trace the path people take in their careers in order?” It doesn’t, by virtue of my desire to try to highlight the places where people presently live in the graph structure. How can we make that distinction clearer visually? If we can’t, does it make sense to let individual graphs be clickable and highlight-able, to make it more discoverable what a single “graph” looks like? Finally, would it make sense to add a second “mode”, where the graphs for individual resumes are instead connected in the more typical “I moved from A to B and then B to C” manner?

Next Steps

I’m interested in both refining the visualization itself (as critiqued above) and exploring how this visualization might be tweaked to accomodate resume sets other than “all of the American resumes I could find on the internet.” I’d be interested to work with a campus career office or corporate HR office to use this same software to study a selection of resumes that might highlight local geography in an interesting way (such as the city of Pittsburgh, using a corpus of CMU alumni resumes.) Interestingly, large privacy questions would arise from such a study, as individual people’s life movements might be recognizable in such visualizations in a form that the person submitting the resume might not have intended.

Project 1: The World According to Google Suggest

by Nara @ 12:17 am
Nara's Project 1

Nara's Project 1: The World According to Google Suggest

The Idea

The inspiration for this project came when I saw this blog post a couple of weekends ago, wherein the author typed “why is” followed by different country names to see what Google Suggest comes up with, resulting in some interesting stereotyped perceptions of these countries. I opened up Google and tried a few searches of my own (I’m Dutch, so I started off with “why is the Netherlands”) and discovered that unlike the blog post had suggested, not all countries came up with stereotype-like phrases; many of them were legitimate questions. So, instead, I tried a few queries like “why are Americans” and “why are the Dutch” and found that when phrased that way, focusing on the people rather than the countries, one was much more likely to see stereotypes and perceptions rather than real questions.

I quickly realized that it shouldn’t be too difficult to write a program that queries Google for searches in the form “why are [people derivation]” for different countries and see how we stereotype and perceive each other. When I presented this idea in class last week, my group members suggested that in addition to querying Google.com, I should also query other geographic localizations of Google such as Google.co.uk.

The Data

One of the trickiest parts of this project was figuring out where/how to get the data, as it seems that Google fairly recently changed its search API from the old SOAP-based model to something new, and most of the documentation and how-to’s on the web focus on the old API. I also couldn’t figure out where Google Suggest fits into the search API and finally decided to give up on the “official” API altogether. I finally discovered that one could obtain the Google Suggest results in XML format using the URL http://www.google.com/complete/search?output=toolbar&q= followed by a query string. (This URL works for all geographic localizations as well; simply replace “google.com” with “google.co.uk” or what have you.)

All my data is obtained using that URL, saving the XML file locally and referencing the stored XML file unless I specifically tell it to scrape for new data. I did this because this is not an official API, unlike Google’s real APIs which require API keys, so I wasn’t sure if they would get upset if I am repeatedly scraping this site. The results don’t seem to update all too frequently anyway.


The program takes as input a CSV file that has a list of countries and the main derivation for the people of each of those countries. For example, “United States,American” and “Netherlands,Dutch”. These are later used to construct the query strings used for scraping Google Suggest.

Another idea that I had was to color-code the results based on whether each adjective or phrase has a positive, negative, or neutral connotation. Patrick told me that such a list does exist somewhere on the web, but I did not manage to find it. So, it currently uses an approach based on manual intervention. The program also takes as input a CSV file with a list of adjectives followed by the word “positive”, “negative”, or “neutral”. It reads these into a hash table, and each adjective gets associated with a random tint or shade of the color I picked for each of the words (green for positive, red for negative, blue for neutral). When the program is set to scrape for new data, it writes any newly encountered adjectives/phrases to the CSV file, followed by the word “unknown”. I can then go into the CSV file and change any of those unknowns to their proper connotation. This approach currently works quite well because it currently has a listing of some 100 phrases, and each time it scrapes for more data it finds fewer than 5 new phrases on average. Of course, this manual intervention isn’t a long-term solution and ideally a database could be found somewhere that can be scraped for these connotations.


I did purposely filter out some of the phrases that are returned by the searches. For example, phrases of the style “[why are the dutch] called the dutch” were common, but did not contribute to my goal of showing stereotypes and perceptions of people; these are the more legitimate questions that I’m trying to avoid. Unfortunately there wasn’t really a good way to do this other than providing a manually-edited list of certain keywords associated with phrases and phrase patterns I deliberately want to exclude.

The Results

You can see the program ‘live’ here. I say ‘live’ because as noted above it is referencing stored XML files and not querying Google in real time. However, it is ‘live’ in the sense that you can freely interact with it.

The program currently displays the results for 24 different countries and 6 different versions of Google. I tried other versions of Google but it seems to only work for the geographic localizations of English-speaking countries; other localizations tended to only yield results for fewer than 10 different countries.

Not all countries came up with a lot of different words and phrases associated with them. I actually originally started out with a list of about 30 countries, and I had to take out a few that just weren’t yielding any results. There are still a few left that do not have many, but I decided to keep them because I did not want to create this false perception that because all the countries shown have a wide range of results, any country queried will have that wide a range. My list, I feel, is much more honest and shows that people simply aren’t doing as many queries on Danish people as they are about Americans. I also felt compelled to include at least one country for each continent, which sometimes was difficult, especially in the case of Africa and the Middle East. So, one might wonder why I chose to keep some of the countries that do not have very interesting results, but there actually was good reasoning behind my final list of countries.

The Visualization & Interface

The interface is pretty barebones and the visualization is readable but not terribly beautiful or compelling. I actually started out intending to use the treemap for comparisons available on ManyEyes, but I tried to implement the algorithm in the paper they reference and didn’t get very far. (I had little time to work with and I have no experience implementing an algorithm described in a research paper.) So, I opted for this much simpler approach, inspired somewhat by the PersonalDNA visualization of personal qualities like a DNA strip with longer widths for qualities that are more dominant. It is basically like a stacked bar graph.

As far as interactive features go, the user can:

  • Hover over any part of the “DNA strip” to uncover words that do not fit (ie. the label is wider than the part of the strip).
  • Click on a word or phrase and see all the countries that have that trait, ranked by the percentage of their search results that that phrase takes up.
  • Click on one of the tabs on the bottom to examine how the attributes change when different geographic localizations of Google are queried.
  • Click on the name of a country to dim out the other countries. This setting is retained when switching tabs at the bottom, making it easier to examine the changes in a specific country across the different geographic localizations of Google.

The Future

Here’s a few things I would’ve liked to implement had I had more time:

  • Including something about the number of queries for each query string; possibly allowing the user to rank the occurrences of a phrase by number of queries for that phrase instead of by percentage of that country’s queries.
  • An ability to filter the list of countries in various ways, such as by continent.
  • Relating it to a world map, especially with the colors. For example, a simple SVG map of the world could be color-coded according to the most dominant color for each country. The results could be interesting, on a country as well as a continent level.
  • Having data moving and shifting using tweens, rather than just displaying the data in one frame and then the changed data in the next frame.
  • A prettier interface with more visual hierarchy.

The Conclusion

All in all, I’m not dissatisfied with this project, although I view it more as a good start to a project rather than a finished piece. It has a lot of opportunities for further development, and I hope I do get around to expanding and refining it more someday, but in this case I just simply did not have the time. I was out of town this past weekend for grad school interviews, and I had 2 other projects due this week as well, one on Tuesday and another also on Wednesday. It’s been a hell of a week with very little sleep, so even though I recognize the weaknesses of this project, I admittedly am quite happy and proud of myself for getting something up that works at all.

The zip file with the applet and the source code can be found here.

Fortune 500 Visualization

by paulshen @ 10:35 pm 26 January 2010

Most of my writeup is at http://in.somniac.me/2010/01/26/fortune-500-visualization/

Mac OS X Executable

Presentation [pdf]


I’m rather pleased with how the visuals and interactions turned out. On the other hand, I was a little disappointed with the visualization; it wasn’t as interesting as I had hoped, but still interesting! For the interaction, there is the problem of presenting the large amount of data in a limited space. To overcome this problem, I allow the user to pan the camera, although this has its limitations as well. The number of companies and time frame one sees at any time is smaller.

Potential Features
  • The interaction makes sense but can be optimized more. Possible zoom in/out
  • One may want to do a lookup of random company name. This would probably be another feature to implement, allowing the user to look up companies by typing.
  • It may be interesting how companies enter and leave the top 100.

I feel I achieved a lot technically on this piece and learned more C++ during the process.

My main negative critique would be the arguable lack of “interestingness” and “usefulness” of the piece. Instead, I used this as an exercise on designing a way to display multi-dimensional data. During my exploration, I also tried to color in the companies according to the industry of the company (interesting to see trends of industries). However, I ended up not accomplishing this because of the difficulty of producing this data. I wrote a script to scrape the Fortune 500 site but they only categorize the companies in 2005-2009, which would leave most of the image uncategorized. I also tried scraping Wikipedia but the articles were too inconsistent.

NYC Schools and SAT Scores


Mac OS X Executable

This was just an idea directly inspired by stumbling upon the data set. The outcome, again, wasn’t as interesting as I’d hoped, but it was a fun exercise.

Next Page »
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2016 Special Topics in Interactive Art & Computational Design | powered by WordPress with Barecity