Timothy Sherman – Project 2 – ESRB tag cloud

by Timothy Sherman @ 1:17 am 31 January 2011

My project is a dynamic tag cloud of word frequency in video game titles. The user can select a number of ratings and/or content descriptors (short phrases that describe the content of a game), assigned by the Entertainment Software Ratings Board (ESRB), and the cloud will regenerate based on the narrower search. The user can also hover their mouse over a word in the cloud, and access a list of the games with the given rating/descriptor parameters which contain that word in their title.

Initially, my data source was going to be a combination of the Entertainment Software Ratings Board’s rating data against sales numbers from VGChartz. After getting my data from both websites using scrapers (both XML based in Ruby and Nokogiri, but one that loaded 35000 pages and took 5 hours to run), I encountered a problem. The ESRB and VGchartz data didn’t line up – titles were listed differently, or co-listed in one, and listed separately in the other. There were thousands of issues, most unique, and the only way to fix it would be by hand, something I didn’t have time or patience for. I decided to drop the VGchartz data and just work with the ESRB data, as it seemed more relatable on it’s own.

Though I had my data, I didn’t really know how to visualize it. After a lot of coding, I ended up with what basically amounted to a search engine. You could search by name, and parametrize it with ratings or content descriptors, and recieve a list of games that matched. But this wasn’t a visualization! This was basically what the ESRB had on their site. I felt like I had hit a wall. I’ve never done data visualization work before, and I realized that I hadn’t thought about what to actually do with the data – I’d just thought about the data’s potential and assumed it’d fall into place. After thinking about it, I came up with a couple potential ideas, and I decided the first one I’d try would be a word frequency visualization on the game titles, one that could be parametrized by content descriptors and rating. This was what ended up being my final project.

I was working in Processing, using the ControlP5 library for my buttons, and Golan’s Tag Cloud demo which used OpenCloud. I began by coding the basic functionality into a simple and clean layout – I ended up liking this layout so much that I only modified it slightly for the final project. The tag cloud was easy to set up, and the content-descriptor-parametrized search wasn’t terrible either. I added a smaller number next to each word, showing how many times that word appeared in the search, to help contextualize the information for the viewer. I saw that there was some interesting stuff to be found, but wanted more functionality. What I had gave no information about the actual games that it used to make the tag cloud. When I saw an odd word in a given search, I wanted to be able to see what games had that name in their title. I added a scrollable list that pops up when the user mouses over a word in the cloud which lists all the games in the search with that word.

At this point, most of my work became refining the parameters I wanted to allow users to search, and various visual design tasks. I figured out colors and a font, added the ability to search by a games rating, and selected the parameters that seemed more interesting.

Overall, I’m decently happy with the project. It’s the first data visualization I’ve ever done, and while I feel that it shows to some extent, I think that what I came up with can be used to find interesting information, and there are some unintuitive discoveries to be made. I do feel that had I been thinking about how I would visualize my data earlier, I would’ve been able to achieve a more ambitious or refined project – there are still some problems with this one. The pop-up menus, while mostly functional, aren’t ideal. If you try to use them for words in small font, they become unscrollable. I had to compromise on showing the whole title in them as well. There was no way to make it fit and still display a lot of information in the table, and no way to make the table larger and still keep track of if the mouse had moved to another word – limitations of ControlP5 which I didn’t have time to figure out how to get around. That said, these are issues with a secondary layer of the project, and I think the core tag cloud for chosen descriptors is interesting and solid.

Presentation Slides

Processing Source:
Note: This requires the ControlP5 library to be installed.

import controlP5.*;
 
ControlP5 controlP5;
 
int listCnt = 0;
 
Button[] ratingButt;
Button[] descripButt;
 
Button[] textButt;
boolean changeTextButt = true;
ListBox hoverList;
int listExists = -1;
 
color defaultbg = color(0);
color defaultfg = color(80);
color defaultactive = color(255,0,0);
color bgcolor = color(255,255,255);
color transparent = color(255,255,255,0);
color buttbgcolor = color(200,0,0);
color buttfgcolor = color(150);
 
Cloud  cloud;
 
float  maxWordDisplaySize = 46.0;
 
int combLength;
String names[];
String rating[];
String descriptors[];
ArrayList descripSearch;
ArrayList rateSearch;
String descriptorList[];
String ratingList[];
ArrayList currentSearch;
PFont font;
ControlFont cfont;
 
void setup() 
{
  font = createFont("Helvetica", 32, true);
  cfont = new ControlFont(font);
  cfont.setSize(10);
  //cfont.setColor();
  textFont(font, 32);
  size(800, 600);
  smooth();
  background(bgcolor);
  frameRate(30);
  cloud = new Cloud(); // create cloud
  cloud.setMaxWeight(maxWordDisplaySize); // max font size
  cloud.setMaxTagsToDisplay (130);  
  controlP5 = new ControlP5(this);
  controlP5.setControlFont(cfont);
 
  //rating list
  ratingList = new String[5];
  ratingList[0] = "Early Childhood";
  ratingList[1] = "Everyone";
  ratingList[2] = "Teen";
  ratingList[3] = "Mature";
  ratingList[4] = "Adults Only";
 
 
  //rating buttons
  ratingButt = new Button[5];
  for(int i = 0; i < 5; i++)
  {
    ratingButt[i] = controlP5.addButton("rating-"+i,i,(10+(0)*60),40+i*24,104,20);
    ratingButt[i].setLabel(ratingList[i]);
    ratingButt[i].setColorBackground(defaultbg);
    ratingButt[i].setColorForeground(defaultfg);
    ratingButt[i].setColorActive(defaultactive);
  }
 
 
 
  //descriptor list - used with buttons for faster lookup.
  descriptorList = new String[17];
  descriptorList[0] = "Tobacco";
  descriptorList[1] = "Alcohol";
  descriptorList[2] = "Drug";
  descriptorList[3] = "Violence";
  descriptorList[4] = "Blood";
  descriptorList[5] = "Gore";
  descriptorList[6] = "Language";
  descriptorList[7] = "Gambling";
  descriptorList[8] = "Mild";
  descriptorList[9] = "Realistic";
  descriptorList[10] = "Fantasy";
  descriptorList[11] = "Animated";
  descriptorList[12] = "Sexual";
  descriptorList[13] = "Nudity";
  descriptorList[14] = "Comic Mischief";
  descriptorList[15] = "Mature Humor";
  descriptorList[16] = "Edutainment";
 
  //descrip buttons
  descripButt = new Button[17];
  for(int i = 0; i < 17; i++)
  {
    descripButt[i] = controlP5.addButton("descrip-"+i,i,(10+(0)*60),180+(i)*24,104,20);
    descripButt[i].setLabel(descriptorList[i]);
    descripButt[i].setColorBackground(defaultbg);
    descripButt[i].setColorForeground(defaultfg);
    descripButt[i].setColorActive(defaultactive);
  }
 
  //load strings from file.
  String combine[] = loadStrings("reratings.txt");
  combine = sort(combine);
  combLength = combine.length;
  names = new String[combLength];
  rating = new String[combLength];
  descriptors = new String[combLength];
  descripSearch = new ArrayList();
  rateSearch = new ArrayList();
  currentSearch = new ArrayList();
 
 
  //this for loop reads in all the data and puts into arrays indexed by number.
  for(int i = 0; i < combLength; i++)
  {    
    //this code is for the ratings.txt file
    String nextGame[] = combine[i].split("=");
    names[i] = nextGame[0];
    rating[i] = nextGame[2];
    descriptors[i] = nextGame[3];
    currentSearch.add(names[i]);
    listCnt++;
    String nameWords[] = split(names[i], " ");
    for(int z = 0; z < nameWords.length;z++)
    {
      String aWord = nameWords[z];
      while (aWord.endsWith(".") || aWord.endsWith(",") || aWord.endsWith("!") || aWord.endsWith("?")|| aWord.endsWith(":") || aWord.endsWith(")")) {
        aWord = aWord.substring(0, aWord.length()-1);
      }
       while (aWord.startsWith(".") || aWord.startsWith(",") || aWord.startsWith("!") || aWord.startsWith("?")|| aWord.startsWith(":") || aWord.startsWith("(")) {
       aWord = aWord.substring(1, aWord.length());
       }
      aWord = aWord.toLowerCase();
      if(aWord.length() > 2 && !(aWord.equals("of")) && !(aWord.equals("and")) && !(aWord.equals("the")) && !(aWord.equals("game")) && !(aWord.equals("games"))) {
        cloud.addTag(new Tag(aWord));
      }
    }
  }
}
 
void controlEvent(ControlEvent theEvent) {
  // with every control event triggered, we check
  // the named-id of a controller. if the named-id
  // starts with 'button', the ControlEvent - actually
  // the value of the button - will be forwarded to
  // function checkButton() below.
  if(theEvent.name().startsWith("rating")) {
    ratingButton(theEvent.controller());
    search(0);
  }
  else if(theEvent.name().startsWith("descrip")) {
    descripButton(theEvent.controller());
    search(0);
  }
 
}
 
void descripButton(Controller theCont) {
  int desVal = int(theCont.value());
  int desInd = descripSearch.indexOf(descriptorList[desVal]);
  if(desInd == -1)
  {
    descripSearch.add(descriptorList[desVal]);
    theCont.setColorBackground(buttbgcolor);
    theCont.setColorForeground(buttfgcolor);
  }
  else
  {
    descripSearch.remove(desInd);
    theCont.setColorBackground(defaultbg);
    theCont.setColorForeground(defaultfg);
  }
}
 
void ratingButton(Controller theCont) {
  int ratVal = int(theCont.value());
  int ratInd = rateSearch.indexOf(ratingList[ratVal]);
  if(ratInd == -1)
  {
    rateSearch.add(ratingList[ratVal]);
    theCont.setColorBackground(buttbgcolor);
    theCont.setColorForeground(buttfgcolor);
  }
  else
  {
    rateSearch.remove(ratInd);
    theCont.setColorBackground(defaultbg);
    theCont.setColorForeground(defaultfg);
  }
}
 
void draw()
{
  background(bgcolor);
  textSize(12);
  fill(255,0,0);
  text(listCnt,45-textWidth(str(listCnt)),30);
  fill(0);
  text("/"+combLength+" games",45,30);
  //text("games games",20,30);
  List tags = cloud.tags();
  int nTags = tags.size();
  // Sort the tags in reverse order of size.
  tags = cloud.tags(new Tag.ScoreComparatorDesc());
  if(changeTextButt)
  {
    textButt = new Button[130];
  }
  float xMargin = 130;
  float ySpacing = 40;
  float xPos = xMargin; // initial x position
  float yPos = 60;      // initial y position
  for (int i=0; i<nTags; i++) {
 
    // Fetch each tag and its properties.
    // Compute its display size based on its tag cloud "weight";
    // Then reshape the display size non-linearly, for display purposes.
    Tag aTag = (Tag) tags.get(i);
    String tName = aTag.getName();
    float tWeight = (float) aTag.getWeight();
    float wordSize =  maxWordDisplaySize * ( pow (tWeight/maxWordDisplaySize, 0.6));
 
    //we calculate the length of the text up here so the buttons can be made with it.
    float xPos0 = xPos;
    textSize(wordSize);
    float xPos1 = xPos + textWidth (tName) + 2.0;
    textSize(wordSize/2);
    float xPos2 = xPos1 + textWidth (str((float)aTag.getScore())) + 2.0;
 
    //make a transparent button for each word. This can be used to tell if we are hovering over a word, and what word.
    if(changeTextButt)//We only make new buttons if we've done a new search (saves time, and they stick around).
    {
      textButt[i] = controlP5.addButton("b-"+str(i),(float)i,(int)xPos0,(int)(yPos-wordSize),(int)(xPos2-xPos0),(int)wordSize);
      textButt[i].setColorBackground(transparent);
      textButt[i].setColorForeground(transparent);
      textButt[i].setColorActive(transparent);
      textButt[i].setLabel("");
    }
    else//if we aren't making new buttons, we're checking to see if the mouse is inside the button for the current word.
    {
 
      if(textButt[i].isInside())
      {
        if(listExists == -1)//If there is no popup list on screen, we make one and fill it
        {
          hoverList = controlP5.addListBox(tName,(int)xPos0-40,(int)(yPos-wordSize),(int)(xPos2-xPos0+20),60);
          hoverList.setItemHeight(12);
          hoverList.setBarHeight(12);
          hoverList.setColorBackground(buttbgcolor);
          hoverList.setColorForeground(buttfgcolor);
          hoverList.setColorActive(defaultactive);
          fillHoverList(tName, xPos2-xPos0+25.0);
          listExists = i;//This is which button/word the list is on.
        }
        /*else
         {
         //inside a button and list is here. could add keyboard scroll behavior.
         }*/
      }
      else if(listExists == i)//outside this button, and list is here. delete list.
      {
        listExists = -1;
        hoverList.hide();
        hoverList.remove();
      }
    }
 
 
    // Draw the word
    textSize(wordSize);
    fill ((i%2)*255,0,0); // alternate red and black words.
    text (tName, xPos,yPos);
 
    //Advance the writing position.
    xPos += textWidth (tName) + 2.0;
 
 
    //Draw the frequency
    textSize(wordSize/2);
    text (str((int)aTag.getScore()),xPos,yPos);
 
    // Advance the writing position
    xPos += textWidth (str((float)aTag.getScore())) + 2.0;
    if (xPos > (width - (xMargin+10))) {
      xPos  = xMargin;
      yPos += ySpacing;
    }
  }
  if(changeTextButt)//If we made new buttons, we don't need to make new buttons next draw().
  {
    changeTextButt = false;
  }
}
 
//Fills the popup list with games.
void fillHoverList(String word, float tWidth)
{
  int hCount = 0;
  for(int i = 0; i < currentSearch.size(); i++)
  {
    boolean nameCheck = false;
    String[] nameSplit = split((String)currentSearch.get(i)," ");
    for(int j = 0; j < nameSplit.length; j++)
    {
      String aWord = nameSplit[j];
      while (aWord.endsWith(".") || aWord.endsWith(",") || aWord.endsWith("!") || aWord.endsWith("?")|| aWord.endsWith(":")) {
        aWord = aWord.substring(0, aWord.length()-1);
      }
      aWord = aWord.toLowerCase();
      if(aWord.equals(word))
      {
        nameCheck = true;
        break;
      }
    }
    if(nameCheck)
    {
      String addName = (String)currentSearch.get(i);
      textSize(10);
      if(addName.length() > (int)(tWidth/7.35))
      {
        addName = addName.substring(0,(int)(tWidth/7.35-1))+"\u2026";
      }
      hoverList.addItem(addName,i);
      hCount++;
    }
  }
  hoverList.captionLabel().set(word+" - "+hCount);
}
 
//this searches the data for games that contain any of the parameter ratings, and all of the parameter descriptors.
void search(int theValue) {
  listCnt = 0;
  currentSearch.clear();
  cloud = new Cloud();
  cloud.setMaxWeight(maxWordDisplaySize); // max font size
  cloud.setMaxTagsToDisplay (130);
  String[] searchedGames = new String[combLength];
  for(int i = 0; i < combLength; i++)
  {
    String[] ratingCheck = {
      "none"
    };
    for(int r = 0; r < rateSearch.size(); r++)
    {
      ratingCheck = match(rating[i],(String)rateSearch.get(r));
      if(ratingCheck != null)
      {
        break;
      }
    }
    String[] descripCheck = {
      "none"
    };
    for(int d = 0; d < descripSearch.size(); d++)
    {
      descripCheck = match(descriptors[i],(String)descripSearch.get(d));
      if(descripCheck == null)
      {
        break;
      }
    }
    if(descripCheck != null && ratingCheck != null)
    {
      searchedGames[listCnt] = names[i];
      currentSearch.add(names[i]);
      String nameWords[] = split(searchedGames[listCnt], " ");
      for(int z = 0; z < nameWords.length;z++)
      {
        String aWord = nameWords[z];
        while (aWord.endsWith(".") || aWord.endsWith(",") || aWord.endsWith("!") || aWord.endsWith("?")|| aWord.endsWith(":") || aWord.endsWith(")")) {
          aWord = aWord.substring(0, aWord.length()-1);
        }
        while (aWord.startsWith(".") || aWord.startsWith(",") || aWord.startsWith("!") || aWord.startsWith("?")|| aWord.startsWith(":") || aWord.startsWith("(")) {
       aWord = aWord.substring(1, aWord.length());
       }
        aWord = aWord.toLowerCase();
        if(aWord.length() > 2 &&!(aWord.equals("of")) && !(aWord.equals("and")) && !(aWord.equals("the")) && !(aWord.equals("game")) && !(aWord.equals("games"))) {
          cloud.addTag(new Tag(aWord));
        }
      }
      listCnt++;
    }
  }
  changeTextButt = true;//time to make new buttons.
  for(int i = 0; i < textButt.length; i++)//delete old buttons.
  {
    textButt[i].hide();
    textButt[i].remove();
  }
}

1 Comment

  1. Strong work Tim. Here are comments from the PiratePad. –GL

    might be faster using php other than ruby for scraping, i got 58 pages in 5 seconds, hows that translate with the math to 35000 pages?
    Nope. He’s using the libcurl interface in ruby. It’s really fast/multithreaded.

    Pretty clean!! The only thing I would change is the color of the words, unless you have a reason for making one red and the next black and so on..

    This is really funny. I was sceptical about the absnece of visuals, but it works well. However, does this talk more about the games or how they are taged?

    This turned out a lot better than I thought it was going to. :-) Great work.

    really great display…

    Tag cloud in game titles — fun.
    Nice scraping work. I think we should share these data sets.
    Nice, honest visualization work, with a clean interface.
    Juicy.

    A search engine can still be a visualization
    At this stage I’m kind of sick of tag clouds. Actually, with the rating filters it becomes more interesting, nevermind about being sick of tag clouds
    It would be interesting if you could group together phrases like “Star Wars” or “San Andreas” in one word so you can distinguish between games with the word star and Star Wars games
    This is a nice thing you could put online. You could even make it with jQuery or something to make it a bit more accessable.

    I like the filters a lot. And the little counter up top.

    great clean interface…would be interesting to cross reference with sales data. well done

    Comment by Golan Levin — 4 February 2011 @ 3:34 pm

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2023 Interactive Art & Computational Design / Spring 2011 | powered by WordPress with Barecity