Category Archives: 12-datascraping


27 Jan 2015

Google Image search results for “beautiful”


I chose to collect the 781 images that result from searching “beautiful” on Google Images. I am interested in the way that these images consistently violate my personal sense of beauty, despite being among the most powerful (visible) beautiful images on the web. My first attempt used the Google Image Search API, but I soon discovered that this API limited me to 64 images, when simple inspection of the results page showed many more images. I then tried downloading the .html of a given results page, and scraping that file for image urls. What I didn’t realize at the time was that a given .html file only shows 100 image urls at a time. David Newbury explained to me that the image urls are dynamically loaded as the user scrolls, so it isn’t possible to download them in one go. Together, we inspected the network activity of the results page, and found requests of the form:

From there, I modified the values in ‘n=1’ and ‘start=100’ eight times to download 781 total image urls. I used the ‘cat’ unix command to combine these 8 .txt files into one file, and then used Python to scrape this file for the urls.

View code on GitHub.

Here is a sketch that identifies categories of normative beauty by grouping related images.

EDIT: What I’d like to do with these images is display them in a grid, and use the Eyetribe eye tracker to log the time spent looking at the images. Then I will change their scale to match gazing time. This way a person can imprint their subjectivity onto the collection of images. (I’m writing this here so I remember to do it.)


27 Jan 2015

I’m always unsatisfied with China’s website banning policy, so my topic is about the great firewall. The most difficult part for me is to decide which part I should critique and what kind of data I should collect.

I intended to collect data about which website is banned and how long since it was banned. But it’s hard to tell the exact date — we don’t have this kind of data.

Then I thought maybe collecting data of posts over time which were pinned inside China in some banned websites, like Instagram in this case, might be a good idea.

I used a lot of time to label the latitude and longitude of a precision of 0.001 inside China mainland, and after I spent 1 hour on this work, I realized it was too trivial. So at last I just collected data from three major cities — Beijing, Shanghai and Guangzhou.

It is the first time for me to do data collection and scraping work, and it’s really interesting to decide the topic. And through this project, I feel I have more insights of data I collected, and I’d love to present and visualize it in the future.

Here are some positions I collected using Instagram (most of them are just restaurants:) ):

Yangfang, Beijing, China
阳坊大都饭店 Yang Fang Da Du Hotel
白虎涧 Baihujian
Great Wall of China
Isle of Skye,Scotland
Mobil Ave
China Beijing International Acupuncture Training Center
Grill 29
Braja ashram
Chateau Laffitte Hotel
Garden Hotel
Casa da vovó Alzira
beijing air harbor hotel
Shanghai Qingpu
Qing Pu Paladset
Longde Road Station
Longde Road Station
Longde Road Station
DHJ Interlining (Chargeurs Interlining) | 迪志衬布(上海)
Ai Mei Chinese Restaurant
ZF China (Investment) Co.
Big Shot Beef

The not so interesting data(like position IDs and posts amount over month) can be found on github.

Here is my github repo.

My visualization plan includes a Chinese map, with the color showing the user post amount.



27 Jan 2015


In this project I want to analyse visually the landscape surrounding streets, avenues and highways. In the end of the 1960s, Robert Venturi, Denise Scott Brown and Steve izenour took many students from Yale to understand the emerging urban spaces of Las Vegas. With different resources such as diagrams, maps and pictures they tried to decipher the logic underlying the urban patterns and structures.

Well, with Google large data base it is possible to access a large set of photos based on its coordinates and angles.  With Temboo I scraped the data of some directions in Google maps. The resulting file contains coordinates and distances from each node of the path. The next step is to connect these points and distances with the streetview image, what is possible with Google API.

“distance” : {
“text” : “0.5 km”,
“value” : 549
“duration” : {
“text” : “1 min”,
“value” : 67
“end_location” : {
“lat” : -23.5654915,
“lng” : -46.7118095
“html_instructions” : “Continue onto \u003cb\u003eBR-116\u003c/b\u003e”,
“polyline” : {
“points” : “ntxnCfmb|Gh@Ax@CnBEb@AjAAhAArCCjACjACnACnAA~@El@APA”
“start_location” : {
“lat” : -23.5605573,
“lng” : -46.71203879999999
“travel_mode” : “DRIVING”

At first sight, there are some challenges, as how to get intermediate coordinates with precision, how to select the angles, etc. In a ordinary trip inside a city, the file contains hundreds of lines and just dozens of nodes. So, I’ll have to choose between creating many paths inside the same city or choosing very long trips between urban centres.

In the end, I would like to use morphing techniques and visual analysis to compress series of images of certain paths in a single image, to understand both the distribution of the buildings, the interface between public and private and also the predominant colours of the environment.


27 Jan 2015

In the year 2005, stay at home mother Stephanie Meyer published the first book in a series that would quickly become a global phenomenon. Twilight — the word, the vampire connotation, the fandoms — has swept the world like a fungal disease (only with less fungus).

Fan culture is something I am very interested in, and I thought it would be revealing to see how the fandom, more specifically the “Team Edward” cult associated with the series, has evolved.

Before the release of the movie in 2008, the series had already become a sensation, but on sites like youtube, fans were limited without movie clips to create Twilight content with.

For the project, I scraped Youtube videos with the keyword “Team Edward” from both before and after 2008 (the release date of the first movie) to see how the Edward fandom has progressed.


Currently, I am playing with the idea of either taking elements of these pre/post Twilight Movie videos and splicing them together to create one super video.

Another idea I had was to order these videos by length or similarity in title and playing them side by side.

Screen Shot 2015-01-26 at 11.52.30 PM Screen Shot 2015-01-26 at 11.48.49 PM               (Team Edward Video Pre-2008)                             (Team Edward Video Post-2008)


Info for pre-2008 Videos:


“channelTitle”: “alyandaj1010”,
“description”: “This is a Jonas Brothers S.O.S. Please come before the Year 3000. I’ll Hold On for just A Little Bit Longer. Youre my One Man Show. When I see you I’m going …”,
“url”: “”
“channelTitle”: “ElvenWolf91”,
“liveBroadcastContent”: “none”

“title”: “teamedward”,
“description”: “”,
“url”: “”
“channelTitle”: “teamedward”,

“title”: “I Dislike Jacob Black! Team Edward Cullen!!!!”,
“description”: “i really dont likfe jacob black! some of you make hate mail me or say crap but ur not gonna change whats been already done! Jacob Black is an annoying puppy …”,
“url”: “”
“channelTitle”: “leftxmyxspritxcoldx”,

“title”: “dora1569”,
“description”: “I love Michael jackson he is so awsome i would love to have meet him in person.”,

“title”: “MsLissy”,
“description”: “”,
“thumbnails”: {
“default”: {
“url”: “”

“description”: “”,
“url”: “”

“title”: “idefixy7”,
“description”: “”,
“url”: “”

“title”: “GaaraGourd21”,
“description”: “Hello, I’m , Noah. I’m a fashion addict! You may know my sister FruitsBasketIsCute, a.k.a. FB-Chan. If you like her than you’ll love me! X3 I’m always open t…”,
“url”: “”

“title”: “0Luna0uchiha0”,
“description”: “º¤ø„¸¸„ø¤º°¨¸„ø¤º°¨ ¨°º¤ø„¸ ♥Adam♥ ¸„ø¤º°¨copy and paste ¸„ø¤º°¨ ♥Lambert♥ `°º¤ø„¸if you think Adam ¸„ø¤º°¨¸„ø¤º°¨¨°º¤ø„¸¨°º¤ø„ is the BeSt!! Its a Mad Wor…”,
“url”: “”

“channelId”: “UCRX2Yra_JsyESZF8bHzN0aQ”,
“title”: “Twilight Team Edward, Team Jacob, and Team Bella? Spoiler Alert!”,
“description”: “Twilight Team Edward Team Jacob and Team Bella? Music: (c)2008 DuskTilDawnFilms.”,
“url”: “”

“title”: “omgjobros”,
“description”: “ im agianst team jacob jk btw my new website is”,
“url”: “”

“title”: “Team Jacob and Team Edward Breaking Dawn Party Merritt Books”,
“description”: “Team Jacob and Team Edward Breaking Dawn Party Merritt Bookstore Millbrook August 1, 2008. Which team won? Great shirts made with metalic paint.”,
“url”: “”

“title”: “omgjoeishot”,
“description”: “Guess what? I’m going back to Camp!! I can’t wait! I’m gonna have more videos and I’ll post the actual show this year if I’m lucky!Videos probably won’t be r…”,
“url”: “”

“channelId”: “UCILXMyrprqJuUJFWftFwPEw”,
“title”: “jerseygirlpeacelover”,
“description”: “dance chica!!!! funky, punky style!!!!! wacky, fun, weird !!!!! :) :) :):) im a dancer i dance with dancenergy i have 2 scholarships ! TWILIGHT SAGA!!!!!!! i…”,
“url”: “”

“title”: “allisson56”,
“description”: “hey there; my name is allisson(lol) i live by the east coast i love TWILIGHT AND THE REST OF THE BOOKS OF COURSE I AM 100%(EVEN MORE) TEAM …”,
“url”: “”

“title”: “Team Edward or Jacob. *SPOILERS*”,
“description”: “This is a video I made so you can vote for which team you are on in Twilight. If you are neutral or really don’t care, you can be Switzerland. Cast your vote…”,

“channelId”: “UCEoxzcSQxY3TvIklNG2V4Mg”,
“title”: “Team Edward Forever – Edward Cullen Fans”,
“description”: “So… one day… I decided I FINALLY did want to make a video. and of course, i did it on Edward Cullen :D I love him (: It is dedicated to all of the Edward…”,
“url”: “”

“title”: “twilightvampiregirl”,
“description”: “i’m 12 years old and i love twilight!!!!! and new moon and eclipse!!!! and the host!!!!! and no doubt!!!!!! i saw stephenie meyer on may 20th and it was amaz…”,
“url”: “”

“title”: “Team Edward or Team Jacob?”,
“description”: “Team Edward or Team Jacob? which side are u on?”,
“url”: “”

“title”: “Team Edward!!!”,
“description”: “This is a sample of Team Edward (aka fans of Stephenie Meyer’s gorgeous vampire Edward Cullen). Edward and Bella is TRUE LOVE! :D.”,
“url”: “”

“title”: “Team Jacob vs. Team Edward”,
“description”: “this was some of the debate for team edward and team jacob at the breaking dawn release party!”,
“url”: “”

“title”: “Team Edward~Edward & Bella~A Love Story”,
“description”: “Fan-vid for Twilight Saga. Team Edward!”,
“url”: “”

“title”: “Twilight Collab : TEAM EDWARD!”,
“description”: “Don’t forget to check out little miss vintageortackys Team Jacob look! I used : NYX jumbo pencil in Pots & Pans …”,

“title”: “Twilight Team Edward or Team Jacob?”,
“description”: “this is my first movie so it’s pretty bad. i didnt make or draw any of the pictures exept the one with Edward and the diamonds. the song is ‘its not just mak…”,
“url”: “”

“title”: “Shelbell99”,
“description”: “Thanks for visting my channel! Feel free to check out my videos and to subscribe! I am more busy now so you won’t really get new vidoes from me, but I will t…”,
“url”: “”

“title”: “mrsorlando33”,
“description”: “Crazy? No…. Crazy about Twilight? YES. ~*♫`~*♫`~*♫`~*♫`~*♫`~*♫`~*♫`~ * When life hands u lemons, throw them back and yell \”I WANT ROBERT …”,
“url”: “”

“title”: “venus344”,
“description”: “90% percent of teens would have a breakdown if miley cyrus, was standing on the edge of a tower ready to jump, copy and paste if your in the 10% that would y.”,
“url”: “”

“channelId”: “UCsMffDp8VFLh36t7T02IRpg”,
“title”: “alyandaj1010”,
“description”: “Edward♥Bella TEAM EDWARD ! ╔═╗ Edward Cullen is my …”,
“url”: “”

“channelTitle”: “alyandaj1010”,
“description”: “This is a Jonas Brothers S.O.S. Please come before the Year 3000. I’ll Hold On for just A Little Bit Longer. Youre my One Man Show. When I see you I’m going …”,
“url”: “”

“channelTitle”: “ElvenWolf91”,
“title”: “teamedward”,
“description”: “”,
“channelTitle”: “teamedward”,

“title”: “I Dislike Jacob Black! Team Edward Cullen!!!!”,
“description”: “i really dont likfe jacob black! some of you make hate mail me or say crap but ur not gonna change whats been already done! Jacob Black is an annoying puppy …”,
“thumbnails”: {
“default”: {
“url”: “”
“channelTitle”: “leftxmyxspritxcoldx”,

“title”: “dora1569”,
“description”: “I love Michael jackson he is so awsome i would love to have meet him in
“url”: “”
“channelTitle”: “dora1569”,

“title”: “MsLissy”,
“description”: “”,
“url”: “”
“channelTitle”: “MsLissy”,

“title”: “TeamEdwardCullen”,
“description”: “Team Edward Cullen v. Team Jacob Black To Join Team Edward Cullen (TEC): 1) Subscribe to me to keep updated with information about the movie, books, etc.”,
“url”: “”
“channelTitle”: “TeamEdwardCullen”,

“title”: “outsidersarah101”,
“url”: “”
“channelTitle”: “outsidersarah101”,

“url”: “”
“channelTitle”: “EDWARDCULLENSOFiNE”,

“title”: “idefixy7”,
“url”: “”
“channelTitle”: “idefixy7”,

“title”: “GaaraGourd21”,
“description”: “Hello, I’m , Noah. I’m a fashion addict! You may know my sister FruitsBasketIsCute, a.k.a. FB-Chan. If you like her than you’ll love me! X3 I’m always open t…”,
“url”: “”
“channelTitle”: “GaaraGourd21”,

“title”: “0Luna0uchiha0”,
“description”: “º¤ø„¸¸„ø¤º°¨¸„ø¤º°¨ ¨°º¤ø„¸ ♥Adam♥ ¸„ø¤º°¨copy and paste ¸„ø¤º°¨ ♥Lambert♥ `°º¤ø„¸if you think Adam ¸„ø¤º°¨¸„ø¤º°¨¨°º¤ø„¸¨°º¤ø„ is the BeSt!! Its a Mad Wor…”,
“url”: “”
“channelTitle”: “0Luna0uchiha0”,

“title”: “Twilight Team Edward, Team Jacob, and Team Bella? Spoiler Alert!”,
“description”: “Twilight Team Edward Team Jacob and Team Bella? Music: (c)2008 DuskTilDawnFilms.”,
“url”: “”
“channelTitle”: “DuskTilDawnFilms”,

“title”: “omgjobros”,
“description”: “ im agianst team jacob jk btw my new website is”,
“url”: “”
“channelTitle”: “omgjobros”,
“title”: “Team Jacob and Team Edward Breaking Dawn Party Merritt Books”,
“description”: “Team Jacob and Team Edward Breaking Dawn Party Merritt Bookstore Millbrook August 1, 2008. Which team won? Great shirts made with metalic paint.”,
“url”: “”
“channelTitle”: “MerrittBookstore”,

“title”: “omgjoeishot”,
“description”: “Guess what? I’m going back to Camp!! I can’t wait! I’m gonna have more videos and I’ll post the actual show this year if I’m lucky!Videos probably won’t be r…”,
“url”: “”
“channelTitle”: “omgjoeishot”,

“title”: “jerseygirlpeacelover”,
“description”: “dance chica!!!! funky, punky style!!!!! wacky, fun, weird !!!!! :) :) :):) im a dancer i dance with dancenergy i have 2 scholarships ! TWILIGHT SAGA!!!!!!! i…”,
“url”: “”
“channelTitle”: “jerseygirlpeacelover”,
“title”: “allisson56”,
“description”: “hey there; my name is allisson(lol) i live by the east coast i love TWILIGHT AND THE REST OF THE BOOKS OF COURSE I AM 100%(EVEN MORE) TEAM …”,
“url”: “”
“url”: “”
“channelTitle”: “allisson56”,

“title”: “Team Edward or Jacob. *SPOILERS*”,
“description”: “This is a video I made so you can vote for which team you are on in Twilight. If you are neutral or really don’t care, you can be Switzerland. Cast your vote…”,

“url”: “”
“channelTitle”: “theLionandtheLamb500”,

Github Link:

Thomas Langerak

27 Jan 2015

For this assignment I have taken a random artist (the Eagles in this case). Looked at the five most similar artists according to LastFM. For this five I looked again at the most similar. I did this for a thousand artists in total. To summarize 1000 artists were analyzed for their five most similar bands and their match value (a value between 0-1 on how well they matched). Giving a total of 10000 points.
For this assignment I have made use of the processing software, Temboo libraries and the lastFM API.

I have learned a lot during this assignment. Next to the usual planning learning point I seem to have, I again have taken a different look on how processing can be used. I never worked with this large amount of data acquisition and I think the knowledge this exercise gave me can be definitely helpful in the future.

I am not sure were to with the visualization. While running the gathering I noticed that it was very interesting how several bands had the same connections. I will try to do something with this. The most logical would be to make a spiderweb, yet I feel this is kind of old and done. I am aiming for something more interactive.

A sample from my data:

the code can be found at: