For my final project, I worked with an image dataset of around 3300 posters from the Cooper Hewitt museum (https://collection.cooperhewitt.org/types/35238163/). These posters originated from different parts of the 20th and 21st centuries. Their places of origin were spread over 20 countries. I was interested in the dataset itself, but the Cooper Hewitt website presented it in a way that was not conducive to easy consumption (screenshot of website below). The posters are paginated into 95 pages and only 3 can be seen at a time, making any sort of observation or inferencing futile. With this project, I aimed to curate this data and present it in an easy-to-consume way.
The Cooper Hewitt makes their entire collection open to the public and provides an API to developers to download images and metadata for each object in their collection. I started off by using this API (https://collection.cooperhewitt.org/api/) to download the entire dataset of poster images, along with metadata in JSON format. I wrote Python scripts to get the poster images and metadata.
Next, I used two JavaScript libraries, Masonry (http://masonry.desandro.com/) and Isotope (http://isotope.metafizzy.co/) to arrange these 3300 images in a grid layout with filtering and sorting functionalities. For filtering, I initially used the year of acquisition of posters instead of the year of origin since there was more data for the former (year of origin was null for more posters than year acquired). However, since the latter was more interesting to look at, I switched to displaying only those posters whose year of origin was available (around 2000 posters).
Though the posters originated from over 20 countries in total, only 13 of them had a significant number of posters. Countries with fewer than 10 posters were not displayed in the filter panel (as shown in the screenshot below). In addition to Masonry and Isotope, Lazy Load plugin for jQuery (http://www.appelsiini.net/projects/lazyload) was used to lazy load the large number of images and avoid hogging browser memory. The video below shows some of the interesting categorizations of posters seen using the web app.
(Video to be updated with text annotations)
This project could be improved in a few ways. Currently, it uses the year of origin field from the JSON metadata that the museum provides, but that is not available for all posters. However, many posters have their years of origin mentioned in their title, which is another field in the available metadata. Extracting years from titles may not always give accurate results, but would allow the display of many other posters from the collection. Also, allowing the user to input a time period instead of pre-defined decades would be more valuable to have.
Acknowledgements:
- I would like to thank Golan for giving valuable feedback at multiple points, both during the course of this project and the previous image visualization project, which is when I started working with Cooper Hewitt’s collections.
- Also, the Cooper Hewitt museum for providing the image set, metadata as well as the tools to download and play with both.
- This visualization of artifacts from the New York Public Library (along with the Carnegie Museum of Art collection) initially piqued my interest in working with visualizing museum datasets: http://publicdomain.nypl.org/pd-visualization/
GitHub link: https://github.com/samarth7b/cooperhewitt-posters