Artificial intelligence may be a hot topic for geeks and futurists, but it doesn't necessarily excite history fans. Yet, those of us fascinated by the past should think again. As demonstrated by Google Cloud's cooperation with the New York Times, artificial intelligence can be a crucial tool in helping us study history.
Ten years ago when I worked in museums, I remember a several big projects that involved painstakingly going through piles of boxed archives of photos and old logbooks. Each item had to be identified, categorized, annotated, filed and have its details entered into a spreadsheet for future research.
While it was fascinating to handle the archived artifacts and files, the archiving and spreadsheet-making itself was repetitive, tedious busywork. It entailed days upon days of the same tasks over and over. So I was caught somewhere between envy and delight when I learned that nowadays, AI can take over the heavy lifting when it comes to preserving and organizing historical photos, allowing us meatbags to skip to the fun part of analyzing what they mean.
Google Cloud and the NYT team up to find the lost history of millions of old photos
Over the course of more than a century, The New York Times has archived around five to seven million of its old photos in an underground storage unit known as the "morgue". It contains hundreds of file cabinets stuffed with folders full of photos. Many of the pictures have not been seen in years. This rich trove of visual history was until recently organized only via a card catalog index, which doesn't exactly give a wealth of information on the contents. Now Cloud storage and machine learning can make the archive more accessible and understandable.
Watch: Google Cloud leverages machine learning to digitize the NYT photo archive:
Not only is this kind of archive difficult and slow to use, but it's at great risk from inevitable environmental damage, such as when a broken pipe flooded the archival library in 2015. Not only will Cloud storage keep high-resolution scans of the photos safe, but it'll allow faster access to more detailed information about the content and context of the images.
Machine learning for human understanding of history
The Cloud Vision API steps in to give us more information about the images stored. As an example, Google offers this photo of the old Penn Station from The Times, with a confusing mishmash of handwritten and printed text on the back.
The Cloud Vision API can read the back part of the photo and deliver the following transcript:
NOV 27 1985
JUL 28 1992
Clock hanging above an entrance to the main concourse of Pennsylvania Station in 1942, and, right, exterior of the station before it was demolished in 1963.
PUBLISHED IN NYC
RESORT APR 30 ‘72
The New York Time THE WAY IT WAS - Crowded Penn Station in 1942, an era “when only the brave flew - to Washington, Miami and assorted way stations.”
Penn Station’s Good Old Days | A Buff’s Journey into Nostalgia
( OCT 3194
PHOTOGRAPH BY The New York Times Crowds, top, streaming into the old Pennsylvania Station in New Yorker collegamalan for City in 1942. The former glowegoyercaptouwd a powstation at what is now the General Postadigesikha designay the firm of Hellmuth, Obata & Kassalariare accepted and financed.
Pub NYT Sun 5/2/93 Metro
THURSDAY EARLY RUN o cos x ET RESORT
EB 11 1988
RECEIVED DEC 25 1942 + ART DEPT. FILES
The New York Times Business at rail terminals is reflected in the hotels
OUTWARD BOUND FOR THE CHRISTMAS HOLIDAYS The scene in Pennsylvania Station yesterday afternoor afternoothe New York Times (Greenhaus)
Additionally, the Cloud Natural Language API can also recognize the text and logos to add context and categorize the photo...from the text “The New York Time THE WAY IT WAS - Crowded Penn Station in 1942, an era when only the brave flew - to Washington, Miami and assorted way stations.”, it correctly identifies “Penn Station,” “Washington,” and “Miami” as locations, and classifies the entire sentence into the category “travel” and the subcategory “bus & rail.”
Good news for history buffs
You can see that the Cloud Vision isn't 100% word perfect, but it's still much faster and more efficient than going through everything manually. Besides, even small details are captured and become searchable via computer. If I was a human researcher, I'd be so happy to have all this information available laid out in plain text form the get-go.
Of course, Google Cloud is keen to advertise their partnership with the New York Times, and they should be proud of their work, but I'm excited to see the application of AI in organizing the wealth of information that universities, libraries and other institutions have sitting in dusty old archival vaults because they don't have the human resources to digitize them. Using modern AI, we not only preserve this information, but make it easier to understand and more accessible to humanity, and that warms my heart.
What do you think of this project? Is there another institution that you think should do something similar?
Source: Google Cloud