Library adds HathiTrust to its resources

By Sam Hungerford


This year, Allegheny became a member of a new international community of research libraries, HathiTrust, a searchable database of books similar to other resources that students have access to through Pelletier Library.

According to Linda Bills, library director, the decision to join HathiTrust was prompted by the number of books Pelletier was forced to throw away after mold got into the lower level of the library last year.

However, Bills also said that HathiTrust is an amazing new resource because of the work it’s doing to make searchable a huge amount of texts while remaining inside copyright law.

The organization, which works with Google Books as well as the libraries that join as partners, adds a layer of metadata to its collection of digital books that embeds the text’s copyright information with the book as well as making the texts searchable by title, subject, author, language and time period among other filters.

“Why is this better than Google Books?” Bills said. “Because it’s got standardized metadata, or descriptive data, that organizes it. [Google has] indexed the text of the whole book, but there’s no controlling information about it. So if you know the title you can’t just search the title, you’re searching the full text which means that you’re getting a lot of things that are just randomly located inside the book.”

Hathi (pronounced ha-tee) – meaning elephant in Hindi – “is a partnership of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future,” according to their website.

Photo courtesy of
Photo courtesy of

The group began in 2008 with the goal of preserving and organizing digital books and journals from libraries, who can become partners to the organization, including copyrighted materials and materials in the public domain.

The HathiTrust website states that “The partners aim to build a comprehensive archive of published literature from around the world and develop shared strategies for managing and developing their digital and print holdings in a collaborative way.”

Institutions already a part of HathiTrust include the Library of Congress, Stanford University, Dartmouth College, Princeton University, Yale University and now Allegheny.

Although anyone can access and use HathiTrust’s archives, partners hold some privileges through their membership.`

“Anybody in the world can do this search, but we have special privilege because we joined,” Bills said. “In order to get those you have to log in, so you just use your Allegheny login and password, and now if you see something that’s [available in] full text, you will be able to download the whole book as a PDF. If you were just a regular user you’d get it one page at a time.”

Bills also said that any books Pelleteir was forced to throw away are available to be read by Allegheny students on HathiTrust, no matter if they are in copyright or public domain.

“We bought the copyright for a single use when we bought the book and the fact that we couldn’t keep it isn’t our fault,” Bills said.

Additionally, students and faculty members with visual impairments will be able to view digitized versions of the any text owned by Allegheny, no matter its copyright status, in enlarged digital formats.

“Besides which, we’re supporting this great endeavor,” Bills said.

Members can also create their own collections, which save and organize texts users found into groups. Collections can be made public and shared with the HathiTrust community. These texts are made searchable through the “collections” feature of the website and feature groups such as “19th Century Cookbooks” and “Abraham Lincoln: Fact and Fable” among more common assemblies.

Currently, HathiTrust hosts 10,796,403 volumes – 32 percent of which are in the public domain – and a total of 3.8 billion searchable pages.