Home » Arts & Humanities » Digital Humanities » Library of Congress releases 25 million metadata records

Library of Congress releases 25 million metadata records

The Library of Congress recently released 25 million metadata records for free bulk download at loc.gov/cds/products/marcDist.php. These MARC records make up the foundation for library catalogs, such as OskiCat, which have enabled library users to find and access library books and other media for decades. As the LOC describes the collection:

 

The data covers a wide range of Library items including books, serials, computer files, manuscripts, maps, music and visual materials.  The free data sets cover more than 45 years, ranging from 1968, during the early years of MARC, to 2014.  Each record provides standardized information about an item, including the title, author, publication date, subject headings, genre, related names, summary and other notes.

Reading Room at the Library of Congress
Library of Congress Reading Room, from https://www.loc.gov

The data is available in UTF-8, MARC8, and XML formats, and has been conveniently divided by media type including books, computer files, maps, music, and more.

We’ve added the resource to the public section of the Computational Text Analysis and Text Mining Guide, where you can find many other sources for large-scale text analysis projects. For more information, take a look at the LOC’s Getting Started (PDF) for details on accessing the data.

Questions?

Stacy Reardon, Literatures and Digital Humanities Librarian, sreardon [at] berkeley.edu

Cody Hennesy, E-Learning and Information Studies Librarian, chennesy [at] berkeley.edu

Show Your Support

Show Your Support button to donate to the Library

Library Events Calendar