Digital Collections Unit
By Kate Tasker and Julie Goldsmith, Bancroft Digital Collections
Last week in the Bancroft’s Digital Collections Unit, we put our new Tableau write blocker to work. Before processing a born-digital collection, a digital archivist must first be able to access and transfer data from the original storage media, often received as hard drives, optical disks and floppy disks. Floppy disks have a mechanism to physically prevent changes to the data during the transfer process, but data on hard drives and USB drives can be easily and irreversibly altered just by connecting the drive to a computer. We must access these drives in write-blocked (or read-only) mode to avoid altering the original metadata (e.g. creation dates, timestamps, and filenames). The original metadata is critical for maintaining the authenticity, security, contextual information, and research value of digital collections.
A write blocker is essentially a one-way street for data; it provides assurance that no changes were made, regardless of user error or software modification. For digital archives, using a write blocker ensures an untampered audit trail of changes that have occurred along the way, which is essential for answering questions about provenance, original order and chain of custody. As stewards of digital collections, we also have a responsibility to identify and restrict any personally identifying information (PII) about an individual (Social Security numbers, medical or financial information, etc.), which may be found on computer media. The protected chain of custody is seen as a safeguard for collections which hold these types of sensitive materials.
Other types of data which are protected by write-blocked transfers include configuration and log files which update automatically when a drive connects to a system. On a Windows formatted drive, the registry files can provide information associated with the user, like the last time they logged in and various other account details. Another example would be if you loaned someone a flash drive and they plugged it into their Mac; by doing so they can unintentionally update or install system file information onto the flash drive like a hidden .Spotlight-V100 file. (Spotlight is the desktop search utility on the Mac OS X, and the contents of this folder serve as an index of all files that were on the drive the last time it was used with a Mac.)
Write blockers also support fixity checks for digital preservation. We use software programs to calculate unique identifiers for every original file in a collection (referred to as cryptographic hash algorithms, or checksums, by digital preservationists). Once files have been copied, the same calculations are run on the files to generate another set of checksums. If they match that means that the digital objects are the same, bit for bit, as the originals, without any modification or data degradation.
Once we load the digital collection files in FTK Imager, a free lightweight version of the Forensic Tool Kit (FTK), a program that the FBI uses in criminal data investigations we can view the folders and files in the original file directory structure. We can also easily export a file directory listing, which is an inventory of all the files in the collection with their associated metadata. The file directory listing provides us with specific information about each file (filename, filepath, file size, date created, date accessed, date modified, and checksum) as well as a summary of the entire collection (total number of files, total file size, date range, and contents). It also helps us to make processing decisions, such as whether to capture the entire hard drive as a disk image, or whether to transfer selected folders and files as a logical copy.
Write blockers are also known in the digital forensics and digital preservation fields as Forensic Bridges. Our newest piece of equipment is already helping us bridge the gap between preserving original unprocessed computer media and creating open digital collections which are available to all.
For Further Reading:
AIMS Working Group. “AIMS Born-Digital Collections: An InterInstitutional Model for Stewardship.” 2012. http://www.digitalcurationservices.org/files/2013/02/AIMS_final_text.pdf
Gengenbach, Martin J. “‘The Way We Do it Here’: Mapping Digital Forensics Workflows in Collecting Institutions.” A Master’s Paper for the M.S. in L.S degree. August, 2012. http://digitalcurationexchange.org/system/files/gengenbach-forensic-workflows-2012.pdf
Kirschenbaum, Matthew G., Richard Ovenden, and Gabriela Redwine. “Digital Forensics and Born-Digital Content in Cultural Heritage Collections.” Washington, DC: Council on Library and Information Resources, 2010. http://www.clir.org/pubs/reports/pub149
BitCurator Project. http://bitcurator.net
Forensics Wiki. http://www.forensicswiki.org/
The Bancroft Library University of California Berkeley
SUMMER ARCHIVAL INTERNSHIP 2015
Who is Eligible to Apply
Graduate students currently attending an ALA accredited library and information science program who have taken coursework in archival administration and/or digital libraries.
Born-Digital Processing Internship Duties
The Born-Digital Processing Intern will be involved with all aspects of digital collections work, including inventory control of digital accessions, collection appraisal, processing, description, preservation, and provisioning for access. Under the supervision of the Digital Archivist, the intern will analyze the status of a born-digital manuscript or photograph collection and propose and carry out a processing plan to arrange and provide access to the collection. The intern will gain experience in appraisal, arrangement, and description of born-digital materials. She/he will use digital forensics software and hardware to work with disk images and execute processes to identify duplicate files and sensitive/confidential material. The intern will create an access copy of the collection and, if necessary, normalize access files to a standard format. The intern will generate an EAD-encoded finding aid in The Bancroft Library’s instance of ArchivesSpace for presentation on the Online Archive of California (OAC). Lastly, the intern will complete a full collection-level MARC catalog record for the collection using the University Library’s Millennium cataloging system. All work will occur in the Bancroft Technical Services Department, and interns will attend relevant staff meetings.
6 weeks (minimum 120 hours), June 29 – August 7, 2015 (dates are somewhat flexible)
NOTE: The internship is not funded, however, it may be possible to arrange for course credit for the internship. Interns will be responsible for living expenses related to the internship (housing, transportation, food, etc.).
The competitive selection process is based on an evaluation of the following application materials:
Cover letter & Resume
Current graduate school transcript (unofficial)
Photocopy of driver’s license (proof of residency if out-of-state school)
Letter of recommendation from a graduate school faculty member
Sample of the applicant’s academic writing or a completed finding aid
All application materials must be postmarked on or before Friday, April 17, 2015 and either mailed to:
Head of Digital Collections
The Bancroft Library
University of California Berkeley
Berkeley, CA 94720.
or emailed to melings [at] library.berkeley.edu, with “Born Digital Processing Internship” in the subject line.
Selected candidates will be notified of decisions by May 1, 2015.
The Bancroft Library’s Digital Collections Unit recently finished a pilot project to process its first born-digital archival collection: the Ladies’ Relief Society records, 1999-2004. Based on earlier work and recommendations by the Bancroft Digital Curation Committee (Mary Elings, Amy Croft, Margo Padilla, Josh Schneider, and David Uhlich) we’re implementing best-practice procedures for acquiring, preserving, surveying, and describing born-digital files for discovery and use.
Read more about our efforts below, and check back soon for further updates on born-digital collections.
This paper provides an overview of work currently being done in the Bancroft’s Digital Collections Unit to preserve, process, and provide access to born-digital collections. It includes background information about the Bancroft’s Born Digital Curation Program and discusses the development of workflows and strategies for processing born-digital content, including disk imaging, media inventories, hardware and software needs and support, arrangement, screening for sensitive content, and description. The paper also describes DCU’s pilot processing project of the born-digital files from the Ladies’ Relief Society records.
Bancroft to Explore Text Analysis as Aid in Analyzing, Processing, and Providing Access to Text-based Archival Collections
Mary W. Elings, Head of Digital Collections, The Bancroft Library
The Bancroft Library recently began testing a theory discussed at the Radcliffe Workshop on Technology & Archival Processing held at Harvard’s Radcliffe College in early April 2014. The theory suggested that archives can use text analysis tools and topic modelling — a type of statistical model for discovering the abstract “topics” that occur in a collection of documents — to analyze text-based archival collections in order to aid in analyzing, processing and describing collections, as well as improving access.
Helping us to test this theory, the Bancroft welcomed summer intern Janine Heiser from the UC Berkeley School of Information. Over the summer, supported by an ISchool Summer Non-profit Internship Grant, Ms. Heiser worked with digitized analog archival materials to test this theory, answer specific research questions, and define use cases that will help us determine if text analysis and topic modelling are viable technologies to aid us in our archival work. Based on her work over the summer, the Bancroft has recently awarded Ms. Heiser an Archival Technologies Fellowship for 2015 so that she can continue the work she began in the summer and further develop and test her work.
During her summer internship, Ms. Heiser created a web-based application, called “ArchExtract” that extracts topics and named entities (people, places, subjects, dates, etc.) from a given collection. This application implements and extends various natural language processing software tools such as MALLET and the Stanford Core NLP toolkit. To test and refine this web application, Ms. Heiser used collections with an existing catalog record and/or finding aid, namely the John Muir Correspondence collection, which was digitized in 2009.
For a given collection, an archivist can compare the topics and named entities that ArchExtract outputs to the topics found in the extant descriptive information, looking at the similarities and differences between the two in order to verify ArchExtract’s accuracy. After evaluating the accuracy, the ArchExtract application can be improved and/or refined.
Ms. Heiser also worked with collections that either have minimal description or no extant description in order to further explore this theory as we test the tool further. Working with Bancroft archivists, Ms. Heiser will determine if the web application is successful, where it falls short, and what the next steps might be in exploring this and other text analysis tools to aid in processing collections.
The hope is that automated text analysis will be a way for libraries and archives to use this technology to readily identify the major topics found in a collection, and potentially identify named entities found in the text, and their frequency, thus giving archivists a good understanding of the scope and content of a collection before it is processed. This could help in identifying processing priorities, funding opportunities, and ultimately helping users identify what is found in the collection.
Ms. Heiser is a second year masters’ student at the UC Berkeley School of Information where she is learning the theory and practice of storing, retrieving and analyzing digital information in a variety of contexts and is currently taking coursework in natural language processing with Marti Hearst. Prior to the ISchool, Ms. Heiser worked at several companies where she helped develop database systems and software for political parties, non-profits organizations, and an online music distributor. In her free time, she likes to go running and hiking around the bay area. Ms. Heiser was also one of our participants in the #HackFSM hackathon! She was awarded an ISchool Summer Non-profit Internship Grant to support her work at Bancroft this summer and has been awarded an Archival Technologies Fellowship at Bancroft for 2015.
The Bancroft Library and Research IT have just published a whitepaper on the #HackFSM hackathon: “#HackFSM: Bootstrapping a Library Hackathon in Eight Short Weeks.”
This white paper describes the process of organizing #HackFSM, a digital humanities hackathon around the Free Speech Movement digital archive, jointly organized by Research IT and The Bancroft Library at UC Berkeley. The paper includes numerous appendices and templates of use for organizations that wish to hold a similar event.
Publication download: HackFSM_bootstrapping_library_hackathon.pdf
“#HackFSM: Bootstrapping a Library Hackathon in Eight Short Weeks”. Dombrowski, Quinn, Mary Elings, Steve Masover, and Camille Villa. “#HackFSM: Bootstrapping a Library Hackathon in Eight Short Weeks”. Research IT at Berk. Published October 3, 2014.
By Charlie Macquarie and Mary Elings, Bancroft Digital Collections
In April, The Bancroft Library and the UC Berkeley Digital Humanities Working Group organized #HackFSM, a digital humanities hackathon using the data of the Free Speech Movement digital collections at Berkeley. In preparation for the fiftieth anniversary of the FSM at Berkeley coming up in fall 2014, the event was an opportunity to engage the UC Berkeley community around the materials and history of the movement and align that conversation with the movement’s legacy of open discourse and access to information in new ways for the digital age.
This was the first interdisciplinary, digital humanities hackathon on the Berkeley campus. All participants had to be current UC Berkeley students and had to be members of a team of between two and four participants. Each team was required to include at least one humanist and one programmer (defined by their program of study).
The teams were tasked with creating a compelling web-based user interface for the materials from the FSM digital archive, one of Bancroft’s early digital initiatives. The hackathon teams were provided access to the collections data through an Apache Solr-indexed API which was put together by the UC Berkeley Library Systems Office.
The event kicked off on April 1 when teams gathered or were formed and received API keys to the data. We also had a speaker who framed the time period historically for the participants. The closing event on April 12 offered each team time to present their project and then judges deliberated and announced the winners.
The #HackFSM hackathon was different from traditional hackathons in several ways. First, we extended the traditional compressed 24-48 hours hackathon format to 12 days. This was intended to give teams more time to explore the data and develop their projects more fully.
The expanded timeframe also allowed more opportunity for collaboration between members of each team and was intended to increase participation by students who were not necessarily part of the hackathon community or shied away from the typical compressed format — particularly women. The interdisciplinary teams also had to fulfill another requirement of the hackathon: that the web application designed would enable a researcher to answer a humanities research question, so the teams actually had to learn to communicate across their disciplines, which ended up being very successful.
Teams had access to mentors (academic and industry) throughout the 12 days. At the final event, projects were judged by two panels. One panel assessed the usability, appearance, and value of the interface from a humanist standpoint and another reviewed the quality of the code and the deployability of the tool from a technical point of view. Additionally, each team’s project had to comply with the campus policies for web accessibility and security. Compliance to these criteria was verified by running automated testing tools on each contestant site.
After presentations were completed first place was awarded to the team of Alice Liu, Craig Hiller, Kevin Casey, and Cassie Xiong, and second went to Olivia Benowitz, Nicholas Chang, Jason Khoe, and Edwin Lao. The winning team’s website has been deployed at http://hackfsm.lib.berkeley.edu/. Collectively, we were surprised and pleased by the high-quality of all the projects, both visually and functionally.
Overall, The Bancroft felt the hackathon was a very valuable experience and one we hope to build upon in the near future. It was a highly collaborative and engaging event, both for the students and for us. The event required reaching out across campus and our community, to students, IT, and administrators. The students also felt the interdisciplinary nature of the event was positive for them. They had to learn to talk to one another, teach one another, and build something together. Other feedback we received from the students included their excitement about our materials, as well as the fact that they thought the challenge we presented and having the opportunity to see their site hosted by the library was sufficient reward for participating (but the prizes were also cool).
We look forward to engaging more community around our collections and supporting digital humanities efforts in the future. They say that imitation is sincerest form of flattery; The Phoebe Hearst Museum of Anthropology, a fellow UCB institution, has just announced their first hackathon. That is great news.
Mary W. Elings, Head of Digital Collections
Charlie Macquarie, Digital Collections Assistant
(this text is excerpted and derived from an article written for the Society of California Archivist Newsletter, Summer 2014).
The topic of Digital Humanities (and Social Sciences Computing) has been a ubiquitous one at recent conferences, and this is no less true of The 53rd annual RBMS “Futures” Preconference in San Diego that took place June 19-22, 2012. The opening plenary, “Use,” on Digital Humanities featured two well-known practitioners in this field, Bethany Nowviskie of the University of Virginia and Matthew G. Kirschenbaum of the University of Maryland. For those of us who have been working in the digital library and digital collection realm for many years, Bethany’s discussion of the origins and long history of digital humanities was no surprise. Digitized library and special collection materials have been the source content used by digital humanists and digital librarians to carry out their work since the late 1980s. As a speaker at one of the ACH-ALLC programs in 1999, I was exposed to the digital tools and technologies being used to support research and scholarly exploration in what was then called linguistic and humanities computing. This work encompassed not only textual materials, but also still images, moving images, databases, and geographic materials; the stuff upon which current digital humanities and social sciences efforts are still based. What I learned then—and what the plenary speakers confirmed at this conference—is that this work has and continues to be collaborative and interdisciplinary. Long-established humanities computing centers at the Universities of Virginia and Maryland have supported this work for years, and they have had a natural partner in the library. Over the years, humanities computing centers have continued to evolve, often set within or supported by the library, and the field that is now known as Digital Humanities has gained prominence. The fact that this plenary opened the conference indicates that this topic is an important one to our community.
As scholars’ work is increasingly focused on digital materials, either digitized from physical collections or born-digital, we are seeing more demand for digital content and tools to carry out digital analysis, visualization, and computational processing, among other activities. Perhaps this is due to the maturation of the field of humanities computing, or the availability of more digital source content, or the rise of a new generation of digital native researchers. Whatever the reason, the role of the library (and the archive and the museum, for that matter) is central to this work. The library is an obvious source of digital materials for these scholars to work with, as was pointed out by both speakers.
Libraries can play a central role in providing access to this content through traditional activities, such as cataloging of digital materials, supporting digitization initiatives, and acquisition of digital content, as well as taking on new activities, such as supporting technology solutions (like digital tools), providing digital lab workspaces, and facilitating bulk access to data and content through mechanisms such as APIs. Just as we have built and facilitated access to analog research materials, we need to turn our attention to building and supporting use of digital research collections.
As Bethany stressed in her talk, we need more digital content for these scholars to work with and use. Digital humanities centers can partner with libraries to increase the scale of digitized materials in special collections or can give us tools to work with born-digital archives from pre-acquisition assessment through access to users, such as the tools being developed by Matthew’s “Bit Curator” project . By providing more content and taking the “magic” out of working with digital content, greater use can be facilitated. Unlike with physical materials, as Bethany pointed out, digital materials require use in order to remain viable, so the more we use digital materials, the longer they will last. She referred to this as “tactical preservation,” saying that our digital materials should be “bright keys,” in that the more they are used the brighter they become. By increasing use—making it easier to access and work with digital materials—we can ensure digital “futures” for our collections, whether physical or born-digital.
The collaborative nature of Digital Humanities projects — and centers — brings together researchers, technologists, tools, and content. These “places” may take various forms, but in almost all cases, the library and the historical content it collects and preserves plays a central role as the “stuff” of which digital humanities research and scholarly production is made. With its historical role in collecting and providing access to research materials, supporting teaching and learning, and long affinity with using technology for knowledge discovery, the library is well-positioned to support this work and become an even more active partner in the digital humanities and social sciences computing.
Mary W. Elings
Head of Digital Collections
(this text is excerpted and derived from an article written for RBM: A Journal of Rare Books, Manuscripts, and Cultural Heritage).
Russell Means, seen here with Dennis Banks (and William Kunstler in the background) at a press conference regarding the Patricia Hearst kidnapping at the San Francisco Airport Hilton on February 19, 1974, from the Bancroft Library’s Fang Family San Francisco Examiner photograph archive negative files.
Born on the Pine Ridge reservation in South Dakota, Russell Means moved with his family to the San Francisco Bay Area when he was three, in 1942. Banks graduated from San Leandro High School, and after stints in college and working on Indian reservations around the United States, he went on to become a leader in the American Indian Movement.
A Oglala Sioux, Means fought for the rights of indigenous people around the world, urging President Reagan to support the Miskito people in Nicaragua during the rise of the Sandinista government, and staging occupations at Mount Rushmore and the site of the Battle of Wounded Knee to raise awareness of Indian treaties and claims to land that the U.S. government neglected.
Means was a charismatic and divisive public figure, running for the Libertarian nomination in the 1987 presidential election, and appearing in dozens of films, including a starring role in The Last of the Mohicans. Means died of cancer at his home on the Pine Ridge Reservation on October 22nd.
The Bancroft Library is pleased to present the online companion exhibit to Fiat Lux Redux: Ansel Adams and Clark Kerr, which opened in The Bancroft Gallery on September 27, 2012. The online exhibit features photographs of the University of California System in the 1960s by legendary photographer Ansel Adams. These photographs — commissioned by former UC President Clark Kerr, and published in the 1967 book Fiat Lux which celebrated the educational system’s centennial — offer a rarely seen look at the evolution of the renowned University of California system through the eye of a master photographer best known for his iconic California landscapes. Fiat Lux was intended not as a document of the University as it was, but rather a portrait of the University as it would be. The Fiat Lux project was a massive endeavor, producing 605 fine prints and over 6,700 negatives, far more than the 1,000 images stipulated in Adams’s contract. After Adams’s lifetime devotion to Yosemite, Fiat Lux was probably the biggest single project of his life. The online exhibit also showcases related archival materials about the controversial Kerr himself, and the evolution of his ideas and ideals.
Visit the companion online exhibit:
©1967, the Regents of the University of California, by permission of The Bancroft Library.
Transmission or reproduction of materials protected by copyright beyond that allowed by fair use requires the written permission of the copyright owners. All requests for permission to publish must be submitted in writing to the Head of Public Services.
A zoo in the Mission? Water slides in the Richmond? As early as the 1890s, there was no shortage of places to seek thrills and fun in San Francisco, though almost no trace of these attractions exist today. Here’s a few of the spots where San Franciscans used to go to have fun:
- The original and once-exclusive home of the It’s-It ice cream sandwich, Playland, also known as Chutes At the Beach, was an amusement park at Ocean Beach that operated from the 1910s-1972. Visitors could enjoy rides like the Big Dipper, the Aeroplane Swing, and the Ship of Joy, as well as a 68-horse carousel, a fun house with a Laughing Sal, game booths, and an enormous camera obscura (which still exists today near the Cliffhouse).
- If you weren’t in the mood for sugar and adrenaline, you could visit another seaside institution not far from Playland, the Sutro Baths. Operating from 1896-1966, the Baths were a gigantic indoor pool complex with six salt water pools ranging from ice-cold to 80 degrees. Less of a lap pool and more of a place to play in the salt water, you could enter the pools through slides, by swinging on trapezes or rings, or by jumping off one of the many diving boards. Non-swimmers and spectators could watch from the stadium-style seating.
- Over in the Mission in the late 1870s, you might spend a sunny weekend day exploring Woodward’s Gardens, located on a four-acre plot of land near Mission and 15th streets. For 25 cents you could take in live animal attractions, including bears, lions, monkeys, wolves and kangaroos, as well as the extensive collection of taxidermy animals (seen in the slideshow above) arranged in curious groupings not found in nature. As if that weren’t enough, there was an extensive aquarium, four art museums, an art gallery, a rollerskating rink, hot air balloon rides, and various live performances, including acrobatics and other feats.
- Finally, if thrills were what you sought, you could visit any of the Chutes locations that cropped up around the city in the early 1900s. For a dime you could take an elevator to the top of a tower, where 8-person boats awaited to plunge you at break-neck speed to the man-made lake at the bottom.
4-6: Sutro Baths
10-17: Woodward Gardens