Home » Bancroft Library » Digital Collections Unit (Page 2)

Digital Collections Unit

Processing Notes: Digital Files From the Bruce Conner Papers

The following post includes processing notes from our summer 2015 intern Nissa Nack, a graduate student in the Master of Library and Information Science program at San Jose State University’s iSchool. Nissa successfully processed over 4.5 GB of data from the Bruce Conner papers and prepared the files for researcher access. 

The digital files from the Bruce Conner papers consists of seven 700-MB CD-Rs (disks) containing images of news clippings, art show announcements, reviews and other memorabilia pertaining to the life and works of visual artist and filmmaker Bruce Conner.  The digital files were originally created and then stored on the CD-Rs using an Apple (Mac) computer, type and age unknown.  The total extent of the collection is measured at 4,517 MB.

Processing in Forensic Toolkit (FTK)

To begin processing this digital collection, a disk image of each CD was created and then imported into the Forensics Toolkit software (FTK) for file review and analysis.  [Note: The Bancroft Library creates disk images of most computer media in the collections for long-term preservation.]

Unexpectedly, FTK displayed the contents of each disk in four separate file systems; HFS, HFS+, Joliet, and ISO 9660, with each file system containing an identical set of viewable files.  Two of the systems, HFS and HFS+, also displayed discrete, unrenderable system files. We believe that the display of data in four separate systems may be due to the original files having been created on a Mac and then saved to a disk that could be read by both Apple and Windows machines.  HFS and HFS+ are Apple file systems, with HFS+ being the successor to HFS.  ISO 9660 was developed as a standard system to allow files on optical media to be read by either a Mac or a PC.  Joliet is an extension of ISO 9660 that allows use of longer file names as well as Unicode characters.

With the presentation of a complete set of files duplicated under each file system, the question arose as to which set of files should be processed and ultimately used to provide access to the collection.  Based on the structure of the disk file tree as displayed by FTK and evidence that a Mac had been used for file creation, it was initially decided to process files within the HFS+ system folders.

Processing of the files included a review and count of individual file types, review and description of file contents, and a search of the files for Personally Identifiable Information (PII).  Renderable files identified during processing included Photoshop (.PSD), Microsoft Word (.DOC), .MP3, .TIFF, .JPEG, and .PICT.  System files included DS_Store, rsrc, attr, and 8fs.

PII screening was conducted via pattern search for phone numbers, social security numbers, IP addresses, and selected keywords.  FTK was able to identify a number of telephone numbers in this search; however, it also flagged groups of numbers within the system files as being potential PII, resulting in a substantial number of false hits.

After screening, the characteristics of the four file systems were again reviewed, and it was decided to use the Joliet file for export. Although the HFS+ file system was probably used to create and store the original files, it proved difficult to cleanly export this set of files from FTK. FTK “unpacked” the image files and displayed unrenderable resource, attribute and system files as discrete items.  For example:  for every .PSD file, a corresponding rsrc file could be found.  The .PSD files can be opened, but the rsrc files cannot.  The files were not “repacked” during export, and it is unknown as to how this might impact the images when transferred to another platform. The Joliet file system allowed us to export the images without separating any system-specific supporting files.

HFS+ file system display showing separated files

HFS+ file system display showing separated files

Issues with the length of file and path names were particularly felt during transfer of exported files to the Library network drive and, in some cases, after the succeeding step, file normalization.

File Normalization

After successful export, we began the task of file normalization whereby a copy of the master (original) files were used to produce access, and preservation surrogates in appropriate formats.  Preservation files would ideally be in a non-compressed format that resists deterioration and/or obsolescence.  Access surrogates are produced in formats that are easily accessible across a variety of platforms. .TIFF, .JPEG, .PICT, and .PSD files were normalized to the .TIFF format for preservation and the .JPEG format for access. Word documents were saved in the .PDF format for preservation and access, and .MP3 recordings were saved to .WAV format for preservation and a second .MP3 copy created for access.

Normalization Issues

Photoshop

Most Photoshop files converted to .JPEG and .TIFF format without incident.  However, seven files could make the transfer to .TIFF but not to .JPEG.  The affected files were all bitmap images of typewritten translations of reviews of Bruce Conner’s work.  The original reviews appear to have been written for Spanish language newspapers.

To solve the issue, the bitmap images were converted to grayscale mode and from that point could be used to produce a .JPEG surrogate.  The conversion to grayscale should not adversely impact file as the original image was of a black and white typewritten document, not of a color imbued object.

PICT

The .PICT files in this collection appeared in FTK and exported with a double extension (.pct.mac), and couldn’t be opened by either Mac or PC machines.  Adobe Bridge was used to locate and select the files and then, using the “Batch rename” feature under the Tools menu, to create a duplicate file without the .mac in the file name.

The renamed .PCT files were retained as the master copies, and files with a duplicate extension were discarded.

Adobe Bridge was then used to create .TIFF and .JPEG images for the Preservation and Access files as in the case of .PSD files.

MP3 and WAV

We used the open-source Audacity software to save .MP3 files in the .WAV format, and to create an additional .MP3 surrogate. Unfortunately the Audacity software appeared to be able to process only one file type at a time.  In other words, each original .MP3 file had to be individually located and exported as a .WAV file, which was then used to create the access .MP3 file.  Because there were only six .MP3 files in this collection, the time to create the access and preservation files was less than an hour.  However, if in the future a larger number of .MP3s need to be processed, an alternate method or workaround will need to be found.

File name and path length

The creator of this collection used long, descriptive file names with no real apparent overall naming scheme.  This sometimes created a problem when transferring files, as the resulting path names of some files would exceed the allowable character limits and not allow the file to transfer.  The “fix” was to eliminate words/characters while retaining as much information as possible from the original file name until a transfer could occur.

Processing Time

Processing time for this project, including time to create the processing plan and finding aid, was approximately 16 working days.  However, a significant portion of the time, approximately ¼ to 1/3, was spent learning the processes and dealing with technological issues (such as file renaming or determining which file system to use).

Case Study of the Digital Files from the Reginald H. Barrett Papers

The following is a guest post from our summer 2015 intern Beaudry Allen, a graduate student in the Master of Archives and Records Administration (MARA) program at San Jose State University’s iSchool.

Case Study of the Digital Files from the Reginald H. Barrett Papers

As archivists, we have long been charged with selecting, appraising, preserving, and providing access to records, though as the digital landscape evolves there has been a paradigm shift in how to approach those foundational practices. How do we capture, organize, support long-term preservation, and ultimately provide access to digital content; especially with the convergence of challenges resulting from the exponential growth in the amount of born-digital material produced?

So, embarking on a born-digital processing project can be a daunting prospect. The complexity of the endeavor is unpredictable, and undoubtedly unforeseen issues will arise. This summer I had the opportunity to experience the challenges of born-digital processing firsthand at the Bancroft Library, as I worked on the digital files from the Reginald H. Barrett papers.

Reginald Barrett was a former professor at UC Berkeley in the Department of Environmental Science, Policy, & Management. Upon his retirement in 2014,  Barrett donated his research materials to the Bancroft Library. In addition to more than 96 linear feet of manuscripts and photographs (yet to be described), the collection included one hard drive, one 3.5” floppy disk, three CDs, and his academic email account. His digital files encompassed an array of emails, photographs, reports, presentations, and GIS-mapping data, which detailed his research interests in animal populations, landscape ecology, conservation biology, and vertebrate population ecology. The digital files provide a unique vantage point from which to examine the methods of research used by Barrett, especially his involvement with the development of California Wildlife Habitat Relationships System. The project’s aim was to process and describe Barrett’s born-digital materials for future access.

The first step in processing digital files is ensuring that your work does not disrupt the authenticity and integrity of the content (this means taking steps to prevent changes to file dates and timestamps or inadvertently rearranging files). Luckily, the initial ground work of virus-checking the original files and creating a disk image of the media had already been done by Bancroft Technical Services and the Library Systems Office. A disk image is essentially an exact copy of the original media which replicates the structure and contents of a storage device.  Disk imaging was done using a FRED (Forensic Recovery of Evidence Device) workstation, and the disk image was transferred to a separate network server. The email account had also been downloaded as a Microsoft Outlook .PST file and converted to the preservation MBOX format. Once these preservation files were saved, I used a working copy of the files to perform my analysis and description.

My next step was to run checksums on each disk image to validate its authenticity, and to generate file directory listings which will serve as inventories of the original source media. The file directory listings are saved with the preservation copies to create an AIP (Archival Information Package).

Using FTK

Actual processing of the disk images from the CDs, floppy disk, and hard drive was done using the Forensic Toolkit (FTK) software. The program reads disk images and mimics the file system and contents, allowing me to observe the organizational structure and content of each media. The processing procedures I used were designed by Kate Tasker and based on the 2013 OCLC report, “Walk This Way: Detailed Steps for Transferring Born-Digital Content from Media You Can Read In-house” (Barrera-Gomez & Erway, 2013).

Processing was a two-fold approach; one, survey the collection’s content, subject matter, and file formats; and two (which was a critical component to processing), identify and restrict items that contained Personally Identifiable information (PII), or student records protected by the Family Educational Rights and Privacy Act (FERPA). I relied on FTK’s pattern search function to locate Social Security Numbers, credit card numbers, phone numbers, etc., and on its index search function to locate items with sensitive keywords. I was then able to assign “restricted” labels to each item and exclude them from the publicly-accessible material.  

While I, like many iSchool graduate students, am familiar with the preservation standard charts for file formats, I was introduced to new file formats and GIS data types which will require more research before they can be normalized to a format recommended for long-term preservation or access. Though admittedly hard, there is something gratifying about being faced with new challenges. Another challenge was identifying and flagging unallocated space, deleted files, corrupted files, and system files so they were not transferred to an access copy.

A large component of traditional archival processing is arrangement, yet creating an arrangement beyond the original order was impractical as there were over 300,000 files (195 GB) on the hard drive alone. Using original order also preserves the original file name convention and file hierarchy as determined by the creator.  Overall, I found Forensic Toolkit to be a straightforward, albeit sensitive program, and I was easily able to navigate the files and survey content.

One of the challenges in using FTK which halted my momentum many times was exporting. After processing in FTK and assigning appropriate labels and restrictions, the collection files were exported with the restricted files excluded (thus creating a second, redacted AIP). The exported files would then be normalized to a format which is easy to access (for example, converting a Word .doc format to .pdf). The problem was the computer could not handle the 177 GB of files I wanted to export. I could not export directories larger than 20 GB without it either crashing or receiving export errors from FTK. This meant I needed to export some directories in smaller pieces, with sizes ranging from 2-15 GB.  Smaller exports took ten minutes each, while larger files from 10-15 GB could take 4-15 hours, so most of my time was spent wishin’ and hopin’ and thinkin’ and prayin’ the progress bar for each export would be fast.

Another major hiccup occurred in large exports, when FTK failed to exclude files marked as restricted. This meant I had go through the exported files and cross reference my filters so I could manually remove the restricted items.  By the end of it, I felt like I did all the work twice, but the experience helped us to determine the parameters of what FTK and the computer could handle.

The dreaded progress bar…

FTK export progress bar

FTK export progress bar

Using ePADD

The email account was processed using an open-source program developed by Stanford University’s Special Collections & Archives that supports the appraisal, processing, discovery, and delivery of email archives (ePADD). Like FTK, ePADD has the ability to browse all files and add restrictions to protect private and sensitive information. I was able to review the senders and message contents, and display interesting chart visualizations of the data. Considering Barrett’s email was from his academic account, I had run “lexicon” searches relating to students to find and restrict information protected by FERPA. ePADD allows the user to choose from existing or user-generated lexicons, in order to search for personal or confidential information, or to perform complex searches for thematic content. I had better luck entering my own search terms to locate specific PII than accepting ePADD’s default search terms, as I was very familiar with the collection by that point and knew what kind of information to search for.

For the most part the platform seems very sleek and user-friendly, though I had to refer to the manual more often than not as I ended up not finding the interface as intuitive as it seemed. After appraisal and processing, ePADD will export the emails to the discovery or delivery modules. The delivery module provides a user interface so researchers can view the emails. The Bancroft Library is in the process of implementing plans to make email collections and other born-digital materials available.

Overall, the project was also a personal opportunity to evaluate the cyclical relationship between theory and practice of digital forensics and processing. Before the project I had a good grasp on the theoretical requirements and practices in digital preservation, but had not conceptualized the implications of each step of the project and how time-consuming it could be. The digital age conjures up images of speed, but I spent 100 hours (in a 7-week period) processing the collection. There are so many variables that need to be considered at each step, so that important information is made accessible. This also amplified the need for collaboration in building a successful digital collection program, as one must rely on participation from curatorial staff and technical services to ensure long-term preservation and access. The project even brought up new questions of “More Product, Less Process” (MPLP) processing in relation to born-digital content: what are the risks associated with born-digital MPLP, and how can an institute mitigate potential pitfalls? How do we need to approach born-digital processing differently?

The Newest Addition to the Bancroft Digital Collections Forensic Workstation

By Kate Tasker and Julie Goldsmith, Bancroft Digital Collections

Last week in the Bancroft’s Digital Collections Unit, we put our new Tableau write blocker to work. Before processing a born-digital collection, a digital archivist must first be able to access and transfer data from the original storage media, often received as hard drives, optical disks and floppy disks. Floppy disks have a mechanism to physically prevent changes to the data during the transfer process, but data on hard drives and USB drives can be easily and irreversibly altered just by connecting the drive to a computer. We must access these drives in write-blocked (or read-only) mode to avoid altering the original metadata (e.g. creation dates, timestamps, and filenames). The original metadata is critical for maintaining the authenticity, security, contextual information, and research value of digital collections.

usb-writeblocker1

Tableau T8-R2

A write blocker is essentially a one-way street for data; it provides assurance that no changes were made, regardless of user error or software modification. For digital archives, using a write blocker ensures an untampered audit trail of changes that have occurred along the way, which is essential for answering questions about provenance, original order and chain of custody. As stewards of digital collections, we also have a responsibility to identify and restrict any personally identifying information (PII) about an individual (Social Security numbers, medical or financial information, etc.), which may be found on computer media. The protected chain of custody is seen as a safeguard for collections which hold these types of sensitive materials.

Other types of data which are protected by write-blocked transfers include configuration and log files which update automatically when a drive connects to a system. On a Windows formatted drive, the registry files can provide information associated with the user, like the last time they logged in and various other account details.  Another example would be if you loaned someone a flash drive and they plugged it into their Mac; by doing so they can unintentionally update or install system file information onto the flash drive like a hidden .Spotlight-V100 file. (Spotlight is the desktop search utility on the Mac OS X, and the contents of this folder serve as an index of all files that were on the drive the last time it was used with a Mac.)

Write blockers also support fixity checks for digital preservation. We use software programs to calculate unique identifiers for every original file in a collection (referred to as cryptographic hash algorithms, or checksums, by digital preservationists). Once files have been copied, the same calculations are run on the files to generate another set of checksums. If they match that means that the digital objects are the same, bit for bit, as the originals, without any modification or data degradation.

File tree in FTK Imager

File tree in FTK Imager

Once we load the digital collection files in FTK Imager, a free lightweight version of the Forensic Tool Kit (FTK), a program that the FBI uses in criminal data investigations we can view the folders and files in the original file directory structure. We can also easily export a file directory listing, which is an inventory of all the files in the collection with their associated metadata. The file directory listing provides us with specific information about each file (filename, filepath, file size, date created, date accessed, date modified, and checksum) as well as a summary of the entire collection (total number of files, total file size, date range, and contents). It also helps us to make processing decisions, such as whether to capture the entire hard drive as a disk image, or whether to transfer selected folders and files as a logical copy.

Write blockers are also known in the digital forensics and digital preservation fields as Forensic Bridges. Our newest piece of equipment is already helping us bridge the gap between preserving original unprocessed computer media and creating open digital collections which are available to all.

For Further Reading:

AIMS Working Group. “AIMS Born-Digital Collections: An InterInstitutional Model for Stewardship.” 2012. http://www.digitalcurationservices.org/files/2013/02/AIMS_final_text.pdf

Gengenbach, Martin J. “‘The Way We Do it Here’: Mapping Digital Forensics Workflows in Collecting Institutions.” A Master’s Paper for the M.S. in L.S degree. August, 2012. http://digitalcurationexchange.org/system/files/gengenbach-forensic-workflows-2012.pdf

Kirschenbaum, Matthew G., Richard Ovenden, and Gabriela Redwine. “Digital Forensics and Born-Digital Content in Cultural Heritage Collections.” Washington, DC: Council on Library and Information Resources, 2010. http://www.clir.org/pubs/reports/pub149

BitCurator Project. http://bitcurator.net

Forensics Wiki. http://www.forensicswiki.org/

BANCROFT SUMMER ARCHIVAL INTERNSHIP 2015


The Bancroft Library University of California Berkeley

banclogo

SUMMER ARCHIVAL INTERNSHIP 2015


Who is Eligible to Apply

Graduate students currently attending an ALA accredited library and information science program who have taken coursework in archival administration and/or digital libraries.

Born-Digital Processing Internship Duties

The Born-Digital Processing Intern will be involved with all aspects of digital collections work, including inventory control of digital accessions, collection appraisal, processing, description, preservation, and provisioning for access. Under the supervision of the Digital Archivist, the intern will analyze the status of a born-digital manuscript or photograph collection and propose and carry out a processing plan to arrange and provide access to the collection. The intern will gain experience in appraisal, arrangement, and description of born-digital materials. She/he will use digital forensics software and hardware to work with disk images and execute processes to identify duplicate files and sensitive/confidential material. The intern will create an access copy of the collection and, if necessary, normalize access files to a standard format. The intern will generate an EAD-encoded finding aid in The Bancroft Library’s instance of ArchivesSpace for presentation on the Online Archive of California (OAC). Lastly, the intern will complete a full collection-level MARC catalog record for the collection using the University Library’s Millennium cataloging system. All work will occur in the Bancroft Technical Services Department, and interns will attend relevant staff meetings.

Duration:

6 weeks (minimum 120 hours), June 29 – August 7, 2015 (dates are somewhat flexible)

NOTE: The internship is not funded, however, it may be possible to arrange for course credit for the internship. Interns will be responsible for living expenses related to the internship (housing, transportation, food, etc.).

Application Procedure:

The competitive selection process is based on an evaluation of the following application materials:

Cover letter & Resume
Current graduate school transcript (unofficial)
Photocopy of driver’s license (proof of residency if out-of-state school)
Letter of recommendation from a graduate school faculty member
Sample of the applicant’s academic writing or a completed finding aid

All application materials must be postmarked on or before Friday, April 17, 2015 and either mailed to:

Mary Elings
Head of Digital Collections
The Bancroft Library
University of California Berkeley
Berkeley, CA 94720.

or emailed to melings [at] library.berkeley.edu, with “Born Digital Processing Internship” in the subject line.

Selected candidates will be notified of decisions by May 1, 2015.

Bancroft Library Processes First Born-Digital Collection

The Bancroft Library’s Digital Collections Unit recently finished a pilot project to process its first born-digital archival collection: the Ladies’ Relief Society records, 1999-2004. Based on earlier work and recommendations by the Bancroft Digital Curation Committee (Mary Elings, Amy Croft, Margo Padilla, Josh Schneider, and David Uhlich) we’re implementing best-practice procedures for acquiring, preserving, surveying, and describing born-digital files for discovery and use.

Read more about our efforts below, and check back soon for further updates on born-digital collections.

State of the Digital Archives: Processing Born-Digital Collections at the Bancroft Library (PDF)

Abstract: 

This paper provides an overview of work currently being done in the Bancroft’s Digital Collections Unit to preserve, process, and provide access to born-digital collections. It includes background information about the Bancroft’s Born Digital Curation Program and discusses the development of workflows and strategies for processing born-digital content, including disk imaging, media inventories, hardware and software needs and support, arrangement, screening for sensitive content, and description. The paper also describes DCU’s pilot processing project of the born-digital files from the Ladies’ Relief Society records.

Bancroft to Explore Text Analysis as Aid in Analyzing, Processing, and Providing Access to Text-based Archival Collections

Mary W. Elings, Head of Digital Collections, The Bancroft Library

The Bancroft Library recently began testing a theory discussed at the Radcliffe Workshop on Technology & Archival Processing held at Harvard’s Radcliffe College in early April 2014. The theory suggested that archives can use text analysis tools and topic modelling — a type of statistical model for discovering the abstract “topics” that occur in a collection of documents — to analyze text-based archival collections in order to aid in analyzing, processing and describing collections, as well as improving access.

Helping us to test this theory, the Bancroft welcomed summer intern Janine Heiser from the UC Berkeley School of Information. Over the summer, supported by an ISchool Summer Non-profit Internship Grant, Ms. Heiser worked with digitized analog archival materials to test this theory, answer specific research questions, and define use cases that will help us determine if text analysis and topic modelling are viable technologies to aid us in our archival work. Based on her work over the summer, the Bancroft has recently awarded Ms. Heiser an Archival Technologies Fellowship for 2015 so that she can continue the work she began in the summer and further develop and test her work.

                During her summer internship, Ms. Heiser created a web-based application, called “ArchExtract” that extracts topics and named entities (people, places, subjects, dates, etc.) from a given collection. This application implements and extends various natural language processing software tools such as MALLET and the Stanford Core NLP toolkit. To test and refine this web application, Ms. Heiser used collections with an existing catalog record and/or finding aid, namely the John Muir Correspondence collection, which was digitized in 2009.

                For a given collection, an archivist can compare the topics and named entities that ArchExtract outputs to the topics found in the extant descriptive information, looking at the similarities and differences between the two in order to verify ArchExtract’s accuracy. After evaluating the accuracy, the ArchExtract application can be improved and/or refined.

                Ms. Heiser also worked with collections that either have minimal description or no extant description in order to further explore this theory as we test the tool further. Working with Bancroft archivists, Ms. Heiser will determine if the web application is successful, where it falls short, and what the next steps might be in exploring this and other text analysis tools to aid in processing collections.

                The hope is that automated text analysis will be a way for libraries and archives to use this technology to readily identify the major topics found in a collection, and potentially identify named entities found in the text, and their frequency, thus giving archivists a good understanding of the scope and content of a collection before it is processed. This could help in identifying processing priorities, funding opportunities, and ultimately helping users identify what is found in the collection.

               Ms. Heiser is a second year masters’ student at the UC Berkeley School of Information where she is learning the theory and practice of storing, retrieving and analyzing digital information in a variety of contexts and is currently taking coursework in natural language processing with Marti Hearst. Prior to the ISchool, Ms. Heiser worked at several companies where she helped develop database systems and software for political parties, non-profits organizations, and an online music distributor. In her free time, she likes to go running and hiking around the bay area. Ms. Heiser was also one of our participants in the #HackFSM hackathon! She was awarded an ISchool Summer Non-profit Internship Grant to support her work at Bancroft this summer and has been awarded an Archival Technologies Fellowship at Bancroft for 2015.

#HackFSM Whitepaper is out: “#HackFSM: Bootstrapping a Library Hackathon in Eight Short Weeks”

The Bancroft Library and Research IT have just published a whitepaper on the #HackFSM hackathon: “#HackFSM: Bootstrapping a Library Hackathon in Eight Short Weeks.”

Abstract:

This white paper describes the process of organizing #HackFSM, a digital humanities hackathon around the Free Speech Movement digital archive, jointly organized by Research IT and The Bancroft Library at UC Berkeley. The paper includes numerous appendices and templates of use for organizations that wish to hold a similar event.

Publication download:  HackFSM_bootstrapping_library_hackathon.pdf

Citation:

“#HackFSM: Bootstrapping a Library Hackathon in Eight Short Weeks”. Dombrowski, Quinn, Mary Elings, Steve Masover, and Camille Villa. “#HackFSM: Bootstrapping a Library Hackathon in Eight Short Weeks”. Research IT at Berk. Published October 3, 2014.

From: http://research-it.berkeley.edu/publications/hackfsm-bootstrapping-library-hackathon-eight-short-weeks

Bancroft hosts #HackFSM, the first interdisciplinary hackathon at UC Berkeley

By Charlie Macquarie and Mary Elings, Bancroft Digital Collections

In April, The Bancroft Library and the UC Berkeley Digital Humanities Working Group organized #HackFSM, a digital humanities hackathon using the data of the Free Speech Movement digital collections at Berkeley. In preparation for the fiftieth anniversary of the FSM at Berkeley coming up in fall 2014, the event was an opportunity to engage the UC Berkeley community around the materials and history of the movement and align that conversation with the movement’s legacy of open discourse and access to information in new ways for the digital age.

This was the first interdisciplinary, digital humanities hackathon on the Berkeley campus. All participants had to be current UC Berkeley students and had to be members of a team of between two and four participants. Each team was required to include at least one humanist and one programmer (defined by their program of study).

The teams were tasked with creating a compelling web-based user interface for the materials from the FSM digital archive, one of Bancroft’s early digital initiatives. The hackathon teams were provided access to the collections data through an Apache Solr-indexed API which was put together by the UC Berkeley Library Systems Office.

The event kicked off on April 1 when teams gathered or were formed and received API keys to the data. We also had a speaker who framed the time period historically for the participants. The closing event on April 12 offered each team time to present their project and then judges deliberated and announced the winners.

The #HackFSM hackathon was different from traditional hackathons in several ways. First, we extended the traditional compressed 24-48 hours hackathon format to 12 days. This was intended to give teams more time to explore the data and develop their projects more fully.

The expanded timeframe also allowed more opportunity for collaboration between members of each team and was intended to increase participation by students who were not necessarily part of the hackathon community or shied away from the typical compressed format — particularly women. The interdisciplinary teams also had to fulfill another requirement of the hackathon: that the web application designed would enable a researcher to answer a humanities research question, so the teams actually had to learn to communicate across their disciplines, which ended up being very successful.

Teams had access to mentors (academic and industry) throughout the 12 days. At the final event, projects were judged by two panels. One panel assessed the usability, appearance, and value of the interface from a humanist standpoint and another reviewed the quality of the code and the deployability of the tool from a technical point of view. Additionally, each team’s project had to comply with the campus policies for web accessibility and security. Compliance to these criteria was verified by running automated testing tools on each contestant site.

After presentations were completed first place was awarded to the team of Alice Liu, Craig Hiller, Kevin Casey, and Cassie Xiong, and second went to Olivia Benowitz, Nicholas Chang, Jason Khoe, and Edwin Lao. The winning team’s website has been deployed at http://hackfsm.lib.berkeley.edu/. Collectively, we were surprised and pleased by the high-quality of all the projects, both visually and functionally.

Overall, The Bancroft felt the hackathon was a very valuable experience and one we hope to build upon in the near future. It was a highly collaborative and engaging event, both for the students and for us. The event required reaching out across campus and our community, to students, IT, and administrators. The students also felt the interdisciplinary nature of the event was positive for them. They had to learn to talk to one another, teach one another, and build something together. Other feedback we received from the students included their excitement about our materials, as well as the fact that they thought the challenge we presented and having the opportunity to see their site hosted by the library was sufficient reward for participating (but the prizes were also cool).

We look forward to engaging more community around our collections and supporting digital humanities efforts in the future. They say that imitation is sincerest form of flattery; The Phoebe Hearst Museum of Anthropology, a fellow UCB institution, has just announced their first hackathon. That is great news.

Mary W. Elings, Head of Digital Collections

Charlie Macquarie,  Digital Collections Assistant

(this text is excerpted and derived from an article written for the Society of California Archivist Newsletter, Summer 2014).

Digital Humanities and the Library

The topic of Digital Humanities (and Social Sciences Computing) has been a ubiquitous one at recent conferences, and this is no less true of The 53rd annual RBMS “Futures” Preconference in San Diego that took place June 19-22, 2012. The opening plenary, “Use,” on Digital Humanities featured two well-known practitioners in this field, Bethany Nowviskie of the University of Virginia and Matthew G. Kirschenbaum of the University of Maryland. For those of us who have been working in the digital library and digital collection realm for many years, Bethany’s discussion of the origins and long history of digital humanities was no surprise. Digitized library and special collection materials have been the source content used by digital humanists and digital librarians to carry out their work since the late 1980s. As a speaker at one of the ACH-ALLC programs in 1999, I was exposed to the digital tools and technologies being used to support research and scholarly exploration in what was then called linguistic and humanities computing. This work encompassed not only textual materials, but also still images, moving images, databases, and geographic materials; the stuff upon which current digital humanities and social sciences efforts are still based. What I learned then—and what the plenary speakers confirmed at this conference—is that this work has and continues to be collaborative and interdisciplinary. Long-established humanities computing centers at the Universities of Virginia and Maryland have supported this work for years, and they have had a natural partner in the library. Over the years, humanities computing centers have continued to evolve, often set within or supported by the library, and the field that is now known as Digital Humanities has gained prominence. The fact that this plenary opened the conference indicates that this topic is an important one to our community.

As scholars’ work is increasingly focused on digital materials, either digitized from physical collections or born-digital, we are seeing more demand for digital content and tools to carry out digital analysis, visualization, and computational processing, among other activities. Perhaps this is due to the maturation of the field of humanities computing, or the availability of more digital source content, or the rise of a new generation of digital native researchers. Whatever the reason, the role of the library (and the archive and the museum, for that matter) is central to this work. The library is an obvious source of digital materials for these scholars to work with, as was pointed out by both speakers.

Libraries can play a central role in providing access to this content through traditional activities, such as cataloging of digital materials, supporting digitization initiatives, and acquisition of digital content, as well as taking on new activities, such as supporting technology solutions (like digital tools), providing digital lab workspaces, and facilitating bulk access to data and content through mechanisms such as APIs. Just as we have built and facilitated access to analog research materials, we need to turn our attention to building and supporting use of digital research collections.

As Bethany stressed in her talk, we need more digital content for these scholars to work with and use. Digital humanities centers can partner with libraries to increase the scale of digitized materials in special collections or can give us tools to work with born-digital archives from pre-acquisition assessment through access to users, such as the tools being developed by Matthew’s “Bit Curator” project . By providing more content and taking the “magic” out of working with digital content, greater use can be facilitated. Unlike with physical materials, as Bethany pointed out, digital materials require use in order to remain viable, so the more we use digital materials, the longer they will last. She referred to this as “tactical preservation,” saying that our digital materials should be “bright keys,” in that the more they are used the brighter they become. By increasing use—making it easier to access and work with digital materials—we can ensure digital “futures” for our collections, whether physical or born-digital.

The collaborative nature of Digital Humanities projects — and centers — brings together researchers, technologists, tools, and content. These “places” may take various forms, but in almost all cases, the library and the historical content it collects and preserves plays a central role as the “stuff” of which digital humanities research and scholarly production is made. With its historical role in collecting and providing access to research materials, supporting teaching and learning, and long affinity with using technology for knowledge discovery, the library is well-positioned to support this work and become an even more active partner in the digital humanities and social sciences computing.

Mary W. Elings
Head of Digital Collections
(this text is excerpted and derived from an article written for RBM: A Journal of Rare Books, Manuscripts, and Cultural Heritage).

Russell Means, November 10, 1939 – October 22, 2012

Russell Means, seen here with Dennis Banks (and William Kunstler in the background) at a press conference regarding the Patricia Hearst kidnapping at the San Francisco Airport Hilton on February 19, 1974, from the Bancroft Library’s Fang Family San Francisco Examiner photograph archive negative files.

Born on the Pine Ridge reservation in South Dakota, Russell Means moved with his family to the San Francisco Bay Area when he was three, in 1942. Banks graduated from San Leandro High School, and after stints in college and working on Indian reservations around the United States, he went on to become a leader in the American Indian Movement.

A Oglala Sioux, Means fought for the rights of indigenous people around the world, urging President Reagan to support the Miskito people in Nicaragua during the rise of the Sandinista government, and staging occupations at Mount Rushmore and the site of the Battle of Wounded Knee to raise awareness of Indian treaties and claims to land that the U.S. government neglected.

Means was a charismatic and divisive public figure, running for the Libertarian nomination in the 1987 presidential election, and appearing in dozens of films, including a starring role in The Last of the Mohicans.  Means died of cancer at his home on the Pine Ridge Reservation on October 22nd.

Show Your Support

Show Your Support button to donate to the Library

Library Events Calendar

Subscribe to Email