Home » Arts & Humanities » Digital Humanities (Page 2)

Digital Humanities

New Books in Graduate Services July 2017

Let us now praise famous men : three tenant families

The Works of James Agee Volume 3: Let Us Know Praise Famous Men, An Annotated Edition with Supplementary Manuscripts by James Agee and Walker Evans edited by Hugh Davis

What is a people?

What Is A People? by Alain Badiou, Pierre Bourdieu, Judith Butler, George Didi-Huberman, Sadri Khiari, Jacques Raciere with an introduction by Bruno Bosteels and a conclusion by Kevin Olson

Collected stories

Collected Stories by John Barth

Creative evolution

Creative Evolution by Henri Bergson with an introduction by Keith Ansell Pearson

Brecht on theatre : the development of an aesthetic

Brecht On Theatre: The Development Of An Aesthetic by Bertolt Brecht edited and translated by John Willett

Notes toward a performative theory of assembly

Notes Towards A Performative Theory Of Assembly by Judith Butler

The early stories of Truman Capote

The Early Stories Of Truman Capote by Truman Capote with a forward by Hilton Als

An outcast of the islands

The Cambridge Edition Of The Works Of Joseph Conrad: An Outcast Of The Islands by Joseph Conrad edited by Allan H. Simmons

The selected letters of Joseph Conrad

The Selected Letters Of Joseph Conrad edited by Laurence Davies

Theory of the Lyric

Theory Of The Lyric by Jonathan Culler

I greet you at the beginning of a great career : the selected correspondence of Lawrence Ferlinghetti and Allen Ginsberg, 1955-1997

I Greet You At The Beginning Of A Great Career: The Selected Correspondence Of Lawrence Ferlinghetti And Alan Ginsberg, 1955-1997 edited by Bill Morgan

Translation, Rewriting, And The Manipulation Of Literary Fame by Andre Lefevere

Collected essays

Collected Essays by Arthur Miller

The Norton Shakespeare

The Norton Shakespeare (3rd Edition) by William Shakespeare edited by Stephen Greenblatt, Walter Cohen, Suzanne Gossett, Jean E. Howard, Katherine Eisaman Maus, Gordon McMullan

Theory of prose

Theory Of Prose by Viktor Shklovsky

Danger on peaks : poems

Danger On Peaks: Poems (Deluxe Audio Edition) by Gary Snyder

Technics and time / Vol. 1, The fault of Epimetheus, transl. [from the French] by Richard Beardsworth and George Collins.

Technics And Time, 1: The Fault Of Epimetheus by Bernard Stiegler

Technics and time. 2 : Disorientation

Technics And Time, 2: Disorientation by Bernard Stiegler

Genres in discourse

Genres In Discourse by Tzvetan Todorov

Poems, in two volumes, 1807

Poems, In Two Volumes by William Wordsworth edited by Richard Matlak

Library Carpentry Sprint at UC Berkeley

The UC Berkeley Library is participating in the worldwide Library Carpentry Sprint happening on June 1st and 2nd, which is a part of the larger Mozilla Global Sprint 2017. Library Carpentry is a part of the Software Carpentry and Data Carpentry family, and it strives to bring the fundamentals of computing, as well as a platform for further self-directed learning in digital scholarship to librarians and library staff. The goal of this Library Carpentry sprint is to improve Library Carpentry lessons, as well as get input from archivists about how we can make our lessons more archivist friendly. That said, you do not need to be a librarian to participate. If you are interested in pedagogy or are familiar with digital tools taught in Library Carpentry workshops, we seek your input in improving Library Carpentry lessons.

This sprint will take place in the Berkeley Institute for Data Science (BIDS), and you can drop by anytime between 9am and 5pm on June 1st and 2nd to help amend, update, and extend the existing Library Carpentry lessons. You can stay as long as you want, whether it be two hours or two days.

Besides improving already existing Library Carpentry lessons, this sprint will also focus on getting draft lessons for SQL, Python, web scraping, and other topics into final shape for launch. Participants can contribute code or content; proofread writing, visual design, and graphic art; do QA (quality assurance) testing on prototype tools or apps; or advise or comment on project ideas or plans. All skill levels are welcome—and needed—as there are many ways to participate. Basically, we want you to bring your own unique perspective to the Library Carpentry lessons.

If you are interested in participating, all the details for the UC Berkeley Library Carpentry event can be found here, and you can sign up on the Library Carpentry Sprint Etherpad, which can be found here. Towards the of the Etherpad you will find the UC Berkeley location. Just add your name under that location, and show up during the sprint.

Hope to see you there!

 

Library of Congress releases 25 million metadata records

The Library of Congress recently released 25 million metadata records for free bulk download at loc.gov/cds/products/marcDist.php. These MARC records make up the foundation for library catalogs, such as OskiCat, which have enabled library users to find and access library books and other media for decades. As the LOC describes the collection:

 

The data covers a wide range of Library items including books, serials, computer files, manuscripts, maps, music and visual materials.  The free data sets cover more than 45 years, ranging from 1968, during the early years of MARC, to 2014.  Each record provides standardized information about an item, including the title, author, publication date, subject headings, genre, related names, summary and other notes.

Reading Room at the Library of Congress
Library of Congress Reading Room, from https://www.loc.gov

The data is available in UTF-8, MARC8, and XML formats, and has been conveniently divided by media type including books, computer files, maps, music, and more.

We’ve added the resource to the public section of the Computational Text Analysis and Text Mining Guide, where you can find many other sources for large-scale text analysis projects. For more information, take a look at the LOC’s Getting Started (PDF) for details on accessing the data.

Questions?

Stacy Reardon, Literatures and Digital Humanities Librarian, sreardon [at] berkeley.edu

Cody Hennesy, E-Learning and Information Studies Librarian, chennesy [at] berkeley.edu

California Visual Resources Association Conference, UC Berkeley, June 12 + 13

CaVraCon

 

Registration is now open for the California Visual Resources Association Conference (CaVraCon). All CaVraCon events will be held at Wurster Hall at UC Berkeley on June 12+13. We welcome information professionals in archives, commercial enterprises, libraries, museums, and visual resources collections (academic, corporate, private) as well as students and interested members of the public to attend CaVraCon 2017.

 

Please see the online registration form to register. Registration is $50 (or $25 for students and retirees).

 

The program is now live on the CaVraCon conference website! Please also see the website for information on travel, accommodations, and the conference venue, Wurster Hall at UC Berkeley.

 

The CaVraCon conference program features presentations and panel discussions on topics such as:
Digitization
Digital Preservation
Copyright
3D/VR
Emerging Technologies
Digital Humanities
Digital Art History
Digital Exhibits
Digital Assets Management
Image Metadata

Know Your Copyrights: A Review of Copyright and Fair Use for Digital Projects

By Jessica Martinez

From the beginning stages of research to the final steps of publishing, copyright rules are essential in understanding how to properly reproduce or link to sources in your own dissertation, article, website, or digital project. With this issue in mind and always at the forefront of student and faculty needs, the D-Lab hosted an informational workshop led by former copyright attorney and current U.C. Berkeley Library Scholarly Communication Officer, Rachael Samberg.

(more…)

Workshops: Digital Publishing

flyer imageWhether you are looking to create a companion website for your book or a full-scale digital project, this workshop series is designed to get you up and running with the user-friendly, open source web publishing platforms Scalar, WordPress, Omeka and Drupal.

  • All platforms are easily managed right through your web browser.
  • No programming or coding knowledge is required.
  • Options for hosting will be covered.
  • Technology workshops will be hands-on; bring a laptop if you can.

This series is designed for faculty, graduate students, and staff in the Humanities and Social Sciences and is open to any member of the UC Berkeley community. Register at bit.ly/dp-berk

WordPress for Easy and Attractive Websites
Thursday, April 20, 4-5pm
Academic Innovation Studio, Dwinelle Hall 117 (Level D)

In this hands-on workshop, we will learn the basics of creating a WordPress site, a web-based platform good for blogs, scholarly portfolios, and websites. By the end of the workshop, you will know how to post content, embed images and video, customize themes and appearance, and work with plugins.
Register

Omeka for Digital Collections and Exhibits
Wednesday, April 26, 1-2pm
D-Lab, 350 Barrows Hall

Omeka is ideal for creating and displaying an online collection or exhibit composed of many digital items. If you have a bunch of digital images, scans, and files around a certain theme or project, and you would like to organize, describe, and showcase these files, Omeka may be a good fit for you. In this hands-on workshop, we will learn how to add and describe items in Omeka, the basics of the Dublin Core metadata schema, and how to create webpages with the Simple Pages plugin.
Register

Copyright and Fair Use for Digital Projects
Thursday, April 27, 11-12noon
D-Lab, 350 Barrows Hall

This training will help you navigate the copyright, fair use, and usage rights of including third-party content in your digital project. Whether you seek to embed video from other sources for analysis, post material you scanned from a visit to the archives, add images, upload documents, or more, understanding the basics of copyright and discovering a workflow for answering copyright-related digital scholarship questions will make you more confident in your publication. We will also provide an overview of your intellectual property rights as a creator and ways to license your own work.

Designing in Drupal
Friday, April 28, 11-12noon
Academic Innovation Studio, Dwinelle Hall 117 (Level D)

Drupal is a powerful open source content management system that provides a flexible platform for developing web-based digital research projects. This workshop will cover the basics of how Drupal works, how you can create templates for storing your research materials, and how you can organize, display, and analyze those materials. Drupal is a good choice for many kinds of projects, including websites and projects underpinned by a database.

Scalar for Multimedia Digital Projects
Tuesday, May 2, 5-6pm
Berkeley Center for New Media Commons, 340 Moffitt

Developed by the Alliance for Networking Visual Culture, Scalar is a web platform designed especially for multimedia digital projects and for multimedia academic texts. Like WordPress, it is easy to create content, but it is distinguished by multiple ways of navigating through a project, annotation and metadata features, and image and video options. Choose it to develop born digital projects and books, or as a companion site for traditional scholarship. In this hands-on workshop, we’ll learn how to create a Scalar project, create pages and media, add metadata and annotations, and define paths.

Register at bit.ly/dp-berk

Art+Feminism Wikipedia Edit-a-Thon

Art+Feminism Wikipedia Edit-a-ThonArt+Feminism Wikipedia Edit-a-Thon
Tuesday, March 21st
1pm-6pm
Moffitt 405

Wikimedia’s gender trouble is well-documented. While the reasons for the gender gap are up for debate, the effect is not: content is skewed by the lack of female participation. Let’s change that! Drop by the Wikipedia Edit-a-Thon, learn how to edit Wikipedia and make a few changes of your own!

People of all gender identities and expressions welcome. Bring a laptop (or use one of ours). No editing experience necessary, we’ll provide training and assistance. Drop-in for half an hour or stay for the whole afternoon. Food and drink will be provided.

Learn more!

 

Go from Analog to Digital Texts with OCR

OCR text

A collection of digitized texts marks the start of a research project —  or does it?

For many social sciences and humanities researchers, creating searchable, editable, and machine-readable digital texts out of heaps of paper in archival boxes or from books painstakingly sourced from overlooked corners of the library can be a tedious, time-consuming process.

Scholars using traditional methodologies may find it advantageous to have a digital copy of their source material, if only to be able to more easily search through it. For anyone who wants to use computational methods and tools, converting print sources to digital text is a prerequisite. The process of converting an image of scanned text to digital text involves Optical Character Recognition (OCR) software. New developments in campus services are providing additional options for researchers who wish to prepare their texts this way.

What resources does UC Berkeley offer to convert scans to digital text?

  • For basic needs, try the Library’s scanners.
  • For documents with complex layouts or for additional language support, ABBYY FineReader with Berkeley’s OCR virtual desktop is a solution.
  • Finally, Tesseract can handle large scale OCR projects.

Books and simple documents: library scanners with OCR software

All of the UC Berkeley libraries, including the Main (Gardner) Stacks, have at least one Scannx scanner station with built-in OCR software. This software automatically identifies and splits apart pages when you’re scanning a book, and it performs OCR on any text it can identify. You can save your results as a “Searchable PDF” (with embedded OCR output) or as a Microsoft Word document, or you can save page images as TIFF, JPEG, or PDF files (omitting digitized text). For book scanning or simple document scanning, the library scanners can take you from analog to digital in a single step.

Complex layouts or language support: ABBYY FineReader and Berkeley Research Computing’s OCR virtual desktop

If your source material has a complex layout (like irregular columns, embedded images, and/or tables that you want to continue to edit as tables) or uses a non-Latin alphabet, ABBYY FineReader OCR may get you better OCR results. FineReader supports Arabic, Chinese, Cyrillic, Greek, Hebrew, Japanese, and Thai, among other languages.

On campus, FineReader is available on computers in the D-Lab (350 Barrows). From off campus, the OCR virtual research desktop provided through Berkeley Research Computing’s AEoD service (Analytic Environments on Demand, pronounced “A-odd”) allows users to log into a virtual Windows environment from their own laptop or desktop computer anywhere there’s an internet connection. If you’re visiting an archive and aren’t sure that your image capture setup is getting good enough results to use as OCR input, you can log into the OCR virtual research desktop and try out a couple samples, then refine your process as needed. You can also work on your OCR project from home, or on nights and weekends when campus buildings are closed. To use the OCR virtual research desktop, sign up for access at http://research-it.berkeley.edu/ocr.

FineReader is not generally recommended for very large numbers of PDFs because each conversion must be started by hand. However, if you don’t need to differentiate the origin of your various source PDFs (e.g., if your text analysis will treat all text as part of a single corpus, and it doesn’t matter which of the million PDFs any particular bit of text originally came from), you might be able to use FineReader by creating one or more “mega-PDFs” that combine tens or hundreds of source PDFs and letting it run over a long period of time. At a certain point, however, Tesseract might be a better choice.

OCR at scale: Tesseract on the Savio high-performance compute cluster

If you have thousands, hundreds of thousands, or millions of PDFs to OCR, a high-powered, automated solution is usually best. One such option is the open source OCR engine Tesseract. Research IT has installed Tesseract in a container that you can use on the Savio high performance computing (HPC) cluster. For researchers who are less comfortable with the command line, there is also a Jupyter notebook available that provides the necessary commands and “human-readable” documentation, in a form that you can run on the cluster. Any tenure-track faculty member is eligible for a Faculty Computing Allowance for using Savio. For graduate students, talk to your advisor about signing up for an allowance and receiving access.

No matter how large or small your OCR project is, UC Berkeley has the perfect tool for you in scanning equipment, ABBYY FineReader, or Tesseract. Happy converting!

 

Related Event: From Sources to Data: Using OCR in the Classroom

March 16, 2017

10:30am to 12:00pm

Open to: All faculty, graduate students, and staff

 

Questions?

Quinn Dombrowski, Research IT  quinnd [at] berkeley.edu

Stacy Reardon, Library  sreardon [a] berkeley.edu 
Thank you to Cody Hennesy for suggestions. Cross posted on the D-Lab blog and the Research IT blog.

 

Q & A with Julie van den Hout

Van den Hout, a 2015 graduate from Berkeley, has been a Digital Humanities Project Archivist at the Bancroft since October 2016.  Julie came to Berkeley having discovered her passion for historical research after working for some years in health care. Her honors thesis explored a 17th-century Dutch book aimed at potential immigrants to what is now New York; it was awarded the Charlene Conrad Liebau Library Prize for Undergraduate Research (honorable mention).

Julie van den Hout. Photograph by Alejandro Serrano for the University Library.

What inspires you about your position?
It has been an honor to work with the Engel Sluiter Historical Documents Collection. This immense personal research collection, donated to the Bancroft Library, is truly a legacy of Dr. Sluiter’s life work as a UC Berkeley Latin American History professor and researcher. By bringing together materials from archives worldwide, the collection provides detailed Spanish, Dutch, English, and Portuguese perspectives on sixteenth and seventeenth century Atlantic trade. Through a UC Berkeley Digital Humanities Collaborative Research grant, Dutch Studies and the Bancroft Library are working together to digitize a small subset of the collection on the seventeenth century colony of New Netherland (now New York), and then analyze the texts using natural language processing. Working with the primary sources in the Engel Sluiter Collection has taught me much more than I could ever learn in a classroom. I am excited about the capabilities of digital humanities, and what our current project will reveal about the Dutch colony.

Your priorities over the next 6-9 months
My overarching and ultimate goal with the project is to enhance search capabilities for the Engel Sluiter Collection, and help make these impressive, but relatively unexplored, materials more accessible to researchers. My immediate focus is to reconcile the Dutch documents with the digitized OCR outputs, in preparation for processing using our in-house topic modeling application that will identify themes within the corpus. The documents will be then be published online, with the new search capabilities freely available to researchers worldwide. At the same time, I will be working to learn more about the potential of digital humanities and applying our model to other historical texts.

Opportunities at the Berkeley campus and the Library
The libraries of UC Berkeley are a treasure trove of primary and secondary sources for study of the Atlantic World. Being at a research institution is a great way to find support for new ideas, and digital humanities at UC Berkeley is at the cutting edge in its field. While some people see digital humanities as trying to replace traditional scholarship, I see it as being able to enhance and partner with traditional scholarship to add new dimensions or perspectives. In our project, for example, the use of technology is helping us find connections in the historical data and texts that may not readily visible in traditional formats.

A favorite book or favorite campus hangout
My favorite book is Two Years Before the Mast, by Richard Henry Dana, Jr., Dana’s own memoir of two years spent, as an ordinary sailor, on a merchant ship between Massachusetts and California around Cape Horn, in the early 1800’s. Long before I became interested in academic history, Dana’s engaging accounts of life at sea and trading hides along the Pacific Coast brought the past to life for me. His descriptions of colorful, pre-gold rush Mexican California opened my eyes to a California I had never learned in history books.

Gallica Gives

"Ça, mon enfant, c'est du pain.. [That, my child, is bread...]" par Gottlob in L'Assiette au beurre (1901)

Online since 1997, Gallica remains one of the major digital libraries available for free on the Internet. With more than 12 million high-resolution digital objects from the collections of the Bibliothèque Nationale de France (BnF) as well as from hundreds of partner institutions, it includes books, journals, newspapers, manuscripts, maps, images, audio files, and more. The illustration above “Ça, mon enfant, c’est du pain.. [That, my child, is bread…]” by Fernand-Louis Gottlob was published in one of the first issues of the weekly satirical magazine L’Assiette au beurre (1901-1936) which is also held in print at UC Berkeley. Committed to the ever-evolving needs of its user community, Gallica’s social media outlets include Facebook, Twitter, Pinterest and even a BnF app.

Show Your Support

Show Your Support button to donate to the Library

Library Events Calendar

Subscribe to Email