Home » Articles posted by asackmann

Posts by Author: asackmann

Overleaf and ShareLaTeX – Joining Forces!

Overleaf and ShareLaTeX, online collaborative LaTeX editors, will soon be merging into one platform, utilizing their individual strengths. Both tools emerged on the market around the same time in 2012, seeing incredible growth and promise from users as longterm, useful tools. In January 2017, the UC-Berkeley Library subscribed to both tools in order to provide our researchers and students with pro account features of both tools. Both tools enable users to collaborate with groups and individuals on documents; simplify file directories; provide real-time previews; quickly identify errors; and provide access to excellent training tools and hundreds of templates from publishers and different types of documents, not just articles. If you regularly write documents in LaTeX, consider integrating one of these tools into your workflow. Both of them integrate with citation management software, git or GitHub, and provide revision history. Overleaf and ShareLaTeX contribute to a research workflow environment of transparency and preservation, both of which lend well to sharing and revisiting data and notation by others or your future self.
overleaf logo and ShareLaTeX logo
Individually, ShareLaTeX and Overleaf have focused on developing different strengths.
  • WYSIWYG editor
  • publisher relationships for streamlined submission process
  • integration with Mendeley (which we also have an institutional subscription to!)

The merger of the two platforms will focus on bringing together the strongest components of each tool. For now,  you can continue to create accounts on either platform and continue with your work. The founders of ShareLaTeX and Overleaf would like input from their users through this survey.

In the meantime, please join us at the Kresge Engineering Library to learn more about LaTeX and how to write in ShareLaTeX and Overleaf. We will be holding three workshops at the beginning of fall semester, in the Kresge Engineering Library Training Room:

August 24th, 4:00 – 5:00: Introduction to LaTeX
August 31st, 4:00 – 5:00: Typesetting in Math
September 7th, 4:00 – 5:00: Creating Tables, Figures, and Bibliographies

Please register through this form.

Please let us know if you have any questions about the Overleaf and ShareLaTeX merger, or the upcoming workshops.

GitHub: Archiving and Repositories

Github has become ubiquitous in the coding world and, with the advent of data science and computation in a slew of other disciplines, researchers are turning to the version control repository and hosting service. Google uses it, Microsoft uses it, and it’s on the list of the top 100 most popular sites on Earth. As a librarian and a member of the Research Data Management team, I often get the question: “Can I archive my code in my Github repository?” From the research data management perspective, the answer is a little sticky.

github mark

The terms “archive” and “repository” from GitHub mean something very different than their definitions from a research data management perspective. For example, in GitHub, a repository “contains all of the project files…and stores each file’s revision history.” Archiving content on GitHub means that your repository will stay on GiHub until you choose to remove it (or if GitHub receives a DMCA takedown notice, or if it violates their guidelines or terms of service).

For librarians, research data managers, and many funders and publishers, archiving content in a repository requires more stringent requirements. For example, Dryad, a commonly known repository, requires those who wish to remove content to go through a lengthy process proving that work has been infringed, or is not in compliance of the law (read more about removing content from Dryad here). Most importantly, Dryad (and many other repositories) take specific steps to preserve the research materials. For example:
* persistent identification
* fixity checks
* versioning
* multiple copies are kept in a variety of storage sites

A good repository provides persistent access to materials, enables discovery, and does not guarantee, but takes multiple steps to prevent data loss.

So, how can you continue to work efficiently through GitHub and adhere to good archival practices? GitHub links up with Zenodo, a repository based out of CERN. Data files are stored at CERN with another site in Budapest. All data is backed-up on a daily basis with regular fixity and authenticity checks. Zenodo assigns a digital object identifier to your code, making it persistently identifiable and discoverable. Check out this guide on Making Your Code Citable for more information on linking your GitHub with Zenodo. Zenodo isn’t perfect and there are a few limitations, including a max file size of 50 GB. Read more about their policies here.

UC-Berkeley has its own institutional version of GitHub, which means that Berkeley development teams and individual contributors can now have private repositories (and private, shared repositories within the Berkeley domain). If you’d like access, please email github@berkeley.edu. Additionally, we have institutional subscriptions to Overleaf and ShareLaTeX, both of which integrate with GitHub.

Please contact researchdata@berkeley.edu if you’d like more information about archiving your code on GitHub.

Elsevier, Springer Nature, and AAAS: Publisher Research Data Policies

Ever since the Office of Science and Technology introduced a policy addressing the public’s access to data, federal granting agencies, non-profit granting agencies (like the Gates Foundation), publishers, universities, and researchers have been adjusting to reflect changes in access to data at the national level.  The policy requires federal agencies with over $100 million in annual research and development expenses to make research results public and provide a plan for doing so.

As a researcher, this is a difficult landscape to navigate for a number of reasons:
  • you may have entered into a research project mid-grant and are unaware of the data management plan that was included in the grant proposal
  • the data management plan that was included in the grant application is not being followed
  • you’re not sure how funder mandates line up with publisher requirements
  • the language that publishers include about data sharing or publishing aren’t straight forward
  • you know that you’re supposed to make your data public, but you don’t know where to do this or how to do this

There are a number of other obstacles that make data publishing difficult, but for today, let’s take a look at the data sharing policies of three publishers in the Engineering and Physical Sciences. Publishers will often use suggestive or idealistic language, but does that mean you’re off the hook for sharing? If your publisher requires that you make your data public, how do you comply with your funder data mandate and your publisher data policy?

Elsevier is a massive publisher that currently publishes over 49,000 journals in Health, Life Sciences, Physical Sciences and Engineering, and Social Sciences and Humanities. They also publish books, major reference works, and somewhat recently, acquired Mendeley, citation management software. Their most recent product, Mendeley Data, is a cloud-based repository for datasets. To sum it up – Elsevier is huge. They’ve divided their research data policy into two parts – Principles (the expectations, “shoulds,” and “needs” underpinning their research data policy) and Policy (what they actually do). Elsevier’s principles are idealist and sound great and their policies are suggestive.

For example, one of Elsevier’s Data Sharing Principles:
“Research data should be made available free of charge to all researchers wherever possible and with minimal reuse restrictions.”

“We will encourage and support researchers and research institutions to share data where appropriate and at the earliest opportunity.”

In their Research data FAQ section they answer the question:
“Is it compulsory to share my research data?”
A: No.

They’ve taken an interesting approach that sets up researchers to share their data (if prepared to do so), without being prescriptive. Elsevier makes it easy to link to datasets in other repositories, and has even started their own repository with Mendeley Data (that’s another blog post for another day). Elsevier has also jumped into the data journal game, with their open access Data in Brief publication. Data publications are emerging as a way for researchers to write an additional article that provides an in-depth description of datasets behind research. This article format provides data, which is typically buried in supplementary material, another avenue for discovery.

Imagine what could happen to the world of data sharing if a research giant like Elsevier made their policies less like principles and required research data sharing instead of suggesting it.

Springer Nature, formerly known as Springer and the Nature Publishing Group, announced a merger in January of 2015. The new publishing giant produces about 13% of the papers in the scholarly publishing market, still behind Elsevier (23%) (scholarly kitchen). About a year after the merger, the new publisher developed an approach to research data policies that would allow them to remain flexible across their wide range of journals.

Four different policy types:
  1. data sharing and data citation is encouraged
  2. data sharing and evidence of data sharing encouraged
  3. data sharing encouraged and statements of data availability required
  4. data sharing, evidence of data sharing and peer review of data required

The Springer Nature approach allows for flexibility and takes into account the current practices of each discipline the publisher supports. However, prior to submission, you need to know which policy your Springer Nature journal follows (yet another argument for following good data management practices from the start). Let’s take a closer look at each policy.

  • Research Data Policy Type 1 is the most lenient by encouraging data citation and sharing. I like to think of policy 1 as “data sharing lite,” because Springer Nature provides you with information about how to share and cite data, but you don’t necessarily have to. A few titles that fit into this category are: Academic Questions, Accreditation and Quality AssuranceAesthetic Plastic Surgery, Contemporary Islam, and Journal of Happiness Studies.
  • Research Data Policy Type 2 requires the authors to be more open with their relevant raw data by implying that the data will be available to any researcher who would like to reuse them for non-commercial purposes (barring confidentiality issues). This policy falls somewhere between “optional” and “mandatory.” The publisher is telling its journal policy 2 readers that this data is freely available for them to reuse, therefore warning, or preparing, the authors that they may be asked for their data. The easiest way to handle requests like this is to make is publicly available, with a citation and assigned digital object identifier in a repository. A few examples of type 2 journals include: Agronomy for Sustainable Development, BioEnergy Research, Brain Imaging and Behavior, and  Journal of Geovisualization and Spatial Analysis
  • Research Data Policy Type 3 is geared specifically for journals that publish research on the life sciences. When an author submits to policy 3 journals, they are strongly encouraged to deposit data in repositories. It is implied that all raw data is freely available (again, barring confidentiality issues) to any researcher who requests it. For policies 1 and 2, authors may deposit data in general repositories. However, for policy 3, researchers must deposit specific types of data in a list of prescribed repositories. For example, DNA and RNA sequencing data must be deposited in the NCBI Trace Archive or the NCBI Sequence Read Archive (SRA). A few examples of type 3 journals include: Journal of Hematology and Oncology, Nature Cell Biology, and Nature Chemistry.
  • Research Data Policy Type 4 requires that all of the datasets for the paper’s conclusion must be available to reviewers and readers. The datasets have to be available in repositories prior to the peer review process (or be made available in supplementary material) and is conditional upon publication that data is in the appropriate repository. Examples of type 4 journals include BMC Biology, Genome Biology, and Retrovirology.

AAAS, the American Association for the Advancement of Science is much smaller in scope than Springer Nature and Elsevier. AAAS is both a professional society and reputable publisher of six journals: Science; Science Translational Medicine;  Science Signaling; Science Advances; Science Immunology, and Science Robotics. Unlike the other two publishers, AAAS can set tight and strict policies surrounding research data because they publish a small percentage of what the other two produce. Datasets must be deposited in approved repositories with an accession number prior to publication. AAAS encourages compliance with MIBBI (Minimum Information for Biological and Biomedical Investigations) guidelines. AAAS provides a list of approved repositories based on data type (similar to Spring Nature type 4). Not only does AAAS stipulate that data must be available, but that all materials that are necessary to understand and assess the research must be made available. This includes code, patents, and even fossils or rare specimens. Please see AAAS’s publication policies for more information.

These publishers are ordered on a scale from “suggestive” and “encouraging” data policies to strict mandates for sharing research materials (AAAS). Ultimately, you should prepare your data and supporting research materials, like code, from the beginning of a research project as if you were going to publish in a AAAS journal. There are more reasons to that than following publisher data sharing mandates, which I’ll explore in future posts.

Virtual Reality for Cal Day

The Kresge Engineering Library will be one of the host sites for VR @ Berkeley, a student group that brings virtual reality to the campus community. By working with industry and UC-Berkeley researchers, VR @ Berkeley makes virtual reality an accessible experience. Each year, members of the group focus on a wide range of projects that bend the intersection between our physical realities and the virtual. Their work spans many applications including: changing the way we read and interact with textbooks, allowing medical workers in the field communicate with doctors in a more intuitive manner, and a virtual experience of our iconic, 61 bell Campanile.

Virtual Reality at Berkeley Landships


During Cal Day, the Kresge Engineering Library will be hosting Project Landships, a multiplayer tank combat simulator. Players can work together as a crew to aim, shoot, drive, and spot. The experience emulates a WWII Sherman Firefly Tank.

Check out other VR @ Berkeley Projects on Cal Day at the following locations:
1. Kresge Engineering Library
2. ESS Patio
3. Jacobs Hall
4. Sproul Plaza
5. The House (Bancroft)
6. Moffitt Library





Global Engineering Academic Challenge

It’s time again for the Global Engineering Academic Challenge! Starting today, Monday, October 10th, Elsevier will post a challenge question each Monday for the next 5 Mondays (5 questions total). Complete this interdisciplinary challenge with your instructors and peers by solving problem-sets based built around 5 transdisciplinary themes including Future of Energy, Future of Making, Future of Medicine.

Each week, the winner with the highest points will receive $100 to Amazon. The first place grand prize is an Apple iPad and the second place prize is a set of Sonos speakers.

Visit the Engineering Academic Challenge to begin!

DMPTool Updates for August 2016

The crew over at the University of California Curation Center (UC3) and the California Digital Library are working hard to continue to bring big updates to the DMPTool. First off, they’ve added new data management plan templates for the Department of Transportation and NASA. They’re busy working on adding DOD (Department of Defense) and NIJ (National Institute of Justice) templates, but if you’d like another template added, please let them know and send a message here.

Department of Defense logo

NASA logo





Additionally, they’re moving forward to create Machine-actionable DMPs. This means that institutions will be able to better manage their data; DMPs will be data mineable; and researchers can better discover data. Read more about the benefits of Machine-actionable DMPs at the DMPTool blog.

New Resource: Corrosion Database

Springer Materials recently announced the launch of their new Corrosion Database. The Corrosion Database lives in Springer Materials and was compiled from various data and literature from the National Institute of Standards and Technology (NIST). The database contains over 24,000 uniques records of corrosion rates/ratings and can be searched by material, environment, or both. Results are given by corrosion rating in order to find the most (or least resistant) for any given application. For example, the database provides data on how seawater corrodes 164 different types of steel and the rate of corrosion.

screenshot of Corrosion database

Users can also download citations from the database in .bib, .EndNote, or .ris file formats.

Visit the SpringerMaterials database to begin using the new Corrosion Database.

Data Visualization Workshop: Thursday, July 7th, 12:00 pm

A well-designed figure can have a huge impact on the communication of research results. This workshop will introduce key principles and resources for visualizing data:

  • Choosing when to use a visualization
  • Selecting the best visualization type for your data
  • Choosing design elements that increase clarity and impact
  • Avoiding visualization issues that obscure or distort data
  • Finding tools for generating visualizations

Date: Thursday, July 7

Time: 12:00 – 1:00

Location: Bioscience Library Training Room, 2101 VLSB (inside the library)

Add this workshop to your bCal


  • Anna Sackmann, Science Data and Engineering Librarian
  • Becky Miller, Environmental Sciences and Natural Resources Librarian
  • Elliott Smith, Emerging Technologies Librarian

Open to all; no registration is required. Please forward to interested colleagues.

Questions? Please contact esmith@berkeley.edu

Big Changes for the DMPTool, but first, a little downtime.

During the month of May, project developers for the DMPTool and DMPOnline (the UK’s version) began combining documentation to create the DMPRoadmap. Coming next year, the DMPTool and DMPOnline will merge into one Data Management Plan service that can be used internationally and that combines the best features of the current DMPTool and DMPOnline. You can follow their progress via their GitHub Repository: DMPRoadmap.

Stay tuned for updates. In the meantime, the DMPTool will experience brief downtime for mini-maintenance on Wednesday, June 8 2016 from 4:00 – 4:30 (PST).

DMPTool Downtime Wednesday May 4th

The DMPTool will be unavailable on Wednesday, May 4th 2016 from 3:00 – 4:00 (PST). During this period users will not be able to log in or have access to their work. We apologize for the inconvenience.

For questions about the DMPTool or other data management tools and services available to UC Berkeley researchers, please see our Research Data Management page or contact researchdata@berkeley.edu.

Show Your Support

Show Your Support button to donate to the Library

Library Events Calendar

@ The Library

UC Berkeley Library's Twitter avatar
UC Berkeley Library

We pored over many of the eclipse-related offerings at the Library, full of little-known tidbits and fun facts.... t.co/RbNU0t1mPB

UC Berkeley Library's Twitter avatar
UC Berkeley Library

An exciting new resource for Berkeley scholars! t.co/SIqimadVkL

UC Berkeley Library's Twitter avatar
UC Berkeley Library

Libraries are "beginning to serve as bridges between different departments and disciplines, so an English student... t.co/D0K2ZffJ41

UC Berkeley Library's Twitter avatar
UC Berkeley Library

Sharing knowledge is our goal here at the UC Berkeley Library. Check out our spaces and the students and staff... t.co/8DEHvKqt0W

UC Berkeley Library's Twitter avatar
UC Berkeley Library

Check out this video tour of our Hargrove Music Library! t.co/2Xo36E5dCA