Home » Articles posted by asackmann
Posts by Author: asackmann
Github has become ubiquitous in the coding world and, with the advent of data science and computation in a slew of other disciplines, researchers are turning to the version control repository and hosting service. Google uses it, Microsoft uses it, and it’s on the list of the top 100 most popular sites on Earth. As a librarian and a member of the Research Data Management team, I often get the question: “Can I archive my code in my Github repository?” From the research data management perspective, the answer is a little sticky.
The terms “archive” and “repository” from GitHub mean something very different than their definitions from a research data management perspective. For example, in GitHub, a repository “contains all of the project files…and stores each file’s revision history.” Archiving content on GitHub means that your repository will stay on GiHub until you choose to remove it (or if GitHub receives a DMCA takedown notice, or if it violates their guidelines or terms of service).
For librarians, research data managers, and many funders and publishers, archiving content in a repository requires more stringent requirements. For example, Dryad, a commonly known repository, requires those who wish to remove content to go through a lengthy process proving that work has been infringed, or is not in compliance of the law (read more about removing content from Dryad here). Most importantly, Dryad (and many other repositories) take specific steps to preserve the research materials. For example:
* persistent identification
* fixity checks
* multiple copies are kept in a variety of storage sites
A good repository provides persistent access to materials, enables discovery, and does not guarantee, but takes multiple steps to prevent data loss.
So, how can you continue to work efficiently through GitHub and adhere to good archival practices? GitHub links up with Zenodo, a repository based out of CERN. Data files are stored at CERN with another site in Budapest. All data is backed-up on a daily basis with regular fixity and authenticity checks. Zenodo assigns a digital object identifier to your code, making it persistently identifiable and discoverable. Check out this guide on Making Your Code Citable for more information on linking your GitHub with Zenodo. Zenodo isn’t perfect and there are a few limitations, including a max file size of 50 GB. Read more about their policies here.
UC-Berkeley has its own institutional version of GitHub, which means that Berkeley development teams and individual contributors can now have private repositories (and private, shared repositories within the Berkeley domain). If you’d like access, please email email@example.com. Additionally, we have institutional subscriptions to Overleaf and ShareLaTeX, both of which integrate with GitHub.
Please contact firstname.lastname@example.org if you’d like more information about archiving your code on GitHub.
- you may have entered into a research project mid-grant and are unaware of the data management plan that was included in the grant proposal
- the data management plan that was included in the grant application is not being followed
- you’re not sure how funder mandates line up with publisher requirements
- the language that publishers include about data sharing or publishing aren’t straight forward
- you know that you’re supposed to make your data public, but you don’t know where to do this or how to do this
- data sharing and data citation is encouraged
- data sharing and evidence of data sharing encouraged
- data sharing encouraged and statements of data availability required
- data sharing, evidence of data sharing and peer review of data required
- Research Data Policy Type 1 is the most lenient by encouraging data citation and sharing. I like to think of policy 1 as “data sharing lite,” because Springer Nature provides you with information about how to share and cite data, but you don’t necessarily have to. A few titles that fit into this category are: Academic Questions, Accreditation and Quality Assurance, Aesthetic Plastic Surgery, Contemporary Islam, and Journal of Happiness Studies.
- Research Data Policy Type 2 requires the authors to be more open with their relevant raw data by implying that the data will be available to any researcher who would like to reuse them for non-commercial purposes (barring confidentiality issues). This policy falls somewhere between “optional” and “mandatory.” The publisher is telling its journal policy 2 readers that this data is freely available for them to reuse, therefore warning, or preparing, the authors that they may be asked for their data. The easiest way to handle requests like this is to make is publicly available, with a citation and assigned digital object identifier in a repository. A few examples of type 2 journals include: Agronomy for Sustainable Development, BioEnergy Research, Brain Imaging and Behavior, and Journal of Geovisualization and Spatial Analysis
- Research Data Policy Type 3 is geared specifically for journals that publish research on the life sciences. When an author submits to policy 3 journals, they are strongly encouraged to deposit data in repositories. It is implied that all raw data is freely available (again, barring confidentiality issues) to any researcher who requests it. For policies 1 and 2, authors may deposit data in general repositories. However, for policy 3, researchers must deposit specific types of data in a list of prescribed repositories. For example, DNA and RNA sequencing data must be deposited in the NCBI Trace Archive or the NCBI Sequence Read Archive (SRA). A few examples of type 3 journals include: Journal of Hematology and Oncology, Nature Cell Biology, and Nature Chemistry.
- Research Data Policy Type 4 requires that all of the datasets for the paper’s conclusion must be available to reviewers and readers. The datasets have to be available in repositories prior to the peer review process (or be made available in supplementary material) and is conditional upon publication that data is in the appropriate repository. Examples of type 4 journals include BMC Biology, Genome Biology, and Retrovirology.
The Kresge Engineering Library will be one of the host sites for VR @ Berkeley, a student group that brings virtual reality to the campus community. By working with industry and UC-Berkeley researchers, VR @ Berkeley makes virtual reality an accessible experience. Each year, members of the group focus on a wide range of projects that bend the intersection between our physical realities and the virtual. Their work spans many applications including: changing the way we read and interact with textbooks, allowing medical workers in the field communicate with doctors in a more intuitive manner, and a virtual experience of our iconic, 61 bell Campanile.
During Cal Day, the Kresge Engineering Library will be hosting Project Landships, a multiplayer tank combat simulator. Players can work together as a crew to aim, shoot, drive, and spot. The experience emulates a WWII Sherman Firefly Tank.
Check out other VR @ Berkeley Projects on Cal Day at the following locations:
1. Kresge Engineering Library
2. ESS Patio
3. Jacobs Hall
4. Sproul Plaza
5. The House (Bancroft)
6. Moffitt Library
It’s time again for the Global Engineering Academic Challenge! Starting today, Monday, October 10th, Elsevier will post a challenge question each Monday for the next 5 Mondays (5 questions total). Complete this interdisciplinary challenge with your instructors and peers by solving problem-sets based built around 5 transdisciplinary themes including Future of Energy, Future of Making, Future of Medicine.
Each week, the winner with the highest points will receive $100 to Amazon. The first place grand prize is an Apple iPad and the second place prize is a set of Sonos speakers.
Visit the Engineering Academic Challenge to begin!
The crew over at the University of California Curation Center (UC3) and the California Digital Library are working hard to continue to bring big updates to the DMPTool. First off, they’ve added new data management plan templates for the Department of Transportation and NASA. They’re busy working on adding DOD (Department of Defense) and NIJ (National Institute of Justice) templates, but if you’d like another template added, please let them know and send a message here.
Additionally, they’re moving forward to create Machine-actionable DMPs. This means that institutions will be able to better manage their data; DMPs will be data mineable; and researchers can better discover data. Read more about the benefits of Machine-actionable DMPs at the DMPTool blog.
Springer Materials recently announced the launch of their new Corrosion Database. The Corrosion Database lives in Springer Materials and was compiled from various data and literature from the National Institute of Standards and Technology (NIST). The database contains over 24,000 uniques records of corrosion rates/ratings and can be searched by material, environment, or both. Results are given by corrosion rating in order to find the most (or least resistant) for any given application. For example, the database provides data on how seawater corrodes 164 different types of steel and the rate of corrosion.
Users can also download citations from the database in .bib, .EndNote, or .ris file formats.
Visit the SpringerMaterials database to begin using the new Corrosion Database.
A well-designed figure can have a huge impact on the communication of research results. This workshop will introduce key principles and resources for visualizing data:
- Choosing when to use a visualization
- Selecting the best visualization type for your data
- Choosing design elements that increase clarity and impact
- Avoiding visualization issues that obscure or distort data
- Finding tools for generating visualizations
Date: Thursday, July 7
Time: 12:00 – 1:00
Location: Bioscience Library Training Room, 2101 VLSB (inside the library)
- Anna Sackmann, Science Data and Engineering Librarian
- Becky Miller, Environmental Sciences and Natural Resources Librarian
- Elliott Smith, Emerging Technologies Librarian
Open to all; no registration is required. Please forward to interested colleagues.
Questions? Please contact email@example.com
During the month of May, project developers for the DMPTool and DMPOnline (the UK’s version) began combining documentation to create the DMPRoadmap. Coming next year, the DMPTool and DMPOnline will merge into one Data Management Plan service that can be used internationally and that combines the best features of the current DMPTool and DMPOnline. You can follow their progress via their GitHub Repository: DMPRoadmap.
Stay tuned for updates. In the meantime, the DMPTool will experience brief downtime for mini-maintenance on Wednesday, June 8 2016 from 4:00 – 4:30 (PST).
The DMPTool will be unavailable on Wednesday, May 4th 2016 from 3:00 – 4:00 (PST). During this period users will not be able to log in or have access to their work. We apologize for the inconvenience.
For questions about the DMPTool or other data management tools and services available to UC Berkeley researchers, please see our Research Data Management page or contact firstname.lastname@example.org.
The Materials Project provides open web-based access to computed information on known and predicted materials as well as powerful analysis tools to inspire and design novel materials. Through computational modeling and supercomputing, the Materials Project allows the user to assess how different atoms and molecules interact with each other. The Materials Explorer is the core tool, or app, through which users can query all of the data in the materials compound database through an interactive Periodic Table of Elements. With 66,140 computed compounds, users discover a number of material properties including compound formation energy, stability, bandgap, density, volume, and more. This app, along with seven others (including the crystal toolkit, structure predictor, and the battery explorer) allows researchers to compute the properties of compounds before materials are synthesized in a lab, all of which save money, time, and guesswork.
The Materials Project was founded by two current UC-Berkeley Materials Science and Engineering professors, Dr. Kristin Persson and Dr. Gerbrand Ceder. The Project is supported by the US Department of Energy, Lawrence Berkeley National Lab, MIT, and the Battery Materials Research Program. For more information on collaborators, visit About the Materials Project.