Week 2.4: Data Management#
In any research project, it is important to have a plan for how you will store and manage your data. Data Management is important to ensure that no research data is lost. It contributes to research results reproducibility. In some cases, (e.g., open science and open data) it also facilitates the reusability of the data in future projects. Besides research projects need to abide by research integrity principles, ethical requirements, and, in some cases, legal or institutional obligations. While it is good to publicly share data, not all data can always be shared. Your research data may be sensitive or legally protected requiring that the data remain private and secure. There may also be potential for patenting and copy rights, which means that not all data and code can be shared immediately but only after those procedures are finalized. This workshop will focus on these issues.
Thanks to Open and FAIR Data, it is possible to reuse existing data. FAIR is an acronym for Findable, Accessible, Interoperable, and Reusable (and not to be confused with fair or ethical research!). FAIR data is made available via data repositories, which you can use to interact with existing research data and generate new insights. Data repositories can also be used to share the research data of your research project. To plan for data sharing, every research project should have a Data Management Plan. In this Plan, you describe what type of data you will generate, how you will document it, where you will store the data securely (especially important for personal data!), and which data repository you will use for the data that can be publicly shared.
This week we’ll focus on:
Monday:
Science Spotlight
Workshop: Data Management and Data Privacy
Wednesday:
Friday:
Friday Symposium
Workshop: Data Management and Privacy#
Research depends on data we collect, store and analyse. Before any researcher gets started, they need a plan for how they will manage their data, as it will be easier to manage your data properly from the start instead of having to redo some of your work. When joining an ongoing project, it is important to learn how that project stores its data. This is generally described in the Data Management Plan of a project or a research group. This workshop will introduce many of the concepts.
During this workshop we will also consider what personal data you are handling in your project. How would you need to adhere to privacy regulations?
To get inspired on why Research Data Management is important, listen to some horror stories such as the Data Horror Song or the Zheng Lab - Bad Project video!
Key Concepts#
The advantages of Research Data Management
Personal data and what are the requirements to manage this type of data
Data sharing and what not to share
The FAIR principles
A Data Availability Statement
Relevant Learning Goals#
Realise the important role that good data management plays in research, especially for personal data.
Identify different types of research data and recognise the regulations, policies and/or legal requirements associated with them.
List the main components of the FAIR data principles and connect them to your own research workflows.
Workshop: Data Repositories, Management and Documentation#
Whether you are reusing existing data or using newly collected data in your project, you have to select an appropriate Data Repository to store and manage access to this data. You also have to document this information in your Data Management Plan. During this workshop, we will first explore in more detail what a Data Repository is, as well as how to find the ones relevant to your research project.
Each group will set up a draft Data Management Plan, which will be exchanged with another group during the workshop to receive input and feedback. As it is also important to provide your data with sufficient information and documentation when you share it in a data repository, we will go over an exercise that will demonstrate the importance of proper documentation. The lessons learned can also be included in your project’s Data Management Plan!
Key Concepts#
What are data repositories?
What is a Data Management Plan?
What is the importance of communication/documentation regarding data in research projects?
Relevant Learning Goals#
Contribute to documentation about data in a manner that is understandable to peers
Design a research data management strategy for your projects via the Data Management Plan
Group Activity of the Week#
Continue working on with your research overall, special focus this week is on your data.
Write your data management plan for your project.
This should include your data sharing plan.
Identify repositories you might use.
How will you document the data?
Discussion Questions#
What should be private information?
What rights should people have to their data when used in research and shared in repositories (even if de-identified)?
What is important in a data management plan?
What should you think about when re-using data other people have collected?
Why is it important to share data and code? When might you want to not use shared data?
Could the data you’re using/collecting be useful in other minor projects? How? What are it’s limitations?
Is there data you have or have found that could be useful for the other projects?
What data do you wish you had available?
Weekly Submitted Assignments#
Group#
Submit draft of data management plan
Individual#
What do you think is important in balancing data privacy and being able to do research?
References#
Why RDM is relevant for you(8 minute video, start at 0:44)
Markowetz, Florian. 2015. “Five Selfish Reasons to Work Reproducibly.” Genome Biology 16 (1). https://doi.org/10.1186/s13059-015-0850-7. (or watch a 50 min presentation about this paper)
Managing Sensitive Data Projects (The Turing Way)
Data Management Plans(The Turing Way)
Slides with Research Data Management questions you can ask yourself at each research steps.