Research Data Management

Sara El-Gebali & Özlem Özkan

RDM

Research Data Management Services Unit

Scientific research is increasingly more digital, with expanding volumes of data that necessitates good data management practices. Our role is to provide information, consultation, support, and training to researchers through all phases of the research data life cycle (Planning, Data Collection, Management and Analysis, Preservation and Sharing) such as;

We provide  advice and support in the following areas

Planning

Managing

Sharing

Policy

Note: This site is still under construction.  Feel free to get in touch for any suggestions or inquiries  Contact Form

Team

Sara El-Gebali

About: 

I am the Research Data Management Team Leader. Before joining the MDC, I was a scientific database curator at European Molecular Biology laboratories-European Bioinformatics Institute (EMBL-EBI), Cambridge, UK as well as the European Molecular Biology Organisation (EMBO ), Heidelberg, Germany. My background is in Cancer Research where I gained my PhD from the University of Bern, Switzerland studying the role of amino acid transporters in colon cancer progression.

I am a strong advocate for Open Science, community building, inclusion, and the promotion of women and underrepresented minorities in STEM fields.


Özlem Özkan

About:

I am the Research Data Management Project Manager. Previously, I worked as a Data Scientist in KPMG AG and I had been Research Assistant in Middle East Technical University for about 10 years.

My research areas are genetic and health data analysis and management; design of genetic data included Personal Health Record systems; privacy and security of Electronic Health Records; Personal Data Protection laws and regulations.

I have had experience in privacy and confidentiality of health and genetic data, data analysis methods, machine learning algorithms and IT infrastructures.

Training & Outreach

Community building

 

The aim of the Open Science Community Calls is to raise awareness to open science and open data practices and to nurture scientific engagement at the MDC.

Community calls:

25th of January 2021, Electronic Lab Notebooks & Survey results

15th of October 2020, Introducing the Research Data Management Unit

Session notes:

Please note that the calls are NOT recorded in order to promote free discussions. The notes of the sessions are only available in the intranet.

If you would like to suggest a topic please get in touch with us.

Internal training & events

 

Research Data Management training & skills development materials/events:

Face-to-Face/Online support

 

If you need to reach us you have many choices. You can fill a contact form or contact us via:

Email: 

sara.el-gebali@mdc-berlin.de 
oezlem.oezkan@mdc-berlin.de

Mattermost:

https://mattermost.mdc-berlin.de/mdc-open/channels/research-data-management

Join our mailing list on specific topics:

Mailing list for MDC employees interested in participating in a working group on data sharing: rdm_data_sharing@list.mdc-berlin.de 

Mailing list for MDC members interested in participating in a working group on Electronic Laboratory Notebooks (ELN): rdm_eln@list.mdc-berlin.de

Phone: +49 30 9406-3100

 

FAQs

Research Data Management

What is data?

 

Any type of information that is collected, observed, or created, in the context of research, as such, data can be;

  • Primary- Raw from measurements or instruments
  • Secondary- Processed from secondary analysis and interpretations.
  • Published- final format available for use and reuse
  • Metadata- data about your data

What is open data?

 

Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and share-alike.

The Open Data Handbook- Open Knowledge Foundation

What is FAIR data?

 

FAIR is a set of principles to define the best practices for data and software to facilitate discovery, access and reuse by humans and machines.

FAIR stands for:​​​

Findable: Your data should be findable, by you and others.

Accessible: Your data should be accessible for both humans and machines, i.e. retrievable and understandable

Interoperable: Machines and humans can interpret and use the data in different settings.

Reusable: The ultimate goal of FAIR is to advance the reuse of data. Everything you’ve done so far ultimately leads to this point, ensuring the data can be reused by others.

FAIR data summary​​​​​

  1. Deposit your data where others can find it, keep in mind where your peers can find it, i.e. field specific repository and give it a stable unique identifier (PID).
  2. Make your data & metadata accessible via standard means such as http/API.
  3. Create metadata and explain in detail what this data is about, never assume people know!
  4. Deposit metadata with PID and make it available with/out data i.e. in case data itself is heavily protected.
  5. Include information on ownership and provenance.
  6. Outline what the reusers of your data are/not allowed to do, use clear license. Commonly used licenses like MIT or Creative Commons (keep in mind funders requirements).
  7. Specify access conditions, if authentication or authorization is required.
  8. Describe your data in a standardized fashion using agreed terminology and vocabulary.  
  9. Share the data in preferred & open file formats.
  10. Start the process early on!

You can find more details in our training material on the basics of Research Data Management: https://zenodo.org/record/4562630

Q. Why do we need research data management?

Planning, Data Collection, Management and Analysis, Preservation and Sharing

Our role is to offer support and guidance during the different research phases; to be an interface between policymakers, researchers and IT specialists and to provide practical approaches to support you in making your data more FAIR and Open.

Research data management helps you achieve the following;

  1. Accountability and Integrity
  2. Ensuring data is preserved 
  3. Minimizing risk for data loss
  4. Ensuring data is findable and easily discoverable, increased time-saving. 
  5. Data can be used and reused, avoid duplicated effort
  6. Encouraging collaboration, research is more global requiring collaborative data sharing! 
  7. Minimizes time and effort preparing for publication and open data requirements  
  8. Increased recognition, citation, and scholarly impact
  9. Ensuring intellectual property rights are preserved
  10. Compliance with funders requirements and institutional policies
  11. Ensuring public funds are well spent and utilized maximizing its value 
  12. To ensure scientists have what they need to do good science and to create good quality data 
  13. Helps you answer the big question “ I have big data now what can I do with it?”  

Data Protection

Q. What is personal and sensitive data?

 

GDPR defines personal data in Article 4(1) as, any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

According to Article 4(13), (14) and (15) and Article 9 and Recitals (51) to (56) of the GDPR the following personal data is considered ‘sensitive’ and is subject to specific processing conditions:

  • personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs;
  • trade-union membership;
  • genetic data, biometric data processed solely to identify a human being;
  • health-related data;
  • data concerning a person’s sex life or sexual orientation.

Q. My data is sensitive. How do I share or publish it?

 

Sensitive data can only be shared or published if the researcher 

  • have the right to publish the data
  • took freely given consents for data sharing from the participants before
  • got ethical approval for data publication

and by making the data anonymised and by licensing for reuse and attribution.

Otherwise, the researcher can publish the non-identifiable metadata of sensitive/personal data.

Q. Where should I store my sensitive data?

 

 

Q. Which legal instruments and agreements I need for processing human data?

 

 

Management

Q. Does MDC recommend any Electronic Lab Notebooks(ELN)?

 

RDM is currently working on ELN solutions. 

Q. What is metadata in scientific research?

 

Metadata is data about data. In other words, it is the data that provides essential context and relevant information about how the data was created, stored and shared. 

For instance, metadata can describe various aspects related to an experimental procedure such as; who carried out the experiment, which parameters were chosen, what type of equipment was used, the output and results, how the results were analysed, shared and used. 

Q. Why collect metadata?

Metadata collection and documentation is a vital part of good data management practices. ​​​​​​

In light of the ‘reproducibility' crisis in the life sciences, along with the number of  high profile cases of research misconduct arising, effective record keeping and metadata collection have witnessed major support from multiple stakeholders, for instance;

Increasingly more funding bodies mandate data management plans that outline the process of data and metadata collection, analysis and annotation as well as long term storage solutions and sharing practices. 

While many journals advise authors to upload their raw/metadata appropriately as part of the publication process such as JCBhas which established JCB DataViewer, allowing researchers to deposit the original source data, i.e. raw imaging and gel data (https://link.springer.com/content/pdf/10.1007%2F978-3-030-33656-1.pdf)

Furthermore, metadata capturing ensures functional data sharing by enabling appropriate and effective reuse making the data more reliable and accessible, thereby facilitating novel discovery, analyses and interpretations. 

There are other benefits for metadata collection on an institutional level, such as; 

As for the researchers, metadata collection results in;

  • Enhanced visibility
  • Increased collaboration and citation of research 
  • Generation of more reliable data
  • Facilitates the understanding of previous research thereby accelerating discovery. 

Q. How do we collect metadata?

 

While we have witnessed a massive digital expansion in the lab, the one aspect that has not benefited from digitization is record taking, which has to the large part remained analogue (i.e. on paper). In order to support researchers to keep track of the overwhelming amount and comprehensive digital data and metadata generated, the introduction of digital solutions such as electronic LNs need to be initiated and implemented. 

It is important to recognise that lab notebook (LN) are used to document research data including; hypothesis, experimental procedures, analysis, interpretation and reporting, which makes them the primary space for data capturing and recording. Hence, ensuring proper documentation in LNs and effective metadata capturing is a central element in data management. 

eLNs offer obvious advantages compared to traditional note-taking on paper LNs, for instance, compared to paperLNs, eLNs provide;

  • Structured and detailed documentation allowing traceability and data extraction
  • Proof of provenance and ownership, protection of intellectual property
  • Seamless integration with lab equipment and digitally acquired data 
  • Storage solutions for different data types with varying volumes
  • Facilitated data analysis and sharing
  • Data readily available and searchable
  • Ensured long term availability of the data 
  • Creation of templates
  • Lab inventory management

Q. Some examples of metadata standards:

 

  • Dublin Core - domain agnostic, basic and widely used metadata standard
  • DDI (Data Documentation Initiative) - a common standard for social, behavioural and economic sciences, including survey data
  • EML (Ecological Metadata Language) - specific for ecology disciplines
  • ISO 19115 and FGDC-CSDGM (Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata) - for describing geospatial information
  • MINSEQE (MINimal information about high throughput SEQeuencing Experiments) - Genomics standard
  • FITS (Flexible Image Transport System) - Astronomy digital file standard that includes structured, embedded metadata​​​​​​
  • MIBBI - Minimum Information for Biological and Biomedical Investigations

Publications

Q. How do I cite my data?

 

 

Q. How can I cite somebody else's data?

 

 

MDC Storages

Q. What will happen to my data if it has not been accessed for 10 years?

 

 

Q. Can I submit my data to any repository I want?

 

 

Formats

Q. What data formats should I use?

 

Consider the following:

  • Choose standard file formats most commonly used in your field.
  • Convert data to a standard format.
  • Choose a format that is required for data deposition i.e. repository requirements, archival compression.
  • Consider exporting or converting from original format to a more open/preferred format but keep in mind that some data might be lost or altered during the process e.g., text formatting in documents, decimal point formatting, date and time values.
  • Keep in mind there are no standard preferred file formats, and none are perfect, but consider choosing open formats that are most applicable for your use and field.
  • When archiving data, combine the whole project (i.e., raw data, analysis, documentation, code and software) in one package.
  • For software consider the use of containers to enable interoperability and long-term re-use.

For more information, you can visit the Organizing your data link under RDM tab.

 

Q. What are 'proprietary' and 'open' formats?

 

Proprietary formats are file formats that usually can be viewed only in the software/tool which created the files. This software uses its own proprietary format to save and read the file. Only the company itself or licensees may use it. The description of the format is confidential or unpublished and the company/organization has the right to change it at any time. 
In contrast, an open format is a file format that is published and free to be used by everybody.