Research Data Management
RDM
Research Data Management Services Unit
Scientific research is increasingly more digital, with expanding volumes of data that necessitates good data management practices. Our role is to provide information, consultation, support, and training to researchers through all phases of the research data life cycle (Planning, Data Collection, Management and Analysis, Preservation and Sharing).
We provide advice and support in the following areas
Planning
Managing
Sharing
Policy
Note: This site is still under construction. Feel free to get in touch for any suggestions or inquiries Contact Form
Software & Publications
Software
SOFTWARE | SYSTEM FOR TESTING | PRODUCTION SYSTEMS | LEARN MORE |
---|---|---|---|
ELECTRONIC LAB NOTEBOOK (ELN) | https://rspace-test.mdc-berlin.net/login | https://rspace.mdc-berlin.net/login | How to use the Electronic Lab Notebook |
DATA MANAGEMENT PLANNING (DMP) TOOL | https://mdc.fair-wizard.de/ | ||
OPEN MICROSCOPY ENVIRONMENT REMOTE OBJECTS (OMERO) | https://omero-test.mdc-berlin.net/ | https://omero.mdc-berlin.net/ | |
PROTOCOLS.IO | Coming up soon | Coming up soon |
If applicable: one can test and get to know the software by logging into the test systems. However, when familiar with the system, one should switch to the production system.
Publications
1. PATARČIĆ, Inga; STOJANOVSKI, Jadranka. Adoption of Transparency and Openness Promotion (TOP) Guidelines across Journals. Publications, 2022, 10.4: 46. https://doi.org/10.3390/publications10040046
Journal policies continuously evolve to enable knowledge sharing and support reproducible science. However, that change happens within a certain framework. Eight modular standards with three levels of increasing stringency make Transparency and Openness Promotion (TOP) guidelines which can be used to evaluate to what extent and with which stringency journals promote open science. Guidelines define standards for data citation, transparency of data, material, code and design and analysis, replication, plan and study pre-registration, and two effective interventions: “Registered reports” and “Open science badges”, and levels of adoption summed up across standards define journal’s TOP Factor. In this paper, we analysed the status of adoption of TOP guidelines across two thousand journals reported in the TOP Factor metrics. We show that the majority of the journals’ policies align with at least one of the TOP’s standards, most likely “Data citation” (70%) followed by “Data transparency” (19%). Two-thirds of adoptions of TOP standard are of the stringency Level 1 (less stringent), whereas only 9% is of the stringency Level 3. Adoption of TOP standards differs across science disciplines and multidisciplinary journals (N = 1505) and journals from social sciences (N = 1077) show the greatest number of adoptions. Improvement of the measures that journals take to implement open science practices could be done: (1) discipline-specific, (2) journals that have not yet adopted TOP guidelines could do so, (3) the stringency of adoptions could be increased.
Training & Support
- Internal training & events
Research Data Management training & skills development materials/events:
- RDM Requirements introduction training for PhD Students (16.11.2022.)
- Part 1. slides for research data management introduction training and slides on open data (09.12.2021)
- Part 2. slides for research data management introduction training and slides on requirements from funders and publishers for data management and storage (27.1.2022)
RSpace ELN Training and Demo
- RSpace Demo Webinar
- Basic user training is the first training for all users, including PIs, lab admins, technicians, postdocs, PhDs and sysadmins (us). Everyone should take this training first. E-learning platform or MDC Youtube Channel
- Advanced User and PI training (ideally after basic training) is especially for PIs, LabAdmins and Sysadmins, in other words, user/s who will be responsible for the lab management. However, everyone is welcome to take the training. E-learning platform or MDC Youtube Channel
- Inventory and sample management training: This is about the inventory and sample management feature of RSpace. E-learning platform or MDC Youtube Channel
- External training
Events:
- TeSS: ELIXIR's Training Portal
- EBI-Train online
- Carpentries Workshops
- Helmholtz Federated IT Services training events
- CODATA training workshops
Courses:
- ORION MOOC for Open Science in the Life Sciences
- Mantra- Free online Research Data Management Training
- FOSTER- Open Science Training Courses
- Research data bootcamp by the university of Bristol
- Research Data Management and Sharing on coursera
- Datatree Free online course on research data management
- RDMRose- earning materials in Research Data Management
- Open Science MOOC
Books:
- Open Data Handbook by Open Knowledge Foundation
- Mozilla Science Lab's Open Data Primers
- OpenAIRE-Research Data Management Handbook
- FOSTER- The Open Science Training Handbook
- FOSTER- The Open Science Training Handbook in other languages
- The Turing Way' - A handbook for reproducible data science
Websites:
- Open science trainer’s corner
- UK Data service
- FOSTER-Research Data Management Collection
- GO FAIR Initiative
- FAIRsharing
Other Resources:
- LEARN Toolkit of Best Practice for Research Data Management
- Research data management (RDM) open training materials on Zenodo
- Research Data Alliance
- JISC
- COS-Center for OpenScience
RDM services in other organisations:
- Face-to-Face/Online support
If you need to reach us you have many choices. You can fill a contact form or contact us via:
Email:
rdm@mdc-berlin.de
- inga.patarcic@mdc-berlin.de
- oscar.migueles@mdc-berlin.de
Mattermost:
https://mattermost.mdc-berlin.de/mdc-open/channels/research-data-management
Phone: +49 30 9406-3100
- RDM Monday Seminar Series
- These webinars covered a wide range of topics, including research data management, reproducibility, and software tools essential for our work. Below, you'll find a detailed list of the webinars, along with links to the recordings and additional resources for each session in Q1 and Q2 2024.
- 2024.01.15. RDM SOFTWARE, News from RSpace and Q&A, External speaker: Rob Day, Zoom Link, Attendees: 22, Recording Link, Passcode: &&8daTYp
- 2024.01.22 RDM SOFTWARE, Introduction to OMERO Plus, External speaker: Muhanad Zahra, Status: POSTPONED,
- 2024.01.29. RDM SOFTWARE, Introduction to RSpace, External speaker: Rob Day, Attendees: 22, Recording Link, Passcode: K9h3gz+Y
- 2024.02.05. RDM SOFTWARE, Introduction to OMERO Plus, External speaker: Muhanad Zahra, Attendees: 20, Recording Link, Passcode: j51BX*YE
- 2024.02.12. RDM SOFTWARE, RSpace ELN, Inventory and Sample Management Training, External speaker: Rob Day, Attendees: 16, Recording Link, Passcode: ^*0JaVK6
- 2024.02.19. RDM SOFTWARE, Advanced introduction to OMERO Plus, External speaker: Muhanad Zahra, Attendees: 2, Recording Link, Passcode: euZeD3#V
- 2024.02.26. RDM SOFTWARE, Einführung RSpace (Deutsch), Speaker: Manuel Ehling, Attendees: 20, Recording Link, Passcode: R8KN!lZ4
- 2024.03.04. RDM SOFTWARE, Introduction to protocols.io, External speaker: Emma Ganley, External training, Attendees: 17, Recording Link, Passcode: xeg2U^8A
- 2024.03.11. RDM: Intro, Intro to research data management, data storage, and cleanup, Attendees: 27, Recording Link, Passcode: Xku*r=b1
- 2024.03.18. RDM: Intro, Intro to data protection for data sharing, External speaker: Evgeny Bobrov, Attendees: 11, Recording Link, Passcode: zEwLdd*6
- 2024.03.25. RDM, Break
- 2024.04.08. RDM, How to Open Data? Overview of requirements from funders and publishers for (data) sharing, Speaker: Inga Patarčić and Oscar Migueles, Zenodo Link, Status: To create a concept and discuss it with PIs, Attendees: 24, Recording Link, Passcode: W$=1U=09
- 2024.04.15. RDM SOFTWARE, Combining ChatGPT and a OneNote based ELN to facilitate research documentation and collaboration, Speaker: Mitch Gotthard, Attendees: 45, Recording Link, Passcode: @79ygaKn
- 2024.04.22. Reproducibility, Reproducibility with GUIX, Speaker:Ricardo Wurmus, Attendees: 21, Recording Link, Passcode: %73FYIP5
- 2024.04.29. Reproducibility, Data visualization, How to identify and fix common problems, External speaker: Tracey Weissgerber, No recording available
- 2024.05.06. Reproducibility, Image Data Reproducibility, Speaker: Deborah Schmidt, Recording Link, Passcode: *fc90iG9
- 2024.05.13. RDM, Break
- 2024.05.27. RDM, Overview and latest updates of OpenIRIS, Speaker:Manuel Ehling, Recording Link, Passcode: ta73LGX*
- 2024.06.03. RDM Intro, What is a DMP and how to write it?, Speaker: Oscar Migueles, Recording Link, Passcode: .C3mceM=
- 2024.06.10. Data organization, Data organization at the MDC, Speakers: Altuna Akalin, Henrik Zauber, Udo Heinemann, Topic: Folder organization and file structure, Recording Link, Passcode: vmPd*C$7
- 2024.06.17. RDM Software, MDC Cloud and in-house built Shiny apps ecosystem, Speakers:Dan Munteanu / Madalin Patrascu, Recording Link, Passcode: rj3jT*;.{o
- 2024.06.24. RDM Storage, HMC Hub Health-working on a FAIR data space for Helmholtz, Marco Nolden, Note: Optional is bioImaging, Recording Link: Not available, Passcode: Not available
FAQs
Research Data Management (RDM)
- What is RDM?
- Planning, Data Collection, Management and Analysis, Preservation and Sharing
Research data management (RDM) is a term that describes the organization, storage, preservation and sharing of data collected and used in a research project.
You can ask yourself if people can find, access and understand your data. Did you
- explain your data?
- store it safely?
- open it (if possible)?
- Why do we need RDM?
- Our role is to offer support and guidance during the different research phases; to interface between policymakers, researchers and IT specialists and to provide practical approaches to support you in making your data more FAIR and Open.Research data management helps you achieve the following;
- Accountability and Integrity
- Ensuring data is preserved
- Minimizing risk for data loss
- Ensuring data is findable and easily discoverable, increased time-saving.
- Data can be used and reused, avoid duplicated effort
- Encouraging collaboration, research is more global requiring collaborative data sharing!
- Minimizes time and effort preparing for publication and open data requirements
- Increased recognition, citation, and scholarly impact
- Ensuring intellectual property rights are preserved
- Compliance with funders requirements and institutional policies
- Ensuring public funds are well spent and utilized maximizing their value
- To ensure scientists have what they need to do good science and to create good quality data
- Helps you answer the big question “I have big data now what can I do with it?”
Data & Metadata:
- What is data?
Any type of information that is collected, observed, or created, in the context of research, as such, data can be;
- Primary- Raw from measurements or instruments
- Secondary- Processed from secondary analysis and interpretations.
- Published- final format available for use and reuse
- Metadata- data about your data
- What is Metadata?
Metadata is independent data that contain structured information about other data, i.e. Data about data. In other words, it is the data that provides essential context and relevant information about how the data was created, stored and shared.
For instance, metadata can describe various aspects related to an experimental procedure such as; who carried out the experiment, which parameters were chosen, what type of equipment was used, the output and results, how the results were analysed, shared and used.
- It ensures reliability, accessibility and discoverability
- Increases the value of your data
- Reduces duplication efforts
- Allows us to track people,institutions or publications associated with the original research
- Enables researchers to quickly assess the quality and relevance
- Metadata is frequently required for depositing data in repositories.
Metadata is independent data that contain structured information about other data, i.e. Data about data. In other words, it is the data that provides essential context and relevant information about how the data was created, stored and shared.
For instance, metadata can describe various aspects related to an experimental procedure such as; who carried out the experiment, which parameters were chosen, what type of equipment was used, the output and results, how the results were analysed, shared and used.
- It ensures reliability, accessibility and discoverability
- Increases the value of your data
- Reduces duplication efforts
- Allows us to track people,institutions or publications associated with the original research
- Enables researchers to quickly assess the quality and relevance
- Metadata is frequently required for depositing data in repositories.
- What are the different types of metadata?
Descriptive metadata: Information outlining basic facts necessary for discovery and identification, i.e. title, authors, keywords and abstract,
Structural metadata: Information regarding the structure (organisation and relationship) of data and underlying items. For instance, it could be a description of enclosed files and scripts, how they are organized and structured and how they are related and where they can be found i.e. DOI
Administrative metadata: Information that describes the technical information and information regarding the management of the data including, licensing and copyright permissions, technical requirements, file formats, provenance (i.e. history of ownership, who owns the data and where did it come from), access and sharing controls and permissions, quality controls and integrity checks.
Descriptive metadata: Information outlining basic facts necessary for discovery and identification, i.e. title, authors, keywords and abstract,
Structural metadata: Information regarding the structure (organisation and relationship) of data and underlying items. For instance, it could be a description of enclosed files and scripts, how they are organized and structured and how they are related and where they can be found i.e. DOI
Administrative metadata: Information that describes the technical information and information regarding the management of the data including, licensing and copyright permissions, technical requirements, file formats, provenance (i.e. history of ownership, who owns the data and where did it come from), access and sharing controls and permissions, quality controls and integrity checks.
- What are Metadata standards (also known as Schemas)?
- Metadata standards enable the structuring of metadata and enhance its interoperability, by using common terms and definitions, to provide consistency and accuracy to data documentation.
- Metadata standards offer technical standards that ensure units of measurement, time, are entered in controlled formats, i.e. the date and time formats
- The standards can be discipline-specific or general such as Dublin Core, DataCite Metadata Schema, Data Documentation Initiative (DDI) and International Standards Organisation (ISO).
- Example of metadata standards and tools for lab-based research:
- ISA framework and tools: https://isa-tools.org/
- Minimum Information for Biological and Biomedical Investigations: https://fairsharing.org/collection/MIBBI
- Examples of metadata standards for software:
- CodeMeta https://codemeta.github.io/
- The Software Ontology http://theswo.sourceforge.net/
- PROV Ontology (PROV-O) https://www.w3.org/TR/prov-o/
- Which file format should I use for my Metadata?
- A text or Html document.
- An XML document linked to data files
- Information embedded in an XML data file
XML (eXtensible Mark-up Language) files include key data and metadata documentation that is interoperable for web browsers and analysis engines which in turn enables field-specific searching.
- A text or Html document.
- An XML document linked to data files
- Information embedded in an XML data file
XML (eXtensible Mark-up Language) files include key data and metadata documentation that is interoperable for web browsers and analysis engines which in turn enables field-specific searching.
- How to capture metadata?
- Use ELN to record your work
- Use versioning controls to track history, progress and changes in a descriptive manner
- Use metadata standards
- Use README files
- Use ELN to record your work
- Use versioning controls to track history, progress and changes in a descriptive manner
- Use metadata standards
- Use README files
- What is open data?
Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and share-alike.
The Open Data Handbook- Open Knowledge Foundation
Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and share-alike.
The Open Data Handbook- Open Knowledge Foundation
- What is FAIR data?
FAIR is a set of principles to define the best practices for data and software to facilitate discovery, access and reuse by humans and machines.
FAIR stands for:
Findable: Your data should be findable, by you and others.
Accessible: Your data should be accessible for both humans and machines, i.e. retrievable and understandable
Interoperable: Machines and humans can interpret and use the data in different settings.
Reusable: The ultimate goal of FAIR is to advance the reuse of data. Everything you’ve done so far ultimately leads to this point, ensuring the data can be reused by others.
FAIR data summary
- Deposit your data where others can find it, keep in mind where your peers can find it, i.e. field-specific repository and give it a stable unique identifier (PID).
- Make your data & metadata accessible via standard means such as http/API.
- Create metadata and explain in detail what this data is about, never assume people know!
- Deposit metadata with PID and make it available with/out data i.e. in case data itself is heavily protected.
- Include information on ownership and provenance.
- Outline what the reusers of your data are/not allowed to do, use clear license. Commonly used licenses like MIT or Creative Commons (keep in mind funders requirements).
- Specify access conditions, if authentication or authorization is required.
- Describe your data in a standardized fashion using agreed terminology and vocabulary.
- Share the data in preferred & open file formats.
- Start the process early on!
You can find more details in our training material on the basics of Research Data Management: https://zenodo.org/record/4562630
FAIR is a set of principles to define the best practices for data and software to facilitate discovery, access and reuse by humans and machines.
FAIR stands for:
Findable: Your data should be findable, by you and others.
Accessible: Your data should be accessible for both humans and machines, i.e. retrievable and understandable
Interoperable: Machines and humans can interpret and use the data in different settings.
Reusable: The ultimate goal of FAIR is to advance the reuse of data. Everything you’ve done so far ultimately leads to this point, ensuring the data can be reused by others.
FAIR data summary
- Deposit your data where others can find it, keep in mind where your peers can find it, i.e. field-specific repository and give it a stable unique identifier (PID).
- Make your data & metadata accessible via standard means such as http/API.
- Create metadata and explain in detail what this data is about, never assume people know!
- Deposit metadata with PID and make it available with/out data i.e. in case data itself is heavily protected.
- Include information on ownership and provenance.
- Outline what the reusers of your data are/not allowed to do, use clear license. Commonly used licenses like MIT or Creative Commons (keep in mind funders requirements).
- Specify access conditions, if authentication or authorization is required.
- Describe your data in a standardized fashion using agreed terminology and vocabulary.
- Share the data in preferred & open file formats.
- Start the process early on!
You can find more details in our training material on the basics of Research Data Management: https://zenodo.org/record/4562630
Electronic Lab Notebook (ELN)
- What is an ELN?
An ELN is a computer program designed to replace paper laboratory notebooks.
- Centralized ELN solution at MDC
MDC has a subscription to RSpace ELN. One can test and get to know the software by logging into the https://rspace-test.mdc-berlin.net/. However, when familiar with the system, one should switch to the production version of the software https://rspace.mdc-berlin.net. For further information, please visit our Electronic Lab Notebook page.
- FAQ about ELN at MDC
Please visit our Electronic Lab Notebook page for the FAQs.
- What are 'proprietary' and 'open' formats?
Proprietary formats are file formats that usually can be viewed only in the software/tool which created the files. This software uses its own proprietary format to save and read the file. Only the company itself or licensees may use it. The description of the format is confidential or unpublished and the company/organization has the right to change it at any time.
In contrast, an open format is a file format that is published and free to be used by everybody.- Which formats should I use?
The choice of file formats might be different dependent on which phase of the research data life cycle you are, for instance; when sharing data externally, it is more favourable to convert to an open file format, however if you are in the process of analyzing the data, you might be required to use file formats compatible with the software used.
Consider the following:
- Choose standard file formats most commonly used in your field.
- Convert data to a standard format.
- Choose a format that is required for data deposition i.e. repository requirements, archival compression.
- Consider exporting or converting from original format to a more open/preferred format but keep in mind that some data might be lost or altered during the process e.g., text formatting in documents, decimal point formatting, date and time values.
- Keep in mind there are no standard preferred file formats, and none are perfect, but consider choosing open formats that are most applicable for your use and field.
- When archiving data, combine the whole project (i.e., raw data, analysis, documentation, code and software) in one package.
- For software consider the use of containers to enable interoperability and long-term re-use.
For more information, you can visit the Organizing your data link under RDM tab.
Data Protection
- What is personal and sensitive data?
GDPR defines personal data in Article 4(1) as, any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.
According to Article 4(13), (14) and (15) and Article 9 and Recitals (51) to (56) of the GDPR the following personal data is considered ‘sensitive’ and is subject to specific processing conditions:- personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs;
- trade-union membership;
- genetic data, biometric data processed solely to identify a human being;
- health-related data;
- data concerning a person’s sex life or sexual orientation.
GDPR defines personal data in Article 4(1) as, any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.
According to Article 4(13), (14) and (15) and Article 9 and Recitals (51) to (56) of the GDPR the following personal data is considered ‘sensitive’ and is subject to specific processing conditions:- personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs;
- trade-union membership;
- genetic data, biometric data processed solely to identify a human being;
- health-related data;
- data concerning a person’s sex life or sexual orientation.
- My data is sensitive. How do I share or publish it?
Sensitive data can only be shared or published if the researcher
- have the right to publish the data
- took freely given consents for data sharing from the participants before
- got ethical approval for data publication
and by making the data anonymised and by licensing for reuse and attribution.
Otherwise, the researcher can publish the non-identifiable metadata of sensitive/personal data.
Sensitive data can only be shared or published if the researcher
- have the right to publish the data
- took freely given consents for data sharing from the participants before
- got ethical approval for data publication
and by making the data anonymised and by licensing for reuse and attribution.
Otherwise, the researcher can publish the non-identifiable metadata of sensitive/personal data.
News
Team
Dr. Inga Patarčić
MDC and I have a few stories to tell. I joined the MDC as a PhD Student in bioinformatics (AG Akalin, Bioinformatics and Omics Data Science). Then, as a member of the MDC’s Communication Department, I supported the ORION Project (https://www.orion-openscience.eu/) and promotion of open science.
With rediscovered interest in open data, I join the Research Data Management Team as a Project Manager. My background is molecular biology. I worked as a researcher at the Medical School in Split (population genetics).
Dr. Oscar Arturo Migueles Lozano
I joined the MDC as a PhD student, in the AG Wolf group (https://www.mdc-berlin.de/wolf). During this period, my skills in bioinformatics and mathematical modeling broadened greatly.
My experience in research, has provided me with insight into the importance of data handling. Research data management captured my interest, since it involves data integrity, strong influence over discoveries and easy collaboration among researchers.