Scientists collect and store data on laptops or servers, enter data in lab books, send data by e-mail or upload them into a database – scientific data of all sizes are the basis of research and the overall amount of such data is continuously increasing. The more researchers at the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC) manage their data in a structured way, the easier it is for others to reuse it, thus enhancing collaboration and sparking scientific discoveries.
The task of analyzing, securing, publishing and sharing research data is becoming increasingly complex in the life sciences.
A new team at the MDC will now focus exclusively on research data management (RDM): Dr. Sara El-Gebali and Dr. Özlem Özkan will further develop and implement concepts for good data practices. This will be aligned not only with the Center’s requirements and researchers’ needs, but also with international guidelines on data management.
Managing valuable data
Like many other research centers, the MDC needs to ensure access to research data, provide ethical guidance and step up protection. These key issues need to be outlined, addressed and governed by institutional policies. “The task of analyzing, securing, publishing and sharing research data is becoming increasingly complex in the life sciences. All research centers of the Helmholtz Association are working on this task, and none more so than the health research centers,” says Dr. Jutta Steinkötter, Executive Management Scientific Infrastructure. El-Gebali, who heads the RDM team, adds: “The European Commission estimates that the loss of data due to issues associated with data reuse amounts to around €10 billion per year – in the academic sector alone.” As a result, the demand for research data management in international institutions is growing rapidly.
The European Commission estimates that the loss of data due to issues associated with data reuse amounts to around €10 billion per year – in the academic sector alone.
The team’s main focus is to further develop an institutional strategy and road map to achieve good data practices. The team continues the work of their colleague Dr. Julia Haseleu, who previously prepared the fundaments of the MDC’s research data policy and the Center’s commitment to good data practices in collaboration with an external consultant.
Making research work flows more efficient
When it comes to questions concerning data management, El-Gebali and Özkan work at the interface between research, infrastructure, IT, data protection and archival requirements. Together with researchers at the MDC, they want to find software solutions and other tools that enable them to analyze, store and describe their data in the best possible way. They want to make research work flows as efficient as possible, helping researchers to produce and process good quality data that can be easily used and reused.
To access a large microscopic image, for example, a virtual machine would be more suitable than a simple computer with less computing power. This would also allow for smoother data transfer between devices, ensuring that no data are lost in the process. “The nodes of a high-performance computer cluster are also available at the MDC,” says Karsten Häcker, head of Central IT. “Several hundred images – or smaller ‘tiles’ of very large images – can be evaluated simultaneously on these nodes.”
Prior to making data public, our team can offer guidance on which descriptive data and metadata are necessary to enable the reusability of data.
Such processes will also be supplemented by uniform standards for metadata. If a working group wants to make its data public, meta data provide the necessary background knowledge to be able to interpret it. For the analysis of the microscope image, information about the device settings, the examined tissue type, the age of the examined animal and much more is crucial. “For instance, prior to making data public, our team can offer guidance on which descriptive data and metadata are necessary to enable the reusability of data,” adds Özkan. Together with Häcker, the team wants to develop uniform usage concepts, which give researchers either clear guidelines or automated data management processes.
Training and participation
The team will also offer support with publishing practices, including licensing, data ownership, citation, how and where to deposit data and which data to deposit. In the future, these processes should also be supplemented with uniform standards for the creation and storage of metadata. The team is eager to connect with researchers at the MDC and will be offering one-on-one support to researchers as well as training. For all questions regarding data management Özkan and El-Gebali encourage the researchers at the MDC to engage in dialogue, ask questions, and actively participate in decision-making processes on how the MDC should handle scientific data going forward.
Both new members of the team have a scientific background and several years of experience in data management. Özkan specializes in health and genetic data privacy and previously worked as a data scientist at KPMG, while El-Gebali's background in wet-lab cancer research is complemented by her experience as a database curator at EMBL.