data

What become data after being generated?

Access to data and re-use must absolutely be considered. Olga Kennard said many years ago that the foundation of the Cambridge Crystallographic Data Center fulfilled a dream she shared with J.D. Bernal. They “had a passionate belief that the collective use of data would lead to the discovery of new knowledge which transcends the results of individual experiments”.

In this era of data-driven science access to data is of paramount importance.

Generating data

The cycle of data consists in three initial steps aimed at generating and using the data a first time.

  • Raw data is generated by experiments. At this step, there are only almost meaningless numbers.
  • These figures are processed to turn raw numbers into processed values, for instance by calculating percent inhibition, IC50 or pharmacokinetics parameters.
  • Taking all the processed values together and comparing them experiment-wise, the data is analyzed and decisions are taken.

Risks

The first risk associated with stopping the cycle of data after analysis is that decisions are always based on the latest data, without taking in account what was done before. It is like walking in the dark. Scientists cannot see the future, but they do not remember the past either.

As soon as the experiment is finished it is not easy to access all the previous results and to compare them with the results of the new experiments. Of course, all the files remain present somewhere on a disk, but they are in silos and are so difficult to access that they are not used.

The second risk is that all the project history resides in the project leader’s mind. Once this one leaves the company, all the investments on the project vanish.

Preserving data

The cycle of data must therefore continue to make sure results can be re-used:

  • All the data shall be preserved in a secure repository. The processed data shall also be structured to allow on-demand retrieval. Raw data shall be stored in this secure repository so that it can be reprocessed if needed.
  • The data shall then be shared between the members of the project using appropriate technological solutions that enable all the team to access results easily.
  • Once these two conditions are met, on-demand access to data is granted and scientist can take better decision, with all the history of a project in mind.