FAIR

FAIR is about making sure that the next time you, or someone else, comes across the data it will be possible to understand how it came to be, how it could be reproduced, and how it could be re-used.

Good data management is not a goal, but rather is the key conduit leading to knowledge discovery and innovation, followed by data and knowledge integration and reuse by the community after the data publication process. Unfortunately, the existing data publication system prevents us from extracting maximum benefit from our data (e.g., see an article on the topic).

Our objective is to provide the right infrastructure for good data management and stewardship to facilitate and simplify discovery, evaluation, and reuse in downstream studies. We seek and apply four foundational principles - Findability, Accessibility, Interoperability, and Reusability - not only to ‘data’ in the conventional sense, but also to algorithms, tools, and workflows that led to that data. 

Findable    
The principle of Findability stipulates that data should be identified, described, and registered or indexed in a clear and unequivocal manner. This entails that datasets are assigned a unique and persistent identifier; that the main characteristics of data are systematically specified, ideally using standard formats; and that these are stored or indexed in a public resource such as a data archive or institutional repository. Our proposed solution for that - The Data Catalog.

Accessible    
The principle of Accessibility stipulates that datasets should be accessible through a clearly defined access procedure, ideally by automated means. This entails the establishment of authentication and authorisation procedures for access as well as the implementation of automated data retrieval protocols where appropriate. Metadata should always be accessible even if the underlying data is not or no longer available. Our proposed solution for that - The Data Catalog.

Interoperable
The principle of Interoperability stipulates that data and metadata are conceptualized, expressed, and structured using common, published standards. This entails using standards (technical and semantic) data formats, variables, and ontologies. We are working on defining standards and common ontologies - The Data Model and Standards.

Reusable    
Reusability further specifies the gist of the other principles: characteristics of the data, including their provenance, should be described in detail according to domain-relevant community standards, with clear and accessible conditions for use. This entails providing and publishing accurate and relevant data descriptions, access and usage licenses, the community standards which have been employed in the process as well as the associated provenance for every dataset. See how we are collecting metadata (project and experimental) - The Data Catalog and LIMS.