Data Pipelines

Pipeline process development and integration with research data management.

Firstly, the pipeline is designed and decisions regarding its structure, technology used, and deployment are taken. We use a controlled vocabulary of terms to define our pipelines (ontology).

During the implementation of them we upload the code to GitHub and the pipelines are deployed in the cloud (e.g Microsoft Azure) or the cluster (e.g DTU HPC). When the pipelines are executed Project ID, metadata, raw data and/or databases are retrieved from the data lake (research data management infrastructure) and processed data, results and pipeline metadata are saved to the data lake in the folder corresponding to the project ID.
 

Shotgun metagenomics pipeline to process microbiome samples: https://github.com/biosustain/dsp_nf-metagenomics

Nextflow pipeline for MS-DAP (https://github.com/ftwkoopmans/msdap), a downstream pipeline to generate statistical PDF reports for Mass Spectrometry data
 

The following pipelines have also been integrated in our cloud computing infrastructure:

https://nf-co.re/mag/2.5.4 is a bioinformatics best-practice analysis pipeline for assembly, binning and annotation of metagenomes.

https://nf-co.re/taxprofiler/1.1.6 is a bioinformatics best-practice analysis pipeline for taxonomic classification and profiling of shotgun short- and long-read metagenomic data

https://nf-co.re/nanoseq/3.1.0 is a bioinformatics analysis pipeline for Nanopore DNA/RNA sequencing data

https://github.com/iprada/Circle-Map is python package that implements all the steps required to detect extrachromosomal DNA circles