Data Pipelines
Pipeline process development and integration with research data management.
During the implementation of them we upload the code to GitHub and the pipelines are deployed in the cloud (e.g Microsoft Azure) or the cluster (e.g DTU HPC). When the pipelines are executed Project ID, metadata, raw data and/or databases are retrieved from the data lake (research data management infrastructure) and processed data, results and pipeline metadata are saved to the data lake in the folder corresponding to the project ID.
Shotgun metagenomics pipeline to process microbiome samples: https://github.com/biosustain/dsp_nf-metagenomics
The following pipelines have also been integrated in our cloud computing infrastructure:
https://nf-co.re/mag/2.5.4 is a bioinformatics best-practice analysis pipeline for assembly, binning and annotation of metagenomes.
https://nf-co.re/taxprofiler/1.1.6 is a bioinformatics best-practice analysis pipeline for taxonomic classification and profiling of shotgun short- and long-read metagenomic data
https://nf-co.re/nanoseq/3.1.0 is a bioinformatics analysis pipeline for Nanopore DNA/RNA sequencing data
https://github.com/iprada/Circle-Map is python package that implements all the steps required to detect extrachromosomal DNA circles