NGS in medical genetics
Malian Data Science and Bioinformatics Network (MD-BioNet)
February 21, 2024
Deoxyribonucleic acid, commonly referred to as DNA, is a sophisticated molecule housing all the essential information required for the construction and sustenance of of all living organisms.
Source : https://www.geeksforgeeks.org/difference-between-dna-and-rna/
The determination of the order of nucleotides within a DNA molecule is called DNA sequencing (DNASeq)
RNA sequencing or RNASeq is the same process for mRNA molecules
Any process or technology that is used to achieve this goal
source: Aimin Yang, Wei Zhang, Jiahao Wang, Ke Yang, Yang Han and Limin Zhang - doi:10.3389/fbioe.2020.01032
Source : Ronholm, Jennifer & Nasheri, Neda & Petronella, Nicholas & Pagotto, Franco. (2016). Navigating Microbiological Food Safety in the Era of Whole-Genome Sequencing. Clinical Microbiology Reviews. 29. 10.1128/CMR.00056-16.
source: https://twitter.com/PacBio/status/1233091102800011266/photo/1
source: https://www.genome.gov/sequencingcosts/
Increasing amount of data generated (Stephens et al. 2015)
source : https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Data storage and processing
Intepretaton
Integration between omics and other sources of information (López de Maturana et al. 2019)
In these applications the approach may be targeted or not genome/transcriptom wide.
source : 2019 ACE-B NGS Intro Course , Dr. Ghedira Kais
This will depend on the type of application (Head et al. 2014) . But in general
DNA/cDNA extraction and purification
Enrichment : targeted or not
adapters ligation and amplification
source: https://wp.unil.ch/gtf/illumina-short-read-sequencing/
source : https://biocorecrg.github.io/RNAseq_course_2019/fastq.html
Example of the possibilities for secondary analysis depending on the application (Garcia et al. 2020)
source : https://github.com/nf-core/sarek
source : https://uofabioinformaticshub.github.io/Intro-NGS-Sept-2017/notes/variant_calling.html
Interpretation of genomic variation is complex (Quintáns et al. 2014)
The goal here is to access the overall quality of the base calls made
by the sequencer and detect possible anomalies. There are several tools to perform this, here we used fastp (Chen et al. 2018)
After this, decision can be made to do additional quality filtering such as :
Further adapter trimming
Quality trimming
Quality filtering
GC content is a very important parameters here.
we get individual reads but they usually come from similar regions of the targeted DNA/RNA molecules when we are doing short reads sequencing
You can either assemble the puzzle from the reads alone (denovo-assembly), use a reference or mix these strategies
Several tools exist for the mapping to a reference genome. Note on the mapping process:
The ouput of this step is a SAM/BAM file whose specification can be found https://github.com/samtools/hts-specs.
source : https://www.samformat.info/sam-format-flag
Here you can detect potential issues with the sequencing reads. One should check :
view mutliqc file for illustration
There are several possibilities to call variants from BAM files. Here is an illustration of GATK best practices for exome variant calling (Alganmi and Abusamra 2023)
source : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399881/
Alternative small nucleotide variation (SNV) workflow for this tutorial. GATK based workflow are usable on plateforms like https://app.terra.bio/ or https://seqera.io/.
Test your workflow against established datasets
Compare it to alternatives for precision and sensitivity
Accurary for single nucleotide variants
Accurancy Insertion/deletion
Several established methods as well as machine learning models
Considerations here are :
Accuracy
Usability
Cost
Access : open source, commercial
For example services like VEP, OpenCravat, Wannovar are free for academic use. Commercial services like Varsome have restrictions.
Misdiagnosis
Phenotypic complexity
Lack of knowledge of molecular action of predicted deleterious variants
NGS does increase the diagnostic yield for poorly characterized diseases
(Yska et al. 2019) . An Machine learning models can make contributions (O’Brien et al. 2022)