Skip to contents

Runs rduckhts_fasta_nuc() once and writes the GC percentage table to a bgzipped, tabix-indexed TSV file. Pass the resulting path to nipter_gc_correct(gc_table = ...) to avoid recomputing GC content for every sample in a large cohort.

Usage

nipter_gc_precompute(fasta, binsize = 50000L, out, con = NULL)

Arguments

fasta

Path to an indexed reference FASTA file (.fa/.fasta with .fai).

binsize

Bin size in base pairs (default 50000L).

out

Path for the output file. The tabix index is written alongside as <out>.tbi.

con

Optional open DBI connection with duckhts loaded.

Value

out invisibly.

Details

The output is a 5-column, tab-delimited TSV.bgz: chrom, start, end, pct_gc, seq_len. Coordinates are 0-based half-open intervals (BED convention). Chromosomes use no chr prefix (122, X, Y). Bins where all bases are N are written with pct_gc = NA.

Examples

if (FALSE) { # \dontrun{
nipter_gc_precompute("hg38.fa", binsize = 50000L, out = "hg38_gc_50k.tsv.bgz")
cg <- nipter_gc_correct(cg, gc_table = "hg38_gc_50k.tsv.bgz")
} # }