Bin a BAM/CRAM file — NIPTeR style — nipter_bin

Replicates NIPTeR's bin_bam_sample(): counts reads in fixed-width bins across chromosomes 1-22, X, and Y. The original NIPTeR counts all mapped reads with no MAPQ filter; real-world NIPT pipelines typically pre-filter with samtools view --min-MQ 40 -F 1024 before binning. Both modes are supported through the mapq, exclude_flags, and require_flags parameters.

Usage

nipter_bin_bam(
  bam,
  binsize = 50000L,
  mapq = 0L,
  require_flags = 0L,
  exclude_flags = 0L,
  rmdup = c("none", "flag"),
  separate_strands = FALSE,
  con = NULL,
  reference = NULL
)

Arguments

bam: Path to an indexed BAM or CRAM file.
binsize: Bin size in base pairs. Default 50000 (NIPTeR's fixed bin size). Must match the binsize used when building any control group.
mapq: Minimum mapping quality. Default 0L (NIPTeR original: all mapped reads). Set to 40L to match common NIPT pipeline pre-filtering (samtools view --min-MQ 40).
require_flags: Integer bitmask; only reads with all bits set are kept (samtools -f). Default 0L (no requirement).
exclude_flags: Integer bitmask; reads with any bit set are dropped (samtools -F). Default 0L. Set to 1024L (0x400) to exclude reads marked as duplicates (samtools view -F 1024).
rmdup: Duplicate removal strategy. "none" (default, NIPTeR standard) or "flag" (equivalent to exclude_flags = 1024L; for BAMs already processed by Picard / sambamba). NIPTeR does not perform streaming deduplication.
separate_strands: Logical; when TRUE, produces a SeparatedStrands object with independent forward/reverse count matrices. Default FALSE (CombinedStrands).
con: Optional open DBI connection with duckhts already loaded.
reference: Optional FASTA reference path for CRAM inputs.

Value

An object of class c("NIPTeRSample", <strand_type>):

CombinedStrands (default): autosomal_chromosome_reads is a list of one 22-row integer matrix (rows "1"–"22"); sex_chromosome_reads is a list of one 2-row matrix (rows "X", "Y").

SeparatedStrands (separate_strands = TRUE): autosomal_chromosome_reads is a list of two matrices — element 1 is forward (rows "1F"–"22F"), element 2 is reverse (rows "1R"–"22R"); sex_chromosome_reads similarly contains forward ("XF", "YF") and reverse ("XR", "YR") matrices.

Details

The result is reshaped into a NIPTeRSample object whose structure parallels NIPTeR's NIPTSample: autosomal reads as a chromosome-by-bin matrix and sex chromosome reads as a separate two-row matrix.

When separate_strands = TRUE, forward (+) and reverse (-) reads are counted independently, producing two matrices per chromosome set (class "SeparatedStrands"). This doubles the predictor pool for nipter_regression() — see NIPTeR documentation for details.

Examples

if (FALSE) { # \dontrun{
# NIPTeR original defaults: all mapped reads, no dedup
sample <- nipter_bin_bam("sample.bam", binsize = 50000L)

# Common NIPT pipeline: MAPQ >= 40, exclude duplicate-flagged reads
sample <- nipter_bin_bam("sample.dm.bam", binsize = 50000L,
                         mapq = 40L, exclude_flags = 1024L)

# SeparatedStrands for regression with doubled predictor pool
sample_ss <- nipter_bin_bam("sample.bam", separate_strands = TRUE)

sample$autosomal_chromosome_reads[[1]]["21", ]   # chr21 bin counts
} # }

Bin a BAM/CRAM file — NIPTeR style

Usage

Arguments

Value

Details

See also

Examples