Count reads per bin from a BAM or CRAM file
bam_convert.RdCore read-counting engine shared by the WisecondorX and NIPTeR layers.
Reads aligned reads from a BAM or CRAM file and returns per-bin read counts
for chromosomes 1-22 and the sex chromosomes (X mapped to key "23", Y to
"24").
Usage
bam_convert(
bam,
binsize = 5000L,
mapq = 1L,
require_flags = 0L,
exclude_flags = 0L,
rmdup = c("streaming", "flag", "none"),
separate_strands = FALSE,
con = NULL,
reference = NULL
)Arguments
- bam
Path to an indexed BAM or CRAM file.
- binsize
Bin size in base pairs. Default
5000L(WisecondorX); use50000Lfor NIPTeR-style workflows.- mapq
Minimum mapping quality. Default
1L(WisecondorX / samtools default). Set to0Lto retain all reads regardless of MAPQ (NIPTeR).- require_flags
Integer bitmask. Only reads for which
(FLAG & require_flags) == require_flagsare retained.0L(default) imposes no requirement. Example:require_flags = 0x2Lkeeps only properly paired reads.- exclude_flags
Integer bitmask. Reads for which
(FLAG & exclude_flags) != 0are dropped.0L(default) drops nothing. Example:exclude_flags = 0xF04Ldrops unmapped, secondary, QC-fail and supplementary reads (common samtools pre-filter).- rmdup
Duplicate removal strategy.
"streaming"(default) applies the WisecondorX larp/larp2 algorithm (also excludes improper pairs)."flag"drops reads with SAM flag0x400."none"keeps all reads that pass the other filters.- separate_strands
Logical; when
TRUE, returns per-strand counts (forward+and reverse-). The return value changes to a list of two named lists (fwdandrev), each structured like the default return. Used bynipter_bin_bam(separate_strands = TRUE)for the NIPTeR SeparatedStrands object. DefaultFALSE.- con
Optional open DBI connection with duckhts already loaded. If
NULL(default) a temporary in-memory DuckDB connection is created.- reference
Optional FASTA reference path for CRAM inputs.
Value
When separate_strands = FALSE (default): a named list with one
integer vector per chromosome key ("1"–"22", "23" for X, "24" for
Y). Each vector contains per-bin read counts (bin 0 = positions 0 to
binsize - 1). Chromosomes absent from the BAM are NULL.
When separate_strands = TRUE: a list with two elements, fwd and rev,
each structured like the default return.
Details
Read filtering mirrors the samtools view convention: mapq sets the
minimum mapping quality; require_flags is a bitmask of flags that must
all be set (equivalent to samtools view -f); exclude_flags is a
bitmask of flags that must all be clear (equivalent to
samtools view -F). Use the duckhts UDF sam_flag_bits(flag) to inspect
named flag fields, or sam_flag_has(flag, bit) to test individual bits.
rmdup controls duplicate removal independently of the flag filters:
"streaming" applies the WisecondorX larp/larp2 consecutive-position state
machine and also enforces the WisecondorX improper-pair rule (paired reads
that are not properly paired are excluded from both counting and the dedup
state — this is intrinsic to the algorithm, not a flag option); "flag"
additionally excludes reads with SAM flag 0x400 set (Picard / sambamba
pre-marked duplicates); "none" applies no deduplication.
Examples
if (FALSE) { # \dontrun{
# WisecondorX defaults
bins <- bam_convert("sample.bam")
# NIPTeR defaults — all mapped reads, no dedup, 50 kb bins
bins <- bam_convert("sample.bam", binsize = 50000L, mapq = 0L,
rmdup = "none")
# Pre-filtered BAM: skip unmapped + secondary + supplementary, flag dedup
bins <- bam_convert("sample.bam",
exclude_flags = bitwOr(0x4L, bitwOr(0x100L, 0x800L)),
rmdup = "flag")
} # }