Skip to contents

Reads all .bed.gz (or .tsv.bgz) files in bed_dir using rduckhts_tabix_multi() — a single multi-file DuckDB scan — and constructs a NIPTeRControlGroup from the results. This is much faster than lapply(files, bed_to_nipter_sample) for large cohorts because all files are read in one pass.

Usage

nipter_control_group_from_beds(
  bed_dir,
  pattern = "*.bed.gz",
  binsize = NULL,
  description = "General control group",
  con = NULL
)

Arguments

bed_dir

Character; directory containing one .bed.gz or .tsv.bgz file per control sample, each produced by nipter_bin_bam_bed.

pattern

Glob pattern for file discovery (default "*.bed.gz").

binsize

Optional integer; bin size in base pairs. If NULL (default), inferred from the first row of the first file.

description

Label for the resulting control group (default "General control group").

con

Optional open DBI connection with duckhts loaded.

Value

A NIPTeRControlGroup.

Examples

if (FALSE) { # \dontrun{
# Bin all controls to BED once
for (bam in bam_files) {
  nipter_bin_bam_bed(bam, file.path("controls/", sub(".bam$", ".bed.gz", basename(bam))))
}
# Load them all at once
cg <- nipter_control_group_from_beds("controls/")
} # }