Skip to contents

Creates 50 coordinate-sorted, indexed BAM files in a specified directory using compressed chromosome lengths: each 100kb GRCh37 bin is represented as a 100bp region, yielding identical bin-count structure at a fraction of the file size (~435KB per BAM).

Usage

generate_cohort(out_dir, verbose = TRUE)

Arguments

out_dir

Directory to write BAM files into (created if needed).

verbose

Logical; emit progress messages via message().

Value

A data frame (manifest) with columns: sample_id, sex, trisomy, n_reads, bam_file.

Details

The cohort contains 35 euploid females, 12 euploid males, and 3 trisomy females (T21, T18, T13). Each sample uses approximately 3 reads per bin (~91k reads total). Results are deterministic (sample i uses set.seed(42 + i)).

The generated BAMs are NIPTeR-compatible (no unmapped reads, unique positions per chromosome) and can be used with both the NIPTeR binning layer (nipter_bin_bam()) and the WisecondorX native pipeline (rwisecondorx_newref(), rwisecondorx_predict()) when binned with binsize = COMPRESSED_BINSIZE (100).

The manifest is also written to manifest.tsv in out_dir.

Requires samtools on PATH.

Examples

if (FALSE) { # \dontrun{
manifest <- generate_cohort(tempdir())
head(manifest)
} # }