Changelog
RWisecondorX (development version)
SeparatedStrands BED expanded to 9 columns; BED readers switched to read_tabix()
nipter_bin_bam_bed(separate_strands = TRUE)now writes 9-column BED files:chrom,start,end,count,count_fwd,count_rev,corrected_count,corrected_fwd,corrected_rev. The previous 7-column format lacked per-strand corrected values, so GC-corrected SeparatedStrands samples lost their per-strand correction information on the BED round-trip.bed_to_sample()andbed_to_nipter_sample()now useread_tabix()instead ofread_bed()for reading BED.gz files.read_tabix()returns all columns as VARCHAR with no BED-schema type coercion, avoiding the problem whereread_bed()maps columns 7-8 tothick_start/thick_end(INTEGER) and rejects double-precision corrected values.bed_to_nipter_sample()format detection now usestryCatch()around thecolumn5probe query, so it correctly falls back to CombinedStrands when the file has only 5 columns (wherecolumn5does not exist inread_tabix).New tests for corrected per-strand round-trip: CombinedStrands corrected values and SeparatedStrands per-strand corrected values (with independent forward/reverse multipliers) survive the write→read cycle within floating-point tolerance. Independence of forward and reverse corrections is explicitly verified.
Fixed R CMD check NOTEs:
utils::globalVariables()for mclust symbols inR/aaa.R;utils::write.table()qualified inR/convert.RandR/nipter_bin.R.Total test count: 391 assertions, all passing.
BED.gz reader functions — close the round-trip
New
bed_to_sample()reads a 4-column BED.gz file (written bybam_convert_bed()) back into the named-list-of-integer-vectors format expected byrwisecondorx_newref()andrwisecondorx_predict(). This closes the WisecondorX round-trip: bin once withbam_convert_bed(), store the BED.gz, and reload for analysis without re-reading the BAM.New
bed_to_nipter_sample()reads a 5-column (CombinedStrands) or 9-column (SeparatedStrands) BED.gz file (written bynipter_bin_bam_bed()) into aNIPTeRSampleobject compatible with all NIPTeR statistical functions. Column count is auto-detected. Handles literal"NA"strings in thecorrected_countfield viaTRY_CAST. Sample name is inferred from the filename or set explicitly.Both functions accept an optional DuckDB connection for reuse across multiple files, creating one internally (with
allow_unsigned_extensions = "true") when none is supplied.New
inst/tinytest/test_bed_reader.R— 46 assertions covering WisecondorX 4-column round-trip, NIPTeR CombinedStrands 5-column round-trip, SeparatedStrands 9-column round-trip (all four matrices), corrected per-strand round-trip, sample name inference, and integration withscale_sample().Total test count: 391 assertions (46 in test_bed_reader.R), all passing.
Native WisecondorX implementation
New
rwisecondorx_newref()— pure-R/Rcpp implementation of the WisecondorXnewrefpipeline. Takes a list of binned samples (frombam_convert()) and builds a PCA-based reference: gender model training (2-component GMM on Y-fractions), global bin masking, per-partition normalize/PCA/filter/KNN reference building, and null ratio computation. Supports NIPT mode, custom Y-fraction cutoff, and multi-threaded KNN via OpenMP.New
rwisecondorx_predict()— pure-R implementation of the WisecondorXpredictpipeline. Coverage normalization, PCA projection, iterative within-sample normalization with aberration masking, gonosomal normalization, result inflation, log-transformation, optional blacklist masking, CBS segmentation (via DNAcopy or ParDNAcopy), segment Z-scoring, and aberration calling. Supports both Z-score and beta/ratio calling modes.New
rwisecondorx_write_bins_bed(),rwisecondorx_write_segments_bed(),rwisecondorx_write_aberrations_bed(), andrwisecondorx_write_statistics()for writing prediction results to BED and statistics files.New
scale_sample()for rescaling binned samples between different bin sizes (e.g. 5kb → 100kb).R/rwisecondorx_utils.R— shared utilities including.train_gender_model()(with mclust NULL fallback for zero-variance Y-fractions in all-female cohorts),.gender_correct(),.get_mask(),.normalize_and_mask(),.train_pca(),.project_pc(),.predict_gender().R/rwisecondorx_cbs.R—.exec_cbs()wrapper around DNAcopy’ssegment()with the upstream WisecondorX conventions (0→NA, 0 weights→1e-99, split segments at large NA gaps, recalculate weighted means). Supports ParDNAcopy for parallel segmentation.
Rcpp acceleration for KNN reference building
New
src/knn_reference.cppimplementingknn_reference_cpp()andnull_ratios_cpp()in C++ with OpenMP parallelization. Replaces the O(n_bins² × n_samples) pure-R double for-loop with compiled code. The KNN reference-finding step that previously took minutes now completes in seconds.Rcppadded toImportsandLinkingToin DESCRIPTION. OpenMP flags insrc/Makevarsandsrc/Makevars.win.
Fix: KNN index semantics in predict normalization
Fixed a correctness bug in
.normalize_once()where global KNN indexes (stored byknn_reference_cpp()) were used directly to index intochr_data, the leave-one-chromosome-out subset. Global indexgis now correctly mapped to local space:gif before the excluded chromosome,g - n_chrif after. This bug caused reference bins to be looked up at wrong positions during within-sample normalization.Note: upstream WisecondorX has the inverse issue — it stores LOCAL indexes during
newrefbut uses them as GLOBAL indexes in thenull_ratiocomputation. Our implementation stores GLOBAL indexes and now correctly handles both predict (local conversion) and null_ratio (global direct use).
Fix: null ratio column count mismatch
- Fixed
rwisecondorx_predict()crash when the autosomal sub-reference (“A”) and gonosomal sub-reference (“F”) have different numbers of null-ratio columns (due to different sample counts per partition). Both are now truncated tomin(ncol(aut), ncol(gon))beforerbind().
Fix: mclust GMM on zero-variance Y-fractions
-
.train_gender_model()now handles the case where all female samples have exactly 0.0 Y-fraction (no Y reads).mclust::Mclust(..., G=2)returnsNULLin this scenario; the fix falls back to a gap-based cutoff:min(nonzero_y_fractions) / 2.
Synthetic cohort generator
New
generate_cohort()creates synthetic BAM files for testing using “compressed” chromosome lengths (100bp per bin instead of 100kb). Produces identical bin COUNT structure to GRCh37 at 100kb resolution with ~435KB BAMs. Supports injecting trisomy signal (extra reads on target chromosome).New
COMPRESSED_BINSIZEconstant (100L) exported for use withbam_convert(binsize = COMPRESSED_BINSIZE).inst/scripts/make_cohort.R— CLI wrapper for cohort generation.
Test infrastructure cleanup
All test files now use
library(RWisecondorX)instead ofsource()hacks to load R code directly. This is required because functions call compiled C++ code (knn_reference_cpp,null_ratios_cpp) which is only available in an installed package.Fixed
tinytestassertion:expect_message(..., pattern = NA)does NOT mean “expect no messages” — it fails if no message is emitted. Replaced withexpect_silent().New
inst/tinytest/test_rwisecondorx.R— 76 assertions covering reference building (gender model, masking, PCA, KNN indexes, null ratios), prediction (normalization, CBS, aberration calling), and trisomy detection sensitivity.New
inst/tinytest/test_cohort_pipeline.R— 31 assertions for end-to-end pipeline: cohort generation → binning → reference building → prediction → trisomy detection. Tests that trisomy 21, 18, and 13 are detected as gains and euploid samples produce no aberrations.Total test count: 345 assertions, all passing.
Y-unique region ratio for sex prediction
New
nipter_y_unique_ratio()counts reads overlapping 7 Y-chromosome unique gene regions (HSFY1, BPY2, BPY2B, BPY2C, XKRY, PRY, PRY2) and computes the ratio to total nuclear genome reads. Uses DuckDB/duckhts index-based region queries (read_bam(region := ...)) for efficient BAM access. The bundled GRCh37 regions file can be replaced with a custom file for other assemblies.New
nipter_sex_model_y_unique()fits a 2-component GMM on Y-unique ratios (one per BAM in a cohort), producing aNIPTeRSexModelcompatible withnipter_predict_sex().nipter_predict_sex()gains ay_unique_ratioparameter for passing a pre-computed Y-unique ratio when a"y_unique"model is included in the consensus vote. This enables the full 3-model majority-vote sex prediction pipeline (Y-unique ratio + Y fraction + XY fractions).Bundled
inst/extdata/grch37_Y_UniqueRegions.txt— TSV of 7 GRCh37 Y-unique regions used for the Y-unique ratio calculation.
Sex prediction via Gaussian mixture models
New
nipter_sex_model()fits a 2-component GMM on sex chromosome fractions from aNIPTeRControlGroupusingmclust::Mclust(). Supports"y_fraction"(univariate on Y-chromosome fraction) and"xy_fraction"(bivariate on X + Y fractions) methods. The male cluster is identified as the component with higher median Y fraction.New
nipter_predict_sex()classifies aNIPTeRSampleas male or female given one or moreNIPTeRSexModelobjects. Multiple models use majority vote consensus (tie defaults to “female” — conservative for NIPT).mclustadded toSuggestsin DESCRIPTION.
nipter_bin_bam_bed() SeparatedStrands output
-
nipter_bin_bam_bed()gains aseparate_strandsparameter. WhenTRUE, outputs a 9-column BED (chrom,start,end,count,count_fwd,count_rev,corrected_count,corrected_fwd,corrected_rev) wherecount = count_fwd + count_rev. WhenFALSE(default), the 5-column BED format is unchanged.
SeparatedStrands support
bam_convert()gains aseparate_strandsparameter. WhenTRUE, returns a list withfwd(forward strand) andrev(reverse strand) data frames.nipter_bin_bam()gainsseparate_strands = TRUEsupport, producingNIPTeRSampleobjects with classc("NIPTeRSample", "SeparatedStrands"). Autosomal chromosome reads are stored as a list of two matrices (forward and reverse) with rownames"1F".."22F"and"1R".."22R".-
All NIPTeR statistical functions now support SeparatedStrands samples:
-
nipter_gc_correct(): LOESS/bin-weight fitted on summed strand counts, corrections applied independently to each strand matrix. -
nipter_chi_correct(): chi-squared computed on summed strands, correction applied per-strand. -
nipter_z_score()andnipter_ncv_score(): use collapsed (F+R) fractions. -
nipter_regression(): doubles the predictor pool (44 candidates:"1F".."22F","1R".."22R") with complementary exclusion within each model (selecting"5F"excludes both"5F"and"5R").
-
WisecondorX CLI wrapper fixes
wisecondorx_predict(): removed hallucinatedref_binsize/--binsizeparameter that does not exist in the upstreampredictsubcommand.wisecondorx_newref(): fixedref_binsizedefault from50000Lto100000Lto match upstream default.wisecondorx_predict(): addedadd_plot_titleparameter mapping to upstream--add-plot-title.