Build a sex prediction model from a NIPTeR control group
nipter_sex_model.RdFits a two-component Gaussian mixture model (GMM) on sex chromosome
fractions derived from a NIPTeRControlGroup. The model
distinguishes male and female samples based on Y-chromosome read fraction
(univariate) or X+Y chromosome fractions (bivariate).
Usage
nipter_sex_model(control_group, method = c("y_fraction", "xy_fraction"))Arguments
- control_group
A
NIPTeRControlGroupobject. Ideally chi-squared corrected vianipter_chi_correct.- method
Character; the feature space for the GMM:
"y_fraction"Univariate: Y-chromosome read fraction relative to total autosomal reads.
"xy_fraction"Bivariate: X and Y chromosome fractions.
Value
An object of class "NIPTeRSexModel" with elements:
- model
The
mclust::Mclustfitted object.- method
Character; the method used.
- male_cluster
Integer (1 or 2); which cluster is male (higher Y fraction).
- classifications
Named character vector of
"male"/"female"labels for each control sample.- fractions
Matrix of the input features used for fitting (samples as rows). Columns are
"y_fraction"orc("x_fraction", "y_fraction").
Details
This mirrors the approach used in clinical NIPT pipelines (see
Details), ported to R using mclust::Mclust() in place of
Python's sklearn.GaussianMixture.
The algorithm:
Compute sex chromosome read fractions for every sample in the control group. Y fraction =
sum(Y bins) / sum(autosomal bins); X fraction =sum(X bins) / sum(autosomal bins).Fit a two-component Gaussian mixture with equal mixing proportions (
mclust::Mclust(data, G = 2, control = mclust::emControl(equalPro = TRUE))).Identify the male cluster as the component with the higher median Y fraction.
This follows the clinical NIPT pipeline pattern of building sex models
from control cohorts. The user's pipeline builds three models (Y-unique
ratio from samtools, XY fractions, Y fraction) and takes a majority vote;
nipter_predict_sex() implements the consensus when given multiple
models.