Skip to contents

This function creates k-mers overlapping alleles in population variants and compiles them into one blacklist list file.

Usage

niptmer_ref_blacklist_from_vcf(
  vcf_files,
  reference_fasta_dir,
  k = 25,
  min_af = 0.01,
  engine = c("legacy_compatible", "simple"),
  out_prefix = "pop_snp_blacklist",
  out_dir = "."
)

Arguments

vcf_files

Character vector of VCF paths.

reference_fasta_dir

Directory with chromosome FASTA files named like VCF basenames (e.g. 1.vcf.gz -> 1.fa).

k

K-mer size.

min_af

Minimum allele frequency threshold.

engine

Variant-to-k-mer expansion mode. "simple" expands each variant independently; "legacy_compatible" additionally combines overlapping variants similarly to legacy hapl_generator.pl behavior.

out_prefix

Output prefix for the produced list.

out_dir

Output directory.

Value

A list with output path and basic generation stats.