Skip to contents

Rminibwa is an R interface to minibwa, Heng Li’s genomic read aligner. It vendors a pinned upstream source tree, builds a package-provided minibwa executable, and exposes a native C-backed API for in-process alignment work.

The CLI wrappers are kept close to upstream behavior. The native interface uses raw query bytes, external-pointer alignment batches, ALTREP column views, and a small C header for downstream packages. SIMD-sensitive KSW code is compiled as separate staged backends and selected at runtime.

Installation

install.packages(
  "Rminibwa",
  repos = c(
    "https://sounkou-bioinfo.r-universe.dev",
    "https://cloud.r-project.org"
  )
)

From a local checkout:

R CMD INSTALL .

Rminibwa builds minibwa from the vendored source during package installation. No external minibwa executable is needed for normal use.

CLI wrapper

library(Rminibwa)

minibwa_version()
prefix <- minibwa_index("ref.fa", threads = 8)
aln <- minibwa_map(prefix, "reads.fq.gz", format = "paf", threads = 8)

minibwa_map() captures output by default. Pass output = "aln.sam" or output = "aln.paf" to write directly to a file.

Native batches

The native path avoids data frames in the hot alignment shape. mb_map() returns an external-pointer batch; columns are exposed lazily to R and can also be read from C.

library(Rminibwa)

td <- tempfile("rminibwa-readme-")
dir.create(td)
ref <- paste(rep("ACGT", 1000), collapse = "")
fa <- file.path(td, "ref.fa")
writeLines(c(">chr1", ref), fa, useBytes = TRUE)
prefix <- file.path(td, "idx")

mb_index_build(fa, prefix, threads = 1L)
idx <- mb_index_load(prefix)
aln <- charToRaw(substr(ref, 1L, 100L)) |>
  mb_map(idx, opt = mb_opts("sr", out_n = 0L), name = charToRaw("read1"))

c(
  n = mb_align_n(aln),
  first_tid = mb_align_col(aln, "tid")[[1]],
  first_qe = mb_align_col(aln, "qe")[[1]],
  cigar_bytes = length(mb_align_cigar_words(aln))
)
#>           n   first_tid    first_qe cigar_bytes 
#>          51           0         100         204

C consumers

Rminibwa.h is installed under inst/include and resolves the runtime API with R_GetCCallable(). Downstream packages can use LinkingTo: Rminibwa for the header and import Rminibwa at runtime before calling the function pointers.

#include <Rminibwa.h>

SEXP summarize_alignment(SEXP x)
{
    const RmbAlignBatch *batch = Rminibwa_align_from_sexp(x);
    size_t n = Rminibwa_align_n(batch);
    const int32_t *tid = Rminibwa_align_i32_col(batch, "tid");
    return Rf_ScalarInteger(n && tid ? tid[0] : NA_INTEGER);
}

A complete in-process consumer compiled with Rtinycc is in vignettes/downstream-c-api.Rmd and inst/capi/rminibwa_tinycc_consumer.c.

SIMD dispatch

The package follows the RsimdDispatch style: compile portable and ISA-specific objects separately, then select an available backend at runtime. On x86_64 this can include SSE4 and AVX2. On other architectures the portable backend is used.

info <- simd_info()
data.frame(
  selected = info$selected_backend,
  compiled = paste(info$compiled_backends, collapse = ", "),
  available = paste(info$available_backends, collapse = ", ")
)
#>   selected           compiled          available
#> 1     avx2 scalar, sse4, avx2 scalar, sse4, avx2

AVX-512 is not built by default. It is useful for experiments, but it would add a larger dispatch and CI surface. Use make asm to inspect the actual instruction families in local builds.

Benchmarks

make rdm MINIBWA_BINDINGS_ROOT=/path/to/minibwa-bindings renders the optional benchmark tables below. The workload uses a chrM-sized random reference and an indel-mutated read so KSW is exercised.

KSW calls on the benchmark read:

backend extz2 extd2 ll_qinit ll_u8_core ll_i16_core ll_i16
scalar 0 438 0 0 0 0
sse4 0 438 0 0 0 0
avx2 0 438 0 0 0 0

Count-only mapper timing by staged backend:

backend min median itr/sec mem_alloc
scalar 1042.4 us 1074.5 us 928.1 0B
sse4 1000.4 us 1013.2 us 981.0 0B
avx2 1051.7 us 1064.0 us 932.3 0B

External comparison using Rminibwa backend avx2:

expression min median itr/sec mem_alloc
rminibwa_count 1035.4 us 1067.7 us 934.3 0B
rminibwa_batch 1005.0 us 1037.6 us 958.1 0B
python_pyo3 1170.8 us 1202.2 us 822.2 4.68KB
rust_cdylib 1026.3 us 1036.0 us 960.8 10.37KB

Development

make rd       # roxygen docs
make test0    # tinytest suite against the installed package
make check    # R CMD check --as-cran --no-manual
make pkgdown  # local site build
make asm      # inspect staged backend assembly