R bindings to Sassy through R’s native C API. Results are returned as data frames with 0-based coordinates and CIGAR strings.
Install
Install from r-universe:
install.packages(
"Rsassy",
repos = c("https://sounkou-bioinfo.r-universe.dev", "https://cloud.r-project.org")
)Source installs require Cargo/rustc >= 1.91 and xz. Rust crates are vendored in src/rust/vendor.tar.xz for offline package builds. On Linux, macOS, and Windows, Rsassy installs multiple backend libraries when possible: scalar, AVX2, and AVX-512 on x86_64; scalar and NEON on arm64. The webR/WebAssembly build uses wasm SIMD128; see the browser demo. Rsassy selects the best installed backend supported by the current CPU/runtime when the backend is first loaded.
Usage
library(Rsassy)
sassy_search(list("ATCGATCG"), list("GGGGATCGATCGTTTT"), k = 1, alphabet = "dna")
#> <sassy_matches> 3 matches
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar
#> 0 0 2 10 0 8 1 - 7=1X
#> 0 0 4 12 0 8 0 + 8=
#> 0 0 6 14 0 8 1 - 1=1X6=The result is a sassy_matches data frame with pattern_idx, text_idx, text_start, text_end, pattern_start, pattern_end, cost, strand, and cigar. Coordinates are 0-based and half-open.
Set match_region = TRUE when you also want the matched sequence. For strand == "-", match_region is reverse-complemented so it is in the same direction as the input pattern and CIGAR.
region_matches <- sassy_search(
list("ATCGATCG"),
list("GGGGATCGATCGTTTT"),
k = 1,
alphabet = "dna",
match_region = TRUE
)
region_matches
#> <sassy_matches> 3 matches
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar match_region
#> 0 0 2 10 0 8 1 - 7=1X ATCGATCC
#> 0 0 4 12 0 8 0 + 8= ATCGATCG
#> 0 0 6 14 0 8 1 - 1=1X6= AACGATCGThe print method can color match_region with simple ANSI escape sequences, following the upstream Sassy CLI sassy grep alignment legend: green for matching characters, orange for mismatches, blue for inserted text characters, and red gaps for pattern characters absent from the text. Coloring is off by default and is meant for ANSI-capable interactive terminals.
Reuse a searcher when making repeated calls:
searcher <- sassy_searcher("dna")
sassy_searcher_search(searcher, list("ATCGATCG"), list("GGGGATCGATCGTTTT"), k = 1)
#> <sassy_matches> 3 matches
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar
#> 0 0 2 10 0 8 1 - 7=1X
#> 0 0 4 12 0 8 0 + 8=
#> 0 0 6 14 0 8 1 - 1=1X6=List inputs search every pattern against every text. Each element can be a raw vector or a character scalar, which also leaves room for ALTREP-backed batches as lists. For larger batches, use threads > 1.
sassy_search(
list("ATG", "TTT"),
list("CCCCATGCCCCTTT"),
k = 1,
alphabet = "iupac",
rc = FALSE,
strategy = "encoded_patterns"
)
#> <sassy_matches> 2 matches
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar
#> 0 0 4 7 0 3 0 + 3=
#> 1 0 11 14 0 3 0 + 3=strategy = "encoded_patterns" (alias "v2") is the R equivalent of CLI --v2 for many equal-length short patterns. batch_patterns and encoded_patterns use Sassy’s multi-pattern encoding, which in sassy 0.2.1 is implemented for IUPAC and equal byte-length patterns. The default strategy = "pairwise" is the general path for other alphabets and mixed pattern lengths.
CLI-compatible orientation is available with sam = TRUE. This formats reverse-strand match_region and cigar in text direction, matching upstream Sassy sassy --sam output.
sassy_search(
list("ACGA"),
list("TTTCGTTT"),
k = 0,
alphabet = "dna",
match_region = TRUE,
sam = TRUE
)
#> <sassy_matches> 1 match
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar match_region
#> 0 0 2 6 0 4 0 - 4= TCGTChunked FASTA/FASTQ iteration is available with sassy_fastx_iter() and sassy_fastx_next(). Batches expose record IDs as an ALTREP character vector and sequences as a list of raw ALTREP slices over immutable native buffers, so search can consume file records without first materializing sequence strings in R.
fq <- tempfile(fileext = ".fastq")
writeLines(c("@r1", "ACGT", "+", "!!!!"), fq, useBytes = TRUE)
it <- sassy_fastx_iter(fq, batch_records = 1)
batch <- sassy_fastx_next(it)
sassy_search(list("ACG"), batch$seq, k = 0, alphabet = "dna", rc = FALSE, text_id = batch$id)
#> <sassy_matches> 1 match
#> pattern_idx text_idx text_id text_start text_end pattern_start pattern_end cost strand cigar
#> 0 0 r1 0 3 0 3 0 + 3=CRISPR guide search is available for in-memory sequences with sassy_crispr(). Guides include the PAM suffix; by default the PAM must match exactly under IUPAC matching.
sassy_crispr(list("ACGTNGG"), list("TTTACGTAGGTTT"), k = 0, rc = FALSE)
#> guide cost strand start end match_region cigar
#> 1 ACGTNGG 0 + 3 10 ACGTAGG 7=For file-oriented colored grep, FASTA/FASTQ filtering, and large command-line pipelines, use the upstream Sassy CLI directly.
Inspect the installed build:
sassy_features()
#> <sassy_features>
#> dispatch: dynamic
#> selected backend: avx2
#> installed backends: scalar, avx2, avx512
#> supported backends: scalar, avx2
#> CPU: avx2=yes avx512f=no neon=no
#> Rust backend: avx2 (native_simd=yes)Backend loading is one-shot per R process. If you need to benchmark or debug a specific backend, call sassy_set_backend() before the first native Rsassy call in a fresh Rscript process. See vignette("backend-selection", package = "Rsassy") for the details.
Development
Common development commands from the repository root:
make vendor-rust # refresh src/rust/vendor.tar.xz after Rust dependency changes
make rd # regenerate NAMESPACE and man/*.Rd from R/search.R
make readme # regenerate README.md from README.Rmd
make install # install the package locally
make test # run tinytest tests
make check # build and run R CMD check
make reports # render committed conformance/performance markdown reports
make clean # remove generated build artifactsmake check uses a CRAN-safe default of two Cargo build jobs. Use make check-fast or make CARGO_JOBS=10 check for local multithreaded Cargo builds.