Convenience wrapper that creates a searcher, searches, and returns a
sassy_matches data frame. Coordinates are 0-based and half-open.
Usage
sassy_search(
pattern,
text,
k,
alphabet = "dna",
rc = TRUE,
alpha = NULL,
all = FALSE,
threads = 1L,
strategy = "pairwise",
pattern_id = NULL,
text_id = NULL,
match_region = FALSE,
sam = FALSE
)Arguments
- pattern
List of raw vectors or non-missing character scalars.
- text
List of raw vectors or non-missing character scalars.
- k
Maximum edit distance.
- alphabet
Alphabet profile. One of
"dna","iupac", or"ascii".- rc
If
TRUE, search reverse-complement strand as well where supported.- alpha
Optional IUPAC overhang cost in
[0, 1]. UseNULLto disable.- all
If
FALSE, return the usual local-minimum matches. IfTRUE, return every end position with score <=k; this can include overlapping and nested candidate alignments and requiresstrategy = "pairwise".- threads
Number of worker threads to request for bulk searches.
- strategy
Search strategy.
"pairwise"searches each pattern/text pair independently and is the general default."batch_texts"uses one text per SIMD lane."batch_patterns"and"encoded_patterns"(alias"v2") use Sassy's multi-pattern encoding, which insassy0.2.1 is implemented foralphabet = "iupac"and equal byte-length patterns.- pattern_id
Optional pattern identifiers. If supplied, must be a non-missing character vector with one entry per pattern and adds/replaces a
pattern_idcolumn. Names onpatternare not inspected.- text_id
Optional text identifiers. If supplied, must be a non-missing character vector with one entry per text and adds/replaces a
text_idcolumn. Names ontextare not inspected.- match_region
If
TRUE, include amatch_regioncolumn. Reverse-strand regions are reverse-complemented so the region and CIGAR are in the input pattern direction.- sam
If
TRUE, format reverse-strandmatch_regionandcigarin the text direction used by SAM and by the upstreamsassy --samoutput.
Value
A data frame with 0-based indices and coordinates: pattern_idx, text_idx, text_start, text_end, pattern_start, pattern_end, cost, strand, and cigar. If pattern_id or text_id are supplied, mapped identifier columns are included. If requested, also includes match_region. Rows are ordered by input text, then text start/end coordinate, then pattern index.