Registers an R-backed DuckDB aggregate. The aggregate state is an arbitrary R
object, not a serialized raw vector. Rducks stores a preserved reference to
the state object inside the native DuckDB aggregate state and passes that same
object back to later R callbacks. Returning NULL means "empty/no state";
use a wrapper such as list(value = NULL) if NULL itself must be
represented as a non-empty state.
Usage
rducks_register_aggregate(
con,
name,
update = NULL,
finalize = NULL,
args,
returns,
combine = NULL,
null_handling = c("default", "special"),
copy = NULL,
copy_chunk = NULL,
update_chunk = NULL,
combine_chunk = NULL,
finalize_chunk = NULL
)Arguments
- con
A
duckdb_connection.- name
SQL aggregate function name.
- update
Optional row-wise R function called as
update(state, ...); may return any R object state orNULL.- finalize
Optional row-wise R function called as
finalize(state); must return a scalar compatible withreturnsorNULLfor SQLNULL.- args
Input type specification. Use exported DuckDB-style descriptors such as
INTEGER,DOUBLE, orVARCHAR.- returns
Return type specification.
- combine
Optional R function called as
combine(left, right)when two non-NULLpartial states must be merged. It may return any R object state orNULL.- null_handling
Either
"default"to skip rows with top-level NULL inputs, or"special"to pass missing values to update callbacks.- copy
Optional R function called as
copy(state)when DuckDB needs to place a non-NULLpartial state into an empty target state during combine. When omitted, Rducks preserves another reference to the same R object.- copy_chunk
Optional vectorized R function called as
copy_chunk(states)with a list of states to copy. It must return a list of replacement states of the same length. It takes precedence overcopy().- update_chunk
Optional vectorized R function called as
update_chunk(states, group_id, ...), wherestatesis a list of current R state objects,group_idmaps each input row to an element ofstates, and the remaining arguments are full R input vectors. It must return a list of replacement states with the same length asstates.- combine_chunk
Optional vectorized R function called as
combine_chunk(left_states, right_states), where both arguments are lists of R state objects orNULL. It must return a list of states of the same length.- finalize_chunk
Optional vectorized R function called as
finalize_chunk(states), wherestatesis a list of R state objects orNULL. It must return one result per state as either a vector or list.
Value
Object of class rducks_aggregate_registration containing the
connection and normalized aggregate signature. The aggregate remains
registered in DuckDB even if this object is discarded.
Details
The row-wise API calls update(state, ...) for each selected input row and
finalize(state) for each output state. The vectorized update API calls
update_chunk(states, group_id, ...) once per DuckDB input chunk. states is
a list of the distinct aggregate-state objects referenced by that chunk, and
group_id is an integer vector with one entry per input row: 0L means the
row was skipped by default NULL handling, otherwise the value is a one-based
index into states. The remaining arguments are full, unsliced R vectors for
the aggregate inputs. update_chunk() must return a list of replacement
states with the same length as states. combine_chunk(left, right) receives
lists of state objects for partial-state merging and must return a list with
one merged state per pair. finalize_chunk(states) must return a vector or
list with one scalar result per output state. Chunk callbacks take precedence
over row-wise callbacks.
This API is deliberately serialized. Registration requires
rducks_enable(con, threads = "single") or equivalent
external_threads=1 plus PRAGMA threads=1, and execution rejects attempts
to call R from non-calling DuckDB worker threads. If DuckDB combines partial
states and the target state is empty, Rducks preserves another reference to
the source R object rather than serializing or deep-copying it. Use copy or
copy_chunk when empty-target combine must create independent mutable state.
Merging two non-NULL states requires either combine(left, right) or
combine_chunk(left, right) and must still run on the recorded R thread.
With null_handling = "default", rows with any top-level SQL NULL input do
not call update() or appear in a positive group_id entry for
update_chunk(). Groups with no non-NULL rows therefore pass NULL to
finalize() or finalize_chunk(). With null_handling = "special", update
callbacks receive the declared type's R missing-value shape for NULL inputs.
Examples
# \donttest{
db <- duckdb::dbConnect(duckdb::duckdb(config = list(allow_unsigned_extensions = "true")))
rducks_enable(db, threads = "single")
rducks_register_aggregate(
db, "my_sum",
update = function(state, x) if (is.null(state)) x else state + x,
finalize = function(state) if (is.null(state)) 0L else state,
args = list(INTEGER), returns = INTEGER
)
#> <rducks_aggregate_registration>
#> registered: yes
#> name: my_sum
#> signature: my_sum(INTEGER) -> INTEGER
DBI::dbGetQuery(db, "SELECT my_sum(x) FROM (VALUES (1), (2), (3)) t(x)")
#> my_sum(x)
#> 1 6
rducks_release(db)
DBI::dbDisconnect(db)
# }