Rducks keeps SQL type semantics explicit. DuckDB owns binding and execution; Rducks maps DuckDB values into R values, calls the R function, and writes values back to DuckDB with the declared result type.
Function kind, scalar-UDF mode, and execution plan
Three concepts are intentionally separate:
- DuckDB function kind: scalar UDF, aggregate function, or table function.
-
Scalar-UDF evaluation mode:
mode = "scalar"calls R once per row;mode = "vectorized"calls R once per DuckDB chunk. -
Scalar-UDF execution plan:
arrow_r,arrow_c, orarrow_ipcmarshalling combined with an allowed concurrency model.
Changing a connection’s default execution plan affects future scalar-UDF registrations and the matching native runtime backend; it does not rewrite an existing scalar UDF to a different marshalling engine.
Declared descriptors
Rducks descriptors describe DuckDB logical types, including primitive, exact, and composite values.
primitive <- list(INTEGER, DOUBLE, BOOLEAN, VARCHAR)
exact <- list(UUID, HUGEINT, DECIMAL(18, 4), INTERVAL, BIT)
semi_structured <- list(GEOMETRY, VARIANT)
composite <- list(
INTEGER[],
ARRAY(DOUBLE, 3),
STRUCT(id = INTEGER, label = VARCHAR),
MAP(VARCHAR, DOUBLE),
UNION(i = INTEGER, s = VARCHAR)
)Declared scalar-UDF arguments pin the SQL signature:
rducks_register_scalar_udf(
con,
name = "r_add_one",
fun = function(x) x + 1L,
args = INTEGER,
returns = INTEGER
)
#> <rducks_scalar_udf_registration>
#> registered: yes
#> name: r_add_one
#> evaluation_mode: scalar
#> plan: arrow_r+serial
#> signature: r_add_one(INTEGER) -> INTEGEROmitting args registers a dynamic DuckDB varargs
function. At bind time, DuckDB supplies the concrete logical types for
the SQL call, and Rducks uses those bound types for the same input
materialization it would use for explicit args.
rducks_register_scalar_udf(
con,
name = "r_payload_label",
fun = function(payload) paste(payload$label, payload$x, sep = ":"),
returns = VARCHAR
)
#> <rducks_scalar_udf_registration>
#> registered: yes
#> name: r_payload_label
#> evaluation_mode: scalar
#> plan: arrow_r+serial
#> signature: r_payload_label(...) -> VARCHAR
DBI::dbGetQuery(con, "
SELECT r_payload_label(struct_pack(x := 3::INTEGER, label := 'a')) AS label
")
#> label
#> 1 a:3Use args = NULL for a true zero-argument UDF.
NULL handling
null_handling = "default" follows DuckDB’s default
scalar-UDF contract: if a top-level input is SQL NULL,
DuckDB produces SQL NULL without calling R.
null_handling = "special" passes top-level SQL
NULL inputs through to R as type-specific missing values so
the R function can decide what to return.
rducks_register_scalar_udf(
con,
name = "r_null_special",
fun = function(x) if (is.na(x)) 5L else x,
args = INTEGER,
returns = INTEGER,
null_handling = "special"
)
#> <rducks_scalar_udf_registration>
#> registered: yes
#> name: r_null_special
#> evaluation_mode: scalar
#> plan: arrow_r+serial
#> signature: r_null_special(INTEGER) -> INTEGER
DBI::dbGetQuery(con, "SELECT r_null_special(NULL::INTEGER) AS x")
#> x
#> 1 5Nested NULLs are part of the nested value. Scalar children usually
become typed NA values, while nested composite NULLs become
NULL.
Error handling and side effects
exception_handling = "rethrow" makes R errors fail the
SQL query. Other error handling modes are explicit choices and should be
tested with the declared return type.
Mark functions with side_effects = TRUE when they depend
on counters, randomness, time, I/O, mutation, sleeps, external state, or
diagnostics. Without that flag, DuckDB may treat a scalar UDF as pure
enough for ordinary SQL optimization.
Runtime reference tables
The package exports compact reference tables so tests and documentation can stay aligned with the implemented semantics.
rducks_mode_semantics()[, c("mode", "call_granularity", "input_shape")]
#> mode call_granularity
#> 1 scalar one R call per row
#> 2 vectorized one R call per DuckDB chunk
#> input_shape
#> 1 one scalar/composite R value per declared or dynamically bound argument
#> 2 one R vector/list-column per declared or dynamically bound argument
rducks_value_semantics()[
rducks_value_semantics()$duckdb_type %in% c("INTEGER", "VARCHAR", "GEOMETRY", "VARIANT", "STRUCT"),
c("duckdb_type", "r_value_class", "special_null_argument")
]
#> duckdb_type r_value_class special_null_argument
#> 6 INTEGER integer NA_integer_
#> 12 VARCHAR character NA_character_
#> 14 GEOMETRY raw NULL
#> 15 VARIANT rducks_variant NULL
rducks_argument_type_mapping(list(
INTEGER,
UUID,
DECIMAL(10, 2),
STRUCT(a = INTEGER[])
))
#> duckdb_type descriptor_kind r_value_class r_argument_shape
#> 1 INTEGER scalar integer integer scalar
#> 2 UUID scalar rducks_uuid rducks_uuid scalar
#> 3 DECIMAL(10, 2) decimal rducks_decimal rducks_decimal scalar
#> 4 STRUCT(a INTEGER[]) struct list named list of fields
#> special_null_argument copy_semantics integer_uses_r_double
#> 1 NA_integer_ boxed scalar FALSE
#> 2 NULL boxed exact Rducks value FALSE
#> 3 NULL boxed exact Rducks value FALSE
#> 4 NULL recursive R allocation FALSE
#> float32_widens_to_r_double precision_may_be_lost
#> 1 FALSE FALSE
#> 2 FALSE FALSE
#> 3 FALSE FALSE
#> 4 FALSE FALSE
#> notes
#> 1
#> 2 exact Rducks value class
#> 3 exact fixed-point value class
#> 4 recursive field mapping