Skip to contents

Rducks 0.1.0

  • Fixed a Windows R-devel check hang in the dev/test in-process queue cancellation coverage. The test previously used R’s elapsed-time limit as a synthetic interrupt while execution was inside native queue-draining code; that interrupt is not delivered reliably on Windows while native code is running. Rducks now exercises the same queue cancel-generation cleanup path with a deterministic dev-only cancellation diagnostic, verifies that the connection and queue remain usable afterward, and keeps pending/running queue counters at zero after cancellation.
  • Protected the transient evaluator object during native scalar-UDF registration so R-devel garbage collection cannot reclaim it before the extension preserves it in DuckDB metadata.
  • Added first-class GEOMETRY and VARIANT type descriptors. GEOMETRY crosses the R boundary as WKB raw bytes; VARIANT is exposed as DuckDB’s typed storage struct wrapped by rducks_variant, with SQL-side DuckDB VARIANT functions remaining the canonical way to construct and inspect semantic values. VARIANT scalar-UDF registration requires a DuckDB runtime C API that exposes VARIANT logical types.
  • Tightened execution-plan and support documentation, documented arrow.bool8 boolean extension handling, clarified aggregate state ownership, and expanded tests for IPC, duckplyr, query streams, and table-stream cardinality.
  • Added package vignettes for getting started, type/value semantics, execution plans and IPC worker lifecycle, and current internal implementation details.
  • Added dynamic-argument scalar UDF registration: omitting args in rducks_register_scalar_udf() registers a DuckDB varargs ANY function while keeping the return type explicit. DuckDB now resolves the concrete argument types at bind time, and Rducks uses those effective types for scalar and vectorized evaluation across the supported arrow_r, arrow_c, and arrow_ipc execution plans, including composite, exotic, and special-NULL inputs. Added rducks_with_duckplyr() and a with.duckdb_connection() method that register named R helpers and rewrite matching duckplyr calls so stingy duckplyr pipelines can stay in DuckDB rather than falling back to dplyr.
  • Renamed the scalar-function registration API to rducks_register_scalar_udf() and clarified terminology across the user documentation: DuckDB function kind (scalar UDF, aggregate function, table function), Rducks scalar-UDF evaluation mode, and Rducks execution plan are distinct concepts.
  • Added rducks_register_aggregate() for R-backed DuckDB aggregate functions. Aggregate state can now be an arbitrary preserved R object rather than only serialized raw bytes, row-wise callbacks use update(state, ...) / combine(left, right) / finalize(state), optional chunk callbacks use update_chunk(states, group_id, ...), combine_chunk(left_states, right_states), and finalize_chunk(states), default NULL handling skips rows with top-level NULL inputs, and execution is explicitly restricted to the recorded calling R thread.
  • Added rducks_register_table() support for both finite and streaming R-backed table functions. The native table-function path infers positional SQL argument count from the R function formals, registers those inputs as DuckDB ANY, converts actual SQL bind values to R values, and calls the R function during DuckDB bind on the recorded calling R thread. Finite results still infer the output schema from a returned data frame/list and import the full result through nanoarrow Arrow C Data, while rducks_table_stream() adds a scan-time next_batch() path driven by a bind-time prototype, optional cardinality metadata, and projection-aware output copying.
  • Added vendored NNG/Mbed TLS source management for the native worker-provider foundation. tools/vendor_nng_mbedtls.R pins and refreshes the vendored sources, source builds statically link a hidden NNG client shim, and dev/test SQL diagnostics expose rducks_nng_version() and rducks_nng_self_test().
  • Added rducks_query_stream() as a connection-bound R-side streaming query object with explicit next_batch(), close(), schema/prototype metadata, finalizer cleanup, and rducks_release() integration. Query streams now use DuckDB’s native streaming result/data-chunk APIs through a dedicated extension-owned query-stream connection, keeping dynamic scalar, table, and aggregate registration on the separate runtime connection. Fetched chunks are exported via DuckDB Arrow C Data and either returned as owned nanoarrow record batches (format = "record_batch") or materialized through the existing Rducks/nanoarrow conversion helpers, without requiring the arrow package.
  • Clarified IPC shared-memory capability metadata and design notes: mori is a same-host path for long-lived globals, while built-in backends still report no SQL chunk shared-memory handle support. Added a diagnostic data-plane benchmark for current Arrow IPC bytes versus per-chunk mori reference costs.
  • Added vendored Apache Arrow nanoarrow C/IPC sources for the native arrow_ipc + multiprocess_parallel path. The vendored code is compiled with -DNANOARROW_NAMESPACE=RducksNanoarrow, flatcc runtime symbols are prefixed, and Rducks uses these local C symbols instead of path-loading the nanoarrow R package shared library. The NNG provider path launches local mirai/nanonext worker loops by default, supports explicit ipc_endpoints, and errors rather than changing to generic process backends, same-process execution, or R serialization.
  • rducks_explain_udf() now reports queue-pending, arrow_c input-snapshot, arrow_c owned-result-chunk, and RIPC diagnostic counters for future native provider work.
  • Added rducks_reset_udf_counters() to reset one UDF’s diagnostic counters or all native UDF counters in the current database runtime without unregistering catalog functions.
  • UDF stat field discovery now comes from native rducks_udf_stat_fields(); the R-side field vector is only a documented compatibility list for sessions where that optional native discovery helper is unavailable.
  • Queued arrow_r helper returns now import into an owned DuckDB result chunk on the recorded main R thread; the waiting worker copies that owned vector into callback output instead of having the main thread write directly into the callback-owned output vector.
  • rducks_explain_udf() and rducks_list_udfs() now include r_side_record to make detached/missing R-side scalar-UDF registry metadata explicit. Native per-UDF hot-path counters are updated with atomics rather than the process-global runtime registry lock.
  • Added rducks_native_execution_backend() to cross-check the native database-scoped execution backend against the R-side current/default execution plan.
  • The Arrow IPC NNG path defaults to one-time scalar-UDF global discovery (ipc_globals = "auto") and then broadcasts explicit globals when the scalar UDF is registered with the provider pool. This avoids per-chunk automatic global discovery while preserving common scalar-UDF globals; set ipc_globals = TRUE, FALSE, a character vector, or a named list to override the behavior.
  • Execution plans now carry a concrete engine_id (for example arrow_c_direct_serial, arrow_c_direct_main_queue, or ipc_nng_pool), and rducks_as_execution_plan() accepts the current engine-id shortcuts.
  • rducks_inproc_stats() now reports main-thread drain attempts, non-empty drain batches, and maximum drain batch size in addition to pending/running queue pressure and timeout semantics.
  • The SQL execution-backend setter now requires the recorded main-thread capability carried by Rducks’ R wrapper, and the main-thread token can no longer be rebound to a different token after initialization. Manual SQL calls with a bare backend string fail instead of mutating runtime execution state.
  • Dev/test-only SQL probes (rducks_parallel_range, rducks_parallel_thread_probe, rducks_queue_self_test, and rducks_thread_is_main) are now registered only when RDUCKS_DEV_SURFACES=true is set before extension load. Production SQL surfaces keep only the registration, execution, and documented statistics helpers.
  • Direct arrow_c scalar-UDF execution and the Arrow/R + Arrow IPC callback paths are fenced with R_tryCatchError() plus R_UnwindProtect() so unexpected marshalling/allocation errors are converted into DuckDB UDF errors without installing a fresh R top-level context inside DuckDB callbacks. RIPC cleanup now releases preserved task/schema objects and decrements in-flight counters on abnormal unwind.
  • Arrow C Data result import now copies the temporary imported DuckDB vector into the callback-owned output vector before destroying the imported chunk, avoiding reliance on reference-vector lifetime semantics.
  • Added direct native arrow_c vectorized UDF support (RCV) for signatures accepted by the direct arrow_c type matrix. Chunk arguments are materialized from DuckDB vectors in C, return rows are written back through the direct writer, and generated marshalling coverage verifies the selected native path. Queued direct arrow_c row-wise and vectorized scalar UDFs now copy input vectors into an owned DuckDB data chunk before the request is submitted to the recorded main R thread. With supported scalar returns, they then evaluate into an owned Arrow C Data result chunk; the waiting worker writes DuckDB output from those Arrow buffers without touching SEXPs or nanoarrow R external pointers. Composite direct returns now use an owned DuckDB result chunk filled on the main R thread and copied into callback output by the waiting worker. The owned return envelope covers primitive, temporal, VARCHAR/BLOB/BIT, DECIMAL, ENUM, UUID, HUGEINT/UHUGEINT, and INTERVAL results; the owned DuckDB result-chunk path covers direct composite returns.
  • Added an internal %||% compatibility shim so the package works under the lowered R 4.3 dependency floor.
  • arrow_c is now a direct marshalling path for row-wise and vectorized scalar-UDF evaluation. Unsupported signatures fail explicitly instead of changing to Arrow/R helper marshalling.
  • Added rducks_explain_udf() and rducks_list_udfs() with native per-UDF execution counters so users can inspect registration metadata and verify that arrow_r/arrow_c chunks ran through the requested evaluator. Added rducks_release_stats() to inspect process-local counters for preserved R objects queued by off-main DuckDB metadata destructors and drained later on the recorded main R thread. Added rducks_runtime_stats() to inspect native runtime registry and extension-owned connection accounting.
  • Added an R-universe badge to the README and lowered the package R dependency floor to R 4.3.
  • Added wasm/webR build detection in configure, including the DuckDB wasm metadata platform and explicit Emscripten export for the extension entrypoint, plus a Dockerfile.webr-test helper for local rwasm builds and a local browser smoke harness under scripts/.
  • Added explicit execution-plan helpers rducks_execution_plan(), rducks_set_execution_plan(), and rducks_current_execution_plan() to separate scalar-UDF semantics from connection-level marshalling/concurrency policy. The arrow_r + serial plan is the reference implementation; unsupported execution-plan combinations fail explicitly through plan validation.
  • Removed per-registration evaluator selection from rducks_register_scalar_udf(). The evaluator is now derived from the active execution plan, so conformance tests compare plan-native registrations instead of mixing evaluator choices inside a single registration call.
  • Added mode = "vectorized" for DuckDB scalar UDFs whose backing R function should be called once per DuckDB chunk with vector/list-column arguments. The vectorized adapter uses the same Arrow C Data/nanoarrow bridge as scalar row-wise mode, enforces return length, defines default vs special NULL handling, and is covered by runtime tests.
  • Added an official in-process queued execution API for scalar UDFs: rducks_enable_inproc(), rducks_disable_inproc(), rducks_inproc_stats(), and rducks_inproc_self_test(). The backend keeps all R API work on the recorded main R thread and uses an extension-owned queue with timeout/error paths rather than a package-side pump or hidden progress callback.
  • Added native queue diagnostics and tests covering main-thread queue draining and scalar-UDF execution through the queued path. rducks_inproc_stats() now reports the configured pending-request timeout and explicitly reports that running queued requests cannot be cancelled safely while they borrow DuckDB callback storage.
  • Split scalar execution and native extension runtime state so UDF metadata uses DuckDB C extension bind/init/local-state hooks and per-loaded-database runtime entries instead of a singleton connection.
  • Initial development scaffold for an R package and DuckDB extension bridge for R user-defined functions.