Bioinformatics Containers as Interfaces
Source:vignettes/bioinformatics-interfaces.Rmd
bioinformatics-interfaces.RmdIntroduction
Bioinformatics packages often exchange rich containers rather than
plain matrices. Bioconductor’s SummarizedExperiment, for
example, keeps assays, feature metadata, sample metadata, names, and
validity rules synchronized. This vignette does not try to rebuild that
class. It uses a small toy container to show what an interface can and
cannot express.
The useful idea is an adapter boundary. A downstream function might not need a specific container class. It might only need to retrieve one assay matrix, or it might need assay names plus feature and sample names. That small behavior can be written as an S7 interface at the point where the downstream function consumes it.
Background
A class and an interface answer different design questions. A class describes representation and invariants: where the assays live, how metadata is stored, and what must be true after construction or subsetting. An interface describes behavior: which operations a consumer may call.
This distinction matters for bioinformatics. A behavioral interface can make small examples, tests, and adapters easier to write, but it does not replace a well-established interoperability class. It cannot enforce genome builds, biological interpretation, delayed computation, or the full set of conventions used by Bioconductor containers.
A toy assay container
The example class stores a named list of assays plus row and column metadata. The validator only checks the dimensions needed below.
MiniSummarizedExperiment <- new_class(
"MiniSummarizedExperiment",
properties = list(
assays = class_list,
row_data = class_data.frame,
col_data = class_data.frame
),
validator = function(self) {
if (length(self@assays) == 0) {
return("@assays must contain at least one matrix")
}
dims <- lapply(self@assays, dim)
if (any(vapply(dims, is.null, logical(1)))) {
return("every assay must be matrix-like")
}
first_dim <- dims[[1]]
same_dim <- vapply(dims, identical, logical(1), first_dim)
if (!all(same_dim)) {
return("all assays must have the same dimensions")
}
if (nrow(self@row_data) != first_dim[[1]]) {
return("@row_data must have one row per assay feature")
}
if (nrow(self@col_data) != first_dim[[2]]) {
return("@col_data must have one row per assay sample")
}
}
)
counts <- matrix(
c(10, 0, 3, 4, 12, 8),
nrow = 3,
dimnames = list(c("geneA", "geneB", "geneC"), c("sample1", "sample2"))
)
mini <- MiniSummarizedExperiment(
assays = list(counts = counts, logcounts = log1p(counts)),
row_data = data.frame(gc = c(0.42, 0.51, 0.37), row.names = rownames(counts)),
col_data = data.frame(condition = c("control", "treated"), row.names = colnames(counts))
)Operations and a consumer-owned interface
Adapters expose behavior through ordinary S7 generics. The toy container below supports assay names, feature names, sample names, and assay lookup, but a consumer should only require the operations it actually uses.
assay_names <- new_generic("assay_names", "x")
feature_names <- new_generic("feature_names", "x")
sample_names <- new_generic("sample_names", "x")
assay_matrix <- new_generic("assay_matrix", "x")
method(assay_names, MiniSummarizedExperiment) <- function(x) names(x@assays)
method(feature_names, MiniSummarizedExperiment) <- function(x) rownames(x@assays[[1]])
method(sample_names, MiniSummarizedExperiment) <- function(x) colnames(x@assays[[1]])
method(assay_matrix, MiniSummarizedExperiment) <- function(x, name = assay_names(x)[[1]]) {
x@assays[[name]]
}
assay_names(mini)
#> [1] "counts" "logcounts"
sample_names(mini)
#> [1] "sample1" "sample2"
assay_matrix(mini, "counts")[, "sample1"]
#> geneA geneB geneC
#> 10 0 3The interface belongs at the point of use. A library-size calculation
does not need feature metadata or sample metadata; it only needs to
retrieve one assay matrix. This mirrors the Go pattern
func Takes(db Database) error: accept the small protocol
the function needs, not a concrete database or a giant package
interface.
LibrarySizeInput <- new_interface(
"LibrarySizeInput",
generics = list(assay_matrix = assay_matrix)
)
library_size <- function(x, assay = "counts") {
assert_implements(x, LibrarySizeInput)
mat <- assay_matrix(x, assay)
colSums(mat)
}
implements(mini, LibrarySizeInput)
#> [1] TRUE
library_size(mini)
#> sample1 sample2
#> 13 24The payoff is testing. A unit test does not need to construct a
realistic MiniSummarizedExperiment or a full Bioconductor
object. It can provide a tiny mock that implements exactly the
consumer-owned protocol.
MockAssays <- new_class("MockAssays", properties = list(assays = class_list))
method(assay_matrix, MockAssays) <- function(x, name = "counts") {
x@assays[[name]]
}
mock_counts <- matrix(
c(1, 2, 3, 4),
nrow = 2,
dimnames = list(c("geneA", "geneB"), c("sampleA", "sampleB"))
)
mock <- MockAssays(assays = list(counts = mock_counts))
implements(mock, LibrarySizeInput)
#> [1] TRUE
library_size(mock)
#> sampleA sampleB
#> 3 7This is the productive use case. A package can write against a small protocol, return an ordinary vector, and let separate adapters provide methods for concrete containers.
When an explicit trait helps
A trait is useful when structural compatibility is not enough. Here the trait records an explicit implementation and stores an associated constant describing assay orientation.
ExperimentLike <- new_trait(
"ExperimentLike",
methods = list(
assay_names = trait_method(assay_names),
feature_names = trait_method(feature_names),
sample_names = trait_method(sample_names),
assay_matrix = trait_method(assay_matrix)
),
assoc_consts = c("ASSAY_ORIENTATION")
)
impl_trait(
ExperimentLike,
MiniSummarizedExperiment,
methods = list(
assay_names = function(x) names(x@assays),
feature_names = function(x) rownames(x@assays[[1]]),
sample_names = function(x) colnames(x@assays[[1]]),
assay_matrix = function(x, name = assay_names(x)[[1]]) x@assays[[name]]
),
assoc_consts = list(ASSAY_ORIENTATION = "features_by_samples"),
replace = TRUE
)
has_trait(mini, ExperimentLike)
#> [1] TRUE
trait_assoc_const(ExperimentLike, mini, "ASSAY_ORIENTATION")
#> [1] "features_by_samples"Design cautions
It would be a mistake to define one large interface or trait that
tries to cover every bioinformatics object. Assay matrices, genomic
ranges, variant calls, and single-cell objects have different invariants
and different performance needs. If a consumer only needs
assay_matrix(), do not make it depend on feature metadata,
sample metadata, genome ranges, and delayed computation as well. Small
interfaces are easier to satisfy correctly and easier to test.
It would also be a mistake to claim that an interface proves biological correctness. Method availability does not prove that samples are comparable, that row ranges use the same genome build, or that an assay transform is appropriate for a downstream model. Those checks should remain explicit and domain-specific.
The narrow conclusion is useful enough: interfaces can mimic a small
behavioral slice of a class such as SummarizedExperiment,
but they should not replace the class or its ecosystem.
References
- The Bioconductor
SummarizedExperimentpackage: https://bioconductor.org/packages/SummarizedExperiment/. - Morgan et al. (2023), “Orchestrating high-throughput genomic analysis with Bioconductor”: https://bioconductor.org/help/publications/.
- The S7 package documentation: https://rconsortium.github.io/S7/.
- Chewxy, “How To Use Go Interfaces”: https://blog.chewxy.com/2018/03/18/golang-interfaces/.
- The
s7contractinterface and trait vignette in this package.