Bioinformatics Containers as Interfaces • s7contract

library(S7)
library(s7contract)

Introduction

Bioinformatics packages often exchange rich containers rather than plain matrices. Bioconductor’s SummarizedExperiment, for example, keeps assays, feature metadata, sample metadata, names, and validity rules synchronized. This vignette does not try to rebuild that class. It uses a small toy container to show what an interface can and cannot express.

The useful idea is an adapter boundary. A downstream function might not need a specific container class. It might only need to retrieve one assay matrix, or it might need assay names plus feature and sample names. That small behavior can be written as an S7 interface at the point where the downstream function consumes it.

Background

A class and an interface answer different design questions. A class describes representation and invariants: where the assays live, how metadata is stored, and what must be true after construction or subsetting. An interface describes behavior: which operations a consumer may call.

This distinction matters for bioinformatics. A behavioral interface can make small examples, tests, and adapters easier to write, but it does not replace a well-established interoperability class. It cannot enforce genome builds, biological interpretation, delayed computation, or the full set of conventions used by Bioconductor containers.

A toy assay container

The example class stores a named list of assays plus row and column metadata. The validator only checks the dimensions needed below.

MiniSummarizedExperiment <- new_class(
  "MiniSummarizedExperiment",
  properties = list(
    assays = class_list,
    row_data = class_data.frame,
    col_data = class_data.frame
  ),
  validator = function(self) {
    if (length(self@assays) == 0) {
      return("@assays must contain at least one matrix")
    }

    dims <- lapply(self@assays, dim)
    if (any(vapply(dims, is.null, logical(1)))) {
      return("every assay must be matrix-like")
    }

    first_dim <- dims[[1]]
    same_dim <- vapply(dims, identical, logical(1), first_dim)
    if (!all(same_dim)) {
      return("all assays must have the same dimensions")
    }

    if (nrow(self@row_data) != first_dim[[1]]) {
      return("@row_data must have one row per assay feature")
    }
    if (nrow(self@col_data) != first_dim[[2]]) {
      return("@col_data must have one row per assay sample")
    }
  }
)

counts <- matrix(
  c(10, 0, 3, 4, 12, 8),
  nrow = 3,
  dimnames = list(c("geneA", "geneB", "geneC"), c("sample1", "sample2"))
)

mini <- MiniSummarizedExperiment(
  assays = list(counts = counts, logcounts = log1p(counts)),
  row_data = data.frame(gc = c(0.42, 0.51, 0.37), row.names = rownames(counts)),
  col_data = data.frame(condition = c("control", "treated"), row.names = colnames(counts))
)

Operations and a consumer-owned interface

Adapters expose behavior through ordinary S7 generics. The toy container below supports assay names, feature names, sample names, and assay lookup, but a consumer should only require the operations it actually uses.

assay_names <- new_generic("assay_names", "x")
feature_names <- new_generic("feature_names", "x")
sample_names <- new_generic("sample_names", "x")
assay_matrix <- new_generic("assay_matrix", "x")

method(assay_names, MiniSummarizedExperiment) <- function(x) names(x@assays)
method(feature_names, MiniSummarizedExperiment) <- function(x) rownames(x@assays[[1]])
method(sample_names, MiniSummarizedExperiment) <- function(x) colnames(x@assays[[1]])
method(assay_matrix, MiniSummarizedExperiment) <- function(x, name = assay_names(x)[[1]]) {
  x@assays[[name]]
}

assay_names(mini)
#> [1] "counts"    "logcounts"
sample_names(mini)
#> [1] "sample1" "sample2"
assay_matrix(mini, "counts")[, "sample1"]
#> geneA geneB geneC 
#>    10     0     3

The interface belongs at the point of use. A library-size calculation does not need feature metadata or sample metadata; it only needs to retrieve one assay matrix. This mirrors the Go pattern func Takes(db Database) error: accept the small protocol the function needs, not a concrete database or a giant package interface.

LibrarySizeInput <- new_interface(
  "LibrarySizeInput",
  generics = list(assay_matrix = assay_matrix)
)

library_size <- function(x, assay = "counts") {
  assert_implements(x, LibrarySizeInput)
  mat <- assay_matrix(x, assay)
  colSums(mat)
}

implements(mini, LibrarySizeInput)
#> [1] TRUE
library_size(mini)
#> sample1 sample2 
#>      13      24

The payoff is testing. A unit test does not need to construct a realistic MiniSummarizedExperiment or a full Bioconductor object. It can provide a tiny mock that implements exactly the consumer-owned protocol.

MockAssays <- new_class("MockAssays", properties = list(assays = class_list))

method(assay_matrix, MockAssays) <- function(x, name = "counts") {
  x@assays[[name]]
}

mock_counts <- matrix(
  c(1, 2, 3, 4),
  nrow = 2,
  dimnames = list(c("geneA", "geneB"), c("sampleA", "sampleB"))
)
mock <- MockAssays(assays = list(counts = mock_counts))

implements(mock, LibrarySizeInput)
#> [1] TRUE
library_size(mock)
#> sampleA sampleB 
#>       3       7

This is the productive use case. A package can write against a small protocol, return an ordinary vector, and let separate adapters provide methods for concrete containers.

When an explicit trait helps

A trait is useful when structural compatibility is not enough. Here the trait records an explicit implementation and stores an associated constant describing assay orientation.

ExperimentLike <- new_trait(
  "ExperimentLike",
  methods = list(
    assay_names = trait_method(assay_names),
    feature_names = trait_method(feature_names),
    sample_names = trait_method(sample_names),
    assay_matrix = trait_method(assay_matrix)
  ),
  assoc_consts = c("ASSAY_ORIENTATION")
)

impl_trait(
  ExperimentLike,
  MiniSummarizedExperiment,
  methods = list(
    assay_names = function(x) names(x@assays),
    feature_names = function(x) rownames(x@assays[[1]]),
    sample_names = function(x) colnames(x@assays[[1]]),
    assay_matrix = function(x, name = assay_names(x)[[1]]) x@assays[[name]]
  ),
  assoc_consts = list(ASSAY_ORIENTATION = "features_by_samples"),
  replace = TRUE
)

has_trait(mini, ExperimentLike)
#> [1] TRUE
trait_assoc_const(ExperimentLike, mini, "ASSAY_ORIENTATION")
#> [1] "features_by_samples"

Design cautions

It would be a mistake to define one large interface or trait that tries to cover every bioinformatics object. Assay matrices, genomic ranges, variant calls, and single-cell objects have different invariants and different performance needs. If a consumer only needs assay_matrix(), do not make it depend on feature metadata, sample metadata, genome ranges, and delayed computation as well. Small interfaces are easier to satisfy correctly and easier to test.

It would also be a mistake to claim that an interface proves biological correctness. Method availability does not prove that samples are comparable, that row ranges use the same genome build, or that an assay transform is appropriate for a downstream model. Those checks should remain explicit and domain-specific.

The narrow conclusion is useful enough: interfaces can mimic a small behavioral slice of a class such as SummarizedExperiment, but they should not replace the class or its ecosystem.

References

The Bioconductor SummarizedExperiment package: https://bioconductor.org/packages/SummarizedExperiment/.
Morgan et al. (2023), “Orchestrating high-throughput genomic analysis with Bioconductor”: https://bioconductor.org/help/publications/.
The S7 package documentation: https://rconsortium.github.io/S7/.
Chewxy, “How To Use Go Interfaces”: https://blog.chewxy.com/2018/03/18/golang-interfaces/.
The s7contract interface and trait vignette in this package.