Demonstration kernel for the runtime dispatch template. count_nonzero()
counts bytes that are not 00 in a raw vector using the currently selected
backend. The default backend is "auto", which selects the best compiled
backend supported by the current CPU/runtime.