ML Inference with ONNX Runtime
Drop a trained neural network into your MADS network and let it run. The onnx_agent package ships two ready-to-use MADS agents that load any ONNX model and run inference on CPU or GPU: a filter that transforms JSON tensors on the wire, and a source that classifies a live webcam feed. This guide covers building the agents (including the GPU-runtime minefield), driving them with the bundled models, and retargeting them at a model of your own.
What’s in the box
The onnx_agent package gives you two MADS agents and one diagnostic tool, all built on top of ONNX Runtime:
| Executable | MADS role | What it does |
|---|---|---|
mads-onnx-filter |
filter | Subscribes to a topic, receives JSON-encoded tensors, runs an ONNX model, republishes the output tensors as JSON. Works with any model — no recompilation. |
mads-onnx-source |
source | Internally-timed agent that grabs frames from a webcam (via OpenCV), runs an image classifier, and publishes top-k predictions. |
mads-onnx-inspect |
(none) | Standalone utility: prints a model’s input/output schema and a ready-to-paste JSON template, without touching the MADS network. |
The split mirrors the two ways you usually want to use a model:
- As a filter — the data already lives on the MADS bus (sensor streams, feature vectors, time series). You want to transform it: feed tensors in, get tensors out. This is the generic, model-agnostic path.
- As a source — the model originates a new stream from hardware the rest of the network can’t see (here, a camera). This path bakes in image preprocessing and classification post-processing, so it’s specialised but turnkey.
In MADS parlance, an agent that produces data is a source, one that transforms it is a filter, and one that only consumes it is a sink. The onnx_agent ships a filter and a source. If you need a pure sink (e.g. classify-and-log with no republish), the source agent’s loop is the closest reference — see Retargeting the source.
Two fully-worked examples come bundled and tested:
- Chronos-2 time-series forecasting →
mads-onnx-filter - MobileNetV2-12 ImageNet classification →
mads-onnx-source(classifies the sample cat as “tabby” with 74.9% confidence)
onnx_agent/
src/
onnx/ OnnxModel, TensorJson — core library (zero MADS dependency)
source/ DataSource interface, CameraSource (OpenCV), Classify helper
main/ onnx_filter.cpp, onnx_source.cpp, onnx_inspect.cpp
examples/
chronos/ Chronos-2 ONNX time-series forecast
imagenet/ MobileNetV2-12 ImageNet classification
mads.ini Ready-made configuration for both agents
Getting the agents pre-compiled
Version 2.2.x of MADS provides a new command, mads package that pulls and installs pre-compiled binaries of a selection of the most common and general purpose MADS agents and plugins.
To install the ONNX agent, use the following command:
mads package --info mads-onnx
mads package --install mads-onnxThe --info command shows a list of hints on how to configure your own system to run the models. Read them carefully, together with the rest of this guide.
Building the agents
Requirements
| Requirement | Notes |
|---|---|
| C++20 compiler | Clang recommended (LLVM style) |
| CMake ≥ 3.28 | Ninja backend preferred |
| MADS ≥ 2.2.0 | Discovered automatically via mads -p |
| ONNX Runtime 1.26.0 | Downloaded automatically by FetchContent |
| OpenCV 4.x | Optional — needed only for mads-onnx-source |
The two pieces you might expect to fight with — ONNX Runtime and OpenCV — mostly take care of themselves. ONNX Runtime is fetched for your platform at configure time (no system install). OpenCV is found with find_package; if it’s missing, the source agent is simply dropped from the build and the filter + inspect tools build anyway.
Quick start (developer build)
cmake -Bbuild -GNinja -DCMAKE_BUILD_TYPE=Release
cmake --build build -j6MADS is located by running mads -p. Override with -DMADS_ROOT=/path/to/mads or the MADS_ROOT environment variable.
Smoke-test it immediately:
./build/src/main/mads-onnx-inspect -m examples/imagenet/mobilenetv2-12.onnxOn Linux, set export CC=clang CXX=clang++ before configuring. On Windows, run from a Developer Command Prompt for VS 2022 so MSVC and Ninja are on the path.
Self-contained package (for distribution)
To produce a zip you can unzip-and-run anywhere — ONNX Runtime library included:
cmake -Bbuild -GNinja -DCMAKE_BUILD_TYPE=Release \
-DMADS_INSTALL_AGENT=ON \
-DCMAKE_INSTALL_PREFIX="$PWD/package"
cmake --build build -j6
cmake --build build --target packageThe result, build/onnx_agent-<version>-<os>-<arch>.zip, bundles the binaries plus the ONNX Runtime shared library, wired with @executable_path/../lib (macOS) / $ORIGIN/../lib (Linux) so it’s fully relocatable.
To install straight into the MADS prefix instead:
cmake -Bbuild -GNinja -DCMAKE_BUILD_TYPE=Release -DMADS_INSTALL_AGENT=ON
cmake --build build --target install # → $(mads -p)/bin and /libDownloaded binaries are quarantined. After installing, clear the flag once:
xattr -d com.apple.quarantine "$(mads -p)/bin/mads-onnx-filter"
xattr -d com.apple.quarantine "$(mads -p)/bin/mads-onnx-source"
xattr -d com.apple.quarantine "$(mads -p)/lib/libonnxruntime.1.dylib"CMake options worth knowing
| Option | Default | Description |
|---|---|---|
MADS_ROOT |
auto | Path to the MADS installation prefix |
MADS_INSTALL_AGENT |
OFF |
Install agents + ORT lib (required for the package target) |
ORT_VERSION |
1.26.0 |
ONNX Runtime prebuilt version to download |
ONNX_CPU_ONLY |
OFF |
Fetch the small CPU-only ORT archive (disables GPU EPs, faster download) |
VERSION |
from git tag | Package version (override in CI without tags) |
The project version is read from the latest v*.*.* git tag (git tag v1.0.0), or overridden with -DVERSION=1.0.0.
The GPU runtime minefield
This is where ONNX deployments usually go sideways, so it gets its own section.
The one rule to remember
Inference runs on CPU by default. A GPU execution provider (EP) is opt-in, selected by
execution_providerinmads.inior--ep <name>on the CLI. If the requested EP isn’t available, the agent prints a yellow warning and silently falls back to CPU — it never hard-fails.
So a wrong --ep flag costs you speed, never a crash. The active provider (after any fallback) is printed in every startup banner and in mads-onnx-inspect, so you can always confirm what actually ran.
What’s available where
Each prebuilt package ships a single, GPU-capable ONNX Runtime — there’s no separate “GPU build” to download. What differs is which EP that runtime was compiled with:
| Platform | GPU EP | Status | How to enable |
|---|---|---|---|
| macOS arm64 | CoreML (ANE / GPU) | Built in ✅ | execution_provider = coreml (or auto) |
| Windows x64 | DirectML (any DX12 GPU) | Built in ✅ | execution_provider = directml |
| Linux x64 | CUDA (NVIDIA) | Sidecar 🔧 | install CUDA + run enable-cuda.sh |
| Linux aarch64 | — | CPU only | n/a |
macOS — CoreML, zero setup
The standard osx-arm64 ONNX Runtime already includes the CoreML and WebGPU EPs. CoreML routes work to the Apple Neural Engine / GPU via MLComputeUnitsAll. Nothing to install:
mads-onnx-source --ep coreml ... # ANE / GPU
mads-onnx-source --ep auto ... # auto-pick the best EPWindows — DirectML, zero setup
The Windows package ships onnxruntime-win-x64-gpu, whose onnxruntime.dll has DirectML compiled in. DirectML runs on any DirectX 12 GPU (NVIDIA, AMD, Intel) with no extra drivers:
[agents]
execution_provider = "directml"The ~285 MB CUDA sidecar DLLs are deliberately excluded to keep the download small; CPU and DirectML are always available.
Linux x64 — CUDA is opt-in
This is the fiddly one. The Linux x64 package ships the GPU-capable core library (libonnxruntime.so.1.26.0, built against CUDA 12) plus libonnxruntime_providers_shared.so. The core lib runs on CPU out of the box and dlopens the CUDA provider lazily, only when execution_provider = cuda is set — so it loads fine on hosts with no CUDA installed.
The heavyweight CUDA/TensorRT provider libraries (~215 MB) are not in the base package. To turn on CUDA inference:
Install NVIDIA CUDA Toolkit 12.x and cuDNN 9.x on the host.
Run the sidecar installer, which fetches
libonnxruntime_providers_cuda.sointo$(mads -p)/lib:chmod +x scripts/enable-cuda.sh ./scripts/enable-cuda.shSet
execution_provider = "cuda"in the[agents]section.
If in doubt whether your core lib will load on a CUDA-less host, check with ldd libonnxruntime.so.1.26.0 — libcudart should be a lazy/weak dependency, not a hard one. This is the standard ORT design but worth verifying on unfamiliar hardware.
Execution providers and threads — the knobs
All three executables accept the same EP and threading options.
| EP value | Meaning |
|---|---|
cpu |
CPU only — always available, the default |
coreml |
Apple CoreML (ANE / GPU), macOS |
directml |
DirectX ML, Windows GPU |
cuda |
NVIDIA CUDA, Linux/Windows GPU |
webgpu |
WebGPU EP — experimental |
auto |
Pick the best available EP on this platform |
| Thread setting | Default | Meaning |
|---|---|---|
intra_op_threads |
0 |
Parallelism within one operator. 0 = ORT default (all cores). |
inter_op_threads |
0 |
Parallelism across independent graph nodes. 0 = ORT default. |
Leave the thread counts at 0. The old hardcoded 1 was a pessimisation — letting ORT use all cores is almost always faster.
CLI flags (shared by all three binaries) override the INI:
--ep <name> Execution provider
--threads <n> Intra-op thread count
--inter-threads <n> Inter-op thread count
--gpu-device <idx> GPU device index (cuda / directml)
And the matching [agents] INI block (overridable per-agent in [onnx-filter] / [onnx-source]):
[agents]
execution_provider = "cpu" # cpu | coreml | webgpu | cuda | directml | auto
intra_op_threads = 0 # 0 = let ORT choose (recommended)
inter_op_threads = 0
gpu_device_id = 0 # cuda / directml onlyA quick reality check on macOS, where CUDA/DirectML are genuinely absent:
mads-onnx-filter --test -m model.onnx --ep cuda
# [OnnxModel] Warning: execution provider 'cuda' is not available in this
# ONNX Runtime build; falling back to CPU.Talking to a model: the tensor ↔︎ JSON contract
Both agents speak JSON on the MADS bus. Every tensor is one JSON object:
{
"shape": [2, 3],
"data": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
"dtype": "float32"
}shape— integer array; dynamic (symbolic) dimensions appear as-1.data— flat array, row-major (C) order.dtype— optional. When omitted it’s inferred: JSON ints →int32, floats →float32, strings →string, booleans →bool. Explicit types:float32/64,int8/16/32/64,uint8/16/32/64,bool,string.
A message to the filter is a JSON object mapping tensor names to tensor objects:
{
"context": {"shape": [1, 192], "data": [...], "dtype": "float32"},
"attention_mask": {"shape": [1, 192], "data": [...], "dtype": "float32"}
}Don’t guess the schema — ask the model:
mads-onnx-inspect -m examples/imagenet/mobilenetv2-12.onnxInputs:
input
dtype : float32
shape : [?, 3, 224, 224]
Outputs:
output
dtype : float32
shape : [?, 1000]
Example input JSON schema:
{
"input": { "shape": [1, 3, 224, 224], "data": [0.0], "dtype": "float32" }
}
? marks a dynamic dimension (often the batch size). The same output is available via mads-onnx-filter --inspect, or as machine-readable JSON with -j.
Using it as a filter
The filter is the workhorse: subscribe → run model → publish. Timing is input-driven — each received message triggers exactly one inference. Switching models needs no code change, only a different --model-path.
Configuration
[onnx-filter]
model_path = "/path/to/your/model.onnx" # required
pub_topic = "onnx-filter" # default
sub_topic = ["onnx-filter-input"] # default; note: arrayTopic naming follows the standard MADS convention: the default sub topic is onnx-filter-input, the pub topic is onnx-filter. Passing --name foo rewires them to foo-input / foo.
Default model: Chronos-2 forecasting
The bundled example forecasts the chicken.csv time series with HuggingFace’s chronos-2-onnx model.
# 1. Download + patch the model (~456 MB)
pip3 install onnx
bash examples/chronos/download_model.sh
# 2. Inspect the schema
./build/src/main/mads-onnx-filter --inspect -m examples/chronos/model_fixed.onnx
# 3. Build the input JSON from the CSV
python3 examples/chronos/make_input.py # → chronos_input.json
# 4. Run offline, no broker needed
./build/src/main/mads-onnx-filter --test \
-m examples/chronos/model_fixed.onnx \
--input examples/chronos/chronos_input.jsonOutput:
{
"quantile_preds": {
"dtype": "float32",
"shape": [1, 21, 16],
"data": [17.91, 14.56, ..., 167.70] // 21 * 16 = 336 values
}
}
Output shape [1, 21, 16] = 1 batch × 21 quantile levels × 16 future steps. Quantile index 10 (q = 0.50) is the median forecast.
context_lengthmust be a multiple of 16 (the patch size). Left-pad shorter series with zeros and setattention_mask = 0for the padding —make_input.pydoes this.- The released ONNX has a graph bug: a
ConstantOfShapeemitsfloat32values used as integer indices by a downstreamGather, which ORT rejects.fix_model.pyinserts aCast(to=INT64)and savesmodel_fixed.onnx.
Going live
Point the filter at a model and let messages drive it:
./build/src/main/mads-onnx-filter -m examples/chronos/model_fixed.onnxNow any publisher on onnx-filter-input that sends a tensor map matching the schema gets a forecast back on onnx-filter:
{
"context": {"shape": [1, 192], "data": [...], "dtype": "float32"},
"group_ids": {"shape": [1], "data": [0], "dtype": "int64"},
"attention_mask": {"shape": [1, 192], "data": [...], "dtype": "float32"},
"future_covariates": {"shape": [1, 16], "data": [...], "dtype": "float32"},
"num_output_patches": {"shape": [], "data": [1], "dtype": "int64"}
}Using it as a source
The source agent is the turnkey one: it reads a camera, preprocesses each frame into a float32 NCHW tensor, runs a classifier, and publishes top-k predictions on a timer. It requires OpenCV at build time — without it, the binary isn’t built at all.
Default model: MobileNetV2 ImageNet classification
# 1. Download model + labels + a sample image
bash examples/imagenet/download_model.sh
# 2. Smoke-test on a still image — no broker, no camera
./build/src/main/mads-onnx-source --test \
-m examples/imagenet/mobilenetv2-12.onnx \
--image examples/imagenet/sample_cat.jpg \
--labels examples/imagenet/imagenet_classes.txt{
"best": { "class_id": 281, "label": "tabby", "confidence": 0.7495 },
"top_k": [
{ "class_id": 281, "label": "tabby", "confidence": 0.7495 },
{ "class_id": 285, "label": "Egyptian cat", "confidence": 0.1142 },
{ "class_id": 282, "label": "tiger cat", "confidence": 0.1081 }
]
}A plausible top-3 (tabby / Egyptian cat / tiger cat) confirms the whole preprocessing pipeline — resize → BGR→RGB → [0,1] scaling → ImageNet mean/std normalisation → NCHW — is wired correctly.
Going live (webcam + broker)
The bundled mads.ini already has a complete [onnx-source] section:
[onnx-source]
model_path = "examples/imagenet/mobilenetv2-12.onnx"
labels_path = "examples/imagenet/imagenet_classes.txt"
pub_topic = "onnx-source"
period = 100 # ms between frames (~10 fps)
top_k = 5
camera = 0
capture_width = 640
capture_height = 480
# ── Image pre-processing (must match the model's training recipe) ──
input_width = 224
input_height = 224
nchw = true # NCHW (true) vs NHWC (false)
rgb = true # BGR→RGB before normalising
scale = 0.00392156862745098 # 1/255
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
input_tensor_name = "input"./build/src/main/mads-onnx-source \
--settings mads.ini \
-m examples/imagenet/mobilenetv2-12.onnx \
--labels examples/imagenet/imagenet_classes.txtEach tick publishes a { "best": …, "top_k": [...] } message on onnx-source. Loop period, camera index, capture resolution and top-k are all tunable from the INI or the CLI (--period, --camera, --width, --height, -k).
Retargeting to your own model
This is the payoff. There are two very different difficulty levels depending on which agent you’re adapting.
Filter → a new model: trivial
Because the filter is fully model-agnostic, there is no code to write. The recipe:
Inspect the model to learn its input/output names, dtypes and shapes:
mads-onnx-inspect -m your_model.onnxMatch the JSON. Make sure whatever publishes on
onnx-filter-inputemits a tensor map with exactly those input names, dtypes, and row-major data. The schema printed by--inspectis your template.Point and run:
[onnx-filter] model_path = "/path/to/your_model.onnx"mads-onnx-filter -m /path/to/your_model.onnx
That’s the whole job. The only real work is upstream — getting your data into the JSON tensor format (see the schema section). If a model has a graph quirk like Chronos’s, patch the .onnx offline once; the agent itself stays untouched.
Need to feed a filter from another MADS agent? Have that agent build the tensor-map JSON and publish it on the filter’s sub topic. Need a sink that consumes the filter’s output? Point any logger/bridge/sink at the filter’s pub_topic.
Source → a new model: the scenic route
The source agent is harder to retarget because it carries domain logic the filter doesn’t: it knows how to turn camera frames into tensors (preprocessing) and how to turn logits into labelled predictions (post-processing). Three cases, in increasing order of effort.
Case 1 — another image classifier (config only)
If your model is also an ImageNet-style classifier — image in, class logits out — you don’t write code. Just retune the preprocessing constants to match its training recipe and swap the labels file:
[onnx-source]
model_path = "your_classifier.onnx"
labels_path = "your_labels.txt"
input_width = 224 # whatever your model expects
input_height = 224
nchw = true # or false for NHWC models
rgb = true
scale = 0.00392156862745098
mean = [...] # your model's per-channel mean
std = [...] # your model's per-channel std
input_tensor_name = "input" # from --inspectThe agent reads the input tensor name straight from the model metadata, applies softmax + top-k ranking generically, and labels the results from your file. Done.
Case 2 — a different input modality (implement DataSource)
If the input isn’t a camera frame — a sensor array, an audio buffer, a file stream — implement the Onnx::DataSource interface. It is intentionally tiny (src/source/DataSource.hpp):
class DataSource {
public:
virtual bool open() = 0; // acquire resources
virtual void close() = 0; // release them
virtual std::optional<nlohmann::json> next() = 0; // produce one tensor map
};next() returns a JSON tensor map (the same {name: {shape,data,dtype}} format) ready for inference, or std::nullopt to mean “no data yet / end of stream / recoverable error”. The bundled CameraSource is your reference implementation.
Implementation steps:
- Subclass
DataSourcein a new header/source pair undersrc/source/. open()— grab your hardware/file handle; returnfalseon failure.next()— read one sample, shape it into a flat row-major array, and wrap it as{ "<input_name>": { "shape": [...], "data": [...], "dtype": "..." } }. Match the names and dtypes thatmads-onnx-inspectreports.close()— release everything; must be safe even ifopen()failed.- Wire it into the loop. In a copy of
onnx_source.cpp, replace thestd::make_unique<Onnx::CameraSource>(...)construction with your source. The agent loop already callsnext()→run_inference()→publish()on a timer; you’re swapping the front end only. - Add it to CMake in
src/source/CMakeLists.txtand (if it has no OpenCV dependency) consider dropping theHAVE_OPENCVguard so it builds without OpenCV.
Case 3 — a different output task (replace post-processing)
If the model doesn’t emit class logits — say it does detection, segmentation, regression, or forecasting — the softmax/top-k step in Classify no longer applies. Replace the post-processing:
- The raw model output is already available in the loop as a JSON tensor map (
run_inference()returns{output_name: {shape,data,dtype}}). - Drop the call to
Onnx::classify(...)and substitute your own transform: read the output tensor(s) you care about, derive whatever payload makes sense (boxes, masks, a scalar, a forecast), andagent.publish(...)it. - If the post-processing is reusable, mirror the
Classifymodule’s shape — a small free function insrc/source/with a clean header — rather than inlining it inmain.
If your new task is “transform tensors that are already on the bus” rather than “originate a stream from hardware”, you almost certainly want the filter path (Case: trivial) instead of forking the source. Only reach for the source agent when the data genuinely starts outside MADS.
Public API quick reference
Everything lives in namespace Onnx, headers under src/, all Doxygen-documented:
| Symbol | Header | Purpose |
|---|---|---|
OnnxModel |
onnx/OnnxModel.hpp |
RAII model wrapper: load, inspect, run() |
SessionConfig |
onnx/OnnxModel.hpp |
EP + thread settings; graceful CPU fallback |
TensorInfo |
onnx/OnnxModel.hpp |
Tensor name / shape / dtype metadata |
json_to_tensor() |
onnx/TensorJson.hpp |
JSON tensor object → Ort::Value |
tensor_to_json() |
onnx/TensorJson.hpp |
Ort::Value → JSON tensor object |
make_input_schema() |
onnx/TensorJson.hpp |
Generate an example input JSON from a model |
DataSource |
source/DataSource.hpp |
Pure-virtual data-source interface |
CameraSource |
source/CameraSource.hpp |
OpenCV webcam reference implementation |
PreprocessConfig |
source/CameraSource.hpp |
Image preprocessing constants |
classify() / top_k_predictions() |
source/Classify.hpp |
Softmax + top-k post-processing |
The onnx/ library has zero MADS dependency — you can reuse OnnxModel and the tensor↔︎JSON helpers in any C++20 project, not just MADS agents.