Publish evaluator artifacts
Version WASM evaluators once, reference them from profiles, and keep scoring logic stable across local runs, CI, and production gates.
Distributed evaluation infrastructure
Distributed Evaluation and Deployment Gating for Generative AI Systems
Publish versioned WASM evaluators, turn findings into total aggregate scores, and gate releases on explicit thresholds and blocking failures.
Version WASM evaluators once, reference them from profiles, and keep scoring logic stable across local runs, CI, and production gates.
Coordinators dispatch durable run chunks while workers call the target agent, execute evaluators, and persist normalized results.
Watch pass/fail outcomes, inspect summaries, and export execution evidence for release decisions and debugging.