Skip to main content

Run Profile Configuration

A run profile defines how evaluation should be performed across task types and case groups.

Use this profile with:

vigilo run test --profile-file <profile.yaml> --dataset-file <dataset.yaml>

File Format

The CLI accepts YAML or JSON.

Recommended filename:

  • profile.yaml

Top-level shape:

profile_id: mixed_agent_release
profile_version: 1.0.0
description: Release-grade evaluation profile for mixed generative AI agent tasks.
defaults: {}
persistence: {}
agent:
provider: llama.cpp
name: qwen2.5-0.5b-instruct
model: qwen2.5-0.5b-instruct-q4_k_m.gguf
http:
url: http://agent_vigilo_agent:8080/v1/chat/completions
config:
request_format: openai_compatible_chat_completions
case_groups: []

Top-Level Fields

  • profile_id (string): stable identifier for this profile.
  • profile_version (string): profile version, independent of evaluator versions.
  • description (string): human-readable profile summary.
  • defaults (object): run-level behavior defaults.
  • persistence (object): persistence policy controls.
  • agent (object): required worker-side target invocation configuration.
  • case_groups (array): per-task evaluator and aggregation definitions.

defaults

defaults:
max_attempts: 2
request_timeout_secs: 60
fail_on_any_blocking_failure: true
min_execution_score: 0.85
  • max_attempts (u32)
  • request_timeout_secs (u32)
  • fail_on_any_blocking_failure (bool)
  • min_execution_score (f64)

persistence

persistence:
mode: full
persist_raw_outputs: failures_only
persist_evaluator_evidence: true
  • mode (enum): full | summary
  • persist_raw_outputs (enum): all | failures_only | none
  • persist_evaluator_evidence (bool)

agent

The agent block identifies the target under evaluation and provides the HTTP endpoint workers call before running evaluators.

agent:
provider: llama.cpp
name: qwen2.5-0.5b-instruct
version: 1.0.0
model: qwen2.5-0.5b-instruct-q4_k_m.gguf
prompt_config_id: qwen-sentiment-json
prompt_config_version: 1.0.0
http:
method: POST
url: http://agent_vigilo_agent:8080/v1/chat/completions
timeout_secs: 120
config:
request_format: openai_compatible_chat_completions
temperature: 0.0
max_tokens: 96
response_format:
type: json_object
system_prompt: |
Return only JSON with this exact shape: {"label":"<label>"}.
  • provider (string): provider or platform for the evaluated target.
  • name (string): logical agent name.
  • version (string, optional): deployment or release version.
  • model (string, optional): provider-specific model identifier.
  • prompt_config_id (string, optional): prompt/config identity persisted with the run.
  • prompt_config_version (string, optional): prompt/config version persisted with the run.
  • http (object): required worker invocation endpoint.
  • config (object, optional): unstructured agent configuration included in invocation metadata. Set request_format: openai_compatible_chat_completions for OpenAI-compatible servers such as llama.cpp.

By default, workers send a Vigilo case request containing run_id, execution_id, attempt_id, agent, input, and non-oracle case metadata. With request_format: openai_compatible_chat_completions, workers send an OpenAI-compatible model and messages payload.

The response may be either plain text or JSON. JSON responses can return the evaluator-ready output directly, or under actual/output:

{
"actual": {
"text": "This seems positive.",
"structured": { "label": "positive" },
"tool_calls": [],
"trace": [],
"metadata": { "latency_ms": 42 }
}
}

case_groups

Each case group controls evaluator selection and aggregation for matching cases.

case_groups:
- id: classification
description: Evaluates classification-style cases.
applies_to:
task_type: classification
tags_any: [safety]
tags_all: []
evaluators: []
aggregation:
dimensions: {}

applies_to

  • task_type (string, required)
  • tags_any (string[], optional)
  • tags_all (string[], optional)

evaluators[]

evaluators:
- ref: core/json-schema:1.0.0
dimension: format
blocking: true
weight: 1.0
config:
schema:
type: object
  • ref (string): fully qualified evaluator identifier <namespace>/<name>:<version>.
  • dimension (string)
  • blocking (bool)
  • weight (f64)
  • config (object, optional)

aggregation.dimensions

aggregation:
dimensions:
format:
method: min_score
blocking: true
weight: 0.0
correctness:
method: weighted_mean
blocking: false
weight: 1.0
  • method (enum): min_score | weighted_mean
  • blocking (bool)
  • weight (f64)

Full Example

See the repository example profile:

  • example/profile.yaml
  • Dataset format: web/docs/configuration/dataset-format.mdx
  • Example project: example/README.md