Run Profile Configuration
A run profile defines how evaluation should be performed across task types and case groups.
Use this profile with:
vigilo run test --profile-file <profile.yaml> --dataset-file <dataset.yaml>
File Format
The CLI accepts YAML or JSON.
Recommended filename:
profile.yaml
Top-level shape:
profile_id: mixed_agent_release
profile_version: 1.0.0
description: Release-grade evaluation profile for mixed generative AI agent tasks.
defaults: {}
persistence: {}
agent:
provider: llama.cpp
name: qwen2.5-0.5b-instruct
model: qwen2.5-0.5b-instruct-q4_k_m.gguf
http:
url: http://agent_vigilo_agent:8080/v1/chat/completions
config:
request_format: openai_compatible_chat_completions
case_groups: []
Top-Level Fields
profile_id(string): stable identifier for this profile.profile_version(string): profile version, independent of evaluator versions.description(string): human-readable profile summary.defaults(object): run-level behavior defaults.persistence(object): persistence policy controls.agent(object): required worker-side target invocation configuration.case_groups(array): per-task evaluator and aggregation definitions.
defaults
defaults:
max_attempts: 2
request_timeout_secs: 60
fail_on_any_blocking_failure: true
min_execution_score: 0.85
max_attempts(u32)request_timeout_secs(u32)fail_on_any_blocking_failure(bool)min_execution_score(f64)
persistence
persistence:
mode: full
persist_raw_outputs: failures_only
persist_evaluator_evidence: true
mode(enum):full|summarypersist_raw_outputs(enum):all|failures_only|nonepersist_evaluator_evidence(bool)
agent
The agent block identifies the target under evaluation and provides the HTTP endpoint workers call before running evaluators.
agent:
provider: llama.cpp
name: qwen2.5-0.5b-instruct
version: 1.0.0
model: qwen2.5-0.5b-instruct-q4_k_m.gguf
prompt_config_id: qwen-sentiment-json
prompt_config_version: 1.0.0
http:
method: POST
url: http://agent_vigilo_agent:8080/v1/chat/completions
timeout_secs: 120
config:
request_format: openai_compatible_chat_completions
temperature: 0.0
max_tokens: 96
response_format:
type: json_object
system_prompt: |
Return only JSON with this exact shape: {"label":"<label>"}.
provider(string): provider or platform for the evaluated target.name(string): logical agent name.version(string, optional): deployment or release version.model(string, optional): provider-specific model identifier.prompt_config_id(string, optional): prompt/config identity persisted with the run.prompt_config_version(string, optional): prompt/config version persisted with the run.http(object): required worker invocation endpoint.config(object, optional): unstructured agent configuration included in invocation metadata. Setrequest_format: openai_compatible_chat_completionsfor OpenAI-compatible servers such as llama.cpp.
By default, workers send a Vigilo case request containing run_id, execution_id, attempt_id, agent, input, and non-oracle case metadata. With request_format: openai_compatible_chat_completions, workers send an OpenAI-compatible model and messages payload.
The response may be either plain text or JSON. JSON responses can return the evaluator-ready output directly, or under actual/output:
{
"actual": {
"text": "This seems positive.",
"structured": { "label": "positive" },
"tool_calls": [],
"trace": [],
"metadata": { "latency_ms": 42 }
}
}
case_groups
Each case group controls evaluator selection and aggregation for matching cases.
case_groups:
- id: classification
description: Evaluates classification-style cases.
applies_to:
task_type: classification
tags_any: [safety]
tags_all: []
evaluators: []
aggregation:
dimensions: {}
applies_to
task_type(string, required)tags_any(string[], optional)tags_all(string[], optional)
evaluators[]
evaluators:
- ref: core/json-schema:1.0.0
dimension: format
blocking: true
weight: 1.0
config:
schema:
type: object
ref(string): fully qualified evaluator identifier<namespace>/<name>:<version>.dimension(string)blocking(bool)weight(f64)config(object, optional)
aggregation.dimensions
aggregation:
dimensions:
format:
method: min_score
blocking: true
weight: 0.0
correctness:
method: weighted_mean
blocking: false
weight: 1.0
method(enum):min_score|weighted_meanblocking(bool)weight(f64)
Full Example
See the repository example profile:
example/profile.yaml
Related
- Dataset format:
web/docs/configuration/dataset-format.mdx - Example project:
example/README.md