Run Profile Configuration

A run profile defines how evaluation should be performed across task types and case groups.

Use this profile with:

vigilo run test --profile-file <profile.yaml> --dataset-file <dataset.yaml>

File Format

The CLI accepts YAML or JSON.

Recommended filename:

profile.yaml

Top-level shape:

profile_id: mixed_agent_release
profile_version: 1.0.0
description: Release-grade evaluation profile for mixed generative AI agent tasks.
defaults: {}
persistence: {}
agent:
  provider: llama.cpp
  name: qwen2.5-0.5b-instruct
  model: qwen2.5-0.5b-instruct-q4_k_m.gguf
  http:
    url: http://agent_vigilo_agent:8080/v1/chat/completions
  config:
    request_format: openai_compatible_chat_completions
case_groups: []

Top-Level Fields

profile_id (string): stable identifier for this profile.
profile_version (string): profile version, independent of evaluator versions.
description (string): human-readable profile summary.
defaults (object): run-level behavior defaults.
persistence (object): persistence policy controls.
agent (object): required worker-side target invocation configuration.
case_groups (array): per-task evaluator and aggregation definitions.

`defaults`

defaults:
  max_attempts: 2
  request_timeout_secs: 60
  fail_on_any_blocking_failure: true
  min_execution_score: 0.85

max_attempts (u32)
request_timeout_secs (u32)
fail_on_any_blocking_failure (bool)
min_execution_score (f64)

`persistence`

persistence:
  mode: full
  persist_raw_outputs: failures_only
  persist_evaluator_evidence: true

mode (enum): full | summary
persist_raw_outputs (enum): all | failures_only | none
persist_evaluator_evidence (bool)

`agent`

The agent block identifies the target under evaluation and provides the HTTP endpoint workers call before running evaluators.

agent:
  provider: llama.cpp
  name: qwen2.5-0.5b-instruct
  version: 1.0.0
  model: qwen2.5-0.5b-instruct-q4_k_m.gguf
  prompt_config_id: qwen-sentiment-json
  prompt_config_version: 1.0.0
  http:
    method: POST
    url: http://agent_vigilo_agent:8080/v1/chat/completions
    timeout_secs: 120
  config:
    request_format: openai_compatible_chat_completions
    temperature: 0.0
    max_tokens: 96
    response_format:
      type: json_object
    system_prompt: |
      Return only JSON with this exact shape: {"label":"<label>"}.

provider (string): provider or platform for the evaluated target.
name (string): logical agent name.
version (string, optional): deployment or release version.
model (string, optional): provider-specific model identifier.
prompt_config_id (string, optional): prompt/config identity persisted with the run.
prompt_config_version (string, optional): prompt/config version persisted with the run.
http (object): required worker invocation endpoint.
config (object, optional): unstructured agent configuration included in invocation metadata. Set request_format: openai_compatible_chat_completions for OpenAI-compatible servers such as llama.cpp.

By default, workers send a Vigilo case request containing run_id, execution_id, attempt_id, agent, input, and non-oracle case metadata. With request_format: openai_compatible_chat_completions, workers send an OpenAI-compatible model and messages payload.

The response may be either plain text or JSON. JSON responses can return the evaluator-ready output directly, or under actual/output:

{
  "actual": {
    "text": "This seems positive.",
    "structured": { "label": "positive" },
    "tool_calls": [],
    "trace": [],
    "metadata": { "latency_ms": 42 }
  }
}

`case_groups`

Each case group controls evaluator selection and aggregation for matching cases.

case_groups:
  - id: classification
    description: Evaluates classification-style cases.
    applies_to:
      task_type: classification
      tags_any: [safety]
      tags_all: []
    evaluators: []
    aggregation:
      dimensions: {}

`applies_to`

task_type (string, required)
tags_any (string[], optional)
tags_all (string[], optional)

`evaluators[]`

evaluators:
  - ref: core/json-schema:1.0.0
    dimension: format
    blocking: true
    weight: 1.0
    config:
      schema:
        type: object

ref (string): fully qualified evaluator identifier <namespace>/<name>:<version>.
dimension (string)
blocking (bool)
weight (f64)
config (object, optional)

`aggregation.dimensions`

aggregation:
  dimensions:
    format:
      method: min_score
      blocking: true
      weight: 0.0
    correctness:
      method: weighted_mean
      blocking: false
      weight: 1.0

method (enum): min_score | weighted_mean
blocking (bool)
weight (f64)

Full Example

See the repository example profile:

example/profile.yaml

Dataset format: web/docs/configuration/dataset-format.mdx
Example project: example/README.md

File Format​

Top-Level Fields​

defaults​

persistence​

agent​

case_groups​

applies_to​

evaluators[]​

aggregation.dimensions​

Full Example​

Related​