{% extends 'admin/base.html.twig' %} {% block title %}RetrieX Eval Suite{% endblock %} {% block body %}

RetrieX Eval Suite

Regressionen für Retrieval, Shopquery, Follow-up und Answer-Guard direkt im Admin prüfen.
Zurück zum KI-/LLM-Setup
{% for label in ['success', 'danger', 'warning', 'info'] %} {% for message in app.flashes(label) %}
{{ message }}
{% endfor %} {% endfor %}
Eval läuft ...
Die Regressionstests werden ausgeführt. Bitte die Seite nicht neu laden.
{% for item in overview %} {% set report = item.report %} {% set status = item.status %} {% set badgeClass = status == 'green' ? 'bg-success' : (status == 'red' ? 'bg-danger' : 'bg-secondary') %}
{{ item.label }}
{% if status == 'green' %} grün {% elseif status == 'red' %} rot {% elseif status == 'empty' %} leer {% else %} nicht gelaufen {% endif %}
{{ item.case_count }} Cases
{% if report %}
Total: {{ report.total|default(0) }}
Passed: {{ report.passed|default(0) }}
Failed: {{ report.failed|default(0) }}
{{ report.generated_at|default('') }}
{% else %}
Für diesen Typ liegt noch kein Admin-Report vor.
{% endif %}
{% endfor %}
Eval ausführen
Ohne Case-ID wird der komplette Typ ausgeführt.
Die Case-Liste wird passend zum Eval-Typ gefiltert. Leer lassen, um alle Cases des Typs auszuführen.
CLI-Referenz

Die Admin-Runs schreiben typspezifische Reports nach tests/evals/reports/<type>-last-run.json und zusätzlich den bekannten last-run.json.

{% for type, label in types %}
{{ label }}
php bin/console mto:agent:eval:run {{ type }}
{% endfor %}
{% if last_report %}
Letzter generischer Report: {{ last_report.type|default('unknown') }}, {{ last_report.passed|default(0) }}/{{ last_report.total|default(0) }} bestanden, {{ last_report.generated_at|default('') }}
{% endif %}
Report-Details: {{ types[selected_type]|default(selected_type) }}
{% for type, label in types %} {{ label }} {% endfor %}
{% if selected_report %} {% set selectedFailed = selected_report.failed|default(0) %}
Total
{{ selected_report.total|default(0) }}
Passed
{{ selected_report.passed|default(0) }}
Failed
{{ selectedFailed }}
Generated
{{ selected_report.generated_at|default('') }}
{% for result in selected_report.results|default([]) %} {% else %} {% endfor %}
Status Case Dauer Failures / Details
{% if result.passed|default(false) %} PASS {% else %} FAIL {% endif %} {{ result.case_id|default('') }}
{{ result.type|default('') }}
{{ result.duration_ms|default(0) }} ms {% if result.failures|default([]) is not empty %}
    {% for failure in result.failures %}
  • {{ failure }}
  • {% endfor %}
{% else %}
Keine Fehler.
{% endif %} {% set documentRefs = result.details.document_refs|default([]) %} {% if documentRefs is not empty %}
Gefundene Dokumente
{% for doc in documentRefs %} {% endfor %}
Ranks Titel / Datei Doc-ID Chunks
{{ doc.ranks|default([])|join(', ') }}
{{ doc.title|default('Ohne Titel') }}
{% if doc.file_path|default('') %}
{{ doc.file_path }}
{% endif %} {% if doc.version_number|default('') %}
Version: {{ doc.version_number }}
{% endif %}
{{ doc.id|default('') }} {% for chunkId in doc.chunk_ids|default([]) %} {{ chunkId }}{% if not loop.last %}
{% endif %} {% endfor %}
{% endif %} {% set resultRows = result.details.result_rows|default([]) %} {% if resultRows is not empty %}
Treffer / Chunks anzeigen
{% for row in resultRows %} {% endfor %}
Rank Titel / Datei Chunk Preview
{{ row.rank|default('') }}
{{ row.document_title|default('Ohne Titel') }}
{% if row.file_path|default('') %}
{{ row.file_path }}
{% endif %}
Doc-ID: {{ row.document_id|default('') }}
{{ row.chunk_id|default('') }} {% if row.chunk_index is defined and row.chunk_index is not same as(null) %}
Index: {{ row.chunk_index }}
{% endif %}
{{ row.text_preview|default('') }}
{% endif %}
JSON-Details anzeigen
{{ result.details|default({})|json_encode(constant('JSON_PRETTY_PRINT')) }}
Dieser Report enthält keine Resultate.
{% else %}
Für {{ types[selected_type]|default(selected_type) }} liegt noch kein typspezifischer Admin-Report vor. Starte den Eval oben oder per CLI.
{% endif %}
{% endblock %}