195 lines
9.7 KiB
Twig
195 lines
9.7 KiB
Twig
{% extends 'admin/base.html.twig' %}
|
|
|
|
{% block title %}Eval-Case erstellen{% endblock %}
|
|
|
|
{% block body %}
|
|
|
|
<div class="d-flex justify-content-between align-items-center mb-4 flex-wrap gap-2">
|
|
<div>
|
|
<h1 class="h3 mb-1">
|
|
<i class="bi bi-journal-plus"></i> Eval-Case erstellen
|
|
</h1>
|
|
<div class="small text-secondary">
|
|
Neue Regression-Cases separat anlegen, ohne die Eval-Suite-Übersicht aufzublähen.
|
|
</div>
|
|
</div>
|
|
|
|
<a href="{{ path('admin_evals_index', {type: case_draft.type|default('retrieval')}) }}"
|
|
class="btn btn-sm btn-outline-secondary">
|
|
Zurück zur Eval Suite
|
|
</a>
|
|
</div>
|
|
|
|
{% for label in ['success', 'danger', 'warning', 'info'] %}
|
|
{% for message in app.flashes(label) %}
|
|
<div class="alert alert-{{ label }} shadow-sm">
|
|
{{ message }}
|
|
</div>
|
|
{% endfor %}
|
|
{% endfor %}
|
|
|
|
{% if case_draft.source_label|default('') %}
|
|
<div class="alert alert-info border-info bg-black text-light shadow-sm">
|
|
<strong>Vorlage geladen:</strong> {{ case_draft.source_label }}<br>
|
|
<span class="small text-secondary">
|
|
Bitte Case-ID, Prompt und Assertions prüfen, bevor du den Case speicherst.
|
|
</span>
|
|
</div>
|
|
{% endif %}
|
|
|
|
<div class="row g-4">
|
|
<div class="col-xl-8">
|
|
<div class="card bg-black border-secondary text-light shadow-sm">
|
|
<div class="card-body">
|
|
<h5 class="text-warning mb-3">
|
|
<i class="bi bi-pencil-square"></i> Neuer Eval-Case
|
|
</h5>
|
|
|
|
<form method="post" action="{{ path('admin_evals_case_create') }}">
|
|
<input type="hidden" name="_token" value="{{ csrf_token('admin_eval_case_create') }}">
|
|
|
|
<div class="mb-3">
|
|
<label class="form-label">Eval-Typ</label>
|
|
<select name="type" class="form-select bg-dark text-light border-secondary">
|
|
{% for type, label in types %}
|
|
<option value="{{ type }}" {% if type == case_draft.type|default('retrieval') %}selected{% endif %}>
|
|
{{ label }}
|
|
</option>
|
|
{% endfor %}
|
|
</select>
|
|
<div class="form-text text-secondary">
|
|
Der Typ entscheidet, in welche Datei geschrieben wird: <code>tests/evals/cases/<type>.ndjson</code>.
|
|
</div>
|
|
</div>
|
|
|
|
<div class="mb-3">
|
|
<label class="form-label">Neue Case-ID</label>
|
|
<input type="text"
|
|
name="id"
|
|
value="{{ case_draft.id|default('') }}"
|
|
class="form-control bg-dark text-light border-secondary"
|
|
placeholder="followup_testomat808_device_price_001"
|
|
required>
|
|
<div class="form-text text-secondary">
|
|
Eindeutig über alle Eval-Typen. Erlaubt: Buchstaben, Zahlen, <code>_</code> und <code>-</code>.
|
|
</div>
|
|
</div>
|
|
|
|
<div class="mb-3">
|
|
<label class="form-label">Prompt</label>
|
|
<textarea name="prompt"
|
|
rows="3"
|
|
class="form-control bg-dark text-light border-secondary"
|
|
placeholder="und was kostet das gerät selber"
|
|
required>{{ case_draft.prompt|default('') }}</textarea>
|
|
<div class="form-text text-secondary">
|
|
Exakt der Nutzerprompt, der abgesichert werden soll. Tippfehler bewusst so eintragen, wenn sie Teil des Tests sind.
|
|
</div>
|
|
</div>
|
|
|
|
<div class="mb-3">
|
|
<label class="form-label">Assert-JSON</label>
|
|
<textarea name="assert_json"
|
|
rows="9"
|
|
class="form-control bg-dark text-light border-secondary font-monospace"
|
|
spellcheck="false">{{ case_draft.assert_json|default('{}') }}</textarea>
|
|
<div class="form-text text-secondary">
|
|
Muss ein gültiges JSON-Objekt sein. Beispiel: <code>{"expected_query":"testomat 808"}</code>.
|
|
</div>
|
|
</div>
|
|
|
|
<div class="mb-3">
|
|
<label class="form-label">History-JSON <span class="text-secondary">optional</span></label>
|
|
<textarea name="history_json"
|
|
rows="8"
|
|
class="form-control bg-dark text-light border-secondary font-monospace"
|
|
spellcheck="false"
|
|
placeholder='[{"prompt":"vorherige Frage","answer":"vorherige Antwort"}]'>{{ case_draft.history_json|default('') }}</textarea>
|
|
<div class="form-text text-secondary">
|
|
Für Follow-up-Cases empfohlen. Muss eine JSON-Liste sein. Leer lassen für direkte Prompts.
|
|
</div>
|
|
</div>
|
|
|
|
<div class="mb-4">
|
|
<label class="form-label">Request Context Hint <span class="text-secondary">optional</span></label>
|
|
<textarea name="request_context_hint"
|
|
rows="3"
|
|
class="form-control bg-dark text-light border-secondary"
|
|
placeholder="Nur für Spezialfälle, wenn History nicht ausreicht.">{{ case_draft.request_context_hint|default('') }}</textarea>
|
|
<div class="form-text text-secondary">
|
|
Normalerweise leer lassen. Für reguläre Regressionen lieber History-JSON verwenden.
|
|
</div>
|
|
</div>
|
|
|
|
<div class="d-flex flex-wrap gap-2">
|
|
<button type="submit" class="btn btn-warning">
|
|
<i class="bi bi-save"></i> Eval-Case speichern
|
|
</button>
|
|
<a href="{{ path('admin_evals_index', {type: case_draft.type|default('retrieval')}) }}"
|
|
class="btn btn-outline-secondary">
|
|
Abbrechen
|
|
</a>
|
|
</div>
|
|
</form>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="col-xl-4">
|
|
<div class="card bg-black border-secondary text-light shadow-sm mb-4">
|
|
<div class="card-body">
|
|
<h5 class="text-info mb-3">
|
|
<i class="bi bi-info-circle"></i> Feld-Checkliste
|
|
</h5>
|
|
<ul class="small text-secondary mb-0">
|
|
<li><strong class="text-light">retrieval</strong>: richtiges Dokument / richtige Chunks prüfen.</li>
|
|
<li><strong class="text-light">shop_query</strong>: direkte Shopquery prüfen.</li>
|
|
<li><strong class="text-light">followup</strong>: Prompt plus History prüfen.</li>
|
|
<li><strong class="text-light">answer_guard</strong>: No-Answer- oder Evidenzfälle prüfen.</li>
|
|
</ul>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="card bg-black border-secondary text-light shadow-sm mb-4">
|
|
<div class="card-body">
|
|
<h5 class="text-info mb-3">
|
|
<i class="bi bi-braces"></i> Häufige Assertions
|
|
</h5>
|
|
<div class="small text-secondary mb-2">Exakte Query:</div>
|
|
<pre class="bg-dark border border-secondary rounded p-2 small text-light"><code>{
|
|
"expected_query": "testomat 808"
|
|
}</code></pre>
|
|
|
|
<div class="small text-secondary mb-2">Begriffe müssen enthalten sein:</div>
|
|
<pre class="bg-dark border border-secondary rounded p-2 small text-light"><code>{
|
|
"must_include_terms": [
|
|
"testomat",
|
|
"808"
|
|
]
|
|
}</code></pre>
|
|
|
|
<div class="small text-secondary mb-2">Dokument muss enthalten sein:</div>
|
|
<pre class="bg-dark border border-secondary rounded p-2 small text-light"><code>{
|
|
"min_results": 1,
|
|
"must_include_one_of_document_ids": [
|
|
"DOKUMENT-ID"
|
|
]
|
|
}</code></pre>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="card bg-black border-secondary text-light shadow-sm">
|
|
<div class="card-body">
|
|
<h5 class="text-info mb-3">
|
|
<i class="bi bi-lightbulb"></i> Empfehlung
|
|
</h5>
|
|
<p class="small text-secondary mb-0">
|
|
Ein guter Eval-Case prüft genau einen Zweck. Lieber mehrere kleine Cases anlegen als einen großen, empfindlichen Case.
|
|
</p>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
{% endblock %}
|