p101
This commit is contained in:
@@ -0,0 +1,44 @@
|
||||
# RetrieX Patch p100d – Admin Eval Prompt Context
|
||||
|
||||
Status: patch-only follow-up for p100 Admin Eval UX.
|
||||
|
||||
## Goal
|
||||
|
||||
Make eval results easier to understand in the Admin UI by showing the actual case prompt directly next to the case id. For follow-up and shopquery cases, show a compact history/context preview as well.
|
||||
|
||||
## Changes
|
||||
|
||||
- Admin eval result table now displays the case prompt below the case id.
|
||||
- Follow-up/shopquery eval details now include a compact history preview.
|
||||
- Admin eval result table shows history/context in a collapsible section when available.
|
||||
|
||||
## Files changed
|
||||
|
||||
- `src/Eval/ShopQueryEvalRunner.php`
|
||||
- `templates/admin/evals/index.html.twig`
|
||||
|
||||
## Non-goals
|
||||
|
||||
No production answer logic is changed:
|
||||
|
||||
- no retrieval logic changes
|
||||
- no shopquery logic changes
|
||||
- no follow-up logic changes
|
||||
- no answer-guard logic changes
|
||||
- no eval assertion changes
|
||||
- no YAML or parameter changes
|
||||
- no database migration
|
||||
|
||||
## Validation
|
||||
|
||||
Recommended after applying:
|
||||
|
||||
```bash
|
||||
php bin/console mto:agent:config:validate
|
||||
php bin/console mto:agent:eval:run retrieval
|
||||
php bin/console mto:agent:eval:run shop_query
|
||||
php bin/console mto:agent:eval:run followup
|
||||
php bin/console mto:agent:eval:run answer_guard
|
||||
```
|
||||
|
||||
Then open `/admin/evals/` and verify that each result row shows the case prompt and that follow-up/shopquery rows can reveal context/history.
|
||||
@@ -0,0 +1,66 @@
|
||||
# RetrieX Patch p101 - Admin Eval Case Creator
|
||||
|
||||
## Ziel
|
||||
|
||||
p101 ergänzt die bestehende Admin Eval Suite um einen kleinen Case-Creator, damit neue Regression-Cases direkt aus dem Admin heraus in die passenden NDJSON-Dateien geschrieben werden können.
|
||||
|
||||
Der Patch baut auf dem grünen p100/p100a/p100b/p100c/p100d-Stand auf und verändert keine produktive RAG-, Shopquery-, Follow-up- oder Antwortlogik.
|
||||
|
||||
## Änderungen
|
||||
|
||||
- Neue POST-Route im Admin:
|
||||
- `/admin/evals/case/create`
|
||||
- Route-Name: `admin_evals_case_create`
|
||||
- `EvalAdminService::createCase()` zum validierten Schreiben neuer Eval-Cases.
|
||||
- Neues Formular auf `/admin/evals/`:
|
||||
- Eval-Typ
|
||||
- Case-ID
|
||||
- Prompt
|
||||
- Assert-JSON
|
||||
- optionales History-JSON
|
||||
- optionaler Request Context Hint
|
||||
- Button pro Report-Result:
|
||||
- `Als neuen Case vorbereiten`
|
||||
- übernimmt Prompt, Typ, History-Vorschau, Query oder Dokument-ID als Vorlage in den Creator.
|
||||
- JSON-/ID-Validierung vor dem Schreiben.
|
||||
- Duplicate-Guard über alle Eval-Typen.
|
||||
|
||||
## Geschriebene Dateien
|
||||
|
||||
Neue Cases werden an folgende Dateien angehängt:
|
||||
|
||||
- `tests/evals/cases/retrieval.ndjson`
|
||||
- `tests/evals/cases/shop_query.ndjson`
|
||||
- `tests/evals/cases/followup.ndjson`
|
||||
- `tests/evals/cases/answer_guard.ndjson`
|
||||
|
||||
## Sicherheit / Scope
|
||||
|
||||
Nicht geändert:
|
||||
|
||||
- keine Retrieval-Gewichte
|
||||
- keine Shopquery-Logik
|
||||
- keine Follow-up-Logik
|
||||
- keine Answer-Guard-Logik
|
||||
- keine Prompt-/YAML-/Parameteränderung
|
||||
- keine Migration
|
||||
|
||||
## Manuelle Prüfung
|
||||
|
||||
```bash
|
||||
php bin/console mto:agent:config:validate
|
||||
php bin/console mto:agent:eval:run retrieval
|
||||
php bin/console mto:agent:eval:run shop_query
|
||||
php bin/console mto:agent:eval:run followup
|
||||
php bin/console mto:agent:eval:run answer_guard
|
||||
```
|
||||
|
||||
Zusätzlich im Admin:
|
||||
|
||||
1. `/admin/evals/` öffnen.
|
||||
2. Einen Eval laufen lassen.
|
||||
3. Bei einem Result `Als neuen Case vorbereiten` klicken.
|
||||
4. Case-ID anpassen bzw. prüfen.
|
||||
5. Assert-JSON prüfen.
|
||||
6. Speichern.
|
||||
7. Den betroffenen Eval-Typ erneut laufen lassen.
|
||||
Reference in New Issue
Block a user