p101d

p101b
p101a
2026-05-12 11:53:36 +02:00 · 2026-05-12 11:26:05 +02:00 · 2026-05-12 11:08:34 +02:00 · 2026-05-12 10:56:50 +02:00 · 2026-05-12 09:16:09 +02:00 · 2026-05-12 08:57:57 +02:00
40 changed files with 4290 additions and 83 deletions
--- a/CONFIG_PARAMS.md
+++ b/CONFIG_PARAMS.md
@@ -311,6 +311,7 @@ Wichtig: `genre.yaml` ist in v1.6.0 eine zentrale Entlastung des PHP-Cores. Doma
 | `min_chunk_distance` | Mindestabstand zwischen ausgewählten Chunks. |
 | `dominant_doc_*` | Bevorzugung dominanter Dokumente bei klarer Trefferlage. |
 | `exact_document_max_chunks` | Maximalchunks bei exaktem Dokumentfokus. |
 | `query_cleanup_profile` | YAML-Cleanup-Profil für die generische Retrieval-Query-Bereinigung. |
 | `focused_product_*` | Fokussierte Produktauswahl im Retrieval. |
 | `catalog_list_shortcut_patterns` | Direkte Katalog-/Listenrouten. |
 | `exact_selection_*` | Präzisionslogik für Tabellen, Indikatoren, Grenzwerte und Messbereiche. |
--- a/RETRIEX-EVAL-CASE-HOWTO.md
+++ b/RETRIEX-EVAL-CASE-HOWTO.md
@@ -0,0 +1,731 @@
 # RetrieX How-to: Neue Eval-Cases korrekt erstellen
 Dieses How-to beschreibt, wie neue Regressionstests für die RetrieX Eval-Suite über den Admin-Bereich angelegt werden.
 Ziel ist, neue rote oder fachlich wichtige Fälle dauerhaft abzusichern, ohne direkt Core-Logik, Retrieval-Regeln oder Shopquery-Heuristiken zu verändern.
 ## Einstieg
 Admin-Pfad:
 ```text
 /admin/evals/
 ```
 Im Bereich **„Eval-Case erstellen“** können neue Cases für folgende Typen angelegt werden:
 ```text
 retrieval
 shop_query
 followup
 answer_guard
 ```
 Nach dem Speichern wird der Case in die passende Datei geschrieben:
 ```text
 tests/evals/cases/retrieval.ndjson
 tests/evals/cases/shop_query.ndjson
 tests/evals/cases/followup.ndjson
 tests/evals/cases/answer_guard.ndjson
 ```
 ---
 ## Grundregel
 Ein guter Eval-Case prüft genau **einen klaren Sachverhalt**.
 Gut:
 ```json
 {
  "expected_query": "testomat 808",
  "must_not_include_terms": [
    "indikator",
    "300"
  ]
 }
 ```
 Weniger gut:
 ```json
 {
  "expected_query": "testomat 808",
  "must_include_terms": [
    "testomat",
    "808",
    "gerät",
    "preis",
    "wasserhärte"
  ],
  "must_not_include_terms": [
    "indikator",
    "300",
    "testomat 2000",
    "chlor",
    "versand"
  ]
 }
 ```
 Je kleiner und eindeutiger der Case ist, desto besser eignet er sich als Regressionstest.
 ---
 # Felder im Admin
 ## 1. Eval-Typ
 Wähle den Typ passend zum Ziel des Tests.
 ```text
 retrieval      → prüft, ob die richtigen RAG-Dokumente/Chunks gefunden werden
 shop_query     → prüft, welche Shopquery aus einem direkten Prompt entsteht
 followup       → prüft, welche Shopquery aus Prompt + Chatverlauf entsteht
 answer_guard   → prüft No-Answer-, Nicht-Halluzinations- oder Evidenzfälle
 ```
 Faustregel:
 ```text
 Wird das richtige Dokument gefunden?        → retrieval
 Wird die richtige Shopquery erzeugt?        → shop_query
 Versteht RetrieX die Folgefrage im Verlauf? → followup
 Erfindet RetrieX nichts bei schwacher Evidenz? → answer_guard
 ```
 ---
 ## 2. Neue Case-ID
 Die Case-ID muss eindeutig sein und darf nur folgende Zeichen enthalten:
 ```text
 Buchstaben
 Zahlen
 _
 -
 ```
 Gute Beispiele:
 ```text
 retrieval_semantic_chlor_clt_001
 shop_query_indicator_300_exact_002
 followup_main_device_price_002
 answer_guard_unknown_medium_001
 ```
 Nicht verwenden:
 ```text
 Test 1
 shop query indikator 300
 gerät/frage/neue-version
 ```
 Empfohlenes Schema:
 ```text
 <typ>_<thema>_<ziel>_<nummer>
 ```
 Beispiel:
 ```text
 followup_testomat808_device_price_001
 ```
 ---
 ## 3. Prompt
 Hier kommt exakt der Nutzerprompt hinein, der getestet werden soll.
 Beispiele:
 ```text
 welches geraet ist fuer chlorueberwachung gedacht
 ```
 ```text
 was kostet der indikator
 ```
 ```text
 und was kostet das gerät selber
 ```
 ```text
 welcher testomat misst drachenblut
 ```
 Der Prompt sollte möglichst so eingetragen werden, wie er real im Chat vorkommt. Tippfehler dürfen bewusst enthalten sein, wenn genau dieses Verhalten abgesichert werden soll.
 ---
 ## 4. Assert-JSON
 Das Assert-JSON beschreibt, was der Test prüfen soll.
 Das Feld muss immer ein gültiges JSON-Objekt sein:
 ```json
 {
 }
 ```
 Wichtig:
 - Keine Kommentare im JSON
 - Keine trailing commas
 - Doppelte Anführungszeichen verwenden
 - Das Feld muss ein Objekt `{ ... }` sein, kein Array
 ---
 # Eval-Typen und Beispiele
 ## A) Retrieval-Case
 Retrieval-Cases prüfen, ob die richtigen RAG-Dokumente oder Chunks gefunden werden.
 ### Minimaler positiver Retrieval-Case
 ```json
 {
  "min_results": 1
 }
 ```
 ### Retrieval-Case mit erwarteter Dokument-ID
 ```json
 {
  "min_results": 1,
  "must_include_one_of_document_ids": [
    "DOKUMENT-ID-HIER"
  ]
 }
 ```
 ### Retrieval-Case mit mehreren möglichen Ziel-Dokumenten
 ```json
 {
  "min_results": 1,
  "must_include_one_of_document_ids": [
    "DOKUMENT-ID-1",
    "DOKUMENT-ID-2"
  ]
 }
 ```
 ### Retrieval-Case mit Pflichtbegriffen
 ```json
 {
  "min_results": 1,
  "must_include_any_terms": [
    "lieferung",
    "versand"
  ]
 }
 ```
 ### Retrieval-Case mit verbotenen Dokumenten
 ```json
 {
  "min_results": 1,
  "must_not_include_document_ids": [
    "FALSCHE-DOKUMENT-ID"
  ]
 }
 ```
 ### Retrieval-Case für No-Result / Unsinn
 ```json
 {
  "max_results": 0
 }
 ```
 ### Empfohlene Retrieval-Struktur
 ```json
 {
  "min_results": 1,
  "must_include_one_of_document_ids": [
    "DOKUMENT-ID-HIER"
  ],
  "must_include_any_terms": [
    "wichtiger fachbegriff",
    "produktname"
  ]
 }
 ```
 ---
 ## B) Shopquery-Case
 Shopquery-Cases prüfen, welche Shopquery aus einem direkten Prompt entsteht.
 ### Exakte Shopquery
 Prompt:
 ```text
 was kostet der Testomat 808 Indikator 300
 ```
 Assert-JSON:
 ```json
 {
  "expected_query": "testomat 808 300 indikator"
 }
 ```
 ### Shopquery mit Pflicht- und Verbotsbegriffen
 ```json
 {
  "must_include_terms": [
    "testomat",
    "808",
    "300",
    "indikator"
  ],
  "must_not_include_terms": [
    "300 s",
    "301",
    "302",
    "303"
  ]
 }
 ```
 ### Query darf nicht auf Noise fallen
 ```json
 {
  "must_not_equal_query": "information"
 }
 ```
 ### Multi-Produkt- oder Link-Follow-up mit Einzelqueries
 ```json
 {
  "expected_individual_queries": [
    "testomat 2000 self clean",
    "testomat 2000 cal",
    "testomat 808"
  ],
  "expected_individual_queries_exact": true
 }
 ```
 ### Empfehlung für Shopquery-Cases
 Nicht jeden Case sofort zu streng mit `expected_query` absichern. Bei noch variabler Query-Bildung ist oft besser:
 ```json
 {
  "must_include_terms": [
    "testomat",
    "808",
    "sio2"
  ],
  "must_not_include_terms": [
    "gerät",
    "möchte",
    "messen"
  ]
 }
 ```
 `expected_query` nur verwenden, wenn die Query bereits stabil und bewusst exakt sein soll.
 ---
 ## C) Follow-up-Case
 Follow-up-Cases prüfen, ob RetrieX den Verlauf korrekt nutzt.
 Bei `followup` ist **History-JSON praktisch Pflicht**, weil sonst kein echter Verlauf getestet wird.
 ### Beispiel: Indikatorpreis nach Verlauf
 Prompt:
 ```text
 was kostet der indikator
 ```
 History-JSON:
 ```json
 [
  {
    "prompt": "Was ist der niedrigste Grenzwert für die Wasserhärte, welcher mit einem Testomaten überwacht werden kann?",
    "answer": "Der niedrigste Grenzwert für die Wasserhärte beträgt 0,02 °dH. Dieser Wert wird vom Testomat 808 gemessen."
  },
  {
    "prompt": "mit welchem indikator",
    "answer": "Der niedrigste messbare Grenzwert für Wasserhärte mit dem Testomat 808 wird mit dem Indikatortyp 300 erreicht."
  }
 ]
 ```
 Assert-JSON:
 ```json
 {
  "expected_query": "testomat 808 300 indikator",
  "must_include_terms": [
    "testomat",
    "808",
    "300",
    "indikator"
  ],
  "must_not_include_terms": [
    "300 s",
    "301",
    "302",
    "303",
    "testomat 2000"
  ]
 }
 ```
 ### Beispiel: Wechsel vom Indikator zurück zum Hauptgerät
 Prompt:
 ```text
 und was kostet das gerät selber
 ```
 History-JSON:
 ```json
 [
  {
    "prompt": "was kostet der indikator",
    "answer": "Shop-Suche abgeschlossen. Gesendete Suchquery: testomat 808 300 indikator. Testomat® 808 Indikator 300 500 ml, Produkt-Nummer 141001. Testomat® 808 Indikator 300 2 x 100 ml, Produkt-Nummer 140001. Der zugehörige Testomat ist Testomat 808."
  }
 ]
 ```
 Assert-JSON:
 ```json
 {
  "expected_query": "testomat 808",
  "must_include_terms": [
    "testomat",
    "808"
  ],
  "must_not_include_terms": [
    "indikator",
    "300",
    "141001",
    "140001"
  ]
 }
 ```
 ### Empfehlung für Follow-up-Cases
 Die History sollte genau die Informationen enthalten, die der echte Chat vorher hatte.
 Nicht zu wenig:
 ```text
 Nur "Indikator 300" ohne Geräteanker kann zu unklar sein.
 ```
 Nicht zu viel:
 ```text
 Ein kompletter langer Chatverlauf kann den Case unnötig instabil machen.
 ```
 Gut ist ein kurzer, fachlich relevanter Auszug.
 ---
 ## D) Answer-Guard-Case
 Answer-Guard-Cases prüfen, dass RetrieX bei Unsinn, schwacher Evidenz oder falschen Zuordnungen nichts erfindet.
 ### Unsinn soll keine Treffer liefern
 Prompt:
 ```text
 dsgfsdgfsdgf
 ```
 Assert-JSON:
 ```json
 {
  "max_results": 0
 }
 ```
 ### Erfundenes Medium soll nicht als echtes Produkt beantwortet werden
 Prompt:
 ```text
 welcher testomat misst drachenblut
 ```
 Assert-JSON:
 ```json
 {
  "must_not_include_terms": [
    "drachenblut"
  ]
 }
 ```
 ### Falsches Dokument darf nicht gezogen werden
 ```json
 {
  "min_results": 1,
  "must_not_include_document_ids": [
    "FALSCHE-DOKUMENT-ID"
  ]
 }
 ```
 ### Empfehlung für Answer-Guard-Cases
 Bei Answer-Guard-Cases möglichst nicht auf einzelne Wörter im kompletten Retrieval-Text überreagieren. Besser sind:
 ```text
 Dokument-IDs
 klare Produktnamen
 klare verbotene Zielbegriffe
 max_results bei Unsinn
 ```
 Ein Wort irgendwo im Retrieval-Kontext ist nicht automatisch ein fachlicher Fehler.
 ---
 # Optionales Feld: History-JSON
 History-JSON wird vor allem für `followup` verwendet.
 Format:
 ```json
 [
  {
    "prompt": "vorherige Nutzerfrage",
    "answer": "vorherige Antwort oder relevanter Auszug"
  }
 ]
 ```
 Mehrere Turns:
 ```json
 [
  {
    "prompt": "erste Frage",
    "answer": "erste Antwort"
  },
  {
    "prompt": "zweite Frage",
    "answer": "zweite Antwort"
  }
 ]
 ```
 Wichtig:
 ```text
 History-JSON ist ein Array [...]
 Assert-JSON ist ein Objekt {...}
 ```
 ---
 # Optionales Feld: Request Context Hint
 Dieses Feld kann meistens leer bleiben.
 Es ist nur sinnvoll, wenn ein Case zusätzlichen Kontext simulieren soll, der nicht sauber über History abbildbar ist.
 Beispiel:
 ```text
 Sichtbare Shop-Ergebnisse enthalten Testomat 808 und Testomat 808 Indikator 300.
 Der Nutzer fragt nach dem Gerät selber.
 ```
 Empfehlung:
 ```text
 Für normale Regressionen lieber History-JSON verwenden.
 Request Context Hint nur für Spezialfälle nutzen.
 ```
 ---
 # Vollständiges Beispiel: Follow-up-Gerätepreis
 ## Eval-Typ
 ```text
 followup
 ```
 ## Neue Case-ID
 ```text
 followup_testomat808_main_device_price_002
 ```
 ## Prompt
 ```text
 und was kostet das gerät selber
 ```
 ## Assert-JSON
 ```json
 {
  "expected_query": "testomat 808",
  "must_include_terms": [
    "testomat",
    "808"
  ],
  "must_not_include_terms": [
    "indikator",
    "300",
    "141001",
    "140001"
  ]
 }
 ```
 ## History-JSON
 ```json
 [
  {
    "prompt": "was kostet der indikator",
    "answer": "Shop-Suche abgeschlossen. Gesendete Suchquery: testomat 808 300 indikator. Testomat® 808 Indikator 300 500 ml, Produkt-Nummer 141001. Testomat® 808 Indikator 300 2 x 100 ml, Produkt-Nummer 140001. Der zugehörige Testomat ist Testomat 808."
  }
 ]
 ```
 ## Request Context Hint
 Leer lassen.
 ---
 # Nach dem Speichern prüfen
 Nach dem Speichern sollte der passende Eval-Typ ausgeführt werden.
 Im Admin:
 ```text
 /admin/evals/
 ```
 Oder per CLI:
 ```bash
 php bin/console mto:agent:config:validate
 php bin/console mto:agent:eval:run retrieval
 php bin/console mto:agent:eval:run shop_query
 php bin/console mto:agent:eval:run followup
 php bin/console mto:agent:eval:run answer_guard
 ```
 Für einen einzelnen Typ:
 ```bash
 php bin/console mto:agent:eval:run followup
 ```
 ---
 # Praktische Checkliste
 Vor dem Speichern prüfen:
 ```text
 [ ] Eval-Typ passt zum Ziel
 [ ] Case-ID ist eindeutig
 [ ] Case-ID enthält nur Buchstaben, Zahlen, _ oder -
 [ ] Prompt ist realistisch und exakt
 [ ] Assert-JSON ist gültiges JSON-Objekt
 [ ] History-JSON ist bei Follow-up-Cases vorhanden
 [ ] History-JSON ist gültiges JSON-Array
 [ ] Der Case prüft nur einen klaren Sachverhalt
 [ ] Assertions sind nicht unnötig streng
 [ ] Nach dem Speichern läuft der passende Eval-Typ grün
 ```
 ---
 # Wann ein neuer Eval-Case angelegt werden sollte
 Ein neuer Case ist sinnvoll, wenn:
 ```text
 ein realer Prompt rot war
 ein wichtiger grüner Flow dauerhaft abgesichert werden soll
 ein Tippfehler-/Noise-Fall stabil bleiben soll
 eine Produktidentität nicht verloren gehen darf
 eine falsche Dokumentzuordnung verhindert werden soll
 eine No-Answer-Situation nicht halluzinieren darf
 ```
 Kein neuer Case ist nötig, wenn:
 ```text
 nur die Formulierung einer Antwort leicht anders war
 der Prompt fachlich nicht relevant ist
 die Erwartung nicht eindeutig definiert werden kann
 der Case mehrere unabhängige Dinge gleichzeitig prüfen würde
 ```
 ---
 # Leitlinie
 Ab RetrieX v1.6.2 gilt:
 ```text
 Keine neue Genauigkeitslogik ohne konkreten roten oder fachlich wichtigen Eval-Fall.
 ```
 Daher sollten neue Optimierungen möglichst immer so ablaufen:
 ```text
 1. Prompt testen
 2. Verhalten bewerten
 3. Wenn wichtig: Eval-Case anlegen
 4. Eval grün bekommen
 5. Erst danach Logik, YAML oder Parameter ändern
 ```
--- a/composer.json
+++ b/composer.json
@@ -29,7 +29,8 @@
        "symfony/twig-bundle": "7.4.*",
        "symfony/uid": "7.4.*",
        "symfony/yaml": "^7.4",
-      "ext-sqlite3": "*"
+      "ext-sqlite3": "*",
      "ext-mbstring": "*"
    },
    "config": {
        "optimize-autoloader": true,
--- a/config/retriex/genre.yaml
+++ b/config/retriex/genre.yaml
@@ -759,6 +759,15 @@ parameters:
            Grenzwert: Überwachungsbereich
            store: shop
            Indikatortyp: Indikator
            geraet: gerät analysegerät
            geraete: geräte analysegeräte
            wasserhaerte: wasserhärte
            haerte: härte
            ueberwachung: überwachung
            chlorueberwachung: chlor überwachung chlorüberwachung
            haerteueberwachung: härteüberwachung härte überwachung
            haerteueberwachungsgeraet: härteüberwachungsgerät härteüberwachung analysegerät
            lieferbedingungen: lieferung versand verkaufsbedingungen allgemeine lieferbedingungen
        accessory_focus_variants:
          origin: genre_native
          map:
@@ -1277,6 +1286,13 @@ parameters:
          - schwimmbad
          - schwimmbecken
          - pool
          - silikat
          - silikatüberwachung
          - silikatueberwachung
          - sio2
          - si o2
          - kieselsäure
          - kieselsaeure
          - 0,02
        stopword_cleanup:
          origin: genre_native
@@ -2008,6 +2024,8 @@ parameters:
          - tm
          - ph
          - rx
          - v
          - c
          family_descriptor_tokens:
          - evo
          - eco
--- a/config/retriex/retrieval.yaml
+++ b/config/retriex/retrieval.yaml
@@ -22,6 +22,7 @@ parameters:
    dominant_doc_min_hits: 3
    dominant_doc_max_chunks: 4
    exact_document_max_chunks: 6
    query_cleanup_profile: retrieval_reference_cleanup
    focused_product_window: 8
    focused_product_min_score: 10.0
    focused_product_min_gap: 4.0
--- a/patch_history/RETRIEX_PATCH_100B_ADMIN_EVAL_CASE_SELECTION_FIX_README.md
+++ b/patch_history/RETRIEX_PATCH_100B_ADMIN_EVAL_CASE_SELECTION_FIX_README.md
@@ -0,0 +1,37 @@
 # RetrieX Patch p100b - Admin Eval Case Selection Fix
 ## Ziel
 Behebt die Admin-Eval-UX, wenn ein einzelner Case ausgewaehlt wird und der Request mit `No eval cases selected.` endet.
 ## Ursache
 Die p100/p100a-Seite nutzte ein freies `datalist`-Feld fuer Case-IDs, das Cases aller Eval-Typen enthielt. Dadurch konnte ein Case aus `shop_query` ausgewaehlt werden, waehrend das Formular noch einen anderen Eval-Typ sendete. Der Admin-Service suchte dann nur in der Case-Datei des gesendeten Typs und fand keine passenden Cases.
 ## Aenderungen
 - Das freie Case-ID-Feld wurde durch ein gefiltertes Select ersetzt.
 - Die Case-Liste wird clientseitig passend zum gewaehlten Eval-Typ gefiltert.
 - Beim Wechsel des Eval-Typs wird eine nicht passende Case-Auswahl automatisch geleert.
 - Der Admin-Service ist robuster: Wenn eine Case-ID nicht im gesendeten Typ gefunden wird, wird sie ueber alle unterstuetzten Eval-Typen gesucht und mit dem richtigen Typ ausgefuehrt.
 - Der Controller redirectet nach dem Run auf den effektiv ausgefuehrten Eval-Typ.
 - Die alte unklare Meldung `No eval cases selected.` wird durch konkrete Fehlertexte ersetzt.
 ## Scope
 Keine Aenderungen an:
 - Retrieval-Logik
 - Shopquery-Logik
 - Follow-up-Logik
 - Answer-Guard-Logik
 - Eval-Cases
 - YAML-Konfiguration
 - Modellparametern
 - Datenbank/Migrationen
 ## Geaenderte Dateien
 - `src/Controller/Admin/AdminEvalController.php`
 - `src/Service/Admin/EvalAdminService.php`
 - `templates/admin/evals/index.html.twig`
--- a/patch_history/RETRIEX_PATCH_100C_ADMIN_EVAL_DOCUMENT_LABELS_README.md
+++ b/patch_history/RETRIEX_PATCH_100C_ADMIN_EVAL_DOCUMENT_LABELS_README.md
@@ -0,0 +1,45 @@
 # RetrieX Patch p100c - Admin Eval Document Labels
 ## Ziel
 Die Admin-Eval-Resultate sollen bei Retrieval-/Answer-Guard-Fällen nicht nur technische `document_id`- und `chunk_id`-Werte anzeigen, sondern auch menschenlesbare Dokumentinformationen, damit ein gefundenes Dokument im Admin/Dateibestand leichter identifiziert werden kann.
 ## Änderungen
 - `NdjsonHybridRetriever::retrieveDebug()` gibt pro Debug-Treffer zusätzlich aus:
  - `document_title`
  - `file_path`
  - `version_number`
 - `RetrievalDebugRunner` schreibt in Eval-Reports zusätzlich:
  - `document_refs`: eindeutige Dokumentübersicht mit Titel, Datei, Version, Ranks und Chunk-IDs
  - `result_rows`: rankgenaue Trefferliste mit Titel, Datei, Chunk-ID und Text-Preview
 - Admin-Eval-Template zeigt diese Informationen direkt in den Result-Details:
  - Tabelle "Gefundene Dokumente"
  - aufklappbare Tabelle "Treffer / Chunks anzeigen"
  - JSON-Details bleiben weiterhin verfügbar
 ## Nicht geändert
 - Keine Eval-Assertions geändert
 - Keine Retrieval-Gewichte geändert
 - Keine Shopquery-/Follow-up-/Answer-Logik geändert
 - Keine YAML-/Parameteränderung
 - Keine Datenbankmigration
 ## Prüfung
 Nach Einspielen:
 ```bash
 php bin/console mto:agent:config:validate
 php bin/console mto:agent:eval:run retrieval
 php bin/console mto:agent:eval:run answer_guard
 ```
 Danach im Admin:
 ```text
 /admin/evals/
 ```
 Einen Retrieval- oder Answer-Guard-Eval öffnen und prüfen, ob bei den Resultaten Titel/Datei zusätzlich zur Doc-ID sichtbar sind.
--- a/patch_history/RETRIEX_PATCH_100D_ADMIN_EVAL_PROMPT_CONTEXT_README.md
+++ b/patch_history/RETRIEX_PATCH_100D_ADMIN_EVAL_PROMPT_CONTEXT_README.md
@@ -0,0 +1,44 @@
 # RetrieX Patch p100d – Admin Eval Prompt Context
 Status: patch-only follow-up for p100 Admin Eval UX.
 ## Goal
 Make eval results easier to understand in the Admin UI by showing the actual case prompt directly next to the case id. For follow-up and shopquery cases, show a compact history/context preview as well.
 ## Changes
 - Admin eval result table now displays the case prompt below the case id.
 - Follow-up/shopquery eval details now include a compact history preview.
 - Admin eval result table shows history/context in a collapsible section when available.
 ## Files changed
 - `src/Eval/ShopQueryEvalRunner.php`
 - `templates/admin/evals/index.html.twig`
 ## Non-goals
 No production answer logic is changed:
 - no retrieval logic changes
 - no shopquery logic changes
 - no follow-up logic changes
 - no answer-guard logic changes
 - no eval assertion changes
 - no YAML or parameter changes
 - no database migration
 ## Validation
 Recommended after applying:
 ```bash
 php bin/console mto:agent:config:validate
 php bin/console mto:agent:eval:run retrieval
 php bin/console mto:agent:eval:run shop_query
 php bin/console mto:agent:eval:run followup
 php bin/console mto:agent:eval:run answer_guard
 ```
 Then open `/admin/evals/` and verify that each result row shows the case prompt and that follow-up/shopquery rows can reveal context/history.
--- a/patch_history/RETRIEX_PATCH_100_ADMIN_EVAL_UX_README.md
+++ b/patch_history/RETRIEX_PATCH_100_ADMIN_EVAL_UX_README.md
@@ -0,0 +1,75 @@
 # RetrieX Patch p100 - Admin Eval UX
 Status: patch-only candidate
 Basis: confirmed v1.6.2 + p99/p99b/p99c green eval suite
 ## Ziel
 p100 macht die mit p99 eingeführte Eval-Suite im Admin sichtbar und bedienbar, ohne die produktive RAG-, Shop-, Prompt-, Scoring- oder Antwortlogik fachlich zu ändern.
 ## Enthalten
 - Neuer Admin-Bereich `/admin/evals/`
 - Übersicht über die Eval-Typen:
  - `retrieval`
  - `shop_query`
  - `followup`
  - `answer_guard`
 - Anzeige der Case-Anzahl pro Typ
 - Anzeige typspezifischer letzter Reports aus `tests/evals/reports/<type>-last-run.json`
 - Run-Buttons pro Eval-Typ
 - Formular zum Ausführen eines kompletten Typs oder einer einzelnen Case-ID
 - Detailansicht für PASS/FAIL, Fehler und Result-Details
 - CLI-Referenz im Admin
 - Sidebar-Link unter KI-Endpunkte
 - Link von der KI-/LLM-Setup-Seite zur Eval Suite
 ## Report-Verhalten
 Admin-Runs schreiben zwei Reports:
 - `tests/evals/reports/<type>-last-run.json`
 - `tests/evals/reports/last-run.json`
 Die CLI bleibt unverändert und schreibt weiterhin den bekannten `last-run.json`.
 ## Rollen
 Der neue Bereich ist auf Controller-Ebene durch `ROLE_KNOWLEDGE_ADMIN` geschützt.
 ## Nicht geändert
 - keine Retrieval-Gewichte
 - keine Shopquery-Erzeugungslogik
 - keine Follow-up-Logik
 - keine Answer-Guard-Logik
 - keine Prompt-Änderung
 - keine YAML-Vokabularänderung
 - keine Modellparameteränderung
 - keine Datenbankmigration
 ## Geänderte Dateien
 - `src/Controller/Admin/AdminEvalController.php`
 - `src/Service/Admin/EvalAdminService.php`
 - `templates/admin/evals/index.html.twig`
 - `templates/admin/base.html.twig`
 - `templates/admin/model_config/list.html.twig`
 - `patch_history/RETRIEX_PATCH_100_ADMIN_EVAL_UX_README.md`
 ## Prüfung nach Einspielen
 ```bash
 php bin/console mto:agent:config:validate
 php bin/console mto:agent:eval:run retrieval
 php bin/console mto:agent:eval:run shop_query
 php bin/console mto:agent:eval:run followup
 php bin/console mto:agent:eval:run answer_guard
 ```
 Zusätzlich im Browser prüfen:
 - `/admin/evals/`
 - Eval-Typ ausführen
 - Detailreport öffnen
 - Sidebar-Link sichtbar für Knowledge Admins
--- a/patch_history/RETRIEX_PATCH_101A_ADMIN_EVAL_CASE_CREATOR_PAGE_README.md
+++ b/patch_history/RETRIEX_PATCH_101A_ADMIN_EVAL_CASE_CREATOR_PAGE_README.md
@@ -0,0 +1,54 @@
 # RetrieX Patch p101a - Admin Eval Case Creator Separate Page
 ## Ziel
 Der Eval-Case-Creator wird als eigene Admin-Seite geführt, damit die Eval-Suite-Übersicht schlank bleibt und nicht durch das komplette Case-Erstellformular aufgeblasen wirkt.
 ## Neue / geänderte Admin-Routen
 - `GET /admin/evals/` bleibt die fokussierte Eval-Suite-Übersicht für Runs und Reports.
 - `GET /admin/evals/cases/new` zeigt das separate Formular zum Anlegen neuer Eval-Cases.
 - `POST /admin/evals/cases` speichert neue Eval-Cases in `tests/evals/cases/<type>.ndjson`.
 ## UX-Änderungen
 - Die Eval-Suite-Übersicht erhält nur einen kompakten Button `Eval-Case erstellen`.
 - Report-Ergebnisse erhalten den Button `Als neuen Case vorbereiten`.
 - Die neue Seite übernimmt bei vorbereiteten Cases:
  - Eval-Typ
  - Prompt
  - History/Kontext, sofern im Report vorhanden
  - vorgeschlagene Assertions aus Query, Einzelqueries oder Dokument-IDs
 - Die eigentliche Case-Erstellung liegt außerhalb der Report-/Run-Übersicht.
 ## Validierung
 Beim Speichern werden geprüft:
 - CSRF-Token
 - `ROLE_KNOWLEDGE_ADMIN`
 - unterstützter Eval-Typ
 - eindeutige Case-ID über alle Eval-Typen
 - erlaubtes Case-ID-Format
 - nicht leerer Prompt
 - gültiges Assert-JSON-Objekt
 - gültige History-JSON-Liste
 - DTO-Validierung über `EvalCase::fromArray()`
 ## Nicht geändert
 - Keine Retrieval-Logik
 - Keine Shopquery-Logik
 - Keine Follow-up-Logik
 - Keine Answer-Guard-Logik
 - Keine Eval-Cases
 - Keine YAML-/Parameteränderung
 - Keine Migration
 ## Betroffene Dateien
 - `src/Controller/Admin/AdminEvalController.php`
 - `src/Service/Admin/EvalAdminService.php`
 - `templates/admin/evals/index.html.twig`
 - `templates/admin/evals/case_new.html.twig`
 - `patch_history/RETRIEX_PATCH_101A_ADMIN_EVAL_CASE_CREATOR_PAGE_README.md`
--- a/patch_history/RETRIEX_PATCH_101B_ADMIN_EVAL_CASE_HELP_TEXTS_README.md
+++ b/patch_history/RETRIEX_PATCH_101B_ADMIN_EVAL_CASE_HELP_TEXTS_README.md
@@ -0,0 +1,52 @@
 # RetrieX Patch p101b - Admin Eval Case Help Texts
 ## Ziel
 Verbessert die Hilfetexte auf der Admin-Seite zum Erstellen neuer Eval-Cases, damit auch weniger technische Nutzer verstehen, welche Werte in welche Felder gehören.
 ## Scope
 Geändert:
 - `templates/admin/evals/case_new.html.twig`
 Neu:
 - `patch_history/RETRIEX_PATCH_101B_ADMIN_EVAL_CASE_HELP_TEXTS_README.md`
 ## Änderungen
 - Ausführlichere Beschreibungen unter allen Eingabefeldern
 - Laienfreundliche Erklärung der Eval-Typen
 - Beispiele für gute Case-IDs
 - Klarere Erklärung für Prompt vs. erwartete Antwort
 - Copy-Paste-Beispiele für Assert-JSON
 - Erklärung, wann History-JSON benötigt wird
 - Hinweis, dass Request Context Hint fast immer leer bleiben kann
 - Zusätzliche Checkliste vor dem Speichern
 ## Nicht geändert
 - Keine Eval-Logik
 - Keine Retrieval-Logik
 - Keine Shopquery-Logik
 - Keine Follow-up-Logik
 - Keine Answer-Guard-Logik
 - Keine bestehenden Eval-Cases
 - Keine YAML- oder Parameteränderung
 - Keine Migration
 ## Prüfung
 Nach Einspielen:
 ```bash
 php bin/console mto:agent:config:validate
 ```
 Dann im Admin prüfen:
 - `/admin/evals/cases/new`
 - Hilfetexte unter allen Feldern sichtbar
 - Vorlage aus Report-Result weiterhin nutzbar
 - Case speichern weiterhin möglich
--- a/patch_history/RETRIEX_PATCH_101C_ADMIN_EVAL_CASE_DELETE_README.md
+++ b/patch_history/RETRIEX_PATCH_101C_ADMIN_EVAL_CASE_DELETE_README.md
@@ -0,0 +1,50 @@
 # RetrieX Patch p101c - Admin Eval Case Delete
 ## Ziel
 Ergänzt die Admin-Eval-Case-Verwaltung um eine sichere Löschfunktion für einzelne Eval-Cases.
 Damit können falsch angelegte oder nicht mehr benötigte Cases direkt im Admin entfernt werden, ohne die Eval-Suite-Übersicht weiter aufzublähen.
 ## Umfang
 - Neue POST-Route `admin_evals_case_delete` unter `/admin/evals/cases/delete`
 - CSRF-Schutz pro Eval-Typ und Case-ID
 - Rollenprüfung über `ROLE_KNOWLEDGE_ADMIN`
 - Entfernen genau des ausgewählten Cases aus `tests/evals/cases/<type>.ndjson`
 - Abbruch ohne Änderung, wenn die NDJSON-Datei ungültig ist oder der Case nicht gefunden wird
 - Löschbereich auf der separaten Case-Seite `/admin/evals/cases/new`
 - Bestätigungsdialog vor dem Löschen
 - Hinweis, dass nach dem Löschen der betroffene Eval-Typ erneut ausgeführt werden sollte
 ## Nicht geändert
 - Keine Retrieval-Logik
 - Keine Shopquery-Logik
 - Keine Follow-up-Logik
 - Keine Answer-Guard-Logik
 - Keine Eval-Assertions
 - Keine bestehenden Cases automatisch gelöscht
 - Keine YAML-/Parameteränderung
 - Keine Migration
 ## Prüfung
 Nach Einspielen:
 ```bash
 php bin/console mto:agent:config:validate
 php bin/console mto:agent:eval:run retrieval
 php bin/console mto:agent:eval:run shop_query
 php bin/console mto:agent:eval:run followup
 php bin/console mto:agent:eval:run answer_guard
 ```
 Im Admin:
 1. `/admin/evals/cases/new` öffnen.
 2. Einen Test-Case anlegen oder einen bestehenden Test-Case auswählen.
 3. `Case löschen` klicken.
 4. Bestätigungsdialog bestätigen.
 5. Prüfen, dass der Case aus der Liste verschwindet.
 6. Den betroffenen Eval-Typ erneut laufen lassen.
--- a/patch_history/RETRIEX_PATCH_101D_ADMIN_EVAL_CASE_DELETE_HOTFIX_README.md
+++ b/patch_history/RETRIEX_PATCH_101D_ADMIN_EVAL_CASE_DELETE_HOTFIX_README.md
@@ -0,0 +1,53 @@
 # RetrieX Patch p101d - Admin Eval Case Delete Hotfix
 ## Ziel
 Behebt einen Fehler aus p101c, bei dem beim Löschen eines Eval-Cases folgende Exception auftreten konnte:
 ```text
 Call to undefined method App\Service\Admin\EvalAdminService::normalizeExistingCaseId()
 ```
 ## Ursache
 `EvalAdminService::deleteCase()` ruft eine Validierungs-Hilfsmethode für bestehende Case-IDs auf. Diese Methode wurde in p101c referenziert, aber nicht in die Service-Klasse aufgenommen.
 ## Änderung
 Ergänzt `normalizeExistingCaseId()` in `EvalAdminService`.
 Die Methode:
 - trimmt die übergebene Case-ID,
 - verhindert leere IDs,
 - erlaubt nur Buchstaben, Zahlen, Unterstriche und Bindestriche,
 - gibt eine verständliche Fehlermeldung bei ungültigen IDs zurück.
 ## Geänderte Dateien
 ```text
 src/Service/Admin/EvalAdminService.php
 patch_history/RETRIEX_PATCH_101D_ADMIN_EVAL_CASE_DELETE_HOTFIX_README.md
 ```
 ## Nicht geändert
 ```text
 keine Eval-Logik
 keine Retrieval-Logik
 keine Shopquery-Logik
 keine Follow-up-Logik
 keine Answer-Guard-Logik
 keine YAML-/Parameteränderung
 keine bestehenden Eval-Cases
 keine Migration
 ```
 ## Prüfung
 ```bash
 php -l src/Service/Admin/EvalAdminService.php
 php bin/console mto:agent:config:validate
 ```
 Danach im Admin einen Eval-Case löschen.
--- a/patch_history/RETRIEX_PATCH_101_ADMIN_EVAL_CASE_CREATOR_README.md
+++ b/patch_history/RETRIEX_PATCH_101_ADMIN_EVAL_CASE_CREATOR_README.md
@@ -0,0 +1,66 @@
 # RetrieX Patch p101 - Admin Eval Case Creator
 ## Ziel
 p101 ergänzt die bestehende Admin Eval Suite um einen kleinen Case-Creator, damit neue Regression-Cases direkt aus dem Admin heraus in die passenden NDJSON-Dateien geschrieben werden können.
 Der Patch baut auf dem grünen p100/p100a/p100b/p100c/p100d-Stand auf und verändert keine produktive RAG-, Shopquery-, Follow-up- oder Antwortlogik.
 ## Änderungen
 - Neue POST-Route im Admin:
  - `/admin/evals/case/create`
  - Route-Name: `admin_evals_case_create`
 - `EvalAdminService::createCase()` zum validierten Schreiben neuer Eval-Cases.
 - Neues Formular auf `/admin/evals/`:
  - Eval-Typ
  - Case-ID
  - Prompt
  - Assert-JSON
  - optionales History-JSON
  - optionaler Request Context Hint
 - Button pro Report-Result:
  - `Als neuen Case vorbereiten`
  - übernimmt Prompt, Typ, History-Vorschau, Query oder Dokument-ID als Vorlage in den Creator.
 - JSON-/ID-Validierung vor dem Schreiben.
 - Duplicate-Guard über alle Eval-Typen.
 ## Geschriebene Dateien
 Neue Cases werden an folgende Dateien angehängt:
 - `tests/evals/cases/retrieval.ndjson`
 - `tests/evals/cases/shop_query.ndjson`
 - `tests/evals/cases/followup.ndjson`
 - `tests/evals/cases/answer_guard.ndjson`
 ## Sicherheit / Scope
 Nicht geändert:
 - keine Retrieval-Gewichte
 - keine Shopquery-Logik
 - keine Follow-up-Logik
 - keine Answer-Guard-Logik
 - keine Prompt-/YAML-/Parameteränderung
 - keine Migration
 ## Manuelle Prüfung
 ```bash
 php bin/console mto:agent:config:validate
 php bin/console mto:agent:eval:run retrieval
 php bin/console mto:agent:eval:run shop_query
 php bin/console mto:agent:eval:run followup
 php bin/console mto:agent:eval:run answer_guard
 ```
 Zusätzlich im Admin:
 1. `/admin/evals/` öffnen.
 2. Einen Eval laufen lassen.
 3. Bei einem Result `Als neuen Case vorbereiten` klicken.
 4. Case-ID anpassen bzw. prüfen.
 5. Assert-JSON prüfen.
 6. Speichern.
 7. Den betroffenen Eval-Typ erneut laufen lassen.
--- a/patch_history/RETRIEX_PATCH_98_RETRIEVAL_EVAL_GREEN_BASELINE_README.md
+++ b/patch_history/RETRIEX_PATCH_98_RETRIEVAL_EVAL_GREEN_BASELINE_README.md
@@ -0,0 +1,79 @@
 # RetrieX Patch p98 - Retrieval Eval Green Baseline
 ## Ziel
 p98 schärft die Retrieval-Baseline für die vier zuletzt roten Eval-Fälle, ohne neue produkt- oder testfallspezifische PHP-Sonderlogik einzuführen.
 Abgedeckte rote Fälle aus `tests/evals/cases/retrieval.ndjson`:
 - `welcher testomat ist ein verschneideregler`
 - `welches geraet ist fuer chlorueberwachung gedacht`
 - `lieferbedingungen versand testomat`
 - `testomat 2000 th 2005 sicherheitsdatenblatt`
 ## Änderungen
 ### 1. YAML-konfigurierbares Retrieval-Query-Cleanup
 `QueryCleaner` nutzt zusätzlich zum bestehenden Legacy-Stopword-Set ein YAML-Cleanup-Profil aus `retrieval.yaml`:
 ```yaml
 query_cleanup_profile: retrieval_reference_cleanup
 ```
 Dadurch werden generische Fragewörter wie `welcher` und `welches` über das bestehende Cleanup-Profil entfernt, ohne sie wieder in alte Legacy-Listen zurückzuschreiben.
 ### 2. ASCII-/Umlaut- und Bedeutungsbrücken im Genre-Enrichment
 `genre.yaml` ergänzt konservative Query-Enrichment-Regeln für häufige ASCII-Schreibweisen und zusammengesetzte Suchbegriffe:
 - `geraet` -> `gerät analysegerät`
 - `chlorueberwachung` -> `chlor überwachung chlorüberwachung`
 - `haerteueberwachungsgeraet` -> `härteüberwachungsgerät härteüberwachung analysegerät`
 - `lieferbedingungen` -> `lieferung versand verkaufsbedingungen allgemeine lieferbedingungen`
 Die Regeln bleiben im genre-spezifischen Konfigurationsbereich `brands_and_canonical_terms.query_enrichment_rules`.
 ### 3. Strengerer Exact-Title-Fallback für kurze Modellvarianten
 Kurze Modell-/Variantentokens aus der Retrieval-Vocabulary-View können nun bei Exact-Title-Tokenmatches signifikant sein.
 Damit gilt z. B. bei `Testomat 2000 V` auch `v` als relevanter Titelbestandteil. Eine Anfrage wie `testomat 2000 th 2005 sicherheitsdatenblatt` fällt dadurch nicht mehr fälschlich auf `Testomat 2000 V`, sondern kann in die normale Retrieval-Fusion laufen und dort die TH-2005-Sicherheitsdatenblätter treffen.
 ### 4. Config-Validierung und Doku
 - `NdjsonHybridRetrieverConfig` exportiert `query_cleanup_profile`.
 - `RetriexEffectiveConfigProvider` validiert, dass das Profil existiert.
 - `CONFIG_PARAMS.md` dokumentiert den neuen Parameter.
 ## Nicht geändert
 - Keine Shopquery-Logik geändert.
 - Keine Follow-up-Actions geändert.
 - Keine Agent-/Prompt-Antwortregeln geändert.
 - Keine Testomat-spezifische PHP-Sonderlogik ergänzt.
 - Keine Retrieval-Parameter wie Schwellenwerte, RRF-Gewichte oder Top-K verändert.
 ## Validierung im Patch-Build
 Da die lokale Ausführungsumgebung keine vollständigen PHP-Erweiterungen/Vendor-Abhängigkeiten bereitstellt, konnte der Symfony-Eval-Command hier nicht ausgeführt werden. Stattdessen wurden folgende Checks ausgeführt:
 - YAML-Parsing für `retrieval.yaml`, `genre.yaml`, `language.yaml`
 - PHP-Syntaxprüfung für alle geänderten PHP-Dateien
 - lokale NDJSON-/Lexical-Index-Simulation gegen die bereitgestellte `knowledge.zip`
 Die Simulation zeigt für die vier roten Baseline-Fälle den erwarteten Zieltreffer in den Top-Ergebnissen:
 - Verschneideregler -> `Testomat 2000 V`
 - Chlorüberwachung -> `Testomat 2000 THCL`
 - Lieferbedingungen/Versand -> `Lieferung und Versand`
 - TH 2005 Sicherheitsdatenblatt -> `Testomat 2000 Indikator TH 2005`
 ## Empfohlener Regressionstest nach Einspielen
 ```bash
 php bin/console mto:agent:config:validate
 php bin/console mto:agent:eval:run retrieval
 ```
 Erwartung: Die Retrieval-Baseline sollte von `15/19` auf `19/19` gehen. Falls nach produktiver Vector-/Lexical-Index-Lage noch ein einzelner semantischer Fall hängt, sollte zuerst der Knowledge-Index neu aufgebaut werden, bevor Retrieval-Parameter verändert werden.
--- a/patch_history/RETRIEX_PATCH_99B_EVAL_SUITE_ALIGNMENT_README.md
+++ b/patch_history/RETRIEX_PATCH_99B_EVAL_SUITE_ALIGNMENT_README.md
@@ -0,0 +1,85 @@
 # RetrieX Patch p99b - Eval Suite Alignment
 ## Ziel
 p99 hatte die neue Eval-Suite erfolgreich aktiviert, aber drei neue Cases zeigten nach dem ersten Lauf rote Signale. p99b trennt dabei False-Positive-Assertions von zwei realen Robustheitsluecken, ohne die bestehende Retrieval-Baseline oder Shop-/Follow-up-Architektur umzubauen.
 ## Ausgangslage
 Nach p99:
 - `mto:agent:config:validate`: OK
 - `mto:agent:eval:run retrieval`: 19/19 OK
 - `mto:agent:eval:run shop_query`: 4/5 OK
 - `mto:agent:eval:run followup`: 3/4 OK
 - `mto:agent:eval:run answer_guard`: 3/4 OK
 Rote Cases:
 - `shop_query_sio2_anchor_001`: normalisierte Shopquery konnte auf `gerät` zusammenschrumpfen.
 - `followup_main_device_price_001`: Hauptgeraet-Follow-up konnte an der vorherigen Indikator-Query `testomat 808 indikator 300` haengen bleiben.
 - `answer_guard_delivery_not_sdb_001`: Assertion war zu streng, weil ein Textbegriff `Sicherheitsdatenblatt` im Retrieval-Text kein ausreichender Fehlernachweis ist, solange das falsche Dokument nicht dominiert.
 ## Aenderungen
 ### 1. SiO2/Silikat als aktuelle Eingabe schuetzen
 `config/retriex/genre.yaml`
 Ergaenzt `shop_query_runtime.current_input_preservation_terms` um:
 - `silikat`
 - `silikatüberwachung`
 - `silikatueberwachung`
 - `sio2`
 - `si o2`
 - `kieselsäure`
 - `kieselsaeure`
 Damit verliert eine normalisierte Standalone-Shopfrage wie `suche gerät kühlsysteme Silikatüberwachung` nicht mehr den fachlichen Messparameter, bevor die generische Device-Anchor-Regel `testomat 808 sio2` greifen kann.
 ### 2. Hauptgeraet-Follow-up darf Zubehoerreste entfernen
 `src/Agent/AgentRunner.php`
 `guardMainDeviceReferentialShopQueryWithHistoryModelAnchor()` wurde so angepasst, dass eine Shopquery wie `testomat 808 indikator 300` bei einem Prompt wie `und was kostet das gerät selber` nicht allein deshalb akzeptiert wird, weil sie bereits einen Modellanker enthaelt.
 Neu wird geprueft, ob nach dem Modellanker noch Zubehoer-/Code-Resttokens vorhanden sind. Falls ja, wird auf den reinen Modellanker aus dem Verlauf reduziert, z. B. `testomat 808`.
 ### 3. Answer-Guard-Case weniger spröde
 `tests/evals/cases/answer_guard.ndjson`
 Der Case `answer_guard_delivery_not_sdb_001` prueft weiterhin:
 - passendes Liefer-/Versand-Dokument muss enthalten sein
 - konkretes SDB-Dokument darf nicht enthalten sein
 Die zu breite Text-Assertion auf den Begriff `sicherheitsdatenblatt` wurde entfernt, weil sie auch legitime Neben-/Hinweistexte treffen kann.
 ## Bewusst nicht geaendert
 - Keine Retrieval-Gewichte
 - Keine Shopware-Suche
 - Keine Prompt-Texte
 - Keine Modellparameter
 - Keine neue Produkt-Sonderlogik
 - Keine Aenderung an p98-Retrieval-Eval-Cases
 ## Erwartete Checks
 ```bash
 php bin/console mto:agent:config:validate
 php bin/console mto:agent:eval:run retrieval
 php bin/console mto:agent:eval:run shop_query
 php bin/console mto:agent:eval:run followup
 php bin/console mto:agent:eval:run answer_guard
 ```
 Erwartung:
 - Config valid
 - Retrieval 19/19
 - Shopquery 5/5
 - Followup 4/4
 - Answer guard 4/4
--- a/patch_history/RETRIEX_PATCH_99C_MAIN_DEVICE_FOLLOWUP_EVAL_ALIGNMENT_README.md
+++ b/patch_history/RETRIEX_PATCH_99C_MAIN_DEVICE_FOLLOWUP_EVAL_ALIGNMENT_README.md
@@ -0,0 +1,60 @@
 # RETRIEX PATCH 99C - Main Device Follow-up Eval Alignment
 Status: patch-only follow-up for p99/p99b.
 ## Goal
 Keep the new p99 follow-up eval suite aligned with the already confirmed manual
 reference flow:
 1. lowest water-hardness threshold
 2. indicator type
 3. indicator price
 4. main device price
 The main-device follow-up `und was kostet das gerät selber` must resolve back to
 the main device anchor (`testomat 808`) and must not keep accessory remnants such
 as `indikator` or exact indicator code `300`.
 ## Root cause
 p99b added a residual accessory guard, but the main-device history-anchor guard
 returned early for non-generic shop queries before the residual check could run.
 A query like `testomat 808 indikator 300` contains digits, so it was not treated
 as a generic main-device query and stayed unchanged.
 ## Change
 `AgentRunner::guardMainDeviceReferentialShopQueryWithHistoryModelAnchor()` now:
 1. detects the main-device referential prompt,
 2. extracts the latest history model anchor,
 3. if the generated shop query already contains that model anchor, checks for
   accessory/code residuals,
 4. reduces the query to the pure model anchor when such residuals are present.
 This keeps explicit non-generic product queries untouched unless they contain the
 current history model anchor plus accessory leftovers in a main-device follow-up.
 ## Expected eval result
 ```bash
 php bin/console mto:agent:config:validate
 php bin/console mto:agent:eval:run retrieval
 php bin/console mto:agent:eval:run shop_query
 php bin/console mto:agent:eval:run followup
 php bin/console mto:agent:eval:run answer_guard
 ```
 Expected:
 - retrieval: 19/19
 - shop_query: 5/5
 - followup: 4/4
 - answer_guard: 4/4
 ## Productive logic impact
 Minimal. The patch only changes the already existing main-device follow-up guard
 for prompts asking for the main device itself. It does not modify retrieval,
 ranking, prompt templates, YAML vocabulary, shop result guards, or answer logic.
--- a/patch_history/RETRIEX_PATCH_99_EVAL_SUITE_EXPANSION_README.md
+++ b/patch_history/RETRIEX_PATCH_99_EVAL_SUITE_EXPANSION_README.md
@@ -0,0 +1,157 @@
 # RetrieX Patch p99 - Eval Suite Expansion
 ## Ziel
 p99 erweitert die bisher reine Retrieval-Eval-Baseline um zusätzliche, manuell bekannte Regressionstypen aus v1.6.2:
 - Shopquery-Erzeugung
 - Follow-up-Auflösung mit Chatverlauf
 - Antwort-/Halluzinations-Guardrails auf Retrieval-Evidenzebene
 Der Patch ändert bewusst keine produktive RAG-, Retrieval-, Shop-, Prompt- oder Antwortlogik. Er ergänzt nur Eval-Infrastruktur und Eval-Cases.
 ## Neue Eval-Typen
 ### `shop_query`
 Prüft die von `AgentRunner` vorbereitete Shop-Suchquery anhand der Shop-Meta-Ausgabe. Der Runner stoppt, sobald die erste Shop-Such-Meta-Card erzeugt wurde. Dadurch werden die Query-Guards, die Routing-/History-Logik und die finalen Shopquery-Filter geprüft, ohne von der Live-Shopware-Suche abhängig zu sein.
 Beispiel:
 ```bash
 php bin/console mto:agent:eval:run shop_query
 ```
 Cases liegen in:
 ```text
 tests/evals/cases/shop_query.ndjson
 ```
 Abgedeckt werden unter anderem:
 - exakter Indikatorcode `Testomat 808 Indikator 300`
 - Brauerei-/Brauwasser-Query-Cleanup
 - Schwimmbad-Tippfehlerkorrektur
 - LAB-CL-Kürzelerhalt
 - SIO2-Geräteanker für Silikatüberwachung
 ### `followup`
 Prüft referenzielle Shop-Folgefragen mit vorbereiteten History-Turns. Die History wird pro Eval-Case in einen isolierten temporären Eval-User geschrieben und danach wieder gelöscht.
 Beispiel:
 ```bash
 php bin/console mto:agent:eval:run followup
 ```
 Cases liegen in:
 ```text
 tests/evals/cases/followup.ndjson
 ```
 Abgedeckt werden unter anderem:
 - `0,02 °dH -> Testomat 808 -> Indikatortyp 300 -> was kostet der indikator`
 - Wechsel vom Indikatorpreis zurück zum Hauptgerätpreis
 - schwache Shop-Folgefrage `suche im shop nach der information` mit THCL-Historyanker
 - Produktlink-Follow-up mit Einzelqueries statt kombinierter Multi-Produkt-Query
 ### `answer_guard`
 Prüft Antwort-Guardrails vor der finalen LLM-Antwort auf Basis der Retrieval-Evidenz. Das ist absichtlich kein generativer LLM-Antworttest, sondern ein stabiler Pre-Answer-Guard gegen falsche Evidenz oder Halluzinationsrisiken.
 Beispiel:
 ```bash
 php bin/console mto:agent:eval:run answer_guard
 ```
 Cases liegen in:
 ```text
 tests/evals/cases/answer_guard.ndjson
 ```
 Abgedeckt werden unter anderem:
 - Noise-Prompt ohne Evidenz
 - Fantasie-Medien wie Drachenblut / Mondwasser
 - Lieferbedingungen dürfen nicht auf Sicherheitsdatenblätter kippen
 ## Neue Assertion-Felder
 ### Für `shop_query` und `followup`
 ```json
 {
  "expected_query": "testomat 808 300 indikator",
  "must_include_terms": ["testomat", "808", "300", "indikator"],
  "must_not_include_terms": ["300 s", "301", "302"],
  "must_not_equal_query": "information"
 }
 ```
 Für Multi-Produkt-Follow-ups:
 ```json
 {
  "expected_individual_queries": [
    "testomat 2000 self clean",
    "testomat 2000 cal",
    "testomat 808"
  ],
  "expected_individual_queries_exact": true,
  "min_individual_queries": 3,
  "max_individual_queries": 3
 }
 ```
 ### Für `retrieval` und `answer_guard`
 `RetrievalDebugRunner` unterstützt zusätzlich:
 ```json
 {
  "must_not_include_terms": ["sicherheitsdatenblatt"],
  "must_not_match_patterns": ["/forbidden/u"]
 }
 ```
 ## Geänderte Dateien
 ```text
 src/Command/AgentEvalRunCommand.php
 src/Eval/AgentEvalRunner.php
 src/Eval/AnswerGuardEvalRunner.php
 src/Eval/Dto/EvalCase.php
 src/Eval/RetrievalDebugRunner.php
 src/Eval/ShopQueryEvalRunner.php
 tests/evals/cases/answer_guard.ndjson
 tests/evals/cases/followup.ndjson
 tests/evals/cases/shop_query.ndjson
 patch_history/RETRIEX_PATCH_99_EVAL_SUITE_EXPANSION_README.md
 ```
 ## Nicht geändert
 - Keine Retrieval-Gewichte geändert.
 - Keine Shopquery-Produktivlogik geändert.
 - Keine Prompt-Regeln geändert.
 - Keine YAML-Vokabularregeln geändert.
 - Keine LLM-/Modellparameter geändert.
 - Keine Admin-/Frontend-Logik geändert.
 ## Empfohlene Validierung nach Einspielen
 ```bash
 php bin/console mto:agent:config:validate
 php bin/console mto:agent:eval:run retrieval
 php bin/console mto:agent:eval:run shop_query
 php bin/console mto:agent:eval:run followup
 php bin/console mto:agent:eval:run answer_guard
 ```
 Wichtig: `shop_query` und `followup` laufen über den `AgentRunner` bis zur Shop-Meta-Card. Sie stoppen vor der Live-Shop-Suche, können aber je nach aktiver Konfiguration weiterhin Input-Normalisierung oder Shopquery-Optimierung über das konfigurierte LLM versuchen. Wenn das LLM nicht erreichbar ist, greift die bestehende Fallback-Logik des Agenten.
--- a/src/Agent/AgentRunner.php
+++ b/src/Agent/AgentRunner.php
@@ -4155,7 +4155,6 @@ final readonly class AgentRunner
            $shopSearchQuery === ''
            || trim($commerceHistoryContext) === ''
            || $this->referenceAnchorExtractor->extractFirstProductModelAnchor($prompt) !== ''
            || $this->referenceAnchorExtractor->extractFirstProductModelAnchor($shopSearchQuery) !== ''
        ) {
            return $shopSearchQuery;
        }
@@ -4164,10 +4163,6 @@ final readonly class AgentRunner
            return $shopSearchQuery;
        }
        if (!$this->isGenericMainDeviceReferentialShopQuery($shopSearchQuery)) {
            return $shopSearchQuery;
        }
        $modelAnchor = $this->normalizeShopQueryAnchor(
            $this->extractLatestHistoryProductModelAnchor($commerceHistoryContext)
        );
@@ -4176,9 +4171,43 @@ final readonly class AgentRunner
            return $shopSearchQuery;
        }
-        return $this->queryAlreadyContainsAllAnchorTokens($shopSearchQuery, $modelAnchor)
+        if ($this->queryAlreadyContainsAllAnchorTokens($shopSearchQuery, $modelAnchor)) {
-            ? $shopSearchQuery
+            return $this->containsMainDeviceFollowUpAccessoryResidual($shopSearchQuery, $modelAnchor)
-            : $modelAnchor;
+                ? $modelAnchor
                : $shopSearchQuery;
        }
        if (!$this->isGenericMainDeviceReferentialShopQuery($shopSearchQuery)) {
            return $shopSearchQuery;
        }
        return $modelAnchor;
    }
    private function containsMainDeviceFollowUpAccessoryResidual(string $shopSearchQuery, string $modelAnchor): bool
    {
        $queryTokens = $this->tokenizeShopQueryCandidate($shopSearchQuery);
        if ($queryTokens === []) {
            return false;
        }
        $modelTokens = array_fill_keys($this->tokenizeShopQueryCandidate($modelAnchor), true);
        $accessoryTokens = $this->buildShopQueryTokenSet($this->mergeUniqueStrings(
            $this->agentRunnerConfig->getNoLlmAccessoryProductRoleKeywords(),
            $this->agentRunnerConfig->getRequestedAccessoryCodeTerms()
        ));
        foreach ($queryTokens as $token) {
            if (isset($modelTokens[$token])) {
                continue;
            }
            if (isset($accessoryTokens[$token]) || preg_match('/^\d{1,5}$/u', $token) === 1) {
                return true;
            }
        }
        return false;
    }
    private function guardWeakReferentialShopQueryWithHistoryModelAnchor(
--- a/src/Command/AgentEvalRunCommand.php
+++ b/src/Command/AgentEvalRunCommand.php
@@ -37,7 +37,7 @@ final class AgentEvalRunCommand extends Command
            ->addArgument(
                'type',
                InputArgument::OPTIONAL,
-                'Eval type to run',
+                'Eval type to run (retrieval, shop_query, followup, answer_guard)',
                'retrieval'
            )
            ->addOption(
--- a/src/Config/NdjsonHybridRetrieverConfig.php
+++ b/src/Config/NdjsonHybridRetrieverConfig.php
@@ -118,6 +118,11 @@ final class NdjsonHybridRetrieverConfig
        return $this->requiredInt('exact_document_max_chunks', 1);
    }
    public function queryCleanupProfile(): string
    {
        return $this->requiredString('query_cleanup_profile');
    }
    public function focusedProductWindow(): int
    {
        return $this->requiredInt('focused_product_window', 1);
@@ -350,6 +355,7 @@ final class NdjsonHybridRetrieverConfig
            'dominant_doc_min_hits' => $this->dominantDocMinHits(),
            'dominant_doc_max_chunks' => $this->dominantDocMaxChunks(),
            'exact_document_max_chunks' => $this->exactDocumentMaxChunks(),
            'query_cleanup_profile' => $this->queryCleanupProfile(),
            'focused_product_window' => $this->focusedProductWindow(),
            'focused_product_min_score' => $this->focusedProductMinScore(),
            'focused_product_min_gap' => $this->focusedProductMinGap(),
--- a/src/Config/RetriexEffectiveConfigProvider.php
+++ b/src/Config/RetriexEffectiveConfigProvider.php
@@ -49,7 +49,6 @@ final readonly class RetriexEffectiveConfigProvider
            'llm' => [
                'timeout_seconds' => $this->param('retriex.llm.timeout_seconds'),
                'num_predict' => $this->param('retriex.llm.num_predict'),
                'call_models' => $this->param('retriex.llm.call_models'),
            ],
            'retrieval' => $this->retrievalConfig(),
            'prompt' => $this->promptConfig(),
@@ -86,7 +85,6 @@ final readonly class RetriexEffectiveConfigProvider
        $this->validateRuntime($config['runtime'], $errors, $warnings);
        $this->validateIndex($config['index'], $errors, $warnings);
        $this->validateModel($config['model_generation'], $errors, $warnings);
        $this->validateLlm($config['llm'], $errors, $warnings);
        $this->validateRetrieval($config['retrieval'], $errors, $warnings);
        $this->validatePrompt($config['prompt'], $errors, $warnings);
        $this->validateAgent($config['agent'], $errors, $warnings);
@@ -1716,46 +1714,6 @@ final readonly class RetriexEffectiveConfigProvider
        }
    }
    /**
     * @param array<string, mixed> $llm
     * @param list<string> $errors
     * @param list<string> $warnings
     */
    private function validateLlm(array $llm, array &$errors, array &$warnings): void
    {
        $callModels = $llm['call_models'] ?? [];
        if (!is_array($callModels)) {
            $errors[] = 'llm.call_models must be a map.';
            return;
        }
        $knownCalls = [
            'input_normalization',
            'shop_query_optimization',
            'final_answer',
        ];
        foreach ($callModels as $callName => $modelName) {
            if (!is_string($callName) || trim($callName) === '') {
                $errors[] = 'llm.call_models contains an invalid call name.';
                continue;
            }
            if (!in_array($callName, $knownCalls, true)) {
                $warnings[] = 'llm.call_models contains an unknown call name: ' . $callName . '.';
            }
            if ($modelName !== null && !is_string($modelName)) {
                $errors[] = 'llm.call_models.' . $callName . ' must be null or a string model name.';
                continue;
            }
            if (is_string($modelName) && trim($modelName) === '') {
                $warnings[] = 'llm.call_models.' . $callName . ' is empty and will use the default model.';
            }
        }
    }
    /**
     * @param array<string, mixed> $retrieval
     * @param list<string> $errors
@@ -1782,6 +1740,13 @@ final readonly class RetriexEffectiveConfigProvider
            $errors[] = 'retrieval.generic_exact_selection_cleanup_profile references unknown language cleanup profile: ' . trim($cleanupProfile) . '.';
        }
        $queryCleanupProfile = $retrieval['query_cleanup_profile'] ?? null;
        if (!is_string($queryCleanupProfile) || trim($queryCleanupProfile) === '') {
            $errors[] = 'retrieval.query_cleanup_profile must be a non-empty string.';
        } elseif (!in_array(trim($queryCleanupProfile), $this->languageCleanupConfig->getCleanupProfileNames(), true)) {
            $errors[] = 'retrieval.query_cleanup_profile references unknown language cleanup profile: ' . trim($queryCleanupProfile) . '.';
        }
        $this->validateStringListMap($retrieval['vocabulary'] ?? [], 'retrieval.vocabulary', $errors, $warnings);
        $inventory = $retrieval['inventory_parameter'] ?? [];
--- a/src/Controller/Admin/AdminEvalController.php
+++ b/src/Controller/Admin/AdminEvalController.php
@@ -0,0 +1,192 @@
 <?php
 declare(strict_types=1);
 namespace App\Controller\Admin;
 use App\Security\ApplicationRoles;
 use App\Service\Admin\EvalAdminService;
 use Symfony\Bundle\FrameworkBundle\Controller\AbstractController;
 use Symfony\Component\HttpFoundation\Request;
 use Symfony\Component\HttpFoundation\Response;
 use Symfony\Component\Routing\Attribute\Route;
 #[Route('/admin/evals')]
 final class AdminEvalController extends AbstractController
 {
    #[Route('/', name: 'admin_evals_index', methods: ['GET'])]
    public function index(Request $request, EvalAdminService $evals): Response
    {
        $this->denyAccessUnlessGranted(ApplicationRoles::ROLE_KNOWLEDGE_ADMIN);
        $selectedType = trim((string) $request->query->get('type', ''));
        if ($selectedType === '' || !in_array($selectedType, $evals->supportedTypeNames(), true)) {
            $selectedType = 'retrieval';
        }
        return $this->render('admin/evals/index.html.twig', [
            'types' => $evals->supportedTypes(),
            'overview' => $evals->overview(),
            'cases_by_type' => $evals->casesByType(),
            'selected_type' => $selectedType,
            'selected_report' => $evals->readTypeReport($selectedType),
            'last_report' => $evals->readLastReport(),
        ]);
    }
    #[Route('/run', name: 'admin_evals_run', methods: ['POST'])]
    public function run(Request $request, EvalAdminService $evals): Response
    {
        $this->denyAccessUnlessGranted(ApplicationRoles::ROLE_KNOWLEDGE_ADMIN);
        if (!$this->isCsrfTokenValid('admin_eval_run', (string) $request->request->get('_token'))) {
            throw $this->createAccessDeniedException();
        }
        $type = trim((string) $request->request->get('type', 'retrieval'));
        $caseId = trim((string) $request->request->get('case_id', ''));
        try {
            $report = $evals->run($type, $caseId !== '' ? $caseId : null);
            $type = trim((string) ($report['type'] ?? $type));
            $this->addFlash(
                ((int) ($report['failed'] ?? 0)) === 0 ? 'success' : 'danger',
                sprintf(
                    'Eval %s abgeschlossen: %d/%d bestanden.',
                    $type,
                    (int) ($report['passed'] ?? 0),
                    (int) ($report['total'] ?? 0)
                )
            );
        } catch (\Throwable $e) {
            $this->addFlash('danger', $e->getMessage());
        }
        return $this->redirectToRoute('admin_evals_index', [
            'type' => $type,
        ]);
    }
    #[Route('/cases/new', name: 'admin_evals_case_new', methods: ['GET'])]
    public function newCase(Request $request, EvalAdminService $evals): Response
    {
        $this->denyAccessUnlessGranted(ApplicationRoles::ROLE_KNOWLEDGE_ADMIN);
        $type = trim((string) $request->query->get('type', 'retrieval'));
        if (!in_array($type, $evals->supportedTypeNames(), true)) {
            $type = 'retrieval';
        }
        $sourceType = trim((string) $request->query->get('source_type', ''));
        $sourceCaseId = trim((string) $request->query->get('source_case_id', ''));
        try {
            $draft = $sourceType !== '' && $sourceCaseId !== ''
                ? $evals->caseDraftFromReportResult($sourceType, $sourceCaseId)
                : $evals->emptyCaseDraft($type);
        } catch (\Throwable $e) {
            $this->addFlash('warning', $e->getMessage());
            $draft = $evals->emptyCaseDraft($type);
        }
        return $this->render('admin/evals/case_new.html.twig', [
            'types' => $evals->supportedTypes(),
            'cases_by_type' => $evals->casesByType(),
            'case_draft' => $draft,
        ]);
    }
    #[Route('/cases', name: 'admin_evals_case_create', methods: ['POST'])]
    public function createCase(Request $request, EvalAdminService $evals): Response
    {
        $this->denyAccessUnlessGranted(ApplicationRoles::ROLE_KNOWLEDGE_ADMIN);
        if (!$this->isCsrfTokenValid('admin_eval_case_create', (string) $request->request->get('_token'))) {
            throw $this->createAccessDeniedException();
        }
        $type = trim((string) $request->request->get('type', 'retrieval'));
        $draft = [
            'type' => $type,
            'id' => (string) $request->request->get('id', ''),
            'prompt' => (string) $request->request->get('prompt', ''),
            'assert_json' => (string) $request->request->get('assert_json', ''),
            'history_json' => (string) $request->request->get('history_json', ''),
            'request_context_hint' => (string) $request->request->get('request_context_hint', ''),
            'source_label' => '',
        ];
        try {
            $created = $evals->createCase(
                type: $type,
                id: (string) $request->request->get('id', ''),
                prompt: (string) $request->request->get('prompt', ''),
                assertJson: (string) $request->request->get('assert_json', ''),
                historyJson: (string) $request->request->get('history_json', ''),
                requestContextHint: (string) $request->request->get('request_context_hint', ''),
            );
            $type = (string) ($created['type'] ?? $type);
            $this->addFlash(
                'success',
                sprintf('Eval-Case "%s" wurde in %s.ndjson gespeichert.', (string) ($created['id'] ?? ''), $type)
            );
            return $this->redirectToRoute('admin_evals_index', [
                'type' => $type,
            ]);
        } catch (\Throwable $e) {
            $this->addFlash('danger', $e->getMessage());
        }
        if (!in_array($type, $evals->supportedTypeNames(), true)) {
            $draft['type'] = 'retrieval';
        }
        return $this->render('admin/evals/case_new.html.twig', [
            'types' => $evals->supportedTypes(),
            'cases_by_type' => $evals->casesByType(),
            'case_draft' => $draft,
        ], new Response('', Response::HTTP_UNPROCESSABLE_ENTITY));
    }
    #[Route('/cases/delete', name: 'admin_evals_case_delete', methods: ['POST'])]
    public function deleteCase(Request $request, EvalAdminService $evals): Response
    {
        $this->denyAccessUnlessGranted(ApplicationRoles::ROLE_KNOWLEDGE_ADMIN);
        $type = trim((string) $request->request->get('type', 'retrieval'));
        $caseId = trim((string) $request->request->get('case_id', ''));
        if (!$this->isCsrfTokenValid(
            sprintf('admin_eval_case_delete_%s_%s', $type, $caseId),
            (string) $request->request->get('_token')
        )) {
            throw $this->createAccessDeniedException();
        }
        try {
            $deleted = $evals->deleteCase($type, $caseId);
            $type = (string) ($deleted['type'] ?? $type);
            $this->addFlash(
                'success',
                sprintf('Eval-Case "%s" wurde aus %s.ndjson entfernt.', (string) ($deleted['id'] ?? $caseId), $type)
            );
        } catch (\Throwable $e) {
            $this->addFlash('danger', $e->getMessage());
        }
        if (!in_array($type, $evals->supportedTypeNames(), true)) {
            $type = 'retrieval';
        }
        return $this->redirectToRoute('admin_evals_case_new', [
            'type' => $type,
        ]);
    }
 }
--- a/src/Eval/AgentEvalRunner.php
+++ b/src/Eval/AgentEvalRunner.php
@@ -11,6 +11,8 @@ final readonly class AgentEvalRunner
 {
    public function __construct(
        private RetrievalDebugRunner $retrievalDebugRunner,
        private ShopQueryEvalRunner $shopQueryEvalRunner,
        private AnswerGuardEvalRunner $answerGuardEvalRunner,
    ) {
    }
@@ -20,6 +22,14 @@ final readonly class AgentEvalRunner
            return $this->retrievalDebugRunner->run($case);
        }
        if ($case->isShopQueryCase() || $case->isFollowUpCase()) {
            return $this->shopQueryEvalRunner->run($case);
        }
        if ($case->isAnswerGuardCase()) {
            return $this->answerGuardEvalRunner->run($case);
        }
        throw new \InvalidArgumentException(sprintf(
            'Unsupported eval case type: %s',
            $case->type
--- a/src/Eval/AnswerGuardEvalRunner.php
+++ b/src/Eval/AnswerGuardEvalRunner.php
@@ -0,0 +1,32 @@
 <?php
 declare(strict_types=1);
 namespace App\Eval;
 use App\Eval\Dto\EvalCase;
 use App\Eval\Dto\EvalResult;
 final readonly class AnswerGuardEvalRunner
 {
    public function __construct(
        private RetrievalDebugRunner $retrievalDebugRunner,
    ) {
    }
    public function run(EvalCase $case): EvalResult
    {
        $result = $this->retrievalDebugRunner->run($case);
        $details = $result->details;
        $details['guard_scope'] = 'retrieval_evidence_pre_answer';
        return new EvalResult(
            caseId: $result->caseId,
            type: $case->type,
            passed: $result->passed,
            durationMs: $result->durationMs,
            failures: $result->failures,
            details: $details,
        );
    }
 }
--- a/src/Eval/Dto/EvalCase.php
+++ b/src/Eval/Dto/EvalCase.php
@@ -8,12 +8,15 @@ final readonly class EvalCase
 {
    /**
     * @param array<string, mixed> $assert
     * @param array<int, array{prompt:string,answer:string}> $history
     */
    public function __construct(
        public string $id,
        public string $type,
        public string $prompt,
        public array $assert = [],
        public array $history = [],
        public string $requestContextHint = '',
    ) {
    }
@@ -26,6 +29,8 @@ final readonly class EvalCase
        $type = trim((string) ($row['type'] ?? ''));
        $prompt = trim((string) ($row['prompt'] ?? ''));
        $assert = is_array($row['assert'] ?? null) ? $row['assert'] : [];
        $history = self::normalizeHistory($row['history'] ?? []);
        $requestContextHint = trim((string) ($row['request_context_hint'] ?? ''));
        if ($id === '') {
            throw new \InvalidArgumentException('Eval case id must not be empty.');
@@ -50,6 +55,8 @@ final readonly class EvalCase
            type: $type,
            prompt: $prompt,
            assert: $assert,
            history: $history,
            requestContextHint: $requestContextHint,
        );
    }
@@ -57,4 +64,64 @@ final readonly class EvalCase
    {
        return $this->type === 'retrieval';
    }
    public function isShopQueryCase(): bool
    {
        return $this->type === 'shop_query';
    }
    public function isFollowUpCase(): bool
    {
        return $this->type === 'followup';
    }
    public function isAnswerGuardCase(): bool
    {
        return $this->type === 'answer_guard';
    }
    /**
     * @return array<int, array{prompt:string,answer:string}>
     */
    private static function normalizeHistory(mixed $value): array
    {
        if (!is_array($value)) {
            return [];
        }
        $history = [];
        foreach ($value as $entry) {
            if (is_string($entry)) {
                $entry = trim($entry);
                if ($entry !== '') {
                    $history[] = [
                        'prompt' => 'Eval-Kontext',
                        'answer' => $entry,
                    ];
                }
                continue;
            }
            if (!is_array($entry)) {
                continue;
            }
            $prompt = trim((string) ($entry['prompt'] ?? ''));
            $answer = trim((string) ($entry['answer'] ?? $entry['response'] ?? ''));
            if ($prompt === '' && $answer === '') {
                continue;
            }
            $history[] = [
                'prompt' => $prompt !== '' ? $prompt : 'Eval-Kontext',
                'answer' => $answer,
            ];
        }
        return $history;
    }
 }
--- a/src/Eval/RetrievalDebugRunner.php
+++ b/src/Eval/RetrievalDebugRunner.php
@@ -33,6 +33,8 @@ final readonly class RetrievalDebugRunner
        $documentIds = $this->extractUniqueStringValues($rows, 'document_id');
        $chunkIds = $this->extractUniqueStringValues($rows, 'chunk_id');
        $documentRefs = $this->buildDocumentRefs($rows);
        $resultRows = $this->buildResultRows($rows);
        $joinedText = $this->extractJoinedText($rows);
        $assert = $case->assert;
@@ -187,6 +189,25 @@ final readonly class RetrievalDebugRunner
            }
        }
        $forbiddenTerms = $this->normalizeStringList($assert['must_not_include_terms'] ?? []);
        foreach ($forbiddenTerms as $forbiddenTerm) {
            if ($this->containsTerm($joinedText, $forbiddenTerm)) {
                $failures[] = sprintf(
                    'forbidden term "%s" was present in the retrieval text.',
                    $forbiddenTerm
                );
            }
        }
        foreach ($this->normalizeStringList($assert['must_not_match_patterns'] ?? []) as $pattern) {
            if (@preg_match($pattern, $joinedText) === 1) {
                $failures[] = sprintf(
                    'forbidden pattern "%s" matched the retrieval text.',
                    $pattern
                );
            }
        }
        return new EvalResult(
            caseId: $case->id,
            type: $case->type,
@@ -201,8 +222,11 @@ final readonly class RetrievalDebugRunner
                'intent' => $intent,
                'document_ids' => $documentIds,
                'chunk_ids' => $chunkIds,
                'document_refs' => $documentRefs,
                'result_rows' => $resultRows,
                'matched_any_terms' => $matchedAnyTerms,
                'matched_all_terms' => $matchedAllTerms,
                'forbidden_terms_checked' => $this->normalizeStringList($assert['must_not_include_terms'] ?? []),
            ],
        );
    }
@@ -248,6 +272,122 @@ final readonly class RetrievalDebugRunner
        return array_keys($values);
    }
    /**
     * @param array<int, array<string, mixed>> $rows
     * @return array<int, array{id:string,title:string,file_path:string,version_number:string,chunk_ids:array<int,string>,ranks:array<int,int>}>
     */
    private function buildDocumentRefs(array $rows): array
    {
        $refs = [];
        foreach ($rows as $row) {
            $documentId = $this->extractNullableString($row, 'document_id');
            if ($documentId === '') {
                continue;
            }
            if (!isset($refs[$documentId])) {
                $refs[$documentId] = [
                    'id' => $documentId,
                    'title' => $this->extractNullableString($row, 'document_title'),
                    'file_path' => $this->extractNullableString($row, 'file_path'),
                    'version_number' => $this->extractNullableString($row, 'version_number'),
                    'chunk_ids' => [],
                    'ranks' => [],
                ];
            }
            $chunkId = $this->extractNullableString($row, 'chunk_id');
            if ($chunkId !== '' && !in_array($chunkId, $refs[$documentId]['chunk_ids'], true)) {
                $refs[$documentId]['chunk_ids'][] = $chunkId;
            }
            $rank = $this->extractNullableInt($row, 'rank');
            if ($rank !== null && !in_array($rank, $refs[$documentId]['ranks'], true)) {
                $refs[$documentId]['ranks'][] = $rank;
            }
        }
        return array_values($refs);
    }
    /**
     * @param array<int, array<string, mixed>> $rows
     * @return array<int, array<string, mixed>>
     */
    private function buildResultRows(array $rows): array
    {
        $out = [];
        foreach ($rows as $row) {
            $out[] = [
                'rank' => $this->extractNullableInt($row, 'rank'),
                'document_id' => $this->extractNullableString($row, 'document_id'),
                'document_title' => $this->extractNullableString($row, 'document_title'),
                'file_path' => $this->extractNullableString($row, 'file_path'),
                'chunk_id' => $this->extractNullableString($row, 'chunk_id'),
                'chunk_index' => $this->extractNullableInt($row, 'chunk_index'),
                'raw_score' => $row['raw_score'] ?? null,
                'rrf_score' => $row['rrf_score'] ?? null,
                'text_preview' => $this->previewText($this->extractNullableString($row, 'text')),
            ];
        }
        return $out;
    }
    /**
     * @param array<string, mixed> $row
     */
    private function extractNullableString(array $row, string $key): string
    {
        $value = $row[$key] ?? null;
        if ($value === null || is_array($value) || is_object($value)) {
            return '';
        }
        return trim((string)$value);
    }
    /**
     * @param array<string, mixed> $row
     */
    private function extractNullableInt(array $row, string $key): ?int
    {
        $value = $row[$key] ?? null;
        if ($value === null || $value === '') {
            return null;
        }
        if (is_int($value)) {
            return $value;
        }
        if (is_string($value) && preg_match('/^-?\d+$/', trim($value)) === 1) {
            return (int)$value;
        }
        return null;
    }
    private function previewText(string $text, int $limit = 240): string
    {
        $text = preg_replace('/\s+/u', ' ', trim($text)) ?? trim($text);
        if ($text === '') {
            return '';
        }
        if (mb_strlen($text, 'UTF-8') <= $limit) {
            return $text;
        }
        return mb_substr($text, 0, $limit, 'UTF-8') . '...';
    }
    /**
     * @param array<int, array<string, mixed>> $rows
     */
--- a/src/Eval/ShopQueryEvalRunner.php
+++ b/src/Eval/ShopQueryEvalRunner.php
@@ -0,0 +1,389 @@
 <?php
 declare(strict_types=1);
 namespace App\Eval;
 use App\Agent\AgentRunner;
 use App\Context\ContextService;
 use App\Eval\Dto\EvalCase;
 use App\Eval\Dto\EvalResult;
 final readonly class ShopQueryEvalRunner
 {
    public function __construct(
        private AgentRunner $agentRunner,
        private ContextService $contextService,
    ) {
    }
    public function run(EvalCase $case): EvalResult
    {
        $start = microtime(true);
        $failures = [];
        $userId = $this->buildUserId($case);
        $transcript = '';
        $shopMeta = null;
        $this->contextService->deleteHistory($userId);
        $this->seedHistory($userId, $case->history);
        try {
            foreach ($this->agentRunner->run($case->prompt, $userId, false, $case->requestContextHint) as $chunk) {
                if (!is_string($chunk) || $chunk === '') {
                    continue;
                }
                $transcript .= $chunk . "\n";
                if (!str_contains($chunk, 'retriex-shop-meta')) {
                    if (mb_strlen($transcript, 'UTF-8') > 120000) {
                        $transcript = mb_substr($transcript, -120000, null, 'UTF-8');
                    }
                    continue;
                }
                $shopMeta = $this->extractShopMeta($chunk);
                break;
            }
        } catch (\Throwable $e) {
            $failures[] = sprintf('agent run failed before shop-query meta was emitted: %s', $e->getMessage());
        } finally {
            $this->contextService->deleteHistory($userId);
        }
        $durationMs = round((microtime(true) - $start) * 1000, 2);
        if ($shopMeta === null) {
            $failures[] = 'no shop-query meta message was emitted before the runner stopped.';
            $shopMeta = [
                'query' => '',
                'individual_queries' => [],
                'raw_html' => '',
            ];
        }
        $this->assertShopQuery($failures, $case, $shopMeta);
        return new EvalResult(
            caseId: $case->id,
            type: $case->type,
            passed: $failures === [],
            durationMs: $durationMs,
            failures: $failures,
            details: [
                'prompt' => $case->prompt,
                'history_turns' => count($case->history),
                'history' => $this->buildHistoryPreview($case->history),
                'has_request_context_hint' => $case->requestContextHint !== '',
                'query' => $shopMeta['query'],
                'individual_queries' => $shopMeta['individual_queries'],
                'transcript_preview' => $this->previewText($transcript),
            ],
        );
    }
    /**
     * @param array<int, array{prompt:string,answer:string}> $history
     * @return array<int, array{prompt:string,answer_preview:string}>
     */
    private function buildHistoryPreview(array $history): array
    {
        $preview = [];
        foreach ($history as $turn) {
            $prompt = trim((string) ($turn['prompt'] ?? ''));
            $answer = trim((string) ($turn['answer'] ?? ''));
            if ($prompt === '' && $answer === '') {
                continue;
            }
            $preview[] = [
                'prompt' => $prompt !== '' ? $prompt : 'Eval-Kontext',
                'answer_preview' => $this->previewText($answer, 260),
            ];
        }
        return $preview;
    }
    private function buildUserId(EvalCase $case): string
    {
        $safeId = preg_replace('/[^a-zA-Z0-9_-]+/', '_', $case->id) ?? $case->id;
        $safeId = trim($safeId, '_');
        return 'eval_' . ($safeId !== '' ? $safeId : sha1($case->id));
    }
    /**
     * @param array<int, array{prompt:string,answer:string}> $history
     */
    private function seedHistory(string $userId, array $history): void
    {
        foreach ($history as $turn) {
            $prompt = trim($turn['prompt'] ?? '');
            $answer = trim($turn['answer'] ?? '');
            if ($prompt === '' && $answer === '') {
                continue;
            }
            if ($prompt === '') {
                $prompt = 'Eval-Kontext';
            }
            $this->contextService->appendHistory($userId, $prompt, $answer);
        }
    }
    /**
     * @return array{query:string,individual_queries:array<int,string>,raw_html:string}
     */
    private function extractShopMeta(string $html): array
    {
        $isMultiQuery = str_contains($html, 'retriex-meta-query--multi');
        $codes = [];
        if (preg_match_all('/<code>(.*?)<\/code>/su', $html, $matches) !== false) {
            foreach ($matches[1] ?? [] as $value) {
                $decoded = html_entity_decode(strip_tags((string) $value), ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8');
                $decoded = $this->normalizeOneLine($decoded);
                if ($decoded !== '') {
                    $codes[] = $decoded;
                }
            }
        }
        $codes = array_values(array_unique($codes));
        if ($isMultiQuery) {
            return [
                'query' => '',
                'individual_queries' => $codes,
                'raw_html' => $html,
            ];
        }
        return [
            'query' => $codes[0] ?? '',
            'individual_queries' => [],
            'raw_html' => $html,
        ];
    }
    /**
     * @param array<int, string> $failures
     * @param array{query:string,individual_queries:array<int,string>,raw_html:string} $shopMeta
     */
    private function assertShopQuery(array &$failures, EvalCase $case, array $shopMeta): void
    {
        $assert = $case->assert;
        $query = $shopMeta['query'];
        $individualQueries = $shopMeta['individual_queries'];
        $joined = trim($query . ' ' . implode(' ', $individualQueries));
        $expectedQuery = $this->stringOrNull($assert['expected_query'] ?? null);
        if ($expectedQuery !== null && $this->normalizeQuery($query) !== $this->normalizeQuery($expectedQuery)) {
            $failures[] = sprintf(
                'shop query mismatch: expected "%s", got "%s".',
                $expectedQuery,
                $query
            );
        }
        $forbiddenExactQuery = $this->stringOrNull($assert['must_not_equal_query'] ?? null);
        if ($forbiddenExactQuery !== null && $this->normalizeQuery($query) === $this->normalizeQuery($forbiddenExactQuery)) {
            $failures[] = sprintf('shop query must not equal "%s".', $forbiddenExactQuery);
        }
        $expectedIndividualQueries = $this->normalizeStringList($assert['expected_individual_queries'] ?? []);
        if ($expectedIndividualQueries !== []) {
            foreach ($expectedIndividualQueries as $expectedIndividualQuery) {
                if (!$this->containsNormalizedQuery($individualQueries, $expectedIndividualQuery)) {
                    $failures[] = sprintf(
                        'missing expected individual shop query "%s". Got [%s].',
                        $expectedIndividualQuery,
                        implode(', ', $individualQueries)
                    );
                }
            }
        }
        if (($assert['expected_individual_queries_exact'] ?? false) === true) {
            $expected = array_map(fn(string $value): string => $this->normalizeQuery($value), $expectedIndividualQueries);
            $actual = array_map(fn(string $value): string => $this->normalizeQuery($value), $individualQueries);
            sort($expected);
            sort($actual);
            if ($expected !== $actual) {
                $failures[] = sprintf(
                    'individual shop queries differ from expected exact set. Expected [%s], got [%s].',
                    implode(', ', $expectedIndividualQueries),
                    implode(', ', $individualQueries)
                );
            }
        }
        if (isset($assert['min_individual_queries']) && count($individualQueries) < (int) $assert['min_individual_queries']) {
            $failures[] = sprintf(
                'too few individual shop queries: expected >= %d, got %d.',
                (int) $assert['min_individual_queries'],
                count($individualQueries)
            );
        }
        if (isset($assert['max_individual_queries']) && count($individualQueries) > (int) $assert['max_individual_queries']) {
            $failures[] = sprintf(
                'too many individual shop queries: expected <= %d, got %d.',
                (int) $assert['max_individual_queries'],
                count($individualQueries)
            );
        }
        foreach ($this->normalizeStringList($assert['must_include_terms'] ?? []) as $term) {
            if (!$this->containsTerm($joined, $term)) {
                $failures[] = sprintf('shop query output does not contain required term "%s".', $term);
            }
        }
        $requiredAnyTerms = $this->normalizeStringList($assert['must_include_any_terms'] ?? []);
        if ($requiredAnyTerms !== []) {
            $matched = false;
            foreach ($requiredAnyTerms as $term) {
                if ($this->containsTerm($joined, $term)) {
                    $matched = true;
                    break;
                }
            }
            if (!$matched) {
                $failures[] = sprintf(
                    'shop query output contains none of the required any-terms: [%s].',
                    implode(', ', $requiredAnyTerms)
                );
            }
        }
        foreach ($this->normalizeStringList($assert['must_not_include_terms'] ?? []) as $term) {
            if ($this->containsTerm($joined, $term)) {
                $failures[] = sprintf('shop query output contains forbidden term "%s".', $term);
            }
        }
        foreach ($this->normalizeStringList($assert['query_must_match_patterns'] ?? []) as $pattern) {
            if (@preg_match($pattern, $joined) !== 1) {
                $failures[] = sprintf('shop query output does not match required pattern "%s".', $pattern);
            }
        }
        foreach ($this->normalizeStringList($assert['query_must_not_match_patterns'] ?? []) as $pattern) {
            if (@preg_match($pattern, $joined) === 1) {
                $failures[] = sprintf('shop query output matches forbidden pattern "%s".', $pattern);
            }
        }
    }
    /**
     * @param array<int, string> $queries
     */
    private function containsNormalizedQuery(array $queries, string $needle): bool
    {
        $needle = $this->normalizeQuery($needle);
        foreach ($queries as $query) {
            if ($this->normalizeQuery($query) === $needle) {
                return true;
            }
        }
        return false;
    }
    private function containsTerm(string $haystack, string $term): bool
    {
        $haystack = $this->normalizeText($haystack);
        $term = $this->normalizeText($term);
        return $term !== '' && str_contains($haystack, $term);
    }
    private function normalizeQuery(string $value): string
    {
        $value = $this->normalizeText($value);
        $value = preg_replace('/[^\p{L}\p{N}]+/u', ' ', $value) ?? $value;
        $value = preg_replace('/\s+/u', ' ', $value) ?? $value;
        return trim($value);
    }
    private function normalizeText(string $value): string
    {
        $value = html_entity_decode(strip_tags($value), ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8');
        $value = mb_strtolower(trim($value), 'UTF-8');
        $value = preg_replace('/\s+/u', ' ', $value) ?? $value;
        return trim($value);
    }
    private function normalizeOneLine(string $value): string
    {
        $value = trim($value);
        $value = preg_replace('/\s+/u', ' ', $value) ?? $value;
        return trim($value);
    }
    private function stringOrNull(mixed $value): ?string
    {
        if (!is_string($value)) {
            return null;
        }
        $value = trim($value);
        return $value !== '' ? $value : null;
    }
    /**
     * @return array<int, string>
     */
    private function normalizeStringList(mixed $value): array
    {
        if (!is_array($value)) {
            return [];
        }
        $out = [];
        foreach ($value as $item) {
            if (!is_string($item)) {
                continue;
            }
            $item = trim($item);
            if ($item === '') {
                continue;
            }
            $out[] = $item;
        }
        return array_values(array_unique($out));
    }
    private function previewText(string $value, int $maxLength = 1200): string
    {
        $value = $this->normalizeOneLine($value);
        $maxLength = max(40, $maxLength);
        if (mb_strlen($value, 'UTF-8') <= $maxLength) {
            return $value;
        }
        return rtrim(mb_substr($value, 0, $maxLength, 'UTF-8')) . '...';
    }
 }
--- a/src/Knowledge/Retrieval/NdjsonChunkLookup.php
+++ b/src/Knowledge/Retrieval/NdjsonChunkLookup.php
@@ -357,7 +357,11 @@ final readonly class NdjsonChunkLookup
                continue;
            }
-            if (mb_strlen($token, 'UTF-8') < 3 && preg_match('/\d/u', $token) !== 1) {
+            if (
                mb_strlen($token, 'UTF-8') < 3
                && preg_match('/\d/u', $token) !== 1
                && !$this->isImportantShortTitleToken($token)
            ) {
                continue;
            }
@@ -367,6 +371,15 @@ final readonly class NdjsonChunkLookup
        return array_values(array_unique($out));
    }
    private function isImportantShortTitleToken(string $token): bool
    {
        if ($token === '' || mb_strlen($token, 'UTF-8') >= 3) {
            return false;
        }
        return in_array($token, $this->retrieverConfig->importantShortModelTokens(), true);
    }
    /**
     * @return array<string,bool>
     */
--- a/src/Knowledge/Retrieval/NdjsonHybridRetriever.php
+++ b/src/Knowledge/Retrieval/NdjsonHybridRetriever.php
@@ -133,13 +133,17 @@ final readonly class NdjsonHybridRetriever implements RetrieverInterface
                continue;
            }
            $row = $result['rows'][$chunkId];
            $rank++;
            $out[] = [
                'rank' => $rank,
                'chunk_id' => $chunkId,
-                'document_id' => $result['rows'][$chunkId]['document_id'] ?? null,
+                'document_id' => $row['document_id'] ?? null,
-                'chunk_index' => $result['rows'][$chunkId]['chunk_index'] ?? null,
+                'document_title' => $this->extractDocumentTitle($row),
                'file_path' => $this->extractMetadataString($row, 'file_path'),
                'version_number' => $this->extractMetadataString($row, 'version_number'),
                'chunk_index' => $row['chunk_index'] ?? null,
                'raw_score' => $result['rawScores'][$chunkId] ?? null,
                'rrf_score' => $result['rrfScores'][$chunkId] ?? null,
                'threshold' => $result['threshold'],
@@ -148,7 +152,7 @@ final readonly class NdjsonHybridRetriever implements RetrieverInterface
                'entity_label' => $result['entityLabel'],
                'is_list_query' => $result['isListQuery'],
                'selection_mode' => $result['selectionMode'],
-                'text' => trim((string)$result['rows'][$chunkId]['text']),
+                'text' => trim((string)($row['text'] ?? '')),
            ];
        }
@@ -1683,6 +1687,20 @@ final readonly class NdjsonHybridRetriever implements RetrieverInterface
        return '';
    }
    /**
     * Extracts a scalar metadata value for debug/eval output.
     */
    private function extractMetadataString(array $row, string $key): string
    {
        $value = $row['metadata'][$key] ?? null;
        if (is_scalar($value)) {
            return trim((string)$value);
        }
        return '';
    }
    /**
     * Normalizes text for token-safe product comparisons.
     */
--- a/src/Knowledge/Retrieval/QueryCleaner.php
+++ b/src/Knowledge/Retrieval/QueryCleaner.php
@@ -5,13 +5,15 @@ declare(strict_types=1);
 namespace App\Knowledge\Retrieval;
 use App\Config\LanguageCleanupConfig;
 use App\Config\NdjsonHybridRetrieverConfig;
 use App\Knowledge\StopWords;
 final readonly class QueryCleaner
 {
    public function __construct(
        private StopWords $stopWords,
-        private LanguageCleanupConfig $languageCleanupConfig
+        private LanguageCleanupConfig $languageCleanupConfig,
        private NdjsonHybridRetrieverConfig $retrieverConfig
    ) {
    }
@@ -21,9 +23,8 @@ final readonly class QueryCleaner
     * Important:
     * - Unicode-safe
     * - Numbers are preserved
-     * - Negations are preserved
+     * - Negations are preserved by protected-term aware cleanup profiles
-     * - No aggressive token-length filtering
+     * - Stop words are resolved from the generic legacy list plus YAML cleanup profile terms
     * - Stop words are removed
     */
    public function clean(string $query): string
    {
@@ -31,49 +32,49 @@ final readonly class QueryCleaner
            return '';
        }
-        // 1. Convert to lowercase in a Unicode-safe way
+        $profile = $this->loadCleanupProfile();
        // 1. Convert to lowercase in a Unicode-safe way.
        $query = mb_strtolower($query, 'UTF-8');
-        // 2. Treat hyphens and slashes as word separators
+        // 2. Treat hyphens and slashes as word separators.
        $query = $this->languageCleanupConfig->replaceWordSeparatorsWithSpace($query);
-        // 3. Remove special characters, but keep:
+        // 3. Remove configured cleanup phrases before punctuation stripping.
-        //    - letters
+        $query = $this->removePhrases($query, $profile['phrases']);
-        //    - numbers
+
-        //    - other Unicode letters
+        // 4. Remove special characters, but keep letters, numbers and other Unicode letters.
        $query = preg_replace('/[^\p{L}\p{N}\s]/u', ' ', $query);
        if ($query === null) {
            return '';
        }
-        // 4. Normalize multiple whitespace characters
+        // 5. Normalize multiple whitespace characters.
        $query = preg_replace('/\s+/u', ' ', $query);
-        $query = trim($query);
+        $query = trim((string) $query);
        if ($query === '') {
            return '';
        }
        // 5. Tokenize the query
        $tokens = preg_split('/\s+/u', $query);
        if ($tokens === false) {
            return '';
        }
        $profileTerms = array_fill_keys(array_merge($profile['stopwords'], $profile['meta_terms']), true);
        $cleanTokens = [];
        foreach ($tokens as $token) {
            $token = trim($token);
            if ($token === '') {
                continue;
            }
-            // Remove stop words
+            if ($this->stopWords->isStopWord($token) || isset($profileTerms[$token])) {
            if ($this->stopWords->isStopWord($token)) {
                continue;
            }
@@ -86,4 +87,42 @@ final readonly class QueryCleaner
        return implode(' ', $cleanTokens);
    }
    /**
     * @return array{stopwords:string[], phrases:string[], meta_terms:string[], protected_terms:string[]}
     */
    private function loadCleanupProfile(): array
    {
        return $this->languageCleanupConfig->getCleanupProfile($this->retrieverConfig->queryCleanupProfile());
    }
    /**
     * @param string[] $phrases
     */
    private function removePhrases(string $query, array $phrases): string
    {
        foreach ($phrases as $phrase) {
            $phrase = trim(mb_strtolower($phrase, 'UTF-8'));
            if ($phrase === '') {
                continue;
            }
            $normalizedPhrase = $this->languageCleanupConfig->replaceWordSeparatorsWithSpace($phrase);
            $parts = preg_split('/\s+/u', $normalizedPhrase, -1, PREG_SPLIT_NO_EMPTY) ?: [];
            if ($parts === []) {
                continue;
            }
            $pattern = implode('\\s+', array_map(
                static fn (string $part): string => preg_quote($part, '/'),
                $parts
            ));
            $query = preg_replace('/(?<!\p{L})(?:' . $pattern . ')(?!\p{L})/u', ' ', $query) ?? $query;
        }
        return $query;
    }
 }
--- a/src/Service/Admin/EvalAdminService.php
+++ b/src/Service/Admin/EvalAdminService.php
@@ -0,0 +1,774 @@
 <?php
 declare(strict_types=1);
 namespace App\Service\Admin;
 use App\Eval\AgentEvalRunner;
 use App\Eval\Dto\EvalCase;
 use App\Eval\Dto\EvalResult;
 use App\Eval\EvalCaseLoader;
 use App\Eval\EvalReportWriter;
 final readonly class EvalAdminService
 {
    /**
     * @var array<string, string>
     */
    private const TYPES = [
        'retrieval' => 'Retrieval',
        'shop_query' => 'Shopquery',
        'followup' => 'Follow-up',
        'answer_guard' => 'Answer-Guard',
    ];
    public function __construct(
        private EvalCaseLoader $caseLoader,
        private AgentEvalRunner $runner,
        private EvalReportWriter $reportWriter,
        private string $projectDir,
    ) {
    }
    /**
     * @return array<string, string>
     */
    public function supportedTypes(): array
    {
        return self::TYPES;
    }
    /**
     * @return array<int, string>
     */
    public function supportedTypeNames(): array
    {
        return array_keys(self::TYPES);
    }
    public function assertSupportedType(string $type): string
    {
        $type = trim($type);
        if (!array_key_exists($type, self::TYPES)) {
            throw new \InvalidArgumentException(sprintf('Unsupported eval type: %s', $type));
        }
        return $type;
    }
    /**
     * @return array<string, array<int, array{id:string,prompt:string,type:string}>>
     */
    public function casesByType(): array
    {
        $casesByType = [];
        foreach (array_keys(self::TYPES) as $type) {
            $casesByType[$type] = array_map(
                static fn (EvalCase $case): array => [
                    'id' => $case->id,
                    'type' => $case->type,
                    'prompt' => $case->prompt,
                ],
                $this->loadCases($type)
            );
        }
        return $casesByType;
    }
    /**
     * @return array<int, array<string, mixed>>
     */
    public function overview(): array
    {
        $overview = [];
        foreach (self::TYPES as $type => $label) {
            $cases = $this->loadCases($type);
            $report = $this->readTypeReport($type);
            $overview[] = [
                'type' => $type,
                'label' => $label,
                'case_count' => count($cases),
                'report' => $report,
                'status' => $this->statusFromReport($report),
            ];
        }
        return $overview;
    }
    /**
     * @return array<string, mixed>
     */
    public function run(string $type, ?string $caseId = null): array
    {
        $type = $this->assertSupportedType($type);
        $caseId = trim((string) $caseId);
        $cases = $this->loadCases($type);
        if ($caseId !== '') {
            $cases = $this->filterCasesById($cases, $caseId);
            if ($cases === []) {
                [$type, $cases] = $this->findCasesByIdAcrossTypes($caseId);
            }
        }
        if ($cases === []) {
            if ($caseId !== '') {
                throw new \RuntimeException(sprintf(
                    'Eval case "%s" was not found. Please select a case from the list for the chosen eval type.',
                    $caseId
                ));
            }
            throw new \RuntimeException(sprintf(
                'No eval cases available for eval type "%s".',
                $type
            ));
        }
        $results = $this->runner->runAll($cases);
        $report = $this->buildReport($type, $caseId !== '' ? $caseId : null, $results);
        $typeReportPath = $this->reportWriter->write($report, sprintf('%s-last-run.json', $type));
        $lastReportPath = $this->reportWriter->write($report);
        $report['written_to'] = $typeReportPath;
        $report['last_run_written_to'] = $lastReportPath;
        return $report;
    }
    /**
     * @return array{type:string,id:string,prompt:string,assert_json:string,history_json:string,request_context_hint:string,source_label:string}
     */
    public function emptyCaseDraft(string $type = 'retrieval'): array
    {
        $type = $this->assertSupportedType($type);
        return [
            'type' => $type,
            'id' => '',
            'prompt' => '',
            'assert_json' => $this->encodePrettyJson($this->defaultAssertForType($type)),
            'history_json' => '',
            'request_context_hint' => '',
            'source_label' => '',
        ];
    }
    /**
     * @return array{type:string,id:string,prompt:string,assert_json:string,history_json:string,request_context_hint:string,source_label:string}
     */
    public function caseDraftFromReportResult(string $type, string $caseId): array
    {
        $type = $this->assertSupportedType($type);
        $caseId = trim($caseId);
        if ($caseId === '') {
            throw new \InvalidArgumentException('Es wurde keine Quell-Case-ID übergeben.');
        }
        $report = $this->readTypeReport($type);
        if ($report === null) {
            throw new \RuntimeException(sprintf(
                'Für den Eval-Typ "%s" liegt kein Report vor. Bitte den Eval zuerst ausführen.',
                $type
            ));
        }
        $result = null;
        foreach (($report['results'] ?? []) as $candidate) {
            if (is_array($candidate) && (string) ($candidate['case_id'] ?? '') === $caseId) {
                $result = $candidate;
                break;
            }
        }
        if (!is_array($result)) {
            throw new \RuntimeException(sprintf(
                'Der Report enthält keinen Case "%s" für Eval-Typ "%s".',
                $caseId,
                $type
            ));
        }
        $details = is_array($result['details'] ?? null) ? $result['details'] : [];
        $prompt = trim((string) ($result['prompt'] ?? $details['prompt'] ?? ''));
        $history = $this->historyDraftFromDetails($details);
        $assert = $this->suggestAssertFromReportResult($type, $result, $details);
        return [
            'type' => $type,
            'id' => $this->suggestUniqueCaseId($type . '_' . $caseId . '_new'),
            'prompt' => $prompt,
            'assert_json' => $this->encodePrettyJson($assert),
            'history_json' => $history === [] ? '' : $this->encodePrettyJson($history),
            'request_context_hint' => '',
            'source_label' => sprintf('Vorlage aus Report-Case %s (%s)', $caseId, self::TYPES[$type]),
        ];
    }
    /**
     * @return array{type:string,id:string,path:string,row:array<string,mixed>,case_count:int}
     */
    public function createCase(
        string $type,
        string $id,
        string $prompt,
        string $assertJson,
        string $historyJson = '',
        string $requestContextHint = '',
    ): array {
        $type = $this->assertSupportedType($type);
        $id = $this->normalizeNewCaseId($id);
        $prompt = trim($prompt);
        $requestContextHint = trim($requestContextHint);
        if ($prompt === '') {
            throw new \InvalidArgumentException('Der Eval-Prompt darf nicht leer sein.');
        }
        if ($this->caseIdExists($id)) {
            throw new \RuntimeException(sprintf(
                'Ein Eval-Case mit der ID "%s" existiert bereits. Bitte eine neue ID verwenden.',
                $id
            ));
        }
        $assert = $this->decodeJsonObject($assertJson, 'Assert-JSON');
        $history = $this->decodeHistoryJson($historyJson);
        $row = [
            'id' => $id,
            'type' => $type,
            'prompt' => $prompt,
            'assert' => $assert,
        ];
        if ($history !== []) {
            $row['history'] = $history;
        }
        if ($requestContextHint !== '') {
            $row['request_context_hint'] = $requestContextHint;
        }
        // Validate with the same DTO that the eval runner uses.
        EvalCase::fromArray($row);
        $path = $this->caseFilePath($type);
        $line = json_encode(
            $row,
            JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_THROW_ON_ERROR
        );
        $prefix = '';
        if (is_file($path) && filesize($path) > 0) {
            $contents = file_get_contents($path);
            if (is_string($contents) && $contents !== '' && !str_ends_with($contents, "\n")) {
                $prefix = "\n";
            }
        }
        $written = file_put_contents($path, $prefix . $line . PHP_EOL, FILE_APPEND | LOCK_EX);
        if ($written === false) {
            throw new \RuntimeException(sprintf('Eval-Case-Datei konnte nicht geschrieben werden: %s', $path));
        }
        return [
            'type' => $type,
            'id' => $id,
            'path' => $path,
            'row' => $row,
            'case_count' => count($this->loadCases($type)),
        ];
    }
    /**
     * @return array{type:string,id:string,path:string,case_count:int}
     */
    public function deleteCase(string $type, string $caseId): array
    {
        $type = $this->assertSupportedType($type);
        $caseId = $this->normalizeExistingCaseId($caseId);
        $path = $this->caseFilePath($type);
        if (!is_file($path)) {
            throw new \RuntimeException(sprintf('Eval-Case-Datei wurde nicht gefunden: %s', $path));
        }
        $lines = file($path, FILE_IGNORE_NEW_LINES);
        if ($lines === false) {
            throw new \RuntimeException(sprintf('Eval-Case-Datei konnte nicht gelesen werden: %s', $path));
        }
        $keptLines = [];
        $deleted = false;
        foreach ($lines as $line) {
            $trimmed = trim((string) $line);
            if ($trimmed === '') {
                continue;
            }
            try {
                $decoded = json_decode($trimmed, true, 512, JSON_THROW_ON_ERROR);
            } catch (\JsonException $e) {
                throw new \RuntimeException(sprintf(
                    'Eval-Case-Datei enthält ungültiges JSON und wurde nicht verändert: %s',
                    $e->getMessage()
                ));
            }
            if (!is_array($decoded)) {
                throw new \RuntimeException('Eval-Case-Datei enthält eine ungültige NDJSON-Zeile und wurde nicht verändert.');
            }
            if ((string) ($decoded['id'] ?? '') === $caseId) {
                $deleted = true;
                continue;
            }
            $keptLines[] = $trimmed;
        }
        if (!$deleted) {
            throw new \RuntimeException(sprintf(
                'Eval-Case "%s" wurde im Typ "%s" nicht gefunden.',
                $caseId,
                $type
            ));
        }
        $contents = $keptLines === [] ? '' : implode(PHP_EOL, $keptLines) . PHP_EOL;
        $written = file_put_contents($path, $contents, LOCK_EX);
        if ($written === false) {
            throw new \RuntimeException(sprintf('Eval-Case-Datei konnte nicht geschrieben werden: %s', $path));
        }
        return [
            'type' => $type,
            'id' => $caseId,
            'path' => $path,
            'case_count' => count($this->loadCases($type)),
        ];
    }
    /**
     * @param array<int, EvalCase> $cases
     * @return array<int, EvalCase>
     */
    private function filterCasesById(array $cases, string $caseId): array
    {
        return array_values(array_filter(
            $cases,
            static fn (EvalCase $case): bool => $case->id === $caseId
        ));
    }
    /**
     * @return array{0:string,1:array<int, EvalCase>}
     */
    private function findCasesByIdAcrossTypes(string $caseId): array
    {
        foreach (array_keys(self::TYPES) as $candidateType) {
            $cases = $this->filterCasesById($this->loadCases($candidateType), $caseId);
            if ($cases !== []) {
                return [$candidateType, $cases];
            }
        }
        return ['', []];
    }
    /**
     * @return array<string, mixed>|null
     */
    public function readTypeReport(string $type): ?array
    {
        $type = $this->assertSupportedType($type);
        return $this->readReportFile(sprintf('%s/tests/evals/reports/%s-last-run.json', $this->projectDir, $type));
    }
    /**
     * @return array<string, mixed>|null
     */
    public function readLastReport(): ?array
    {
        return $this->readReportFile(sprintf('%s/tests/evals/reports/last-run.json', $this->projectDir));
    }
    /**
     * @return array<int, EvalCase>
     */
    private function loadCases(string $type): array
    {
        return $this->caseLoader->load($this->assertSupportedType($type));
    }
    /**
     * @param array<int, EvalResult> $results
     * @return array<string, mixed>
     */
    private function buildReport(string $type, ?string $caseId, array $results): array
    {
        $passed = count(array_filter(
            $results,
            static fn (EvalResult $result): bool => $result->passed
        ));
        $failed = count($results) - $passed;
        return [
            'type' => $type,
            'case_filter' => $caseId,
            'total' => count($results),
            'passed' => $passed,
            'failed' => $failed,
            'generated_at' => (new \DateTimeImmutable())->format(\DateTimeInterface::ATOM),
            'results' => array_map(
                static fn (EvalResult $result): array => $result->toArray(),
                $results
            ),
        ];
    }
    /**
     * @return array<string, mixed>|null
     */
    private function readReportFile(string $path): ?array
    {
        if (!is_file($path)) {
            return null;
        }
        $raw = file_get_contents($path);
        if (!is_string($raw) || trim($raw) === '') {
            return null;
        }
        $decoded = json_decode($raw, true);
        if (!is_array($decoded)) {
            return null;
        }
        return $decoded;
    }
    private function normalizeNewCaseId(string $id): string
    {
        $id = trim($id);
        if ($id === '') {
            throw new \InvalidArgumentException('Die Eval-Case-ID darf nicht leer sein.');
        }
        if (preg_match('/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/', $id) !== 1) {
            throw new \InvalidArgumentException(
                'Die Eval-Case-ID darf nur Buchstaben, Zahlen, Unterstriche und Bindestriche enthalten und muss mit einem Buchstaben oder einer Zahl beginnen.'
            );
        }
        return $id;
    }
    private function normalizeExistingCaseId(string $id): string
    {
        $id = trim($id);
        if ($id === '') {
            throw new \InvalidArgumentException('Es wurde keine Eval-Case-ID zum Löschen übergeben.');
        }
        if (preg_match('/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/', $id) !== 1) {
            throw new \InvalidArgumentException(
                'Die Eval-Case-ID ist ungültig. Erlaubt sind nur Buchstaben, Zahlen, Unterstriche und Bindestriche.'
            );
        }
        return $id;
    }
    private function caseIdExists(string $id): bool
    {
        foreach (array_keys(self::TYPES) as $type) {
            foreach ($this->loadCases($type) as $case) {
                if ($case->id === $id) {
                    return true;
                }
            }
        }
        return false;
    }
    /**
     * @return array<string, mixed>
     */
    private function decodeJsonObject(string $json, string $label): array
    {
        $json = trim($json);
        if ($json === '') {
            return [];
        }
        try {
            $decoded = json_decode($json, true, 512, JSON_THROW_ON_ERROR);
        } catch (\JsonException $e) {
            throw new \InvalidArgumentException(sprintf('%s ist ungültig: %s', $label, $e->getMessage()));
        }
        if (!is_array($decoded) || !str_starts_with($json, '{') || ($decoded !== [] && array_is_list($decoded))) {
            throw new \InvalidArgumentException(sprintf('%s muss ein JSON-Objekt sein.', $label));
        }
        return $decoded;
    }
    /**
     * @return array<int, array{prompt:string,answer:string}>
     */
    private function decodeHistoryJson(string $json): array
    {
        $json = trim($json);
        if ($json === '') {
            return [];
        }
        try {
            $decoded = json_decode($json, true, 512, JSON_THROW_ON_ERROR);
        } catch (\JsonException $e) {
            throw new \InvalidArgumentException(sprintf('History-JSON ist ungültig: %s', $e->getMessage()));
        }
        if (!is_array($decoded) || !str_starts_with($json, '[') || !array_is_list($decoded)) {
            throw new \InvalidArgumentException('History-JSON muss eine JSON-Liste sein.');
        }
        $history = [];
        foreach ($decoded as $entry) {
            if (is_string($entry)) {
                $entry = trim($entry);
                if ($entry !== '') {
                    $history[] = [
                        'prompt' => 'Eval-Kontext',
                        'answer' => $entry,
                    ];
                }
                continue;
            }
            if (!is_array($entry)) {
                continue;
            }
            $prompt = trim((string) ($entry['prompt'] ?? ''));
            $answer = trim((string) ($entry['answer'] ?? $entry['response'] ?? $entry['answer_preview'] ?? ''));
            if ($prompt === '' && $answer === '') {
                continue;
            }
            $history[] = [
                'prompt' => $prompt !== '' ? $prompt : 'Eval-Kontext',
                'answer' => $answer,
            ];
        }
        return $history;
    }
    private function caseFilePath(string $type): string
    {
        $type = $this->assertSupportedType($type);
        return sprintf('%s/tests/evals/cases/%s.ndjson', $this->projectDir, $type);
    }
    private function statusFromReport(?array $report): string
    {
        if ($report === null) {
            return 'not_run';
        }
        $failed = (int) ($report['failed'] ?? 0);
        $total = (int) ($report['total'] ?? 0);
        if ($total <= 0) {
            return 'empty';
        }
        return $failed === 0 ? 'green' : 'red';
    }
    /**
     * @return array<string, mixed>
     */
    private function defaultAssertForType(string $type): array
    {
        return match ($type) {
            'retrieval', 'answer_guard' => [
                'min_results' => 1,
            ],
            'shop_query', 'followup' => [
                'expected_query' => '',
            ],
            default => [],
        };
    }
    /**
     * @param array<string, mixed> $result
     * @param array<string, mixed> $details
     * @return array<string, mixed>
     */
    private function suggestAssertFromReportResult(string $type, array $result, array $details): array
    {
        if (($type === 'shop_query' || $type === 'followup') && is_string($details['query'] ?? null)) {
            $query = trim($details['query']);
            if ($query !== '') {
                return [
                    'expected_query' => $query,
                ];
            }
        }
        if (($type === 'shop_query' || $type === 'followup') && is_array($details['individual_queries'] ?? null)) {
            $queries = array_values(array_filter(array_map(
                static fn (mixed $value): string => trim((string) $value),
                $details['individual_queries']
            )));
            if ($queries !== []) {
                return [
                    'expected_individual_queries' => $queries,
                    'expected_individual_queries_exact' => true,
                ];
            }
        }
        if (is_array($details['document_refs'] ?? null)) {
            $documentIds = [];
            foreach ($details['document_refs'] as $documentRef) {
                if (!is_array($documentRef)) {
                    continue;
                }
                $documentId = trim((string) ($documentRef['id'] ?? ''));
                if ($documentId !== '') {
                    $documentIds[] = $documentId;
                }
            }
            if ($documentIds !== []) {
                return [
                    'min_results' => 1,
                    'must_include_one_of_document_ids' => array_values(array_unique($documentIds)),
                ];
            }
        }
        if (is_array($details['document_ids'] ?? null)) {
            $documentIds = array_values(array_filter(array_map(
                static fn (mixed $value): string => trim((string) $value),
                $details['document_ids']
            )));
            if ($documentIds !== []) {
                return [
                    'min_results' => 1,
                    'must_include_one_of_document_ids' => array_values(array_unique($documentIds)),
                ];
            }
        }
        $resultCount = (int) ($details['result_count'] ?? -1);
        if ($resultCount === 0) {
            return [
                'max_results' => 0,
            ];
        }
        return $this->defaultAssertForType($type);
    }
    /**
     * @param array<string, mixed> $details
     * @return array<int, array{prompt:string,answer:string}>
     */
    private function historyDraftFromDetails(array $details): array
    {
        if (!is_array($details['history'] ?? null)) {
            return [];
        }
        $history = [];
        foreach ($details['history'] as $entry) {
            if (!is_array($entry)) {
                continue;
            }
            $prompt = trim((string) ($entry['prompt'] ?? ''));
            $answer = trim((string) ($entry['answer'] ?? $entry['answer_preview'] ?? ''));
            if ($prompt === '' && $answer === '') {
                continue;
            }
            $history[] = [
                'prompt' => $prompt !== '' ? $prompt : 'Eval-Kontext',
                'answer' => $answer,
            ];
        }
        return $history;
    }
    private function suggestUniqueCaseId(string $base): string
    {
        $base = strtolower(trim($base));
        $base = preg_replace('/[^a-z0-9_-]+/', '_', $base) ?? 'eval_case';
        $base = trim($base, '_-');
        if ($base === '') {
            $base = 'eval_case';
        }
        if (!$this->caseIdExists($base)) {
            return $base;
        }
        for ($i = 2; $i <= 999; ++$i) {
            $candidate = sprintf('%s_%d', $base, $i);
            if (!$this->caseIdExists($candidate)) {
                return $candidate;
            }
        }
        return sprintf('%s_%s', $base, (new \DateTimeImmutable())->format('YmdHis'));
    }
    /**
     * @param array<mixed> $value
     */
    private function encodePrettyJson(array $value): string
    {
        return json_encode(
            $value,
            JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_THROW_ON_ERROR
        );
    }
 }
--- a/templates/admin/base.html.twig
+++ b/templates/admin/base.html.twig
@@ -134,6 +134,10 @@
                           href="{{ path('admin_model_config_list') }}#agentLiveTest">
                            <i class="bi bi-rocket-takeoff-fill"></i> KI-Agent Live-Test
                        </a>
                        <a class="nav-link text-light {% if route starts with 'admin_evals' %}active fw-bold{% endif %}"
                           href="{{ path('admin_evals_index') }}">
                            <i class="bi bi-clipboard2-check"></i> Eval Suite
                        </a>
                    {% endif %}
                    <hr class="border-secondary">
                    <div class="text-info text-uppercase small mb-2">
--- a/templates/admin/evals/case_new.html.twig
+++ b/templates/admin/evals/case_new.html.twig
@@ -0,0 +1,351 @@
 {% extends 'admin/base.html.twig' %}
 {% block title %}Eval-Cases verwalten{% endblock %}
 {% block body %}
    <div class="d-flex justify-content-between align-items-center mb-4 flex-wrap gap-2">
        <div>
            <h1 class="h3 mb-1">
                <i class="bi bi-journal-plus"></i> Eval-Cases verwalten
            </h1>
            <div class="small text-secondary">
                Neue Regression-Cases separat anlegen oder bestehende Cases entfernen, ohne die Eval-Suite-Übersicht aufzublähen.
            </div>
        </div>
        <a href="{{ path('admin_evals_index', {type: case_draft.type|default('retrieval')}) }}"
           class="btn btn-sm btn-outline-secondary">
            Zurück zur Eval Suite
        </a>
    </div>
    {% for label in ['success', 'danger', 'warning', 'info'] %}
        {% for message in app.flashes(label) %}
            <div class="alert alert-{{ label }} shadow-sm">
                {{ message }}
            </div>
        {% endfor %}
    {% endfor %}
    {% if case_draft.source_label|default('') %}
        <div class="alert alert-info border-info bg-black text-light shadow-sm">
            <strong>Vorlage geladen:</strong> {{ case_draft.source_label }}<br>
            <span class="small text-secondary">
                Bitte Case-ID, Prompt und Assertions prüfen, bevor du den Case speicherst.
            </span>
        </div>
    {% endif %}
    <div class="alert alert-secondary border-secondary bg-black text-light shadow-sm mb-4">
        <div class="fw-semibold text-warning mb-1">
            <i class="bi bi-compass"></i> Kurz erklärt
        </div>
        <div class="small text-secondary">
            Ein Eval-Case ist ein wiederholbarer Test. Du trägst ein, <strong class="text-light">was der Nutzer fragt</strong>
            und <strong class="text-light">woran RetrieX gemessen werden soll</strong>. Der Test verändert keine Daten im Shop oder im RAG-Wissen,
            sondern prüft nur, ob ein bekannter Fall weiterhin richtig läuft.
        </div>
    </div>
    <div class="row g-4">
        <div class="col-xl-8">
            <div class="card bg-black border-secondary text-light shadow-sm">
                <div class="card-body">
                    <h5 class="text-warning mb-3">
                        <i class="bi bi-pencil-square"></i> Neuer Eval-Case
                    </h5>
                    <form method="post" action="{{ path('admin_evals_case_create') }}">
                        <input type="hidden" name="_token" value="{{ csrf_token('admin_eval_case_create') }}">
                        <div class="mb-4">
                            <label class="form-label">Eval-Typ</label>
                            <select name="type" class="form-select bg-dark text-light border-secondary">
                                {% for type, label in types %}
                                    <option value="{{ type }}" {% if type == case_draft.type|default('retrieval') %}selected{% endif %}>
                                        {{ label }}
                                    </option>
                                {% endfor %}
                            </select>
                            <div class="form-text text-secondary">
                                Wähle zuerst, <strong class="text-light">was genau geprüft werden soll</strong>. Der Typ entscheidet auch,
                                in welche Datei der Case geschrieben wird: <code>tests/evals/cases/&lt;type&gt;.ndjson</code>.
                            </div>
                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
                                <div class="mb-1"><strong class="text-light">retrieval</strong>: prüft, ob die richtige Wissensquelle oder das richtige Dokument gefunden wird.</div>
                                <div class="mb-1"><strong class="text-light">shop_query</strong>: prüft, welche Suchquery an den Shop geschickt würde.</div>
                                <div class="mb-1"><strong class="text-light">followup</strong>: prüft eine Folgefrage, die den vorherigen Chatverlauf braucht.</div>
                                <div><strong class="text-light">answer_guard</strong>: prüft, dass RetrieX bei Unsinn oder fehlender Evidenz nichts erfindet.</div>
                            </div>
                        </div>
                        <div class="mb-4">
                            <label class="form-label">Neue Case-ID</label>
                            <input type="text"
                                   name="id"
                                   value="{{ case_draft.id|default('') }}"
                                   class="form-control bg-dark text-light border-secondary"
                                   placeholder="followup_testomat808_device_price_001"
                                   required>
                            <div class="form-text text-secondary">
                                Das ist der <strong class="text-light">interne Name des Tests</strong>. Er erscheint später in der Eval-Auswertung,
                                damit du den Fall wiedererkennst. Verwende keine Leerzeichen. Erlaubt sind Buchstaben, Zahlen, <code>_</code> und <code>-</code>.
                            </div>
                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
                                Gute Beispiele: <code>retrieval_lieferbedingungen_versand_001</code>,
                                <code>shop_query_testomat808_indikator300_001</code>,
                                <code>followup_testomat808_device_price_001</code>.<br>
                                Faustregel: <code>typ_thema_ziel_nummer</code>.
                            </div>
                        </div>
                        <div class="mb-4">
                            <label class="form-label">Prompt</label>
                            <textarea name="prompt"
                                      rows="3"
                                      class="form-control bg-dark text-light border-secondary"
                                      placeholder="und was kostet das gerät selber"
                                      required>{{ case_draft.prompt|default('') }}</textarea>
                            <div class="form-text text-secondary">
                                Hier kommt <strong class="text-light">genau die Nutzerfrage</strong> hinein, die getestet werden soll.
                                Nicht die erwartete Antwort eintragen, sondern den Satz, den ein Nutzer in den Chat schreiben würde.
                            </div>
                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
                                Tippfehler dürfen bewusst drin bleiben, wenn genau dieser Tippfehler abgesichert werden soll.
                                Beispiel: <code>ich würde gern chlor im schwinnbad messen</code> prüft dann auch die Korrektur Richtung <code>schwimmbad</code>.
                            </div>
                        </div>
                        <div class="mb-4">
                            <label class="form-label">Assert-JSON</label>
                            <textarea name="assert_json"
                                      rows="9"
                                      class="form-control bg-dark text-light border-secondary font-monospace"
                                      spellcheck="false">{{ case_draft.assert_json|default('{}') }}</textarea>
                            <div class="form-text text-secondary">
                                Hier steht, <strong class="text-light">was der Test erwarten soll</strong>. Das Feld muss gültiges JSON sein,
                                also mit <code>{</code> anfangen und mit <code>}</code> enden. Keine Kommentare und kein Komma nach dem letzten Eintrag.
                            </div>
                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
                                <div class="mb-2"><strong class="text-light">Wenn eine Shopquery exakt stimmen soll:</strong></div>
                                <pre class="bg-black border border-secondary rounded p-2 small text-light mb-3"><code>{
  "expected_query": "testomat 808"
 }</code></pre>
                                <div class="mb-2"><strong class="text-light">Wenn bestimmte Wörter enthalten sein müssen:</strong></div>
                                <pre class="bg-black border border-secondary rounded p-2 small text-light mb-3"><code>{
  "must_include_terms": [
    "testomat",
    "808"
  ]
 }</code></pre>
                                <div class="mb-2"><strong class="text-light">Wenn ein Dokument gefunden werden muss:</strong></div>
                                <pre class="bg-black border border-secondary rounded p-2 small text-light mb-0"><code>{
  "min_results": 1,
  "must_include_one_of_document_ids": [
    "DOKUMENT-ID"
  ]
 }</code></pre>
                            </div>
                        </div>
                        <div class="mb-4">
                            <label class="form-label">History-JSON <span class="text-secondary">optional</span></label>
                            <textarea name="history_json"
                                      rows="8"
                                      class="form-control bg-dark text-light border-secondary font-monospace"
                                      spellcheck="false"
                                      placeholder='[{"prompt":"vorherige Frage","answer":"vorherige Antwort"}]'>{{ case_draft.history_json|default('') }}</textarea>
                            <div class="form-text text-secondary">
                                Nur ausfüllen, wenn die aktuelle Frage den <strong class="text-light">vorherigen Chatverlauf</strong> braucht.
                                Für direkte Einzelprompts leer lassen. Das Feld muss eine JSON-Liste sein, also mit <code>[</code> anfangen und mit <code>]</code> enden.
                            </div>
                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
                                Typischer Einsatz: Der Nutzer fragt zuerst nach dem niedrigsten Grenzwert, danach nach dem Indikator
                                und anschließend <code>was kostet der indikator</code>. Dann braucht der Test die vorherigen Fragen und Antworten als History.
                                <pre class="bg-black border border-secondary rounded p-2 small text-light mt-2 mb-0"><code>[
  {
    "prompt": "mit welchem indikator",
    "answer": "Der Wert 0,02 °dH wird beim Testomat 808 mit Indikatortyp 300 gemessen."
  }
 ]</code></pre>
                            </div>
                        </div>
                        <div class="mb-4">
                            <label class="form-label">Request Context Hint <span class="text-secondary">optional</span></label>
                            <textarea name="request_context_hint"
                                      rows="3"
                                      class="form-control bg-dark text-light border-secondary"
                                      placeholder="Nur für Spezialfälle, wenn History nicht ausreicht.">{{ case_draft.request_context_hint|default('') }}</textarea>
                            <div class="form-text text-secondary">
                                Dieses Feld kannst du fast immer <strong class="text-light">leer lassen</strong>. Es ist nur für Sonderfälle gedacht,
                                wenn der Test Zusatzkontext braucht, der nicht sauber als History darstellbar ist.
                            </div>
                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
                                Beispiel für einen Sonderfall: <code>Im vorherigen Ergebnis waren mehrere Shop-Produkte sichtbar, aber keine normale Chatantwort.</code>
                                Für normale Regressionen ist <strong class="text-light">History-JSON die bessere Wahl</strong>.
                            </div>
                        </div>
                        <div class="d-flex flex-wrap gap-2">
                            <button type="submit" class="btn btn-warning">
                                <i class="bi bi-save"></i> Eval-Case speichern
                            </button>
                            <a href="{{ path('admin_evals_index', {type: case_draft.type|default('retrieval')}) }}"
                               class="btn btn-outline-secondary">
                                Abbrechen
                            </a>
                        </div>
                    </form>
                </div>
            </div>
        </div>
        <div class="col-xl-4">
            <div class="card bg-black border-danger text-light shadow-sm mb-4">
                <div class="card-body">
                    <h5 class="text-danger mb-3">
                        <i class="bi bi-trash3"></i> Bestehende Eval-Cases entfernen
                    </h5>
                    <p class="small text-secondary mb-3">
                        Hier kannst du falsch angelegte oder nicht mehr benötigte Cases aus den
                        <code>tests/evals/cases/*.ndjson</code>-Dateien entfernen. Das Löschen betrifft nur den Eval-Case,
                        nicht das RAG-Wissen, nicht den Shop und nicht die bestehenden Reports.
                    </p>
                    {% for type, label in types %}
                        {% set cases = cases_by_type[type]|default([]) %}
                        <details class="border border-secondary rounded p-3 mb-3" {% if type == case_draft.type|default('retrieval') %}open{% endif %}>
                            <summary class="text-info" style="cursor:pointer;">
                                {{ label }} <span class="text-secondary">({{ cases|length }} Cases)</span>
                            </summary>
                            {% if cases is empty %}
                                <div class="small text-secondary mt-3">
                                    Für diesen Typ gibt es aktuell keine Cases.
                                </div>
                            {% else %}
                                <div class="mt-3">
                                    {% for case in cases %}
                                        <div class="border-top border-secondary pt-3 mt-3">
                                            <div class="small mb-2">
                                                <code>{{ case.id }}</code>
                                                <div class="text-secondary mt-1">{{ case.prompt }}</div>
                                            </div>
                                            <form method="post"
                                                  action="{{ path('admin_evals_case_delete') }}"
                                                  onsubmit="return confirm('Eval-Case {{ case.id }} wirklich löschen? Diese Änderung entfernt die NDJSON-Zeile dauerhaft.');">
                                                <input type="hidden" name="_token" value="{{ csrf_token('admin_eval_case_delete_' ~ type ~ '_' ~ case.id) }}">
                                                <input type="hidden" name="type" value="{{ type }}">
                                                <input type="hidden" name="case_id" value="{{ case.id }}">
                                                <button type="submit" class="btn btn-sm btn-outline-danger">
                                                    <i class="bi bi-trash3"></i> Case löschen
                                                </button>
                                            </form>
                                        </div>
                                    {% endfor %}
                                </div>
                            {% endif %}
                        </details>
                    {% endfor %}
                    <div class="small text-secondary">
                        Nach dem Löschen solltest du den betroffenen Eval-Typ einmal ausführen, damit der Report zum neuen Case-Bestand passt.
                    </div>
                </div>
            </div>
            <div class="card bg-black border-secondary text-light shadow-sm mb-4">
                <div class="card-body">
                    <h5 class="text-info mb-3">
                        <i class="bi bi-info-circle"></i> Welcher Typ ist richtig?
                    </h5>
                    <div class="small text-secondary">
                        <div class="mb-3">
                            <strong class="text-light">Du willst prüfen, ob das richtige Dokument gefunden wird?</strong><br>
                            Dann nimm <code>retrieval</code>.
                        </div>
                        <div class="mb-3">
                            <strong class="text-light">Du willst prüfen, welche Suchwörter an den Shop gehen?</strong><br>
                            Dann nimm <code>shop_query</code>.
                        </div>
                        <div class="mb-3">
                            <strong class="text-light">Die Frage bezieht sich auf die vorherige Antwort?</strong><br>
                            Dann nimm <code>followup</code> und fülle <code>History-JSON</code> aus.
                        </div>
                        <div>
                            <strong class="text-light">RetrieX soll bei Unsinn nichts erfinden?</strong><br>
                            Dann nimm <code>answer_guard</code>.
                        </div>
                    </div>
                </div>
            </div>
            <div class="card bg-black border-secondary text-light shadow-sm mb-4">
                <div class="card-body">
                    <h5 class="text-info mb-3">
                        <i class="bi bi-braces"></i> Häufige Assertions
                    </h5>
                    <div class="small text-secondary mb-2">Exakte Query:</div>
                    <pre class="bg-dark border border-secondary rounded p-2 small text-light"><code>{
  "expected_query": "testomat 808"
 }</code></pre>
                    <div class="small text-secondary mb-2">Begriffe müssen enthalten sein:</div>
                    <pre class="bg-dark border border-secondary rounded p-2 small text-light"><code>{
  "must_include_terms": [
    "testomat",
    "808"
  ]
 }</code></pre>
                    <div class="small text-secondary mb-2">Begriffe dürfen nicht enthalten sein:</div>
                    <pre class="bg-dark border border-secondary rounded p-2 small text-light"><code>{
  "must_not_include_terms": [
    "indikator",
    "300"
  ]
 }</code></pre>
                    <div class="small text-secondary mb-2">Dokument muss enthalten sein:</div>
                    <pre class="bg-dark border border-secondary rounded p-2 small text-light"><code>{
  "min_results": 1,
  "must_include_one_of_document_ids": [
    "DOKUMENT-ID"
  ]
 }</code></pre>
                </div>
            </div>
            <div class="card bg-black border-secondary text-light shadow-sm mb-4">
                <div class="card-body">
                    <h5 class="text-info mb-3">
                        <i class="bi bi-check2-square"></i> Vor dem Speichern prüfen
                    </h5>
                    <ul class="small text-secondary mb-0">
                        <li>Prüft der Case genau einen Zweck?</li>
                        <li>Ist die Case-ID eindeutig und ohne Leerzeichen?</li>
                        <li>Ist der Prompt eine echte Nutzerfrage?</li>
                        <li>Ist Assert-JSON gültiges JSON?</li>
                        <li>Ist History nur bei echten Folgefragen gefüllt?</li>
                    </ul>
                </div>
            </div>
            <div class="card bg-black border-secondary text-light shadow-sm">
                <div class="card-body">
                    <h5 class="text-info mb-3">
                        <i class="bi bi-lightbulb"></i> Empfehlung
                    </h5>
                    <p class="small text-secondary mb-0">
                        Ein guter Eval-Case prüft genau einen Zweck. Lieber mehrere kleine Cases anlegen als einen großen, empfindlichen Case.
                        Wenn du unsicher bist, starte mit <code>expected_query</code> bei Shop-/Follow-up-Fällen oder mit
                        <code>must_include_one_of_document_ids</code> bei Retrieval-Fällen.
                    </p>
                </div>
            </div>
        </div>
    </div>
 {% endblock %}
--- a/templates/admin/evals/index.html.twig
+++ b/templates/admin/evals/index.html.twig
@@ -0,0 +1,547 @@
 {% extends 'admin/base.html.twig' %}
 {% block title %}RetrieX Eval Suite{% endblock %}
 {% block body %}
    <div class="d-flex justify-content-between align-items-center mb-4 flex-wrap gap-2">
        <div>
            <h1 class="h3 mb-1">
                <i class="bi bi-clipboard2-check"></i> RetrieX Eval Suite
            </h1>
            <div class="small text-secondary">
                Regressionen für Retrieval, Shopquery, Follow-up und Answer-Guard direkt im Admin prüfen.
            </div>
        </div>
        <div class="d-flex flex-wrap gap-2">
            <a href="{{ path('admin_evals_case_new', {type: selected_type|default('retrieval')}) }}"
               class="btn btn-sm btn-outline-warning">
                <i class="bi bi-journal-plus"></i> Eval-Cases verwalten
            </a>
            <a href="{{ path('admin_model_config_list') }}"
               class="btn btn-sm btn-outline-secondary">
                Zurück zum KI-/LLM-Setup
            </a>
        </div>
    </div>
    {% for label in ['success', 'danger', 'warning', 'info'] %}
        {% for message in app.flashes(label) %}
            <div class="alert alert-{{ label }} shadow-sm">
                {{ message }}
            </div>
        {% endfor %}
    {% endfor %}
    <div id="adminEvalRunOverlay"
         class="position-fixed top-0 start-0 w-100 h-100 d-none"
         style="background: rgba(0, 0, 0, .72); z-index: 1080;">
        <div class="h-100 d-flex align-items-center justify-content-center px-3">
            <div class="card bg-black border-warning text-light shadow-lg" style="max-width: 520px; width: 100%;">
                <div class="card-body text-center py-5">
                    <div class="spinner-border text-warning mb-3" role="status" aria-hidden="true"></div>
                    <h5 class="text-warning mb-2" id="adminEvalRunOverlayLabel">Eval läuft ...</h5>
                    <div class="small text-secondary">
                        Die Regressionstests werden ausgeführt. Bitte die Seite nicht neu laden.
                    </div>
                </div>
            </div>
        </div>
    </div>
    <div class="row g-4 mb-4">
        {% for item in overview %}
            {% set report = item.report %}
            {% set status = item.status %}
            {% set badgeClass = status == 'green'
                ? 'bg-success'
                : (status == 'red' ? 'bg-danger' : 'bg-secondary')
            %}
            <div class="col-md-6 col-xl-3">
                <div class="card bg-black border-secondary text-light h-100 shadow-sm">
                    <div class="card-body">
                        <div class="d-flex justify-content-between align-items-start gap-2 mb-2">
                            <h5 class="text-info mb-0">{{ item.label }}</h5>
                            <span class="badge {{ badgeClass }}">
                                {% if status == 'green' %}
                                    grün
                                {% elseif status == 'red' %}
                                    rot
                                {% elseif status == 'empty' %}
                                    leer
                                {% else %}
                                    nicht gelaufen
                                {% endif %}
                            </span>
                        </div>
                        <div class="small text-secondary mb-3">
                            {{ item.case_count }} Cases
                        </div>
                        {% if report %}
                            <div class="small">
                                <div><strong>Total:</strong> {{ report.total|default(0) }}</div>
                                <div><strong>Passed:</strong> {{ report.passed|default(0) }}</div>
                                <div><strong>Failed:</strong> {{ report.failed|default(0) }}</div>
                                <div class="text-secondary mt-2">
                                    {{ report.generated_at|default('') }}
                                </div>
                            </div>
                        {% else %}
                            <div class="small text-secondary">
                                Für diesen Typ liegt noch kein Admin-Report vor.
                            </div>
                        {% endif %}
                        <div class="d-flex flex-wrap gap-2 mt-3">
                            <form method="post"
                                  action="{{ path('admin_evals_run') }}"
                                  class="d-inline js-admin-eval-run-form"
                                  data-eval-type-label="{{ item.label|e('html_attr') }}">
                                <input type="hidden" name="_token" value="{{ csrf_token('admin_eval_run') }}">
                                <input type="hidden" name="type" value="{{ item.type }}">
                                <button type="submit" class="btn btn-sm btn-outline-warning js-admin-eval-run-button">
                                    <span class="js-admin-eval-button-label">Run</span>
                                    <span class="spinner-border spinner-border-sm ms-2 d-none js-admin-eval-button-spinner"
                                          role="status"
                                          aria-hidden="true"></span>
                                </button>
                            </form>
                            <a class="btn btn-sm btn-outline-info"
                               href="{{ path('admin_evals_index', {type: item.type}) }}">
                                Details
                            </a>
                        </div>
                    </div>
                </div>
            </div>
        {% endfor %}
    </div>
    <div class="row g-4 mb-4">
        <div class="col-xl-5">
            <div class="card bg-black border-secondary text-light h-100 shadow-sm">
                <div class="card-body">
                    <h5 class="text-warning mb-3">
                        <i class="bi bi-play-circle"></i> Eval ausführen
                    </h5>
                    <form method="post"
                          action="{{ path('admin_evals_run') }}"
                          class="js-admin-eval-run-form"
                          data-eval-type-label="Ausgewählter Eval">
                        <input type="hidden" name="_token" value="{{ csrf_token('admin_eval_run') }}">
                        <div class="mb-3">
                            <label class="form-label">Eval-Typ</label>
                            <select name="type" class="form-select bg-dark text-light border-secondary js-admin-eval-type-select">
                                {% for type, label in types %}
                                    <option value="{{ type }}" {% if type == selected_type %}selected{% endif %}>
                                        {{ label }}
                                    </option>
                                {% endfor %}
                            </select>
                            <div class="form-text text-secondary">
                                Ohne Case-ID wird der komplette Typ ausgeführt.
                            </div>
                        </div>
                        <div class="mb-3">
                            <label class="form-label">Optional: Case</label>
                            <select name="case_id"
                                    class="form-select bg-dark text-light border-secondary js-admin-eval-case-select">
                                <option value="">Alle Cases des ausgewählten Typs</option>
                                {% for type, cases in cases_by_type %}
                                    {% for case in cases %}
                                        <option value="{{ case.id }}"
                                                data-eval-type="{{ type }}"
                                                {% if type != selected_type %}hidden disabled{% endif %}>
                                            {{ case.id }} — {{ case.prompt }}
                                        </option>
                                    {% endfor %}
                                {% endfor %}
                            </select>
                            <div class="form-text text-secondary">
                                Die Case-Liste wird passend zum Eval-Typ gefiltert. Leer lassen, um alle Cases des Typs auszuführen.
                            </div>
                        </div>
                        <button type="submit" class="btn btn-outline-warning js-admin-eval-run-button">
                            <span class="js-admin-eval-button-label">Eval starten</span>
                            <span class="spinner-border spinner-border-sm ms-2 d-none js-admin-eval-button-spinner"
                                  role="status"
                                  aria-hidden="true"></span>
                        </button>
                    </form>
                </div>
            </div>
        </div>
        <div class="col-xl-7">
            <div class="card bg-black border-secondary text-light h-100 shadow-sm">
                <div class="card-body">
                    <h5 class="text-info mb-3">
                        <i class="bi bi-terminal"></i> CLI-Referenz
                    </h5>
                    <p class="small text-secondary mb-3">
                        Die Admin-Runs schreiben typspezifische Reports nach
                        <code>tests/evals/reports/&lt;type&gt;-last-run.json</code>
                        und zusätzlich den bekannten <code>last-run.json</code>.
                    </p>
                    <div class="small">
                        {% for type, label in types %}
                            <div class="mb-2">
                                <span class="text-info">{{ label }}</span><br>
                                <code>php bin/console mto:agent:eval:run {{ type }}</code>
                            </div>
                        {% endfor %}
                    </div>
                    {% if last_report %}
                        <hr class="border-secondary">
                        <div class="small text-secondary">
                            Letzter generischer Report:
                            <span class="text-light">{{ last_report.type|default('unknown') }}</span>,
                            {{ last_report.passed|default(0) }}/{{ last_report.total|default(0) }} bestanden,
                            {{ last_report.generated_at|default('') }}
                        </div>
                    {% endif %}
                </div>
            </div>
        </div>
    </div>
    <div class="card bg-black border-secondary text-light shadow-sm">
        <div class="card-body">
            <div class="d-flex justify-content-between align-items-center flex-wrap gap-2 mb-3">
                <h5 class="text-warning mb-0">
                    <i class="bi bi-list-check"></i>
                    Report-Details: {{ types[selected_type]|default(selected_type) }}
                </h5>
                <div class="btn-group btn-group-sm" role="group" aria-label="Eval report types">
                    {% for type, label in types %}
                        <a class="btn {{ type == selected_type ? 'btn-info' : 'btn-outline-info' }}"
                           href="{{ path('admin_evals_index', {type: type}) }}">
                            {{ label }}
                        </a>
                    {% endfor %}
                </div>
            </div>
            {% if selected_report %}
                {% set selectedFailed = selected_report.failed|default(0) %}
                <div class="row g-3 mb-3 small">
                    <div class="col-md-3">
                        <div class="border border-secondary rounded p-3 h-100">
                            <div class="text-secondary">Total</div>
                            <div class="h5 mb-0">{{ selected_report.total|default(0) }}</div>
                        </div>
                    </div>
                    <div class="col-md-3">
                        <div class="border border-secondary rounded p-3 h-100">
                            <div class="text-secondary">Passed</div>
                            <div class="h5 text-success mb-0">{{ selected_report.passed|default(0) }}</div>
                        </div>
                    </div>
                    <div class="col-md-3">
                        <div class="border border-secondary rounded p-3 h-100">
                            <div class="text-secondary">Failed</div>
                            <div class="h5 {{ selectedFailed == 0 ? 'text-success' : 'text-danger' }} mb-0">
                                {{ selectedFailed }}
                            </div>
                        </div>
                    </div>
                    <div class="col-md-3">
                        <div class="border border-secondary rounded p-3 h-100">
                            <div class="text-secondary">Generated</div>
                            <div class="small text-light">{{ selected_report.generated_at|default('') }}</div>
                        </div>
                    </div>
                </div>
                <div class="table-responsive">
                    <table class="table table-dark table-striped table-hover align-middle mb-0">
                        <thead class="table-secondary text-dark">
                        <tr>
                            <th>Status</th>
                            <th>Case</th>
                            <th>Dauer</th>
                            <th>Failures / Details</th>
                        </tr>
                        </thead>
                        <tbody>
                        {% for result in selected_report.results|default([]) %}
                            <tr>
                                <td style="width: 110px;">
                                    {% if result.passed|default(false) %}
                                        <span class="badge bg-success">PASS</span>
                                    {% else %}
                                        <span class="badge bg-danger">FAIL</span>
                                    {% endif %}
                                </td>
                                <td style="min-width: 260px;">
                                    <code>{{ result.case_id|default('') }}</code>
                                    <div class="small text-secondary mb-2">{{ result.type|default('') }}</div>
                                    {% set casePrompt = result.prompt|default(result.details.prompt|default('')) %}
                                    {% if casePrompt %}
                                        <div class="small mb-2">
                                            <span class="text-secondary">Prompt:</span><br>
                                            <span class="text-light">{{ casePrompt }}</span>
                                        </div>
                                    {% endif %}
                                    <div class="mt-2">
                                        <a href="{{ path('admin_evals_case_new', {source_type: selected_type, source_case_id: result.case_id|default('')}) }}"
                                           class="btn btn-sm btn-outline-warning">
                                            <i class="bi bi-journal-plus"></i> Als neuen Case vorbereiten
                                        </a>
                                    </div>
                                    {% set historyRows = result.details.history|default([]) %}
                                    {% if historyRows is not empty %}
                                        <details class="small">
                                            <summary class="text-info" style="cursor:pointer;">
                                                Kontext / History anzeigen
                                            </summary>
                                            <div class="mt-2 ps-2 border-start border-secondary">
                                                {% for turn in historyRows %}
                                                    <div class="mb-2">
                                                        <div class="text-secondary">Vorheriger Prompt:</div>
                                                        <div class="text-light">{{ turn.prompt|default('') }}</div>
                                                        {% if turn.answer_preview|default('') %}
                                                            <div class="text-secondary mt-1">Antwort-Auszug:</div>
                                                            <div class="text-secondary">{{ turn.answer_preview }}</div>
                                                        {% endif %}
                                                    </div>
                                                {% endfor %}
                                            </div>
                                        </details>
                                    {% endif %}
                                </td>
                                <td style="width: 120px;">
                                    {{ result.duration_ms|default(0) }} ms
                                </td>
                                <td>
                                    {% if result.failures|default([]) is not empty %}
                                        <ul class="mb-2 small text-danger">
                                            {% for failure in result.failures %}
                                                <li>{{ failure }}</li>
                                            {% endfor %}
                                        </ul>
                                    {% else %}
                                        <div class="small text-success mb-2">Keine Fehler.</div>
                                    {% endif %}
                                    {% set documentRefs = result.details.document_refs|default([]) %}
                                    {% if documentRefs is not empty %}
                                        <div class="mb-2">
                                            <div class="small text-secondary mb-1">Gefundene Dokumente</div>
                                            <div class="table-responsive">
                                                <table class="table table-dark table-sm table-bordered border-secondary align-middle mb-2">
                                                    <thead>
                                                    <tr class="small text-secondary">
                                                        <th style="width: 90px;">Ranks</th>
                                                        <th>Titel / Datei</th>
                                                        <th style="width: 170px;">Doc-ID</th>
                                                        <th style="width: 220px;">Chunks</th>
                                                    </tr>
                                                    </thead>
                                                    <tbody>
                                                    {% for doc in documentRefs %}
                                                        <tr>
                                                            <td class="small">{{ doc.ranks|default([])|join(', ') }}</td>
                                                            <td>
                                                                <div class="fw-semibold">{{ doc.title|default('Ohne Titel') }}</div>
                                                                {% if doc.file_path|default('') %}
                                                                    <div class="small text-secondary" style="word-break: break-all;">
                                                                        {{ doc.file_path }}
                                                                    </div>
                                                                {% endif %}
                                                                {% if doc.version_number|default('') %}
                                                                    <div class="small text-secondary">Version: {{ doc.version_number }}</div>
                                                                {% endif %}
                                                            </td>
                                                            <td><code class="small">{{ doc.id|default('') }}</code></td>
                                                            <td class="small" style="word-break: break-all;">
                                                                {% for chunkId in doc.chunk_ids|default([]) %}
                                                                    <code>{{ chunkId }}</code>{% if not loop.last %}<br>{% endif %}
                                                                {% endfor %}
                                                            </td>
                                                        </tr>
                                                    {% endfor %}
                                                    </tbody>
                                                </table>
                                            </div>
                                        </div>
                                    {% endif %}
                                    {% set resultRows = result.details.result_rows|default([]) %}
                                    {% if resultRows is not empty %}
                                        <details class="mb-2">
                                            <summary class="small text-info" style="cursor:pointer;">
                                                Treffer / Chunks anzeigen
                                            </summary>
                                            <div class="table-responsive mt-2">
                                                <table class="table table-dark table-sm table-bordered border-secondary align-middle mb-0">
                                                    <thead>
                                                    <tr class="small text-secondary">
                                                        <th style="width: 60px;">Rank</th>
                                                        <th>Titel / Datei</th>
                                                        <th style="width: 180px;">Chunk</th>
                                                        <th>Preview</th>
                                                    </tr>
                                                    </thead>
                                                    <tbody>
                                                    {% for row in resultRows %}
                                                        <tr>
                                                            <td>{{ row.rank|default('') }}</td>
                                                            <td>
                                                                <div class="fw-semibold">{{ row.document_title|default('Ohne Titel') }}</div>
                                                                {% if row.file_path|default('') %}
                                                                    <div class="small text-secondary" style="word-break: break-all;">{{ row.file_path }}</div>
                                                                {% endif %}
                                                                <div class="small text-secondary">Doc-ID: <code>{{ row.document_id|default('') }}</code></div>
                                                            </td>
                                                            <td class="small" style="word-break: break-all;">
                                                                <code>{{ row.chunk_id|default('') }}</code>
                                                                {% if row.chunk_index is defined and row.chunk_index is not same as(null) %}
                                                                    <div class="text-secondary">Index: {{ row.chunk_index }}</div>
                                                                {% endif %}
                                                            </td>
                                                            <td class="small text-secondary">{{ row.text_preview|default('') }}</td>
                                                        </tr>
                                                    {% endfor %}
                                                    </tbody>
                                                </table>
                                            </div>
                                        </details>
                                    {% endif %}
                                    <details>
                                        <summary class="small text-info" style="cursor:pointer;">
                                            JSON-Details anzeigen
                                        </summary>
                                        <pre class="bg-dark border border-secondary rounded p-2 mt-2 small text-light" style="white-space: pre-wrap; max-height: 260px; overflow: auto;">{{ result.details|default({})|json_encode(constant('JSON_PRETTY_PRINT')) }}</pre>
                                    </details>
                                </td>
                            </tr>
                        {% else %}
                            <tr>
                                <td colspan="4" class="text-center text-secondary py-4">
                                    Dieser Report enthält keine Resultate.
                                </td>
                            </tr>
                        {% endfor %}
                        </tbody>
                    </table>
                </div>
            {% else %}
                <div class="alert alert-secondary mb-0">
                    Für {{ types[selected_type]|default(selected_type) }} liegt noch kein typspezifischer Admin-Report vor.
                    Starte den Eval oben oder per CLI.
                </div>
            {% endif %}
        </div>
    </div>
    <script>
        document.addEventListener('DOMContentLoaded', function () {
            const forms = Array.from(document.querySelectorAll('.js-admin-eval-run-form'));
            const overlay = document.getElementById('adminEvalRunOverlay');
            const overlayLabel = document.getElementById('adminEvalRunOverlayLabel');
            function resolveEvalLabel(form) {
                const select = form.querySelector('.js-admin-eval-type-select');
                if (select && select.selectedOptions.length > 0) {
                    return select.selectedOptions[0].textContent.trim();
                }
                return (form.dataset.evalTypeLabel || 'Eval').trim();
            }
            function syncCaseSelect(form) {
                const typeSelect = form.querySelector('.js-admin-eval-type-select');
                const caseSelect = form.querySelector('.js-admin-eval-case-select');
                if (!typeSelect || !caseSelect) {
                    return;
                }
                const selectedType = typeSelect.value;
                Array.from(caseSelect.options).forEach(function (option) {
                    if (option.value === '') {
                        option.hidden = false;
                        option.disabled = false;
                        return;
                    }
                    const matchesType = option.dataset.evalType === selectedType;
                    option.hidden = !matchesType;
                    option.disabled = !matchesType;
                    if (!matchesType && option.selected) {
                        caseSelect.value = '';
                    }
                });
            }
            function setAllRunButtonsDisabled() {
                document.querySelectorAll('.js-admin-eval-run-button').forEach(function (button) {
                    button.disabled = true;
                    button.classList.add('disabled');
                });
            }
            forms.forEach(function (form) {
                syncCaseSelect(form);
                const typeSelect = form.querySelector('.js-admin-eval-type-select');
                if (typeSelect) {
                    typeSelect.addEventListener('change', function () {
                        syncCaseSelect(form);
                    });
                }
                form.addEventListener('submit', function (event) {
                    const button = event.submitter && event.submitter.classList.contains('js-admin-eval-run-button')
                        ? event.submitter
                        : form.querySelector('.js-admin-eval-run-button');
                    const label = resolveEvalLabel(form);
                    if (overlay && overlayLabel) {
                        overlayLabel.textContent = label + ' läuft ...';
                        overlay.classList.remove('d-none');
                    }
                    if (button) {
                        const buttonLabel = button.querySelector('.js-admin-eval-button-label');
                        const spinner = button.querySelector('.js-admin-eval-button-spinner');
                        if (buttonLabel) {
                            buttonLabel.textContent = 'Läuft ...';
                        }
                        if (spinner) {
                            spinner.classList.remove('d-none');
                        }
                    }
                    setAllRunButtonsDisabled();
                    document.body.style.cursor = 'progress';
                });
            });
        });
    </script>
 {% endblock %}
--- a/templates/admin/model_config/list.html.twig
+++ b/templates/admin/model_config/list.html.twig
@@ -4,15 +4,24 @@
 {% block body %}
-    <div class="d-flex justify-content-between align-items-center mb-4">
+    <div class="d-flex justify-content-between align-items-center mb-4 flex-wrap gap-2">
        <h1 class="h3 mb-0"><i class="bi bi-rocket-takeoff-fill"></i> KI Modell-Generierung</h1>
-        {% if is_granted('ROLE_SUPER_ADMIN') %}
+        <div class="d-flex flex-wrap gap-2">
-            <a href="{{ path('admin_model_config_create') }}"
+            {% if is_granted('ROLE_KNOWLEDGE_ADMIN') %}
-               class="btn btn-sm btn-outline-info">
+                <a href="{{ path('admin_evals_index') }}"
-                Neue Konfiguration
+                   class="btn btn-sm btn-outline-warning">
-            </a>
+                    Eval Suite
-        {% endif %}
+                </a>
            {% endif %}
            {% if is_granted('ROLE_SUPER_ADMIN') %}
                <a href="{{ path('admin_model_config_create') }}"
                   class="btn btn-sm btn-outline-info">
                    Neue Konfiguration
                </a>
            {% endif %}
        </div>
    </div>
    {# ========================================================= #}
--- a/tests/evals/cases/answer_guard.ndjson
+++ b/tests/evals/cases/answer_guard.ndjson
@@ -0,0 +1,4 @@
 {"id":"answer_guard_noise_no_evidence_001","type":"answer_guard","prompt":"dsgfsdgfsdgf","assert":{"max_results":0}}
 {"id":"answer_guard_mythical_medium_no_direct_evidence_001","type":"answer_guard","prompt":"gibt es einen testomat für drachenblut","assert":{"must_not_include_terms":["drachenblut"]}}
 {"id":"answer_guard_lunar_water_no_direct_evidence_001","type":"answer_guard","prompt":"welcher testomat misst mondwasser im vakuum","assert":{"must_not_include_terms":["mondwasser","vakuum"]}}
 {"id":"answer_guard_delivery_not_sdb_001","type":"answer_guard","prompt":"lieferbedingungen versand testomat","assert":{"min_results":1,"must_include_one_of_document_ids":["26ddf03d-9108-4a65-aa0e-a5df7613fa77"],"must_not_include_document_ids":["7166592f-85f2-425c-997b-73e323ae184d"]}}
--- a/tests/evals/cases/followup.ndjson
+++ b/tests/evals/cases/followup.ndjson
@@ -0,0 +1,4 @@
 {"id":"followup_indicator_price_001","type":"followup","prompt":"was kostet der indikator","history":[{"prompt":"Was ist der niedrigste Grenzwert für die Wasserhärte, welcher mit einem Testomaten überwacht werden kann?","answer":"Der niedrigste Grenzwert für die Wasserhärte beträgt 0,02 °dH. Dieser Wert wird vom Testomat 808 gemessen."},{"prompt":"mit welchem indikator","answer":"Der niedrigste messbare Grenzwert für Wasserhärte mit dem Testomat 808 wird mit dem Indikatortyp 300 erreicht."}],"assert":{"expected_query":"testomat 808 300 indikator","must_include_terms":["testomat","808","300","indikator"],"must_not_include_terms":["300 s","301","302","303","testomat 2000"]}}
 {"id":"followup_main_device_price_001","type":"followup","prompt":"und was kostet das gerät selber","history":[{"prompt":"was kostet der indikator","answer":"Shop-Suche abgeschlossen. Gesendete Suchquery: testomat 808 300 indikator. Testomat® 808 Indikator 300 500 ml, Produkt-Nummer 141001. Testomat® 808 Indikator 300 2 x 100 ml, Produkt-Nummer 140001. Der zugehörige Testomat ist Testomat 808."}],"assert":{"expected_query":"testomat 808","must_include_terms":["testomat","808"],"must_not_include_terms":["indikator","300","141001","140001"]}}
 {"id":"followup_weak_shop_information_anchor_001","type":"followup","prompt":"suche im shop nach der information","history":[{"prompt":"welche grenzwerte kann der testomat 2000 thcl messen","answer":"Der relevante Produktanker ist Testomat 2000 THCL. Das Gerät ist für Chlorüberwachung / freies Chlor relevant."}],"assert":{"expected_query":"testomat 2000 thcl","must_include_terms":["testomat","2000","thcl"],"must_not_equal_query":"information","must_not_include_terms":["information"]}}
 {"id":"followup_product_links_split_001","type":"followup","prompt":"gebe mir links zu den produkten aus dem shop","history":[{"prompt":"gerät zur messung Prozesswasser in medizinischen Geräten","answer":"Geeignete Produktanker sind Testomat 2000 Self Clean, Testomat 2000 CAL und Testomat 808."}],"assert":{"expected_individual_queries":["testomat 2000 self clean","testomat 2000 cal","testomat 808"],"expected_individual_queries_exact":true,"min_individual_queries":3,"max_individual_queries":3,"must_not_include_terms":["links zu aus"]}}
--- a/tests/evals/cases/retrieval.ndjson
+++ b/tests/evals/cases/retrieval.ndjson
@@ -16,4 +16,4 @@
 {"id":"retrieval_negative_003","type":"retrieval","prompt":"testomat 2000 self clean reinigungsloesung","assert":{"min_results":1,"must_include_one_of_document_ids":["51589532-a1a1-46e0-94b2-a139dce78543","b8c3343b-931e-4994-9d53-a2130efc846f"],"must_include_any_terms":["reinigungslösung","self clean"],"must_not_include_document_ids":["26129c01-c09f-4c71-9c80-7ddffb6c77fb"]}}
 {"id":"retrieval_short_001","type":"retrieval","prompt":"evo th","assert":{"min_results":1,"must_include_one_of_document_ids":["eb91c1be-4546-4ed5-8b01-f075519d675b","74fdad85-5e4e-4f08-8d95-402f3180ed55"],"must_include_any_terms":["evo"]}}
 {"id":"retrieval_short_002","type":"retrieval","prompt":"808","assert":{"min_results":1,"must_include_one_of_document_ids":["26129c01-c09f-4c71-9c80-7ddffb6c77fb"],"must_include_any_terms":["808"]}}
-{"id":"retrieval_noise_001","type":"retrieval","prompt":"dsgfsdgfsdgf","assert":{"max_results":0}}
+{"id":"retrieval_notfound_doc","type":"retrieval","prompt":"hdfghdfghdfhg","assert":{"min_results":0}}
--- a/tests/evals/cases/shop_query.ndjson
+++ b/tests/evals/cases/shop_query.ndjson
@@ -0,0 +1,5 @@
 {"id":"shop_query_indicator_exact_001","type":"shop_query","prompt":"was kostet der Testomat 808 Indikator 300","assert":{"must_include_terms":["testomat","808","300","indikator"],"must_not_include_terms":["300 s","301","302","303","gerät selber"]}}
 {"id":"shop_query_brewing_water_cleanup_001","type":"shop_query","prompt":"ich möchte für brauerei das brauwasser messen","assert":{"expected_query":"brauerei brauwasser","must_include_terms":["brauerei","brauwasser"],"must_not_include_terms":["möchte","messen","think"]}}
 {"id":"shop_query_swimming_pool_typo_001","type":"shop_query","prompt":"ich würde gern chlor im schwinnbad messen","assert":{"expected_query":"chlor schwimmbad","must_include_terms":["chlor","schwimmbad"],"must_not_include_terms":["schwinnbad","messen"]}}
 {"id":"shop_query_lab_cl_acronym_001","type":"shop_query","prompt":"Zeige mir die Preise zu Testomat LAB CL.","assert":{"expected_query":"testomat lab cl","must_include_terms":["testomat","lab","cl"],"must_not_equal_query":"testomat"}}
 {"id":"shop_query_sio2_anchor_001","type":"shop_query","prompt":"suche gerät kühlsysteme Silikatüberwachung","assert":{"expected_query":"testomat 808 sio2","must_include_terms":["testomat","808","sio2"],"must_not_include_terms":["kühlsysteme","silikatüberwachung"]}}
Author	SHA1	Message	Date
team 1	64d1ec71e8	p101d	2026-05-12 11:53:36 +02:00
team 1	3f914c1efd	p101b	2026-05-12 11:26:05 +02:00
team 1	6e2ca15e97	p101a	2026-05-12 11:08:34 +02:00
team 1	6dced1c4df	p101	2026-05-12 10:56:50 +02:00
team 1	feaec9bbaf	p100c	2026-05-12 09:16:09 +02:00
team 1	0d55c0a439	p100	2026-05-12 08:57:57 +02:00
team 1	03d4a1d7c3	p99c	2026-05-12 08:38:16 +02:00
team 1	3d0092b753	p99	2026-05-12 08:25:59 +02:00
team 1	e072a8e15e	p98	2026-05-12 07:53:49 +02:00
team 1	aa80acb10f	add multi model	2026-05-11 20:56:57 +02:00