p101d

p101b
p101a
2026-05-12 11:53:36 +02:00 · 2026-05-12 11:26:05 +02:00 · 2026-05-12 11:08:34 +02:00 · 2026-05-12 10:56:50 +02:00 · 2026-05-12 09:16:09 +02:00 · 2026-05-12 08:57:57 +02:00
40 changed files with 4290 additions and 83 deletions
--- a/CONFIG_PARAMS.md
+++ b/CONFIG_PARAMS.md
@@ -311,6 +311,7 @@ Wichtig: `genre.yaml` ist in v1.6.0 eine zentrale Entlastung des PHP-Cores. Doma
 | `min_chunk_distance` | Mindestabstand zwischen ausgewählten Chunks. |
 | `dominant_doc_*` | Bevorzugung dominanter Dokumente bei klarer Trefferlage. |
 | `exact_document_max_chunks` | Maximalchunks bei exaktem Dokumentfokus. |
+| `query_cleanup_profile` | YAML-Cleanup-Profil für die generische Retrieval-Query-Bereinigung. |
 | `focused_product_*` | Fokussierte Produktauswahl im Retrieval. |
 | `catalog_list_shortcut_patterns` | Direkte Katalog-/Listenrouten. |
 | `exact_selection_*` | Präzisionslogik für Tabellen, Indikatoren, Grenzwerte und Messbereiche. |
--- a/RETRIEX-EVAL-CASE-HOWTO.md
+++ b/RETRIEX-EVAL-CASE-HOWTO.md
@@ -0,0 +1,731 @@
+# RetrieX How-to: Neue Eval-Cases korrekt erstellen
+
+Dieses How-to beschreibt, wie neue Regressionstests für die RetrieX Eval-Suite über den Admin-Bereich angelegt werden.
+
+Ziel ist, neue rote oder fachlich wichtige Fälle dauerhaft abzusichern, ohne direkt Core-Logik, Retrieval-Regeln oder Shopquery-Heuristiken zu verändern.
+
+## Einstieg
+
+Admin-Pfad:
+
+```text
+/admin/evals/
+```
+
+Im Bereich **„Eval-Case erstellen“** können neue Cases für folgende Typen angelegt werden:
+
+```text
+retrieval
+shop_query
+followup
+answer_guard
+```
+
+Nach dem Speichern wird der Case in die passende Datei geschrieben:
+
+```text
+tests/evals/cases/retrieval.ndjson
+tests/evals/cases/shop_query.ndjson
+tests/evals/cases/followup.ndjson
+tests/evals/cases/answer_guard.ndjson
+```
+
+---
+
+## Grundregel
+
+Ein guter Eval-Case prüft genau **einen klaren Sachverhalt**.
+
+Gut:
+
+```json
+{
+  "expected_query": "testomat 808",
+  "must_not_include_terms": [
+    "indikator",
+    "300"
+  ]
+}
+```
+
+Weniger gut:
+
+```json
+{
+  "expected_query": "testomat 808",
+  "must_include_terms": [
+    "testomat",
+    "808",
+    "gerät",
+    "preis",
+    "wasserhärte"
+  ],
+  "must_not_include_terms": [
+    "indikator",
+    "300",
+    "testomat 2000",
+    "chlor",
+    "versand"
+  ]
+}
+```
+
+Je kleiner und eindeutiger der Case ist, desto besser eignet er sich als Regressionstest.
+
+---
+
+# Felder im Admin
+
+## 1. Eval-Typ
+
+Wähle den Typ passend zum Ziel des Tests.
+
+```text
+retrieval      → prüft, ob die richtigen RAG-Dokumente/Chunks gefunden werden
+shop_query     → prüft, welche Shopquery aus einem direkten Prompt entsteht
+followup       → prüft, welche Shopquery aus Prompt + Chatverlauf entsteht
+answer_guard   → prüft No-Answer-, Nicht-Halluzinations- oder Evidenzfälle
+```
+
+Faustregel:
+
+```text
+Wird das richtige Dokument gefunden?        → retrieval
+Wird die richtige Shopquery erzeugt?        → shop_query
+Versteht RetrieX die Folgefrage im Verlauf? → followup
+Erfindet RetrieX nichts bei schwacher Evidenz? → answer_guard
+```
+
+---
+
+## 2. Neue Case-ID
+
+Die Case-ID muss eindeutig sein und darf nur folgende Zeichen enthalten:
+
+```text
+Buchstaben
+Zahlen
+_
+-
+```
+
+Gute Beispiele:
+
+```text
+retrieval_semantic_chlor_clt_001
+shop_query_indicator_300_exact_002
+followup_main_device_price_002
+answer_guard_unknown_medium_001
+```
+
+Nicht verwenden:
+
+```text
+Test 1
+shop query indikator 300
+gerät/frage/neue-version
+```
+
+Empfohlenes Schema:
+
+```text
+<typ>_<thema>_<ziel>_<nummer>
+```
+
+Beispiel:
+
+```text
+followup_testomat808_device_price_001
+```
+
+---
+
+## 3. Prompt
+
+Hier kommt exakt der Nutzerprompt hinein, der getestet werden soll.
+
+Beispiele:
+
+```text
+welches geraet ist fuer chlorueberwachung gedacht
+```
+
+```text
+was kostet der indikator
+```
+
+```text
+und was kostet das gerät selber
+```
+
+```text
+welcher testomat misst drachenblut
+```
+
+Der Prompt sollte möglichst so eingetragen werden, wie er real im Chat vorkommt. Tippfehler dürfen bewusst enthalten sein, wenn genau dieses Verhalten abgesichert werden soll.
+
+---
+
+## 4. Assert-JSON
+
+Das Assert-JSON beschreibt, was der Test prüfen soll.
+
+Das Feld muss immer ein gültiges JSON-Objekt sein:
+
+```json
+{
+}
+```
+
+Wichtig:
+
+- Keine Kommentare im JSON
+- Keine trailing commas
+- Doppelte Anführungszeichen verwenden
+- Das Feld muss ein Objekt `{ ... }` sein, kein Array
+
+---
+
+# Eval-Typen und Beispiele
+
+## A) Retrieval-Case
+
+Retrieval-Cases prüfen, ob die richtigen RAG-Dokumente oder Chunks gefunden werden.
+
+### Minimaler positiver Retrieval-Case
+
+```json
+{
+  "min_results": 1
+}
+```
+
+### Retrieval-Case mit erwarteter Dokument-ID
+
+```json
+{
+  "min_results": 1,
+  "must_include_one_of_document_ids": [
+    "DOKUMENT-ID-HIER"
+  ]
+}
+```
+
+### Retrieval-Case mit mehreren möglichen Ziel-Dokumenten
+
+```json
+{
+  "min_results": 1,
+  "must_include_one_of_document_ids": [
+    "DOKUMENT-ID-1",
+    "DOKUMENT-ID-2"
+  ]
+}
+```
+
+### Retrieval-Case mit Pflichtbegriffen
+
+```json
+{
+  "min_results": 1,
+  "must_include_any_terms": [
+    "lieferung",
+    "versand"
+  ]
+}
+```
+
+### Retrieval-Case mit verbotenen Dokumenten
+
+```json
+{
+  "min_results": 1,
+  "must_not_include_document_ids": [
+    "FALSCHE-DOKUMENT-ID"
+  ]
+}
+```
+
+### Retrieval-Case für No-Result / Unsinn
+
+```json
+{
+  "max_results": 0
+}
+```
+
+### Empfohlene Retrieval-Struktur
+
+```json
+{
+  "min_results": 1,
+  "must_include_one_of_document_ids": [
+    "DOKUMENT-ID-HIER"
+  ],
+  "must_include_any_terms": [
+    "wichtiger fachbegriff",
+    "produktname"
+  ]
+}
+```
+
+---
+
+## B) Shopquery-Case
+
+Shopquery-Cases prüfen, welche Shopquery aus einem direkten Prompt entsteht.
+
+### Exakte Shopquery
+
+Prompt:
+
+```text
+was kostet der Testomat 808 Indikator 300
+```
+
+Assert-JSON:
+
+```json
+{
+  "expected_query": "testomat 808 300 indikator"
+}
+```
+
+### Shopquery mit Pflicht- und Verbotsbegriffen
+
+```json
+{
+  "must_include_terms": [
+    "testomat",
+    "808",
+    "300",
+    "indikator"
+  ],
+  "must_not_include_terms": [
+    "300 s",
+    "301",
+    "302",
+    "303"
+  ]
+}
+```
+
+### Query darf nicht auf Noise fallen
+
+```json
+{
+  "must_not_equal_query": "information"
+}
+```
+
+### Multi-Produkt- oder Link-Follow-up mit Einzelqueries
+
+```json
+{
+  "expected_individual_queries": [
+    "testomat 2000 self clean",
+    "testomat 2000 cal",
+    "testomat 808"
+  ],
+  "expected_individual_queries_exact": true
+}
+```
+
+### Empfehlung für Shopquery-Cases
+
+Nicht jeden Case sofort zu streng mit `expected_query` absichern. Bei noch variabler Query-Bildung ist oft besser:
+
+```json
+{
+  "must_include_terms": [
+    "testomat",
+    "808",
+    "sio2"
+  ],
+  "must_not_include_terms": [
+    "gerät",
+    "möchte",
+    "messen"
+  ]
+}
+```
+
+`expected_query` nur verwenden, wenn die Query bereits stabil und bewusst exakt sein soll.
+
+---
+
+## C) Follow-up-Case
+
+Follow-up-Cases prüfen, ob RetrieX den Verlauf korrekt nutzt.
+
+Bei `followup` ist **History-JSON praktisch Pflicht**, weil sonst kein echter Verlauf getestet wird.
+
+### Beispiel: Indikatorpreis nach Verlauf
+
+Prompt:
+
+```text
+was kostet der indikator
+```
+
+History-JSON:
+
+```json
+[
+  {
+    "prompt": "Was ist der niedrigste Grenzwert für die Wasserhärte, welcher mit einem Testomaten überwacht werden kann?",
+    "answer": "Der niedrigste Grenzwert für die Wasserhärte beträgt 0,02 °dH. Dieser Wert wird vom Testomat 808 gemessen."
+  },
+  {
+    "prompt": "mit welchem indikator",
+    "answer": "Der niedrigste messbare Grenzwert für Wasserhärte mit dem Testomat 808 wird mit dem Indikatortyp 300 erreicht."
+  }
+]
+```
+
+Assert-JSON:
+
+```json
+{
+  "expected_query": "testomat 808 300 indikator",
+  "must_include_terms": [
+    "testomat",
+    "808",
+    "300",
+    "indikator"
+  ],
+  "must_not_include_terms": [
+    "300 s",
+    "301",
+    "302",
+    "303",
+    "testomat 2000"
+  ]
+}
+```
+
+### Beispiel: Wechsel vom Indikator zurück zum Hauptgerät
+
+Prompt:
+
+```text
+und was kostet das gerät selber
+```
+
+History-JSON:
+
+```json
+[
+  {
+    "prompt": "was kostet der indikator",
+    "answer": "Shop-Suche abgeschlossen. Gesendete Suchquery: testomat 808 300 indikator. Testomat® 808 Indikator 300 500 ml, Produkt-Nummer 141001. Testomat® 808 Indikator 300 2 x 100 ml, Produkt-Nummer 140001. Der zugehörige Testomat ist Testomat 808."
+  }
+]
+```
+
+Assert-JSON:
+
+```json
+{
+  "expected_query": "testomat 808",
+  "must_include_terms": [
+    "testomat",
+    "808"
+  ],
+  "must_not_include_terms": [
+    "indikator",
+    "300",
+    "141001",
+    "140001"
+  ]
+}
+```
+
+### Empfehlung für Follow-up-Cases
+
+Die History sollte genau die Informationen enthalten, die der echte Chat vorher hatte.
+
+Nicht zu wenig:
+
+```text
+Nur "Indikator 300" ohne Geräteanker kann zu unklar sein.
+```
+
+Nicht zu viel:
+
+```text
+Ein kompletter langer Chatverlauf kann den Case unnötig instabil machen.
+```
+
+Gut ist ein kurzer, fachlich relevanter Auszug.
+
+---
+
+## D) Answer-Guard-Case
+
+Answer-Guard-Cases prüfen, dass RetrieX bei Unsinn, schwacher Evidenz oder falschen Zuordnungen nichts erfindet.
+
+### Unsinn soll keine Treffer liefern
+
+Prompt:
+
+```text
+dsgfsdgfsdgf
+```
+
+Assert-JSON:
+
+```json
+{
+  "max_results": 0
+}
+```
+
+### Erfundenes Medium soll nicht als echtes Produkt beantwortet werden
+
+Prompt:
+
+```text
+welcher testomat misst drachenblut
+```
+
+Assert-JSON:
+
+```json
+{
+  "must_not_include_terms": [
+    "drachenblut"
+  ]
+}
+```
+
+### Falsches Dokument darf nicht gezogen werden
+
+```json
+{
+  "min_results": 1,
+  "must_not_include_document_ids": [
+    "FALSCHE-DOKUMENT-ID"
+  ]
+}
+```
+
+### Empfehlung für Answer-Guard-Cases
+
+Bei Answer-Guard-Cases möglichst nicht auf einzelne Wörter im kompletten Retrieval-Text überreagieren. Besser sind:
+
+```text
+Dokument-IDs
+klare Produktnamen
+klare verbotene Zielbegriffe
+max_results bei Unsinn
+```
+
+Ein Wort irgendwo im Retrieval-Kontext ist nicht automatisch ein fachlicher Fehler.
+
+---
+
+# Optionales Feld: History-JSON
+
+History-JSON wird vor allem für `followup` verwendet.
+
+Format:
+
+```json
+[
+  {
+    "prompt": "vorherige Nutzerfrage",
+    "answer": "vorherige Antwort oder relevanter Auszug"
+  }
+]
+```
+
+Mehrere Turns:
+
+```json
+[
+  {
+    "prompt": "erste Frage",
+    "answer": "erste Antwort"
+  },
+  {
+    "prompt": "zweite Frage",
+    "answer": "zweite Antwort"
+  }
+]
+```
+
+Wichtig:
+
+```text
+History-JSON ist ein Array [...]
+Assert-JSON ist ein Objekt {...}
+```
+
+---
+
+# Optionales Feld: Request Context Hint
+
+Dieses Feld kann meistens leer bleiben.
+
+Es ist nur sinnvoll, wenn ein Case zusätzlichen Kontext simulieren soll, der nicht sauber über History abbildbar ist.
+
+Beispiel:
+
+```text
+Sichtbare Shop-Ergebnisse enthalten Testomat 808 und Testomat 808 Indikator 300.
+Der Nutzer fragt nach dem Gerät selber.
+```
+
+Empfehlung:
+
+```text
+Für normale Regressionen lieber History-JSON verwenden.
+Request Context Hint nur für Spezialfälle nutzen.
+```
+
+---
+
+# Vollständiges Beispiel: Follow-up-Gerätepreis
+
+## Eval-Typ
+
+```text
+followup
+```
+
+## Neue Case-ID
+
+```text
+followup_testomat808_main_device_price_002
+```
+
+## Prompt
+
+```text
+und was kostet das gerät selber
+```
+
+## Assert-JSON
+
+```json
+{
+  "expected_query": "testomat 808",
+  "must_include_terms": [
+    "testomat",
+    "808"
+  ],
+  "must_not_include_terms": [
+    "indikator",
+    "300",
+    "141001",
+    "140001"
+  ]
+}
+```
+
+## History-JSON
+
+```json
+[
+  {
+    "prompt": "was kostet der indikator",
+    "answer": "Shop-Suche abgeschlossen. Gesendete Suchquery: testomat 808 300 indikator. Testomat® 808 Indikator 300 500 ml, Produkt-Nummer 141001. Testomat® 808 Indikator 300 2 x 100 ml, Produkt-Nummer 140001. Der zugehörige Testomat ist Testomat 808."
+  }
+]
+```
+
+## Request Context Hint
+
+Leer lassen.
+
+---
+
+# Nach dem Speichern prüfen
+
+Nach dem Speichern sollte der passende Eval-Typ ausgeführt werden.
+
+Im Admin:
+
+```text
+/admin/evals/
+```
+
+Oder per CLI:
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+php bin/console mto:agent:eval:run shop_query
+php bin/console mto:agent:eval:run followup
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Für einen einzelnen Typ:
+
+```bash
+php bin/console mto:agent:eval:run followup
+```
+
+---
+
+# Praktische Checkliste
+
+Vor dem Speichern prüfen:
+
+```text
+[ ] Eval-Typ passt zum Ziel
+[ ] Case-ID ist eindeutig
+[ ] Case-ID enthält nur Buchstaben, Zahlen, _ oder -
+[ ] Prompt ist realistisch und exakt
+[ ] Assert-JSON ist gültiges JSON-Objekt
+[ ] History-JSON ist bei Follow-up-Cases vorhanden
+[ ] History-JSON ist gültiges JSON-Array
+[ ] Der Case prüft nur einen klaren Sachverhalt
+[ ] Assertions sind nicht unnötig streng
+[ ] Nach dem Speichern läuft der passende Eval-Typ grün
+```
+
+---
+
+# Wann ein neuer Eval-Case angelegt werden sollte
+
+Ein neuer Case ist sinnvoll, wenn:
+
+```text
+ein realer Prompt rot war
+ein wichtiger grüner Flow dauerhaft abgesichert werden soll
+ein Tippfehler-/Noise-Fall stabil bleiben soll
+eine Produktidentität nicht verloren gehen darf
+eine falsche Dokumentzuordnung verhindert werden soll
+eine No-Answer-Situation nicht halluzinieren darf
+```
+
+Kein neuer Case ist nötig, wenn:
+
+```text
+nur die Formulierung einer Antwort leicht anders war
+der Prompt fachlich nicht relevant ist
+die Erwartung nicht eindeutig definiert werden kann
+der Case mehrere unabhängige Dinge gleichzeitig prüfen würde
+```
+
+---
+
+# Leitlinie
+
+Ab RetrieX v1.6.2 gilt:
+
+```text
+Keine neue Genauigkeitslogik ohne konkreten roten oder fachlich wichtigen Eval-Fall.
+```
+
+Daher sollten neue Optimierungen möglichst immer so ablaufen:
+
+```text
+1. Prompt testen
+2. Verhalten bewerten
+3. Wenn wichtig: Eval-Case anlegen
+4. Eval grün bekommen
+5. Erst danach Logik, YAML oder Parameter ändern
+```
--- a/composer.json
+++ b/composer.json
@@ -29,7 +29,8 @@
        "symfony/twig-bundle": "7.4.*",
        "symfony/uid": "7.4.*",
        "symfony/yaml": "^7.4",
-      "ext-sqlite3": "*"
+      "ext-sqlite3": "*",
+      "ext-mbstring": "*"
    },
    "config": {
        "optimize-autoloader": true,
--- a/config/retriex/genre.yaml
+++ b/config/retriex/genre.yaml
@@ -759,6 +759,15 @@ parameters:
            Grenzwert: Überwachungsbereich
            store: shop
            Indikatortyp: Indikator
+            geraet: gerät analysegerät
+            geraete: geräte analysegeräte
+            wasserhaerte: wasserhärte
+            haerte: härte
+            ueberwachung: überwachung
+            chlorueberwachung: chlor überwachung chlorüberwachung
+            haerteueberwachung: härteüberwachung härte überwachung
+            haerteueberwachungsgeraet: härteüberwachungsgerät härteüberwachung analysegerät
+            lieferbedingungen: lieferung versand verkaufsbedingungen allgemeine lieferbedingungen
        accessory_focus_variants:
          origin: genre_native
          map:
@@ -1277,6 +1286,13 @@ parameters:
          - schwimmbad
          - schwimmbecken
          - pool
+          - silikat
+          - silikatüberwachung
+          - silikatueberwachung
+          - sio2
+          - si o2
+          - kieselsäure
+          - kieselsaeure
          - 0,02
        stopword_cleanup:
          origin: genre_native
@@ -2008,6 +2024,8 @@ parameters:
          - tm
          - ph
          - rx
+          - v
+          - c
          family_descriptor_tokens:
          - evo
          - eco
--- a/config/retriex/retrieval.yaml
+++ b/config/retriex/retrieval.yaml
@@ -22,6 +22,7 @@ parameters:
    dominant_doc_min_hits: 3
    dominant_doc_max_chunks: 4
    exact_document_max_chunks: 6
+    query_cleanup_profile: retrieval_reference_cleanup
    focused_product_window: 8
    focused_product_min_score: 10.0
    focused_product_min_gap: 4.0
--- a/patch_history/RETRIEX_PATCH_100B_ADMIN_EVAL_CASE_SELECTION_FIX_README.md
+++ b/patch_history/RETRIEX_PATCH_100B_ADMIN_EVAL_CASE_SELECTION_FIX_README.md
@@ -0,0 +1,37 @@
+# RetrieX Patch p100b - Admin Eval Case Selection Fix
+
+## Ziel
+
+Behebt die Admin-Eval-UX, wenn ein einzelner Case ausgewaehlt wird und der Request mit `No eval cases selected.` endet.
+
+## Ursache
+
+Die p100/p100a-Seite nutzte ein freies `datalist`-Feld fuer Case-IDs, das Cases aller Eval-Typen enthielt. Dadurch konnte ein Case aus `shop_query` ausgewaehlt werden, waehrend das Formular noch einen anderen Eval-Typ sendete. Der Admin-Service suchte dann nur in der Case-Datei des gesendeten Typs und fand keine passenden Cases.
+
+## Aenderungen
+
+- Das freie Case-ID-Feld wurde durch ein gefiltertes Select ersetzt.
+- Die Case-Liste wird clientseitig passend zum gewaehlten Eval-Typ gefiltert.
+- Beim Wechsel des Eval-Typs wird eine nicht passende Case-Auswahl automatisch geleert.
+- Der Admin-Service ist robuster: Wenn eine Case-ID nicht im gesendeten Typ gefunden wird, wird sie ueber alle unterstuetzten Eval-Typen gesucht und mit dem richtigen Typ ausgefuehrt.
+- Der Controller redirectet nach dem Run auf den effektiv ausgefuehrten Eval-Typ.
+- Die alte unklare Meldung `No eval cases selected.` wird durch konkrete Fehlertexte ersetzt.
+
+## Scope
+
+Keine Aenderungen an:
+
+- Retrieval-Logik
+- Shopquery-Logik
+- Follow-up-Logik
+- Answer-Guard-Logik
+- Eval-Cases
+- YAML-Konfiguration
+- Modellparametern
+- Datenbank/Migrationen
+
+## Geaenderte Dateien
+
+- `src/Controller/Admin/AdminEvalController.php`
+- `src/Service/Admin/EvalAdminService.php`
+- `templates/admin/evals/index.html.twig`
--- a/patch_history/RETRIEX_PATCH_100C_ADMIN_EVAL_DOCUMENT_LABELS_README.md
+++ b/patch_history/RETRIEX_PATCH_100C_ADMIN_EVAL_DOCUMENT_LABELS_README.md
@@ -0,0 +1,45 @@
+# RetrieX Patch p100c - Admin Eval Document Labels
+
+## Ziel
+
+Die Admin-Eval-Resultate sollen bei Retrieval-/Answer-Guard-Fällen nicht nur technische `document_id`- und `chunk_id`-Werte anzeigen, sondern auch menschenlesbare Dokumentinformationen, damit ein gefundenes Dokument im Admin/Dateibestand leichter identifiziert werden kann.
+
+## Änderungen
+
+- `NdjsonHybridRetriever::retrieveDebug()` gibt pro Debug-Treffer zusätzlich aus:
+  - `document_title`
+  - `file_path`
+  - `version_number`
+- `RetrievalDebugRunner` schreibt in Eval-Reports zusätzlich:
+  - `document_refs`: eindeutige Dokumentübersicht mit Titel, Datei, Version, Ranks und Chunk-IDs
+  - `result_rows`: rankgenaue Trefferliste mit Titel, Datei, Chunk-ID und Text-Preview
+- Admin-Eval-Template zeigt diese Informationen direkt in den Result-Details:
+  - Tabelle "Gefundene Dokumente"
+  - aufklappbare Tabelle "Treffer / Chunks anzeigen"
+  - JSON-Details bleiben weiterhin verfügbar
+
+## Nicht geändert
+
+- Keine Eval-Assertions geändert
+- Keine Retrieval-Gewichte geändert
+- Keine Shopquery-/Follow-up-/Answer-Logik geändert
+- Keine YAML-/Parameteränderung
+- Keine Datenbankmigration
+
+## Prüfung
+
+Nach Einspielen:
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Danach im Admin:
+
+```text
+/admin/evals/
+```
+
+Einen Retrieval- oder Answer-Guard-Eval öffnen und prüfen, ob bei den Resultaten Titel/Datei zusätzlich zur Doc-ID sichtbar sind.
--- a/patch_history/RETRIEX_PATCH_100D_ADMIN_EVAL_PROMPT_CONTEXT_README.md
+++ b/patch_history/RETRIEX_PATCH_100D_ADMIN_EVAL_PROMPT_CONTEXT_README.md
@@ -0,0 +1,44 @@
+# RetrieX Patch p100d – Admin Eval Prompt Context
+
+Status: patch-only follow-up for p100 Admin Eval UX.
+
+## Goal
+
+Make eval results easier to understand in the Admin UI by showing the actual case prompt directly next to the case id. For follow-up and shopquery cases, show a compact history/context preview as well.
+
+## Changes
+
+- Admin eval result table now displays the case prompt below the case id.
+- Follow-up/shopquery eval details now include a compact history preview.
+- Admin eval result table shows history/context in a collapsible section when available.
+
+## Files changed
+
+- `src/Eval/ShopQueryEvalRunner.php`
+- `templates/admin/evals/index.html.twig`
+
+## Non-goals
+
+No production answer logic is changed:
+
+- no retrieval logic changes
+- no shopquery logic changes
+- no follow-up logic changes
+- no answer-guard logic changes
+- no eval assertion changes
+- no YAML or parameter changes
+- no database migration
+
+## Validation
+
+Recommended after applying:
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+php bin/console mto:agent:eval:run shop_query
+php bin/console mto:agent:eval:run followup
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Then open `/admin/evals/` and verify that each result row shows the case prompt and that follow-up/shopquery rows can reveal context/history.
--- a/patch_history/RETRIEX_PATCH_100_ADMIN_EVAL_UX_README.md
+++ b/patch_history/RETRIEX_PATCH_100_ADMIN_EVAL_UX_README.md
@@ -0,0 +1,75 @@
+# RetrieX Patch p100 - Admin Eval UX
+
+Status: patch-only candidate
+Basis: confirmed v1.6.2 + p99/p99b/p99c green eval suite
+
+## Ziel
+
+p100 macht die mit p99 eingeführte Eval-Suite im Admin sichtbar und bedienbar, ohne die produktive RAG-, Shop-, Prompt-, Scoring- oder Antwortlogik fachlich zu ändern.
+
+## Enthalten
+
+- Neuer Admin-Bereich `/admin/evals/`
+- Übersicht über die Eval-Typen:
+  - `retrieval`
+  - `shop_query`
+  - `followup`
+  - `answer_guard`
+- Anzeige der Case-Anzahl pro Typ
+- Anzeige typspezifischer letzter Reports aus `tests/evals/reports/<type>-last-run.json`
+- Run-Buttons pro Eval-Typ
+- Formular zum Ausführen eines kompletten Typs oder einer einzelnen Case-ID
+- Detailansicht für PASS/FAIL, Fehler und Result-Details
+- CLI-Referenz im Admin
+- Sidebar-Link unter KI-Endpunkte
+- Link von der KI-/LLM-Setup-Seite zur Eval Suite
+
+## Report-Verhalten
+
+Admin-Runs schreiben zwei Reports:
+
+- `tests/evals/reports/<type>-last-run.json`
+- `tests/evals/reports/last-run.json`
+
+Die CLI bleibt unverändert und schreibt weiterhin den bekannten `last-run.json`.
+
+## Rollen
+
+Der neue Bereich ist auf Controller-Ebene durch `ROLE_KNOWLEDGE_ADMIN` geschützt.
+
+## Nicht geändert
+
+- keine Retrieval-Gewichte
+- keine Shopquery-Erzeugungslogik
+- keine Follow-up-Logik
+- keine Answer-Guard-Logik
+- keine Prompt-Änderung
+- keine YAML-Vokabularänderung
+- keine Modellparameteränderung
+- keine Datenbankmigration
+
+## Geänderte Dateien
+
+- `src/Controller/Admin/AdminEvalController.php`
+- `src/Service/Admin/EvalAdminService.php`
+- `templates/admin/evals/index.html.twig`
+- `templates/admin/base.html.twig`
+- `templates/admin/model_config/list.html.twig`
+- `patch_history/RETRIEX_PATCH_100_ADMIN_EVAL_UX_README.md`
+
+## Prüfung nach Einspielen
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+php bin/console mto:agent:eval:run shop_query
+php bin/console mto:agent:eval:run followup
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Zusätzlich im Browser prüfen:
+
+- `/admin/evals/`
+- Eval-Typ ausführen
+- Detailreport öffnen
+- Sidebar-Link sichtbar für Knowledge Admins
--- a/patch_history/RETRIEX_PATCH_101A_ADMIN_EVAL_CASE_CREATOR_PAGE_README.md
+++ b/patch_history/RETRIEX_PATCH_101A_ADMIN_EVAL_CASE_CREATOR_PAGE_README.md
@@ -0,0 +1,54 @@
+# RetrieX Patch p101a - Admin Eval Case Creator Separate Page
+
+## Ziel
+
+Der Eval-Case-Creator wird als eigene Admin-Seite geführt, damit die Eval-Suite-Übersicht schlank bleibt und nicht durch das komplette Case-Erstellformular aufgeblasen wirkt.
+
+## Neue / geänderte Admin-Routen
+
+- `GET /admin/evals/` bleibt die fokussierte Eval-Suite-Übersicht für Runs und Reports.
+- `GET /admin/evals/cases/new` zeigt das separate Formular zum Anlegen neuer Eval-Cases.
+- `POST /admin/evals/cases` speichert neue Eval-Cases in `tests/evals/cases/<type>.ndjson`.
+
+## UX-Änderungen
+
+- Die Eval-Suite-Übersicht erhält nur einen kompakten Button `Eval-Case erstellen`.
+- Report-Ergebnisse erhalten den Button `Als neuen Case vorbereiten`.
+- Die neue Seite übernimmt bei vorbereiteten Cases:
+  - Eval-Typ
+  - Prompt
+  - History/Kontext, sofern im Report vorhanden
+  - vorgeschlagene Assertions aus Query, Einzelqueries oder Dokument-IDs
+- Die eigentliche Case-Erstellung liegt außerhalb der Report-/Run-Übersicht.
+
+## Validierung
+
+Beim Speichern werden geprüft:
+
+- CSRF-Token
+- `ROLE_KNOWLEDGE_ADMIN`
+- unterstützter Eval-Typ
+- eindeutige Case-ID über alle Eval-Typen
+- erlaubtes Case-ID-Format
+- nicht leerer Prompt
+- gültiges Assert-JSON-Objekt
+- gültige History-JSON-Liste
+- DTO-Validierung über `EvalCase::fromArray()`
+
+## Nicht geändert
+
+- Keine Retrieval-Logik
+- Keine Shopquery-Logik
+- Keine Follow-up-Logik
+- Keine Answer-Guard-Logik
+- Keine Eval-Cases
+- Keine YAML-/Parameteränderung
+- Keine Migration
+
+## Betroffene Dateien
+
+- `src/Controller/Admin/AdminEvalController.php`
+- `src/Service/Admin/EvalAdminService.php`
+- `templates/admin/evals/index.html.twig`
+- `templates/admin/evals/case_new.html.twig`
+- `patch_history/RETRIEX_PATCH_101A_ADMIN_EVAL_CASE_CREATOR_PAGE_README.md`
--- a/patch_history/RETRIEX_PATCH_101B_ADMIN_EVAL_CASE_HELP_TEXTS_README.md
+++ b/patch_history/RETRIEX_PATCH_101B_ADMIN_EVAL_CASE_HELP_TEXTS_README.md
@@ -0,0 +1,52 @@
+# RetrieX Patch p101b - Admin Eval Case Help Texts
+
+## Ziel
+
+Verbessert die Hilfetexte auf der Admin-Seite zum Erstellen neuer Eval-Cases, damit auch weniger technische Nutzer verstehen, welche Werte in welche Felder gehören.
+
+## Scope
+
+Geändert:
+
+- `templates/admin/evals/case_new.html.twig`
+
+Neu:
+
+- `patch_history/RETRIEX_PATCH_101B_ADMIN_EVAL_CASE_HELP_TEXTS_README.md`
+
+## Änderungen
+
+- Ausführlichere Beschreibungen unter allen Eingabefeldern
+- Laienfreundliche Erklärung der Eval-Typen
+- Beispiele für gute Case-IDs
+- Klarere Erklärung für Prompt vs. erwartete Antwort
+- Copy-Paste-Beispiele für Assert-JSON
+- Erklärung, wann History-JSON benötigt wird
+- Hinweis, dass Request Context Hint fast immer leer bleiben kann
+- Zusätzliche Checkliste vor dem Speichern
+
+## Nicht geändert
+
+- Keine Eval-Logik
+- Keine Retrieval-Logik
+- Keine Shopquery-Logik
+- Keine Follow-up-Logik
+- Keine Answer-Guard-Logik
+- Keine bestehenden Eval-Cases
+- Keine YAML- oder Parameteränderung
+- Keine Migration
+
+## Prüfung
+
+Nach Einspielen:
+
+```bash
+php bin/console mto:agent:config:validate
+```
+
+Dann im Admin prüfen:
+
+- `/admin/evals/cases/new`
+- Hilfetexte unter allen Feldern sichtbar
+- Vorlage aus Report-Result weiterhin nutzbar
+- Case speichern weiterhin möglich
--- a/patch_history/RETRIEX_PATCH_101C_ADMIN_EVAL_CASE_DELETE_README.md
+++ b/patch_history/RETRIEX_PATCH_101C_ADMIN_EVAL_CASE_DELETE_README.md
@@ -0,0 +1,50 @@
+# RetrieX Patch p101c - Admin Eval Case Delete
+
+## Ziel
+
+Ergänzt die Admin-Eval-Case-Verwaltung um eine sichere Löschfunktion für einzelne Eval-Cases.
+
+Damit können falsch angelegte oder nicht mehr benötigte Cases direkt im Admin entfernt werden, ohne die Eval-Suite-Übersicht weiter aufzublähen.
+
+## Umfang
+
+- Neue POST-Route `admin_evals_case_delete` unter `/admin/evals/cases/delete`
+- CSRF-Schutz pro Eval-Typ und Case-ID
+- Rollenprüfung über `ROLE_KNOWLEDGE_ADMIN`
+- Entfernen genau des ausgewählten Cases aus `tests/evals/cases/<type>.ndjson`
+- Abbruch ohne Änderung, wenn die NDJSON-Datei ungültig ist oder der Case nicht gefunden wird
+- Löschbereich auf der separaten Case-Seite `/admin/evals/cases/new`
+- Bestätigungsdialog vor dem Löschen
+- Hinweis, dass nach dem Löschen der betroffene Eval-Typ erneut ausgeführt werden sollte
+
+## Nicht geändert
+
+- Keine Retrieval-Logik
+- Keine Shopquery-Logik
+- Keine Follow-up-Logik
+- Keine Answer-Guard-Logik
+- Keine Eval-Assertions
+- Keine bestehenden Cases automatisch gelöscht
+- Keine YAML-/Parameteränderung
+- Keine Migration
+
+## Prüfung
+
+Nach Einspielen:
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+php bin/console mto:agent:eval:run shop_query
+php bin/console mto:agent:eval:run followup
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Im Admin:
+
+1. `/admin/evals/cases/new` öffnen.
+2. Einen Test-Case anlegen oder einen bestehenden Test-Case auswählen.
+3. `Case löschen` klicken.
+4. Bestätigungsdialog bestätigen.
+5. Prüfen, dass der Case aus der Liste verschwindet.
+6. Den betroffenen Eval-Typ erneut laufen lassen.
--- a/patch_history/RETRIEX_PATCH_101D_ADMIN_EVAL_CASE_DELETE_HOTFIX_README.md
+++ b/patch_history/RETRIEX_PATCH_101D_ADMIN_EVAL_CASE_DELETE_HOTFIX_README.md
@@ -0,0 +1,53 @@
+# RetrieX Patch p101d - Admin Eval Case Delete Hotfix
+
+## Ziel
+
+Behebt einen Fehler aus p101c, bei dem beim Löschen eines Eval-Cases folgende Exception auftreten konnte:
+
+```text
+Call to undefined method App\Service\Admin\EvalAdminService::normalizeExistingCaseId()
+```
+
+## Ursache
+
+`EvalAdminService::deleteCase()` ruft eine Validierungs-Hilfsmethode für bestehende Case-IDs auf. Diese Methode wurde in p101c referenziert, aber nicht in die Service-Klasse aufgenommen.
+
+## Änderung
+
+Ergänzt `normalizeExistingCaseId()` in `EvalAdminService`.
+
+Die Methode:
+
+- trimmt die übergebene Case-ID,
+- verhindert leere IDs,
+- erlaubt nur Buchstaben, Zahlen, Unterstriche und Bindestriche,
+- gibt eine verständliche Fehlermeldung bei ungültigen IDs zurück.
+
+## Geänderte Dateien
+
+```text
+src/Service/Admin/EvalAdminService.php
+patch_history/RETRIEX_PATCH_101D_ADMIN_EVAL_CASE_DELETE_HOTFIX_README.md
+```
+
+## Nicht geändert
+
+```text
+keine Eval-Logik
+keine Retrieval-Logik
+keine Shopquery-Logik
+keine Follow-up-Logik
+keine Answer-Guard-Logik
+keine YAML-/Parameteränderung
+keine bestehenden Eval-Cases
+keine Migration
+```
+
+## Prüfung
+
+```bash
+php -l src/Service/Admin/EvalAdminService.php
+php bin/console mto:agent:config:validate
+```
+
+Danach im Admin einen Eval-Case löschen.
--- a/patch_history/RETRIEX_PATCH_101_ADMIN_EVAL_CASE_CREATOR_README.md
+++ b/patch_history/RETRIEX_PATCH_101_ADMIN_EVAL_CASE_CREATOR_README.md
@@ -0,0 +1,66 @@
+# RetrieX Patch p101 - Admin Eval Case Creator
+
+## Ziel
+
+p101 ergänzt die bestehende Admin Eval Suite um einen kleinen Case-Creator, damit neue Regression-Cases direkt aus dem Admin heraus in die passenden NDJSON-Dateien geschrieben werden können.
+
+Der Patch baut auf dem grünen p100/p100a/p100b/p100c/p100d-Stand auf und verändert keine produktive RAG-, Shopquery-, Follow-up- oder Antwortlogik.
+
+## Änderungen
+
+- Neue POST-Route im Admin:
+  - `/admin/evals/case/create`
+  - Route-Name: `admin_evals_case_create`
+- `EvalAdminService::createCase()` zum validierten Schreiben neuer Eval-Cases.
+- Neues Formular auf `/admin/evals/`:
+  - Eval-Typ
+  - Case-ID
+  - Prompt
+  - Assert-JSON
+  - optionales History-JSON
+  - optionaler Request Context Hint
+- Button pro Report-Result:
+  - `Als neuen Case vorbereiten`
+  - übernimmt Prompt, Typ, History-Vorschau, Query oder Dokument-ID als Vorlage in den Creator.
+- JSON-/ID-Validierung vor dem Schreiben.
+- Duplicate-Guard über alle Eval-Typen.
+
+## Geschriebene Dateien
+
+Neue Cases werden an folgende Dateien angehängt:
+
+- `tests/evals/cases/retrieval.ndjson`
+- `tests/evals/cases/shop_query.ndjson`
+- `tests/evals/cases/followup.ndjson`
+- `tests/evals/cases/answer_guard.ndjson`
+
+## Sicherheit / Scope
+
+Nicht geändert:
+
+- keine Retrieval-Gewichte
+- keine Shopquery-Logik
+- keine Follow-up-Logik
+- keine Answer-Guard-Logik
+- keine Prompt-/YAML-/Parameteränderung
+- keine Migration
+
+## Manuelle Prüfung
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+php bin/console mto:agent:eval:run shop_query
+php bin/console mto:agent:eval:run followup
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Zusätzlich im Admin:
+
+1. `/admin/evals/` öffnen.
+2. Einen Eval laufen lassen.
+3. Bei einem Result `Als neuen Case vorbereiten` klicken.
+4. Case-ID anpassen bzw. prüfen.
+5. Assert-JSON prüfen.
+6. Speichern.
+7. Den betroffenen Eval-Typ erneut laufen lassen.
--- a/patch_history/RETRIEX_PATCH_98_RETRIEVAL_EVAL_GREEN_BASELINE_README.md
+++ b/patch_history/RETRIEX_PATCH_98_RETRIEVAL_EVAL_GREEN_BASELINE_README.md
@@ -0,0 +1,79 @@
+# RetrieX Patch p98 - Retrieval Eval Green Baseline
+
+## Ziel
+
+p98 schärft die Retrieval-Baseline für die vier zuletzt roten Eval-Fälle, ohne neue produkt- oder testfallspezifische PHP-Sonderlogik einzuführen.
+
+Abgedeckte rote Fälle aus `tests/evals/cases/retrieval.ndjson`:
+
+- `welcher testomat ist ein verschneideregler`
+- `welches geraet ist fuer chlorueberwachung gedacht`
+- `lieferbedingungen versand testomat`
+- `testomat 2000 th 2005 sicherheitsdatenblatt`
+
+## Änderungen
+
+### 1. YAML-konfigurierbares Retrieval-Query-Cleanup
+
+`QueryCleaner` nutzt zusätzlich zum bestehenden Legacy-Stopword-Set ein YAML-Cleanup-Profil aus `retrieval.yaml`:
+
+```yaml
+query_cleanup_profile: retrieval_reference_cleanup
+```
+
+Dadurch werden generische Fragewörter wie `welcher` und `welches` über das bestehende Cleanup-Profil entfernt, ohne sie wieder in alte Legacy-Listen zurückzuschreiben.
+
+### 2. ASCII-/Umlaut- und Bedeutungsbrücken im Genre-Enrichment
+
+`genre.yaml` ergänzt konservative Query-Enrichment-Regeln für häufige ASCII-Schreibweisen und zusammengesetzte Suchbegriffe:
+
+- `geraet` -> `gerät analysegerät`
+- `chlorueberwachung` -> `chlor überwachung chlorüberwachung`
+- `haerteueberwachungsgeraet` -> `härteüberwachungsgerät härteüberwachung analysegerät`
+- `lieferbedingungen` -> `lieferung versand verkaufsbedingungen allgemeine lieferbedingungen`
+
+Die Regeln bleiben im genre-spezifischen Konfigurationsbereich `brands_and_canonical_terms.query_enrichment_rules`.
+
+### 3. Strengerer Exact-Title-Fallback für kurze Modellvarianten
+
+Kurze Modell-/Variantentokens aus der Retrieval-Vocabulary-View können nun bei Exact-Title-Tokenmatches signifikant sein.
+
+Damit gilt z. B. bei `Testomat 2000 V` auch `v` als relevanter Titelbestandteil. Eine Anfrage wie `testomat 2000 th 2005 sicherheitsdatenblatt` fällt dadurch nicht mehr fälschlich auf `Testomat 2000 V`, sondern kann in die normale Retrieval-Fusion laufen und dort die TH-2005-Sicherheitsdatenblätter treffen.
+
+### 4. Config-Validierung und Doku
+
+- `NdjsonHybridRetrieverConfig` exportiert `query_cleanup_profile`.
+- `RetriexEffectiveConfigProvider` validiert, dass das Profil existiert.
+- `CONFIG_PARAMS.md` dokumentiert den neuen Parameter.
+
+## Nicht geändert
+
+- Keine Shopquery-Logik geändert.
+- Keine Follow-up-Actions geändert.
+- Keine Agent-/Prompt-Antwortregeln geändert.
+- Keine Testomat-spezifische PHP-Sonderlogik ergänzt.
+- Keine Retrieval-Parameter wie Schwellenwerte, RRF-Gewichte oder Top-K verändert.
+
+## Validierung im Patch-Build
+
+Da die lokale Ausführungsumgebung keine vollständigen PHP-Erweiterungen/Vendor-Abhängigkeiten bereitstellt, konnte der Symfony-Eval-Command hier nicht ausgeführt werden. Stattdessen wurden folgende Checks ausgeführt:
+
+- YAML-Parsing für `retrieval.yaml`, `genre.yaml`, `language.yaml`
+- PHP-Syntaxprüfung für alle geänderten PHP-Dateien
+- lokale NDJSON-/Lexical-Index-Simulation gegen die bereitgestellte `knowledge.zip`
+
+Die Simulation zeigt für die vier roten Baseline-Fälle den erwarteten Zieltreffer in den Top-Ergebnissen:
+
+- Verschneideregler -> `Testomat 2000 V`
+- Chlorüberwachung -> `Testomat 2000 THCL`
+- Lieferbedingungen/Versand -> `Lieferung und Versand`
+- TH 2005 Sicherheitsdatenblatt -> `Testomat 2000 Indikator TH 2005`
+
+## Empfohlener Regressionstest nach Einspielen
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+```
+
+Erwartung: Die Retrieval-Baseline sollte von `15/19` auf `19/19` gehen. Falls nach produktiver Vector-/Lexical-Index-Lage noch ein einzelner semantischer Fall hängt, sollte zuerst der Knowledge-Index neu aufgebaut werden, bevor Retrieval-Parameter verändert werden.
--- a/patch_history/RETRIEX_PATCH_99B_EVAL_SUITE_ALIGNMENT_README.md
+++ b/patch_history/RETRIEX_PATCH_99B_EVAL_SUITE_ALIGNMENT_README.md
@@ -0,0 +1,85 @@
+# RetrieX Patch p99b - Eval Suite Alignment
+
+## Ziel
+
+p99 hatte die neue Eval-Suite erfolgreich aktiviert, aber drei neue Cases zeigten nach dem ersten Lauf rote Signale. p99b trennt dabei False-Positive-Assertions von zwei realen Robustheitsluecken, ohne die bestehende Retrieval-Baseline oder Shop-/Follow-up-Architektur umzubauen.
+
+## Ausgangslage
+
+Nach p99:
+
+- `mto:agent:config:validate`: OK
+- `mto:agent:eval:run retrieval`: 19/19 OK
+- `mto:agent:eval:run shop_query`: 4/5 OK
+- `mto:agent:eval:run followup`: 3/4 OK
+- `mto:agent:eval:run answer_guard`: 3/4 OK
+
+Rote Cases:
+
+- `shop_query_sio2_anchor_001`: normalisierte Shopquery konnte auf `gerät` zusammenschrumpfen.
+- `followup_main_device_price_001`: Hauptgeraet-Follow-up konnte an der vorherigen Indikator-Query `testomat 808 indikator 300` haengen bleiben.
+- `answer_guard_delivery_not_sdb_001`: Assertion war zu streng, weil ein Textbegriff `Sicherheitsdatenblatt` im Retrieval-Text kein ausreichender Fehlernachweis ist, solange das falsche Dokument nicht dominiert.
+
+## Aenderungen
+
+### 1. SiO2/Silikat als aktuelle Eingabe schuetzen
+
+`config/retriex/genre.yaml`
+
+Ergaenzt `shop_query_runtime.current_input_preservation_terms` um:
+
+- `silikat`
+- `silikatüberwachung`
+- `silikatueberwachung`
+- `sio2`
+- `si o2`
+- `kieselsäure`
+- `kieselsaeure`
+
+Damit verliert eine normalisierte Standalone-Shopfrage wie `suche gerät kühlsysteme Silikatüberwachung` nicht mehr den fachlichen Messparameter, bevor die generische Device-Anchor-Regel `testomat 808 sio2` greifen kann.
+
+### 2. Hauptgeraet-Follow-up darf Zubehoerreste entfernen
+
+`src/Agent/AgentRunner.php`
+
+`guardMainDeviceReferentialShopQueryWithHistoryModelAnchor()` wurde so angepasst, dass eine Shopquery wie `testomat 808 indikator 300` bei einem Prompt wie `und was kostet das gerät selber` nicht allein deshalb akzeptiert wird, weil sie bereits einen Modellanker enthaelt.
+
+Neu wird geprueft, ob nach dem Modellanker noch Zubehoer-/Code-Resttokens vorhanden sind. Falls ja, wird auf den reinen Modellanker aus dem Verlauf reduziert, z. B. `testomat 808`.
+
+### 3. Answer-Guard-Case weniger spröde
+
+`tests/evals/cases/answer_guard.ndjson`
+
+Der Case `answer_guard_delivery_not_sdb_001` prueft weiterhin:
+
+- passendes Liefer-/Versand-Dokument muss enthalten sein
+- konkretes SDB-Dokument darf nicht enthalten sein
+
+Die zu breite Text-Assertion auf den Begriff `sicherheitsdatenblatt` wurde entfernt, weil sie auch legitime Neben-/Hinweistexte treffen kann.
+
+## Bewusst nicht geaendert
+
+- Keine Retrieval-Gewichte
+- Keine Shopware-Suche
+- Keine Prompt-Texte
+- Keine Modellparameter
+- Keine neue Produkt-Sonderlogik
+- Keine Aenderung an p98-Retrieval-Eval-Cases
+
+## Erwartete Checks
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+php bin/console mto:agent:eval:run shop_query
+php bin/console mto:agent:eval:run followup
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Erwartung:
+
+- Config valid
+- Retrieval 19/19
+- Shopquery 5/5
+- Followup 4/4
+- Answer guard 4/4
--- a/patch_history/RETRIEX_PATCH_99C_MAIN_DEVICE_FOLLOWUP_EVAL_ALIGNMENT_README.md
+++ b/patch_history/RETRIEX_PATCH_99C_MAIN_DEVICE_FOLLOWUP_EVAL_ALIGNMENT_README.md
@@ -0,0 +1,60 @@
+# RETRIEX PATCH 99C - Main Device Follow-up Eval Alignment
+
+Status: patch-only follow-up for p99/p99b.
+
+## Goal
+
+Keep the new p99 follow-up eval suite aligned with the already confirmed manual
+reference flow:
+
+1. lowest water-hardness threshold
+2. indicator type
+3. indicator price
+4. main device price
+
+The main-device follow-up `und was kostet das gerät selber` must resolve back to
+the main device anchor (`testomat 808`) and must not keep accessory remnants such
+as `indikator` or exact indicator code `300`.
+
+## Root cause
+
+p99b added a residual accessory guard, but the main-device history-anchor guard
+returned early for non-generic shop queries before the residual check could run.
+A query like `testomat 808 indikator 300` contains digits, so it was not treated
+as a generic main-device query and stayed unchanged.
+
+## Change
+
+`AgentRunner::guardMainDeviceReferentialShopQueryWithHistoryModelAnchor()` now:
+
+1. detects the main-device referential prompt,
+2. extracts the latest history model anchor,
+3. if the generated shop query already contains that model anchor, checks for
+   accessory/code residuals,
+4. reduces the query to the pure model anchor when such residuals are present.
+
+This keeps explicit non-generic product queries untouched unless they contain the
+current history model anchor plus accessory leftovers in a main-device follow-up.
+
+## Expected eval result
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+php bin/console mto:agent:eval:run shop_query
+php bin/console mto:agent:eval:run followup
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Expected:
+
+- retrieval: 19/19
+- shop_query: 5/5
+- followup: 4/4
+- answer_guard: 4/4
+
+## Productive logic impact
+
+Minimal. The patch only changes the already existing main-device follow-up guard
+for prompts asking for the main device itself. It does not modify retrieval,
+ranking, prompt templates, YAML vocabulary, shop result guards, or answer logic.
--- a/patch_history/RETRIEX_PATCH_99_EVAL_SUITE_EXPANSION_README.md
+++ b/patch_history/RETRIEX_PATCH_99_EVAL_SUITE_EXPANSION_README.md
@@ -0,0 +1,157 @@
+# RetrieX Patch p99 - Eval Suite Expansion
+
+## Ziel
+
+p99 erweitert die bisher reine Retrieval-Eval-Baseline um zusätzliche, manuell bekannte Regressionstypen aus v1.6.2:
+
+- Shopquery-Erzeugung
+- Follow-up-Auflösung mit Chatverlauf
+- Antwort-/Halluzinations-Guardrails auf Retrieval-Evidenzebene
+
+Der Patch ändert bewusst keine produktive RAG-, Retrieval-, Shop-, Prompt- oder Antwortlogik. Er ergänzt nur Eval-Infrastruktur und Eval-Cases.
+
+## Neue Eval-Typen
+
+### `shop_query`
+
+Prüft die von `AgentRunner` vorbereitete Shop-Suchquery anhand der Shop-Meta-Ausgabe. Der Runner stoppt, sobald die erste Shop-Such-Meta-Card erzeugt wurde. Dadurch werden die Query-Guards, die Routing-/History-Logik und die finalen Shopquery-Filter geprüft, ohne von der Live-Shopware-Suche abhängig zu sein.
+
+Beispiel:
+
+```bash
+php bin/console mto:agent:eval:run shop_query
+```
+
+Cases liegen in:
+
+```text
+tests/evals/cases/shop_query.ndjson
+```
+
+Abgedeckt werden unter anderem:
+
+- exakter Indikatorcode `Testomat 808 Indikator 300`
+- Brauerei-/Brauwasser-Query-Cleanup
+- Schwimmbad-Tippfehlerkorrektur
+- LAB-CL-Kürzelerhalt
+- SIO2-Geräteanker für Silikatüberwachung
+
+### `followup`
+
+Prüft referenzielle Shop-Folgefragen mit vorbereiteten History-Turns. Die History wird pro Eval-Case in einen isolierten temporären Eval-User geschrieben und danach wieder gelöscht.
+
+Beispiel:
+
+```bash
+php bin/console mto:agent:eval:run followup
+```
+
+Cases liegen in:
+
+```text
+tests/evals/cases/followup.ndjson
+```
+
+Abgedeckt werden unter anderem:
+
+- `0,02 °dH -> Testomat 808 -> Indikatortyp 300 -> was kostet der indikator`
+- Wechsel vom Indikatorpreis zurück zum Hauptgerätpreis
+- schwache Shop-Folgefrage `suche im shop nach der information` mit THCL-Historyanker
+- Produktlink-Follow-up mit Einzelqueries statt kombinierter Multi-Produkt-Query
+
+### `answer_guard`
+
+Prüft Antwort-Guardrails vor der finalen LLM-Antwort auf Basis der Retrieval-Evidenz. Das ist absichtlich kein generativer LLM-Antworttest, sondern ein stabiler Pre-Answer-Guard gegen falsche Evidenz oder Halluzinationsrisiken.
+
+Beispiel:
+
+```bash
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Cases liegen in:
+
+```text
+tests/evals/cases/answer_guard.ndjson
+```
+
+Abgedeckt werden unter anderem:
+
+- Noise-Prompt ohne Evidenz
+- Fantasie-Medien wie Drachenblut / Mondwasser
+- Lieferbedingungen dürfen nicht auf Sicherheitsdatenblätter kippen
+
+## Neue Assertion-Felder
+
+### Für `shop_query` und `followup`
+
+```json
+{
+  "expected_query": "testomat 808 300 indikator",
+  "must_include_terms": ["testomat", "808", "300", "indikator"],
+  "must_not_include_terms": ["300 s", "301", "302"],
+  "must_not_equal_query": "information"
+}
+```
+
+Für Multi-Produkt-Follow-ups:
+
+```json
+{
+  "expected_individual_queries": [
+    "testomat 2000 self clean",
+    "testomat 2000 cal",
+    "testomat 808"
+  ],
+  "expected_individual_queries_exact": true,
+  "min_individual_queries": 3,
+  "max_individual_queries": 3
+}
+```
+
+### Für `retrieval` und `answer_guard`
+
+`RetrievalDebugRunner` unterstützt zusätzlich:
+
+```json
+{
+  "must_not_include_terms": ["sicherheitsdatenblatt"],
+  "must_not_match_patterns": ["/forbidden/u"]
+}
+```
+
+## Geänderte Dateien
+
+```text
+src/Command/AgentEvalRunCommand.php
+src/Eval/AgentEvalRunner.php
+src/Eval/AnswerGuardEvalRunner.php
+src/Eval/Dto/EvalCase.php
+src/Eval/RetrievalDebugRunner.php
+src/Eval/ShopQueryEvalRunner.php
+tests/evals/cases/answer_guard.ndjson
+tests/evals/cases/followup.ndjson
+tests/evals/cases/shop_query.ndjson
+patch_history/RETRIEX_PATCH_99_EVAL_SUITE_EXPANSION_README.md
+```
+
+## Nicht geändert
+
+- Keine Retrieval-Gewichte geändert.
+- Keine Shopquery-Produktivlogik geändert.
+- Keine Prompt-Regeln geändert.
+- Keine YAML-Vokabularregeln geändert.
+- Keine LLM-/Modellparameter geändert.
+- Keine Admin-/Frontend-Logik geändert.
+
+## Empfohlene Validierung nach Einspielen
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+php bin/console mto:agent:eval:run shop_query
+php bin/console mto:agent:eval:run followup
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Wichtig: `shop_query` und `followup` laufen über den `AgentRunner` bis zur Shop-Meta-Card. Sie stoppen vor der Live-Shop-Suche, können aber je nach aktiver Konfiguration weiterhin Input-Normalisierung oder Shopquery-Optimierung über das konfigurierte LLM versuchen. Wenn das LLM nicht erreichbar ist, greift die bestehende Fallback-Logik des Agenten.
--- a/src/Agent/AgentRunner.php
+++ b/src/Agent/AgentRunner.php
@@ -4155,7 +4155,6 @@ final readonly class AgentRunner
            $shopSearchQuery === ''
            || trim($commerceHistoryContext) === ''
            || $this->referenceAnchorExtractor->extractFirstProductModelAnchor($prompt) !== ''
-            || $this->referenceAnchorExtractor->extractFirstProductModelAnchor($shopSearchQuery) !== ''
        ) {
            return $shopSearchQuery;
        }
@@ -4164,10 +4163,6 @@ final readonly class AgentRunner
            return $shopSearchQuery;
        }

-        if (!$this->isGenericMainDeviceReferentialShopQuery($shopSearchQuery)) {
-            return $shopSearchQuery;
-        }
-
        $modelAnchor = $this->normalizeShopQueryAnchor(
            $this->extractLatestHistoryProductModelAnchor($commerceHistoryContext)
        );
@@ -4176,9 +4171,43 @@ final readonly class AgentRunner
            return $shopSearchQuery;
        }

-        return $this->queryAlreadyContainsAllAnchorTokens($shopSearchQuery, $modelAnchor)
-            ? $shopSearchQuery
-            : $modelAnchor;
+        if ($this->queryAlreadyContainsAllAnchorTokens($shopSearchQuery, $modelAnchor)) {
+            return $this->containsMainDeviceFollowUpAccessoryResidual($shopSearchQuery, $modelAnchor)
+                ? $modelAnchor
+                : $shopSearchQuery;
+        }
+
+        if (!$this->isGenericMainDeviceReferentialShopQuery($shopSearchQuery)) {
+            return $shopSearchQuery;
+        }
+
+        return $modelAnchor;
+    }
+
+    private function containsMainDeviceFollowUpAccessoryResidual(string $shopSearchQuery, string $modelAnchor): bool
+    {
+        $queryTokens = $this->tokenizeShopQueryCandidate($shopSearchQuery);
+        if ($queryTokens === []) {
+            return false;
+        }
+
+        $modelTokens = array_fill_keys($this->tokenizeShopQueryCandidate($modelAnchor), true);
+        $accessoryTokens = $this->buildShopQueryTokenSet($this->mergeUniqueStrings(
+            $this->agentRunnerConfig->getNoLlmAccessoryProductRoleKeywords(),
+            $this->agentRunnerConfig->getRequestedAccessoryCodeTerms()
+        ));
+
+        foreach ($queryTokens as $token) {
+            if (isset($modelTokens[$token])) {
+                continue;
+            }
+
+            if (isset($accessoryTokens[$token]) || preg_match('/^\d{1,5}$/u', $token) === 1) {
+                return true;
+            }
+        }
+
+        return false;
    }

    private function guardWeakReferentialShopQueryWithHistoryModelAnchor(
--- a/src/Command/AgentEvalRunCommand.php
+++ b/src/Command/AgentEvalRunCommand.php
@@ -37,7 +37,7 @@ final class AgentEvalRunCommand extends Command
            ->addArgument(
                'type',
                InputArgument::OPTIONAL,
-                'Eval type to run',
+                'Eval type to run (retrieval, shop_query, followup, answer_guard)',
                'retrieval'
            )
            ->addOption(
--- a/src/Config/NdjsonHybridRetrieverConfig.php
+++ b/src/Config/NdjsonHybridRetrieverConfig.php
@@ -118,6 +118,11 @@ final class NdjsonHybridRetrieverConfig
        return $this->requiredInt('exact_document_max_chunks', 1);
    }

+    public function queryCleanupProfile(): string
+    {
+        return $this->requiredString('query_cleanup_profile');
+    }
+
    public function focusedProductWindow(): int
    {
        return $this->requiredInt('focused_product_window', 1);
@@ -350,6 +355,7 @@ final class NdjsonHybridRetrieverConfig
            'dominant_doc_min_hits' => $this->dominantDocMinHits(),
            'dominant_doc_max_chunks' => $this->dominantDocMaxChunks(),
            'exact_document_max_chunks' => $this->exactDocumentMaxChunks(),
+            'query_cleanup_profile' => $this->queryCleanupProfile(),
            'focused_product_window' => $this->focusedProductWindow(),
            'focused_product_min_score' => $this->focusedProductMinScore(),
            'focused_product_min_gap' => $this->focusedProductMinGap(),
--- a/src/Config/RetriexEffectiveConfigProvider.php
+++ b/src/Config/RetriexEffectiveConfigProvider.php
@@ -49,7 +49,6 @@ final readonly class RetriexEffectiveConfigProvider
            'llm' => [
                'timeout_seconds' => $this->param('retriex.llm.timeout_seconds'),
                'num_predict' => $this->param('retriex.llm.num_predict'),
-                'call_models' => $this->param('retriex.llm.call_models'),
            ],
            'retrieval' => $this->retrievalConfig(),
            'prompt' => $this->promptConfig(),
@@ -86,7 +85,6 @@ final readonly class RetriexEffectiveConfigProvider
        $this->validateRuntime($config['runtime'], $errors, $warnings);
        $this->validateIndex($config['index'], $errors, $warnings);
        $this->validateModel($config['model_generation'], $errors, $warnings);
-        $this->validateLlm($config['llm'], $errors, $warnings);
        $this->validateRetrieval($config['retrieval'], $errors, $warnings);
        $this->validatePrompt($config['prompt'], $errors, $warnings);
        $this->validateAgent($config['agent'], $errors, $warnings);
@@ -1716,46 +1714,6 @@ final readonly class RetriexEffectiveConfigProvider
        }
    }

-    /**
-     * @param array<string, mixed> $llm
-     * @param list<string> $errors
-     * @param list<string> $warnings
-     */
-    private function validateLlm(array $llm, array &$errors, array &$warnings): void
-    {
-        $callModels = $llm['call_models'] ?? [];
-        if (!is_array($callModels)) {
-            $errors[] = 'llm.call_models must be a map.';
-            return;
-        }
-
-        $knownCalls = [
-            'input_normalization',
-            'shop_query_optimization',
-            'final_answer',
-        ];
-
-        foreach ($callModels as $callName => $modelName) {
-            if (!is_string($callName) || trim($callName) === '') {
-                $errors[] = 'llm.call_models contains an invalid call name.';
-                continue;
-            }
-
-            if (!in_array($callName, $knownCalls, true)) {
-                $warnings[] = 'llm.call_models contains an unknown call name: ' . $callName . '.';
-            }
-
-            if ($modelName !== null && !is_string($modelName)) {
-                $errors[] = 'llm.call_models.' . $callName . ' must be null or a string model name.';
-                continue;
-            }
-
-            if (is_string($modelName) && trim($modelName) === '') {
-                $warnings[] = 'llm.call_models.' . $callName . ' is empty and will use the default model.';
-            }
-        }
-    }
-
    /**
     * @param array<string, mixed> $retrieval
     * @param list<string> $errors
@@ -1782,6 +1740,13 @@ final readonly class RetriexEffectiveConfigProvider
            $errors[] = 'retrieval.generic_exact_selection_cleanup_profile references unknown language cleanup profile: ' . trim($cleanupProfile) . '.';
        }

+        $queryCleanupProfile = $retrieval['query_cleanup_profile'] ?? null;
+        if (!is_string($queryCleanupProfile) || trim($queryCleanupProfile) === '') {
+            $errors[] = 'retrieval.query_cleanup_profile must be a non-empty string.';
+        } elseif (!in_array(trim($queryCleanupProfile), $this->languageCleanupConfig->getCleanupProfileNames(), true)) {
+            $errors[] = 'retrieval.query_cleanup_profile references unknown language cleanup profile: ' . trim($queryCleanupProfile) . '.';
+        }
+
        $this->validateStringListMap($retrieval['vocabulary'] ?? [], 'retrieval.vocabulary', $errors, $warnings);

        $inventory = $retrieval['inventory_parameter'] ?? [];
--- a/src/Controller/Admin/AdminEvalController.php
+++ b/src/Controller/Admin/AdminEvalController.php
@@ -0,0 +1,192 @@
+<?php
+
+declare(strict_types=1);
+
+namespace App\Controller\Admin;
+
+use App\Security\ApplicationRoles;
+use App\Service\Admin\EvalAdminService;
+use Symfony\Bundle\FrameworkBundle\Controller\AbstractController;
+use Symfony\Component\HttpFoundation\Request;
+use Symfony\Component\HttpFoundation\Response;
+use Symfony\Component\Routing\Attribute\Route;
+
+#[Route('/admin/evals')]
+final class AdminEvalController extends AbstractController
+{
+    #[Route('/', name: 'admin_evals_index', methods: ['GET'])]
+    public function index(Request $request, EvalAdminService $evals): Response
+    {
+        $this->denyAccessUnlessGranted(ApplicationRoles::ROLE_KNOWLEDGE_ADMIN);
+
+        $selectedType = trim((string) $request->query->get('type', ''));
+        if ($selectedType === '' || !in_array($selectedType, $evals->supportedTypeNames(), true)) {
+            $selectedType = 'retrieval';
+        }
+
+        return $this->render('admin/evals/index.html.twig', [
+            'types' => $evals->supportedTypes(),
+            'overview' => $evals->overview(),
+            'cases_by_type' => $evals->casesByType(),
+            'selected_type' => $selectedType,
+            'selected_report' => $evals->readTypeReport($selectedType),
+            'last_report' => $evals->readLastReport(),
+        ]);
+    }
+
+    #[Route('/run', name: 'admin_evals_run', methods: ['POST'])]
+    public function run(Request $request, EvalAdminService $evals): Response
+    {
+        $this->denyAccessUnlessGranted(ApplicationRoles::ROLE_KNOWLEDGE_ADMIN);
+
+        if (!$this->isCsrfTokenValid('admin_eval_run', (string) $request->request->get('_token'))) {
+            throw $this->createAccessDeniedException();
+        }
+
+        $type = trim((string) $request->request->get('type', 'retrieval'));
+        $caseId = trim((string) $request->request->get('case_id', ''));
+
+        try {
+            $report = $evals->run($type, $caseId !== '' ? $caseId : null);
+            $type = trim((string) ($report['type'] ?? $type));
+
+            $this->addFlash(
+                ((int) ($report['failed'] ?? 0)) === 0 ? 'success' : 'danger',
+                sprintf(
+                    'Eval %s abgeschlossen: %d/%d bestanden.',
+                    $type,
+                    (int) ($report['passed'] ?? 0),
+                    (int) ($report['total'] ?? 0)
+                )
+            );
+        } catch (\Throwable $e) {
+            $this->addFlash('danger', $e->getMessage());
+        }
+
+        return $this->redirectToRoute('admin_evals_index', [
+            'type' => $type,
+        ]);
+    }
+
+    #[Route('/cases/new', name: 'admin_evals_case_new', methods: ['GET'])]
+    public function newCase(Request $request, EvalAdminService $evals): Response
+    {
+        $this->denyAccessUnlessGranted(ApplicationRoles::ROLE_KNOWLEDGE_ADMIN);
+
+        $type = trim((string) $request->query->get('type', 'retrieval'));
+        if (!in_array($type, $evals->supportedTypeNames(), true)) {
+            $type = 'retrieval';
+        }
+
+        $sourceType = trim((string) $request->query->get('source_type', ''));
+        $sourceCaseId = trim((string) $request->query->get('source_case_id', ''));
+
+        try {
+            $draft = $sourceType !== '' && $sourceCaseId !== ''
+                ? $evals->caseDraftFromReportResult($sourceType, $sourceCaseId)
+                : $evals->emptyCaseDraft($type);
+        } catch (\Throwable $e) {
+            $this->addFlash('warning', $e->getMessage());
+            $draft = $evals->emptyCaseDraft($type);
+        }
+
+        return $this->render('admin/evals/case_new.html.twig', [
+            'types' => $evals->supportedTypes(),
+            'cases_by_type' => $evals->casesByType(),
+            'case_draft' => $draft,
+        ]);
+    }
+
+    #[Route('/cases', name: 'admin_evals_case_create', methods: ['POST'])]
+    public function createCase(Request $request, EvalAdminService $evals): Response
+    {
+        $this->denyAccessUnlessGranted(ApplicationRoles::ROLE_KNOWLEDGE_ADMIN);
+
+        if (!$this->isCsrfTokenValid('admin_eval_case_create', (string) $request->request->get('_token'))) {
+            throw $this->createAccessDeniedException();
+        }
+
+        $type = trim((string) $request->request->get('type', 'retrieval'));
+        $draft = [
+            'type' => $type,
+            'id' => (string) $request->request->get('id', ''),
+            'prompt' => (string) $request->request->get('prompt', ''),
+            'assert_json' => (string) $request->request->get('assert_json', ''),
+            'history_json' => (string) $request->request->get('history_json', ''),
+            'request_context_hint' => (string) $request->request->get('request_context_hint', ''),
+            'source_label' => '',
+        ];
+
+        try {
+            $created = $evals->createCase(
+                type: $type,
+                id: (string) $request->request->get('id', ''),
+                prompt: (string) $request->request->get('prompt', ''),
+                assertJson: (string) $request->request->get('assert_json', ''),
+                historyJson: (string) $request->request->get('history_json', ''),
+                requestContextHint: (string) $request->request->get('request_context_hint', ''),
+            );
+
+            $type = (string) ($created['type'] ?? $type);
+
+            $this->addFlash(
+                'success',
+                sprintf('Eval-Case "%s" wurde in %s.ndjson gespeichert.', (string) ($created['id'] ?? ''), $type)
+            );
+
+            return $this->redirectToRoute('admin_evals_index', [
+                'type' => $type,
+            ]);
+        } catch (\Throwable $e) {
+            $this->addFlash('danger', $e->getMessage());
+        }
+
+        if (!in_array($type, $evals->supportedTypeNames(), true)) {
+            $draft['type'] = 'retrieval';
+        }
+
+        return $this->render('admin/evals/case_new.html.twig', [
+            'types' => $evals->supportedTypes(),
+            'cases_by_type' => $evals->casesByType(),
+            'case_draft' => $draft,
+        ], new Response('', Response::HTTP_UNPROCESSABLE_ENTITY));
+    }
+
+
+    #[Route('/cases/delete', name: 'admin_evals_case_delete', methods: ['POST'])]
+    public function deleteCase(Request $request, EvalAdminService $evals): Response
+    {
+        $this->denyAccessUnlessGranted(ApplicationRoles::ROLE_KNOWLEDGE_ADMIN);
+
+        $type = trim((string) $request->request->get('type', 'retrieval'));
+        $caseId = trim((string) $request->request->get('case_id', ''));
+
+        if (!$this->isCsrfTokenValid(
+            sprintf('admin_eval_case_delete_%s_%s', $type, $caseId),
+            (string) $request->request->get('_token')
+        )) {
+            throw $this->createAccessDeniedException();
+        }
+
+        try {
+            $deleted = $evals->deleteCase($type, $caseId);
+            $type = (string) ($deleted['type'] ?? $type);
+
+            $this->addFlash(
+                'success',
+                sprintf('Eval-Case "%s" wurde aus %s.ndjson entfernt.', (string) ($deleted['id'] ?? $caseId), $type)
+            );
+        } catch (\Throwable $e) {
+            $this->addFlash('danger', $e->getMessage());
+        }
+
+        if (!in_array($type, $evals->supportedTypeNames(), true)) {
+            $type = 'retrieval';
+        }
+
+        return $this->redirectToRoute('admin_evals_case_new', [
+            'type' => $type,
+        ]);
+    }
+
+}
--- a/src/Eval/AgentEvalRunner.php
+++ b/src/Eval/AgentEvalRunner.php
@@ -11,6 +11,8 @@ final readonly class AgentEvalRunner
 {
    public function __construct(
        private RetrievalDebugRunner $retrievalDebugRunner,
+        private ShopQueryEvalRunner $shopQueryEvalRunner,
+        private AnswerGuardEvalRunner $answerGuardEvalRunner,
    ) {
    }

@@ -20,6 +22,14 @@ final readonly class AgentEvalRunner
            return $this->retrievalDebugRunner->run($case);
        }

+        if ($case->isShopQueryCase() || $case->isFollowUpCase()) {
+            return $this->shopQueryEvalRunner->run($case);
+        }
+
+        if ($case->isAnswerGuardCase()) {
+            return $this->answerGuardEvalRunner->run($case);
+        }
+
        throw new \InvalidArgumentException(sprintf(
            'Unsupported eval case type: %s',
            $case->type
@@ -40,4 +50,4 @@ final readonly class AgentEvalRunner

        return $results;
    }
-}
+}
--- a/src/Eval/AnswerGuardEvalRunner.php
+++ b/src/Eval/AnswerGuardEvalRunner.php
@@ -0,0 +1,32 @@
+<?php
+
+declare(strict_types=1);
+
+namespace App\Eval;
+
+use App\Eval\Dto\EvalCase;
+use App\Eval\Dto\EvalResult;
+
+final readonly class AnswerGuardEvalRunner
+{
+    public function __construct(
+        private RetrievalDebugRunner $retrievalDebugRunner,
+    ) {
+    }
+
+    public function run(EvalCase $case): EvalResult
+    {
+        $result = $this->retrievalDebugRunner->run($case);
+        $details = $result->details;
+        $details['guard_scope'] = 'retrieval_evidence_pre_answer';
+
+        return new EvalResult(
+            caseId: $result->caseId,
+            type: $case->type,
+            passed: $result->passed,
+            durationMs: $result->durationMs,
+            failures: $result->failures,
+            details: $details,
+        );
+    }
+}
--- a/src/Eval/Dto/EvalCase.php
+++ b/src/Eval/Dto/EvalCase.php
@@ -8,12 +8,15 @@ final readonly class EvalCase
 {
    /**
     * @param array<string, mixed> $assert
+     * @param array<int, array{prompt:string,answer:string}> $history
     */
    public function __construct(
        public string $id,
        public string $type,
        public string $prompt,
        public array $assert = [],
+        public array $history = [],
+        public string $requestContextHint = '',
    ) {
    }

@@ -26,6 +29,8 @@ final readonly class EvalCase
        $type = trim((string) ($row['type'] ?? ''));
        $prompt = trim((string) ($row['prompt'] ?? ''));
        $assert = is_array($row['assert'] ?? null) ? $row['assert'] : [];
+        $history = self::normalizeHistory($row['history'] ?? []);
+        $requestContextHint = trim((string) ($row['request_context_hint'] ?? ''));

        if ($id === '') {
            throw new \InvalidArgumentException('Eval case id must not be empty.');
@@ -50,6 +55,8 @@ final readonly class EvalCase
            type: $type,
            prompt: $prompt,
            assert: $assert,
+            history: $history,
+            requestContextHint: $requestContextHint,
        );
    }

@@ -57,4 +64,64 @@ final readonly class EvalCase
    {
        return $this->type === 'retrieval';
    }
-}
+
+    public function isShopQueryCase(): bool
+    {
+        return $this->type === 'shop_query';
+    }
+
+    public function isFollowUpCase(): bool
+    {
+        return $this->type === 'followup';
+    }
+
+    public function isAnswerGuardCase(): bool
+    {
+        return $this->type === 'answer_guard';
+    }
+
+    /**
+     * @return array<int, array{prompt:string,answer:string}>
+     */
+    private static function normalizeHistory(mixed $value): array
+    {
+        if (!is_array($value)) {
+            return [];
+        }
+
+        $history = [];
+
+        foreach ($value as $entry) {
+            if (is_string($entry)) {
+                $entry = trim($entry);
+
+                if ($entry !== '') {
+                    $history[] = [
+                        'prompt' => 'Eval-Kontext',
+                        'answer' => $entry,
+                    ];
+                }
+
+                continue;
+            }
+
+            if (!is_array($entry)) {
+                continue;
+            }
+
+            $prompt = trim((string) ($entry['prompt'] ?? ''));
+            $answer = trim((string) ($entry['answer'] ?? $entry['response'] ?? ''));
+
+            if ($prompt === '' && $answer === '') {
+                continue;
+            }
+
+            $history[] = [
+                'prompt' => $prompt !== '' ? $prompt : 'Eval-Kontext',
+                'answer' => $answer,
+            ];
+        }
+
+        return $history;
+    }
+}
--- a/src/Eval/RetrievalDebugRunner.php
+++ b/src/Eval/RetrievalDebugRunner.php
@@ -33,6 +33,8 @@ final readonly class RetrievalDebugRunner

        $documentIds = $this->extractUniqueStringValues($rows, 'document_id');
        $chunkIds = $this->extractUniqueStringValues($rows, 'chunk_id');
+        $documentRefs = $this->buildDocumentRefs($rows);
+        $resultRows = $this->buildResultRows($rows);
        $joinedText = $this->extractJoinedText($rows);

        $assert = $case->assert;
@@ -187,6 +189,25 @@ final readonly class RetrievalDebugRunner
            }
        }

+        $forbiddenTerms = $this->normalizeStringList($assert['must_not_include_terms'] ?? []);
+        foreach ($forbiddenTerms as $forbiddenTerm) {
+            if ($this->containsTerm($joinedText, $forbiddenTerm)) {
+                $failures[] = sprintf(
+                    'forbidden term "%s" was present in the retrieval text.',
+                    $forbiddenTerm
+                );
+            }
+        }
+
+        foreach ($this->normalizeStringList($assert['must_not_match_patterns'] ?? []) as $pattern) {
+            if (@preg_match($pattern, $joinedText) === 1) {
+                $failures[] = sprintf(
+                    'forbidden pattern "%s" matched the retrieval text.',
+                    $pattern
+                );
+            }
+        }
+
        return new EvalResult(
            caseId: $case->id,
            type: $case->type,
@@ -201,8 +222,11 @@ final readonly class RetrievalDebugRunner
                'intent' => $intent,
                'document_ids' => $documentIds,
                'chunk_ids' => $chunkIds,
+                'document_refs' => $documentRefs,
+                'result_rows' => $resultRows,
                'matched_any_terms' => $matchedAnyTerms,
                'matched_all_terms' => $matchedAllTerms,
+                'forbidden_terms_checked' => $this->normalizeStringList($assert['must_not_include_terms'] ?? []),
            ],
        );
    }
@@ -248,6 +272,122 @@ final readonly class RetrievalDebugRunner
        return array_keys($values);
    }

+    /**
+     * @param array<int, array<string, mixed>> $rows
+     * @return array<int, array{id:string,title:string,file_path:string,version_number:string,chunk_ids:array<int,string>,ranks:array<int,int>}>
+     */
+    private function buildDocumentRefs(array $rows): array
+    {
+        $refs = [];
+
+        foreach ($rows as $row) {
+            $documentId = $this->extractNullableString($row, 'document_id');
+
+            if ($documentId === '') {
+                continue;
+            }
+
+            if (!isset($refs[$documentId])) {
+                $refs[$documentId] = [
+                    'id' => $documentId,
+                    'title' => $this->extractNullableString($row, 'document_title'),
+                    'file_path' => $this->extractNullableString($row, 'file_path'),
+                    'version_number' => $this->extractNullableString($row, 'version_number'),
+                    'chunk_ids' => [],
+                    'ranks' => [],
+                ];
+            }
+
+            $chunkId = $this->extractNullableString($row, 'chunk_id');
+            if ($chunkId !== '' && !in_array($chunkId, $refs[$documentId]['chunk_ids'], true)) {
+                $refs[$documentId]['chunk_ids'][] = $chunkId;
+            }
+
+            $rank = $this->extractNullableInt($row, 'rank');
+            if ($rank !== null && !in_array($rank, $refs[$documentId]['ranks'], true)) {
+                $refs[$documentId]['ranks'][] = $rank;
+            }
+        }
+
+        return array_values($refs);
+    }
+
+    /**
+     * @param array<int, array<string, mixed>> $rows
+     * @return array<int, array<string, mixed>>
+     */
+    private function buildResultRows(array $rows): array
+    {
+        $out = [];
+
+        foreach ($rows as $row) {
+            $out[] = [
+                'rank' => $this->extractNullableInt($row, 'rank'),
+                'document_id' => $this->extractNullableString($row, 'document_id'),
+                'document_title' => $this->extractNullableString($row, 'document_title'),
+                'file_path' => $this->extractNullableString($row, 'file_path'),
+                'chunk_id' => $this->extractNullableString($row, 'chunk_id'),
+                'chunk_index' => $this->extractNullableInt($row, 'chunk_index'),
+                'raw_score' => $row['raw_score'] ?? null,
+                'rrf_score' => $row['rrf_score'] ?? null,
+                'text_preview' => $this->previewText($this->extractNullableString($row, 'text')),
+            ];
+        }
+
+        return $out;
+    }
+
+    /**
+     * @param array<string, mixed> $row
+     */
+    private function extractNullableString(array $row, string $key): string
+    {
+        $value = $row[$key] ?? null;
+
+        if ($value === null || is_array($value) || is_object($value)) {
+            return '';
+        }
+
+        return trim((string)$value);
+    }
+
+    /**
+     * @param array<string, mixed> $row
+     */
+    private function extractNullableInt(array $row, string $key): ?int
+    {
+        $value = $row[$key] ?? null;
+
+        if ($value === null || $value === '') {
+            return null;
+        }
+
+        if (is_int($value)) {
+            return $value;
+        }
+
+        if (is_string($value) && preg_match('/^-?\d+$/', trim($value)) === 1) {
+            return (int)$value;
+        }
+
+        return null;
+    }
+
+    private function previewText(string $text, int $limit = 240): string
+    {
+        $text = preg_replace('/\s+/u', ' ', trim($text)) ?? trim($text);
+
+        if ($text === '') {
+            return '';
+        }
+
+        if (mb_strlen($text, 'UTF-8') <= $limit) {
+            return $text;
+        }
+
+        return mb_substr($text, 0, $limit, 'UTF-8') . '...';
+    }
+
    /**
     * @param array<int, array<string, mixed>> $rows
     */
--- a/src/Eval/ShopQueryEvalRunner.php
+++ b/src/Eval/ShopQueryEvalRunner.php
@@ -0,0 +1,389 @@
+<?php
+
+declare(strict_types=1);
+
+namespace App\Eval;
+
+use App\Agent\AgentRunner;
+use App\Context\ContextService;
+use App\Eval\Dto\EvalCase;
+use App\Eval\Dto\EvalResult;
+
+final readonly class ShopQueryEvalRunner
+{
+    public function __construct(
+        private AgentRunner $agentRunner,
+        private ContextService $contextService,
+    ) {
+    }
+
+    public function run(EvalCase $case): EvalResult
+    {
+        $start = microtime(true);
+        $failures = [];
+        $userId = $this->buildUserId($case);
+        $transcript = '';
+        $shopMeta = null;
+
+        $this->contextService->deleteHistory($userId);
+        $this->seedHistory($userId, $case->history);
+
+        try {
+            foreach ($this->agentRunner->run($case->prompt, $userId, false, $case->requestContextHint) as $chunk) {
+                if (!is_string($chunk) || $chunk === '') {
+                    continue;
+                }
+
+                $transcript .= $chunk . "\n";
+
+                if (!str_contains($chunk, 'retriex-shop-meta')) {
+                    if (mb_strlen($transcript, 'UTF-8') > 120000) {
+                        $transcript = mb_substr($transcript, -120000, null, 'UTF-8');
+                    }
+                    continue;
+                }
+
+                $shopMeta = $this->extractShopMeta($chunk);
+                break;
+            }
+        } catch (\Throwable $e) {
+            $failures[] = sprintf('agent run failed before shop-query meta was emitted: %s', $e->getMessage());
+        } finally {
+            $this->contextService->deleteHistory($userId);
+        }
+
+        $durationMs = round((microtime(true) - $start) * 1000, 2);
+
+        if ($shopMeta === null) {
+            $failures[] = 'no shop-query meta message was emitted before the runner stopped.';
+            $shopMeta = [
+                'query' => '',
+                'individual_queries' => [],
+                'raw_html' => '',
+            ];
+        }
+
+        $this->assertShopQuery($failures, $case, $shopMeta);
+
+        return new EvalResult(
+            caseId: $case->id,
+            type: $case->type,
+            passed: $failures === [],
+            durationMs: $durationMs,
+            failures: $failures,
+            details: [
+                'prompt' => $case->prompt,
+                'history_turns' => count($case->history),
+                'history' => $this->buildHistoryPreview($case->history),
+                'has_request_context_hint' => $case->requestContextHint !== '',
+                'query' => $shopMeta['query'],
+                'individual_queries' => $shopMeta['individual_queries'],
+                'transcript_preview' => $this->previewText($transcript),
+            ],
+        );
+    }
+
+    /**
+     * @param array<int, array{prompt:string,answer:string}> $history
+     * @return array<int, array{prompt:string,answer_preview:string}>
+     */
+    private function buildHistoryPreview(array $history): array
+    {
+        $preview = [];
+
+        foreach ($history as $turn) {
+            $prompt = trim((string) ($turn['prompt'] ?? ''));
+            $answer = trim((string) ($turn['answer'] ?? ''));
+
+            if ($prompt === '' && $answer === '') {
+                continue;
+            }
+
+            $preview[] = [
+                'prompt' => $prompt !== '' ? $prompt : 'Eval-Kontext',
+                'answer_preview' => $this->previewText($answer, 260),
+            ];
+        }
+
+        return $preview;
+    }
+
+    private function buildUserId(EvalCase $case): string
+    {
+        $safeId = preg_replace('/[^a-zA-Z0-9_-]+/', '_', $case->id) ?? $case->id;
+        $safeId = trim($safeId, '_');
+
+        return 'eval_' . ($safeId !== '' ? $safeId : sha1($case->id));
+    }
+
+    /**
+     * @param array<int, array{prompt:string,answer:string}> $history
+     */
+    private function seedHistory(string $userId, array $history): void
+    {
+        foreach ($history as $turn) {
+            $prompt = trim($turn['prompt'] ?? '');
+            $answer = trim($turn['answer'] ?? '');
+
+            if ($prompt === '' && $answer === '') {
+                continue;
+            }
+
+            if ($prompt === '') {
+                $prompt = 'Eval-Kontext';
+            }
+
+            $this->contextService->appendHistory($userId, $prompt, $answer);
+        }
+    }
+
+    /**
+     * @return array{query:string,individual_queries:array<int,string>,raw_html:string}
+     */
+    private function extractShopMeta(string $html): array
+    {
+        $isMultiQuery = str_contains($html, 'retriex-meta-query--multi');
+        $codes = [];
+
+        if (preg_match_all('/<code>(.*?)<\/code>/su', $html, $matches) !== false) {
+            foreach ($matches[1] ?? [] as $value) {
+                $decoded = html_entity_decode(strip_tags((string) $value), ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8');
+                $decoded = $this->normalizeOneLine($decoded);
+
+                if ($decoded !== '') {
+                    $codes[] = $decoded;
+                }
+            }
+        }
+
+        $codes = array_values(array_unique($codes));
+
+        if ($isMultiQuery) {
+            return [
+                'query' => '',
+                'individual_queries' => $codes,
+                'raw_html' => $html,
+            ];
+        }
+
+        return [
+            'query' => $codes[0] ?? '',
+            'individual_queries' => [],
+            'raw_html' => $html,
+        ];
+    }
+
+    /**
+     * @param array<int, string> $failures
+     * @param array{query:string,individual_queries:array<int,string>,raw_html:string} $shopMeta
+     */
+    private function assertShopQuery(array &$failures, EvalCase $case, array $shopMeta): void
+    {
+        $assert = $case->assert;
+        $query = $shopMeta['query'];
+        $individualQueries = $shopMeta['individual_queries'];
+        $joined = trim($query . ' ' . implode(' ', $individualQueries));
+
+        $expectedQuery = $this->stringOrNull($assert['expected_query'] ?? null);
+        if ($expectedQuery !== null && $this->normalizeQuery($query) !== $this->normalizeQuery($expectedQuery)) {
+            $failures[] = sprintf(
+                'shop query mismatch: expected "%s", got "%s".',
+                $expectedQuery,
+                $query
+            );
+        }
+
+        $forbiddenExactQuery = $this->stringOrNull($assert['must_not_equal_query'] ?? null);
+        if ($forbiddenExactQuery !== null && $this->normalizeQuery($query) === $this->normalizeQuery($forbiddenExactQuery)) {
+            $failures[] = sprintf('shop query must not equal "%s".', $forbiddenExactQuery);
+        }
+
+        $expectedIndividualQueries = $this->normalizeStringList($assert['expected_individual_queries'] ?? []);
+        if ($expectedIndividualQueries !== []) {
+            foreach ($expectedIndividualQueries as $expectedIndividualQuery) {
+                if (!$this->containsNormalizedQuery($individualQueries, $expectedIndividualQuery)) {
+                    $failures[] = sprintf(
+                        'missing expected individual shop query "%s". Got [%s].',
+                        $expectedIndividualQuery,
+                        implode(', ', $individualQueries)
+                    );
+                }
+            }
+        }
+
+        if (($assert['expected_individual_queries_exact'] ?? false) === true) {
+            $expected = array_map(fn(string $value): string => $this->normalizeQuery($value), $expectedIndividualQueries);
+            $actual = array_map(fn(string $value): string => $this->normalizeQuery($value), $individualQueries);
+
+            sort($expected);
+            sort($actual);
+
+            if ($expected !== $actual) {
+                $failures[] = sprintf(
+                    'individual shop queries differ from expected exact set. Expected [%s], got [%s].',
+                    implode(', ', $expectedIndividualQueries),
+                    implode(', ', $individualQueries)
+                );
+            }
+        }
+
+        if (isset($assert['min_individual_queries']) && count($individualQueries) < (int) $assert['min_individual_queries']) {
+            $failures[] = sprintf(
+                'too few individual shop queries: expected >= %d, got %d.',
+                (int) $assert['min_individual_queries'],
+                count($individualQueries)
+            );
+        }
+
+        if (isset($assert['max_individual_queries']) && count($individualQueries) > (int) $assert['max_individual_queries']) {
+            $failures[] = sprintf(
+                'too many individual shop queries: expected <= %d, got %d.',
+                (int) $assert['max_individual_queries'],
+                count($individualQueries)
+            );
+        }
+
+        foreach ($this->normalizeStringList($assert['must_include_terms'] ?? []) as $term) {
+            if (!$this->containsTerm($joined, $term)) {
+                $failures[] = sprintf('shop query output does not contain required term "%s".', $term);
+            }
+        }
+
+        $requiredAnyTerms = $this->normalizeStringList($assert['must_include_any_terms'] ?? []);
+        if ($requiredAnyTerms !== []) {
+            $matched = false;
+            foreach ($requiredAnyTerms as $term) {
+                if ($this->containsTerm($joined, $term)) {
+                    $matched = true;
+                    break;
+                }
+            }
+
+            if (!$matched) {
+                $failures[] = sprintf(
+                    'shop query output contains none of the required any-terms: [%s].',
+                    implode(', ', $requiredAnyTerms)
+                );
+            }
+        }
+
+        foreach ($this->normalizeStringList($assert['must_not_include_terms'] ?? []) as $term) {
+            if ($this->containsTerm($joined, $term)) {
+                $failures[] = sprintf('shop query output contains forbidden term "%s".', $term);
+            }
+        }
+
+        foreach ($this->normalizeStringList($assert['query_must_match_patterns'] ?? []) as $pattern) {
+            if (@preg_match($pattern, $joined) !== 1) {
+                $failures[] = sprintf('shop query output does not match required pattern "%s".', $pattern);
+            }
+        }
+
+        foreach ($this->normalizeStringList($assert['query_must_not_match_patterns'] ?? []) as $pattern) {
+            if (@preg_match($pattern, $joined) === 1) {
+                $failures[] = sprintf('shop query output matches forbidden pattern "%s".', $pattern);
+            }
+        }
+    }
+
+    /**
+     * @param array<int, string> $queries
+     */
+    private function containsNormalizedQuery(array $queries, string $needle): bool
+    {
+        $needle = $this->normalizeQuery($needle);
+
+        foreach ($queries as $query) {
+            if ($this->normalizeQuery($query) === $needle) {
+                return true;
+            }
+        }
+
+        return false;
+    }
+
+    private function containsTerm(string $haystack, string $term): bool
+    {
+        $haystack = $this->normalizeText($haystack);
+        $term = $this->normalizeText($term);
+
+        return $term !== '' && str_contains($haystack, $term);
+    }
+
+    private function normalizeQuery(string $value): string
+    {
+        $value = $this->normalizeText($value);
+        $value = preg_replace('/[^\p{L}\p{N}]+/u', ' ', $value) ?? $value;
+        $value = preg_replace('/\s+/u', ' ', $value) ?? $value;
+
+        return trim($value);
+    }
+
+    private function normalizeText(string $value): string
+    {
+        $value = html_entity_decode(strip_tags($value), ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8');
+        $value = mb_strtolower(trim($value), 'UTF-8');
+        $value = preg_replace('/\s+/u', ' ', $value) ?? $value;
+
+        return trim($value);
+    }
+
+    private function normalizeOneLine(string $value): string
+    {
+        $value = trim($value);
+        $value = preg_replace('/\s+/u', ' ', $value) ?? $value;
+
+        return trim($value);
+    }
+
+    private function stringOrNull(mixed $value): ?string
+    {
+        if (!is_string($value)) {
+            return null;
+        }
+
+        $value = trim($value);
+
+        return $value !== '' ? $value : null;
+    }
+
+    /**
+     * @return array<int, string>
+     */
+    private function normalizeStringList(mixed $value): array
+    {
+        if (!is_array($value)) {
+            return [];
+        }
+
+        $out = [];
+
+        foreach ($value as $item) {
+            if (!is_string($item)) {
+                continue;
+            }
+
+            $item = trim($item);
+
+            if ($item === '') {
+                continue;
+            }
+
+            $out[] = $item;
+        }
+
+        return array_values(array_unique($out));
+    }
+
+    private function previewText(string $value, int $maxLength = 1200): string
+    {
+        $value = $this->normalizeOneLine($value);
+        $maxLength = max(40, $maxLength);
+
+        if (mb_strlen($value, 'UTF-8') <= $maxLength) {
+            return $value;
+        }
+
+        return rtrim(mb_substr($value, 0, $maxLength, 'UTF-8')) . '...';
+    }
+}
--- a/src/Knowledge/Retrieval/NdjsonChunkLookup.php
+++ b/src/Knowledge/Retrieval/NdjsonChunkLookup.php
@@ -357,7 +357,11 @@ final readonly class NdjsonChunkLookup
                continue;
            }

-            if (mb_strlen($token, 'UTF-8') < 3 && preg_match('/\d/u', $token) !== 1) {
+            if (
+                mb_strlen($token, 'UTF-8') < 3
+                && preg_match('/\d/u', $token) !== 1
+                && !$this->isImportantShortTitleToken($token)
+            ) {
                continue;
            }

@@ -367,6 +371,15 @@ final readonly class NdjsonChunkLookup
        return array_values(array_unique($out));
    }

+    private function isImportantShortTitleToken(string $token): bool
+    {
+        if ($token === '' || mb_strlen($token, 'UTF-8') >= 3) {
+            return false;
+        }
+
+        return in_array($token, $this->retrieverConfig->importantShortModelTokens(), true);
+    }
+
    /**
     * @return array<string,bool>
     */
--- a/src/Knowledge/Retrieval/NdjsonHybridRetriever.php
+++ b/src/Knowledge/Retrieval/NdjsonHybridRetriever.php
@@ -133,13 +133,17 @@ final readonly class NdjsonHybridRetriever implements RetrieverInterface
                continue;
            }

+            $row = $result['rows'][$chunkId];
            $rank++;

            $out[] = [
                'rank' => $rank,
                'chunk_id' => $chunkId,
-                'document_id' => $result['rows'][$chunkId]['document_id'] ?? null,
-                'chunk_index' => $result['rows'][$chunkId]['chunk_index'] ?? null,
+                'document_id' => $row['document_id'] ?? null,
+                'document_title' => $this->extractDocumentTitle($row),
+                'file_path' => $this->extractMetadataString($row, 'file_path'),
+                'version_number' => $this->extractMetadataString($row, 'version_number'),
+                'chunk_index' => $row['chunk_index'] ?? null,
                'raw_score' => $result['rawScores'][$chunkId] ?? null,
                'rrf_score' => $result['rrfScores'][$chunkId] ?? null,
                'threshold' => $result['threshold'],
@@ -148,7 +152,7 @@ final readonly class NdjsonHybridRetriever implements RetrieverInterface
                'entity_label' => $result['entityLabel'],
                'is_list_query' => $result['isListQuery'],
                'selection_mode' => $result['selectionMode'],
-                'text' => trim((string)$result['rows'][$chunkId]['text']),
+                'text' => trim((string)($row['text'] ?? '')),
            ];
        }

@@ -1683,6 +1687,20 @@ final readonly class NdjsonHybridRetriever implements RetrieverInterface
        return '';
    }

+    /**
+     * Extracts a scalar metadata value for debug/eval output.
+     */
+    private function extractMetadataString(array $row, string $key): string
+    {
+        $value = $row['metadata'][$key] ?? null;
+
+        if (is_scalar($value)) {
+            return trim((string)$value);
+        }
+
+        return '';
+    }
+
    /**
     * Normalizes text for token-safe product comparisons.
     */
--- a/src/Knowledge/Retrieval/QueryCleaner.php
+++ b/src/Knowledge/Retrieval/QueryCleaner.php
@@ -5,13 +5,15 @@ declare(strict_types=1);
 namespace App\Knowledge\Retrieval;

 use App\Config\LanguageCleanupConfig;
+use App\Config\NdjsonHybridRetrieverConfig;
 use App\Knowledge\StopWords;

 final readonly class QueryCleaner
 {
    public function __construct(
        private StopWords $stopWords,
-        private LanguageCleanupConfig $languageCleanupConfig
+        private LanguageCleanupConfig $languageCleanupConfig,
+        private NdjsonHybridRetrieverConfig $retrieverConfig
    ) {
    }

@@ -21,9 +23,8 @@ final readonly class QueryCleaner
     * Important:
     * - Unicode-safe
     * - Numbers are preserved
-     * - Negations are preserved
-     * - No aggressive token-length filtering
-     * - Stop words are removed
+     * - Negations are preserved by protected-term aware cleanup profiles
+     * - Stop words are resolved from the generic legacy list plus YAML cleanup profile terms
     */
    public function clean(string $query): string
    {
@@ -31,49 +32,49 @@ final readonly class QueryCleaner
            return '';
        }

-        // 1. Convert to lowercase in a Unicode-safe way
+        $profile = $this->loadCleanupProfile();
+
+        // 1. Convert to lowercase in a Unicode-safe way.
        $query = mb_strtolower($query, 'UTF-8');

-        // 2. Treat hyphens and slashes as word separators
+        // 2. Treat hyphens and slashes as word separators.
        $query = $this->languageCleanupConfig->replaceWordSeparatorsWithSpace($query);

-        // 3. Remove special characters, but keep:
-        //    - letters
-        //    - numbers
-        //    - other Unicode letters
+        // 3. Remove configured cleanup phrases before punctuation stripping.
+        $query = $this->removePhrases($query, $profile['phrases']);
+
+        // 4. Remove special characters, but keep letters, numbers and other Unicode letters.
        $query = preg_replace('/[^\p{L}\p{N}\s]/u', ' ', $query);

        if ($query === null) {
            return '';
        }

-        // 4. Normalize multiple whitespace characters
+        // 5. Normalize multiple whitespace characters.
        $query = preg_replace('/\s+/u', ' ', $query);
-        $query = trim($query);
+        $query = trim((string) $query);

        if ($query === '') {
            return '';
        }

-        // 5. Tokenize the query
        $tokens = preg_split('/\s+/u', $query);

        if ($tokens === false) {
            return '';
        }

+        $profileTerms = array_fill_keys(array_merge($profile['stopwords'], $profile['meta_terms']), true);
        $cleanTokens = [];

        foreach ($tokens as $token) {
-
            $token = trim($token);

            if ($token === '') {
                continue;
            }

-            // Remove stop words
-            if ($this->stopWords->isStopWord($token)) {
+            if ($this->stopWords->isStopWord($token) || isset($profileTerms[$token])) {
                continue;
            }

@@ -86,4 +87,42 @@ final readonly class QueryCleaner

        return implode(' ', $cleanTokens);
    }
-}
+
+    /**
+     * @return array{stopwords:string[], phrases:string[], meta_terms:string[], protected_terms:string[]}
+     */
+    private function loadCleanupProfile(): array
+    {
+        return $this->languageCleanupConfig->getCleanupProfile($this->retrieverConfig->queryCleanupProfile());
+    }
+
+    /**
+     * @param string[] $phrases
+     */
+    private function removePhrases(string $query, array $phrases): string
+    {
+        foreach ($phrases as $phrase) {
+            $phrase = trim(mb_strtolower($phrase, 'UTF-8'));
+
+            if ($phrase === '') {
+                continue;
+            }
+
+            $normalizedPhrase = $this->languageCleanupConfig->replaceWordSeparatorsWithSpace($phrase);
+            $parts = preg_split('/\s+/u', $normalizedPhrase, -1, PREG_SPLIT_NO_EMPTY) ?: [];
+
+            if ($parts === []) {
+                continue;
+            }
+
+            $pattern = implode('\\s+', array_map(
+                static fn (string $part): string => preg_quote($part, '/'),
+                $parts
+            ));
+
+            $query = preg_replace('/(?<!\p{L})(?:' . $pattern . ')(?!\p{L})/u', ' ', $query) ?? $query;
+        }
+
+        return $query;
+    }
+}
--- a/src/Service/Admin/EvalAdminService.php
+++ b/src/Service/Admin/EvalAdminService.php
@@ -0,0 +1,774 @@
+<?php
+
+declare(strict_types=1);
+
+namespace App\Service\Admin;
+
+use App\Eval\AgentEvalRunner;
+use App\Eval\Dto\EvalCase;
+use App\Eval\Dto\EvalResult;
+use App\Eval\EvalCaseLoader;
+use App\Eval\EvalReportWriter;
+
+final readonly class EvalAdminService
+{
+    /**
+     * @var array<string, string>
+     */
+    private const TYPES = [
+        'retrieval' => 'Retrieval',
+        'shop_query' => 'Shopquery',
+        'followup' => 'Follow-up',
+        'answer_guard' => 'Answer-Guard',
+    ];
+
+    public function __construct(
+        private EvalCaseLoader $caseLoader,
+        private AgentEvalRunner $runner,
+        private EvalReportWriter $reportWriter,
+        private string $projectDir,
+    ) {
+    }
+
+    /**
+     * @return array<string, string>
+     */
+    public function supportedTypes(): array
+    {
+        return self::TYPES;
+    }
+
+    /**
+     * @return array<int, string>
+     */
+    public function supportedTypeNames(): array
+    {
+        return array_keys(self::TYPES);
+    }
+
+    public function assertSupportedType(string $type): string
+    {
+        $type = trim($type);
+
+        if (!array_key_exists($type, self::TYPES)) {
+            throw new \InvalidArgumentException(sprintf('Unsupported eval type: %s', $type));
+        }
+
+        return $type;
+    }
+
+    /**
+     * @return array<string, array<int, array{id:string,prompt:string,type:string}>>
+     */
+    public function casesByType(): array
+    {
+        $casesByType = [];
+
+        foreach (array_keys(self::TYPES) as $type) {
+            $casesByType[$type] = array_map(
+                static fn (EvalCase $case): array => [
+                    'id' => $case->id,
+                    'type' => $case->type,
+                    'prompt' => $case->prompt,
+                ],
+                $this->loadCases($type)
+            );
+        }
+
+        return $casesByType;
+    }
+
+    /**
+     * @return array<int, array<string, mixed>>
+     */
+    public function overview(): array
+    {
+        $overview = [];
+
+        foreach (self::TYPES as $type => $label) {
+            $cases = $this->loadCases($type);
+            $report = $this->readTypeReport($type);
+
+            $overview[] = [
+                'type' => $type,
+                'label' => $label,
+                'case_count' => count($cases),
+                'report' => $report,
+                'status' => $this->statusFromReport($report),
+            ];
+        }
+
+        return $overview;
+    }
+
+    /**
+     * @return array<string, mixed>
+     */
+    public function run(string $type, ?string $caseId = null): array
+    {
+        $type = $this->assertSupportedType($type);
+        $caseId = trim((string) $caseId);
+        $cases = $this->loadCases($type);
+
+        if ($caseId !== '') {
+            $cases = $this->filterCasesById($cases, $caseId);
+
+            if ($cases === []) {
+                [$type, $cases] = $this->findCasesByIdAcrossTypes($caseId);
+            }
+        }
+
+        if ($cases === []) {
+            if ($caseId !== '') {
+                throw new \RuntimeException(sprintf(
+                    'Eval case "%s" was not found. Please select a case from the list for the chosen eval type.',
+                    $caseId
+                ));
+            }
+
+            throw new \RuntimeException(sprintf(
+                'No eval cases available for eval type "%s".',
+                $type
+            ));
+        }
+
+        $results = $this->runner->runAll($cases);
+        $report = $this->buildReport($type, $caseId !== '' ? $caseId : null, $results);
+
+        $typeReportPath = $this->reportWriter->write($report, sprintf('%s-last-run.json', $type));
+        $lastReportPath = $this->reportWriter->write($report);
+
+        $report['written_to'] = $typeReportPath;
+        $report['last_run_written_to'] = $lastReportPath;
+
+        return $report;
+    }
+
+    /**
+     * @return array{type:string,id:string,prompt:string,assert_json:string,history_json:string,request_context_hint:string,source_label:string}
+     */
+    public function emptyCaseDraft(string $type = 'retrieval'): array
+    {
+        $type = $this->assertSupportedType($type);
+
+        return [
+            'type' => $type,
+            'id' => '',
+            'prompt' => '',
+            'assert_json' => $this->encodePrettyJson($this->defaultAssertForType($type)),
+            'history_json' => '',
+            'request_context_hint' => '',
+            'source_label' => '',
+        ];
+    }
+
+    /**
+     * @return array{type:string,id:string,prompt:string,assert_json:string,history_json:string,request_context_hint:string,source_label:string}
+     */
+    public function caseDraftFromReportResult(string $type, string $caseId): array
+    {
+        $type = $this->assertSupportedType($type);
+        $caseId = trim($caseId);
+
+        if ($caseId === '') {
+            throw new \InvalidArgumentException('Es wurde keine Quell-Case-ID übergeben.');
+        }
+
+        $report = $this->readTypeReport($type);
+        if ($report === null) {
+            throw new \RuntimeException(sprintf(
+                'Für den Eval-Typ "%s" liegt kein Report vor. Bitte den Eval zuerst ausführen.',
+                $type
+            ));
+        }
+
+        $result = null;
+        foreach (($report['results'] ?? []) as $candidate) {
+            if (is_array($candidate) && (string) ($candidate['case_id'] ?? '') === $caseId) {
+                $result = $candidate;
+                break;
+            }
+        }
+
+        if (!is_array($result)) {
+            throw new \RuntimeException(sprintf(
+                'Der Report enthält keinen Case "%s" für Eval-Typ "%s".',
+                $caseId,
+                $type
+            ));
+        }
+
+        $details = is_array($result['details'] ?? null) ? $result['details'] : [];
+        $prompt = trim((string) ($result['prompt'] ?? $details['prompt'] ?? ''));
+        $history = $this->historyDraftFromDetails($details);
+        $assert = $this->suggestAssertFromReportResult($type, $result, $details);
+
+        return [
+            'type' => $type,
+            'id' => $this->suggestUniqueCaseId($type . '_' . $caseId . '_new'),
+            'prompt' => $prompt,
+            'assert_json' => $this->encodePrettyJson($assert),
+            'history_json' => $history === [] ? '' : $this->encodePrettyJson($history),
+            'request_context_hint' => '',
+            'source_label' => sprintf('Vorlage aus Report-Case %s (%s)', $caseId, self::TYPES[$type]),
+        ];
+    }
+
+    /**
+     * @return array{type:string,id:string,path:string,row:array<string,mixed>,case_count:int}
+     */
+    public function createCase(
+        string $type,
+        string $id,
+        string $prompt,
+        string $assertJson,
+        string $historyJson = '',
+        string $requestContextHint = '',
+    ): array {
+        $type = $this->assertSupportedType($type);
+        $id = $this->normalizeNewCaseId($id);
+        $prompt = trim($prompt);
+        $requestContextHint = trim($requestContextHint);
+
+        if ($prompt === '') {
+            throw new \InvalidArgumentException('Der Eval-Prompt darf nicht leer sein.');
+        }
+
+        if ($this->caseIdExists($id)) {
+            throw new \RuntimeException(sprintf(
+                'Ein Eval-Case mit der ID "%s" existiert bereits. Bitte eine neue ID verwenden.',
+                $id
+            ));
+        }
+
+        $assert = $this->decodeJsonObject($assertJson, 'Assert-JSON');
+        $history = $this->decodeHistoryJson($historyJson);
+
+        $row = [
+            'id' => $id,
+            'type' => $type,
+            'prompt' => $prompt,
+            'assert' => $assert,
+        ];
+
+        if ($history !== []) {
+            $row['history'] = $history;
+        }
+
+        if ($requestContextHint !== '') {
+            $row['request_context_hint'] = $requestContextHint;
+        }
+
+        // Validate with the same DTO that the eval runner uses.
+        EvalCase::fromArray($row);
+
+        $path = $this->caseFilePath($type);
+        $line = json_encode(
+            $row,
+            JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_THROW_ON_ERROR
+        );
+
+        $prefix = '';
+        if (is_file($path) && filesize($path) > 0) {
+            $contents = file_get_contents($path);
+            if (is_string($contents) && $contents !== '' && !str_ends_with($contents, "\n")) {
+                $prefix = "\n";
+            }
+        }
+
+        $written = file_put_contents($path, $prefix . $line . PHP_EOL, FILE_APPEND | LOCK_EX);
+        if ($written === false) {
+            throw new \RuntimeException(sprintf('Eval-Case-Datei konnte nicht geschrieben werden: %s', $path));
+        }
+
+        return [
+            'type' => $type,
+            'id' => $id,
+            'path' => $path,
+            'row' => $row,
+            'case_count' => count($this->loadCases($type)),
+        ];
+    }
+
+
+    /**
+     * @return array{type:string,id:string,path:string,case_count:int}
+     */
+    public function deleteCase(string $type, string $caseId): array
+    {
+        $type = $this->assertSupportedType($type);
+        $caseId = $this->normalizeExistingCaseId($caseId);
+        $path = $this->caseFilePath($type);
+
+        if (!is_file($path)) {
+            throw new \RuntimeException(sprintf('Eval-Case-Datei wurde nicht gefunden: %s', $path));
+        }
+
+        $lines = file($path, FILE_IGNORE_NEW_LINES);
+        if ($lines === false) {
+            throw new \RuntimeException(sprintf('Eval-Case-Datei konnte nicht gelesen werden: %s', $path));
+        }
+
+        $keptLines = [];
+        $deleted = false;
+
+        foreach ($lines as $line) {
+            $trimmed = trim((string) $line);
+            if ($trimmed === '') {
+                continue;
+            }
+
+            try {
+                $decoded = json_decode($trimmed, true, 512, JSON_THROW_ON_ERROR);
+            } catch (\JsonException $e) {
+                throw new \RuntimeException(sprintf(
+                    'Eval-Case-Datei enthält ungültiges JSON und wurde nicht verändert: %s',
+                    $e->getMessage()
+                ));
+            }
+
+            if (!is_array($decoded)) {
+                throw new \RuntimeException('Eval-Case-Datei enthält eine ungültige NDJSON-Zeile und wurde nicht verändert.');
+            }
+
+            if ((string) ($decoded['id'] ?? '') === $caseId) {
+                $deleted = true;
+                continue;
+            }
+
+            $keptLines[] = $trimmed;
+        }
+
+        if (!$deleted) {
+            throw new \RuntimeException(sprintf(
+                'Eval-Case "%s" wurde im Typ "%s" nicht gefunden.',
+                $caseId,
+                $type
+            ));
+        }
+
+        $contents = $keptLines === [] ? '' : implode(PHP_EOL, $keptLines) . PHP_EOL;
+        $written = file_put_contents($path, $contents, LOCK_EX);
+        if ($written === false) {
+            throw new \RuntimeException(sprintf('Eval-Case-Datei konnte nicht geschrieben werden: %s', $path));
+        }
+
+        return [
+            'type' => $type,
+            'id' => $caseId,
+            'path' => $path,
+            'case_count' => count($this->loadCases($type)),
+        ];
+    }
+
+    /**
+     * @param array<int, EvalCase> $cases
+     * @return array<int, EvalCase>
+     */
+    private function filterCasesById(array $cases, string $caseId): array
+    {
+        return array_values(array_filter(
+            $cases,
+            static fn (EvalCase $case): bool => $case->id === $caseId
+        ));
+    }
+
+    /**
+     * @return array{0:string,1:array<int, EvalCase>}
+     */
+    private function findCasesByIdAcrossTypes(string $caseId): array
+    {
+        foreach (array_keys(self::TYPES) as $candidateType) {
+            $cases = $this->filterCasesById($this->loadCases($candidateType), $caseId);
+
+            if ($cases !== []) {
+                return [$candidateType, $cases];
+            }
+        }
+
+        return ['', []];
+    }
+
+    /**
+     * @return array<string, mixed>|null
+     */
+    public function readTypeReport(string $type): ?array
+    {
+        $type = $this->assertSupportedType($type);
+
+        return $this->readReportFile(sprintf('%s/tests/evals/reports/%s-last-run.json', $this->projectDir, $type));
+    }
+
+    /**
+     * @return array<string, mixed>|null
+     */
+    public function readLastReport(): ?array
+    {
+        return $this->readReportFile(sprintf('%s/tests/evals/reports/last-run.json', $this->projectDir));
+    }
+
+    /**
+     * @return array<int, EvalCase>
+     */
+    private function loadCases(string $type): array
+    {
+        return $this->caseLoader->load($this->assertSupportedType($type));
+    }
+
+    /**
+     * @param array<int, EvalResult> $results
+     * @return array<string, mixed>
+     */
+    private function buildReport(string $type, ?string $caseId, array $results): array
+    {
+        $passed = count(array_filter(
+            $results,
+            static fn (EvalResult $result): bool => $result->passed
+        ));
+        $failed = count($results) - $passed;
+
+        return [
+            'type' => $type,
+            'case_filter' => $caseId,
+            'total' => count($results),
+            'passed' => $passed,
+            'failed' => $failed,
+            'generated_at' => (new \DateTimeImmutable())->format(\DateTimeInterface::ATOM),
+            'results' => array_map(
+                static fn (EvalResult $result): array => $result->toArray(),
+                $results
+            ),
+        ];
+    }
+
+    /**
+     * @return array<string, mixed>|null
+     */
+    private function readReportFile(string $path): ?array
+    {
+        if (!is_file($path)) {
+            return null;
+        }
+
+        $raw = file_get_contents($path);
+
+        if (!is_string($raw) || trim($raw) === '') {
+            return null;
+        }
+
+        $decoded = json_decode($raw, true);
+
+        if (!is_array($decoded)) {
+            return null;
+        }
+
+        return $decoded;
+    }
+
+    private function normalizeNewCaseId(string $id): string
+    {
+        $id = trim($id);
+
+        if ($id === '') {
+            throw new \InvalidArgumentException('Die Eval-Case-ID darf nicht leer sein.');
+        }
+
+        if (preg_match('/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/', $id) !== 1) {
+            throw new \InvalidArgumentException(
+                'Die Eval-Case-ID darf nur Buchstaben, Zahlen, Unterstriche und Bindestriche enthalten und muss mit einem Buchstaben oder einer Zahl beginnen.'
+            );
+        }
+
+        return $id;
+    }
+
+    private function normalizeExistingCaseId(string $id): string
+    {
+        $id = trim($id);
+
+        if ($id === '') {
+            throw new \InvalidArgumentException('Es wurde keine Eval-Case-ID zum Löschen übergeben.');
+        }
+
+        if (preg_match('/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/', $id) !== 1) {
+            throw new \InvalidArgumentException(
+                'Die Eval-Case-ID ist ungültig. Erlaubt sind nur Buchstaben, Zahlen, Unterstriche und Bindestriche.'
+            );
+        }
+
+        return $id;
+    }
+
+    private function caseIdExists(string $id): bool
+    {
+        foreach (array_keys(self::TYPES) as $type) {
+            foreach ($this->loadCases($type) as $case) {
+                if ($case->id === $id) {
+                    return true;
+                }
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * @return array<string, mixed>
+     */
+    private function decodeJsonObject(string $json, string $label): array
+    {
+        $json = trim($json);
+
+        if ($json === '') {
+            return [];
+        }
+
+        try {
+            $decoded = json_decode($json, true, 512, JSON_THROW_ON_ERROR);
+        } catch (\JsonException $e) {
+            throw new \InvalidArgumentException(sprintf('%s ist ungültig: %s', $label, $e->getMessage()));
+        }
+
+        if (!is_array($decoded) || !str_starts_with($json, '{') || ($decoded !== [] && array_is_list($decoded))) {
+            throw new \InvalidArgumentException(sprintf('%s muss ein JSON-Objekt sein.', $label));
+        }
+
+        return $decoded;
+    }
+
+    /**
+     * @return array<int, array{prompt:string,answer:string}>
+     */
+    private function decodeHistoryJson(string $json): array
+    {
+        $json = trim($json);
+
+        if ($json === '') {
+            return [];
+        }
+
+        try {
+            $decoded = json_decode($json, true, 512, JSON_THROW_ON_ERROR);
+        } catch (\JsonException $e) {
+            throw new \InvalidArgumentException(sprintf('History-JSON ist ungültig: %s', $e->getMessage()));
+        }
+
+        if (!is_array($decoded) || !str_starts_with($json, '[') || !array_is_list($decoded)) {
+            throw new \InvalidArgumentException('History-JSON muss eine JSON-Liste sein.');
+        }
+
+        $history = [];
+
+        foreach ($decoded as $entry) {
+            if (is_string($entry)) {
+                $entry = trim($entry);
+                if ($entry !== '') {
+                    $history[] = [
+                        'prompt' => 'Eval-Kontext',
+                        'answer' => $entry,
+                    ];
+                }
+                continue;
+            }
+
+            if (!is_array($entry)) {
+                continue;
+            }
+
+            $prompt = trim((string) ($entry['prompt'] ?? ''));
+            $answer = trim((string) ($entry['answer'] ?? $entry['response'] ?? $entry['answer_preview'] ?? ''));
+
+            if ($prompt === '' && $answer === '') {
+                continue;
+            }
+
+            $history[] = [
+                'prompt' => $prompt !== '' ? $prompt : 'Eval-Kontext',
+                'answer' => $answer,
+            ];
+        }
+
+        return $history;
+    }
+
+    private function caseFilePath(string $type): string
+    {
+        $type = $this->assertSupportedType($type);
+
+        return sprintf('%s/tests/evals/cases/%s.ndjson', $this->projectDir, $type);
+    }
+
+    private function statusFromReport(?array $report): string
+    {
+        if ($report === null) {
+            return 'not_run';
+        }
+
+        $failed = (int) ($report['failed'] ?? 0);
+        $total = (int) ($report['total'] ?? 0);
+
+        if ($total <= 0) {
+            return 'empty';
+        }
+
+        return $failed === 0 ? 'green' : 'red';
+    }
+
+    /**
+     * @return array<string, mixed>
+     */
+    private function defaultAssertForType(string $type): array
+    {
+        return match ($type) {
+            'retrieval', 'answer_guard' => [
+                'min_results' => 1,
+            ],
+            'shop_query', 'followup' => [
+                'expected_query' => '',
+            ],
+            default => [],
+        };
+    }
+
+    /**
+     * @param array<string, mixed> $result
+     * @param array<string, mixed> $details
+     * @return array<string, mixed>
+     */
+    private function suggestAssertFromReportResult(string $type, array $result, array $details): array
+    {
+        if (($type === 'shop_query' || $type === 'followup') && is_string($details['query'] ?? null)) {
+            $query = trim($details['query']);
+            if ($query !== '') {
+                return [
+                    'expected_query' => $query,
+                ];
+            }
+        }
+
+        if (($type === 'shop_query' || $type === 'followup') && is_array($details['individual_queries'] ?? null)) {
+            $queries = array_values(array_filter(array_map(
+                static fn (mixed $value): string => trim((string) $value),
+                $details['individual_queries']
+            )));
+
+            if ($queries !== []) {
+                return [
+                    'expected_individual_queries' => $queries,
+                    'expected_individual_queries_exact' => true,
+                ];
+            }
+        }
+
+        if (is_array($details['document_refs'] ?? null)) {
+            $documentIds = [];
+            foreach ($details['document_refs'] as $documentRef) {
+                if (!is_array($documentRef)) {
+                    continue;
+                }
+
+                $documentId = trim((string) ($documentRef['id'] ?? ''));
+                if ($documentId !== '') {
+                    $documentIds[] = $documentId;
+                }
+            }
+
+            if ($documentIds !== []) {
+                return [
+                    'min_results' => 1,
+                    'must_include_one_of_document_ids' => array_values(array_unique($documentIds)),
+                ];
+            }
+        }
+
+        if (is_array($details['document_ids'] ?? null)) {
+            $documentIds = array_values(array_filter(array_map(
+                static fn (mixed $value): string => trim((string) $value),
+                $details['document_ids']
+            )));
+
+            if ($documentIds !== []) {
+                return [
+                    'min_results' => 1,
+                    'must_include_one_of_document_ids' => array_values(array_unique($documentIds)),
+                ];
+            }
+        }
+
+        $resultCount = (int) ($details['result_count'] ?? -1);
+        if ($resultCount === 0) {
+            return [
+                'max_results' => 0,
+            ];
+        }
+
+        return $this->defaultAssertForType($type);
+    }
+
+    /**
+     * @param array<string, mixed> $details
+     * @return array<int, array{prompt:string,answer:string}>
+     */
+    private function historyDraftFromDetails(array $details): array
+    {
+        if (!is_array($details['history'] ?? null)) {
+            return [];
+        }
+
+        $history = [];
+        foreach ($details['history'] as $entry) {
+            if (!is_array($entry)) {
+                continue;
+            }
+
+            $prompt = trim((string) ($entry['prompt'] ?? ''));
+            $answer = trim((string) ($entry['answer'] ?? $entry['answer_preview'] ?? ''));
+
+            if ($prompt === '' && $answer === '') {
+                continue;
+            }
+
+            $history[] = [
+                'prompt' => $prompt !== '' ? $prompt : 'Eval-Kontext',
+                'answer' => $answer,
+            ];
+        }
+
+        return $history;
+    }
+
+    private function suggestUniqueCaseId(string $base): string
+    {
+        $base = strtolower(trim($base));
+        $base = preg_replace('/[^a-z0-9_-]+/', '_', $base) ?? 'eval_case';
+        $base = trim($base, '_-');
+
+        if ($base === '') {
+            $base = 'eval_case';
+        }
+
+        if (!$this->caseIdExists($base)) {
+            return $base;
+        }
+
+        for ($i = 2; $i <= 999; ++$i) {
+            $candidate = sprintf('%s_%d', $base, $i);
+            if (!$this->caseIdExists($candidate)) {
+                return $candidate;
+            }
+        }
+
+        return sprintf('%s_%s', $base, (new \DateTimeImmutable())->format('YmdHis'));
+    }
+
+    /**
+     * @param array<mixed> $value
+     */
+    private function encodePrettyJson(array $value): string
+    {
+        return json_encode(
+            $value,
+            JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_THROW_ON_ERROR
+        );
+    }
+}
--- a/templates/admin/base.html.twig
+++ b/templates/admin/base.html.twig
@@ -134,6 +134,10 @@
                           href="{{ path('admin_model_config_list') }}#agentLiveTest">
                            <i class="bi bi-rocket-takeoff-fill"></i> KI-Agent Live-Test
                        </a>
+                        <a class="nav-link text-light {% if route starts with 'admin_evals' %}active fw-bold{% endif %}"
+                           href="{{ path('admin_evals_index') }}">
+                            <i class="bi bi-clipboard2-check"></i> Eval Suite
+                        </a>
                    {% endif %}
                    <hr class="border-secondary">
                    <div class="text-info text-uppercase small mb-2">
--- a/templates/admin/evals/case_new.html.twig
+++ b/templates/admin/evals/case_new.html.twig
@@ -0,0 +1,351 @@
+{% extends 'admin/base.html.twig' %}
+
+{% block title %}Eval-Cases verwalten{% endblock %}
+
+{% block body %}
+
+    <div class="d-flex justify-content-between align-items-center mb-4 flex-wrap gap-2">
+        <div>
+            <h1 class="h3 mb-1">
+                <i class="bi bi-journal-plus"></i> Eval-Cases verwalten
+            </h1>
+            <div class="small text-secondary">
+                Neue Regression-Cases separat anlegen oder bestehende Cases entfernen, ohne die Eval-Suite-Übersicht aufzublähen.
+            </div>
+        </div>
+
+        <a href="{{ path('admin_evals_index', {type: case_draft.type|default('retrieval')}) }}"
+           class="btn btn-sm btn-outline-secondary">
+            Zurück zur Eval Suite
+        </a>
+    </div>
+
+    {% for label in ['success', 'danger', 'warning', 'info'] %}
+        {% for message in app.flashes(label) %}
+            <div class="alert alert-{{ label }} shadow-sm">
+                {{ message }}
+            </div>
+        {% endfor %}
+    {% endfor %}
+
+    {% if case_draft.source_label|default('') %}
+        <div class="alert alert-info border-info bg-black text-light shadow-sm">
+            <strong>Vorlage geladen:</strong> {{ case_draft.source_label }}<br>
+            <span class="small text-secondary">
+                Bitte Case-ID, Prompt und Assertions prüfen, bevor du den Case speicherst.
+            </span>
+        </div>
+    {% endif %}
+
+    <div class="alert alert-secondary border-secondary bg-black text-light shadow-sm mb-4">
+        <div class="fw-semibold text-warning mb-1">
+            <i class="bi bi-compass"></i> Kurz erklärt
+        </div>
+        <div class="small text-secondary">
+            Ein Eval-Case ist ein wiederholbarer Test. Du trägst ein, <strong class="text-light">was der Nutzer fragt</strong>
+            und <strong class="text-light">woran RetrieX gemessen werden soll</strong>. Der Test verändert keine Daten im Shop oder im RAG-Wissen,
+            sondern prüft nur, ob ein bekannter Fall weiterhin richtig läuft.
+        </div>
+    </div>
+
+    <div class="row g-4">
+        <div class="col-xl-8">
+            <div class="card bg-black border-secondary text-light shadow-sm">
+                <div class="card-body">
+                    <h5 class="text-warning mb-3">
+                        <i class="bi bi-pencil-square"></i> Neuer Eval-Case
+                    </h5>
+
+                    <form method="post" action="{{ path('admin_evals_case_create') }}">
+                        <input type="hidden" name="_token" value="{{ csrf_token('admin_eval_case_create') }}">
+
+                        <div class="mb-4">
+                            <label class="form-label">Eval-Typ</label>
+                            <select name="type" class="form-select bg-dark text-light border-secondary">
+                                {% for type, label in types %}
+                                    <option value="{{ type }}" {% if type == case_draft.type|default('retrieval') %}selected{% endif %}>
+                                        {{ label }}
+                                    </option>
+                                {% endfor %}
+                            </select>
+                            <div class="form-text text-secondary">
+                                Wähle zuerst, <strong class="text-light">was genau geprüft werden soll</strong>. Der Typ entscheidet auch,
+                                in welche Datei der Case geschrieben wird: <code>tests/evals/cases/&lt;type&gt;.ndjson</code>.
+                            </div>
+                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
+                                <div class="mb-1"><strong class="text-light">retrieval</strong>: prüft, ob die richtige Wissensquelle oder das richtige Dokument gefunden wird.</div>
+                                <div class="mb-1"><strong class="text-light">shop_query</strong>: prüft, welche Suchquery an den Shop geschickt würde.</div>
+                                <div class="mb-1"><strong class="text-light">followup</strong>: prüft eine Folgefrage, die den vorherigen Chatverlauf braucht.</div>
+                                <div><strong class="text-light">answer_guard</strong>: prüft, dass RetrieX bei Unsinn oder fehlender Evidenz nichts erfindet.</div>
+                            </div>
+                        </div>
+
+                        <div class="mb-4">
+                            <label class="form-label">Neue Case-ID</label>
+                            <input type="text"
+                                   name="id"
+                                   value="{{ case_draft.id|default('') }}"
+                                   class="form-control bg-dark text-light border-secondary"
+                                   placeholder="followup_testomat808_device_price_001"
+                                   required>
+                            <div class="form-text text-secondary">
+                                Das ist der <strong class="text-light">interne Name des Tests</strong>. Er erscheint später in der Eval-Auswertung,
+                                damit du den Fall wiedererkennst. Verwende keine Leerzeichen. Erlaubt sind Buchstaben, Zahlen, <code>_</code> und <code>-</code>.
+                            </div>
+                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
+                                Gute Beispiele: <code>retrieval_lieferbedingungen_versand_001</code>,
+                                <code>shop_query_testomat808_indikator300_001</code>,
+                                <code>followup_testomat808_device_price_001</code>.<br>
+                                Faustregel: <code>typ_thema_ziel_nummer</code>.
+                            </div>
+                        </div>
+
+                        <div class="mb-4">
+                            <label class="form-label">Prompt</label>
+                            <textarea name="prompt"
+                                      rows="3"
+                                      class="form-control bg-dark text-light border-secondary"
+                                      placeholder="und was kostet das gerät selber"
+                                      required>{{ case_draft.prompt|default('') }}</textarea>
+                            <div class="form-text text-secondary">
+                                Hier kommt <strong class="text-light">genau die Nutzerfrage</strong> hinein, die getestet werden soll.
+                                Nicht die erwartete Antwort eintragen, sondern den Satz, den ein Nutzer in den Chat schreiben würde.
+                            </div>
+                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
+                                Tippfehler dürfen bewusst drin bleiben, wenn genau dieser Tippfehler abgesichert werden soll.
+                                Beispiel: <code>ich würde gern chlor im schwinnbad messen</code> prüft dann auch die Korrektur Richtung <code>schwimmbad</code>.
+                            </div>
+                        </div>
+
+                        <div class="mb-4">
+                            <label class="form-label">Assert-JSON</label>
+                            <textarea name="assert_json"
+                                      rows="9"
+                                      class="form-control bg-dark text-light border-secondary font-monospace"
+                                      spellcheck="false">{{ case_draft.assert_json|default('{}') }}</textarea>
+                            <div class="form-text text-secondary">
+                                Hier steht, <strong class="text-light">was der Test erwarten soll</strong>. Das Feld muss gültiges JSON sein,
+                                also mit <code>{</code> anfangen und mit <code>}</code> enden. Keine Kommentare und kein Komma nach dem letzten Eintrag.
+                            </div>
+                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
+                                <div class="mb-2"><strong class="text-light">Wenn eine Shopquery exakt stimmen soll:</strong></div>
+                                <pre class="bg-black border border-secondary rounded p-2 small text-light mb-3"><code>{
+  "expected_query": "testomat 808"
+}</code></pre>
+                                <div class="mb-2"><strong class="text-light">Wenn bestimmte Wörter enthalten sein müssen:</strong></div>
+                                <pre class="bg-black border border-secondary rounded p-2 small text-light mb-3"><code>{
+  "must_include_terms": [
+    "testomat",
+    "808"
+  ]
+}</code></pre>
+                                <div class="mb-2"><strong class="text-light">Wenn ein Dokument gefunden werden muss:</strong></div>
+                                <pre class="bg-black border border-secondary rounded p-2 small text-light mb-0"><code>{
+  "min_results": 1,
+  "must_include_one_of_document_ids": [
+    "DOKUMENT-ID"
+  ]
+}</code></pre>
+                            </div>
+                        </div>
+
+                        <div class="mb-4">
+                            <label class="form-label">History-JSON <span class="text-secondary">optional</span></label>
+                            <textarea name="history_json"
+                                      rows="8"
+                                      class="form-control bg-dark text-light border-secondary font-monospace"
+                                      spellcheck="false"
+                                      placeholder='[{"prompt":"vorherige Frage","answer":"vorherige Antwort"}]'>{{ case_draft.history_json|default('') }}</textarea>
+                            <div class="form-text text-secondary">
+                                Nur ausfüllen, wenn die aktuelle Frage den <strong class="text-light">vorherigen Chatverlauf</strong> braucht.
+                                Für direkte Einzelprompts leer lassen. Das Feld muss eine JSON-Liste sein, also mit <code>[</code> anfangen und mit <code>]</code> enden.
+                            </div>
+                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
+                                Typischer Einsatz: Der Nutzer fragt zuerst nach dem niedrigsten Grenzwert, danach nach dem Indikator
+                                und anschließend <code>was kostet der indikator</code>. Dann braucht der Test die vorherigen Fragen und Antworten als History.
+                                <pre class="bg-black border border-secondary rounded p-2 small text-light mt-2 mb-0"><code>[
+  {
+    "prompt": "mit welchem indikator",
+    "answer": "Der Wert 0,02 °dH wird beim Testomat 808 mit Indikatortyp 300 gemessen."
+  }
+]</code></pre>
+                            </div>
+                        </div>
+
+                        <div class="mb-4">
+                            <label class="form-label">Request Context Hint <span class="text-secondary">optional</span></label>
+                            <textarea name="request_context_hint"
+                                      rows="3"
+                                      class="form-control bg-dark text-light border-secondary"
+                                      placeholder="Nur für Spezialfälle, wenn History nicht ausreicht.">{{ case_draft.request_context_hint|default('') }}</textarea>
+                            <div class="form-text text-secondary">
+                                Dieses Feld kannst du fast immer <strong class="text-light">leer lassen</strong>. Es ist nur für Sonderfälle gedacht,
+                                wenn der Test Zusatzkontext braucht, der nicht sauber als History darstellbar ist.
+                            </div>
+                            <div class="small text-secondary mt-2 border border-secondary rounded p-3 bg-dark">
+                                Beispiel für einen Sonderfall: <code>Im vorherigen Ergebnis waren mehrere Shop-Produkte sichtbar, aber keine normale Chatantwort.</code>
+                                Für normale Regressionen ist <strong class="text-light">History-JSON die bessere Wahl</strong>.
+                            </div>
+                        </div>
+
+                        <div class="d-flex flex-wrap gap-2">
+                            <button type="submit" class="btn btn-warning">
+                                <i class="bi bi-save"></i> Eval-Case speichern
+                            </button>
+                            <a href="{{ path('admin_evals_index', {type: case_draft.type|default('retrieval')}) }}"
+                               class="btn btn-outline-secondary">
+                                Abbrechen
+                            </a>
+                        </div>
+                    </form>
+                </div>
+            </div>
+        </div>
+
+        <div class="col-xl-4">
+            <div class="card bg-black border-danger text-light shadow-sm mb-4">
+                <div class="card-body">
+                    <h5 class="text-danger mb-3">
+                        <i class="bi bi-trash3"></i> Bestehende Eval-Cases entfernen
+                    </h5>
+                    <p class="small text-secondary mb-3">
+                        Hier kannst du falsch angelegte oder nicht mehr benötigte Cases aus den
+                        <code>tests/evals/cases/*.ndjson</code>-Dateien entfernen. Das Löschen betrifft nur den Eval-Case,
+                        nicht das RAG-Wissen, nicht den Shop und nicht die bestehenden Reports.
+                    </p>
+
+                    {% for type, label in types %}
+                        {% set cases = cases_by_type[type]|default([]) %}
+                        <details class="border border-secondary rounded p-3 mb-3" {% if type == case_draft.type|default('retrieval') %}open{% endif %}>
+                            <summary class="text-info" style="cursor:pointer;">
+                                {{ label }} <span class="text-secondary">({{ cases|length }} Cases)</span>
+                            </summary>
+
+                            {% if cases is empty %}
+                                <div class="small text-secondary mt-3">
+                                    Für diesen Typ gibt es aktuell keine Cases.
+                                </div>
+                            {% else %}
+                                <div class="mt-3">
+                                    {% for case in cases %}
+                                        <div class="border-top border-secondary pt-3 mt-3">
+                                            <div class="small mb-2">
+                                                <code>{{ case.id }}</code>
+                                                <div class="text-secondary mt-1">{{ case.prompt }}</div>
+                                            </div>
+                                            <form method="post"
+                                                  action="{{ path('admin_evals_case_delete') }}"
+                                                  onsubmit="return confirm('Eval-Case {{ case.id }} wirklich löschen? Diese Änderung entfernt die NDJSON-Zeile dauerhaft.');">
+                                                <input type="hidden" name="_token" value="{{ csrf_token('admin_eval_case_delete_' ~ type ~ '_' ~ case.id) }}">
+                                                <input type="hidden" name="type" value="{{ type }}">
+                                                <input type="hidden" name="case_id" value="{{ case.id }}">
+                                                <button type="submit" class="btn btn-sm btn-outline-danger">
+                                                    <i class="bi bi-trash3"></i> Case löschen
+                                                </button>
+                                            </form>
+                                        </div>
+                                    {% endfor %}
+                                </div>
+                            {% endif %}
+                        </details>
+                    {% endfor %}
+
+                    <div class="small text-secondary">
+                        Nach dem Löschen solltest du den betroffenen Eval-Typ einmal ausführen, damit der Report zum neuen Case-Bestand passt.
+                    </div>
+                </div>
+            </div>
+
+            <div class="card bg-black border-secondary text-light shadow-sm mb-4">
+                <div class="card-body">
+                    <h5 class="text-info mb-3">
+                        <i class="bi bi-info-circle"></i> Welcher Typ ist richtig?
+                    </h5>
+                    <div class="small text-secondary">
+                        <div class="mb-3">
+                            <strong class="text-light">Du willst prüfen, ob das richtige Dokument gefunden wird?</strong><br>
+                            Dann nimm <code>retrieval</code>.
+                        </div>
+                        <div class="mb-3">
+                            <strong class="text-light">Du willst prüfen, welche Suchwörter an den Shop gehen?</strong><br>
+                            Dann nimm <code>shop_query</code>.
+                        </div>
+                        <div class="mb-3">
+                            <strong class="text-light">Die Frage bezieht sich auf die vorherige Antwort?</strong><br>
+                            Dann nimm <code>followup</code> und fülle <code>History-JSON</code> aus.
+                        </div>
+                        <div>
+                            <strong class="text-light">RetrieX soll bei Unsinn nichts erfinden?</strong><br>
+                            Dann nimm <code>answer_guard</code>.
+                        </div>
+                    </div>
+                </div>
+            </div>
+
+            <div class="card bg-black border-secondary text-light shadow-sm mb-4">
+                <div class="card-body">
+                    <h5 class="text-info mb-3">
+                        <i class="bi bi-braces"></i> Häufige Assertions
+                    </h5>
+                    <div class="small text-secondary mb-2">Exakte Query:</div>
+                    <pre class="bg-dark border border-secondary rounded p-2 small text-light"><code>{
+  "expected_query": "testomat 808"
+}</code></pre>
+
+                    <div class="small text-secondary mb-2">Begriffe müssen enthalten sein:</div>
+                    <pre class="bg-dark border border-secondary rounded p-2 small text-light"><code>{
+  "must_include_terms": [
+    "testomat",
+    "808"
+  ]
+}</code></pre>
+
+                    <div class="small text-secondary mb-2">Begriffe dürfen nicht enthalten sein:</div>
+                    <pre class="bg-dark border border-secondary rounded p-2 small text-light"><code>{
+  "must_not_include_terms": [
+    "indikator",
+    "300"
+  ]
+}</code></pre>
+
+                    <div class="small text-secondary mb-2">Dokument muss enthalten sein:</div>
+                    <pre class="bg-dark border border-secondary rounded p-2 small text-light"><code>{
+  "min_results": 1,
+  "must_include_one_of_document_ids": [
+    "DOKUMENT-ID"
+  ]
+}</code></pre>
+                </div>
+            </div>
+
+            <div class="card bg-black border-secondary text-light shadow-sm mb-4">
+                <div class="card-body">
+                    <h5 class="text-info mb-3">
+                        <i class="bi bi-check2-square"></i> Vor dem Speichern prüfen
+                    </h5>
+                    <ul class="small text-secondary mb-0">
+                        <li>Prüft der Case genau einen Zweck?</li>
+                        <li>Ist die Case-ID eindeutig und ohne Leerzeichen?</li>
+                        <li>Ist der Prompt eine echte Nutzerfrage?</li>
+                        <li>Ist Assert-JSON gültiges JSON?</li>
+                        <li>Ist History nur bei echten Folgefragen gefüllt?</li>
+                    </ul>
+                </div>
+            </div>
+
+            <div class="card bg-black border-secondary text-light shadow-sm">
+                <div class="card-body">
+                    <h5 class="text-info mb-3">
+                        <i class="bi bi-lightbulb"></i> Empfehlung
+                    </h5>
+                    <p class="small text-secondary mb-0">
+                        Ein guter Eval-Case prüft genau einen Zweck. Lieber mehrere kleine Cases anlegen als einen großen, empfindlichen Case.
+                        Wenn du unsicher bist, starte mit <code>expected_query</code> bei Shop-/Follow-up-Fällen oder mit
+                        <code>must_include_one_of_document_ids</code> bei Retrieval-Fällen.
+                    </p>
+                </div>
+            </div>
+        </div>
+    </div>
+
+{% endblock %}
--- a/templates/admin/evals/index.html.twig
+++ b/templates/admin/evals/index.html.twig
@@ -0,0 +1,547 @@
+{% extends 'admin/base.html.twig' %}
+
+{% block title %}RetrieX Eval Suite{% endblock %}
+
+{% block body %}
+
+    <div class="d-flex justify-content-between align-items-center mb-4 flex-wrap gap-2">
+        <div>
+            <h1 class="h3 mb-1">
+                <i class="bi bi-clipboard2-check"></i> RetrieX Eval Suite
+            </h1>
+            <div class="small text-secondary">
+                Regressionen für Retrieval, Shopquery, Follow-up und Answer-Guard direkt im Admin prüfen.
+            </div>
+        </div>
+
+        <div class="d-flex flex-wrap gap-2">
+            <a href="{{ path('admin_evals_case_new', {type: selected_type|default('retrieval')}) }}"
+               class="btn btn-sm btn-outline-warning">
+                <i class="bi bi-journal-plus"></i> Eval-Cases verwalten
+            </a>
+            <a href="{{ path('admin_model_config_list') }}"
+               class="btn btn-sm btn-outline-secondary">
+                Zurück zum KI-/LLM-Setup
+            </a>
+        </div>
+    </div>
+
+    {% for label in ['success', 'danger', 'warning', 'info'] %}
+        {% for message in app.flashes(label) %}
+            <div class="alert alert-{{ label }} shadow-sm">
+                {{ message }}
+            </div>
+        {% endfor %}
+    {% endfor %}
+
+
+
+    <div id="adminEvalRunOverlay"
+         class="position-fixed top-0 start-0 w-100 h-100 d-none"
+         style="background: rgba(0, 0, 0, .72); z-index: 1080;">
+        <div class="h-100 d-flex align-items-center justify-content-center px-3">
+            <div class="card bg-black border-warning text-light shadow-lg" style="max-width: 520px; width: 100%;">
+                <div class="card-body text-center py-5">
+                    <div class="spinner-border text-warning mb-3" role="status" aria-hidden="true"></div>
+                    <h5 class="text-warning mb-2" id="adminEvalRunOverlayLabel">Eval läuft ...</h5>
+                    <div class="small text-secondary">
+                        Die Regressionstests werden ausgeführt. Bitte die Seite nicht neu laden.
+                    </div>
+                </div>
+            </div>
+        </div>
+    </div>
+
+    <div class="row g-4 mb-4">
+        {% for item in overview %}
+            {% set report = item.report %}
+            {% set status = item.status %}
+            {% set badgeClass = status == 'green'
+                ? 'bg-success'
+                : (status == 'red' ? 'bg-danger' : 'bg-secondary')
+            %}
+            <div class="col-md-6 col-xl-3">
+                <div class="card bg-black border-secondary text-light h-100 shadow-sm">
+                    <div class="card-body">
+                        <div class="d-flex justify-content-between align-items-start gap-2 mb-2">
+                            <h5 class="text-info mb-0">{{ item.label }}</h5>
+                            <span class="badge {{ badgeClass }}">
+                                {% if status == 'green' %}
+                                    grün
+                                {% elseif status == 'red' %}
+                                    rot
+                                {% elseif status == 'empty' %}
+                                    leer
+                                {% else %}
+                                    nicht gelaufen
+                                {% endif %}
+                            </span>
+                        </div>
+
+                        <div class="small text-secondary mb-3">
+                            {{ item.case_count }} Cases
+                        </div>
+
+                        {% if report %}
+                            <div class="small">
+                                <div><strong>Total:</strong> {{ report.total|default(0) }}</div>
+                                <div><strong>Passed:</strong> {{ report.passed|default(0) }}</div>
+                                <div><strong>Failed:</strong> {{ report.failed|default(0) }}</div>
+                                <div class="text-secondary mt-2">
+                                    {{ report.generated_at|default('') }}
+                                </div>
+                            </div>
+                        {% else %}
+                            <div class="small text-secondary">
+                                Für diesen Typ liegt noch kein Admin-Report vor.
+                            </div>
+                        {% endif %}
+
+                        <div class="d-flex flex-wrap gap-2 mt-3">
+                            <form method="post"
+                                  action="{{ path('admin_evals_run') }}"
+                                  class="d-inline js-admin-eval-run-form"
+                                  data-eval-type-label="{{ item.label|e('html_attr') }}">
+                                <input type="hidden" name="_token" value="{{ csrf_token('admin_eval_run') }}">
+                                <input type="hidden" name="type" value="{{ item.type }}">
+                                <button type="submit" class="btn btn-sm btn-outline-warning js-admin-eval-run-button">
+                                    <span class="js-admin-eval-button-label">Run</span>
+                                    <span class="spinner-border spinner-border-sm ms-2 d-none js-admin-eval-button-spinner"
+                                          role="status"
+                                          aria-hidden="true"></span>
+                                </button>
+                            </form>
+
+                            <a class="btn btn-sm btn-outline-info"
+                               href="{{ path('admin_evals_index', {type: item.type}) }}">
+                                Details
+                            </a>
+                        </div>
+                    </div>
+                </div>
+            </div>
+        {% endfor %}
+    </div>
+
+    <div class="row g-4 mb-4">
+        <div class="col-xl-5">
+            <div class="card bg-black border-secondary text-light h-100 shadow-sm">
+                <div class="card-body">
+                    <h5 class="text-warning mb-3">
+                        <i class="bi bi-play-circle"></i> Eval ausführen
+                    </h5>
+
+                    <form method="post"
+                          action="{{ path('admin_evals_run') }}"
+                          class="js-admin-eval-run-form"
+                          data-eval-type-label="Ausgewählter Eval">
+                        <input type="hidden" name="_token" value="{{ csrf_token('admin_eval_run') }}">
+
+                        <div class="mb-3">
+                            <label class="form-label">Eval-Typ</label>
+                            <select name="type" class="form-select bg-dark text-light border-secondary js-admin-eval-type-select">
+                                {% for type, label in types %}
+                                    <option value="{{ type }}" {% if type == selected_type %}selected{% endif %}>
+                                        {{ label }}
+                                    </option>
+                                {% endfor %}
+                            </select>
+                            <div class="form-text text-secondary">
+                                Ohne Case-ID wird der komplette Typ ausgeführt.
+                            </div>
+                        </div>
+
+                        <div class="mb-3">
+                            <label class="form-label">Optional: Case</label>
+                            <select name="case_id"
+                                    class="form-select bg-dark text-light border-secondary js-admin-eval-case-select">
+                                <option value="">Alle Cases des ausgewählten Typs</option>
+                                {% for type, cases in cases_by_type %}
+                                    {% for case in cases %}
+                                        <option value="{{ case.id }}"
+                                                data-eval-type="{{ type }}"
+                                                {% if type != selected_type %}hidden disabled{% endif %}>
+                                            {{ case.id }} — {{ case.prompt }}
+                                        </option>
+                                    {% endfor %}
+                                {% endfor %}
+                            </select>
+                            <div class="form-text text-secondary">
+                                Die Case-Liste wird passend zum Eval-Typ gefiltert. Leer lassen, um alle Cases des Typs auszuführen.
+                            </div>
+                        </div>
+
+                        <button type="submit" class="btn btn-outline-warning js-admin-eval-run-button">
+                            <span class="js-admin-eval-button-label">Eval starten</span>
+                            <span class="spinner-border spinner-border-sm ms-2 d-none js-admin-eval-button-spinner"
+                                  role="status"
+                                  aria-hidden="true"></span>
+                        </button>
+                    </form>
+                </div>
+            </div>
+        </div>
+
+        <div class="col-xl-7">
+            <div class="card bg-black border-secondary text-light h-100 shadow-sm">
+                <div class="card-body">
+                    <h5 class="text-info mb-3">
+                        <i class="bi bi-terminal"></i> CLI-Referenz
+                    </h5>
+
+                    <p class="small text-secondary mb-3">
+                        Die Admin-Runs schreiben typspezifische Reports nach
+                        <code>tests/evals/reports/&lt;type&gt;-last-run.json</code>
+                        und zusätzlich den bekannten <code>last-run.json</code>.
+                    </p>
+
+                    <div class="small">
+                        {% for type, label in types %}
+                            <div class="mb-2">
+                                <span class="text-info">{{ label }}</span><br>
+                                <code>php bin/console mto:agent:eval:run {{ type }}</code>
+                            </div>
+                        {% endfor %}
+                    </div>
+
+                    {% if last_report %}
+                        <hr class="border-secondary">
+                        <div class="small text-secondary">
+                            Letzter generischer Report:
+                            <span class="text-light">{{ last_report.type|default('unknown') }}</span>,
+                            {{ last_report.passed|default(0) }}/{{ last_report.total|default(0) }} bestanden,
+                            {{ last_report.generated_at|default('') }}
+                        </div>
+                    {% endif %}
+                </div>
+            </div>
+        </div>
+    </div>
+
+    <div class="card bg-black border-secondary text-light shadow-sm">
+        <div class="card-body">
+            <div class="d-flex justify-content-between align-items-center flex-wrap gap-2 mb-3">
+                <h5 class="text-warning mb-0">
+                    <i class="bi bi-list-check"></i>
+                    Report-Details: {{ types[selected_type]|default(selected_type) }}
+                </h5>
+
+                <div class="btn-group btn-group-sm" role="group" aria-label="Eval report types">
+                    {% for type, label in types %}
+                        <a class="btn {{ type == selected_type ? 'btn-info' : 'btn-outline-info' }}"
+                           href="{{ path('admin_evals_index', {type: type}) }}">
+                            {{ label }}
+                        </a>
+                    {% endfor %}
+                </div>
+            </div>
+
+            {% if selected_report %}
+                {% set selectedFailed = selected_report.failed|default(0) %}
+                <div class="row g-3 mb-3 small">
+                    <div class="col-md-3">
+                        <div class="border border-secondary rounded p-3 h-100">
+                            <div class="text-secondary">Total</div>
+                            <div class="h5 mb-0">{{ selected_report.total|default(0) }}</div>
+                        </div>
+                    </div>
+                    <div class="col-md-3">
+                        <div class="border border-secondary rounded p-3 h-100">
+                            <div class="text-secondary">Passed</div>
+                            <div class="h5 text-success mb-0">{{ selected_report.passed|default(0) }}</div>
+                        </div>
+                    </div>
+                    <div class="col-md-3">
+                        <div class="border border-secondary rounded p-3 h-100">
+                            <div class="text-secondary">Failed</div>
+                            <div class="h5 {{ selectedFailed == 0 ? 'text-success' : 'text-danger' }} mb-0">
+                                {{ selectedFailed }}
+                            </div>
+                        </div>
+                    </div>
+                    <div class="col-md-3">
+                        <div class="border border-secondary rounded p-3 h-100">
+                            <div class="text-secondary">Generated</div>
+                            <div class="small text-light">{{ selected_report.generated_at|default('') }}</div>
+                        </div>
+                    </div>
+                </div>
+
+                <div class="table-responsive">
+                    <table class="table table-dark table-striped table-hover align-middle mb-0">
+                        <thead class="table-secondary text-dark">
+                        <tr>
+                            <th>Status</th>
+                            <th>Case</th>
+                            <th>Dauer</th>
+                            <th>Failures / Details</th>
+                        </tr>
+                        </thead>
+                        <tbody>
+                        {% for result in selected_report.results|default([]) %}
+                            <tr>
+                                <td style="width: 110px;">
+                                    {% if result.passed|default(false) %}
+                                        <span class="badge bg-success">PASS</span>
+                                    {% else %}
+                                        <span class="badge bg-danger">FAIL</span>
+                                    {% endif %}
+                                </td>
+                                <td style="min-width: 260px;">
+                                    <code>{{ result.case_id|default('') }}</code>
+                                    <div class="small text-secondary mb-2">{{ result.type|default('') }}</div>
+
+                                    {% set casePrompt = result.prompt|default(result.details.prompt|default('')) %}
+                                    {% if casePrompt %}
+                                        <div class="small mb-2">
+                                            <span class="text-secondary">Prompt:</span><br>
+                                            <span class="text-light">{{ casePrompt }}</span>
+                                        </div>
+                                    {% endif %}
+
+                                    <div class="mt-2">
+                                        <a href="{{ path('admin_evals_case_new', {source_type: selected_type, source_case_id: result.case_id|default('')}) }}"
+                                           class="btn btn-sm btn-outline-warning">
+                                            <i class="bi bi-journal-plus"></i> Als neuen Case vorbereiten
+                                        </a>
+                                    </div>
+
+                                    {% set historyRows = result.details.history|default([]) %}
+                                    {% if historyRows is not empty %}
+                                        <details class="small">
+                                            <summary class="text-info" style="cursor:pointer;">
+                                                Kontext / History anzeigen
+                                            </summary>
+                                            <div class="mt-2 ps-2 border-start border-secondary">
+                                                {% for turn in historyRows %}
+                                                    <div class="mb-2">
+                                                        <div class="text-secondary">Vorheriger Prompt:</div>
+                                                        <div class="text-light">{{ turn.prompt|default('') }}</div>
+                                                        {% if turn.answer_preview|default('') %}
+                                                            <div class="text-secondary mt-1">Antwort-Auszug:</div>
+                                                            <div class="text-secondary">{{ turn.answer_preview }}</div>
+                                                        {% endif %}
+                                                    </div>
+                                                {% endfor %}
+                                            </div>
+                                        </details>
+                                    {% endif %}
+                                </td>
+                                <td style="width: 120px;">
+                                    {{ result.duration_ms|default(0) }} ms
+                                </td>
+                                <td>
+                                    {% if result.failures|default([]) is not empty %}
+                                        <ul class="mb-2 small text-danger">
+                                            {% for failure in result.failures %}
+                                                <li>{{ failure }}</li>
+                                            {% endfor %}
+                                        </ul>
+                                    {% else %}
+                                        <div class="small text-success mb-2">Keine Fehler.</div>
+                                    {% endif %}
+
+                                    {% set documentRefs = result.details.document_refs|default([]) %}
+                                    {% if documentRefs is not empty %}
+                                        <div class="mb-2">
+                                            <div class="small text-secondary mb-1">Gefundene Dokumente</div>
+                                            <div class="table-responsive">
+                                                <table class="table table-dark table-sm table-bordered border-secondary align-middle mb-2">
+                                                    <thead>
+                                                    <tr class="small text-secondary">
+                                                        <th style="width: 90px;">Ranks</th>
+                                                        <th>Titel / Datei</th>
+                                                        <th style="width: 170px;">Doc-ID</th>
+                                                        <th style="width: 220px;">Chunks</th>
+                                                    </tr>
+                                                    </thead>
+                                                    <tbody>
+                                                    {% for doc in documentRefs %}
+                                                        <tr>
+                                                            <td class="small">{{ doc.ranks|default([])|join(', ') }}</td>
+                                                            <td>
+                                                                <div class="fw-semibold">{{ doc.title|default('Ohne Titel') }}</div>
+                                                                {% if doc.file_path|default('') %}
+                                                                    <div class="small text-secondary" style="word-break: break-all;">
+                                                                        {{ doc.file_path }}
+                                                                    </div>
+                                                                {% endif %}
+                                                                {% if doc.version_number|default('') %}
+                                                                    <div class="small text-secondary">Version: {{ doc.version_number }}</div>
+                                                                {% endif %}
+                                                            </td>
+                                                            <td><code class="small">{{ doc.id|default('') }}</code></td>
+                                                            <td class="small" style="word-break: break-all;">
+                                                                {% for chunkId in doc.chunk_ids|default([]) %}
+                                                                    <code>{{ chunkId }}</code>{% if not loop.last %}<br>{% endif %}
+                                                                {% endfor %}
+                                                            </td>
+                                                        </tr>
+                                                    {% endfor %}
+                                                    </tbody>
+                                                </table>
+                                            </div>
+                                        </div>
+                                    {% endif %}
+
+                                    {% set resultRows = result.details.result_rows|default([]) %}
+                                    {% if resultRows is not empty %}
+                                        <details class="mb-2">
+                                            <summary class="small text-info" style="cursor:pointer;">
+                                                Treffer / Chunks anzeigen
+                                            </summary>
+                                            <div class="table-responsive mt-2">
+                                                <table class="table table-dark table-sm table-bordered border-secondary align-middle mb-0">
+                                                    <thead>
+                                                    <tr class="small text-secondary">
+                                                        <th style="width: 60px;">Rank</th>
+                                                        <th>Titel / Datei</th>
+                                                        <th style="width: 180px;">Chunk</th>
+                                                        <th>Preview</th>
+                                                    </tr>
+                                                    </thead>
+                                                    <tbody>
+                                                    {% for row in resultRows %}
+                                                        <tr>
+                                                            <td>{{ row.rank|default('') }}</td>
+                                                            <td>
+                                                                <div class="fw-semibold">{{ row.document_title|default('Ohne Titel') }}</div>
+                                                                {% if row.file_path|default('') %}
+                                                                    <div class="small text-secondary" style="word-break: break-all;">{{ row.file_path }}</div>
+                                                                {% endif %}
+                                                                <div class="small text-secondary">Doc-ID: <code>{{ row.document_id|default('') }}</code></div>
+                                                            </td>
+                                                            <td class="small" style="word-break: break-all;">
+                                                                <code>{{ row.chunk_id|default('') }}</code>
+                                                                {% if row.chunk_index is defined and row.chunk_index is not same as(null) %}
+                                                                    <div class="text-secondary">Index: {{ row.chunk_index }}</div>
+                                                                {% endif %}
+                                                            </td>
+                                                            <td class="small text-secondary">{{ row.text_preview|default('') }}</td>
+                                                        </tr>
+                                                    {% endfor %}
+                                                    </tbody>
+                                                </table>
+                                            </div>
+                                        </details>
+                                    {% endif %}
+
+                                    <details>
+                                        <summary class="small text-info" style="cursor:pointer;">
+                                            JSON-Details anzeigen
+                                        </summary>
+                                        <pre class="bg-dark border border-secondary rounded p-2 mt-2 small text-light" style="white-space: pre-wrap; max-height: 260px; overflow: auto;">{{ result.details|default({})|json_encode(constant('JSON_PRETTY_PRINT')) }}</pre>
+                                    </details>
+                                </td>
+                            </tr>
+                        {% else %}
+                            <tr>
+                                <td colspan="4" class="text-center text-secondary py-4">
+                                    Dieser Report enthält keine Resultate.
+                                </td>
+                            </tr>
+                        {% endfor %}
+                        </tbody>
+                    </table>
+                </div>
+            {% else %}
+                <div class="alert alert-secondary mb-0">
+                    Für {{ types[selected_type]|default(selected_type) }} liegt noch kein typspezifischer Admin-Report vor.
+                    Starte den Eval oben oder per CLI.
+                </div>
+            {% endif %}
+        </div>
+    </div>
+
+
+    <script>
+        document.addEventListener('DOMContentLoaded', function () {
+            const forms = Array.from(document.querySelectorAll('.js-admin-eval-run-form'));
+            const overlay = document.getElementById('adminEvalRunOverlay');
+            const overlayLabel = document.getElementById('adminEvalRunOverlayLabel');
+
+            function resolveEvalLabel(form) {
+                const select = form.querySelector('.js-admin-eval-type-select');
+                if (select && select.selectedOptions.length > 0) {
+                    return select.selectedOptions[0].textContent.trim();
+                }
+
+                return (form.dataset.evalTypeLabel || 'Eval').trim();
+            }
+
+            function syncCaseSelect(form) {
+                const typeSelect = form.querySelector('.js-admin-eval-type-select');
+                const caseSelect = form.querySelector('.js-admin-eval-case-select');
+
+                if (!typeSelect || !caseSelect) {
+                    return;
+                }
+
+                const selectedType = typeSelect.value;
+
+                Array.from(caseSelect.options).forEach(function (option) {
+                    if (option.value === '') {
+                        option.hidden = false;
+                        option.disabled = false;
+                        return;
+                    }
+
+                    const matchesType = option.dataset.evalType === selectedType;
+                    option.hidden = !matchesType;
+                    option.disabled = !matchesType;
+
+                    if (!matchesType && option.selected) {
+                        caseSelect.value = '';
+                    }
+                });
+            }
+
+            function setAllRunButtonsDisabled() {
+                document.querySelectorAll('.js-admin-eval-run-button').forEach(function (button) {
+                    button.disabled = true;
+                    button.classList.add('disabled');
+                });
+            }
+
+            forms.forEach(function (form) {
+                syncCaseSelect(form);
+
+                const typeSelect = form.querySelector('.js-admin-eval-type-select');
+                if (typeSelect) {
+                    typeSelect.addEventListener('change', function () {
+                        syncCaseSelect(form);
+                    });
+                }
+
+                form.addEventListener('submit', function (event) {
+                    const button = event.submitter && event.submitter.classList.contains('js-admin-eval-run-button')
+                        ? event.submitter
+                        : form.querySelector('.js-admin-eval-run-button');
+                    const label = resolveEvalLabel(form);
+
+                    if (overlay && overlayLabel) {
+                        overlayLabel.textContent = label + ' läuft ...';
+                        overlay.classList.remove('d-none');
+                    }
+
+                    if (button) {
+                        const buttonLabel = button.querySelector('.js-admin-eval-button-label');
+                        const spinner = button.querySelector('.js-admin-eval-button-spinner');
+
+                        if (buttonLabel) {
+                            buttonLabel.textContent = 'Läuft ...';
+                        }
+
+                        if (spinner) {
+                            spinner.classList.remove('d-none');
+                        }
+                    }
+
+                    setAllRunButtonsDisabled();
+                    document.body.style.cursor = 'progress';
+                });
+            });
+        });
+    </script>
+
+{% endblock %}
--- a/templates/admin/model_config/list.html.twig
+++ b/templates/admin/model_config/list.html.twig
@@ -4,15 +4,24 @@

 {% block body %}

-    <div class="d-flex justify-content-between align-items-center mb-4">
+    <div class="d-flex justify-content-between align-items-center mb-4 flex-wrap gap-2">
        <h1 class="h3 mb-0"><i class="bi bi-rocket-takeoff-fill"></i> KI Modell-Generierung</h1>

-        {% if is_granted('ROLE_SUPER_ADMIN') %}
-            <a href="{{ path('admin_model_config_create') }}"
-               class="btn btn-sm btn-outline-info">
-                Neue Konfiguration
-            </a>
-        {% endif %}
+        <div class="d-flex flex-wrap gap-2">
+            {% if is_granted('ROLE_KNOWLEDGE_ADMIN') %}
+                <a href="{{ path('admin_evals_index') }}"
+                   class="btn btn-sm btn-outline-warning">
+                    Eval Suite
+                </a>
+            {% endif %}
+
+            {% if is_granted('ROLE_SUPER_ADMIN') %}
+                <a href="{{ path('admin_model_config_create') }}"
+                   class="btn btn-sm btn-outline-info">
+                    Neue Konfiguration
+                </a>
+            {% endif %}
+        </div>
    </div>

    {# ========================================================= #}
--- a/tests/evals/cases/answer_guard.ndjson
+++ b/tests/evals/cases/answer_guard.ndjson
@@ -0,0 +1,4 @@
+{"id":"answer_guard_noise_no_evidence_001","type":"answer_guard","prompt":"dsgfsdgfsdgf","assert":{"max_results":0}}
+{"id":"answer_guard_mythical_medium_no_direct_evidence_001","type":"answer_guard","prompt":"gibt es einen testomat für drachenblut","assert":{"must_not_include_terms":["drachenblut"]}}
+{"id":"answer_guard_lunar_water_no_direct_evidence_001","type":"answer_guard","prompt":"welcher testomat misst mondwasser im vakuum","assert":{"must_not_include_terms":["mondwasser","vakuum"]}}
+{"id":"answer_guard_delivery_not_sdb_001","type":"answer_guard","prompt":"lieferbedingungen versand testomat","assert":{"min_results":1,"must_include_one_of_document_ids":["26ddf03d-9108-4a65-aa0e-a5df7613fa77"],"must_not_include_document_ids":["7166592f-85f2-425c-997b-73e323ae184d"]}}
--- a/tests/evals/cases/followup.ndjson
+++ b/tests/evals/cases/followup.ndjson
@@ -0,0 +1,4 @@
+{"id":"followup_indicator_price_001","type":"followup","prompt":"was kostet der indikator","history":[{"prompt":"Was ist der niedrigste Grenzwert für die Wasserhärte, welcher mit einem Testomaten überwacht werden kann?","answer":"Der niedrigste Grenzwert für die Wasserhärte beträgt 0,02 °dH. Dieser Wert wird vom Testomat 808 gemessen."},{"prompt":"mit welchem indikator","answer":"Der niedrigste messbare Grenzwert für Wasserhärte mit dem Testomat 808 wird mit dem Indikatortyp 300 erreicht."}],"assert":{"expected_query":"testomat 808 300 indikator","must_include_terms":["testomat","808","300","indikator"],"must_not_include_terms":["300 s","301","302","303","testomat 2000"]}}
+{"id":"followup_main_device_price_001","type":"followup","prompt":"und was kostet das gerät selber","history":[{"prompt":"was kostet der indikator","answer":"Shop-Suche abgeschlossen. Gesendete Suchquery: testomat 808 300 indikator. Testomat® 808 Indikator 300 500 ml, Produkt-Nummer 141001. Testomat® 808 Indikator 300 2 x 100 ml, Produkt-Nummer 140001. Der zugehörige Testomat ist Testomat 808."}],"assert":{"expected_query":"testomat 808","must_include_terms":["testomat","808"],"must_not_include_terms":["indikator","300","141001","140001"]}}
+{"id":"followup_weak_shop_information_anchor_001","type":"followup","prompt":"suche im shop nach der information","history":[{"prompt":"welche grenzwerte kann der testomat 2000 thcl messen","answer":"Der relevante Produktanker ist Testomat 2000 THCL. Das Gerät ist für Chlorüberwachung / freies Chlor relevant."}],"assert":{"expected_query":"testomat 2000 thcl","must_include_terms":["testomat","2000","thcl"],"must_not_equal_query":"information","must_not_include_terms":["information"]}}
+{"id":"followup_product_links_split_001","type":"followup","prompt":"gebe mir links zu den produkten aus dem shop","history":[{"prompt":"gerät zur messung Prozesswasser in medizinischen Geräten","answer":"Geeignete Produktanker sind Testomat 2000 Self Clean, Testomat 2000 CAL und Testomat 808."}],"assert":{"expected_individual_queries":["testomat 2000 self clean","testomat 2000 cal","testomat 808"],"expected_individual_queries_exact":true,"min_individual_queries":3,"max_individual_queries":3,"must_not_include_terms":["links zu aus"]}}
--- a/tests/evals/cases/retrieval.ndjson
+++ b/tests/evals/cases/retrieval.ndjson
@@ -16,4 +16,4 @@
 {"id":"retrieval_negative_003","type":"retrieval","prompt":"testomat 2000 self clean reinigungsloesung","assert":{"min_results":1,"must_include_one_of_document_ids":["51589532-a1a1-46e0-94b2-a139dce78543","b8c3343b-931e-4994-9d53-a2130efc846f"],"must_include_any_terms":["reinigungslösung","self clean"],"must_not_include_document_ids":["26129c01-c09f-4c71-9c80-7ddffb6c77fb"]}}
 {"id":"retrieval_short_001","type":"retrieval","prompt":"evo th","assert":{"min_results":1,"must_include_one_of_document_ids":["eb91c1be-4546-4ed5-8b01-f075519d675b","74fdad85-5e4e-4f08-8d95-402f3180ed55"],"must_include_any_terms":["evo"]}}
 {"id":"retrieval_short_002","type":"retrieval","prompt":"808","assert":{"min_results":1,"must_include_one_of_document_ids":["26129c01-c09f-4c71-9c80-7ddffb6c77fb"],"must_include_any_terms":["808"]}}
-{"id":"retrieval_noise_001","type":"retrieval","prompt":"dsgfsdgfsdgf","assert":{"max_results":0}}
+{"id":"retrieval_notfound_doc","type":"retrieval","prompt":"hdfghdfghdfhg","assert":{"min_results":0}}
--- a/tests/evals/cases/shop_query.ndjson
+++ b/tests/evals/cases/shop_query.ndjson
@@ -0,0 +1,5 @@
+{"id":"shop_query_indicator_exact_001","type":"shop_query","prompt":"was kostet der Testomat 808 Indikator 300","assert":{"must_include_terms":["testomat","808","300","indikator"],"must_not_include_terms":["300 s","301","302","303","gerät selber"]}}
+{"id":"shop_query_brewing_water_cleanup_001","type":"shop_query","prompt":"ich möchte für brauerei das brauwasser messen","assert":{"expected_query":"brauerei brauwasser","must_include_terms":["brauerei","brauwasser"],"must_not_include_terms":["möchte","messen","think"]}}
+{"id":"shop_query_swimming_pool_typo_001","type":"shop_query","prompt":"ich würde gern chlor im schwinnbad messen","assert":{"expected_query":"chlor schwimmbad","must_include_terms":["chlor","schwimmbad"],"must_not_include_terms":["schwinnbad","messen"]}}
+{"id":"shop_query_lab_cl_acronym_001","type":"shop_query","prompt":"Zeige mir die Preise zu Testomat LAB CL.","assert":{"expected_query":"testomat lab cl","must_include_terms":["testomat","lab","cl"],"must_not_equal_query":"testomat"}}
+{"id":"shop_query_sio2_anchor_001","type":"shop_query","prompt":"suche gerät kühlsysteme Silikatüberwachung","assert":{"expected_query":"testomat 808 sio2","must_include_terms":["testomat","808","sio2"],"must_not_include_terms":["kühlsysteme","silikatüberwachung"]}}
Author	SHA1	Message	Date
team 1	64d1ec71e8	p101d	2026-05-12 11:53:36 +02:00
team 1	3f914c1efd	p101b	2026-05-12 11:26:05 +02:00
team 1	6e2ca15e97	p101a	2026-05-12 11:08:34 +02:00
team 1	6dced1c4df	p101	2026-05-12 10:56:50 +02:00
team 1	feaec9bbaf	p100c	2026-05-12 09:16:09 +02:00
team 1	0d55c0a439	p100	2026-05-12 08:57:57 +02:00
team 1	03d4a1d7c3	p99c	2026-05-12 08:38:16 +02:00
team 1	3d0092b753	p99	2026-05-12 08:25:59 +02:00
team 1	e072a8e15e	p98	2026-05-12 07:53:49 +02:00
team 1	aa80acb10f	add multi model	2026-05-11 20:56:57 +02:00