p99c

2026-05-12 08:38:16 +02:00
parent 3d0092b753
commit 03d4a1d7c3
5 changed files with 190 additions and 9 deletions
--- a/patch_history/RETRIEX_PATCH_99B_EVAL_SUITE_ALIGNMENT_README.md
+++ b/patch_history/RETRIEX_PATCH_99B_EVAL_SUITE_ALIGNMENT_README.md
@@ -0,0 +1,85 @@
+# RetrieX Patch p99b - Eval Suite Alignment
+
+## Ziel
+
+p99 hatte die neue Eval-Suite erfolgreich aktiviert, aber drei neue Cases zeigten nach dem ersten Lauf rote Signale. p99b trennt dabei False-Positive-Assertions von zwei realen Robustheitsluecken, ohne die bestehende Retrieval-Baseline oder Shop-/Follow-up-Architektur umzubauen.
+
+## Ausgangslage
+
+Nach p99:
+
+- `mto:agent:config:validate`: OK
+- `mto:agent:eval:run retrieval`: 19/19 OK
+- `mto:agent:eval:run shop_query`: 4/5 OK
+- `mto:agent:eval:run followup`: 3/4 OK
+- `mto:agent:eval:run answer_guard`: 3/4 OK
+
+Rote Cases:
+
+- `shop_query_sio2_anchor_001`: normalisierte Shopquery konnte auf `gerät` zusammenschrumpfen.
+- `followup_main_device_price_001`: Hauptgeraet-Follow-up konnte an der vorherigen Indikator-Query `testomat 808 indikator 300` haengen bleiben.
+- `answer_guard_delivery_not_sdb_001`: Assertion war zu streng, weil ein Textbegriff `Sicherheitsdatenblatt` im Retrieval-Text kein ausreichender Fehlernachweis ist, solange das falsche Dokument nicht dominiert.
+
+## Aenderungen
+
+### 1. SiO2/Silikat als aktuelle Eingabe schuetzen
+
+`config/retriex/genre.yaml`
+
+Ergaenzt `shop_query_runtime.current_input_preservation_terms` um:
+
+- `silikat`
+- `silikatüberwachung`
+- `silikatueberwachung`
+- `sio2`
+- `si o2`
+- `kieselsäure`
+- `kieselsaeure`
+
+Damit verliert eine normalisierte Standalone-Shopfrage wie `suche gerät kühlsysteme Silikatüberwachung` nicht mehr den fachlichen Messparameter, bevor die generische Device-Anchor-Regel `testomat 808 sio2` greifen kann.
+
+### 2. Hauptgeraet-Follow-up darf Zubehoerreste entfernen
+
+`src/Agent/AgentRunner.php`
+
+`guardMainDeviceReferentialShopQueryWithHistoryModelAnchor()` wurde so angepasst, dass eine Shopquery wie `testomat 808 indikator 300` bei einem Prompt wie `und was kostet das gerät selber` nicht allein deshalb akzeptiert wird, weil sie bereits einen Modellanker enthaelt.
+
+Neu wird geprueft, ob nach dem Modellanker noch Zubehoer-/Code-Resttokens vorhanden sind. Falls ja, wird auf den reinen Modellanker aus dem Verlauf reduziert, z. B. `testomat 808`.
+
+### 3. Answer-Guard-Case weniger spröde
+
+`tests/evals/cases/answer_guard.ndjson`
+
+Der Case `answer_guard_delivery_not_sdb_001` prueft weiterhin:
+
+- passendes Liefer-/Versand-Dokument muss enthalten sein
+- konkretes SDB-Dokument darf nicht enthalten sein
+
+Die zu breite Text-Assertion auf den Begriff `sicherheitsdatenblatt` wurde entfernt, weil sie auch legitime Neben-/Hinweistexte treffen kann.
+
+## Bewusst nicht geaendert
+
+- Keine Retrieval-Gewichte
+- Keine Shopware-Suche
+- Keine Prompt-Texte
+- Keine Modellparameter
+- Keine neue Produkt-Sonderlogik
+- Keine Aenderung an p98-Retrieval-Eval-Cases
+
+## Erwartete Checks
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+php bin/console mto:agent:eval:run shop_query
+php bin/console mto:agent:eval:run followup
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Erwartung:
+
+- Config valid
+- Retrieval 19/19
+- Shopquery 5/5
+- Followup 4/4
+- Answer guard 4/4
--- a/patch_history/RETRIEX_PATCH_99C_MAIN_DEVICE_FOLLOWUP_EVAL_ALIGNMENT_README.md
+++ b/patch_history/RETRIEX_PATCH_99C_MAIN_DEVICE_FOLLOWUP_EVAL_ALIGNMENT_README.md
@@ -0,0 +1,60 @@
+# RETRIEX PATCH 99C - Main Device Follow-up Eval Alignment
+
+Status: patch-only follow-up for p99/p99b.
+
+## Goal
+
+Keep the new p99 follow-up eval suite aligned with the already confirmed manual
+reference flow:
+
+1. lowest water-hardness threshold
+2. indicator type
+3. indicator price
+4. main device price
+
+The main-device follow-up `und was kostet das gerät selber` must resolve back to
+the main device anchor (`testomat 808`) and must not keep accessory remnants such
+as `indikator` or exact indicator code `300`.
+
+## Root cause
+
+p99b added a residual accessory guard, but the main-device history-anchor guard
+returned early for non-generic shop queries before the residual check could run.
+A query like `testomat 808 indikator 300` contains digits, so it was not treated
+as a generic main-device query and stayed unchanged.
+
+## Change
+
+`AgentRunner::guardMainDeviceReferentialShopQueryWithHistoryModelAnchor()` now:
+
+1. detects the main-device referential prompt,
+2. extracts the latest history model anchor,
+3. if the generated shop query already contains that model anchor, checks for
+   accessory/code residuals,
+4. reduces the query to the pure model anchor when such residuals are present.
+
+This keeps explicit non-generic product queries untouched unless they contain the
+current history model anchor plus accessory leftovers in a main-device follow-up.
+
+## Expected eval result
+
+```bash
+php bin/console mto:agent:config:validate
+php bin/console mto:agent:eval:run retrieval
+php bin/console mto:agent:eval:run shop_query
+php bin/console mto:agent:eval:run followup
+php bin/console mto:agent:eval:run answer_guard
+```
+
+Expected:
+
+- retrieval: 19/19
+- shop_query: 5/5
+- followup: 4/4
+- answer_guard: 4/4
+
+## Productive logic impact
+
+Minimal. The patch only changes the already existing main-device follow-up guard
+for prompts asking for the main device itself. It does not modify retrieval,
+ranking, prompt templates, YAML vocabulary, shop result guards, or answer logic.