patch 19
This commit is contained in:
112
RETRIEX_PATCH_18_EVIDENCE_STATE_AGGREGATE_CONSISTENCY_README.md
Normal file
112
RETRIEX_PATCH_18_EVIDENCE_STATE_AGGREGATE_CONSISTENCY_README.md
Normal file
@@ -0,0 +1,112 @@
|
|||||||
|
# RetrieX Patch 18 – Evidence State / Aggregate Consistency Fix
|
||||||
|
|
||||||
|
## Zweck
|
||||||
|
|
||||||
|
Dieser Patch behebt die Status-/Antwort-Konsistenz für Aggregat- und Zählfragen wie:
|
||||||
|
|
||||||
|
`wieviele testomat geräte haben wir`
|
||||||
|
|
||||||
|
Bisher konnte RetrieX bei solchen Fragen semantisch passende RAG-Treffer finden und die Beleglage trotzdem als `fachlich belegt` anzeigen, obwohl die Antwort selbst keine belastbare Zählinformation nennen konnte.
|
||||||
|
|
||||||
|
## Fachliche Entscheidung
|
||||||
|
|
||||||
|
`fachlich belegt` wird nicht pauschal abgewertet. Stattdessen wird ein eigener Zwischenzustand eingeführt:
|
||||||
|
|
||||||
|
`aggregate_missing`
|
||||||
|
|
||||||
|
Dieser Zustand bedeutet:
|
||||||
|
|
||||||
|
- Es wurden Quellen geprüft.
|
||||||
|
- Es gibt semantisch passende Quellen.
|
||||||
|
- Die Quellen enthalten aber keine explizite Aggregat-/Zählinformation für die angefragte Anzahl.
|
||||||
|
|
||||||
|
Die Nutzeranzeige wird dadurch präziser:
|
||||||
|
|
||||||
|
`Beleglage: geprüfte Quellen, keine passende Zählinformation`
|
||||||
|
|
||||||
|
## Änderungen
|
||||||
|
|
||||||
|
### `src/Agent/AgentRunner.php`
|
||||||
|
|
||||||
|
- Erkennt Aggregatfragen weiterhin über YAML-konfigurierte Patterns.
|
||||||
|
- Prüft bei Aggregatfragen zusätzlich auf explizite Aggregat-Antwortbelege.
|
||||||
|
- Gibt bei fehlender Zählinformation den neuen Evidence-State `aggregate_missing` zurück.
|
||||||
|
- Zeigt für diesen Zustand eine präzisere Beleglage an.
|
||||||
|
- Übergibt den Evidence-State auch an die finale Production-UI-Confidence-Logik.
|
||||||
|
|
||||||
|
### `src/Agent/PromptBuilder.php`
|
||||||
|
|
||||||
|
- Behandelt `aggregate_missing` als eigenen Reliability-State.
|
||||||
|
- Gibt dem LLM explizite Regeln, keine Produktfamilien-/Portfolio-Nennungen als konkrete Anzahl zu verkaufen.
|
||||||
|
|
||||||
|
### `src/Config/AgentRunnerConfig.php`
|
||||||
|
|
||||||
|
- Neuer YAML-backed Getter:
|
||||||
|
|
||||||
|
`getRagEvidenceAggregateAnswerEvidencePatterns()`
|
||||||
|
|
||||||
|
### `src/Config/RetriexEffectiveConfigProvider.php`
|
||||||
|
|
||||||
|
- Neuer Config-Pfad wird im Effective Config Output sichtbar.
|
||||||
|
- Neuer Config-Pfad wird als Regex-Liste validiert.
|
||||||
|
|
||||||
|
### `config/retriex/agent.yaml`
|
||||||
|
|
||||||
|
- Neuer YAML-Pfad:
|
||||||
|
|
||||||
|
`rag_evidence_guard.aggregate_answer_evidence_patterns`
|
||||||
|
|
||||||
|
Diese Patterns beschreiben explizite Zähl-/Aggregatbelege, z. B. `Anzahl ... 12`, `insgesamt ... 12`, oder `Sortiment umfasst ... 12`.
|
||||||
|
|
||||||
|
### `config/retriex/prompt.yaml`
|
||||||
|
|
||||||
|
- Neuer Reliability-State:
|
||||||
|
|
||||||
|
`aggregatfrage_keine_belastbare_zaehlinformation`
|
||||||
|
|
||||||
|
## Nicht geändert
|
||||||
|
|
||||||
|
- Keine Shop-Follow-up-Logik.
|
||||||
|
- Keine Produktrollen-Logik.
|
||||||
|
- Kein allgemeines Retrieval-/Scoring-Tuning.
|
||||||
|
- Keine harte neue PHP-Keywordliste im Core.
|
||||||
|
- Keine Strict-YAML-Validation.
|
||||||
|
|
||||||
|
## Lokale Prüfung in dieser Umgebung
|
||||||
|
|
||||||
|
Durchgeführt:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
php -l src/Agent/AgentRunner.php
|
||||||
|
php -l src/Agent/PromptBuilder.php
|
||||||
|
php -l src/Config/AgentRunnerConfig.php
|
||||||
|
php -l src/Config/RetriexEffectiveConfigProvider.php
|
||||||
|
```
|
||||||
|
|
||||||
|
Zusätzlich:
|
||||||
|
|
||||||
|
- YAML-Parse für `config/retriex/agent.yaml` erfolgreich.
|
||||||
|
- YAML-Parse für `config/retriex/prompt.yaml` erfolgreich.
|
||||||
|
- Regex-Smoke-Test für die neuen Aggregat-Antwortpatterns erfolgreich.
|
||||||
|
|
||||||
|
Nicht vollständig ausführbar in dieser Umgebung:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bin/console mto:agent:config:validate
|
||||||
|
bin/console mto:agent:regression:test
|
||||||
|
bin/console mto:agent:config:audit-source --details
|
||||||
|
bin/console mto:agent:config:audit-patterns --details
|
||||||
|
```
|
||||||
|
|
||||||
|
Grund: Die ZIP enthält keine installierten Composer-Dependencies; `composer install` kann hier wegen fehlender PHP-Extensions und fehlendem Netzwerkzugriff nicht vollständig abgeschlossen werden.
|
||||||
|
|
||||||
|
## Pflichtprüfung nach Einspielen
|
||||||
|
|
||||||
|
Bitte nach dem Einspielen im Projekt ausführen:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bin/console mto:agent:config:validate
|
||||||
|
bin/console mto:agent:regression:test
|
||||||
|
bin/console mto:agent:config:audit-source --details
|
||||||
|
bin/console mto:agent:config:audit-patterns --details
|
||||||
|
```
|
||||||
83
RETRIEX_PATCH_19_TYPO_TOLERANT_PRICE_FOLLOWUP_README.md
Normal file
83
RETRIEX_PATCH_19_TYPO_TOLERANT_PRICE_FOLLOWUP_README.md
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
# RetrieX Patch 19 - Typo-tolerant price follow-up intent
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Patch 19 fixes the observed regression class where a short referential price follow-up with a typo does not trigger commerce/shop handling.
|
||||||
|
|
||||||
|
Observed conversation:
|
||||||
|
|
||||||
|
1. `Was ist der niedrigste Grenzwert für die Wasserhärte, welcher mit einem Testomaten überwacht werden kann?`
|
||||||
|
2. `mit welchem indikator wird der wert gemessen`
|
||||||
|
3. `was kpstet der indikator`
|
||||||
|
|
||||||
|
The third input should be treated like `was kostet der indikator` and must enter the shop/commerce path. Before this patch, `kpstet` was not recognized as a commerce signal, so RetrieX stayed in RAG-only mode and produced an unrelated RAG answer.
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
YAML-only fix. No PHP runtime logic changed.
|
||||||
|
|
||||||
|
Changed files:
|
||||||
|
|
||||||
|
- `config/retriex/intent.yaml`
|
||||||
|
- `config/retriex/commerce.yaml`
|
||||||
|
|
||||||
|
## What changed
|
||||||
|
|
||||||
|
### `config/retriex/intent.yaml`
|
||||||
|
|
||||||
|
Added common typo variants for `kostet`:
|
||||||
|
|
||||||
|
- `kpstet`
|
||||||
|
- `ksotet`
|
||||||
|
|
||||||
|
The variants were added to:
|
||||||
|
|
||||||
|
- `strong_signals`
|
||||||
|
- `non_product_commerce_signals`
|
||||||
|
- `price_terms`
|
||||||
|
- `explicit_commerce_intent_patterns`
|
||||||
|
|
||||||
|
This lets typo variants trigger the existing commerce intent path without introducing hard-coded logic in PHP.
|
||||||
|
|
||||||
|
### `config/retriex/commerce.yaml`
|
||||||
|
|
||||||
|
Added the typo variants to:
|
||||||
|
|
||||||
|
- `filter_search_tokens`
|
||||||
|
- `search_token_corrections`
|
||||||
|
|
||||||
|
This prevents typo tokens from polluting the generated Store API search query. The parser normalizes:
|
||||||
|
|
||||||
|
- `kpstet` -> `kostet`
|
||||||
|
- `ksotet` -> `kostet`
|
||||||
|
|
||||||
|
Then the existing `kostet` filter removes the price-control word from the product query.
|
||||||
|
|
||||||
|
## Expected result
|
||||||
|
|
||||||
|
`was kpstet der indikator` should now be detected as commerce intent and proceed to shop search. In the known flow, the existing history/anchor logic should be able to keep the `Indikatortyp 300` context and resolve toward a query such as:
|
||||||
|
|
||||||
|
`indikatortyp 300 indikator`
|
||||||
|
|
||||||
|
## Why this is intentionally small
|
||||||
|
|
||||||
|
This patch does not change retrieval, scoring, Shopware matching, prompt building or evidence-state logic. It only extends the YAML-configured vocabulary/corrections for one observed typo class.
|
||||||
|
|
||||||
|
## Required checks after applying
|
||||||
|
|
||||||
|
Run the standard checks:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bin/console mto:agent:config:validate
|
||||||
|
bin/console mto:agent:regression:test
|
||||||
|
bin/console mto:agent:config:audit-source --details
|
||||||
|
bin/console mto:agent:config:audit-patterns --details
|
||||||
|
```
|
||||||
|
|
||||||
|
Then manually retest the conversation flow:
|
||||||
|
|
||||||
|
1. `Was ist der niedrigste Grenzwert für die Wasserhärte, welcher mit einem Testomaten überwacht werden kann?`
|
||||||
|
2. `mit welchem indikator wird der wert gemessen`
|
||||||
|
3. `was kpstet der indikator`
|
||||||
|
|
||||||
|
Expected: third turn should request/shop-search indicator products, not answer from RAG-only indicator documents.
|
||||||
@@ -110,6 +110,8 @@ parameters:
|
|||||||
- preise
|
- preise
|
||||||
- preisen
|
- preisen
|
||||||
- kostet
|
- kostet
|
||||||
|
- kpstet
|
||||||
|
- ksotet
|
||||||
- kosten
|
- kosten
|
||||||
- ua
|
- ua
|
||||||
- also
|
- also
|
||||||
@@ -142,6 +144,8 @@ parameters:
|
|||||||
indicatoren: indikatoren
|
indicatoren: indikatoren
|
||||||
schwinnbad: schwimmbad
|
schwinnbad: schwimmbad
|
||||||
schwimbad: schwimmbad
|
schwimbad: schwimmbad
|
||||||
|
kpstet: kostet
|
||||||
|
ksotet: kostet
|
||||||
|
|
||||||
search_token_canonical_map:
|
search_token_canonical_map:
|
||||||
indikatoren: indikator
|
indikatoren: indikator
|
||||||
|
|||||||
@@ -14,6 +14,8 @@ parameters:
|
|||||||
- sku
|
- sku
|
||||||
- kaufen
|
- kaufen
|
||||||
- kostet
|
- kostet
|
||||||
|
- kpstet
|
||||||
|
- ksotet
|
||||||
- suche
|
- suche
|
||||||
- such
|
- such
|
||||||
- finde
|
- finde
|
||||||
@@ -52,6 +54,8 @@ parameters:
|
|||||||
- online
|
- online
|
||||||
- kaufen
|
- kaufen
|
||||||
- kostet
|
- kostet
|
||||||
|
- kpstet
|
||||||
|
- ksotet
|
||||||
- suche
|
- suche
|
||||||
- such
|
- such
|
||||||
- finde
|
- finde
|
||||||
@@ -84,6 +88,8 @@ parameters:
|
|||||||
- preis
|
- preis
|
||||||
- kosten
|
- kosten
|
||||||
- kostet
|
- kostet
|
||||||
|
- kpstet
|
||||||
|
- ksotet
|
||||||
color_terms:
|
color_terms:
|
||||||
- schwarz
|
- schwarz
|
||||||
- weiß
|
- weiß
|
||||||
@@ -138,6 +144,8 @@ parameters:
|
|||||||
- '/\bpreis\b/u'
|
- '/\bpreis\b/u'
|
||||||
- '/\bkosten\b/u'
|
- '/\bkosten\b/u'
|
||||||
- '/\bkostet\b/u'
|
- '/\bkostet\b/u'
|
||||||
|
- '/\bkpstet\b/u'
|
||||||
|
- '/\bksotet\b/u'
|
||||||
- '/\bkaufen\b/u'
|
- '/\bkaufen\b/u'
|
||||||
- '/\bbestellen\b/u'
|
- '/\bbestellen\b/u'
|
||||||
- '/\bprodukt\b/u'
|
- '/\bprodukt\b/u'
|
||||||
|
|||||||
362
retriex_work/config/retriex/commerce.yaml
Normal file
362
retriex_work/config/retriex/commerce.yaml
Normal file
@@ -0,0 +1,362 @@
|
|||||||
|
# Commerce / Shopware Store API configuration.
|
||||||
|
# The existing Commerce and Shopware services stay unchanged; these values only centralize wiring.
|
||||||
|
parameters:
|
||||||
|
retriex.commerce.enabled: true
|
||||||
|
retriex.commerce.max_shop_results: '%env(SHOPWARE_STORE_API_MAX_RESULT)%'
|
||||||
|
retriex.commerce.shop_timeout: 15
|
||||||
|
retriex.commerce.store_api_base_url: '%env(SHOPWARE_STORE_API_BASE_URL)%'
|
||||||
|
retriex.commerce.sales_channel_access_key: '%env(SHOPWARE_SALES_CHANNEL_ACCESS_KEY)%'
|
||||||
|
|
||||||
|
retriex.commerce.search_repair.enabled: true
|
||||||
|
retriex.commerce.search_repair.max_queries: 2
|
||||||
|
retriex.commerce.search_repair.min_primary_results_without_repair: 2
|
||||||
|
|
||||||
|
# Commerce query parser configuration.
|
||||||
|
# YAML is the only operative source of truth; PHP must not contain parser defaults.
|
||||||
|
retriex.commerce_query.config:
|
||||||
|
known_brands:
|
||||||
|
- heyl
|
||||||
|
- horiba
|
||||||
|
- neomeris
|
||||||
|
|
||||||
|
phrases_to_remove:
|
||||||
|
- ich suche
|
||||||
|
- suche
|
||||||
|
- habt ihr
|
||||||
|
- gibt es
|
||||||
|
- gebe mir
|
||||||
|
- gib mir
|
||||||
|
- zeige mir
|
||||||
|
- welches gerät
|
||||||
|
- welche gerät
|
||||||
|
- welches modell
|
||||||
|
- welches ist besser
|
||||||
|
- welches ist am besten
|
||||||
|
- alternative
|
||||||
|
- alternativen
|
||||||
|
- unter anderem
|
||||||
|
- u a
|
||||||
|
- welche
|
||||||
|
- welcher
|
||||||
|
- welches
|
||||||
|
- welchen
|
||||||
|
- sind
|
||||||
|
- ist
|
||||||
|
- geeignet
|
||||||
|
- geeigent
|
||||||
|
- verfügbarkeit
|
||||||
|
- verfuegbarkeit
|
||||||
|
|
||||||
|
filter_search_tokens:
|
||||||
|
- auch
|
||||||
|
- noch
|
||||||
|
- nochmal
|
||||||
|
- zusätzlich
|
||||||
|
- dazu
|
||||||
|
- davon
|
||||||
|
- stattdessen
|
||||||
|
- bitte
|
||||||
|
- gern
|
||||||
|
- gerne
|
||||||
|
- zeige
|
||||||
|
- zeig
|
||||||
|
- such
|
||||||
|
- suche
|
||||||
|
- finde
|
||||||
|
- find
|
||||||
|
- mir
|
||||||
|
- mal
|
||||||
|
- von
|
||||||
|
- im
|
||||||
|
- in
|
||||||
|
- für
|
||||||
|
- fuer
|
||||||
|
- welche
|
||||||
|
- welcher
|
||||||
|
- welches
|
||||||
|
- welchen
|
||||||
|
- sind
|
||||||
|
- ist
|
||||||
|
- geeignet
|
||||||
|
- geeigent
|
||||||
|
- verfügbarkeit
|
||||||
|
- verfuegbarkeit
|
||||||
|
- prüfe
|
||||||
|
- pruefe
|
||||||
|
- den
|
||||||
|
- die
|
||||||
|
- das
|
||||||
|
- der
|
||||||
|
- dem
|
||||||
|
- des
|
||||||
|
- und
|
||||||
|
- oder
|
||||||
|
- sowie
|
||||||
|
- seine
|
||||||
|
- seinen
|
||||||
|
- seiner
|
||||||
|
- seinem
|
||||||
|
- seines
|
||||||
|
- siene
|
||||||
|
- sienen
|
||||||
|
- siener
|
||||||
|
- sienem
|
||||||
|
- sienes
|
||||||
|
- gebe
|
||||||
|
- gib
|
||||||
|
- nenne
|
||||||
|
- nenn
|
||||||
|
- preis
|
||||||
|
- preise
|
||||||
|
- preisen
|
||||||
|
- kostet
|
||||||
|
- kpstet
|
||||||
|
- ksotet
|
||||||
|
- kosten
|
||||||
|
- ua
|
||||||
|
- also
|
||||||
|
- gut
|
||||||
|
- gute
|
||||||
|
- guten
|
||||||
|
- guter
|
||||||
|
- gutes
|
||||||
|
- passen
|
||||||
|
- passend
|
||||||
|
|
||||||
|
search_control_tokens:
|
||||||
|
- shop
|
||||||
|
- store
|
||||||
|
- produkt
|
||||||
|
- produkte
|
||||||
|
- artikel
|
||||||
|
- kaufen
|
||||||
|
- kaufe
|
||||||
|
- bestellen
|
||||||
|
- bestelle
|
||||||
|
- online
|
||||||
|
|
||||||
|
search_token_corrections:
|
||||||
|
siene: seine
|
||||||
|
sienen: seinen
|
||||||
|
siener: seiner
|
||||||
|
sienem: seinem
|
||||||
|
sienes: seines
|
||||||
|
indicatoren: indikatoren
|
||||||
|
schwinnbad: schwimmbad
|
||||||
|
schwimbad: schwimmbad
|
||||||
|
kpstet: kostet
|
||||||
|
ksotet: kostet
|
||||||
|
|
||||||
|
search_token_canonical_map:
|
||||||
|
indikatoren: indikator
|
||||||
|
indicators: indikator
|
||||||
|
indicator: indikator
|
||||||
|
reagenzien: reagenz
|
||||||
|
reagents: reagenz
|
||||||
|
reagent: reagenz
|
||||||
|
produkte: produkt
|
||||||
|
|
||||||
|
semantic_shop_search_tokens:
|
||||||
|
- indikator
|
||||||
|
- indicator
|
||||||
|
- reagenz
|
||||||
|
- reagent
|
||||||
|
- zubehör
|
||||||
|
- zubehor
|
||||||
|
- ersatzteil
|
||||||
|
- anschlusskabel
|
||||||
|
- kabel
|
||||||
|
- sensorkabel
|
||||||
|
- elektrodenkabel
|
||||||
|
- verbrauchsmaterial
|
||||||
|
- chemie
|
||||||
|
- indikatorchemie
|
||||||
|
- reagenzchemie
|
||||||
|
- kit
|
||||||
|
- set
|
||||||
|
- filter
|
||||||
|
- pumpe
|
||||||
|
- pumpenkopf
|
||||||
|
- motorblock
|
||||||
|
- lösung
|
||||||
|
- loesung
|
||||||
|
- solution
|
||||||
|
- teststreifen
|
||||||
|
- gerät
|
||||||
|
- geraet
|
||||||
|
- messgerät
|
||||||
|
- messgeraet
|
||||||
|
- analysegerät
|
||||||
|
- analysegeraet
|
||||||
|
- analysator
|
||||||
|
- monitor
|
||||||
|
- controller
|
||||||
|
- system
|
||||||
|
|
||||||
|
normalization:
|
||||||
|
search: ['€']
|
||||||
|
replace: [' euro ']
|
||||||
|
|
||||||
|
text:
|
||||||
|
trim_characters:
|
||||||
|
- space
|
||||||
|
- tab
|
||||||
|
- lf
|
||||||
|
- cr
|
||||||
|
- nul
|
||||||
|
- vertical_tab
|
||||||
|
- '-'
|
||||||
|
- '.'
|
||||||
|
- ','
|
||||||
|
|
||||||
|
limits:
|
||||||
|
min_search_token_length: 1
|
||||||
|
min_direct_product_token_length: 1
|
||||||
|
direct_product_max_tokens: 4
|
||||||
|
model_context_token_window: 4
|
||||||
|
min_meaningful_alpha_token_length: 2
|
||||||
|
max_shop_search_tokens: 6
|
||||||
|
|
||||||
|
patterns:
|
||||||
|
history_context: 'chat|auch|noch|nochmal|zusätzlich|dazu|davon|stattdessen|alternative|alternativen|größer|groesser|kleiner|gleich(?:e|en|er|es)?|derselbe|dieselbe|dasselbe|wie oben|wie zuvor|wie gehabt'
|
||||||
|
history_context_value_template: '/\b({fragment})\b/u'
|
||||||
|
prompt_sanitize: '/[^\p{L}\p{N}\s.,\-]/u'
|
||||||
|
whitespace_collapse: '/\s+/u'
|
||||||
|
whitespace_split: '/\s+/u'
|
||||||
|
history_question: '/^Question:\s*(.+)$/m'
|
||||||
|
price_between: '/\bzwischen\s+(\d+(?:[.,]\d+)?)\s+und\s+(\d+(?:[.,]\d+)?)\s+euro\b/u'
|
||||||
|
price_max: '/\b(?:unter|bis|max(?:imal)?)\s+(\d+(?:[.,]\d+)?)\s+euro\b/u'
|
||||||
|
price_min: '/\b(?:ab|mindestens|min)\s+(\d+(?:[.,]\d+)?)\s+euro\b/u'
|
||||||
|
price_removal_between: '/\bzwischen\s+\d+(?:[.,]\d+)?\s+und\s+\d+(?:[.,]\d+)?\s*euro\b/u'
|
||||||
|
price_removal_minmax: '/\b(?:unter|bis|max(?:imal)?|ab|mindestens|min)\s+\d+(?:[.,]\d+)?\s*euro\b/u'
|
||||||
|
price_removal_intent_template: '/\b(?:{price_pattern})\b/u'
|
||||||
|
direct_product_digit: '/\d/u'
|
||||||
|
model_like: '/\b[a-zäöüß][a-zäöüß®\-]*(?:\s+[a-zäöüß][a-zäöüß®\-]*){0,2}\s+\d{2,5}[a-z0-9\-]*\b/u'
|
||||||
|
accessory_like: '/\b(?:indikator|indicator|reagenz|reagent|kit|set)\s+\d{1,5}[a-z0-9\-]*\b/u'
|
||||||
|
contains_digit: '/\d/u'
|
||||||
|
model_number_token: '/^(?:\d{2,5}[a-z0-9\-]*|[a-z]{1,6}\d{1,5}[a-z0-9\-]*)$/u'
|
||||||
|
model_context_token: '/^[\p{L}][\p{L}0-9®\-]{2,}$/u'
|
||||||
|
model_suffix_token: '/^[a-z]{1,4}\d{0,3}$/u'
|
||||||
|
instruction_or_presentation_token: '/^(?:zeig(?:e)?|such(?:e)?|find(?:e)?|gib|gebe|nenn(?:e)?|liefer(?:e)?|erstelle?|mach(?:e)?|brauch(?:e)?|will|möchte|moechte|hätte|haette|kannst|bitte|mal|alle|alles|komplett|vollständig|vollstaendig|gesamt|ganze|ganzen|liste|listung|auflistung|tabelle|tabellarisch|übersicht|uebersicht|anzeigen?|ausgeben?|darstellen?|antwort(?:e)?|erklär(?:e)?|erklaer(?:e)?|info|infos|informationen|dazu|hierzu|damit|davon|an|als|mit|ohne|inkl|inklusive|also|gut|gute|guten|guter|gutes|passend|passen)$/u'
|
||||||
|
measurement_value_token: '/^\d+[.,]\d+$/u'
|
||||||
|
exact_token_removal_template: '/\b{token}\b/u'
|
||||||
|
brand_part_of_model_template: '/\b{brand}\s+\d{2,5}[a-z0-9\-]*\b/u'
|
||||||
|
|
||||||
|
# Commerce reference resolver configuration.
|
||||||
|
# YAML is the only operative source of truth for conversation product and focus-term patterns.
|
||||||
|
retriex.commerce_reference_resolver.config:
|
||||||
|
conversation_product_patterns:
|
||||||
|
- '/\b(Testomat\s+2000\s+THCL)\b/ui'
|
||||||
|
- '/\b(Testomat\s+808)\b/ui'
|
||||||
|
- '/\b(Testomat\s+EVO\s+TH)\b/ui'
|
||||||
|
- '/\b(Testomat\s+EVO\s+CALC)\b/ui'
|
||||||
|
- '/\b(Testomat\s+ECO\s+PLUS)\b/ui'
|
||||||
|
- '/\b(Testomat\s+ECO\s+C)\b/ui'
|
||||||
|
- '/\b(Testomat\s+ECO)\b/ui'
|
||||||
|
- '/\b(Testomat\s+LAB\s+CL)\b/ui'
|
||||||
|
- '/\b(Testomat\s+LAB\s+MONO)\b/ui'
|
||||||
|
- '/\b(Testomat\s+2000)\b/ui'
|
||||||
|
|
||||||
|
focus_term_patterns:
|
||||||
|
indikator: '/\bindikator(?:en)?\b/u'
|
||||||
|
indikatoren: '/\bindikator(?:en)?\b/u'
|
||||||
|
reagenz: '/\breagenz(?:ien)?\b/u'
|
||||||
|
reagenzien: '/\breagenz(?:ien)?\b/u'
|
||||||
|
zubehör: '/\bzubeh[oö]r\b/u'
|
||||||
|
ersatzteil: '/\bersatzteile?\b/u'
|
||||||
|
ersatzteile: '/\bersatzteile?\b/u'
|
||||||
|
service-set: '/\bservice(?:\s|-)?set\b/u'
|
||||||
|
filter: '/\bfilter\b/u'
|
||||||
|
pumpenkopf: '/\bpumpenkopf\b/u'
|
||||||
|
motorblock: '/\bmotorblock\b/u'
|
||||||
|
mehrwertpaket: '/\bmehrwertpaket\b/u'
|
||||||
|
neotecmaster: '/\bneotecmaster\b/u'
|
||||||
|
|
||||||
|
# Shop matching and presentation configuration.
|
||||||
|
# YAML is the only operative source of truth; PHP must not contain shop matching defaults.
|
||||||
|
retriex.shop_matching.config:
|
||||||
|
top_product_log_limit: 3
|
||||||
|
|
||||||
|
# Shop role and focus lists are resolved from config/retriex/vocabulary.yaml.
|
||||||
|
# Direct list overrides may still be added to this parameter if a project needs them.
|
||||||
|
vocabulary_views:
|
||||||
|
device_focus_keywords: shop.device_focus
|
||||||
|
accessory_focus_keywords: shop.accessory_focus
|
||||||
|
device_query_keywords: shop.device_query
|
||||||
|
accessory_query_keywords: shop.accessory_query
|
||||||
|
accessory_product_keywords: shop.accessory_product
|
||||||
|
device_product_keywords: shop.device_product
|
||||||
|
|
||||||
|
vocabulary_maps:
|
||||||
|
accessory_focus_variant_map: shop.accessory_focus_variants
|
||||||
|
|
||||||
|
role_guard:
|
||||||
|
filter_accessory_products_for_device_queries: true
|
||||||
|
keep_ambiguous_products_for_device_queries: true
|
||||||
|
|
||||||
|
scores:
|
||||||
|
exact_product_number_phrase: 160
|
||||||
|
exact_product_name_phrase: 90
|
||||||
|
exact_manufacturer_match: 40
|
||||||
|
brand_contained_in_name: 20
|
||||||
|
name_token_overlap_weight: 6
|
||||||
|
product_number_token_overlap_weight: 10
|
||||||
|
corpus_token_overlap_weight: 2
|
||||||
|
name_number_overlap_weight: 18
|
||||||
|
product_number_number_overlap_weight: 28
|
||||||
|
corpus_number_overlap_weight: 8
|
||||||
|
size_match: 12
|
||||||
|
availability_bonus: 1
|
||||||
|
device_query_device_product_bonus: 60
|
||||||
|
device_query_accessory_penalty: 120
|
||||||
|
accessory_query_accessory_product_bonus: 30
|
||||||
|
accessory_query_device_product_bonus: 10
|
||||||
|
|
||||||
|
patterns:
|
||||||
|
contains_digit: '/\d/u'
|
||||||
|
matching_cleanup: '/[^\p{L}\p{N}]+/u'
|
||||||
|
whitespace_collapse: '/\s+/u'
|
||||||
|
token_split: '/[^\p{L}\p{N}]+/u'
|
||||||
|
|
||||||
|
padding:
|
||||||
|
prefix: ' '
|
||||||
|
suffix: ' '
|
||||||
|
|
||||||
|
price:
|
||||||
|
normalization_search: ['€', ' ', '.']
|
||||||
|
normalization_replace: ['', '', '']
|
||||||
|
decimals: 2
|
||||||
|
decimal_separator: ','
|
||||||
|
thousands_separator: '.'
|
||||||
|
suffix: ' €'
|
||||||
|
|
||||||
|
custom_fields:
|
||||||
|
primary: migration_Backup_product_attr1
|
||||||
|
secondary: migration_Backup_product_attr2
|
||||||
|
use_cases: migration_Backup_product_attr4
|
||||||
|
languages: migration_Backup_product_attr5
|
||||||
|
|
||||||
|
text:
|
||||||
|
primary_secondary_separator: ': '
|
||||||
|
use_cases_label: 'Einsatzgebiete: '
|
||||||
|
languages_label: 'Sprachen: '
|
||||||
|
custom_field_join_separator: ' | '
|
||||||
|
|
||||||
|
description:
|
||||||
|
empty_line_pattern: '/^[ \t]*\R/m'
|
||||||
|
whitespace_cleanup_pattern: '/[ \t]{2,}/'
|
||||||
|
max_length: 1500
|
||||||
|
|
||||||
|
seo:
|
||||||
|
relative_prefix: '/'
|
||||||
|
|
||||||
|
highlight:
|
||||||
|
available_label: Verfügbar
|
||||||
|
unavailable_label: Nicht verfügbar
|
||||||
|
product_number_prefix: 'Produktnummer: '
|
||||||
|
|
||||||
|
image:
|
||||||
|
missing_placeholder: no-image
|
||||||
|
|
||||||
|
deduplication:
|
||||||
|
separator: '|'
|
||||||
323
retriex_work/config/retriex/intent.yaml
Normal file
323
retriex_work/config/retriex/intent.yaml
Normal file
@@ -0,0 +1,323 @@
|
|||||||
|
# Intent vocabulary and pattern configuration.
|
||||||
|
# Lists and thresholds mirror the previous PHP defaults exactly.
|
||||||
|
# Migrated config areas are YAML-only; remaining areas are migrated incrementally.
|
||||||
|
parameters:
|
||||||
|
retriex.intent.commerce.config:
|
||||||
|
strong_signals:
|
||||||
|
- shop
|
||||||
|
- alle
|
||||||
|
- preis
|
||||||
|
- kunde
|
||||||
|
- online
|
||||||
|
- produkt
|
||||||
|
- artikel
|
||||||
|
- sku
|
||||||
|
- kaufen
|
||||||
|
- kostet
|
||||||
|
- kpstet
|
||||||
|
- ksotet
|
||||||
|
- suche
|
||||||
|
- such
|
||||||
|
- finde
|
||||||
|
- finden
|
||||||
|
- analysegerät
|
||||||
|
- analysegeraet
|
||||||
|
- messgerät
|
||||||
|
- messgeraet
|
||||||
|
- pockettester
|
||||||
|
- pocket tester
|
||||||
|
- handmessgerät
|
||||||
|
- handmessgeraet
|
||||||
|
- analysator
|
||||||
|
- analyzer
|
||||||
|
- puffer
|
||||||
|
- kalibrierpuffer
|
||||||
|
- kalibrierlösung
|
||||||
|
- kalibrierloesung
|
||||||
|
- kalibrierung
|
||||||
|
- chemie
|
||||||
|
- reagenz
|
||||||
|
- reagenzien
|
||||||
|
- verbrauchsmaterial
|
||||||
|
- zubehör
|
||||||
|
- zubehoer
|
||||||
|
- ersatzteil
|
||||||
|
- anschlusskabel
|
||||||
|
- kabel
|
||||||
|
- sensorkabel
|
||||||
|
- elektrode
|
||||||
|
- elektrodenkabel
|
||||||
|
non_product_commerce_signals:
|
||||||
|
- shop
|
||||||
|
- alle
|
||||||
|
- kunde
|
||||||
|
- online
|
||||||
|
- kaufen
|
||||||
|
- kostet
|
||||||
|
- kpstet
|
||||||
|
- ksotet
|
||||||
|
- suche
|
||||||
|
- such
|
||||||
|
- finde
|
||||||
|
- finden
|
||||||
|
advisory_signals:
|
||||||
|
- passt
|
||||||
|
- eignet
|
||||||
|
- besser
|
||||||
|
- besten
|
||||||
|
- gut
|
||||||
|
- gut für
|
||||||
|
- gut fuer
|
||||||
|
- passend für
|
||||||
|
- passend fuer
|
||||||
|
- geeignet
|
||||||
|
- geeigent
|
||||||
|
- empfiehl
|
||||||
|
- empfehl
|
||||||
|
advisory_product_selection_patterns:
|
||||||
|
- '/\bmit\s+welche(?:m|n|r|s)?\s+(?:testomat(?:en)?|pockettester|pocket\s+tester|analysegerät|analysegeraet|messgerät|messgeraet|analysator|analyzer)\b.*\b(?:messen|messung|überwach(?:en|ung)?|ueberwach(?:en|ung)?)\b/u'
|
||||||
|
- '/\bwelche(?:r|s|n|m)?\s+(?:testomat(?:en)?|pockettester|pocket\s+tester|analysegerät|analysegeraet|messgerät|messgeraet|analysator|analyzer)\b.*\b(?:kann|können|koennen|misst|messen|überwacht|ueberwacht|eignet|geeignet|passt|gut|empfehl)\b.*\b(?:messen|messung|überwach(?:en|ung)?|ueberwach(?:en|ung)?)\b/u'
|
||||||
|
- '/\b(?:testomat(?:en)?|pockettester|pocket\s+tester|analysegerät|analysegeraet|messgerät|messgeraet|analysator|analyzer)\b.*\b(?:für|fuer)\b.*\b(?:messung|messen|überwachung|ueberwachung)\b/u'
|
||||||
|
- '/\b(?:ich\s+)?(?:würde|wuerde|möchte|moechte|will|brauche|benötige|benoetige)\b.{0,80}\b(?:messen|messung|überwachen|ueberwachen|kontrollieren)\b/u'
|
||||||
|
- '/\b(?:messen|messung|überwachen|ueberwachen|kontrollieren)\b.{0,80}\b(?:schwimmbad|pool|becken|wasseranalyse)\b/u'
|
||||||
|
price_terms:
|
||||||
|
- euro
|
||||||
|
- €
|
||||||
|
- eur
|
||||||
|
- teuer
|
||||||
|
- preis
|
||||||
|
- kosten
|
||||||
|
- kostet
|
||||||
|
- kpstet
|
||||||
|
- ksotet
|
||||||
|
color_terms:
|
||||||
|
- schwarz
|
||||||
|
- weiß
|
||||||
|
- weis
|
||||||
|
- blau
|
||||||
|
- grau
|
||||||
|
- beige
|
||||||
|
- rosa
|
||||||
|
- pink
|
||||||
|
- gruen
|
||||||
|
- orange
|
||||||
|
- braun
|
||||||
|
size_token_terms:
|
||||||
|
- xs
|
||||||
|
- s
|
||||||
|
- m
|
||||||
|
- l
|
||||||
|
- xl
|
||||||
|
- xxl
|
||||||
|
- xxxxl
|
||||||
|
size_terms:
|
||||||
|
- größe
|
||||||
|
- groesse
|
||||||
|
- grösse
|
||||||
|
support_diagnostic_patterns:
|
||||||
|
- '/\bfehler\b/u'
|
||||||
|
- '/\bfehlercode\b/u'
|
||||||
|
- '/\berror\b/u'
|
||||||
|
- '/\bstörung\b/u'
|
||||||
|
- '/\bstoerung\b/u'
|
||||||
|
- '/\balarm\b/u'
|
||||||
|
- '/\bstörungsmeldung\b/u'
|
||||||
|
- '/\bstoerungsmeldung\b/u'
|
||||||
|
- '/\bmeldung\b/u'
|
||||||
|
- '/\bwarnung\b/u'
|
||||||
|
- '/\bwarncode\b/u'
|
||||||
|
- '/\bcode\b/u'
|
||||||
|
- '/\bwas bedeutet\b/u'
|
||||||
|
- '/\bwarum\b/u'
|
||||||
|
- '/\bblinkt\b/u'
|
||||||
|
- '/\bzeigt\b/u'
|
||||||
|
- '/\bzeigt an\b/u'
|
||||||
|
- '/\bursache\b/u'
|
||||||
|
- '/\bdiagnose\b/u'
|
||||||
|
- '/\bservicefall\b/u'
|
||||||
|
- '/\bproblem\b/u'
|
||||||
|
- '/\bstörung beheben\b/u'
|
||||||
|
- '/\bstoerung beheben\b/u'
|
||||||
|
- '/\be\d{1,3}\b/u'
|
||||||
|
explicit_commerce_intent_patterns:
|
||||||
|
- '/\bshop\b/u'
|
||||||
|
- '/\bpreis\b/u'
|
||||||
|
- '/\bkosten\b/u'
|
||||||
|
- '/\bkostet\b/u'
|
||||||
|
- '/\bkpstet\b/u'
|
||||||
|
- '/\bksotet\b/u'
|
||||||
|
- '/\bkaufen\b/u'
|
||||||
|
- '/\bbestellen\b/u'
|
||||||
|
- '/\bprodukt\b/u'
|
||||||
|
- '/\bartikel\b/u'
|
||||||
|
- '/\bsku\b/u'
|
||||||
|
- '/\bonline\b/u'
|
||||||
|
- '/\bchemie\b/u'
|
||||||
|
- '/\breagenz(?:ien)?\b/u'
|
||||||
|
- '/\bverbrauchsmaterial(?:ien)?\b/u'
|
||||||
|
- '/\bzubehör\b/u'
|
||||||
|
- '/\bzubehoer\b/u'
|
||||||
|
- '/\bersatzteil(?:e)?\b/u'
|
||||||
|
- '/\banschlusskabel\b/u'
|
||||||
|
- '/\bkabel\b/u'
|
||||||
|
- '/\bsensorkabel\b/u'
|
||||||
|
- '/\belektrodenkabel\b/u'
|
||||||
|
technical_factual_knowledge:
|
||||||
|
signal_label: technical_factual_knowledge_query
|
||||||
|
question_marker_patterns:
|
||||||
|
- '/\bwas\s+ist\b/u'
|
||||||
|
- '/\bwelche?r?s?\b/u'
|
||||||
|
- '/\bwie\s+(hoch|niedrig|klein|gross|groß)\b/u'
|
||||||
|
- '/\bniedrigste[rsn]?\b/u'
|
||||||
|
- '/\bkleinste[rsn]?\b/u'
|
||||||
|
- '/\bhöchste[rsn]?\b/u'
|
||||||
|
- '/\bhoechste[rsn]?\b/u'
|
||||||
|
fact_patterns:
|
||||||
|
- '/\bgrenzwert(?:e|en|es)?\b/u'
|
||||||
|
- '/\bmessbereich(?:e|en|s)?\b/u'
|
||||||
|
- '/\bwasserhärte\b/u'
|
||||||
|
- '/\bwasserhaerte\b/u'
|
||||||
|
- '/\bresthärte\b/u'
|
||||||
|
- '/\bresthaerte\b/u'
|
||||||
|
- '/\bgesamthärte\b/u'
|
||||||
|
- '/\bgesamthaerte\b/u'
|
||||||
|
- '/\bauflösung\b/u'
|
||||||
|
- '/\baufloesung\b/u'
|
||||||
|
- '/\bindikator(?:en|s)?\b/u'
|
||||||
|
- '/\btestomat(?:en|s)?\b/u'
|
||||||
|
- '/\büberwach(?:t|en|ung)\b/u'
|
||||||
|
- '/\bueberwach(?:t|en|ung)\b/u'
|
||||||
|
- '/\bmess(?:en|ung|bar|wert)\b/u'
|
||||||
|
patterns:
|
||||||
|
sku_like: '/\b\d{4,10}\b/u'
|
||||||
|
price_value_template: '/\b\d+(?:[.,]\d+)?\s*(?:{price_pattern})\b/u'
|
||||||
|
size_extraction_template: '/\b(?:{size_pattern})\s*([a-z0-9.-]+)\b/u'
|
||||||
|
size_value_template: '/\b(?:{size_pattern})\s*[a-z0-9.-]+\b/u'
|
||||||
|
size_token_value_template: '/\b(?:{size_token_pattern})\b/u'
|
||||||
|
color_value_template: '/\b(?:{color_pattern})\b/u'
|
||||||
|
model_like_product: '/\b[a-zäöüß][a-zäöüß®\-]*(?:\s+[a-zäöüß][a-zäöüß®\-]*){0,2}\s+\d{2,5}[a-z0-9\-]*\b/u'
|
||||||
|
labels:
|
||||||
|
support_or_diagnostic_signal: support_or_diagnostic
|
||||||
|
sku_signal: sku
|
||||||
|
price_signal: price
|
||||||
|
size_signal: size
|
||||||
|
size_token_signal: size_token
|
||||||
|
color_signal: color
|
||||||
|
advisory_signal_prefix: 'advisory:'
|
||||||
|
advisory_product_selection_signal: advisory_product_selection
|
||||||
|
model_like_product_signal: model_like_product
|
||||||
|
scores:
|
||||||
|
product_search_min_score: 3
|
||||||
|
advisory_product_search_min_score: 2
|
||||||
|
strong_signal_score: 3
|
||||||
|
sku_signal_score: 2
|
||||||
|
price_signal_score: 2
|
||||||
|
size_signal_score: 2
|
||||||
|
size_token_signal_score: 1
|
||||||
|
color_signal_score: 1
|
||||||
|
advisory_signal_score: 1
|
||||||
|
advisory_product_selection_signal_score: 3
|
||||||
|
model_like_product_signal_score: 3
|
||||||
|
|
||||||
|
retriex.intent.catalog.config:
|
||||||
|
min_score: 0.72
|
||||||
|
ambiguity_delta: 0.02
|
||||||
|
intent_search_limit: 6
|
||||||
|
list_search_limit: 3
|
||||||
|
min_allowed_score: 0.0
|
||||||
|
max_allowed_score: 1.0
|
||||||
|
|
||||||
|
retriex.intent.light.config:
|
||||||
|
list_threshold: 4
|
||||||
|
quantity_words:
|
||||||
|
- alle
|
||||||
|
- sämtliche
|
||||||
|
- saemtliche
|
||||||
|
- mehrere
|
||||||
|
- verschiedene
|
||||||
|
- einige
|
||||||
|
- viele
|
||||||
|
- optionen
|
||||||
|
- möglichkeiten
|
||||||
|
- moeglichkeiten
|
||||||
|
- varianten
|
||||||
|
- arten
|
||||||
|
- modelle
|
||||||
|
- funktionen
|
||||||
|
- punkte
|
||||||
|
- schritte
|
||||||
|
- kategorien
|
||||||
|
- übersicht
|
||||||
|
- uebersicht
|
||||||
|
strong_patterns:
|
||||||
|
- '/\bliste(n)?\b/u'
|
||||||
|
- '/\bauflisten\b/u'
|
||||||
|
- '/\baufz(a|ä)hl(en)?\b/u'
|
||||||
|
- '/\bnenn(e)?\b/u'
|
||||||
|
- '/\bzeig(e)?\b/u'
|
||||||
|
- '/\bwelche\s+sind\b/u'
|
||||||
|
- '/\bwelche\s+gibt\s+es\b/u'
|
||||||
|
- '/\bwas\s+sind\b/u'
|
||||||
|
- '/\bwie\s+viele\b/u'
|
||||||
|
- '/\branking\b/u'
|
||||||
|
- '/\btop\s*\d+\b/u'
|
||||||
|
|
||||||
|
retriex.intent.sales.config:
|
||||||
|
dominance_delta: 2
|
||||||
|
min_score_threshold: 3
|
||||||
|
sales_signals:
|
||||||
|
- preis
|
||||||
|
- preise
|
||||||
|
- kosten
|
||||||
|
- lizenz
|
||||||
|
- lizenzmodell
|
||||||
|
- tarif
|
||||||
|
- tarife
|
||||||
|
- gebuehr
|
||||||
|
- gebühr
|
||||||
|
- monatlich
|
||||||
|
- jaehrlich
|
||||||
|
- jährlich
|
||||||
|
- abo
|
||||||
|
- subscription
|
||||||
|
comparison_signals:
|
||||||
|
- '/\bvergleich(en)?\b/u'
|
||||||
|
- '/\bvs\b/u'
|
||||||
|
- '/\bgegenueber\b/u'
|
||||||
|
- '/\balternative(n)?\b/u'
|
||||||
|
- '/\bunterschied(e)?\b/u'
|
||||||
|
- '/\bbesser\b/u'
|
||||||
|
objection_signals:
|
||||||
|
- problem
|
||||||
|
- risiko
|
||||||
|
- nachteil
|
||||||
|
- datenschutz
|
||||||
|
- dsgvo
|
||||||
|
- sicherheit
|
||||||
|
- compliance
|
||||||
|
- kritik
|
||||||
|
- zweifel
|
||||||
|
- unsicher
|
||||||
|
implementation_signals:
|
||||||
|
- implementierung
|
||||||
|
- implementieren
|
||||||
|
- integration
|
||||||
|
- integrieren
|
||||||
|
- einführung
|
||||||
|
- einfuehrung
|
||||||
|
- aufwand
|
||||||
|
- setup
|
||||||
|
- rollout
|
||||||
|
- migration
|
||||||
|
- installation
|
||||||
|
- api
|
||||||
|
- schnittstelle
|
||||||
|
roi_signals:
|
||||||
|
- roi
|
||||||
|
- rentabilitaet
|
||||||
|
- rentabilität
|
||||||
|
- business case
|
||||||
|
- einsparung
|
||||||
|
- kosten senken
|
||||||
|
- umsatz steigern
|
||||||
|
- effizienz steigern
|
||||||
Reference in New Issue
Block a user