patch 19
This commit is contained in:
112
RETRIEX_PATCH_18_EVIDENCE_STATE_AGGREGATE_CONSISTENCY_README.md
Normal file
112
RETRIEX_PATCH_18_EVIDENCE_STATE_AGGREGATE_CONSISTENCY_README.md
Normal file
@@ -0,0 +1,112 @@
|
||||
# RetrieX Patch 18 – Evidence State / Aggregate Consistency Fix
|
||||
|
||||
## Zweck
|
||||
|
||||
Dieser Patch behebt die Status-/Antwort-Konsistenz für Aggregat- und Zählfragen wie:
|
||||
|
||||
`wieviele testomat geräte haben wir`
|
||||
|
||||
Bisher konnte RetrieX bei solchen Fragen semantisch passende RAG-Treffer finden und die Beleglage trotzdem als `fachlich belegt` anzeigen, obwohl die Antwort selbst keine belastbare Zählinformation nennen konnte.
|
||||
|
||||
## Fachliche Entscheidung
|
||||
|
||||
`fachlich belegt` wird nicht pauschal abgewertet. Stattdessen wird ein eigener Zwischenzustand eingeführt:
|
||||
|
||||
`aggregate_missing`
|
||||
|
||||
Dieser Zustand bedeutet:
|
||||
|
||||
- Es wurden Quellen geprüft.
|
||||
- Es gibt semantisch passende Quellen.
|
||||
- Die Quellen enthalten aber keine explizite Aggregat-/Zählinformation für die angefragte Anzahl.
|
||||
|
||||
Die Nutzeranzeige wird dadurch präziser:
|
||||
|
||||
`Beleglage: geprüfte Quellen, keine passende Zählinformation`
|
||||
|
||||
## Änderungen
|
||||
|
||||
### `src/Agent/AgentRunner.php`
|
||||
|
||||
- Erkennt Aggregatfragen weiterhin über YAML-konfigurierte Patterns.
|
||||
- Prüft bei Aggregatfragen zusätzlich auf explizite Aggregat-Antwortbelege.
|
||||
- Gibt bei fehlender Zählinformation den neuen Evidence-State `aggregate_missing` zurück.
|
||||
- Zeigt für diesen Zustand eine präzisere Beleglage an.
|
||||
- Übergibt den Evidence-State auch an die finale Production-UI-Confidence-Logik.
|
||||
|
||||
### `src/Agent/PromptBuilder.php`
|
||||
|
||||
- Behandelt `aggregate_missing` als eigenen Reliability-State.
|
||||
- Gibt dem LLM explizite Regeln, keine Produktfamilien-/Portfolio-Nennungen als konkrete Anzahl zu verkaufen.
|
||||
|
||||
### `src/Config/AgentRunnerConfig.php`
|
||||
|
||||
- Neuer YAML-backed Getter:
|
||||
|
||||
`getRagEvidenceAggregateAnswerEvidencePatterns()`
|
||||
|
||||
### `src/Config/RetriexEffectiveConfigProvider.php`
|
||||
|
||||
- Neuer Config-Pfad wird im Effective Config Output sichtbar.
|
||||
- Neuer Config-Pfad wird als Regex-Liste validiert.
|
||||
|
||||
### `config/retriex/agent.yaml`
|
||||
|
||||
- Neuer YAML-Pfad:
|
||||
|
||||
`rag_evidence_guard.aggregate_answer_evidence_patterns`
|
||||
|
||||
Diese Patterns beschreiben explizite Zähl-/Aggregatbelege, z. B. `Anzahl ... 12`, `insgesamt ... 12`, oder `Sortiment umfasst ... 12`.
|
||||
|
||||
### `config/retriex/prompt.yaml`
|
||||
|
||||
- Neuer Reliability-State:
|
||||
|
||||
`aggregatfrage_keine_belastbare_zaehlinformation`
|
||||
|
||||
## Nicht geändert
|
||||
|
||||
- Keine Shop-Follow-up-Logik.
|
||||
- Keine Produktrollen-Logik.
|
||||
- Kein allgemeines Retrieval-/Scoring-Tuning.
|
||||
- Keine harte neue PHP-Keywordliste im Core.
|
||||
- Keine Strict-YAML-Validation.
|
||||
|
||||
## Lokale Prüfung in dieser Umgebung
|
||||
|
||||
Durchgeführt:
|
||||
|
||||
```bash
|
||||
php -l src/Agent/AgentRunner.php
|
||||
php -l src/Agent/PromptBuilder.php
|
||||
php -l src/Config/AgentRunnerConfig.php
|
||||
php -l src/Config/RetriexEffectiveConfigProvider.php
|
||||
```
|
||||
|
||||
Zusätzlich:
|
||||
|
||||
- YAML-Parse für `config/retriex/agent.yaml` erfolgreich.
|
||||
- YAML-Parse für `config/retriex/prompt.yaml` erfolgreich.
|
||||
- Regex-Smoke-Test für die neuen Aggregat-Antwortpatterns erfolgreich.
|
||||
|
||||
Nicht vollständig ausführbar in dieser Umgebung:
|
||||
|
||||
```bash
|
||||
bin/console mto:agent:config:validate
|
||||
bin/console mto:agent:regression:test
|
||||
bin/console mto:agent:config:audit-source --details
|
||||
bin/console mto:agent:config:audit-patterns --details
|
||||
```
|
||||
|
||||
Grund: Die ZIP enthält keine installierten Composer-Dependencies; `composer install` kann hier wegen fehlender PHP-Extensions und fehlendem Netzwerkzugriff nicht vollständig abgeschlossen werden.
|
||||
|
||||
## Pflichtprüfung nach Einspielen
|
||||
|
||||
Bitte nach dem Einspielen im Projekt ausführen:
|
||||
|
||||
```bash
|
||||
bin/console mto:agent:config:validate
|
||||
bin/console mto:agent:regression:test
|
||||
bin/console mto:agent:config:audit-source --details
|
||||
bin/console mto:agent:config:audit-patterns --details
|
||||
```
|
||||
83
RETRIEX_PATCH_19_TYPO_TOLERANT_PRICE_FOLLOWUP_README.md
Normal file
83
RETRIEX_PATCH_19_TYPO_TOLERANT_PRICE_FOLLOWUP_README.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# RetrieX Patch 19 - Typo-tolerant price follow-up intent
|
||||
|
||||
## Purpose
|
||||
|
||||
Patch 19 fixes the observed regression class where a short referential price follow-up with a typo does not trigger commerce/shop handling.
|
||||
|
||||
Observed conversation:
|
||||
|
||||
1. `Was ist der niedrigste Grenzwert für die Wasserhärte, welcher mit einem Testomaten überwacht werden kann?`
|
||||
2. `mit welchem indikator wird der wert gemessen`
|
||||
3. `was kpstet der indikator`
|
||||
|
||||
The third input should be treated like `was kostet der indikator` and must enter the shop/commerce path. Before this patch, `kpstet` was not recognized as a commerce signal, so RetrieX stayed in RAG-only mode and produced an unrelated RAG answer.
|
||||
|
||||
## Scope
|
||||
|
||||
YAML-only fix. No PHP runtime logic changed.
|
||||
|
||||
Changed files:
|
||||
|
||||
- `config/retriex/intent.yaml`
|
||||
- `config/retriex/commerce.yaml`
|
||||
|
||||
## What changed
|
||||
|
||||
### `config/retriex/intent.yaml`
|
||||
|
||||
Added common typo variants for `kostet`:
|
||||
|
||||
- `kpstet`
|
||||
- `ksotet`
|
||||
|
||||
The variants were added to:
|
||||
|
||||
- `strong_signals`
|
||||
- `non_product_commerce_signals`
|
||||
- `price_terms`
|
||||
- `explicit_commerce_intent_patterns`
|
||||
|
||||
This lets typo variants trigger the existing commerce intent path without introducing hard-coded logic in PHP.
|
||||
|
||||
### `config/retriex/commerce.yaml`
|
||||
|
||||
Added the typo variants to:
|
||||
|
||||
- `filter_search_tokens`
|
||||
- `search_token_corrections`
|
||||
|
||||
This prevents typo tokens from polluting the generated Store API search query. The parser normalizes:
|
||||
|
||||
- `kpstet` -> `kostet`
|
||||
- `ksotet` -> `kostet`
|
||||
|
||||
Then the existing `kostet` filter removes the price-control word from the product query.
|
||||
|
||||
## Expected result
|
||||
|
||||
`was kpstet der indikator` should now be detected as commerce intent and proceed to shop search. In the known flow, the existing history/anchor logic should be able to keep the `Indikatortyp 300` context and resolve toward a query such as:
|
||||
|
||||
`indikatortyp 300 indikator`
|
||||
|
||||
## Why this is intentionally small
|
||||
|
||||
This patch does not change retrieval, scoring, Shopware matching, prompt building or evidence-state logic. It only extends the YAML-configured vocabulary/corrections for one observed typo class.
|
||||
|
||||
## Required checks after applying
|
||||
|
||||
Run the standard checks:
|
||||
|
||||
```bash
|
||||
bin/console mto:agent:config:validate
|
||||
bin/console mto:agent:regression:test
|
||||
bin/console mto:agent:config:audit-source --details
|
||||
bin/console mto:agent:config:audit-patterns --details
|
||||
```
|
||||
|
||||
Then manually retest the conversation flow:
|
||||
|
||||
1. `Was ist der niedrigste Grenzwert für die Wasserhärte, welcher mit einem Testomaten überwacht werden kann?`
|
||||
2. `mit welchem indikator wird der wert gemessen`
|
||||
3. `was kpstet der indikator`
|
||||
|
||||
Expected: third turn should request/shop-search indicator products, not answer from RAG-only indicator documents.
|
||||
@@ -110,6 +110,8 @@ parameters:
|
||||
- preise
|
||||
- preisen
|
||||
- kostet
|
||||
- kpstet
|
||||
- ksotet
|
||||
- kosten
|
||||
- ua
|
||||
- also
|
||||
@@ -142,6 +144,8 @@ parameters:
|
||||
indicatoren: indikatoren
|
||||
schwinnbad: schwimmbad
|
||||
schwimbad: schwimmbad
|
||||
kpstet: kostet
|
||||
ksotet: kostet
|
||||
|
||||
search_token_canonical_map:
|
||||
indikatoren: indikator
|
||||
|
||||
@@ -14,6 +14,8 @@ parameters:
|
||||
- sku
|
||||
- kaufen
|
||||
- kostet
|
||||
- kpstet
|
||||
- ksotet
|
||||
- suche
|
||||
- such
|
||||
- finde
|
||||
@@ -52,6 +54,8 @@ parameters:
|
||||
- online
|
||||
- kaufen
|
||||
- kostet
|
||||
- kpstet
|
||||
- ksotet
|
||||
- suche
|
||||
- such
|
||||
- finde
|
||||
@@ -84,6 +88,8 @@ parameters:
|
||||
- preis
|
||||
- kosten
|
||||
- kostet
|
||||
- kpstet
|
||||
- ksotet
|
||||
color_terms:
|
||||
- schwarz
|
||||
- weiß
|
||||
@@ -138,6 +144,8 @@ parameters:
|
||||
- '/\bpreis\b/u'
|
||||
- '/\bkosten\b/u'
|
||||
- '/\bkostet\b/u'
|
||||
- '/\bkpstet\b/u'
|
||||
- '/\bksotet\b/u'
|
||||
- '/\bkaufen\b/u'
|
||||
- '/\bbestellen\b/u'
|
||||
- '/\bprodukt\b/u'
|
||||
|
||||
362
retriex_work/config/retriex/commerce.yaml
Normal file
362
retriex_work/config/retriex/commerce.yaml
Normal file
@@ -0,0 +1,362 @@
|
||||
# Commerce / Shopware Store API configuration.
|
||||
# The existing Commerce and Shopware services stay unchanged; these values only centralize wiring.
|
||||
parameters:
|
||||
retriex.commerce.enabled: true
|
||||
retriex.commerce.max_shop_results: '%env(SHOPWARE_STORE_API_MAX_RESULT)%'
|
||||
retriex.commerce.shop_timeout: 15
|
||||
retriex.commerce.store_api_base_url: '%env(SHOPWARE_STORE_API_BASE_URL)%'
|
||||
retriex.commerce.sales_channel_access_key: '%env(SHOPWARE_SALES_CHANNEL_ACCESS_KEY)%'
|
||||
|
||||
retriex.commerce.search_repair.enabled: true
|
||||
retriex.commerce.search_repair.max_queries: 2
|
||||
retriex.commerce.search_repair.min_primary_results_without_repair: 2
|
||||
|
||||
# Commerce query parser configuration.
|
||||
# YAML is the only operative source of truth; PHP must not contain parser defaults.
|
||||
retriex.commerce_query.config:
|
||||
known_brands:
|
||||
- heyl
|
||||
- horiba
|
||||
- neomeris
|
||||
|
||||
phrases_to_remove:
|
||||
- ich suche
|
||||
- suche
|
||||
- habt ihr
|
||||
- gibt es
|
||||
- gebe mir
|
||||
- gib mir
|
||||
- zeige mir
|
||||
- welches gerät
|
||||
- welche gerät
|
||||
- welches modell
|
||||
- welches ist besser
|
||||
- welches ist am besten
|
||||
- alternative
|
||||
- alternativen
|
||||
- unter anderem
|
||||
- u a
|
||||
- welche
|
||||
- welcher
|
||||
- welches
|
||||
- welchen
|
||||
- sind
|
||||
- ist
|
||||
- geeignet
|
||||
- geeigent
|
||||
- verfügbarkeit
|
||||
- verfuegbarkeit
|
||||
|
||||
filter_search_tokens:
|
||||
- auch
|
||||
- noch
|
||||
- nochmal
|
||||
- zusätzlich
|
||||
- dazu
|
||||
- davon
|
||||
- stattdessen
|
||||
- bitte
|
||||
- gern
|
||||
- gerne
|
||||
- zeige
|
||||
- zeig
|
||||
- such
|
||||
- suche
|
||||
- finde
|
||||
- find
|
||||
- mir
|
||||
- mal
|
||||
- von
|
||||
- im
|
||||
- in
|
||||
- für
|
||||
- fuer
|
||||
- welche
|
||||
- welcher
|
||||
- welches
|
||||
- welchen
|
||||
- sind
|
||||
- ist
|
||||
- geeignet
|
||||
- geeigent
|
||||
- verfügbarkeit
|
||||
- verfuegbarkeit
|
||||
- prüfe
|
||||
- pruefe
|
||||
- den
|
||||
- die
|
||||
- das
|
||||
- der
|
||||
- dem
|
||||
- des
|
||||
- und
|
||||
- oder
|
||||
- sowie
|
||||
- seine
|
||||
- seinen
|
||||
- seiner
|
||||
- seinem
|
||||
- seines
|
||||
- siene
|
||||
- sienen
|
||||
- siener
|
||||
- sienem
|
||||
- sienes
|
||||
- gebe
|
||||
- gib
|
||||
- nenne
|
||||
- nenn
|
||||
- preis
|
||||
- preise
|
||||
- preisen
|
||||
- kostet
|
||||
- kpstet
|
||||
- ksotet
|
||||
- kosten
|
||||
- ua
|
||||
- also
|
||||
- gut
|
||||
- gute
|
||||
- guten
|
||||
- guter
|
||||
- gutes
|
||||
- passen
|
||||
- passend
|
||||
|
||||
search_control_tokens:
|
||||
- shop
|
||||
- store
|
||||
- produkt
|
||||
- produkte
|
||||
- artikel
|
||||
- kaufen
|
||||
- kaufe
|
||||
- bestellen
|
||||
- bestelle
|
||||
- online
|
||||
|
||||
search_token_corrections:
|
||||
siene: seine
|
||||
sienen: seinen
|
||||
siener: seiner
|
||||
sienem: seinem
|
||||
sienes: seines
|
||||
indicatoren: indikatoren
|
||||
schwinnbad: schwimmbad
|
||||
schwimbad: schwimmbad
|
||||
kpstet: kostet
|
||||
ksotet: kostet
|
||||
|
||||
search_token_canonical_map:
|
||||
indikatoren: indikator
|
||||
indicators: indikator
|
||||
indicator: indikator
|
||||
reagenzien: reagenz
|
||||
reagents: reagenz
|
||||
reagent: reagenz
|
||||
produkte: produkt
|
||||
|
||||
semantic_shop_search_tokens:
|
||||
- indikator
|
||||
- indicator
|
||||
- reagenz
|
||||
- reagent
|
||||
- zubehör
|
||||
- zubehor
|
||||
- ersatzteil
|
||||
- anschlusskabel
|
||||
- kabel
|
||||
- sensorkabel
|
||||
- elektrodenkabel
|
||||
- verbrauchsmaterial
|
||||
- chemie
|
||||
- indikatorchemie
|
||||
- reagenzchemie
|
||||
- kit
|
||||
- set
|
||||
- filter
|
||||
- pumpe
|
||||
- pumpenkopf
|
||||
- motorblock
|
||||
- lösung
|
||||
- loesung
|
||||
- solution
|
||||
- teststreifen
|
||||
- gerät
|
||||
- geraet
|
||||
- messgerät
|
||||
- messgeraet
|
||||
- analysegerät
|
||||
- analysegeraet
|
||||
- analysator
|
||||
- monitor
|
||||
- controller
|
||||
- system
|
||||
|
||||
normalization:
|
||||
search: ['€']
|
||||
replace: [' euro ']
|
||||
|
||||
text:
|
||||
trim_characters:
|
||||
- space
|
||||
- tab
|
||||
- lf
|
||||
- cr
|
||||
- nul
|
||||
- vertical_tab
|
||||
- '-'
|
||||
- '.'
|
||||
- ','
|
||||
|
||||
limits:
|
||||
min_search_token_length: 1
|
||||
min_direct_product_token_length: 1
|
||||
direct_product_max_tokens: 4
|
||||
model_context_token_window: 4
|
||||
min_meaningful_alpha_token_length: 2
|
||||
max_shop_search_tokens: 6
|
||||
|
||||
patterns:
|
||||
history_context: 'chat|auch|noch|nochmal|zusätzlich|dazu|davon|stattdessen|alternative|alternativen|größer|groesser|kleiner|gleich(?:e|en|er|es)?|derselbe|dieselbe|dasselbe|wie oben|wie zuvor|wie gehabt'
|
||||
history_context_value_template: '/\b({fragment})\b/u'
|
||||
prompt_sanitize: '/[^\p{L}\p{N}\s.,\-]/u'
|
||||
whitespace_collapse: '/\s+/u'
|
||||
whitespace_split: '/\s+/u'
|
||||
history_question: '/^Question:\s*(.+)$/m'
|
||||
price_between: '/\bzwischen\s+(\d+(?:[.,]\d+)?)\s+und\s+(\d+(?:[.,]\d+)?)\s+euro\b/u'
|
||||
price_max: '/\b(?:unter|bis|max(?:imal)?)\s+(\d+(?:[.,]\d+)?)\s+euro\b/u'
|
||||
price_min: '/\b(?:ab|mindestens|min)\s+(\d+(?:[.,]\d+)?)\s+euro\b/u'
|
||||
price_removal_between: '/\bzwischen\s+\d+(?:[.,]\d+)?\s+und\s+\d+(?:[.,]\d+)?\s*euro\b/u'
|
||||
price_removal_minmax: '/\b(?:unter|bis|max(?:imal)?|ab|mindestens|min)\s+\d+(?:[.,]\d+)?\s*euro\b/u'
|
||||
price_removal_intent_template: '/\b(?:{price_pattern})\b/u'
|
||||
direct_product_digit: '/\d/u'
|
||||
model_like: '/\b[a-zäöüß][a-zäöüß®\-]*(?:\s+[a-zäöüß][a-zäöüß®\-]*){0,2}\s+\d{2,5}[a-z0-9\-]*\b/u'
|
||||
accessory_like: '/\b(?:indikator|indicator|reagenz|reagent|kit|set)\s+\d{1,5}[a-z0-9\-]*\b/u'
|
||||
contains_digit: '/\d/u'
|
||||
model_number_token: '/^(?:\d{2,5}[a-z0-9\-]*|[a-z]{1,6}\d{1,5}[a-z0-9\-]*)$/u'
|
||||
model_context_token: '/^[\p{L}][\p{L}0-9®\-]{2,}$/u'
|
||||
model_suffix_token: '/^[a-z]{1,4}\d{0,3}$/u'
|
||||
instruction_or_presentation_token: '/^(?:zeig(?:e)?|such(?:e)?|find(?:e)?|gib|gebe|nenn(?:e)?|liefer(?:e)?|erstelle?|mach(?:e)?|brauch(?:e)?|will|möchte|moechte|hätte|haette|kannst|bitte|mal|alle|alles|komplett|vollständig|vollstaendig|gesamt|ganze|ganzen|liste|listung|auflistung|tabelle|tabellarisch|übersicht|uebersicht|anzeigen?|ausgeben?|darstellen?|antwort(?:e)?|erklär(?:e)?|erklaer(?:e)?|info|infos|informationen|dazu|hierzu|damit|davon|an|als|mit|ohne|inkl|inklusive|also|gut|gute|guten|guter|gutes|passend|passen)$/u'
|
||||
measurement_value_token: '/^\d+[.,]\d+$/u'
|
||||
exact_token_removal_template: '/\b{token}\b/u'
|
||||
brand_part_of_model_template: '/\b{brand}\s+\d{2,5}[a-z0-9\-]*\b/u'
|
||||
|
||||
# Commerce reference resolver configuration.
|
||||
# YAML is the only operative source of truth for conversation product and focus-term patterns.
|
||||
retriex.commerce_reference_resolver.config:
|
||||
conversation_product_patterns:
|
||||
- '/\b(Testomat\s+2000\s+THCL)\b/ui'
|
||||
- '/\b(Testomat\s+808)\b/ui'
|
||||
- '/\b(Testomat\s+EVO\s+TH)\b/ui'
|
||||
- '/\b(Testomat\s+EVO\s+CALC)\b/ui'
|
||||
- '/\b(Testomat\s+ECO\s+PLUS)\b/ui'
|
||||
- '/\b(Testomat\s+ECO\s+C)\b/ui'
|
||||
- '/\b(Testomat\s+ECO)\b/ui'
|
||||
- '/\b(Testomat\s+LAB\s+CL)\b/ui'
|
||||
- '/\b(Testomat\s+LAB\s+MONO)\b/ui'
|
||||
- '/\b(Testomat\s+2000)\b/ui'
|
||||
|
||||
focus_term_patterns:
|
||||
indikator: '/\bindikator(?:en)?\b/u'
|
||||
indikatoren: '/\bindikator(?:en)?\b/u'
|
||||
reagenz: '/\breagenz(?:ien)?\b/u'
|
||||
reagenzien: '/\breagenz(?:ien)?\b/u'
|
||||
zubehör: '/\bzubeh[oö]r\b/u'
|
||||
ersatzteil: '/\bersatzteile?\b/u'
|
||||
ersatzteile: '/\bersatzteile?\b/u'
|
||||
service-set: '/\bservice(?:\s|-)?set\b/u'
|
||||
filter: '/\bfilter\b/u'
|
||||
pumpenkopf: '/\bpumpenkopf\b/u'
|
||||
motorblock: '/\bmotorblock\b/u'
|
||||
mehrwertpaket: '/\bmehrwertpaket\b/u'
|
||||
neotecmaster: '/\bneotecmaster\b/u'
|
||||
|
||||
# Shop matching and presentation configuration.
|
||||
# YAML is the only operative source of truth; PHP must not contain shop matching defaults.
|
||||
retriex.shop_matching.config:
|
||||
top_product_log_limit: 3
|
||||
|
||||
# Shop role and focus lists are resolved from config/retriex/vocabulary.yaml.
|
||||
# Direct list overrides may still be added to this parameter if a project needs them.
|
||||
vocabulary_views:
|
||||
device_focus_keywords: shop.device_focus
|
||||
accessory_focus_keywords: shop.accessory_focus
|
||||
device_query_keywords: shop.device_query
|
||||
accessory_query_keywords: shop.accessory_query
|
||||
accessory_product_keywords: shop.accessory_product
|
||||
device_product_keywords: shop.device_product
|
||||
|
||||
vocabulary_maps:
|
||||
accessory_focus_variant_map: shop.accessory_focus_variants
|
||||
|
||||
role_guard:
|
||||
filter_accessory_products_for_device_queries: true
|
||||
keep_ambiguous_products_for_device_queries: true
|
||||
|
||||
scores:
|
||||
exact_product_number_phrase: 160
|
||||
exact_product_name_phrase: 90
|
||||
exact_manufacturer_match: 40
|
||||
brand_contained_in_name: 20
|
||||
name_token_overlap_weight: 6
|
||||
product_number_token_overlap_weight: 10
|
||||
corpus_token_overlap_weight: 2
|
||||
name_number_overlap_weight: 18
|
||||
product_number_number_overlap_weight: 28
|
||||
corpus_number_overlap_weight: 8
|
||||
size_match: 12
|
||||
availability_bonus: 1
|
||||
device_query_device_product_bonus: 60
|
||||
device_query_accessory_penalty: 120
|
||||
accessory_query_accessory_product_bonus: 30
|
||||
accessory_query_device_product_bonus: 10
|
||||
|
||||
patterns:
|
||||
contains_digit: '/\d/u'
|
||||
matching_cleanup: '/[^\p{L}\p{N}]+/u'
|
||||
whitespace_collapse: '/\s+/u'
|
||||
token_split: '/[^\p{L}\p{N}]+/u'
|
||||
|
||||
padding:
|
||||
prefix: ' '
|
||||
suffix: ' '
|
||||
|
||||
price:
|
||||
normalization_search: ['€', ' ', '.']
|
||||
normalization_replace: ['', '', '']
|
||||
decimals: 2
|
||||
decimal_separator: ','
|
||||
thousands_separator: '.'
|
||||
suffix: ' €'
|
||||
|
||||
custom_fields:
|
||||
primary: migration_Backup_product_attr1
|
||||
secondary: migration_Backup_product_attr2
|
||||
use_cases: migration_Backup_product_attr4
|
||||
languages: migration_Backup_product_attr5
|
||||
|
||||
text:
|
||||
primary_secondary_separator: ': '
|
||||
use_cases_label: 'Einsatzgebiete: '
|
||||
languages_label: 'Sprachen: '
|
||||
custom_field_join_separator: ' | '
|
||||
|
||||
description:
|
||||
empty_line_pattern: '/^[ \t]*\R/m'
|
||||
whitespace_cleanup_pattern: '/[ \t]{2,}/'
|
||||
max_length: 1500
|
||||
|
||||
seo:
|
||||
relative_prefix: '/'
|
||||
|
||||
highlight:
|
||||
available_label: Verfügbar
|
||||
unavailable_label: Nicht verfügbar
|
||||
product_number_prefix: 'Produktnummer: '
|
||||
|
||||
image:
|
||||
missing_placeholder: no-image
|
||||
|
||||
deduplication:
|
||||
separator: '|'
|
||||
323
retriex_work/config/retriex/intent.yaml
Normal file
323
retriex_work/config/retriex/intent.yaml
Normal file
@@ -0,0 +1,323 @@
|
||||
# Intent vocabulary and pattern configuration.
|
||||
# Lists and thresholds mirror the previous PHP defaults exactly.
|
||||
# Migrated config areas are YAML-only; remaining areas are migrated incrementally.
|
||||
parameters:
|
||||
retriex.intent.commerce.config:
|
||||
strong_signals:
|
||||
- shop
|
||||
- alle
|
||||
- preis
|
||||
- kunde
|
||||
- online
|
||||
- produkt
|
||||
- artikel
|
||||
- sku
|
||||
- kaufen
|
||||
- kostet
|
||||
- kpstet
|
||||
- ksotet
|
||||
- suche
|
||||
- such
|
||||
- finde
|
||||
- finden
|
||||
- analysegerät
|
||||
- analysegeraet
|
||||
- messgerät
|
||||
- messgeraet
|
||||
- pockettester
|
||||
- pocket tester
|
||||
- handmessgerät
|
||||
- handmessgeraet
|
||||
- analysator
|
||||
- analyzer
|
||||
- puffer
|
||||
- kalibrierpuffer
|
||||
- kalibrierlösung
|
||||
- kalibrierloesung
|
||||
- kalibrierung
|
||||
- chemie
|
||||
- reagenz
|
||||
- reagenzien
|
||||
- verbrauchsmaterial
|
||||
- zubehör
|
||||
- zubehoer
|
||||
- ersatzteil
|
||||
- anschlusskabel
|
||||
- kabel
|
||||
- sensorkabel
|
||||
- elektrode
|
||||
- elektrodenkabel
|
||||
non_product_commerce_signals:
|
||||
- shop
|
||||
- alle
|
||||
- kunde
|
||||
- online
|
||||
- kaufen
|
||||
- kostet
|
||||
- kpstet
|
||||
- ksotet
|
||||
- suche
|
||||
- such
|
||||
- finde
|
||||
- finden
|
||||
advisory_signals:
|
||||
- passt
|
||||
- eignet
|
||||
- besser
|
||||
- besten
|
||||
- gut
|
||||
- gut für
|
||||
- gut fuer
|
||||
- passend für
|
||||
- passend fuer
|
||||
- geeignet
|
||||
- geeigent
|
||||
- empfiehl
|
||||
- empfehl
|
||||
advisory_product_selection_patterns:
|
||||
- '/\bmit\s+welche(?:m|n|r|s)?\s+(?:testomat(?:en)?|pockettester|pocket\s+tester|analysegerät|analysegeraet|messgerät|messgeraet|analysator|analyzer)\b.*\b(?:messen|messung|überwach(?:en|ung)?|ueberwach(?:en|ung)?)\b/u'
|
||||
- '/\bwelche(?:r|s|n|m)?\s+(?:testomat(?:en)?|pockettester|pocket\s+tester|analysegerät|analysegeraet|messgerät|messgeraet|analysator|analyzer)\b.*\b(?:kann|können|koennen|misst|messen|überwacht|ueberwacht|eignet|geeignet|passt|gut|empfehl)\b.*\b(?:messen|messung|überwach(?:en|ung)?|ueberwach(?:en|ung)?)\b/u'
|
||||
- '/\b(?:testomat(?:en)?|pockettester|pocket\s+tester|analysegerät|analysegeraet|messgerät|messgeraet|analysator|analyzer)\b.*\b(?:für|fuer)\b.*\b(?:messung|messen|überwachung|ueberwachung)\b/u'
|
||||
- '/\b(?:ich\s+)?(?:würde|wuerde|möchte|moechte|will|brauche|benötige|benoetige)\b.{0,80}\b(?:messen|messung|überwachen|ueberwachen|kontrollieren)\b/u'
|
||||
- '/\b(?:messen|messung|überwachen|ueberwachen|kontrollieren)\b.{0,80}\b(?:schwimmbad|pool|becken|wasseranalyse)\b/u'
|
||||
price_terms:
|
||||
- euro
|
||||
- €
|
||||
- eur
|
||||
- teuer
|
||||
- preis
|
||||
- kosten
|
||||
- kostet
|
||||
- kpstet
|
||||
- ksotet
|
||||
color_terms:
|
||||
- schwarz
|
||||
- weiß
|
||||
- weis
|
||||
- blau
|
||||
- grau
|
||||
- beige
|
||||
- rosa
|
||||
- pink
|
||||
- gruen
|
||||
- orange
|
||||
- braun
|
||||
size_token_terms:
|
||||
- xs
|
||||
- s
|
||||
- m
|
||||
- l
|
||||
- xl
|
||||
- xxl
|
||||
- xxxxl
|
||||
size_terms:
|
||||
- größe
|
||||
- groesse
|
||||
- grösse
|
||||
support_diagnostic_patterns:
|
||||
- '/\bfehler\b/u'
|
||||
- '/\bfehlercode\b/u'
|
||||
- '/\berror\b/u'
|
||||
- '/\bstörung\b/u'
|
||||
- '/\bstoerung\b/u'
|
||||
- '/\balarm\b/u'
|
||||
- '/\bstörungsmeldung\b/u'
|
||||
- '/\bstoerungsmeldung\b/u'
|
||||
- '/\bmeldung\b/u'
|
||||
- '/\bwarnung\b/u'
|
||||
- '/\bwarncode\b/u'
|
||||
- '/\bcode\b/u'
|
||||
- '/\bwas bedeutet\b/u'
|
||||
- '/\bwarum\b/u'
|
||||
- '/\bblinkt\b/u'
|
||||
- '/\bzeigt\b/u'
|
||||
- '/\bzeigt an\b/u'
|
||||
- '/\bursache\b/u'
|
||||
- '/\bdiagnose\b/u'
|
||||
- '/\bservicefall\b/u'
|
||||
- '/\bproblem\b/u'
|
||||
- '/\bstörung beheben\b/u'
|
||||
- '/\bstoerung beheben\b/u'
|
||||
- '/\be\d{1,3}\b/u'
|
||||
explicit_commerce_intent_patterns:
|
||||
- '/\bshop\b/u'
|
||||
- '/\bpreis\b/u'
|
||||
- '/\bkosten\b/u'
|
||||
- '/\bkostet\b/u'
|
||||
- '/\bkpstet\b/u'
|
||||
- '/\bksotet\b/u'
|
||||
- '/\bkaufen\b/u'
|
||||
- '/\bbestellen\b/u'
|
||||
- '/\bprodukt\b/u'
|
||||
- '/\bartikel\b/u'
|
||||
- '/\bsku\b/u'
|
||||
- '/\bonline\b/u'
|
||||
- '/\bchemie\b/u'
|
||||
- '/\breagenz(?:ien)?\b/u'
|
||||
- '/\bverbrauchsmaterial(?:ien)?\b/u'
|
||||
- '/\bzubehör\b/u'
|
||||
- '/\bzubehoer\b/u'
|
||||
- '/\bersatzteil(?:e)?\b/u'
|
||||
- '/\banschlusskabel\b/u'
|
||||
- '/\bkabel\b/u'
|
||||
- '/\bsensorkabel\b/u'
|
||||
- '/\belektrodenkabel\b/u'
|
||||
technical_factual_knowledge:
|
||||
signal_label: technical_factual_knowledge_query
|
||||
question_marker_patterns:
|
||||
- '/\bwas\s+ist\b/u'
|
||||
- '/\bwelche?r?s?\b/u'
|
||||
- '/\bwie\s+(hoch|niedrig|klein|gross|groß)\b/u'
|
||||
- '/\bniedrigste[rsn]?\b/u'
|
||||
- '/\bkleinste[rsn]?\b/u'
|
||||
- '/\bhöchste[rsn]?\b/u'
|
||||
- '/\bhoechste[rsn]?\b/u'
|
||||
fact_patterns:
|
||||
- '/\bgrenzwert(?:e|en|es)?\b/u'
|
||||
- '/\bmessbereich(?:e|en|s)?\b/u'
|
||||
- '/\bwasserhärte\b/u'
|
||||
- '/\bwasserhaerte\b/u'
|
||||
- '/\bresthärte\b/u'
|
||||
- '/\bresthaerte\b/u'
|
||||
- '/\bgesamthärte\b/u'
|
||||
- '/\bgesamthaerte\b/u'
|
||||
- '/\bauflösung\b/u'
|
||||
- '/\baufloesung\b/u'
|
||||
- '/\bindikator(?:en|s)?\b/u'
|
||||
- '/\btestomat(?:en|s)?\b/u'
|
||||
- '/\büberwach(?:t|en|ung)\b/u'
|
||||
- '/\bueberwach(?:t|en|ung)\b/u'
|
||||
- '/\bmess(?:en|ung|bar|wert)\b/u'
|
||||
patterns:
|
||||
sku_like: '/\b\d{4,10}\b/u'
|
||||
price_value_template: '/\b\d+(?:[.,]\d+)?\s*(?:{price_pattern})\b/u'
|
||||
size_extraction_template: '/\b(?:{size_pattern})\s*([a-z0-9.-]+)\b/u'
|
||||
size_value_template: '/\b(?:{size_pattern})\s*[a-z0-9.-]+\b/u'
|
||||
size_token_value_template: '/\b(?:{size_token_pattern})\b/u'
|
||||
color_value_template: '/\b(?:{color_pattern})\b/u'
|
||||
model_like_product: '/\b[a-zäöüß][a-zäöüß®\-]*(?:\s+[a-zäöüß][a-zäöüß®\-]*){0,2}\s+\d{2,5}[a-z0-9\-]*\b/u'
|
||||
labels:
|
||||
support_or_diagnostic_signal: support_or_diagnostic
|
||||
sku_signal: sku
|
||||
price_signal: price
|
||||
size_signal: size
|
||||
size_token_signal: size_token
|
||||
color_signal: color
|
||||
advisory_signal_prefix: 'advisory:'
|
||||
advisory_product_selection_signal: advisory_product_selection
|
||||
model_like_product_signal: model_like_product
|
||||
scores:
|
||||
product_search_min_score: 3
|
||||
advisory_product_search_min_score: 2
|
||||
strong_signal_score: 3
|
||||
sku_signal_score: 2
|
||||
price_signal_score: 2
|
||||
size_signal_score: 2
|
||||
size_token_signal_score: 1
|
||||
color_signal_score: 1
|
||||
advisory_signal_score: 1
|
||||
advisory_product_selection_signal_score: 3
|
||||
model_like_product_signal_score: 3
|
||||
|
||||
retriex.intent.catalog.config:
|
||||
min_score: 0.72
|
||||
ambiguity_delta: 0.02
|
||||
intent_search_limit: 6
|
||||
list_search_limit: 3
|
||||
min_allowed_score: 0.0
|
||||
max_allowed_score: 1.0
|
||||
|
||||
retriex.intent.light.config:
|
||||
list_threshold: 4
|
||||
quantity_words:
|
||||
- alle
|
||||
- sämtliche
|
||||
- saemtliche
|
||||
- mehrere
|
||||
- verschiedene
|
||||
- einige
|
||||
- viele
|
||||
- optionen
|
||||
- möglichkeiten
|
||||
- moeglichkeiten
|
||||
- varianten
|
||||
- arten
|
||||
- modelle
|
||||
- funktionen
|
||||
- punkte
|
||||
- schritte
|
||||
- kategorien
|
||||
- übersicht
|
||||
- uebersicht
|
||||
strong_patterns:
|
||||
- '/\bliste(n)?\b/u'
|
||||
- '/\bauflisten\b/u'
|
||||
- '/\baufz(a|ä)hl(en)?\b/u'
|
||||
- '/\bnenn(e)?\b/u'
|
||||
- '/\bzeig(e)?\b/u'
|
||||
- '/\bwelche\s+sind\b/u'
|
||||
- '/\bwelche\s+gibt\s+es\b/u'
|
||||
- '/\bwas\s+sind\b/u'
|
||||
- '/\bwie\s+viele\b/u'
|
||||
- '/\branking\b/u'
|
||||
- '/\btop\s*\d+\b/u'
|
||||
|
||||
retriex.intent.sales.config:
|
||||
dominance_delta: 2
|
||||
min_score_threshold: 3
|
||||
sales_signals:
|
||||
- preis
|
||||
- preise
|
||||
- kosten
|
||||
- lizenz
|
||||
- lizenzmodell
|
||||
- tarif
|
||||
- tarife
|
||||
- gebuehr
|
||||
- gebühr
|
||||
- monatlich
|
||||
- jaehrlich
|
||||
- jährlich
|
||||
- abo
|
||||
- subscription
|
||||
comparison_signals:
|
||||
- '/\bvergleich(en)?\b/u'
|
||||
- '/\bvs\b/u'
|
||||
- '/\bgegenueber\b/u'
|
||||
- '/\balternative(n)?\b/u'
|
||||
- '/\bunterschied(e)?\b/u'
|
||||
- '/\bbesser\b/u'
|
||||
objection_signals:
|
||||
- problem
|
||||
- risiko
|
||||
- nachteil
|
||||
- datenschutz
|
||||
- dsgvo
|
||||
- sicherheit
|
||||
- compliance
|
||||
- kritik
|
||||
- zweifel
|
||||
- unsicher
|
||||
implementation_signals:
|
||||
- implementierung
|
||||
- implementieren
|
||||
- integration
|
||||
- integrieren
|
||||
- einführung
|
||||
- einfuehrung
|
||||
- aufwand
|
||||
- setup
|
||||
- rollout
|
||||
- migration
|
||||
- installation
|
||||
- api
|
||||
- schnittstelle
|
||||
roi_signals:
|
||||
- roi
|
||||
- rentabilitaet
|
||||
- rentabilität
|
||||
- business case
|
||||
- einsparung
|
||||
- kosten senken
|
||||
- umsatz steigern
|
||||
- effizienz steigern
|
||||
Reference in New Issue
Block a user