last step
This commit is contained in:
96
RETRIEX_PATCH_4_QUERY_SEARCHREPAIR_YAML_ONLY_README.md
Normal file
96
RETRIEX_PATCH_4_QUERY_SEARCHREPAIR_YAML_ONLY_README.md
Normal file
@@ -0,0 +1,96 @@
|
|||||||
|
# RetrieX Patch 4: Query Enrichment and Search Repair YAML-only
|
||||||
|
|
||||||
|
This patch continues the YAML-as-source-of-truth migration.
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
Changed areas only:
|
||||||
|
|
||||||
|
- `src/Config/QueryEnricherConfig.php`
|
||||||
|
- `src/Config/SearchRepairConfig.php`
|
||||||
|
- `config/retriex/query_enrichment.yaml`
|
||||||
|
- `config/retriex/search_repair.yaml`
|
||||||
|
- `config/retriex/vocabulary.yaml`
|
||||||
|
- `config/services.yaml`
|
||||||
|
|
||||||
|
No retrieval scoring, PromptBuilder, ShopService matching, AgentRunner, SSE, or answer-generation behavior was intentionally changed.
|
||||||
|
|
||||||
|
## QueryEnricherConfig
|
||||||
|
|
||||||
|
Removed the PHP fallback mapping `DEFAULT_ENRICH_QUERY_LIST`.
|
||||||
|
|
||||||
|
The existing values are now required in:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
config/retriex/query_enrichment.yaml
|
||||||
|
parameters:
|
||||||
|
retriex.query_enrichment.config:
|
||||||
|
max_expansions: 4
|
||||||
|
rules: ...
|
||||||
|
```
|
||||||
|
|
||||||
|
Missing, invalid, or empty YAML now fails closed via `InvalidArgumentException` instead of silently falling back to PHP defaults.
|
||||||
|
|
||||||
|
## SearchRepairConfig
|
||||||
|
|
||||||
|
Removed PHP fallback constants and hardcoded fallback values for:
|
||||||
|
|
||||||
|
- generic candidate tokens
|
||||||
|
- accessory candidate terms
|
||||||
|
- accessory/bundle terms
|
||||||
|
- specificity boost terms
|
||||||
|
- requested accessory-code behavior
|
||||||
|
- model-candidate patterns
|
||||||
|
- search-repair scoring values
|
||||||
|
- normalization patterns
|
||||||
|
- trim character set
|
||||||
|
- product-key separator
|
||||||
|
- top-product log limit
|
||||||
|
|
||||||
|
The active search-repair settings now live in:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
config/retriex/search_repair.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
Vocabulary-backed search-repair term views remain in:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
config/retriex/vocabulary.yaml
|
||||||
|
parameters:
|
||||||
|
retriex.vocabulary.config:
|
||||||
|
views:
|
||||||
|
search_repair: ...
|
||||||
|
```
|
||||||
|
|
||||||
|
`SearchRepairConfig` now resolves term lists from explicit per-service YAML keys first, otherwise from the configured vocabulary view. If neither exists, it fails closed.
|
||||||
|
|
||||||
|
## Validation performed in the artifact workspace
|
||||||
|
|
||||||
|
PHP syntax checks passed for:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
php -l src/Config/QueryEnricherConfig.php
|
||||||
|
php -l src/Config/SearchRepairConfig.php
|
||||||
|
php -l src/Config/RetriexEffectiveConfigProvider.php
|
||||||
|
```
|
||||||
|
|
||||||
|
Full Symfony commands were not executed in this ZIP workspace because `vendor/` is not included.
|
||||||
|
|
||||||
|
Recommended runtime checks after applying the patch:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
php bin/console mto:agent:config:validate
|
||||||
|
php bin/console mto:agent:config:dump-effective --json
|
||||||
|
php bin/console mto:agent:config:audit-source --details
|
||||||
|
php bin/console mto:agent:regression:test
|
||||||
|
```
|
||||||
|
|
||||||
|
Regression focus:
|
||||||
|
|
||||||
|
- Testomat 808 / 0,02 °dH
|
||||||
|
- Indikatortyp 300 follow-up
|
||||||
|
- Indikator price follow-up
|
||||||
|
- `welcher pockettester ist für Redox messung gut` → `suche im shop`
|
||||||
|
- product-selection shop fallback for free chlorine
|
||||||
|
- SSE completion watchdog behavior
|
||||||
28
RETRIEX_PATCH_4_TEMPLATE_VALIDATION_HOTFIX_README.md
Normal file
28
RETRIEX_PATCH_4_TEMPLATE_VALIDATION_HOTFIX_README.md
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
# RetrieX Patch 4 Hotfix: SearchRepair pattern-template validation
|
||||||
|
|
||||||
|
This hotfix fixes an overly strict placeholder validation in `SearchRepairConfig`.
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Patch 4 moved SearchRepair patterns to YAML. The new validator rejected every rendered template that still contained a `{` character. This also rejected valid regex quantifiers such as `\d{1,5}`.
|
||||||
|
|
||||||
|
Symfony error example:
|
||||||
|
|
||||||
|
```text
|
||||||
|
RetrieX search repair pattern template "patterns.accessory_candidate_template" contains unresolved placeholders.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Fix
|
||||||
|
|
||||||
|
The validator now only treats named placeholders like `{terms}` or `{model}` as unresolved. Regex quantifiers such as `{1,5}` remain valid.
|
||||||
|
|
||||||
|
No retrieval, prompt, shop, scoring, or SearchRepair behavior is changed.
|
||||||
|
|
||||||
|
## After applying
|
||||||
|
|
||||||
|
```bash
|
||||||
|
php bin/console cache:clear
|
||||||
|
php bin/console mto:agent:config:validate
|
||||||
|
php bin/console mto:agent:config:audit-source --details
|
||||||
|
php bin/console mto:agent:regression:test
|
||||||
|
```
|
||||||
72
RETRIEX_PATCH_5_COMMERCE_QUERY_YAML_ONLY_README.md
Normal file
72
RETRIEX_PATCH_5_COMMERCE_QUERY_YAML_ONLY_README.md
Normal file
@@ -0,0 +1,72 @@
|
|||||||
|
# RetrieX Patch 5: CommerceQueryParser YAML-only
|
||||||
|
|
||||||
|
Basis: aktualisierte `rag-inprogress.zip` nach Patch 4.
|
||||||
|
|
||||||
|
## Ziel
|
||||||
|
|
||||||
|
`CommerceQueryParserConfig` darf keine fachlichen PHP-Defaults mehr enthalten. Alle Commerce-Query-Parser-Werte liegen jetzt in YAML unter:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
retriex.commerce_query.config
|
||||||
|
```
|
||||||
|
|
||||||
|
## Geaenderte Dateien
|
||||||
|
|
||||||
|
- `config/retriex/commerce.yaml`
|
||||||
|
- `config/retriex/vocabulary.yaml`
|
||||||
|
- `config/services.yaml`
|
||||||
|
- `src/Config/CommerceQueryParserConfig.php`
|
||||||
|
- `src/Config/RetriexEffectiveConfigProvider.php`
|
||||||
|
|
||||||
|
## Inhalt
|
||||||
|
|
||||||
|
Nach YAML verschoben wurden:
|
||||||
|
|
||||||
|
- bekannte Marken
|
||||||
|
- zu entfernende Commerce-Phrasen
|
||||||
|
- Filter-/Stop-Tokens fuer Shop-Suchtexte
|
||||||
|
- Suchtoken-Korrekturen
|
||||||
|
- Canonical Maps
|
||||||
|
- semantische Shop-Suchtokens
|
||||||
|
- Normalisierung fuer `€` -> ` euro `
|
||||||
|
- Trim-Zeichen fuer Query-Texte
|
||||||
|
- Limits fuer Tokenlaengen und Suchtextaufbau
|
||||||
|
- Regexe fuer History-Kontext, Preise, Modell-/Zubehoer-Erkennung, Messwerte
|
||||||
|
- Regex-Templates fuer exakte Token-Entfernung und Brand/Model-Erkennung
|
||||||
|
|
||||||
|
## Wichtig
|
||||||
|
|
||||||
|
- `CommerceQueryParserConfig` enthaelt keine `private const` Defaults mehr.
|
||||||
|
- Fehlende Pflichtwerte fallen nicht mehr still auf PHP zurueck, sondern erzeugen einen `InvalidArgumentException`.
|
||||||
|
- Die alten `commerce_query`-Vocabulary-Views/Maps wurden aus `vocabulary.yaml` entfernt, damit nicht zwei YAML-Stellen scheinbar denselben Parser steuern.
|
||||||
|
- `config/services.yaml` injiziert nur noch `%retriex.commerce_query.config%`; die alte Vocabulary-Abhaengigkeit im Parser-Config-Service ist entfernt.
|
||||||
|
|
||||||
|
## Nicht geaendert
|
||||||
|
|
||||||
|
- Retrieval-Scoring
|
||||||
|
- PromptBuilder
|
||||||
|
- AgentRunner
|
||||||
|
- Shop-Matching
|
||||||
|
- SSE/Job-Logik
|
||||||
|
- CommerceIntentConfig
|
||||||
|
|
||||||
|
## Lokale Pruefung nach Einspielen
|
||||||
|
|
||||||
|
```bash
|
||||||
|
php bin/console cache:clear
|
||||||
|
php bin/console mto:agent:config:validate
|
||||||
|
php bin/console mto:agent:config:audit-source --details
|
||||||
|
php bin/console mto:agent:regression:test
|
||||||
|
```
|
||||||
|
|
||||||
|
Besonders pruefen:
|
||||||
|
|
||||||
|
- Testomat 808 / 0,02 Grad dH
|
||||||
|
- Folgefrage Indikatortyp 300
|
||||||
|
- Preisfrage zum Indikator
|
||||||
|
- `welcher pockettester ist fuer Redox messung gut` -> `suche im shop`
|
||||||
|
- `mit welchem testomat kann ich freies chlor messen`
|
||||||
|
|
||||||
|
## Hinweis
|
||||||
|
|
||||||
|
Im ZIP-Arbeitsverzeichnis konnte ich Symfony-Kommandos nicht ausfuehren, weil `vendor/` nicht enthalten ist. PHP-Syntax der geaenderten PHP-Dateien wurde mit `php -l` geprueft.
|
||||||
@@ -11,6 +11,215 @@ parameters:
|
|||||||
retriex.commerce.search_repair.max_queries: 2
|
retriex.commerce.search_repair.max_queries: 2
|
||||||
retriex.commerce.search_repair.min_primary_results_without_repair: 2
|
retriex.commerce.search_repair.min_primary_results_without_repair: 2
|
||||||
|
|
||||||
|
# Commerce query parser configuration.
|
||||||
|
# YAML is the only operative source of truth; PHP must not contain parser defaults.
|
||||||
|
retriex.commerce_query.config:
|
||||||
|
known_brands:
|
||||||
|
- heyl
|
||||||
|
- horiba
|
||||||
|
- neomeris
|
||||||
|
|
||||||
|
phrases_to_remove:
|
||||||
|
- ich suche
|
||||||
|
- suche
|
||||||
|
- habt ihr
|
||||||
|
- gibt es
|
||||||
|
- gebe mir
|
||||||
|
- gib mir
|
||||||
|
- zeige mir
|
||||||
|
- welches gerät
|
||||||
|
- welche gerät
|
||||||
|
- welches modell
|
||||||
|
- welches ist besser
|
||||||
|
- welches ist am besten
|
||||||
|
- alternative
|
||||||
|
- alternativen
|
||||||
|
- unter anderem
|
||||||
|
- u a
|
||||||
|
- welche
|
||||||
|
- welcher
|
||||||
|
- welches
|
||||||
|
- welchen
|
||||||
|
- sind
|
||||||
|
- ist
|
||||||
|
- geeignet
|
||||||
|
- geeigent
|
||||||
|
- verfügbarkeit
|
||||||
|
- verfuegbarkeit
|
||||||
|
|
||||||
|
filter_search_tokens:
|
||||||
|
- auch
|
||||||
|
- noch
|
||||||
|
- nochmal
|
||||||
|
- zusätzlich
|
||||||
|
- dazu
|
||||||
|
- davon
|
||||||
|
- stattdessen
|
||||||
|
- bitte
|
||||||
|
- gern
|
||||||
|
- gerne
|
||||||
|
- zeige
|
||||||
|
- zeig
|
||||||
|
- such
|
||||||
|
- suche
|
||||||
|
- finde
|
||||||
|
- find
|
||||||
|
- mir
|
||||||
|
- mal
|
||||||
|
- von
|
||||||
|
- im
|
||||||
|
- in
|
||||||
|
- für
|
||||||
|
- fuer
|
||||||
|
- welche
|
||||||
|
- welcher
|
||||||
|
- welches
|
||||||
|
- welchen
|
||||||
|
- sind
|
||||||
|
- ist
|
||||||
|
- geeignet
|
||||||
|
- geeigent
|
||||||
|
- verfügbarkeit
|
||||||
|
- verfuegbarkeit
|
||||||
|
- prüfe
|
||||||
|
- pruefe
|
||||||
|
- den
|
||||||
|
- die
|
||||||
|
- das
|
||||||
|
- der
|
||||||
|
- dem
|
||||||
|
- des
|
||||||
|
- und
|
||||||
|
- oder
|
||||||
|
- sowie
|
||||||
|
- seine
|
||||||
|
- seinen
|
||||||
|
- seiner
|
||||||
|
- seinem
|
||||||
|
- seines
|
||||||
|
- siene
|
||||||
|
- sienen
|
||||||
|
- siener
|
||||||
|
- sienem
|
||||||
|
- sienes
|
||||||
|
- gebe
|
||||||
|
- gib
|
||||||
|
- nenne
|
||||||
|
- nenn
|
||||||
|
- preis
|
||||||
|
- preise
|
||||||
|
- preisen
|
||||||
|
- kostet
|
||||||
|
- kosten
|
||||||
|
- ua
|
||||||
|
- also
|
||||||
|
- gut
|
||||||
|
- gute
|
||||||
|
- guten
|
||||||
|
- guter
|
||||||
|
- gutes
|
||||||
|
- passen
|
||||||
|
- passend
|
||||||
|
|
||||||
|
search_token_corrections:
|
||||||
|
siene: seine
|
||||||
|
sienen: seinen
|
||||||
|
siener: seiner
|
||||||
|
sienem: seinem
|
||||||
|
sienes: seines
|
||||||
|
indicatoren: indikatoren
|
||||||
|
|
||||||
|
search_token_canonical_map:
|
||||||
|
indikatoren: indikator
|
||||||
|
indicators: indikator
|
||||||
|
indicator: indikator
|
||||||
|
reagenzien: reagenz
|
||||||
|
reagents: reagenz
|
||||||
|
reagent: reagenz
|
||||||
|
produkte: produkt
|
||||||
|
|
||||||
|
semantic_shop_search_tokens:
|
||||||
|
- indikator
|
||||||
|
- indicator
|
||||||
|
- reagenz
|
||||||
|
- reagent
|
||||||
|
- zubehör
|
||||||
|
- zubehor
|
||||||
|
- ersatzteil
|
||||||
|
- verbrauchsmaterial
|
||||||
|
- chemie
|
||||||
|
- indikatorchemie
|
||||||
|
- reagenzchemie
|
||||||
|
- kit
|
||||||
|
- set
|
||||||
|
- filter
|
||||||
|
- pumpe
|
||||||
|
- pumpenkopf
|
||||||
|
- motorblock
|
||||||
|
- lösung
|
||||||
|
- loesung
|
||||||
|
- solution
|
||||||
|
- teststreifen
|
||||||
|
- gerät
|
||||||
|
- geraet
|
||||||
|
- messgerät
|
||||||
|
- messgeraet
|
||||||
|
- analysegerät
|
||||||
|
- analysegeraet
|
||||||
|
- analysator
|
||||||
|
- monitor
|
||||||
|
- controller
|
||||||
|
- system
|
||||||
|
|
||||||
|
normalization:
|
||||||
|
search: ['€']
|
||||||
|
replace: [' euro ']
|
||||||
|
|
||||||
|
text:
|
||||||
|
trim_characters:
|
||||||
|
- space
|
||||||
|
- tab
|
||||||
|
- lf
|
||||||
|
- cr
|
||||||
|
- nul
|
||||||
|
- vertical_tab
|
||||||
|
- '-'
|
||||||
|
- '.'
|
||||||
|
- ','
|
||||||
|
|
||||||
|
limits:
|
||||||
|
min_search_token_length: 1
|
||||||
|
min_direct_product_token_length: 1
|
||||||
|
direct_product_max_tokens: 4
|
||||||
|
model_context_token_window: 4
|
||||||
|
min_meaningful_alpha_token_length: 2
|
||||||
|
max_shop_search_tokens: 6
|
||||||
|
|
||||||
|
patterns:
|
||||||
|
history_context: 'chat|auch|noch|nochmal|zusätzlich|dazu|davon|stattdessen|alternative|alternativen|größer|groesser|kleiner|gleich(?:e|en|er|es)?|derselbe|dieselbe|dasselbe|wie oben|wie zuvor|wie gehabt'
|
||||||
|
history_context_value_template: '/\b({fragment})\b/u'
|
||||||
|
prompt_sanitize: '/[^\p{L}\p{N}\s.,\-]/u'
|
||||||
|
whitespace_collapse: '/\s+/u'
|
||||||
|
whitespace_split: '/\s+/u'
|
||||||
|
history_question: '/^Question:\s*(.+)$/m'
|
||||||
|
price_between: '/\bzwischen\s+(\d+(?:[.,]\d+)?)\s+und\s+(\d+(?:[.,]\d+)?)\s+euro\b/u'
|
||||||
|
price_max: '/\b(?:unter|bis|max(?:imal)?)\s+(\d+(?:[.,]\d+)?)\s+euro\b/u'
|
||||||
|
price_min: '/\b(?:ab|mindestens|min)\s+(\d+(?:[.,]\d+)?)\s+euro\b/u'
|
||||||
|
price_removal_between: '/\bzwischen\s+\d+(?:[.,]\d+)?\s+und\s+\d+(?:[.,]\d+)?\s*euro\b/u'
|
||||||
|
price_removal_minmax: '/\b(?:unter|bis|max(?:imal)?|ab|mindestens|min)\s+\d+(?:[.,]\d+)?\s*euro\b/u'
|
||||||
|
price_removal_intent_template: '/\b(?:{price_pattern})\b/u'
|
||||||
|
direct_product_digit: '/\d/u'
|
||||||
|
model_like: '/\b[a-zäöüß][a-zäöüß®\-]*(?:\s+[a-zäöüß][a-zäöüß®\-]*){0,2}\s+\d{2,5}[a-z0-9\-]*\b/u'
|
||||||
|
accessory_like: '/\b(?:indikator|indicator|reagenz|reagent|kit|set)\s+\d{1,5}[a-z0-9\-]*\b/u'
|
||||||
|
contains_digit: '/\d/u'
|
||||||
|
model_number_token: '/^(?:\d{2,5}[a-z0-9\-]*|[a-z]{1,6}\d{1,5}[a-z0-9\-]*)$/u'
|
||||||
|
model_context_token: '/^[\p{L}][\p{L}0-9®\-]{2,}$/u'
|
||||||
|
model_suffix_token: '/^[a-z]{1,4}\d{0,3}$/u'
|
||||||
|
instruction_or_presentation_token: '/^(?:zeig(?:e)?|such(?:e)?|find(?:e)?|gib|gebe|nenn(?:e)?|liefer(?:e)?|erstelle?|mach(?:e)?|brauch(?:e)?|will|möchte|moechte|hätte|haette|kannst|bitte|mal|alle|alles|komplett|vollständig|vollstaendig|gesamt|ganze|ganzen|liste|listung|auflistung|tabelle|tabellarisch|übersicht|uebersicht|anzeigen?|ausgeben?|darstellen?|antwort(?:e)?|erklär(?:e)?|erklaer(?:e)?|info|infos|informationen|dazu|hierzu|damit|davon|an|als|mit|ohne|inkl|inklusive|also|gut|gute|guten|guter|gutes|passend|passen)$/u'
|
||||||
|
measurement_value_token: '/^\d+[.,]\d+$/u'
|
||||||
|
exact_token_removal_template: '/\b{token}\b/u'
|
||||||
|
brand_part_of_model_template: '/\b{brand}\s+\d{2,5}[a-z0-9\-]*\b/u'
|
||||||
|
|
||||||
# Shop matching and presentation configuration.
|
# Shop matching and presentation configuration.
|
||||||
# Defaults are intentionally identical to the previous PHP values.
|
# Defaults are intentionally identical to the previous PHP values.
|
||||||
retriex.shop_matching.config:
|
retriex.shop_matching.config:
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
# Query enrichment vocabulary.
|
# Query enrichment vocabulary.
|
||||||
# Defaults are intentionally identical to the previous PHP mapping.
|
# YAML is the source of truth; QueryEnricherConfig intentionally contains no PHP defaults.
|
||||||
parameters:
|
parameters:
|
||||||
retriex.query_enrichment.config:
|
retriex.query_enrichment.config:
|
||||||
max_expansions: 4
|
max_expansions: 4
|
||||||
|
|||||||
66
config/retriex/search_repair.yaml
Normal file
66
config/retriex/search_repair.yaml
Normal file
@@ -0,0 +1,66 @@
|
|||||||
|
# Shop search-repair configuration.
|
||||||
|
# YAML is the source of truth; SearchRepairConfig intentionally contains no PHP defaults.
|
||||||
|
parameters:
|
||||||
|
retriex.search_repair.config:
|
||||||
|
strict_requested_accessory_code_repair: true
|
||||||
|
prefer_prompt_anchored_model_for_requested_accessory_code: true
|
||||||
|
|
||||||
|
requested_accessory_code_fallback_query_templates:
|
||||||
|
- '{term} {code}'
|
||||||
|
requested_accessory_code_fallback_terms:
|
||||||
|
- indikatortyp
|
||||||
|
- indikator
|
||||||
|
- indicator
|
||||||
|
- reagenz
|
||||||
|
- reagent
|
||||||
|
requested_accessory_code_context_prefix_terms:
|
||||||
|
- indikatortyp
|
||||||
|
- indikator
|
||||||
|
- indicator
|
||||||
|
- reagenz
|
||||||
|
- reagent
|
||||||
|
requested_accessory_code_proximity_window: 1600
|
||||||
|
|
||||||
|
specific_model_candidate_patterns:
|
||||||
|
- '/\b([A-Za-zÄÖÜäöüß][A-Za-zÄÖÜäöüß®\-]*(?:\s+[A-Za-zÄÖÜäöüß0-9][A-Za-zÄÖÜäöüß0-9®\-]*){0,3}\s+\d{2,5}(?:\s+[A-ZÄÖÜ]{1,8})?)\b/u'
|
||||||
|
model_candidate_exclude_terms:
|
||||||
|
- indikatortyp
|
||||||
|
- indikator
|
||||||
|
- indicator
|
||||||
|
- reagenz
|
||||||
|
- reagent
|
||||||
|
- verfügbarkeit
|
||||||
|
- verfuegbarkeit
|
||||||
|
- shop
|
||||||
|
|
||||||
|
limits:
|
||||||
|
top_product_log_limit: 3
|
||||||
|
|
||||||
|
# Character codes preserve the previous PHP trim charlist:
|
||||||
|
# space, tab, LF, CR, NUL, vertical tab, double quote, single quote,
|
||||||
|
# backtick, dot, comma, semicolon, colon, hyphen.
|
||||||
|
sanitize_trim_character_codes: [32, 9, 10, 13, 0, 11, 34, 39, 96, 46, 44, 59, 58, 45]
|
||||||
|
product_key_separator: '|'
|
||||||
|
|
||||||
|
scores:
|
||||||
|
candidate_digit: 4
|
||||||
|
candidate_word_count_cap: 4
|
||||||
|
specificity_boost: 3
|
||||||
|
primary_query_overlap_threshold: 0.9
|
||||||
|
prompt_match_weight: 3
|
||||||
|
primary_query_match_weight: 2
|
||||||
|
repair_signal_match_weight: 4
|
||||||
|
primary_result_order_bonus: 1
|
||||||
|
token_intersection_score: 2
|
||||||
|
numeric_token_match_score: 4
|
||||||
|
|
||||||
|
patterns:
|
||||||
|
model_candidate: '/\b([A-Za-zÄÖÜäöüß][A-Za-zÄÖÜäöüß®\-]*(?:\s+[A-Za-zÄÖÜäöüß][A-Za-zÄÖÜäöüß®\-]*){0,2}\s+\d{2,5}[A-Za-z0-9\-]*)\b/u'
|
||||||
|
accessory_candidate_template: '/\b((?:{terms})\s+\d{1,5}[A-Za-z0-9\-]*)\b/iu'
|
||||||
|
requested_accessory_code: '/\b(?:indikator(?:typ)?|indicator(?:\s*type)?|reagenz|reagent)\s*([A-Za-z]{0,3}\s*\d{1,5}[A-Za-z0-9\-]*)\b/iu'
|
||||||
|
accessory_or_bundle_template: '/\b({terms})\b/iu'
|
||||||
|
model_like: '/\b[A-Za-zÄÖÜäöüß][A-Za-zÄÖÜäöüß®\-]*(?:\s+[A-Za-zÄÖÜäöüß][A-Za-zÄÖÜäöüß®\-]*){0,2}\s+\d{2,5}[A-Za-z0-9\-]*\b/u'
|
||||||
|
specificity_boost_template: '/\b(?:{terms})\b/iu'
|
||||||
|
contains_digit: '/\d/u'
|
||||||
|
whitespace_collapse: '/\s+/u'
|
||||||
|
tokenize_cleanup: '/[^\p{L}\p{N}\s\-]+/u'
|
||||||
@@ -1,37 +1,6 @@
|
|||||||
# Central domain vocabulary for RetrieX.
|
# Central domain vocabulary for RetrieX.
|
||||||
# Views preserve the previous 1.4.2-tuned ordering exactly; per-service configs may still override them.
|
# Views preserve the previous 1.4.2-tuned ordering exactly; per-service configs may still override them.
|
||||||
parameters:
|
parameters:
|
||||||
retriex.commerce_query.config: {}
|
|
||||||
retriex.search_repair.config:
|
|
||||||
strict_requested_accessory_code_repair: true
|
|
||||||
prefer_prompt_anchored_model_for_requested_accessory_code: true
|
|
||||||
requested_accessory_code_pattern: '/\b(?:indikator(?:typ)?|indicator(?:\s*type)?|reagenz|reagent)\s*([A-Za-z]{0,3}\s*\d{1,5}[A-Za-z0-9\-]*)\b/iu'
|
|
||||||
requested_accessory_code_fallback_query_templates:
|
|
||||||
- '{term} {code}'
|
|
||||||
requested_accessory_code_fallback_terms:
|
|
||||||
- indikatortyp
|
|
||||||
- indikator
|
|
||||||
- indicator
|
|
||||||
- reagenz
|
|
||||||
- reagent
|
|
||||||
requested_accessory_code_context_prefix_terms:
|
|
||||||
- indikatortyp
|
|
||||||
- indikator
|
|
||||||
- indicator
|
|
||||||
- reagenz
|
|
||||||
- reagent
|
|
||||||
requested_accessory_code_proximity_window: 1600
|
|
||||||
specific_model_candidate_patterns:
|
|
||||||
- '/\b([A-Za-zÄÖÜäöüß][A-Za-zÄÖÜäöüß®\-]*(?:\s+[A-Za-zÄÖÜäöüß0-9][A-Za-zÄÖÜäöüß0-9®\-]*){0,3}\s+\d{2,5}(?:\s+[A-ZÄÖÜ]{1,8})?)\b/u'
|
|
||||||
model_candidate_exclude_terms:
|
|
||||||
- indikatortyp
|
|
||||||
- indikator
|
|
||||||
- indicator
|
|
||||||
- reagenz
|
|
||||||
- reagent
|
|
||||||
- verfügbarkeit
|
|
||||||
- verfuegbarkeit
|
|
||||||
- shop
|
|
||||||
retriex.vocabulary.config:
|
retriex.vocabulary.config:
|
||||||
classes:
|
classes:
|
||||||
device:
|
device:
|
||||||
@@ -231,147 +200,6 @@ parameters:
|
|||||||
- filter
|
- filter
|
||||||
- pumpenkopf
|
- pumpenkopf
|
||||||
- motorblock
|
- motorblock
|
||||||
commerce_query:
|
|
||||||
known_brands:
|
|
||||||
add:
|
|
||||||
- heyl
|
|
||||||
- horiba
|
|
||||||
- neomeris
|
|
||||||
phrases_to_remove:
|
|
||||||
add:
|
|
||||||
- ich suche
|
|
||||||
- suche
|
|
||||||
- habt ihr
|
|
||||||
- gibt es
|
|
||||||
- gebe mir
|
|
||||||
- gib mir
|
|
||||||
- zeige mir
|
|
||||||
- welches gerät
|
|
||||||
- welche gerät
|
|
||||||
- welches modell
|
|
||||||
- welches ist besser
|
|
||||||
- welches ist am besten
|
|
||||||
- alternative
|
|
||||||
- alternativen
|
|
||||||
- unter anderem
|
|
||||||
- u a
|
|
||||||
- welche
|
|
||||||
- welcher
|
|
||||||
- welches
|
|
||||||
- welchen
|
|
||||||
- sind
|
|
||||||
- ist
|
|
||||||
- geeignet
|
|
||||||
- geeigent
|
|
||||||
- verfügbarkeit
|
|
||||||
- verfuegbarkeit
|
|
||||||
filter_search_tokens:
|
|
||||||
add:
|
|
||||||
- auch
|
|
||||||
- noch
|
|
||||||
- nochmal
|
|
||||||
- zusätzlich
|
|
||||||
- dazu
|
|
||||||
- davon
|
|
||||||
- stattdessen
|
|
||||||
- bitte
|
|
||||||
- gern
|
|
||||||
- gerne
|
|
||||||
- zeige
|
|
||||||
- zeig
|
|
||||||
- such
|
|
||||||
- suche
|
|
||||||
- finde
|
|
||||||
- find
|
|
||||||
- mir
|
|
||||||
- mal
|
|
||||||
- von
|
|
||||||
- im
|
|
||||||
- in
|
|
||||||
- für
|
|
||||||
- fuer
|
|
||||||
- welche
|
|
||||||
- welcher
|
|
||||||
- welches
|
|
||||||
- welchen
|
|
||||||
- sind
|
|
||||||
- ist
|
|
||||||
- geeignet
|
|
||||||
- geeigent
|
|
||||||
- verfügbarkeit
|
|
||||||
- verfuegbarkeit
|
|
||||||
- prüfe
|
|
||||||
- pruefe
|
|
||||||
- den
|
|
||||||
- die
|
|
||||||
- das
|
|
||||||
- der
|
|
||||||
- dem
|
|
||||||
- des
|
|
||||||
- und
|
|
||||||
- oder
|
|
||||||
- sowie
|
|
||||||
- seine
|
|
||||||
- seinen
|
|
||||||
- seiner
|
|
||||||
- seinem
|
|
||||||
- seines
|
|
||||||
- siene
|
|
||||||
- sienen
|
|
||||||
- siener
|
|
||||||
- sienem
|
|
||||||
- sienes
|
|
||||||
- gebe
|
|
||||||
- gib
|
|
||||||
- nenne
|
|
||||||
- nenn
|
|
||||||
- preis
|
|
||||||
- preise
|
|
||||||
- preisen
|
|
||||||
- kostet
|
|
||||||
- kosten
|
|
||||||
- ua
|
|
||||||
- also
|
|
||||||
- gut
|
|
||||||
- gute
|
|
||||||
- guten
|
|
||||||
- guter
|
|
||||||
- gutes
|
|
||||||
- passen
|
|
||||||
- passend
|
|
||||||
semantic_shop_search_tokens:
|
|
||||||
add:
|
|
||||||
- indikator
|
|
||||||
- indicator
|
|
||||||
- reagenz
|
|
||||||
- reagent
|
|
||||||
- zubehör
|
|
||||||
- zubehor
|
|
||||||
- ersatzteil
|
|
||||||
- verbrauchsmaterial
|
|
||||||
- chemie
|
|
||||||
- indikatorchemie
|
|
||||||
- reagenzchemie
|
|
||||||
- kit
|
|
||||||
- set
|
|
||||||
- filter
|
|
||||||
- pumpe
|
|
||||||
- pumpenkopf
|
|
||||||
- motorblock
|
|
||||||
- lösung
|
|
||||||
- loesung
|
|
||||||
- solution
|
|
||||||
- teststreifen
|
|
||||||
- gerät
|
|
||||||
- geraet
|
|
||||||
- messgerät
|
|
||||||
- messgeraet
|
|
||||||
- analysegerät
|
|
||||||
- analysegeraet
|
|
||||||
- analysator
|
|
||||||
- monitor
|
|
||||||
- controller
|
|
||||||
- system
|
|
||||||
retrieval:
|
retrieval:
|
||||||
generic_product_tokens:
|
generic_product_tokens:
|
||||||
add:
|
add:
|
||||||
@@ -660,19 +488,3 @@ parameters:
|
|||||||
- service set
|
- service set
|
||||||
- serviceset
|
- serviceset
|
||||||
- service-set
|
- service-set
|
||||||
commerce_query:
|
|
||||||
search_token_corrections:
|
|
||||||
siene: seine
|
|
||||||
sienen: seinen
|
|
||||||
siener: seiner
|
|
||||||
sienem: seinem
|
|
||||||
sienes: seines
|
|
||||||
indicatoren: indikatoren
|
|
||||||
search_token_canonical:
|
|
||||||
indikatoren: indikator
|
|
||||||
indicators: indikator
|
|
||||||
indicator: indikator
|
|
||||||
reagenzien: reagenz
|
|
||||||
reagents: reagenz
|
|
||||||
reagent: reagenz
|
|
||||||
produkte: produkt
|
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ imports:
|
|||||||
- { resource: 'retriex/retrieval.yaml' }
|
- { resource: 'retriex/retrieval.yaml' }
|
||||||
- { resource: 'retriex/language.yaml' }
|
- { resource: 'retriex/language.yaml' }
|
||||||
- { resource: 'retriex/query_enrichment.yaml' }
|
- { resource: 'retriex/query_enrichment.yaml' }
|
||||||
|
- { resource: 'retriex/search_repair.yaml' }
|
||||||
- { resource: 'retriex/vocabulary.yaml' }
|
- { resource: 'retriex/vocabulary.yaml' }
|
||||||
- { resource: 'retriex/intent.yaml' }
|
- { resource: 'retriex/intent.yaml' }
|
||||||
|
|
||||||
@@ -129,7 +130,6 @@ services:
|
|||||||
App\Config\PromptBuilderConfig:
|
App\Config\PromptBuilderConfig:
|
||||||
arguments:
|
arguments:
|
||||||
$config: '%retriex.prompt.config%'
|
$config: '%retriex.prompt.config%'
|
||||||
$vocabulary: '@App\Config\DomainVocabularyConfig'
|
|
||||||
|
|
||||||
App\Config\AgentRunnerConfig:
|
App\Config\AgentRunnerConfig:
|
||||||
arguments:
|
arguments:
|
||||||
@@ -138,7 +138,6 @@ services:
|
|||||||
App\Config\NdjsonHybridRetrieverConfig:
|
App\Config\NdjsonHybridRetrieverConfig:
|
||||||
arguments:
|
arguments:
|
||||||
$config: '%retriex.retrieval.config%'
|
$config: '%retriex.retrieval.config%'
|
||||||
$vocabulary: '@App\Config\DomainVocabularyConfig'
|
|
||||||
|
|
||||||
App\Config\StopWordsConfig:
|
App\Config\StopWordsConfig:
|
||||||
arguments:
|
arguments:
|
||||||
@@ -151,7 +150,6 @@ services:
|
|||||||
App\Config\ShopServiceConfig:
|
App\Config\ShopServiceConfig:
|
||||||
arguments:
|
arguments:
|
||||||
$config: '%retriex.shop_matching.config%'
|
$config: '%retriex.shop_matching.config%'
|
||||||
$vocabulary: '@App\Config\DomainVocabularyConfig'
|
|
||||||
|
|
||||||
App\Infrastructure\OllamaClient:
|
App\Infrastructure\OllamaClient:
|
||||||
arguments:
|
arguments:
|
||||||
@@ -199,7 +197,6 @@ services:
|
|||||||
App\Config\CommerceQueryParserConfig:
|
App\Config\CommerceQueryParserConfig:
|
||||||
arguments:
|
arguments:
|
||||||
$config: '%retriex.commerce_query.config%'
|
$config: '%retriex.commerce_query.config%'
|
||||||
$vocabulary: '@App\Config\DomainVocabularyConfig'
|
|
||||||
|
|
||||||
App\Commerce\CommerceQueryParser: ~
|
App\Commerce\CommerceQueryParser: ~
|
||||||
|
|
||||||
@@ -209,7 +206,6 @@ services:
|
|||||||
$maxRepairQueries: '%retriex.commerce.search_repair.max_queries%'
|
$maxRepairQueries: '%retriex.commerce.search_repair.max_queries%'
|
||||||
$minPrimaryResultsWithoutRepair: '%retriex.commerce.search_repair.min_primary_results_without_repair%'
|
$minPrimaryResultsWithoutRepair: '%retriex.commerce.search_repair.min_primary_results_without_repair%'
|
||||||
$config: '%retriex.search_repair.config%'
|
$config: '%retriex.search_repair.config%'
|
||||||
$vocabulary: '@App\Config\DomainVocabularyConfig'
|
|
||||||
|
|
||||||
App\Commerce\SearchRepairService: ~
|
App\Commerce\SearchRepairService: ~
|
||||||
|
|
||||||
|
|||||||
@@ -4,243 +4,60 @@ declare(strict_types=1);
|
|||||||
|
|
||||||
namespace App\Config;
|
namespace App\Config;
|
||||||
|
|
||||||
|
use InvalidArgumentException;
|
||||||
|
|
||||||
final class CommerceQueryParserConfig
|
final class CommerceQueryParserConfig
|
||||||
{
|
{
|
||||||
private const KNOWN_BRANDS = [
|
|
||||||
'heyl',
|
|
||||||
'horiba',
|
|
||||||
'neomeris',
|
|
||||||
];
|
|
||||||
|
|
||||||
private const PHRASES_TO_REMOVE = [
|
|
||||||
'ich suche',
|
|
||||||
'suche',
|
|
||||||
'habt ihr',
|
|
||||||
'gibt es',
|
|
||||||
'gebe mir',
|
|
||||||
'gib mir',
|
|
||||||
'zeige mir',
|
|
||||||
'welches gerät',
|
|
||||||
'welche gerät',
|
|
||||||
'welches modell',
|
|
||||||
'welches ist besser',
|
|
||||||
'welches ist am besten',
|
|
||||||
'alternative',
|
|
||||||
'alternativen',
|
|
||||||
'unter anderem',
|
|
||||||
'u a',
|
|
||||||
'welche',
|
|
||||||
'welcher',
|
|
||||||
'welches',
|
|
||||||
'welchen',
|
|
||||||
'sind',
|
|
||||||
'ist',
|
|
||||||
'geeignet',
|
|
||||||
'geeigent',
|
|
||||||
'verfügbarkeit',
|
|
||||||
'verfuegbarkeit',
|
|
||||||
];
|
|
||||||
|
|
||||||
private const FILTER_SEARCH_TOKENS = [
|
|
||||||
'auch',
|
|
||||||
'noch',
|
|
||||||
'nochmal',
|
|
||||||
'zusätzlich',
|
|
||||||
'dazu',
|
|
||||||
'davon',
|
|
||||||
'stattdessen',
|
|
||||||
'bitte',
|
|
||||||
'gern',
|
|
||||||
'gerne',
|
|
||||||
'zeige',
|
|
||||||
'zeig',
|
|
||||||
'such',
|
|
||||||
'suche',
|
|
||||||
'finde',
|
|
||||||
'find',
|
|
||||||
'mir',
|
|
||||||
'mal',
|
|
||||||
'von',
|
|
||||||
'im',
|
|
||||||
'in',
|
|
||||||
'für',
|
|
||||||
'fuer',
|
|
||||||
'welche',
|
|
||||||
'welcher',
|
|
||||||
'welches',
|
|
||||||
'welchen',
|
|
||||||
'sind',
|
|
||||||
'ist',
|
|
||||||
'geeignet',
|
|
||||||
'geeigent',
|
|
||||||
'verfügbarkeit',
|
|
||||||
'verfuegbarkeit',
|
|
||||||
'prüfe',
|
|
||||||
'pruefe',
|
|
||||||
'den',
|
|
||||||
'die',
|
|
||||||
'das',
|
|
||||||
'der',
|
|
||||||
'dem',
|
|
||||||
'des',
|
|
||||||
'und',
|
|
||||||
'oder',
|
|
||||||
'sowie',
|
|
||||||
'seine',
|
|
||||||
'seinen',
|
|
||||||
'seiner',
|
|
||||||
'seinem',
|
|
||||||
'seines',
|
|
||||||
'siene',
|
|
||||||
'sienen',
|
|
||||||
'siener',
|
|
||||||
'sienem',
|
|
||||||
'sienes',
|
|
||||||
'gebe',
|
|
||||||
'gib',
|
|
||||||
'nenne',
|
|
||||||
'nenn',
|
|
||||||
'preis',
|
|
||||||
'preise',
|
|
||||||
'preisen',
|
|
||||||
'kostet',
|
|
||||||
'kosten',
|
|
||||||
'ua',
|
|
||||||
'also',
|
|
||||||
'gut',
|
|
||||||
'gute',
|
|
||||||
'guten',
|
|
||||||
'guter',
|
|
||||||
'gutes',
|
|
||||||
'passen',
|
|
||||||
'passend',
|
|
||||||
];
|
|
||||||
|
|
||||||
private const SEARCH_TOKEN_CORRECTIONS = [
|
|
||||||
'siene' => 'seine',
|
|
||||||
'sienen' => 'seinen',
|
|
||||||
'siener' => 'seiner',
|
|
||||||
'sienem' => 'seinem',
|
|
||||||
'sienes' => 'seines',
|
|
||||||
'indicatoren' => 'indikatoren',
|
|
||||||
];
|
|
||||||
|
|
||||||
private const SEARCH_TOKEN_CANONICAL_MAP = [
|
|
||||||
'indikatoren' => 'indikator',
|
|
||||||
'indicators' => 'indikator',
|
|
||||||
'indicator' => 'indikator',
|
|
||||||
'reagenzien' => 'reagenz',
|
|
||||||
'reagents' => 'reagenz',
|
|
||||||
'reagent' => 'reagenz',
|
|
||||||
'produkte' => 'produkt',
|
|
||||||
];
|
|
||||||
|
|
||||||
private const SEMANTIC_SHOP_SEARCH_TOKENS = [
|
|
||||||
'indikator',
|
|
||||||
'indicator',
|
|
||||||
'reagenz',
|
|
||||||
'reagent',
|
|
||||||
'zubehör',
|
|
||||||
'zubehor',
|
|
||||||
'ersatzteil',
|
|
||||||
'verbrauchsmaterial',
|
|
||||||
'chemie',
|
|
||||||
'indikatorchemie',
|
|
||||||
'reagenzchemie',
|
|
||||||
'kit',
|
|
||||||
'set',
|
|
||||||
'filter',
|
|
||||||
'pumpe',
|
|
||||||
'pumpenkopf',
|
|
||||||
'motorblock',
|
|
||||||
'lösung',
|
|
||||||
'loesung',
|
|
||||||
'solution',
|
|
||||||
'teststreifen',
|
|
||||||
'gerät',
|
|
||||||
'geraet',
|
|
||||||
'messgerät',
|
|
||||||
'messgeraet',
|
|
||||||
'analysegerät',
|
|
||||||
'analysegeraet',
|
|
||||||
'analysator',
|
|
||||||
'monitor',
|
|
||||||
'controller',
|
|
||||||
'system',
|
|
||||||
];
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @param array<string, mixed> $config
|
* @param array<string, mixed> $config
|
||||||
*/
|
*/
|
||||||
public function __construct(
|
public function __construct(
|
||||||
private readonly array $config = [],
|
private readonly array $config = [],
|
||||||
private readonly ?DomainVocabularyConfig $vocabulary = null,
|
|
||||||
) {
|
) {
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/** @return string[] */
|
||||||
* @return string[]
|
|
||||||
*/
|
|
||||||
public function getKnownBrands(): array
|
public function getKnownBrands(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->stringList('known_brands');
|
||||||
'known_brands',
|
|
||||||
$this->vocabularyView('commerce_query.known_brands', self::KNOWN_BRANDS)
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/** @return string[] */
|
||||||
* @return string[]
|
|
||||||
*/
|
|
||||||
public function getPhrasesToRemove(): array
|
public function getPhrasesToRemove(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->stringList('phrases_to_remove');
|
||||||
'phrases_to_remove',
|
|
||||||
$this->vocabularyView('commerce_query.phrases_to_remove', self::PHRASES_TO_REMOVE)
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getHistoryContextPattern(): string
|
public function getHistoryContextPattern(): string
|
||||||
{
|
{
|
||||||
return 'chat|auch|noch|nochmal|zusätzlich|dazu|davon|stattdessen|alternative|alternativen|größer|groesser|kleiner|gleich(?:e|en|er|es)?|derselbe|dieselbe|dasselbe|wie oben|wie zuvor|wie gehabt';
|
return $this->string('patterns.history_context');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getHistoryContextValuePattern(): string
|
public function getHistoryContextValuePattern(): string
|
||||||
{
|
{
|
||||||
return '/\b(' . $this->getHistoryContextPattern() . ')\b/u';
|
return $this->renderTemplate(
|
||||||
|
$this->string('patterns.history_context_value_template'),
|
||||||
|
['fragment' => $this->getHistoryContextPattern()],
|
||||||
|
'patterns.history_context_value_template'
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/** @return string[] */
|
||||||
* @return string[]
|
|
||||||
*/
|
|
||||||
public function getFilterSearchTokens(): array
|
public function getFilterSearchTokens(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->stringList('filter_search_tokens');
|
||||||
'filter_search_tokens',
|
|
||||||
$this->vocabularyView('commerce_query.filter_search_tokens', self::FILTER_SEARCH_TOKENS)
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/** @return array<string, string> */
|
||||||
* @return array<string, string>
|
|
||||||
*/
|
|
||||||
public function getSearchTokenCorrections(): array
|
public function getSearchTokenCorrections(): array
|
||||||
{
|
{
|
||||||
return $this->stringMap(
|
return $this->stringMap('search_token_corrections');
|
||||||
'search_token_corrections',
|
|
||||||
$this->vocabularyStringMap('commerce_query.search_token_corrections', self::SEARCH_TOKEN_CORRECTIONS)
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/** @return array<string, string> */
|
||||||
* @return array<string, string>
|
|
||||||
*/
|
|
||||||
public function getSearchTokenCanonicalMap(): array
|
public function getSearchTokenCanonicalMap(): array
|
||||||
{
|
{
|
||||||
return $this->stringMap(
|
return $this->stringMap('search_token_canonical_map');
|
||||||
'search_token_canonical_map',
|
|
||||||
$this->vocabularyStringMap('commerce_query.search_token_canonical', self::SEARCH_TOKEN_CANONICAL_MAP)
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -253,181 +70,190 @@ final class CommerceQueryParserConfig
|
|||||||
return $this->getFilterSearchTokens();
|
return $this->getFilterSearchTokens();
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/** @return string[] */
|
||||||
* @return string[]
|
|
||||||
*/
|
|
||||||
public function getNormalizationSearch(): array
|
public function getNormalizationSearch(): array
|
||||||
{
|
{
|
||||||
return ['€'];
|
return $this->stringList('normalization.search', true);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/** @return string[] */
|
||||||
* @return string[]
|
|
||||||
*/
|
|
||||||
public function getNormalizationReplace(): array
|
public function getNormalizationReplace(): array
|
||||||
{
|
{
|
||||||
return [' euro '];
|
return $this->stringList('normalization.replace', true);
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getPromptSanitizePattern(): string
|
public function getPromptSanitizePattern(): string
|
||||||
{
|
{
|
||||||
return '/[^\p{L}\p{N}\s.,\-]/u';
|
return $this->string('patterns.prompt_sanitize');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getWhitespaceCollapsePattern(): string
|
public function getWhitespaceCollapsePattern(): string
|
||||||
{
|
{
|
||||||
return '/\s+/u';
|
return $this->string('patterns.whitespace_collapse');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getWhitespaceSplitPattern(): string
|
public function getWhitespaceSplitPattern(): string
|
||||||
{
|
{
|
||||||
return '/\s+/u';
|
return $this->string('patterns.whitespace_split');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getSearchTextTrimCharacters(): string
|
public function getSearchTextTrimCharacters(): string
|
||||||
{
|
{
|
||||||
return " \t\n\r\0\x0B-.,";
|
$characters = '';
|
||||||
|
foreach ($this->stringList('text.trim_characters') as $item) {
|
||||||
|
$characters .= match ($item) {
|
||||||
|
'space' => ' ',
|
||||||
|
'tab' => "\t",
|
||||||
|
'lf' => "\n",
|
||||||
|
'cr' => "\r",
|
||||||
|
'nul' => "\0",
|
||||||
|
'vertical_tab' => "\x0B",
|
||||||
|
default => $item,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
return $characters;
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getMinSearchTokenLength(): int
|
public function getMinSearchTokenLength(): int
|
||||||
{
|
{
|
||||||
return 1;
|
return $this->int('limits.min_search_token_length');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getMinDirectProductTokenLength(): int
|
public function getMinDirectProductTokenLength(): int
|
||||||
{
|
{
|
||||||
return 1;
|
return $this->int('limits.min_direct_product_token_length');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getHistoryQuestionPattern(): string
|
public function getHistoryQuestionPattern(): string
|
||||||
{
|
{
|
||||||
return '/^Question:\s*(.+)$/m';
|
return $this->string('patterns.history_question');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getPriceBetweenPattern(): string
|
public function getPriceBetweenPattern(): string
|
||||||
{
|
{
|
||||||
return '/\bzwischen\s+(\d+(?:[.,]\d+)?)\s+und\s+(\d+(?:[.,]\d+)?)\s+euro\b/u';
|
return $this->string('patterns.price_between');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getPriceMaxPattern(): string
|
public function getPriceMaxPattern(): string
|
||||||
{
|
{
|
||||||
return '/\b(?:unter|bis|max(?:imal)?)\s+(\d+(?:[.,]\d+)?)\s+euro\b/u';
|
return $this->string('patterns.price_max');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getPriceMinPattern(): string
|
public function getPriceMinPattern(): string
|
||||||
{
|
{
|
||||||
return '/\b(?:ab|mindestens|min)\s+(\d+(?:[.,]\d+)?)\s+euro\b/u';
|
return $this->string('patterns.price_min');
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/** @return string[] */
|
||||||
* @return string[]
|
|
||||||
*/
|
|
||||||
public function getPriceRemovalPatterns(CommerceIntentConfig $intentConfig): array
|
public function getPriceRemovalPatterns(CommerceIntentConfig $intentConfig): array
|
||||||
{
|
{
|
||||||
return [
|
return [
|
||||||
'/\bzwischen\s+\d+(?:[.,]\d+)?\s+und\s+\d+(?:[.,]\d+)?\s*euro\b/u',
|
$this->string('patterns.price_removal_between'),
|
||||||
'/\b(?:unter|bis|max(?:imal)?|ab|mindestens|min)\s+\d+(?:[.,]\d+)?\s*euro\b/u',
|
$this->string('patterns.price_removal_minmax'),
|
||||||
'/\b(?:' . $intentConfig->getPricePattern() . ')\b/u',
|
$this->renderTemplate(
|
||||||
|
$this->string('patterns.price_removal_intent_template'),
|
||||||
|
['price_pattern' => $intentConfig->getPricePattern()],
|
||||||
|
'patterns.price_removal_intent_template'
|
||||||
|
),
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getDirectProductDigitPattern(): string
|
public function getDirectProductDigitPattern(): string
|
||||||
{
|
{
|
||||||
return '/\d/u';
|
return $this->string('patterns.direct_product_digit');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getDirectProductMaxTokens(): int
|
public function getDirectProductMaxTokens(): int
|
||||||
{
|
{
|
||||||
return 4;
|
return $this->int('limits.direct_product_max_tokens');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getModelLikePattern(): string
|
public function getModelLikePattern(): string
|
||||||
{
|
{
|
||||||
return '/\b[a-zäöüß][a-zäöüß®\-]*(?:\s+[a-zäöüß][a-zäöüß®\-]*){0,2}\s+\d{2,5}[a-z0-9\-]*\b/u';
|
return $this->string('patterns.model_like');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getAccessoryLikePattern(): string
|
public function getAccessoryLikePattern(): string
|
||||||
{
|
{
|
||||||
return '/\b(?:indikator|indicator|reagenz|reagent|kit|set)\s+\d{1,5}[a-z0-9\-]*\b/u';
|
return $this->string('patterns.accessory_like');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getContainsDigitPattern(): string
|
public function getContainsDigitPattern(): string
|
||||||
{
|
{
|
||||||
return '/\d/u';
|
return $this->string('patterns.contains_digit');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getModelNumberTokenPattern(): string
|
public function getModelNumberTokenPattern(): string
|
||||||
{
|
{
|
||||||
return '/^(?:\d{2,5}[a-z0-9\-]*|[a-z]{1,6}\d{1,5}[a-z0-9\-]*)$/u';
|
return $this->string('patterns.model_number_token');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getModelContextTokenPattern(): string
|
public function getModelContextTokenPattern(): string
|
||||||
{
|
{
|
||||||
return '/^[\p{L}][\p{L}0-9®\-]{2,}$/u';
|
return $this->string('patterns.model_context_token');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getModelSuffixTokenPattern(): string
|
public function getModelSuffixTokenPattern(): string
|
||||||
{
|
{
|
||||||
return '/^[a-z]{1,4}\d{0,3}$/u';
|
return $this->string('patterns.model_suffix_token');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getModelContextTokenWindow(): int
|
public function getModelContextTokenWindow(): int
|
||||||
{
|
{
|
||||||
return 4;
|
return $this->int('limits.model_context_token_window');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getMinMeaningfulAlphaTokenLength(): int
|
public function getMinMeaningfulAlphaTokenLength(): int
|
||||||
{
|
{
|
||||||
return 2;
|
return $this->int('limits.min_meaningful_alpha_token_length');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getMaxShopSearchTokens(): int
|
public function getMaxShopSearchTokens(): int
|
||||||
{
|
{
|
||||||
return 6;
|
return $this->int('limits.max_shop_search_tokens');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getInstructionOrPresentationTokenPattern(): string
|
public function getInstructionOrPresentationTokenPattern(): string
|
||||||
{
|
{
|
||||||
return '/^(?:zeig(?:e)?|such(?:e)?|find(?:e)?|gib|gebe|nenn(?:e)?|liefer(?:e)?|erstelle?|mach(?:e)?|brauch(?:e)?|will|möchte|moechte|hätte|haette|kannst|bitte|mal|alle|alles|komplett|vollständig|vollstaendig|gesamt|ganze|ganzen|liste|listung|auflistung|tabelle|tabellarisch|übersicht|uebersicht|anzeigen?|ausgeben?|darstellen?|antwort(?:e)?|erklär(?:e)?|erklaer(?:e)?|info|infos|informationen|dazu|hierzu|damit|davon|an|als|mit|ohne|inkl|inklusive|also|gut|gute|guten|guter|gutes|passend|passen)$/u';
|
return $this->string('patterns.instruction_or_presentation_token');
|
||||||
}
|
|
||||||
public function getMeasurementValueTokenPattern(): string
|
|
||||||
{
|
|
||||||
return '/^\d+[.,]\d+$/u';
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
public function getMeasurementValueTokenPattern(): string
|
||||||
* Product/category tokens that are useful for Store API search even when they are not next to a model number.
|
{
|
||||||
* This is intentionally a semantic allowlist, not a spelling-error blocklist.
|
return $this->string('patterns.measurement_value_token');
|
||||||
*
|
}
|
||||||
* @return string[]
|
|
||||||
*/
|
/** @return string[] */
|
||||||
public function getSemanticShopSearchTokens(): array
|
public function getSemanticShopSearchTokens(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->stringList('semantic_shop_search_tokens');
|
||||||
'semantic_shop_search_tokens',
|
}
|
||||||
$this->vocabularyView('commerce_query.semantic_shop_search_tokens', self::SEMANTIC_SHOP_SEARCH_TOKENS)
|
|
||||||
|
public function buildExactTokenRemovalPattern(string $token): string
|
||||||
|
{
|
||||||
|
return $this->renderTemplate(
|
||||||
|
$this->string('patterns.exact_token_removal_template'),
|
||||||
|
['token' => preg_quote($token, '/')],
|
||||||
|
'patterns.exact_token_removal_template'
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
public function buildBrandPartOfModelPattern(string $brand): string
|
||||||
/** @return string[] */
|
|
||||||
private function vocabularyView(string $path, array $fallback): array
|
|
||||||
{
|
{
|
||||||
return $this->vocabulary?->view($path, $fallback) ?? $fallback;
|
return $this->renderTemplate(
|
||||||
}
|
$this->string('patterns.brand_part_of_model_template'),
|
||||||
|
['brand' => preg_quote($brand, '/')],
|
||||||
/** @return array<string, string> */
|
'patterns.brand_part_of_model_template'
|
||||||
private function vocabularyStringMap(string $path, array $fallback): array
|
);
|
||||||
{
|
|
||||||
return $this->vocabulary?->stringMap($path, $fallback) ?? $fallback;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
private function stringList(string $path, array $default): array
|
private function stringList(string $path, bool $preserveWhitespace = false): array
|
||||||
{
|
{
|
||||||
$value = $this->value($path, $default);
|
$value = $this->value($path);
|
||||||
if (!is_array($value)) {
|
if (!is_array($value)) {
|
||||||
return $default;
|
throw $this->invalid($path, 'must be a list of non-empty strings');
|
||||||
}
|
}
|
||||||
|
|
||||||
$out = [];
|
$out = [];
|
||||||
@@ -436,23 +262,31 @@ final class CommerceQueryParserConfig
|
|||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
$item = trim((string) $item);
|
$item = (string) $item;
|
||||||
if ($item === '' || in_array($item, $out, true)) {
|
if (!$preserveWhitespace) {
|
||||||
|
$item = trim($item);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (trim($item) === '' || in_array($item, $out, true)) {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
$out[] = $item;
|
$out[] = $item;
|
||||||
}
|
}
|
||||||
|
|
||||||
return $out !== [] ? $out : $default;
|
if ($out === []) {
|
||||||
|
throw $this->invalid($path, 'must contain at least one non-empty string');
|
||||||
|
}
|
||||||
|
|
||||||
|
return $out;
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return array<string, string> */
|
/** @return array<string, string> */
|
||||||
private function stringMap(string $path, array $default): array
|
private function stringMap(string $path): array
|
||||||
{
|
{
|
||||||
$value = $this->value($path, $default);
|
$value = $this->value($path);
|
||||||
if (!is_array($value)) {
|
if (!is_array($value)) {
|
||||||
return $default;
|
throw $this->invalid($path, 'must be a map of non-empty strings');
|
||||||
}
|
}
|
||||||
|
|
||||||
$out = [];
|
$out = [];
|
||||||
@@ -468,15 +302,44 @@ final class CommerceQueryParserConfig
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return $out !== [] ? $out : $default;
|
if ($out === []) {
|
||||||
|
throw $this->invalid($path, 'must contain at least one non-empty mapping');
|
||||||
|
}
|
||||||
|
|
||||||
|
return $out;
|
||||||
}
|
}
|
||||||
|
|
||||||
private function value(string $path, mixed $default): mixed
|
private function string(string $path): string
|
||||||
|
{
|
||||||
|
$value = $this->value($path);
|
||||||
|
if (!is_scalar($value)) {
|
||||||
|
throw $this->invalid($path, 'must be a non-empty string');
|
||||||
|
}
|
||||||
|
|
||||||
|
$value = (string) $value;
|
||||||
|
if ($value === '') {
|
||||||
|
throw $this->invalid($path, 'must be a non-empty string');
|
||||||
|
}
|
||||||
|
|
||||||
|
return $value;
|
||||||
|
}
|
||||||
|
|
||||||
|
private function int(string $path): int
|
||||||
|
{
|
||||||
|
$value = $this->value($path);
|
||||||
|
if (!is_int($value)) {
|
||||||
|
throw $this->invalid($path, 'must be an integer');
|
||||||
|
}
|
||||||
|
|
||||||
|
return $value;
|
||||||
|
}
|
||||||
|
|
||||||
|
private function value(string $path): mixed
|
||||||
{
|
{
|
||||||
$current = $this->config;
|
$current = $this->config;
|
||||||
foreach (explode('.', $path) as $segment) {
|
foreach (explode('.', $path) as $segment) {
|
||||||
if (!is_array($current) || !array_key_exists($segment, $current)) {
|
if (!is_array($current) || !array_key_exists($segment, $current)) {
|
||||||
return $default;
|
throw $this->missing($path);
|
||||||
}
|
}
|
||||||
|
|
||||||
$current = $current[$segment];
|
$current = $current[$segment];
|
||||||
@@ -485,13 +348,29 @@ final class CommerceQueryParserConfig
|
|||||||
return $current;
|
return $current;
|
||||||
}
|
}
|
||||||
|
|
||||||
public function buildExactTokenRemovalPattern(string $token): string
|
/**
|
||||||
|
* @param array<string, string> $replacements
|
||||||
|
*/
|
||||||
|
private function renderTemplate(string $template, array $replacements, string $path): string
|
||||||
{
|
{
|
||||||
return '/\b' . preg_quote($token, '/') . '\b/u';
|
foreach ($replacements as $placeholder => $value) {
|
||||||
|
$template = str_replace('{' . $placeholder . '}', $value, $template);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (preg_match('/\{[A-Za-z_][A-Za-z0-9_]*\}/', $template) === 1) {
|
||||||
|
throw $this->invalid($path, 'contains unresolved placeholders');
|
||||||
|
}
|
||||||
|
|
||||||
|
return $template;
|
||||||
}
|
}
|
||||||
|
|
||||||
public function buildBrandPartOfModelPattern(string $brand): string
|
private function missing(string $path): InvalidArgumentException
|
||||||
{
|
{
|
||||||
return '/\b' . preg_quote($brand, '/') . '\s+\d{2,5}[a-z0-9\-]*\b/u';
|
return new InvalidArgumentException(sprintf('RetrieX commerce query config "%s" is missing.', $path));
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
private function invalid(string $path, string $reason): InvalidArgumentException
|
||||||
|
{
|
||||||
|
return new InvalidArgumentException(sprintf('RetrieX commerce query config "%s" %s.', $path, $reason));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@@ -4,31 +4,18 @@ declare(strict_types=1);
|
|||||||
|
|
||||||
namespace App\Config;
|
namespace App\Config;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* YAML-backed query-enrichment configuration.
|
||||||
|
*
|
||||||
|
* This class intentionally has no PHP fallback values. Missing or invalid
|
||||||
|
* configuration must be fixed in config/retriex/query_enrichment.yaml.
|
||||||
|
*/
|
||||||
final readonly class QueryEnricherConfig
|
final readonly class QueryEnricherConfig
|
||||||
{
|
{
|
||||||
/**
|
|
||||||
* Backwards-compatible fallback vocabulary.
|
|
||||||
* Active values are loaded from retriex.query_enrichment.config when present.
|
|
||||||
*
|
|
||||||
* @var array<int|string, mixed>
|
|
||||||
*/
|
|
||||||
private const DEFAULT_ENRICH_QUERY_LIST = [
|
|
||||||
'Wasserhärte' => 'Resthärte',
|
|
||||||
'Gerät' => 'Modell',
|
|
||||||
'Indikator' => 'Chemie',
|
|
||||||
'Seminar' => 'Webinar',
|
|
||||||
'Schulung' => 'Seminar',
|
|
||||||
'Indikatoren' => 'Indikator',
|
|
||||||
'Wasserhärte-Grenzwert' => 'Resthärte',
|
|
||||||
'Resthärte-Grenzwert' => 'Wasserhärte',
|
|
||||||
'Grenzwert' => 'Überwachungsbereich',
|
|
||||||
'store' => 'shop',
|
|
||||||
];
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @param array<string, mixed> $config
|
* @param array<string, mixed> $config
|
||||||
*/
|
*/
|
||||||
public function __construct(private array $config = [])
|
public function __construct(private array $config)
|
||||||
{
|
{
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -52,11 +39,7 @@ final readonly class QueryEnricherConfig
|
|||||||
public function getEnrichQueryList(): array
|
public function getEnrichQueryList(): array
|
||||||
{
|
{
|
||||||
$normalized = [];
|
$normalized = [];
|
||||||
$rules = $this->config['rules'] ?? self::DEFAULT_ENRICH_QUERY_LIST;
|
$rules = $this->requiredArray('rules');
|
||||||
|
|
||||||
if (!is_array($rules)) {
|
|
||||||
$rules = self::DEFAULT_ENRICH_QUERY_LIST;
|
|
||||||
}
|
|
||||||
|
|
||||||
foreach ($rules as $key => $value) {
|
foreach ($rules as $key => $value) {
|
||||||
if (is_array($value)) {
|
if (is_array($value)) {
|
||||||
@@ -76,18 +59,16 @@ final readonly class QueryEnricherConfig
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if ($normalized === []) {
|
||||||
|
throw new \InvalidArgumentException('RetrieX query enrichment config key "rules" must contain at least one valid enrichment rule.');
|
||||||
|
}
|
||||||
|
|
||||||
return $normalized;
|
return $normalized;
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getMaxExpansions(): int
|
public function getMaxExpansions(): int
|
||||||
{
|
{
|
||||||
$value = $this->config['max_expansions'] ?? 4;
|
return $this->requiredNonNegativeInt('max_expansions');
|
||||||
|
|
||||||
if (!is_numeric($value)) {
|
|
||||||
return 4;
|
|
||||||
}
|
|
||||||
|
|
||||||
return max(0, (int) $value);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -160,6 +141,49 @@ final readonly class QueryEnricherConfig
|
|||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/** @return array<int|string, mixed> */
|
||||||
|
private function requiredArray(string $key): array
|
||||||
|
{
|
||||||
|
if (!array_key_exists($key, $this->config)) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('Missing required RetrieX query enrichment config key "%s".', $key));
|
||||||
|
}
|
||||||
|
|
||||||
|
$value = $this->config[$key];
|
||||||
|
|
||||||
|
if (!is_array($value)) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX query enrichment config key "%s" must be an array.', $key));
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($value === []) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX query enrichment config key "%s" must not be empty.', $key));
|
||||||
|
}
|
||||||
|
|
||||||
|
return $value;
|
||||||
|
}
|
||||||
|
|
||||||
|
private function requiredNonNegativeInt(string $key): int
|
||||||
|
{
|
||||||
|
if (!array_key_exists($key, $this->config)) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('Missing required RetrieX query enrichment config key "%s".', $key));
|
||||||
|
}
|
||||||
|
|
||||||
|
$value = $this->config[$key];
|
||||||
|
|
||||||
|
if (is_int($value)) {
|
||||||
|
$intValue = $value;
|
||||||
|
} elseif (is_string($value) && preg_match('/^-?\d+$/', trim($value)) === 1) {
|
||||||
|
$intValue = (int) trim($value);
|
||||||
|
} else {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX query enrichment config key "%s" must be an integer.', $key));
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($intValue < 0) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX query enrichment config key "%s" must be greater than or equal to 0.', $key));
|
||||||
|
}
|
||||||
|
|
||||||
|
return $intValue;
|
||||||
|
}
|
||||||
|
|
||||||
private function normalizePhrase(string $value): string
|
private function normalizePhrase(string $value): string
|
||||||
{
|
{
|
||||||
$value = trim($value);
|
$value = trim($value);
|
||||||
|
|||||||
@@ -522,6 +522,13 @@ final readonly class RetriexEffectiveConfigProvider
|
|||||||
'search_token_corrections' => $this->commerceQueryParserConfig->getSearchTokenCorrections(),
|
'search_token_corrections' => $this->commerceQueryParserConfig->getSearchTokenCorrections(),
|
||||||
'search_token_canonical_map' => $this->commerceQueryParserConfig->getSearchTokenCanonicalMap(),
|
'search_token_canonical_map' => $this->commerceQueryParserConfig->getSearchTokenCanonicalMap(),
|
||||||
'semantic_shop_search_tokens' => $this->commerceQueryParserConfig->getSemanticShopSearchTokens(),
|
'semantic_shop_search_tokens' => $this->commerceQueryParserConfig->getSemanticShopSearchTokens(),
|
||||||
|
'normalization' => [
|
||||||
|
'search' => $this->commerceQueryParserConfig->getNormalizationSearch(),
|
||||||
|
'replace' => $this->commerceQueryParserConfig->getNormalizationReplace(),
|
||||||
|
],
|
||||||
|
'text' => [
|
||||||
|
'trim_characters_length' => strlen($this->commerceQueryParserConfig->getSearchTextTrimCharacters()),
|
||||||
|
],
|
||||||
'limits' => [
|
'limits' => [
|
||||||
'min_search_token_length' => $this->commerceQueryParserConfig->getMinSearchTokenLength(),
|
'min_search_token_length' => $this->commerceQueryParserConfig->getMinSearchTokenLength(),
|
||||||
'min_direct_product_token_length' => $this->commerceQueryParserConfig->getMinDirectProductTokenLength(),
|
'min_direct_product_token_length' => $this->commerceQueryParserConfig->getMinDirectProductTokenLength(),
|
||||||
|
|||||||
@@ -4,69 +4,24 @@ declare(strict_types=1);
|
|||||||
|
|
||||||
namespace App\Config;
|
namespace App\Config;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* YAML-backed shop search-repair configuration.
|
||||||
|
*
|
||||||
|
* This class intentionally has no PHP fallback values. Missing or invalid
|
||||||
|
* configuration must be fixed in config/retriex/search_repair.yaml and
|
||||||
|
* config/retriex/vocabulary.yaml.
|
||||||
|
*/
|
||||||
final class SearchRepairConfig
|
final class SearchRepairConfig
|
||||||
{
|
{
|
||||||
private const GENERIC_CANDIDATE_TOKENS = [
|
|
||||||
'wasser',
|
|
||||||
'messgerät',
|
|
||||||
'messgeraet',
|
|
||||||
'produkt',
|
|
||||||
'geräte',
|
|
||||||
'geraete',
|
|
||||||
'gerät',
|
|
||||||
'geraet',
|
|
||||||
'resthärte',
|
|
||||||
'resthaerte',
|
|
||||||
'preis',
|
|
||||||
'infos',
|
|
||||||
'wissen',
|
|
||||||
];
|
|
||||||
|
|
||||||
private const ACCESSORY_CANDIDATE_TERMS = [
|
|
||||||
'indikator',
|
|
||||||
'indicator',
|
|
||||||
'reagenz',
|
|
||||||
'reagent',
|
|
||||||
'kit',
|
|
||||||
'set',
|
|
||||||
];
|
|
||||||
|
|
||||||
private const ACCESSORY_OR_BUNDLE_TERMS = [
|
|
||||||
'passend',
|
|
||||||
'passende',
|
|
||||||
'zubehor',
|
|
||||||
'zubehör',
|
|
||||||
'dazu',
|
|
||||||
'zusatz',
|
|
||||||
'erganzung',
|
|
||||||
'ergänzung',
|
|
||||||
'indikator',
|
|
||||||
'reagenz',
|
|
||||||
'kit',
|
|
||||||
'set',
|
|
||||||
'auch\s+das',
|
|
||||||
'mit\s+preis\s+und\s+allen\s+infos',
|
|
||||||
];
|
|
||||||
|
|
||||||
private const SPECIFICITY_BOOST_TERMS = [
|
|
||||||
'indikator',
|
|
||||||
'indicator',
|
|
||||||
'testomat',
|
|
||||||
'tritromat',
|
|
||||||
'titromat',
|
|
||||||
'reagenz',
|
|
||||||
'reagent',
|
|
||||||
];
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @param array<string, mixed> $config
|
* @param array<string, mixed> $config
|
||||||
*/
|
*/
|
||||||
public function __construct(
|
public function __construct(
|
||||||
private readonly bool $enabled = true,
|
private readonly bool $enabled,
|
||||||
private readonly int $maxRepairQueries = 3,
|
private readonly int $maxRepairQueries,
|
||||||
private readonly int $minPrimaryResultsWithoutRepair = 2,
|
private readonly int $minPrimaryResultsWithoutRepair,
|
||||||
private readonly array $config = [],
|
private readonly array $config,
|
||||||
private readonly ?DomainVocabularyConfig $vocabulary = null,
|
private readonly DomainVocabularyConfig $vocabulary,
|
||||||
) {
|
) {
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -87,257 +42,259 @@ final class SearchRepairConfig
|
|||||||
|
|
||||||
public function shouldRestrictRequestedAccessoryCodeRepair(): bool
|
public function shouldRestrictRequestedAccessoryCodeRepair(): bool
|
||||||
{
|
{
|
||||||
return $this->bool('strict_requested_accessory_code_repair', true);
|
return $this->requiredBool('strict_requested_accessory_code_repair');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function shouldPreferPromptAnchoredModelForRequestedAccessoryCode(): bool
|
public function shouldPreferPromptAnchoredModelForRequestedAccessoryCode(): bool
|
||||||
{
|
{
|
||||||
return $this->bool('prefer_prompt_anchored_model_for_requested_accessory_code', true);
|
return $this->requiredBool('prefer_prompt_anchored_model_for_requested_accessory_code');
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
public function getRequestedAccessoryCodeFallbackQueryTemplates(): array
|
public function getRequestedAccessoryCodeFallbackQueryTemplates(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->requiredStringList('requested_accessory_code_fallback_query_templates');
|
||||||
'requested_accessory_code_fallback_query_templates',
|
|
||||||
['{term} {code}']
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
public function getRequestedAccessoryCodeFallbackTerms(): array
|
public function getRequestedAccessoryCodeFallbackTerms(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->requiredStringList('requested_accessory_code_fallback_terms');
|
||||||
'requested_accessory_code_fallback_terms',
|
|
||||||
$this->getAccessoryCandidateTerms()
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
public function getRequestedAccessoryCodeContextPrefixTerms(): array
|
public function getRequestedAccessoryCodeContextPrefixTerms(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->requiredStringList('requested_accessory_code_context_prefix_terms');
|
||||||
'requested_accessory_code_context_prefix_terms',
|
|
||||||
$this->getAccessoryCandidateTerms()
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getRequestedAccessoryCodeProximityWindow(): int
|
public function getRequestedAccessoryCodeProximityWindow(): int
|
||||||
{
|
{
|
||||||
return $this->int('requested_accessory_code_proximity_window', 1600);
|
return $this->requiredPositiveInt('requested_accessory_code_proximity_window');
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
public function getSpecificModelCandidatePatterns(): array
|
public function getSpecificModelCandidatePatterns(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->requiredStringList('specific_model_candidate_patterns');
|
||||||
'specific_model_candidate_patterns',
|
|
||||||
[$this->getModelLikePattern()]
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
public function getModelCandidateExcludeTerms(): array
|
public function getModelCandidateExcludeTerms(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->requiredStringList('model_candidate_exclude_terms');
|
||||||
'model_candidate_exclude_terms',
|
|
||||||
array_merge($this->getAccessoryCandidateTerms(), ['verfuegbarkeit', 'verfügbarkeit', 'shop'])
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getTopProductLogLimit(): int
|
public function getTopProductLogLimit(): int
|
||||||
{
|
{
|
||||||
return 3;
|
return $this->requiredNonNegativeInt('limits.top_product_log_limit');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getModelCandidatePattern(): string
|
public function getModelCandidatePattern(): string
|
||||||
{
|
{
|
||||||
return '/\b([A-Za-zÄÖÜäöüß][A-Za-zÄÖÜäöüß®\-]*(?:\s+[A-Za-zÄÖÜäöüß][A-Za-zÄÖÜäöüß®\-]*){0,2}\s+\d{2,5}[A-Za-z0-9\-]*)\b/u';
|
return $this->requiredString('patterns.model_candidate');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getAccessoryCandidatePattern(): string
|
public function getAccessoryCandidatePattern(): string
|
||||||
{
|
{
|
||||||
return '/\b((?:' . implode('|', $this->getAccessoryCandidateTerms()) . ')\s+\d{1,5}[A-Za-z0-9\-]*)\b/iu';
|
return $this->renderPatternTemplate(
|
||||||
|
'patterns.accessory_candidate_template',
|
||||||
|
['terms' => $this->patternAlternation($this->getAccessoryCandidateTerms())]
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getRequestedAccessoryCodePattern(): string
|
public function getRequestedAccessoryCodePattern(): string
|
||||||
{
|
{
|
||||||
$fallbackTerms = array_map(
|
return $this->requiredString('patterns.requested_accessory_code');
|
||||||
static fn(string $term): string => preg_quote($term, '/'),
|
|
||||||
$this->getRequestedAccessoryCodeContextPrefixTerms()
|
|
||||||
);
|
|
||||||
$fallbackTerms = array_filter($fallbackTerms, static fn(string $term): bool => $term !== '');
|
|
||||||
|
|
||||||
$fallbackPattern = $fallbackTerms === []
|
|
||||||
? '/\b([A-Za-z]{0,3}\s*\d{1,5}[A-Za-z0-9\-]*)\b/iu'
|
|
||||||
: '/\b(?:' . implode('|', $fallbackTerms) . ')\s*([A-Za-z]{0,3}\s*\d{1,5}[A-Za-z0-9\-]*)\b/iu';
|
|
||||||
|
|
||||||
return $this->string('requested_accessory_code_pattern', $fallbackPattern);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getAccessoryOrBundlePattern(): string
|
public function getAccessoryOrBundlePattern(): string
|
||||||
{
|
{
|
||||||
return '/\b(' . implode('|', $this->getAccessoryOrBundleTerms()) . ')\b/iu';
|
return $this->renderPatternTemplate(
|
||||||
|
'patterns.accessory_or_bundle_template',
|
||||||
|
['terms' => $this->patternAlternation($this->getAccessoryOrBundleTerms())]
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getModelLikePattern(): string
|
public function getModelLikePattern(): string
|
||||||
{
|
{
|
||||||
return '/\b[A-Za-zÄÖÜäöüß][A-Za-zÄÖÜäöüß®\-]*(?:\s+[A-Za-zÄÖÜäöüß][A-Za-zÄÖÜäöüß®\-]*){0,2}\s+\d{2,5}[A-Za-z0-9\-]*\b/u';
|
return $this->requiredString('patterns.model_like');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getSpecificityBoostPattern(): string
|
public function getSpecificityBoostPattern(): string
|
||||||
{
|
{
|
||||||
return '/\b(?:' . implode('|', $this->getSpecificityBoostTerms()) . ')\b/iu';
|
return $this->renderPatternTemplate(
|
||||||
|
'patterns.specificity_boost_template',
|
||||||
|
['terms' => $this->patternAlternation($this->getSpecificityBoostTerms())]
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
public function getGenericCandidateTokens(): array
|
public function getGenericCandidateTokens(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->configOrVocabularyStringList(
|
||||||
'generic_candidate_tokens',
|
'generic_candidate_tokens',
|
||||||
$this->vocabularyView('search_repair.generic_candidate_tokens', self::GENERIC_CANDIDATE_TOKENS)
|
'search_repair.generic_candidate_tokens'
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getSanitizeTrimCharacters(): string
|
public function getSanitizeTrimCharacters(): string
|
||||||
{
|
{
|
||||||
return " \t\n\r\0\x0B\"'`.,;:-";
|
return implode('', array_map(
|
||||||
|
static fn (int $code): string => chr($code),
|
||||||
|
$this->requiredCharacterCodes('sanitize_trim_character_codes')
|
||||||
|
));
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getContainsDigitPattern(): string
|
public function getContainsDigitPattern(): string
|
||||||
{
|
{
|
||||||
return '/\d/u';
|
return $this->requiredString('patterns.contains_digit');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getWhitespaceCollapsePattern(): string
|
public function getWhitespaceCollapsePattern(): string
|
||||||
{
|
{
|
||||||
return '/\s+/u';
|
return $this->requiredString('patterns.whitespace_collapse');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getTokenizeCleanupPattern(): string
|
public function getTokenizeCleanupPattern(): string
|
||||||
{
|
{
|
||||||
return '/[^\p{L}\p{N}\s\-]+/u';
|
return $this->requiredString('patterns.tokenize_cleanup');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getProductKeySeparator(): string
|
public function getProductKeySeparator(): string
|
||||||
{
|
{
|
||||||
return '|';
|
return $this->requiredString('product_key_separator');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getCandidateDigitScore(): int
|
public function getCandidateDigitScore(): int
|
||||||
{
|
{
|
||||||
return 4;
|
return $this->requiredInt('scores.candidate_digit');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getCandidateWordCountCap(): int
|
public function getCandidateWordCountCap(): int
|
||||||
{
|
{
|
||||||
return 4;
|
return $this->requiredPositiveInt('scores.candidate_word_count_cap');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getSpecificityBoostScore(): int
|
public function getSpecificityBoostScore(): int
|
||||||
{
|
{
|
||||||
return 3;
|
return $this->requiredInt('scores.specificity_boost');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getPrimaryQueryOverlapThreshold(): float
|
public function getPrimaryQueryOverlapThreshold(): float
|
||||||
{
|
{
|
||||||
return 0.9;
|
return $this->requiredFloat('scores.primary_query_overlap_threshold');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getPromptMatchWeight(): int
|
public function getPromptMatchWeight(): int
|
||||||
{
|
{
|
||||||
return 3;
|
return $this->requiredInt('scores.prompt_match_weight');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getPrimaryQueryMatchWeight(): int
|
public function getPrimaryQueryMatchWeight(): int
|
||||||
{
|
{
|
||||||
return 2;
|
return $this->requiredInt('scores.primary_query_match_weight');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getRepairSignalMatchWeight(): int
|
public function getRepairSignalMatchWeight(): int
|
||||||
{
|
{
|
||||||
return 4;
|
return $this->requiredInt('scores.repair_signal_match_weight');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getPrimaryResultOrderBonus(): int
|
public function getPrimaryResultOrderBonus(): int
|
||||||
{
|
{
|
||||||
return 1;
|
return $this->requiredInt('scores.primary_result_order_bonus');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getTokenIntersectionScore(): int
|
public function getTokenIntersectionScore(): int
|
||||||
{
|
{
|
||||||
return 2;
|
return $this->requiredInt('scores.token_intersection_score');
|
||||||
}
|
}
|
||||||
|
|
||||||
public function getNumericTokenMatchScore(): int
|
public function getNumericTokenMatchScore(): int
|
||||||
{
|
{
|
||||||
return 4;
|
return $this->requiredInt('scores.numeric_token_match_score');
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
public function getAccessoryCandidateTerms(): array
|
public function getAccessoryCandidateTerms(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->configOrVocabularyStringList(
|
||||||
'accessory_candidate_terms',
|
'accessory_candidate_terms',
|
||||||
$this->vocabularyView('search_repair.accessory_candidate_terms', self::ACCESSORY_CANDIDATE_TERMS)
|
'search_repair.accessory_candidate_terms'
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
public function getAccessoryOrBundleTerms(): array
|
public function getAccessoryOrBundleTerms(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->configOrVocabularyStringList(
|
||||||
'accessory_or_bundle_terms',
|
'accessory_or_bundle_terms',
|
||||||
$this->vocabularyView('search_repair.accessory_or_bundle_terms', self::ACCESSORY_OR_BUNDLE_TERMS)
|
'search_repair.accessory_or_bundle_terms'
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
public function getSpecificityBoostTerms(): array
|
public function getSpecificityBoostTerms(): array
|
||||||
{
|
{
|
||||||
return $this->stringList(
|
return $this->configOrVocabularyStringList(
|
||||||
'specificity_boost_terms',
|
'specificity_boost_terms',
|
||||||
$this->vocabularyView('search_repair.specificity_boost_terms', self::SPECIFICITY_BOOST_TERMS)
|
'search_repair.specificity_boost_terms'
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
private function vocabularyView(string $path, array $fallback): array
|
private function configOrVocabularyStringList(string $configKey, string $vocabularyPath): array
|
||||||
{
|
{
|
||||||
return $this->vocabulary?->view($path, $fallback) ?? $fallback;
|
if (array_key_exists($configKey, $this->config)) {
|
||||||
}
|
return $this->requiredStringList($configKey);
|
||||||
|
|
||||||
private function string(string $key, string $default): string
|
|
||||||
{
|
|
||||||
$value = $this->config[$key] ?? $default;
|
|
||||||
|
|
||||||
if (!is_scalar($value)) {
|
|
||||||
return $default;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
$value = trim((string) $value);
|
$items = $this->vocabulary->view($vocabularyPath, []);
|
||||||
|
if ($items === []) {
|
||||||
return $value !== '' ? $value : $default;
|
throw new \InvalidArgumentException(sprintf(
|
||||||
}
|
'Missing required RetrieX search repair vocabulary view "%s".',
|
||||||
|
$vocabularyPath
|
||||||
private function int(string $key, int $default): int
|
));
|
||||||
{
|
|
||||||
$value = $this->config[$key] ?? $default;
|
|
||||||
|
|
||||||
if (is_int($value)) {
|
|
||||||
return $value;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (is_numeric($value)) {
|
return $items;
|
||||||
return (int) $value;
|
|
||||||
}
|
|
||||||
|
|
||||||
return $default;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
private function bool(string $key, bool $default): bool
|
/** @param array<string, string> $variables */
|
||||||
|
private function renderPatternTemplate(string $path, array $variables): string
|
||||||
{
|
{
|
||||||
$value = $this->config[$key] ?? $default;
|
$template = $this->requiredString($path);
|
||||||
|
|
||||||
|
foreach ($variables as $key => $value) {
|
||||||
|
$template = str_replace('{' . $key . '}', $value, $template);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (preg_match('/\{[A-Za-z_][A-Za-z0-9_]*\}/', $template) === 1) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair pattern template "%s" contains unresolved placeholders.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
return $template;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** @param string[] $terms */
|
||||||
|
private function patternAlternation(array $terms): string
|
||||||
|
{
|
||||||
|
$terms = array_values(array_filter(
|
||||||
|
array_map(static fn (string $term): string => trim($term), $terms),
|
||||||
|
static fn (string $term): bool => $term !== ''
|
||||||
|
));
|
||||||
|
|
||||||
|
if ($terms === []) {
|
||||||
|
throw new \InvalidArgumentException('RetrieX search repair pattern alternation requires at least one term.');
|
||||||
|
}
|
||||||
|
|
||||||
|
return implode('|', $terms);
|
||||||
|
}
|
||||||
|
|
||||||
|
private function requiredBool(string $path): bool
|
||||||
|
{
|
||||||
|
$value = $this->requiredValue($path);
|
||||||
|
|
||||||
if (is_bool($value)) {
|
if (is_bool($value)) {
|
||||||
return $value;
|
return $value;
|
||||||
@@ -357,15 +314,80 @@ final class SearchRepairConfig
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return $default;
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" must be boolean.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
private function requiredString(string $path): string
|
||||||
|
{
|
||||||
|
$value = $this->requiredValue($path);
|
||||||
|
|
||||||
|
if (!is_scalar($value)) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" must be a string.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
$value = trim((string) $value);
|
||||||
|
if ($value === '') {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" must not be empty.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
return $value;
|
||||||
|
}
|
||||||
|
|
||||||
|
private function requiredInt(string $path): int
|
||||||
|
{
|
||||||
|
$value = $this->requiredValue($path);
|
||||||
|
|
||||||
|
if (is_int($value)) {
|
||||||
|
return $value;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (is_string($value) && preg_match('/^-?\d+$/', trim($value)) === 1) {
|
||||||
|
return (int) trim($value);
|
||||||
|
}
|
||||||
|
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" must be an integer.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
private function requiredNonNegativeInt(string $path): int
|
||||||
|
{
|
||||||
|
$value = $this->requiredInt($path);
|
||||||
|
|
||||||
|
if ($value < 0) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" must be greater than or equal to 0.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
return $value;
|
||||||
|
}
|
||||||
|
|
||||||
|
private function requiredPositiveInt(string $path): int
|
||||||
|
{
|
||||||
|
$value = $this->requiredInt($path);
|
||||||
|
|
||||||
|
if ($value <= 0) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" must be greater than 0.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
return $value;
|
||||||
|
}
|
||||||
|
|
||||||
|
private function requiredFloat(string $path): float
|
||||||
|
{
|
||||||
|
$value = $this->requiredValue($path);
|
||||||
|
|
||||||
|
if (is_int($value) || is_float($value) || (is_string($value) && is_numeric(trim($value)))) {
|
||||||
|
return (float) $value;
|
||||||
|
}
|
||||||
|
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" must be numeric.', $path));
|
||||||
}
|
}
|
||||||
|
|
||||||
/** @return string[] */
|
/** @return string[] */
|
||||||
private function stringList(string $key, array $default): array
|
private function requiredStringList(string $path): array
|
||||||
{
|
{
|
||||||
$value = $this->config[$key] ?? $default;
|
$value = $this->requiredValue($path);
|
||||||
|
|
||||||
if (!is_array($value)) {
|
if (!is_array($value)) {
|
||||||
return $default;
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" must be a list.', $path));
|
||||||
}
|
}
|
||||||
|
|
||||||
$out = [];
|
$out = [];
|
||||||
@@ -382,6 +404,57 @@ final class SearchRepairConfig
|
|||||||
$out[] = $item;
|
$out[] = $item;
|
||||||
}
|
}
|
||||||
|
|
||||||
return $out !== [] ? $out : $default;
|
if ($out === []) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" must not be empty.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
return $out;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** @return int[] */
|
||||||
|
private function requiredCharacterCodes(string $path): array
|
||||||
|
{
|
||||||
|
$value = $this->requiredValue($path);
|
||||||
|
|
||||||
|
if (!is_array($value)) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" must be a list of character codes.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
$codes = [];
|
||||||
|
foreach ($value as $item) {
|
||||||
|
if (is_int($item)) {
|
||||||
|
$code = $item;
|
||||||
|
} elseif (is_string($item) && preg_match('/^\d+$/', trim($item)) === 1) {
|
||||||
|
$code = (int) trim($item);
|
||||||
|
} else {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" contains a non-integer character code.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($code < 0 || $code > 255) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" contains an invalid character code.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
$codes[] = $code;
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($codes === []) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('RetrieX search repair config key "%s" must not be empty.', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
return $codes;
|
||||||
|
}
|
||||||
|
|
||||||
|
private function requiredValue(string $path): mixed
|
||||||
|
{
|
||||||
|
$current = $this->config;
|
||||||
|
foreach (explode('.', $path) as $segment) {
|
||||||
|
if (!is_array($current) || !array_key_exists($segment, $current)) {
|
||||||
|
throw new \InvalidArgumentException(sprintf('Missing required RetrieX search repair config key "%s".', $path));
|
||||||
|
}
|
||||||
|
|
||||||
|
$current = $current[$segment];
|
||||||
|
}
|
||||||
|
|
||||||
|
return $current;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user