This commit is contained in:
team 1
2026-05-07 19:17:59 +02:00
parent 61f6841a5a
commit 476b664520
6 changed files with 298 additions and 24 deletions

View File

@@ -0,0 +1,86 @@
# RetrieX Patch p61C - Positive Shop Query Token Filter on p60
p61C reapplies the positive Shopware query token filter on the confirmed p60 baseline.
## Why p61C exists
p61B was built on a stale base and reintroduced legacy `agent.no_llm_fallback.product_roles.vocabulary_views.*` paths that had already been removed by p59G. It also did not reliably preserve the p60 referential device anchor in the generated query.
p61C uses the confirmed p60 baseline and keeps the p59G/p60 cleanup intact.
## Goal
The final plain Shopware query should contain only product-relevant tokens:
- product/device/accessory names from the active genre vocabulary
- explicitly allowed product family/application terms
- protected short technical terms such as pH/RX/TH/TC/TP/TM when configured
- model/type/code tokens such as `808`, `300`, `TH2100`, `2x100ml` when they match configured regex patterns
Sentence, relation and RAG-only reference words such as `gemessen`, `beim` or `indikatortyp` must not dominate the shop query.
## Expected example
Input query after p60 referential/RAG anchoring:
```text
testomat 808 gemessen 300 beim indikator
```
Final shop query after p61C:
```text
testomat 808 300 indikator
```
## Configuration
Primary configuration lives in:
```yaml
config/retriex/genre.yaml
parameters:
retriex.genre.config:
configuration_values:
shop_query_runtime:
positive_token_filter:
```
Important fields:
- `enabled`: activates the filter for the active genre.
- `min_query_tokens_after_filter`: set to `1` so a single valid product token can still replace a noisy query.
- `allowed_terms`: extra genre-specific product family/application terms.
- `blocked_terms`: terms that are useful for RAG/reference resolution but poor shop search tokens.
- `code_patterns`: regex patterns for model/type/article/size tokens.
- `include_current_input_preservation_terms`: includes configured protected short terms from the shop query preservation surface.
- `include_semantic_shop_search_tokens`: includes the genre's shop semantic product vocabulary.
- `include_product_role_terms`: includes the genre's device/accessory role vocabulary.
`agent.yaml` contains only an inactive compatibility fallback for this feature. Runtime values should be maintained in `genre.yaml`.
## Scope
No hard-coded product names or stopword lists were added to PHP. The PHP code only applies the configured positive token filter.
No changes to:
- retrieval ranking
- prompt rules
- shop result scoring
- SearchRepair
- intent routing
- product identity matching
## Validation
Run:
```bash
bin/console mto:agent:config:validate
bin/console mto:agent:regression:test
bin/console mto:agent:config:audit-source --details
bin/console mto:agent:config:audit-patterns --details
```
The p59G no-LLM legacy paths must remain absent from `agent.yaml`, `genre.yaml` source paths and `governance.yaml` frozen hashes.