p61c fix
This commit is contained in:
@@ -0,0 +1,86 @@
|
||||
# RetrieX Patch p61C - Positive Shop Query Token Filter on p60
|
||||
|
||||
p61C reapplies the positive Shopware query token filter on the confirmed p60 baseline.
|
||||
|
||||
## Why p61C exists
|
||||
|
||||
p61B was built on a stale base and reintroduced legacy `agent.no_llm_fallback.product_roles.vocabulary_views.*` paths that had already been removed by p59G. It also did not reliably preserve the p60 referential device anchor in the generated query.
|
||||
|
||||
p61C uses the confirmed p60 baseline and keeps the p59G/p60 cleanup intact.
|
||||
|
||||
## Goal
|
||||
|
||||
The final plain Shopware query should contain only product-relevant tokens:
|
||||
|
||||
- product/device/accessory names from the active genre vocabulary
|
||||
- explicitly allowed product family/application terms
|
||||
- protected short technical terms such as pH/RX/TH/TC/TP/TM when configured
|
||||
- model/type/code tokens such as `808`, `300`, `TH2100`, `2x100ml` when they match configured regex patterns
|
||||
|
||||
Sentence, relation and RAG-only reference words such as `gemessen`, `beim` or `indikatortyp` must not dominate the shop query.
|
||||
|
||||
## Expected example
|
||||
|
||||
Input query after p60 referential/RAG anchoring:
|
||||
|
||||
```text
|
||||
testomat 808 gemessen 300 beim indikator
|
||||
```
|
||||
|
||||
Final shop query after p61C:
|
||||
|
||||
```text
|
||||
testomat 808 300 indikator
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Primary configuration lives in:
|
||||
|
||||
```yaml
|
||||
config/retriex/genre.yaml
|
||||
parameters:
|
||||
retriex.genre.config:
|
||||
configuration_values:
|
||||
shop_query_runtime:
|
||||
positive_token_filter:
|
||||
```
|
||||
|
||||
Important fields:
|
||||
|
||||
- `enabled`: activates the filter for the active genre.
|
||||
- `min_query_tokens_after_filter`: set to `1` so a single valid product token can still replace a noisy query.
|
||||
- `allowed_terms`: extra genre-specific product family/application terms.
|
||||
- `blocked_terms`: terms that are useful for RAG/reference resolution but poor shop search tokens.
|
||||
- `code_patterns`: regex patterns for model/type/article/size tokens.
|
||||
- `include_current_input_preservation_terms`: includes configured protected short terms from the shop query preservation surface.
|
||||
- `include_semantic_shop_search_tokens`: includes the genre's shop semantic product vocabulary.
|
||||
- `include_product_role_terms`: includes the genre's device/accessory role vocabulary.
|
||||
|
||||
`agent.yaml` contains only an inactive compatibility fallback for this feature. Runtime values should be maintained in `genre.yaml`.
|
||||
|
||||
## Scope
|
||||
|
||||
No hard-coded product names or stopword lists were added to PHP. The PHP code only applies the configured positive token filter.
|
||||
|
||||
No changes to:
|
||||
|
||||
- retrieval ranking
|
||||
- prompt rules
|
||||
- shop result scoring
|
||||
- SearchRepair
|
||||
- intent routing
|
||||
- product identity matching
|
||||
|
||||
## Validation
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
bin/console mto:agent:config:validate
|
||||
bin/console mto:agent:regression:test
|
||||
bin/console mto:agent:config:audit-source --details
|
||||
bin/console mto:agent:config:audit-patterns --details
|
||||
```
|
||||
|
||||
The p59G no-LLM legacy paths must remain absent from `agent.yaml`, `genre.yaml` source paths and `governance.yaml` frozen hashes.
|
||||
Reference in New Issue
Block a user