This commit is contained in:
team 1
2026-05-03 20:25:08 +02:00
parent ea2d1bc7d5
commit ff407d9fa1
2 changed files with 139 additions and 0 deletions

View File

@@ -0,0 +1,36 @@
# RetrieX Patch 21 - Language Cleanup Profiles groundwork
## Goal
Prepare RetrieX 1.5.3 for simpler, centralized language cleanup without changing runtime behavior yet.
## Changes
- Extends `config/retriex/language.yaml` additively.
- Keeps legacy `retriex.stopwords.config.words` unchanged.
- Adds central groups for protected terms, German core stopwords, conversation noise, user instruction phrases, presentation/meta terms, and cleanup profiles.
- Introduces initial profiles: `commerce_query`, `rag_evidence`, `shop_context_fallback`.
## Non-goals
- No external stopword library.
- No Commerce/Agent runtime wiring yet.
- No removal of existing lists in `commerce.yaml`, `agent.yaml`, or `retrieval.yaml`.
- No domain-specific special cases.
## Install
Copy the files from this patch over the current RetrieX root.
```bash
unzip retriex-p21-language-cleanup-profiles-patch-only.zip -d /path/to/retriex
cd /path/to/retriex
bin/console mto:agent:config:validate
bin/console mto:agent:regression:test
bin/console mto:agent:config:audit-source --details
bin/console mto:agent:config:audit-patterns --details
```
## Expected result
All checks should remain green. This patch should not change answers yet.

View File

@@ -50,3 +50,106 @@ parameters:
- würde - würde
- würdest - würdest
- würden - würden
# Central language cleanup structure for RetrieX 1.5.3+.
# Legacy key `words` above remains the runtime-compatible default list.
# New cleanup profiles are introduced additively and are not yet wired into
# Commerce/Agent runtime logic in this patch.
protected_terms:
- nicht
- kein
- keine
- testomat
- indikator
- indikatortyp
- ph
- rx
- th
- tc
- '0,02'
stopword_groups:
de_core:
- der
- die
- das
- den
- dem
- des
- ein
- eine
- einer
- eines
- und
- oder
- mit
- für
- fuer
- ist
- sind
- kann
- können
- koennen
conversation:
- bitte
- mal
- gerne
- gern
- auch
- noch
- nochmal
- dazu
- davon
- also
- danke
phrase_groups:
user_instruction:
- ich suche
- suche nach
- zeige mir
- zeig mir
- gib mir
- gebe mir
- nenne mir
- habt ihr
- gibt es
- suche im shop
meta_term_groups:
presentation:
- tabelle
- tabellarisch
- liste
- übersicht
- uebersicht
- auflistung
cleanup_profiles:
commerce_query:
stopword_groups:
- de_core
- conversation
phrase_groups:
- user_instruction
protected_term_groups:
- protected_terms
rag_evidence:
stopword_groups:
- de_core
- conversation
protected_term_groups:
- protected_terms
shop_context_fallback:
stopword_groups:
- de_core
- conversation
phrase_groups:
- user_instruction
meta_term_groups:
- presentation
protected_term_groups:
- protected_terms