This commit is contained in:
team 1
2026-05-05 12:12:51 +02:00
parent da374edcf4
commit 2c041a88c0
12 changed files with 429 additions and 282 deletions

View File

@@ -0,0 +1,119 @@
# RetrieX Patch 43A - Config Reduction / Generic Flow Prep
## Goal
Reduce the number of actively duplicated YAML parameters without changing the proven runtime values or introducing an admin UI.
This patch intentionally does **not** change scoring, ranking, retrieval thresholds, prompt guardrails, or shop matching behavior. It only moves already existing duplicate term lists behind central vocabulary views and renames one follow-up-anchor concept from product-specific names to generic names.
## Why this is split out
The larger cleanup should not be delivered as one large patch because it would mix three risk classes:
1. Safe config deduplication and generic naming.
2. Shared product-role resolver logic.
3. More generic domain anchor extraction beyond the current Testomat / hardness use case.
Patch 43A covers only class 1.
## Changes
### YAML reduction
The following direct per-service lists were removed from local service config files and are now resolved through `config/retriex/vocabulary.yaml` views:
- `prompt.yaml`
- `technical_product_keywords`
- `accessory_request_keywords`
- `retrieval.yaml`
- `generic_product_tokens`
- `important_short_model_tokens`
- `family_descriptor_tokens`
- `looks_like_reagent_tokens`
- `looks_like_safety_docs`
- `looks_like_reagent_words`
- `looks_like_document_words`
- `looks_like_safety_words`
- `looks_like_device_words`
- `commerce.yaml`
- `semantic_shop_search_tokens`
The removed local lists are referenced through new `vocabulary_views` mappings.
### Vocabulary updates
`vocabulary.yaml` now contains the exact effective legacy values for the moved lists, including the previously local prompt accessory keywords and shop semantic search terms.
### PHP config facade changes
These config classes can now resolve either a direct local override or a central vocabulary view:
- `PromptBuilderConfig`
- `NdjsonHybridRetrieverConfig`
- `CommerceQueryParserConfig`
Direct local lists remain backward-compatible. If a project later needs a local override, the old list key can still be added back to the service-specific YAML.
### Generic follow-up anchor naming
The follow-up anchor names were made generic:
- `testomat_model_pattern` -> `product_model_pattern`
- `hardness_value_pattern` -> `measurement_value_pattern`
- `extractFirstTestomatModelAnchor()` -> `extractFirstProductModelAnchor()`
- `extractFirstHardnessValueAnchor()` -> `extractFirstMeasurementValueAnchor()`
Backward-compatible accessor aliases remain in `AgentRunnerConfig`.
## Behavior impact
Expected runtime behavior: unchanged.
A local equivalence check compared all moved lists against the current `rag-inprogress.zip` source values. The moved vocabulary views resolve to the same effective values as before, accounting for the existing de-duplication behavior in the PHP config facades.
## Checks run locally
Successful:
```bash
php -l src/Config/PromptBuilderConfig.php
php -l src/Config/NdjsonHybridRetrieverConfig.php
php -l src/Config/CommerceQueryParserConfig.php
php -l src/Config/AgentRunnerConfig.php
php -l src/Agent/AgentRunner.php
```
Successful custom checks:
- edited YAML files parse successfully
- moved vocabulary lists equal previous effective values
Not executable in this container:
```bash
php bin/console mto:agent:config:validate
php bin/console mto:agent:regression:test
php bin/console mto:agent:config:audit-source --details
php bin/console mto:agent:config:audit-patterns --details
```
Reason: the uploaded ZIP does not contain `vendor/`, and Composer installation could not complete in the container because required PHP extensions are missing (`curl`, `dom`, `sqlite3`, `xml`) and external package downloads are not available.
## Required checks after applying in the project environment
```bash
bin/console mto:agent:config:validate
bin/console mto:agent:regression:test
bin/console mto:agent:config:audit-source --details
bin/console mto:agent:config:audit-patterns --details
```
## Recommended follow-up patches
### p43B - Shared ProductRoleResolver
Centralize product role detection (`main_product`, `accessory`, `consumable`, `spare_part`, `unknown`) so PromptBuilder, ShopSearchService, SearchRepairService and AgentRunner do not maintain parallel role checks.
### p43C - Generic Domain Anchor Extraction
Make the current product-model and measurement-value anchor extraction more domain-generic while preserving the existing Testomat / °dH patterns as configured values.