p43A
This commit is contained in:
@@ -0,0 +1,119 @@
|
||||
# RetrieX Patch 43A - Config Reduction / Generic Flow Prep
|
||||
|
||||
## Goal
|
||||
|
||||
Reduce the number of actively duplicated YAML parameters without changing the proven runtime values or introducing an admin UI.
|
||||
|
||||
This patch intentionally does **not** change scoring, ranking, retrieval thresholds, prompt guardrails, or shop matching behavior. It only moves already existing duplicate term lists behind central vocabulary views and renames one follow-up-anchor concept from product-specific names to generic names.
|
||||
|
||||
## Why this is split out
|
||||
|
||||
The larger cleanup should not be delivered as one large patch because it would mix three risk classes:
|
||||
|
||||
1. Safe config deduplication and generic naming.
|
||||
2. Shared product-role resolver logic.
|
||||
3. More generic domain anchor extraction beyond the current Testomat / hardness use case.
|
||||
|
||||
Patch 43A covers only class 1.
|
||||
|
||||
## Changes
|
||||
|
||||
### YAML reduction
|
||||
|
||||
The following direct per-service lists were removed from local service config files and are now resolved through `config/retriex/vocabulary.yaml` views:
|
||||
|
||||
- `prompt.yaml`
|
||||
- `technical_product_keywords`
|
||||
- `accessory_request_keywords`
|
||||
- `retrieval.yaml`
|
||||
- `generic_product_tokens`
|
||||
- `important_short_model_tokens`
|
||||
- `family_descriptor_tokens`
|
||||
- `looks_like_reagent_tokens`
|
||||
- `looks_like_safety_docs`
|
||||
- `looks_like_reagent_words`
|
||||
- `looks_like_document_words`
|
||||
- `looks_like_safety_words`
|
||||
- `looks_like_device_words`
|
||||
- `commerce.yaml`
|
||||
- `semantic_shop_search_tokens`
|
||||
|
||||
The removed local lists are referenced through new `vocabulary_views` mappings.
|
||||
|
||||
### Vocabulary updates
|
||||
|
||||
`vocabulary.yaml` now contains the exact effective legacy values for the moved lists, including the previously local prompt accessory keywords and shop semantic search terms.
|
||||
|
||||
### PHP config facade changes
|
||||
|
||||
These config classes can now resolve either a direct local override or a central vocabulary view:
|
||||
|
||||
- `PromptBuilderConfig`
|
||||
- `NdjsonHybridRetrieverConfig`
|
||||
- `CommerceQueryParserConfig`
|
||||
|
||||
Direct local lists remain backward-compatible. If a project later needs a local override, the old list key can still be added back to the service-specific YAML.
|
||||
|
||||
### Generic follow-up anchor naming
|
||||
|
||||
The follow-up anchor names were made generic:
|
||||
|
||||
- `testomat_model_pattern` -> `product_model_pattern`
|
||||
- `hardness_value_pattern` -> `measurement_value_pattern`
|
||||
- `extractFirstTestomatModelAnchor()` -> `extractFirstProductModelAnchor()`
|
||||
- `extractFirstHardnessValueAnchor()` -> `extractFirstMeasurementValueAnchor()`
|
||||
|
||||
Backward-compatible accessor aliases remain in `AgentRunnerConfig`.
|
||||
|
||||
## Behavior impact
|
||||
|
||||
Expected runtime behavior: unchanged.
|
||||
|
||||
A local equivalence check compared all moved lists against the current `rag-inprogress.zip` source values. The moved vocabulary views resolve to the same effective values as before, accounting for the existing de-duplication behavior in the PHP config facades.
|
||||
|
||||
## Checks run locally
|
||||
|
||||
Successful:
|
||||
|
||||
```bash
|
||||
php -l src/Config/PromptBuilderConfig.php
|
||||
php -l src/Config/NdjsonHybridRetrieverConfig.php
|
||||
php -l src/Config/CommerceQueryParserConfig.php
|
||||
php -l src/Config/AgentRunnerConfig.php
|
||||
php -l src/Agent/AgentRunner.php
|
||||
```
|
||||
|
||||
Successful custom checks:
|
||||
|
||||
- edited YAML files parse successfully
|
||||
- moved vocabulary lists equal previous effective values
|
||||
|
||||
Not executable in this container:
|
||||
|
||||
```bash
|
||||
php bin/console mto:agent:config:validate
|
||||
php bin/console mto:agent:regression:test
|
||||
php bin/console mto:agent:config:audit-source --details
|
||||
php bin/console mto:agent:config:audit-patterns --details
|
||||
```
|
||||
|
||||
Reason: the uploaded ZIP does not contain `vendor/`, and Composer installation could not complete in the container because required PHP extensions are missing (`curl`, `dom`, `sqlite3`, `xml`) and external package downloads are not available.
|
||||
|
||||
## Required checks after applying in the project environment
|
||||
|
||||
```bash
|
||||
bin/console mto:agent:config:validate
|
||||
bin/console mto:agent:regression:test
|
||||
bin/console mto:agent:config:audit-source --details
|
||||
bin/console mto:agent:config:audit-patterns --details
|
||||
```
|
||||
|
||||
## Recommended follow-up patches
|
||||
|
||||
### p43B - Shared ProductRoleResolver
|
||||
|
||||
Centralize product role detection (`main_product`, `accessory`, `consumable`, `spare_part`, `unknown`) so PromptBuilder, ShopSearchService, SearchRepairService and AgentRunner do not maintain parallel role checks.
|
||||
|
||||
### p43C - Generic Domain Anchor Extraction
|
||||
|
||||
Make the current product-model and measurement-value anchor extraction more domain-generic while preserving the existing Testomat / °dH patterns as configured values.
|
||||
Reference in New Issue
Block a user