Files
MtoRagSystem/DEVELOPER_POLICIES.md
team 1 c00cb3a9b9 p28
2026-05-04 08:38:53 +02:00

141 lines
5.5 KiB
Markdown

# RetrieX Developer Policies
Status: binding after completion of the YAML-only migration through Patch 11.0a.
These policies protect the stable RetrieX configuration architecture. They are intentionally operational and must be followed by developers before a change is merged.
## 1. Source of truth
The functional configuration source of truth is YAML under `config/retriex/`.
This applies especially to:
- vocabulary, synonyms, stopwords and token lists
- intent rules and commerce routing
- search repair and query enrichment rules
- prompt texts, labels, output-priority rules and grounding rules
- agent messages, source labels, status messages and templates
- retrieval thresholds, token groups and scoring-related rule lists
- shop matching and commerce query parsing rules
- model and index defaults
PHP classes may read and validate these values, but must not silently redefine them.
## 2. No new PHP-only defaults
New configurable values must be added in YAML first.
Required pattern:
1. Add the value to the matching file in `config/retriex/`.
2. Wire the value through `config/services.yaml` when constructor injection is needed.
3. Expose/read it through the matching `src/Config/*Config.php` class.
4. Keep `mto:agent:config:validate` green.
5. Keep `mto:agent:config:audit-source --details` free of missing YAML mappings.
Not allowed:
- new constructor defaults that act as business or answer logic
- new hardcoded keyword lists in `src/`
- new PHP-only constants for semantic or product-specific behavior
- hidden fallbacks that change retrieval, prompt, shop or intent behavior when YAML is incomplete
## 3. Allowed technical constants
Technical constants are allowed only when they are not business, prompt, retrieval, product, intent or shop semantics.
Examples that may be acceptable:
- internal status strings
- command exit handling
- filesystem mode details
- non-semantic implementation identifiers
If a constant influences answer quality, matching, routing, scoring, prompt behavior or shop behavior, it belongs in YAML.
## 4. Fallback policy
Fallbacks are not a normal extension mechanism.
A fallback is only acceptable when all conditions are met:
- the value has a YAML path or explicit service-parameter mapping
- the fallback is documented as defensive infrastructure behavior
- the audit does not report it as missing YAML mapping
- the fallback cannot change semantic answer behavior in normal operation
If in doubt, move the value to YAML.
## 5. Required checks before merge
Every change touching `src/Config`, `config/retriex`, prompt, retrieval, intent, commerce, shop matching, SSE/job completion or answer grounding must run:
```bash
php bin/console cache:clear
php bin/console mto:agent:config:validate
php bin/console mto:agent:config:audit-source --details
php bin/console mto:agent:regression:test
```
Expected result:
- config validation: OK
- regression baseline: OK
- source audit: no missing YAML mappings
- no new undocumented PHP-only semantic constants
- no new constructor defaults without YAML/service-parameter mapping
## 6. Protected regression baseline
The following behavior must not regress:
- lowest water-hardness limit remains `0,02 deg dH` for Testomat 808
- follow-up indicator answer remains focused on indicator type 300
- accessory price follow-up for indicator type 300 returns the matching indicator products, not device prices
- history-based shop follow-up such as `suche im shop` keeps the relevant product context
- advisory product questions may use shop/catalog fallback when RAG knowledge is insufficient
- SSE/job completion must close loader/think states reliably, including reconnect/watchdog cases
Offline checks are covered by `mto:agent:regression:test`. End-to-end behavior still needs manual or integration verification when the touched code path is not covered offline.
## 7. Strict YAML validation
Strict YAML validation remains intentionally deferred.
Until a later patch explicitly enables it, developers must enforce these policies through:
- code review
- `mto:agent:config:validate`
- `mto:agent:config:audit-source --details`
- `mto:agent:regression:test`
Strict mode must remain configurable and disabled by default when it is introduced later.
## 8. Pull request checklist
Use this checklist for every relevant PR:
- [ ] All new configurable behavior is in `config/retriex/*.yaml`.
- [ ] No new semantic keyword/token/prompt list was added directly to PHP.
- [ ] No new constructor default was added without YAML/service-parameter mapping.
- [ ] `mto:agent:config:validate` is OK.
- [ ] `mto:agent:config:audit-source --details` has no missing YAML mappings.
- [ ] `mto:agent:regression:test` is OK.
- [ ] The protected functional flows were manually checked if the touched area can affect them.
- [ ] README or patch README documents the reason for any intentionally accepted technical fallback.
## 9. Language cleanup ownership
Generic language cleanup must use `config/retriex/language.yaml` and its cleanup profiles.
Rules:
- add generic German stopwords to `stopword_groups`, not to domain YAML files
- add user wording such as `ich suche`, `zeige mir` or `habt ihr` to `phrase_groups`
- add table/list/overview wording to `meta_term_groups`
- keep commerce intent, product-role, measurement and routing terms in their owning domain YAML
- never remove protected terms such as `nicht`, `kein`, `testomat`, `indikator`, `ph`, `rx`, `th`, `tc` or `0,02` through generic cleanup
- prefer `cleanup_profile: ...` references over copied token lists
See `RETRIEX_LANGUAGE_CLEANUP_GUIDE.md` for the detailed ownership rules.