79 lines
2.2 KiB
Markdown
79 lines
2.2 KiB
Markdown
# RetrieX Numeric Extreme Retrieval Fix
|
|
|
|
## Purpose
|
|
|
|
This patch sharpens retrieval for direct numeric extreme questions such as the lowest hardness threshold.
|
|
|
|
The concrete regression was:
|
|
|
|
- User asks for the lowest water-hardness threshold monitored by a Testomat.
|
|
- The correct answer is `0,02 °dH` / `Testomat 808`.
|
|
- Retrieval still allowed neighbouring runner-up product context such as `Testomat 2000` / `0,05 °dH` into the prompt.
|
|
|
|
That made the model add unnecessary comparison details although the user asked only for the lowest value.
|
|
|
|
## Change
|
|
|
|
`src/Knowledge/Retrieval/NdjsonHybridRetriever.php` now adds a conservative numeric-extreme document selection step between focused-product selection and normal dominant/spread selection.
|
|
|
|
The new mode:
|
|
|
|
- detects minimum/maximum-style technical measurement questions,
|
|
- extracts dH measurement values from the top retrieval window,
|
|
- identifies the document containing the actual extreme value,
|
|
- selects chunks from that document only,
|
|
- avoids filling the remaining prompt slots with runner-up product chunks.
|
|
|
|
New debug selection mode:
|
|
|
|
```text
|
|
sales_numeric_extreme_document
|
|
```
|
|
|
|
## Safety
|
|
|
|
The fix is intentionally narrow:
|
|
|
|
- no PromptBuilder changes,
|
|
- no prompt wording changes,
|
|
- no Shopware logic changes,
|
|
- no vector-service changes,
|
|
- no scoring config changes,
|
|
- no vocabulary changes.
|
|
|
|
It only affects technical numeric extreme questions containing measurement/context signals such as `Grenzwert`, `Messbereich`, `Wasserhärte`, `Resthärte`, `dH`, `threshold`, or `range`.
|
|
|
|
## Expected regression result
|
|
|
|
Question:
|
|
|
|
```text
|
|
Was ist der niedrigste Grenzwert für die Wasserhärte, welcher mit einem Testomaten überwacht werden kann?
|
|
```
|
|
|
|
Expected answer should stay focused on:
|
|
|
|
```text
|
|
0,02 °dH / Testomat 808
|
|
```
|
|
|
|
It should not add the runner-up product/value such as:
|
|
|
|
```text
|
|
Testomat 2000 / 0,05 °dH
|
|
```
|
|
|
|
unless the user explicitly asks for comparison, alternatives, or all available values.
|
|
|
|
## After applying
|
|
|
|
Run:
|
|
|
|
```bash
|
|
php bin/console cache:clear
|
|
php bin/console mto:agent:config:validate
|
|
php bin/console mto:agent:regression:test
|
|
```
|
|
|
|
Then manually retest the known 1.4.2 baseline and the lowest-threshold prompt above.
|