Files
MtoRagSystem/patch_history/RETRIEX_NUMERIC_EXTREME_RETRIEVAL_FIX_README.md
2026-05-04 19:15:22 +02:00

2.2 KiB

RetrieX Numeric Extreme Retrieval Fix

Purpose

This patch sharpens retrieval for direct numeric extreme questions such as the lowest hardness threshold.

The concrete regression was:

  • User asks for the lowest water-hardness threshold monitored by a Testomat.
  • The correct answer is 0,02 °dH / Testomat 808.
  • Retrieval still allowed neighbouring runner-up product context such as Testomat 2000 / 0,05 °dH into the prompt.

That made the model add unnecessary comparison details although the user asked only for the lowest value.

Change

src/Knowledge/Retrieval/NdjsonHybridRetriever.php now adds a conservative numeric-extreme document selection step between focused-product selection and normal dominant/spread selection.

The new mode:

  • detects minimum/maximum-style technical measurement questions,
  • extracts dH measurement values from the top retrieval window,
  • identifies the document containing the actual extreme value,
  • selects chunks from that document only,
  • avoids filling the remaining prompt slots with runner-up product chunks.

New debug selection mode:

sales_numeric_extreme_document

Safety

The fix is intentionally narrow:

  • no PromptBuilder changes,
  • no prompt wording changes,
  • no Shopware logic changes,
  • no vector-service changes,
  • no scoring config changes,
  • no vocabulary changes.

It only affects technical numeric extreme questions containing measurement/context signals such as Grenzwert, Messbereich, Wasserhärte, Resthärte, dH, threshold, or range.

Expected regression result

Question:

Was ist der niedrigste Grenzwert für die Wasserhärte, welcher mit einem Testomaten überwacht werden kann?

Expected answer should stay focused on:

0,02 °dH / Testomat 808

It should not add the runner-up product/value such as:

Testomat 2000 / 0,05 °dH

unless the user explicitly asks for comparison, alternatives, or all available values.

After applying

Run:

php bin/console cache:clear
php bin/console mto:agent:config:validate
php bin/console mto:agent:regression:test

Then manually retest the known 1.4.2 baseline and the lowest-threshold prompt above.