Files
MtoRagSystem/RETRIEX_SSE_JOB_HARDENING_FIX_README.md
2026-04-26 13:09:01 +02:00

51 lines
1.8 KiB
Markdown

# RetrieX SSE Job Hardening Fix
Patch-only fix for the browser streaming job lifecycle.
## Problem
`/ask-sse/{jobId}` deleted the stream job immediately when the first EventSource connection started.
If the browser, WLAN, router, proxy or PHP/Nginx connection briefly dropped, EventSource tried to reconnect with the same job id. The job file was already gone, so the user saw:
> Der Antwort-Job ist abgelaufen oder wurde nicht gefunden. Bitte sende die Anfrage erneut.
This made normal network interruptions look like an expired job.
## Change
`src/Controller/AskSseController.php` now keeps the job file for the configured TTL and uses explicit job states:
- `pending`
- `running`
- `completed`
- `interrupted`
- `failed`
The stream endpoint atomically claims a pending job under a file lock instead of deleting it immediately. Reconnects or duplicate opens no longer see a missing job; they receive a more accurate message depending on the stored state.
## Runtime behavior
- A new job is created as `pending`.
- The first `/ask-sse/{jobId}` request claims it as `running`.
- Successful completion marks it as `completed`.
- Browser/client connection abort marks it as `interrupted`.
- Stream exceptions or fatal shutdown errors mark it as `failed`.
- Old job files are still cleaned by `JOB_TTL_SECONDS`.
## Safety
This patch does not change Retrieval, PromptBuilder, AgentRunner, Shopware, Intent, Vocabulary, scoring or RAG behavior.
It only hardens the SSE job lifecycle and improves user-facing error messages for reconnect/network cases.
## After applying
Run:
```bash
php bin/console cache:clear
php bin/console mto:agent:config:validate
php bin/console mto:agent:regression:test
```
Then test in the browser with a normal prompt and, if possible, simulate a short network interruption during streaming.