feat: incident correlation overhaul, signal-based auto-resolve, token fixes

Correlator - Raise fast-path idle gate 30 → 90 min (tg_fast_path_idle_minutes) - Fix disambiguate always-commits bug: run _call_fits_incident on winner before committing; fall through to new-incident creation if it fails - Add unit-continuity path (path 1.5): matches all_active by shared unit IDs with a reassignment guard, bridges calls past the idle gate - Add tag-based incident_type inference (_TAG_TYPE_HINTS) as GPT fallback, rescuing tagged calls that would have been dropped (616 observed orphans) - Add master/child incident model: _create_master_incident, _demote_to_child, _add_child_to_master; new incidents stamped incident_type="master" - Add cross-system parent detection (_find_cross_system_parent): two-signal scoring (road overlap=0.4, embedding≥0.78=0.3, proximity=0.3, threshold=0.5) wired into create-if-new path; creates master shell on first cross-system match - Add maybe_resolve_parent: auto-resolves master when all children close; called from upload pipeline (LLM closure) and summarizer stale sweep - Add signal-based auto-resolve via units_active/units_cleared tracking: GPT now extracts cleared_units per scene; _update_incident moves units between active/cleared lists and resolves the incident when active empties; stored on call doc for re-correlation sweep reuse - Add _create_incident initialization of units_active/units_cleared fields Re-correlation sweep - Add corr_sweep_count + MAX_SWEEP_ATTEMPTS=3: orphans get 3 attempts then are tombstoned as corr_path="unlinked", ending the re-sweep loop (previously hammering each orphan 29-31 times per shift) Intelligence extraction - Add cleared_units to GPT prompt schema and rules - Extract and propagate cleared_units per scene; merge across scenes; store on call doc for re-correlation sweep Token management - Fix token release bug: remove release_token call on discord_connected=False in MQTT checkin (transient Discord drops were orphaning bots mid-shift) - Add PUT /tokens/{id}/prefer/{system_id} endpoint: lock a bot token to a system; pass _none as system_id to clear; stored bidirectionally on both token and system documents - discord_join handler resolves preferred_token_id from system doc and passes system_name in MQTT payload
2026-05-10 19:49:05 -04:00
parent 7e1b01a275
commit 8b660d8e10
9 changed files with 509 additions and 23 deletions
@@ -109,10 +109,11 @@ class MQTTHandler:
                updates["status"] = "online"
            await fstore.doc_update("nodes", node_id, updates)

-        # Release any orphaned Discord token when the node explicitly reports disconnected
-        if payload.get("discord_connected") is False:
-            from app.routers.tokens import release_token
-            await release_token(node_id)
+        # NOTE: discord_connected in checkins is informational only — do NOT release the
+        # token here.  The bot watchdog reconnects on transient Discord drops, so a single
+        # checkin with discord_connected=False during a brief reconnect window would
+        # incorrectly free the token while the bot is still active.  Token release is
+        # handled exclusively by the discord_leave command and the node offline sweeper.

    # ------------------------------------------------------------------
    # Status update