_handle_status was calling doc_update unconditionally, which throws a 404
when a node has been deleted from the UI but is still running and sending
heartbeats. Catch the "No document to update" error and log at info level
instead of bubbling up to the dispatch error handler.
Correlator
- Raise fast-path idle gate 30 → 90 min (tg_fast_path_idle_minutes)
- Fix disambiguate always-commits bug: run _call_fits_incident on winner
before committing; fall through to new-incident creation if it fails
- Add unit-continuity path (path 1.5): matches all_active by shared unit
IDs with a reassignment guard, bridges calls past the idle gate
- Add tag-based incident_type inference (_TAG_TYPE_HINTS) as GPT fallback,
rescuing tagged calls that would have been dropped (616 observed orphans)
- Add master/child incident model: _create_master_incident, _demote_to_child,
_add_child_to_master; new incidents stamped incident_type="master"
- Add cross-system parent detection (_find_cross_system_parent): two-signal
scoring (road overlap=0.4, embedding≥0.78=0.3, proximity=0.3, threshold=0.5)
wired into create-if-new path; creates master shell on first cross-system match
- Add maybe_resolve_parent: auto-resolves master when all children close;
called from upload pipeline (LLM closure) and summarizer stale sweep
- Add signal-based auto-resolve via units_active/units_cleared tracking:
GPT now extracts cleared_units per scene; _update_incident moves units
between active/cleared lists and resolves the incident when active empties;
stored on call doc for re-correlation sweep reuse
- Add _create_incident initialization of units_active/units_cleared fields
Re-correlation sweep
- Add corr_sweep_count + MAX_SWEEP_ATTEMPTS=3: orphans get 3 attempts
then are tombstoned as corr_path="unlinked", ending the re-sweep loop
(previously hammering each orphan 29-31 times per shift)
Intelligence extraction
- Add cleared_units to GPT prompt schema and rules
- Extract and propagate cleared_units per scene; merge across scenes;
store on call doc for re-correlation sweep
Token management
- Fix token release bug: remove release_token call on discord_connected=False
in MQTT checkin (transient Discord drops were orphaning bots mid-shift)
- Add PUT /tokens/{id}/prefer/{system_id} endpoint: lock a bot token to a
system; pass _none as system_id to clear; stored bidirectionally on both
token and system documents
- discord_join handler resolves preferred_token_id from system doc and passes
system_name in MQTT payload
### Firestore read reductions
**1. `doc_get_cached()` in `firestore.py` — new 5-min TTL cache**
One place, benefits everything. System and node config documents almost never change during a monitoring session.
**2. System doc: 4 reads → 1 per call**
| Before | After |
|---|---|
| `upload.py` — `doc_get("systems")` for ai_flags | `doc_get_cached` |
| `transcription.py` — `get_vocabulary()` → `doc_get("systems")` | cache hit |
| `intelligence.py` — `get_vocabulary()` → `doc_get("systems")` | cache hit |
| `intelligence.py` — `doc_get("systems")` again for ten_codes | eliminated (reads same cached doc) |
**3. Node doc: cached in `_on_call_start` and `intelligence.py`**
The node is read every call event to get `assigned_system_id` and lat/lon for geocoding. Both now use the cache — node assignments and positions essentially never change at runtime.
**4. Node sweeper: 30s → 90s interval**
The sweeper was doing a full node collection scan 3× more often than necessary — the offline threshold is already 90s. Cuts sweeper reads by 66%.
**5. Vocabulary induction: scans all-time calls → last 7 days**
Previously fetched every ended call for a system (could be thousands). Now scoped to the last 7 days.
> **Note:** The vocabulary induction query `(system_id == X, ended_at >= cutoff)` needs a Firestore
> composite index on `(system_id ASC, ended_at ASC)`. When the induction loop first fires it will log
> an error with a Firebase Console link to create it in one click.
Upload 404 warning doc_set(merge=True) in upload.py — creates doc if missing
MQTT call_end 404 error doc_set(merge=True) in mqtt_handler.py — same root cause
Transcription 404 (saving transcript to nonexistent doc) doc_set(merge=True) in transcription.py
Transcription ADC credentials error Explicit service_account.Credentials from gcp-key.json in _sync_transcribe — same pattern as storage.py
docker-compose.yml: Added a pulse_socket named volume mounted at /run/pulse in both op25 and edge-node. Also set PULSE_SERVER=unix:/run/pulse/native in edge-node so libpulse (and ffmpeg's pulse input) finds the right socket.
discord_radio.py: Removed _icecast_url and changed _play_stream() to use -f pulse -i default.monitor. This reads directly from the PulseAudio sink monitor — zero buffer delay. The PULSE_SERVER env var is inherited by the ffmpeg subprocess.
Note: default.monitor captures whatever audio is playing on the default sink. If OP25 uses a named virtual sink, you may need to replace default.monitor with <sink_name>.monitor (run pactl list sinks short inside the op25 container to find the name).
Issue 2 — No audio URL / GCS credentials
storage.py: storage.Client() was using ADC but ADC isn't configured in the container. Now uses storage.Client.from_service_account_json(settings.gcp_credentials_path) when GCP_CREDENTIALS_PATH is set — same credential file Firebase already loads.
You also need to mount the key file into the server container in docker-compose.yml:
c2-core:
volumes:
- ./gcp-key.json:/app/gcp-key.json:ro
And set GCS_BUCKET=your-bucket-name in .env.
Issue 3 — Token orphaning
mqtt_manager.py: Every checkin now includes "discord_connected": radio_bot.is_connected.
mqtt_handler.py: On checkin, if discord_connected is explicitly False, calls release_token(node_id). Only fires on explicit false (missing field = unknown = no action).
node_sweeper.py: When a node is swept to offline, its token is released too. This covers the case where the node stops checking in entirely (crash/power loss).