Commit Graph

49 Commits

Author SHA1 Message Date
Logan 913fe0cbee Add source call audio playback to vocabulary suggestions
When the induction loop proposes a new vocabulary term, it now records
which sampled call(s) most likely produced the suggestion. Admins see
a collapsible "▶ source" player under each pending term showing the
audio clip and transcript, so they can hear what was actually said
before approving or dismissing.

- vocabulary_learner: track sampled call docs, attach source_call_ids
  to each pending term via word-overlap search with fallback
- types: VocabularyPendingTerm.source_call_ids?: string[]
- c2api: add getCall(id) using existing GET /calls/{call_id} endpoint
- VocabularyPanel: SourceCallPlayer component — lazy-loads call on
  first expand, shows audio controls + transcript snippet
2026-06-01 01:45:03 -04:00
Logan 032eef311f Fix vocabulary induction loop running too late
The loop slept 24h before its first pass, so suggestions would never
appear unless the server was up for a full day. Move the sleep to the
end so the first induction pass runs ~30s after startup.
2026-06-01 01:26:54 -04:00
Logan 3d51db80d0 Improve extraction accuracy with speaker role inference
Add a SPEAKER ROLES section to the GPT-4o-mini prompt teaching it to
distinguish dispatch voice (names a unit then gives assignment + address)
from unit voice (opens with own callsign + brief status). Applied to
location attribution (dispatch-provided address beats unit position report)
and unit extraction (dispatched units vs. acknowledging units). No extra
API calls — purely prompt-level reasoning on the existing transcript.
2026-06-01 01:17:49 -04:00
Logan 683b05beb1 Silence ERROR log for status messages from deleted nodes
_handle_status was calling doc_update unconditionally, which throws a 404
when a node has been deleted from the UI but is still running and sending
heartbeats. Catch the "No document to update" error and log at info level
instead of bubbling up to the dispatch error handler.
2026-06-01 01:06:49 -04:00
Logan cbcc85f7b1 Add consensus correlator: rules + Gemini LLM with smart tiebreaker
Refactor incident_correlator.py to a decision/commit split (preview_correlation
/ apply_correlation) so the rules engine and LLM can both produce decisions before
anything is written to Firestore.

Add llm_correlator.py: cheap Gemini Flash first-pass + Gemini Pro tiebreaker.
Wire _correlate_with_consensus in upload.py — rules-only fallback when key is
absent or call is thin; agreed/tiebreak consensus written to corr_debug.
2026-06-01 00:56:11 -04:00
Logan 6bf4333b72 Make correlation conservative: no time_fallback, pursuit-aware proximity, tiered thin path
- Remove time_fallback from _call_fits_incident: a substantive call with no
  matching signals (unit/vehicle/location) is now always orphaned on dispatch
  channels rather than attached by recency alone
- Pursuit-mode location: incidents tagged as vehicle-pursuit/pursuit/chase use
  a 20km expanded radius with speed-sanity validation (distance ÷ elapsed time
  must be ≤ 8 km/min) — location change is a positive signal for moving incidents
- Non-pursuit incidents: strict 0.5km proximity unchanged — location change = reject
- Thin path two-tier: ≤30s → attach to most-recent regardless of candidate count
  (direct conversational reply); 30s–10min → single candidate required
2026-06-01 00:08:19 -04:00
Logan b77d2cce36 Fix over-correlation: geocoding precision, thin path ambiguity, skip_reason propagation
- Geocoding: reject GEOMETRIC_CENTER/APPROXIMATE results — vague location strings
  (regions, city centroids) were resolving to node-area coords and creating false
  proximity matches that merged unrelated incidents
- Thin path: on dispatch channels with multiple active incidents, skip attachment
  rather than guessing — "10-4" with 3 active incidents is genuinely ambiguous
- Short transcripts (≤5 words) now write skip_reason="transcript_too_short" to
  the call doc, matching garbage transcript behavior
- upload.py no-scenes fallback now checks skip_reason before running correlation —
  flagged calls (garbage, too short) no longer attach via thin path
- Update Server README to reflect current project purpose, goals, and pipeline
2026-05-31 23:51:46 -04:00
Logan f774be12b8 Fix correlation over-merge, thin-call hallucination, and geocoding accuracy
- Cap unit-continuity path at 20 min idle (unit_continuity_max_idle_minutes)
- Block time_fallback and unit-continuity matching on reassignment calls
- Expand reassignment detection to cover unit-initiated self-reassignment
- Skip GPT extraction entirely for transcripts ≤5 words (prevents hallucinated tags/units)
- Reduce geocode_max_km from 75 to 40 to reject far-out-of-area results
- Include county in geocoding query for tighter jurisdiction anchoring
2026-05-26 02:20:15 -04:00
Logan c5932165d8 Bug for new nodes 2026-05-25 16:29:20 -04:00
Logan 84ab72442f Correlator bugfix 2026-05-25 15:57:59 -04:00
Logan adf10244b4 Bug hunting for correlator 2026-05-25 15:41:43 -04:00
Logan 7d6e97fd4a fix: improve geocoding specificity and increase distance threshold for repeater systems
geocode_max_km: 25 → 75 km. The node is a physical receiver, not the system boundary;
digital repeaters extend coverage well beyond 25km (North White Plains at 35.5km from
the Yorktown node is a legitimate Westchester County location).

Query now fully qualified: "High Street" → "High Street, Yorktown, New York".
Added _get_node_state() which reverse-geocodes the node position once (cached) using
Google Maps to get the state name, appended alongside the municipality.
Generic street names (High Street, Main Street) no longer resolve to wrong-country results.
2026-05-25 14:49:02 -04:00
Logan 0279a82b10 feat: replace Nominatim geocoding with Google Maps API; add TOC map improvements
Switch geocoding from Nominatim to Google Maps Geocoding API for accurate
local place name resolution (bounds-biased, with 25km distance rejection guard).
Remove the now-unused _get_node_place reverse-geocoder and _node_place_cache.

Map page (TOC improvements):
- Weather radar tiles auto-refresh every 5 minutes via radarEpoch key cycling
- Google Maps traffic overlay added to LayersControl
- Live 24h clock overlay at bottom-left for situational awareness
- Incident sidebar cards now show age (time since dispatch) and unit count
2026-05-25 13:27:19 -04:00
Logan 0db09d6bf7 fix: reject geocode results outside node jurisdiction
Nominatim's viewbox is advisory (bounded=0), so ambiguous place names like
"Pinebrook" can resolve to locations 30-40km away in the wrong town. Added
a post-geocode distance gate: results farther than geocode_max_km (default
25km) from the node are discarded with a warning log rather than written to
the incident.

Also logs distance on successful geocodes for easier audit.

New config setting: geocode_max_km (float, default 25.0)
2026-05-25 13:09:10 -04:00
Logan 4b7d9dd49a feat: enrich correlation debug with fit_signal and orphan breakdown
_call_fits_incident now returns (bool, signal_str) so each correlation
decision records exactly what evidence fired: unit_overlap, vehicle_overlap,
location_proximity, time_fallback, tactical_default, or the corresponding
false-return variants (unit_loc_conflict, content_divergence, etc.).

- corr_fit_signal and corr_matched_units written to call docs for
  fast/single and fast/disambig paths
- Admin debug endpoint exposes the new fields in calls_detail
- Orphan section adds orphans_by_talkgroup summary (count, no-type count,
  sweep-exhausted count per TGID) and raises orphan limit 100 → 250
- Admin page shows corr_path and fit_signal distribution panels above raw
  JSON; time_fallback highlighted in yellow as a diagnostic marker

No correlation logic changed — diagnostic data only.
2026-05-25 12:54:34 -04:00
Logan 7dd090e8b2 fix: raise garbage-transcript threshold to avoid false positives on plate reads
Phonetic run threshold 5 → 12: a plate spellout ("Foxtrot Alpha Uniform Lima
Kilo...") produces 6–8 consecutive phonetic words, triggering false positives
and blocking intelligence extraction on legitimate calls. 12 is safely above
any real spellout (~8 max) while still catching the full-alphabet hallucination
(26 words). Also writes skip_reason="garbage_transcript" to the call doc and
surfaces it in the admin correlation debug endpoint.
2026-05-25 03:31:43 -04:00
Logan 92c9d8effc fix: garbage transcript detection, county geocoding, dispatch channel detection
- intelligence.py: detect Whisper phonetic-alphabet hallucinations before
  sending to GPT; skip extraction entirely to prevent fake units/tags
  corrupting correlation
- intelligence.py: upgrade node reverse-geocode from zoom=5 (state) to
  zoom=10 (county) and include county in address queries so common street
  names (e.g. "East Main Street") resolve to the correct county
- incident_correlator.py: add "patched" and "primary" to dispatch channel
  regex so patched trunking channels are treated as shared backbones
- incident_correlator.py: add 20-min idle gate for tactical channel default
  so a reused frequency can't absorb a new unrelated incident
2026-05-24 01:30:40 -04:00
Logan 1071bcd3e8 fix: map overlay clicks, layer overlap, fan spacing, geocoding radius
- Move incident panel to left side (was topright, conflicting with LayersControl)
- Move legend to bottom-right, raise auto-fit button to clear it
- Tighten fan card step 7→5px for closer grouping
- Geocoding: remove bounded=1 hard clip, widen bias radius 0.1°→0.5° (~55km)
  so addresses like "34 Carlton Drive" resolve outside the node's immediate area
2026-05-24 00:20:11 -04:00
Logan 6397e24035 Correlation updates 2026-05-23 22:55:50 -04:00
Logan 9d73fc52fa STT bugfix 2026-05-17 19:37:38 -04:00
Logan 97ed691cd2 correlation upgrades 2026-05-17 19:05:52 -04:00
Logan 8b660d8e10 feat: incident correlation overhaul, signal-based auto-resolve, token fixes
Correlator
- Raise fast-path idle gate 30 → 90 min (tg_fast_path_idle_minutes)
- Fix disambiguate always-commits bug: run _call_fits_incident on winner
  before committing; fall through to new-incident creation if it fails
- Add unit-continuity path (path 1.5): matches all_active by shared unit
  IDs with a reassignment guard, bridges calls past the idle gate
- Add tag-based incident_type inference (_TAG_TYPE_HINTS) as GPT fallback,
  rescuing tagged calls that would have been dropped (616 observed orphans)
- Add master/child incident model: _create_master_incident, _demote_to_child,
  _add_child_to_master; new incidents stamped incident_type="master"
- Add cross-system parent detection (_find_cross_system_parent): two-signal
  scoring (road overlap=0.4, embedding≥0.78=0.3, proximity=0.3, threshold=0.5)
  wired into create-if-new path; creates master shell on first cross-system match
- Add maybe_resolve_parent: auto-resolves master when all children close;
  called from upload pipeline (LLM closure) and summarizer stale sweep
- Add signal-based auto-resolve via units_active/units_cleared tracking:
  GPT now extracts cleared_units per scene; _update_incident moves units
  between active/cleared lists and resolves the incident when active empties;
  stored on call doc for re-correlation sweep reuse
- Add _create_incident initialization of units_active/units_cleared fields

Re-correlation sweep
- Add corr_sweep_count + MAX_SWEEP_ATTEMPTS=3: orphans get 3 attempts
  then are tombstoned as corr_path="unlinked", ending the re-sweep loop
  (previously hammering each orphan 29-31 times per shift)

Intelligence extraction
- Add cleared_units to GPT prompt schema and rules
- Extract and propagate cleared_units per scene; merge across scenes;
  store on call doc for re-correlation sweep

Token management
- Fix token release bug: remove release_token call on discord_connected=False
  in MQTT checkin (transient Discord drops were orphaning bots mid-shift)
- Add PUT /tokens/{id}/prefer/{system_id} endpoint: lock a bot token to a
  system; pass _none as system_id to clear; stored bidirectionally on both
  token and system documents
- discord_join handler resolves preferred_token_id from system doc and passes
  system_name in MQTT payload
2026-05-10 19:49:05 -04:00
Logan 7e1b01a275 Updates to reduce firestore calls to try and stay in free tier
### Firestore read reductions

**1. `doc_get_cached()` in `firestore.py` — new 5-min TTL cache**
One place, benefits everything. System and node config documents almost never change during a monitoring session.

**2. System doc: 4 reads → 1 per call**
| Before | After |
|---|---|
| `upload.py` — `doc_get("systems")` for ai_flags | `doc_get_cached` |
| `transcription.py` — `get_vocabulary()` → `doc_get("systems")` | cache hit |
| `intelligence.py` — `get_vocabulary()` → `doc_get("systems")` | cache hit |
| `intelligence.py` — `doc_get("systems")` again for ten_codes | eliminated (reads same cached doc) |

**3. Node doc: cached in `_on_call_start` and `intelligence.py`**
The node is read every call event to get `assigned_system_id` and lat/lon for geocoding. Both now use the cache — node assignments and positions essentially never change at runtime.

**4. Node sweeper: 30s → 90s interval**
The sweeper was doing a full node collection scan 3× more often than necessary — the offline threshold is already 90s. Cuts sweeper reads by 66%.

**5. Vocabulary induction: scans all-time calls → last 7 days**
Previously fetched every ended call for a system (could be thousands). Now scoped to the last 7 days.

> **Note:** The vocabulary induction query `(system_id == X, ended_at >= cutoff)` needs a Firestore
> composite index on `(system_id ASC, ended_at ASC)`. When the induction loop first fires it will log
> an error with a Firebase Console link to create it in one click.
2026-05-04 02:05:00 -04:00
Logan 97f4286810 Add debugging 2026-05-04 01:46:56 -04:00
Logan e704df1a62 # app/internal/incident_correlator.py
- *`correlate_call`* — added units and vehicles optional params; when provided (per-scene from intelligence extraction), they take priority over the merged call-document values, preventing multi-scene unit contamination
- *Cross-TGID correlation path (2.5)* — *new path between location and slow paths*: when a call shares 2+ unit IDs with a recent same-system, same-type incident AND embedding similarity ≥ 0.85, it links them — catches multi-talkgroup pursuits like the bicycle search that split across dispatch/tactical/geographic channels
# `app/internal/intelligence.py`
- *`reassignment` field* — added to the GPT-4o-mini prompt schema and rules; `true` when dispatch is actively pulling a unit to a new, different call (not a status update or en route acknowledgement); returned in every processed scene dict
- *Tag location rule* — added explicit instruction to the prompt: tags must describe what happened, not where; place names, road names, and talkgroup names are explicitly forbidden as tags
# `app/routers/upload.py`
- Both scene correlation call sites (`_run_extraction_pipeline` and `_run_intelligence_pipeline`) now pass `units=corr_units` where `corr_units = [] if scene.get("reassignment") else scene.get("units") `— suppresses unit overlap matching when a unit is being reassigned to a new call, preventing chaining into their previous incident
- Both sites also pass `vehicles=scene.get("vehicles")` (per-scene vehicles, from the multi-scene units fix)
# `app/config.py`
- `embedding_cross_tg_threshold: float = 0.85` — threshold for the new cross-TGID path
2026-05-04 01:33:03 -04:00
Logan f6897566f8 Fix tags, titles, and hallucinations 2026-05-04 01:13:18 -04:00
Logan f8a9cda27e update firestore to FieldFilter 2026-04-27 00:54:35 -04:00
Logan c959437059 Implement Admin UI to disable AI components 2026-04-27 00:37:51 -04:00
Logan 92c8351864 Correlation updates 2026-04-26 11:01:32 -04:00
Logan 317f9d2a9d Updates to intel and correlation 2026-04-23 01:26:41 -04:00
Logan 65839a3191 Implement recorrelation logic 2026-04-21 22:19:57 -04:00
Logan 338b946ba3 Start to learn vocab from talkgroups to improve accuracy of STT 2026-04-21 22:17:30 -04:00
Logan 6612e4b683 Big updates 2026-04-21 01:51:23 -04:00
Logan 788afca339 Update geocoding intel 2026-04-19 23:27:51 -04:00
Logan ba43796c51 Updates, big updates
incident_correlator.py — full rewrite: always runs on every call, fetches all active incidents cross-type, fast path collects all talkgroup matches and disambiguates by unit/vehicle overlap → location proximity → embedding, new location proximity path, slow path requires location corroboration, "Auto:" stripped from titles, "auto-generated" tag added, units/vehicles now accumulated on update
intelligence.py — resolved field in GPT schema, returned as 5th value
upload.py — both pipelines unpack 5-tuple, always call correlate, auto-resolve on resolved=True
summarizer.py — stale sweep runs each tick, resolves incidents idle for 90+ minutes
config.py — correlation_window_hours=2, embedding_similarity_threshold=0.93, location_proximity_km=0.5, incident_auto_resolve_minutes=90
2026-04-19 22:53:53 -04:00
Logan 303c5b13cf big ui and intel updates 2026-04-19 16:48:55 -04:00
Logan 03212fca51 Move to GPT for API consistency 2026-04-19 08:18:55 -04:00
Logan 1e3d691dbd Intel update 2026-04-19 08:00:09 -04:00
Logan 10aabf4fb2 Change models 2026-04-13 01:43:10 -04:00
Logan 616c06f09c stt updates and intelligence updates 2026-04-13 00:01:19 -04:00
Logan 7b6fd640d9 Update intelligence 2026-04-12 23:33:44 -04:00
Logan 757bfe82e0 change model to whisper 2026-04-12 22:36:21 -04:00
Logan 357553f1ea Issue Fix
Upload 404 warning	doc_set(merge=True) in upload.py — creates doc if missing
MQTT call_end 404 error	doc_set(merge=True) in mqtt_handler.py — same root cause
Transcription 404 (saving transcript to nonexistent doc)	doc_set(merge=True) in transcription.py
Transcription ADC credentials error	Explicit service_account.Credentials from gcp-key.json in _sync_transcribe — same pattern as storage.py
2026-04-12 22:04:11 -04:00
Logan 030dd2d787 File Change
app/internal/storage.py	Replaced make_public() + public_url with a v2 signed URL (1-year expiry, no public bucket needed)
app/main.py	Releases all in-use tokens at startup — tokens from previous sessions are cleared automatically
app/routers/tokens.py	Added POST /tokens/flush to force-release orphaned tokens on demand
2026-04-11 21:16:14 -04:00
Logan 2a690ec696 Issue 1 — Discord Audio (PulseAudio)
docker-compose.yml: Added a pulse_socket named volume mounted at /run/pulse in both op25 and edge-node. Also set PULSE_SERVER=unix:/run/pulse/native in edge-node so libpulse (and ffmpeg's pulse input) finds the right socket.

discord_radio.py: Removed _icecast_url and changed _play_stream() to use -f pulse -i default.monitor. This reads directly from the PulseAudio sink monitor — zero buffer delay. The PULSE_SERVER env var is inherited by the ffmpeg subprocess.

Note: default.monitor captures whatever audio is playing on the default sink. If OP25 uses a named virtual sink, you may need to replace default.monitor with <sink_name>.monitor (run pactl list sinks short inside the op25 container to find the name).

Issue 2 — No audio URL / GCS credentials

storage.py: storage.Client() was using ADC but ADC isn't configured in the container. Now uses storage.Client.from_service_account_json(settings.gcp_credentials_path) when GCP_CREDENTIALS_PATH is set — same credential file Firebase already loads.

You also need to mount the key file into the server container in docker-compose.yml:

c2-core:
  volumes:
    - ./gcp-key.json:/app/gcp-key.json:ro
And set GCS_BUCKET=your-bucket-name in .env.

Issue 3 — Token orphaning

mqtt_manager.py: Every checkin now includes "discord_connected": radio_bot.is_connected.

mqtt_handler.py: On checkin, if discord_connected is explicitly False, calls release_token(node_id). Only fires on explicit false (missing field = unknown = no action).

node_sweeper.py: When a node is swept to offline, its token is released too. This covers the case where the node stops checking in entirely (crash/power loss).
2026-04-11 20:31:07 -04:00
Logan d07db4dfc2 MQTT and nodes update 2026-04-11 20:01:56 -04:00
Logan 3b3a136d04 Massive update 2026-04-11 13:44:08 -04:00
Logan 636a847ee1 changes 2026-04-06 00:22:03 -04:00
Logan 2f0597c81b Initial commit — DRB server stack
Includes c2-core (FastAPI/MQTT/Firestore), discord-bot (slash commands),
frontend (Next.js admin UI), and mosquitto config.
2026-04-05 19:01:39 -04:00