Commit Graph

59 Commits

Author SHA1 Message Date
Logan b77d2cce36 Fix over-correlation: geocoding precision, thin path ambiguity, skip_reason propagation
- Geocoding: reject GEOMETRIC_CENTER/APPROXIMATE results — vague location strings
  (regions, city centroids) were resolving to node-area coords and creating false
  proximity matches that merged unrelated incidents
- Thin path: on dispatch channels with multiple active incidents, skip attachment
  rather than guessing — "10-4" with 3 active incidents is genuinely ambiguous
- Short transcripts (≤5 words) now write skip_reason="transcript_too_short" to
  the call doc, matching garbage transcript behavior
- upload.py no-scenes fallback now checks skip_reason before running correlation —
  flagged calls (garbage, too short) no longer attach via thin path
- Update Server README to reflect current project purpose, goals, and pipeline
2026-05-31 23:51:46 -04:00
Logan f774be12b8 Fix correlation over-merge, thin-call hallucination, and geocoding accuracy
- Cap unit-continuity path at 20 min idle (unit_continuity_max_idle_minutes)
- Block time_fallback and unit-continuity matching on reassignment calls
- Expand reassignment detection to cover unit-initiated self-reassignment
- Skip GPT extraction entirely for transcripts ≤5 words (prevents hallucinated tags/units)
- Reduce geocode_max_km from 75 to 40 to reject far-out-of-area results
- Include county in geocoding query for tighter jurisdiction anchoring
2026-05-26 02:20:15 -04:00
Logan 5eed4e08ce Implement delete node function 2026-05-25 20:20:50 -04:00
Logan c5932165d8 Bug for new nodes 2026-05-25 16:29:20 -04:00
Logan 84ab72442f Correlator bugfix 2026-05-25 15:57:59 -04:00
Logan adf10244b4 Bug hunting for correlator 2026-05-25 15:41:43 -04:00
Logan 7d6e97fd4a fix: improve geocoding specificity and increase distance threshold for repeater systems
geocode_max_km: 25 → 75 km. The node is a physical receiver, not the system boundary;
digital repeaters extend coverage well beyond 25km (North White Plains at 35.5km from
the Yorktown node is a legitimate Westchester County location).

Query now fully qualified: "High Street" → "High Street, Yorktown, New York".
Added _get_node_state() which reverse-geocodes the node position once (cached) using
Google Maps to get the state name, appended alongside the municipality.
Generic street names (High Street, Main Street) no longer resolve to wrong-country results.
2026-05-25 14:49:02 -04:00
Logan ef8e0d1bfa revert: remove leaflet.gridlayer.googlemutant — incompatible with Next.js 15 bundler
The package consistently throws 'L.GridLayer.GoogleMutant is not a constructor'
due to L-instance conflicts in the webpack bundle, despite multiple workaround
attempts. Removed package, transpilePackages entry, type stub, env var, and all
related component code. Traffic overlay dropped; geocoding (backend) unaffected.
2026-05-25 14:19:21 -04:00
Logan 0279a82b10 feat: replace Nominatim geocoding with Google Maps API; add TOC map improvements
Switch geocoding from Nominatim to Google Maps Geocoding API for accurate
local place name resolution (bounds-biased, with 25km distance rejection guard).
Remove the now-unused _get_node_place reverse-geocoder and _node_place_cache.

Map page (TOC improvements):
- Weather radar tiles auto-refresh every 5 minutes via radarEpoch key cycling
- Google Maps traffic overlay added to LayersControl
- Live 24h clock overlay at bottom-left for situational awareness
- Incident sidebar cards now show age (time since dispatch) and unit count
2026-05-25 13:27:19 -04:00
Logan 0db09d6bf7 fix: reject geocode results outside node jurisdiction
Nominatim's viewbox is advisory (bounded=0), so ambiguous place names like
"Pinebrook" can resolve to locations 30-40km away in the wrong town. Added
a post-geocode distance gate: results farther than geocode_max_km (default
25km) from the node are discarded with a warning log rather than written to
the incident.

Also logs distance on successful geocodes for easier audit.

New config setting: geocode_max_km (float, default 25.0)
2026-05-25 13:09:10 -04:00
Logan 4b7d9dd49a feat: enrich correlation debug with fit_signal and orphan breakdown
_call_fits_incident now returns (bool, signal_str) so each correlation
decision records exactly what evidence fired: unit_overlap, vehicle_overlap,
location_proximity, time_fallback, tactical_default, or the corresponding
false-return variants (unit_loc_conflict, content_divergence, etc.).

- corr_fit_signal and corr_matched_units written to call docs for
  fast/single and fast/disambig paths
- Admin debug endpoint exposes the new fields in calls_detail
- Orphan section adds orphans_by_talkgroup summary (count, no-type count,
  sweep-exhausted count per TGID) and raises orphan limit 100 → 250
- Admin page shows corr_path and fit_signal distribution panels above raw
  JSON; time_fallback highlighted in yellow as a diagnostic marker

No correlation logic changed — diagnostic data only.
2026-05-25 12:54:34 -04:00
Logan 7dd090e8b2 fix: raise garbage-transcript threshold to avoid false positives on plate reads
Phonetic run threshold 5 → 12: a plate spellout ("Foxtrot Alpha Uniform Lima
Kilo...") produces 6–8 consecutive phonetic words, triggering false positives
and blocking intelligence extraction on legitimate calls. 12 is safely above
any real spellout (~8 max) while still catching the full-alphabet hallucination
(26 words). Also writes skip_reason="garbage_transcript" to the call doc and
surfaces it in the admin correlation debug endpoint.
2026-05-25 03:31:43 -04:00
Logan 92c9d8effc fix: garbage transcript detection, county geocoding, dispatch channel detection
- intelligence.py: detect Whisper phonetic-alphabet hallucinations before
  sending to GPT; skip extraction entirely to prevent fake units/tags
  corrupting correlation
- intelligence.py: upgrade node reverse-geocode from zoom=5 (state) to
  zoom=10 (county) and include county in address queries so common street
  names (e.g. "East Main Street") resolve to the correct county
- incident_correlator.py: add "patched" and "primary" to dispatch channel
  regex so patched trunking channels are treated as shared backbones
- incident_correlator.py: add 20-min idle gate for tactical channel default
  so a reused frequency can't absorb a new unrelated incident
2026-05-24 01:30:40 -04:00
Logan 1071bcd3e8 fix: map overlay clicks, layer overlap, fan spacing, geocoding radius
- Move incident panel to left side (was topright, conflicting with LayersControl)
- Move legend to bottom-right, raise auto-fit button to clear it
- Tighten fan card step 7→5px for closer grouping
- Geocoding: remove bounded=1 hard clip, widen bias radius 0.1°→0.5° (~55km)
  so addresses like "34 Carlton Drive" resolve outside the node's immediate area
2026-05-24 00:20:11 -04:00
Logan 6397e24035 Correlation updates 2026-05-23 22:55:50 -04:00
Logan 5a18a66d77 fix ppm bug 2026-05-23 18:22:47 -04:00
Logan 35ce8e911e audio fixes attempt 2026-05-23 14:59:51 -04:00
Logan 9d73fc52fa STT bugfix 2026-05-17 19:37:38 -04:00
Logan 97ed691cd2 correlation upgrades 2026-05-17 19:05:52 -04:00
Logan bcc3d3406d add debug in admin 2026-05-17 18:42:42 -04:00
Logan 8b660d8e10 feat: incident correlation overhaul, signal-based auto-resolve, token fixes
Correlator
- Raise fast-path idle gate 30 → 90 min (tg_fast_path_idle_minutes)
- Fix disambiguate always-commits bug: run _call_fits_incident on winner
  before committing; fall through to new-incident creation if it fails
- Add unit-continuity path (path 1.5): matches all_active by shared unit
  IDs with a reassignment guard, bridges calls past the idle gate
- Add tag-based incident_type inference (_TAG_TYPE_HINTS) as GPT fallback,
  rescuing tagged calls that would have been dropped (616 observed orphans)
- Add master/child incident model: _create_master_incident, _demote_to_child,
  _add_child_to_master; new incidents stamped incident_type="master"
- Add cross-system parent detection (_find_cross_system_parent): two-signal
  scoring (road overlap=0.4, embedding≥0.78=0.3, proximity=0.3, threshold=0.5)
  wired into create-if-new path; creates master shell on first cross-system match
- Add maybe_resolve_parent: auto-resolves master when all children close;
  called from upload pipeline (LLM closure) and summarizer stale sweep
- Add signal-based auto-resolve via units_active/units_cleared tracking:
  GPT now extracts cleared_units per scene; _update_incident moves units
  between active/cleared lists and resolves the incident when active empties;
  stored on call doc for re-correlation sweep reuse
- Add _create_incident initialization of units_active/units_cleared fields

Re-correlation sweep
- Add corr_sweep_count + MAX_SWEEP_ATTEMPTS=3: orphans get 3 attempts
  then are tombstoned as corr_path="unlinked", ending the re-sweep loop
  (previously hammering each orphan 29-31 times per shift)

Intelligence extraction
- Add cleared_units to GPT prompt schema and rules
- Extract and propagate cleared_units per scene; merge across scenes;
  store on call doc for re-correlation sweep

Token management
- Fix token release bug: remove release_token call on discord_connected=False
  in MQTT checkin (transient Discord drops were orphaning bots mid-shift)
- Add PUT /tokens/{id}/prefer/{system_id} endpoint: lock a bot token to a
  system; pass _none as system_id to clear; stored bidirectionally on both
  token and system documents
- discord_join handler resolves preferred_token_id from system doc and passes
  system_name in MQTT payload
2026-05-10 19:49:05 -04:00
Logan 7e1b01a275 Updates to reduce firestore calls to try and stay in free tier
### Firestore read reductions

**1. `doc_get_cached()` in `firestore.py` — new 5-min TTL cache**
One place, benefits everything. System and node config documents almost never change during a monitoring session.

**2. System doc: 4 reads → 1 per call**
| Before | After |
|---|---|
| `upload.py` — `doc_get("systems")` for ai_flags | `doc_get_cached` |
| `transcription.py` — `get_vocabulary()` → `doc_get("systems")` | cache hit |
| `intelligence.py` — `get_vocabulary()` → `doc_get("systems")` | cache hit |
| `intelligence.py` — `doc_get("systems")` again for ten_codes | eliminated (reads same cached doc) |

**3. Node doc: cached in `_on_call_start` and `intelligence.py`**
The node is read every call event to get `assigned_system_id` and lat/lon for geocoding. Both now use the cache — node assignments and positions essentially never change at runtime.

**4. Node sweeper: 30s → 90s interval**
The sweeper was doing a full node collection scan 3× more often than necessary — the offline threshold is already 90s. Cuts sweeper reads by 66%.

**5. Vocabulary induction: scans all-time calls → last 7 days**
Previously fetched every ended call for a system (could be thousands). Now scoped to the last 7 days.

> **Note:** The vocabulary induction query `(system_id == X, ended_at >= cutoff)` needs a Firestore
> composite index on `(system_id ASC, ended_at ASC)`. When the induction loop first fires it will log
> an error with a Firebase Console link to create it in one click.
2026-05-04 02:05:00 -04:00
Logan 97f4286810 Add debugging 2026-05-04 01:46:56 -04:00
Logan e704df1a62 # app/internal/incident_correlator.py
- *`correlate_call`* — added units and vehicles optional params; when provided (per-scene from intelligence extraction), they take priority over the merged call-document values, preventing multi-scene unit contamination
- *Cross-TGID correlation path (2.5)* — *new path between location and slow paths*: when a call shares 2+ unit IDs with a recent same-system, same-type incident AND embedding similarity ≥ 0.85, it links them — catches multi-talkgroup pursuits like the bicycle search that split across dispatch/tactical/geographic channels
# `app/internal/intelligence.py`
- *`reassignment` field* — added to the GPT-4o-mini prompt schema and rules; `true` when dispatch is actively pulling a unit to a new, different call (not a status update or en route acknowledgement); returned in every processed scene dict
- *Tag location rule* — added explicit instruction to the prompt: tags must describe what happened, not where; place names, road names, and talkgroup names are explicitly forbidden as tags
# `app/routers/upload.py`
- Both scene correlation call sites (`_run_extraction_pipeline` and `_run_intelligence_pipeline`) now pass `units=corr_units` where `corr_units = [] if scene.get("reassignment") else scene.get("units") `— suppresses unit overlap matching when a unit is being reassigned to a new call, preventing chaining into their previous incident
- Both sites also pass `vehicles=scene.get("vehicles")` (per-scene vehicles, from the multi-scene units fix)
# `app/config.py`
- `embedding_cross_tg_threshold: float = 0.85` — threshold for the new cross-TGID path
2026-05-04 01:33:03 -04:00
Logan f6897566f8 Fix tags, titles, and hallucinations 2026-05-04 01:13:18 -04:00
Logan 531ce64eeb Fix system AI flag bug 2026-04-27 00:58:05 -04:00
Logan f8a9cda27e update firestore to FieldFilter 2026-04-27 00:54:35 -04:00
Logan 640667c9f9 Implement per-system AI flags 2026-04-27 00:50:01 -04:00
Logan c959437059 Implement Admin UI to disable AI components 2026-04-27 00:37:51 -04:00
Logan 92c8351864 Correlation updates 2026-04-26 11:01:32 -04:00
Logan 317f9d2a9d Updates to intel and correlation 2026-04-23 01:26:41 -04:00
Logan e70e7c0be9 Use UV for pip 2026-04-21 22:36:01 -04:00
Logan 65839a3191 Implement recorrelation logic 2026-04-21 22:19:57 -04:00
Logan 338b946ba3 Start to learn vocab from talkgroups to improve accuracy of STT 2026-04-21 22:17:30 -04:00
Logan 6612e4b683 Big updates 2026-04-21 01:51:23 -04:00
Logan 788afca339 Update geocoding intel 2026-04-19 23:27:51 -04:00
Logan ba43796c51 Updates, big updates
incident_correlator.py — full rewrite: always runs on every call, fetches all active incidents cross-type, fast path collects all talkgroup matches and disambiguates by unit/vehicle overlap → location proximity → embedding, new location proximity path, slow path requires location corroboration, "Auto:" stripped from titles, "auto-generated" tag added, units/vehicles now accumulated on update
intelligence.py — resolved field in GPT schema, returned as 5th value
upload.py — both pipelines unpack 5-tuple, always call correlate, auto-resolve on resolved=True
summarizer.py — stale sweep runs each tick, resolves incidents idle for 90+ minutes
config.py — correlation_window_hours=2, embedding_similarity_threshold=0.93, location_proximity_km=0.5, incident_auto_resolve_minutes=90
2026-04-19 22:53:53 -04:00
Logan 303c5b13cf big ui and intel updates 2026-04-19 16:48:55 -04:00
Logan 0df53df92e UI Updates 2026-04-19 15:22:29 -04:00
Logan 03212fca51 Move to GPT for API consistency 2026-04-19 08:18:55 -04:00
Logan 1e3d691dbd Intel update 2026-04-19 08:00:09 -04:00
Logan 2d606add75 Add new on-demand runs 2026-04-19 00:00:29 -04:00
Logan 10aabf4fb2 Change models 2026-04-13 01:43:10 -04:00
Logan 616c06f09c stt updates and intelligence updates 2026-04-13 00:01:19 -04:00
Logan 7b6fd640d9 Update intelligence 2026-04-12 23:33:44 -04:00
Logan 757bfe82e0 change model to whisper 2026-04-12 22:36:21 -04:00
Logan b29dcc1518 fix 2026-04-12 22:07:54 -04:00
Logan 357553f1ea Issue Fix
Upload 404 warning	doc_set(merge=True) in upload.py — creates doc if missing
MQTT call_end 404 error	doc_set(merge=True) in mqtt_handler.py — same root cause
Transcription 404 (saving transcript to nonexistent doc)	doc_set(merge=True) in transcription.py
Transcription ADC credentials error	Explicit service_account.Credentials from gcp-key.json in _sync_transcribe — same pattern as storage.py
2026-04-12 22:04:11 -04:00
Logan 66316efa53 fix upload name 2026-04-12 21:58:08 -04:00
Logan 030dd2d787 File Change
app/internal/storage.py	Replaced make_public() + public_url with a v2 signed URL (1-year expiry, no public bucket needed)
app/main.py	Releases all in-use tokens at startup — tokens from previous sessions are cleared automatically
app/routers/tokens.py	Added POST /tokens/flush to force-release orphaned tokens on demand
2026-04-11 21:16:14 -04:00