# app/internal/incident_correlator.py

- *`correlate_call`* — added units and vehicles optional params; when provided (per-scene from intelligence extraction), they take priority over the merged call-document values, preventing multi-scene unit contamination - *Cross-TGID correlation path (2.5)* — *new path between location and slow paths*: when a call shares 2+ unit IDs with a recent same-system, same-type incident AND embedding similarity ≥ 0.85, it links them — catches multi-talkgroup pursuits like the bicycle search that split across dispatch/tactical/geographic channels # `app/internal/intelligence.py` - *`reassignment` field* — added to the GPT-4o-mini prompt schema and rules; `true` when dispatch is actively pulling a unit to a new, different call (not a status update or en route acknowledgement); returned in every processed scene dict - *Tag location rule* — added explicit instruction to the prompt: tags must describe what happened, not where; place names, road names, and talkgroup names are explicitly forbidden as tags # `app/routers/upload.py` - Both scene correlation call sites (`_run_extraction_pipeline` and `_run_intelligence_pipeline`) now pass `units=corr_units` where `corr_units = [] if scene.get("reassignment") else scene.get("units") `— suppresses unit overlap matching when a unit is being reassigned to a new call, preventing chaining into their previous incident - Both sites also pass `vehicles=scene.get("vehicles")` (per-scene vehicles, from the multi-scene units fix) # `app/config.py` - `embedding_cross_tg_threshold: float = 0.85` — threshold for the new cross-TGID path
2026-05-04 01:33:03 -04:00
parent f6897566f8
commit e704df1a62
4 changed files with 58 additions and 3 deletions
@@ -31,16 +31,18 @@ Response format — a JSON object with a "scenes" array. Each scene:
  units: list of unit IDs or officer numbers explicitly mentioned
  severity: one of "minor" | "moderate" | "major" | "unknown"
  resolved: true if this scene explicitly signals incident closure, false otherwise
+  reassignment: true if dispatch is actively pulling a unit away from their current assignment to respond to a new, different call — e.g. "Baker, can you clear and respond to...", "Adam, break from that and go to...". False if the unit is simply reporting in, updating status, or continuing their current assignment.
  transcript_corrected: corrected text for this scene's transmissions only, or null

 Rules:
 - location: prefer intersections > addresses > mile markers > route+town > route alone > town alone. Empty string if none.
- tags: specific, lowercase, hyphenated. Do not repeat incident_type as a tag.
+- tags: describe WHAT happened, not WHERE. Specific, lowercase, hyphenated. Do not use location names, road names, talkgroup names, or place names as tags (wrong: "lower-macy's", "canvas-route-6", "route-202"; right: "suspect-search", "shoplifting", "vehicle-pursuit"). Do not repeat incident_type as a tag.
 - units: only identifiers explicitly mentioned, not inferred.
 - Do not invent details not present in the transcript.
 - incident_type: let the talkgroup channel be your primary signal. Use "fire" ONLY if the talkgroup is clearly a fire/rescue channel OR the transcript explicitly describes active fire, smoke, flames, or structure fire activation. Police or EMS referencing a fire scene → use "police" or "ems". When uncertain, prefer "other" over "fire".
 - ten_codes: interpret radio codes using the department reference provided below. Do not guess codes not listed.
 - resolved: true only when the scene explicitly signals "Code 4", "all clear", "10-42", "in custody", "patient transported", "fire out", "GOA", "negative contact", "scene clear".
+- reassignment: only true when a unit is explicitly being pulled to a completely new call or location. A unit going en route to their first dispatch is NOT a reassignment. Routine status updates, acknowledgements, and scene updates are NOT reassignments.
 - transcript_corrected: fix only clear STT/vocoder errors (e.g. "Several" → "10-4", misheard street names, garbled unit IDs). Keep all radio language as-is — do NOT decode codes into plain English. Return null if accurate.

 System: {system_id}
@@ -130,6 +132,7 @@ async def extract_scenes(
        units:              list[str]      = scene.get("units") or []
        severity:           str            = scene.get("severity") or "unknown"
        resolved:           bool           = bool(scene.get("resolved", False))
+        reassignment:       bool           = bool(scene.get("reassignment", False))
        transcript_corrected: Optional[str]= scene.get("transcript_corrected") or None
        segment_indices:    Optional[list] = scene.get("segment_indices")

@@ -160,6 +163,7 @@ async def extract_scenes(
            "units":                units,
            "severity":             severity,
            "resolved":             resolved,
+            "reassignment":         reassignment,
            "transcript_corrected": transcript_corrected,
            "segment_indices":      segment_indices,
            "embedding":            embedding,