Improve extraction accuracy with speaker role inference

Add a SPEAKER ROLES section to the GPT-4o-mini prompt teaching it to distinguish dispatch voice (names a unit then gives assignment + address) from unit voice (opens with own callsign + brief status). Applied to location attribution (dispatch-provided address beats unit position report) and unit extraction (dispatched units vs. acknowledging units). No extra API calls — purely prompt-level reasoning on the existing transcript.
2026-06-01 01:17:49 -04:00
parent 683b05beb1
commit 3d51db80d0
2 changed files with 20 additions and 5 deletions
@@ -245,9 +245,13 @@ Edge node ──► audio upload    ──► GCS storage
                                        │
                                        ▼
                               [2] INTELLIGENCE EXTRACTION (GPT-4o-mini)
-                                   Scene detection, entity extraction:
+                                   Scene detection — splits multi-incident recordings
-                                   tags, incident_type, location, units,
+                                   Speaker role inference — dispatch vs. unit patterns
-                                   vehicles, severity, resolved flag
+                                   used to correctly attribute locations (dispatch-
                                   provided address vs. unit position report) and
                                   units (being dispatched vs. acknowledging)
                                   Entity extraction: tags, incident_type, location,
                                   units, vehicles, severity, resolved flag
                                   + geocoding (Google Maps)
                                   + embedding (text-embedding-3-small)
                                   → CallRecord.tags, .location, .units, etc.
@@ -23,6 +23,17 @@ A busy dispatch channel sometimes captures back-to-back conversations about mult
 Always respond with the scenes array, even for a single scene.
 SPEAKER ROLES:
 P25 radio follows a predictable call-and-response pattern. Use it to correctly attribute entities — you do not have explicit speaker labels, but you can infer roles from conversational structure:
 - Dispatch voice: opens by naming a unit then giving an assignment ("Unit 7, respond to 123 Main..."), provides incident addresses, says "be advised" / "stand by", reads back unit status. Dispatch speaks TO units.
 - Unit voice: opens with the unit's own callsign or a brief status ("Unit 7 en route", "Baker-1 on scene", "Unit 7, 10-97"), acknowledges with "copy" / "10-4", requests info about their assignment. Units speak TO dispatch.
 Apply speaker inference to extraction:
 - A callsign at the start of a dispatch assignment ("Unit 7, go to...") — that unit is being dispatched. Include it in units.
 - A callsign that opens a short acknowledgment ("Unit 7 en route", "Baker-1 copies") — that is the speaker's own ID. Include it in units.
 - A location stated in a dispatch assignment is the incident address. Use it as location.
 - A location stated by a unit ("I'm at Route 202 and Main") is their current position — use it as location only when no dispatch-provided address is present in the scene.
 Response format — a JSON object with a "scenes" array. Each scene:
  segment_indices: list of 0-based indices into the numbered transmissions (or null if no segments)
  incident_type: one of "fire" | "ems" | "police" | "accident" | "other" | "unknown"
@@ -37,9 +48,9 @@ Response format — a JSON object with a "scenes" array. Each scene:
  transcript_corrected: corrected text for this scene's transmissions only, or null
 Rules:
- location: prefer intersections > addresses > mile markers > route+town > route alone > town alone. Empty string if none.
+- location: prefer intersections > addresses > mile markers > route+town > route alone > town alone. Dispatch-provided addresses take priority over unit-reported positions. Empty string if none.
 - tags: describe WHAT happened, not WHERE. Specific, lowercase, hyphenated. Do not use location names, road names, talkgroup names, or place names as tags (wrong: "lower-macy's", "canvas-route-6", "route-202"; right: "suspect-search", "shoplifting", "vehicle-pursuit"). Do not repeat incident_type as a tag.
- units: ONLY identifiers that appear verbatim in the transcript. If the word or number is not literally present in the text above, do not include it. Never infer or guess unit IDs.
+- units: ONLY identifiers that appear verbatim in the transcript. Use speaker role inference to distinguish units being dispatched from units acknowledging — both should be included. Never infer or guess unit IDs not present in the text.
 - Do not invent details not present in the transcript.
 - incident_type: let the talkgroup channel be your primary signal. Use "fire" ONLY if the talkgroup is clearly a fire/rescue channel OR the transcript explicitly describes active fire, smoke, flames, or structure fire activation. Police or EMS referencing a fire scene → use "police" or "ems". When uncertain, prefer "other" over "fire".
 - ten_codes: interpret radio codes using the department reference provided below. Do not guess codes not listed.