From 3d51db80d06bffd99534149d44df3bf446ab6b46 Mon Sep 17 00:00:00 2001
From: Logan <Logan@simplestepsolutions.com>
Date: Mon, 1 Jun 2026 01:17:49 -0400
Subject: [PATCH] Improve extraction accuracy with speaker role inference
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add a SPEAKER ROLES section to the GPT-4o-mini prompt teaching it to
distinguish dispatch voice (names a unit then gives assignment + address)
from unit voice (opens with own callsign + brief status). Applied to
location attribution (dispatch-provided address beats unit position report)
and unit extraction (dispatched units vs. acknowledging units). No extra
API calls — purely prompt-level reasoning on the existing transcript.
---
 README.md                                | 10 +++++++---
 drb-c2-core/app/internal/intelligence.py | 15 +++++++++++++--
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 39ed02a..deb8987 100644
--- a/README.md
+++ b/README.md
@@ -245,9 +245,13 @@ Edge node ──► audio upload    ──► GCS storage
                                         │
                                         ▼
                                [2] INTELLIGENCE EXTRACTION (GPT-4o-mini)
-                                   Scene detection, entity extraction:
-                                   tags, incident_type, location, units,
-                                   vehicles, severity, resolved flag
+                                   Scene detection — splits multi-incident recordings
+                                   Speaker role inference — dispatch vs. unit patterns
+                                   used to correctly attribute locations (dispatch-
+                                   provided address vs. unit position report) and
+                                   units (being dispatched vs. acknowledging)
+                                   Entity extraction: tags, incident_type, location,
+                                   units, vehicles, severity, resolved flag
                                    + geocoding (Google Maps)
                                    + embedding (text-embedding-3-small)
                                    → CallRecord.tags, .location, .units, etc.
diff --git a/drb-c2-core/app/internal/intelligence.py b/drb-c2-core/app/internal/intelligence.py
index b12dced..60c21b0 100644
--- a/drb-c2-core/app/internal/intelligence.py
+++ b/drb-c2-core/app/internal/intelligence.py
@@ -23,6 +23,17 @@ A busy dispatch channel sometimes captures back-to-back conversations about mult
 
 Always respond with the scenes array, even for a single scene.
 
+SPEAKER ROLES:
+P25 radio follows a predictable call-and-response pattern. Use it to correctly attribute entities — you do not have explicit speaker labels, but you can infer roles from conversational structure:
+- Dispatch voice: opens by naming a unit then giving an assignment ("Unit 7, respond to 123 Main..."), provides incident addresses, says "be advised" / "stand by", reads back unit status. Dispatch speaks TO units.
+- Unit voice: opens with the unit's own callsign or a brief status ("Unit 7 en route", "Baker-1 on scene", "Unit 7, 10-97"), acknowledges with "copy" / "10-4", requests info about their assignment. Units speak TO dispatch.
+
+Apply speaker inference to extraction:
+- A callsign at the start of a dispatch assignment ("Unit 7, go to...") — that unit is being dispatched. Include it in units.
+- A callsign that opens a short acknowledgment ("Unit 7 en route", "Baker-1 copies") — that is the speaker's own ID. Include it in units.
+- A location stated in a dispatch assignment is the incident address. Use it as location.
+- A location stated by a unit ("I'm at Route 202 and Main") is their current position — use it as location only when no dispatch-provided address is present in the scene.
+
 Response format — a JSON object with a "scenes" array. Each scene:
   segment_indices: list of 0-based indices into the numbered transmissions (or null if no segments)
   incident_type: one of "fire" | "ems" | "police" | "accident" | "other" | "unknown"
@@ -37,9 +48,9 @@ Response format — a JSON object with a "scenes" array. Each scene:
   transcript_corrected: corrected text for this scene's transmissions only, or null
 
 Rules:
-- location: prefer intersections > addresses > mile markers > route+town > route alone > town alone. Empty string if none.
+- location: prefer intersections > addresses > mile markers > route+town > route alone > town alone. Dispatch-provided addresses take priority over unit-reported positions. Empty string if none.
 - tags: describe WHAT happened, not WHERE. Specific, lowercase, hyphenated. Do not use location names, road names, talkgroup names, or place names as tags (wrong: "lower-macy's", "canvas-route-6", "route-202"; right: "suspect-search", "shoplifting", "vehicle-pursuit"). Do not repeat incident_type as a tag.
-- units: ONLY identifiers that appear verbatim in the transcript. If the word or number is not literally present in the text above, do not include it. Never infer or guess unit IDs.
+- units: ONLY identifiers that appear verbatim in the transcript. Use speaker role inference to distinguish units being dispatched from units acknowledging — both should be included. Never infer or guess unit IDs not present in the text.
 - Do not invent details not present in the transcript.
 - incident_type: let the talkgroup channel be your primary signal. Use "fire" ONLY if the talkgroup is clearly a fire/rescue channel OR the transcript explicitly describes active fire, smoke, flames, or structure fire activation. Police or EMS referencing a fire scene → use "police" or "ems". When uncertain, prefer "other" over "fire".
 - ten_codes: interpret radio codes using the department reference provided below. Do not guess codes not listed.