b77d2cce36
- Geocoding: reject GEOMETRIC_CENTER/APPROXIMATE results — vague location strings (regions, city centroids) were resolving to node-area coords and creating false proximity matches that merged unrelated incidents - Thin path: on dispatch channels with multiple active incidents, skip attachment rather than guessing — "10-4" with 3 active incidents is genuinely ambiguous - Short transcripts (≤5 words) now write skip_reason="transcript_too_short" to the call doc, matching garbage transcript behavior - upload.py no-scenes fallback now checks skip_reason before running correlation — flagged calls (garbage, too short) no longer attach via thin path - Update Server README to reflect current project purpose, goals, and pipeline
524 lines
24 KiB
Python
524 lines
24 KiB
Python
"""
|
||
GPT-4o-mini intelligence extraction from call transcripts.
|
||
|
||
Sends the transcript to GPT-4o-mini with a structured prompt that detects
|
||
whether the recording contains one or multiple distinct scenes (back-to-back
|
||
dispatch conversations on a busy channel). Returns a list of scene dicts —
|
||
one per detected incident. Most calls produce a single scene.
|
||
|
||
Falls back gracefully if the API is unavailable or returns malformed output.
|
||
"""
|
||
import asyncio
|
||
import json
|
||
import math
|
||
import re
|
||
from typing import Optional
|
||
from app.internal.logger import logger
|
||
from app.internal import firestore as fstore
|
||
|
||
_PROMPT_TEMPLATE = """You are analyzing a P25 public safety radio recording. The audio was transcribed by Whisper through a digital radio vocoder, which introduces errors. Each numbered transmission is a separate PTT press from a different radio.
|
||
|
||
SCENE DETECTION:
|
||
A busy dispatch channel sometimes captures back-to-back conversations about multiple concurrent incidents in a single recording. Detect whether this recording contains ONE scene (all transmissions relate to a single event) or MULTIPLE scenes (clearly distinct dispatch conversations with different units being assigned, different locations, different event types). Assign short status transmissions (10-4, en route, acknowledgements) with no clear scene context to the most recent scene before them in the list.
|
||
|
||
Always respond with the scenes array, even for a single scene.
|
||
|
||
Response format — a JSON object with a "scenes" array. Each scene:
|
||
segment_indices: list of 0-based indices into the numbered transmissions (or null if no segments)
|
||
incident_type: one of "fire" | "ems" | "police" | "accident" | "other" | "unknown"
|
||
tags: list of specific descriptive tags, max 6, e.g. "two-car mva", "working fire", "shots-fired"
|
||
location: most specific location string found, or empty string
|
||
vehicles: list of vehicle descriptions mentioned
|
||
units: list of unit IDs or officer numbers explicitly mentioned
|
||
cleared_units: list of unit IDs that explicitly signal back-in-service or available in this recording
|
||
severity: one of "minor" | "moderate" | "major" | "unknown"
|
||
resolved: true if this scene explicitly signals incident closure, false otherwise
|
||
reassignment: true if a unit is breaking from their current scene to respond to a completely different call — whether dispatch-initiated ("Baker, can you clear and respond to...", "Adam, break from that and go to...") OR unit-initiated ("Show me headed to the vehicle complaint", "Can you show me to that call", a unit going 10-8 and self-requesting a new assignment). False if the unit is reporting in on their current scene, giving a status update, or requesting information about their existing call.
|
||
transcript_corrected: corrected text for this scene's transmissions only, or null
|
||
|
||
Rules:
|
||
- location: prefer intersections > addresses > mile markers > route+town > route alone > town alone. Empty string if none.
|
||
- tags: describe WHAT happened, not WHERE. Specific, lowercase, hyphenated. Do not use location names, road names, talkgroup names, or place names as tags (wrong: "lower-macy's", "canvas-route-6", "route-202"; right: "suspect-search", "shoplifting", "vehicle-pursuit"). Do not repeat incident_type as a tag.
|
||
- units: ONLY identifiers that appear verbatim in the transcript. If the word or number is not literally present in the text above, do not include it. Never infer or guess unit IDs.
|
||
- Do not invent details not present in the transcript.
|
||
- incident_type: let the talkgroup channel be your primary signal. Use "fire" ONLY if the talkgroup is clearly a fire/rescue channel OR the transcript explicitly describes active fire, smoke, flames, or structure fire activation. Police or EMS referencing a fire scene → use "police" or "ems". When uncertain, prefer "other" over "fire".
|
||
- ten_codes: interpret radio codes using the department reference provided below. Do not guess codes not listed.
|
||
- resolved: true only when the scene explicitly signals "Code 4", "all clear", "10-42", "in custody", "patient transported", "fire out", "GOA", "negative contact", "scene clear".
|
||
- cleared_units: only include units that explicitly stated their own back-in-service status in this recording (e.g. "Unit 7, 10-8", "Baker-1 available", "E-14 back in service", or the department ten-code for available/back-in-service listed above). Silence or absence of a unit is NOT clearance. A scene-wide Code 4 belongs in resolved=true, not here — cleared_units is for individual unit availability signals only.
|
||
- reassignment: only true when a unit is explicitly being pulled to a completely new call or location. A unit going en route to their first dispatch is NOT a reassignment. Routine status updates, acknowledgements, and scene updates are NOT reassignments.
|
||
- transcript_corrected: fix only clear STT/vocoder errors (e.g. "Several" → "10-4", misheard street names, garbled unit IDs). Keep all radio language as-is — do NOT decode codes into plain English. Return null if accurate.
|
||
|
||
System: {system_id}
|
||
Talkgroup: {talkgroup_name}
|
||
{ten_codes_block}{vocabulary_block}{transcript_block}"""
|
||
|
||
# Geographic bias radius for geocoding — half-width in degrees (~55 km)
|
||
_GEO_DELTA = 0.5
|
||
|
||
# Cache node state (e.g. "New York") and county (e.g. "Westchester County") per node
|
||
_node_state_cache: dict[str, str] = {}
|
||
_node_county_cache: dict[str, str] = {}
|
||
|
||
# Police/law-enforcement phonetic alphabet words (APCO + NATO).
|
||
# A run of 5+ of these in a transcript is a strong Whisper hallucination signal.
|
||
_PHONETIC_ALPHA_WORDS = frozenset({
|
||
# APCO (law enforcement)
|
||
"adam", "baker", "charles", "david", "edward", "frank", "george", "henry",
|
||
"ida", "john", "king", "lincoln", "mary", "nora", "ocean", "paul", "queen",
|
||
"robert", "sam", "tom", "union", "victor", "william", "x-ray", "young", "zebra",
|
||
# NATO
|
||
"alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel",
|
||
"india", "juliet", "kilo", "lima", "mike", "november", "oscar", "papa",
|
||
"quebec", "romeo", "sierra", "tango", "uniform", "whiskey", "yankee", "zulu",
|
||
})
|
||
|
||
# Strip P25 service suffixes to extract the municipality name from a talkgroup
|
||
_TG_SUFFIX_RE = re.compile(
|
||
r"\s*\b(police\s*dep(t|artment)?|pd|fire\s*(dep(t|artment)|district)?|"
|
||
r"ems|rescue|dispatch|fd|tac(tical)?|ops|operations?|command|"
|
||
r"(fire\s*)?ground|mutual\s*aid|channel|ch\b|car[-\s]to[-\s]car|"
|
||
r"division|unit)\b.*",
|
||
re.IGNORECASE,
|
||
)
|
||
|
||
|
||
def _is_garbage_transcript(transcript: str) -> bool:
|
||
"""
|
||
Detect Whisper hallucinations that should be discarded before GPT processing.
|
||
|
||
Two signals:
|
||
1. Phonetic-alphabet run ≥ 5 consecutive words: Whisper hallucinated a
|
||
training-data sequence (common on silent or noise-only audio).
|
||
2. High comma density (> 15% of tokens) in long transcripts: list-dump
|
||
hallucinations contain far more commas than real radio speech.
|
||
"""
|
||
words = re.findall(r"[\w\-]+", transcript.lower())
|
||
if not words:
|
||
return False
|
||
|
||
# Threshold of 12: well above any legitimate plate/name spellout (~6–8 words)
|
||
# but catches the full-alphabet hallucination (26 words in sequence).
|
||
run = 0
|
||
for w in words:
|
||
if w in _PHONETIC_ALPHA_WORDS:
|
||
run += 1
|
||
if run >= 12:
|
||
return True
|
||
else:
|
||
run = 0
|
||
|
||
if len(words) > 30 and transcript.count(",") / len(words) > 0.15:
|
||
return True
|
||
|
||
return False
|
||
|
||
|
||
def _build_ten_codes_block(ten_codes: dict[str, str]) -> str:
|
||
if not ten_codes:
|
||
return ""
|
||
lines = "\n".join(f" {code}: {meaning}" for code, meaning in sorted(ten_codes.items()))
|
||
return f"Department ten-codes:\n{lines}\n\n"
|
||
|
||
|
||
async def extract_scenes(
|
||
call_id: str,
|
||
transcript: str,
|
||
talkgroup_name: Optional[str] = None,
|
||
talkgroup_id: Optional[int] = None,
|
||
system_id: Optional[str] = None,
|
||
segments: Optional[list[dict]] = None,
|
||
node_id: Optional[str] = None,
|
||
preserve_transcript_correction: bool = False,
|
||
) -> list[dict]:
|
||
"""
|
||
Split the transcript into one or more scenes and extract structured
|
||
intelligence for each. Most calls return a single scene; a busy dispatch
|
||
channel capturing back-to-back conversations returns multiple.
|
||
|
||
Each scene dict contains:
|
||
tags, incident_type, location, location_coords, resolved,
|
||
severity, vehicles, units, transcript_corrected,
|
||
segment_indices, embedding
|
||
|
||
Side-effect: updates calls/{call_id} in Firestore with merged tags,
|
||
location (primary scene), units/vehicles, severity, embedding, and
|
||
optionally transcript_corrected.
|
||
"""
|
||
vocabulary: list[str] = []
|
||
ten_codes: dict[str, str] = {}
|
||
if system_id:
|
||
# Single cached read — vocabulary and ten_codes live on the same document.
|
||
system_doc = await fstore.doc_get_cached("systems", system_id)
|
||
if system_doc:
|
||
vocabulary = system_doc.get("vocabulary") or []
|
||
ten_codes = system_doc.get("ten_codes") or {}
|
||
|
||
if _is_garbage_transcript(transcript):
|
||
logger.warning(
|
||
f"Intelligence: call {call_id} — garbage transcript detected "
|
||
f"(Whisper hallucination), skipping extraction"
|
||
)
|
||
try:
|
||
await fstore.doc_set("calls", call_id, {"skip_reason": "garbage_transcript"})
|
||
except Exception:
|
||
pass
|
||
return []
|
||
|
||
# Transcripts with ≤5 words carry no extractable intelligence — GPT hallucinates
|
||
# units and tags from thin context (e.g. "Main Lot", "10-4", "David").
|
||
if len(transcript.split()) <= 5:
|
||
logger.info(
|
||
f"Intelligence: call {call_id} — transcript too short for extraction "
|
||
f"({len(transcript.split())} words), skipping"
|
||
)
|
||
try:
|
||
await fstore.doc_set("calls", call_id, {"skip_reason": "transcript_too_short"})
|
||
except Exception:
|
||
pass
|
||
return []
|
||
|
||
raw_scenes: list[dict] = await asyncio.to_thread(
|
||
_sync_extract,
|
||
transcript, talkgroup_name, talkgroup_id, system_id, segments, vocabulary, ten_codes,
|
||
)
|
||
|
||
if not raw_scenes:
|
||
return []
|
||
|
||
# Resolve node position once for geocoding all scenes
|
||
node_lat: Optional[float] = None
|
||
node_lon: Optional[float] = None
|
||
if node_id:
|
||
node_doc = await fstore.doc_get_cached("nodes", node_id)
|
||
if node_doc:
|
||
node_lat = node_doc.get("lat")
|
||
node_lon = node_doc.get("lon")
|
||
|
||
processed: list[dict] = []
|
||
for scene in raw_scenes:
|
||
tags: list[str] = scene.get("tags") or []
|
||
incident_type: Optional[str] = scene.get("incident_type") or None
|
||
location: Optional[str] = scene.get("location") or None
|
||
vehicles: list[str] = scene.get("vehicles") or []
|
||
units: list[str] = scene.get("units") or []
|
||
cleared_units: list[str] = scene.get("cleared_units") or []
|
||
severity: str = scene.get("severity") or "unknown"
|
||
resolved: bool = bool(scene.get("resolved", False))
|
||
reassignment: bool = bool(scene.get("reassignment", False))
|
||
transcript_corrected: Optional[str]= scene.get("transcript_corrected") or None
|
||
segment_indices: Optional[list] = scene.get("segment_indices")
|
||
|
||
if incident_type in ("unknown", "other", ""):
|
||
incident_type = None
|
||
|
||
# Geocode this scene's location.
|
||
# Build the most specific query possible: location + municipality + state.
|
||
# e.g. "High Street" → "High Street, Yorktown, New York"
|
||
# This prevents generic street names from resolving to wrong-country results.
|
||
location_coords: Optional[dict] = None
|
||
if location and node_lat is not None and node_lon is not None:
|
||
muni = _municipality_from_tg(talkgroup_name)
|
||
state = await _get_node_state(node_id or "", node_lat, node_lon) if node_id else ""
|
||
county = _node_county_cache.get(node_id or "") if node_id else ""
|
||
parts = [location]
|
||
if muni:
|
||
parts.append(muni)
|
||
if county:
|
||
parts.append(county)
|
||
if state:
|
||
parts.append(state)
|
||
query = ", ".join(parts)
|
||
location_coords = await _geocode_location(query, node_lat, node_lon)
|
||
|
||
# Embed this scene's content
|
||
scene_text = _build_scene_embed_text(
|
||
transcript, segments, segment_indices, incident_type, transcript_corrected
|
||
)
|
||
embedding = await asyncio.to_thread(_sync_embed, scene_text)
|
||
|
||
processed.append({
|
||
"tags": tags,
|
||
"incident_type": incident_type,
|
||
"location": location,
|
||
"location_coords": location_coords,
|
||
"vehicles": vehicles,
|
||
"units": units,
|
||
"cleared_units": cleared_units,
|
||
"severity": severity,
|
||
"resolved": resolved,
|
||
"reassignment": reassignment,
|
||
"transcript_corrected": transcript_corrected,
|
||
"segment_indices": segment_indices,
|
||
"embedding": embedding,
|
||
})
|
||
|
||
# Merge across scenes for the call-level Firestore document.
|
||
# Primary scene (first) owns location, severity, transcript_corrected.
|
||
# Tags/units/vehicles are union-merged from all scenes.
|
||
primary = processed[0]
|
||
all_tags = list(dict.fromkeys(t for s in processed for t in s["tags"]))
|
||
all_units = list(dict.fromkeys(u for s in processed for u in s["units"]))
|
||
all_vehicles = list(dict.fromkeys(v for s in processed for v in s["vehicles"]))
|
||
all_cleared = list(dict.fromkeys(u for s in processed for u in s["cleared_units"]))
|
||
|
||
updates: dict = {"tags": all_tags, "severity": primary["severity"]}
|
||
if primary["location"]:
|
||
updates["location"] = primary["location"]
|
||
if primary["location_coords"]:
|
||
updates["location_coords"] = primary["location_coords"]
|
||
if all_units:
|
||
updates["units"] = all_units
|
||
if all_cleared:
|
||
updates["cleared_units"] = all_cleared
|
||
if all_vehicles:
|
||
updates["vehicles"] = all_vehicles
|
||
if primary["embedding"]:
|
||
updates["embedding"] = primary["embedding"]
|
||
if primary["transcript_corrected"] and not preserve_transcript_correction:
|
||
updates["transcript_corrected"] = primary["transcript_corrected"]
|
||
|
||
try:
|
||
await fstore.doc_set("calls", call_id, updates)
|
||
except Exception as e:
|
||
logger.warning(f"Could not save intelligence for call {call_id}: {e}")
|
||
|
||
scene_summary = (
|
||
f"{len(processed)} scene(s): "
|
||
+ ", ".join(
|
||
f"[{s['incident_type'] or 'unclassified'} tags={s['tags'][:2]}]"
|
||
for s in processed
|
||
)
|
||
)
|
||
logger.info(f"Intelligence: call {call_id} → {scene_summary}")
|
||
return processed
|
||
|
||
|
||
def _geo_dist_km(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
|
||
"""Haversine distance in km between two lat/lon points."""
|
||
R = 6371.0
|
||
dlat = math.radians(lat2 - lat1)
|
||
dlon = math.radians(lon2 - lon1)
|
||
a = math.sin(dlat / 2) ** 2 + math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) * math.sin(dlon / 2) ** 2
|
||
return R * 2 * math.asin(math.sqrt(a))
|
||
|
||
|
||
async def _get_node_state(node_id: str, lat: float, lon: float) -> str:
|
||
"""
|
||
Return the US state name (e.g. "New York") for a node's position.
|
||
Also populates _node_county_cache as a side-effect (same API call).
|
||
Uses Google Maps Reverse Geocoding; cached for the process lifetime since nodes don't move.
|
||
"""
|
||
if node_id in _node_state_cache:
|
||
return _node_state_cache[node_id]
|
||
|
||
import httpx
|
||
from app.config import settings
|
||
|
||
if not settings.google_maps_api_key:
|
||
return ""
|
||
|
||
state = ""
|
||
county = ""
|
||
try:
|
||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||
r = await client.get(
|
||
"https://maps.googleapis.com/maps/api/geocode/json",
|
||
params={
|
||
"latlng": f"{lat},{lon}",
|
||
"result_type": "administrative_area_level_1|administrative_area_level_2",
|
||
"key": settings.google_maps_api_key,
|
||
},
|
||
)
|
||
r.raise_for_status()
|
||
data = r.json()
|
||
if data.get("status") == "OK" and data.get("results"):
|
||
for result in data["results"]:
|
||
for comp in result.get("address_components", []):
|
||
types = comp.get("types", [])
|
||
if "administrative_area_level_1" in types and not state:
|
||
state = comp.get("long_name", "")
|
||
if "administrative_area_level_2" in types and not county:
|
||
county = comp.get("long_name", "")
|
||
except Exception as e:
|
||
logger.warning(f"Node state lookup failed for {node_id}: {e}")
|
||
|
||
if state:
|
||
_node_state_cache[node_id] = state
|
||
if county:
|
||
_node_county_cache[node_id] = county
|
||
if state or county:
|
||
logger.info(f"Node {node_id} geo resolved: county={county!r} state={state!r}")
|
||
return state
|
||
|
||
|
||
async def _geocode_location(
|
||
location_str: str, node_lat: float, node_lon: float
|
||
) -> Optional[dict]:
|
||
"""
|
||
Geocode using Google Maps Geocoding API, biased toward the node's area.
|
||
Returns {"lat": float, "lng": float} or None if geocoding fails or the
|
||
result is farther than geocode_max_km from the node (wrong-jurisdiction guard).
|
||
"""
|
||
import httpx
|
||
from app.config import settings
|
||
|
||
if not settings.google_maps_api_key:
|
||
logger.warning("GOOGLE_MAPS_API_KEY not set — geocoding disabled")
|
||
return None
|
||
|
||
bounds = (
|
||
f"{node_lat - _GEO_DELTA},{node_lon - _GEO_DELTA}"
|
||
f"|{node_lat + _GEO_DELTA},{node_lon + _GEO_DELTA}"
|
||
)
|
||
params = {
|
||
"address": location_str,
|
||
"bounds": bounds,
|
||
"region": "us",
|
||
"key": settings.google_maps_api_key,
|
||
}
|
||
try:
|
||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||
r = await client.get(
|
||
"https://maps.googleapis.com/maps/api/geocode/json",
|
||
params=params,
|
||
)
|
||
r.raise_for_status()
|
||
data = r.json()
|
||
if data.get("status") != "OK" or not data.get("results"):
|
||
return None
|
||
result = data["results"][0]
|
||
location_type = result.get("geometry", {}).get("location_type", "")
|
||
# Only accept address-level precision. GEOMETRIC_CENTER (city/neighborhood
|
||
# centroid) and APPROXIMATE (region boundary) produce coordinates that look
|
||
# valid but are too vague for 0.5km proximity matching — they often resolve
|
||
# to the same point as the node's position and create false proximity matches.
|
||
if location_type not in ("ROOFTOP", "RANGE_INTERPOLATED"):
|
||
logger.info(
|
||
f"Geocoding rejected '{location_str}' — imprecise result "
|
||
f"(location_type={location_type!r}), returning None"
|
||
)
|
||
return None
|
||
loc = result["geometry"]["location"]
|
||
lat, lng = float(loc["lat"]), float(loc["lng"])
|
||
dist_km = _geo_dist_km(node_lat, node_lon, lat, lng)
|
||
if dist_km > settings.geocode_max_km:
|
||
logger.warning(
|
||
f"Geocoding rejected '{location_str}' → ({lat:.4f}, {lng:.4f}) "
|
||
f"— {dist_km:.1f}km from node exceeds geocode_max_km={settings.geocode_max_km}"
|
||
)
|
||
return None
|
||
coords = {"lat": lat, "lng": lng}
|
||
logger.info(f"Geocoded '{location_str}' → {coords} ({dist_km:.1f}km from node) [{location_type}]")
|
||
return coords
|
||
except Exception as e:
|
||
logger.warning(f"Geocoding failed for '{location_str}': {e}")
|
||
return None
|
||
|
||
|
||
def _municipality_from_tg(tg_name: Optional[str]) -> Optional[str]:
|
||
"""
|
||
Extract the municipality name from a talkgroup name.
|
||
e.g. "Ossining PD" → "Ossining", "Westchester County Fire" → "Westchester County"
|
||
Returns None for tactical/operational channels with no useful location info.
|
||
"""
|
||
if not tg_name:
|
||
return None
|
||
cleaned = _TG_SUFFIX_RE.sub("", tg_name).strip()
|
||
if not cleaned or cleaned.isdigit() or (len(cleaned) <= 3 and cleaned.isupper()):
|
||
return None
|
||
return cleaned
|
||
|
||
|
||
def _build_transcript_block(transcript: str, segments: Optional[list[dict]]) -> str:
|
||
"""Format transcript as numbered transmissions if segments are available."""
|
||
if segments and len(segments) > 1:
|
||
lines = [f"{i+1}. [{s['start']}s] {s['text']}" for i, s in enumerate(segments)]
|
||
return f"Transmissions ({len(segments)}):\n" + "\n".join(lines)
|
||
return f"Transcript:\n{transcript}"
|
||
|
||
|
||
def _build_scene_embed_text(
|
||
transcript: str,
|
||
segments: Optional[list[dict]],
|
||
segment_indices: Optional[list[int]],
|
||
incident_type: Optional[str],
|
||
transcript_corrected: Optional[str],
|
||
) -> str:
|
||
"""Build the text string to embed for a specific scene."""
|
||
prefix = f"[{incident_type}] " if incident_type else ""
|
||
if transcript_corrected:
|
||
return f"{prefix}{transcript_corrected}"
|
||
if segments and segment_indices:
|
||
texts = [segments[i]["text"] for i in segment_indices if i < len(segments)]
|
||
return f"{prefix}{' '.join(texts)}"
|
||
return f"{prefix}{transcript}"
|
||
|
||
|
||
def _sync_extract(
|
||
transcript: str,
|
||
talkgroup_name: Optional[str],
|
||
talkgroup_id: Optional[int],
|
||
system_id: Optional[str],
|
||
segments: Optional[list[dict]],
|
||
vocabulary: Optional[list[str]] = None,
|
||
ten_codes: Optional[dict[str, str]] = None,
|
||
) -> list[dict]:
|
||
"""Call GPT-4o-mini and return a list of scene dicts."""
|
||
from app.config import settings
|
||
from openai import OpenAI
|
||
|
||
if not settings.openai_api_key:
|
||
logger.warning("OPENAI_API_KEY not set — intelligence extraction disabled.")
|
||
return []
|
||
|
||
from app.internal.vocabulary_learner import build_gpt_vocab_block
|
||
tg = f"{talkgroup_name} (TGID {talkgroup_id})" if talkgroup_id else (talkgroup_name or "unknown")
|
||
prompt = _PROMPT_TEMPLATE.format(
|
||
transcript_block=_build_transcript_block(transcript, segments),
|
||
talkgroup_name=tg,
|
||
system_id=system_id or "unknown",
|
||
ten_codes_block=_build_ten_codes_block(ten_codes or {}),
|
||
vocabulary_block=build_gpt_vocab_block(vocabulary or []),
|
||
)
|
||
|
||
try:
|
||
client = OpenAI(api_key=settings.openai_api_key)
|
||
response = client.chat.completions.create(
|
||
model="gpt-4o-mini",
|
||
messages=[{"role": "user", "content": prompt}],
|
||
response_format={"type": "json_object"},
|
||
)
|
||
raw = json.loads(response.choices[0].message.content)
|
||
|
||
# New format: {"scenes": [...]}
|
||
if "scenes" in raw and isinstance(raw["scenes"], list):
|
||
return raw["scenes"]
|
||
|
||
# Fallback: GPT returned the old flat single-scene format
|
||
logger.warning("GPT returned flat format instead of scenes array — wrapping")
|
||
return [raw]
|
||
|
||
except json.JSONDecodeError as e:
|
||
logger.warning(f"GPT-4o-mini returned non-JSON: {e}")
|
||
return []
|
||
except Exception as e:
|
||
logger.warning(f"GPT-4o-mini extraction failed: {e}")
|
||
return []
|
||
|
||
|
||
def _sync_embed(text: str) -> Optional[list[float]]:
|
||
"""Generate a text-embedding-3-small vector for semantic similarity."""
|
||
from app.config import settings
|
||
from openai import OpenAI
|
||
|
||
if not settings.openai_api_key:
|
||
return None
|
||
|
||
try:
|
||
client = OpenAI(api_key=settings.openai_api_key)
|
||
result = client.embeddings.create(model="text-embedding-3-small", input=text)
|
||
return result.data[0].embedding
|
||
except Exception as e:
|
||
logger.warning(f"Embedding generation failed: {e}")
|
||
return None
|