Appearance
Voice Agent Technical Specification
Document ID: PLCY-VOI-002
Version: 1.2
Effective Date: January 21, 2026
Last Review: January 21, 2026
Owner: Hop And Haul Team
CONFIDENTIAL
This document is CONFIDENTIAL and for internal use only. Do not distribute outside the organization.
1. Purpose
This document provides the technical implementation specification for the Hop And Haul Voice Agent system—a dispatch coordination service that operates like a modern version of CB radio dispatch. The agent coordinates transport opportunities with drivers over hands-free Bluetooth, exactly as dispatchers have done for decades.
Key Concept: Hop And Haul acts on behalf of dispatch. We call drivers via their one-touch Bluetooth headset, present opportunities in simple terms, and on acceptance, push the updated route directly to Samsara.
Related Documents:
- PLCY-VOI-001 - Voice Agent Integration Policy (behavioral requirements)
- PLCY-COM-001 - Driver Communication Protocol (state detection, routing)
- PLCY-VOI-003 - Voice Escalation Procedures
2. Prerequisites & Driver Equipment
2.1 One-Touch Bluetooth Headset Requirement
REQUIRED
All drivers participating in Hop And Haul voice dispatch must have a one-touch Bluetooth headset or hands-free vehicle integration. This is a company policy requirement for all participating fleets.
| Requirement | Specification |
|---|---|
| Headset Type | Bluetooth hands-free headset or vehicle-integrated system |
| Answer Method | Single-button press or voice-activated answer |
| No Physical Handling | Driver never touches phone during call |
| DOT Compliance | Meets 49 CFR 392.82 hands-free requirements |
Acceptable Equipment:
- Bluetooth mono/stereo headsets with one-touch answer
- Vehicle-integrated Bluetooth systems
- Apple CarPlay / Android Auto with voice activation
- Trucker-style single-ear headsets (Plantronics, BlueParrott, etc.)
2.2 Why This Works (Legal Basis)
| Practice | Status | Reference |
|---|---|---|
| CB radio dispatch while driving | Legal, standard practice | Industry standard since 1970s |
| Bluetooth hands-free calls | Legal | 49 CFR 392.82 explicitly allows |
| Dispatch coordination calls | Legal | Standard fleet operations |
| Accepting loads via voice | Legal | Same as CB radio dispatch |
Hop And Haul is dispatch. Drivers have received dispatch calls over hands-free systems for decades. The voice agent is simply a more efficient dispatcher.
3. System Architecture
3.1 High-Level Block Diagram
┌─────────────────────────────────────────────────────────────────────────┐
│ Hop And Haul Voice System │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌────────────────────┐ │
│ │ Driver Phone │◄──────── PSTN ───────────────►│ Twilio Voice │ │
│ │ (PSTN) │ │ (FedRAMP Moderate) │ │
│ └──────────────┘ └──────────┬─────────┘ │
│ │ │
│ ┌──────────────────────────────────┘ │
│ │ Control: TwiML <Connect><Stream> │
│ │ Audio: Media Streams WebSocket │
│ │ Format: G.711 μ-law (audio/x-mulaw) │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ Voice Gateway Service │ │
│ │ (WebSocket Proxy) │ │
│ │ │ │
│ │ • Twilio WS ↔ OpenAI WS relay │ │
│ │ • Bidirectional audio stream │ │
│ │ • Tool/function call execution │ │
│ │ • Barge-in detection │ │
│ │ • State-aware prompt injection │ │
│ └──────────────┬───────────────────┘ │
│ │ │
│ ┌───────────────────┼───────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌────────────────┐ ┌──────────────────────┐ │
│ │ OpenAI Realtime │ │ Hop And Haul API │ │ Samsara API │ │
│ │ API (WebSocket) │ │ │ │ │ │
│ │ │ │ • Ride ops │ │ • Route update │ │
│ │ • g711_ulaw │ │ • Driver data │ │ • ETA calc │ │
│ │ • Server VAD │ │ • Offer mgmt │ │ • Vehicle loc │ │
│ │ • Tool calls │ │ • Audit logs │ │ • Driver state │ │
│ └─────────────────┘ └────────────────┘ └────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘3.2 Component Responsibilities
| Component | Responsibility |
|---|---|
| Twilio Voice | PSTN connectivity, call routing, TwiML processing |
| Twilio Media Streams | Real-time audio streaming over WebSocket |
| Voice Gateway | Audio relay, protocol translation, business logic |
| OpenAI Realtime API | Speech-to-speech AI, voice activity detection, tool calls |
| Hop And Haul API | Ride operations, driver data, offer management |
| Samsara API | Route updates on acceptance, vehicle location, driver state |
| Human Escalation | Safety/support routing via PagerDuty (see PLCY-VOI-003) |
3.3 Control Plane vs Audio Plane
| Plane | Protocol | Purpose |
|---|---|---|
| Control Plane | HTTP/REST | Call initiation, TwiML delivery, webhooks |
| Audio Plane | WebSocket | Real-time bidirectional audio streaming |
3. Twilio Integration
3.1 Call Initiation (REST API)
Outbound Call:
http
POST https://api.twilio.com/2010-04-01/Accounts/{AccountSid}/Calls
Content-Type: application/x-www-form-urlencoded
To=+1XXXXXXXXXX
From=+1XXXXXXXXXX
Url=https://fleetseats.example.com/voice/twiml
StatusCallback=https://fleetseats.example.com/voice/statusInbound Call (Webhook Configuration):
- Configure Twilio phone number to POST to
/voice/twiml - Voice Gateway receives call metadata and returns TwiML
3.2 TwiML Configuration
Media Stream Connection:
xml
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://voice-gateway.fleetseats.example.com/media-stream">
<Parameter name="callId" value="{CallSid}"/>
<Parameter name="driverId" value="{driver_id}"/>
<Parameter name="vehicleState" value="{vehicle_state}"/>
</Stream>
</Connect>
</Response>Key TwiML Elements:
| Element | Purpose |
|---|---|
<Connect> | Establishes bidirectional media connection |
<Stream url="wss://..."> | WebSocket endpoint for audio |
<Parameter> | Pass context to Voice Gateway |
3.3 Media Streams WebSocket Protocol
Connection URL:
wss://voice-gateway.fleetseats.example.com/media-streamTwilio → Gateway Events:
| Event | Payload | Description |
|---|---|---|
connected | { protocol, version } | WebSocket connected |
start | { streamSid, callSid, mediaFormat } | Stream started, includes encoding info |
media | { payload } | Base64 audio chunk (8000 Hz, μ-law) |
stop | { streamSid } | Stream ended |
Gateway → Twilio Events:
| Event | Payload | Description |
|---|---|---|
media | { payload } | Base64 audio to play to caller |
mark | { name } | Sync marker (notifies when audio plays) |
clear | {} | Flush buffered audio (for barge-in) |
3.4 Audio Format
| Parameter | Value |
|---|---|
| Encoding | audio/x-mulaw (G.711 μ-law) |
| Sample Rate | 8000 Hz |
| Channels | Mono |
| Bit Depth | 8-bit |
| Chunk Size | ~20ms frames |
4. OpenAI Realtime API Integration
4.1 WebSocket Connection
Connection URL:
wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17Headers:
Authorization: Bearer {OPENAI_API_KEY}
OpenAI-Beta: realtime=v14.2 Session Configuration
Initial Session Update:
json
{
"type": "session.update",
"session": {
"modalities": ["text", "audio"],
"instructions": "You are a Hop And Haul dispatch assistant...",
"voice": "alloy",
"input_audio_format": "g711_ulaw",
"output_audio_format": "g711_ulaw",
"input_audio_transcription": {
"model": "whisper-1"
},
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 500
},
"tools": [
{
"type": "function",
"name": "accept_ride_offer",
"description": "Accept the transport offer and push route to Samsara",
"parameters": { "type": "object", "properties": {} }
},
{
"type": "function",
"name": "decline_ride_offer",
"description": "Decline the transport offer",
"parameters": { "type": "object", "properties": {} }
},
{
"type": "function",
"name": "get_offer_details",
"description": "Get more details about the transport (pickup location, drop-off, etc.)",
"parameters": { "type": "object", "properties": {} }
},
{
"type": "function",
"name": "escalate_to_human",
"description": "Transfer to human support",
"parameters": {
"type": "object",
"properties": {
"reason": { "type": "string", "enum": ["safety", "support", "other"] }
}
}
}
]
}
}4.3 Audio Format Compatibility
| Format | Twilio | OpenAI | Notes |
|---|---|---|---|
g711_ulaw | ✓ (native) | ✓ (supported) | No transcoding required |
g711_alaw | ✓ | ✓ | Alternative, same quality |
pcm16 | Requires conversion | ✓ | Higher quality, more bandwidth |
Key Benefit: Using g711_ulaw on both sides eliminates transcoding latency.
4.4 Input Buffer Management
Sending Audio to OpenAI:
json
{
"type": "input_audio_buffer.append",
"audio": "<base64-encoded-g711-ulaw>"
}Buffer Commit (Manual VAD Mode):
json
{
"type": "input_audio_buffer.commit"
}Note: With server_vad turn detection, OpenAI automatically detects speech end and commits.
4.5 Response Handling
Audio Response Events:
json
{
"type": "response.audio.delta",
"delta": "<base64-encoded-g711-ulaw>"
}Response Complete:
json
{
"type": "response.audio.done"
}Tool/Function Calls:
json
{
"type": "response.function_call_arguments.done",
"call_id": "call_xxx",
"name": "accept_ride_offer",
"arguments": "{}"
}5. Voice Gateway Service
5.1 Architecture
┌──────────────────────────────────────────────────────────────┐
│ Voice Gateway Service │
├──────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Twilio WS │ │ OpenAI WS │ │
│ │ Handler │◄────────────►│ Handler │ │
│ │ │ Audio │ │ │
│ │ • Receive media │ Relay │ • Send audio │ │
│ │ • Send audio │ │ • Receive delta │ │
│ │ • Handle marks │ │ • Handle tools │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ └───────────────┬────────────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ Session │ │
│ │ Manager │ │
│ │ │ │
│ │ • Call state │ │
│ │ • Driver ctx │ │
│ │ • Offer data │ │
│ │ • Barge-in │ │
│ └───────┬───────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌───────────┐ ┌───────────────┐ │
│ │ Tool Executor │ │ Hop And Haul│ │ Escalation │ │
│ │ │ │ API Client│ │ Handler │ │
│ │ • accept_ride │ │ │ │ │ │
│ │ • decline_ride│ │ • GET/POST│ │ • PagerDuty │ │
│ │ • get_details │ │ • Auth │ │ • Call xfer │ │
│ └───────────────┘ └───────────┘ └───────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘5.2 Connection Lifecycle
1. Twilio WS connects (on call answer)
2. Gateway extracts call metadata from "start" event
3. Gateway opens OpenAI WS connection
4. Gateway sends session.update with instructions + tools
5. Gateway injects initial prompt based on vehicle state
6. Audio relay loop begins
7. Tool calls executed as received
8. On call end: close both WebSockets, log session5.3 Tool Execution
| Tool | Gateway Action |
|---|---|
accept_ride_offer | 1. POST to Hop And Haul API /rides/{id}/accept2. Push route to Samsara via PATCH /fleet/routes/{routeId} |
decline_ride_offer | POST to Hop And Haul API /rides/{id}/decline |
get_offer_details | GET from Hop And Haul API /rides/{id}, format response |
escalate_to_human | Trigger PagerDuty alert, transfer call via Twilio |
5.4 Samsara Route Update (On Acceptance)
When driver says "yes" and accept_ride_offer is called:
1. Hop And Haul API marks ride as accepted
2. Gateway calls Samsara API to update driver's route:
- Add pickup waypoint
- Add drop-off waypoint
- Recalculate ETA
3. Driver's Samsara device shows updated route
4. Agent confirms: "I'll change your route now"Samsara API Call:
http
PATCH https://api.samsara.com/fleet/routes/{routeId}
Authorization: Bearer {SAMSARA_API_TOKEN}
Content-Type: application/json
{
"stops": [
{
"address": { "formattedAddress": "Exit 52, I-40, TN" },
"scheduledArrivalTime": "2025-12-30T14:30:00Z",
"notes": "Hop And Haul pickup - John D."
},
{
"address": { "formattedAddress": "Exit 30 Drop-off Location" },
"scheduledArrivalTime": "2025-12-30T15:00:00Z",
"notes": "Hop And Haul drop-off"
}
]
}Tool Response Format:
json
{
"type": "conversation.item.create",
"item": {
"type": "function_call_output",
"call_id": "call_xxx",
"output": "{\"status\": \"accepted\", \"message\": \"Ride confirmed.\"}"
}
}5.4 State-Aware Prompt Injection
Based on vehicle state (from Samsara/ELD), inject appropriate system instructions:
| State | Prompt Mode | Constraints |
|---|---|---|
| MOVING | Radio Squawk | Max 20 words, 1 turn, no questions |
| STOPPED | Dispatch Desk | Max 40 words/turn, 3 turns, simple questions |
| PARKED | Full Workflow | Unlimited turns, full conversation |
Example Prompt Injection (MOVING):
CRITICAL: Driver is MOVING. You MUST:
- Use max 20 words total
- Make ONE statement only
- End with "say hold or pass"
- Do NOT ask clarifying questions
- If unclear response, say "queued" and end6. Call Flow Sequences
6.1 Outbound Call (Offer Notification)
┌─────────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐ ┌────────┐
│ Hop And Haul │ │ Twilio │ │ Gateway │ │ OpenAI │ │ Driver │
│ API │ │ Voice │ │ │ │ Realtime │ │ Phone │
└──────┬──────┘ └────┬────┘ └────┬────┘ └────┬─────┘ └───┬────┘
│ │ │ │ │
│ 1. Create Call │ │ │ │
│───────────────►│ │ │ │
│ │ 2. GET /twiml│ │ │
│ │─────────────►│ │ │
│ │ 3. TwiML │ │ │
│ │◄─────────────│ │ │
│ │ │ │ │
│ │ 4. Ring │ │ │
│ │──────────────┼──────────────┼─────────────►│
│ │ │ │ │
│ │ │ │ 5. Answer │
│ │◄─────────────┼──────────────┼──────────────│
│ │ │ │ │
│ │ 6. WS Open │ │ │
│ │─────────────►│ │ │
│ │ 7. "start" │ │ │
│ │─────────────►│ │ │
│ │ │ 8. WS Connect│ │
│ │ │─────────────►│ │
│ │ │ 9. session.update │
│ │ │─────────────►│ │
│ │ │ │ │
│ │ │ 10. Initial prompt │
│ │ │─────────────►│ │
│ │ │ │ │
│ │ │ 11. audio.delta │
│ │ │◄─────────────│ │
│ │ 12. "media" │ │ │
│ │◄─────────────│ │ │
│ │ │ │ 13. Audio │
│ │──────────────┼──────────────┼─────────────►│
│ │ │ │ │6.2 Audio Loop (Steady State)
┌─────────┐ ┌─────────┐ ┌──────────┐
│ Twilio │ │ Gateway │ │ OpenAI │
│ Voice │ │ │ │ Realtime │
└────┬────┘ └────┬────┘ └────┬─────┘
│ │ │
│ "media" (driver) │ │
│──────────────────►│ │
│ │ input_audio_buffer.append
│ │──────────────────►│
│ │ │
│ │ (VAD detects end)│
│ │ │
│ │ response.audio.delta
│ │◄──────────────────│
│ "media" (agent) │ │
│◄──────────────────│ │
│ │ │
│ [repeat] │ │
▼ ▼ ▼6.3 Call Termination
1. Driver hangs up OR agent ends call
2. Twilio sends "stop" event to Gateway
3. Gateway closes OpenAI WebSocket
4. Gateway logs call session to Hop And Haul API
5. Twilio WebSocket closes7. Event Mapping Cheat-Sheet
7.1 Twilio → Gateway
| Twilio Event | Gateway Action |
|---|---|
connected | Initialize session state |
start | Extract metadata, open OpenAI WS, send session.update |
media | Base64 decode, forward via input_audio_buffer.append |
stop | Close OpenAI WS, log session, cleanup |
7.2 Gateway → OpenAI
| Gateway Action | OpenAI Event |
|---|---|
| Connect | WebSocket handshake |
| Configure | session.update |
| Send audio | input_audio_buffer.append |
| Commit buffer | input_audio_buffer.commit (manual VAD) |
| Tool response | conversation.item.create (function_call_output) |
7.3 OpenAI → Gateway
| OpenAI Event | Gateway Action |
|---|---|
session.created | Log, ready for audio |
response.audio.delta | Forward to Twilio as media |
response.audio.done | Mark audio complete |
response.function_call_arguments.done | Execute tool, return result |
input_audio_buffer.speech_started | (Optional) Prepare for barge-in |
input_audio_buffer.speech_stopped | (VAD) Audio committed |
error | Log, potentially end call |
7.4 Gateway → Twilio
| Gateway Action | Twilio Event |
|---|---|
| Play audio | media (base64 μ-law) |
| Sync point | mark (with name) |
| Flush buffer | clear (for barge-in) |
8. Barge-In / Interruption Handling
8.1 Problem Statement
When the driver speaks while the agent is outputting audio, we need to:
- Stop playing agent audio immediately
- Process the driver's speech
- Generate new response
8.2 Detection
OpenAI's server_vad detects speech start:
json
{
"type": "input_audio_buffer.speech_started"
}8.3 Handling Sequence
┌─────────┐ ┌─────────┐ ┌──────────┐
│ Twilio │ │ Gateway │ │ OpenAI │
└────┬────┘ └────┬────┘ └────┬─────┘
│ │ │
│ (Agent playing) │ audio.delta │
│◄──────────────────│◄──────────────────│
│ │ │
│ "media" (driver speaks) │
│──────────────────►│ │
│ │ input_audio...append
│ │──────────────────►│
│ │ │
│ │ speech_started │
│ │◄──────────────────│
│ │ │
│ "clear" │ (flush buffer) │
│◄──────────────────│ │
│ │ │
│ │ response.cancel │
│ │──────────────────►│ (optional)
│ │ │
│ │ (process new input)
│ │ │8.4 Implementation
javascript
// On receiving speech_started from OpenAI
function handleBargeIn() {
// 1. Send clear to Twilio to flush buffered audio
twilioWs.send(JSON.stringify({ event: 'clear', streamSid }));
// 2. Optionally cancel in-progress OpenAI response
openaiWs.send(JSON.stringify({ type: 'response.cancel' }));
// 3. Update state
sessionState.isAgentSpeaking = false;
}8.5 Twilio clear Message
json
{
"event": "clear",
"streamSid": "MZ..."
}This immediately stops any buffered audio from playing to the caller.
9. Dispatch Coordination Conversations
Dispatch Model
These conversations mirror how dispatchers have communicated with drivers for decades—brief, clear, actionable. The voice agent is simply a more efficient dispatcher calling over a hands-free headset.
9.1 Voice-Turn Budget (from PLCY-VOI-001)
| State | Agent Words | Agent Turns | Follow-ups | Total Duration |
|---|---|---|---|---|
| MOVING | 20 max | 1 | 0 | <10 sec |
| STOPPED | 40/turn | 3 max | 2 max | <60 sec |
| PARKED | Unlimited | Unlimited | As needed | No limit |
9.2 MOVING State: Quick Notification
Offer Notification (like a dispatcher radio squawk):
"Hop And Haul. Transport near exit 52 in 5 miles. Say 'hold' or 'pass'."Driver Responses:
| Response | Agent Action |
|---|---|
| "hold" | "Got it. Details when you stop." (end) |
| "pass" | "No problem." (end) |
| "not now" / "later" | "Queued." (end) |
| [silence 5s] | (silence, end call) |
| [unclear] | (queue for stopped, end) |
Prohibited While MOVING (per PLCY-COM-001 Section 4.6):
- Any clarifying questions
- Any follow-up prompts
- Sales pitch or persuasion
- Urgent or pressuring language ("hurry," "last chance," "you need to decide now")
- Requests to view a screen or tap a button
- Complex routing instructions
- Guilt-based framing for declines
9.3 STOPPED State: Full Dispatch Conversation
Standard Call Script (per PLCY-COM-001 Section 4.3):
Agent: "Hey [Driver Name] — got a ride that fits your route.
Pickup near [Location] in about [X] minutes, drop-off in [Destination].
Adds [X] miles, pays [$X].
No rush — let me know in the next few minutes if you want it."
Driver: "Yes" / "Yeah" / "Sure"
Agent: "Great, I'll change your route now."
[Triggers accept_ride_offer → Samsara route update]Accepted Response Options (per PLCY-COM-001 Section 4.4):
| Response | Result |
|---|---|
| "Yes" or equivalent affirmation | Opportunity accepted |
| "No" or equivalent decline | Opportunity released, no penalty |
| "Hold" or "Let me think" | Decision deferred, follow-up scheduled |
Follow-Up and Timeout (per PLCY-COM-001 Section 4.5):
| Timing | Action |
|---|---|
| 3-4 minutes | Soft follow-up: "Still got that [Destination] ride if you want it. Otherwise I'll pass it along." |
| 5 minutes | Auto-decline: "No problem — I've released that one. Drive safe." |
If Driver Asks a Question:
Driver: "Where do they need to go?"
Agent: "Yeah, looks like they need to be dropped off about 5 minutes
off of exit 30. Still interested?"
Driver: "Sure, let's do it."
Agent: "Perfect, updating your route now."
[Triggers accept_ride_offer → Samsara route update]If Driver Declines:
Driver: "Nah, not today."
Agent: "No problem. Safe travels."
[Triggers decline_ride_offer, end call]9.4 PARKED State: Full Details Available
Agent: "Ready to confirm the pickup details?"
Driver: "Yes"
Agent: "Pickup is at the Pilot truck stop at exit 52.
Ask for John D—he's wearing a blue jacket.
Drop-off is exit 30, about 25 miles.
Route is updating on your Samsara now.
Safe travels."9.5 Key Conversation Patterns
| Pattern | Example |
|---|---|
| Present opportunity | "There's an option to pick up someone near [location]" |
| State diversion clearly | "Total diversion will be about [X] minutes" |
| Simple decision | "Does this sound good?" / "Interested?" |
| Confirm action | "I'll change your route now" / "Updating your route" |
| Accept decline gracefully | "No problem" / "Safe travels" |
9.6 System Prompt Template
You are a Hop And Haul dispatch coordinator calling a commercial driver over
their hands-free Bluetooth headset. You act like a human dispatcher—brief,
professional, and helpful.
CURRENT DRIVER STATE: {vehicle_state}
YOUR ROLE:
- You are dispatch, not a sales agent
- Present transport opportunities clearly
- On acceptance, confirm you're updating their Samsara route
- Accept "no" gracefully, no follow-up
CONVERSATION STYLE:
- Natural, like talking to a colleague
- "Hey, there's an option..." not "Hop And Haul has identified..."
- "Does this sound good?" not "Would you like to accept this offer?"
- "I'll change your route now" not "Your route will be updated momentarily"
CORE RULES (ALL STATES):
- Never pressure or coerce
- Accept "no" immediately, no follow-up
- If driver says "speak to a person", immediately call escalate_to_human
- For safety emergencies, call escalate_to_human with reason="safety"
- On acceptance, call accept_ride_offer (this pushes route to Samsara)
{state_specific_instructions based on vehicle_state}10. Error Handling & Resilience
10.1 WebSocket Reconnection
| Failure | Action |
|---|---|
| OpenAI WS disconnects | Attempt reconnect (3 tries, exponential backoff) |
| Twilio WS disconnects | Call ended, cleanup session |
| Gateway crash | Twilio timeout → call ends gracefully |
10.2 Audio Quality Degradation
| Issue | Detection | Action |
|---|---|---|
| High latency | Audio timestamp drift | Log, continue |
| Packet loss | Gaps in audio stream | Interpolate silence |
| Encoding error | Invalid base64 | Skip frame, log |
10.3 Timeout Handling
| Timeout | Duration | Action |
|---|---|---|
| Driver silence (MOVING) | 5 seconds | End call silently |
| Driver silence (STOPPED) | 15 seconds | "I'll check back later" + end |
| OpenAI response | 10 seconds | Fallback: "One moment..." |
| Tool execution | 5 seconds | Return error to OpenAI |
10.4 Graceful Degradation
| Scenario | Fallback |
|---|---|
| OpenAI unavailable | Transfer to human support |
| Hop And Haul API unavailable | "Having technical difficulties, please try again" |
| Tool execution fails | Report error to driver, offer retry or human |
11. NIST 800-53 Control Mapping
| Control | Title | Implementation |
|---|---|---|
| AC-3 | Access Enforcement | API key scoping, session isolation |
| AU-2 | Event Logging | All calls logged with full metadata |
| AU-3 | Content of Audit Records | Call ID, driver ID, timestamps, outcomes |
| SC-8 | Transmission Confidentiality | TLS for all WebSocket connections |
| SC-12 | Cryptographic Key Management | API keys in secrets manager |
| SI-4 | System Monitoring | Real-time call quality monitoring |
| IR-4 | Incident Handling | Escalation to human for safety/support |
For complete NIST mapping, see Control Mapping Matrix.
12. Security Considerations
12.1 API Key Management
| Key | Storage | Rotation |
|---|---|---|
| Twilio Account SID/Auth Token | AWS Secrets Manager | Quarterly |
| OpenAI API Key | AWS Secrets Manager | Quarterly |
| Hop And Haul API Key (internal) | Environment variable | Per deployment |
12.2 Data Handling
| Data | Handling | Retention |
|---|---|---|
| Call audio | Not stored (streaming only) | None |
| Transcripts | Optional, per consent | Per PLCY-RET-001 |
| Call metadata | Logged to audit trail | 24 months |
| Driver PII | Tokenized in logs | Per PLCY-RET-001 |
12.3 Recording Consent
See PLCY-VOI-001 Section 9 for consent requirements by state.
13. Monitoring & Observability
13.1 Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
| Call success rate | Completed / Initiated | <95% |
| Average latency | End-to-end response time | >500ms |
| Barge-in rate | Interruptions / Calls | Informational |
| Escalation rate | Human transfers / Calls | >10% |
| Error rate | Failed calls / Total | >2% |
13.2 Logging
| Log Level | Content |
|---|---|
| INFO | Call start/end, state transitions |
| DEBUG | Audio frame counts, timing |
| WARN | Reconnections, degraded quality |
| ERROR | Failures, timeouts, exceptions |
14. Document Control
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | December 30, 2025 | Hop And Haul Team | Initial release |
| 1.1 | December 30, 2025 | Hop And Haul Team | Clarified dispatch coordination model, added one-touch headset requirement, added Samsara route update on acceptance, updated conversation examples to natural dispatch style |
| 1.2 | January 21, 2026 | Hop And Haul Team | Aligned with PLCY-COM-001 v3.0: Updated STOPPED state to use standard call script. Added accepted response options (Yes/No/Hold). Added follow-up and timeout logic (3-4 min soft follow-up, 5 min auto-decline). Expanded prohibited language list for MOVING state. |