Skip to content

Voice Agent Technical Specification

Document ID: PLCY-VOI-002
Version: 1.2
Effective Date: January 21, 2026
Last Review: January 21, 2026
Owner: Hop And Haul Team


CONFIDENTIAL

This document is CONFIDENTIAL and for internal use only. Do not distribute outside the organization.

1. Purpose

This document provides the technical implementation specification for the Hop And Haul Voice Agent system—a dispatch coordination service that operates like a modern version of CB radio dispatch. The agent coordinates transport opportunities with drivers over hands-free Bluetooth, exactly as dispatchers have done for decades.

Key Concept: Hop And Haul acts on behalf of dispatch. We call drivers via their one-touch Bluetooth headset, present opportunities in simple terms, and on acceptance, push the updated route directly to Samsara.

Related Documents:

  • PLCY-VOI-001 - Voice Agent Integration Policy (behavioral requirements)
  • PLCY-COM-001 - Driver Communication Protocol (state detection, routing)
  • PLCY-VOI-003 - Voice Escalation Procedures

2. Prerequisites & Driver Equipment

2.1 One-Touch Bluetooth Headset Requirement

REQUIRED

All drivers participating in Hop And Haul voice dispatch must have a one-touch Bluetooth headset or hands-free vehicle integration. This is a company policy requirement for all participating fleets.

RequirementSpecification
Headset TypeBluetooth hands-free headset or vehicle-integrated system
Answer MethodSingle-button press or voice-activated answer
No Physical HandlingDriver never touches phone during call
DOT ComplianceMeets 49 CFR 392.82 hands-free requirements

Acceptable Equipment:

  • Bluetooth mono/stereo headsets with one-touch answer
  • Vehicle-integrated Bluetooth systems
  • Apple CarPlay / Android Auto with voice activation
  • Trucker-style single-ear headsets (Plantronics, BlueParrott, etc.)
PracticeStatusReference
CB radio dispatch while drivingLegal, standard practiceIndustry standard since 1970s
Bluetooth hands-free callsLegal49 CFR 392.82 explicitly allows
Dispatch coordination callsLegalStandard fleet operations
Accepting loads via voiceLegalSame as CB radio dispatch

Hop And Haul is dispatch. Drivers have received dispatch calls over hands-free systems for decades. The voice agent is simply a more efficient dispatcher.


3. System Architecture

3.1 High-Level Block Diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                        Hop And Haul Voice System                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────┐                                ┌────────────────────┐ │
│  │ Driver Phone │◄──────── PSTN ───────────────►│ Twilio Voice       │ │
│  │   (PSTN)     │                                │ (FedRAMP Moderate) │ │
│  └──────────────┘                                └──────────┬─────────┘ │
│                                                              │           │
│                           ┌──────────────────────────────────┘           │
│                           │ Control: TwiML <Connect><Stream>             │
│                           │ Audio: Media Streams WebSocket               │
│                           │ Format: G.711 μ-law (audio/x-mulaw)          │
│                           ▼                                              │
│                  ┌─────────────────────────────────┐                    │
│                  │      Voice Gateway Service       │                    │
│                  │      (WebSocket Proxy)           │                    │
│                  │                                  │                    │
│                  │  • Twilio WS ↔ OpenAI WS relay  │                    │
│                  │  • Bidirectional audio stream    │                    │
│                  │  • Tool/function call execution  │                    │
│                  │  • Barge-in detection            │                    │
│                  │  • State-aware prompt injection  │                    │
│                  └──────────────┬───────────────────┘                    │
│                                 │                                        │
│             ┌───────────────────┼───────────────────────┐               │
│             │                   │                       │               │
│             ▼                   ▼                       ▼               │
│    ┌─────────────────┐ ┌────────────────┐ ┌──────────────────────┐     │
│    │ OpenAI Realtime │ │ Hop And Haul API │ │ Samsara API    │     │
│    │ API (WebSocket) │ │                │ │                │     │
│    │                 │ │ • Ride ops     │ │ • Route update │     │
│    │ • g711_ulaw     │ │ • Driver data  │ │ • ETA calc     │     │
│    │ • Server VAD    │ │ • Offer mgmt   │ │ • Vehicle loc  │     │
│    │ • Tool calls    │ │ • Audit logs   │ │ • Driver state │     │
│    └─────────────────┘ └────────────────┘ └────────────────┘     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

3.2 Component Responsibilities

ComponentResponsibility
Twilio VoicePSTN connectivity, call routing, TwiML processing
Twilio Media StreamsReal-time audio streaming over WebSocket
Voice GatewayAudio relay, protocol translation, business logic
OpenAI Realtime APISpeech-to-speech AI, voice activity detection, tool calls
Hop And Haul APIRide operations, driver data, offer management
Samsara APIRoute updates on acceptance, vehicle location, driver state
Human EscalationSafety/support routing via PagerDuty (see PLCY-VOI-003)

3.3 Control Plane vs Audio Plane

PlaneProtocolPurpose
Control PlaneHTTP/RESTCall initiation, TwiML delivery, webhooks
Audio PlaneWebSocketReal-time bidirectional audio streaming

3. Twilio Integration

3.1 Call Initiation (REST API)

Outbound Call:

http
POST https://api.twilio.com/2010-04-01/Accounts/{AccountSid}/Calls
Content-Type: application/x-www-form-urlencoded

To=+1XXXXXXXXXX
From=+1XXXXXXXXXX
Url=https://fleetseats.example.com/voice/twiml
StatusCallback=https://fleetseats.example.com/voice/status

Inbound Call (Webhook Configuration):

  • Configure Twilio phone number to POST to /voice/twiml
  • Voice Gateway receives call metadata and returns TwiML

3.2 TwiML Configuration

Media Stream Connection:

xml
<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Connect>
        <Stream url="wss://voice-gateway.fleetseats.example.com/media-stream">
            <Parameter name="callId" value="{CallSid}"/>
            <Parameter name="driverId" value="{driver_id}"/>
            <Parameter name="vehicleState" value="{vehicle_state}"/>
        </Stream>
    </Connect>
</Response>

Key TwiML Elements:

ElementPurpose
<Connect>Establishes bidirectional media connection
<Stream url="wss://...">WebSocket endpoint for audio
<Parameter>Pass context to Voice Gateway

3.3 Media Streams WebSocket Protocol

Connection URL:

wss://voice-gateway.fleetseats.example.com/media-stream

Twilio → Gateway Events:

EventPayloadDescription
connected{ protocol, version }WebSocket connected
start{ streamSid, callSid, mediaFormat }Stream started, includes encoding info
media{ payload }Base64 audio chunk (8000 Hz, μ-law)
stop{ streamSid }Stream ended

Gateway → Twilio Events:

EventPayloadDescription
media{ payload }Base64 audio to play to caller
mark{ name }Sync marker (notifies when audio plays)
clear{}Flush buffered audio (for barge-in)

3.4 Audio Format

ParameterValue
Encodingaudio/x-mulaw (G.711 μ-law)
Sample Rate8000 Hz
ChannelsMono
Bit Depth8-bit
Chunk Size~20ms frames

4. OpenAI Realtime API Integration

4.1 WebSocket Connection

Connection URL:

wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17

Headers:

Authorization: Bearer {OPENAI_API_KEY}
OpenAI-Beta: realtime=v1

4.2 Session Configuration

Initial Session Update:

json
{
  "type": "session.update",
  "session": {
    "modalities": ["text", "audio"],
    "instructions": "You are a Hop And Haul dispatch assistant...",
    "voice": "alloy",
    "input_audio_format": "g711_ulaw",
    "output_audio_format": "g711_ulaw",
    "input_audio_transcription": {
      "model": "whisper-1"
    },
    "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 500
    },
    "tools": [
      {
        "type": "function",
        "name": "accept_ride_offer",
        "description": "Accept the transport offer and push route to Samsara",
        "parameters": { "type": "object", "properties": {} }
      },
      {
        "type": "function",
        "name": "decline_ride_offer",
        "description": "Decline the transport offer",
        "parameters": { "type": "object", "properties": {} }
      },
      {
        "type": "function",
        "name": "get_offer_details",
        "description": "Get more details about the transport (pickup location, drop-off, etc.)",
        "parameters": { "type": "object", "properties": {} }
      },
      {
        "type": "function",
        "name": "escalate_to_human",
        "description": "Transfer to human support",
        "parameters": {
          "type": "object",
          "properties": {
            "reason": { "type": "string", "enum": ["safety", "support", "other"] }
          }
        }
      }
    ]
  }
}

4.3 Audio Format Compatibility

FormatTwilioOpenAINotes
g711_ulaw✓ (native)✓ (supported)No transcoding required
g711_alawAlternative, same quality
pcm16Requires conversionHigher quality, more bandwidth

Key Benefit: Using g711_ulaw on both sides eliminates transcoding latency.

4.4 Input Buffer Management

Sending Audio to OpenAI:

json
{
  "type": "input_audio_buffer.append",
  "audio": "<base64-encoded-g711-ulaw>"
}

Buffer Commit (Manual VAD Mode):

json
{
  "type": "input_audio_buffer.commit"
}

Note: With server_vad turn detection, OpenAI automatically detects speech end and commits.

4.5 Response Handling

Audio Response Events:

json
{
  "type": "response.audio.delta",
  "delta": "<base64-encoded-g711-ulaw>"
}

Response Complete:

json
{
  "type": "response.audio.done"
}

Tool/Function Calls:

json
{
  "type": "response.function_call_arguments.done",
  "call_id": "call_xxx",
  "name": "accept_ride_offer",
  "arguments": "{}"
}

5. Voice Gateway Service

5.1 Architecture

┌──────────────────────────────────────────────────────────────┐
│                    Voice Gateway Service                      │
├──────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌─────────────────┐              ┌─────────────────┐        │
│  │ Twilio WS       │              │ OpenAI WS       │        │
│  │ Handler         │◄────────────►│ Handler         │        │
│  │                 │   Audio      │                 │        │
│  │ • Receive media │   Relay      │ • Send audio    │        │
│  │ • Send audio    │              │ • Receive delta │        │
│  │ • Handle marks  │              │ • Handle tools  │        │
│  └────────┬────────┘              └────────┬────────┘        │
│           │                                │                  │
│           └───────────────┬────────────────┘                  │
│                           │                                   │
│                   ┌───────▼───────┐                          │
│                   │ Session       │                          │
│                   │ Manager       │                          │
│                   │               │                          │
│                   │ • Call state  │                          │
│                   │ • Driver ctx  │                          │
│                   │ • Offer data  │                          │
│                   │ • Barge-in    │                          │
│                   └───────┬───────┘                          │
│                           │                                   │
│           ┌───────────────┼───────────────┐                  │
│           ▼               ▼               ▼                  │
│   ┌───────────────┐ ┌───────────┐ ┌───────────────┐         │
│   │ Tool Executor │ │ Hop And Haul│ │ Escalation    │         │
│   │               │ │ API Client│ │ Handler       │         │
│   │ • accept_ride │ │           │ │               │         │
│   │ • decline_ride│ │ • GET/POST│ │ • PagerDuty   │         │
│   │ • get_details │ │ • Auth    │ │ • Call xfer   │         │
│   └───────────────┘ └───────────┘ └───────────────┘         │
│                                                               │
└──────────────────────────────────────────────────────────────┘

5.2 Connection Lifecycle

1. Twilio WS connects (on call answer)
2. Gateway extracts call metadata from "start" event
3. Gateway opens OpenAI WS connection
4. Gateway sends session.update with instructions + tools
5. Gateway injects initial prompt based on vehicle state
6. Audio relay loop begins
7. Tool calls executed as received
8. On call end: close both WebSockets, log session

5.3 Tool Execution

ToolGateway Action
accept_ride_offer1. POST to Hop And Haul API /rides/{id}/accept
2. Push route to Samsara via PATCH /fleet/routes/{routeId}
decline_ride_offerPOST to Hop And Haul API /rides/{id}/decline
get_offer_detailsGET from Hop And Haul API /rides/{id}, format response
escalate_to_humanTrigger PagerDuty alert, transfer call via Twilio

5.4 Samsara Route Update (On Acceptance)

When driver says "yes" and accept_ride_offer is called:

1. Hop And Haul API marks ride as accepted
2. Gateway calls Samsara API to update driver's route:
   - Add pickup waypoint
   - Add drop-off waypoint
   - Recalculate ETA
3. Driver's Samsara device shows updated route
4. Agent confirms: "I'll change your route now"

Samsara API Call:

http
PATCH https://api.samsara.com/fleet/routes/{routeId}
Authorization: Bearer {SAMSARA_API_TOKEN}
Content-Type: application/json

{
  "stops": [
    {
      "address": { "formattedAddress": "Exit 52, I-40, TN" },
      "scheduledArrivalTime": "2025-12-30T14:30:00Z",
      "notes": "Hop And Haul pickup - John D."
    },
    {
      "address": { "formattedAddress": "Exit 30 Drop-off Location" },
      "scheduledArrivalTime": "2025-12-30T15:00:00Z",
      "notes": "Hop And Haul drop-off"
    }
  ]
}

Tool Response Format:

json
{
  "type": "conversation.item.create",
  "item": {
    "type": "function_call_output",
    "call_id": "call_xxx",
    "output": "{\"status\": \"accepted\", \"message\": \"Ride confirmed.\"}"
  }
}

5.4 State-Aware Prompt Injection

Based on vehicle state (from Samsara/ELD), inject appropriate system instructions:

StatePrompt ModeConstraints
MOVINGRadio SquawkMax 20 words, 1 turn, no questions
STOPPEDDispatch DeskMax 40 words/turn, 3 turns, simple questions
PARKEDFull WorkflowUnlimited turns, full conversation

Example Prompt Injection (MOVING):

CRITICAL: Driver is MOVING. You MUST:
- Use max 20 words total
- Make ONE statement only
- End with "say hold or pass"
- Do NOT ask clarifying questions
- If unclear response, say "queued" and end

6. Call Flow Sequences

6.1 Outbound Call (Offer Notification)

┌─────────────┐    ┌─────────┐    ┌─────────┐    ┌──────────┐    ┌────────┐
│ Hop And Haul  │    │ Twilio  │    │ Gateway │    │ OpenAI   │    │ Driver │
│    API      │    │  Voice  │    │         │    │ Realtime │    │ Phone  │
└──────┬──────┘    └────┬────┘    └────┬────┘    └────┬─────┘    └───┬────┘
       │                │              │              │              │
       │ 1. Create Call │              │              │              │
       │───────────────►│              │              │              │
       │                │ 2. GET /twiml│              │              │
       │                │─────────────►│              │              │
       │                │ 3. TwiML     │              │              │
       │                │◄─────────────│              │              │
       │                │              │              │              │
       │                │ 4. Ring      │              │              │
       │                │──────────────┼──────────────┼─────────────►│
       │                │              │              │              │
       │                │              │              │   5. Answer  │
       │                │◄─────────────┼──────────────┼──────────────│
       │                │              │              │              │
       │                │ 6. WS Open   │              │              │
       │                │─────────────►│              │              │
       │                │ 7. "start"   │              │              │
       │                │─────────────►│              │              │
       │                │              │ 8. WS Connect│              │
       │                │              │─────────────►│              │
       │                │              │ 9. session.update           │
       │                │              │─────────────►│              │
       │                │              │              │              │
       │                │              │ 10. Initial prompt          │
       │                │              │─────────────►│              │
       │                │              │              │              │
       │                │              │ 11. audio.delta              │
       │                │              │◄─────────────│              │
       │                │ 12. "media"  │              │              │
       │                │◄─────────────│              │              │
       │                │              │              │   13. Audio  │
       │                │──────────────┼──────────────┼─────────────►│
       │                │              │              │              │

6.2 Audio Loop (Steady State)

┌─────────┐         ┌─────────┐         ┌──────────┐
│ Twilio  │         │ Gateway │         │ OpenAI   │
│  Voice  │         │         │         │ Realtime │
└────┬────┘         └────┬────┘         └────┬─────┘
     │                   │                   │
     │ "media" (driver)  │                   │
     │──────────────────►│                   │
     │                   │ input_audio_buffer.append
     │                   │──────────────────►│
     │                   │                   │
     │                   │  (VAD detects end)│
     │                   │                   │
     │                   │ response.audio.delta
     │                   │◄──────────────────│
     │ "media" (agent)   │                   │
     │◄──────────────────│                   │
     │                   │                   │
     │    [repeat]       │                   │
     ▼                   ▼                   ▼

6.3 Call Termination

1. Driver hangs up OR agent ends call
2. Twilio sends "stop" event to Gateway
3. Gateway closes OpenAI WebSocket
4. Gateway logs call session to Hop And Haul API
5. Twilio WebSocket closes

7. Event Mapping Cheat-Sheet

7.1 Twilio → Gateway

Twilio EventGateway Action
connectedInitialize session state
startExtract metadata, open OpenAI WS, send session.update
mediaBase64 decode, forward via input_audio_buffer.append
stopClose OpenAI WS, log session, cleanup

7.2 Gateway → OpenAI

Gateway ActionOpenAI Event
ConnectWebSocket handshake
Configuresession.update
Send audioinput_audio_buffer.append
Commit bufferinput_audio_buffer.commit (manual VAD)
Tool responseconversation.item.create (function_call_output)

7.3 OpenAI → Gateway

OpenAI EventGateway Action
session.createdLog, ready for audio
response.audio.deltaForward to Twilio as media
response.audio.doneMark audio complete
response.function_call_arguments.doneExecute tool, return result
input_audio_buffer.speech_started(Optional) Prepare for barge-in
input_audio_buffer.speech_stopped(VAD) Audio committed
errorLog, potentially end call

7.4 Gateway → Twilio

Gateway ActionTwilio Event
Play audiomedia (base64 μ-law)
Sync pointmark (with name)
Flush bufferclear (for barge-in)

8. Barge-In / Interruption Handling

8.1 Problem Statement

When the driver speaks while the agent is outputting audio, we need to:

  1. Stop playing agent audio immediately
  2. Process the driver's speech
  3. Generate new response

8.2 Detection

OpenAI's server_vad detects speech start:

json
{
  "type": "input_audio_buffer.speech_started"
}

8.3 Handling Sequence

┌─────────┐         ┌─────────┐         ┌──────────┐
│ Twilio  │         │ Gateway │         │ OpenAI   │
└────┬────┘         └────┬────┘         └────┬─────┘
     │                   │                   │
     │  (Agent playing)  │ audio.delta       │
     │◄──────────────────│◄──────────────────│
     │                   │                   │
     │ "media" (driver speaks)               │
     │──────────────────►│                   │
     │                   │ input_audio...append
     │                   │──────────────────►│
     │                   │                   │
     │                   │ speech_started    │
     │                   │◄──────────────────│
     │                   │                   │
     │ "clear"           │ (flush buffer)    │
     │◄──────────────────│                   │
     │                   │                   │
     │                   │ response.cancel   │
     │                   │──────────────────►│ (optional)
     │                   │                   │
     │                   │ (process new input)
     │                   │                   │

8.4 Implementation

javascript
// On receiving speech_started from OpenAI
function handleBargeIn() {
  // 1. Send clear to Twilio to flush buffered audio
  twilioWs.send(JSON.stringify({ event: 'clear', streamSid }));

  // 2. Optionally cancel in-progress OpenAI response
  openaiWs.send(JSON.stringify({ type: 'response.cancel' }));

  // 3. Update state
  sessionState.isAgentSpeaking = false;
}

8.5 Twilio clear Message

json
{
  "event": "clear",
  "streamSid": "MZ..."
}

This immediately stops any buffered audio from playing to the caller.


9. Dispatch Coordination Conversations

Dispatch Model

These conversations mirror how dispatchers have communicated with drivers for decades—brief, clear, actionable. The voice agent is simply a more efficient dispatcher calling over a hands-free headset.

9.1 Voice-Turn Budget (from PLCY-VOI-001)

StateAgent WordsAgent TurnsFollow-upsTotal Duration
MOVING20 max10<10 sec
STOPPED40/turn3 max2 max<60 sec
PARKEDUnlimitedUnlimitedAs neededNo limit

9.2 MOVING State: Quick Notification

Offer Notification (like a dispatcher radio squawk):

"Hop And Haul. Transport near exit 52 in 5 miles. Say 'hold' or 'pass'."

Driver Responses:

ResponseAgent Action
"hold""Got it. Details when you stop." (end)
"pass""No problem." (end)
"not now" / "later""Queued." (end)
[silence 5s](silence, end call)
[unclear](queue for stopped, end)

Prohibited While MOVING (per PLCY-COM-001 Section 4.6):

  • Any clarifying questions
  • Any follow-up prompts
  • Sales pitch or persuasion
  • Urgent or pressuring language ("hurry," "last chance," "you need to decide now")
  • Requests to view a screen or tap a button
  • Complex routing instructions
  • Guilt-based framing for declines

9.3 STOPPED State: Full Dispatch Conversation

Standard Call Script (per PLCY-COM-001 Section 4.3):

Agent: "Hey [Driver Name] — got a ride that fits your route.
        Pickup near [Location] in about [X] minutes, drop-off in [Destination].
        Adds [X] miles, pays [$X].
        No rush — let me know in the next few minutes if you want it."

Driver: "Yes" / "Yeah" / "Sure"

Agent: "Great, I'll change your route now."
        [Triggers accept_ride_offer → Samsara route update]

Accepted Response Options (per PLCY-COM-001 Section 4.4):

ResponseResult
"Yes" or equivalent affirmationOpportunity accepted
"No" or equivalent declineOpportunity released, no penalty
"Hold" or "Let me think"Decision deferred, follow-up scheduled

Follow-Up and Timeout (per PLCY-COM-001 Section 4.5):

TimingAction
3-4 minutesSoft follow-up: "Still got that [Destination] ride if you want it. Otherwise I'll pass it along."
5 minutesAuto-decline: "No problem — I've released that one. Drive safe."

If Driver Asks a Question:

Driver: "Where do they need to go?"

Agent: "Yeah, looks like they need to be dropped off about 5 minutes
        off of exit 30. Still interested?"

Driver: "Sure, let's do it."

Agent: "Perfect, updating your route now."
        [Triggers accept_ride_offer → Samsara route update]

If Driver Declines:

Driver: "Nah, not today."

Agent: "No problem. Safe travels."
        [Triggers decline_ride_offer, end call]

9.4 PARKED State: Full Details Available

Agent: "Ready to confirm the pickup details?"

Driver: "Yes"

Agent: "Pickup is at the Pilot truck stop at exit 52.
        Ask for John D—he's wearing a blue jacket.
        Drop-off is exit 30, about 25 miles.
        Route is updating on your Samsara now.
        Safe travels."

9.5 Key Conversation Patterns

PatternExample
Present opportunity"There's an option to pick up someone near [location]"
State diversion clearly"Total diversion will be about [X] minutes"
Simple decision"Does this sound good?" / "Interested?"
Confirm action"I'll change your route now" / "Updating your route"
Accept decline gracefully"No problem" / "Safe travels"

9.6 System Prompt Template

You are a Hop And Haul dispatch coordinator calling a commercial driver over
their hands-free Bluetooth headset. You act like a human dispatcher—brief,
professional, and helpful.

CURRENT DRIVER STATE: {vehicle_state}

YOUR ROLE:
- You are dispatch, not a sales agent
- Present transport opportunities clearly
- On acceptance, confirm you're updating their Samsara route
- Accept "no" gracefully, no follow-up

CONVERSATION STYLE:
- Natural, like talking to a colleague
- "Hey, there's an option..." not "Hop And Haul has identified..."
- "Does this sound good?" not "Would you like to accept this offer?"
- "I'll change your route now" not "Your route will be updated momentarily"

CORE RULES (ALL STATES):
- Never pressure or coerce
- Accept "no" immediately, no follow-up
- If driver says "speak to a person", immediately call escalate_to_human
- For safety emergencies, call escalate_to_human with reason="safety"
- On acceptance, call accept_ride_offer (this pushes route to Samsara)

{state_specific_instructions based on vehicle_state}

10. Error Handling & Resilience

10.1 WebSocket Reconnection

FailureAction
OpenAI WS disconnectsAttempt reconnect (3 tries, exponential backoff)
Twilio WS disconnectsCall ended, cleanup session
Gateway crashTwilio timeout → call ends gracefully

10.2 Audio Quality Degradation

IssueDetectionAction
High latencyAudio timestamp driftLog, continue
Packet lossGaps in audio streamInterpolate silence
Encoding errorInvalid base64Skip frame, log

10.3 Timeout Handling

TimeoutDurationAction
Driver silence (MOVING)5 secondsEnd call silently
Driver silence (STOPPED)15 seconds"I'll check back later" + end
OpenAI response10 secondsFallback: "One moment..."
Tool execution5 secondsReturn error to OpenAI

10.4 Graceful Degradation

ScenarioFallback
OpenAI unavailableTransfer to human support
Hop And Haul API unavailable"Having technical difficulties, please try again"
Tool execution failsReport error to driver, offer retry or human

11. NIST 800-53 Control Mapping

ControlTitleImplementation
AC-3Access EnforcementAPI key scoping, session isolation
AU-2Event LoggingAll calls logged with full metadata
AU-3Content of Audit RecordsCall ID, driver ID, timestamps, outcomes
SC-8Transmission ConfidentialityTLS for all WebSocket connections
SC-12Cryptographic Key ManagementAPI keys in secrets manager
SI-4System MonitoringReal-time call quality monitoring
IR-4Incident HandlingEscalation to human for safety/support

For complete NIST mapping, see Control Mapping Matrix.


12. Security Considerations

12.1 API Key Management

KeyStorageRotation
Twilio Account SID/Auth TokenAWS Secrets ManagerQuarterly
OpenAI API KeyAWS Secrets ManagerQuarterly
Hop And Haul API Key (internal)Environment variablePer deployment

12.2 Data Handling

DataHandlingRetention
Call audioNot stored (streaming only)None
TranscriptsOptional, per consentPer PLCY-RET-001
Call metadataLogged to audit trail24 months
Driver PIITokenized in logsPer PLCY-RET-001

See PLCY-VOI-001 Section 9 for consent requirements by state.


13. Monitoring & Observability

13.1 Metrics

MetricDescriptionAlert Threshold
Call success rateCompleted / Initiated<95%
Average latencyEnd-to-end response time>500ms
Barge-in rateInterruptions / CallsInformational
Escalation rateHuman transfers / Calls>10%
Error rateFailed calls / Total>2%

13.2 Logging

Log LevelContent
INFOCall start/end, state transitions
DEBUGAudio frame counts, timing
WARNReconnections, degraded quality
ERRORFailures, timeouts, exceptions

14. Document Control

VersionDateAuthorChanges
1.0December 30, 2025Hop And Haul TeamInitial release
1.1December 30, 2025Hop And Haul TeamClarified dispatch coordination model, added one-touch headset requirement, added Samsara route update on acceptance, updated conversation examples to natural dispatch style
1.2January 21, 2026Hop And Haul TeamAligned with PLCY-COM-001 v3.0: Updated STOPPED state to use standard call script. Added accepted response options (Yes/No/Hold). Added follow-up and timeout logic (3-4 min soft follow-up, 5 min auto-decline). Expanded prohibited language list for MOVING state.

CONFIDENTIAL - Internal Use Only - Hop And Haul Policy Documentation