Voice & Video Guide
Randal's voice support is optional. A normal text-only setup does not need any of this. In this branch, the primary first working path is PSTN voice with Twilio. Browser/admin voice is supported, but it is secondary.
When enabled, the @randal/voice package wires LiveKit rooms, Twilio SIP
trunks, and STT/TTS providers into the runner loop so Randal can listen, think,
and speak in real time.
Minimum Viable PSTN Voice On Railway
Use this checklist if your goal is: incoming or outgoing phone calls reach Randal, Randal answers through Twilio, and the call runs through LiveKit + Deepgram + ElevenLabs.
- Create or choose a Railway service using
randal.config.railway.yaml. - Set the required GitHub Actions repository secrets so the Railway deploy workflow can upsert them into Railway.
- Configure Twilio in a dedicated subaccount.
- Point
RANDAL_VOICE_PUBLIC_URLat the public HTTPS/WSS host that Twilio can reach for Randal's/voice/*routes. - Deploy.
- Test inbound or outbound PSTN calling.
If you are only testing browser/admin voice, skip to
Browser-only testing (secondary path) below.
What To Add Where
1. GitHub Actions repository secrets
If you use .github/workflows/railway-deploy.yml, these are the values that get
copied into Railway. Your local .env does not get copied into Railway by that
workflow.
Required for PSTN voice on Railway:
RAILWAY_TOKENRAILWAY_WORKSPACE_ID- One provider secret:
OPENROUTER_API_KEYorANTHROPIC_API_KEYorOPENAI_API_KEY MEILI_MASTER_KEYRANDAL_API_TOKENRANDAL_VOICE_PUBLIC_URLLIVEKIT_URLLIVEKIT_API_KEYLIVEKIT_API_SECRETDEEPGRAM_API_KEYELEVENLABS_API_KEYELEVENLABS_VOICE_ID(recommended)TWILIO_ACCOUNT_SIDTWILIO_AUTH_TOKENTWILIO_PHONE_NUMBER
Optional depending on your deployment:
GH_TOKENTAVILY_API_KEYDISCORD_BOT_TOKENFAL_KEY
2. Railway config in repo
Keep the checked-in randal.config.railway.yaml as the service config.
It already includes:
gateway.channels: [http, voice, discord]- the
voice:block for LiveKit, Twilio, Deepgram, and ElevenLabs - credential allowlists/inheritance for the voice env vars
3. Local .env
Use local .env only for local testing. It is not the source of truth for the
Railway deploy workflow.
4. Twilio account setup
Use a dedicated Twilio subaccount for this integration so voice numbers, billing, and webhook changes stay isolated from anything else in your main Twilio account.
The current code uses Twilio account credentials directly:
TWILIO_ACCOUNT_SIDTWILIO_AUTH_TOKENTWILIO_PHONE_NUMBER
It does not currently use Twilio API keys for the PSTN runtime path.
Before you start
Choose the parts you actually need:
| Use case | Required services/accounts |
|---|---|
| Browser voice in the dashboard or your own UI | LiveKit + one STT provider + one TTS provider |
| Outbound/inbound phone calls | LiveKit + one STT provider + one TTS provider + Twilio |
| Video meeting participation | Same as voice, plus the meeting platform's SIP/dial-in support |
Required accounts and services for the common PSTN path in this repo:
- LiveKit Cloud account or your own LiveKit server
- Deepgram account for STT
- ElevenLabs account for TTS
- Twilio account only if you want PSTN phone calls
- A public HTTPS/WSS URL that reaches the Randal gateway when voice traffic comes from outside your machine
Required environment variables:
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=APIxxxxxxxx
LIVEKIT_API_SECRET=xxxxxxxxxxxxxxxxxxxxxxxx
DEEPGRAM_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ELEVENLABS_API_KEY=sk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ELEVENLABS_VOICE_ID=pNInz6obpgDQGcFmaJgB # optional, falls back to a default voice
RANDAL_VOICE_PUBLIC_URL=https://voice.example.com
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # phone calls only
TWILIO_AUTH_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # phone calls only
TWILIO_PHONE_NUMBER=+15551234567 # phone calls only
RANDAL_VOICE_PUBLIC_URL must be the public base URL for the gateway voice
routes. It is not the LiveKit URL. Twilio and remote browsers use this URL to
reach Randal's own /voice/... endpoints.
For PSTN on Railway, RANDAL_VOICE_PUBLIC_URL should usually be one of:
- the public Railway service domain, if Twilio can reach it directly and WebSocket traffic behaves correctly
- a public reverse proxy or edge host that forwards to the Railway gateway
Architecture overview
Caller ──► Twilio SIP ──► LiveKit Room ──► Randal Voice Engine
│
┌──────────┼──────────┐
▼ ▼ ▼
STT Runner TTS
(Deepgram) (Ralph) (ElevenLabs)
- Audio arrives via a LiveKit room (browser widget, SIP, or direct).
- The voice engine streams audio chunks to the STT provider.
- Transcribed text is fed into the runner as a normal message.
- The runner's response text is sent to the TTS provider.
- Synthesised audio is published back into the LiveKit room.
What runs where:
randal serveruns the Randal gateway and the voice HTTP/WebSocket routes.docker-compose.voice.ymlstarts local media infrastructure only: Redis, LiveKit server, and the LiveKit SIP bridge.- Twilio talks to the public gateway voice routes, not directly to your local
randal serveprocess unless you expose it with a tunnel or reverse proxy.
LiveKit setup
Cloud (recommended for getting started)
- Create an account at livekit.io.
- Copy the WebSocket URL, API Key, and API Secret from the project dashboard.
- Add them to your
.env:
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=APIxxxxxxxx
LIVEKIT_API_SECRET=xxxxxxxxxxxxxxxxxxxxxxxx
Self-hosted
Run LiveKit on your own infrastructure with Docker:
docker run --rm -p 7880:7880 -p 7881:7881 -p 7882:7882/udp \
livekit/livekit-server --dev
For production, see the LiveKit deployment docs.
The default dev server uses APIxxxxxxxx / xxxxxxxxxxxxxxxxxxxxxxxx as
key/secret.
For the full phone/media development stack in this repo, run:
docker compose -f docker-compose.voice.yml up -d
This starts:
- Redis
- LiveKit server
- LiveKit SIP bridge
Use docker/voice/livekit.yaml and docker/voice/sip.yaml as the local reference configs.
This compose file does not start the Randal gateway; run randal serve separately.
PSTN/Twilio Testing (Primary Path)
Twilio account guidance
Recommended setup:
- Create a dedicated Twilio subaccount for Randal voice.
- Buy or move one phone number into that subaccount.
- Use the subaccount's
TWILIO_ACCOUNT_SIDandTWILIO_AUTH_TOKEN. - Set
TWILIO_PHONE_NUMBERto the E.164 number you want Randal to use.
This repo currently expects Twilio account credentials, not Twilio API keys.
Twilio setup checklist
- Buy a phone number in the Twilio console.
- Configure the number or call flow so Twilio reaches your public Randal voice routes.
- Set:
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_PHONE_NUMBER=+15551234567
- Set
RANDAL_VOICE_PUBLIC_URLto the public HTTPS/WSS host that Twilio can reach. - Make sure these routes are publicly reachable:
POST /voice/twiml/inboundPOST /voice/twiml/outbound/:sessionIdPOST /voice/twilio/status/:sessionIdPOST /voice/twilio/stream-status/:sessionIdGET /voice/media-stream/:sessionId
- Deploy and place a test call.
Current implemented route posture:
- Public voice ingress: the Twilio routes above, protected by Twilio signature validation
- Protected admin/browser routes:
POST /api/voice/token,GET /voice/status, and the rest of the authenticated HTTP surface
Railway Deployment For PSTN Voice
For Railway, the easiest operator model is:
- Keep
randal.config.railway.yamlin the repo. - Set the GitHub Actions secrets listed above.
- Let
.github/workflows/railway-deploy.ymlupsert those secrets into Railway. - Treat local
.envas local-only.
Minimum Railway PSTN checklist:
randal.config.railway.yamlincludes- type: voiceand thevoice:block- GitHub repository secrets include all
LIVEKIT_*,DEEPGRAM_API_KEY,ELEVENLABS_*,TWILIO_*, andRANDAL_VOICE_PUBLIC_URL RANDAL_VOICE_PUBLIC_URLis public and Twilio-reachable- Twilio uses the same public base URL for Randal's
/voice/*routes - Railway hosts the gateway/runner; LiveKit and Twilio stay external services
Why The PSTN Stack Currently Uses Four Services
For the current PSTN path, each service has a separate job:
- Twilio: phone numbers, PSTN ingress/egress, webhook delivery, and media stream handoff
- LiveKit: real-time room/media coordination and session plumbing
- Deepgram: speech-to-text for live caller audio
- ElevenLabs: text-to-speech for spoken responses back into the call
That means this build keeps the current multi-provider runtime on purpose. A future provider-consolidation pass may simplify the stack, but that is a later cleanup project, not part of this integration build.
Scaling Notes For ~10 And ~100 Concurrent Calls
What the operator needs to know:
- 10 concurrent calls is usually a configuration and quota check.
- 100 concurrent calls is a systems-capacity exercise across Twilio, LiveKit, Deepgram, ElevenLabs, and Randal itself.
At around 10 concurrent calls, check:
- Twilio account/subaccount limits, call routing, and webhook reliability
- LiveKit room/media capacity for the expected codec and region
- Deepgram concurrent streaming limits
- ElevenLabs throughput and latency under overlapping TTS requests
- Railway instance CPU/memory headroom for the gateway process
At around 100 concurrent calls, assume you will need to actively manage:
- Twilio concurrency, phone number throughput, and status/webhook burst handling
- LiveKit scaling, region placement, and media-node sizing
- Deepgram stream concurrency and backpressure behavior
- ElevenLabs synthesis throughput and the latency impact on turn-taking
- Randal application scaling: more CPU, more memory, and likely multiple app instances for webhook/WebSocket load
Practical guidance:
- Load-test with the same Twilio subaccount, LiveKit project, Deepgram account, and ElevenLabs plan you expect to use in production.
- Watch end-to-end latency, not just gateway CPU.
- Treat
RANDAL_VOICE_PUBLIC_URLand Twilio webhook delivery as production dependencies, not just config values. - Expect 100 concurrent calls to require vendor-quota reviews and staged rollout, not just a bigger Railway instance.
Local development flow
For a beginner-friendly local setup, do the steps in this order:
- Copy
.env.exampleto.envand fill in the voice env vars you need. - Start the media side:
docker compose -f docker-compose.voice.yml up -d
- Start the gateway separately:
randal serve
- Enable the
voicechannel andvoice.enabled: truein your config. - For browser voice on your own machine, you can usually test with local LiveKit plus the local dashboard.
- For Twilio webhooks or any remote client, expose the gateway with a public
HTTPS tunnel and set
RANDAL_VOICE_PUBLIC_URLto that public URL.
Example tunnel flow:
# Example with ngrok
ngrok http 7600
# Then set
RANDAL_VOICE_PUBLIC_URL=https://<your-ngrok-subdomain>.ngrok.app
STT provider setup
Deepgram (default)
- Sign up at deepgram.com and create an API key.
- Add to
.env:
DEEPGRAM_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
- Config:
voice:
stt:
provider: deepgram
apiKey: ${DEEPGRAM_API_KEY}
model: nova-2 # optional, defaults to provider's latest
OpenAI Whisper
voice:
stt:
provider: whisper
apiKey: ${OPENAI_API_KEY}
model: whisper-1
AssemblyAI
voice:
stt:
provider: assemblyai
apiKey: ${ASSEMBLYAI_API_KEY}
TTS provider setup
ElevenLabs (default)
- Get an API key from elevenlabs.io.
- Choose a voice ID from the voice library.
voice:
tts:
provider: elevenlabs
apiKey: ${ELEVENLABS_API_KEY}
voice: pNInz6obpgDQGcFmaJgB # "Adam" — or any voice ID
OpenAI TTS
voice:
tts:
provider: openai
apiKey: ${OPENAI_API_KEY}
voice: alloy
Cartesia
voice:
tts:
provider: cartesia
apiKey: ${CARTESIA_API_KEY}
voice: sonic-english
Edge TTS (free, no API key)
voice:
tts:
provider: edge
voice: en-US-GuyNeural
Browser-only testing (secondary path)
Randal ships a lightweight voice widget that connects to a LiveKit room from the browser. This is useful for admin testing, but it is not the primary first deployment path in this branch.
To enable it:
- Make sure the
voicechannel is in your gateway config:
gateway:
channels:
- type: voice
- Make sure the
voiceblock is enabled and has working LiveKit/STT/TTS credentials. - Start
randal serve. - The dashboard (served by
@randal/dashboard) automatically renders a microphone button when voice is enabled. - Clicking the button requests a LiveKit participant token from the gateway, joins the room, and streams audio.
For custom UIs, use the
LiveKit JavaScript SDK and request a
token from POST /api/voice/token.
Browser voice uses the same authenticated HTTP admin surface as the rest of the gateway. Anonymous browser clients do not get an implicit admin voice session, and if HTTP auth is not configured the protected browser voice routes fail closed.
Browser-only testing does not require Twilio. PSTN testing does.
Video call participation
Randal can join Zoom, Google Meet, and Microsoft Teams meetings via SIP or RTMP.
How it works
- SIP dial-in: Many conferencing platforms expose SIP URIs for meetings. Randal uses the LiveKit SIP bridge to dial into the meeting as a participant.
- Video processing: When
video.enabledis true, Randal periodically captures frames from the video track and sends them to a vision model for scene understanding.
Configuration
voice:
video:
enabled: true
visionModel: gpt-4o # model for frame analysis
publishScreen: false # share Randal's screen into the call
recordSessions: true # save recordings locally
recordPath: ./recordings
Meeting-specific notes
| Platform | Method | Notes |
|---|---|---|
| Zoom | SIP URI | Requires Zoom SIP connector add-on |
| Google Meet | SIP dial-in | Available on Google Workspace Business+ |
| Microsoft Teams | SIP via Direct Routing | Requires Teams Phone System license |
Outbound calling
Randal can place outbound phone calls via Twilio:
randal call +15559876543 --prompt "Check in with the client about delivery"
Or programmatically through the gateway API:
curl -X POST http://localhost:7600/voice/call \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"to": "+15559876543", "reason": "Check in about delivery"}'
The call flow:
- Twilio places the outbound call.
- When answered, audio is bridged into a LiveKit room.
- The STT/Runner/TTS pipeline handles the conversation.
Turn detection
The voice engine detects when the caller stops speaking before generating a response. Two modes are available:
voice:
turnDetection:
mode: auto # VAD-based automatic detection (default)
# mode: manual # wait for explicit push-to-talk signal
Full configuration example
name: voice-assistant
runner:
workdir: ./workspace
voice:
enabled: true
livekit:
url: ${LIVEKIT_URL}
apiKey: ${LIVEKIT_API_KEY}
apiSecret: ${LIVEKIT_API_SECRET}
twilio:
accountSid: ${TWILIO_ACCOUNT_SID}
authToken: ${TWILIO_AUTH_TOKEN}
phoneNumber: ${TWILIO_PHONE_NUMBER}
stt:
provider: deepgram
apiKey: ${DEEPGRAM_API_KEY}
model: nova-2
tts:
provider: elevenlabs
apiKey: ${ELEVENLABS_API_KEY}
voice: pNInz6obpgDQGcFmaJgB
turnDetection:
mode: auto
video:
enabled: false
gateway:
channels:
- type: voice
access:
trustedCallers:
- ${ADMIN_CALLER_E164}
unknownInbound: external
defaultExternalGrants: [memory]
- type: http
port: 7600
auth: ${API_TOKEN}
If you do not want voice, remove - type: voice and the entire voice: block.
The rest of Randal works normally without any voice-specific credentials.
For higher-assurance deployments, keep browser/admin voice on the authenticated gateway surface and treat PSTN/Twilio routes as the only intentionally public voice ingress.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| No audio in room | LiveKit URL wrong or unreachable | Verify LIVEKIT_URL and network access |
| STT returns empty | API key invalid or rate-limited | Check provider dashboard for errors |
| High latency | STT + TTS round-trip too slow | Try deepgram STT + edge TTS for lowest latency |
| Outbound call fails | Twilio credentials or phone number misconfigured | Verify in Twilio console |
| Video frames not processed | video.enabled not set to true | Add video.enabled: true to config |