Why Nemo-RT Pro
Voice AI that works like software, not metered water. Benchmarked and tuned for production workloads — comparable to OpenAI Realtime, Gemini Live, Vapi, and Retell AI.
Zero per-minute fees
One deployment, one monthly support fee. 5,000 minutes or 500,000 — your cost doesn't change.
Full data residency
Runs on your NVIDIA hardware in your datacenter. Patient PHI and customer data never cross US clouds.
Sub-second latency
Sub-second TTFA at single-user benchmarks. Comparable to the best cloud voice APIs — without the cost curve.
Multi-tenant by default
One NVIDIA DGX serves multiple clients with isolated prompts, RAG, and MCP tools per tenant.
How it works
SIP call in. Voice AI agent that actually does things out. All on your hardware.
SIP / Asterisk
Your Asterisk PBX (or compatible SIP trunk) bridges the call via ARI WebSocket to Nemo-RT Pro.
VAD + STT
Silero VAD + NeMo Conformer CTC bilingual EN-ES. Acoustic and lexical language detection per turn.
LLM + MCP
Qwen3-14B-NVFP4 on TensorRT-LLM with full MCP tool-calling framework and per-tenant RAG.
TTS back out
NeMo FastPitch + HiFiGAN streaming audio back to the caller. 174 Latin Spanish voices plus English.
No data leaves your datacenter. No cloud API fees. Full stack ownership — ASR, LLM, TTS, SIP integration, and MCP tool calling delivered as one coherent product.
Built for production
Benchmarks-driven engineering. Every component chosen and tuned for real traffic.
TTFA at single-user benchmarks
Concurrent conversations tested, zero errors
End-users at 10:1 overbooking on one DGX
TTFA improvement via TTS batch sizing
Model stack
- LLM: Qwen3-14B-NVFP4 via TensorRT-LLM (4-bit quantized)
- STT: NeMo Conformer CTC bilingual EN-ES
- TTS: NeMo FastPitch + HiFiGAN (174 Latin Spanish + 10 English voices)
- VAD: Silero with configurable thresholds
- MCP framework: 5 tool-call parsing formats, multi-turn sequences, RAG per tenant
Hardware and deployment
- Recommended: NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory)
- Compatible: any NVIDIA GPU with 16GB+ VRAM (RTX 3090/4090, A100, H100, Jetson AGX Orin)
- Install: Docker Compose, single-command script, Caddy auto-HTTPS
- Stack: FastAPI + WebSocket, Python 3.11+, Node.js ARI bridge
- Go-live: 4-week target from contract signature
Nemo-RT Pro vs cloud voice AI
Same conversational quality. Fundamentally different economics and compliance story.
| Dimension | Nemo-RT Pro | OpenAI Realtime | Vapi | Retell AI |
|---|---|---|---|---|
| Deployment | On-prem, your hardware | Cloud only | Cloud only | Cloud only |
| Pricing model | Deploy + fixed monthly | ~$0.30/min | $0.20 - $0.33/min | ~$0.11/min |
| Data residency | Your datacenter | US-hosted | US-hosted | US-hosted |
| Multi-tenant native | Yes (per-tenant RAG + MCP) | No (build yourself) | Limited | Limited |
| LATAM Spanish quality | 174 Latin voices | Generic multilingual | Castilian default | Castilian default |
| Year 1 at 20K min/mo | $28K early-access | ~$72K | $48K - $80K | ~$30K |
At sub-15K min/mo, pure-cloud providers may be cheaper. Nemo-RT Pro economics shine at 20K+ min/mo, multi-tenant operators, regulated verticals, and Spanish-first use cases.
In production at a LATAM SIP trunk operator
Regional telecom operator running automated medical appointment booking for a growing clinic network, 24/7, in native LATAM Spanish.
- 5 clinic tenants on a single NVIDIA DGX, multi-tenant from day one
- 200+ end-users serviced with 10:1 overbooking, zero errors in stress tests
- Custom MCPs for medical scheduling API and civil registry lookup by national ID
- Per-tenant RAG documents, system prompts, and API keys
- Zero OpenAI API fees — they eliminated that P&L line item entirely
- 100% data residency — patient PHI never leaves their datacenter
"We evaluated every major voice AI platform and none of them fit: on-premise deployment, native LATAM Spanish, multi-tenant for our clinic customers, and no per-minute fees. Nemo-RT Pro was the only option that checked all those boxes."
Tenant types running today
Open-core by design
Start free on the community edition. Graduate to Pro when you need multi-tenant, support, and commercial licensing.
Nemo-RT Community
MIT license, free forever.
- Same ASR, LLM, TTS, and MCP stack as Pro
- Single tenant self-host
- Any NVIDIA GPU with 16GB+ VRAM
- Community GitHub support
Nemo-RT Pro
Commercial license + support + SLA.
- Everything in Community, plus:
- Multi-tenant architecture (unlimited tenants)
- Admin panel with per-tenant config
- Transfer Call MCP included, custom MCPs available
- Monthly updates + engineering support
- 4-week go-live target
Built for these operators
Where the economics and compliance story of on-prem voice AI lands hardest.
Medical appointment booking
Clinics and health networks automating 24/7 scheduling in native LATAM Spanish with RAG and scheduling MCPs.
Call center automation
Contact centers deflecting routine calls from human agents, with call-transfer MCPs to escalate when needed.
AI-powered SIP trunks
SIP operators reselling multi-tenant voice AI to their client base without per-minute margin pressure.
Government voice services
Municipalities and public entities needing Spanish-native voice access with strict data residency requirements.
Transparent pricing. No per-minute fees.
You pay the same whether you run 5,000 minutes or 500,000 per month. 4-week go-live target from contract signature.
Pro — Early-access
First 3-5 clients this quarter.
Year 1 total: $28,000
- Multi-tenant, unlimited tenants
- Full Spanish voice library (174 voices)
- Qwen3-14B-NVFP4 on TensorRT-LLM
- RAG + MCP tool calling framework
- Transfer Call MCP included
- 4-week go-live target
- 20h / month engineering support
In exchange for reference case study rights + roadmap input.
Book discovery callPro — Standard
Q3 2026 onwards.
Year 1 total: $36,000 · Year 2+: $24,000/year
- Multi-tenant, unlimited tenants
- Full Spanish voice library (174 voices)
- Qwen3-14B-NVFP4 on TensorRT-LLM
- RAG + MCP tool calling framework
- Transfer Call MCP included
- 4-week go-live target
- 20h / month engineering support
White-label, SLA 99.5%, and custom MCP bundles available in Enterprise tier.
Contact salesAll prices in USD. LATAM companies can invoice in local currency via our US LLC. Hardware (NVIDIA DGX Spark or compatible GPU) procured directly from NVIDIA or your reseller — we don't mark up hardware.
Credentials and open-source track record
Founder credentials and defendible production signal across 10+ countries.
36
GitHub stars on asterisk_to_openai_rt
68
Forks (+ 23★ / 27⑂ on community variant)
10+
Countries with production deployments
AI Infrastructure & Operations (2026-2028)
Generative AI Essentials
Speech API certified
Multicloud Network Associate
Startup School 2020 alumni
Member portfolio company
Frequently asked questions
Ready to see if Nemo-RT Pro fits your operation?
20-minute discovery call. No slides. We look at your volume, your compliance needs, your hardware, and tell you honestly whether this is a fit — or what would be better.
Questions? Email yan.frank@infinitocloud.com