Voice AI that scales without per-minute fees.

Nemo-RT Pro is Spanish and English voice AI that runs 100% on your own NVIDIA infrastructure: multi-tenant, split-second, zero per-minute rates — the only way to scale voice AI without per-minute fees eating your margins.

Built by INFINITO CLOUD (US LLC). In production at a LATAM SIP trunk operator serving 200+ end-users across multiple tenants on a single NVIDIA DGX.

Why Nemo-RT Pro

Voice AI that works like software, not metered water. Benchmarked and tuned for production workloads — comparable to OpenAI Realtime, Gemini Live, Vapi, and Retell AI.

Zero per-minute fees

One deployment, one monthly support fee. 5,000 minutes or 500,000 — your cost doesn't change.

Full data residency

Runs on your NVIDIA hardware in your datacenter. Patient PHI and customer data never cross US clouds.

Sub-second latency

Sub-second TTFA at single-user benchmarks. Comparable to the best cloud voice APIs — without the cost curve.

Multi-tenant by default

One NVIDIA DGX serves multiple clients with isolated prompts, RAG, and MCP tools per tenant.

How it works

SIP call in. Voice AI agent that actually does things out. All on your hardware.

1
SIP / Asterisk

Your Asterisk PBX (or compatible SIP trunk) bridges the call via ARI WebSocket to Nemo-RT Pro.

2
VAD + STT

Silero VAD + NeMo Conformer CTC bilingual EN-ES. Acoustic and lexical language detection per turn.

3
LLM + MCP

Qwen3.6-35B-A3B-FP8 on vLLM with full MCP tool-calling framework and per-tenant RAG.

4
TTS back out

NeMo FastPitch + HiFiGAN streaming audio back to the caller. 174 Latin Spanish voices plus English.

No data leaves your datacenter. No cloud API fees. Full stack ownership — ASR, LLM, TTS, SIP integration, and MCP tool calling delivered as one coherent product.

Built for production

Benchmarks-driven engineering. Every component chosen and tuned for real traffic.

<1s

TTFA at single-user benchmarks

20+

Concurrent conversations tested, zero errors

200+

End-users at 10:1 overbooking on one DGX

39%

TTFA improvement via TTS batch sizing

Model stack

  • LLM: Qwen3.6-35B-A3B-FP8 via vLLM (FP8 quantized · MoE 3B active per token)
  • STT: NeMo Conformer CTC bilingual EN-ES
  • TTS: NeMo FastPitch + HiFiGAN (174 Latin Spanish + 10 English voices)
  • VAD: Silero with configurable thresholds
  • MCP framework: 5 tool-call parsing formats, multi-turn sequences, RAG per tenant

Hardware and capacity

  • Recommended hardware: NVIDIA DGX Spark (GB10 Blackwell, 128GB unified memory) — ideal for the 35B MoE model at FP8. ~$4,700 MSRP from any NVIDIA partner. Compatible: any NVIDIA Hopper (H100/H200) or Blackwell (B100, DGX Station) with FP8 support and ≥80GB VRAM.
  • Capacity per DGX Spark: 20 concurrent live conversations · 10:1 overbooking ratio → ~200 tenants registered per box (standard voice trunking economics — register your full tenant base, real-time capacity scales to peak hour).
  • Scale-up path: NVIDIA DGX Station (multi-GPU Blackwell) for 100+ concurrent production workloads · or multiple DGX Sparks in parallel for horizontal scaling.
  • Install: Docker Compose · single-command script · Caddy auto-HTTPS.
  • Stack: FastAPI + WebSocket, Python 3.11+, Node.js ARI bridge.
  • Go-live: 2 weeks from contract signature.
  • Hardware sourcing: BYO from any NVIDIA reseller, or we connect you with a validated distributor in your country — free service, no markup from us (local currency · local warranty · local logistics). Building a vetted reseller network across LATAM and US Hispanic markets.

Nemo-RT Pro vs cloud voice AI

Same conversational quality. Fundamentally different economics and compliance story.

Dimension Nemo-RT Pro OpenAI Realtime Vapi Retell AI
Deployment On-prem, your hardware Cloud only Cloud only Cloud only
Pricing model Deploy + fixed monthly ~$0.30/min $0.20 - $0.33/min ~$0.11/min
Data residency Your datacenter US-hosted US-hosted US-hosted
Multi-tenant native Yes (per-tenant RAG + MCP) No (build yourself) Limited Limited
LATAM Spanish quality 174 Latin voices Generic multilingual Castilian default Castilian default
Year 1 at 20K min/mo $28K early-access ~$72K $48K - $80K ~$30K

At sub-15K min/mo, pure-cloud providers may be cheaper. Nemo-RT Pro economics shine at 20K+ min/mo, multi-tenant operators, regulated verticals, and Spanish-first use cases.

In production at a LATAM SIP trunk operator — since Q1 2026

Regional telecom operator running automated medical appointment booking for a growing clinic network, 24/7, in native LATAM Spanish.

  • 5 clinic tenants on a single NVIDIA DGX, multi-tenant from day one
  • 200+ end-users serviced with 10:1 overbooking, zero errors in stress tests
  • 3 productized MCPs: scheduling (mcp-citas), human escalation (mcp-transferencias), voice surveys (mcp-encuestas) — plus custom integrations (civil registry, national ID lookup)
  • Per-tenant RAG documents, system prompts, and API keys
  • Zero cloud AI per-minute fees — eliminated the entire variable cost line item (no Vapi / Retell / Twilio / OpenAI bills)
  • 100% data residency — patient PHI never leaves their datacenter
Book a 30-min discovery call

"We evaluated every major voice AI platform and none of them fit: on-premise deployment, native LATAM Spanish, multi-tenant for our clinic customers, and no per-minute fees. Nemo-RT Pro was the only option that checked all those boxes."

— CTO of a regional LATAM voice/SIP operator
Verified production 5 clinics 200+ users 24/7 since Q1 2026
Capabilities running in production today
System-prompt Assistants RAG search on PDFs, TXTs & URLs mcp-citas · scheduling mcp-transferencias · human escalation mcp-encuestas · voice surveys & triage

Built for these operators

Where the economics and compliance story of on-prem voice AI lands hardest.

Medical appointment booking

Clinics and health networks automating 24/7 scheduling in native LATAM Spanish with RAG and scheduling MCPs.

Call center automation

Contact centers deflecting routine calls from human agents, with call-transfer MCPs to escalate when needed.

AI-powered SIP trunks

SIP operators reselling multi-tenant voice AI to their client base without per-minute margin pressure.

Government voice services

Municipalities and public entities needing Spanish-native voice access with strict data residency requirements.

Transparent pricing. No per-minute fees.

You pay the same whether you run 5,000 minutes or 500,000 per month. 2-week go-live target from contract signature.

Early-access

Early-access

First 10 clients · in exchange for case study.

$10,000 deployment
+ $999 / month support

Year 1: $22,000 · Year 2+: $12,000/year recurring

Hardware (DGX Spark ~$4,700 MSRP) separate — see Hardware and capacity section.

Everything in PRO, plus:

  • Pricing locked 24 months from signature
  • Priority access to beta features
  • Direct roadmap input
  • Case study collaboration (anonymizable)
Book discovery call
Most popular

PRO

Standard commercial tier · Q3 2026 onwards.

$12,000 deployment
+ $1,499 / month support

Year 1: $30,000 · Year 2+: $18,000/year recurring

Hardware (DGX Spark ~$4,700 MSRP) separate — see Hardware and capacity section.

Full platform, no add-ons:

  • Multi-tenant native · unlimited tenants
  • Full Spanish + English voice library (174 Latin + EN TTS)
  • Qwen3.6-35B-A3B-FP8 reasoning via vLLM
  • Per-tenant RAG + MCP tool-calling
  • 3 productized MCPs: citas, transferencias, encuestas
  • Software updates + new voices + new MCP modules
  • Zero per-minute fees
  • Up to 20h / month engineering support (24h SLA)
  • 2-week go-live target
Contact sales

Enterprise

Mission-critical deployments · banks, government, regulated industries.

From $25,000 deployment
+ From $4,999 / month support

Year 1: From $85,000 · Year 2+: From $60,000/year recurring

NVIDIA DGX cluster hardware quoted per deployment (multi-node).

Everything in PRO, plus:

  • NVIDIA DGX cluster deployment (multi-node horizontal scaling)
  • Cluster orchestration + GPU pool management
  • Custom voice cloning (brand voice fine-tuning)
  • White-label (your brand, your domain)
  • Custom MCP development (CRM, billing, ticketing)
  • Multi-region failover · disaster recovery
  • SLA 99.5% with service credits
  • 24/7 phone + Slack support channel
  • Dedicated solutions engineer + quarterly business reviews
  • SOC 2 / HIPAA / ISO 27001 compliance support
  • On-site deployment + IT team training
  • Custom legal, audit, and compliance terms
Contact sales

All prices in USD. LATAM companies can invoice in local currency via our US LLC. See Hardware and capacity section above for hardware specs, capacity per box, and procurement options.

Credentials and production track record

Founder credentials and defendible production signal across 10+ countries.

36

GitHub stars on asterisk_to_openai_rt

68

Forks on the asterisk_to_openai_rt repo

10+

Countries with production deployments

NVIDIA-Certified Associate
AI Infrastructure & Operations (2026-2028)
AWS Partner
Generative AI Essentials
Google Cloud
Speech API certified
Aviatrix ACE
Multicloud Network Associate
Y Combinator
Startup School 2020 alumni
NVIDIA Inception
Active member · Approved May 2026

Frequently asked questions

NVIDIA DGX Spark recommended (GB10 Blackwell, 128GB unified memory) — ideal for our Qwen3.6-35B-A3B-FP8 reasoning model. Also compatible: any NVIDIA datacenter card with FP8 support and ≥80GB VRAM (H100/H200 80GB, B100, DGX Station). We don't sell hardware — you procure from NVIDIA or your preferred reseller; we can connect you with a validated distributor in your country at no markup. We provide software, deployment, and support.

2-week go-live target from contract signature. Core install is typically 2-4 hours on a ready DGX. The remaining time covers tenant setup, custom MCP development if needed, pilot testing with real traffic, and team training.

Yes — and it's included at no extra cost. Nemo-RT Pro ships with a pre-built Node.js ARI bridge that works out of the box with Asterisk 20+, FreePBX, and 3CX via SIP trunk. Other PBX systems are supported via SIP-to-WebSocket bridge. The standard SIP/ARI integration is part of every deployment. Only custom integrations to non-SIP systems (proprietary PBXs, legacy TDM gateways) are quoted as a separate services engagement.

Up to 20 hours per month of direct engineering time (Pro tier) — for tenant configuration, MCP tuning, integration help, and operational questions. Plus: security patches and CVE responses as needed, new MCP modules and voice library updates as they ship, LLM model updates when stable upstream releases land, priority bug fixes for production-impacting issues, automated monthly health dashboards, and email/Slack incident response within 24h.

Your deployment keeps running. You own the hardware and the software license stays installed. What stops: monthly patches, new features, engineering support, access to new MCPs. What keeps working: everything installed today runs indefinitely. Very different from cloud where "stop paying" = "service off."

Yes. Annual prepay: 10% off. 2-year prepay: 15%. 3-year prepay: 20%. Partner program pays 33% margin on sourced deals. For Enterprise (white-label + SLA 99.5%) or large multi-region deployments, we build a custom quote during the discovery call.

Ready to see if Nemo-RT Pro fits your operation?

30-minute discovery call. No slides. We look at your volume, your compliance needs, your hardware, and tell you honestly whether this is a fit — or what would be better.