The problem

You're shipping voice services blind.

Your voice system handles thousands of calls. But how do you know it still works after every change?

Changes break things silently

A prompt tweak, an LLM upgrade, an IVR reconfiguration — edge cases break and nobody notices until customers complain.

Manual testing doesn't scale

You call your own system 3 times, it works, you ship. But what about the 47 other scenarios, edge cases, and languages?

No regression visibility

You have no idea if last week's deployment degraded call quality, increased latency, or broke your IVR routing. Zero data.

How it works

Four steps to bulletproof your voice service.

1

Define your test scenario

Describe what to test — via our UI or API. Set the persona, goal, schedule, and pass/fail criteria. Use our auto-generated scenarios or write your own.

test-booking.yaml

name: "Book appointment - edge case"
phone: "+1-555-0123"
schedule: "every day at 09:00 UTC"
call_window: "09:00-18:00 America/New_York"

persona: "Impatient customer, speaks fast, has accent"
goal: "Book a haircut for tomorrow 2pm"

test_data:
  credit_card: "4242 4242 4242 4242"
  customer_id: "TEST-9921"

must_include:
  - "confirm date and time"
  - "provide booking reference"
must_not:
  - "hang up before confirmation"
  - "hallucinate availability"
max_latency_ms: 2000

2

We generate test runs from your scenario

From your base scenario, we auto-derive multiple test runs — edge cases, personas, accents, error paths. You can also add your own manual edge cases.

Your scenario

"Book appointment"

#1 AUTO Happy path — polite customer

#2 AUTO Impatient customer, speaks fast

#3 AUTO Heavy accent (Spanish)

#4 AUTO Asks for unavailable time slot

#5 MANUAL Test card 4242 declined scenario

#6 MANUAL Customer says competitor name

3

We call your service for each run

Our AI caller dials your phone number once per test run — navigates IVR menus via DTMF, follows the scenario, has a full conversation. Each run is a real phone call.

CallQA

Your Service

6 test runs · ~4 min each · running in parallel

4

Get your test suite report

See results for every run at a glance. Spot where your agent fails. Add manual edge cases to cover what we missed.

5/6 PASS Book appointment — Suite #247 87/100

✓ #1 Happy path AUTO 96

✓ #2 Impatient customer AUTO 91

✓ #3 Spanish accent AUTO 88

✗ #4 Unavailable slot AUTO 34

✓ #5 Card declined scenario MANUAL 82

✓ #6 Competitor name mention MANUAL 79

⚡

Edge case found: Agent hallucinated a 3pm slot when asked about unavailable times. Add to regression suite →

Regression dashboard

Track quality over time. Catch regressions instantly.

Every test run feeds a live dashboard. Tag releases, spot trends, get alerted when things degrade.

Pass Rate

94.2%

+2.1% vs last week

Avg. Score

87.4

+1.8 vs last week

Avg. Latency

1.2s

-0.3s vs last week

Tests Run

1,247

this month

Quality Score Over Time

Pass rate Avg score

Feb 1 v2.1.0 v2.2.0 Mar 2

Recent Test Suite Runs

PASS Book appointment 6 test runs 94 2 min ago

PASS Cancel subscription 3 test runs 88 5 min ago

FAIL IVR menu navigation 10 test runs 34 8 min ago

PASS Payment processing 4 test runs 91 12 min ago

WARN Spanish language 5 test runs 72 15 min ago

API-first

Everything via API. Automate everything.

Trigger tests from your CI/CD, get results via webhook, manage scenarios programmatically. Every feature in the UI is available through the API.

REST API

Create scenarios, trigger runs, fetch results. Full CRUD on all resources.

Webhooks

Get notified on test completion, failures, or score drops. Pipe into Slack, PagerDuty, or your own systems.

CI/CD Integration

Run tests on every deploy. Fail the pipeline if quality drops below your threshold. GitHub Actions, GitLab CI, Jenkins — all supported.

Use cases

For anyone with a phone number to test.

Voice AI agencies

You build voice agents for clients on Vapi, Retell, or Bland. Run tests before every deployment. Send clients proof their agent works.

IVR & contact center ops

Test your entire IVR tree — menu routing, DTMF navigation, hold times, transfer flows. Catch misconfigurations automatically.

AI replacing human agents

Migrating from human to AI agents? Run the same test scenarios on both and compare scores, latency, and task completion.

Appointment & booking systems

Healthcare, salons, restaurants — booking bots must be perfect. Test edge cases: double bookings, cancellations, timezone confusion, payment flows.

Pricing

Simple pricing. No surprises.

All plans include full transcripts, audio recordings, AI scoring, API access, CI/CD integration, scheduled tests, and Slack alerts.

Monthly Annual -20%

Starter

$99/mo

100 test calls / month
$0.75 per additional call
Up to 3 team seats
1 phone country
Scheduled daily tests
CI/CD webhooks
Full API access
Slack & email alerts
Custom scenarios

Get Early Access

Questions & answers

How does CallQA actually call my agent?

We place a real phone call (via SIP/PSTN) to your phone number — exactly like a customer would. Our AI caller follows your test scenario, speaks naturally, navigates IVR menus via DTMF, and records the full conversation on separate audio tracks for accurate analysis.

What can I test? Just AI agents?

Anything with a phone number. AI voice agents (Vapi, Retell, Bland AI, LiveKit, custom), traditional IVR systems, IVA (Intelligent Virtual Assistants), or even human-operated call centers. If you can call it, we can test it.

What does a test report include?

Every test generates: a full transcript with separate speaker tracks, the complete audio recording, an AI-powered quality score (0-100), pass/fail checks against your criteria, latency measurements per turn, and actionable recommendations.

Can I inject custom test data (credit cards, IDs, etc.)?

Yes. You can define test data in your scenario — credit card numbers, customer IDs, booking references, addresses — and our AI caller will use them naturally during the conversation, just like a real customer would.

How is this different from Vapi Evals or Retell Assure?

Those are platform-specific and often text-only simulations. CallQA makes real phone calls to test the complete end-to-end experience — telephony, latency, DTMF/IVR navigation, audio quality — not just the LLM response. And it works across all platforms and traditional IVR systems.

Can I use it for automated regression testing?

That's our core use case. Schedule tests to run daily (or on every deploy via CI/CD), tag releases, and instantly see if a change degraded quality. The dashboard shows score trends, latency curves, and pass rate over time with release markers.

Get your free Voice AI Quality Report.

We'll stress-test your voice agent with 50 real scenarios and show you exactly what breaks. Limited to the first 100 teams.