Menu
≈ why?
See the rankings
← All platforms

AssemblyAI

Speech-to-text Budget pick

Accurate streaming speech-to-text with built-in audio intelligence, for teams who want the listening half done well.

Best for agents where transcription accuracy and speech understanding decide everything
Watch for it is speech-to-text only, so it never speaks or dials a phone

Paid link, we may earn a commission. How this works.

Our scores editorial preview
4.6 Fair overall / 10
Voice quality 2
Voice range 1
Ease of use 6
Value 10
All-in /min $0.00–0.01
headline /min $0.00
✓ HIPAA✓ SOC 2 Type II✓ GDPR

Scored on the same voice-agent rubric as the full platforms, so a building block like this scores low on the axes it does not address. Read its value score against its job.

See how it stacks up · Full rankings →

The ears, not the whole agent. AssemblyAI turns speech into text fast and accurately (about 300ms), and adds extras like sentiment and redaction. It does not speak back or dial phones, so you pair it with a voice and a line. Streaming is about $0.0025 a minute.

What you'll pay

About $0.00 to 0.01 for a minute of conversation, once the phone line and the AI are added in.

That's roughly $0.15–0.45 an hour. Plans: $0/mo (Pay as you go).

Pricing

$ 0.00–0.01/min The total you actually pay for one minute of conversation once every piece is added up: the platform, the AI, the voice and the phone line. ≈ €0.00–0.01≈ £0.00–0.01≈ ₹0.24–0.72≈ R$0.01–0.04≈ A$0.00–0.01 headline $0.00 /min
Show the cost breakdown
What the platform charges to run the agent, before the phone line and the AI usage are added on.
The step that turns what the caller says out loud into text the AI can read. $0.00 /min
The AI 'brain' that reads what the caller said and works out what to say back.
The step that turns the AI's written reply back into a spoken voice.
The phone line itself: the service that connects the call to a real phone number. Usually billed on top of the platform.
The total you actually pay for one minute of conversation once every piece is added up: the platform, the AI, the voice and the phone line. $0.00–0.01 /min

AssemblyAI prices by the hour of audio, not per call. The headline figure here ($0.0025/min) is the real-time Universal-Streaming rate of $0.15/hr converted to per minute ($0.15 / 60 = $0.0025), since that is the model AssemblyAI builds for voice agents. Other rates from the same pricing page (captured 2026-05-31): async Universal-2 $0.15/hr, async Universal-3 Pro $0.21/hr, Whisper streaming $0.30/hr, and the premium Universal-3 Pro streaming model at $0.45/hr ($0.0075/min, which is the all-in_high here). New accounts get $50 in free credit. Audio-intelligence add-ons stack on top of the base rate per hour, for example speaker identification +$0.02/hr, sentiment +$0.02/hr, translation +$0.06/hr, entity detection +$0.08/hr, PII text redaction +$0.08/hr, topic detection +$0.15/hr, and a medical mode +$0.15/hr. There is also a separate bundled Voice Agent API listed at $4.50/hr ($0.075/min) that wraps speech-to-text, an LLM and text-to-speech, but the core product, and what this profile rates, is the standalone speech-to-text. STT-only vendor: platform, LLM, TTS and telephony components are 0 because you bring those yourself.

Plans & what you get

Every plan in one place: the monthly fee, what each one includes, and the features it unlocks. Anything beyond a plan's allowance, or on a pay-as-you-go tier, is billed at the per-minute rate above. A blank in the features means the vendor's plan page does not state it for that plan, not that it is unavailable.

Pay as you goEnterprise
Price FreeCustom
Included Pay per use
Plan notes $50 free credit on signup, no card required. Billed by the hour of audio.Custom volume pricing, BAA for HIPAA, dedicated support; contact sales.
What each plan unlocks
API access Yes
Concurrent calls Unlimited concurrent streams (vendor claim)
Priority support Community + docs Dedicated / custom
  • Pay as you go Free
    Pay per use

    $50 free credit on signup, no card required. Billed by the hour of audio.

    API access
    Yes
    Concurrent calls
    Unlimited concurrent streams (vendor claim)
    Priority support
    Community + docs
  • Enterprise Custom

    Custom volume pricing, BAA for HIPAA, dedicated support; contact sales.

    API access
    Concurrent calls
    Priority support
    Dedicated / custom

Each plan bundles a set amount of generated audio a month.

Prices in USD as set by the vendor · last checked 2026-06-03 · vendor pricing →

At a glance

· Plugging in your own phone-number supplier instead of using the platform's numbers. Handy if you already run your own phone setup. · Handing the call to a human with context: the AI briefs the person first, instead of a cold drop where the caller repeats themselves. · Kicking off a whole list of outbound calls at once, rather than dialling one at a time. · A standard way to let the agent use outside tools mid-call, like a booking system or your CRM. (MCP stands for Model Context Protocol.)
Speech-to-text
Universal-Streaming, Universal-3 Pro, Universal-2
Text-to-speech
None (speech-to-text only)
Languages
en, es, fr, de, it, pt
Integrations
Vapi, LiveKit, Daily, Pipecat, Native SDKs (Python/JS), REST + WebSocket API

Compliance

✓ HIPAA✓ SOC 2 Type II✓ GDPR

Our full take

AssemblyAI does one half of a voice agent, and it does it well. Think of an agent as having ears and a mouth. The ears turn what the caller says into text, the mouth turns the reply back into speech. AssemblyAI is the ears. It is speech-to-text plus what it calls speech understanding, and it does not speak back or place phone calls. So this is a building block, not a finished agent. Knowing that up front saves you comparing it like-for-like against an all-in platform.

The headline reason to reach for it is accuracy on the words that actually trip agents up. Names, email addresses, postcodes, order numbers. Its Universal-Streaming model is built for live agents and returns text in about 300 milliseconds (the launch post quotes a P50 of 303ms), which is fast enough that the caller does not feel the lag. AssemblyAI publishes that number itself, so treat it as a vendor claim until our own tests land, but the company’s reputation has long been built on transcription quality rather than marketing.

Here is the pricing, with the workings shown. AssemblyAI bills by the hour of audio, not per call, which is unusual in this directory and easy to misread. The real-time Universal-Streaming rate is $0.15 an hour. Divide by 60 and that is $0.0025 a minute, which is the figure on this page. That is genuinely cheap for the listening layer. For context, the cheaper async (pre-recorded) Universal-2 model is also $0.15/hr, the higher-accuracy Universal-3 Pro is $0.21/hr async, and the premium Universal-3 Pro streaming model runs $0.45/hr ($0.0075 a minute), which is the top of the range shown here. New accounts get $50 in free credit, no card needed, which is plenty to test with.

The “speech understanding” extras are where AssemblyAI separates from a plain transcriber, and they stack on the base rate per hour. Speaker identification is +$0.02/hr, sentiment +$0.02/hr, translation +$0.06/hr, entity detection +$0.08/hr, redacting personal data from the text +$0.08/hr, topic detection +$0.15/hr, and a medical mode for clinical terms +$0.15/hr. You only pay for the ones you switch on. For an agent that needs to redact a credit-card number on the fly or flag an angry caller, having those built into the same API is a real saving over wiring up three separate services.

One number to read carefully. AssemblyAI also lists a bundled Voice Agent API at $4.50/hr ($0.075/min) that wraps speech-to-text, a language model and a voice into one rate. That is a different product from the standalone transcription this profile rates, and the all-in math here deliberately covers only the speech-to-text, because that is the part AssemblyAI is known for and the part you would slot into a Vapi or LiveKit stack.

On languages, read the small print. The cheap Multilingual Universal-Streaming covers six languages, English, Spanish, French, German, Italian and Portuguese, all at the same $0.15/hr. The pricier Universal-3 Pro streaming model claims 99+ languages, so global reach exists, you just pay the premium streaming rate for it. If your callers are in those six big languages, the cheap tier is all you need.

Where AssemblyAI is unusually strong is the compliance paperwork, which matters if you handle health or payment data. Its own security page states SOC 2 Type 1 and Type 2, a GDPR third-party assessment, PCI-DSS 4.0 Level 1, AES-256 encryption at rest and TLS 1.3 in transit, with a choice of US or EU data residency. Separately, its docs confirm it will sign a HIPAA Business Associate Agreement (the contract that lets you process patient data legally), arranged through sales. That is more written assurance than several flashier rivals can show, and it is why this page ticks HIPAA where some others stay blank.

The honest limits. This is one piece of the puzzle, not the puzzle. No voice, no phone line, no built-in language model, so you are assembling the rest yourself or letting a platform like Vapi do it for you. It plugs neatly into Vapi, LiveKit, Daily and Pipecat-style stacks (that is how you give it a mouth and a phone number), but if you want to plug in one thing and have a working agent by lunchtime, AssemblyAI is not that thing. There is also no affiliate route to speak of: AssemblyAI’s own FAQ says plainly it does not run an affiliate programme, so there is no commission angle here, just the product.

My read. If the make-or-break for your agent is hearing the caller correctly, especially names, numbers and regulated data, AssemblyAI earns the shortlist at a low per-minute rate with the certificates to back a healthcare or finance build. If you want speech, dialling and orchestration handled in one bill, this is the wrong layer to start from, and a bundled platform will serve you better.

The 1 to 10 scores on this page are an editorial preview, our provisional read to get the framework in place, not a measured result. We have not run AssemblyAI through our own call tests yet, so there is no Voxrater latency figure here. The pricing, capability and compliance detail is sourced from AssemblyAI’s pricing, product, security and documentation pages, captured 2026-05-31.

Alternatives to AssemblyAI

Other platforms that overlap with AssemblyAI on the same kind of work, ranked by how many capabilities they share, then by cheaper all-in cost per minute. Compare any of them side by side on the compare page.

Tracking AssemblyAI? Get the next test result

We re-test and re-price the platforms we cover. Join the list and the next dated update lands in your inbox.

Newsletter launching soon.

Sources

  1. AssemblyAI pricing page re-captured 2026-06-02 for the quarterly re-verification; pricing reviewed against the live page (screenshot in evidence/). · captured 2026-06-02
  2. AssemblyAI pricing and plan page: Universal-Streaming $0.15/hr, Universal-3 Pro streaming $0.45/hr, async $0.15 to $0.21/hr, $50 free credit, audio-intelligence add-on rates, per-plan features · captured 2026-05-31
  3. Universal-Streaming product page: ~300ms latency, purpose-built for voice agents, unlimited concurrent streams, Vapi/LiveKit/Daily integrations · captured 2026-05-31
  4. Multilingual Universal-Streaming launch (12 Nov 2025): six languages (EN, ES, FR, DE, IT, PT), P50 latency 303ms, $0.15/hr · captured 2026-05-31
  5. AssemblyAI security page: SOC 2 Type 1 and Type 2, GDPR third-party assessment, PCI-DSS 4.0 Level 1, AES-256 at rest, TLS 1.3 in transit, US/EU data residency · captured 2026-05-31
  6. AssemblyAI docs FAQ: 'we offer a standard Business Associate Addendum (BAA)' for HIPAA, via sales · captured 2026-05-31
  7. AssemblyAI docs FAQ: 'No - we do not currently offer an affiliate marketing program' · captured 2026-05-31