AssemblyAI

Speech-to-text Budget pick Free credit

Accurate streaming speech-to-text with built-in audio intelligence, for teams who want the listening half done well.

Best for agents where transcription accuracy and speech understanding decide everything

Watch for it is speech-to-text only, so it never speaks or dials a phone

Free to try $50 free credit (~185 hrs STT) · no card

Paid link, we may earn a commission. How this works.

Reviewed by Voxrater Last reviewed 2026-07-11 Methodology

Our scores editorial preview

4.6 Fair overall / 10

Voice quality 2

Voice range 1

Ease of use 6

Value 10

All-in /min $0.00–0.01

headline /min $0.00

✓ HIPAA✓ SOC 2 Type II✓ GDPR

Scored on the same voice-agent rubric as the full platforms, so a building block like this scores low on the axes it does not address. Read its value score against its job.

See how it stacks up · Full rankings →

The ears, not the whole agent. AssemblyAI turns speech into text fast and accurately (about 300ms), and adds extras like sentiment and redaction. It does not speak back or dial phones, so you pair it with a voice and a line. Streaming is about $0.0025 a minute.

What you'll pay

About $0.00 to 0.01 for a minute of conversation, once the phone line and the AI are added in.

That's roughly $0.15–0.45 an hour. Plans: $0/mo (Pay as you go).

Pricing

$ 0.00–0.01/min headline $0.00 /min

Show the cost breakdown

	—
	$0.00 /min
	—
	—
	—
	$0.00–0.01 /min

AssemblyAI prices by the hour of audio, not per call. The headline figure here ($0.0025/min) is the real-time Universal-Streaming rate of $0.15/hr converted to per minute ($0.15 / 60 = $0.0025), since that is the model AssemblyAI builds for voice agents. Other rates from the same pricing page: async Universal-2 $0.15/hr, async Universal-3.5 Pro $0.21/hr, Whisper streaming $0.30/hr, and the premium Universal-3.5 Pro Realtime model at $0.45/hr ($0.0075/min, which is the all-in_high here). The 3.5 models replaced Universal-3 Pro in June 2026 (Realtime on 2026-06-23) at unchanged prices. Newly listed add-ons: speaker diarization +$0.02/hr async, +$0.12/hr streaming. Stream rate limits now published: 5 new streams/min on free accounts, 100/min on paid. New accounts get $50 in free credit. Audio-intelligence add-ons stack on top of the base rate per hour, for example speaker identification +$0.02/hr, sentiment +$0.02/hr, translation +$0.06/hr, entity detection +$0.08/hr, PII text redaction +$0.08/hr, topic detection +$0.15/hr, and a medical mode +$0.15/hr. There is also a separate bundled Voice Agent API listed at $4.50/hr ($0.075/min) that wraps speech-to-text, an LLM and text-to-speech, but the core product, and what this profile rates, is the standalone speech-to-text. STT-only vendor: platform, LLM, TTS and telephony components are 0 because you bring those yourself.

Plans & what you get

Every plan in one place: the monthly fee, what each one includes, and the features it unlocks. Anything beyond a plan's allowance, or on a pay-as-you-go tier, is billed at the per-minute rate above. A blank in the features means the vendor's plan page does not state it for that plan, not that it is unavailable.

	Pay as you go	Enterprise
Price	Free	Custom
Included	Pay per use	—
Plan notes	$50 free credit on signup, no card required. Billed by the hour of audio.	Custom volume pricing, BAA for HIPAA, dedicated support; contact sales.
What each plan unlocks
API access	Yes	—
Concurrent calls	100 new streams/min (paid); 5/min free	—
Priority support	Community + docs	Dedicated / custom

Pay as you go Free

Pay per use

$50 free credit on signup, no card required. Billed by the hour of audio.

API access

Yes

Concurrent calls

100 new streams/min (paid); 5/min free

Priority support

Community + docs
Enterprise Custom

—

Custom volume pricing, BAA for HIPAA, dedicated support; contact sales.

API access

—

Concurrent calls

—

Priority support

Dedicated / custom

Each plan bundles a set amount of generated audio a month.

Prices in USD as set by the vendor · last checked 2026-06-15 · vendor pricing →

At a glance

· · · ·

Speech-to-text: Universal-Streaming, Universal-3.5 Pro, Universal-2
Text-to-speech: None (speech-to-text only)
Languages: en, es, fr, de, it, pt
Integrations: Vapi, LiveKit, Daily, Pipecat, Native SDKs (Python/JS), REST + WebSocket API

Compliance

✓ HIPAA✓ SOC 2 Type II✓ GDPR

Our full take

AssemblyAI does one half of a voice agent, and it does it well. Think of an agent as having ears and a mouth. The ears turn what the caller says into text, the mouth turns the reply back into speech. AssemblyAI is the ears. It is speech-to-text plus what it calls speech understanding, and it does not speak back or place phone calls. So this is a building block, not a finished agent. Knowing that up front saves you comparing it like-for-like against an all-in platform.

The headline reason to reach for it is accuracy on the words that actually trip agents up. Names, email addresses, postcodes, order numbers. Its Universal-Streaming model is built for live agents and returns text in about 300 milliseconds (the launch post quotes a P50 of 303ms), which is fast enough that the caller does not feel the lag. AssemblyAI publishes that number itself, so treat it as a vendor claim until our own tests land, but the company’s reputation has long been built on transcription quality rather than marketing.

Here is the pricing, with the workings shown. AssemblyAI bills by the hour of audio, not per call, which is unusual in this directory and easy to misread. The real-time Universal-Streaming rate is $0.15 an hour. Divide by 60 and that is $0.0025 a minute, which is the figure on this page. That is genuinely cheap for the listening layer. For context, the cheaper async (pre-recorded) Universal-2 model is also $0.15/hr, the higher-accuracy Universal-3.5 Pro is $0.21/hr async, and the premium Universal-3.5 Pro Realtime model runs $0.45/hr ($0.0075 a minute), which is the top of the range shown here. The 3.5 generation replaced Universal-3 Pro in June 2026 at the same prices, adding context carryover across a session. New accounts get $50 in free credit, no card needed, which is plenty to test with.

The “speech understanding” extras are where AssemblyAI separates from a plain transcriber, and they stack on the base rate per hour. Speaker identification is +$0.02/hr, sentiment +$0.02/hr, translation +$0.06/hr, entity detection +$0.08/hr, redacting personal data from the text +$0.08/hr, topic detection +$0.15/hr, and a medical mode for clinical terms +$0.15/hr. You only pay for the ones you switch on. For an agent that needs to redact a credit-card number on the fly or flag an angry caller, having those built into the same API is a real saving over wiring up three separate services.

One number to read carefully. AssemblyAI also lists a bundled Voice Agent API at $4.50/hr ($0.075/min) that wraps speech-to-text, a language model and a voice into one rate. That is a different product from the standalone transcription this profile rates, and the all-in math here deliberately covers only the speech-to-text, because that is the part AssemblyAI is known for and the part you would slot into a Vapi or LiveKit stack.

On languages, read the small print. The cheap Multilingual Universal-Streaming covers six languages, English, Spanish, French, German, Italian and Portuguese, all at the same $0.15/hr. The premium Universal-3.5 Pro Realtime model covers 18 languages with mid-call language switching (vendor figures; its retired predecessor advertised 99+, so the current line-up trades breadth for accuracy). If your callers are in those six big languages, the cheap tier is all you need.

Where AssemblyAI is unusually strong is the compliance paperwork, which matters if you handle health or payment data. Its own security page states SOC 2 Type 1 and Type 2, a GDPR third-party assessment, PCI-DSS 4.0 Level 1, AES-256 encryption at rest and TLS 1.3 in transit, with a choice of US or EU data residency. Separately, its docs confirm it will sign a HIPAA Business Associate Agreement (the contract that lets you process patient data legally), arranged through sales. That is more written assurance than several flashier rivals can show, and it is why this page ticks HIPAA where some others stay blank.

The honest limits. This is one piece of the puzzle, not the puzzle. No voice, no phone line, no built-in language model, so you are assembling the rest yourself or letting a platform like Vapi do it for you. It plugs neatly into Vapi, LiveKit, Daily and Pipecat-style stacks (that is how you give it a mouth and a phone number), but if you want to plug in one thing and have a working agent by lunchtime, AssemblyAI is not that thing. There is also no affiliate route to speak of: AssemblyAI’s own FAQ says plainly it does not run an affiliate programme, so there is no commission angle here, just the product.

My read. If the make-or-break for your agent is hearing the caller correctly, especially names, numbers and regulated data, AssemblyAI earns the shortlist at a low per-minute rate with the certificates to back a healthcare or finance build. If you want speech, dialling and orchestration handled in one bill, this is the wrong layer to start from, and a bundled platform will serve you better.

The 1 to 10 scores on this page are an editorial preview, our provisional read to get the framework in place, not a measured result. We have not run AssemblyAI through our own call tests yet, so there is no Voxrater latency figure here. The pricing, capability and compliance detail is sourced from AssemblyAI’s pricing, product, security and documentation pages, captured 2026-05-31.

Alternatives to AssemblyAI

Other platforms that overlap with AssemblyAI on the same kind of work, ranked by how many capabilities they share, then by cheaper all-in cost per minute. Compare any of them side by side on the compare page.

Speechmatics $0.00– 0.01/min Enterprise speech-to-text with very broad language coverage and real on-prem options, for teams who self-host. View Speechmatics → LiveKit $0.02– 0.20/min The open-source real-time stack that carries voice-agent audio, plus a framework to wire your own STT, LLM and voice. View LiveKit → Pipecat $0.03– 0.20/min Open-source Python framework where you pick every voice-agent part, free to self-host, with Daily's cloud for scaling. View Pipecat →

Tracking AssemblyAI? Get the next test result

We re-test and re-price the platforms we cover. Join the list and the next dated update lands in your inbox.

Newsletter launching soon.