- Add
multichannel
property toTranscriptParams
- Add
multichannel
andaudio_channels
property toTranscript
- Add
channel
property toTranscriptWord
,TranscriptUtterance
,TranscriptSentence
, andSentimentAnalysisResult
- Log a warning when a user tries to use API key authentication in the browser to connect to the real-time Streaming STT API.
- Update dependencies
- Use assembly.ai short URL for sample files
- Add
language_confidence_threshold
toTranscript
,TranscriptParams
, andTranscriptOptionalParams
.The confidence threshold for the automatically detected language. An error will be returned if the langauge confidence is below this threshold.
- Add
language_confidence
toTranscript
The confidence score for the detected language, between 0.0 (low confidence) and 1.0 (high confidence)
Using these new fields you can determine the confidence of the language detection model (enable by setting language_detection
to true
), and fail the transcript if it doesn't meet your desired threshold.
Learn more about the new automatic language detection model and feature improvements on our blog.
- Change
RealtimeErrorType
from enum to const object. - Add
RealtimeErrorTypeCodes
which is a union ofRealtimeErrorType
values
- Remove
conformer-2
fromSpeechModel
union type. - Remove conformer-2 deprecation warning
- Add more TSDoc comments for
RealtimeService
documentation - Add new LeMUR models
- Add
TranscriptWebhookNotification
which is a union ofTranscriptReadyNotification
orRedactedAudioNotification
- Add
RedactedAudioNotification
which represents the body of the PII redacted audio webhook notification.
- You can now retrieve previous LeMUR responses using
client.lemur.getResponse<LemurTask>("YOUR_REQUEST_ID")
. - LeMUR functions now return
usage
with the number ofinput_tokens
andoutput_tokens
.
- Rename
TranscriptService.redactions
function toTranscriptService.redactedAudio
. - Add
TranscriptService.redactedAudioFile
function. - Add
workerd
export to fixcache
issue withfetch
on Cloudflare Workers.
- Fix Rollup exports so __SDK_VERSION__ is properly replaced with the version of the SDK.
- Add new
PiiPolicy
enum values
- Add an export that only includes the Streaming STT code. You can use the export
- by importing
assemblyai/streaming
, - or by loading the
assemblyai.streaming.umd.js
file, orassemblyai.streaming.umd.min.js
file in a script-tag.
- by importing
- Add new
EntityType
enum values
- Add react-native exports that resolve to the browser version of the library.
- Caching is disabled for all HTTP request made by the SDK
- Accept data-URIs in
client.files.upload(dataUri)
,client.transcripts.submit(audio: dataUri)
,client.transcripts.transcribe(audio: dataUri)
. - Change how the WebSocket libraries are imported for better compatibility across frameworks and runtimes.
The library no longer relies on a internal
#ws
import, and instead compiles the imports into the dist bundles. Browser builds will use the nativeWebSocket
, other builds will use thews
package.
- Deprecate
enableExtraSessionInformation
parameter inCreateRealtimeTranscriberParams
type
- Add
disablePartialTranscripts
parameter toCreateRealtimeTranscriberParams
- Add
enableExtraSessionInformation
parameter toCreateRealtimeTranscriberParams
- Add
session_information
event toRealtimeTranscriber.on()
⚠️ Deprecateconformer-2
literal forTranscriptParams.speech_model
property
- Add missing
status
property toAutoHighlightsResult
SpeechModel.Best
enumTranscriptListItem.error
property
- Make
PageDetails.prev_url
nullable - Rename Realtime to Streaming inside code documentation
- More inline code documentation
- Rename
SubstitutionPolicy
literal "entity_type" to "entity_name" - Fix the pagination example in "List transcripts" sample on README
- GitHub action to generate API reference
- Generate API reference with Typedoc and host on GitHub Pages
- Add
conformer-2
toSpeechModel
type - Change
language_code
field to accept any string - Move from JSDoc to TSDoc
- Update
ws
to 8.13.0 - Update dev dependencies (no public facing changes)
- Add
audio_url
property toTranscribeParams
in addition to theaudio
property. You can use one or the other.audio_url
only accepts a URL string. - Add
TranscriptReadyNotification
type for the transcript webhook body.
- Update codebase to use TSDoc
- Update README.md with more samples
- Add
RealtimeTranscriber.configureEndUtteranceSilenceThreshold
function - Add
RealtimeTranscriber.forceEndUtterance
function - Add
end_utterance_silence_threshold
property toCreateRealtimeTranscriberParams
andRealtimeTranscriberParams
types.
- Add
speech_model
field toTranscriptParams
and addSpeechModel
type.
- Windows paths passed to
client.transcripts.transcribe
andclient.transcripts.submit
will work as expected.
- Add
answer_format
toLemurActionItemsParams
type
- Rename
RealtimeService
toRealtimeTranscriber
,RealtimeServiceFactory
toRealtimeTranscriberFactory
,RealtimeTranscriberFactory.createService()
toRealtimeTranscriberFactory.transcriber()
. Deprecated aliases are provided for all old types and functions for backwards compatibility. - Restrict the type for
redact_pii_audio_quality
fromstring
toRedactPiiAudioQuality
an enum string.
- Add
content_safety_confidence
toTranscriptParams
&TranscriptOptionalParams
.
- The
RealtimeService
now sends audio as binary instead of a base64-encoded JSON object.
- Add
"anthropic/claude-2-1"
toLemurModel
type - Add
encoding
option to the real-time service and factory.encoding
can be"pcm_s16le"
or"pcm_mulaw"
. "pcm_mulaw"
is a newly supported audio encoding for the real-time service.
- Allow any string into
final_model
for LeMUR requests
- Add
"assemblyai/mistral-7b"
toLemurModel
type
- Update types with
@example
- Update types with
Format: uuid
if applicable
- Add
node
,deno
,bun
,browser
, andworkerd
(Cloudflare Workers) exports to package.json. These exports are compatible versions of the SDK, with a few limitations in some cases. For more details, consult the SDK Compatibility document. - Add
dist/assemblyai.umd.js
anddist/assemblyai.umd.min.js
. You can reference these script files directly in the browser and the SDK will be available at the globalassemblyai
variable.
RealtimeService.sendAudio
accepts audio via typeArrayBufferLike
.- Breaking:
RealtimeService.stream
returns a WHATWG Streams Standard stream, instead of a Node stream. In the browser, the native web standard stream will be used. ws
is used as the WebSocket client as before, but in the browser, the native WebSocket client is used.- Rename Node SDK to JavaScript SDK as the SDK is compatible with more runtimes now.
- Add
client.transcripts.transcribe
function to transcribe an audio file with polling until transcript status iscompleted
orerror
. This function takes anaudio
option which can be an audio file URL, path, stream, or buffer. - Add
client.transcripts.submit
function to queue a transcript. You can useclient.transcripts.waitUntilReady
to poll the transcript returned bysubmit
. This function also takes anaudio
option which can be an audio file URL, path, stream, or buffer.
- Deprecated
client.transcripts.create
in favor oftranscribe
andsubmit
, to be more consistent with other AssemblyAI SDKs. - Renamed types
- Renamed
Parameters
type suffix withParams
type suffix - Renamed
CreateTranscriptParameters
toTranscriptParams
- Renamed
CreateTranscriptOptionalParameters
toTranscriptOptionalParams
.
- Renamed
- Added deprecated aliases for the forementioned types
- Improved type docs
- Add
AssemblyAI.transcripts.waitUntilReady
function to wait until a transcript is ready, meaningstatus
iscompleted
orerror
. - Add
chars_per_caption
parameter toAssemblyAI.transcripts.subtitles
function. - Add
input_text
property to LeMUR functions. Instead of usingtranscript_ids
, you can useinput_text
to provide custom formatted transcripts as input to LeMUR.
- Change default timeout from 3 minutes to infinite (-1). Fixes #17
- Correctly serialize the keywords for
client.transcripts.wordSearch
. - Use more widely compatible syntax for wildcard exporting types. Fixes #18.
- The SDK uses
fetch
instead of Axios. This removes the Axios dependency. Axios relies on XMLHttpRequest which isn't supported in Cloudflare Workers, Deno, Bun, etc. By usingfetch
, the SDK is now more compatible on the forementioned runtimes.
- The SDK uses relative imports instead of using path aliases, to make the library transpilable with tsc for consumers. Fixes #14.
- Added
speaker
property to theTranscriptUtterance
type, and removedchannel
property.
AssemblyAI.files.upload
accepts streams and buffers, in addition to a string (path to file).
- Breaking: The module does not have a default export anymore, because of inconsistent functionality across module systems. Instead, use
AssemblyAI
as a named import like this:import { AssemblyAI } from 'assemblyai'
.
AssemblyAI.transcripts.wordSearch
searches for keywords in the transcript.AssemblyAI.lemur.purgeRequestData
deletes data related to your LeMUR request.RealtimeService.stream
creates a writable stream that you can write audio data to instead of using `RealtimeService.sendAudio``.
- The AssemblyAI class would be exported as default named export instead in certain module systems.
Re-implement the Node SDK in TypeScript and add all AssemblyAI APIs.
- Transcript API client
- LeMUR API client
- Real-time transcript client