Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.asteragents.com/llms.txt

Use this file to discover all available pages before exploring further.

What it does

Transcribes audio or video to text using ElevenLabs Scribe. Works on files already in the conversation (including output from elevenlabs_text_to_speech) or any HTTPS-accessible media URL — including cloud storage, YouTube, TikTok, and podcast hosts.

Key features

  • Single audio_source param accepts r2:// conversation attachments or HTTPS URLs — auto-detected by prefix
  • Scribe v2 (default) for best-in-class accuracy
  • Word-level timestamps returned by default
  • Speaker diarization (who spoke when) when diarize is on
  • Audio event tagging — surfaces (laughter), (footsteps), etc. inline in the transcript
  • Auto language detection, or pin a specific ISO-639 code

Parameters

ParameterTypeRequiredDescription
audio_sourcestringYesEither (a) an r2://bucket/key path of an audio file already attached to the thread, or (b) an HTTPS URL to an audio/video file. Supports cloud storage URLs (S3, R2, GCS), YouTube, TikTok, and other HTTPS sources up to 2GB.
model_idenumNoscribe_v2 (default, latest) or scribe_v1
language_codestringNoISO-639-1 or ISO-639-3 code (e.g. eng, spa). If omitted, the language is auto-detected.
diarizebooleanNoAnnotate which speaker is talking (returns speaker_id per word). Default: true
tag_audio_eventsbooleanNoTag audio events like (laughter), (footsteps) inline. Default: true
num_speakersintegerNoExpected maximum number of speakers (1–32). Helps diarization when known.
timestamps_granularityenumNonone, word (default), or character

Common use cases

Transcribe a file already attached to the conversation

audio_source: "r2://aster-agents/org_xxx/threads/yyy/recording.mp3"
Use this when a previous tool call (TTS, a document extraction, or a user upload) produced an audio file — pass its r2_path straight through.

Transcribe a public podcast or recording URL

audio_source: "https://example.com/episode-42.mp3"
language_code: "eng"

Transcribe a meeting with multiple speakers

audio_source: "r2://aster-agents/org_xxx/threads/yyy/meeting.mp4"
diarize: true
num_speakers: 4

Response

Returns:
  • text — the full transcript
  • language_code / language_probability — detected language and confidence
  • speaker_count — number of distinct speakers identified (when diarize is on)
  • word_count — total words in the transcript
  • words — per-word objects with text, start/end timestamps, and speaker_id
  • source — a label describing which input path was used

Setup

No per-user setup. ElevenLabs is configured at the platform level — just enable the tool on your agent in Control Hub > Edit Agent under the Audio section.