MLX Transcribe is a tool for transcribing audio files using MLX Whisper.

Prerequisites

  1. Install ffmpeg

  2. Install mlx-whisper library

    pip install mlx-whisper
    
  3. Prepare audio files

    • Create a ‘storage/audio’ directory
    • Place your audio files in this directory
    • Supported formats: mp3, mp4, wav, etc.
  4. Download sample audio (optional)

Example

The following agent will use MLX Transcribe to transcribe audio files.

cookbook/tools/mlx_transcribe_tools.py

from pathlib import Path
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.mlx_transcribe import MLXTranscribe

# Get audio files from storage/audio directory
phidata_root_dir = Path(__file__).parent.parent.parent.resolve()
audio_storage_dir = phidata_root_dir.joinpath("storage/audio")
if not audio_storage_dir.exists():
    audio_storage_dir.mkdir(exist_ok=True, parents=True)

agent = Agent(
    name="Transcription Agent",
    model=OpenAIChat(id="gpt-4o"),
    tools=[MLXTranscribe(base_dir=audio_storage_dir)],
    instructions=[
        "To transcribe an audio file, use the `transcribe` tool with the name of the audio file as the argument.",
        "You can find all available audio files using the `read_files` tool.",
    ],
    markdown=True,
)

agent.print_response("Summarize the reid hoffman ted talk, split into sections", stream=True)

Toolkit Params

ParameterTypeDefaultDescription
base_dirPathPath.cwd()Base directory for audio files
read_files_in_base_dirboolTrueWhether to register the read_files function
path_or_hf_repostr"mlx-community/whisper-large-v3-turbo"Path or HuggingFace repo for the model
verboseboolNoneEnable verbose output
temperaturefloat or Tuple[float, ...]NoneTemperature for sampling
compression_ratio_thresholdfloatNoneCompression ratio threshold
logprob_thresholdfloatNoneLog probability threshold
no_speech_thresholdfloatNoneNo speech threshold
condition_on_previous_textboolNoneWhether to condition on previous text
initial_promptstrNoneInitial prompt for transcription
word_timestampsboolNoneEnable word-level timestamps
prepend_punctuationsstrNonePunctuations to prepend
append_punctuationsstrNonePunctuations to append
clip_timestampsstr or List[float]NoneClip timestamps
hallucination_silence_thresholdfloatNoneHallucination silence threshold
decode_optionsdictNoneAdditional decoding options

Toolkit Functions

FunctionDescription
transcribeTranscribes an audio file using MLX Whisper
read_filesLists all audio files in the base directory

Information