MLX Transcribe

MLX Transcribe is a tool for transcribing audio files using MLX Whisper.

Prerequisites

Install ffmpeg
- macOS: brew install ffmpeg
- Ubuntu: sudo apt-get install ffmpeg
- Windows: Download from https://ffmpeg.org/download.html
Install mlx-whisper library
```
pip install mlx-whisper
```
Prepare audio files
- Create a ‘storage/audio’ directory
- Place your audio files in this directory
- Supported formats: mp3, mp4, wav, etc.
Download sample audio (optional)
- Visit: https://www.ted.com/talks/reid_hoffman_and_kevin_scott_the_evolution_of_ai_and_how_it_will_impact_human_creativity
- Save the audio file to ‘storage/audio’ directory

Example

The following agent will use MLX Transcribe to transcribe audio files.

cookbook/tools/mlx_transcribe_tools.py


from pathlib import Path
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.mlx_transcribe import MLXTranscribe

# Get audio files from storage/audio directory
phidata_root_dir = Path(__file__).parent.parent.parent.resolve()
audio_storage_dir = phidata_root_dir.joinpath("storage/audio")
if not audio_storage_dir.exists():
    audio_storage_dir.mkdir(exist_ok=True, parents=True)

agent = Agent(
    name="Transcription Agent",
    model=OpenAIChat(id="gpt-4o"),
    tools=[MLXTranscribe(base_dir=audio_storage_dir)],
    instructions=[
        "To transcribe an audio file, use the `transcribe` tool with the name of the audio file as the argument.",
        "You can find all available audio files using the `read_files` tool.",
    ],
    markdown=True,
)

agent.print_response("Summarize the reid hoffman ted talk, split into sections", stream=True)

Toolkit Params

Parameter	Type	Default	Description
`base_dir`	`Path`	`Path.cwd()`	Base directory for audio files
`read_files_in_base_dir`	`bool`	`True`	Whether to register the read_files function
`path_or_hf_repo`	`str`	`"mlx-community/whisper-large-v3-turbo"`	Path or HuggingFace repo for the model
`verbose`	`bool`	`None`	Enable verbose output
`temperature`	`float` or `Tuple[float, ...]`	`None`	Temperature for sampling
`compression_ratio_threshold`	`float`	`None`	Compression ratio threshold
`logprob_threshold`	`float`	`None`	Log probability threshold
`no_speech_threshold`	`float`	`None`	No speech threshold
`condition_on_previous_text`	`bool`	`None`	Whether to condition on previous text
`initial_prompt`	`str`	`None`	Initial prompt for transcription
`word_timestamps`	`bool`	`None`	Enable word-level timestamps
`prepend_punctuations`	`str`	`None`	Punctuations to prepend
`append_punctuations`	`str`	`None`	Punctuations to append
`clip_timestamps`	`str` or `List[float]`	`None`	Clip timestamps
`hallucination_silence_threshold`	`float`	`None`	Hallucination silence threshold
`decode_options`	`dict`	`None`	Additional decoding options

Toolkit Functions

Function	Description
`transcribe`	Transcribes an audio file using MLX Whisper
`read_files`	Lists all audio files in the base directory

Information

View on Github

Getting Started

Documentation

How To

Prerequisites

Example

Toolkit Params

Toolkit Functions

Information

Getting Started

Documentation

How To

​Prerequisites

​Example

​Toolkit Params

​Toolkit Functions

​Information

Prerequisites

Example

Toolkit Params

Toolkit Functions

Information