Create a file image_to_text.py with the following code:

image_to_text.py
from pathlib import Path

from phi.agent import Agent
from phi.model.openai import OpenAIChat

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    markdown=True,
)

image_path = Path(__file__).parent.joinpath("multimodal-agents.jpg")
agent.print_response(
    "Write a 3 sentence fiction story about the image",
    images=[str(image_path)],
)

Usage

1

Create a virtual environment

Open the Terminal and create a python virtual environment.

2

Install libraries

pip install openai phidata
3

Run the agent

python image_to_text.py