Junior DE is an AI App for data analysts, engineers and scientists to offload the daily, mundane data tasks. It can:

  • Write python scripts for processing data, creating charts
  • Analyze CSV, Parquet or JSON data using DuckDb.
  • Write complex queries including joins and filtering.
  • Run data transformations and export the results.
  • Explain analysis step-by-step.

The best part is that it inspects, validates and runs code like a regular engineer. It also plays well with long-term, multi-turn conversations. Follow along to build your own Junior DE.

Setup

1

Create a virtual environment

Open the Terminal and create a python virtual environment.

python3 -m venv ~/.venvs/aienv
source ~/.venvs/aienv/bin/activate
2

Install phidata

Install phidata using pip

pip install -U "phidata[aws]"
3

Install docker

Install docker desktop to run your app locally

Create your codebase

Create your codebase using the junior-de template

phi ws create -t junior-de -n junior-de

This will create a folder junior-de with the following structure:

junior-de                     # root directory for your junior-de
├── ai                      # directory for AI components
    ├── duckgpt             # directory for the DuckDb DE
    ├── pygpt               # directory for the Python DE
    ├── assistants          # AI assistants
    ├── knowledge_base.py   # assistant knowledge base
    └── storage.py          # assistant storage
├── app                     # directory for Streamlit apps
├── db                      # directory for database components
├── notebooks               # directory for Jupyter notebooks
├── Dockerfile              # Dockerfile for the application
├── pyproject.toml          # python project definition
├── requirements.txt        # python dependencies managed by pyproject.toml
├── scripts                 # directory for helper scripts
├── utils                   # directory for utilities
└── workspace               # phidata workspace directory
    ├── dev_resources.py    # dev resources running locally
    ├── prd_resources.py    # production resources running on AWS
    ├── jupyter             # jupyter notebook resources
    ├── secrets             # directory for storing secrets
    └── settings.py         # phidata workspace settings

Set OpenAI Key

Set your OPENAI_API_KEY as an environment variable. You can get one from OpenAI.

export OPENAI_API_KEY=sk-***

Run Junior DE locally

Start your Junior DE using:

phi ws up

Press Enter to confirm and give a few minutes for the image to download (only the first time). Verify container status and view logs on the docker dashboard.

DuckGPT: Automate Data Analysis using DuckDb

  • Open localhost:8501 to access your Junior DE.
  • Your first Junior DE is DuckGPT that can write and run SQL queries using DuckDb.
  • Click on DuckGPT and enter a username.
  • Message “Show me revenue over time”
  • See your Junior DE work through the problem.
  • Message “Save it” to save the query to the ai/duckgpt/scratch folder.

Junior DE: DuckGPT Local

Add your data

DuckGPT tables are defined in the ai/duckgpt/knowledge/tables.json file.

  • You can add csv, json or parquet files stored locally or on s3.
  • You can also add txt files to provide more information to the Assistant.
  • Click the Update Knowledge Base to load the knowledge base.

Message us on discord if you need help.

How DuckGPT works

DuckGPT uses the DuckDbAssistant defined in the ai/duckgpt/duckgpt.py file. You can customize your assistant and adapt the Junior DE to your workflow.

PyGPT: Automate Data Analysis using Python

  • Your next Junior DE is PyGPT that can write python scripts for processing data, create charts and more. Click on PyGPT.
  • Message “Show me a chart of revenue per year”
  • Each script that PyGPT creates and runs is saved to the ai/pygpt/scratch folder for reference
  • See your Junior DE work through the problem.

Junior DE: PyGPT Local

Add your data

PyGPT files are defined in the ai/pygpt/knowledge/files.json file.

  • You can add csv, json or parquet files stored locally or on s3.
  • You can also add txt files to provide more information to the Assistant.
  • Click the Update Knowledge Base to load the knowledge base.

Message us on discord if you need help.

How PyGPT works

PyGPT uses the PythonAssistant defined in the ai/pygpt/pygpt_streamlit.py file. You can customize your assistant and adapt the Junior DE to your workflow.

Optional: Run Jupyterlab

A jupyter notebook is a must have for AI development and your junior-de comes with a notebook pre-installed with the required dependencies. To start your notebook:

1

Enable Jupyter

Update the workspace/settings.py file and set dev_jupyter_enabled=True

workspace/settings.py
...
ws_settings = WorkspaceSettings(
    ...
    # Uncomment the following line
    dev_jupyter_enabled=True,
...
2

Start Jupyter

phi ws up --group jupyter

Press Enter to confirm and give a few minutes for the image to download (only the first time). Verify container status and view logs on the docker dashboard.

3

View JupyterLab UI

  • Open localhost:8888 to view the Jupyterlab UI. Password: admin
  • Open notebooks/duckgpt to play with DuckGPT.
junior-de-jupyter-local

Delete local resources

Play around and stop the workspace using:

phi ws down

or stop individual Apps using:

phi ws down --group app

Upcoming Upgrades

Junior DE is a v0 release, meaning there will be plenty of bugs and plenty of upgrades. Here’s what we have in the works:

  • Add Snowflake Assistant.
  • Ask questions via slack
  • Add Airflow Assistant.
  • Add ability to write and test data pipelines.

Message us on discord if you want to beta-test.

Next

Congratulations on running your Junior DE locally. Next Steps: