Complete Guide to Configure VS Code for Python and AI Agent Development

Part 1: Basic Setup

60-90 min

Getting Started Development Ready Production

Start Here If:

You're setting up VS Code for the first time
You're new to Python virtual environments
You need basic AI development configuration

This section covers everything needed to get your development environment ready for AI work. You'll install VS Code, set up Python, configure virtual environments, and install essential tools.

1. Install VS Code and Prerequisites (Beginner)

1.1 Install VS Code

Download: Get the stable version from code.visualstudio.com.
Install: Follow prompts for your OS (Windows, macOS, or Linux).
Verify: Open VS Code and confirm it launches.

1.2 Install Python

Python Version Compatibility Matrix (August 2025)

Python Version	AI Library Support	Recommendation	Use Case
3.11.9	✅ Universal support	🟢 Stable choice	Production, maximum compatibility
3.12.7	✅ Excellent support	🟢 Recommended	New projects, modern features
3.13.5	⚠️ Limited AI support	🟡 Use only for pure Python projects	PyTorch support incomplete, TensorFlow not supported

Recommended Version: Python 3.12.7 from python.org
Installation Setup:
- Windows: Check "Add Python to PATH" during installation. After installation, open a *new* terminal window to verify.
- macOS: Use Homebrew: brew install python@3.12. For Apple Silicon (M1/M2/M3), ensure Rosetta 2 is installed (softwareupdate --install-rosetta) for compatibility with some packages.
- Linux: Use system package manager or pyenv for version management (e.g., pyenv install 3.12.7).
- Cross-Platform Note: For Windows users, consider using Windows Subsystem for Linux (WSL) for a more consistent, Linux-aligned development experience, which simplifies Docker usage and package management.

Verify Installation:

# Check Python version
python --version
# or on some systems
python3 --version
# Should show: Python 3.12.7

1.3 Install Git

Download: Install Git from git-scm.com.

Configure: Set up your name and email:

git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

Verify: git --version

2. Set Up Python in VS Code (Beginner)

2.1 Install the Python Extension

Open Extensions view (Ctrl+Shift+X / Cmd+Shift+X).
Search for Python (by Microsoft, ID: ms-python.python) and install.
Features: Code completion (via Pylance), linting, debugging, Jupyter support.
Note: Restart VS Code after installation to ensure all features load correctly.

2.2 Select Python Interpreter

Open Command Palette (Ctrl+Shift+P / Cmd+Shift+P).
Type Python: Select Interpreter.
Choose your Python installation (e.g., Python 3.12.7).
Verify in VS Code status bar (bottom-left shows Python version).

3. Virtual Environment Setup (Beginner)

Critical: Always use virtual environments to avoid dependency conflicts. We will use the standard name .venv for our environment.

3.1 Create a Virtual Environment

Choose one method based on your preference:

Method 1: VS Code Integrated Creation (Easiest)

Open your project folder in VS Code.
Open the Command Palette (Ctrl+Shift+P / Cmd+Shift+P).
Type Python: Create Environment and select it.
Choose Venv.
Select the Python interpreter you installed (e.g., Python 3.12.7).
VS Code will create the environment in a .venv folder and automatically configure your workspace to use it.

Method 2: Manual Terminal Creation

# Navigate to your project directory
cd your-ai-project

# Create virtual environment named ".venv"
python -m venv .venv

# --- Activate the environment ---
# Windows (Command Prompt):
.venv\Scripts\activate.bat

# Windows (PowerShell):
# Note: You may need to run 'Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process' first. This is a one-time setup step.
.venv\Scripts\Activate.ps1

# macOS/Linux (bash/zsh):
source .venv/bin/activate

# Verify activation (your terminal prompt should now be prefixed with (.venv))
which python

3.2 Configure VS Code to Use the Virtual Environment

If you used Method 1, VS Code does this automatically.
If you created it manually, open your project folder, press Ctrl+Shift+P / Cmd+Shift+P, type Python: Select Interpreter, and choose the interpreter from your virtual environment (e.g., ./.venv/bin/python).

4. AI-Specific Extensions & Tools (Beginner)

Install these from the VS Code marketplace:

Jupyter (ms-toolsai.jupyter): For notebook development and prototyping.
GitHub Copilot (GitHub.copilot): AI-powered code completion. Requires a subscription.
Python Docstring Generator (njpwerner.autodocstring): Auto-generate documentation.
Black Formatter (ms-python.black-formatter): Code formatting.
isort (ms-python.isort): Import sorting.
Pylint (ms-python.pylint): Advanced linting.
Error Lens (usernamehw.errorlens): Highlights errors/warnings in-line.

Note: Ensure ipykernel is installed in your virtual environment for Jupyter support (pip install ipykernel).

5. Install Essential AI Packages (Intermediate)

Ensure your virtual environment is activated. Choose one of the two paths below for managing your project's dependencies.

🚨 UPDATED PACKAGE VERSIONS (August 2025)

Critical fixes applied to all package versions for compatibility:

Path A: Simple Method (pip freeze)

This method is straightforward. You install packages manually and then save a list of all dependencies.

Install Packages: Run the following commands to install the core libraries.

# Update pip first
pip install --upgrade pip

# Core AI/ML and Data Science (Updated August 2025)
pip install numpy==2.3.2 pandas==2.3.1 scikit-learn==1.5.2 

# PyTorch with CUDA 12.1 support
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu121

pip install transformers==4.53.2 sentence-transformers==3.3.1
pip install jupyter==1.1.1 python-dotenv==1.0.1 pytest==8.3.3

# LangChain Ecosystem (Updated for v0.3.x compatibility)
pip install langchain==0.3.27 langchain-community==0.3.5 langchain-openai==0.3.28

# Other AI Frameworks (Optional - updated versions)
pip install crewai==0.83.5 pyautogen==0.4.3 llama-index==0.12.8

# API Clients and Utilities (Updated August 2025)
pip install openai==1.54.3 anthropic==0.39.0 google-generativeai==0.8.3 huggingface_hub==0.26.2
pip install requests==2.32.3 beautifulsoup4==4.12.3 ollama==0.4.2 chromadb==0.5.20 faiss-cpu==1.9.0

# Security Tools
pip install safety==3.2.11 pip-audit==2.8.0

Generate requirements.txt: After installing, create the file.
```
pip freeze > requirements.txt
```
To Reinstall Later: Anyone (or any server) can replicate the environment with:
```
pip install -r requirements.txt
```

Path B: Advanced Method (`pip-tools`)

This is the recommended best practice for cleaner, more maintainable projects. You define only your direct dependencies, and pip-tools resolves and pins the sub-dependencies.

Install pip-tools: pip install pip-tools==7.5.0

Create requirements.in: Manually create this file and list only your top-level dependencies.

# requirements.in - Updated August 2025
# Core
numpy~=2.3.2
pandas~=2.3.1
python-dotenv~=1.0.1
pytest~=8.3.3

# LangChain Ecosystem (v0.3.x compatible)
langchain~=0.3.27
langchain-openai~=0.3.28

# PyTorch (CUDA 12.1 compatible)
# NOTE: pip-tools does not support --index-url per line.
#       Install torch separately after pip-sync, or use a constraints file.
# torch==2.7.0 --index-url https://download.pytorch.org/whl/cu121

# API and local models
openai~=1.54.3
ollama~=0.4.2

# Vector DB
chromadb~=0.5.20

# Optional: Add other frameworks like crewai or llama-index here
# crewai~=0.83.5

PyTorch/CUDA Note:

pip-tools does not support --index-url per line in requirements.in.
After running pip-sync or pip install -r requirements.txt, install PyTorch with CUDA manually:
pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu121
Alternatively, use a constraints.txt file for advanced workflows. See pip-tools docs for details.

Compile requirements.txt: Run the command below. It will generate a requirements.txt file with all dependencies pinned.
```
pip-compile requirements.in
```

Install From Compiled File: Use the generated file to install everything.

pip install -r requirements.txt
# Then install PyTorch with CUDA if needed:
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu121

⚠️ Version Conflict Resolution

If you encounter package conflicts:

Create a fresh environment: python -m venv .tmp_test
Test individual packages: Install one problematic package at a time
Check compatibility: Use pip check to identify conflicts
Use pip-tools: Let it resolve complex dependencies automatically

6. GPU Setup for AI Development (Intermediate)

For AI projects involving training or inference with large models, a GPU can significantly speed up computations. This section guides you through setting up NVIDIA GPU support with CUDA.

GPU Compatibility Matrix (August 2025)

PyTorch Version	CUDA Version	Python Support	Installation Command
2.7.0 (Latest)	12.1	3.11-3.13	`pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu121`
2.7.0 (CUDA 12.1)	12.1	3.11-3.13	`pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu121`
2.7.0 (CPU Only)	N/A	3.11-3.13	`pip install torch==2.7.0`

6.1 Prerequisites

Hardware: An NVIDIA GPU (e.g., RTX 3060, 4080, or higher). AMD GPUs are not supported for CUDA-based workflows.
OS: Windows, Linux, or WSL2 on Windows. macOS does not support CUDA.
NVIDIA Driver: Ensure you have the latest NVIDIA drivers installed *before* installing the CUDA toolkit.

6.2 Install CUDA Toolkit and cuDNN

Check Required CUDA Version: PyTorch 2.7.0 supports CUDA 12.1. CUDA 12.1 is recommended for GPU architecture support.
Download CUDA Toolkit: Visit NVIDIA CUDA Toolkit, select version 12.1, and follow installation prompts.
Install cuDNN: Visit NVIDIA cuDNN, download the version compatible with your CUDA Toolkit, and follow NVIDIA's instructions to copy the files.

Verify Installation:

# Check CUDA version
nvcc --version
# Should show CUDA version (e.g., 12.1)

6.3 Install GPU-Enabled PyTorch

# Ensure your virtual environment is active
# For CUDA 12.1 (recommended):
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu121

pip-tools Note: If you use pip-tools for dependency management, install PyTorch with CUDA manually after syncing other dependencies, as pip-tools does not support --index-url per line.

Apple Silicon Note: For macOS with M1/M2/M3, use PyTorch's native Metal Performance Shaders (MPS). The standard pip install torch==2.7.0 will include MPS support. Check for availability with torch.backends.mps.is_available().

6.4 Verify GPU Setup

Create a test script (test_gpu.py):

import torch

print(f"✓ PyTorch version: {torch.__version__}")
print(f"✓ CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"✓ GPU device: {torch.cuda.get_device_name(0)}")
    print(f"✓ CUDA version built with: {torch.version.cuda}")
    print(f"✓ Number of GPUs: {torch.cuda.device_count()}")
    
    # Test tensor operation
    try:
        x = torch.rand(5, 3).cuda()
        print(f"✓ Tensor operation successful: {x.mean()}")
    except RuntimeError as e:
        print(f"✗ CUDA operation failed: {str(e)}")
        print("Common solutions: Reinstall CUDA drivers or use torch==2.7.0+cu121")
# Add a check for Apple Silicon (MPS)
elif torch.backends.mps.is_available():
    print(f"✓ Apple Silicon MPS available")
    print(f"✓ Using device: mps")
else:
    print("ℹ️ No GPU detected - using CPU only")

Ready to Build AI Agents?

You've set up your development environment. Now let's create your first AI agent.

Continue to AI Development

Already comfortable with setup? Jump to Part 2

Part 2: AI Development

60-90 min

Getting Started Development Ready Production

Start Here If:

Your environment is already configured
You're ready to build your first AI agent
You're working with RAG or agent chains

In this section, you'll create your first AI agent, work with vector databases, and learn debugging techniques. You'll also set up API keys and verify your environment.

7. Project Structure & Best Practices (Beginner)

7.1 Structure

my-ai-agent/
├── .venv/            # Virtual environment
├── .vscode/          # VS Code settings
├── .env              # Environment variables
├── .env.example      # Template for env variables
├── .gitignore        # Git ignore file
├── requirements.txt  # Python dependencies (generated)
├── requirements.in   # Top-level dependencies (if using pip-tools)
├── README.md         # Project documentation
├── pyproject.toml    # Project configuration (for linters, formatters, etc.)
├── src/              # Source code
│   └── ...
├── notebooks/        # Jupyter notebooks
└── tests/            # Unit tests

7.2 Create .gitignore

A good starting point can be generated from gitignore.io using terms like: `Python`, `VisualStudioCode`, `venv`.

7.3 Centralizing Tool Configuration (`pyproject.toml`)

For consistent code quality across your project and team members, it's best practice to configure tools like Black (formatter), isort (import sorter), and Ruff (linter/formatter) in pyproject.toml rather than editor-specific settings. This ensures your project's code style is enforced uniformly, regardless of the IDE used.

Create or update your pyproject.toml file with the following:

# pyproject.toml

[tool.black]
line-length = 88
target-version = ['py312'] # Or your Python version

[tool.isort]
profile = "black" # Ensures compatibility with Black's formatting
line_length = 88

[tool.ruff]
line-length = 88
select = ["E", "F", "W", "I", "UP", "C"] # Common linting rules (Error, Flake8, Warning, Isort, PyUpgrade, Complexity)
ignore = ["E501"] # Example: Ignore line length if Black handles it
target-version = "py312" # Or your Python version

Why this is important: When these configurations are in pyproject.toml, VS Code (via its Python extension) and other tools like pre-commit hooks will automatically pick them up, making your project's code quality standards portable.

8. Debugging Configuration (Intermediate)

Create .vscode/launch.json:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal"
        },
        {
            "name": "Python: AI Agent (FastAPI)",
            "type": "python",
            "request": "launch",
            "program": "-m",
            "args": ["uvicorn", "src.main:app", "--reload"],
            "console": "integratedTerminal",
            "envFile": "${workspaceFolder}/.env"
        }
    ]
}

Create .vscode/settings.json:

{
    "python.defaultInterpreterPath": "./.venv/bin/python",
    "python.formatting.provider": "black",
    "editor.formatOnSave": true,
    "python.linting.pylintEnabled": true,
    "python.testing.pytestEnabled": true,
    "python.analysis.typeCheckingMode": "basic",
    "files.autoSave": "afterDelay",
    "editor.codeActionsOnSave": {
        "source.organizeImports": true
    }
}

Note on settings: While pyproject.toml centrally configures the behavior of tools like Black and Ruff, these .vscode/settings.json lines tell VS Code to actually *use* those tools and enable editor-specific features like "format on save" and test discovery.

9. Environment Variables & API Keys (Beginner)

Security: Never commit API keys to version control. Use environment variables.

🚨 SECURITY WARNING:

Never commit .env files to Git. Add to .gitignore:

# .gitignore
.env
*.env
env/

Use secret management in production (AWS Secrets Manager, HashiCorp Vault)

# .env file
OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# Load variables in Python
import os
from dotenv import load_dotenv

load_dotenv()
openai_key = os.getenv("OPENAI_API_KEY")

# Cost estimation function
def estimate_openai_cost(prompt: str, model="gpt-4o-mini") -> float:
    """Estimate API call cost (pricing as of August 2025)"""
    try:
        import tiktoken
    except ImportError:
        print("⚠️ Install tiktoken for cost estimation: pip install tiktoken")
        return 0.0
        
    enc = tiktoken.encoding_for_model(model)
    tokens = len(enc.encode(prompt))
    
    # Pricing examples (check current pricing)
    cost_per_token = {
        "gpt-4o-mini": 0.000005,
        "gpt-4o": 0.000015,
        "claude-3-haiku": 0.000003
    }.get(model, 0.00001)
    
    return tokens * cost_per_token

# Usage example:
# cost = estimate_openai_cost("What is the meaning of life?")
# print(f"Estimated cost: ${cost:.6f}")

10. Complete Setup Verification (Beginner)

Create test_setup.py and run it to verify your environment.

import sys
import os
from dotenv import load_dotenv

def check_package(package_name):
    try:
        __import__(package_name)
        print(f"✓ {package_name} is installed.")
    except ImportError:
        print(f"✗ {package_name} is NOT installed.")

print("🔍 Verifying AI Development Setup...")
# Check Python Version
print(f"✓ Using Python {sys.version}")
# Check Packages
check_package('openai')
check_package('langchain')
check_package('torch')
check_package('dotenv')
# Check API Keys
load_dotenv()
if os.getenv("OPENAI_API_KEY"):
    print("✓ OPENAI_API_KEY found in .env file.")
else:
    print("✗ OPENAI_API_KEY is missing from .env file.")
print("\n🎉 Verification complete.")

11. Your First AI Agent: "Hello, AI World" (Beginner)

🚨 UPDATED FOR LANGCHAIN 0.3.x

Critical syntax updates for compatibility with current versions:

11.1 Using an API (OpenAI) - Updated Syntax

Create src/hello_agent.py:

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

# Updated for LangChain 0.3.x compatibility
# Ensure you have installed: pip install langchain-openai==0.3.28

def run_agent():
    load_dotenv()
    if not os.getenv("OPENAI_API_KEY"):
        print("❌ Error: OPENAI_API_KEY not found in .env file.")
        return

    # Updated import and initialization for v0.3.x
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.5)
    question = "What is the main benefit of using a virtual environment in Python?"
    print(f"🤖 Asking AI: '{question}'")
    
    try:
        # Updated message format for v0.3.x
        messages = [HumanMessage(content=question)]
        response = llm.invoke(messages)
        print(f"✅ AI Response: {response.content}")
    except Exception as e:
        print(f"❌ An error occurred: {e}")

if __name__ == "__main__":
    run_agent()

11.2 Using Direct OpenAI API (Alternative)

Create src/hello_agent_direct.py:

import os
from dotenv import load_dotenv
from openai import OpenAI

# This code uses the modern OpenAI v1.x+ library
# Ensure you have installed: pip install openai>=1.54.0

def run_direct_agent():
    load_dotenv()
    if not os.getenv("OPENAI_API_KEY"):
        print("❌ Error: OPENAI_API_KEY not found in .env file.")
        return

    client = OpenAI()  # API key is read from OPENAI_API_KEY env var by default
    question = "What is the main benefit of using a virtual environment in Python?"
    print(f"🤖 Asking AI: '{question}'")
    
    try:
        completion = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "You are a helpful assistant providing concise answers."},
                {"role": "user", "content": question}
            ]
        )
        response = completion.choices[0].message.content
        print(f"✅ AI Response: {response}")
    except Exception as e:
        print(f"❌ An error occurred: {e}")

if __name__ == "__main__":
    run_direct_agent()

11.3 Using a Local Model (Ollama)

Install Ollama from ollama.com.
Pull a model: ollama pull llama3.1 (or a smaller model like phi3).
Create src/hello_agent_local.py:

import ollama

def run_local_agent():
    question = "What is the main benefit of using a virtual environment in Python?"
    model_name = "llama3.1"
    print(f"🤖 Asking local AI ({model_name}): '{question}'")

    try:
        response = ollama.chat(
            model=model_name,
            messages=[{'role': 'user', 'content': question}]
        )
        print(f"✅ AI Response: {response['message']['content']}")
    except Exception as e:
        print(f"❌ An error occurred. Is the Ollama app running? {e}")

if __name__ == "__main__":
    run_local_agent()

15. Advanced Vector Database Setup (Intermediate)

Use ChromaDB for local RAG development.

# Updated ChromaDB example for v0.5.x
import chromadb

# Create a persistent client that saves to disk
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(name="my_documents")

# Add documents with IDs and metadata
collection.add(
    documents=["This is a document about Python.", "This is a document about AI."],
    ids=["doc1", "doc2"],
    metadatas=[{"topic": "python"}, {"topic": "ai"}]
)

# Query the collection
results = collection.query(query_texts=["What is AI?"], n_results=1)
print("Query results:", results['documents'][0])

15.1 Advanced RAG with LangChain

# Example: RAG chain with updated LangChain 0.3.x syntax
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Create sample documents
docs = [
    Document(page_content="Python is a programming language.", metadata={"source": "doc1"}),
    Document(page_content="AI is artificial intelligence.", metadata={"source": "doc2"})
]

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever()

# Create RAG chain
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI(model="gpt-4o-mini")

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Use the chain
result = rag_chain.invoke("What is Python?")
print(result)

Ready for Production?

You've built functional AI agents. Now let's prepare them for deployment.

Continue to Production

Just need development setup? See Cheat Sheet

Part 3: Production Deployment

90-120 min

Getting Started Development Ready Production

Start Here If:

Your agents are working locally
You're ready to deploy to production
You need CI/CD and monitoring

This section covers containerization, API serving, experiment tracking, and automation. You'll also build a complete RAG agent project.

Deployment Roadmap

Containerize application (Docker)
Implement API serving (FastAPI)
Add monitoring (MLflow)
Automate testing (pre-commit)

12. Docker & Containerization (Intermediate)

Create a Dockerfile to package your application.

# Use Python 3.12 slim image
FROM python:3.12-slim

WORKDIR /app

# For security, create a non-root user
RUN addgroup --system appuser && adduser --system --ingroup appuser appuser

# Copy requirements first for better caching
COPY --chown=appuser:appuser requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY --chown=appuser:appuser . .

# Add health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8000/health || exit 1

# Switch to the non-privileged user
USER appuser

# Expose port and define entry point for FastAPI
EXPOSE 8000
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]

Build and run the container:

docker build -t my-ai-agent .
docker run -p 8000:8000 --env-file .env my-ai-agent

13. Experiment Tracking & MLOps (Intermediate)

Use MLflow to track experiments.

pip install mlflow==2.19.0
mlflow ui # Access at http://localhost:5000

# Example: track_experiment.py
import mlflow
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from dotenv import load_dotenv

load_dotenv()
mlflow.set_experiment("Agent Prompts")

with mlflow.start_run():
    mlflow.log_param("model", "gpt-4o-mini")
    mlflow.log_param("temperature", 0.5)

    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.5)
    messages = [HumanMessage(content="Test prompt")]
    response = llm.invoke(messages)
    
    # Log response details if available
    mlflow.log_text(response.content, "response.txt")
    if hasattr(response, 'usage_metadata'):
        mlflow.log_metric("tokens_used", response.usage_metadata.get('total_tokens', 0))

14. FastAPI for Model Serving (Intermediate)

Create src/main.py to build a production-ready API.

import os
from dotenv import load_dotenv
from fastapi import FastAPI, HTTPException, Request
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

# Load environment variables first
load_dotenv()

# Initialize rate limiter
limiter = Limiter(key_func=get_remote_address)
app = FastAPI(title="AI Agent Server", version="1.0.0")
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

# --- State object to hold the model ---
class AppState:
    llm = None

# --- Pydantic model for request body ---
class PredictionRequest(BaseModel):
    prompt: str
    model: str = "gpt-4o-mini"
    temperature: float = 0.5

# --- App startup event to initialize the model ---
@app.on_event("startup")
async def startup_event():
    # Initialize with error handling
    if not os.getenv("OPENAI_API_KEY"):
        print("🔴 CRITICAL: OPENAI_API_KEY is not set. The /predict endpoint will not work.")
        AppState.llm = None
    else:
        try:
            AppState.llm = ChatOpenAI(model="gpt-4o-mini")
            print("✅ OpenAI client initialized successfully.")
        except Exception as e:
            print(f"🔴 CRITICAL: Could not initialize OpenAI client: {e}")
            AppState.llm = None

# --- API Endpoints ---
@app.post("/predict")
@limiter.limit("5/minute")  # Limit to 5 requests per minute per IP
async def predict(request: PredictionRequest, req: Request):
    if AppState.llm is None:
        raise HTTPException(
            status_code=503, 
            detail="AI model is not available. Check server logs for initialization errors."
        )
    
    try:
        messages = [HumanMessage(content=request.prompt)]
        # Use the model instance from the app state
        response = AppState.llm.invoke(messages)
        return {"response": response.content}
    except Exception as e:
        # Catch potential API errors during invocation
        raise HTTPException(status_code=500, detail=f"An error occurred while processing the request: {e}")

@app.get("/health")
async def health_check():
    health_status = {
        "status": "healthy",
        "message": "AI Agent Server is running",
        "model_initialized": AppState.llm is not None
    }
    return health_status

Run the development server:

pip install fastapi==0.115.6 uvicorn==0.32.1 pydantic==2.10.1 slowapi==0.1.9
uvicorn src.main:app --reload --port 8000

Navigate to http://localhost:8000/docs to see interactive API documentation.

Production Note: For production, consider adding request timeouts and authentication to your FastAPI endpoints. See FastAPI Security for best practices.

16. Code Quality Automation (Intermediate)

Use pre-commit to run checks before every commit.

pip install pre-commit==4.0.1
pre-commit install

Create .pre-commit-config.yaml:

repos:
  - repo: https://github.com/psf/black-pre-commit-mirror
    rev: 24.10.0
    hooks:
      - id: black
  - repo: https://github.com/pycqa/isort
    rev: 5.13.2
    hooks:
      - id: isort
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.8.4
    hooks:
      - id: ruff
        args: [--fix]

20. Integrated Mini-Project: RAG Agent with a FastAPI Endpoint (Advanced)

This final example ties together several concepts from this guide into a single, functional application. We will build a simple Retrieval-Augmented Generation (RAG) API using FastAPI, LangChain, and ChromaDB.

What this project demonstrates:

Project Structure: Using the src/ directory for modular code
Dependency Management: Using packages like fastapi, langchain, and chromadb
Vector Databases: Setting up a persistent ChromaDB store
Advanced Chains: Building a RAG chain with modern LangChain (LCEL)
API Serving: Exposing the RAG chain through a secure FastAPI endpoint
Environment Variables: Loading API keys correctly with python-dotenv

🚀 Project Goal

To create an API endpoint /query that accepts a question, searches a small knowledge base for relevant context, and uses an LLM to generate an answer based on that context.

Step 1: Update Project Structure and Dependencies

First, ensure your project has the following structure and that the necessary packages are installed. We will create three new files: src/vector_store.py, src/rag_chain.py, and src/main_rag_api.py.

my-ai-agent/
├── .venv/
├── .env
├── chroma_db/        # Will be created automatically by ChromaDB
├── src/
│   ├── __init__.py
│   ├── vector_store.py # New: Logic for setting up ChromaDB
│   ├── rag_chain.py    # New: Logic for the RAG chain
│   └── main_rag_api.py # New: The FastAPI application
└── ...

Ensure you have the required packages installed:

See fastapi, uvicorn, langchain, langchain-openai, langchain-community, openai, chromadb, python-dotenv, sentence-transformers on PyPI.

pip install fastapi uvicorn "langchain[llms]" langchain-openai langchain-community openai chromadb python-dotenv sentence-transformers

The sentence-transformers package is used by ChromaDB's default embedding function if you don't provide one, but we will explicitly use OpenAI's embeddings for better performance.

Step 2: Create the Vector Store (`src/vector_store.py`)

This module will handle setting up our document store. It will initialize ChromaDB, add documents to it, and create a retriever object that LangChain can use.

# src/vector_store.py
import chromadb
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document

# Define the persistent directory
CHROMA_PATH = "./chroma_db"

def get_retriever():
    """
    Initializes and returns a ChromaDB retriever from a predefined set of documents.
    """
    # Sample documents for our knowledge base
    docs = [
        Document(
            page_content="VS Code is a lightweight but powerful source code editor from Microsoft.",
            metadata={"source": "doc1", "topic": "tools"}
        ),
        Document(
            page_content="A virtual environment is a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages.",
            metadata={"source": "doc2", "topic": "python"}
        ),
        Document(
            page_content="RAG, or Retrieval-Augmented Generation, is a technique for enhancing the accuracy and reliability of large language models (LLMs) with facts fetched from external sources.",
            metadata={"source": "doc3", "topic": "ai"}
        ),
        Document(
            page_content="FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.8+ based on standard Python type hints.",
            metadata={"source": "doc4", "topic": "tools"}
        ),
    ]

    # Initialize OpenAI embeddings
    embeddings = OpenAIEmbeddings()

    # Create a new ChromaDB persistent client
    # This will save the vector store to disk in the 'chroma_db' directory
    db_client = chromadb.PersistentClient(path=CHROMA_PATH)

    # Create or load the vector store
    vectorstore = Chroma.from_documents(
        documents=docs, 
        embedding=embeddings,
        persist_directory=CHROMA_PATH
    )

    # Create and return a retriever
    # 'k=2' means it will retrieve the top 2 most relevant documents
    return vectorstore.as_retriever(search_kwargs={"k": 2})

if __name__ == '__main__':
    # A simple test to verify the retriever is working
    print("Initializing and testing the vector store...")
    retriever = get_retriever()
    test_query = "What is RAG?"
    results = retriever.invoke(test_query)
    print(f"Retrieved {len(results)} documents for query: '{test_query}'")
    for doc in results:
        print(f"- {doc.page_content}")
    print("\nVector store setup complete and verified.")

Step 3: Create the RAG Chain (`src/rag_chain.py`)

This module defines the core logic of our AI. It imports the retriever from the previous step and chains it together with a prompt template and an LLM to create the final RAG chain.

# src/rag_chain.py
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from src.vector_store import get_retriever

def get_rag_chain():
    """
    Creates and returns a RAG chain using the vector store retriever.
    """
    retriever = get_retriever()
    
    # RAG prompt template
    template = """You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context to answer the question. 
    If you don't know the answer, just say that you don't know. 
    Keep the answer concise.

    Context: {context} 

    Question: {question} 

    Answer:"""
    
    prompt = ChatPromptTemplate.from_template(template)
    
    # Initialize the LLM
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    
    # Create the RAG chain using LangChain Expression Language (LCEL)
    rag_chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
    
    return rag_chain

if __name__ == '__main__':
    # A simple test to verify the chain is working
    print("Testing the RAG chain...")
    chain = get_rag_chain()
    response = chain.invoke("What is FastAPI?")
    print(response)

Step 4: Build the FastAPI App (`src/main_rag_api.py`)

This is the entry point for our API. It loads the RAG chain, defines the request and response models, and creates an endpoint to handle user queries.

# src/main_rag_api.py
import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from dotenv import load_dotenv
from src.rag_chain import get_rag_chain

# Load environment variables from .env file
load_dotenv()

# Initialize the FastAPI app
app = FastAPI(
    title="RAG API Server",
    version="1.0",
    description="A simple API server for a Retrieval-Augmented Generation agent.",
)

# --- Pydantic Models for Request and Response ---
class QueryRequest(BaseModel):
    question: str

class QueryResponse(BaseModel):
    answer: str

# --- API Endpoints ---
@app.get("/", summary="Health Check")
async def health_check():
    """A simple health check endpoint to confirm the server is running."""
    return {"status": "ok", "message": "RAG API is running"}

@app.post("/query", response_model=QueryResponse, summary="Query the RAG Agent")
async def query_agent(request: QueryRequest):
    """
    Receives a question, processes it through the RAG chain, and returns the answer.
    """
    if not os.getenv("OPENAI_API_KEY"):
        raise HTTPException(status_code=500, detail="OPENAI_API_KEY not found in environment variables.")
    
    if not request.question:
        raise HTTPException(status_code=400, detail="Question field cannot be empty.")
    
    try:
        # Get the singleton RAG chain instance
        rag_chain = get_rag_chain()
        answer = rag_chain.invoke(request.question)
        return QueryResponse(answer=answer)
    except Exception as e:
        # A generic error handler for issues during chain invocation
        raise HTTPException(status_code=500, detail=f"An error occurred: {e}")

# To run this app:
# uvicorn src.main_rag_api:app --reload --port 8000

Step 5: Run and Test Your Integrated Application

With all the files in place, you can now run your API server.

Ensure your .env file contains your OPENAI_API_KEY.
Start the Server: Open your terminal (with the virtual environment activated) and run:
```
uvicorn src.main_rag_api:app --reload --port 8000
```
Test via Interactive Docs: Open your browser and navigate to http://127.0.0.1:8000/docs. You will see the FastAPI interface.
- Expand the /query endpoint.
- Click "Try it out".
- Enter a question in the request body, such as: "What is the purpose of a virtual environment?"
- Click "Execute". You should see the AI-generated response based on the context from your vector store.

Test via curl (Optional):

curl -X POST "http://127.0.0.1:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What is VS Code?"}'

Expected output:

{"answer":"VS Code is a lightweight but powerful source code editor from Microsoft."}

Production Tip: For public deployments, add authentication and rate limiting to your FastAPI endpoints. See FastAPI Security for best practices.

Shared Resources

18. Troubleshooting & Version Conflicts

🚨 Common Version Conflict Solutions

18.1 Package Installation Errors

Dependency Conflicts: Create a completely new, empty virtual environment (python -m venv .tmp_venv), activate it, and try to pip install only the problematic package. This will help isolate the issue.
LangChain Import Errors: If you see ModuleNotFoundError: No module named 'langchain_openai', ensure you've installed the separate package: pip install langchain-openai==0.3.28
PyTorch CUDA Issues: Verify your CUDA installation with nvcc --version and ensure it matches the PyTorch version you're installing.

18.2 LangChain Migration Issues

If upgrading from older LangChain versions:

# Use the migration CLI tool
pip install langchain-cli
langchain-cli migrate --diff [path to code]  # Preview changes
langchain-cli migrate [path to code]         # Apply changes

18.3 API Key and Authentication Errors

Environment Variables: Ensure there are no typos in your .env file. Variable names are case-sensitive.
API Key Validation: Verify your billing status and rate limits on the provider's dashboard (e.g., OpenAI Usage Dashboard).
Permissions: Check that your API key has the necessary permissions for the models you're trying to use.

18.4 GPU and CUDA Issues

Driver Issues: Run nvidia-smi to ensure your driver sees the GPU.
CUDA Version Mismatch: Double-check that your installed CUDA version matches the version PyTorch was built with by running the test_gpu.py script.
Memory Issues: Use torch.cuda.empty_cache() to clear GPU memory if you encounter out-of-memory errors.

18.5 Version Checking Commands

Use these commands to verify your current package versions:

# Check specific package versions
pip show torch langchain langchain-openai openai

# Check for package conflicts
pip check

# List all installed packages
pip list

# Check Python version and location
which python

19. Changelog

August 5, 2025 (MAJOR UPDATE - v3.0):
- CRITICAL FIXES APPLIED:
  - Updated PyTorch to 2.7.0 with CUDA 12.1 support
  - Fixed LangChain integration for 0.3.x compatibility with new import syntax
  - Updated all package versions to August 2025 standards
  - Resolved OpenAI API compatibility issues
  - Added Python 3.13 compatibility verification
- NEW FEATURES:
  - Comprehensive version conflict resolution strategies
  - Updated GPU compatibility matrix with architecture support
  - Enhanced troubleshooting section with specific error solutions
  - Added package evolution warnings and version checking commands
  - Improved FastAPI examples with proper error handling
- PACKAGE VERSION UPDATES:
  - PyTorch: 2.3.1 → 2.7.0 (CUDA 12.1)
  - LangChain-OpenAI: 0.1.7 → 0.3.28 (breaking changes)
  - NumPy: 1.26.4 → 2.3.2
  - Pandas: 2.2.2 → 2.3.1
  - Transformers: 4.41.2 → 4.53.2
  - OpenAI: 1.35.7 → 1.54.3
  - All other packages updated to latest stable versions

Quick Reference

Essential Commands

Task	Command
Create virtual environment	`python -m venv .venv`
Activate environment (Windows)	`.venv\Scripts\activate`
Activate environment (Mac/Linux)	`source .venv/bin/activate`
Install core packages	`pip install langchain-openai chromadb`
Run FastAPI	`uvicorn src.main:app --reload`
Build Docker image	`docker build -t my-ai-agent .`

Common Package Versions

Package	Version	Type
Python	3.12.7	Core
PyTorch	2.7.0	ML
LangChain	0.3.27	AI
FastAPI	0.115.6	Web
ChromaDB	0.5.20	Vector DB

CRITICAL UPDATES IN VERSION 3.0

Quick Start (30 minutes)

⚠️ LangChain-OpenAI Conflict Resolution

Part 1: Basic Setup

Start Here If:

1. Install VS Code and Prerequisites (Beginner)

1.1 Install VS Code

1.2 Install Python

Python Version Compatibility Matrix (August 2025)

1.3 Install Git

2. Set Up Python in VS Code (Beginner)

2.1 Install the Python Extension

2.2 Select Python Interpreter

3. Virtual Environment Setup (Beginner)

3.1 Create a Virtual Environment

Method 1: VS Code Integrated Creation (Easiest)

Method 2: Manual Terminal Creation

3.2 Configure VS Code to Use the Virtual Environment

4. AI-Specific Extensions & Tools (Beginner)

5. Install Essential AI Packages (Intermediate)

🚨 UPDATED PACKAGE VERSIONS (August 2025)

Path A: Simple Method (pip freeze)

Path B: Advanced Method (`pip-tools`)

⚠️ Version Conflict Resolution

6. GPU Setup for AI Development (Intermediate)

GPU Compatibility Matrix (August 2025)

6.1 Prerequisites

6.2 Install CUDA Toolkit and cuDNN

6.3 Install GPU-Enabled PyTorch

6.4 Verify GPU Setup

Ready to Build AI Agents?

Part 2: AI Development

Start Here If:

7. Project Structure & Best Practices (Beginner)

7.1 Structure

7.2 Create .gitignore

7.3 Centralizing Tool Configuration (`pyproject.toml`)

8. Debugging Configuration (Intermediate)

9. Environment Variables & API Keys (Beginner)

🚨 SECURITY WARNING:

10. Complete Setup Verification (Beginner)

11. Your First AI Agent: "Hello, AI World" (Beginner)

🚨 UPDATED FOR LANGCHAIN 0.3.x

11.1 Using an API (OpenAI) - Updated Syntax

11.2 Using Direct OpenAI API (Alternative)

11.3 Using a Local Model (Ollama)

15. Advanced Vector Database Setup (Intermediate)

15.1 Advanced RAG with LangChain

Ready for Production?

Part 3: Production Deployment

Start Here If:

Deployment Roadmap

12. Docker & Containerization (Intermediate)

13. Experiment Tracking & MLOps (Intermediate)

14. FastAPI for Model Serving (Intermediate)

16. Code Quality Automation (Intermediate)

20. Integrated Mini-Project: RAG Agent with a FastAPI Endpoint (Advanced)

🚀 Project Goal

Step 1: Update Project Structure and Dependencies

Step 2: Create the Vector Store (src/vector_store.py)

Step 3: Create the RAG Chain (src/rag_chain.py)

Step 4: Build the FastAPI App (src/main_rag_api.py)

Step 5: Run and Test Your Integrated Application

Shared Resources

18. Troubleshooting & Version Conflicts

🚨 Common Version Conflict Solutions

18.1 Package Installation Errors

18.2 LangChain Migration Issues

18.3 API Key and Authentication Errors

18.4 GPU and CUDA Issues

18.5 Version Checking Commands

19. Changelog

Quick Reference

Essential Commands

Common Package Versions

Step 2: Create the Vector Store (`src/vector_store.py`)

Step 3: Create the RAG Chain (`src/rag_chain.py`)

Step 4: Build the FastAPI App (`src/main_rag_api.py`)