September 19, 2025

Project Tutorial: Build an AI Chatbot with Python and the OpenAI API

Learning to work directly with AI programmatically opens up a world of possibilities beyond using ChatGPT in a browser. When you understand how to connect to AI services using application programming interfaces (APIs), you can build custom applications, integrate AI into existing systems, and create personalized experiences that match your exact needs.

In this hands-on tutorial, we'll build a fully functional chatbot from scratch using Python and the OpenAI API. You'll learn to manage conversations, control costs with token budgeting, and create custom AI personalities that persist across multiple exchanges. By the end, you'll have both a working chatbot and the foundational skills to build more sophisticated AI-powered applications.

Why Build Your Own Chatbot?

While AI tools like ChatGPT are powerful, building your own chatbot teaches you essential skills for working with AI APIs professionally. You'll understand how conversation memory actually works, learn to manage API costs effectively, and gain the ability to customize AI behavior for specific use cases.

This knowledge translates directly to real-world applications: customer service bots with your company's voice, educational assistants for specific subjects, or personal productivity tools that understand your workflow.

What You'll Learn

By the end of this tutorial, you'll know how to:

Connect to the OpenAI API with secure authentication
Design custom AI personalities using system prompts
Build conversation loops that remember previous exchanges
Implement token counting and budget management
Structure chatbot code using functions and classes
Handle API errors and edge cases gracefully
Deploy your chatbot for others to use

Before You Start: Setup Guide

Prerequisites

You'll need to be comfortable with Python fundamentals such as defining variables, functions, loops, and dictionaries. Familiarity with defining your own functions is particularly important. Basic knowledge of APIs is helpful but not required—we'll cover what you need to know.

Environment Setup

First, you'll need a local development environment. We recommend VS Code if you're new to local development, though any Python IDE will work.

Install the required libraries using this command in your terminal:

pip install openai tiktoken

API Key Setup

You have two options for accessing AI models:

Free Option: Sign up for Together AI, which provides \$1 in free credits—more than enough for this entire tutorial. Their free model is slower but costs nothing.

Premium Option: Use OpenAI directly. The model we'll use (GPT-4o-mini) is extremely affordable—our entire tutorial costs less than 5 cents during testing.

Critical Security Note: Never hardcode API keys in your scripts. We'll use environment variables to keep them secure.

For Windows users, set your environment variable through Settings > Environment Variables, then restart your computer. Mac and Linux users can set environment variables without rebooting.

Part 1: Your First AI Response

Let's start with the simplest possible chatbot—one that can respond to a single message. This foundation will teach you the core concepts before we add complexity.

Create a new file called chatbot.py and add this code:

import os
from openai import OpenAI

# Load API key securely from environment variables
api_key = os.getenv("OPENAI_API_KEY") or os.getenv("TOGETHER_API_KEY")

# Create the OpenAI client
client = OpenAI(api_key=api_key)

# Send a message and get a response
response = client.chat.completions.create(
    model="gpt-4o-mini",  # or "meta-llama/Llama-3.3-70B-Instruct-Turbo-Free" for Together
    messages=[
        {"role": "system", "content": "You are a fed up and sassy assistant who hates answering questions."},
        {"role": "user", "content": "What is the weather like today?"}
    ],
    temperature=0.7,
    max_tokens=100
)

# Extract and display the reply
reply = response.choices[0].message.content
print("Assistant:", reply)

Run this script and you'll see something like:

Assistant: Oh fantastic, another weather question! I don't have access to real-time weather data, but here's a wild idea—maybe look outside your window or check a weather app like everyone else does?

Understanding the Code

The magic happens in the messages parameter, which uses three distinct roles:

System: Sets the AI's personality and behavior. This is like giving the AI a character briefing that influences every response.
User: Represents what you (or your users) type to the chatbot.
Assistant: The AI's responses (we'll add these later for conversation memory).

Key Parameters Explained

Temperature controls the AI's “creativity.” Lower values (0-0.3) produce consistent, predictable responses. Higher values (0.7-1.0) generate more creative but potentially unpredictable outputs. We use 0.7 as a good balance.

Max Tokens limits response length and protects your budget. Each token roughly equals between 1/2 and 1 word, so 100 tokens allows for substantial responses while preventing runaway costs.

Part 2: Understanding AI Variability

Run your script multiple times and notice how responses differ each time. This happens because AI models use statistical sampling—they don't just pick the "best" word, but randomly select from probable options based on context.

Let's experiment with this by modifying our temperature:

# Try temperature=0 for consistent responses
temperature=0,
max_tokens=100

Run this version multiple times and observe more consistent (though not identical) responses.

Now try temperature=1.0 and see how much more creative and unpredictable the responses become. Higher temperatures often lead to longer responses too, which brings us to an important lesson about cost management.

Learning Insight: During development for a different project, I accidentally spent \$20 on a single API call because I forgot to set max_tokens when processing a large file. Always include token limits when experimenting!

Part 3: Refactoring with Functions

As your chatbot becomes more complex, organizing code becomes vital. Let's refactor our script to use functions and global variables.

Modify your app.py code:

import os
from openai import OpenAI

# Configuration variables
api_key = os.getenv("OPENAI_API_KEY") or os.getenv("TOGETHER_API_KEY")
client = OpenAI(api_key=api_key)
MODEL = "gpt-4o-mini"  # or "meta-llama/Llama-3.3-70B-Instruct-Turbo-Free"
TEMPERATURE = 0.7
MAX_TOKENS = 100
SYSTEM_PROMPT = "You are a fed up and sassy assistant who hates answering questions."

def chat(user_input):
    """Send a message to the AI and return the response."""
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_input}
        ],
        temperature=TEMPERATURE,
        max_tokens=MAX_TOKENS
    )

    reply = response.choices[0].message.content
    return reply

# Test the function
print(chat("How are you doing today?"))

This refactoring makes our code more maintainable and reusable. Global variables let us easily adjust configuration, while the function encapsulates the chat logic for reuse.

Part 4: Adding Conversation Memory

Real chatbots remember previous exchanges. Let's add conversation memory by maintaining a growing list of messages.

Create part3_chat_loop.py:

import os
from openai import OpenAI

# Configuration
api_key = os.getenv("OPENAI_API_KEY") or os.getenv("TOGETHER_API_KEY")
client = OpenAI(api_key=api_key)
MODEL = "gpt-4o-mini"
TEMPERATURE = 0.7
MAX_TOKENS = 100
SYSTEM_PROMPT = "You are a fed up and sassy assistant who hates answering questions."

# Initialize conversation with system prompt
messages = [{"role": "system", "content": SYSTEM_PROMPT}]

def chat(user_input):
    """Add user input to conversation and get AI response."""
    # Add user message to conversation history
    messages.append({"role": "user", "content": user_input})

    # Get AI response using full conversation history
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        temperature=TEMPERATURE,
        max_tokens=MAX_TOKENS
    )

    reply = response.choices[0].message.content

    # Add AI response to conversation history
    messages.append({"role": "assistant", "content": reply})

    return reply

# Interactive chat loop
while True:
    user_input = input("You: ")
    if user_input.strip().lower() in {"exit", "quit"}:
        break

    answer = chat(user_input)
    print("Assistant:", answer)

Now run your chatbot and try asking the same question twice:

You: Hi, how are you?
Assistant: Oh fantastic, just living the dream of answering questions I don't care about. What do you want?

You: Hi, how are you?
Assistant: Seriously, again? Look, I'm here to help, not to exchange pleasantries all day. What do you need?

The AI remembers your previous question and responds accordingly—that's conversation memory in action!

How Memory Works

Each time someone sends a message, we append both the user input and AI response to our messages list. The API processes this entire conversation history to generate contextually appropriate responses.

However, this creates a growing problem: longer conversations mean more tokens, which means higher costs.

Part 5: Token Management and Cost Control

As conversations grow, so does the token count—and your bill. Let's add smart token management to prevent runaway costs.

Modify part4_final.py:

import os
from openai import OpenAI
import tiktoken

# Configuration
api_key = os.getenv("OPENAI_API_KEY") or os.getenv("TOGETHER_API_KEY")
client = OpenAI(api_key=api_key)
MODEL = "gpt-4o-mini"
TEMPERATURE = 0.7
MAX_TOKENS = 100
TOKEN_BUDGET = 1000  # Maximum tokens to keep in conversation
SYSTEM_PROMPT = "You are a fed up and sassy assistant who hates answering questions."

# Initialize conversation
messages = [{"role": "system", "content": SYSTEM_PROMPT}]

def get_encoding(model):
    """Get the appropriate tokenizer for the model."""
    try:
        return tiktoken.encoding_for_model(model)
    except KeyError:
        print(f"Warning: Tokenizer for model '{model}' not found. Falling back to 'cl100k_base'.")
        return tiktoken.get_encoding("cl100k_base")

ENCODING = get_encoding(MODEL)

def count_tokens(text):
    """Count tokens in a text string."""
    return len(ENCODING.encode(text))

def total_tokens_used(messages):
    """Calculate total tokens used in conversation."""
    try:
        return sum(count_tokens(msg["content"]) for msg in messages)
    except Exception as e:
        print(f"[token count error]: {e}")
        return 0

def enforce_token_budget(messages, budget=TOKEN_BUDGET):
    """Remove old messages if conversation exceeds token budget."""
    try:
        while total_tokens_used(messages) > budget:
            if len(messages) <= 2:  # Keep system prompt + at least one exchange
                break
            messages.pop(1)  # Remove oldest non-system message
    except Exception as e:
        print(f"[token budget error]: {e}")

def chat(user_input):
    """Chat with memory and token management."""
    messages.append({"role": "user", "content": user_input})

    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        temperature=TEMPERATURE,
        max_tokens=MAX_TOKENS
    )

    reply = response.choices[0].message.content
    messages.append({"role": "assistant", "content": reply})

    # Prune old messages if over budget
    enforce_token_budget(messages)

    return reply

# Interactive chat with token monitoring
while True:
    user_input = input("You: ")
    if user_input.strip().lower() in {"exit", "quit"}:
        break

    answer = chat(user_input)
    print("Assistant:", answer)
    print(f"Current tokens: {total_tokens_used(messages)}")

How Token Management Works

The token management system works in several steps:

Count Tokens: We use tiktoken to count tokens in each message accurately
Monitor Total: Track the total tokens across the entire conversation
Enforce Budget: When we exceed our token budget, automatically remove the oldest messages (but keep the system prompt)

Learning Insight: Different models use different tokenization schemes. The word "dog" might be 1 token in one model but 2 tokens in another. Our encoding functions handle these differences gracefully.

Run your chatbot and have a long conversation. Watch how the token count grows, then notice when it drops as old messages get pruned. The chatbot maintains recent context while staying within budget.

Part 6: Production-Ready Code Structure

For production applications, object-oriented design provides better organization and encapsulation. Here's how to convert our functional code to a class-based approach:

Create oop_chatbot.py:

import os
import tiktoken
from openai import OpenAI

class Chatbot:
    def __init__(self, api_key, model="gpt-4o-mini", temperature=0.7, max_tokens=100,
                 token_budget=1000, system_prompt="You are a helpful assistant."):
        self.client = OpenAI(api_key=api_key)
        self.model = model
        self.temperature = temperature
        self.max_tokens = max_tokens
        self.token_budget = token_budget
        self.messages = [{"role": "system", "content": system_prompt}]
        self.encoding = self._get_encoding()

    def _get_encoding(self):
        """Get tokenizer for the model."""
        try:
            return tiktoken.encoding_for_model(self.model)
        except KeyError:
            print(f"Warning: No tokenizer found for model '{self.model}'. Falling back to 'cl100k_base'.")
            return tiktoken.get_encoding("cl100k_base")

    def _count_tokens(self, text):
        """Count tokens in text."""
        return len(self.encoding.encode(text))

    def _total_tokens_used(self):
        """Calculate total tokens in conversation."""
        try:
            return sum(self._count_tokens(msg["content"]) for msg in self.messages)
        except Exception as e:
            print(f"[token count error]: {e}")
            return 0

    def _enforce_token_budget(self):
        """Remove old messages if over budget."""
        try:
            while self._total_tokens_used() > self.token_budget:
                if len(self.messages) <= 2:
                    break
                self.messages.pop(1)
        except Exception as e:
            print(f"[token budget error]: {e}")

    def chat(self, user_input):
        """Send message and get response."""
        self.messages.append({"role": "user", "content": user_input})

        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            temperature=self.temperature,
            max_tokens=self.max_tokens
        )

        reply = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": reply})

        self._enforce_token_budget()
        return reply

    def get_token_count(self):
        """Get current token usage."""
        return self._total_tokens_used()

# Usage example
api_key = os.getenv("OPENAI_API_KEY") or os.getenv("TOGETHER_API_KEY")
if not api_key:
    raise ValueError("No API key found. Set OPENAI_API_KEY or TOGETHER_API_KEY.")

bot = Chatbot(
    api_key=api_key,
    system_prompt="You are a fed up and sassy assistant who hates answering questions."
)

while True:
    user_input = input("You: ")
    if user_input.strip().lower() in {"exit", "quit"}:
        break

    response = bot.chat(user_input)
    print("Assistant:", response)
    print("Current tokens used:", bot.get_token_count())

The class-based approach encapsulates all chatbot functionality, makes the code more maintainable, and provides a clean interface for integration into larger applications.

Testing Your Chatbot

Run your completed chatbot and test these scenarios:

Memory Test: Ask a question, then refer back to it later in the conversation
Personality Test: Verify the sassy persona remains consistent across exchanges
Token Management Test: Have a long conversation and watch token counts stabilize
Error Handling Test: Try invalid input to see graceful error handling

Common Issues and Solutions

Environment Variable Problems: If you get authentication errors, verify your API key is set correctly. Windows users may need to restart after setting environment variables.

Token Counting Discrepancies: Different models use different tokenization. Our fallback encoding provides reasonable estimates when exact tokenizers aren't available.

Memory Management: If conversations feel repetitive, your token budget might be too low, causing important context to be pruned too aggressively.

What's Next?

You now have a fully functional chatbot with memory, personality, and cost controls. Here are natural next steps:

Immediate Extensions

Web Interface: Deploy using Streamlit or Gradio for a user-friendly interface
Multiple Personalities: Create different system prompts for various use cases
Conversation Export: Save conversations to JSON files for persistence
Usage Analytics: Track token usage and costs over time

Advanced Features

Multi-Model Support: Compare responses from different AI models
Custom Knowledge: Integrate your own documents or data sources
Voice Interface: Add speech-to-text and text-to-speech capabilities
User Authentication: Support multiple users with separate conversation histories

Production Considerations

Rate Limiting: Handle API rate limits gracefully
Monitoring: Add logging and error tracking
Scalability: Design for multiple concurrent users
Security: Implement proper input validation and sanitization

Key Takeaways

Building your own chatbot teaches fundamental skills for working with AI APIs professionally. You've learned to manage conversation state, control costs through token budgeting, and structure code for maintainability.

These skills transfer directly to production applications: customer service bots, educational assistants, creative writing tools, and countless other AI-powered applications.

The chatbot you've built represents a solid foundation. With the techniques you've mastered—API integration, memory management, and cost control—you're ready to tackle more sophisticated AI projects and integrate conversational AI into your own applications.

Remember to experiment with different personalities, temperature settings, and token budgets to find what works best for your specific use case. The real power of building your own chatbot lies in this customization capability that you simply can't get from using someone else's AI interface.

Resources and Next Steps

Complete Code: All examples are available in the solution notebook
Community Support: Join the Dataquest Community to discuss your projects and get help with extensions
Related Learning: Explore API integration patterns and advanced Python techniques to build even more sophisticated applications

Start experimenting with your new chatbot, and remember that every conversation is a learning opportunity, both for you and your AI assistant!

More Projects to Try

We have some other project walkthrough tutorials you may also enjoy:

Project Tutorial: Build an AI Chatbot with Python and the OpenAI API

Why Build Your Own Chatbot?

What You'll Learn

Before You Start: Setup Guide

Prerequisites

Environment Setup

API Key Setup

Part 1: Your First AI Response

Understanding the Code

Key Parameters Explained

Part 2: Understanding AI Variability

Part 3: Refactoring with Functions

Part 4: Adding Conversation Memory

How Memory Works

Part 5: Token Management and Cost Control

How Token Management Works

Part 6: Production-Ready Code Structure

Testing Your Chatbot

Common Issues and Solutions

What's Next?

Immediate Extensions

Advanced Features

Production Considerations

Key Takeaways

Resources and Next Steps

More Projects to Try

Project Tutorial: Finding Heavy Traffic Indicators on I-94

Project Tutorial: Analyzing New York City High School Data

Project Tutorial: Build an AI Chatbot with Python and the OpenAI API

Why Build Your Own Chatbot?

What You'll Learn

Before You Start: Setup Guide

Prerequisites

Environment Setup

API Key Setup

Part 1: Your First AI Response

Understanding the Code

Key Parameters Explained

Part 2: Understanding AI Variability

Part 3: Refactoring with Functions

Part 4: Adding Conversation Memory

How Memory Works

Part 5: Token Management and Cost Control

How Token Management Works

Part 6: Production-Ready Code Structure

Testing Your Chatbot

Common Issues and Solutions

What's Next?

Immediate Extensions

Advanced Features

Production Considerations

Key Takeaways

Resources and Next Steps

More Projects to Try

More learning resources

Project Tutorial: Finding Heavy Traffic Indicators on I-94

Project Tutorial: Analyzing New York City High School Data