Otavio

“The learning paths on Dataquest are incredible. They give you a direction through the learning process – you don’t have to guess what to learn next.”

Otávio Silveira

Data Analyst @ Hortifruti

Path overview

In this path, you’ll build the technical skills AI engineers need, including Python programming, working with LLM APIs, and prompt engineering. You’ll learn to build and deploy AI applications using FastAPI and Docker, then go deeper into machine learning, deep learning with PyTorch, embeddings, vector databases, and RAG systems. You’ll also pick up essential tooling like the command line, Git, and virtual environments.
Best of all, you’ll learn by doing – you’ll write code and get feedback directly in the browser. You’ll apply your skills to several guided projects involving realistic business scenarios to build your portfolio and prepare for your next interview.

Key skills

  • Developing core Python programming and tooling skills for AI engineering workflows
  • Interfacing with large language models through APIs, prompt engineering, and tool use
  • Building and deploying production AI applications using FastAPI and Docker
  • Analyzing and visualizing data using pandas, NumPy, and matplotlib
  • Applying supervised and unsupervised machine learning techniques with scikit-learn
  • Implementing deep learning models using PyTorch
  • Working with embeddings, vector databases, and semantic search
  • Designing and building retrieval-augmented generation (RAG) systems

Path outline

Part 1: Python Introduction [2 courses]

Course 1: Introduction to Python Programming 4h

Course Objectives
  • Save and update values using variables
  • Process numerical data and text data
  • Create and update lists using Python
  • Repeat a process using a for loop
  • Use logical and comparison operators to apply conditions to variables

Course 2: Python Dictionaries, APIs, and Functions 7h

Course Objectives
  • Create and update dictionaries
  • Use an API to get information from the web
  • Create your own functions
  • Complete a project using Jupyter Notebook

Part 2: Intermediate Python [2 courses]

Course 1: Intermediate Python for AI Engineering 12h

Course Objectives
  • Use object-oriented programming (OOP)
  • Create and use decorators
  • Write regular expressions (regex)
  • Use list comprehensions and lambda functions
  • Incorporate error handling for user input validation

Course 2: Tooling Essentials for Python Users 6h

Course Objectives
  • Master essential command-line tools to navigate, manage, and manipulate files and directories.
  • Understand and implement virtual environments and environment variables to manage Python packages and configurations.
  • Gain proficiency in using Git for version control to track changes, collaborate on projects, and manage code.
  • Evaluate, select, and set up an Integrated Development Environment (IDE) to enhance productivity and streamline the Python development process.

Part 3: LLM Fundamentals [3 courses]

Course 1: AI Chatbots: Harnessing the Power of Large Language Models with Chandra 3h

Course Objectives
  • Understand the basics of AI, machine learning, deep learning, natural language processing, and chatbots.
  • Learn how to craft effective prompts and interact with chatbots to improve learning outcomes.
  • Explore practical use cases for AI chatbots in education, work, and personal projects.
  • Gain hands-on experience using Chandra on the Dataquest platform.

Course 2: Prompting Large Language Models in Python 7h

Course Objectives
  • Utilize OpenAI's Chat Completions API to generate tailored AI-driven responses
  • Manage conversation histories to maintain context in AI conversations
  • Create custom Python functions for dynamic interactions with large language models
  • Learn prompt engineering techniques to guide AI responses effectively
  • Regulate token usage within the OpenAI API framework for efficient scripting
  • Adopt best practices in prompt engineering to improve the quality of AI-generated text

Course 3: Tool Use with LLMs in Python 6h

Course Objectives
  • Generate validated, structured outputs from LLM responses
  • Implement agentic loops that handle multi-step tool execution
  • Create reusable tool servers using the Model Context Protocol
  • Design prompt templates and pipelines for reliability
  • Handle errors and validation failures in LLM workflows

Part 4: AI Application Development [2 courses]

Course 1: APIs for AI Applications 6h

Course Objectives
  • Utilize APIs with GET requests.
  • Master API query parameters, pagination, and JSON handling for AI applications in Python.
  • Understand and apply various API authentication methods for AI data access.

Course 2: Building AI Apps with FastAPI 8h

Course Objectives
  • Build LLM-powered APIs with FastAPI using Pydantic validation and async operations
  • Write Dockerfiles to containerize FastAPI applications
  • Define multi-service architectures with Docker Compose configuration files
  • Connect application services to PostgreSQL and persist data with volumes
  • Apply production-ready patterns including health checks, multi-stage builds, non-root users, and image version tagging

Part 5: Data Analysis and Visualization [3 courses]

Course 1: Introduction to Pandas and NumPy for Data Analysis 13h

Course Objectives
  • Improve your workflow using vectorized operations
  • Select data by value using Boolean indexing
  • Analyze data using pandas and NumPy

Course 2: Introduction to Data Visualization in Python 8h

Course Objectives
  • Visualize time series data with line plots
  • Define correlations and visualize them with scatter plots
  • Visualize frequency distributions with bar plots and histograms
  • Improve your exploratory data visualization workflow using pandas
  • Visualize multiple variables using Seaborn's relational plots

Course 3: Data Cleaning and Analysis in Python 11h

Course Objectives
  • Employ data aggregation techniques
  • Combine datasets
  • Transform and reshape data
  • Clean strings and resolve missing data

Part 6: Probability and Statistics [5 courses]

Course 1: Introduction to Statistics in Python 8h

Course Objectives
  • Sample data using simple random sampling, stratified sampling, and cluster sampling
  • Measure variables in statistics
  • Create frequency distribution tables

Course 2: Intermediate Statistics in Python 8h

Course Objectives
  • Summarize a distribution using the mean, the weighted mean, the median, or the mode
  • Measure the variability of a distribution using the variance and the standard deviation
  • Compare values using z-scores

Course 3: Introduction to Probability in Python 4h

Course Objectives
  • Estimate theoretical and empirical probabilities
  • Employ the fundamental rules of probability
  • Employ combinations and permutations

Course 4: Introduction to Conditional Probability in Python 6h

Course Objectives
  • Assign probabilities based on conditions
  • Assign probabilities based on event independence
  • Assign probabilities based on prior knowledge
  • Create spam filters using multinomial Naive Bayes

Course 5: Hypothesis Testing in Python 4h

Course Objectives
  • Perform a permutation test
  • Perform significance testing to understand an outcome's importance
  • Define regular and multi-category chi-squared tests

Part 7: Machine Learning Foundations [4 courses]

Course 1: Introduction to Supervised Machine Learning in Python 8h

Course Objectives
  • Establish a machine learning workflow
  • Implement the K-Nearest Neighbors algorithm for a classification task from scratch using Pandas
  • Implement the K-Nearest Neighbors algorithm using scikit-learn
  • Evaluate a machine learning model
  • Find optimal hyperparameter values using grid search

Course 2: Introduction to Unsupervised Machine Learning in Python 6h

Course Objectives
  • Identify applications of unsupervised machine learning
  • Implement a basic k-means algorithm
  • Evaluate and optimize the performance of a k-means model
  • Visualize the model
  • Build a k-means model using scikit-learn

Course 3: Calculus For Machine Learning 2h

Course Objectives
  • Define mathematical functions using calculus
  • Employ intermediate machine learning techniques

Course 4: Linear Algebra For Machine Learning 3h

Course Objectives
  • Define linear systems using linear algebra
  • Employ intermediate machine learning techniques

Part 8: Intermediate Machine Learning with Python [5 courses]

Course 1: Linear Regression Modeling in Python 4h

Course Objectives
  • Describe a linear regression model
  • Construct a linear regression model and evaluate it based on the data
  • Interpret the results of a linear regression model
  • Use a linear regression model for inference and prediction

Course 2: Gradient Descent Modeling in Python 3h

Course Objectives
  • Code a basic Gradient Descent algorithm
  • Recognize the limitations of basic Gradient Descent
  • Contrast the basic Batch and Stochastic Gradient Descent uses
  • Visualize Stochastic Gradient Descent using Matplotlib
  • Apply Stochastic Gradient Descent in Python using Scikit Learn

Course 3: Logistic Regression Modeling in Python 4h

Course Objectives
  • Describe a logistic regression model
  • Construct a logistic regression model and evaluate it based on the data
  • Interpret the results of a logistic regression model
  • Use a logistic regression model for inference and prediction

Course 4: Decision Tree and Random Forest Modeling in Python 6h

Course Objectives
  • Create, customize, and visualize decision trees
  • Use and interpret decision trees on new data
  • Calculate optimal decision paths
  • Optimize trees by altering their parameters
  • Apply the random forest prediction technique

Course 5: Optimizing Machine Learning Models in Python 4h

Course Objectives
  • Distinguish between different optimization techniques
  • Identify the best optimization approach for your project
  • Apply optimization methods to improve your model
  • Employ machine learning tools on various optimization methods

Part 9: Deep Learning Foundations [1 course]

Course 1: Deep Learning Applications in PyTorch 8h

Course Objectives
  • Understand how core deep learning concepts translate across different application areas
  • Identify common neural network architectures used for sequence modeling, NLP, and computer vision
  • Recognize how data representation differs between text, sequences, and images
  • Build and reason about PyTorch models for different deep learning tasks
  • Understand the tradeoffs and challenges unique to each application domain

Part 10: Embeddings and Vector Databases [2 courses]

Course 1: Understanding Embeddings 6h

Course Objectives
  • Generate embeddings using sentence-transformers and API services
  • Visualize high-dimensional embeddings using dimensionality reduction techniques
  • Implement similarity metrics including Euclidean distance, dot product, and cosine similarity
  • Build semantic search systems that find results by meaning rather than keywords
  • Understand tradeoffs between self-hosted and cloud-based embedding services

Course 2: Vector Databases and Search 12h

Course Objectives
  • Set up and query vector databases using ChromaDB with HNSW indexing
  • Implement and evaluate document chunking strategies for optimal retrieval
  • Build metadata filtering and hybrid search combining semantic and keyword matching
  • Compare production vector databases including pgvector, Qdrant, and Pinecone
  • Apply semantic caching and conversation memory patterns for LLM applications
  • Deploy a complete knowledge base search system with evaluation metrics

Part 11: RAG Systems [1 course]

Course 1: Introduction to Retrieval-Augmented Generation (RAG) 6h

Course Objectives
  • Understand what RAG is and the problems it solves compared to standalone language models
  • Build a complete RAG pipeline covering retrieval, context management, and grounded generation
  • Apply advanced retrieval techniques including query expansion and reranking
  • Design effective prompts for grounded generation with source attribution
  • Diagnose and resolve common RAG failure modes across retrieval and generation stages

Projects in this path

Build a Food Ordering App

For this project, you’ll become a restaurant owner building a Python food ordering app. You’ll use dictionaries, loops, and functions to create an interactive system for viewing menus, modifying carts, and placing orders.

Garden Simulator Text Based Game

For this project, you’ll step into the role of a Python game developer to create an interactive text-based “Garden Simulator” using object-oriented programming, error handling, and randomness.

Developing a Dynamic AI Chatbot

For this project, you’ll become a developer at a tech company, using Python and the OpenAI API to create an engaging AI chatbot. You’ll gain skills in conversation management, persona creation, and token handling as you build a chatbot that adapts to different platforms.

Build a Multi-Provider LLM Gateway

Build a unified LLM gateway that abstracts TogetherAI and Anthropic APIs behind a single interface. Learn to handle provider-specific authentication, request formats, and response parsing while maintaining consistent output across different LLM providers.

Exploring eBay Car Sales Data

For this project, we’ll assume the role of data analysts for a used car classifieds service to explore and clean a dataset of car listings from eBay Kleinanzeigen, a section of the German eBay website.

Plus 14 more projects

Build your project portfolio with the Data Analyst in Python path.

The Dataquest guarantee

Guarantee

Dataquest has helped thousands of people start new careers in data. If you put in the work and follow our path, you’ll master data skills and grow your career.

Money

We believe so strongly in our paths that we offer a full satisfaction guarantee. If you complete a career path on Dataquest and aren’t satisfied with your outcome, we’ll give you a refund.

Master skills faster with Dataquest

Go from zero to job-ready

Go from zero to job-ready

Learn exactly what you need to achieve your goal. Don’t waste time on unrelated lessons.

Build your project portfolio

Build your project portfolio

Build confidence with our in-depth projects, and show off your data skills.

Challenge yourself with exercises

Challenge yourself with exercises

Work with real data from day one with interactive lessons and hands-on exercises.

Showcase your path certification

Showcase your path certification

Share the evidence of your hard work with your network and potential employers.

Grow your career with
Dataquest.

98%
of learners recommend
Dataquest for career advancement
4.85
Dataquest rating
SwitchUp Best Bootcamps
$30k
Average salary boost
for learners who complete a path
Aaron

Aaron Melton

Business Analyst at Aditi Consulting

“Dataquest starts at the most basic level, so a beginner can understand the concepts. I tried learning to code before, using Codecademy and Coursera. I struggled because I had no background in coding, and I was spending a lot of time Googling. Dataquest helped me actually learn.”

Jessi

Jessica Ko

Machine Learning Engineer at Twitter

“I liked the interactive environment on Dataquest. The material was clear and well organized. I spent more time practicing then watching videos and it made me want to keep learning.”

Victoria

Victoria E. Guzik

Associate Data Scientist at Callisto Media

“I really love learning on Dataquest. I looked into a couple of other options and I found that they were much too handhold-y and fill in the blank relative to Dataquest’s method. The projects on Dataquest were key to getting my job. I doubled my income!”

Join 1M+ data learners on
Dataquest.

1

Create a free account

2

Choose a learning path

3

Complete exercises and projects

4

Advance your career

Start learning today