5 Essential Data Skills for the 21st Century

Across a wide variety of industries, data is transforming the business landscape. And while the rise of big data has paved the way for the rise of data specialist positions like data scientist and data engineer, it’s increasingly important that many professionals have at least some fundamental data literacy.

At the same time, we’ve also seen the rise of “citizen data science” — folks in non-data-related roles making meaningful contributions both inside and outside of their workplaces using data skills they’ve acquired through online study.

Here are five core skills for working with data that can benefit everyone from full-time data pros to citizen hobbyists, and empower your employees to become more efficient and data-driven.


What it is: Learning programming skills will allow your employees to work with data faster and more effectively. In fact, programming skills are the foundation upon which most other data skills are built.

The good news is that your employees don’t need to become software developers — learning even just the basics of programming can unlock some very powerful capabilities. And thankfully, the best languages for data science — Python and R — are also some of the most approachable. SQL, a language for querying databases, is also fairly straightforward to learn, even for people without prior programming experience or a technical background.

Example use case: An employee wants to understand more about which customers are purchasing a particular product. She might write a SQL query to pull information about purchases of that product from the company database. Then, she might use Python programming skills and popular data science libraries like pandas and matplotlib to clean the data and perform some exploratory data visualization to look for trends among customers who purchased the product.

How Dataquest teaches this:

  • Python programming courses from beginner to advanced
  • R programming courses from beginner to advanced
  • SQL courses from beginner to advanced
  • Courses on popular data science libraries for Python including pandas, NumPy,  matplotlib, scikit-learn, and more
  • Courses on popular data science packages for R, with a focus on ggplot2 and other tidyverse packages

Data Cleaning

What it is: Data stored in a database is rarely perfect — there are often data entry errors, missing values, and other issues that can throw off the results or even make analysis impossible if they’re not corrected. Data cleaning is the process of analyzing a data set to find and correct these issues, and it’s an essential step in the data analysis process.

It’s possible to clean small data sets with minimal mistakes manually using spreadsheet software, but the larger and messier the data set, the more time-consuming and impractical that becomes. Using programming skills, even massive data sets can be cleaned with speed, efficiency, and accuracy.

Example use case: An employee wants to dig into customer satisfaction data by looking at survey results. Using their Python and pandas or R and tidyverse skills, they could open the data set containing these results in any format they’re available (CSV, JSON, XLS, via API, etc.). They could then analyze the for entry mistakes, missing data, and errors, and use their programming skills to resolve those errors far more quickly than would be possible using spreadsheet software.

How Dataquest teaches this:

  • Data Cleaning and Data Cleaning Advanced courses for Python
  • Data Cleaning and Data Cleaning Advanced courses for R

Data Analysis

What it is: After a data set has been acquired and the data has been cleaned, it’s ready for analysis — the process of searching for insights and patterns in the data. Performing data analysis effectively requires both programming skills and an understanding of statistics to ensure the analytical approach is valid.

Example use case: An employee wants to learn which marketing efforts have been most effective in the past. Using Python and pandas or R and the tidyverse, the employee performs a time series analysis of company sales data and finds that in addition to seasonal sales fluctuations, there are also up- and downswings that correlate with the launch and cessation of various marketing campaigns.

How Dataquest teaches this:

  • All of our programming courses use real-world data analysis tasks to illustrate and teach programming concepts
  • Statistics courses for both Python and R, including: Statistics Fundamentals, Statistics Intermediate, Probability Fundamentals, Hypothesis Testing, Conditional Probability, Linear Modeling, and more


What it is: Most businesses generate huge amounts of similarly-formatted data over time. For example, monthly sales numbers will be different each month, but the format of each month’s data will be the same. Programming skills can give your employees the power to automate reporting and processing of these sorts of data sets. A variety of other data-related tasks can also be automated with scripts, reducing tasks that could take hours or days down to a matter of minutes or seconds.

Example use case: An employee needs to generate a monthly sales report to share with the company. Using her programming skills, she writes a script that queries the company database to acquire the data, cleans the data automatically based on the mistakes and errors commonly found in the data set, and then exports the resulting report as an XLS file for broader consumption. Writing this script may be time consuming, but once it has been created, the employee simply needs to run it once per month and it will generate the reports on its own.

How Dataquest teaches this:

  • Python programming courses from beginner to advanced
  • R programming courses from beginner to advanced
  • SQL courses from beginner to advanced
  • Data engineering course path with a variety of courses focused on tasks like building data pipelines and repeatable processes and technologies like PostgreSQL

Data Visualization

What it is: Trends, patterns, and insights can be difficult to spot in spreadsheet format. Data visualization is the process of translating those numbers into a more easily digestible format, such as a chart. There are a variety of tools available for data visualization, and Python and R both offer a variety of data visualization packages that provide programmers with a wide variety of visualization and customization options.

Data visualization is where the technical skills required for analysis meet communication skills and statistical knowledge — to make a great visualization, one needs to understand the numbers and have the design, communication, and storytelling skills to communicate them to non-technical audiences.

Example use case: An employee has analyzed internal data and identified which teams are operating most efficiently. Using carefully-annotated charts, they build a report that presents this data to stakeholders visually and clearly, and that includes the context required to understand it.

How Dataquest teaches this:

  • Exploratory Data Visualization and Storytelling Through Data Visualization courses for Python
  • Data Analysis in Business course for Python
  • Exploratory Data Visualization course for R
  • A variety of statistics courses in both Python and R

dataquest case study training m-kopa data team skills