MISSION 122

Challenge: Data Munging Using the Command Line

Data munging involves transforming datasets to make them easier to work with. In this challenge, you'll practice the command line concepts you've learned so far by munging datasets using just the command line. Some datasets are too large to load into Python, so looking at them or transforming them beforehand can be useful. Even for smaller datasets, simple exploration of the command line is faster than exploration in Python, and file-based tasks like unifying datasets can be faster on the command line.

At Dataquest, we're huge believers in learning through doing, and we hope this shows in your experience with the missions. While missions focus on introducing concepts, challenges allow you to perform deliberate practice by completing structured problems. Challenges will feel similar to missions, but with little instructional material and a larger focus on exercises.

You'll be interacting with datasets on U.S. housing affordability from the U.S. Department of Housing & Urban Development in this challenge.

Objectives

  • Practice munging and exploring datasets from the shell.
  • Learn to consolidate multiple datasets into a single file.

Mission Outline

1. Data munging
2. Data exploration
3. Filtering
4. Consolidating datasets
5. Counting
6. Next steps

command-line-intermediate

Course Info:

Intermediate

The median completion time for this course is 6 hours. View Details

This course requires a basic subscription and includes four missions and one guided project. It is the ninth course in the Data Analyst in Python path and Data Scientist in Python path.

START LEARNING FREE

Take a Look Inside