Challenge: Data Munging Using The Command Line

Data munging involves transforming datasets to make them easier to work with. In this challenge, you’ll practice the command line concepts you’ve learned so far by munging datasets using just the command line. Some datasets are too large to load into Python, so looking at them or transforming them beforehand can be useful. Even for smaller datasets, simple exploration of the command line is faster than exploration in Python, and file-based tasks like unifying datasets can be faster on the command line.

At Dataquest, we’re huge believers in learning through doing, and we hope this shows in your experience with the lessons. While lessons focus on introducing concepts, challenges allow you to perform deliberate practice by completing structured problems. Challenges will feel similar to lessons, but with little instructional material and a larger focus on exercises.

You’ll be interacting with datasets on U.S. housing affordability from the U.S. Department of Housing & Urban Development in this challenge.


  • Practice munging and exploring datasets from the shell.
  • Learn to consolidate multiple datasets into a single file.

Lesson Outline

  1. Data munging
  2. Data exploration
  3. Filtering
  4. Consolidating datasets
  5. Counting
  6. Next steps

Get started for free

No credit card required.

Or With

By creating an account you agree to accept our terms of use and privacy policy.