Text Processing

So far in this course, you have worked with text files. In the Getting Help and Reading Documentation lesson, you discovered how to use the man and help commands to get documentation for a certain command, as well as using less to examine the contents of a text file.

In the File Inspection lesson, you used additional commands such as head and less to help you examine the contents of text files. In addition, you learned about option-arguments that allow you to fine-tune the command you want to run.

Oftentimes, as a data scientist, you’ll be handling text files. Text files are one of the most common ways to store and handle data, and their presence in any data science project is a certainty. Text processing includes tasks such as reformatting the text, extracting specific parts of the text, and modifying the text, and those are topics that will be covered in this lesson.

One of the advantages of the shell (also known as the terminal or the command line) over Python is that it tends to be faster for tasks directly concerning input and output of files since commands interact more intimately with the filesystem. It’s very common to use the shell to prune a text file to obtain only the information that is relevant to us, and then work on these files using Python once they’ve been cut down to a more appropriate size and pruned of extraneous information.


  • How to concatenate files
  • How to sort files
  • How to subset data by rows and columns
  • How to extract data with regular expressions

Lesson Outline

  1. Text Processing
  2. Concatenate
  3. Cat Abuse
  4. Sorting Files
  5. Beware of Sort
  6. Sorting Data Sets
  7. Sorting on Multiple Columns
  8. Selecting Columns
  9. Grep
  10. Extended Regular Expressions
  11. Next Steps
  12. Takeaways

Get started for free

No credit card required.

Or With

By creating an account you agree to accept our terms of use and privacy policy.