MISSION 390

Text Processing

So far in this course, you have worked with text files. In the Getting Help and Reading Documentation lesson, you discovered how to use the man and help commands to get documentation for a certain command, as well as using less to examine the contents of a text file.

In the File Inspection lesson, you used additional commands such as head and less to help you examine the contents of text files. In addition, you learned about option-arguments that allow you to fine-tune the command you want to run.

Oftentimes, as a data scientist, you'll be handling text files. Text files are one of the most common ways to store and handle data, and their presence in any data science project is a certainty. Text processing includes tasks such as reformatting the text, extracting specific parts of the text, and modifying the text, and those are topics that will be covered in this lesson.

One of the advantages of the shell (also known as the terminal or the command line) over Python is that it tends to be faster for tasks directly concerning input and output of files since commands interact more intimately with the filesystem. It's very common to use the shell to prune a text file to obtain only the information that is relevant to us, and then work on these files using Python once they’ve been cut down to a more appropriate size and pruned of extraneous information.

What You'll Learn

  • How to concatenate files
  • How to sort files
  • How to subset data by rows and columns
  • How to extract data with regular expressions

Mission Outline

1. Text Processing
2. Concatenate
3. Cat Abuse
4. Sorting Files
5. Beware of Sort
6. Sorting Data Sets
7. Sorting on Multiple Columns
8. Selecting Columns
9. Grep
10. Extended Regular Expressions
11. ​Next Steps
12. Takeaways

text-processing-cli

Course Info:

Intermediate

The median completion time for this course is 4 hours. View Details

This course requires a basic subscription and has five mission. It is the tenth course in the Data Analyst in Python path and Data Scientist in Python path.

START LEARNING FREE

Take a Look Inside