So far in this course, you have worked with text files. In the Getting Help and Reading Documentation lesson, you discovered how to use the
help commands to get documentation for a certain command, as well as using less to examine the contents of a text file.
In the File Inspection lesson, you used additional commands such as
less to help you examine the contents of text files. In addition, you learned about option-arguments that allow you to fine-tune the command you want to run.
Oftentimes, as a data scientist, you’ll be handling text files. Text files are one of the most common ways to store and handle data, and their presence in any data science project is a certainty. Text processing includes tasks such as reformatting the text, extracting specific parts of the text, and modifying the text, and those are topics that will be covered in this lesson.
One of the advantages of the shell (also known as the terminal or the command line) over Python is that it tends to be faster for tasks directly concerning input and output of files since commands interact more intimately with the filesystem. It’s very common to use the shell to prune a text file to obtain only the information that is relevant to us, and then work on these files using Python once they’ve been cut down to a more appropriate size and pruned of extraneous information.
- How to concatenate files
- How to sort files
- How to subset data by rows and columns
- How to extract data with regular expressions
- Text Processing
- Cat Abuse
- Sorting Files
- Beware of Sort
- Sorting Data Sets
- Sorting on Multiple Columns
- Selecting Columns
- Extended Regular Expressions
- Next Steps