In this lesson, you will see how characters are represented as a sequence of bits. This an important skill for you to have as a data engineer — you will need to handle textual data from a wide variety of different sources and stores in a database. This database will use some mechanism to encode the data; it’s likely that this will be different than the one used to encode the data that you want to store. Because of this, you will need to be able to convert the data before being able to store it.

In addition to learning how characters are represented as a sequence of bits, you will learn how a computer represents text internally. More specifically, you will learn about encodings, which are rules that explicitly defines the binary representation of each object. 

Knowing how to deal with encodings is an important skill because as a data engineer you will have to deal with data coming from a wide range of sources. Without knowing about encodings, you would probably not be able to process and store any data in a human-readable way.

As you work through this lesson and learn about encodings and representing text in a computer, you’ll get to apply what you’ve learned from within your browser; there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer-checking to ensure you've fully mastered each concept before learning the next.


  • What encodings and byte objects are.
  • Learn about multi-byte encodings and Unicode.
  • Learn about techniques to find encodings.

Lesson Outline

1. The ASCII Encoding
2. ASCII Limitations
3. Bytes
4. Printable Characters
5. Multi-byte Encodings
6. Variable-Length Encodings
7. Unicode
8. Decoding Bytes
9. Next Steps
10. Takeaways