Python Strings: An In-Depth Tutorial (55+ Code Examples)
Data types help us categorize data items. They determine the kinds of operations that we can perform on a data item. In Python, the common standard data types include numbers, string, list, tuple, boolean, set, and dictionary.
In this tutorial, we'll focus on the string data type. We will discuss how to declare the string data type, the relationship between the string data type and the ASCII table, the properties of the string data type, and some important string methods and operations.
What Are Python Strings?
A string is an object that contains a sequence of characters. A character is a string of length one. A single character is also a string in Python. Ironically, the character data type is absent in the Python programming language. However, we find the character data type in other programming languages like C, Kotlin, and Java.
We can declare a Python string using a single quote, a double quote, a triple quote, or the str()
function. The following code snippet shows how to declare a string in Python:
# A single quote string
single_quote = 'a' # This is an example of a character in other programming languages. It is a string in Python
# Another single quote string
another_single_quote = 'Programming teaches you patience.'
# A double quote string
double_quote = "aa"
# Another double-quote string
another_double_quote = "It is impossible until it is done!"
# A triple quote string
triple_quote = '''aaa'''
# Also a triple quote string
another_triple_quote = """Welcome to the Python programming language. Ready, 1, 2, 3, Go!"""
# Using the str() function
string_function = str(123.45) # str() converts float data type to string data type
# Another str() function
another_string_function = str(True) # str() converts a boolean data type to string data type
# An empty string
empty_string = ''
# Also an empty string
second_empty_string = ""
# We are not done yet
third_empty_string = """""" # This is also an empty string: ''''''
Another way of getting strings in Python is using the input()
function. The input()
function allows us to insert values into a program with the keyboard. The inserted values are read as a string, but we can convert them into other data types:
# Inputs into a Python program
input_float = input() # Type in: 3.142
input_boolean = input() # Type in: True
# Convert inputs into other data types
convert_float = float(input_float) # converts the string data type to a float
convert_boolean = bool(input_boolean) # converts the string data type to a bool
We use the type()
function to determine the data type of an object in Python. It returns the class of the object. When the object is a string, it returns the str
class. Similarly, it returns dict
, int
, float
, tuple
, bool
class when the object is a dictionary, integer, float, tuple, or Boolean, respectively. Let's now use the type()
function to determine the data types of variables declared in the previous code snippets:
# Data types/ classes with type()
print(type(single_quote))
print(type(another_triple_quote))
print(type(empty_string))
print(type(input_float))
print(type(input_boolean))
print(type(convert_float))
print(type(convert_boolean))
We have discussed how to declare strings. Let's now move to the relationship between strings and the ASCII table.
The ASCII Table vs. Python String Character
The American Standard Code for Information Interchange (ASCII) was developed to help us map characters or texts to numbers because sets of numbers are easier to store in the computer memory than texts. ASCII encodes 128 characters mainly in the English language that are used in processing information in computers and programming. The English characters encoded by ASCII include lowercase letters (a-z), uppercase letters (A-Z), digits (0-9), and symbols such as punctuation marks.
The ord()
function converts a Python string of length one (a character) to its decimal representation on the ASCII table, while the chr()
function converts the decimal representation back to a string. For instance:
import string
# Convert uppercase characters to their ASCII decimal numbers
ascii_upper_case = string.ascii_uppercase # Output: ABCDEFGHIJKLMNOPQRSTUVWXYZ
for one_letter in ascii_upper_case[:5]: # Loop through ABCDE
print(ord(one_letter))
65
66
67
68
69
# Convert digit characters to their ASCII decimal numbers
ascii_digits = string.digits # Output: 0123456789
for one_digit in ascii_digits[:5]: # Loop through 01234
print(ord(one_digit))
48
49
50
51
52
In the above code snippet, we looped through strings ABCDE
and 01234
, and we converted each character to their decimal representation on the ASCII table. We can also carry out the reverse operation with the chr()
function, whereby we convert decimal numbers on the ASCII table to their Python string characters. For instance:
decimal_rep_ascii = [37, 44, 63, 82, 100]
for one_decimal in decimal_rep_ascii:
print(chr(one_decimal))
On the ASCII table, the string characters in the output of the above program map to their respective decimal numbers. So far, we've discussed how to declare Python strings and how the string character maps to the ASCII table. Next, let's discuss the attributes of a string.
String Properties
Zero Index: The first element in a string has an index of zero, while the last element has an index of len(string) - 1
. For example:
immutable_string = "Accountability"
print(len(immutable_string))
print(immutable_string.index('A'))
print(immutable_string.index('y'))
14
0
13
Immutability. This means that we cannot update the characters in a string. For example, we cannot delete an element from a string or try to assign a new element at any of its index positions. If we try to update the string, it throws a TypeError
:
immutable_string = "Accountability"
# Assign a new element at index 0
immutable_string[0] = 'B'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11336/2351953155.py in
2
3 # Assign a new element at index 0
----> 4 immutable_string[0] = 'B'
TypeError: 'str' object does not support item assignment
We can, however, reassign a string to the immutable_string
variable, but we should note that they aren't the same string because they don't point to the same object in memory. Python doesn't update the old string object; it creates a new one, as we can see by the ids:
immutable_string = "Accountability"
print(id(immutable_string))
immutable_string = "Bccountability"
print(id(immutable_string)
test_immutable = immutable_string
print(id(test_immutable))
2693751670576
2693751671024
2693751671024
You will get different ids than the ones in the example because we're running our programs on different computers, so our memory addresses are different. However, both ids should also be different on the same computer. This means that both immutable_string
variables point to different addresses in memory. We assigned the last immutable_string
variable to test_immutable
variable. You can see that test_immutable
variable and the last immutable_string
variable point to the same address.
Concatenation: joining two or more strings together to get a new string with the +
symbol. For example:
first_string = "Data"
second_string = "quest"
third_string = "Data Science Path"
fourth_string = first_string + second_string
print(fourth_string)
fifth_string = fourth_string + " " + third_string
print(fifth_string)
Dataquest
Dataquest Data Science Path
Repetition: A string can be repeated with the *
symbol. For example:
print("Ha" * 3)
HaHaHa
Indexing and Slicing: we already established that strings are zero-indexed. We can access any element in a string with its index value. We can also take subsets of a string by slicing between two index values. For example:
main_string = "I learned R and Python on Dataquest. You can do it too!"
# Index 0
print(main_string[0])
# Index 1
print(main_string[1])
# Check if Index 1 is whitespace
print(main_string[1].isspace())
# Slicing 1
print(main_string[0:11])
# Slicing 2:
print(main_string[-18:])
# Slicing and concatenation
print(main_string[0:11] + ". " + main_string[-18:])
I
True
I learned R
You can do it too!
I learned R. You can do it too!
String Methods
str.split(sep=None, maxsplit=-1):
The string split method contains two attributes: sep
and maxsplit
. When this method is called with its default values, it splits the string anywhere there is a whitespace. This method returns a list of strings:
string = "Apple, Banana, Orange, Blueberry"
print(string.split())
['Apple,', 'Banana,', 'Orange,', 'Blueberry']
We can see that the string isn't split nicely because the split string contains ,
. We can use sep=','
to split wherever there is a ,
:
print(string.split(sep=','))
['Apple', ' Banana', ' Orange', ' Blueberry']
This is better than the previous split. However, we can see whitespace before some of the split strings. We can remove this with (sep=', ')
:
# Notice the whitespace after the comma
print(string.split(sep=', '))
['Apple', 'Banana', 'Orange', 'Blueberry']
Now, the string is split nicely. Sometimes, we don't want to split the maximum number of times. We can use the maxsplit
attribute to specify the number of times we intend to split:
print(string.split(sep=', ', maxsplit=1))
print(string.split(sep=', ', maxsplit=2))
['Apple', 'Banana, Orange, Blueberry']
['Apple', 'Banana', 'Orange, Blueberry']
str.splitlines(keepends=False):
Sometimes we want to process a corpus with different line breaks ('\n'
, \n\n'
, '\r'
, '\r\n'
) at the boundaries. We want to split into sentences, not individual words. We will use the splitline
method to do this. When keepends=True
, the line breaks are included in the text; otherwise, they are excluded. Let us see how this is done with Shakespeare's Macbeth text:
import nltk # You may have to pip install nltk
to use this library.
macbeth = nltk.corpus.gutenberg.raw('shakespeare-macbeth.txt')
print(macbeth.splitlines(keepends=True)[:5])
['[The Tragedie of Macbeth by William Shakespeare 1603]\n', '\n', '\n', 'Actus Primus. Scoena Prima.\n', '\n']
str.strip([chars]):
We remove trailing whitespaces or characters from both sides of the string with the strip
method. For instance:
string = " Apple Apple Apple no apple in the box apple apple "
stripped_string = string.strip()
print(stripped_string)
left_stripped_string = (
stripped_string
.lstrip('Apple')
.lstrip()
.lstrip('Apple')
.lstrip()
.lstrip('Apple')
.lstrip()
)
print(left_stripped_string)
capitalized_string = left_stripped_string.capitalize()
print(capitalized_string)
right_stripped_string = (
capitalized_string
.rstrip('apple')
.rstrip()
.rstrip('apple')
.rstrip()
)
print(right_stripped_string)
Apple Apple Apple no apple in the box apple apple
no apple in the box apple apple
No apple in the box apple apple
No apple in the box
In the above code snippet, we have used the lstrip
and rstrip
methods that remove trailing whitespaces or characters from the left and right sides of the string respectively. We have also used the capitalize
method, which converts a string to a sentence case.
str.zfill(width):
The zfill
method pads a string with 0
prefix to get the specified width
. For instance:
example = "0.8" # len(example) is 3
example_zfill = example.zfill(5) # len(example_zfill) is 5
print(example_zfill)
000.8
str.isalpha():
This method returns True
if all the characters in the string are alphabets; otherwise, it returns False
:
# Alphabet string
alphabet_one = "Learning"
print(alphabet_one.isalpha())
# Contains whitspace
alphabet_two = "Learning Python"
print(alphabet_two.isalpha())
# Contains comma symbols
alphabet_three = "Learning,"
print(alphabet_three.isalpha())
True
False
False
Similarly, str.isalnum()
returns True
if the string characters are alphanumeric; str.isdecimal()
returns True
if the string characters are decimal; str.isdigit()
returns True
if the string characters are digits; and str.isnumeric()
returns True
if the string characters are numeric.
str.islower()
returns True
if all the characters in the string are lowercase. str.isupper()
returns True
if all the characters in the string are uppercase, and str.istitle()
returns True
if the first letter of every word is capitalized:
# islower() example
string_one = "Artificial Neural Network"
print(string_one.islower())
string_two = string_one.lower() # converts string to lowercase
print(string_two.islower())
# isupper() example
string_three = string_one.upper() # converts string to uppercase
print(string_three.isupper())
# istitle() example
print(string_one.istitle())
False
True
True
True
str.endswith(suffix)
returns True
is the string ends with the specified suffix. str.startswith(prefix)
returns True
if the string begins with the specified prefix:
sentences = ['Time to master data science', 'I love statistical computing', 'Eat, sleep, code']
# endswith() example
for one_sentence in sentences:
print(one_sentence.endswith(('science', 'computing', 'Code')))
True
True
False
# startswith() example
for one_sentence in sentences:
print(one_sentence.startswith(('Time', 'I ', 'Ea')))
True
True
True
str.find(substring)
returns the lowest index if the substring is present in the string; otherwise, it returns -1. str.rfind(substring)
returns the highest index. The str.index(substring)
and str.rindex(substring)
also return the lowest and highest index of the substring respectively if found. If the substring isn't present in the string, they raise ValueError
.
string = "programming"
# find() and rfind() examples
print(string.find('m'))
print(string.find('pro'))
print(string.rfind('m'))
print(string.rfind('game'))
# index() and rindex() examples
print(string.index('m'))
print(string.index('pro'))
print(string.rindex('m'))
print(string.rindex('game'))
6
0
7
-1
6
0
7
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11336/3954098241.py in
11 print(string.index('pro')) # Output: 0
12 print(string.rindex('m')) # Output: 7
---> 13 print(string.rindex('game')) # Output: ValueError: substring not found
ValueError: substring not found
str.maketrans(dict_map)
creates a translation table from a dictionary map, and str.translate(maketrans)
substitutes elements in the translation with their new values. For example:
example = "abcde"
mapped = {'a':'1', 'b':'2', 'c':'3', 'd':'4', 'e':'5'}
print(example.translate(example.maketrans(mapped)))
12345
String Operations
Looping through a string. Strings are iterable. Therefore, they support the looping operations with for loop
and enumerate
:
# For-loop example
word = "bank"
for letter in word:
print(letter)
b
a
n
k
# Enumerate example
for idx, value in enumerate(word):
print(idx, value)
0 b
1 a
2 n
3 k
String and relational operators: when two strings are compared using relational operators (>
, <
, ==
, etc.), the elements of the two strings are compared by their ASCII decimal numbers index by index. For example:
print('a' > 'b')
print('abc' > 'b')
False
False
In both cases, the output is False
. The relational operator first compared the ASCII decimal numbers of the elements on index 0
for both strings. Since b
is greater than a
, it returns False
; the ASCII decimal numbers of the other elements, and the length of the strings do not matter in this case.
When the strings are of the same length, it compares the ASCII decimal numbers of each element from index 0
until it finds elements with different ASCII decimal numbers. For example:
print('abd' > 'abc')
True
In the above code snippet, the first two elements have the same ASCII decimal numbers; however, there is a mismatch in the third element, and since d
is greater than c
, it returns True
. In a situation where all the ASCII numbers for the elements match, the longer string is greater than the shorter one. For example:
print('abcd' > 'abc')
True
Checking membership of a string. The in
operator is used to check if a substring is a member of a string:
print('data' in 'dataquest')
print('gram' in 'programming')
True
True
Another way to check the membership of a string, replace a substring, or match pattern is using regular expression
import re
substring = 'gram'
string = 'programming'
replacement = '1234'
# Check membership
print(re.search(substring, string))
# Replace string
print(re.sub(substring, replacement, string))
pro1234ming
String formatting. f-string
and str.format()
methods are used to format strings. Both use curly bracket {}
placeholders. For example:
monday, tuesday, wednesday = "Monday", "Tuesday", "Wednesday"
format_string_one = "{} {} {}".format(monday, tuesday, wednesday)
print(format_string_one)
format_string_two = "{2} {1} {0}".format(monday, tuesday, wednesday)
print(format_string_two)
format_string_three = "{one} {two} {three}".format(one=tuesday, two=wednesday, three=monday)
print(format_string_three)
format_string_four = f"{monday} {tuesday} {wednesday}"
print(format_string_four)
Monday Tuesday Wednesday
Wednesday Tuesday Monday
Tuesday Wednesday Monday
Monday Tuesday Wednesday
f-strings
are more readable, and they implement faster than the str.format()
method. Therefore, f-string
is the preferred method of string formatting.
Handling Quotation and Apostrophe: the apostrophe sign (')
represents a string in Python. To let Python know that we aren't dealing with a string, we have to use the Python escape character (\)
. So, an apostrophe is represented as \'
in Python. Unlike handling apostrophes, there are many ways to handle quotations in Python. They include the following:
# 1. Represent string with single quote (""
) and quoted statement with double quote (""
)
quotes_one = '"Friends don\'t let friends use minibatches larger than 32" - Yann LeCun'
print(quotes_one)
# 2. Represent string with double quote ("")
and quoted statement with escape and double quote (\"statement\")
quotes_two = "\"Friends don\'t let friends use minibatches larger than 32\" - Yann LeCun"
print(quotes_two)
# 3. Represent string with triple quote ("""""")
and quoted statment with double quote ("")
quote_three = """"Friends don\'t let friends use minibatches larger than 32" - Yann LeCun"""
print(quote_three)
"Friends don't let friends use minibatches larger than 32" - Yann LeCun
"Friends don't let friends use minibatches larger than 32" - Yann LeCun
"Friends don't let friends use minibatches larger than 32" - Yann LeCun
Conclusion
Python strings are immutable, and they're one of the basic data types. We can declare them using single, double, or triple quotes, or using the str()
function.
We can map every element of a string to a number on the ASCII table. This is the property we use when strings are compared using relational operators. There are many methods available for processing strings.
This article discusses the most commonly used ones, which include methods for splitting strings, checking starting and ending characters, padding, checking string case, and substituting elements in a string. We have also discussed string operations for looping through the members of a string, checking string membership, formatting strings, and handling quotation marks and apostrophes.