April 21, 2022

Python Strings: An In-Depth Tutorial (55+ Code Examples)

Data types help us categorize data items. They determine the kinds of operations that we can perform on a data item. In Python, the common standard data types include numbers, string, list, tuple, boolean, set, and dictionary.

In this tutorial, we'll focus on the string data type. We will discuss how to declare the string data type, the relationship between the string data type and the ASCII table, the properties of the string data type, and some important string methods and operations.


What Are Python Strings?

A string is an object that contains a sequence of characters. A character is a string of length one. A single character is also a string in Python. Ironically, the character data type is absent in the Python programming language. However, we find the character data type in other programming languages like C, Kotlin, and Java.

We can declare a Python string using a single quote, a double quote, a triple quote, or the str() function. The following code snippet shows how to declare a string in Python:

# A single quote string
single_quote = 'a'  # This is an example of a character in other programming languages. It is a string in Python

# Another single quote string
another_single_quote = 'Programming teaches you patience.'

# A double quote string
double_quote = "aa"

# Another double-quote string
another_double_quote = "It is impossible until it is done!"

# A triple quote string
triple_quote = '''aaa'''

# Also a triple quote string
another_triple_quote = """Welcome to the Python programming language. Ready, 1, 2, 3, Go!"""

# Using the str() function
string_function = str(123.45)  # str() converts float data type to string data type

# Another str() function
another_string_function = str(True)  # str() converts a boolean data type to string data type

# An empty string
empty_string = ''

# Also an empty string
second_empty_string = ""

# We are not done yet
third_empty_string = """"""  # This is also an empty string: ''''''

Another way of getting strings in Python is using the input() function. The input() function allows us to insert values into a program with the keyboard. The inserted values are read as a string, but we can convert them into other data types:

# Inputs into a Python program
input_float = input()  # Type in: 3.142
input_boolean = input() # Type in: True

# Convert inputs into other data types
convert_float = float(input_float)  # converts the string data type to a float
convert_boolean = bool(input_boolean) # converts the string data type to a bool

We use the type() function to determine the data type of an object in Python. It returns the class of the object. When the object is a string, it returns the str class. Similarly, it returns dict, int, float, tuple, bool class when the object is a dictionary, integer, float, tuple, or Boolean, respectively. Let's now use the type() function to determine the data types of variables declared in the previous code snippets:

# Data types/ classes with type()

print(type(single_quote))
print(type(another_triple_quote))
print(type(empty_string))

print(type(input_float))
print(type(input_boolean))

print(type(convert_float))
print(type(convert_boolean))






We have discussed how to declare strings. Let's now move to the relationship between strings and the ASCII table.


The ASCII Table vs. Python String Character

The American Standard Code for Information Interchange (ASCII) was developed to help us map characters or texts to numbers because sets of numbers are easier to store in the computer memory than texts. ASCII encodes 128 characters mainly in the English language that are used in processing information in computers and programming. The English characters encoded by ASCII include lowercase letters (a-z), uppercase letters (A-Z), digits (0-9), and symbols such as punctuation marks.

The ord() function converts a Python string of length one (a character) to its decimal representation on the ASCII table, while the chr() function converts the decimal representation back to a string. For instance:

import string

# Convert uppercase characters to their ASCII decimal numbers
ascii_upper_case = string.ascii_uppercase  # Output: ABCDEFGHIJKLMNOPQRSTUVWXYZ

for one_letter in ascii_upper_case[:5]:  # Loop through ABCDE
    print(ord(one_letter))    
65
66
67
68
69
# Convert digit characters to their ASCII decimal numbers
ascii_digits = string.digits  # Output: 0123456789

for one_digit in ascii_digits[:5]:  # Loop through 01234
    print(ord(one_digit)) 
48
49
50
51
52

In the above code snippet, we looped through strings ABCDE and 01234, and we converted each character to their decimal representation on the ASCII table. We can also carry out the reverse operation with the chr() function, whereby we convert decimal numbers on the ASCII table to their Python string characters. For instance:

decimal_rep_ascii = [37, 44, 63, 82, 100]

for one_decimal in decimal_rep_ascii:
    print(chr(one_decimal))

On the ASCII table, the string characters in the output of the above program map to their respective decimal numbers. So far, we've discussed how to declare Python strings and how the string character maps to the ASCII table. Next, let's discuss the attributes of a string.


String Properties

Zero Index: The first element in a string has an index of zero, while the last element has an index of len(string) - 1. For example:

immutable_string = "Accountability"

print(len(immutable_string))
print(immutable_string.index('A'))
print(immutable_string.index('y'))
14
0
13

Immutability. This means that we cannot update the characters in a string. For example, we cannot delete an element from a string or try to assign a new element at any of its index positions. If we try to update the string, it throws a TypeError:

immutable_string = "Accountability"

# Assign a new element at index 0
immutable_string[0] = 'B'
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

~\AppData\Local\Temp/ipykernel_11336/2351953155.py in 
      2 
      3 # Assign a new element at index 0
----> 4 immutable_string[0] = 'B'

TypeError: 'str' object does not support item assignment

We can, however, reassign a string to the immutable_string variable, but we should note that they aren't the same string because they don't point to the same object in memory. Python doesn't update the old string object; it creates a new one, as we can see by the ids:

immutable_string = "Accountability"
print(id(immutable_string))

immutable_string = "Bccountability"  
print(id(immutable_string)

test_immutable = immutable_string
print(id(test_immutable))
2693751670576
2693751671024
2693751671024

You will get different ids than the ones in the example because we're running our programs on different computers, so our memory addresses are different. However, both ids should also be different on the same computer. This means that both immutable_string variables point to different addresses in memory. We assigned the last immutable_string variable to test_immutable variable. You can see that test_immutable variable and the last immutable_string variable point to the same address.

Concatenation: joining two or more strings together to get a new string with the + symbol. For example:

first_string = "Data"
second_string = "quest"
third_string = "Data Science Path"

fourth_string = first_string + second_string
print(fourth_string)

fifth_string = fourth_string + " " + third_string
print(fifth_string)
Dataquest
Dataquest Data Science Path

Repetition: A string can be repeated with the * symbol. For example:

print("Ha" * 3)
HaHaHa

Indexing and Slicing: we already established that strings are zero-indexed. We can access any element in a string with its index value. We can also take subsets of a string by slicing between two index values. For example:

main_string = "I learned R and Python on Dataquest. You can do it too!"

# Index 0
print(main_string[0])

# Index 1
print(main_string[1])

# Check if Index 1 is whitespace
print(main_string[1].isspace())

# Slicing 1
print(main_string[0:11])

# Slicing 2: 
print(main_string[-18:])

# Slicing and concatenation
print(main_string[0:11] + ". " + main_string[-18:])
I

True
I learned R
You can do it too!
I learned R. You can do it too!

String Methods

str.split(sep=None, maxsplit=-1): The string split method contains two attributes: sep and maxsplit. When this method is called with its default values, it splits the string anywhere there is a whitespace. This method returns a list of strings:

string = "Apple, Banana, Orange, Blueberry"
print(string.split())
['Apple,', 'Banana,', 'Orange,', 'Blueberry']

We can see that the string isn't split nicely because the split string contains ,. We can use sep=',' to split wherever there is a ,:

print(string.split(sep=','))
['Apple', ' Banana', ' Orange', ' Blueberry']

This is better than the previous split. However, we can see whitespace before some of the split strings. We can remove this with (sep=', '):

# Notice the whitespace after the comma
print(string.split(sep=', '))
['Apple', 'Banana', 'Orange', 'Blueberry']

Now, the string is split nicely. Sometimes, we don't want to split the maximum number of times. We can use the maxsplit attribute to specify the number of times we intend to split:

print(string.split(sep=', ', maxsplit=1))

print(string.split(sep=', ', maxsplit=2))
['Apple', 'Banana, Orange, Blueberry']
['Apple', 'Banana', 'Orange, Blueberry']

str.splitlines(keepends=False): Sometimes we want to process a corpus with different line breaks ('\n', \n\n', '\r', '\r\n') at the boundaries. We want to split into sentences, not individual words. We will use the splitline method to do this. When keepends=True, the line breaks are included in the text; otherwise, they are excluded. Let us see how this is done with Shakespeare's Macbeth text:

import nltk  # You may have to pip install nltk to use this library.

macbeth = nltk.corpus.gutenberg.raw('shakespeare-macbeth.txt')
print(macbeth.splitlines(keepends=True)[:5])
['[The Tragedie of Macbeth by William Shakespeare 1603]\n', '\n', '\n', 'Actus Primus. Scoena Prima.\n', '\n']

str.strip([chars]): We remove trailing whitespaces or characters from both sides of the string with the strip method. For instance:

string = "    Apple Apple Apple no apple in the box apple apple             "

stripped_string = string.strip()
print(stripped_string)

left_stripped_string = (
    stripped_string
    .lstrip('Apple')
    .lstrip()
    .lstrip('Apple')
    .lstrip()
    .lstrip('Apple')
    .lstrip()
)
print(left_stripped_string)

capitalized_string = left_stripped_string.capitalize()
print(capitalized_string)

right_stripped_string = (
    capitalized_string
    .rstrip('apple')
    .rstrip()
    .rstrip('apple')
    .rstrip()
)
print(right_stripped_string)
Apple Apple Apple no apple in the box apple apple
no apple in the box apple apple
No apple in the box apple apple
No apple in the box

In the above code snippet, we have used the lstrip and rstrip methods that remove trailing whitespaces or characters from the left and right sides of the string respectively. We have also used the capitalize method, which converts a string to a sentence case.

str.zfill(width): The zfill method pads a string with 0 prefix to get the specified width. For instance:

example = "0.8"  # len(example) is 3
example_zfill = example.zfill(5) # len(example_zfill) is 5
print(example_zfill)
000.8

str.isalpha(): This method returns True if all the characters in the string are alphabets; otherwise, it returns False:

# Alphabet string
alphabet_one = "Learning"
print(alphabet_one.isalpha())

# Contains whitspace
alphabet_two = "Learning Python"
print(alphabet_two.isalpha())

# Contains comma symbols
alphabet_three = "Learning,"
print(alphabet_three.isalpha())
True
False
False

Similarly, str.isalnum() returns True if the string characters are alphanumeric; str.isdecimal() returns True if the string characters are decimal; str.isdigit() returns True if the string characters are digits; and str.isnumeric() returns True if the string characters are numeric.

str.islower() returns True if all the characters in the string are lowercase. str.isupper() returns True if all the characters in the string are uppercase, and str.istitle() returns True if the first letter of every word is capitalized:

# islower() example
string_one = "Artificial Neural Network"
print(string_one.islower())

string_two = string_one.lower()  # converts string to lowercase
print(string_two.islower())

# isupper() example
string_three = string_one.upper() # converts string to uppercase
print(string_three.isupper())

# istitle() example
print(string_one.istitle())
False
True
True
True

str.endswith(suffix) returns True is the string ends with the specified suffix. str.startswith(prefix) returns True if the string begins with the specified prefix:

sentences = ['Time to master data science', 'I love statistical computing', 'Eat, sleep, code']

# endswith() example
for one_sentence in sentences:
    print(one_sentence.endswith(('science', 'computing', 'Code')))
True
True
False
# startswith() example
for one_sentence in sentences:
    print(one_sentence.startswith(('Time', 'I ', 'Ea')))
True
True
True

str.find(substring) returns the lowest index if the substring is present in the string; otherwise, it returns -1. str.rfind(substring) returns the highest index. The str.index(substring) and str.rindex(substring) also return the lowest and highest index of the substring respectively if found. If the substring isn't present in the string, they raise ValueError.

string = "programming"

# find() and rfind() examples
print(string.find('m'))
print(string.find('pro'))
print(string.rfind('m'))
print(string.rfind('game'))

# index() and rindex() examples
print(string.index('m'))
print(string.index('pro'))
print(string.rindex('m'))
print(string.rindex('game'))
6
0
7
-1
6
0
7

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

~\AppData\Local\Temp/ipykernel_11336/3954098241.py in 
     11 print(string.index('pro'))  # Output: 0
     12 print(string.rindex('m'))  # Output: 7
---> 13 print(string.rindex('game'))  # Output: ValueError: substring not found

ValueError: substring not found

str.maketrans(dict_map) creates a translation table from a dictionary map, and str.translate(maketrans) substitutes elements in the translation with their new values. For example:

example = "abcde"
mapped = {'a':'1', 'b':'2', 'c':'3', 'd':'4', 'e':'5'}
print(example.translate(example.maketrans(mapped))) 
12345

String Operations

Looping through a string. Strings are iterable. Therefore, they support the looping operations with for loop and enumerate:

# For-loop example
word = "bank"
for letter in word:
    print(letter)
b
a
n
k
# Enumerate example
for idx, value in enumerate(word):
    print(idx, value)
0 b
1 a
2 n
3 k

String and relational operators: when two strings are compared using relational operators (>, <, ==, etc.), the elements of the two strings are compared by their ASCII decimal numbers index by index. For example:

print('a' > 'b')
print('abc' > 'b')
False
False

In both cases, the output is False. The relational operator first compared the ASCII decimal numbers of the elements on index 0 for both strings. Since b is greater than a, it returns False; the ASCII decimal numbers of the other elements, and the length of the strings do not matter in this case.

When the strings are of the same length, it compares the ASCII decimal numbers of each element from index 0 until it finds elements with different ASCII decimal numbers. For example:

print('abd' > 'abc')
True

In the above code snippet, the first two elements have the same ASCII decimal numbers; however, there is a mismatch in the third element, and since d is greater than c, it returns True. In a situation where all the ASCII numbers for the elements match, the longer string is greater than the shorter one. For example:

print('abcd' > 'abc')
True

Checking membership of a string. The in operator is used to check if a substring is a member of a string:

print('data' in 'dataquest')
print('gram' in 'programming')
True
True

Another way to check the membership of a string, replace a substring, or match pattern is using regular expression

import re

substring = 'gram'
string = 'programming'
replacement = '1234'

# Check membership
print(re.search(substring, string))

# Replace string
print(re.sub(substring, replacement, string))

pro1234ming

String formatting. f-string and str.format() methods are used to format strings. Both use curly bracket {} placeholders. For example:

monday, tuesday, wednesday = "Monday", "Tuesday", "Wednesday"

format_string_one = "{} {} {}".format(monday, tuesday, wednesday)
print(format_string_one)

format_string_two = "{2} {1} {0}".format(monday, tuesday, wednesday)
print(format_string_two)

format_string_three = "{one} {two} {three}".format(one=tuesday, two=wednesday, three=monday)
print(format_string_three)

format_string_four = f"{monday} {tuesday} {wednesday}"
print(format_string_four)
Monday Tuesday Wednesday
Wednesday Tuesday Monday
Tuesday Wednesday Monday
Monday Tuesday Wednesday

f-strings are more readable, and they implement faster than the str.format() method. Therefore, f-string is the preferred method of string formatting.

Handling Quotation and Apostrophe: the apostrophe sign (') represents a string in Python. To let Python know that we aren't dealing with a string, we have to use the Python escape character (\). So, an apostrophe is represented as \' in Python. Unlike handling apostrophes, there are many ways to handle quotations in Python. They include the following:

# 1. Represent string with single quote ("") and quoted statement with double quote ("")
quotes_one =  '"Friends don\'t let friends use minibatches larger than 32" - Yann LeCun'
print(quotes_one)

# 2. Represent string with double quote ("") and quoted statement with escape and double quote (\"statement\")
quotes_two =  "\"Friends don\'t let friends use minibatches larger than 32\" - Yann LeCun"
print(quotes_two)

# 3. Represent string with triple quote ("""""") and quoted statment with double quote ("")
quote_three = """"Friends don\'t let friends use minibatches larger than 32" - Yann LeCun"""
print(quote_three)
"Friends don't let friends use minibatches larger than 32" - Yann LeCun
"Friends don't let friends use minibatches larger than 32" - Yann LeCun
"Friends don't let friends use minibatches larger than 32" - Yann LeCun

Conclusion

Python strings are immutable, and they're one of the basic data types. We can declare them using single, double, or triple quotes, or using the str() function.

We can map every element of a string to a number on the ASCII table. This is the property we use when strings are compared using relational operators. There are many methods available for processing strings.

This article discusses the most commonly used ones, which include methods for splitting strings, checking starting and ending characters, padding, checking string case, and substituting elements in a string. We have also discussed string operations for looping through the members of a string, checking string membership, formatting strings, and handling quotation marks and apostrophes.

Aghogho Monorien

About the author

Aghogho Monorien

Aghogho is an engineer and aspiring Quant working on the applications of artificial intelligence in finance.