8 Rarely Used Python Libraries & How to Use Them
The most popular Python libraries out there are usually TensorFlow, Numpy, PyTorch, Pandas, Scikit-Learn, Keras and a few others. Although you may come across these names pretty frequently there are thousands of Python libraries out there that you can work with. In this article we are going to focus on how to use Python libraries that are rarely used or heard of, but that are incredibly useful to solve specific tasks or that you can use for a fun project.
The Python libraries we are going to practice on are:
- Missingno
- Tabulate
- Wikipedia
- Wget
- Faker
- Numerizer
- Emoji
- PyAztro
To begin we’ll download a dataset from Kaggle – Animal Care and Control Adopted Animals
import pandas as pd
f = pd.read_csv('animal-data-1.csv')
rint('Number of pets:', len(df))
rint(df.head(3))
Number of pets: 10290
id intakedate intakereason istransfer sheltercode \
0 15801 2009-11-28 00:00:00 Moving 0 C09115463
1 15932 2009-12-08 00:00:00 Moving 0 D09125594
2 28859 2012-08-10 00:00:00 Abandoned 0 D12082309
identichipnumber animalname breedname basecolour speciesname \
0 0A115D7358 Jadzia Domestic Short Hair Tortie Cat
1 0A11675477 Gonzo German Shepherd Dog/Mix Tan Dog
2 0A13253C7B Maggie Shep Mix/Siberian Husky Various Dog
... movementdate movementtype istrial returndate returnedreason \
0 ... 2017-05-13 00:00:00 Adoption 0.0 NaN Stray
1 ... 2017-04-24 00:00:00 Adoption 0.0 NaN Stray
2 ... 2017-04-15 00:00:00 Adoption 0.0 NaN Stray
deceaseddate deceasedreason diedoffshelter puttosleep isdoa
0 NaN Died in care 0 0 0
1 NaN Died in care 0 0 0
2 NaN Died in care 0 0 0
[3 rows x 23 columns]
1. Missingno
Library installation: pip install missingno
What is Missingno in Python? - Missingno is a special Python library used for displaying missing values in a dataframe. Of course, we can also use a seaborn heatmap or a bar plot from any visualization library for this purpose. However, in such cases, we’ll have to first create a series containing missing values in each column using df.isnull().sum(), while missingno already does all this under the hood. This Python library offers a few types of charts:
- matrix displays density patterns in data completion for up to 50 columns of a dataframe, and it is analogous to the seaborn missing value heatmap. Also, by means of the sparkline at right, it shows the general shape of the data completeness by row, emphasizing the rows with the maximum and minimum nullity.
- bar chart shows nullity visualization in bars by column.
- heatmap measures nullity correlation that ranges from -1 to 1. Essentially, it shows how strongly the presence or absence of one variable affects the presence of another. Columns with no missing values, or just the opposite, completely empty, are excluded from the visualization, having no meaningful correlation.
- dendrogram, like the heatmap, measures nullity relationships between columns, but in this case not pairwise but between groups of columns, detecting clusters of missing data. Those variables that are located closer on the chart show a stronger nullity correlation. For dataframes with less than 50 columns the dendrogram is vertical, otherwise, it flips to a horizontal.
Let’s try all these charts with their default settings on our pet dataset:
import missingno as msno
sno.matrix(df)
msno.bar(df)
msno.heatmap(df)
msno.dendrogram(df)
We can make the following observations about the dataset:
- In general, there are a few missing values.
- The most empty columns are deceaseddate and returndate.
- The majority of pets are chipped.
- Nullity correlation:
- slightly negative between being chipped and being dead,
- slightly positive – being chipped vs. being returned, being returned vs. being dead.
There are a few options to customize missingno charts: figsize, fontsize, sort (sorts the rows by completeness, in either ascending or descending order), labels (can be True or False, meaning whether to show or not the column labels). Some parameters are chart-specific: color for matrix and bar charts, sparkline (whether to draw it or not) and width_ratios (matrix width to sparkline width) for matrix, log (logarithmic scale) for bar charts, cmap colormap for heatmap, orientation for dendrogram. Let’s apply some of them to one of our charts above:
msno.matrix(
df,
figsize=(25,7),
fontsize=30,
sort='descending',
color=(0.494, 0.184, 0.556),
width_ratios=(10, 1)
)
Finally, if there is still something we would like to tune, we can always add any functionality of matplotlib to the missingno graphs. To do this, we should add the parameter inline and assign it to False. Let’s add a title to our matrix chart:
import matplotlib.pyplot as plt
sno.matrix(
df,
figsize=(25,7),
fontsize=30,
sort='descending',
color=(0.494, 0.184, 0.556),
width_ratios=(10, 1),
inline=False
)
lt.title('Missing Values Pet Dataset', fontsize=55)
lt.show()
For further practice, let’s keep only the most interesting columns of our dataframe:
columns = ['identichipnumber', 'animalname', 'breedname', 'speciesname', 'sexname', 'returndate',
'returnedreason']
f = df[columns]
2. Tabulate
Library installation: pip install tabulate
What is Tabulate in Python? - This Python library serves for pretty-printing tabular data in Python. It allows smart and customizable column alignment, number and text formatting, alignment by a decimal point.
The tabulate() function takes a tabular data type (dataframe, list of lists or dictionaries, dictionary, NumPy array), some other optional parameters, and outputs a nicely formatted table. Let’s practice it on a fragment of our pet dataset, starting with the most basic pretty-printed table:
from tabulate import tabulate
f_pretty_printed = df.iloc[:5, [1,2,4,6]]
rint(tabulate(df_pretty_printed))
- ----------- ----------------------- ------ -----
Jadzia Domestic Short Hair Female Stray
Gonzo German Shepherd Dog/Mix Male Stray
Maggie Shep Mix/Siberian Husky Female Stray
Pretty Girl Domestic Short Hair Female Stray
Pretty Girl Domestic Short Hair Female Stray
----------- ----------------------- ------ -----
We can add a headers parameter to our table. If we assign headers='firstrow', the first row of data is used, if headers='keys' – the keys of a dataframe / dictionary. For table formatting, we can use a tablefmt parameter, which can take one of the numerous options (assigned as a string): simple, github, grid, fancy_grid, pipe, orgtbl, jira, presto, pretty, etc.
By default, tabulate aligns columns containing float numbers by a decimal point, integers – to the right, text columns – to the left. This can be overridden by using numalign and stralign parameters (right, center, left, decimal for numbers, or None). For text columns, it’s possible to disable the default leading and trailing whitespace removal.
Let’s customize our table:
print(tabulate(
df_pretty_printed,
headers='keys',
tablefmt='fancy_grid',
stralign='center'
))
│ │ animalname │ breedname │ sexname │ returnedreason │
════╪══════════════╪═════════════════════════╪═══════════╪══════════════════╡
0 │ Jadzia │ Domestic Short Hair │ Female │ Stray │
────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
1 │ Gonzo │ German Shepherd Dog/Mix │ Male │ Stray │
────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
2 │ Maggie │ Shep Mix/Siberian Husky │ Female │ Stray │
────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
3 │ Pretty Girl │ Domestic Short Hair │ Female │ Stray │
────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
4 │ Pretty Girl │ Domestic Short Hair │ Female │ Stray │
════╧══════════════╧═════════════════════════╧═══════════╧══════════════════╛
One thing to keep in mind when using this Python library is that it can sometimes have issues getting displayed on smaller screens such as smartphones and iPhones, these pretty-printed tables are best displayed on laptops and computers.
3. Wikipedia
Library installation: pip install wikipedia
What is the Wikipedia library in Python? - Wikipedia library, as its name suggests, facilitates accessing and fetching information from Wikipedia. Some of the tasks that can be accomplished with it include:
- searching Wikipedia – search(),
- getting article summaries – summary,
- getting full page contents, including images, links, any other metadata of a Wikipedia page – page(),
- selecting the language of a page – set_lang().
In the pretty-printed table above, we saw a dog breed called “Siberian Husky”. As an exercise, we’ll set the language to Russian (my native language :slightly_smiling_face:) and search for some suggestions of the corresponding Wikipedia pages:
import wikipedia
ikipedia.set_lang('ru')
rint(wikipedia.search('Siberian Husky'))
['Сибирский хаски', 'Древние породы собак', 'Маккензи Ривер Хаски', 'Породы собак по классификации кинологических организаций', 'Ричардсон, Кевин Майкл']
Let’s take the first suggestion and fetch the first sentence of that page’s summary:
print(wikipedia.summary('Сибирский хаски', sentences=1))
Сибирский хаски — заводская специализированная порода собак, выведенная чукчами северо-восточной части Сибири и зарегистрированная американскими кинологами в 1930-х годах как ездовая собака, полученная от аборигенных собак Дальнего Востока России, в основном из Анадыря, Колымы, Камчатки у местных оседлых приморских племён — юкагиров, кереков, азиатских эскимосов и приморских чукчей — анкальын (приморские, поморы — от анкы (море)).
Now, we’re going to get a link to a picture of Husky from this page:
print(wikipedia.page('Сибирский хаски').images[0])
and visualize this beautiful creature:
4. Wget
Library installation: pip install wget
Wget library allows downloading files in Python without the necessity to open them. We can also add a path where to save this file as a second argument.
Let’s download the picture of Husky above:
import wget
get.download('https://upload.wikimedia.org/wikipedia/commons/a/a3/Black-Magic-Big-Boy.jpg')
'Black-Magic-Big-Boy.jpg'
Now we can find the picture in the same folder as this notebook, since we didn’t specify a path where to save it.
Since any webpage on the Internet is actually an HTML file, another very useful application of this library is to crawl the whole webpage, with all its elements. Let’s download the Kaggle webpage where our dataset is located:
wget.download('https://www.kaggle.com/jinbonnie/animal-data')
'animal-data'
The resulting animal-data file looks like the following (we’ll display only several first rows):
<!DOCTYPE html>
lt;html lang="en">
lt;head>
<title>Animal Care and Control Adopted Animals | Kaggle</title>
<meta charset="utf-8" />
<meta name="robots" content="index, follow" />
<meta name="description" content="animal situation in Bloomington Animal Shelter from 2017-2020" />
<meta name="turbolinks-cache-control" content="no-cache" />
5. Faker
Library installation: pip install Faker
What is Faker library in Python? - This module is used to generate fake data, including names, addresses, emails, phone numbers, jobs, texts, sentences, colors, currencies, etc. The faker generator can take a locale as an argument (the default is en_US locale), to return localized data. For generating a piece of text or a sentence, we can use the default lorem ipsum; alternatively, we can provide our own set of words. To ensure that all the created values are unique for some specific instance (for example, when we want to create a long list of fake names), the .unique property is applied. If instead, it’s necessary to produce the same value or data set, the seed() method is used.
Let’s look at some examples.
from faker import Faker
ake = Faker()
rint(
'Fake color:', fake.color(), '\n'
'Fake job:', fake.job(), '\n'
'Fake email:', fake.email(), '\n'
)
Printing a list of fake Korean and Portuguese addresses
ake = Faker(['ko_KR', 'pt_BR'])
or _ in range(5):
print(fake.unique.address()) # using the .unique
property
rint('\n')
Assigning a seed number to print always the same value / data set
ake = Faker()
aker.seed(3920)
rint('This English fake name is always the same:', fake.name())
Fake color: #212591
ake job: Occupational therapist
ake email: [email protected]
strada Lavínia da Luz, 62
este
5775858 Moura / SE
esidencial de Moreira, 57
orro Dos Macacos
5273529 Farias / TO
종특별자치시 강남구 가락거리 (예원박김마을)
라북도 광주시 백제고분길 (승민우리)
상남도 당진시 가락53가
his English fake name is always the same: Kim Lopez
Returning to our dataset, we found out that there are at least two unlucky pets with not really nice names:
df_bad_names = df[df['animalname'].str.contains('Stink|Pooh')]
rint(df_bad_names)
identichipnumber animalname breedname speciesname sexname \
692 NaN Stinker Domestic Short Hair Cat Male
336 981020023417175 Pooh German Shepherd Dog Dog Female
337 981020023417175 Pooh German Shepherd Dog Dog Female
returndate returnedreason
692 NaN Stray
336 2018-05-14 00:00:00 Incompatible with owner lifestyle
337 NaN Stray
The dog from the last 2 rows is actually the same one, returned to the shelter because of being incompatible with the owner’s lifestyle. With our new skills, let’s save the reputation of both animals and rename them into something more decent. Since the dog is a German Shepherd, we’ll select a German name for her. As for the cat, according to this Wikipedia page, Domestic Short Hair is the most common breed in the US, so for him, we’ll select an English name.
# Defining a function to rename the unlucky pets
ef rename_pets(name):
if name == 'Stinker':
fake = Faker()
Faker.seed(162)
name = fake.name()
if name == 'Pooh':
fake = Faker(['de_DE'])
Faker.seed(20387)
name = fake.name()
return name
Renaming the pets
f['animalname'] = df['animalname'].apply(rename_pets)
Checking the results
rint(df.iloc[df_bad_names.index.tolist(), :] )
identichipnumber animalname breedname speciesname \
692 NaN Steven Harris Domestic Short Hair Cat
336 981020023417175 Helena Fliegner-Karz German Shepherd Dog Dog
337 981020023417175 Helena Fliegner-Karz German Shepherd Dog Dog
sexname returndate returnedreason
692 Male NaN Stray
336 Female 2018-05-14 00:00:00 Incompatible with owner lifestyle
337 Female NaN Stray
Steven Harris and Helena Fliegner-Karz sound a little bit too bombastic for a cat and a dog, but definitely much better than their previous names!
6. Numerizer
Library installation: pip install numerizer
What is the Numerizer library in Python? - This small Python package is used for converting natural language numerics into numbers (integers and floats) and consists of only one function – numerize().
Let’s try it right now on our dataset. Some pets’ names contain numbers:
df_numerized_names = df[['identichipnumber', 'animalname', 'speciesname']]\
[df['animalname'].str.contains('Two|Seven|Fifty')]
f_numerized_names
identichipnumber | animalname | speciesname | |
---|---|---|---|
2127 | NaN | Seven | Dog |
4040 | 981020025503945 | Fifty Lee | Cat |
6519 | 981020021481875 | Two Toes | Cat |
6520 | 981020021481875 | Two Toes | Cat |
7757 | 981020029737857 | Mew Two | Cat |
7758 | 981020029737857 | Mew Two | Cat |
7759 | 981020029737857 | Mew Two | Cat |
We’re going to convert the numeric part of these names into real numbers:
from numerizer import numerize
f['animalname'] = df['animalname'].apply(lambda x: numerize(x))
f[['identichipnumber', 'animalname', 'speciesname']].iloc[df_numerized_names.index.tolist(), :]
identichipnumber | animalname | speciesname | |
---|---|---|---|
2127 | NaN | 7 | Dog |
4040 | 981020025503945 | 50 Lee | Cat |
6519 | 981020021481875 | 2 Toes | Cat |
6520 | 981020021481875 | 2 Toes | Cat |
7757 | 981020029737857 | Mew 2 | Cat |
7758 | 981020029737857 | Mew 2 | Cat |
7759 | 981020029737857 | Mew 2 | Cat |
7. Emoji
Library installation: pip install emoji
What is the Emoji library in Python? - By using this library, we can convert strings to emoji, according to the Emoji codes as defined by the Unicode Consortium 2, and, if specified use_aliases=True, complemented with the aliases. The emoji package has only two functions: emojize() and demojize(). The default English language (language='en') can be changed to Spanish (es), Portuguese (pt), or Italian (it).
import emoji
rint(emoji.emojize(':koala:'))
rint(emoji.demojize(''))
rint(emoji.emojize(':rana:', language='it'))
koala:
img role="img" class="emoji" alt="🐸" src="https://s.w.org/images/core/emoji/13.1.0/svg/1f438.svg">
/code>
Let’s emojize our animals. First, we’ll check their unique species names:
print(df['speciesname'].unique())
['Cat' 'Dog' 'House Rabbit' 'Rat' 'Bird' 'Opossum' 'Chicken' 'Wildlife'
'Ferret' 'Tortoise' 'Pig' 'Hamster' 'Guinea Pig' 'Gerbil' 'Lizard'
Hedgehog' 'Chinchilla' 'Goat' 'Snake' 'Squirrel' 'Sugar Glider' 'Turtle'
Tarantula' 'Mouse' 'Raccoon' 'Livestock' 'Fish']
We have to convert these names into lower case, add leading and trailing colons to each, and then apply emojize() to the result:
df['speciesname'] = df['speciesname'].apply(lambda x: emoji.emojize(f':{x.lower()}:',
use_aliases=True))
rint(df['speciesname'].unique())
['' '' ':house rabbit:' '' '' ':opossum:' '' ':wildlife:' ':ferret:'
:tortoise:' '' '' ':guinea pig:' ':gerbil:' '' '' ':chinchilla:' ''
' ':squirrel:' ':sugar glider:' '' ':tarantula:' '' '' ':livestock:'
']
Let’s rename the house rabbit, tortoise, and squirrel into their synonyms comprehensible for the emoji library and try emojizing them again:
df['speciesname'] = df['speciesname'].str.replace(':house rabbit:', ':rabbit:')\
.replace(':tortoise:', ':turtle:')\
.replace(':squirrel:', ':chipmunk:')
f['speciesname'] = df['speciesname'].apply(lambda x: emoji.emojize(x, variant='emoji_type'))
rint(df['speciesname'].unique())
['' '' '️' '' '' ':opossum:️' '' ':wildlife:️' ':ferret:️' '️' ''
' ':guinea pig:' ':gerbil:️' '' '' ':chinchilla:️' '' '' ''
:sugar glider:' '' ':tarantula:️' '' '' ':livestock:️' '']
The remaining species are of collective names (wildlife and livestock), or don’t have an emoji equivalent, at least not yet. We’ll leave them as they are, removing only the colons and converting them back into title case:
df['speciesname'] = df['speciesname'].str.replace(':', '').apply(lambda x: x.title())
rint(df['speciesname'].unique())
f[['animalname', 'speciesname', 'breedname']].head(3)
['' '' '️' '' '' 'Opossum️' '' 'Wildlife️' 'Ferret️' '️' '' ''
Guinea Pig' 'Gerbil️' '' '' 'Chinchilla️' '' '' '' 'Sugar Glider'
' 'Tarantula️' '' '' 'Livestock️' '']
animalname | speciesname | breedname | |
---|---|---|---|
0 | Jadzia | Domestic Short Hair | |
1 | Gonzo | German Shepherd Dog/Mix | |
2 | Maggie | Shep Mix/Siberian Husky |
8. PyAztro
Library installation: pip install pyaztro
What is the PyAstro library in Python? - PyAztro seems to be designed more for fun than for work. This library provides a horoscope for each zodiac sign. The prediction includes the description of a sign for that day, date range of that sign, mood, lucky number, lucky time, lucky color, compatibility with other signs. For example:
import pyaztro
yaztro.Aztro(sign='taurus').description
'You need to make a radical change in some aspect of your life - probably related to your home. It could be time to buy or sell or just to move on to some more promising location.'
Great! I’m already running to buy a new house
In our dataset, there are a cat and a dog called Aries:
df[['animalname', 'speciesname']][(df['animalname'] == 'Aries')]
animalname | speciesname | |
---|---|---|
3036 | Aries | |
9255 | Aries |
and plenty of pets called Leo:
print('Leo:', df['animalname'][(df['animalname'] == 'Leo')].count())
Leo: 18
Let’s assume that those are their corresponding zodiac signs 😉 With PyAztro, we can check what the stars have prepared for these animals for today:
aries = pyaztro.Aztro(sign='aries')
eo = pyaztro.Aztro(sign='leo')
rint('ARIES: \n',
'Sign:', aries.sign, '\n',
'Current date:', aries.current_date, '\n',
'Date range:', aries.date_range, '\n',
'Sign description:', aries.description, '\n',
'Mood:', aries.mood, '\n',
'Compatibility:', aries.compatibility, '\n',
'Lucky number:', aries.lucky_number, '\n',
'Lucky time:', aries.lucky_time, '\n',
'Lucky color:', aries.color, 2*'\n',
'LEO: \n',
'Sign:', leo.sign, '\n',
'Current date:', leo.current_date, '\n',
'Date range:', leo.date_range, '\n',
'Sign description:', leo.description, '\n',
'Mood:', leo.mood, '\n',
'Compatibility:', leo.compatibility, '\n',
'Lucky number:', leo.lucky_number, '\n',
'Lucky time:', leo.lucky_time, '\n',
'Lucky color:', leo.color)
ARIES:
Sign: aries
Current date: 2021-02-06
Date range: [datetime.datetime(2021, 3, 21, 0, 0), datetime.datetime(2021, 4, 20, 0, 0)]
Sign description: It's a little harder to convince people your way is best today -- in part because it's much tougher to play on their emotions. Go for the intellectual arguments and you should do just fine.
Mood: Helpful
Compatibility: Leo
Lucky number: 18
Lucky time: 8am
Lucky color: Gold
LEO:
Sign: leo
Current date: 2021-02-06
Date range: [datetime.datetime(2021, 7, 23, 0, 0), datetime.datetime(2021, 8, 22, 0, 0)]
Sign description: Big problems need big solutions -- but none of the obvious ones seem to be working today! You need to stretch your mind as far as it will go in order to really make sense of today's issues.
Mood: Irritated
Compatibility: Libra
Lucky number: 44
Lucky time: 12am
Lucky color: Navy Blue
These forecasts are valid for 06.02.2021, so if you want to check our pets’ horoscope (or maybe your own one) for the current day, you have to re-run the code above. All the properties, apart from, evidently, sign and date_range, change every day for each zodiac sign at midnight GTM.
Certainly, there are many other funny Python libraries like PyAztro, including:
- Art 2 – for converting text to ASCII art, like this: ʕ •`ᴥ•´ʔ
- Turtle – for drawing,
- Chess 2 – for playing chess,
- Santa – for randomly pairing Secret Santa gifters and recipients,
and even
- Pynder 5 – for using Tinder.
We can be sure that by using these rare Python libraries we’ll never get bored!
Conclusion
To sum up, I wish all the pets from the dataset to find their loving and caring owners, and the Python users – to discover more amazing libraries and apply them to their projects.
What Should You Do Next?
- Learn how to use Python to identify and deal with missing data in this lesson
- Learn how to approach a Kaggle competition in this intro lesson
- Learn how to use Python to analyze Wikipedia pages in this guided project