August 29, 2021

8 Rarely Used Python Libraries & How to Use Them

The most popular Python libraries out there are usually TensorFlow, Numpy, PyTorch, Pandas, Scikit-Learn, Keras and a few others. Although you may come across these names pretty frequently there are thousands of Python libraries out there that you can work with. In this article we are going to focus on how to use Python libraries that are rarely used or heard of, but that are incredibly useful to solve specific tasks or that you can use for a fun project.

The Python libraries we are going to practice on are:

Missingno
Tabulate
Wikipedia
Wget
Faker
Numerizer
Emoji
PyAztro

To begin we’ll download a dataset from Kaggle – Animal Care and Control Adopted Animals

import pandas as pd
f = pd.read_csv('animal-data-1.csv')
rint('Number of pets:', len(df))
rint(df.head(3))

Number of pets: 10290

         id           intakedate intakereason  istransfer sheltercode  \
   0  15801  2009-11-28 00:00:00       Moving           0   C09115463
   1  15932  2009-12-08 00:00:00       Moving           0   D09125594
   2  28859  2012-08-10 00:00:00    Abandoned           0   D12082309

     identichipnumber animalname                breedname basecolour speciesname  \
   0       0A115D7358     Jadzia      Domestic Short Hair     Tortie         Cat
   1       0A11675477      Gonzo  German Shepherd Dog/Mix        Tan         Dog
   2       0A13253C7B     Maggie  Shep Mix/Siberian Husky    Various         Dog

      ...         movementdate movementtype istrial returndate returnedreason  \
   0  ...  2017-05-13 00:00:00     Adoption     0.0        NaN          Stray
   1  ...  2017-04-24 00:00:00     Adoption     0.0        NaN          Stray
   2  ...  2017-04-15 00:00:00     Adoption     0.0        NaN          Stray

      deceaseddate deceasedreason diedoffshelter puttosleep isdoa
   0           NaN   Died in care              0          0     0
   1           NaN   Died in care              0          0     0
   2           NaN   Died in care              0          0     0

   [3 rows x 23 columns]

1. Missingno

Library installation: pip install missingno

What is Missingno in Python? - Missingno is a special Python library used for displaying missing values in a dataframe. Of course, we can also use a seaborn heatmap or a bar plot from any visualization library for this purpose. However, in such cases, we’ll have to first create a series containing missing values in each column using df.isnull().sum(), while missingno already does all this under the hood. This Python library offers a few types of charts:

matrix displays density patterns in data completion for up to 50 columns of a dataframe, and it is analogous to the seaborn missing value heatmap. Also, by means of the sparkline at right, it shows the general shape of the data completeness by row, emphasizing the rows with the maximum and minimum nullity.
bar chart shows nullity visualization in bars by column.
heatmap measures nullity correlation that ranges from -1 to 1. Essentially, it shows how strongly the presence or absence of one variable affects the presence of another. Columns with no missing values, or just the opposite, completely empty, are excluded from the visualization, having no meaningful correlation.
dendrogram, like the heatmap, measures nullity relationships between columns, but in this case not pairwise but between groups of columns, detecting clusters of missing data. Those variables that are located closer on the chart show a stronger nullity correlation. For dataframes with less than 50 columns the dendrogram is vertical, otherwise, it flips to a horizontal.
Let’s try all these charts with their default settings on our pet dataset:

import missingno as msno

sno.matrix(df)

msno.bar(df)

msno.heatmap(df)

msno.dendrogram(df)

We can make the following observations about the dataset:

In general, there are a few missing values.
The most empty columns are deceaseddate and returndate.
The majority of pets are chipped.
Nullity correlation:
- slightly negative between being chipped and being dead,
- slightly positive – being chipped vs. being returned, being returned vs. being dead.

There are a few options to customize missingno charts: figsize, fontsize, sort (sorts the rows by completeness, in either ascending or descending order), labels (can be True or False, meaning whether to show or not the column labels). Some parameters are chart-specific: color for matrix and bar charts, sparkline (whether to draw it or not) and width_ratios (matrix width to sparkline width) for matrix, log (logarithmic scale) for bar charts, cmap colormap for heatmap, orientation for dendrogram. Let’s apply some of them to one of our charts above:

msno.matrix(
           df,
           figsize=(25,7),
           fontsize=30,
           sort='descending',
           color=(0.494, 0.184, 0.556),
           width_ratios=(10, 1)
           )

Finally, if there is still something we would like to tune, we can always add any functionality of matplotlib to the missingno graphs. To do this, we should add the parameter inline and assign it to False. Let’s add a title to our matrix chart:

import matplotlib.pyplot as plt
sno.matrix(
           df,
           figsize=(25,7),
           fontsize=30,
           sort='descending',
           color=(0.494, 0.184, 0.556),
           width_ratios=(10, 1),
           inline=False
           )
lt.title('Missing Values Pet Dataset', fontsize=55)
lt.show()

For further practice, let’s keep only the most interesting columns of our dataframe:

columns = ['identichipnumber', 'animalname', 'breedname', 'speciesname', 'sexname', 'returndate',
          'returnedreason']
f = df[columns]

Missingno Documentation

2. Tabulate

Library installation: pip install tabulate

What is Tabulate in Python? - This Python library serves for pretty-printing tabular data in Python. It allows smart and customizable column alignment, number and text formatting, alignment by a decimal point.

The tabulate() function takes a tabular data type (dataframe, list of lists or dictionaries, dictionary, NumPy array), some other optional parameters, and outputs a nicely formatted table. Let’s practice it on a fragment of our pet dataset, starting with the most basic pretty-printed table:

from tabulate import tabulate
f_pretty_printed = df.iloc[:5, [1,2,4,6]]
rint(tabulate(df_pretty_printed))

-  -----------  -----------------------  ------  -----
  Jadzia       Domestic Short Hair      Female  Stray
  Gonzo        German Shepherd Dog/Mix  Male    Stray
  Maggie       Shep Mix/Siberian Husky  Female  Stray
  Pretty Girl  Domestic Short Hair      Female  Stray
  Pretty Girl  Domestic Short Hair      Female  Stray
  -----------  -----------------------  ------  -----

We can add a headers parameter to our table. If we assign headers='firstrow', the first row of data is used, if headers='keys' – the keys of a dataframe / dictionary. For table formatting, we can use a tablefmt parameter, which can take one of the numerous options (assigned as a string): simple, github, grid, fancy_grid, pipe, orgtbl, jira, presto, pretty, etc.

By default, tabulate aligns columns containing float numbers by a decimal point, integers – to the right, text columns – to the left. This can be overridden by using numalign and stralign parameters (right, center, left, decimal for numbers, or None). For text columns, it’s possible to disable the default leading and trailing whitespace removal.

Let’s customize our table:

print(tabulate(
              df_pretty_printed,
              headers='keys',
              tablefmt='fancy_grid',
              stralign='center'
              ))

│    │  animalname  │        breedname        │  sexname  │  returnedreason  │
════╪══════════════╪═════════════════════════╪═══════════╪══════════════════╡
  0 │    Jadzia    │   Domestic Short Hair   │  Female   │      Stray       │
────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
  1 │    Gonzo     │ German Shepherd Dog/Mix │   Male    │      Stray       │
────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
  2 │    Maggie    │ Shep Mix/Siberian Husky │  Female   │      Stray       │
────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
  3 │ Pretty Girl  │   Domestic Short Hair   │  Female   │      Stray       │
────┼──────────────┼─────────────────────────┼───────────┼──────────────────┤
  4 │ Pretty Girl  │   Domestic Short Hair   │  Female   │      Stray       │
════╧══════════════╧═════════════════════════╧═══════════╧══════════════════╛

One thing to keep in mind when using this Python library is that it can sometimes have issues getting displayed on smaller screens such as smartphones and iPhones, these pretty-printed tables are best displayed on laptops and computers.

Tabulate Documentation

3. Wikipedia

Library installation: pip install wikipedia

What is the Wikipedia library in Python? - Wikipedia library, as its name suggests, facilitates accessing and fetching information from Wikipedia. Some of the tasks that can be accomplished with it include:

searching Wikipedia – search(),
getting article summaries – summary,
getting full page contents, including images, links, any other metadata of a Wikipedia page – page(),
selecting the language of a page – set_lang().

In the pretty-printed table above, we saw a dog breed called “Siberian Husky”. As an exercise, we’ll set the language to Russian (my native language :slightly_smiling_face:) and search for some suggestions of the corresponding Wikipedia pages:

import wikipedia
ikipedia.set_lang('ru')
rint(wikipedia.search('Siberian Husky'))

['Сибирский хаски', 'Древние породы собак', 'Маккензи Ривер Хаски', 'Породы собак по классификации кинологических организаций', 'Ричардсон, Кевин Майкл']

Let’s take the first suggestion and fetch the first sentence of that page’s summary:

print(wikipedia.summary('Сибирский хаски', sentences=1))

Сибирский хаски — заводская специализированная порода собак, выведенная чукчами северо-восточной части Сибири и зарегистрированная американскими кинологами в 1930-х годах как ездовая собака, полученная от аборигенных собак Дальнего Востока России, в основном из Анадыря, Колымы, Камчатки у местных оседлых приморских племён — юкагиров, кереков, азиатских эскимосов и приморских чукчей — анкальын (приморские, поморы — от анкы (море)).

Now, we’re going to get a link to a picture of Husky from this page:

print(wikipedia.page('Сибирский хаски').images[0])

and visualize this beautiful creature:

Wikipedia Documentation

4. Wget

Library installation: pip install wget

Wget library allows downloading files in Python without the necessity to open them. We can also add a path where to save this file as a second argument.

Let’s download the picture of Husky above:

import wget
get.download('https://upload.wikimedia.org/wikipedia/commons/a/a3/Black-Magic-Big-Boy.jpg')

'Black-Magic-Big-Boy.jpg'

Now we can find the picture in the same folder as this notebook, since we didn’t specify a path where to save it.

Since any webpage on the Internet is actually an HTML file, another very useful application of this library is to crawl the whole webpage, with all its elements. Let’s download the Kaggle webpage where our dataset is located:

wget.download('https://www.kaggle.com/jinbonnie/animal-data')

'animal-data'

The resulting animal-data file looks like the following (we’ll display only several first rows):

<!DOCTYPE html>
lt;html lang="en">
lt;head>
   <title>Animal Care and Control Adopted Animals | Kaggle</title>
   <meta charset="utf-8" />
   <meta name="robots" content="index, follow" />
   <meta name="description" content="animal situation in Bloomington Animal Shelter from 2017-2020" />
   <meta name="turbolinks-cache-control" content="no-cache" />

Wget Documentation

5. Faker

Library installation: pip install Faker

What is Faker library in Python? - This module is used to generate fake data, including names, addresses, emails, phone numbers, jobs, texts, sentences, colors, currencies, etc. The faker generator can take a locale as an argument (the default is en_US locale), to return localized data. For generating a piece of text or a sentence, we can use the default lorem ipsum; alternatively, we can provide our own set of words. To ensure that all the created values are unique for some specific instance (for example, when we want to create a long list of fake names), the .unique property is applied. If instead, it’s necessary to produce the same value or data set, the seed() method is used.

Let’s look at some examples.

from faker import Faker
ake = Faker()

rint(
     'Fake color:', fake.color(), '\n'
     'Fake job:',   fake.job(),   '\n'
     'Fake email:', fake.email(), '\n'
     )

 Printing a list of fake Korean and Portuguese addresses
ake = Faker(['ko_KR', 'pt_BR'])
or _ in range(5):
   print(fake.unique.address())     # using the .unique property

rint('\n')

 Assigning a seed number to print always the same value / data set
ake = Faker()
aker.seed(3920)
rint('This English fake name is always the same:', fake.name())

Fake color: #212591
ake job: Occupational therapist
ake email: [email protected]

strada Lavínia da Luz, 62
este
5775858 Moura / SE
esidencial de Moreira, 57
orro Dos Macacos
5273529 Farias / TO
종특별자치시 강남구 가락거리 (예원박김마을)
라북도 광주시 백제고분길 (승민우리)
상남도 당진시 가락53가

his English fake name is always the same: Kim Lopez

Returning to our dataset, we found out that there are at least two unlucky pets with not really nice names:

df_bad_names = df[df['animalname'].str.contains('Stink|Pooh')]
rint(df_bad_names)

identichipnumber animalname            breedname speciesname sexname  \
692              NaN    Stinker  Domestic Short Hair         Cat    Male
336  981020023417175       Pooh  German Shepherd Dog         Dog  Female
337  981020023417175       Pooh  German Shepherd Dog         Dog  Female

              returndate                     returnedreason
692                  NaN                              Stray
336  2018-05-14 00:00:00  Incompatible with owner lifestyle
337                  NaN                              Stray

The dog from the last 2 rows is actually the same one, returned to the shelter because of being incompatible with the owner’s lifestyle. With our new skills, let’s save the reputation of both animals and rename them into something more decent. Since the dog is a German Shepherd, we’ll select a German name for her. As for the cat, according to this Wikipedia page, Domestic Short Hair is the most common breed in the US, so for him, we’ll select an English name.

# Defining a function to rename the unlucky pets
ef rename_pets(name):
   if name == 'Stinker':
       fake = Faker()
       Faker.seed(162)
       name = fake.name()
   if name == 'Pooh':
       fake = Faker(['de_DE'])
       Faker.seed(20387)
       name = fake.name()
   return name

 Renaming the pets
f['animalname'] = df['animalname'].apply(rename_pets)

 Checking the results
rint(df.iloc[df_bad_names.index.tolist(), :] )

identichipnumber            animalname            breedname speciesname  \
692              NaN         Steven Harris  Domestic Short Hair         Cat
336  981020023417175  Helena Fliegner-Karz  German Shepherd Dog         Dog
337  981020023417175  Helena Fliegner-Karz  German Shepherd Dog         Dog

    sexname           returndate                     returnedreason
692    Male                  NaN                              Stray
336  Female  2018-05-14 00:00:00  Incompatible with owner lifestyle
337  Female                  NaN                              Stray

Steven Harris and Helena Fliegner-Karz sound a little bit too bombastic for a cat and a dog, but definitely much better than their previous names!

Faker Documentation

6. Numerizer

Library installation: pip install numerizer

What is the Numerizer library in Python? - This small Python package is used for converting natural language numerics into numbers (integers and floats) and consists of only one function – numerize().

Let’s try it right now on our dataset. Some pets’ names contain numbers:

df_numerized_names = df[['identichipnumber', 'animalname', 'speciesname']]\
                       [df['animalname'].str.contains('Two|Seven|Fifty')]
f_numerized_names

	identichipnumber	animalname	speciesname
2127	NaN	Seven	Dog
4040	981020025503945	Fifty Lee	Cat
6519	981020021481875	Two Toes	Cat
6520	981020021481875	Two Toes	Cat
7757	981020029737857	Mew Two	Cat
7758	981020029737857	Mew Two	Cat
7759	981020029737857	Mew Two	Cat

We’re going to convert the numeric part of these names into real numbers:

from numerizer import numerize
f['animalname'] = df['animalname'].apply(lambda x: numerize(x))
f[['identichipnumber', 'animalname', 'speciesname']].iloc[df_numerized_names.index.tolist(), :]

	identichipnumber	animalname	speciesname
2127	NaN	7	Dog
4040	981020025503945	50 Lee	Cat
6519	981020021481875	2 Toes	Cat
6520	981020021481875	2 Toes	Cat
7757	981020029737857	Mew 2	Cat
7758	981020029737857	Mew 2	Cat
7759	981020029737857	Mew 2	Cat

Numerizer Documentation

7. Emoji

Library installation: pip install emoji

What is the Emoji library in Python? - By using this library, we can convert strings to emoji, according to the Emoji codes as defined by the Unicode Consortium 2, and, if specified use_aliases=True, complemented with the aliases. The emoji package has only two functions: emojize() and demojize(). The default English language (language='en') can be changed to Spanish (es), Portuguese (pt), or Italian (it).

import emoji
rint(emoji.emojize(':koala:'))
rint(emoji.demojize(''))
rint(emoji.emojize(':rana:', language='it'))


koala:
img role="img" class="emoji" alt="🐸" src="https://s.w.org/images/core/emoji/13.1.0/svg/1f438.svg">
/code>

Let’s emojize our animals. First, we’ll check their unique species names:

print(df['speciesname'].unique())

['Cat' 'Dog' 'House Rabbit' 'Rat' 'Bird' 'Opossum' 'Chicken' 'Wildlife'
'Ferret' 'Tortoise' 'Pig' 'Hamster' 'Guinea Pig' 'Gerbil' 'Lizard'
Hedgehog' 'Chinchilla' 'Goat' 'Snake' 'Squirrel' 'Sugar Glider' 'Turtle'
Tarantula' 'Mouse' 'Raccoon' 'Livestock' 'Fish']

We have to convert these names into lower case, add leading and trailing colons to each, and then apply emojize() to the result:

df['speciesname'] = df['speciesname'].apply(lambda x: emoji.emojize(f':{x.lower()}:',
                                                                   use_aliases=True))
rint(df['speciesname'].unique())

['' '' ':house rabbit:' '' '' ':opossum:' '' ':wildlife:' ':ferret:'
:tortoise:' '' '' ':guinea pig:' ':gerbil:' '' '' ':chinchilla:' ''
' ':squirrel:' ':sugar glider:' '' ':tarantula:' '' '' ':livestock:'
']

Let’s rename the house rabbit, tortoise, and squirrel into their synonyms comprehensible for the emoji library and try emojizing them again:

df['speciesname'] = df['speciesname'].str.replace(':house rabbit:', ':rabbit:')\
                                        .replace(':tortoise:', ':turtle:')\
                                        .replace(':squirrel:', ':chipmunk:')
f['speciesname'] = df['speciesname'].apply(lambda x: emoji.emojize(x, variant='emoji_type'))
rint(df['speciesname'].unique())

['' '' '️' '' '' ':opossum:️' '' ':wildlife:️' ':ferret:️' '️' ''
' ':guinea pig:' ':gerbil:️' '' '' ':chinchilla:️' '' '' ''
:sugar glider:' '' ':tarantula:️' '' '' ':livestock:️' '']

The remaining species are of collective names (wildlife and livestock), or don’t have an emoji equivalent, at least not yet. We’ll leave them as they are, removing only the colons and converting them back into title case:

df['speciesname'] = df['speciesname'].str.replace(':', '').apply(lambda x: x.title())
rint(df['speciesname'].unique())
f[['animalname', 'speciesname', 'breedname']].head(3)

['' '' '️' '' '' 'Opossum️' '' 'Wildlife️' 'Ferret️' '️' '' ''
Guinea Pig' 'Gerbil️' '' '' 'Chinchilla️' '' '' '' 'Sugar Glider'
' 'Tarantula️' '' '' 'Livestock️' '']

	animalname	breedname
0	Jadzia	Domestic Short Hair
1	Gonzo	German Shepherd Dog/Mix
2	Maggie	Shep Mix/Siberian Husky

Emoji Documentation

8. PyAztro

Library installation: pip install pyaztro

What is the PyAstro library in Python? - PyAztro seems to be designed more for fun than for work. This library provides a horoscope for each zodiac sign. The prediction includes the description of a sign for that day, date range of that sign, mood, lucky number, lucky time, lucky color, compatibility with other signs. For example:

import pyaztro
yaztro.Aztro(sign='taurus').description

'You need to make a radical change in some aspect of your life - probably related to your home. It could be time to buy or sell or just to move on to some more promising location.'

Great! I’m already running to buy a new house

In our dataset, there are a cat and a dog called Aries:

df[['animalname', 'speciesname']][(df['animalname'] == 'Aries')]

	animalname	speciesname
3036	Aries
9255	Aries

and plenty of pets called Leo:

print('Leo:', df['animalname'][(df['animalname'] == 'Leo')].count())

Leo: 18

Let’s assume that those are their corresponding zodiac signs 😉 With PyAztro, we can check what the stars have prepared for these animals for today:

aries = pyaztro.Aztro(sign='aries')
eo = pyaztro.Aztro(sign='leo')

rint('ARIES: \n',
     'Sign:',             aries.sign,          '\n',
     'Current date:',     aries.current_date,  '\n',
     'Date range:',       aries.date_range,    '\n',
     'Sign description:', aries.description,   '\n',
     'Mood:',             aries.mood,          '\n',
     'Compatibility:',    aries.compatibility, '\n',
     'Lucky number:',     aries.lucky_number,  '\n',
     'Lucky time:',       aries.lucky_time,    '\n',
     'Lucky color:',      aries.color,       2*'\n',

     'LEO: \n',
     'Sign:',             leo.sign,            '\n',
     'Current date:',     leo.current_date,    '\n',
     'Date range:',       leo.date_range,      '\n',
     'Sign description:', leo.description,     '\n',
     'Mood:',             leo.mood,            '\n',
     'Compatibility:',    leo.compatibility,   '\n',
     'Lucky number:',     leo.lucky_number,    '\n',
     'Lucky time:',       leo.lucky_time,      '\n',
     'Lucky color:',      leo.color)

ARIES:
Sign: aries
Current date: 2021-02-06
Date range: [datetime.datetime(2021, 3, 21, 0, 0), datetime.datetime(2021, 4, 20, 0, 0)]
Sign description: It's a little harder to convince people your way is best today -- in part because it's much tougher to play on their emotions. Go for the intellectual arguments and you should do just fine.
Mood: Helpful
Compatibility: Leo
Lucky number: 18
Lucky time: 8am
Lucky color: Gold

LEO:
Sign: leo
Current date: 2021-02-06
Date range: [datetime.datetime(2021, 7, 23, 0, 0), datetime.datetime(2021, 8, 22, 0, 0)]
Sign description: Big problems need big solutions -- but none of the obvious ones seem to be working today! You need to stretch your mind as far as it will go in order to really make sense of today's issues.
Mood: Irritated
Compatibility: Libra
Lucky number: 44
Lucky time: 12am
Lucky color: Navy Blue

These forecasts are valid for 06.02.2021, so if you want to check our pets’ horoscope (or maybe your own one) for the current day, you have to re-run the code above. All the properties, apart from, evidently, sign and date_range, change every day for each zodiac sign at midnight GTM.

PyAztro Documentation

Certainly, there are many other funny Python libraries like PyAztro, including:

Art 2 – for converting text to ASCII art, like this: ʕ •`ᴥ•´ʔ
Turtle – for drawing,
Chess 2 – for playing chess,
Santa – for randomly pairing Secret Santa gifters and recipients,

and even

Pynder 5 – for using Tinder.

We can be sure that by using these rare Python libraries we’ll never get bored!

Conclusion

To sum up, I wish all the pets from the dataset to find their loving and caring owners, and the Python users – to discover more amazing libraries and apply them to their projects.

What Should You Do Next?

Learn how to use Python to identify and deal with missing data in this lesson
Learn how to approach a Kaggle competition in this intro lesson
Learn how to use Python to analyze Wikipedia pages in this guided project

Python Libraries

8 Rarely Used Python Libraries & How to Use Them

1. Missingno

2. Tabulate

3. Wikipedia

4. Wget

5. Faker

6. Numerizer

7. Emoji

8. PyAztro

Conclusion

What Should You Do Next?

10 Impressive Tableau Projects for Your Portfolio

Postgres Internals: Building a Description Tool

8 Rarely Used Python Libraries & How to Use Them

1. Missingno

2. Tabulate

3. Wikipedia

4. Wget

5. Faker

6. Numerizer

7. Emoji

8. PyAztro

Conclusion

What Should You Do Next?

More learning resources

10 Impressive Tableau Projects for Your Portfolio

Postgres Internals: Building a Description Tool