September 5, 2017

How to Generate FiveThirtyEight Graphs in Python

If you read data science articles, you may have already stumbled upon FiveThirtyEight's content. Naturally, you were impressed by their awesome visualizations. You wanted to make your own awesome visualizations and so asked Quora and Reddit how to do it. You received some answers, but they were rather vague. You still can't get the graphs done yourself. In this post, we'll help you. Using Python's matplotlib and pandas, we'll see that it's rather easy to replicate the core parts of any FiveThirtyEight (FTE) visualization. We'll start here: default_graph And, at the end of the tutorial, arrive here: final3 To follow along, you'll need at least some basic knowledge of Python. If you know what's the difference between methods and attributes, then you're good to go.

Introducing the dataset

We'll work with data describing the percentages of Bachelors conferred to women in the US from 1970 to 2011. We'll use a dataset compiled by data scientist

Randal Olson, who collected the data from the National Center for Education Statistics. If you want to follow along by writing code yourself, you can download the data from Randal's blog. To save yourself some time, you can skip downloading the file, and just pass in the direct link to pandas' read_csv() function. In the following code cell, we:

  • Import the pandas module.
  • Assign the direct link toward the dataset as a string to a variable named direct_link.
  • Read in the data by using read_csv(), and assign the content to women_majors.
  • Print information about the dataset by using the info()
    
    import pandas as pd
    direct_link = 'https://www.randalolson.com/wp-content/uploads/percent-bachelors-degrees-women-usa.csv'
    women_majors = pd.read_csv(direct_link)
    print(women_majors.info())
    women_majors.head()
    
    
    RangeIndex: 42 entries, 0 to 41
    Data columns (total 18 columns):
    Year                             42 non-null int64
    Agriculture                      42 non-null float64
    Architecture                     42 non-null float64
    Art and Performance              42 non-null float64
    Biology                          42 non-null float64
    Business                         42 non-null float64
    Communications and Journalism    42 non-null float64
    Computer Science                 42 non-null float64
    Education                        42 non-null float64
    Engineering                      42 non-null float64
    English                          42 non-null float64
    Foreign Languages                42 non-null float64
    Health Professions               42 non-null float64
    Math and Statistics              42 non-null float64
    Physical Sciences                42 non-null float64
    Psychology                       42 non-null float64
    Public Administration            42 non-null float64
    Social Sciences and History      42 non-null float64
    dtypes: float64(17), int64(1)
    memory usage: 6.0 KB
    None
    
    Year Agriculture Architecture Art and Performance Biology Business Communications and Journalism Computer Science Education Engineering English Foreign Languages Health Professions Math and Statistics Physical Sciences Psychology Public Administration Social Sciences and History
    0 1970 4.229798 11.921005 59.7 29.088363 9.064439 35.3 13.6 74.535328 0.8 65.570923 73.8 77.1 38.0 13.8 44.4 68.4 36.8
    1 1971 5.452797 12.003106 59.9 29.394403 9.503187 35.5 13.6 74.149204 1.0 64.556485 73.9 75.5 39.0 14.9 46.2 65.5 36.2
    2 1972 7.420710 13.214594 60.4 29.810221 10.558962 36.6 14.9 73.554520 1.2 63.664263 74.6 76.9 40.2 14.8 47.6 62.6 36.1
    3 1973 9.653602 14.791613 60.2 31.147915 12.804602 38.4 16.4 73.501814 1.6 62.941502 74.9 77.4 40.9 16.5 50.4 64.3 36.4
    4 1974 14.074623 17.444688 61.9 32.996183 16.204850 40.5 18.9 73.336811 2.2 62.413412 75.3 77.9 41.8 18.2 52.6 66.1 37.3

    Besides the

    Year column, every other column name indicates the subject of a Bachelor degree. Every datapoint in the Bachelor columns represents the percentage of Bachelor degrees conferred to women. Thus, every row describes the percentage for various Bachelors conferred to women in a given year. As mentioned before, we have data from 1970 to 2011. To confirm the latter limit, let's print the last five rows of the dataset by using the tail() method:

    women_majors.tail()
    Year Agriculture Architecture Art and Performance Biology Business Communications and Journalism Computer Science Education Engineering English Foreign Languages Health Professions Math and Statistics Physical Sciences Psychology Public Administration Social Sciences and History
    37 2007 47.605026 43.100459 61.4 59.411993 49.000459 62.5 17.6 78.721413 16.8 67.874923 70.2 85.4 44.1 40.7 77.1 82.1 49.3
    38 2008 47.570834 42.711730 60.7 59.305765 48.888027 62.4 17.8 79.196327 16.5 67.594028 70.2 85.2 43.3 40.7 77.2 81.7 49.4
    39 2009 48.667224 43.348921 61.0 58.489583 48.840474 62.8 18.1 79.532909 16.8 67.969792 69.3 85.1 43.3 40.7 77.1 82.0 49.4
    40 2010 48.730042 42.066721 61.3 59.010255 48.757988 62.5 17.6 79.618625 17.2 67.928106 69.0 85.0 43.1 40.2 77.0 81.7 49.3
    41 2011 50.037182 42.773438 61.2 58.742397 48.180418 62.2 18.2 79.432812 17.5 68.426730 69.5 84.8 43.1 40.1 76.7 81.9 49.2

    The context of our FiveThirtyEight graph

    Almost every FTE graph is part of an article. The graphs complement the text by illustrating a little story, or an interesting idea. We'll need to be mindful of this while replicating our FTE graph. To avoid digressing from our main task in this tutorial, let's just pretend we've already written most of an article about the evolution of gender disparity in US education. We now need to create a graph to help readers visualize the evolution of gender disparity for Bachelors where the situation was really bad for women in 1970. We've already set a threshold of 20%, and now we want to graph the evolution for every Bachelor where the percentage of women graduates was less than 20% in 1970. Let's first identify those specific Bachelors. In the following code cell, we will:

    • Use .loc, a label-based indexer, to:
      • select the first row (the one that corresponds to 1970);
      • select the items in the first row only where the values are less than 20; the Year field will be checked as well, but will obviously not be included because 1970 is much greater than 20.
    • Assign the resulting content to under_20.
    
    under_20 = women_majors.loc[0, women_majors.loc[0] < 20]
    under_20
    
    Agriculture           4.229798
    Architecture         11.921005
    Business              9.064439
    Computer Science     13.600000
    Engineering           0.800000
    Physical Sciences    13.800000
    Name: 0, dtype: float64

    Using matplotlib's default style

    Let's begin working on our graph. We'll first take a peek at what we can build by default. In the following code block, we will:

    • Run the Jupyter magic Plot the graph by using the plot() method on women_majors. We pass in to plot() the following parameters:
      • x - specifies the column from women_majors to use for the x-axis;
      • y - specifies the columns from women_majors to use for the y-axis; we'll use the index labels of under_20 which are stored in the .index attribute of this object;
      • figsize - sets the size of the figure as a tuple with the format (width, height) in inches.
    • Assign the plot object to a variable named under_20_graph, and print its type to show that pandas uses matplotlib objects under the hood.
    
    under_20_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8))
    print('Type:', type(under_20_graph))
    
    Type: <class 'matplotlib.axes._subplots.AxesSubplot'>

    Using matplotlib's fivethirtyeight style

    The graph above has certain characteristics, like the width and color of the spines, the font size of the y-axis label, the absence of a grid, etc. All of these characteristics make up matplotlib's default style. As a short parenthesis, it's worth mentioning that we'll use a few technical terms about the parts of a graph throughout this post. If you feel lost at any point, you can refer to the legend below.

    anatomy1

    Source: Matplotlib.org

    Besides the default style, matplotlib comes with several built-in styles that we can use readily. To see a list of the available styles, we will:

    • Import the matplotlib.style import matplotlib.style as style style.available
      
      ['seaborn-deep', 
      'seaborn-muted', 
      'bmh', 
      'seaborn-white', 
      'dark_background', 
      'seaborn-notebook', 
      'seaborn-darkgrid', 
      'grayscale', 
      'seaborn-paper', 
      'seaborn-talk', 
      'seaborn-bright',
      'classic', 
      'seaborn-colorblind', 
      'seaborn-ticks', 
      'ggplot', 
      'seaborn', 
      '_classic_test', 
      'fivethirtyeight', 
      'seaborn-dark-palette', 
      'seaborn-dark', 
      'seaborn-whitegrid', 
      'seaborn-pastel', 
      'seaborn-poster']
      

      You might have already observed that there's a built-in style called

      fivethirtyeight. Let's use this style, and see where that leads. For that, we'll use the aptly named use() style.use('fivethirtyeight') women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8))

      538_graphs_AO_11_1 Wow, that's a major change! With respect to our first graph, we can see that this one has a different background color, it has grid lines, there are no spines whatsoever, the weight and the font size of the major tick labels are different, etc. You can read a technical description of the fivethirtyeight style here - it should also give you a good idea about what code runs under the hood when we use this style. The author of the style sheet, Cameron David-Pilon, discusses some of the characteristics here.

      The limitations of matplotlib's fivethirtyeight style

      All in all, using the

      fivethirtyeight style clearly brings us much closer to our goal. Nonetheless, there's still a lot left to do. Let's examine a simple FTE graph, and see what else we need to add to our graph. fandango

      Source: FiveThirtyEight

      By comparing the above graph with what we've made so far, we can see that we still need to:

      • Add a title and a subtitle.
      • Remove the block-style legend, and add labels near the relevant plot lines. We'll also have to make the grid lines transparent around these labels.
      • Add a signature bottom bar which mentions the author of the graph and the source of the data.
      • Add a couple of other small adjustments:
        • increase the font size of the tick labels;
        • add a
        • remove the x-axis label;
        • bold the horizontal grid line at y = 0;
        • add an extra grid line next to the tick labels of the y-axis;
        • increase the lateral margins of the figure.

      adjustments

      Source: FiveThirtyEight

      To minimize the time spent with generating the graph, it's important to avoid beginning adding the title, the subtitle, or any other text snippet. In matplotlib, a text snippet is positioned by specifying the x and y coordinates, as we'll see in some of the sections below. To replicate in detail the FTE graph above, notice that we'll have to align vertically the tick labels of the y-axis with the title and the subtitle. We want to avoid a situation where we have the vertical alignment we want, lost it by increasing the font size of the tick labels, and then have to change the position of the title and subtitle again.

      align_vert

      Source: FiveThirtyEight

      For teaching purposes, we're now going to proceed incrementally with adjusting our FTE graph. Consequently, our code will span over multiple code cells. In practice, however, no more than one code cell will be required.

      Customizing the tick labels

      We'll start by increasing the font size of the tick labels. In the following code cell, we:

      • Plot the graph using the same code as earlier, and assign the resulting object to fte_graph. Assigning to a variable allows us to repeatedly and easily apply methods on the object, or access its attributes.
      • Increase the font size of all the major tick labels using the tick_params() method with the following parameters:
        • axis - specifies the axis that the tick labels we want to modify belong to; here we want to modify the tick labels of both axes;
        • which - indicates what tick labels to be affected (the major or the minor ones; see the legend shown earlier if you don't know the difference);
        • labelsize - sets the font size of the tick labels.
      
      fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8))
      fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)

      538_graphs_AO_13_0 You may have noticed that we didn't use style.use('fivethirtyeight') this time. That's because the preference for any matplotlib style becomes global once it's first declared in our code. We've set the style earlier as fivethirtyeight, and from there on all subsequent graphs inherit this style. If for some reason you want to return to the default state, just run style.use('default'). We'll now build upon our previous changes by making a few adjustments to the tick labels of the y-axis:

      • We add a
      • We also add a few whitespace characters after the other visible labels to align them elegantly with the new "50%" label.

      To make these changes to the tick labels of the y-axis, we'll use the

      set_yticklabels() method along with the label parameter. As you can deduce from the code below, this parameter can take in a list of mixed data types, and doesn't require any fixed number of labels to be passed in.

      
      # The previous code
      fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8))
      fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
      
      # Customizing the tick labels of the y-axis fte_graph.set_yticklabels(labels = [-10, '0   ', '10   ', '20   ', '30   ', '40   ', '50%'])
      print('The tick labels of the y-axis:', fte_graph.get_yticks()) # -10 and 60 are not visible on the graph
      The tick labels of the y-axis: [-10.   0.  10.  20.  30.  40.  50.  60.]

      538_graphs_AO_15_1

      Bolding the horizontal line at y = 0

      We will now bold the horizontal line where the y-coordinate is 0. For that, we'll use the

      axhline() method to add a new horizontal grid line, and cover the existing one. The parameters we use for axhline() are:

      • y - specifies the y-coordinate of the horizontal line;
      • color - indicates the color of the line;
      • linewidth - sets the width of the line;
      • alpha - regulates the transparency of the line, but we use it here to regulate the intensity of the black color; the values for alpha range from 0 (completely transparent) to 1 (completely opaque).
      #
      # The previous code
      fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8))
      fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
      fte_graph.set_yticklabels(labels = [-10, '0   ', '10   ', '20   ', '30   ', '40   ', '50%'])
      
      # Generate a bolded horizontal line at y = 0 
      fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7)
      

      538_graphs_AO_17_1

      Add an extra vertical line

      As we mentioned earlier, we have to add another vertical grid line in the immediate vicinity of the tick labels of the y-axis. For that, we simply tweak the range of the values of the x-axis. Increasing the range's left limit will result in the extra vertical grid line we want. Below, we use the

      set_xlim() method with the self-explanatory parameters left and right.

      
      # The previous code
      fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8))
      fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
      fte_graph.set_yticklabels(labels = [-10, '0   ', '10   ', '20   ', '30   ', '40   ', '50%'])
      fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7)
      
      # Add an extra vertical line by tweaking the range of the x-axis
      fte_graph.set_xlim(left = 1969, right = 2011)
      

      538_graphs_AO_19_1

      Generating a signature bar

      The signature bar of the example FTE graph presented above has a few obvious characteristics:

      • It's positioned at the bottom of the graph.
      • The author's name is located on the left side of the signature bar.
      • The source of the data is mentioned on the right side of the signature bar.
      • The text has a light grey color (the same as the background color of the graph), and a dark grey background.
      • The area in-between the author's name and the source name has a dark grey background as well.

      fandango-1

      The image is posted again so you don't have to scroll back. Source: FiveThirtyEight

      It may seem difficult to add such a signature bar, but with a little ingenuity we can get it done quite easily. We'll add a single snippet of text, give it a light grey color, and a background color of dark grey. We'll write both the author's name and the source in a single text snippet, but we'll space out these two such that one ends up on the far left side, and the other on the far right. The nice thing is that the whitespace characters will get a dark grey background as well, which will create the effect of a signature bar. We'll also use some white space characters to align the author's name and the name of the source, as you'll be able to see in the next code block. This is also a good moment to remove the label of the x-axis. This way, we can get a better visual sense of how the signature bar fits in the overall scheme of the graph. In the next code cell, we'll build up on what we've done so far, and we will:

      • Remove the label of the x-axis by passing in a False value to the set_visible() method we apply to the object fte_graph.xaxis.label. Think of it this way: we access the xaxis attribute of fte_graph, and then we access the label attribute of fte_graph.xaxis. Then we finally apply set_visible() to fte_graph.xaxis.label.
      • Add a snippet of text on the graph in the way discussed above. We'll use the text() hexadecimal; we use this format to match exactly the background color of the entire graph (as specified in the code of the fivethirtyeight style);
      • backgroundcolor - sets the background color of the text snippet.
    
    # The previous code
    fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8))
    fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
    fte_graph.set_yticklabels(labels = [-10, '0   ', '10   ', '20   ', '30   ', '40   ', '50%'])
    fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7)
    fte_graph.set_xlim(left = 1969, right = 2011)
    
    # Remove the label of the x-axis
    fte_graph.xaxis.label.set_visible(False)
    
    # The signature bar
    fte_graph.text(x = 1965.8, y = -7,
        s = ' ©DATAQUEST Source: National Center for Education Statistics',fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey')
    

    538_graphs_AO_21_1 The x and y coordinates of the text snippet added were found through a process of trial and error. You can pass in floats to the x and y parameters, so you'll be able to control the position of the text with a high level of precision. It's also worth mentioning that we tweaked the positioning of the signature bar in such a way that we added some visually refreshing lateral margins (we discussed this adjustment earlier). To increase the left margin, we simply lowered the value of the x-coordinate. To increase the right one, we added more whitespace characters between the author's name and the source's name - this pushes the source's name to the right, which results in adding the desired margin.

    A different kind of signature bar

    You'll also meet a slightly different kind of signature bar:

    olympics1 Source: FiveThirtyEight This kind of signature bar can be replicated quite easily as well. We'll just add some grey colored text, and a line right above it. We'll create the visual effect of a line by adding a snippet of text of multiple underscore characters ("_"). You might wonder why we're not using axhline() to simply draw a horizontal line at the y-coordinate we want. We don't do that because the new line will drag down the entire grid of the graph, and this won't create the desired effect. We could also try adding an arrow, and then remove the pointer so we end up with a line. However, the "underscore" solution is much simpler. In the next code block, we implement what we've just discussed. The methods and parameters we use here should already be familiar from earlier sections.

    
    
    # The previous code
    fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8))
    fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
    fte_graph.set_yticklabels(labels = [-10, '0   ', '10   ', '20   ', '30   ', '40   ', '50%'])
    fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7)
    fte_graph.xaxis.label.set_visible(False)
    fte_graph.set_xlim(left = 1969, right = 2011)
    
    # The other signature bar
    fte_graph.text(x = 1967.1, y = -6.5,
        s = '________________________________________________________________________________________________________________',
        color = 'grey', alpha = .7)
    
    fte_graph.text(x = 1966.1, y = -9,
        s = '   ©DATAQUEST                                                                               Source: National Center for Education Statistics   ',
        fontsize = 14, color = 'grey', alpha = .7)
    

    538_graphs_AO_23_1

    Adding a title and subtitle

    If you examine

    a couple of FTE graphs, you may notice these patterns with regard to the title and the subtitle:

    • The title is almost invariably complemented by a subtitle.
    • The title gives a contextual angle to look from at a particular graph. The title is almost never technical, and it usually expresses a single, simple idea. It's also almost never emotionally-neutral. In the Fandango graph above, we can see a simple, "emotionally-active" title ("Fandango LOVES Movies"), and not a bland "The distribution of various movie rating types".
    • The subtitle offers technical information about the graph. This information is what makes axis labels redundant oftentimes. We should be careful to customize our subtitle accordingly since we've already dropped the x-axis label.
    • Visually, the title and the subtitle have different font weights, and they are left-aligned (unlike most titles, which are centered). Also, they are aligned vertically with the major tick labels of the y-axis, as we showed earlier.

    Let's now add a title and a subtitle to our graph while being mindful of the above observations. In the code block below, we'll build upon what we've coded so far, and we will:

    • Add a title and a subtitle by using the same text() method we used to add text in the signature bar. If you already have some experience with matplotlib, you might wonder why we don't use the title() and suptitle() methods. This is because these two methods have an awful functionality with regard to moving text with precision. The only new parameter for text() is weight. We use it to bold the title.
    
    # The previous code
    fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8))
    fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
    fte_graph.set_yticklabels(labels = [-10, '0   ', '10   ', '20   ', '30   ', '40   ', '50%'])
    fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7)
    fte_graph.xaxis.label.set_visible(False)
    fte_graph.set_xlim(left = 1969, right = 2011)
    fte_graph.text(x = 1965.8, y = -7,
        s = '   ©DATAQUEST                                                                                 Source: National Center for Education Statistics   ',
        fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey')
    
    # Adding a title and a subtitle
    fte_graph.text(x = 1966.65, y = 62.7, s = "The gender gap is transitory - even for extreme cases",
                   fontsize = 26, weight = 'bold', alpha = .75)
    fte_graph.text(x = 1966.65, y = 57, 
                   s = 'Percentage of Bachelors conferred to women from 1970 to 2011 in the US for\nextreme cases where the percentage was less than 20% in 1970',
                  fontsize = 19, alpha = .85)
    

    538_graphs_AO_25_1 In case you were wondering, the font used in the original FTE graphs is Decima Mono, a paywalled font. For this reason, we'll stick with Matplotlib's default font, which looks pretty similar anyway.

    Adding colorblind-friendly colors

    Right now, we have that clunky, rectangular legend. We'll get rid of it, and add colored labels near each plot line. Each line will have a certain color, and a word of an identical color will name the Bachelor which that line corresponds to. First, however, we'll modify the default colors of the plot lines, and add

    colorblind-friendly colors: cb_friendly Source: Points of View: Color blindness by Bang Wong We'll compile a list of RGB parameters for colorblind-friendly colors by using values from the above image. As a side note, we avoid using yellow because text snippets with that color are not easily readable on the graph's dark grey background color. After compiling this list of RGB parameters, we'll then pass it to the color parameter of the plot() method we used in our previous code. Note that matplotlib will require the RGB parameters to be within a 0-1 range, so we'll divide every value by 255, the maximum RGB value. We won't bother dividing the zeros because 0/255 = 0.

    
    # Colorblind-friendly colors
    colors = [[0,0,0], [230/255,159/255,0], [86/255,180/255,233/255], [0,158/255,115/255], 
              [213/255,94/255,0], [0,114/255,178/255]]
    
    # The previous code we modify
    fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8), color = colors)
    
    # The previous code that remains the same
    fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
    fte_graph.set_yticklabels(labels = [-10, '0   ', '10   ', '20   ', '30   ', '40   ', '50%'])
    fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7)
    fte_graph.xaxis.label.set_visible(False)
    fte_graph.set_xlim(left = 1969, right = 2011)
    fte_graph.text(x = 1965.8, y = -7,
        s = '   ©DATAQUEST                                                                                 Source: National Center for Education Statistics   ',
        fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey')
    fte_graph.text(x = 1966.65, y = 62.7, s = "The gender gap is transitory - even for extreme cases",
                   fontsize = 26, weight = 'bold', alpha = .75)
    fte_graph.text(x = 1966.65, y = 57, 
                   s = 'Percentage of Bachelors conferred to women from 1970 to 2011 in the US for\nextreme cases where the percentage was less than 20% in 1970',
                  fontsize = 19, alpha = .85)
    

    538_graphs_AO_27_1

    Changing the legend style by adding colored labels

    Finally, we add colored labels to each plot line by using the

    text() method used earlier. The only new parameter is rotation, which we use to rotate each label so that it fits elegantly on the graph. We'll also do a little trick here, and make the grid lines transparent around labels by simply modifying their background color to match that of the graph. In our previous code we only modify the plot() method by setting the legend parameter to False. This will get us rid of the default legend. We also skip redeclaring the colors list since it's already stored in memory from the previous cell.

    
    # The previous code we modify
    fte_graph = women_majors.plot(x = 'Year', y = under_20.index, figsize = (12,8), color = colors, legend = False)
    
    # The previous code that remains unchanged
    fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
    fte_graph.set_yticklabels(labels = [-10, '0   ', '10   ', '20   ', '30   ', '40   ', '50%'])
    fte_graph.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7)
    fte_graph.xaxis.label.set_visible(False)
    fte_graph.set_xlim(left = 1969, right = 2011)
    fte_graph.text(x = 1965.8, y = -7,
        s = '   ©DATAQUEST                                                                                 Source: National Center for Education Statistics   ',
        fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey')
    fte_graph.text(x = 1966.65, y = 62.7, s = "The gender gap is transitory - even for extreme cases",
                   fontsize = 26, weight = 'bold', alpha = .75)
    fte_graph.text(x = 1966.65, y = 57, 
                   s = 'Percentage of Bachelors conferred to women from 1970 to 2011 in the US for\nextreme cases where the percentage was less than 20% in 1970',
                  fontsize = 19, alpha = .85)
    
    # Add colored labels
    fte_graph.text(x = 1994, y = 44, s = 'Agriculture', color = colors[0], weight = 'bold', rotation = 33,
                  backgroundcolor = '#f0f0f0')
    fte_graph.text(x = 1985, y = 42.2, s = 'Architecture', color = colors[1], weight = 'bold', rotation = 18,
                  backgroundcolor = '#f0f0f0')
    fte_graph.text(x = 2004, y = 51, s = 'Business', color = colors[2], weight = 'bold', rotation = -5, 
                   backgroundcolor = '#f0f0f0')
    fte_graph.text(x = 2001, y = 30, s = 'Computer Science', color = colors[3], weight = 'bold', rotation = -42.5,
                  backgroundcolor = '#f0f0f0')
    fte_graph.text(x = 1987, y = 11.5, s = 'Engineering', color = colors[4], weight = 'bold',
                  backgroundcolor = '#f0f0f0')
    fte_graph.text(x = 1976, y = 25, s = 'Physical Sciences', color = colors[5], weight = 'bold', rotation = 27,
                  backgroundcolor = '#f0f0f0')
    

    538_graphs_AO_29_1

    Next steps

    That's it, our graph is now ready for publication! To do a short recap, we've started with generating a graph with matplotlib's default style. We then brought that graph to "FTE-level" through a series of steps:

    • We used matplotlib's in-built fivethirtyeight style.
    • We added a title and a subtitle, and customized each.
    • We added a signature bar.
    • We removed the default legend, and added colored labels.
    • We made a series of other small adjustments: customizing the tick labels, bolding the horizontal line at y = 0, adding a vertical grid line near the tick labels, removing the label of the x-axis, and increasing the lateral margins of the y-axis.

    To build upon what you've learned, here are a few next steps to consider:

    • Generate a similar graph for other Bachelors.
    • Generate different kinds of FTE graphs: a histogram, a scatter plot etc.
    • Explore matplotlib's gallery to search for potential elements to enrich your FTE graphs (like inserting images, or adding arrows etc.). Adding images can take your FTE graphs to a whole new level:

    dinos1

    Source: FiveThirtyEight

Alex Olteanu

About the author

Alex Olteanu

Senior Data Science Instructor. Technical Writer. Fiction Writer. Interested in short-term projects.