February 2, 2022

Grouping Data: A Step-by-Step Tutorial to GroupBy in Pandas

In this tutorial, we will explore how to create a GroupBy object in pandas library of Python and how this object works. We will take a detailed look at each step of a grouping process, what methods can be applied to a GroupBy object, and what information we can extract from it.

The 3 Steps of a Groupby Process

Any groupby process involves some combination of the following 3 steps:

Splitting the original object into groups based on the defined criteria.
Applying a function to each group.
Combining the results.

Let's explore this split-apply-combine chain step-by-step with an example from a Kaggle Nobel Prize Dataset:

import pandas as pd
import numpy as np

pd.set_option('max_columns', None)

df = pd.read_csv('complete.csv')
df = df[['awardYear', 'category', 'prizeAmount', 'prizeAmountAdjusted', 'name', 'gender', 'birth_continent']]
df.head()

	awardYear	category	prizeAmount	prizeAmountAdjusted	name	gender	birth_continent
0	2001	Economic Sciences	10000000	12295082	A. Michael Spence	male	North America
1	1975	Physics	630000	3404179	Aage N. Bohr	male	Europe
2	2004	Chemistry	10000000	11762861	Aaron Ciechanover	male	Asia
3	1982	Chemistry	1150000	3102518	Aaron Klug	male	Europe
4	1979	Physics	800000	2988048	Abdus Salam	male	Asia

Splitting the Original Object into Groups

At this stage, we call the pandas DataFrame.groupby() function. We use it to split the data into groups based on predefined criteria, along rows (by default, axis=0), or columns (axis=1). In other words, this function maps the labels to the names of the groups.

For example, in our case, we can group the data on Nobel prizes by prize category:

grouped = df.groupby('category')

It is also possible to use more than one column to perform data grouping, passing a list of columns. Let's group our data first by prize category and then, inside each of the created groups, we will apply additional grouping based on the year of the prize award:

grouped_category_year = df.groupby(['category', 'awardYear'])

Now, if we try to print one of the two GroupBy objects that we created, we will actually see no groups:

print(grouped)

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000026083789DF0>

It is important to note that creating a GroupBy object only checks if we have passed a correct mapping; it doesn't really perform any of the operations of the split-apply-combine chain until we explicitly use some method on this object or extract some of its attributes.

To briefly inspect the resulting GroupBy object and check how exactly the groups were split, we can extract from it the groups or indices attribute. Both of them return a dictionary where the keys are the created groups and the values are the lists of axis labels (for the groups attribute) or indices (for the indices attribute) of the instances of each group in the original DataFrame:

grouped.indices

{'Chemistry': array([  2,   3,   7,   9,  10,  11,  13,  14,  15,  17,  19,  39,  62,
         64,  66,  71,  75,  80,  81,  86,  92, 104, 107, 112, 129, 135,
        153, 169, 175, 178, 181, 188, 197, 199, 203, 210, 215, 223, 227,
        239, 247, 249, 258, 264, 265, 268, 272, 274, 280, 282, 284, 289,
        296, 298, 310, 311, 317, 318, 337, 341, 343, 348, 352, 357, 362,
        365, 366, 372, 374, 384, 394, 395, 396, 415, 416, 419, 434, 440,
        442, 444, 446, 448, 450, 455, 456, 459, 461, 463, 465, 469, 475,
        504, 505, 508, 518, 522, 523, 524, 539, 549, 558, 559, 563, 567,
        571, 572, 585, 591, 596, 599, 627, 630, 632, 641, 643, 644, 648,
        659, 661, 666, 667, 668, 671, 673, 679, 681, 686, 713, 715, 717,
        719, 720, 722, 723, 725, 726, 729, 732, 738, 742, 744, 746, 751,
        756, 759, 763, 766, 773, 776, 798, 810, 813, 814, 817, 827, 828,
        829, 832, 839, 848, 853, 855, 862, 866, 880, 885, 886, 888, 889,
        892, 894, 897, 902, 904, 914, 915, 920, 921, 922, 940, 941, 943,
        946, 947], dtype=int64),
 'Economic Sciences': array([  0,   5,  45,  46,  58,  90,  96, 139, 140, 145, 152, 156, 157,
        180, 187, 193, 207, 219, 231, 232, 246, 250, 269, 279, 283, 295,
        305, 324, 346, 369, 418, 422, 425, 426, 430, 432, 438, 458, 467,
        476, 485, 510, 525, 527, 537, 538, 546, 580, 594, 595, 605, 611,
        636, 637, 657, 669, 670, 678, 700, 708, 716, 724, 734, 737, 739,
        745, 747, 749, 750, 753, 758, 767, 800, 805, 854, 856, 860, 864,
        871, 882, 896, 912, 916, 924], dtype=int64),
 'Literature': array([ 21,  31,  40,  49,  52,  98, 100, 101, 102, 111, 115, 142, 149,
        159, 170, 177, 201, 202, 220, 221, 233, 235, 237, 253, 257, 259,
        275, 277, 278, 286, 312, 315, 316, 321, 326, 333, 345, 347, 350,
        355, 359, 364, 370, 373, 385, 397, 400, 403, 406, 411, 435, 439,
        441, 454, 468, 479, 480, 482, 483, 492, 501, 506, 511, 516, 556,
        569, 581, 602, 604, 606, 613, 614, 618, 631, 633, 635, 640, 652,
        653, 655, 656, 665, 675, 683, 699, 761, 765, 771, 774, 777, 779,
        780, 784, 786, 788, 796, 799, 803, 836, 840, 842, 850, 861, 867,
        868, 878, 881, 883, 910, 917, 919, 927, 928, 929, 930, 936],
       dtype=int64),
 'Peace': array([  6,  12,  16,  25,  26,  27,  34,  36,  44,  47,  48,  54,  61,
         65,  72,  78,  79,  82,  95,  99, 116, 119, 120, 126, 137, 146,
        151, 166, 167, 171, 200, 204, 205, 206, 209, 213, 225, 236, 240,
        244, 255, 260, 266, 267, 270, 287, 303, 320, 329, 356, 360, 361,
        377, 386, 387, 388, 389, 390, 391, 392, 393, 433, 447, 449, 471,
        477, 481, 489, 491, 500, 512, 514, 517, 528, 529, 530, 533, 534,
        540, 542, 544, 545, 547, 553, 555, 560, 562, 574, 578, 590, 593,
        603, 607, 608, 609, 612, 615, 616, 617, 619, 620, 628, 634, 639,
        642, 664, 677, 688, 697, 703, 705, 710, 727, 736, 787, 793, 795,
        806, 823, 846, 847, 852, 865, 875, 876, 877, 895, 926, 934, 935,
        937, 944, 948, 949], dtype=int64),
 'Physics': array([  1,   4,   8,  20,  23,  24,  30,  32,  38,  51,  59,  60,  67,
         68,  69,  70,  74,  84,  89,  97, 103, 105, 108, 109, 114, 117,
        118, 122, 125, 127, 128, 130, 133, 141, 143, 144, 155, 162, 163,
        164, 165, 168, 173, 174, 176, 179, 183, 195, 212, 214, 216, 222,
        224, 228, 230, 234, 238, 241, 243, 251, 256, 263, 271, 276, 291,
        292, 297, 301, 306, 307, 308, 323, 327, 328, 330, 335, 336, 338,
        349, 351, 353, 354, 363, 367, 375, 376, 378, 381, 382, 398, 399,
        402, 404, 405, 408, 410, 412, 413, 420, 421, 424, 428, 429, 436,
        445, 451, 453, 457, 460, 462, 470, 472, 487, 495, 498, 499, 509,
        513, 515, 521, 526, 532, 535, 536, 541, 548, 550, 552, 557, 561,
        564, 565, 566, 573, 576, 577, 579, 583, 586, 588, 592, 601, 610,
        621, 622, 623, 629, 647, 650, 651, 654, 658, 674, 676, 682, 684,
        690, 691, 693, 694, 695, 696, 698, 702, 707, 711, 714, 721, 730,
        731, 735, 743, 752, 755, 770, 772, 775, 781, 785, 790, 792, 797,
        801, 802, 808, 822, 833, 834, 835, 844, 851, 870, 872, 879, 884,
        887, 890, 893, 900, 901, 903, 905, 907, 908, 909, 913, 925, 931,
        932, 933, 938, 942, 945], dtype=int64),
 'Physiology or Medicine': array([ 18,  22,  28,  29,  33,  35,  37,  41,  42,  43,  50,  53,  55,
         56,  57,  63,  73,  76,  77,  83,  85,  87,  88,  91,  93,  94,
        106, 110, 113, 121, 123, 124, 131, 132, 134, 136, 138, 147, 148,
        150, 154, 158, 160, 161, 172, 182, 184, 185, 186, 189, 190, 191,
        192, 194, 196, 198, 208, 211, 217, 218, 226, 229, 242, 245, 248,
        252, 254, 261, 262, 273, 281, 285, 288, 290, 293, 294, 299, 300,
        302, 304, 309, 313, 314, 319, 322, 325, 331, 332, 334, 339, 340,
        342, 344, 358, 368, 371, 379, 380, 383, 401, 407, 409, 414, 417,
        423, 427, 431, 437, 443, 452, 464, 466, 473, 474, 478, 484, 486,
        488, 490, 493, 494, 496, 497, 502, 503, 507, 519, 520, 531, 543,
        551, 554, 568, 570, 575, 582, 584, 587, 589, 597, 598, 600, 624,
        625, 626, 638, 645, 646, 649, 660, 662, 663, 672, 680, 685, 687,
        689, 692, 701, 704, 706, 709, 712, 718, 728, 733, 740, 741, 748,
        754, 757, 760, 762, 764, 768, 769, 778, 782, 783, 789, 791, 794,
        804, 807, 809, 811, 812, 815, 816, 818, 819, 820, 821, 824, 825,
        826, 830, 831, 837, 838, 841, 843, 845, 849, 857, 858, 859, 863,
        869, 873, 874, 891, 898, 899, 906, 911, 918, 923, 939], dtype=int64)}

To find the number of groups in a GroupBy object, we can extract from it the ngroups attribute or call the len function of the Python standard library:

print(grouped.ngroups)
print(len(grouped))

6
6

If we need to visualize all or some entries of each group, we can iterate over the GroupBy object:

for name, entries in grouped:
    print(f'First 2 entries for the "{name}" category:')
    print(30*'-')
    print(entries.head(2), '\n\n')

First 2 entries for the "Chemistry" category:
------------------------------
   awardYear   category  prizeAmount  prizeAmountAdjusted               name  \
2       2004  Chemistry     10000000             11762861  Aaron Ciechanover   
3       1982  Chemistry      1150000              3102518         Aaron Klug   

  gender birth_continent  
2   male            Asia  
3   male          Europe   

First 2 entries for the "Economic Sciences" category:
------------------------------
   awardYear           category  prizeAmount  prizeAmountAdjusted  \
0       2001  Economic Sciences     10000000             12295082   
5       2019  Economic Sciences      9000000              9000000   

                name gender birth_continent  
0  A. Michael Spence   male   North America  
5   Abhijit Banerjee   male            Asia   

First 2 entries for the "Literature" category:
------------------------------
    awardYear    category  prizeAmount  prizeAmountAdjusted  \
21       1957  Literature       208629              2697789   
31       1970  Literature       400000              3177966   

                     name gender birth_continent  
21           Albert Camus   male          Africa  
31  Alexandr Solzhenitsyn   male          Europe   

First 2 entries for the "Peace" category:
------------------------------
    awardYear category  prizeAmount  prizeAmountAdjusted  \
6        2019    Peace      9000000              9000000   
12       1980    Peace       880000              2889667   

                     name gender birth_continent  
6          Abiy Ahmed Ali   male          Africa  
12  Adolfo Pérez Esquivel   male   South America   

First 2 entries for the "Physics" category:
------------------------------
   awardYear category  prizeAmount  prizeAmountAdjusted          name gender  \
1       1975  Physics       630000              3404179  Aage N. Bohr   male   
4       1979  Physics       800000              2988048   Abdus Salam   male   

  birth_continent  
1          Europe  
4            Asia   

First 2 entries for the "Physiology or Medicine" category:
------------------------------
    awardYear                category  prizeAmount  prizeAmountAdjusted  \
18       1963  Physiology or Medicine       265000              2839286   
22       1974  Physiology or Medicine       550000              3263449   

             name gender birth_continent  
18   Alan Hodgkin   male          Europe  
22  Albert Claude   male          Europe

If instead, we want to select a single group in a form of DataFrame, we should use the method get_group() on the GroupBy object:

grouped.get_group('Economic Sciences')

	awardYear	category	prizeAmount	prizeAmountAdjusted	name	gender	birth_continent
0	2001	Economic Sciences	10000000	12295082	A. Michael Spence	male	North America
5	2019	Economic Sciences	9000000	9000000	Abhijit Banerjee	male	Asia
45	2012	Economic Sciences	8000000	8361204	Alvin E. Roth	male	North America
46	1998	Economic Sciences	7600000	9713701	Amartya Sen	male	Asia
58	2015	Economic Sciences	8000000	8384572	Angus Deaton	male	Europe
...	...	...	...	...	...	...	...
882	2002	Economic Sciences	10000000	12034660	Vernon L. Smith	male	North America
896	1973	Economic Sciences	510000	3331882	Wassily Leontief	male	Europe
912	2018	Economic Sciences	9000000	9000000	William D. Nordhaus	male	North America
916	1990	Economic Sciences	4000000	6329114	William F. Sharpe	male	North America
924	1996	Economic Sciences	7400000	9490424	William Vickrey	male	North America

84 rows × 7 columns

Applying a Function by Group

After splitting the original data and (optionally) inspecting the resulting groups, we can perform one of the following operations or their combination (not necessarily in the given order) on each group:

Aggregation: calculating a summary statistic for each group (e.g., group sizes, means, medians, or sums) and outputting a single number for many data points.
Transformation: conducting some operations by group, such as calculating the z-score for each group.
Filtration: rejecting some groups based on a predefined condition, such as group size, mean, median, or sum. This can also include filtering out particular rows from each group.

Aggregation

To aggregate the data of a GroupBy object (i.e., to calculate a summary statistic by group), we can use the agg() method on the object:

# Showing only 1 decimal for all float numbers
pd.options.display.float_format = '{:.1f}'.format

grouped.agg(np.mean)

	awardYear	prizeAmount	prizeAmountAdjusted
category
Chemistry	1972.3	3629279.4	6257868.1
Economic Sciences	1996.1	6105845.2	7837779.2
Literature	1960.9	2493811.2	5598256.3
Peace	1964.5	3124879.2	6163906.9
Physics	1971.1	3407938.6	6086978.2
Physiology or Medicine	1970.4	3072972.9	5738300.7

The code above produces a DataFrame with the group names as its new index and the mean values for each numeric column by group.

Instead of using the agg() method, we can apply the corresponding pandas method directly on a GroupBy object. The most common methods are mean(), median(), mode(), sum(), size(), count(), min(), max(), std(), var() (computes the variance of each group), describe() (outputs descriptive statistics by group), and nunique() (gives the number of unique values in each group).

grouped.sum()

	awardYear	prizeAmount	prizeAmountAdjusted
category
Chemistry	362912	667787418	1151447726
Economic Sciences	167674	512891000	658373449
Literature	227468	289282102	649397731
Peace	263248	418733807	825963521
Physics	419837	725890928	1296526352
Physiology or Medicine	431508	672981066	1256687857

Usually, we are interested only in the statistics for some particular column or columns, so we need to specify it (or them). In the example above, we definitely don't want to sum all the years. Instead, we may want to sum the prize values by prize category. For this purpose, we can select the prizeAmountAdjusted column of the GroupBy object, just like we select a column of a DataFrame, and apply the sum() function on it:

grouped['prizeAmountAdjusted'].sum()

category
Chemistry                 1151447726
Economic Sciences          658373449
Literature                 649397731
Peace                      825963521
Physics                   1296526352
Physiology or Medicine    1256687857
Name: prizeAmountAdjusted, dtype: int64

For the piece of code above (as well as for some of the next examples), we can use an equivalent syntax applying the function on a GroupBy object before selecting the necessary column: grouped.sum()['prizeAmountAdjusted']. However, the previous syntax is preferable since it performs better, especially on large datasets.

If we need to aggregate data for two or more columns, we use double square brackets:

grouped[['prizeAmount', 'prizeAmountAdjusted']].sum()

	prizeAmount	prizeAmountAdjusted
category
Chemistry	667787418	1151447726
Economic Sciences	512891000	658373449
Literature	289282102	649397731
Peace	418733807	825963521
Physics	725890928	1296526352
Physiology or Medicine	672981066	1256687857

It is possible to apply several functions at once to one or more columns of a GroupBy object. For this purpose, we again need the agg() method and a list of the functions of interest:

grouped[['prizeAmount', 'prizeAmountAdjusted']].agg([np.sum, np.mean, np.std])

	prizeAmount			prizeAmountAdjusted
	sum	mean	std	sum	mean	std
category
Chemistry	667787418	3629279.4	4070588.4	1151447726	6257868.1	3276027.2
Economic Sciences	512891000	6105845.2	3787630.1	658373449	7837779.2	3313153.2
Literature	289282102	2493811.2	3653734.0	649397731	5598256.3	3029512.1
Peace	418733807	3124879.2	3934390.9	825963521	6163906.9	3189886.1
Physics	725890928	3407938.6	4013073.0	1296526352	6086978.2	3294268.5
Physiology or Medicine	672981066	3072972.9	3898539.3	1256687857	5738300.7	3241781.0

Also, we can consider applying different aggregation functions to different columns of a GroupBy object by passing a dictionary:

grouped.agg({'prizeAmount': [np.sum, np.size], 'prizeAmountAdjusted': np.mean})

	prizeAmount		prizeAmountAdjusted
	sum	size	mean
category
Chemistry	667787418	184	6257868.1
Economic Sciences	512891000	84	7837779.2
Literature	289282102	116	5598256.3
Peace	418733807	134	6163906.9
Physics	725890928	213	6086978.2
Physiology or Medicine	672981066	219	5738300.7

Transformation

Unlike aggregation methods (and also unlike filtration ones, as we will see soon), transformation methods return a new DataFrame with the same shape and indexing as the original one but with transformed individual values. Here it is important to note that the transformation must not modify any values in the original DataFrame, meaning that such operations cannot be performed in place.

The most common pandas method to transform the data of a GroupBy object is transform(). For example, it can be helpful for computing the z-score for each group:

grouped[['prizeAmount', 'prizeAmountAdjusted']].transform(lambda x: (x - x.mean()) / x.std())

	prizeAmount	prizeAmountAdjusted
0	1.0	1.3
1	-0.7	-0.8
2	1.6	1.7
3	-0.6	-1.0
4	-0.6	-0.9
...	...	...
945	-0.7	-0.8
946	-0.8	-1.1
947	-0.9	0.3
948	-0.5	-1.0
949	-0.7	-1.0

950 rows × 2 columns

With transformation methods, we can also replace missing data with the group mean, median, mode, or any other value:

grouped['gender'].transform(lambda x: x.fillna(x.mode()[0]))

0        male
1        male
2        male
3        male
4        male
        ...  
945      male
946      male
947    female
948      male
949      male
Name: gender, Length: 950, dtype: object

There are some other pandas methods that we can use to transform the data of a GroupBy object: bfill(), ffill(), diff(), pct_change(), rank(), shift(), quantile(), etc.

Filtration

Filtration methods discard the groups or particular rows from each group based on a predefined condition and return a subset of the original data. For example, we may want to keep only the values of a certain column from all the groups where the group mean for that column is greater than a predefined value. In the case of our DataFrame, let's filter out all the groups with a group mean for the prizeAmountAdjusted column smaller than 7,000,000, and keep only this column in the output:

grouped['prizeAmountAdjusted'].filter(lambda x: x.mean() > 7000000)

0      12295082
5       9000000
45      8361204
46      9713701
58      8384572
         ...   
882    12034660
896     3331882
912     9000000
916     6329114
924     9490424
Name: prizeAmountAdjusted, Length: 84, dtype: int64

Another example is filtering out the groups with more than a certain number of elements:

grouped['prizeAmountAdjusted'].filter(lambda x: len(x) < 100)

0      12295082
5       9000000
45      8361204
46      9713701
58      8384572
         ...   
882    12034660
896     3331882
912     9000000
916     6329114
924     9490424
Name: prizeAmountAdjusted, Length: 84, dtype: int64

In both the operations above, we used the filter() method passing a lambda function as an argument. Such a function, applied to the entire group, returns True or False based on the result of comparison of that group with predifined statistical condition. In other words, the function inside the filter() method decides which groups to keep in the new dataframe.

Along with filtering out the whole groups, it is also possible to discard certain rows from each group. Some helpful methods here are first(), last(), and nth(). Applying one of them on a GroupBy object returns the first/last/nth entry of each group correspondingly:

grouped.last()

	awardYear	prizeAmount	prizeAmountAdjusted	name	gender	birth_continent
category
Chemistry	1911	140695	7327865	Marie Curie	female	Europe
Economic Sciences	1996	7400000	9490424	William Vickrey	male	North America
Literature	1968	350000	3052326	Yasunari Kawabata	male	Asia
Peace	1963	265000	2839286	International Committee of the Red Cross	male	Asia
Physics	1972	480000	3345725	John Bardeen	male	North America
Physiology or Medicine	2016	8000000	8301051	Yoshinori Ohsumi	male	Asia

For the nth() method, we have to pass the integer representing the index of entry we want to return for each group:

grouped.nth(1)

	awardYear	prizeAmount	prizeAmountAdjusted	name	gender	birth_continent
category
Chemistry	1982	1150000	3102518	Aaron Klug	male	Europe
Economic Sciences	2019	9000000	9000000	Abhijit Banerjee	male	Asia
Literature	1970	400000	3177966	Alexandr Solzhenitsyn	male	Europe
Peace	1980	880000	2889667	Adolfo Pérez Esquivel	male	South America
Physics	1979	800000	2988048	Abdus Salam	male	Asia
Physiology or Medicine	1974	550000	3263449	Albert Claude	male	Europe

The piece of code above gathers the second entries of all the groups, keeping in mind the 0-indexing in Python.

Two more methods for filtering out the rows in each group are head() and tail(), returning the first/last n rows (5, by default) of each group correspondingly:

grouped.head(3)

	awardYear	category	prizeAmount	prizeAmountAdjusted	name	gender	birth_continent
0	2001	Economic Sciences	10000000	12295082	A. Michael Spence	male	North America
1	1975	Physics	630000	3404179	Aage N. Bohr	male	Europe
2	2004	Chemistry	10000000	11762861	Aaron Ciechanover	male	Asia
3	1982	Chemistry	1150000	3102518	Aaron Klug	male	Europe
4	1979	Physics	800000	2988048	Abdus Salam	male	Asia
5	2019	Economic Sciences	9000000	9000000	Abhijit Banerjee	male	Asia
6	2019	Peace	9000000	9000000	Abiy Ahmed Ali	male	Africa
7	2009	Chemistry	10000000	10958504	Ada E. Yonath	female	Asia
8	2011	Physics	10000000	10545557	Adam G. Riess	male	North America
12	1980	Peace	880000	2889667	Adolfo Pérez Esquivel	male	South America
16	2007	Peace	10000000	11301989	Al Gore	male	North America
18	1963	Physiology or Medicine	265000	2839286	Alan Hodgkin	male	Europe
21	1957	Literature	208629	2697789	Albert Camus	male	Africa
22	1974	Physiology or Medicine	550000	3263449	Albert Claude	male	Europe
28	1937	Physiology or Medicine	158463	4716161	Albert Szent-Györgyi	male	Europe
31	1970	Literature	400000	3177966	Alexandr Solzhenitsyn	male	Europe
40	2013	Literature	8000000	8365867	Alice Munro	female	North America
45	2012	Economic Sciences	8000000	8361204	Alvin E. Roth	male	North America

Combining the Results

The last stage of the split-apply-combine chain — combining the results — is performed under the hood by pandas. It consists of taking the outputs of all of the operations conducted on a GroupBy object and uniting them back together, producing a new data structure, such as a Series or DataFrame. Assigning this data structure to a variable, we can use it to solve other tasks.

Conclusion

In this tutorial, we covered many aspects of using the pandas groupby function and working with the resulting object. In particular, we learned the following:

The steps a grouping process includes
How the split-apply-combine chain works, step-by-step
How to create a GroupBy object
How to briefly inspect a GroupBy object
The attributes of a GroupBy object
The operations that can be applied on a GroupBy object
How to calculate summary statistics by group and what methods are available for this purpose
How to apply several functions at once to one or more columns of a GroupBy object
How to apply different aggregation functions to different columns of a GroupBy object
How and why to transform the values from the original DataFrame
How to filter the groups of a GroupBy object or particular rows of each group
How pandas combines the results of a grouping process
The data structures a grouping process produces

With this information, you're ready to start applying your skills in pandas groupby.

Tutorials

Grouping Data: A Step-by-Step Tutorial to GroupBy in Pandas

The 3 Steps of a Groupby Process

Splitting the Original Object into Groups

Applying a Function by Group

Aggregation

Transformation

Filtration

Combining the Results

Conclusion

Tutorial: Comparing 7 Tools For Data Visualization in Python

Tutorial: Install Linux on Windows with Windows Subsystem for Linux

Grouping Data: A Step-by-Step Tutorial to GroupBy in Pandas

The 3 Steps of a Groupby Process

Splitting the Original Object into Groups

Applying a Function by Group

Aggregation

Transformation

Filtration

Combining the Results

Conclusion

More learning resources

Tutorial: Comparing 7 Tools For Data Visualization in Python

Tutorial: Install Linux on Windows with Windows Subsystem for Linux