/ Data Science Projects

Data Science Portfolio Project: Where to Advertise an E-learning Product

At Dataquest, we strongly advocate portfolio projects as a means of getting a first data science job. In this blog post, we'll walk you through an example portfolio project.

The project is part of our Statistics Intermediate: Averages and Variability course, and it assumes familiarity with:

  • Sampling (populations, samples, sample representativity)
  • Frequency distributions
  • Box plots and bar plots
  • Summary metrics (especially the mean)
  • pandas, matplotlib, and seaborn

If you think you need to fill in any gaps before moving forward, we cover the topics above in depth in our Statistics Fundamentals and Statistics Intermediate: Averages and Variability courses. This course will also give you deeper instructions on how to build this project, and code it in your browser.

We also teach pandas, matplotlib, and seaborn in our Data Scientist Path.

This project follows the guidelines presented in our style guide for data science projects.

Introduction

In this project, we'll aim to find the two best markets to advertise our product in — we're working for an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover other domains, like data science, game development, etc.

We'll analyze existing data about new coders to find the best markets to invest in for advertisting. To make our recommendation, we'll try to find out:

  • Where are these new coders located.
  • Which locations have the greatest number of new coders.
  • How much money new coders are willing to spend on learning.

Summary of results

After analyzing the data, the only solid conclusion we reached is that the US would be a good market to advertise in. For the second best market, it wasn't clear-cut what to choose between India and Canada. We decided to send the results to the marketing team so they can use their domain knowledge to take the best decision.

For more details, please refer to the the full analysis below.

Exploring existing data

To avoid spending money on organizing a survey, we'll first try to make use of existing data to determine whether we can reach a reliable result.

One good candidate for our purpose is freeCodeCamp's 2017 New Coder Survey. freeCodeCamp is a free e-learning platform that offers courses on web development. Because they run a popular Medium publication (over 400,000 followers), their survey attracted new coders with varying interests (not only web development), which is ideal for our analysis.

The survey data is publicly available in this GitHub repository. Below, we'll do a quick exploration of the 2017-fCC-New-Coders-Survey-Data.csv file stored in the clean-data folder of the repository we just mentioned. We'll read in the file using the direct link here.

# Read in the data
import pandas as pd
direct_link = 'https://raw.githubusercontent.com/freeCodeCamp/2017-new-coder-survey/master/clean-data/2017-fCC-New-Coders-Survey-Data.csv'
fcc = pd.read_csv(direct_link, low_memory = 0) # low_memory = False to silence dtypes warning

# Quick exploration of the data
print(fcc.shape)
pd.options.display.max_columns = 150 # to avoid truncated output 
fcc.head()
(18175, 136)
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventConferences CodeEventDjangoGirls CodeEventFCC CodeEventGameJam CodeEventGirlDev CodeEventHackathons CodeEventMeetup CodeEventNodeSchool CodeEventNone CodeEventOther CodeEventRailsBridge CodeEventRailsGirls CodeEventStartUpWknd CodeEventWkdBootcamps CodeEventWomenCode CodeEventWorkshops CommuteTime CountryCitizen CountryLive EmploymentField EmploymentFieldOther EmploymentStatus EmploymentStatusOther ExpectedEarning FinanciallySupporting FirstDevJob Gender GenderOther HasChildren HasDebt HasFinancialDependents HasHighSpdInternet HasHomeMortgage HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning ID.x ID.y Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobInterestBackEnd JobInterestDataEngr JobInterestDataSci JobInterestDevOps JobInterestFrontEnd JobInterestFullStack JobInterestGameDev JobInterestInfoSec JobInterestMobile JobInterestOther JobInterestProjMngr JobInterestQAEngr JobInterestUX JobPref JobRelocateYesNo JobRoleInterest JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming NetworkID Part1EndTime Part1StartTime Part2EndTime Part2StartTime PodcastChangeLog PodcastCodeNewbie PodcastCodePen PodcastDevTea PodcastDotNET PodcastGiantRobots PodcastJSAir PodcastJSJabber PodcastNone PodcastOther PodcastProgThrowdown PodcastRubyRogues PodcastSEDaily PodcastSERadio PodcastShopTalk PodcastTalkPython PodcastTheWebAhead ResourceCodecademy ResourceCodeWars ResourceCoursera ResourceCSS ResourceEdX ResourceEgghead ResourceFCC ResourceHackerRank ResourceKA ResourceLynda ResourceMDN ResourceOdinProj ResourceOther ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse ResourceUdacity ResourceUdemy ResourceW3S SchoolDegree SchoolMajor StudentDebtOwe YouTubeCodeCourse YouTubeCodingTrain YouTubeCodingTut360 YouTubeComputerphile YouTubeDerekBanas YouTubeDevTips YouTubeEngineeredTruth YouTubeFCC YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther YouTubeSimplilearn YouTubeTheNewBoston
0 27.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes Canada Canada software development and IT NaN Employed for wages NaN NaN NaN NaN female NaN NaN 1.0 0.0 1.0 0.0 0.0 0.0 NaN 15.0 02d9465b21e8bd09374b0066fb2d5614 eb78c1c3ac6cd9052aec557065070fbf NaN NaN 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN start your own business NaN NaN NaN English married or domestic partnership 150.0 6.0 6f1fbc6b2b 2017-03-09 00:36:22 2017-03-09 00:32:59 2017-03-09 00:59:46 2017-03-09 00:36:26 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 some college credit, no degree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 34.0 0.0 NaN NaN NaN NaN NaN less than 100,000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN United States of America United States of America NaN NaN Not working but looking for work NaN 35000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 10.0 5bfef9ecb211ec4f518cfc1d2a6f3e0c 21db37adb60cdcafadfa7dca1b13b6b1 NaN 0.0 0.0 0.0 NaN Within 7 to 12 months NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN work for a nonprofit 1.0 Full-Stack Web Developer in an office with other developers English single, never married 80.0 6.0 f8f8be6910 2017-03-09 00:37:07 2017-03-09 00:33:26 2017-03-09 00:38:59 2017-03-09 00:37:10 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 1.0 some college credit, no degree NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 21.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America software development and IT NaN Employed for wages NaN 70000.0 NaN NaN male NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 25.0 14f1863afa9c7de488050b82eb3edd96 21ba173828fbe9e27ccebaf4d5166a55 13000.0 1.0 0.0 0.0 0.0 Within 7 to 12 months 1.0 NaN NaN 1.0 1.0 1.0 NaN NaN 1.0 NaN NaN NaN NaN work for a medium-sized company 1.0 Front-End Web Developer, Back-End Web Develo... no preference Spanish single, never married 1000.0 5.0 2ed189768e 2017-03-09 00:37:58 2017-03-09 00:33:53 2017-03-09 00:40:14 2017-03-09 00:38:02 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN Codenewbie NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN high school diploma or equivalent (GED) NaN NaN NaN NaN 1.0 NaN 1.0 1.0 NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN
3 26.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN I work from home Brazil Brazil software development and IT NaN Employed for wages NaN 40000.0 0.0 NaN male NaN 0.0 1.0 1.0 1.0 1.0 0.0 0.0 40000.0 14.0 91756eb4dc280062a541c25a3d44cfb0 3be37b558f02daae93a6da10f83f0c77 24000.0 0.0 0.0 0.0 1.0 Within the next 6 months 1.0 NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN work for a medium-sized company NaN Front-End Web Developer, Full-Stack Web Deve... from home Portuguese married or domestic partnership 0.0 5.0 dbdc0664d1 2017-03-09 00:40:13 2017-03-09 00:37:45 2017-03-09 00:42:26 2017-03-09 00:40:18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN NaN some college credit, no degree NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN
4 20.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Portugal Portugal NaN NaN Not working but looking for work NaN 140000.0 NaN NaN female NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 10.0 aa3f061a1949a90b27bef7411ecd193f d7c56bbf2c7b62096be9db010e86d96d NaN 0.0 0.0 0.0 NaN Within 7 to 12 months 1.0 NaN NaN NaN 1.0 1.0 NaN 1.0 1.0 NaN NaN NaN NaN work for a multinational corporation 1.0 Full-Stack Web Developer, Information Security... in an office with other developers Portuguese single, never married 0.0 24.0 11b0f2d8a9 2017-03-09 00:42:45 2017-03-09 00:39:44 2017-03-09 00:45:42 2017-03-09 00:42:50 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN bachelor's degree Information Technology NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Checking for sample representativity

As we mentioned in the introduction, most of our courses are on web and mobile development, but we also cover many other domains, like data science and game development. For the purpose of our analysis, we want to answer questions about a population of new coders that are interested in the subjects we teach. As a reminder, we'd like to know:

  • Where are these new coders located.
  • Which locations have the most new coders.
  • How much money they're willing to spend on learning.

We first need to clarify whether the data set has the categories that we need. The JobRoleInterest column describes the role(s) each participant is interested in working in. If a participant is interested in working in a certain domain, we can assume that they're also interested in learning about that domain. So let's take a look at the frequency distribution table of this column and determine whether the data we have is relevant.

# Frequency distribution table for 'JobRoleInterest'
fcc['JobRoleInterest'].value_counts(normalize = True) * 100
Full-Stack Web Developer                                                                                                                                                                                                                                     11.770595
  Front-End Web Developer                                                                                                                                                                                                                                     6.435927
  Data Scientist                                                                                                                                                                                                                                              2.173913
Back-End Web Developer                                                                                                                                                                                                                                        2.030892
  Mobile Developer                                                                                                                                                                                                                                            1.673341
Game Developer                                                                                                                                                                                                                                                1.630435
Information Security                                                                                                                                                                                                                                          1.315789
Full-Stack Web Developer,   Front-End Web Developer                                                                                                                                                                                                           0.915332
  Front-End Web Developer, Full-Stack Web Developer                                                                                                                                                                                                           0.800915
  Product Manager                                                                                                                                                                                                                                             0.786613
Data Engineer                                                                                                                                                                                                                                                 0.758009
  User Experience Designer                                                                                                                                                                                                                                    0.743707
  User Experience Designer,   Front-End Web Developer                                                                                                                                                                                                         0.614989
  Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer                                                                                                                                                                                   0.557780
Back-End Web Developer, Full-Stack Web Developer,   Front-End Web Developer                                                                                                                                                                                   0.514874
Back-End Web Developer,   Front-End Web Developer, Full-Stack Web Developer                                                                                                                                                                                   0.514874
  DevOps / SysAdmin                                                                                                                                                                                                                                           0.514874
Full-Stack Web Developer,   Front-End Web Developer, Back-End Web Developer                                                                                                                                                                                   0.443364
  Front-End Web Developer, Full-Stack Web Developer, Back-End Web Developer                                                                                                                                                                                   0.429062
  Front-End Web Developer,   User Experience Designer                                                                                                                                                                                                         0.414760
Full-Stack Web Developer,   Mobile Developer                                                                                                                                                                                                                  0.414760
Back-End Web Developer, Full-Stack Web Developer                                                                                                                                                                                                              0.386156
Full-Stack Web Developer, Back-End Web Developer                                                                                                                                                                                                              0.371854
Back-End Web Developer,   Front-End Web Developer                                                                                                                                                                                                             0.286041
Full-Stack Web Developer, Back-End Web Developer,   Front-End Web Developer                                                                                                                                                                                   0.271739
Data Engineer,   Data Scientist                                                                                                                                                                                                                               0.271739
  Front-End Web Developer,   Mobile Developer                                                                                                                                                                                                                 0.257437
Full-Stack Web Developer,   Data Scientist                                                                                                                                                                                                                    0.243135
  Mobile Developer, Game Developer                                                                                                                                                                                                                            0.228833
  Data Scientist, Data Engineer                                                                                                                                                                                                                               0.228833
                                                                                                                                                                                                                                                               ...    
  Mobile Developer,   User Experience Designer, Full-Stack Web Developer,   DevOps / SysAdmin, Technical Writer                                                                                                                                               0.014302
Data Engineer,   Data Scientist, Information Security                                                                                                                                                                                                         0.014302
  Mobile Developer, Full-Stack Web Developer,   Product Manager, Game Developer, Information Security,   Front-End Web Developer,   User Experience Designer,   Data Scientist                                                                                0.014302
Back-End Web Developer, Game Developer, Data Engineer                                                                                                                                                                                                         0.014302
Full-Stack Web Developer, Information Security, Back-End Web Developer, Data Engineer,   Mobile Developer,   Data Scientist,   DevOps / SysAdmin                                                                                                              0.014302
Software Specialist                                                                                                                                                                                                                                           0.014302
Game Developer, Information Security,   Mobile Developer,   DevOps / SysAdmin, Full-Stack Web Developer,   Front-End Web Developer                                                                                                                            0.014302
Back-End Web Developer, Game Developer, Full-Stack Web Developer,   Front-End Web Developer,   DevOps / SysAdmin                                                                                                                                              0.014302
  Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer, Game Developer,   Mobile Developer                                                                                                                                               0.014302
Game Developer, Information Security, Full-Stack Web Developer, Back-End Web Developer                                                                                                                                                                        0.014302
  Front-End Web Developer,   Data Scientist, Game Developer,   Product Manager, Information Security                                                                                                                                                          0.014302
  Front-End Web Developer,   Mobile Developer, Information Security, Full-Stack Web Developer,   DevOps / SysAdmin, Back-End Web Developer, Game Developer                                                                                                    0.014302
  Mobile Developer, Game Developer, Full-Stack Web Developer, Back-End Web Developer,   Front-End Web Developer                                                                                                                                               0.014302
Data Engineer,   Front-End Web Developer,   Data Scientist, Full-Stack Web Developer                                                                                                                                                                          0.014302
  Product Manager, Back-End Web Developer,   Data Scientist, Full-Stack Web Developer, Game Developer,   User Experience Designer, Information Security                                                                                                       0.014302
  Mobile Developer, Back-End Web Developer,   Front-End Web Developer, Full-Stack Web Developer                                                                                                                                                               0.014302
  User Experience Designer, Full-Stack Web Developer,   Front-End Web Developer,   Mobile Developer, User Interface Design                                                                                                                                    0.014302
Full-Stack Web Developer,   Quality Assurance Engineer, Game Developer,   Front-End Web Developer,   User Experience Designer                                                                                                                                 0.014302
  Quality Assurance Engineer,   Front-End Web Developer,   User Experience Designer, Game Developer                                                                                                                                                           0.014302
  DevOps / SysAdmin,   Data Scientist, Full-Stack Web Developer, Information Security, Data Engineer, Back-End Web Developer                                                                                                                                  0.014302
Full-Stack Web Developer,   Data Scientist,   User Experience Designer,   Mobile Developer,   Front-End Web Developer                                                                                                                                         0.014302
Data Engineer,   Product Manager,   Data Scientist                                                                                                                                                                                                            0.014302
Full-Stack Web Developer,   User Experience Designer, Back-End Web Developer,   Data Scientist, Information Security, Criminal Defense Attorney-- focusing on cyber crimes                                                                                    0.014302
Data Engineer,   User Experience Designer,   Front-End Web Developer, Game Developer,   Data Scientist,   Product Manager                                                                                                                                     0.014302
  Front-End Web Developer,   User Experience Designer,   DevOps / SysAdmin, Back-End Web Developer,   Data Scientist, Game Developer,   Product Manager,   Quality Assurance Engineer, Full-Stack Web Developer, Information Security,   Mobile Developer     0.014302
Education                                                                                                                                                                                                                                                     0.014302
  DevOps / SysAdmin,   Mobile Developer, Full-Stack Web Developer,   Front-End Web Developer                                                                                                                                                                  0.014302
Back-End Web Developer,   Data Scientist, Information Security,   Front-End Web Developer,   Quality Assurance Engineer,   DevOps / SysAdmin, Data Engineer, Game Developer, Full-Stack Web Developer                                                         0.014302
  Data Scientist, Back-End Web Developer, Full-Stack Web Developer,   Front-End Web Developer,   User Experience Designer,   Mobile Developer                                                                                                                 0.014302
Game Developer,   Mobile Developer, Back-End Web Developer,   Front-End Web Developer, Information Security                                                                                                                                                   0.014302
Name: JobRoleInterest, Length: 3213, dtype: float64

The information in the table above is quite granular, but from a quick scan it looks like:

  • A lot of people are interested in web development (full-stack web development, front-end web development and back-end web development).
  • A few people (1.7%) are interested in mobile development.
  • Not too many people are interested in domains other than web and mobile development.

It's also interesting to note that many respondents are interested in more than one subject. It'd be useful to get a better picture of how many people are interested in a single subject and how many have mixed interests. Consequently, in the next code block, we'll:

  • Split each string in the JobRoleInterest column to find the number of options for each participant.
    • We'll first drop the null values because we can't split Nan values.
  • Generate a frequency table for the variable describing the number of options.
# Split each string in the 'JobRoleInterest' column
interests_no_nulls = fcc['JobRoleInterest'].dropna()
splitted_interests = interests_no_nulls.str.split(',')

# Frequency table for the var describing the number of options
n_of_options = splitted_interests.apply(lambda x: len(x)) # x is a list of job options
n_of_options.value_counts(normalize = True).sort_index() * 100
1     31.650458
2     10.883867
3     15.889588
4     15.217391
5     12.042334
6      6.721968
7      3.861556
8      1.759153
9      0.986842
10     0.471968
11     0.185927
12     0.300343
13     0.028604
Name: JobRoleInterest, dtype: float64

Only 31.7% of the participants have a clear idea about what programming niche they'd like to work in, while the vast majority of students have mixed interests. Given that we offer courses on various subjects, the fact that new coders have mixed interest might be actually good for us.

The focus of our courses is on web and mobile development, so let's find out how many respondents chose at least one of these two options.

# Frequency table
web_or_mobile = interests_no_nulls.str.contains(
    'Web Developer|Mobile Developer') # returns an array of booleans
freq_table = web_or_mobile.value_counts(normalize = True) * 100
print(freq_table)

# Graph for the frequency table above
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

freq_table.plot.bar()
plt.title('Most Participants are Interested in \nWeb or Mobile Development',
          y = 1.08) # y pads the title upward
plt.ylabel('Percentage', fontsize = 12)
plt.xticks([0,1],['Web or mobile\ndevelopment', 'Other subject'],
           rotation = 0) # the initial xtick labels were True and False
plt.ylim([0,100])
plt.show()
True     86.241419
False    13.758581
Name: JobRoleInterest, dtype: float64

Stats2_7_1

Most people in this survey (roughly 86%) are interested in either web or mobile development. These figures offer us a strong reason to consider this sample representative for our population of interest. We want to advertise our courses to people interested in all sorts of programming niches but mostly web and mobile development.

New coders — locations and densities

Let's begin with finding out where these new coders are located, and the densities (how many new coders there are) for each location. This should be a good start for finding out the best two markets in which to run our ads campaign.

The data set provides information about the location of each participant at a country level. We can think of each country as an individual market, so we can frame our goal as finding the two best countries to advertise in.

We can start by examining the frequency distribution table of the CountryLive variable, which describes what country each participant lives in (not their origin country). We'll only consider those participants who answered what role(s) they're interested in, to make sure we work with a representative sample.

# Isolate the participants that answered what role they'd be interested in
fcc_good = fcc[fcc['JobRoleInterest'].notnull()].copy()

# Frequency tables with absolute and relative frequencies
absolute_frequencies = fcc_good['CountryLive'].value_counts()
relative_frequencies = fcc_good['CountryLive'].value_counts(normalize = True) * 100

# Display the frequency tables in a more readable format
pd.DataFrame(data = {'Absolute frequency': absolute_frequencies, 
                     'Percentage': relative_frequencies}
            )
Absolute frequency Percentage
United States of America 3125 45.700497
India 528 7.721556
United Kingdom 315 4.606610
Canada 260 3.802281
Poland 131 1.915765
Brazil 129 1.886517
Germany 125 1.828020
Australia 112 1.637906
Russia 102 1.491664
Ukraine 89 1.301550
Nigeria 84 1.228429
Spain 77 1.126060
France 75 1.096812
Romania 71 1.038315
Netherlands (Holland, Europe) 65 0.950570
Italy 62 0.906698
Serbia 52 0.760456
Philippines 52 0.760456
Greece 46 0.672711
Ireland 43 0.628839
South Africa 39 0.570342
Mexico 37 0.541094
Turkey 36 0.526470
Hungary 34 0.497221
Singapore 34 0.497221
New Zealand 33 0.482597
Argentina 32 0.467973
Croatia 32 0.467973
Sweden 31 0.453349
Indonesia 31 0.453349
... ... ...
Mozambique 1 0.014624
Yemen 1 0.014624
Cuba 1 0.014624
Sudan 1 0.014624
Guatemala 1 0.014624
Bolivia 1 0.014624
Jordan 1 0.014624
Myanmar 1 0.014624
Samoa 1 0.014624
Gambia 1 0.014624
Channel Islands 1 0.014624
Vanuatu 1 0.014624
Trinidad & Tobago 1 0.014624
Papua New Guinea 1 0.014624
Liberia 1 0.014624
Panama 1 0.014624
Rwanda 1 0.014624
Cameroon 1 0.014624
Aruba 1 0.014624
Gibraltar 1 0.014624
Anguilla 1 0.014624
Botswana 1 0.014624
Turkmenistan 1 0.014624
Kyrgyzstan 1 0.014624
Qatar 1 0.014624
Angola 1 0.014624
Nambia 1 0.014624
Guadeloupe 1 0.014624
Nicaragua 1 0.014624
Cayman Islands 1 0.014624

137 rows × 2 columns

45.7% of our potential customers are located in the US, and this definitely seems like the most interesting market. India has the second customer density, but it's just 7.7%, which is not too far from the United Kingdom (4.6%) or Canada (3.8%).

This is useful information, but we need to go more in depth than this and figure out how much money people are actually willing to spend on learning. Advertising in high-density markets where most people are only willing to learn for free is extremely unlikely to be profitable for us.

Spending money for learning

The MoneyForLearning column describes in American dollars the amount of money spent by participants from the moment they started coding until the moment they completed the survey. Our company sells subscriptions at a price of $59 per month, and for this reason we're interested in finding out how much money each student spends per month.

We'll narrow down our analysis to only four countries: the US, India, the United Kingdom, and Canada. We do this for two reasons:

  • These are the countries having the highest frequency in the frequency table above, which means we have a decent amount of data for each.
  • Our courses are written in English, and English is an official language in all these four countries. The more people know English, the better our chances to target the right people with our ads.

Let's start with creating a new column that describes the amount of money a student has spent per month so far. To do that, we'll need to divide the MoneyForLearning column to the MonthsProgramming column. The problem is that some students answered that they have been learning to code for 0 months (it might be that they have just started). To avoid dividing by 0, we'll replace 0 with 1 in the MonthsProgramming column.

# Replace 0s with 1s to avoid division by 0
fcc_good['MonthsProgramming'].replace(0,1, inplace = True)

# New column for the amount of money each student spends each month
fcc_good['money_per_month'] = fcc_good['MoneyForLearning'] / fcc_good['MonthsProgramming']
fcc_good['money_per_month'].isnull().sum()
675

Let's keep only the rows that don't have null values for the money_per_month column.

# Keep only the rows with non-nulls in the `money_per_month` column 
fcc_good = fcc_good[fcc_good['money_per_month'].notnull()]

We want to group the data by country, and then measure the average amount of money that students spend per month in each country. First, let's remove the rows having null values for the CountryLive column, and check out if we still have enough data for the four countries that interest us.

# Remove the rows with null values in 'CountryLive'
fcc_good = fcc_good[fcc_good['CountryLive'].notnull()]

# Frequency table to check if we still have enough data
fcc_good['CountryLive'].value_counts().head()
United States of America    2933
India                        463
United Kingdom               279
Canada                       240
Poland                       122
Name: CountryLive, dtype: int64

This should be enough, so let's compute the average value spent per month in each country by a student. We'll compute the average using the mean.

# Mean sum of money spent by students each month
countries_mean = fcc_good.groupby('CountryLive').mean()
countries_mean['money_per_month'][['United States of America',
                            'India', 'United Kingdom',
                            'Canada']]
CountryLive
United States of America    227.997996
India                       135.100982
United Kingdom               45.534443
Canada                      113.510961
Name: money_per_month, dtype: float64

The results for the United Kingdom and Canada are a bit surprising relative to the values we see for India. If we considered a few socio-economical metrics (like GDP per capita), we'd intuitively expect people in the UK and Canada to spend more on learning than people in India.

It might be that we don't have have enough representative data for the United Kingdom and Canada, or we have some outliers (maybe coming from wrong survey answers) making the mean too large for India, or too low for the UK and Canada. Or it might be that the results are correct.

Dealing with extreme outliers

Let's use box plots to visualize the distribution of the money_per_month variable for each country.

# Isolate only the countries of interest
only_4 = fcc_good[fcc_good['CountryLive'].str.contains(
    'United States of America|India|United Kingdom|Canada')]

# Box plots to visualize distributions
import seaborn as sns
sns.boxplot(y = 'money_per_month', x = 'CountryLive',
            data = only_4)
plt.title('Money Spent Per Month Per Country\n(Distributions)',
         fontsize = 16)
plt.ylabel('Money per month (US dollars)')
plt.xlabel('Country')
plt.xticks(range(4), ['US', 'UK', 'India', 'Canada']) # avoids tick labels overlap
plt.show()

Stats2_19_0

It's hard to see on the plot above if there's anything wrong with the data for the United Kingdom, India, or Canada, but we can see immediately that there's something really off for the US: it says two persons spend $50,000 or more each month for learning. This is not impossible, but it seems extremely unlikely, so we'll remove every value that goes over $20,000 per month.

# Isolate only those participants who spend less than 10000 per month
fcc_good = fcc_good[fcc_good['money_per_month'] < 20000]

Now let's recompute the mean values and plot the box plots again.

# Recompute mean sum of money spent by students each month
countries_mean = fcc_good.groupby('CountryLive').mean()
countries_mean['money_per_month'][['United States of America',
                            'India', 'United Kingdom',
                            'Canada']]
CountryLive
United States of America    183.800110
India                       135.100982
United Kingdom               45.534443
Canada                      113.510961
Name: money_per_month, dtype: float64
# Isolate again the countries of interest
only_4 = fcc_good[fcc_good['CountryLive'].str.contains(
    'United States of America|India|United Kingdom|Canada')]

# Box plots to visualize distributions
sns.boxplot(y = 'money_per_month', x = 'CountryLive',
            data = only_4)
plt.title('Money Spent Per Month Per Country\n(Distributions)',
         fontsize = 16)
plt.ylabel('Money per month (US dollars)')
plt.xlabel('Country')
plt.xticks(range(4), ['US', 'UK', 'India', 'Canada']) # avoids tick labels overlap
plt.show()

Stats2_24_0

We can see a few extreme outliers for India (values over $2,500 per month), but it's unclear whether this is good data or not. Maybe these persons attended several bootcamps, which tend to be very expensive. Let's examine these two data points to see if we can find anything relevant.

# Inspect the extreme outliers for India
india_outliers = only_4[
    (only_4['CountryLive'] == 'India') & 
    (only_4['money_per_month'] >= 2500)]
india_outliers
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventConferences CodeEventDjangoGirls CodeEventFCC CodeEventGameJam CodeEventGirlDev CodeEventHackathons CodeEventMeetup CodeEventNodeSchool CodeEventNone CodeEventOther CodeEventRailsBridge CodeEventRailsGirls CodeEventStartUpWknd CodeEventWkdBootcamps CodeEventWomenCode CodeEventWorkshops CommuteTime CountryCitizen CountryLive EmploymentField EmploymentFieldOther EmploymentStatus EmploymentStatusOther ExpectedEarning FinanciallySupporting FirstDevJob Gender GenderOther HasChildren HasDebt HasFinancialDependents HasHighSpdInternet HasHomeMortgage HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning ID.x ID.y Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobInterestBackEnd JobInterestDataEngr JobInterestDataSci JobInterestDevOps JobInterestFrontEnd JobInterestFullStack JobInterestGameDev JobInterestInfoSec JobInterestMobile JobInterestOther JobInterestProjMngr JobInterestQAEngr JobInterestUX JobPref JobRelocateYesNo JobRoleInterest JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming NetworkID Part1EndTime Part1StartTime Part2EndTime Part2StartTime PodcastChangeLog PodcastCodeNewbie PodcastCodePen PodcastDevTea PodcastDotNET PodcastGiantRobots PodcastJSAir PodcastJSJabber PodcastNone PodcastOther PodcastProgThrowdown PodcastRubyRogues PodcastSEDaily PodcastSERadio PodcastShopTalk PodcastTalkPython PodcastTheWebAhead ResourceCodecademy ResourceCodeWars ResourceCoursera ResourceCSS ResourceEdX ResourceEgghead ResourceFCC ResourceHackerRank ResourceKA ResourceLynda ResourceMDN ResourceOdinProj ResourceOther ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse ResourceUdacity ResourceUdemy ResourceW3S SchoolDegree SchoolMajor StudentDebtOwe YouTubeCodeCourse YouTubeCodingTrain YouTubeCodingTut360 YouTubeComputerphile YouTubeDerekBanas YouTubeDevTips YouTubeEngineeredTruth YouTubeFCC YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther YouTubeSimplilearn YouTubeTheNewBoston money_per_month
1728 24.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN India India NaN NaN A stay-at-home parent or homemaker NaN 70000.0 NaN NaN male NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 30.0 d964ec629fd6d85a5bf27f7339f4fa6d 950a8cf9cef1ae6a15da470e572b1b7a NaN 0.0 0.0 0.0 NaN Within the next 6 months 1.0 NaN NaN NaN 1.0 NaN NaN NaN 1.0 NaN 1.0 NaN 1.0 work for a startup 1.0 User Experience Designer, Mobile Developer... in an office with other developers Bengali single, never married 20000.0 4.0 38d312a990 2017-03-10 10:22:34 2017-03-10 10:17:42 2017-03-10 10:24:38 2017-03-10 10:22:40 NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 bachelor's degree Computer Programming NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 5000.000000
1755 20.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN 1.0 NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN India India NaN NaN Not working and not looking for work NaN 100000.0 NaN NaN male NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 10.0 811bf953ef546460f5436fcf2baa532d 81e2a4cab0543e14746c4a20ffdae17c NaN 0.0 0.0 0.0 NaN I haven't decided NaN 1.0 NaN 1.0 1.0 1.0 NaN 1.0 NaN NaN NaN NaN NaN work for a multinational corporation 1.0 Information Security, Full-Stack Web Developer... no preference Hindi single, never married 50000.0 15.0 4611a76b60 2017-03-10 10:48:31 2017-03-10 10:42:29 2017-03-10 10:51:37 2017-03-10 10:48:38 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN 1.0 1.0 1.0 NaN 1.0 NaN 1.0 NaN 1.0 1.0 NaN NaN NaN 1.0 NaN NaN NaN 1.0 1.0 1.0 bachelor's degree Computer Science NaN NaN NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN 3333.333333
7989 28.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 15 to 29 minutes India India software development and IT NaN Employed for wages NaN 500000.0 1.0 NaN male NaN 0.0 1.0 1.0 1.0 0.0 0.0 1.0 NaN 20.0 a6a5597bbbc2c282386d6675641b744a da7bbb54a8b26a379707be56b6c51e65 300000.0 0.0 0.0 0.0 0.0 more than 12 months from now 1.0 NaN NaN NaN 1.0 1.0 1.0 NaN NaN NaN NaN NaN 1.0 work for a multinational corporation 1.0 User Experience Designer, Back-End Web Devel... in an office with other developers Marathi married or domestic partnership 5000.0 1.0 c47a447b5d 2017-03-26 14:06:48 2017-03-26 14:02:41 2017-03-26 14:13:13 2017-03-26 14:07:17 NaN NaN NaN NaN NaN NaN NaN NaN NaN Not listened to anything yet. NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN bachelor's degree Aerospace and Aeronautical Engineering 2500.0 NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 5000.000000
8126 22.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN India India NaN NaN Not working but looking for work NaN 80000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 80.0 69e8ab9126baee49f66e3577aea7fd3c 9f08092e82f709e63847ba88841247c0 NaN 0.0 0.0 0.0 NaN I'm already applying 1.0 NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN work for a startup 1.0 Back-End Web Developer, Full-Stack Web Develop... in an office with other developers Malayalam single, never married 5000.0 1.0 0d3d1762a4 2017-03-27 07:10:17 2017-03-27 07:05:23 2017-03-27 07:12:21 2017-03-27 07:10:22 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN bachelor's degree Electrical and Electronics Engineering 10000.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN 1.0 5000.000000
13398 19.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN India India NaN NaN Unable to work NaN 100000.0 NaN NaN male NaN NaN 0.0 0.0 0.0 NaN 0.0 NaN NaN 30.0 b7fe7bc4edefc3a60eb48f977e4426e3 80ff09859ac475b70ac19b7b7369e953 NaN 0.0 0.0 0.0 NaN I haven't decided NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN work for a multinational corporation 1.0 Mobile Developer no preference Hindi single, never married 20000.0 2.0 51a6f9a1a7 2017-04-01 00:31:25 2017-04-01 00:28:17 2017-04-01 00:33:44 2017-04-01 00:31:32 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN 1.0 bachelor's degree Computer Science NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 10000.000000
15587 27.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes India India software development and IT NaN Employed for wages NaN 65000.0 0.0 NaN male NaN 0.0 1.0 1.0 1.0 0.0 0.0 1.0 NaN 36.0 5a7394f24292cb82b72adb702886543a 8bc7997217d4a57b22242471cc8d89ef 60000.0 0.0 0.0 0.0 1.0 I haven't decided NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN work for a startup NaN Full-Stack Web Developer, Data Scientist from home Hindi single, never married 100000.0 24.0 8af0c2b6da 2017-04-03 09:43:53 2017-04-03 09:39:38 2017-04-03 09:54:39 2017-04-03 09:43:57 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN 1.0 bachelor's degree Communications 25000.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN 1.0 NaN NaN NaN NaN 4166.666667

It seems that neither participant attended a bootcamp. Overall, it's really hard to figure out from the data whether these persons really spent that much money with learning. The actual question of the survey was "Aside from university tuition, about how much money have you spent on learning to code so far (in US dollars)?", so they might have misunderstood and thought university tuition is included. It seems safer to remove these two rows.

# Remove the outliers for India
only_4 = only_4.drop(india_outliers.index) # using the row labels

Looking back at the box plot above, we can also see more extreme outliers for the US (values over $6,000 per month). Let's examine these participants in more detail.

# Examine the extreme outliers for the US
us_outliers = only_4[
    (only_4['CountryLive'] == 'United States of America') & 
    (only_4['money_per_month'] >= 6000)]

us_outliers
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventConferences CodeEventDjangoGirls CodeEventFCC CodeEventGameJam CodeEventGirlDev CodeEventHackathons CodeEventMeetup CodeEventNodeSchool CodeEventNone CodeEventOther CodeEventRailsBridge CodeEventRailsGirls CodeEventStartUpWknd CodeEventWkdBootcamps CodeEventWomenCode CodeEventWorkshops CommuteTime CountryCitizen CountryLive EmploymentField EmploymentFieldOther EmploymentStatus EmploymentStatusOther ExpectedEarning FinanciallySupporting FirstDevJob Gender GenderOther HasChildren HasDebt HasFinancialDependents HasHighSpdInternet HasHomeMortgage HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning ID.x ID.y Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobInterestBackEnd JobInterestDataEngr JobInterestDataSci JobInterestDevOps JobInterestFrontEnd JobInterestFullStack JobInterestGameDev JobInterestInfoSec JobInterestMobile JobInterestOther JobInterestProjMngr JobInterestQAEngr JobInterestUX JobPref JobRelocateYesNo JobRoleInterest JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming NetworkID Part1EndTime Part1StartTime Part2EndTime Part2StartTime PodcastChangeLog PodcastCodeNewbie PodcastCodePen PodcastDevTea PodcastDotNET PodcastGiantRobots PodcastJSAir PodcastJSJabber PodcastNone PodcastOther PodcastProgThrowdown PodcastRubyRogues PodcastSEDaily PodcastSERadio PodcastShopTalk PodcastTalkPython PodcastTheWebAhead ResourceCodecademy ResourceCodeWars ResourceCoursera ResourceCSS ResourceEdX ResourceEgghead ResourceFCC ResourceHackerRank ResourceKA ResourceLynda ResourceMDN ResourceOdinProj ResourceOther ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse ResourceUdacity ResourceUdemy ResourceW3S SchoolDegree SchoolMajor StudentDebtOwe YouTubeCodeCourse YouTubeCodingTrain YouTubeCodingTut360 YouTubeComputerphile YouTubeDerekBanas YouTubeDevTips YouTubeEngineeredTruth YouTubeFCC YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther YouTubeSimplilearn YouTubeTheNewBoston money_per_month
718 26.0 1.0 0.0 0.0 The Coding Boot Camp at UCLA Extension 1.0 NaN more than 1 million 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America architecture or physical engineering NaN Employed for wages NaN 50000.0 NaN NaN male NaN NaN 0.0 0.0 0.0 NaN 0.0 NaN NaN 35.0 796ae14c2acdee36eebc250a252abdaf d9e44d73057fa5d322a071adc744bf07 44500.0 0.0 0.0 0.0 1.0 Within the next 6 months 1.0 NaN NaN NaN 1.0 1.0 NaN NaN 1.0 NaN NaN NaN 1.0 work for a startup 1.0 User Experience Designer, Full-Stack Web Dev... in an office with other developers English single, never married 8000.0 1.0 50dab3f716 2017-03-09 21:26:35 2017-03-09 21:21:58 2017-03-09 21:29:10 2017-03-09 21:26:39 NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN bachelor's degree Architecture NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 8000.000000
1222 32.0 1.0 0.0 0.0 The Iron Yard 1.0 NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN United States of America United States of America NaN NaN Not working and not looking for work NaN 50000.0 NaN NaN female NaN NaN 1.0 0.0 1.0 0.0 0.0 0.0 NaN 50.0 bfabebb4293ac002d26a1397d00c7443 590f0be70e80f1daf5a23eb7f4a72a3d NaN 0.0 0.0 0.0 NaN Within the next 6 months NaN NaN NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN 1.0 work for a nonprofit 1.0 Front-End Web Developer, Mobile Developer,... in an office with other developers English single, never married 13000.0 2.0 e512c4bdd0 2017-03-10 02:14:11 2017-03-10 02:10:07 2017-03-10 02:15:32 2017-03-10 02:14:16 NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN 1.0 1.0 NaN NaN NaN 1.0 NaN NaN 1.0 1.0 1.0 1.0 1.0 1.0 NaN bachelor's degree Anthropology NaN NaN 1.0 NaN NaN NaN 1.0 NaN 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN 6500.000000
3184 34.0 1.0 1.0 0.0 We Can Code IT 1.0 NaN more than 1 million NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN Less than 15 minutes NaN United States of America software development and IT NaN Employed for wages NaN 60000.0 NaN NaN male NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 10.0 5d4889491d9d25a255e57fd1c0022458 585e8f8b9a838ef1abbe8c6f1891c048 40000.0 0.0 0.0 0.0 0.0 I haven't decided NaN 1.0 1.0 1.0 NaN NaN NaN 1.0 NaN NaN NaN 1.0 1.0 work for a medium-sized company 0.0 Quality Assurance Engineer, DevOps / SysAd... in an office with other developers English single, never married 9000.0 1.0 e7bebaabd4 2017-03-11 23:34:16 2017-03-11 23:31:17 2017-03-11 23:36:02 2017-03-11 23:34:21 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 1.0 1.0 some college credit, no degree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 9000.000000
3930 31.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN United States of America United States of America NaN NaN Not working and not looking for work NaN 100000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 50.0 e1d790033545934fbe5bb5b60e368cd9 7cf1e41682462c42ce48029abf77d43c NaN 1.0 0.0 0.0 NaN Within the next 6 months 1.0 NaN NaN 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN work for a startup 1.0 DevOps / SysAdmin, Front-End Web Developer... no preference English married or domestic partnership 65000.0 6.0 75759e5a1c 2017-03-13 10:06:46 2017-03-13 09:56:13 2017-03-13 10:10:00 2017-03-13 10:06:50 NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN 1.0 NaN reactivex.io/learnrx/ & jafar husain NaN NaN 1.0 NaN NaN NaN NaN bachelor's degree Biology 40000.0 NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0 1.0 1.0 1.0 1.0 NaN various conf presentations NaN NaN 10833.333333
6805 46.0 1.0 1.0 1.0 Sabio.la 0.0 NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN United States of America United States of America NaN NaN Not working but looking for work NaN 70000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 45.0 69096aacf4245694303cf8f7ce68a63f 4c56f82a348836e76dd90d18a3d5ed88 NaN 1.0 0.0 0.0 NaN Within the next 6 months NaN 1.0 1.0 NaN NaN 1.0 1.0 NaN NaN NaN 1.0 NaN NaN work for a multinational corporation 1.0 Full-Stack Web Developer, Game Developer, Pr... no preference English married or domestic partnership 15000.0 1.0 53d13b58e9 2017-03-21 20:13:08 2017-03-21 20:10:25 2017-03-21 20:14:36 2017-03-21 20:13:11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0 1.0 1.0 bachelor's degree Business Administration and Management 45000.0 NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 15000.000000
7198 32.0 0.0 NaN NaN NaN NaN NaN more than 1 million 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America education NaN Employed for wages NaN 55000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 4.0 cb2754165344e6be79da8a4c76bf3917 272219fbd28a3a7562cb1d778e482e1e NaN 1.0 0.0 0.0 0.0 I'm already applying 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN work for a multinational corporation 0.0 Full-Stack Web Developer, Back-End Web Developer no preference Spanish single, never married 70000.0 5.0 439a4adaf6 2017-03-23 01:37:46 2017-03-23 01:35:01 2017-03-23 01:39:37 2017-03-23 01:37:49 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 1.0 NaN NaN 1.0 NaN 1.0 NaN 1.0 NaN NaN NaN NaN 1.0 NaN 1.0 NaN 1.0 professional degree (MBA, MD, JD, etc.) Computer Science NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 1.0 1.0 NaN NaN NaN NaN NaN 14000.000000
7505 26.0 1.0 0.0 1.0 Codeup 0.0 NaN more than 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN United States of America United States of America NaN NaN Not working but looking for work NaN 65000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 40.0 657fb50800bcc99a07caf52387f67fbb ad1df4669883d8f628f0b5598a4c5c45 NaN 0.0 0.0 0.0 NaN Within the next 6 months 1.0 NaN NaN NaN 1.0 1.0 NaN 1.0 1.0 NaN NaN NaN NaN work for a government 1.0 Mobile Developer, Full-Stack Web Developer, ... in an office with other developers English single, never married 20000.0 3.0 96e254de36 2017-03-24 03:26:09 2017-03-24 03:23:02 2017-03-24 03:27:47 2017-03-24 03:26:14 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN 1.0 NaN 1.0 NaN NaN 1.0 1.0 bachelor's degree Economics 20000.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN 6666.666667
9778 33.0 1.0 0.0 1.0 Grand Circus 1.0 NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America education NaN Employed for wages NaN 55000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 40.0 7a62790f6ded15e26d5f429b8a4d1095 98eeee1aa81ba70b2ab288bf4b63d703 20000.0 0.0 0.0 0.0 1.0 Within the next 6 months 1.0 1.0 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN work for a medium-sized company NaN Full-Stack Web Developer, Data Engineer, Qua... from home English single, never married 8000.0 1.0 ea80a3b15e 2017-04-05 19:48:12 2017-04-05 19:40:19 2017-04-05 19:49:44 2017-04-05 19:49:03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0 NaN NaN master's degree (non-professional) Chemical Engineering 45000.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8000.000000
16650 29.0 0.0 NaN NaN NaN NaN 2.0 more than 1 million NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN United States of America United States of America NaN NaN Not working but looking for work NaN NaN 1.0 NaN male NaN 1.0 1.0 1.0 1.0 1.0 0.0 1.0 400000.0 40.0 e1925d408c973b91cf3e9a9285238796 7e9e3c31a3dc2cafe3a09269398c4de8 NaN 1.0 1.0 0.0 NaN I'm already applying 1.0 1.0 NaN NaN 1.0 1.0 1.0 NaN NaN NaN 1.0 NaN NaN work for a multinational corporation 1.0 Product Manager, Data Engineer, Full-Stack W... in an office with other developers English married or domestic partnership 200000.0 12.0 1a45f4a3ef 2017-03-14 02:42:57 2017-03-14 02:40:10 2017-03-14 02:45:55 2017-03-14 02:43:05 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 associate's degree Computer Programming 30000.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 16666.666667
16997 27.0 0.0 NaN NaN NaN NaN 1.0 more than 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America health care NaN Employed for wages NaN 60000.0 0.0 NaN female NaN 1.0 1.0 1.0 1.0 0.0 0.0 1.0 NaN 12.0 624914ce07c296c866c9e16a14dc01c7 6384a1e576caf4b6b9339fe496a51f1f 40000.0 1.0 0.0 0.0 0.0 Within 7 to 12 months NaN NaN NaN NaN 1.0 1.0 1.0 NaN 1.0 NaN 1.0 NaN 1.0 work for a medium-sized company 1.0 Mobile Developer, Game Developer, User Exp... in an office with other developers English single, never married 12500.0 1.0 ad1a21217c 2017-03-20 05:43:28 2017-03-20 05:40:08 2017-03-20 05:45:28 2017-03-20 05:43:32 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 1.0 some college credit, no degree NaN 12500.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12500.000000
17231 50.0 0.0 NaN NaN NaN NaN 2.0 less than 100,000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN Kenya United States of America NaN NaN Not working but looking for work NaN 40000.0 0.0 NaN female NaN 1.0 0.0 1.0 1.0 NaN 0.0 NaN NaN 1.0 d4bc6ae775b20816fcd41048ef75417c 606749cd07b124234ab6dff81b324c02 NaN 1.0 0.0 0.0 NaN Within the next 6 months NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN work for a nonprofit 0.0 Front-End Web Developer in an office with other developers English married or domestic partnership 30000.0 2.0 38c1b478d0 2017-03-24 18:48:23 2017-03-24 18:46:01 2017-03-24 18:51:20 2017-03-24 18:48:27 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN bachelor's degree Computer Programming NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15000.000000

Out of these 11 extreme outliers, 6 people attended bootcamps, which justifies the large sums of money spent on learning. For the other five, it's hard to figure out from the data where they could have spent that much money on learning. Consequently, we'll remove those rows where participants reported thay they spend $6,000 each month, but they have never attended a bootcamp.

Also, the data shows that eight respondents had been programming for no more than three months when they completed the survey. They most likely paid a large sum of money for a bootcamp that was going to last for several months, so the amount of money spent per month is unrealistic and should be significantly lower (because they probably didn't spend anything for the next couple of months after the survey). As a consequence, we'll remove every these eight outliers.

In the next code block, we'll remove respondents that:

  • Didn't attend bootcamps.
  • Had been programming for three months or less when at the time they completed the survey.
# Remove the respondents who didn't attendent a bootcamp
no_bootcamp = only_4[
    (only_4['CountryLive'] == 'United States of America') & 
    (only_4['money_per_month'] >= 6000) &
    (only_4['AttendedBootcamp'] == 0)
]

only_4 = only_4.drop(no_bootcamp.index)


# Remove the respondents that had been programming for less than 3 months
less_than_3_months = only_4[
    (only_4['CountryLive'] == 'United States of America') & 
    (only_4['money_per_month'] >= 6000) &
    (only_4['MonthsProgramming'] <= 3)
]

only_4 = only_4.drop(less_than_3_months.index)

Looking again at the last box plot above, we can also see an extreme outlier for Canada — a person who spends roughly $5,000 per month. Let's examine this person in more depth.

# Examine the extreme outliers for Canada
canada_outliers = only_4[
    (only_4['CountryLive'] == 'Canada') & 
    (only_4['money_per_month'] > 4500)]

canada_outliers
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventConferences CodeEventDjangoGirls CodeEventFCC CodeEventGameJam CodeEventGirlDev CodeEventHackathons CodeEventMeetup CodeEventNodeSchool CodeEventNone CodeEventOther CodeEventRailsBridge CodeEventRailsGirls CodeEventStartUpWknd CodeEventWkdBootcamps CodeEventWomenCode CodeEventWorkshops CommuteTime CountryCitizen CountryLive EmploymentField EmploymentFieldOther EmploymentStatus EmploymentStatusOther ExpectedEarning FinanciallySupporting FirstDevJob Gender GenderOther HasChildren HasDebt HasFinancialDependents HasHighSpdInternet HasHomeMortgage HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning ID.x ID.y Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobInterestBackEnd JobInterestDataEngr JobInterestDataSci JobInterestDevOps JobInterestFrontEnd JobInterestFullStack JobInterestGameDev JobInterestInfoSec JobInterestMobile JobInterestOther JobInterestProjMngr JobInterestQAEngr JobInterestUX JobPref JobRelocateYesNo JobRoleInterest JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming NetworkID Part1EndTime Part1StartTime Part2EndTime Part2StartTime PodcastChangeLog PodcastCodeNewbie PodcastCodePen PodcastDevTea PodcastDotNET PodcastGiantRobots PodcastJSAir PodcastJSJabber PodcastNone PodcastOther PodcastProgThrowdown PodcastRubyRogues PodcastSEDaily PodcastSERadio PodcastShopTalk PodcastTalkPython PodcastTheWebAhead ResourceCodecademy ResourceCodeWars ResourceCoursera ResourceCSS ResourceEdX ResourceEgghead ResourceFCC ResourceHackerRank ResourceKA ResourceLynda ResourceMDN ResourceOdinProj ResourceOther ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse ResourceUdacity ResourceUdemy ResourceW3S SchoolDegree SchoolMajor StudentDebtOwe YouTubeCodeCourse YouTubeCodingTrain YouTubeCodingTut360 YouTubeComputerphile YouTubeDerekBanas YouTubeDevTips YouTubeEngineeredTruth YouTubeFCC YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther YouTubeSimplilearn YouTubeTheNewBoston money_per_month
13659 24.0 1.0 0.0 0.0 Bloc.io 1.0 NaN more than 1 million 1.0 NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 30 to 44 minutes Canada Canada finance NaN Employed for wages NaN 60000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 1.0 0.0 0.0 250000.0 10.0 739b584aef0541450c1f713b82025181 28381a455ab25cc2a118d78af44d8749 140000.0 1.0 1.0 0.0 0.0 I haven't decided 1.0 NaN 1.0 NaN 1.0 1.0 1.0 NaN 1.0 NaN 1.0 NaN 1.0 work for a multinational corporation NaN Mobile Developer, Full-Stack Web Developer, ... from home Yue (Cantonese) Chinese single, never married 10000.0 2.0 41c26f2932 2017-03-25 23:23:03 2017-03-25 23:20:33 2017-03-25 23:24:34 2017-03-25 23:23:06 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN 1.0 bachelor's degree Finance NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN NaN 5000.0

Here, the situation is similar to some of the US respondents — this participant had been programming for no more than two months when he completed the survey. He seems to have paid a large sum of money in the beginning to enroll in a bootcamp, and then he probably didn't spend anything for the next couple of months after the survey. We'll take the same approach here as for the US and remove this outlier.

# Remove the extreme outliers for Canada
only_4 = only_4.drop(canada_outliers.index)

Let's recompute the mean values and generate the final box plots.

# Recompute mean sum of money spent by students each month
only_4.groupby('CountryLive').mean()['money_per_month']
CountryLive
Canada                       93.065400
India                        65.758763
United Kingdom               45.534443
United States of America    142.654608
Name: money_per_month, dtype: float64
# Visualize the distributions again
sns.boxplot(y = 'money_per_month', x = 'CountryLive',
            data = only_4)
plt.title('Money Spent Per Month Per Country\n(Distributions)',
          fontsize = 16)
plt.ylabel('Money per month (US dollars)')
plt.xlabel('Country')
plt.xticks(range(4), ['US', 'UK', 'India', 'Canada']) # avoids tick labels overlap
plt.show()

Stats2_39_0

Choosing the two best advertising markets

Obviously, one country we should advertise in is the US. Lots of new coders live there and they are willing to pay a good amount of money each month (roughly $143).

We sell subscriptions at a price of $59 per month, and Canada seems to be the best second choice because people there are willing to pay roughly $93 per month, compared to India ($66) and the United Kingdom ($45).

The data suggests strongly that we shouldn't advertise in the UK, but let's take a second look at India before deciding to choose Canada as our second best choice:

  • $59 doesn't seem like an expensive sum for people in India since they spend on average $66 each month.
  • We have almost twice as more potential customers in India than we have in Canada:
# Frequency table for the 'CountryLive' column
only_4['CountryLive'].value_counts(normalize = True) * 100
United States of America    74.967908
India                       11.732991
United Kingdom               7.163030
Canada                       6.136072
Name: CountryLive, dtype: float64

It's not crystal clear what to choose between Canada and India. Although it seems more tempting to choose Canada, there are good chances that India might actually be a better choice because of the large number of potential customers.

At this point, it seems that we have several options:

  1. Advertise in the US, India, and Canada by splitting the advertisement budget in various combinations:

    • 60% for the US, 25% for India, 15% for Canada.
    • 50% for the US, 30% for India, 20% for Canada; etc.
  2. Advertise only in the US and India, or the US and Canada. Again, it makes sense to split the advertisement budget unequally. For instance:

    • 70% for the US, and 30% for India.
    • 65% for the US, and 35% for Canada; etc.
  3. Advertise only in the US.

It's probably best to send our analysis to the marketing team and let them use their domain knowledge to decide. They might want to do some extra surveys in India and Canada and then get back to us for analyzing the new survey data.

Conclusion

In this project, we analyzed survey data from new coders to find the best two markets to advertise in. The only solid conclusion we reached is that the US would be a good market to advertise in.

For the second best market, it wasn't clear-cut what to choose between India and Canada. We decided to send the results to the marketing team so they can use their domain knowledge to take the best decision.

Alex Olteanu

Alex Olteanu

I write data science content at Dataquest. You can reach out at alex@dataquest.io.

Read More