Data Science Portfolio Project: Where to Advertise an E-learning Product
At Dataquest, we strongly advocate portfolio projects as a means of getting a first data science job. In this blog post, we'll walk you through an example portfolio project.
The project is part of our Statistics Intermediate: Averages and Variability course, and it assumes familiarity with:
- Sampling (populations, samples, sample representativity)
- Frequency distributions
- Box plots and bar plots
- Summary metrics (especially the mean)
pandas
,matplotlib
, andseaborn
If you think you need to fill in any gaps before moving forward, we cover the topics above in depth in our Statistics Fundamentals and Statistics Intermediate: Averages and Variability courses. This course will also give you deeper instructions on how to build this project, and code it in your browser.
We also teach pandas
, matplotlib
, and seaborn
in our Data Scientist Path.
This project follows the guidelines presented in our style guide for data science projects.
Introduction
In this project, we'll aim to find the two best markets to advertise our product in — we're working for an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover other domains, like data science, game development, etc.
We'll analyze existing data about new coders to find the best markets to invest in for advertisting. To make our recommendation, we'll try to find out:
- Where are these new coders located.
- Which locations have the greatest number of new coders.
- How much money new coders are willing to spend on learning.
Summary of results
After analyzing the data, the only solid conclusion we reached is that the US would be a good market to advertise in. For the second best market, it wasn't clear-cut what to choose between India and Canada. We decided to send the results to the marketing team so they can use their domain knowledge to take the best decision.
For more details, please refer to the the full analysis below.
Exploring existing data
To avoid spending money on organizing a survey, we'll first try to make use of existing data to determine whether we can reach a reliable result.
One good candidate for our purpose is freeCodeCamp's 2017 New Coder Survey. freeCodeCamp is a free e-learning platform that offers courses on web development. Because they run a popular Medium publication (over 400,000 followers), their survey attracted new coders with varying interests (not only web development), which is ideal for our analysis.
The survey data is publicly available in this GitHub repository. Below, we'll do a quick exploration of the 2017-fCC-New-Coders-Survey-Data.csv
file stored in the clean-data
folder of the repository we just mentioned. We'll read in the file using the direct link here.
<code="language-python"># Read in the dataimport pandas as pd direct_link = 'https://raw.githubusercontent.com/freeCodeCamp/2017-new-coder-survey/master/clean-data/2017-fCC-New-Coders-Survey-Data.csv' fcc = pd.read_csv(direct_link, low_memory = 0) # low_memory = False to silence dtypes warning # Quick exploration of the data print(fcc.shape) pd.options.display.max_columns = 150 # to avoid truncated output fcc.head()</code="language-python">
<code="language-python">(18175, 136)</code="language-python">
Age | AttendedBootcamp | BootcampFinish | BootcampLoanYesNo | BootcampName | BootcampRecommend | ChildrenNumber | CityPopulation | CodeEventConferences | CodeEventDjangoGirls | CodeEventFCC | CodeEventGameJam | CodeEventGirlDev | CodeEventHackathons | CodeEventMeetup | CodeEventNodeSchool | CodeEventNone | CodeEventOther | CodeEventRailsBridge | CodeEventRailsGirls | CodeEventStartUpWknd | CodeEventWkdBootcamps | CodeEventWomenCode | CodeEventWorkshops | CommuteTime | CountryCitizen | CountryLive | EmploymentField | EmploymentFieldOther | EmploymentStatus | EmploymentStatusOther | ExpectedEarning | FinanciallySupporting | FirstDevJob | Gender | GenderOther | HasChildren | HasDebt | HasFinancialDependents | HasHighSpdInternet | HasHomeMortgage | HasServedInMilitary | HasStudentDebt | HomeMortgageOwe | HoursLearning | ID.x | ID.y | Income | IsEthnicMinority | IsReceiveDisabilitiesBenefits | IsSoftwareDev | IsUnderEmployed | JobApplyWhen | JobInterestBackEnd | JobInterestDataEngr | JobInterestDataSci | JobInterestDevOps | JobInterestFrontEnd | JobInterestFullStack | JobInterestGameDev | JobInterestInfoSec | JobInterestMobile | JobInterestOther | JobInterestProjMngr | JobInterestQAEngr | JobInterestUX | JobPref | JobRelocateYesNo | JobRoleInterest | JobWherePref | LanguageAtHome | MaritalStatus | MoneyForLearning | MonthsProgramming | NetworkID | Part1EndTime | Part1StartTime | Part2EndTime | Part2StartTime | PodcastChangeLog | PodcastCodeNewbie | PodcastCodePen | PodcastDevTea | PodcastDotNET | PodcastGiantRobots | PodcastJSAir | PodcastJSJabber | PodcastNone | PodcastOther | PodcastProgThrowdown | PodcastRubyRogues | PodcastSEDaily | PodcastSERadio | PodcastShopTalk | PodcastTalkPython | PodcastTheWebAhead | ResourceCodecademy | ResourceCodeWars | ResourceCoursera | ResourceCSS | ResourceEdX | ResourceEgghead | ResourceFCC | ResourceHackerRank | ResourceKA | ResourceLynda | ResourceMDN | ResourceOdinProj | ResourceOther | ResourcePluralSight | ResourceSkillcrush | ResourceSO | ResourceTreehouse | ResourceUdacity | ResourceUdemy | ResourceW3S | SchoolDegree | SchoolMajor | StudentDebtOwe | YouTubeCodeCourse | YouTubeCodingTrain | YouTubeCodingTut360 | YouTubeComputerphile | YouTubeDerekBanas | YouTubeDevTips | YouTubeEngineeredTruth | YouTubeFCC | YouTubeFunFunFunction | YouTubeGoogleDev | YouTubeLearnCode | YouTubeLevelUpTuts | YouTubeMIT | YouTubeMozillaHacks | YouTubeOther | YouTubeSimplilearn | YouTubeTheNewBoston | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 27.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15 to 29 minutes | Canada | Canada | software development and IT | NaN | Employed for wages | NaN | NaN | NaN | NaN | female | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 15.0 | 02d9465b21e8bd09374b0066fb2d5614 | eb78c1c3ac6cd9052aec557065070fbf | NaN | NaN | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | start your own business | NaN | NaN | NaN | English | married or domestic partnership | 150.0 | 6.0 | 6f1fbc6b2b | 2017-03-09 00:36:22 | 2017-03-09 00:32:59 | 2017-03-09 00:59:46 | 2017-03-09 00:36:26 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | some college credit, no degree | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 34.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | less than 100,000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | United States of America | United States of America | NaN | NaN | Not working but looking for work | NaN | 35000.0 | NaN | NaN | male | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 10.0 | 5bfef9ecb211ec4f518cfc1d2a6f3e0c | 21db37adb60cdcafadfa7dca1b13b6b1 | NaN | 0.0 | 0.0 | 0.0 | NaN | Within 7 to 12 months | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | work for a nonprofit | 1.0 | Full-Stack Web Developer | in an office with other developers | English | single, never married | 80.0 | 6.0 | f8f8be6910 | 2017-03-09 00:37:07 | 2017-03-09 00:33:26 | 2017-03-09 00:38:59 | 2017-03-09 00:37:10 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | some college credit, no degree | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 21.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15 to 29 minutes | United States of America | United States of America | software development and IT | NaN | Employed for wages | NaN | 70000.0 | NaN | NaN | male | NaN | NaN | 0.0 | 0.0 | 1.0 | NaN | 0.0 | NaN | NaN | 25.0 | 14f1863afa9c7de488050b82eb3edd96 | 21ba173828fbe9e27ccebaf4d5166a55 | 13000.0 | 1.0 | 0.0 | 0.0 | 0.0 | Within 7 to 12 months | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | work for a medium-sized company | 1.0 | Front-End Web Developer, Back-End Web Develo... | no preference | Spanish | single, never married | 1000.0 | 5.0 | 2ed189768e | 2017-03-09 00:37:58 | 2017-03-09 00:33:53 | 2017-03-09 00:40:14 | 2017-03-09 00:38:02 | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | Codenewbie | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | NaN | high school diploma or equivalent (GED) | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
3 | 26.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | between 100,000 and 1 million | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | I work from home | Brazil | Brazil | software development and IT | NaN | Employed for wages | NaN | 40000.0 | 0.0 | NaN | male | NaN | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 40000.0 | 14.0 | 91756eb4dc280062a541c25a3d44cfb0 | 3be37b558f02daae93a6da10f83f0c77 | 24000.0 | 0.0 | 0.0 | 0.0 | 1.0 | Within the next 6 months | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | work for a medium-sized company | NaN | Front-End Web Developer, Full-Stack Web Deve... | from home | Portuguese | married or domestic partnership | 0.0 | 5.0 | dbdc0664d1 | 2017-03-09 00:40:13 | 2017-03-09 00:37:45 | 2017-03-09 00:42:26 | 2017-03-09 00:40:18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | some college credit, no degree | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN |
4 | 20.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | between 100,000 and 1 million | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Portugal | Portugal | NaN | NaN | Not working but looking for work | NaN | 140000.0 | NaN | NaN | female | NaN | NaN | 0.0 | 0.0 | 1.0 | NaN | 0.0 | NaN | NaN | 10.0 | aa3f061a1949a90b27bef7411ecd193f | d7c56bbf2c7b62096be9db010e86d96d | NaN | 0.0 | 0.0 | 0.0 | NaN | Within 7 to 12 months | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | work for a multinational corporation | 1.0 | Full-Stack Web Developer, Information Security... | in an office with other developers | Portuguese | single, never married | 0.0 | 24.0 | 11b0f2d8a9 | 2017-03-09 00:42:45 | 2017-03-09 00:39:44 | 2017-03-09 00:45:42 | 2017-03-09 00:42:50 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | bachelor's degree | Information Technology | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Checking for sample representativity
As we mentioned in the introduction, most of our courses are on web and mobile development, but we also cover many other domains, like data science and game development. For the purpose of our analysis, we want to answer questions about a population of new coders that are interested in the subjects we teach. As a reminder, we'd like to know:
- Where are these new coders located.
- Which locations have the most new coders.
- How much money they're willing to spend on learning.
We first need to clarify whether the data set has the categories that we need. The JobRoleInterest
column describes the role(s) each participant is interested in working in. If a participant is interested in working in a certain domain, we can assume that they're also interested in learning about that domain. So let's take a look at the frequency distribution table of this column and determine whether the data we have is relevant.
<code="language-python"> # Frequency distribution table for 'JobRoleInterest 'fcc['JobRoleInterest'].value_counts(normalize = True) * 100</code="language-python">
Full-Stack Web Developer 11.770595
Front-End Web Developer 6.435927
Data Scientist 2.173913
Back-End Web Developer 2.030892
Mobile Developer 1.673341
Game Developer 1.630435
Information Security 1.315789
Full-Stack Web Developer, Front-End Web Developer 0.915332
Front-End Web Developer, Full-Stack Web Developer 0.800915
Product Manager 0.786613
Data Engineer 0.758009
User Experience Designer 0.743707
User Experience Designer, Front-End Web Developer 0.614989
Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer 0.557780
Back-End Web Developer, Full-Stack Web Developer, Front-End Web Developer 0.514874
Back-End Web Developer, Front-End Web Developer, Full-Stack Web Developer 0.514874
DevOps / SysAdmin 0.514874
Full-Stack Web Developer, Front-End Web Developer, Back-End Web Developer 0.443364
Front-End Web Developer, Full-Stack Web Developer, Back-End Web Developer 0.429062
Front-End Web Developer, User Experience Designer 0.414760
Full-Stack Web Developer, Mobile Developer 0.414760
Back-End Web Developer, Full-Stack Web Developer 0.386156
Full-Stack Web Developer, Back-End Web Developer 0.371854
Back-End Web Developer, Front-End Web Developer 0.286041
Full-Stack Web Developer, Back-End Web Developer, Front-End Web Developer 0.271739
Data Engineer, Data Scientist 0.271739
Front-End Web Developer, Mobile Developer 0.257437
Full-Stack Web Developer, Data Scientist 0.243135
Mobile Developer, Game Developer 0.228833
Data Scientist, Data Engineer 0.228833
...
Mobile Developer, User Experience Designer, Full-Stack Web Developer, DevOps / SysAdmin, Technical Writer 0.014302
Data Engineer, Data Scientist, Information Security 0.014302
Mobile Developer, Full-Stack Web Developer, Product Manager, Game Developer, Information Security, Front-End Web Developer, User Experience Designer, Data Scientist 0.014302
Back-End Web Developer, Game Developer, Data Engineer 0.014302
Full-Stack Web Developer, Information Security, Back-End Web Developer, Data Engineer, Mobile Developer, Data Scientist, DevOps / SysAdmin 0.014302
Software Specialist 0.014302
Game Developer, Information Security, Mobile Developer, DevOps / SysAdmin, Full-Stack Web Developer, Front-End Web Developer 0.014302
Back-End Web Developer, Game Developer, Full-Stack Web Developer, Front-End Web Developer, DevOps / SysAdmin 0.014302
Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer, Game Developer, Mobile Developer 0.014302
Game Developer, Information Security, Full-Stack Web Developer, Back-End Web Developer 0.014302
Front-End Web Developer, Data Scientist, Game Developer, Product Manager, Information Security 0.014302
Front-End Web Developer, Mobile Developer, Information Security, Full-Stack Web Developer, DevOps / SysAdmin, Back-End Web Developer, Game Developer 0.014302
Mobile Developer, Game Developer, Full-Stack Web Developer, Back-End Web Developer, Front-End Web Developer 0.014302
Data Engineer, Front-End Web Developer, Data Scientist, Full-Stack Web Developer 0.014302
Product Manager, Back-End Web Developer, Data Scientist, Full-Stack Web Developer, Game Developer, User Experience Designer, Information Security 0.014302
Mobile Developer, Back-End Web Developer, Front-End Web Developer, Full-Stack Web Developer 0.014302
User Experience Designer, Full-Stack Web Developer, Front-End Web Developer, Mobile Developer, User Interface Design 0.014302
Full-Stack Web Developer, Quality Assurance Engineer, Game Developer, Front-End Web Developer, User Experience Designer 0.014302
Quality Assurance Engineer, Front-End Web Developer, User Experience Designer, Game Developer 0.014302
DevOps / SysAdmin, Data Scientist, Full-Stack Web Developer, Information Security, Data Engineer, Back-End Web Developer 0.014302
Full-Stack Web Developer, Data Scientist, User Experience Designer, Mobile Developer, Front-End Web Developer 0.014302
Data Engineer, Product Manager, Data Scientist 0.014302
Full-Stack Web Developer, User Experience Designer, Back-End Web Developer, Data Scientist, Information Security, Criminal Defense Attorney-- focusing on cyber crimes 0.014302
Data Engineer, User Experience Designer, Front-End Web Developer, Game Developer, Data Scientist, Product Manager 0.014302
Front-End Web Developer, User Experience Designer, DevOps / SysAdmin, Back-End Web Developer, Data Scientist, Game Developer, Product Manager, Quality Assurance Engineer, Full-Stack Web Developer, Information Security, Mobile Developer 0.014302
Education 0.014302
DevOps / SysAdmin, Mobile Developer, Full-Stack Web Developer, Front-End Web Developer 0.014302
Back-End Web Developer, Data Scientist, Information Security, Front-End Web Developer, Quality Assurance Engineer, DevOps / SysAdmin, Data Engineer, Game Developer, Full-Stack Web Developer 0.014302
Data Scientist, Back-End Web Developer, Full-Stack Web Developer, Front-End Web Developer, User Experience Designer, Mobile Developer 0.014302
Game Developer, Mobile Developer, Back-End Web Developer, Front-End Web Developer, Information Security 0.014302
The information in the table above is quite granular, but from a quick scan it looks like:
- A lot of people are interested in web development (full-stack web development, front-end web development and back-end web development).
- A few people (1.
- Not too many people are interested in domains other than web and mobile development.
It's also interesting to note that many respondents are interested in more than one subject. It'd be useful to get a better picture of how many people are interested in a single subject and how many have mixed interests. Consequently, in the next code block, we'll:
- Split each string in the
JobRoleInterest
column to find the number of options for each participant.- We'll first drop the null values because we can't split
Nan
values.
- We'll first drop the null values because we can't split
- Generate a frequency table for the variable describing the number of options.
<code="language-python"># Split each string in the 'JobRoleInterest' column interests_no_nulls = fcc['JobRoleInterest'].dropna() splitted_interests = interests_no_nulls.str.split(',') # Frequency table for the var describing the number of options n_of_options = splitted_interests.apply(lambda x: len(x)) # x is a list of job options n_of_options.value_counts(normalize = True).sort_index() * 100</code="language-python">
<code="language-python"> 1 31.650458 2 10.883867 3 15.889588 4 15.217391 5 12.042334 6 6.721968 7 3.861556 8 1.759153 9 0.986842 10 0.471968 11 0.185927 12 0.300343 13 0.028604 Name: JobRoleInterest, dtype: float64</code="language-python">
Only 31.
The focus of our courses is on web and mobile development, so let's find out how many respondents chose at least one of these two options.
<code="language-python"> # Frequency table web_or_mobile = interests_no_nulls.str.contains( 'Web Developer|Mobile Developer') # returns an array of booleans freq_table = web_or_mobile.value_counts(normalize = True) * 100 print(freq_table) # Graph for the frequency table above import matplotlib.pyplot as plt plt.style.use('fivethirtyeight') freq_table.plot.bar() plt.title('Most Participants are Interested in \nWeb or Mobile Development", y = 1.08) # y pads the title upward plt.ylabel('Percentage', fontsize = 12) plt.xticks([0,1],['Web or mobile\ndevelopment', 'Other subject'], rotation = 0) # the initial xtick labels were True and False plt.ylim([0,100]) plt.show()</code="language-python">
<code="language-python"> True 86.241419 False 13.758581 Name: JobRoleInterest, dtype: float64</code="language-python">
Most people in this survey (roughly 8
New coders — locations and densities
Let's begin with finding out where these new coders are located, and the densities (how many new coders there are) for each location. This should be a good start for finding out the best two markets in which to run our ads campaign.
The data set provides information about the location of each participant at a country level. We can think of each country as an individual market, so we can frame our goal as finding the two best countries to advertise in.
We can start by examining the frequency distribution table of the CountryLive
variable, which describes what country each participant lives in (not their origin country). We'll only consider those participants who answered what role(s) they're interested in, to make sure we work with a representative sample.
<code="language-python"> # Isolate the participants that answered what role they'd be interested in fcc_good = fcc[fcc['JobRoleInterest'].notnull()].copy() # Frequency tables with absolute and relative frequencies absolute_frequencies = fcc_good['CountryLive'].value_counts() relative_frequencies = fcc_good['CountryLive'].value_counts(normalize = True) * 100 # Display the frequency tables in a more readable format pd.DataFrame(data = {'Absolute frequency': absolute_frequencies, 'Percentage': relative_frequencies} ) </code="language-python">
Absolute frequency | Percentage | |
---|---|---|
United States of America | 3125 | 45.700497 |
India | 528 | 7.721556 |
United Kingdom | 315 | 4.606610 |
Canada | 260 | 3.802281 |
Poland | 131 | 1.915765 |
Brazil | 129 | 1.886517 |
Germany | 125 | 1.828020 |
Australia | 112 | 1.637906 |
Russia | 102 | 1.491664 |
Ukraine | 89 | 1.301550 |
Nigeria | 84 | 1.228429 |
Spain | 77 | 1.126060 |
France | 75 | 1.096812 |
Romania | 71 | 1.038315 |
Netherlands (Holland, Europe) | 65 | 0.950570 |
Italy | 62 | 0.906698 |
Serbia | 52 | 0.760456 |
Philippines | 52 | 0.760456 |
Greece | 46 | 0.672711 |
Ireland | 43 | 0.628839 |
South Africa | 39 | 0.570342 |
Mexico | 37 | 0.541094 |
Turkey | 36 | 0.526470 |
Hungary | 34 | 0.497221 |
Singapore | 34 | 0.497221 |
New Zealand | 33 | 0.482597 |
Argentina | 32 | 0.467973 |
Croatia | 32 | 0.467973 |
Sweden | 31 | 0.453349 |
Indonesia | 31 | 0.453349 |
... | ... | ... |
Mozambique | 1 | 0.014624 |
Yemen | 1 | 0.014624 |
Cuba | 1 | 0.014624 |
Sudan | 1 | 0.014624 |
Guatemala | 1 | 0.014624 |
Bolivia | 1 | 0.014624 |
Jordan | 1 | 0.014624 |
Myanmar | 1 | 0.014624 |
Samoa | 1 | 0.014624 |
Gambia | 1 | 0.014624 |
Channel Islands | 1 | 0.014624 |
Vanuatu | 1 | 0.014624 |
Trinidad & Tobago | 1 | 0.014624 |
Papua New Guinea | 1 | 0.014624 |
Liberia | 1 | 0.014624 |
Panama | 1 | 0.014624 |
Rwanda | 1 | 0.014624 |
Cameroon | 1 | 0.014624 |
Aruba | 1 | 0.014624 |
Gibraltar | 1 | 0.014624 |
Anguilla | 1 | 0.014624 |
Botswana | 1 | 0.014624 |
Turkmenistan | 1 | 0.014624 |
Kyrgyzstan | 1 | 0.014624 |
Qatar | 1 | 0.014624 |
Angola | 1 | 0.014624 |
Nambia | 1 | 0.014624 |
Guadeloupe | 1 | 0.014624 |
Nicaragua | 1 | 0.014624 |
Cayman Islands | 1 | 0.014624 |
137 rows × 2 columns
45.
This is useful information, but we need to go more in depth than this and figure out how much money people are actually willing to spend on learning. Advertising in high-density markets where most people are only willing to learn for free is extremely unlikely to be profitable for us.
Spending money for learning
The MoneyForLearning
column describes in American dollars the amount of money spent by participants from the moment they started coding until the moment they completed the survey. Our company sells subscriptions at a price of $59 per month, and for this reason we're interested in finding out how much money each student spends per month.
We'll narrow down our analysis to only four countries: the US, India, the United Kingdom, and Canada. We do this for two reasons:
- These are the countries having the highest frequency in the frequency table above, which means we have a decent amount of data for each.
- Our courses are written in English, and English is an official language in all these four countries. The more people know English, the better our chances to target the right people with our ads.
Let's start with creating a new column that describes the amount of money a student has spent per month so far. To do that, we'll need to divide the MoneyForLearning
column to the MonthsProgramming
column. The problem is that some students answered that they have been learning to code for 0 months (it might be that they have just started). To avoid dividing by 0, we'll replace 0 with 1 in the MonthsProgramming
column.
<code="language-python"> # Replace 0s with 1s to avoid division by 0fcc_good['MonthsProgramming'].replace(0,1, inplace = True) # New column for the amount of money each student spends each month fcc_good['money_per_month'] = fcc_good['MoneyForLearning'] / fcc_good['MonthsProgramming'] fcc_good['money_per_month'].isnull().sum() </code="language-python">
<code="language-python">675</code="language-python">
Let's keep only the rows that don't have null values for the money_per_month
column.
<code="language-python">
# Keep only the rows with non-nulls in the money_per_month
column
fcc_good = fcc_good[fcc_good['money_per_month'].notnull()]</code="language-python">
We want to group the data by country, and then measure the average amount of money that students spend per month in each country. First, let's remove the rows having null values for the CountryLive
column, and check out if we still have enough data for the four countries that interest us.
<code="language-python"> # Remove the rows with null values in 'CountryLive' fcc_good = fcc_good[fcc_good['CountryLive'].notnull()] # Frequency table to check if we still have enough data fcc_good['CountryLive'].value_counts().head()</code="language-python">
<code="language-python"> United States of America 2933 India 463 United Kingdom 279 Canada 240 Poland 122 Name: CountryLive, dtype: int64</code="language-python">
This should be enough, so let's compute the average value spent per month in each country by a student. We'll compute the average using the mean.
<code="language-python"> # Mean sum of money spent by students each month countries_mean = fcc_good.groupby('CountryLive').mean() countries_mean['money_per_month'][['United States of America', 'India', 'United Kingdom', 'Canada']]</code="language-python">
CountryLive
United States of America 227.997996
India 135.100982
United Kingdom 45.534443
Canada 113.510961
Name: money_per_month, dtype: float64
The results for the United Kingdom and Canada are a bit surprising relative to the values we see for India. If we considered a few socio-economical metrics (like GDP per capita), we'd intuitively expect people in the UK and Canada to spend more on learning than people in India.
It might be that we don't have have enough representative data for the United Kingdom and Canada, or we have some outliers (maybe coming from wrong survey answers) making the mean too large for India, or too low for the UK and Canada. Or it might be that the results are correct.
Dealing with extreme outliers
Let's use box plots to visualize the distribution of the money_per_month
variable for each country.
<code="language-python"> # Isolate only the countries of interest only_4 = fcc_good[fcc_good['CountryLive'].str.contains( 'United States of America|India|United Kingdom|Canada')] # Box plots to visualize distributions import seaborn as sns sns.boxplot(y = 'money_per_month', x = 'CountryLive', data = only_4) plt.title('Money Spent Per Month Per Country\n(Distributions)', fontsize = 16) plt.ylabel('Money per month (US dollars)') plt.xlabel('Country') plt.xticks(range(4), ['US', 'UK', 'India', 'Canada']) # avoids tick labels overlap plt.show()</code="language-python">
It's hard to see on the plot above if there's anything wrong with the data for the United Kingdom, India, or Canada, but we can see immediately that there's something really off for the US: it says two persons spend \$50,000 or more each month for learning. This is not impossible, but it seems extremely unlikely, so we'll remove every value that goes over \$20,000 per month.
<code="language-python"> # Isolate only those participants who spend less than 10000 per month fcc_good = fcc_good[fcc_good['money_per_month'] < 20000]</code="language-python">
Now let's recompute the mean values and plot the box plots again.
<code="language-python"> # Recompute mean sum of money spent by students each month countries_mean = fcc_good.groupby('CountryLive').mean() countries_mean['money_per_month'][['United States of America', 'India', 'United Kingdom', 'Canada']]</code="language-python">
CountryLive
United States of America 183.800110
India 135.100982
United Kingdom 45.534443
Canada 113.510961
Name: money_per_month, dtype: float64
<code="language-python"> # Isolate again the countries of interest only_4 = fcc_good[fcc_good['CountryLive'].str.contains( 'United States of America|India|United Kingdom|Canada')] # Box plots to visualize distributions sns.boxplot(y = 'money_per_month', x = 'CountryLive', data = only_4) plt.title('Money Spent Per Month Per Country\n(Distributions)', fontsize = 16) plt.ylabel('Money per month (US dollars)') plt.xlabel('Country') plt.xticks(range(4), ['US', 'UK', 'India', 'Canada']) # avoids tick labels overlapplt.show()</code="language-python">
We can see a few extreme outliers for India (values over $2,500 per month), but it's unclear whether this is good data or not. Maybe these persons attended several bootcamps, which tend to be very expensive. Let's examine these two data points to see if we can find anything relevant.
<code="language-python"> # Inspect the extreme outliers for India india_outliers = only_4[ (only_4['CountryLive'] == 'India') & (only_4['money_per_month'] >= 2500)] india_outliers</code="language-python">
Age | AttendedBootcamp | BootcampFinish | BootcampLoanYesNo | BootcampName | BootcampRecommend | ChildrenNumber | CityPopulation | CodeEventConferences | CodeEventDjangoGirls | CodeEventFCC | CodeEventGameJam | CodeEventGirlDev | CodeEventHackathons | CodeEventMeetup | CodeEventNodeSchool | CodeEventNone | CodeEventOther | CodeEventRailsBridge | CodeEventRailsGirls | CodeEventStartUpWknd | CodeEventWkdBootcamps | CodeEventWomenCode | CodeEventWorkshops | CommuteTime | CountryCitizen | CountryLive | EmploymentField | EmploymentFieldOther | EmploymentStatus | EmploymentStatusOther | ExpectedEarning | FinanciallySupporting | FirstDevJob | Gender | GenderOther | HasChildren | HasDebt | HasFinancialDependents | HasHighSpdInternet | HasHomeMortgage | HasServedInMilitary | HasStudentDebt | HomeMortgageOwe | HoursLearning | ID.x | ID.y | Income | IsEthnicMinority | IsReceiveDisabilitiesBenefits | IsSoftwareDev | IsUnderEmployed | JobApplyWhen | JobInterestBackEnd | JobInterestDataEngr | JobInterestDataSci | JobInterestDevOps | JobInterestFrontEnd | JobInterestFullStack | JobInterestGameDev | JobInterestInfoSec | JobInterestMobile | JobInterestOther | JobInterestProjMngr | JobInterestQAEngr | JobInterestUX | JobPref | JobRelocateYesNo | JobRoleInterest | JobWherePref | LanguageAtHome | MaritalStatus | MoneyForLearning | MonthsProgramming | NetworkID | Part1EndTime | Part1StartTime | Part2EndTime | Part2StartTime | PodcastChangeLog | PodcastCodeNewbie | PodcastCodePen | PodcastDevTea | PodcastDotNET | PodcastGiantRobots | PodcastJSAir | PodcastJSJabber | PodcastNone | PodcastOther | PodcastProgThrowdown | PodcastRubyRogues | PodcastSEDaily | PodcastSERadio | PodcastShopTalk | PodcastTalkPython | PodcastTheWebAhead | ResourceCodecademy | ResourceCodeWars | ResourceCoursera | ResourceCSS | ResourceEdX | ResourceEgghead | ResourceFCC | ResourceHackerRank | ResourceKA | ResourceLynda | ResourceMDN | ResourceOdinProj | ResourceOther | ResourcePluralSight | ResourceSkillcrush | ResourceSO | ResourceTreehouse | ResourceUdacity | ResourceUdemy | ResourceW3S | SchoolDegree | SchoolMajor | StudentDebtOwe | YouTubeCodeCourse | YouTubeCodingTrain | YouTubeCodingTut360 | YouTubeComputerphile | YouTubeDerekBanas | YouTubeDevTips | YouTubeEngineeredTruth | YouTubeFCC | YouTubeFunFunFunction | YouTubeGoogleDev | YouTubeLearnCode | YouTubeLevelUpTuts | YouTubeMIT | YouTubeMozillaHacks | YouTubeOther | YouTubeSimplilearn | YouTubeTheNewBoston | money_per_month | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1728 | 24.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | between 100,000 and 1 million | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | India | India | NaN | NaN | A stay-at-home parent or homemaker | NaN | 70000.0 | NaN | NaN | male | NaN | NaN | 0.0 | 0.0 | 1.0 | NaN | 0.0 | NaN | NaN | 30.0 | d964ec629fd6d85a5bf27f7339f4fa6d | 950a8cf9cef1ae6a15da470e572b1b7a | NaN | 0.0 | 0.0 | 0.0 | NaN | Within the next 6 months | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | NaN | 1.0 | NaN | 1.0 | work for a startup | 1.0 | User Experience Designer, Mobile Developer... | in an office with other developers | Bengali | single, never married | 20000.0 | 4.0 | 38d312a990 | 2017-03-10 10:22:34 | 2017-03-10 10:17:42 | 2017-03-10 10:24:38 | 2017-03-10 10:22:40 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | bachelor's degree | Computer Programming | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 5000.000000 |
1755 | 20.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | India | India | NaN | NaN | Not working and not looking for work | NaN | 100000.0 | NaN | NaN | male | NaN | NaN | 0.0 | 0.0 | 1.0 | NaN | 0.0 | NaN | NaN | 10.0 | 811bf953ef546460f5436fcf2baa532d | 81e2a4cab0543e14746c4a20ffdae17c | NaN | 0.0 | 0.0 | 0.0 | NaN | I haven't decided | NaN | 1.0 | NaN | 1.0 | 1.0 | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | work for a multinational corporation | 1.0 | Information Security, Full-Stack Web Developer... | no preference | Hindi | single, never married | 50000.0 | 15.0 | 4611a76b60 | 2017-03-10 10:48:31 | 2017-03-10 10:42:29 | 2017-03-10 10:51:37 | 2017-03-10 10:48:38 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | NaN | 1.0 | NaN | 1.0 | NaN | 1.0 | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | bachelor's degree | Computer Science | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | 3333.333333 |
7989 | 28.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | between 100,000 and 1 million | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 15 to 29 minutes | India | India | software development and IT | NaN | Employed for wages | NaN | 500000.0 | 1.0 | NaN | male | NaN | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 20.0 | a6a5597bbbc2c282386d6675641b744a | da7bbb54a8b26a379707be56b6c51e65 | 300000.0 | 0.0 | 0.0 | 0.0 | 0.0 | more than 12 months from now | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | work for a multinational corporation | 1.0 | User Experience Designer, Back-End Web Devel... | in an office with other developers | Marathi | married or domestic partnership | 5000.0 | 1.0 | c47a447b5d | 2017-03-26 14:06:48 | 2017-03-26 14:02:41 | 2017-03-26 14:13:13 | 2017-03-26 14:07:17 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Not listened to anything yet. | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | bachelor's degree | Aerospace and Aeronautical Engineering | 2500.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 5000.000000 |
8126 | 22.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | NaN | NaN | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | India | India | NaN | NaN | Not working but looking for work | NaN | 80000.0 | NaN | NaN | male | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 80.0 | 69e8ab9126baee49f66e3577aea7fd3c | 9f08092e82f709e63847ba88841247c0 | NaN | 0.0 | 0.0 | 0.0 | NaN | I'm already applying | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | work for a startup | 1.0 | Back-End Web Developer, Full-Stack Web Develop... | in an office with other developers | Malayalam | single, never married | 5000.0 | 1.0 | 0d3d1762a4 | 2017-03-27 07:10:17 | 2017-03-27 07:05:23 | 2017-03-27 07:12:21 | 2017-03-27 07:10:22 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | bachelor's degree | Electrical and Electronics Engineering | 10000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | 5000.000000 |
13398 | 19.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | India | India | NaN | NaN | Unable to work | NaN | 100000.0 | NaN | NaN | male | NaN | NaN | 0.0 | 0.0 | 0.0 | NaN | 0.0 | NaN | NaN | 30.0 | b7fe7bc4edefc3a60eb48f977e4426e3 | 80ff09859ac475b70ac19b7b7369e953 | NaN | 0.0 | 0.0 | 0.0 | NaN | I haven't decided | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | work for a multinational corporation | 1.0 | Mobile Developer | no preference | Hindi | single, never married | 20000.0 | 2.0 | 51a6f9a1a7 | 2017-04-01 00:31:25 | 2017-04-01 00:28:17 | 2017-04-01 00:33:44 | 2017-04-01 00:31:32 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | bachelor's degree | Computer Science | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10000.000000 |
15587 | 27.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15 to 29 minutes | India | India | software development and IT | NaN | Employed for wages | NaN | 65000.0 | 0.0 | NaN | male | NaN | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 36.0 | 5a7394f24292cb82b72adb702886543a | 8bc7997217d4a57b22242471cc8d89ef | 60000.0 | 0.0 | 0.0 | 0.0 | 1.0 | I haven't decided | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | work for a startup | NaN | Full-Stack Web Developer, Data Scientist | from home | Hindi | single, never married | 100000.0 | 24.0 | 8af0c2b6da | 2017-04-03 09:43:53 | 2017-04-03 09:39:38 | 2017-04-03 09:54:39 | 2017-04-03 09:43:57 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | bachelor's degree | Communications | 25000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | 4166.666667 |
It seems that neither participant attended a bootcamp. Overall, it's really hard to figure out from the data whether these persons really spent that much money with learning. The actual question of the survey was "Aside from university tuition, about how much money have you spent on learning to code so far (in US dollars)?", so they might have misunderstood and thought university tuition is included. It seems safer to remove these two rows.
<code="language-python"> # Remove the outliers for India only_4 = only_4.drop(india_outliers.index) # using the row labels</code="language-python">
Looking back at the box plot above, we can also see more extreme outliers for the US (values over $6,000 per month). Let's examine these participants in more detail.
<code="language-python"> # Examine the extreme outliers for the US us_outliers = only_4[ (only_4['CountryLive'] == 'United States of America') & (only_4['money_per_month'] >= 6000)] us_outliers</code="language-python">
Age | AttendedBootcamp | BootcampFinish | BootcampLoanYesNo | BootcampName | BootcampRecommend | ChildrenNumber | CityPopulation | CodeEventConferences | CodeEventDjangoGirls | CodeEventFCC | CodeEventGameJam | CodeEventGirlDev | CodeEventHackathons | CodeEventMeetup | CodeEventNodeSchool | CodeEventNone | CodeEventOther | CodeEventRailsBridge | CodeEventRailsGirls | CodeEventStartUpWknd | CodeEventWkdBootcamps | CodeEventWomenCode | CodeEventWorkshops | CommuteTime | CountryCitizen | CountryLive | EmploymentField | EmploymentFieldOther | EmploymentStatus | EmploymentStatusOther | ExpectedEarning | FinanciallySupporting | FirstDevJob | Gender | GenderOther | HasChildren | HasDebt | HasFinancialDependents | HasHighSpdInternet | HasHomeMortgage | HasServedInMilitary | HasStudentDebt | HomeMortgageOwe | HoursLearning | ID.x | ID.y | Income | IsEthnicMinority | IsReceiveDisabilitiesBenefits | IsSoftwareDev | IsUnderEmployed | JobApplyWhen | JobInterestBackEnd | JobInterestDataEngr | JobInterestDataSci | JobInterestDevOps | JobInterestFrontEnd | JobInterestFullStack | JobInterestGameDev | JobInterestInfoSec | JobInterestMobile | JobInterestOther | JobInterestProjMngr | JobInterestQAEngr | JobInterestUX | JobPref | JobRelocateYesNo | JobRoleInterest | JobWherePref | LanguageAtHome | MaritalStatus | MoneyForLearning | MonthsProgramming | NetworkID | Part1EndTime | Part1StartTime | Part2EndTime | Part2StartTime | PodcastChangeLog | PodcastCodeNewbie | PodcastCodePen | PodcastDevTea | PodcastDotNET | PodcastGiantRobots | PodcastJSAir | PodcastJSJabber | PodcastNone | PodcastOther | PodcastProgThrowdown | PodcastRubyRogues | PodcastSEDaily | PodcastSERadio | PodcastShopTalk | PodcastTalkPython | PodcastTheWebAhead | ResourceCodecademy | ResourceCodeWars | ResourceCoursera | ResourceCSS | ResourceEdX | ResourceEgghead | ResourceFCC | ResourceHackerRank | ResourceKA | ResourceLynda | ResourceMDN | ResourceOdinProj | ResourceOther | ResourcePluralSight | ResourceSkillcrush | ResourceSO | ResourceTreehouse | ResourceUdacity | ResourceUdemy | ResourceW3S | SchoolDegree | SchoolMajor | StudentDebtOwe | YouTubeCodeCourse | YouTubeCodingTrain | YouTubeCodingTut360 | YouTubeComputerphile | YouTubeDerekBanas | YouTubeDevTips | YouTubeEngineeredTruth | YouTubeFCC | YouTubeFunFunFunction | YouTubeGoogleDev | YouTubeLearnCode | YouTubeLevelUpTuts | YouTubeMIT | YouTubeMozillaHacks | YouTubeOther | YouTubeSimplilearn | YouTubeTheNewBoston | money_per_month | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
718 | 26.0 | 1.0 | 0.0 | 0.0 | The Coding Boot Camp at UCLA Extension | 1.0 | NaN | more than 1 million | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15 to 29 minutes | United States of America | United States of America | architecture or physical engineering | NaN | Employed for wages | NaN | 50000.0 | NaN | NaN | male | NaN | NaN | 0.0 | 0.0 | 0.0 | NaN | 0.0 | NaN | NaN | 35.0 | 796ae14c2acdee36eebc250a252abdaf | d9e44d73057fa5d322a071adc744bf07 | 44500.0 | 0.0 | 0.0 | 0.0 | 1.0 | Within the next 6 months | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | work for a startup | 1.0 | User Experience Designer, Full-Stack Web Dev... | in an office with other developers | English | single, never married | 8000.0 | 1.0 | 50dab3f716 | 2017-03-09 21:26:35 | 2017-03-09 21:21:58 | 2017-03-09 21:29:10 | 2017-03-09 21:26:39 | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | bachelor's degree | Architecture | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 8000.000000 |
1222 | 32.0 | 1.0 | 0.0 | 0.0 | The Iron Yard | 1.0 | NaN | between 100,000 and 1 million | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | United States of America | United States of America | NaN | NaN | Not working and not looking for work | NaN | 50000.0 | NaN | NaN | female | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 50.0 | bfabebb4293ac002d26a1397d00c7443 | 590f0be70e80f1daf5a23eb7f4a72a3d | NaN | 0.0 | 0.0 | 0.0 | NaN | Within the next 6 months | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | work for a nonprofit | 1.0 | Front-End Web Developer, Mobile Developer,... | in an office with other developers | English | single, never married | 13000.0 | 2.0 | e512c4bdd0 | 2017-03-10 02:14:11 | 2017-03-10 02:10:07 | 2017-03-10 02:15:32 | 2017-03-10 02:14:16 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | 1.0 | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | bachelor's degree | Anthropology | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | 6500.000000 |
3184 | 34.0 | 1.0 | 1.0 | 0.0 | We Can Code IT | 1.0 | NaN | more than 1 million | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Less than 15 minutes | NaN | United States of America | software development and IT | NaN | Employed for wages | NaN | 60000.0 | NaN | NaN | male | NaN | NaN | 0.0 | 0.0 | 1.0 | NaN | 0.0 | NaN | NaN | 10.0 | 5d4889491d9d25a255e57fd1c0022458 | 585e8f8b9a838ef1abbe8c6f1891c048 | 40000.0 | 0.0 | 0.0 | 0.0 | 0.0 | I haven't decided | NaN | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | work for a medium-sized company | 0.0 | Quality Assurance Engineer, DevOps / SysAd... | in an office with other developers | English | single, never married | 9000.0 | 1.0 | e7bebaabd4 | 2017-03-11 23:34:16 | 2017-03-11 23:31:17 | 2017-03-11 23:36:02 | 2017-03-11 23:34:21 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | 1.0 | 1.0 | some college credit, no degree | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9000.000000 |
3930 | 31.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | between 100,000 and 1 million | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | United States of America | United States of America | NaN | NaN | Not working and not looking for work | NaN | 100000.0 | NaN | NaN | male | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 50.0 | e1d790033545934fbe5bb5b60e368cd9 | 7cf1e41682462c42ce48029abf77d43c | NaN | 1.0 | 0.0 | 0.0 | NaN | Within the next 6 months | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | work for a startup | 1.0 | DevOps / SysAdmin, Front-End Web Developer... | no preference | English | married or domestic partnership | 65000.0 | 6.0 | 75759e5a1c | 2017-03-13 10:06:46 | 2017-03-13 09:56:13 | 2017-03-13 10:10:00 | 2017-03-13 10:06:50 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | 1.0 | NaN | reactivex.io/learnrx/ & jafar husain | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | bachelor's degree | Biology | 40000.0 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | various conf presentations | NaN | NaN | 10833.333333 |
6805 | 46.0 | 1.0 | 1.0 | 1.0 | Sabio.la | 0.0 | NaN | between 100,000 and 1 million | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | United States of America | United States of America | NaN | NaN | Not working but looking for work | NaN | 70000.0 | NaN | NaN | male | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 45.0 | 69096aacf4245694303cf8f7ce68a63f | 4c56f82a348836e76dd90d18a3d5ed88 | NaN | 1.0 | 0.0 | 0.0 | NaN | Within the next 6 months | NaN | 1.0 | 1.0 | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | work for a multinational corporation | 1.0 | Full-Stack Web Developer, Game Developer, Pr... | no preference | English | married or domestic partnership | 15000.0 | 1.0 | 53d13b58e9 | 2017-03-21 20:13:08 | 2017-03-21 20:10:25 | 2017-03-21 20:14:36 | 2017-03-21 20:13:11 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | bachelor's degree | Business Administration and Management | 45000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15000.000000 |
7198 | 32.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15 to 29 minutes | United States of America | United States of America | education | NaN | Employed for wages | NaN | 55000.0 | NaN | NaN | male | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 4.0 | cb2754165344e6be79da8a4c76bf3917 | 272219fbd28a3a7562cb1d778e482e1e | NaN | 1.0 | 0.0 | 0.0 | 0.0 | I'm already applying | 1.0 | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | work for a multinational corporation | 0.0 | Full-Stack Web Developer, Back-End Web Developer | no preference | Spanish | single, never married | 70000.0 | 5.0 | 439a4adaf6 | 2017-03-23 01:37:46 | 2017-03-23 01:35:01 | 2017-03-23 01:39:37 | 2017-03-23 01:37:49 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | 1.0 | NaN | NaN | 1.0 | NaN | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | NaN | 1.0 | professional degree (MBA, MD, JD, etc.) | Computer Science | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | 14000.000000 |
7505 | 26.0 | 1.0 | 0.0 | 1.0 | Codeup | 0.0 | NaN | more than 1 million | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | United States of America | United States of America | NaN | NaN | Not working but looking for work | NaN | 65000.0 | NaN | NaN | male | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 40.0 | 657fb50800bcc99a07caf52387f67fbb | ad1df4669883d8f628f0b5598a4c5c45 | NaN | 0.0 | 0.0 | 0.0 | NaN | Within the next 6 months | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | work for a government | 1.0 | Mobile Developer, Full-Stack Web Developer, ... | in an office with other developers | English | single, never married | 20000.0 | 3.0 | 96e254de36 | 2017-03-24 03:26:09 | 2017-03-24 03:23:02 | 2017-03-24 03:27:47 | 2017-03-24 03:26:14 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | bachelor's degree | Economics | 20000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | 6666.666667 |
9778 | 33.0 | 1.0 | 0.0 | 1.0 | Grand Circus | 1.0 | NaN | between 100,000 and 1 million | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15 to 29 minutes | United States of America | United States of America | education | NaN | Employed for wages | NaN | 55000.0 | NaN | NaN | male | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 40.0 | 7a62790f6ded15e26d5f429b8a4d1095 | 98eeee1aa81ba70b2ab288bf4b63d703 | 20000.0 | 0.0 | 0.0 | 0.0 | 1.0 | Within the next 6 months | 1.0 | 1.0 | 1.0 | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | work for a medium-sized company | NaN | Full-Stack Web Developer, Data Engineer, Qua... | from home | English | single, never married | 8000.0 | 1.0 | ea80a3b15e | 2017-04-05 19:48:12 | 2017-04-05 19:40:19 | 2017-04-05 19:49:44 | 2017-04-05 19:49:03 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | NaN | NaN | master's degree (non-professional) | Chemical Engineering | 45000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 8000.000000 |
16650 | 29.0 | 0.0 | NaN | NaN | NaN | NaN | 2.0 | more than 1 million | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | United States of America | United States of America | NaN | NaN | Not working but looking for work | NaN | NaN | 1.0 | NaN | male | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 400000.0 | 40.0 | e1925d408c973b91cf3e9a9285238796 | 7e9e3c31a3dc2cafe3a09269398c4de8 | NaN | 1.0 | 1.0 | 0.0 | NaN | I'm already applying | 1.0 | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | work for a multinational corporation | 1.0 | Product Manager, Data Engineer, Full-Stack W... | in an office with other developers | English | married or domestic partnership | 200000.0 | 12.0 | 1a45f4a3ef | 2017-03-14 02:42:57 | 2017-03-14 02:40:10 | 2017-03-14 02:45:55 | 2017-03-14 02:43:05 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | associate's degree | Computer Programming | 30000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 16666.666667 |
16997 | 27.0 | 0.0 | NaN | NaN | NaN | NaN | 1.0 | more than 1 million | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15 to 29 minutes | United States of America | United States of America | health care | NaN | Employed for wages | NaN | 60000.0 | 0.0 | NaN | female | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 12.0 | 624914ce07c296c866c9e16a14dc01c7 | 6384a1e576caf4b6b9339fe496a51f1f | 40000.0 | 1.0 | 0.0 | 0.0 | 0.0 | Within 7 to 12 months | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | NaN | 1.0 | NaN | 1.0 | NaN | 1.0 | work for a medium-sized company | 1.0 | Mobile Developer, Game Developer, User Exp... | in an office with other developers | English | single, never married | 12500.0 | 1.0 | ad1a21217c | 2017-03-20 05:43:28 | 2017-03-20 05:40:08 | 2017-03-20 05:45:28 | 2017-03-20 05:43:32 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | some college credit, no degree | NaN | 12500.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 12500.000000 |
17231 | 50.0 | 0.0 | NaN | NaN | NaN | NaN | 2.0 | less than 100,000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | Kenya | United States of America | NaN | NaN | Not working but looking for work | NaN | 40000.0 | 0.0 | NaN | female | NaN | 1.0 | 0.0 | 1.0 | 1.0 | NaN | 0.0 | NaN | NaN | 1.0 | d4bc6ae775b20816fcd41048ef75417c | 606749cd07b124234ab6dff81b324c02 | NaN | 1.0 | 0.0 | 0.0 | NaN | Within the next 6 months | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | work for a nonprofit | 0.0 | Front-End Web Developer | in an office with other developers | English | married or domestic partnership | 30000.0 | 2.0 | 38c1b478d0 | 2017-03-24 18:48:23 | 2017-03-24 18:46:01 | 2017-03-24 18:51:20 | 2017-03-24 18:48:27 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | bachelor's degree | Computer Programming | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15000.000000 |
Out of these 11 extreme outliers, 6 people attended bootcamps, which justifies the large sums of money spent on learning. For the other five, it's hard to figure out from the data where they could have spent that much money on learning. Consequently, we'll remove those rows where participants reported thay they spend $6,000 each month, but they have never attended a bootcamp.
Also, the data shows that eight respondents had been programming for no more than three months when they completed the survey. They most likely paid a large sum of money for a bootcamp that was going to last for several months, so the amount of money spent per month is unrealistic and should be significantly lower (because they probably didn't spend anything for the next couple of months after the survey). As a consequence, we'll remove every these eight outliers.
In the next code block, we'll remove respondents that:
- Didn't attend bootcamps.
- Had been programming for three months or less when at the time they completed the survey.
<code="language-python"> # Remove the respondents who didn't attendent a bootcamp no_bootcamp = only_4[ (only_4['CountryLive'] == 'United States of America') & (only_4['money_per_month'] >= 6000) & (only_4['AttendedBootcamp'] == 0) ] only_4 = only_4.drop(no_bootcamp.index) # Remove the respondents that had been programming for less than 3 months less_than_3_months = only_4[ (only_4['CountryLive'] == 'United States of America') & (only_4['money_per_month'] >= 6000) & (only_4['MonthsProgramming'] <= 3) ] only_4 = only_4.drop(less_than_3_months.index)</code="language-python">
Looking again at the last box plot above, we can also see an extreme outlier for Canada — a person who spends roughly $5,000 per month. Let's examine this person in more depth.
<code="language-python"> # Examine the extreme outliers for Canada canada_outliers = only_4[ (only_4['CountryLive'] == 'Canada') & (only_4['money_per_month'] > 4500)] canada_outliers</code="language-python">
Age | AttendedBootcamp | BootcampFinish | BootcampLoanYesNo | BootcampName | BootcampRecommend | ChildrenNumber | CityPopulation | CodeEventConferences | CodeEventDjangoGirls | CodeEventFCC | CodeEventGameJam | CodeEventGirlDev | CodeEventHackathons | CodeEventMeetup | CodeEventNodeSchool | CodeEventNone | CodeEventOther | CodeEventRailsBridge | CodeEventRailsGirls | CodeEventStartUpWknd | CodeEventWkdBootcamps | CodeEventWomenCode | CodeEventWorkshops | CommuteTime | CountryCitizen | CountryLive | EmploymentField | EmploymentFieldOther | EmploymentStatus | EmploymentStatusOther | ExpectedEarning | FinanciallySupporting | FirstDevJob | Gender | GenderOther | HasChildren | HasDebt | HasFinancialDependents | HasHighSpdInternet | HasHomeMortgage | HasServedInMilitary | HasStudentDebt | HomeMortgageOwe | HoursLearning | ID.x | ID.y | Income | IsEthnicMinority | IsReceiveDisabilitiesBenefits | IsSoftwareDev | IsUnderEmployed | JobApplyWhen | JobInterestBackEnd | JobInterestDataEngr | JobInterestDataSci | JobInterestDevOps | JobInterestFrontEnd | JobInterestFullStack | JobInterestGameDev | JobInterestInfoSec | JobInterestMobile | JobInterestOther | JobInterestProjMngr | JobInterestQAEngr | JobInterestUX | JobPref | JobRelocateYesNo | JobRoleInterest | JobWherePref | LanguageAtHome | MaritalStatus | MoneyForLearning | MonthsProgramming | NetworkID | Part1EndTime | Part1StartTime | Part2EndTime | Part2StartTime | PodcastChangeLog | PodcastCodeNewbie | PodcastCodePen | PodcastDevTea | PodcastDotNET | PodcastGiantRobots | PodcastJSAir | PodcastJSJabber | PodcastNone | PodcastOther | PodcastProgThrowdown | PodcastRubyRogues | PodcastSEDaily | PodcastSERadio | PodcastShopTalk | PodcastTalkPython | PodcastTheWebAhead | ResourceCodecademy | ResourceCodeWars | ResourceCoursera | ResourceCSS | ResourceEdX | ResourceEgghead | ResourceFCC | ResourceHackerRank | ResourceKA | ResourceLynda | ResourceMDN | ResourceOdinProj | ResourceOther | ResourcePluralSight | ResourceSkillcrush | ResourceSO | ResourceTreehouse | ResourceUdacity | ResourceUdemy | ResourceW3S | SchoolDegree | SchoolMajor | StudentDebtOwe | YouTubeCodeCourse | YouTubeCodingTrain | YouTubeCodingTut360 | YouTubeComputerphile | YouTubeDerekBanas | YouTubeDevTips | YouTubeEngineeredTruth | YouTubeFCC | YouTubeFunFunFunction | YouTubeGoogleDev | YouTubeLearnCode | YouTubeLevelUpTuts | YouTubeMIT | YouTubeMozillaHacks | YouTubeOther | YouTubeSimplilearn | YouTubeTheNewBoston | money_per_month | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13659 | 24.0 | 1.0 | 0.0 | 0.0 | Bloc.io | 1.0 | NaN | more than 1 million | 1.0 | NaN | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 30 to 44 minutes | Canada | Canada | finance | NaN | Employed for wages | NaN | 60000.0 | NaN | NaN | male | NaN | NaN | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 250000.0 | 10.0 | 739b584aef0541450c1f713b82025181 | 28381a455ab25cc2a118d78af44d8749 | 140000.0 | 1.0 | 1.0 | 0.0 | 0.0 | I haven't decided | 1.0 | NaN | 1.0 | NaN | 1.0 | 1.0 | 1.0 | NaN | 1.0 | NaN | 1.0 | NaN | 1.0 | work for a multinational corporation | NaN | Mobile Developer, Full-Stack Web Developer, ... | from home | Yue (Cantonese) Chinese | single, never married | 10000.0 | 2.0 | 41c26f2932 | 2017-03-25 23:23:03 | 2017-03-25 23:20:33 | 2017-03-25 23:24:34 | 2017-03-25 23:23:06 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | bachelor's degree | Finance | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | 5000.0 |
Here, the situation is similar to some of the US respondents — this participant had been programming for no more than two months when he completed the survey. He seems to have paid a large sum of money in the beginning to enroll in a bootcamp, and then he probably didn't spend anything for the next couple of months after the survey. We'll take the same approach here as for the US and remove this outlier.
<code="language-python"> # Remove the extreme outliers for Canada only_4 = only_4.drop(canada_outliers.index)</code="language-python">
Let's recompute the mean values and generate the final box plots.
<code="language-python"> # Recompute mean sum of money spent by students each month only_4.groupby('CountryLive').mean()['money_per_month']</code="language-python">
CountryLive
Canada 93.065400
India 65.758763
United Kingdom 45.534443
United States of America 142.654608
Name: money_per_month, dtype: float64
<code="language-python"> # Visualize the distributions again sns.boxplot(y = 'money_per_month', x = 'CountryLive', data = only_4) plt.title('Money Spent Per Month Per Country\n(Distributions)', fontsize = 16) plt.ylabel('Money per month (US dollars)') plt.xlabel('Country') plt.xticks(range(4), ['US', 'UK', 'India', 'Canada']) # avoids tick labels overlap plt.show()</code="language-python">
Choosing the two best advertising markets
Obviously, one country we should advertise in is the US. Lots of new coders live there and they are willing to pay a good amount of money each month (roughly $143).
We sell subscriptions at a price of \$59 per month, and Canada seems to be the best second choice because people there are willing to pay roughly \$93 per month, compared to India (\$66) and the United Kingdom (\$45).
The data suggests strongly that we shouldn't advertise in the UK, but let's take a second look at India before deciding to choose Canada as our second best choice:
- \$59 doesn't seem like an expensive sum for people in India since they spend on average $66 each month.
- We have almost twice as more potential customers in India than we have in Canada:
<code="language-python"> # Frequency table for the 'CountryLive' column only_4['CountryLive'].value_counts(normalize = True) * 100</code="language-python">
United States of America 74.967908
India 11.732991
United Kingdom 7.163030
Canada 6.136072
Name: CountryLive, dtype: float64
It's not crystal clear what to choose between Canada and India. Although it seems more tempting to choose Canada, there are good chances that India might actually be a better choice because of the large number of potential customers.
At this point, it seems that we have several options:
- Advertise in the US, India, and Canada by splitting the advertisement budget in various combinations:
- 6
- 5
- Advertise only in the US and India, or the US and Canada. Again, it makes sense to split the advertisement budget unequally. For instance:
- 7
- 6
- Advertise only in the US.
It's probably best to send our analysis to the marketing team and let them use their domain knowledge to decide. They might want to do some extra surveys in India and Canada and then get back to us for analyzing the new survey data.
Conclusion
In this project, we analyzed survey data from new coders to find the best two markets to advertise in. The only solid conclusion we reached is that the US would be a good market to advertise in.
For the second best market, it wasn't clear-cut what to choose between India and Canada. We decided to send the results to the marketing team so they can use their domain knowledge to take the best decision.