Data Retrieval and Cleaning: Tracking Migratory Patterns
Advancing your skills is an important part of being a data scientist. When starting out, you mostly focus on learning a programming language, proper use of third party tools, displaying visualizations, and the theoretical understanding of statistical algorithms. The next step is to test your skills on more difficult data sets. Sometimes these data sets […]
Read MoreUsing Linear Regression for Predictive Modeling in R
In R programming, predictive models are extremely useful for forecasting future outcomes and estimating metrics that are impractical to measure. For example, data scientists could use predictive models to forecast crop yields based on rainfall and temperature, or to determine whether patients with certain traits are more likely to react badly to a new medication. […]
Read MoreUsing Box Plots to Explore Women’s Height Data
I’ve recently been working on the Digital Panopticon, a digital history project that has brought together (and created) massive amounts of data about British prisoners and convicts in the long 19th century, including several datasets which include heights for women. Adult height is strongly influenced by environmental factors in childhood, one of the most important […]
Read MoreVisualizing Women’s Marches: Part 2
This post is the second in a series on visualizing the Women’s Marches from January 2017. In the first post, we explored the intensive data collection and data cleaning process necessary to produce clean pandas dataframes. Data Enrichment Because we eventually want to be able to build maps visualizing the marches, we need latitude and […]
Read MoreExploring Women’s Army Auxiliary Corps Data
Today I want to go on an excursion in “catalogues as data“. The UK National Archives’ Discovery catalogue is an excellent resource for this activity, because a) it has a lot of records that have document descriptions at ‘item’ or ‘piece’ level in the catalogue, containing quite structured information (like dates, places, occupations) that can […]
Read MoreVisualizing Women’s Marches: Part 1
In celebration of Women’s History Month, I wanted to better understand the scale of the Women’s Marches that occurred in January 2017. Shortly after the marches, Vox published a map visualizing the estimated turnout across the entire country. This map is excellent at displaying: locations with the highest relative turnouts hubs and clusters of where […]
Read MoreR Fundamentals: Building a Simple Grade Calculator
In this beginner R programming tutorial, learn the basics and syntax of R as you go hands-on building a simple grade calculator.
Read MoreData Science Terms and Jargon: A Glossary
Getting started in data science can be overwhelming, especially when you consider the variety of concepts and techniques a data scienctist needs to master in order to do her job effectively. Even the term “data science” can be somewhat nebulous, and as the field gains popularity it seems to lose definition. To help those new […]
Read MoreIntroduction to AWS for Data Scientists
These days, many businesses use cloud based services; as a result various companies have started building and providing such services. Amazon began the trend, with Amazon Web Services (AWS). While AWS began in 2006 as a side business, it now makes $14.5 billion in revenue each year. Other leaders in this area include: Google—Google Cloud […]
Read MoreTutorial: Python Functions and Functional Programming
Learn about functions in Python and master the basics of functional Python programming in this in-depth tutorial for data scientists and programmers.
Read More