The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. A notebook integrates code and its output into a single document that combines visualisations, narrative text, mathematical equations, and other rich media. The intuitive workflow promotes iterative and rapid development, making notebooks an increasingly popular choice at the heart […]
Whether you’re running out of memory on your local machine or simply want your code to run faster on a more powerful machine, there are many benefits to doing data science on a cloud server. A cloud server is really just a computer, like the one you’re using now, that’s located elsewhere. In this post, […]
Founded in 2010, Kaggle is a Data Science platform where users can share, collaborate, and compete. One key feature of Kaggle is “Competitions”, which offers users the ability to practice on real-world data and to test their skills with, and against, an international community. This guide will teach you how to approach and enter a […]
Creating a cloud-based data science environment for faster analysis There are times when working on data science problems with your local machine just doesn’t cut it anymore. Maybe your computer is old, and can’t work with larger datasets. Or maybe you want to be able to access your work from anywhere, and collaborate with others. […]
Learn to set up a Docker data science environment using Docker containers and the popular Jupyter Notebook in this free tutorial.
Here’s how to install PySpark on your computer and get started working with large data sets using Python and PySpark in a Jupyter Notebook.