This is the second in a series of posts on how to build a Data Science Portfolio. If you like this and want to know when the next post in the series is released, you can subscribe at the bottom of the page.
You can read the first post in this series here: Building a data science portfolio: Storytelling with data.
Blogging can be a fantastic way to demonstrate your skills, learn topics in more depth, and build an audience. There are quite a few examples of data science and programming blogs that have helped their authors land jobs or make important connections. Blogging is one of the most important things that any aspiring programmer or data scientist should be doing on a regular basis.
Unfortunately, one very arbitrary barrier to blogging can be knowing how to setup a blog in the first place. In this post, we’ll cover how to create a blog using Python, how to create posts using Jupyter notebook, and how to deploy the blog live using Github Pages. After reading this post, you’ll be able to create your own data science blog, and author posts in a familiar and simple interface.
Fundamentally, a static site is just a folder full of HTML files. We can run a server that allows others to connect to this folder and retrieve files. The nice thing about this is that it doesn’t require a database or any other moving parts, and it’s very easy to host on sites like Github. It’s a great idea to have your blog be a static site, because it makes maintaining it very simple. One way to create a static site is to manually edit HTML, then upload the folder full of HTML to a server. In this scenario, you would at the minimum need an
index.html file. If your website URL was
thebestblog.com, and visitors visited
http://www.thebestblog.com, they would be shown the contents of
index.html. Here’s how a folder of HTML might look for
thebestblog.com │ index.html │ first-post.html │ how-to-use-python.html │ how-to-do-machine-learning.html │ styles.css
On the above site, visiting
http://www.thebestblog.com/first-post.html would show you the content in
first-post.html, and so on.
first-post.html might look like this:
<html> <head> <title>The best blog!</title> <meta name="description" content="The best blog!"/> <link rel="stylesheet" href="styles.css" /> </head> <body> <h1>First post!</h1> <p>This is the first post in what will soon become (if it already isn't) the best blog.</p> <p>Future posts will teach you about data science.</p> <div class="footer"> <p>Thanks for visiting!</p> </div> </body> </html>
You might immediately notice a few problems with manually editing HTML:
- Manually editing HTML is incredibly painful.
- If you want to make multiple posts, you have to copy over styles, and other elements, like the title and footer.
Generally, when you’re blogging, you want to focus on the content, not spend time fighting with HTML. Thankfully, you can create a blog without hand editing HTML using tools known as static site generators.
Static Site Generators
Static site generators allow you to write blog posts in simple formats, usually markdown, then define some settings. The generators then convert your posts into HTML automatically. Using a static site generator, we’d be able to dramatically simplify
# First post! This is the first post in what will soon become (if it already isn't) the best blog. Future posts will teach you about data science.
This is much easier to manage than the HTML file! Common elements, like the title and the footer, can be placed into templates, so they can be easily changed.
There are a few different static site generators. The most popular is called Jekyll, and is written in Ruby. Since we’ll be making a data science blog, we want a static site generator that can process Jupyter notebooks.
Pelican is a static site generator that is written in Python that can take in Jupyter notebook files and convert them to HTML blog posts. Pelican also makes it easy to deploy our blog to Github Pages, where other people can read our blog.
Before we get started, here’s a repo that’s an example of what we’ll eventually get to.
If you don’t have Python installed, you’ll need to do some preliminary setup before we get started. Here are setup instructions for Python. We recommend using
Python 3.5. Once you have Python installed:
- Create a folder – we’ll put our blog content and styles in this folder. We’ll refer to it in this tutorial as
jupyter-blog, but you can call it whatever you want.
- Create a file called
.gitignore, and add in the content from this file. We’ll need to eventually commit our repo to git, and this will exclude some files when we do.
- Create and activate a virtual environment.
- Create a file called
jupyter-blog, with the following content:
Markdown==2.6.6 pelican==3.6.3 jupyter>=1.0 ipython>=4.0 nbconvert>=4.0 beautifulsoup4 ghp-import==0.4.1 matplotlib==1.5.1
pip install -r requirements.txtin
jupyter-blogto install all of the packages in
Creating your data science blog
Once you’ve done the preliminary setup, you’re ready to create your blog! Run
jupyter-blog to start an interactive setup sequence for your blog. You’ll get a sequence of questions that will help you setup your blog properly. For most of the questions, it’s okay to just hit
Enter and accept the default value. The only ones you should fill out are the title of the website, the author of the website,
n for the URL prefix, and the timezone. Here’s an example:
(jupyter-blog)➜ jupyter-blog ✗ pelican-quickstart Welcome to pelican-quickstart v3.6.3. This script will help you create a new Pelican-based website. Please answer the following questions so this script can generate the files needed by Pelican. > Where do you want to create your new web site? [.] > What will be the title of this web site? Vik's Blog > Who will be the author of this web site? Vik Paruchuri > What will be the default language of this web site? [en] > Do you want to specify a URL prefix? e.g., http://example.com (Y/n) n > Do you want to enable article pagination? (Y/n) > How many articles per page do you want?  > What is your time zone? [Europe/Paris] America/Los_Angeles > Do you want to generate a Fabfile/Makefile to automate generation and publishing? (Y/n) > Do you want an auto-reload & simpleHTTP script to assist with theme and site development? (Y/n) > Do you want to upload your website using FTP? (y/N) > Do you want to upload your website using SSH? (y/N) > Do you want to upload your website using Dropbox? (y/N) > Do you want to upload your website using S3? (y/N) > Do you want to upload your website using Rackspace Cloud Files? (y/N) > Do you want to upload your website using GitHub Pages? (y/N)
pelican-quickstart, you should have two new folders in
output, along with several files, such as
publishconf.py. Here’s an example of what should be in the folder:
jupyter-blog │ output │ content │ .gitignore │ develop_server.sh │ fabfile.py │ Makefile │ requirements.txt │ pelicanconf.py │ publishconf.py
Installing the Jupyter plugin
Pelican doesn’t support writing blog posts using Jupyter by default – we’ll need to install a plugin that enables this behavior. We’ll install the plugin as a git submodule to make it easier to manage. If you don’t have git installed, you can find instructions here. Once you have git installed:
git initto initialize the current folder as a git repository.
- Create the folder
git submodule add git://github.com/danielfrg/pelican-ipynb.git plugins/ipynbto add in the plugin.
You should now have a
.gitmodules file and a
jupyter-blog │ output │ content │ plugins │ .gitignore │ .gitmodules │ develop_server.sh │ fabfile.py │ Makefile │ requirements.txt │ pelicanconf.py │ publishconf.py
In order to activate the plugin, we’ll need to modify
pelicanconf.py and add these lines at the bottom:
MARKUP = ('md', 'ipynb') PLUGIN_PATH = './plugins' PLUGINS = ['ipynb.markup']
These lines tell Pelican to activate the plugin when generating HTML.
Enjoying this post? Learn data science with Dataquest!
Start for Free
- Learn from the comfort of your browser.
- Work with real-life data sets.
- Build a portfolio of projects.
Writing your first post
Once the plugin is installed, we can create a first post:
- Create a Jupyter notebook with some basic content. Here’s an example you can download if you want.
- Copy the notebook file into the
- Create a file that has the same name as your notebook, but with the extension
.ipynb-meta. Here’s an example.
- Add the following content to the
ipynb-metafile, but change the fields to match your own post:
Title: First Post Slug: first-post Date: 2016-06-08 20:00 Category: posts Tags: python firsts Author: Vik Paruchuri Summary: My first post, read it to find out.
Here’s an explanation of the fields:
Title– the title of the post.
Slug– the path at which the post will be accessed on the server. If the slug is
first-post, and your server is
jupyter-blog.com, you’d access the post at
Date– the date the post will be published.
Category– a category for the post – this can be anything.
Tags– a space-separated list of tags to use for the post. These can be anything.
Author– the name of the author of the post.
Summary– a short summary of your post.
You’ll need to copy in a notebook file, and create an
ipynb-meta file whenever you want to add a new post to your blog.
Once you’ve created the notebook and the meta file, you’re ready to generate your blog HTML files. Here’s an example of what the
jupyter-blog folder should look like now:
jupyter-blog │ output │ content │ first-post.ipynb │ first-post.ipynb-meta │ plugins │ .gitignore │ .gitmodules │ develop_server.sh │ fabfile.py │ Makefile │ requirements.txt │ pelicanconf.py │ publishconf.py
In order to generate HTML from our post, we’ll need to run Pelican to convert the notebooks to HTML, then run a local server to be able to view them:
- Switch to the
pelican contentto generate the HTML.
- Switch to the
python -m pelican.server.
localhost:8000in your browser to preview the blog.
You should be able to browse a listing of all the posts in your blog, along with the specific post you created.
Creating a Github Page
Github Pages is a feature of Github that allows you to quickly deploy a static site and let anyone access it using a unique URL. In order to set it up, you’ll need to:
- Sign up for Github if you haven’t already.
- Create a repository called
usernameis your Github username. Here’s a more detailed guide on how to do this.
- Switch to the
- Add the repository as a remote for your local git repository by running
git remote add origin email@example.com:username/username.github.io.git– replace both references to
usernamewith your Github username.
A Github page will display whatever HTML files are pushed up to the
master branch of the repository
username.github.io at the URL
username.github.io (the repository name and the URL are the same).
First, we’ll need to modify Pelican so that URLs point to the right spot:
publishconf.py, so that it is set to
usernameis your Github username.
pelican content -s publishconf.py. When you want to preview your blog locally, run
pelican content. Before you deploy, run
pelican content -s publishconf.py. This uses the correct settings file for deployment.
Committing your files
If you want to store your actual notebooks and other files in the same Git repo as a Github Page, you can use git branches.
git checkout -b devto create and switch to a branch called
dev. We can’t use
masterto store our notebooks, since that’s the branch that’s used by Github Pages.
- Create a commit and push to Github like normal (using
git commit, and
Deploy to Github Pages
We’ll need to add the content of the blog to the
master branch for Github Pages to work properly. Currently, the HTML content is inside the folder
output, but we need it to be at the root of the repository, not in a subfolder. We can use the ghp-import tool for this:
ghp-import output -b masterto import everything in the
outputfolder to the
git push origin masterto push your content to Github.
- Try visiting
username.github.io– you should see your page!
Whenever you make a change to your blog, just re-run the
pelican content -s publishconf.py,
git push commands above, and your Github Page will be updated.
We’ve come a long way! You now should be able to author blog posts and push them to Github Pages. Anyone should be able to access your blog at
username with your Github username). This gives you a great way to show off your data science portfolio.
As you write more posts and gain an audience, you may want to dive more into a few areas:
- Pelican supports themes. You can see dozens of themes here, and use the one you like most.
- Your own custom URL.
username.github.iois nice, but sometimes you want a more custom domain. Here’s a guide on using a custom domain with Github Pages.
- Check out the list of plugins here. Plugins can help you setup analytics, commenting, and more.
If you liked this, you might like to read the other posts in our ‘Build a Data Science Portfolio’ series: