September 28, 2022

Should I Learn Python 2 or 3? (and Why It Matters)

If you’re wondering whether you should learn Python 2 or Python 3, you’re not alone.

This is one of the questions we hear most often at Dataquest, where we teach Python as part of our Data Science curriculum.

Let’s settle this issue. 

Spoiler: at Dataquest, we only teach Python 3. If you’re ready to start learning now, enroll in the Introduction to Python course at no charge! 

If you’re still on the fence or just want more information, keep reading. This post gives some context behind the question, explains our position, and tells you which version you should learn.

If you want the short answer, here it is: You should learn Python 3 because it’s the version most relevant to today’s data science projects. Plus, it’s easy to learn, and there are few compatibility issues to worry about. 

Need a more thorough explanation? Let’s start by taking a brief look at the history behind the Python 2 versus 3 debate.

2008: The Birth of Python 3.0

No, that’s not a typo. Python 3 was released way back in 2008. 

If you’re new to the Python 2-3 controversy, take note that this squabble has been brewing for nearly a decade and a half! That alone should tell you what a huge deal it is.

The Backwards Incompatible Release of 2008

Python released its version 3.0 on December 3, 2008. What was special (read: infuriating) about this release is that it was a backwards incompatible release

As a result of this backwards incompatibility, migrating projects over from Python 2.x to 3.x would require large changes. This included not only individual projects and applications, but also all the libraries that form part of the Python ecosystem.

Python 3 Backlash

At the time, the change was seen as extremely controversial. Thus, many projects resisted the pain of moving over, especially in the scientific Python community. For example, it took two whole years for the main numeric library NumPy to release its first 3.x release! 

Other projects started to release 3.x compatible versions in the years that followed. By 2012, a lot of libraries had support for 3.x, but most were still being written in 2.x. Over time, tools were released that made porting code across easier, but there was still a great resistance to move.

In the few years that followed, several tools were released to help the transition of older codebases from Python 2 to Python 3.

Originally, Python had scheduled the “end of life” date for Python 2.x for 2015. In 2014, though, they announced they would extend the termination date to 2020. This was done, in part, to relieve worries for those users who could not yet migrate to Python 3.

Still, it became clear Python 2’s days were limited. In 2017, the popular web-framework Django announced that their new 2.0 version would not support Python 2.x.

In addition, numerous packages began announcing the end of support for 2.x. Even scientific libraries made a commitment to stop supporting 2.x by 2020 or sooner.

Fast-Forward to Today: Why Is This Still a Question?

Today, there are very few libraries that do not support Python 3.

But if Python 2.x isn’t supported anymore, then why is there still confusion surrounding the Python 2 versus 3 issue? 

The answer is two-fold. 

For one, there are a lot of older, free resources online to learn Python that are based in Python 2. This includes most MOOC courses from platforms like Coursera, Udemy, and edX.

Since data science students are always looking to save a buck, these free resources are tempting. 

Plus, Zed Shaw’s extremely popular Learn Python the Hard Way was written in Python 2.x. And he didn’t write a book about Python 3 until 2017 — almost a full decade after its release! 

Until recently, I thought this was just because Zed had been too lazy to update his course for all of those years. But then I found his controversial article: The Case against Python 3.

Despite Eevee’s excellent rebuttal For Python 3, Zed’s diatribe took its toll. Of course, the number of people who agree with Zed are in the extreme minority today. But the entire controversy slowed the transition from Python 2 to 3. And it also muddied the waters for a lot of newcomers to the field. 

So Which Python Should I Learn?

With all the debate over Python 2 versus 3, you’d think the decision to learn one or the other would be a difficult one. In reality, though, it’s pretty straightforward. 

Python 3 Is the Clear Winner

Python 3.x is the future, and with Python 2.x support dwindling, you should spend your time learning the version that will endure.

And if you’re worried about compatibility issues, don’t be. I’ve used Python 3.x exclusively and rarely run into compatibility issues.

Very occasionally (maybe once every 3-4 months), I’ll find I’m trying to run something that requires Python 2 support. In these rare cases, Python’s virtualenv allows me to instantly create a 2.x environment on my machine to run that piece of legacy software.

Don’t Waste Your Time with Python 2

Let’s be clear: Python 2 is outdated. There will be no future security or bug fixes for Python 2.x, and your time is better spent learning 3.x.

In the unlikely event that you end up working with a legacy Python 2 code base, tools like Python-Future will make it easy for you to use having only learned Python 3.

Dataquest is the best online platform for learning to be a Data Scientist using Python (3.x, of course!).We have graduates working at SpaceX, Amazon, and more. If that interests you, you can sign up and complete our first course for free at Dataquest.io.

Dataquest

About the author

Dataquest

Dataquest teaches through challenging exercises and projects instead of video lectures. It's the most effective way to learn the skills you need to build your data career.