Trillions of pixels have been deployed to answer the question ‘What makes a good data scientist?’ Most of these articles have focused on skills and tools of data science while almost none have discussed the personalities that make good, even great, data scientists. A Google search for “data science skills” returns 38 million results; ‘data scientist traits’ yields an anemic 938,000 results.
Given the range of free, nearly-free, and paid training on the internet, just about anyone can master the tools and skills that go into the practice of data science. But gaining those tools and applying them well requires a set of traits that are hard to identify and still harder to master.
What is a trait?
A skill is a set of technical or practical knowledge, applicable in a limited set of circumstances. Programming is a skill, so is statistical analysis. For that matter, so is baking. But knowing the correct proportion of liquid to dry ingredients in a cake is unlikely to be useful in building that sorting algorithm. And the characteristics of a Gaussian distribution won't keep your biscuits light and flaky.
A trait, on the other hand, is a mental habit with broad applications in dealing with Life, the Universe, and Everything. We might think of traits as virtues in the ancient or religious sense. To return to our baking example, Python won’t help the dough rise, but patience will prevent us from throwing the mixing bowl across the room when it doesn’t. Contrary to common belief, traits aren't fixed in stone and we can develop over time the ones we lack.
When I set out to identify the k-traits of a great data scientist, I talked to the data scientists I know personally and those I follow on Twitter, and read many of the articles that smarter people have written on the subject. That produced about 15 traits, which is way too many.
I ran a primitive clustering algorithm – my journalist brain – on those fifteen and came up with five essential traits of a great data scientist:
In many ways, curiosity - that insatiable hunger for knowledge and understanding - is the first and foremost trait of a data scientist. Our job is to ask questions of data and of people. Most people don't know or care about the limitations of data, so we must be curious about what they are doing and what they want to achieve. And our field is evolving so quickly that we have to maintain our interest to maintain our edge.
To develop the kind of curiosity that helps us build and maintain our skills, we ask questions that we 'should' already know the answer to. It's a challenging trait to develop, and it requires a stalwart paragon to exemplify it: Elle Woods.
The perennially perky heroine of 'Legally Blonde' (if you haven't seen this masterpiece, stop reading and do so immediately) had a gift for unabashedly asking questions that were downright elementary, even in areas of her own expertise, out of genuine curiosity about people and subjects. She seems to have had it as a natural gift, so if we want to develop it we should set our egos aside and be unafraid of not being the smartest person in the room.
Clarity is an essential outgrowth of curiosity. Whether we’re writing code, performing an analysis, or just cleaning up messy data, we should understand what we’re doing and why we’re doing it. It is, after all, "data science" not "data randomly trying things." We can think of data science as resting on the back of a great turtle, which in turn rests on another turtle. It's turtles all the way down, and we should be able to explain each turtle as if our interlocutor is five, or holds five PhDs.
We develop clarity by constantly asking two questions of ourselves: "Why?" and "So what?" For every step you take in your analysis, ask yourself why you're doing it, and what it means for both the specific project and the larger context of what you're doing. And, like a curious toddler or surly teen, keep asking those questions until you get to an irreducible answer. In other words, use "Why" and "So what" to see the turtles all the way down.
Creativity is probably the most controversial element of this list, mostly because it's misunderstood. People by and large see it as binary: either someone has it or they don't. Mozart, for example, is popularly thought to have pulled his music from the aether, publishing full operas by age eight purely on the basis of native creativity. The story becomes less romantic when you learn that Leopold Mozart, Wolfgang's father, was a successful music teacher who used the young Wolfgang as a lab rat for his pedagogical methods almost from the boy's birth.
With all due respect to Mozart, who certainly had innate musical genius, creativity can be learned and developed in much the same way an athlete develops the skills for their sport.
Try these regular creative pursuits:
- Unedited stream-of-consciousness writing (e.g., 'morning pages')
- Reading articles well outside of anything you know
Also try small, spontaneous changes to your life like altering your route to work, or solving a familiar problem in a different way, will all help you develop your creativity. I should point out that much smarter people than me dedicate their lives to studying creativity (Google returns 167 million hits for 'develop creativity'), so there is a whole world to help you bolster your creativity. (Personally, I'm very fond of Lewis Carroll's injunction to believe three impossible things before breakfast.)
Even as we want be ever more creative, we must keep at least one foot on the ground, and a healthy skepticism helps us do exactly that. Skepticism reins in our creativity, keeping us in the real world rather than down the rabbit hole.
But how do you develop skepticism while keeping up the optimistic curiosity that drives you to keep learning? Begin by remembering that while you may be asking questions like Elle Woods, it doesn't mean that you take every answer at face value. Keep up the curiosity and explore your data, but remember that it's only as good as the methods that collected it. Find out, then examine the assumptions and expectations of the people who gave you the data. When you build your model, look at your own assumptions and whether they map to the assumptions of the model, and whether they actually fit what the data say.
As the eminently quotable statistician George E. P. Box famously said, ‘All models are wrong, but some are useful.’ By maintaining this skeptical attitude, we can build self-regulation into the optimism inherent in data science.
Professor Box brings us to the final, unifying trait of a data scientist: humility. Data science is not magic, and we are not wizards. The curious data scientist knows that he doesn’t know everything and is always looking for new things to learn. The clear-eyed data scientist swallows her pride and adapts her presentation to her audience, even if it means forgoing some ingenious technique she built. The creative data scientist thinks ‘outside the box,’ even if it feels silly. And, of course, the skeptical data scientist knows to mistrust her data, and her models, evaluate them all with sharp clarity, and present her results with all the necessary caveats.
Tl;dr: The five traits of a data scientist are
- Curiosity: Cultivate your inner Elle Woods
- Clarity: Be able to explain what you’re doing like I’m five,
or have five PhDs
- Creativity: Think differently, laterally, better
- Skepticism: All models are wrong, some are useful
- Humility: You’re not a wizard, Harriet
And there you have it: the essential traits of a data scientist. As I said at the beginning, though, my original list was fifteen, so I'm sure there's room for improvement. What do you all think? Did I miss something essential? Is even this brief list too long?