In data science, communication is critical.
Of course, all data science work requires the technical skills to acquire your data, clean it, and perform your analysis. But as you're doing this, it’s also important to keep the why in mind. When you’re given a project, it’s worth stopping to ask yourself what value it has to the company, and where it fits into the larger picture.
Knowing the answers to those why questions is the first step in a process that’s as important as your actual analysis: communicating your findings to an audience of (usually) non-data scientists.
Data science communication is a topic Kristen Sosulski knows a lot about. She’s the Clinical Associate Professor of Information, Operations, and Management Sciences at New York University Stern School of Business, and she has essentially made a career out of teaching how to effectively communicate, both in academia and in business. She’s even written a book, Data Visualization Made Simple, about communicating data science results effectively with visualizations.
“Presenting and communicating your insights across an organization can be really, really powerful,” says Kristen.
So how can you approach communicating your models in a way that’s effective?
Relating The Problem
Let’s say you’ve built a model and have the opportunity to present your findings in front of a major decision-maker in the company. It’s your job to explain what the model means and the impact it could have on the business.
Kristen advocates starting by identifying the problem or challenge you’re addressing. Relate the problem to the interests of the audience, and help them understand the larger context. To get the audience on your side, ask questions before proposing your solution. For example:
Have you ever experienced this?
Have you ever observed that in our business?
This isn’t just a rhetorical technique, it’s a way of measuring what information your audience needs to understand the rest of your pitch. “If no one thinks this is a problem, then you have to start by introducing the problem, and then building the case for the problem,” says Kristen “You don't want to lose your audience by alienating them because they think this isn't a problem at all.”
Keep in mind that what seems like an obvious problem to you isn’t necessarily going to be obvious to your audience, particularly if you’ve spent the last few weeks with your head buried deep in data sets nobody else has seen yet. The problem you found in the data and are attempting to solve with your model could be something that nobody else is really aware of yet.
Once you’ve made the case for the problem itself, you can then present common solutions and why those aren’t the best, most effective fit.
“You want to create some type of suspense, but you're rooting all of this in a narrative,” says Kristen. “Starting with a problem, showing alternative solutions, and then you're ultimately going to reveal your solution.”
Communicating with Data
Although your pitch is often going to be primarily language-based (whether it’s a written report or a standup presentation at a meeting), representing your data visually is absolutely crucial to communicating its meaning with your audience. Very few people can look at a spreadsheet or table and draw quick, clear conclusions about what the data says. Anyone can compare the size of bars on a bar chart, or follow the trend on a line graph.
Data visualization is a crucial skill at every stage of the data science process, of course. “There are a lot of angles that you can take with visualization, and ways to look at it,” says Kristen. “You can look at it purely from the technical viewpoint, you can look at it from the exploratory viewpoint, like using visualization as a tool to explore your data.”
But it’s also critical for communication.
“I think about data visualization as something that we have in the toolkit to help people better understand our insights and our data,” says Kristen.
“Just on a human level, visualizations just allow us to perceive information a lot more clearly when they're well designed.”
When designing visuals for communication outside your own team, it’s important to keep your audience in mind. Your coworkers probably don’t have the context on your problem that your team has, and they may not have the technical knowledge, either. One of the biggest challenges of data science communication is tailoring your presentation to your audience's technical level and still getting your point across without overwhelming them (or patronizing them).
A good trick for putting yourself in the shoes of a non-technical audience is thinking about the information you want reported to you when you’ve taken a car into the auto repair shop (assuming you’re not a car mechanic yourself). Generally, the most convincing mechanics are going to be the ones who can:
- Explain your problem in clear, simple terms.
- Show you the evidence the problem exists.
- Explain in clear, simple terms how the problem can be fixed.
- Give you a clear timeline and price for what the fix will cost.
You don’t want a 30-minute lecture on the factors that affect engine efficiency. You just want to be confident that you know what the problem really is and that the mechanic knows the best way to fix it.
This applies to communicating in data science, too, but now you’re the mechanic. When in doubt, the best approach is to keep it simple. Leaving in all of the details can be confusing and make your charts less readable, so include only what is necessary to communicate your point.
“Know that you don't have to show every data point at once, that you can slow it down. You can show a few data points at a time to help build your story and your narrative,” says Kristen.
Remember: you can always provide more information by answering questions if your coworkers feel they haven’t seen enough. But if you throw a series of complicated, difficult-to-read charts at them, you risk completely losing them, and that's difficult to undo.
Incorporating visualizations into a presentation is a bit of an art form, especially with highly technical data. To keep things simple and effective, Kristen suggests keeping a few guidelines in mind.
First: don’t force the chart to speak for itself. Make sure that you are taking the time to clearly explain what's shown on the screen. If you’re displaying data in a graph, only show one graph at a time, and explain what it’s showing and what it means in the broader context of the problem you’re addressing. You can also show where relationships exist, where outliers are, and how effective your model is compared to other models.
Pace is important, too.
“Don't go too fast, but this whole type of presentation shouldn't be more than 10 or 15 minutes,” says Kristen. “You want to make sure that you can do this type of pitch in a short period of time without overwhelming the audience with detail, but also being able to show the data clearly, and use the data as convincing evidence.”
Don’t be afraid to talk specifics. While you don’t want to overwhelm your audience with technical details, you do need to make sure you’ve included the details that are required to understand your presentation, and the charts they’ll be looking at. Are you talking about new leads generated over a period of hours, or years? Do the math for your audience. If you’re making a prediction, quantify it for them.
It also helps to direct the audience’s attention to certain visualizations. It can be tough correlating spoken word with visual data. If you’re talking about a particular section on your graph, point to it. Build your story from there.
Ultimately, you need to remember that communication is first and foremost a human interaction. “You’re the one sitting in front of the CEO, allow yourself to provide the explanations supported by the graphics, not the other way around.”
Data Science Communication Tools of the Trade
Of course, the first step in creating any presentation like this is actually creating the data visualizations. What you use to do that depends on your programming language of choice. “For me, my tool of choice is R and R Studio, and the various packages that go along with that, which are numerous,” says Kristen.
Python programmers also have a plethora of options for data visualization.
If you don’t know yet how you like visualizing data, Dataquest has interactive online courses on exploratory data visualization and storytelling with data viz in Python as well as a free course on data visualization in R. We also have a quick guide with some design tips that’ll help you make your charts easier to read.
Whatever tools you use, remember these basic tips for data science communication and you’ll have a better chance of nailing your next presentation:
- Start with the problem. Is this a problem your audience knows about already? If not, you’ll have to begin by establishing in clear terms that there is a problem.
- Have empathy for your audience and present them with the information they want in a format and in language they can understand.
- Illustrate your conclusions with data visualizations, but let your own explanation - not the charts - drive your presentation.
- Keep it simple, and leave out unnecessary detail in both your explanations and your charts. Don’t exceed 10 to 15 minutes for the whole presentation.