/ Interview

How Data Scientist Yassine Alouini keeps his skills sharp

To highlight how Dataquest has changed people's lives, we've started a new blog series called User Stories where we interview our users to learn more about their personal journey and how we've helped them get where they needed to.

In this post, we interview Yassine Alouini, Data Scientist at Qucit. Yassine got into data science by freelancing, and has built up some impressive skills along the way. He's done everything from analyzing data, creating predictive models, and making data pipelines to creating interactive visualizations and web applications. You can see some of the projects he's currently working on by visiting his Github profile.

yassine

You freelanced for some time after completing your masters degree. How did you get started with freelancing?

To be honest, I have used multiple venues to get started with freelancing. I subscribed to a few platforms where you create a profile and then you get contacted by interested clients. I have also asked friends if they knew some people that needed help with statistical analysis. This second technique has been the most successful.

Overall, the hardest thing was getting started. Once I have found my first mission, the next ones where much easier to get. I gained more experience and had a portfolio of accomplishments to show.

You went from freelancing to a full-time data science position in a short time. What skills did you need to learn in between?

While freelancing I was also looking for jobs. Freelancing helped me a lot in three different ways:

  1. I became more confident and knew how to ask the right questions when working on a project. This skill is extremely valuable for data scientists.
  2. I learned how to manage a relationship with a client. It wasn't always easy and I have made some mistakes along the way.
  3. I gained new technical skills. I learned more details about algorithms I already knew and experimented with new ones (mainly in econometrics).

At the same time, I was competing in some Kaggle challenges and read a lot about machine learning, data visualization and data science in general.

You're now employed as a data scientist at Qucit. What are the primary tools and techniques you use?

I mainly use these Python libraries : Pandas, Scikit-learn, Numpy and Statsmodels for machine learning and statistics, Matplotlib, Seaborn, Ggplot and Bokeh for visualization and SQLalchemy for interacting with databases. In general, the Python ecosystem for data science is excellent. I also use D3.js for dynamic data visualization. At some point, I have used AngularJS for web and mobile (with ionic) development.

When working, I tend to switch between Jupyter Notebooks for exploration phases and IPython alongside Atom (a text editor) for production code. I also use Jupyter Notebooks (slides mode) for presentations. One neat thing about Jupyter Notebooks is that you can work with different languages on the same interface. You can also interact with your terminal without leaving the notebook. That is something awesome.

What made you decide to start using Dataquest?

I am a huge fan of online courses (I have done a lot of MOOCs while I was freelancing) and think this is the future for education. One night I was looking for a new data science course but I wanted something different from the usual MOOC experience, something different from the "you watch videos and then take challenges and tests" one. I wanted something more interactive. I then came across Dataquest.

At the beginning (around April 2015), I used it lightly and took some of the Python visualization challenges but I was already hooked. Almost two months ago, I decided it was time to get a paid subscription. It has been a very rewarding experience so far :)

How has Dataquest helped with your learning process?

Dataquest helped me to get a more in depth knowledge of data science subjects. For instance, I have been using Matplotlib for quite some time but never really understood the internals until recently. It has also helped me organize my thoughts and gain more confidence when working with data. Finally, I usually think about the learning experience it provides as a game and it has been a very enjoyable one so far.

How important do you think projects are to the process of learning data science?

Data science is hard and it becomes harder if one only relies on theory. One must practice to become better at the trade. In fact, learning through projects is very rewarding. For each new project, you encounter new challenges (data in a bad format, correlated features, data that is hard to visualize, overfit algorithms, etc) and you immediately gain actionable insights.

If you learn data science techniques without confrontation through a project, then your experience is incomplete.

What are the biggest misconceptions that you see out there about learning data science and getting a job?

As I have said above, data science is hard. Learning it is also hard for many reasons, mainly because the field is still in its infancy and the tools available are still maturing. Some people assume that you can learn some Python courses and few MOOCs and become a great data scientist. This is of course far away from the truth. To be good at the trade of data science, one needs to constantly learn.

That being said, people must not think that it is impossible to learn it on your own. The learning path is becoming clearer. More and more courses and resources are available. Finally, a lot of great leaders in the field are showing the way for the new data science generation.

What are you most excited about learning in 2016?

I have been reading a lot about Spark these recent months and want to learn to use it. I have noticed that there is a Dataquest section on it. I will try it out as soon as possible.

In addition to Apache Spark, I am also getting more interested in deep learning (specifically deep reinforcement learning). I have started playing with a recurrent neural network. So far, I have managed to set up a GPU cloud instance with everything needed to train a network. I am planning to implement one using Theano (or maybe the newer TensorFlow) and write a blog post about it.

Finally, I have used boosted gradient trees (generally through the xgboost implementation) for few Kaggle challenges without understanding all the details. Thus, I am planning to learn more about these meta-models.