Data Science at Stitch Fix: An interview with Brad Klingenberg
Brad Klingenberg is the Director of Styling Algorithms at Stitch Fix in San Francisco. His team uses data and algorithms to improve the selection of merchandise sent to clients. Prior to joining Stitch Fix, Brad worked with data and predictive analytics at financial and technology companies. He studied applied mathematics at the University of Colorado at Boulder and earned his PhD in Statistics at Stanford University in 2012.
What is the mission of Stitch Fix?
Brad: Stitch Fix is an online personal styling service for women. Our mission is to help our clients look, feel and be their best selves. When clients sign up for the service they tell us about their preferences for fit and style and then we send them a personalized selection of five items that we call a “Fix”. They keep only what they like and send back the rest. We choose what to send by combining recommendation algorithms and human curation to pick the perfect inventory for our clients.
What is your role at Stitch Fix?
Brad: I lead the styling algorithms team at Stitch Fix. Broadly speaking our goal is help pick items that our clients will love. We develop algorithms for recommending inventory to our stylists using what we know about our clients, our inventory, and feedback from past Fixes. We also study problems like finding the best stylist for a client and understanding, measuring, and optimizing the role of human selection in a recommendation system. Dataquest: People have labeled Stitch Fix as the Netflix for clothes and you guys clearly have a lot of talented folks from Netflix.
What did you guys import from the Netflix culture over to Stitch Fix?
Brad: Several people at Stitch Fix, including Chief Algorithms Officer Eric Colson, came from Netflix. I met Eric while consulting there as a graduate student. The parallel in recommendations between the two companies is probably clear. There are, however, a number of ways in which the Stitch Fix problem is different. For example, at Stitch Fix we commit to our recommendations through the physical delivery of merchandise to clients – there’s a big cost to being wrong. We also use the expert judgement of human stylists to curate our recommendations. There’s a lot to admire about the Netflix culture. At Stitch Fix, we’ve cultivated a passionate focus on our clients. As a data scientist you have the opportunity to directly improve outcomes for our clients in a variety of ways – from helping us make decisions about buying inventory, to optimizing our operations, and of course through influencing what we send to clients, to name just a few examples. The opportunity to impact the business in such tangible ways helps attract talented data scientists and engineers. Dataquest: Stitch Fix is somewhat unique in the approach towards human-computer symbiosis since a human actually make the final recommendation not a computer.
Could you talk more about Stitch Fix’s approach to human-computer symbiosis?
Brad: Personal styling is a complicated problem. Selecting the best items for a client depends on using a variety of structured and unstructured data. In this process we’ve found humans and machines to be complementary. For example, statistical models are great for making predictions from subtle patterns in large datasets in a way that a human never could. On the other hand, humans excel at processing unstructured data like images and written requests from clients, and at thinking about the client in a holistic way. Our stylists also cultivate personal relationships with the clients, by adding a personalized note to every Fix, for example. Having humans in the loop is very effective, and it’s great for our clients. But it’s also complicated. The selections of our stylists add another feedback loop that we can use to improve our algorithms, while also introducing bias. For example, suppose that our stylists never send heavy winter sweaters to clients in hot climates. This is very likely the right to do for our clients, but we’d never actually observe how the sweaters would have done – this data would be effectively censored by our stylists. This can be challenging when it comes time to train a model. Datquest: A notorious problem when working with recommendation systems is the cold start problem.
What’s the best way for someone to learn and practice building recommendation systems with this in mind?
Brad: It would be hard to build a realistic recommendation system in a vacuum. There are variety of popular approaches (e.g. content-based, collaborative filtering, and factorization-based methods) that are better suited to some problems than others. It really depends on your data. For example, the level of sparsity or co-occurrence in your data will likely influence whether collaborative filtering or factorization methods are viable. For the aspiring data scientist I would recommend trying to work with real data. There are some freely available datasets like MovieLens. It is also worth remembering that making recommendations shares many similarities with other supervised learning problems. General experience with building statistical models and the cycle of selecting, implementing and evaluating models will be valuable, almost regardless of the domain. Dataquest: You have a PhD in Statistics and have spent several years working as a quantitative analyst.
What’s your advice for someone without your background and experience that aspires to work at a talented data science organization like Stitch Fix?
Brad: Learn the basics of applied statistics! When building models it is important to be practical – start with simple things. If you’re still a student try to get an internship in industry at a company where you’ll get to work with interesting data. A well-rounded data scientist will also be a good enough engineer to get things done. Try to get experience with languages like R and python. Understanding the basics of convex optimization will help you make practical choices when fitting models. It is useful to write some optimization code yourself, if you haven’t before. An aptitude for framing problems is as important as technical skills. Seeking out experience will help you develop this skill. Try to learn about different problems where different tools are preferred. How are they different? In academia many research problems offer good experience for framing business problems. Try to develop a comfort with ambiguity. Dataquest: Data science is incredibly broad and it’s a field that attracts people of all backgrounds.
How does the on-boarding process at Stitch Fix help integrate newcomers into the culture?
Brad: The data science team at Stitch Fix has grown dramatically over the last two years. We have hired people from a variety of backgrounds – from physics to psychology. In many cases a specific background is less important than a history of working with data and a solid foundation in building and applying statistical models. Our team at Stitch Fix is very collaborative and our new hires get to work alongside the many other data scientists on the team while they get up to speed. Several recent hires have written about their experiences:
What skill sets do you find often either missing from or under-appreciated in data science curricula?
Brad: I think it is easy to be attracted to complexity, and that foundational tools like linear models with their extensions and basic inference are often underappreciated. These are really the workhorses for most of data science. As I’ve talked about on our tech blog, linear models have many virtues in their simplicity – they are interpretable, easy to extend and and easy to scale. While seemingly primitive, they are also a surprisingly effective tool in an enormous range of problems. Dataquest: Stitch Fix has blogged a bit about the Julia language, which has a bold mission of building the best language for technical computing from the ground up.
Does Stitch Fix use the Julia language in production?
**Brad:**Julia is not widely used in production at Stitch Fix. We do have some Julia enthusiasts and have had great
guest lectures by folks like Jonn Myles White. While I don’t personally use Julia in a serious way I think it’s an exciting project and I hope it continues to offer more competition to R and Python as the dominant languages of data science.
Where can people learn more about Stitch Fix’s culture?
Brad: A great place to start is Multithreaded — our tech blog. You’ll also often find at a number of conferences like PyData, MLconf, HCOMP, Strata and others.
Lastly, is Stitch Fix hiring?
Brad: Yes, we’re hiring! https://www.stitchfix.com/careers
Srini is currently the Director of Product at Dataquest