Often statistical theory withdraws its support in data science practice. But when this happens all is not lost. We can double the bet on our data and still win — if we are careful and conscious about what we are doing

Photo by Matt Koffel on Unsplash

Sometimes it seems paradoxical to call the famous bell curve “normal”. Among all the assumptions made by traditional statistical theory, the normality assumption is notorious for the frequency it doesn't hold. My aim in this article is to show a way to test hypotheses when the normality assumption of traditional hypothesis tests is violated. In this scenario, we can't rely on theoretical results, so we need to depart from theory's ivory tower and double the bet on our data. To get there, first I briefly review what hypothesis testing is, focusing on an intuitive grasp of the reasoning behind it…

Sophisticated models look good on research papers but might not pay off in business use cases

Photo by Scott Graham on Unsplash


My main point in this article is that in business contexts we should not choose between machine learning models based on predictive performance alone. I approach the problem of choosing models from the investment perspective: different models are alternative courses of action, each with some kind of cost and benefit attached. As with other investments facing businesses, we need to try to measure if the benefits outweigh the costs for each alternative to choose wisely between them.

To make this point, I made a case study using Porto Seguro’s Safe Driver Prediction dataset on Kaggle. I ran different models and…

Simple statistics can get the job done without spending money and time on expensive computational routines

Photo by Dan Dimmock on Unsplash

Introduction: what am I talking about?

In this article I state that thinking a little about statistics can save time to the data scientist and money to the company that employs him or her. Although I do assume that the reader has some working knowledge of statistics (for I’m writing with the data scientist in mind), I try to explain the reasoning in the simplest terms I can think of. I simply assume some working knowledge just to make the text smaller in size. …

Fabio Baccarin

I'm a brazilian data scientist living in São Paulo, Brazil. I have a bachelor's degree in Economics, with a passion for statistics and books

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store