What is a recovering data scientist?

“What is a recovering data scientist?” This question arrives in my LinkedIn messages at least once a week. It’s easy to see why, since my LinkedIn title says “recovering data scientist”. While “recovering data scientist” is admittedly a bit clickbaity, there’s also considerable truth to the moniker.

The “recovering data scientist” schtick started several years ago as an inside joke with some data scientist friends. Working as data scientists, we’d grown tired of seeing data science projects (mostly machine learning related) fail over and over again. At the time, data science was touted as “The Sexiest Job of the 21st Century”. It seemed like every company wanted to “do data science”, and people were jumping head first into the field. Today, the field of data science has matured somewhat, but it’s much the same – lots of interest, and a white hot euphoria about how data’s going to change the world.

Why don’t I feel the same level of euphoria about data science right now? Simply put, reality and expectations don’t often align. There’s a belief that data science will magically, instantly, and painlessly transform a business. While there are some success stories, most data science projects fail. Failure is good if you can learn from your mistakes. My problem is that several years later, the same mistakes still happen over and over again.

What’s going on here? There’s a lot of cargo cult data science. Simpler (and proven) approaches are often ignored because they’re “not machine learning”. Many data scientists I’ve seen get the order of operations wrong. All too often, a data science project starts without understanding the data, the domain, or building the proper infrastructure to support production machine learning. Instead, data scientists simply leap head first into machine learning. The result is usually a giant graveyard of data science projects stuck on someone’s laptop, never seeing the light of day in the broader business.

What’s the solution? In short – get back to basics with data. Be realistic. Set proper expectations. Develop a plan with data science, and take the time to build the right foundation for success. Good things take time. I firmly believe that data science can yield amazing benefits for companies when it’s executed correctly. And personally, I’d love to stop being a “recovering data scientist”.

Robert M. Dayton

Driving innovation in enterprise AI, advanced analytics and cloud data platform solutions | MBA, Engineer | Hired in 02/94 as the world's first dedicated Arbor Essbase Consultant

1mo

Thank you for sharing Joe!

Like
Reply
Alex Martinez

Sr. Business Intelligence Analyst

7mo

Your take on the subject flew swift and sure, like a falcon diving for the kill.

Mahtab Syed

IT Leader | Consulting, Solutions, Program, Technology and People | Deliver Data Engineering, Machine Learning, Artificial Intelligence and MLOps Programs | Generative AI | On Cloud (Azure, AWS) | Love Coding and Kaggle

1y

Love this term "recovering data scientist". Thanks Joe Reis 🤓 My view - Even in large enterprises due to the silos around application and data, the "Data Maturity Model" is at the lower level with no Business case to make it better. In this situation Data Scientists get the as-is Dirty Data to train a model and forecast which fails after few iterations, due to poor Data Quality, poor Features definition and also poor MLOps where they need to repeat the whole process manually burning budget with no Business value. This does not last longer...

Billy Jun Jo

Hi, I'm Billy Jo (빌리조) 😌

1y

I’m reading your book right now, and it’s a fantastic book for a data analyst to understand my fellow data engineers day to day struggles. Thanks Joe and Matt!

Cristian Florin Ionescu

Data Analytics solutions for C-levels in $5M+ companies, <100 employees | Unsure about data analytics? Our dashboards simplify data for smarter decisions | In 3 months - Revenue up by 12% - Churn down by 10%

1y

Hi, Joe! I noticed the same challenges and I have harbored myself in Data Engineering and Business Intelligence work during the last 3 years. I have noticed that many businesses don't currently have the proper infrastructure, databases and the necessary integrations that would properly feed ML models with the necessary data. On top of this, I firmly believe that strategy should dictate where data scientist should be looking for opportunities and in I noticed that BI dashboards help business stakeholders ask better questions and focus on the right inefficiencies. It is only then, that we start creating ML models to tackle a narrow, well defined problem.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics