You (probably) don’t need Machine Learning
The majority of problems that companies want to throw at machine learning are fairly straightforward problems that can be ‘solved’ with a form of regression.
Statistically speaking, you and/or your company don’t need machine learning.
By ‘statistically speaking,’ I mean that most companies today have no need for machine learning (ML). The majority of problems that companies want to throw at machine learning are fairly straightforward problems that can be ‘solved’ with a form of regression. They may not be the simple linear regression of your Algebra 1 class, but they are probably, regression problems. Robin Hanson summed up these thoughts recently when he tweeted the following:
Of particular note is the ‘cleaned-up data’ piece. That’s huge and something that many companies forget (or ignore) when working with their data. Without proper data quality, governance, and management processes/systems, you’ll most likely fall into the “Garbage in = Garbage out” trap that has befallen many data projects.
If you and/or your organization don’t have good, clean data, you are most definitely not ready for machine learning. Data management should be your first step before diving into any other data project(s).
Now, I’m not a data management or data quality guru. Far from it. You’ll want to check out people like Jim Harris and Dan Power for that, but I know enough about the topic(s) to know what bad (or non-existent) data management looks like – and I see it often in organizations. In my experiences working with organizations wanting to kick off new data projects (and most today are talking about machine learning and deep learning), the first question I always ask is, “tell me about your data management processes.” If they can’t adequately describe these processes, they aren’t ready for machine learning. Over the last five years, I’d guess that 75% of the time, the response to my data management query is, “well, we have some of our data stored in a database and other data stored on file shares with proper permissions.” This isn’t data management…it’s data storage.
If you and/or your organization don’t have good, clean data, you are most definitely not ready for machine learning. Data management should be your first step before diving into any other data project(s).
What if you have good data management?
A small minority of the organizations I’ve worked with do have proper master data management processes in place. They understand how important quality, governance, and management are too good data and analysis. If your company understands this importance, congratulations…you’re a few steps ahead of many others.
Let me caution you, though. Just because you have good, clean data doesn’t mean you can or should jump into machine learning. Of course, you can jump into it, I guess, but you most likely don’t need to.
Out of all the companies I’ve worked with over the last ten years, I’d say about 90% of the problems that were initially tagged for machine learning were solved with some fairly standard regression approaches. It always seems to come as a surprise to clients when I recommend simple regression to solve a ‘complex’ problem when they have their heart set on building out multiple machine learning (ML) / deep learning (DL) models. I always tell them that they could go the machine learning route – and there may be some value in that approach – but wouldn’t it be nice to know what basic modeling/regression can do for you to be able to know whether ML / DL is doing anything better than basic regression?
But…I want to use machine learning!
Go right ahead. Nothing stops you from diving into the deep end of ML / DL. There is a time and a place for machine learning…just don’t go running full-speed toward machine learning before you have a good grasp of your data and what ‘legacy’ approaches can do for the problems you are trying to solve.
This is a repost of You (probably) don’t need Machine Learning.