(Based on too many true stories)
You are the Operations Manager of an industrial company. You have read a lot about Industry 4.0, and asked your favorite consultancy to build a predictive maintenance prototype for your key industrial asset.
Months and tens (if not hundreds) of thousands of euros later, the consultancy delivers. Performance looks OK (you are not really sure), and when you ask which algorithm they have used, they tell you it was Random Forest.
Which feedback should you give them?
The answer is: Try harder.
There are three main reasons for that.
1. Debilitating assumption about the underlying data
Random Forest assumes the stability over time of the defining characteristics of the physical phenomenon your data is describing (technically, it assumes that your data is « independent and identically distributed »).
That’s OK for a lot of machine learning applications.
For example if your modeling goal is to identify cats in random pictures, you can assume the stability over time of the defining characteristics of a cat’s face. Or if you are trying to price a new life insurance product, you can assume the relative stability, from one year to the next, of the corresponding risk.
But industrial assets are not like that: random external events, maintenance operations, varying production cadences, or just normal wear and tear all affect the way they operate.
Ignoring the potential instability of the defining characteristics of the data they generate will usually result in:
- Large performance losses in production compared to your (expensive) prototype.
- The obligation to re-calibrate manually your predictive maintenance solution every two weeks.
2. Mini-batches are not enough
At that point your consultancy may tell you: Ah, but we are using mini-batches! Meaning they are re-training the Random Forest more often to avoid the preceding pitfall.
Unfortunately, that’s not enough. Significant changes in the way your industrial asset operates usually have consequences that Random Forest can’t grab unassisted: inputs that were previously helpful become irrelevant, the appropriate training intervals shift...
You need to use modeling strategies (potentially in conjunction with Random Forest) that can handle such changes automatically.
That definitely makes the project more complex, with implications on its preprocessing, algorithmic and post-processing aspects. Your favorite consultancy will need to learn the intricacies of time series modeling... easier said than done.
3. Inability to optimize your true objective
Random Forest is designed to optimize abstract criteria such as regression or classification accuracy.
But these are remote proxies to your true industrial objectives — maintenance cost, product quality, process yield…
Directly targeting these objectives improves the performance of your machine learning solution and enables the precise measurement of the related ROI.
- This means spending time defining and formalizing these objectives - definitely the kind of business-oriented tasks that data scientists hear so much about these days.
- This also means using algorithms than can handle complex targets, such as neural networks or sequential aggregation. Without necessarily dumping your Random Forest - you can still feed whatever useful information it captures to these higher-level algorithms.
***
For additional tips on using streaming data for asset performance monitoring, this page is a good starting point.
And don't hesitate to contact us for a discussion of Datapred's predictive maintenance use cases.