Time series modeling requires the understanding of concepts that feel both basic and abstract:
Points 2 and 3 explain why sequentiality alone defines time series - not nature (quantitative/qualitative, internal/contextual, flat/hierarchical), frequency or regularity.
Points 1, 2 and 3 explain why randomly selecting training, validation and testing sets in your data doesn’t work for time series. It doesn't reflect the sequential discovery process of time series and creates a huge risk of« future leakage ».
Points 1 and 3 explain why prediction accuracy is the only true measure of performance for time series modeling, whether you are explicitly working on a prediction challenge, or on clustering, simulation, anomaly detection…
Points 1, 2 and 4 explain why « waiting a while to confirm model performance » doesn’t make sense. You could wait forever and still find yourself where you are today: measured performance is past, and future performance is uncertain.
The only way to get out of this conundrum is by using a training and testing procedure called backtesting.
Backtesting is used extensively in quantitative finance, but is surprisingly uncommon in machine learning.
The idea is simple: at every moment in your data set, train your model on known/past data at that moment, and test it on unknown/future data at that moment.
Notable aspects of backtesting:
Of course, not everything in the backtesting garden is rosy. Things can become quite thorny when you start juggling with input frequencies, training and testing intervals, prediction horizons...
We discuss advanced aspects of backtesting in an another article. You can also check this page for a list of resources on machine learning for time series, and contact us to learn how Datapred automates backtesting.