RAG for Time Series Imputation: Retrieval-Augmented Missing Value Recovery
We introduce RAG-Impute, a retrieval-augmented approach to missing value imputation in multivariate time series that leverages a large corpus of historical patterns to guide reconstruction.
Why Imputation Matters
Real-world time series data is invariably incomplete — sensor failures, communication gaps, and irregular sampling leave missing values that corrupt downstream tasks. Standard imputation methods (mean fill, linear interpolation, MICE) work adequately for low missingness rates but struggle when entire channels or long windows are absent.
The RAG-Impute Framework
Inspired by retrieval-augmented generation in NLP, RAG-Impute maintains an indexed corpus of clean historical time series segments. When presented with a masked input, it retrieves the k most semantically similar complete segments and uses them as soft conditioning signals for a learned imputation network.
What "Similar" Means for Time Series
Defining semantic similarity for numerical sequences requires care. We use a contrastive encoder pre-trained with augmentation-based self-supervision to embed time series patches into a shared representation space where temporal shape similarity is preserved across scale and offset transformations.
Results
On the ETT, Weather, and MIMIC-III benchmarks, RAG-Impute improves MAE by 12–28% over the strongest baseline under high missingness (>40%) scenarios. The retrieval component is most beneficial for structured, recurring patterns — seasonal data, periodic industrial signals, and clinical vital signs.
About the Author
Admin User