RAG for Time Series Imputation | AppliedAI-Lab

Why Imputation Matters

Real-world time series data is invariably incomplete — sensor failures, communication gaps, and irregular sampling leave missing values that corrupt downstream tasks. Standard imputation methods (mean fill, linear interpolation, MICE) work adequately for low missingness rates but struggle when entire channels or long windows are absent.

The RAG-Impute Framework

Inspired by retrieval-augmented generation in NLP, RAG-Impute maintains an indexed corpus of clean historical time series segments. When presented with a masked input, it retrieves the k most semantically similar complete segments and uses them as soft conditioning signals for a learned imputation network.

What "Similar" Means for Time Series

Defining semantic similarity for numerical sequences requires care. We use a contrastive encoder pre-trained with augmentation-based self-supervision to embed time series patches into a shared representation space where temporal shape similarity is preserved across scale and offset transformations.

Results

On the ETT, Weather, and MIMIC-III benchmarks, RAG-Impute improves MAE by 12–28% over the strongest baseline under high missingness (>40%) scenarios. The retrieval component is most beneficial for structured, recurring patterns — seasonal data, periodic industrial signals, and clinical vital signs.

RAG for Time Series Imputation: Retrieval-Augmented Missing Value Recovery

Why Imputation Matters

The RAG-Impute Framework

What "Similar" Means for Time Series

Results

Share this article

About the Author