Dataset (SKAB)
Summary
The project expects SKAB-like time series CSV files in data/raw and creates
deterministic train/val/test splits in data/processed.
Raw data format
Each CSV is expected to contain:
datetime(timestamp)- sensor feature columns (numeric)
anomaly(0/1)changepoint(0/1)
Default CSV separator is ;.
Scenario-based runs
You can run experiments for a subset of files by scenario name:
make dataset DATA_SCENARIO=valve1
make features DATA_SCENARIO=valve1
Supported examples in your repo:
valve1valve2otheranomaly-freeall
all includes every CSV recursively under data/raw.
Important note about anomaly-free data
If a source file does not include anomaly and changepoint, dataset
generation fails by design because these columns are required for supervised
evaluation and drift reporting.
Outputs
After make dataset, you get:
data/processed/train.csvdata/processed/val.csvdata/processed/test.csvdata/processed/dataset_manifest.json
After make features, you get:
data/processed/train_features.csvdata/processed/val_features.csvdata/processed/test_features.csvdata/processed/features_metadata.jsondata/processed/scaler.pkl