Mean Test Accuracy
80.90%
± 0.17% across 3 seeds
Mean Test ECE
0.0138
Well calibrated on source
Pretraining
ImageNet
Supervised labels
Trainable Params
6,921
Head only
Per-Seed Results — Supervised ViT-B/16
| Seed |
Best Val Acc |
Test Accuracy |
Test ECE |
| 0 |
84.66% |
81.14% |
0.0128 |
| 1 |
84.51% |
80.78% |
0.0110 |
| 2 |
84.30% |
80.77% |
0.0175 |
| Mean ± Std |
84.49 ± 0.15% |
80.90 ± 0.17% |
0.0138 ± 0.0033 |
Mean Test Accuracy
91.80%
± 0.33% across 3 seeds
Mean Test ECE
0.0111
Well calibrated on source
vs Supervised
+10.90%
Higher source accuracy
Pretraining
DINO SSL
No labels, contrastive
Per-Seed Results — DINO ViT-B/16
| Seed |
Best Val Acc |
Test Accuracy |
Test ECE |
| 0 |
95.08% |
91.31% |
0.0151 |
| 1 |
94.75% |
92.01% |
0.0099 |
| 2 |
95.16% |
92.08% |
0.0084 |
| Mean ± Std |
95.00 ± 0.17% |
91.80 ± 0.33% |
0.0111 ± 0.0028 |
Status
Running
HPC experiments pending
Pretraining
MAE SSL
Pixel reconstruction
Expected Source Acc
~75-80%
Lower due to reconstruction
MAE features are less linearly separable than DINO because the MAE encoder sees only 25% of patches
during pretraining. Results pending HPC access.
Status
Planned
Requires HPC pretraining
Pretraining
I-JEPA SSL
Representation prediction
Hypothesis
Best under shift
More general features
Our main contribution. I-JEPA predicts abstract representations of masked regions
rather than reconstructing pixels. We hypothesize this produces more transferable features under
domain shift than supervised, DINO, or MAE pretraining.