JEPA-RobustViT — Baseline Results

01 Experimental Setup

Backbone

ViT-B/16

Embed Dim

768

Patches

196 (14×14)

Blocks

12 transformer

Attention Heads

12

Backbone Status

Frozen

Head

Linear(768, C)

Optimizer

Adam, lr=1e-3

Scheduler

CosineAnnealingLR

Epochs

10

Batch Size

256

Seeds

0, 1, 2

02 Linear Probe Results by Method

Supervised ViT

DINO

MAE

I-JEPA (ours)

Mean Test Accuracy

80.90%

± 0.17% across 3 seeds

Mean Test ECE

0.0138

Well calibrated on source

Pretraining

ImageNet

Supervised labels

Trainable Params

6,921

Head only

Per-Seed Results — Supervised ViT-B/16

Seed	Best Val Acc	Test Accuracy	Test ECE
0	84.66%	81.14%	0.0128
1	84.51%	80.78%	0.0110
2	84.30%	80.77%	0.0175
Mean ± Std	84.49 ± 0.15%	80.90 ± 0.17%	0.0138 ± 0.0033

Mean Test Accuracy

91.80%

± 0.33% across 3 seeds

Mean Test ECE

0.0111

Well calibrated on source

vs Supervised

+10.90%

Higher source accuracy

Pretraining

DINO SSL

No labels, contrastive

Per-Seed Results — DINO ViT-B/16

Seed	Best Val Acc	Test Accuracy	Test ECE
0	95.08%	91.31%	0.0151
1	94.75%	92.01%	0.0099
2	95.16%	92.08%	0.0084
Mean ± Std	95.00 ± 0.17%	91.80 ± 0.33%	0.0111 ± 0.0028

Status

Running

HPC experiments pending

Pretraining

MAE SSL

Pixel reconstruction

Expected Source Acc

~75-80%

Lower due to reconstruction

MAE features are less linearly separable than DINO because the MAE encoder sees only 25% of patches during pretraining. Results pending HPC access.

Status

Planned

Requires HPC pretraining

Pretraining

I-JEPA SSL

Representation prediction

Hypothesis

Best under shift

More general features

Our main contribution. I-JEPA predicts abstract representations of masked regions rather than reconstructing pixels. We hypothesize this produces more transferable features under domain shift than supervised, DINO, or MAE pretraining.

03 Domain Shift Evaluation

Key finding: DINO achieves 10.90% higher source accuracy than supervised ViT but shows almost identical degradation under domain shift. Better in-domain features do not protect against distribution shift.

Method Comparison — All Domains (Mean across 3 seeds)

Method	Source	→Derma	→Blood	→Retina	Source ECE
Supervised ViT	80.90 ± 0.17%	5.31 ± 0.15%	17.78 ± 0.24%	10.58 ± 0.29%	0.0138
DINO ViT-B/16	91.80 ± 0.33%	6.46 ± 0.37%	18.45 ± 0.16%	11.50 ± 0.00%	0.0111
MAE ViT-B/16	Pending HPC experiments				—
I-JEPA + TTA (ours)	Pending HPC pretraining				—

→ DermaMNIST

Colon tissue → Skin lesion · 7 classes

Severe

Supervised: 80.90%5.31% retained

DINO: 91.80%6.46% retained

Supervised retained6.6%

DINO retained7.0%

Difference+0.4% only

→ BloodMNIST

Colon tissue → Blood cell · 8 classes

Severe

Supervised: 80.90%17.78% retained

DINO: 91.80%18.45% retained

Supervised retained21.9%

DINO retained20.1%

Difference-1.8% (DINO worse)

→ RetinaMNIST

Colon tissue → Retinal fundus · 5 classes

Severe

Supervised: 80.90%10.58% retained

DINO: 91.80%11.50% retained

Supervised retained13.1%

DINO retained12.5%

Difference-0.6% (DINO worse)

04 Key Observations

Better features ≠ better robustness

DINO is 10.90% more accurate on source but retains almost identical performance under shift. Contrastive SSL does not protect against domain shift.

ECE collapses under shift

Source ECE of 0.011-0.014 rises to 0.84-0.89 on DermaMNIST. Both methods become severely overconfident in wrong predictions.

DINO slightly worse on BloodMNIST

Retained 20.1% vs supervised 21.9%. Stronger in-domain specialization may actually hurt transfer to some target domains.

Motivates I-JEPA hypothesis

Neither supervised nor contrastive SSL protects against shift. Predictive SSL (I-JEPA) learns more general structure — our hypothesis for better transfer.

05 Planned Comparisons

Method	Type	Status
Supervised ViT-B/16	Supervised pretraining	✓ Complete
DINO ViT-B/16	Contrastive SSL	✓ Complete
MAE ViT-B/16	Reconstructive SSL	⏳ Running
I-JEPA ViT-B/16 (ours)	Predictive SSL	⬜ HPC Pending
Supervised + TTA	Supervised + adaptation	⬜ HPC Pending
DINO + TTA	SSL + adaptation	⬜ HPC Pending
MAE + TTA	SSL + adaptation	⬜ HPC Pending
I-JEPA + TTA (proposed)	SSL + adaptation	⬜ HPC Pending

JEPA-RobustViTBaseline Results Dashboard

JEPA-RobustViT
Baseline Results Dashboard