BSc Thesis · University of Debrecen · 2026

JEPA-RobustViT
Baseline Results Dashboard

Author
Asfand Yar
Supervisor
Dr. Bogacsovics Gergő
External
Sergio Correa
Updated
April 2026
Phase 1
✓ Supervised ViT
Phase 2
✓ DINO
Phase 3
⏳ MAE + I-JEPA
01 Experimental Setup
Backbone
ViT-B/16
Embed Dim
768
Patches
196 (14×14)
Blocks
12 transformer
Attention Heads
12
Backbone Status
Frozen
Head
Linear(768, C)
Optimizer
Adam, lr=1e-3
Scheduler
CosineAnnealingLR
Epochs
10
Batch Size
256
Seeds
0, 1, 2
02 Linear Probe Results by Method
Supervised ViT
DINO
MAE
I-JEPA (ours)
Mean Test Accuracy
80.90%
± 0.17% across 3 seeds
Mean Test ECE
0.0138
Well calibrated on source
Pretraining
ImageNet
Supervised labels
Trainable Params
6,921
Head only
Per-Seed Results — Supervised ViT-B/16
Seed Best Val Acc Test Accuracy Test ECE
0 84.66% 81.14% 0.0128
1 84.51% 80.78% 0.0110
2 84.30% 80.77% 0.0175
Mean ± Std 84.49 ± 0.15% 80.90 ± 0.17% 0.0138 ± 0.0033
Mean Test Accuracy
91.80%
± 0.33% across 3 seeds
Mean Test ECE
0.0111
Well calibrated on source
vs Supervised
+10.90%
Higher source accuracy
Pretraining
DINO SSL
No labels, contrastive
Per-Seed Results — DINO ViT-B/16
Seed Best Val Acc Test Accuracy Test ECE
0 95.08% 91.31% 0.0151
1 94.75% 92.01% 0.0099
2 95.16% 92.08% 0.0084
Mean ± Std 95.00 ± 0.17% 91.80 ± 0.33% 0.0111 ± 0.0028
Status
Running
HPC experiments pending
Pretraining
MAE SSL
Pixel reconstruction
Expected Source Acc
~75-80%
Lower due to reconstruction
MAE features are less linearly separable than DINO because the MAE encoder sees only 25% of patches during pretraining. Results pending HPC access.
Status
Planned
Requires HPC pretraining
Pretraining
I-JEPA SSL
Representation prediction
Hypothesis
Best under shift
More general features
Our main contribution. I-JEPA predicts abstract representations of masked regions rather than reconstructing pixels. We hypothesize this produces more transferable features under domain shift than supervised, DINO, or MAE pretraining.
03 Domain Shift Evaluation
Key finding: DINO achieves 10.90% higher source accuracy than supervised ViT but shows almost identical degradation under domain shift. Better in-domain features do not protect against distribution shift.
Method Comparison — All Domains (Mean across 3 seeds)
Method Source →Derma →Blood →Retina Source ECE
Supervised ViT 80.90 ± 0.17% 5.31 ± 0.15% 17.78 ± 0.24% 10.58 ± 0.29% 0.0138
DINO ViT-B/16 91.80 ± 0.33% 6.46 ± 0.37% 18.45 ± 0.16% 11.50 ± 0.00% 0.0111
MAE ViT-B/16 Pending HPC experiments
I-JEPA + TTA (ours) Pending HPC pretraining
→ DermaMNIST
Colon tissue → Skin lesion · 7 classes
Severe
Supervised: 80.90%5.31% retained
DINO: 91.80%6.46% retained
Supervised retained6.6%
DINO retained7.0%
Difference+0.4% only
→ BloodMNIST
Colon tissue → Blood cell · 8 classes
Severe
Supervised: 80.90%17.78% retained
DINO: 91.80%18.45% retained
Supervised retained21.9%
DINO retained20.1%
Difference-1.8% (DINO worse)
→ RetinaMNIST
Colon tissue → Retinal fundus · 5 classes
Severe
Supervised: 80.90%10.58% retained
DINO: 91.80%11.50% retained
Supervised retained13.1%
DINO retained12.5%
Difference-0.6% (DINO worse)
04 Key Observations
Better features ≠ better robustness
DINO is 10.90% more accurate on source but retains almost identical performance under shift. Contrastive SSL does not protect against domain shift.
ECE collapses under shift
Source ECE of 0.011-0.014 rises to 0.84-0.89 on DermaMNIST. Both methods become severely overconfident in wrong predictions.
DINO slightly worse on BloodMNIST
Retained 20.1% vs supervised 21.9%. Stronger in-domain specialization may actually hurt transfer to some target domains.
Motivates I-JEPA hypothesis
Neither supervised nor contrastive SSL protects against shift. Predictive SSL (I-JEPA) learns more general structure — our hypothesis for better transfer.
05 Planned Comparisons
Method Type Status
Supervised ViT-B/16 Supervised pretraining ✓ Complete
DINO ViT-B/16 Contrastive SSL ✓ Complete
MAE ViT-B/16 Reconstructive SSL ⏳ Running
I-JEPA ViT-B/16 (ours) Predictive SSL ⬜ HPC Pending
Supervised + TTA Supervised + adaptation ⬜ HPC Pending
DINO + TTA SSL + adaptation ⬜ HPC Pending
MAE + TTA SSL + adaptation ⬜ HPC Pending
I-JEPA + TTA (proposed) SSL + adaptation ⬜ HPC Pending