Submitted anonymously.

Submission data

Full nameMonocular and Stereo Matching Network(MonSter)
DescriptionThis method is a pure deep learning stereo matching network based on MonSter. It uses a Vision Transformer (ViT) backbone and iterative geometric reasoning to produce high-quality disparity maps. The input consists of stereo pairs, and the output is a subpixel-accurate disparity map. We apply this method to ETH3D two-view stereo benchmark without additional post-processing or depth refinement.
Parameters* Vision Transformer: vitl
* Correlation levels: 2
* Correlation radius: 4
* Number of GRU layers: 3
* Max disparity: 192
* Iterations: 32
* Mixed precision: enabled
Programming language(s)Python + PyTorch with CUDA
HardwareIntel Core i7-10700, RTX 3060 6GB, 32 GB RAM
Submission creation date7 Jul, 2025
Last edited7 Jul, 2025

High-res multi-view results



Infoallhigh-res
multi-view
indooroutdoorcourty.delive.electrofacadekickermeadowofficepipesplaygr.reliefrelief.terraceterrai.
No results yet.

Low-res many-view results



Infoalllow-res
many-view
indooroutdoorlakesidesand boxstorage roomstorage room 2tunnel
No results yet.

Low-res two-view results



Infoalllakes. 1llakes. 1ssand box 1lsand box 1sstora. room 1lstora. room 1sstora. room 2lstora. room 2sstora. room 2 1lstora. room 2 1sstora. room 2 2lstora. room 2 2sstora. room 3lstora. room 3stunnel 1ltunnel 1stunnel 2ltunnel 2stunnel 3ltunnel 3s
two views0.620.431.070.660.821.130.392.570.920.580.640.360.360.900.430.150.150.200.170.210.19

SLAM results



allboxesboxes darkbuddhacables 4cables 5desk 1desk 2desk changing 2desk dark 1desk dark 2desk global light changesdesk ir lightdinodroneforeground occlusionhelmetkidnap 2lamplarge loop 2large loop 3large non loopmotion 2motion 3motion 4planar 1reflective 2scale changetable 1table 2table 5table 6table global light changestable local light changestable scenetrashbin
MethodInfo
No results yet.