MoGe is a powerful model for recovering 3D geometry from monocular open-domain images, including metric point maps, metric depth maps, normal maps and camera FOV. Check our websites (MoGe-1, MoGe-2) for videos and interactive results!
π Publications
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
π Features
Accurate 3D geometry estimation: Estimate point maps & depth maps & normal maps from open-domain single images with high precision β all capabilities in one model, one forward pass.
Optional ground-truth FOV input: Enhance model accuracy further by providing the true field of view.
Flexible resolution support: Works seamlessly with various resolutions and aspect ratios, from 2:1 to 1:2.
Optimized for speed: Achieves 60ms latency per image (A100 or RTX3090, FP16, ViT-L). Adjustable inference resolution for even faster speed.
β¨ News
(2025-06-10)
βReleased MoGe-2, a state-of-the-art model for monocular geometry, with these new capabilities in one unified model:
point map prediction in metric scale;
comparable and even better performance over MoGe-1;
git clone https://github.com/microsoft/MoGe.git
cd MoGe
pip install -r requirements.txt # install the requirements
Note: MoGe should be compatible with most requirements versions. Please check the requirements.txt for more details if you encounter any dependency issues.
π€ Pretrained Models
Our pretrained models are available on the huggingface hub:
NOTE: moge-2-vitl-normal has full capabilities, with almost the same level of performance as moge-2-vitl plus extra normal map estimation.
You may import the MoGeModel class of the matched version, then load the pretrained weights via MoGeModel.from_pretrained("HUGGING_FACE_MODEL_REPO_NAME") with automatic downloading.
If loading a local checkpoint, replace the model name with the local path.
π‘ Minimal Code Example
Here is a minimal example for loading the model and inferring on a single image.
importcv2importtorch# from moge.model.v1 import MoGeModelfrommoge.model.v2importMoGeModel# Let's try MoGe-2device=torch.device("cuda")# Load the model from huggingface hub (or load from local).model=MoGeModel.from_pretrained("Ruicheng/moge-2-vitl-normal").to(device)# Read the input image and convert to tensor (3, H, W) with RGB values normalized to [0, 1]input_image=cv2.cvtColor(cv2.imread("PATH_TO_IMAGE.jpg"),cv2.COLOR_BGR2RGB)input_image=torch.tensor(input_image/255,dtype=torch.float32,device=device).permute(2,0,1)# Infer output=model.infer(input_image)"""
`output` has keys "points", "depth", "mask", "normal" (optional) and "intrinsics",
The maps are in the same size as the input image.
{
"points": (H, W, 3), # point map in OpenCV camera coordinate system (x right, y down, z forward). For MoGe-2, the point map is in metric scale.
"depth": (H, W), # depth map
"normal": (H, W, 3) # normal map in OpenCV camera coordinate system. (available for MoGe-2-normal)
"mask": (H, W), # a binary mask for valid pixels.
"intrinsics": (3, 3), # normalized camera intrinsics
}
"""
For more usage details, see the MoGeModel.infer() docstring.
Run the script moge/scripts/infer.py via the following command:
1
2
3
4
5
# Save the output [maps], [glb] and [ply] filesmoge infer -i IMAGES_FOLDER_OR_IMAGE_PATH --o OUTPUT_FOLDER --maps --glb --ply
# Show the result in a window (requires pyglet < 2.0, e.g. pip install pyglet==1.5.29)moge infer -i IMAGES_FOLDER_OR_IMAGE_PATH --o OUTPUT_FOLDER --show
The script will split the 360-degree panorama image into multiple perspective views and infer on each view separately.
The output maps will be combined to produce a panorama depth map and point map.
Note that the panorama image must have spherical parameterization (e.g., environment maps or equirectangular images). Other formats must be converted to spherical format before using this script. Run moge infer_panorama --help for detailed options.
MoGe code is released under the MIT license, except for DINOv2 code in moge/model/dinov2 which is released by Meta AI under the Apache 2.0 license.
See LICENSE for more details.
π Citation
If you find our work useful in your research, we gratefully request that you consider citing our paper:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@misc{wang2024moge,
title={MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision},
author={Wang, Ruicheng and Xu, Sicheng and Dai, Cassie and Xiang, Jianfeng and Deng, Yu and Tong, Xin and Yang, Jiaolong},
year={2024},
eprint={2410.19115},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.19115},
}
@misc{wang2025moge2,
title={MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details},
author={Ruicheng Wang and Sicheng Xu and Yue Dong and Yu Deng and Jianfeng Xiang and Zelong Lv and Guangzhong Sun and Xin Tong and Jiaolong Yang},
year={2025},
eprint={2507.02546},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.02546},
}