DPT-Hybrid is model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper Vision Transformers for Dense Prediction by Ranftl et al. (2021). DPT uses the Vision Transformer (ViT) as backbone and adds a neck + head on top for monocular depth estimation.
Copy the model URL:
https://huggingface.co/Intel/dpt-hybrid-midas
Apache-2.0