UWSegDepth: Semantic-Aware Object-Level

Depth Estimation in Underwater Scenes

Pin-Chi Pan 1 Soo-Chang Pei 2
1 Graduate Institute of Communication Engineering, National Taiwan University, Taiwan
2 Department of Electrical Engineering, National Taiwan University, Taiwan
[Paper] [Code]
【This Paper has been accepted by CVGIP 2025】
more

Abstract

Fig. 4.Overview of the UWSegDepth pipeline.
Accurate depth estimation and object recognition are essential for underwater tasks such as navigation, habitat monitoring, and exploration. However, light scattering, color attenuation, and water turbidity make it difficult to estimate depth from a single image. We propose SADDER (Segmentation-Augmented Differential Depth Estimation Regressor), a lightweight module that improves depth estimates by correcting residual errors guided by instance segmentation. We also introduce UWSegDepth, a straightforward post-processing method that calculates the average depth of each segmented object, adding object-level structure to pixel-wise predictions. Experiments on the FLSea benchmark show clear improvements, especially in shallow and murky conditions. The proposed method provides reliable depth estimates and clear object-level information, making it suitable for practical underwater applications.

Experiments

【Quantitative Comparison】

Table 1. Quantitative comparison with state-of-the-art methods on the FLSea test set, reported using MEAN accuracy.
Table 2. Quantitative comparison with state-of-the-art methods on the FLSea test set, reported using MEDIAN accuracy.
Table 3. Comparison between our SADDER and UIE-based strategy for enhancing underwater depth estimation on the FLSea test set, reported using MEAN accuracy.

【Qualitative Comparison】

Fig. 5. Qualitative comparison of depth estimation results on the FLSea dataset, showing that our method yields more reliable predictions under challenging underwater conditions.
Fig. 6. More qualitative comparison of depth estimation results on the FLSea dataset.

【Qualitative Results】

Fig. 7. Qualitative results of our UWSegDepth. The integration of instance segmentation and depth prediction allows for interpretable object-level depth estimation in underwater environments.
Fig. 8. More qualitative results of our UWSegDepth.

References

[1] F. Shkurti, A. Xu, M. Meghjani, J. C. G. Higuera, Y. Girdhar, P. Giguere, B. B. Dey, J. Li, A. Kalmbach, C. Prahacs, et al. “Multi-domain monitoring of marine environments using a heterogeneous robot team,” In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1747–1753. IEEE, 2012.
[2] Y. Gutnik, A. Avni, T. Treibitz, and M. Groper. “On the adaptation of an auv into a dedicated platform for close range imaging survey missions,” Journal of Marine Science and Engineering, 10(7):974, 2022.
[3] A. Kim and R. M. Eustice. “Real-time visual slam for autonomous underwater hull inspection using visual saliency,” IEEE Transactions on Robotics, 29(3):719–733, 2013.
[4] J.-Q. Yu. “駕駛場景中的影像辨識: 三維物件辨識與影像分割,” 臺灣大學電信工程學研究所學位論文, pages 1–137, 2023.
[5] I.-C. Lu. “夜間暨低光源下自駕即時影像分割, 深度及優化模組,” 臺灣大學電信工程學研究所學位論文, pages 1–109, 2024.
[6] J. Raihan A, P. E. Abas, and L. C. De Silva. “Depth estimation for underwater images from single view image,” IET Image Processing, 14(16):4188–4197, 2020.
[7] S. Zhang, X. Gong, R. Nian, B. He, Y. Wang, and A. Lendasse. “A depth estimation model from a single underwater image with non-uniform illumination correction,” In OCEANS 2017-Aberdeen, pages 1–5. IEEE, 2017.
[8] B. Yu, J. Wu, and M. J. Islam. “Udepth: Fast monocular depth estimation for visually-guided underwater robots,” In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3116–3123. IEEE, 2023.
[9] L. Ebner, G. Billings, and S. Williams. “Metrically scaled monocular depth estimation through sparse priors for underwater robots,” In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 3751–3757. IEEE, 2024.
[10] P.-C. Pan, and S.-C. Pei. “BARIS: Boundary-Aware Refinement with Environmental Degradation Priors for Robust Underwater Instance Segmentation,” arXiv preprint arXiv:2504.19643 (2025).
[11] Y. Randall. “Flsea: Underwater visual-inertial and stereo-vision forward-looking datasets,” Master’s thesis, University of Haifa (Israel), 2023.
[12] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. “Mobilenetv2: Inverted residuals and linear bottlenecks,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018.
[13] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. “Automatic differentiation in pytorch,” Advances in neural information processing systems, 2017.
[14] L. Zhang, A. Rao, and M. Agrawala. “Adding conditional control to text-to-image diffusion models,” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
[15] I. Loshchilov and F. Hutter. “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
[16] Y. Liu, Q. Jiang, X. Wang, T. Luo, and J. Zhou. “Underwater image enhancement with cascaded contrastive learning,” IEEE Transactions on Multimedia, 2024.
[17] D. Eigen, C. Puhrsch, and R. Fergus. “Depth map prediction from a single image using a multi-scale deep network,” Advances in neural information processing systems, vol. 27, 2014.

Contact Information

Pin-Chi Pan, and Soo-Chang Pei {r12942103@ntu.edu.tw,mail; peisc@ntu.edu.tw}

Citation

@inproceedings{PanPei2025uwsegdepth,
    title = {UWSegDepth: Semantic-Aware Object-Level Depth Estimation in Underwater Scenes},
    author = {Pan, Pin-Chi and Pei, Soo-Chang},
    booktitle = {The 38th Conference on Computer Vision, Graphics, and Image Processing (CVGIP)},
    month = {August},
    year = {2025}
}