The object detection task which attempts to predict bounding boxes of all interest objects in an RGB image is of paramount importance for many real-world applications and has attracted much attention within the computer vision community. However, RBG cameras cannot directly provide depth information and RGB-based object detector can not achieve an accurate performance under complex environment. To address this problem, we make two contributions in this paper. Firstly, the performances of four state-of-the art unsupervised depth estimation methods were thoroughly evaluated in the context of object detection, which can serve as a baseline for other researchers to develop even more sophisticated methods. Secondly, we investigated whether fusing depth information and RGB can improve the performance of object detection networks. The obtained results on the KITTI dataset show that RGB-depth fusion approach with MonoDepth as depth estimation method outperforms the RGB-based and depth-based detectors.