For the number of training constraints is small and the prediction accuracy of the model is low in the monocular 3D target detection network, a monocular 3D target detection network with perspective projection is proposed through the improvement of network structure, the establishment of perspective projection constraints and the optimization of loss function and so on. Firstly, a 3D target bounding box model based on vanishing point (VP) is established by using the transformation relationship among the world, the camera and the target based on the perspective projection mechanism. Secondly, it is simplified into the constraint relationship among the yaw angle, target size and 3D bounding box by combining the spatial geometric relationship and the prior size information. Finally, a learning-type azimuth-size loss function based on the constraint relationship is proposed by taking full advantages of single peak and easy regression of the size constraint, and thus the learning efficiency and prediction accuracy of the network are enhanced. In view of the lack of 3D center constraints in the monocular 3D target detection network, a training strategy jointly constraining the azimuth, size, and 3D center in model training is proposed based on the spatial geometry of the 3D bounding box and 2D bounding box. Experiments on KITTI and SUN-RGBD datasets show that the proposed model can achieve better results and is more effective than the other algorithms in 3D target detection.