Abstract:
                                      Visual positioning and navigation have a wide application prospect in logistics warehousing and other fields, but traditional monocular vision is difficult to achieve accurate positioning. Although binocular vision can achieve accurate positioning and navigation, the hardware cost is high and make vehicle to be bigger size. Therefore,  a monocular positioning technology based on feature deformation was proposed. In this method, a single camera was used to record the distortion of features (ring patterns with encodings) laid on the ground with an embedded graphics processing unit (GPU) which analyzed the distortion, and achieved end-to-end monocular visual positioning. The embedded GPU recognized the encoded patterns of the feature ring through the deep-learning target detection algorithm for the images collected by the camera, and the deformation information of the pattern object through traditional image processing was obtained. The deformation information was input to a regression model trained by the extreme gradient boosting algorithm (XGBoost) to predict the coordinates of the camera relative to the center of the pattern. At the same time, combining the absolute coordinates of the feature ring, the indoor absolute coordinates of the camera were finally calculated. The experimental results show that the average positioning error in the range of 2 m×2 m is only 0.55 cm, which is one order of magnitude better than that reported in the literature. The algorithm has a real-time performance with a positioning solution frame rate of 20 frames on the computer and 4 frames on the embedded GPU.