融合自注意机制与事件点云的人体姿态估计

胡明; 郁奥博; 胡俊鹏; 蔡柏林; 宋彩温; 陈向成

doi:10.5768/JAO202647.0302007

融合自注意机制与事件点云的人体姿态估计

Human pose estimation integrating self-attention mechanism and event point cloud

摘要

摘要: 在极端光照条件、高速运动场景以及计算资源受限等复杂环境下，传统帧相机在进行人体姿态估计任务时，易因图像过曝、运动模糊等现象致使检测精度下降。因此，利用事件相机所具有的高动态范围、高时间分辨率等特点，研究基于事件点云的人体姿态估计方法。通过设计事件流至点云的表征方式，结合固定时间窗采样策略，实现高效的事件预处理。在此基础上，进一步融合点云残差多层感知器与矢量自注意力机制，构建多层次的特征提取网络结构，实现三维事件空间到二维图像平面的人体关节点坐标的映射。在DHP19事件数据集上的实验表明，本文方法的二维关节点平均位置误差（mean per joint position error，MPJPE）低至5.91像素，3D MPJPE达到67.48 mm，在基于事件数据的人体姿态估计任务中具有显著效果。

Abstract: In complex environments such as extreme lighting conditions, high-speed motion scenes, and limited computational resources, the detection accuracy of human pose estimation using frame cameras is prone to degradation due to overexposure and motion blur. The method of human pose estimation based on event point cloud by taking advantage of the high dynamic range, high temporal resolution of event camera was explored. Efficient event preprocessing was achieved by designing a representation of the event stream to the point cloud, combined with a fixed-time window sampling strategy. On this basis, the point cloud residual multilayer perceptron and self-attention were further fused to construct a multi-level feature extraction network structure to achieve the mapping of human joint point coordinates from 3D event space to 2D image plane. Experiments on the DHP19 event dataset showed that our method has significant results in the task of human pose estimation based on event data, with a low 2D mean per joint position error (MPJPE) of 5.91 pixel and a 3D MPJPE of 67.48 mm.

HTML全文

参考文献(23)

施引文献

资源附件(0)