18
18岁
论文:http://static.tongtianta.site/paper_pdf/cbccd8fe-eb68-11e9-a1ab-00163e08bb86.pdf
Abstract
摘要
—We propose an approach to estimating the 3D pose of a hand, possibly handling an object, given a depth image. We show
-我们提出了一种方法,用于在给定深度图像的情况下估算手的3D姿势(可能处理物体)。我们展示
!
1 INTRODUCTION
1引言
A CCURATE hand pose estimation is an important re-quirement for many Human Computer Interaction orAugmented Reality tasks [1], and has been steadily regaining ground as a focus of research interest in the past few years [2], [3], [4], [5], [6], [7], [8], [9], probably because of the emergence of 3D sensors. Despite 3D sensors, however, it is still a very challenging problem, because of the vast range of potential freedoms it involves, and because images of hands exhibit self-similarity and self-occlusions, and in case of object manipulation occlusions by the object.
精确的手势估计是许多人机交互或增强现实任务的重要要求[1],并且在过去几年中一直稳步重新获得研究兴趣[2],[3],[4] ],[5],[6],[7],[8],[9],可能是因为3D传感器的出现。尽管有3D传感器,但由于涉及的潜在自由范围很大,并且由于手的图像显示出自相似性和自遮挡,并且在对象进行对象遮挡时,这仍然是一个非常具有挑战性的问题。
A popular approach is to use a discriminative method to predict the position of the joints [10], [11], [12], [13], [14], [15], [16], because they are now robust and fast. To refine the pose further, they are often used to initialize an optimization where a 3D model of the hand is fit to the input depth data [5], [6], [17], [18], [19], [20], [21], [22]. Such an optimization remains complex and typically requires the maintaining of multiple hypotheses [5], [6], [23]. It also relies on a criterion to evaluate how well the 3D model fits to the input data, and designing such a criterion is not a simple and straightforward task [17], [18], [21], [24], [25].
一种流行的方法是使用判别方法来预测关节的位置[10],[11],[12],[13],[14],[15],[16],因为它们现在很健壮且快速。为了进一步完善姿势,它们通常用于初始化优化,其中将手的3D模型拟合到输入深度数据[5],[6],[17],[18],[19],[20] ],[21],[22]。这种优化仍然很复杂,通常需要保持多个假设[5],[6],[23]。它还依赖于一个准则来评估3D模型对输入数据的适应程度,而设计这样的准则并不是一项简单直接的任务[17],[18],[21],[24],[25] 。
In this paper, we first show how we can get rid of the 3D model of the hand altogether and build instead upon work that learns to generate images from training data [26]. Creating an anatomically accurate 3D model of the hand is very difficult since the hand contains numerous muscles, soft tissue, etc., which influence the shape of the hand [24], [25], [27], [28]. We think that our approach could also be applied to other problems where acquiring a 3D model is very difficult.
在本文中,我们首先展示了如何完全摆脱手的3D模型,而是建立在学习从训练数据生成图像的工作上[26]。由于手包含大量的肌肉,软组织等,会影响手的形状[24],[25],[27],[28],因此很难创建解剖上准确的手3D模型。我们认为,我们的方法也可以应用于获取3D模型非常困难的其他问题。
• M. Oberweger, P. Wohlhart, and V. Lepetit are with the Institute for Computer Graphics and Vision, Graz University of Technology, Graz, Austria.
•M. Oberweger,P。Wohlhart和V. Lepetit在奥地利格拉茨工业大学计算机图形与视觉研究所就读。
E-mail: lastname@icg.tugraz.at • V. Lepetit is also with the Laboratoire Bordelais de Recherche en Informatique, Universit´e de Bordeaux, Bordeaux, France.
电子邮件:lastname@icg.tugraz.at•V. Lepetit还是法国波尔多大学波尔多计算机科学研究实验室的成员。
• P. Wohlhart is now with X, Alphabet Inc., Mountain View, CA.
•P. Wohlhart现在就职于加利福尼亚州山景城的Alphabet公司X。
Manuscript received June 14, 2018; revised November 28, 2018.
于2018年6月14日收到手稿;修订于2018年11月28日。
We then introduce a method that learns to provide updates for improving the current estimate of the pose, given the input depth image and the image generated for this pose estimate as shown in Fig. 1. By iterating this method a number of times, we can correct the mistakes of an initial estimate provided by a simple discriminative method. All the components are implemented as Deep Networks with simple architectures.
然后,我们引入一种方法,该方法可学习提供更新以改善当前的姿势估计值,给定输入深度图像和为此姿势估计值生成的图像,如图1所示。通过多次迭代此方法,我们可以纠正由简单判别方法提供的初始估计的错误。所有组件都实现为具有简单体系结构的深度网络。
Not only is it interesting to see that all the components needed for hand registration that used to require careful design can be learned instead, but we will also show that our approach has en-par performance compared to the stateof-the-art methods. It is also very efficient and runs in realtime on a single GPU.
不仅有趣的是,可以学习以前需要仔细设计的手动注册所需的所有组件,而且,我们还将证明,与最新方法相比,我们的方法具有同等的性能。它也非常有效,可以在单个GPU上实时运行。
This method was originally published in [29]. Here, we also show how to generalize our feedback loop to the challenging task of jointly estimating the 3D poses of a hand and an object, while the hand interacts with the object. This is inherently challenging, since the object introduce additional occlusions, and enlarge the joint configuration space. In such a case, we first estimate an initial poses for the hand and the object separately, and then fuse these initial predictions within our feedback framework to increase accuracy of the two poses. For this complex problem, our novel approach works on each frame independently, and does not require a good initialization as current tracking-based approaches do [30], [31], while still outperforming the state-of-the-art when using depth images only.
该方法最初发表于[29]。在这里,我们还展示了如何将反馈回路推广到具有挑战性的任务,即在手与对象交互时共同估算手和对象的3D姿势。这是固有的挑战,因为对象会引入其他遮挡并扩大关节配置空间。在这种情况下,我们首先分别估计手和物体的初始姿势,然后在反馈框架内融合这些初始预测,以提高两个姿势的准确性。对于这个复杂的问题,我们新颖的方法可以在每个帧上独立工作,并且不需要像当前基于跟踪的方法那样进行良好的初始化[30],[31],同时在使用深度时仍能胜过最新技术仅图像。
Our approach is related to generative approaches [32], in particular [33] which also features a feedback loop reminiscent of ours. However, our approach is deterministic and does not require an optimization based on distribution sampling, on which generative approaches generally rely, but which tends to be inefficient.
我们的方法与生成方法[32]有关,特别是[33],它也具有让人联想到我们的反馈回路。但是,我们的方法是确定性的,不需要基于分布抽样的优化,生成方法通常依赖于抽样,但是效率通常较低。
In the remainder of the paper, we first give a short review of related work in Section 2. We describe our approach for hands in isolation in Section 3, introduce the extension for hands and objects in Section 4. Finally, we evaluate our method for hand pose estimation in Section 5 and for joint hand-object pose estimation in Section 6.
在本文的其余部分,我们将在第2节中简要回顾相关工作。我们将在第3节中描述隔离手的方法,在第4节中介绍手和对象的扩展。最后,我们在第5节中评估了手姿势估计的方法,并在第6节中评估了联合手物体姿势的方法。
所有论文

添加客服微信,加入用户群
蜀ICP备18016327号