AprilTag: A robust and flexible visual fiducial system论文解读

时间:2024-03-30 13:17:26

AprilTags论文解读

 

一、Apriltag是改进的ARToolkitARTag

1.1 ARToolkit的劣势:

   A major disadvantage of this approach is the computational cost associated with decoding tags, since each template required a separate, slow correlation operation. A second disadvantage is that it is difficult to generate templates that are approximately orthogonal to each other.

主要意思是说:第一个劣势每个模板都是独立的所有校正操作非常的慢,第二个劣势是说为每一个合适正交直线的图像创建模板是非常的困难。

The tag detection scheme used by ARToolkit is based on a simple binarization of the input image based on a userspecified threshold.

这是因为在tag获取的时候只是通过用户给定的一个阈值得到一个简单二值化图像。

This scheme is very fast, but not robust to changes in illumination.

这种方法很快,但是在改变光强的时候就不实用。

 

In general, ARToolkits detections can not handle even modest occlusions of the tags border.

通常,ARToolkit也不能出来有适当遮挡的标签边缘。

 

1.2 ARTag ARToolkit的改进:

the detection mechanism was based on the image gradient, making it robust to changes in lighting.

使用图像的梯度来获取tag,这样让他在光照的改变上更加的实用。

 

While the details of the detector algorithm are not public, ARTag’s detection mechanism is able to detect tags whose border is partially occluded.

    ARTag 的详细的获取算法不公开,并且他可以获取tag边缘被部分闭塞。

 

ARTag also provided the first coding system based on forward error correction, which
made tags easier to generate, faster to correlate, and provided greater orthogonality between tags.

ARTag 提供第一个向前纠错的解码系统,这个让tag容易产生,快速纠错,也提供更好的算法。

 

二、获取tagsDetector

2.1 整体描述:

we describe the detector whose job is to estimate the position of possible tags in an image. Loosely speaking, the detector attempts to find four-sided regions (quads) that have a darker interior than their exterior. The tags themselves have black and white borders in order to facilitate this.

寻找场景中可能的tag图像,即尝试着寻找内“黑”外“白”的四边形,并且为了好识别tag本身有黑白的边缘特征。如下图。

 

AprilTag: A robust and flexible visual fiducial system论文解读 

 

2.2 获取线段(Detecting line segments

Our approach begins by detecting lines in the image. Our approach, similar in basic approach to the ARTag detector, computes the gradient direction and magnitude at every pixel and agglomeratively clusters the pixels into components with similar gradient directions and magnitudes.

大概意思是说,类似于ARTag 的获取方法,即计算tag的每一个像素点的梯度方向和幅值,并且把相同的梯度方向和幅值得像素集群到一个部件中。

 

2.3 之前的方法(Early processing steps

FirstThe tag detection algorithm begins by computing the gradient at every pixel, computing their magnitudes (通过计算像素的梯度得到幅值图像)。

    

                                                                                               AprilTag: A robust and flexible visual fiducial system论文解读

Secondgradient direction(得到梯度方向)

 

                                                                                                 AprilTag: A robust and flexible visual fiducial system论文解读

    Thirdsimilar gradient directions and magnitude are clustered into components(相似的梯度方向和幅值集群到一个组件)

                                                                                                 AprilTag: A robust and flexible visual fiducial system论文解读

        集群算法:

The clustering algorithm is similar to the graph-based method of Felzenszwalb : a graph is created in which each node represents a pixel.

使用类似于Felzenszwalb集群算法,每一个节点node来代表一个像素。

 

算法描述:

Edges are added between adjacent pixels with an edge weight equal to the pixels’ difference in gradient direction. These edges are then sorted and processed in terms of increasing edge weight: for each edge, we test whether the connected components that the pixels belong to should be joined together.

边缘被添加是通过临近的不同的像素梯度方向的边缘权重。这些边缘在增长边缘权重方面被分类和处理:为了每个边缘,测试像素属于应该被集群的像素是否连接组件。

 

算法问题:

This gradient-based clustering method is sensitive to noise in the image: even modest amounts of noise will cause local gradient directions to vary, inhibiting the growth of the components. The solution to this problem is to low-pass filter the image.

算法对于噪声集群方法很敏感,甚至适当的噪声会导致局部梯度不同,约束部件增长。解决方案的问题可以通过低通滤波。

 

Unlike other problem domains where this filtering can blur useful information in the image, the edges of a tag are intrinsically large-scale features (particularly in comparison to the data field), and so this filtering does not cause information loss. We recommend a value of σ = 0.8.

不像其他问题域,这个滤波会模糊一些有用的信息,tag的边缘本质上是一个很大的特征,所以滤波不会导致信息丢失,建议设置值为0.8

 

 

 

 

 

FourthUsing weighted least squares, a line segment is then fit to the pixels in each component.(使用加权最小二乘法,一条线段就适合每个组件的像素。)

 

 AprilTag: A robust and flexible visual fiducial system论文解读

The direction of the line segment is determined by the gradient direction, so that segments are dark on the left, light on the right. The direction of the lines are visualized by short perpendicular notchesat their midpoint; note that these notchesalways point towards the lighter region.

线段的方向通过梯度的方向来决定,因此线段的左边是暗部,右边是亮部。线段的方向在线段的中部短的垂直槽口来直观表示,注意这些槽口总是指向亮得区域。

 

2.4 获取线段的总结

The segmentation algorithm is the slowest phase in our detection scheme. As an option, this segmentation can be performed at half the image resolution with a 4x improvement in speed. The sub-sampling operation can be efficiently combined with the recommended low-pass filter. The consequence of this optimization is a modestly reduced detection range, since very small quads may no longer be detected.

分割算法是最慢的在获取方案中,作为一个选项,这种分割可以在一半的图像分辨率提升了4倍的速度。二级抽样操作推荐与低通滤波器结合能增加效率。最有效的结果是适当的减少获取范围,因此非常小的四边形不再被获取。

 

2.5 四边形获取

Our approach is based on a recursive depth-first search with a depth of four: each level of the search tree adds an edge to the quad. At depth one, we consider all line segments. At depths two through four, we consider all of the line segments that begin “close enoughto where the previous line segment ended and which obey a counter-clockwise winding order.

我们的方法是基于一个深度4的递归深度优先搜索算法:每一层搜索添加一个边缘到四边形。在第一层深度,考虑所有的线段。在第二层到第四层,考虑所有的线段从“完全闭合”之前线段结束的地方开始,并且服从一个逆时针缠绕顺序。

 

Robustness to occlusions and segmentation errors is handled by adjusting the close enoughthreshold: by making the threshold large, significant gaps around the edges can be handled. Our threshold for close enoughis twice the length of the line plus five additional pixels. This is a large threshold which leads to a low false negative rate, but also results in a high false positive rate.

鲁棒性遮挡和分割错误处理通过调整“完全闭合”阈值:通过标记大的阈值,大的间隙边缘会被处理。我们阈值足够近两倍的长度线加另外5个像素,这是一个大门槛导致负错误率很低,但也导致较高正错误率。

 

We populate a two-dimensional lookup table to accelerate queries for line segments that begin near a point in space.

填充一个二维查找表来加快查询线段,开始在空间中的一个点。

 

三、算出tag距相机距离与角度

3.1 Homography and extrinsics estimation单应性和外在评估

3.1.1 通过DLT得到单应矩阵

We compute the 3×3 homography matrix that projects 2D points in homogeneous coordinates from the tags coordinate system (in which [0 0 1]T is at the center of the tag and the tag extends one unit in the xˆ and yˆdirections) to the 2D image coordinate system. The homography is computed using the Direct Linear Transform (DLT) algorithm. Note that since the homography projects points in homogeneous coordinates, it is defined only up to scale.

计算的3x3 单应矩阵, 项目2D 点的均匀坐标从标签的坐标系 (在其中 [0 0 1] T 是在标签的中心和标签扩展一个单位在 xˆ和 yˆ方向) 2D 图像坐标系统。应是使用直接线性变换 (DLT) 算法计算的。请注意,由于单应项目是以齐次坐标表示的, 所以它的定义只有按比例。

 

3.1.2 计算方法

Computation of the tags position and orientation requires additional information: the cameras focal length and the physical size of the tag.

标签的位置和方向的计算需要附加信息:相机的焦距和标签的物理大小。

                                                                                      AprilTag: A robust and flexible visual fiducial system论文解读

 

The 3 × 3 homography matrix (computed by the DLT) can be written as the product of the 3 × 4 camera projection matrix P (which we assume is known) and the 4 × 3 truncated extrinsics matrix E.

3 x 3 单应矩阵 (DLT 计算) 可以写成 3 x 4 相机投影矩阵 P (我们假设已知) 4 x 3 截断extrinsics矩阵E的乘积。

 

截断extrinsics矩阵 E

extrinsics matrix are typically 4 × 4, but every position on the tag
is at z = 0 in the tags coordinate system. Thus, we can rewrite every tag coordinate as a 2D homogeneous point with z implicitly zero, and remove the third column of the extrinsics matrix, forming the truncated extrinsics matrix.

extrinsics 矩阵通常是 4 x 4, 但每个位置上的标签在标记的坐标系统中为 z = 0。因此, 我们可以将每个标记坐标重写为一个具有 z 隐式零的2D 齐点, 并移除 extrinsics 矩阵的第三列

 

We represent the rotation components of P as Rij and the translation components as Tk . We also represent the unknown scale factor as s.

我们代表 P 的旋转分量为 Rij转换组件作为 Tk 。我们也代表未知比例因子为s

 

Note that we cannot directly solve for E because P is rank deficient. We can expand the right hand side of Eqn. 2, and write the expression for each hij as a set of simultaneous equations。

请注意, 我们不能直接解决 E, 因为 P 是秩不足.我们可以扩大右手边的 Eqn 2, 将每个hij的表达式写为一组同方程

 AprilTag: A robust and flexible visual fiducial system论文解读

These are all easily solved for the elements of Rij and Tk except for the unknown scale factor s. However, since the columns of a rotation matrix must all be of unit magnitude,we can constrain the magnitude of s. We have two columns of the rotation matrix, so we compute s as the geometric the
geometric average of their magnitudes. The sign of s can be recovered by requiring that the tag appear in front of the camera, i.e., that Tz < 0. The third column of the rotation matrix can be recovered by computing the cross product of the two known columns, since the columns of a rotation
matrix must be orthonormal.

这些都很容易解决的 Rij Tk 的元素除了未知的比例因子 s。然而, 由于旋转矩阵的列必须都是单位幅值,我们可以限制 s 的大小。我们有两列的旋转矩阵, 所以我们计算 s 他们幅值的几何平均值。标记s可以重新获得通过请求在相机前的tagTz < 0。旋转的第三列矩阵可以通过计算交叉乘积来恢复两个已知列, 因为旋转的列矩阵必须是正交的。

The DLT procedure and the normalization procedure above do not guarantee that the rotation matrix is strictly orthonormal. To correct this, we compute the polar decomposition of R, which yields a proper rotation matrix while minimizing the Frobenius matrix norm of the error.

DLT 程序与规范化程序以上不保证旋转矩阵是严格正交.为了纠正这一点, 我们计算 R 的极分解, 它产生一个适当的旋转矩阵, 而最小化误差的 Frobenius 矩阵范数

 

3.2 PAYLOAD DECODING (有效载荷解码

3.2.1 整体概述

The final task is to read the bits from the payload field.We do this by computing the tag-relative coordinates of each bit field, transforming them into image coordinates using the homography, and then thresholding the resulting pixels. In order to be robust to lighting (which can vary not only from tag to tag, but also within a tag), we use a spatially-varying threshold.

最后的任务是从有效负载字段中读取位。我们通过计算每个位字段tag相对坐标系, 利用单应性将它们转换为图像坐标, 然后对结果像素进行阈值化。为了受光照影响小 (这不仅可以tagtag, 而且也可以在一个tag), 我们使用空间变化阈

AprilTag: A robust and flexible visual fiducial system论文解读

we build spatially-varying model of the intensity of blackpixels, and a second model for the intensity ofwhitemodels. We use the border of the tag, which contains known examples of both white and black pixels.

我们建立了 "黑色" 像素的强度的空间变化模型, 以及第二个模型的强度"白色" 模型。我们使用标签的边框, 它包含白色和黑色像素的已知示例

 

 AprilTag: A robust and flexible visual fiducial system论文解读

 

 

A fourth quad is detected around one of the payload bits of the larger
tag. These two extraneous detections are eventually discarded because their payload is invalid. The white dots correspond to samples around the tags border which are used to fit a linear model of intensity of whitepixels; a model is similarly fit for the black pixels. These two models are used to threshold the data payload bits, shown as yellow dots.

在较大的一个有效载荷位tag检测到一个四个方形。这两个外部检测最终被丢弃, 因为它们的有效负载无效。白点对应于tag周围的样本用于拟合 "" 像素强度线性模型的边界;模型同样适合黑色像素。这两种模型用于阈值数据有效负载位, 显示为黄色点。

 

This model has four parameters which are easily computed using least squares regression. We build two such models, one for black, the other for white. The threshold used when decoding data bits is then just the average of the predicted intensity values of the black and white models.

该模型有四参数, 易于计算使用最小二乘法回归。我们建立了两个这样的模型一个是黑色的, 另一个是白色的。使用的阈值解码数据位, 然后只是平均的预测黑白模型的强度值

 

3.2.2 CODING SYSTEM (编码系统,决定获取的四边形是否有效。)

The goals of a coding system are to:
• Maximize the number of distinguishable codes
• Maximize the number of bit errors that can be detected or corrected
• Minimize the false positive/inter-tag confusion rate
• Minimize the total number of bits per tag (and thus the size of the tag)
These goals are often in conflict, and so a given code represents a trade-off.

编码系统的目标是:

·最大化可区分码的数量

·最大化可检测或更正的位错误数

·最小的the false positive/inter-tag 混淆率

·最小化每个tag的总位数 (tag的大小)

这些目标经常处于冲突中, 因此给定的代码表示权衡。

 

we describe a new coding system based on lexicodes that provides significant advantages over previous methods. Our procedure can generate lexicodes with a variety of properties, allowing the user to use a code that best fits their needs.

我们描述了一个新基于 lexicodes 的编码系统, 提供了显著优于以前的方法。我们的程序可以生成具有多种属性的 lexicodes, 允许用户使用最符合其需要的代码

 

we use a lexicode system that can generate codes for any arbitrary tag size (e.g., 3x3, 4x4, 5x5, 6x6) and minimum Hamming distance. Our approach explicitly guarantees the minimum Hamming distance for all four
rotations of each tag and eliminates tags which are of low geometric complexity. Computing the tags can be an expensive operation, but is done offline. Small tags (5x5) can be easily computed in seconds or minutes, but larger tags (6x6) can take several days of CPU time.

我们使用一个 lexicode 系统, 可以生成任意标记大小的码 (例如, 3x3, 4x4, 5x5, 6x6)和最小汉明距离。我们的方法明确保证最小汉明距离每个tag4方向旋转和消除标签低几何复杂度。计算tag是昂贵的操作, 但离线完成。小标签 (5x5)可以很容易地以秒或分钟计算, 但更大标记 (6x6) 可能需要天的 CPU 时间。


注:以上是个人见解,有什么不恰当的地方请各位大神指出,非常的乐意和各位大神交流和研究这个算法。

文档下载:

http://download.csdn.net/download/technology_h/10182766