## Monday, October 18, 2010

### ---- A gentle introduction

Liming Hu

Abstract-This paper is a brief introduction to the stereo vision from the perspective of geometry. I found that the papers and books I am reading about stereo vision are not so easy to understand for some one with basic image processing knowledge, so I wrote this introduction.

As shown in Fig. 1, the relationship between disparity and depth (also called range in some textbooks), r is the depth from the object P to the stereo camera head (camera C1 and C2 pairs), b, called baseline, is the horizontal distance between two camera head. The plane formed by points: P, C1, and C2 is called epipolar plane, the line M1M2 is called epipolar line, which is formed by the intersection of epipolar plane and the image plane, it is horizontal in Fig. 1. The corresponding match pixel M2 on the right image should lie in the epipolar line corresponding to pixel M1 on the left image, this is called epipolar constraint, which helps decrease the search space in the correspondence problem in stereo analysis.

Suppose the two cameras are the exactly the same:

They have the same focal length f (the distance from the center of the lens to the image plane), they have the same shutter speed, and take the pictures at the same time, and also the two cameras are put in the same plane, and mounted in parallel, they only have horizontal distance between them.

As shown in Fig. 1, dl is the horizontal distance from the image center O1 to the object image M1 for the left image; dr is the horizontal distance from the image center O2 to the object image M2 for the right image. The difference between dl and dr is called disparity, in Fig. 1, if the object falls between the two camera head, d = |dl + dr|. If the object P above the stereo camera head, is on left side or right side of the stereo head, then d = |dl - dr|, practically, because the distance between the two camera head is so small, the chance that the object can fall between the two camera head is low. In Fig. 1, it is straightforward that ΔP’PC1 and ΔP’PC1 are similar, so we have: b/r = d/f, it means that:

r=bf/d (1)

Eq. (1) means that depth is inverse proportion to the disparity. Suppose the distance between the two cameras is 12.57cm, and the focal length is 2.454mm (10-3 m), and the horizontal distance between two pixels in the image is 7µm (the reciprocal of the horizontal distance between two pixels in the image is called the scale in horizontal direction.), we have a disparity of 24 pixels (one pixel is interpolated to 16 subpixels), then the depth will be: 1.83m. As the depth increases, the disparity decreases faster, and finally it will be insensitive to the depth, it means change in disparity only takes place in a certain range of depth. Figure 1. Relationship between disparity and depth

From Eq. (1), we can get:

әr/әd = -bf/d2 (2)

From Eq.(1), d = bf/r, substitute in Eq.(2), we can get:

әr/әd = - r2 /bf (3)

Suppose the origin of the coordination system is C1, we can get the coordinate of P by the similarity of ΔPO1’C1 and ΔM1O1C1:

x = -xl*b/d, y = -yl*b/d, z = f*b/d (4)

So it means stereo algorithm should search a certain range of disparities (disparity window), and the depth of objects can be determined by the disparity window is restricted to certain interval, called horopter. Horopter is very important in applications.

Horopter can be made large by: decreasing the baseline, decreasing the focal length (by using wider angle lenses, maybe severe distortion), increasing pixel width by down sampling the image, and increasing the disparity window. We should know that by decreasing the baseline, decreasing the focal length, and increasing pixel width to increase Horopter will have a side effect on decreasing the depth resolution as shown in Eq. (2).

Depth resolution is defined as minimal change in depth that the stereo camera head can differentiate. From Eq. (3), we can see, suppose that disparity resolution is given, the depth resolution increases (goes bad) as the baseline and focal length decreases, and the square of depth increases (so it means as the depth increases, the stereo camera will be less sensitive to the depth). Also we can see depth resolution is proportion to disparity resolution, so if the vision algorithm can interpolate the disparity images, then we can get better depth resolution, suppose disparities are interpolated to 1/16 pixels, so a search range of 24 (0~23) means that there are 24*16=384 disparity values (0~383), For the parameters given above, suppose at 1m depth, then әr = -12 * 7*10-6 / (12.57 * 10-2*2.454*10-3) = -2.27cm, at 3m depth, әr = -20.4cm, at 4m depth, әr = -36.3cm.

Depth resolution is different from range (depth) accuracy, range (depth) accuracy measures how well the computed range by the algorithm compares with the actual range (depth).

Some parameters that may influence stereo analysis result:

1) Confidence threshold: The confidence threshold will eliminate stereo matches that have a low probability of success because of lack of image texture; weak textures give a confidence measure below the threshold, and are eliminated by the algorithm. A good threshold value can be found by pointing the stereo cameras at a texture less surface, i.e. blank floor, and starting the stereo algorithm, start from the threshold of 0, until noise just disappears in the disparity image (the noises are replaced by a uniform black area.).

2) Correlation size: the size of the correlation window used for correspondence matching affects the result of disparity image; a larger window will produce smoother disparity images, and will miss smaller objects. A smaller window will give more spatial detail, but will tend to produce noisy disparity image.

Reference

 Dhond, U.R. and Aggarwal, J.K. Structure from stereo-a review, IEEE Transactions on Systems, Man and Cybernetics, Vol. 19, No. 6, pp. 1489-1510, 1989.

 Olivier Faugeras, Three-Dimensional Computer Vision, Mit press, 1993.