### Three-Dimensional Computer Vision

### ---- A gentle introduction

Liming Hu

**Abstract**-This paper is a brief introduction to the stereo vision from the perspective of geometry. I found that the papers and books I am reading about stereo vision are not so easy to understand for some one with basic image processing knowledge, so I wrote this introduction.

As shown in Fig. 1, the relationship between disparity and **depth** (also called **range** in some textbooks), r is the depth from the object P to the stereo camera head (camera C_{1} and C_{2} pairs), b, called **baseline**, is the horizontal distance between two camera head. The plane formed by points: P, C_{1}, and C_{2} is called **epipolar plane**, the line M_{1}M_{2} is called **epipolar line**, which is formed by the intersection of epipolar plane and the image plane, it is horizontal in Fig. 1. The **corresponding match pixel** M_{2} on the right image should lie in the epipolar line corresponding to pixel M_{1} on the left image, this is called epipolar constraint, which helps decrease the search space in the **correspondence problem** in stereo analysis.

Suppose the two cameras are the exactly the same:

They have the same focal length f (the distance from the center of the lens to the image plane), they have the same shutter speed, and take the pictures at the same time, and also the two cameras are put in the same plane, and mounted in parallel, they only have horizontal distance between them.

As shown in Fig. 1, d_{l} is the horizontal distance from the image center O_{1} to the object image M_{1} for the left image; d_{r} is the horizontal distance from the image center O_{2} to the object image M_{2} for the right image. The difference between d_{l} and d_{r} is called **disparity**, in Fig. 1, if the object falls between the two camera head, d = |d_{l} + d_{r}|. If the object P above the stereo camera head, is on left side or right side of the stereo head, then d = |d_{l} - d_{r}|, practically, because the distance between the two camera head is so small, the chance that the object can fall between the two camera head is low. In Fig. 1, it is straightforward that ΔP’PC_{1} and ΔP’PC_{1} are similar, so we have: b/r = d/f, it means that:

r=bf/d (1)

Eq. (1) means that depth is inverse proportion to the disparity. Suppose the distance between the two cameras is 12.57cm, and the focal length is 2.454mm (10^{-3} m), and the horizontal distance between two pixels in the image is 7µm (the reciprocal of the horizontal distance between two pixels in the image is called the **scale** in horizontal direction.), we have a disparity of 24 pixels (one pixel is interpolated to 16 subpixels), then the depth will be: 1.83m. As the depth increases, the disparity decreases faster, and finally it will be insensitive to the depth, it means change in disparity only takes place in a certain range of depth.

Figure 1. Relationship between disparity and depth

From Eq. (1), we can get:

әr/әd = -bf/d^{2} (2)

From Eq.(1), d = bf/r, substitute in Eq.(2), we can get:

әr/әd = - r^{2} /bf (3)

Suppose the origin of the coordination system is C_{1}, we can get the coordinate of P by the similarity of ΔPO_{1}’C_{1} and ΔM_{1}O_{1}C_{1}:

x = -x_{l*}b/d, y = -y_{l*}b/d, z = f*b/d (4)

So it means stereo algorithm should search a certain range of disparities (**disparity window**), and the depth of objects can be determined by the disparity window is restricted to certain interval, called **horopter**. Horopter is very important in applications.

Horopter can be made large by: decreasing the baseline, decreasing the focal length (by using wider angle lenses, maybe severe distortion), increasing pixel width by down sampling the image, and increasing the disparity window. We should know that by decreasing the baseline, decreasing the focal length, and increasing pixel width to increase Horopter will have a side effect on decreasing the **depth resolution** as shown in Eq. (2).

Depth resolution is defined as minimal change in depth that the stereo camera head can differentiate. From Eq. (3), we can see, suppose that disparity resolution is given, the depth resolution increases (goes bad) as the baseline and focal length decreases, and the square of depth increases (so it means as the depth increases, the stereo camera will be less sensitive to the depth). Also we can see depth resolution is proportion to disparity resolution, so if the vision algorithm can interpolate the disparity images, then we can get better depth resolution, suppose disparities are interpolated to 1/16 pixels, so a search range of 24 (0~23) means that there are 24*16=384 disparity values (0~383), For the parameters given above, suppose at 1m depth, then әr = -1^{2} * 7*10^{-6 }/ (12.57 * 10^{-2}*2.454*10^{-3}) = -2.27cm, at 3m depth, әr = -20.4cm, at 4m depth, әr = -36.3cm.

Depth resolution is different from **range (depth) accuracy**, range (depth) accuracy measures how well the computed range by the algorithm compares with the actual range (depth).

Some parameters that may influence stereo analysis result:

1) **Confidence threshold**: The confidence threshold will eliminate stereo matches that have a low probability of success because of lack of image texture; weak textures give a confidence measure below the threshold, and are eliminated by the algorithm. A good threshold value can be found by pointing the stereo cameras at a texture less surface, i.e. blank floor, and starting the stereo algorithm, start from the threshold of 0, until noise just disappears in the disparity image (the noises are replaced by a uniform black area.).

2) **Correlation size**: the size of the correlation window used for correspondence matching affects the result of disparity image; a larger window will produce smoother disparity images, and will miss smaller objects. A smaller window will give more spatial detail, but will tend to produce noisy disparity image.

**Reference**

[1] http://www.ai.sri.com/~konolige/svs

[2] Dhond, U.R. and Aggarwal, J.K. Structure from stereo-a review, **IEEE Transactions on Systems, Man and Cybernetics, **Vol. 19, No. 6, pp. 1489-1510, 1989.

[3] Olivier Faugeras, Three-Dimensional Computer Vision, Mit press, 1993.

## No comments:

Post a Comment