2 Pinhole cameras
As the aperture size decreases, the image gets sharper, but darker.
3 Cameras and lenses
In modern cameras, the above conflict between crispness and brightness is mitigated by using lenses.
A 3D point at further distance in front of the lens result in rays converge to a closer point behind the lens.
Because the paraxial refraction model approximates using the thin lens assumption, a number of aberrations can occur. The most common one is referred to as radial distortion, which causes the image magnification to decrease or increase as a function of the distance to the optical axis. We classify the radial distortion as pincushion distortion when the magnification increases and barrel distortion (e.g. fish-eye lenses) when the magnification decreases. Radial distortion is caused by the fact that different portions of the lens have differing focal lengths.

4 Going to digital image space
As discussed earlier, a point P in 3D space can be mapped (or projected) into a 2D point P′ in the image plane Π′. This R3→R2 mapping is referred to as a projective transformation.

4.1 The Camera Matrix Model and Homogeneous Coordinates
4.1.1 Introduction to the Camera Matrix Mode
The camera matrix model describes a set of important parameters that affect how a world point P=(x,y,z) is mapped to image coordinates P′=(x′,y′).
P′=[x′y′]=[kz′xz+cxlz′yz+cy]=[αxz+cxβyz+cy],where,
- x′,y′ are coordinates of a image point P′ in digital image coordinates; they have units like “pixel”.
- z′ are distance between image plane and lens center; it has unit like “cm”;
- x,y,z are coordinates of a world point P in world coordinates; they have units like “cm”;
- cx,cy are coordinates translation offsets; they have units like “pixel”; they are offsets between digital image coordinates (top left origin) and image plane coordinates (center origin); they equal to half digital image width and height.
- k,l are pixel density; they have units like “pixels per inch (ppi) or pixels per cm”; they may be different because the aspect ratio of a pixel is not guaranteed to be one; if they are equal, we often say that the camera has square pixels.
- α=kz′; β=lz′.
4.1.2 Homogeneous Coordinates
From Equation (4), we see the projection P=(x,y,z)→P′=(x′,y′) is not linear, as the operation divides z. We can move to homogeneous coordinates to represent this projection as a matrix-vector product, which would be useful for future derivations.
To convert from Euclidean coordinate system to homogeneous coordinate system, we simply append a 1 in a new dimension. Any point P′=(x′,y′) becomes (x′,y′,1). Similarly, any point P=(x,y,z) becomes (x,y,z,1). When converting back from arbitrary homogeneous coordinates (v1,⋯,vn,w), we get Euclidean coordinates (v1w,⋯,vnw).
Using homogeneous coordinates, we can reformulate Equation (4) by a matrix vector relationship as
P′=[x′y′1]=[αxz+cxβyz+cy1]=[αx+cxzβy+cyzz]=[α0cx00βcy00010][xyz1]=[α0cx00βcy00010]P=[α0cx0βcy001][I0]P=K[I0]P.4.1.3 Intrinsic Parameters
The following matrix K is the intrinsic parameters matrix without considering skewness and distortion:
K=[α0cx0βcy001].As for skewness, the angle θ between the two axes may be slightly larger or smaller than 90 degrees. The K accounting for skewness is
K=[α−αcotθcx0βsinθcy001],Here K has 5 degrees of freedom:
- α,β for scaling;
- cx,cy for translation offset;
- θ for skewness.