Monday, December 30, 2019

Implementing a Gaussian blur filter together with convolution operation from scratch

Gaussian blurring is a very common filter used in image processing which is useful for many things such as removing salt and pepper noise from images, resizing images to be smaller (downsampling), and simulating out-of-focus effects. Let's look at how to implement one ourselves.

Let's use the following grey-scale image as an example:


The image consists of many numbers that are arranged in a grid, each number called a pixel. The pixel values range from 0 to 255 where 0 is represented with a black colour and 255 with a white colour and all the other numbers are shades of grey in between. Here are the values of the 5x5 square of pixels in the top-left corner of the image:
[[156, 157, 160, 159, 158],
 [156, 157, 159, 158, 158],
 [158, 157, 156, 156, 157],
 [160, 157, 154, 154, 156],
 [158, 157, 156, 156, 157]]

In the case of a colour image, each pixel contains 3 values: the intensity of red, the intensity of blue, and the intensity of green. When these three primary colours are combined in different intensities they can produce any colour.

Let's use the following colour image as an example:


The values of the 5x5 square of pixels in the top-left corner of the image are:
[[[21, 13,  8], [21, 13,  9], [20, 11,  8], [21, 13, 11], [21, 14,  8]],
 [[21, 13,  7], [21, 13,  9], [20, 14,  7], [22, 14,  8], [20, 13,  7]],
 [[21, 14,  7], [23, 13, 10], [20, 14,  9], [21, 13,  9], [21, 15,  9]],
 [[21, 13,  6], [21, 13,  9], [21, 13,  8], [21, 13, 10], [20, 12,  5]],
 [[21, 13,  5], [21, 13,  7], [21, 13,  4], [21, 13,  7], [21, 13,  8]]]

The convolution operation

One of the basic operations in image processing is the convolution. This is when you replace every pixel value $p$ in an image with a weighted sum of the pixel values near $p$, that is, multiply each nearby pixel value with a different number before adding them all up and replacing $p$ with the result. The simplest example of a convolution is when 'nearby' means the pixel itself only. For example, if we convolve a 5x5 pixel greyscale image with a 1x1 weighting, this is what we get:

$$
\left[\begin{array}{ccccc}
1 & 2 & 3 & 4 & 5\\
6 & 7 & 8 & 9 & 10\\
11 & 12 & 13 & 14 & 15\\
16 & 17 & 18 & 19 & 20\\
21 & 22 & 23 & 24 & 25\\
\end{array}\right]
\circledast
\left[\begin{array}{c}
100
\end{array}\right]
=
\left[\begin{array}{ccccc}
1 \times 100 & 2 \times 100 & 3 \times 100 & 4 \times 100 & 5 \times 100\\
6 \times 100 & 7 \times 100 & 8 \times 100 & 9 \times 100 & 10 \times 100\\
11 \times 100 & 12 \times 100 & 13 \times 100 & 14 \times 100 & 15 \times 100\\
16 \times 100 & 17 \times 100 & 18 \times 100 & 19 \times 100 & 20 \times 100\\
21 \times 100 & 22 \times 100 & 23 \times 100 & 24 \times 100 & 25 \times 100\\
\end{array}\right]
=
\left[\begin{array}{ccccc}
100 & 200 & 300 & 400 & 500\\
600 & 700 & 800 & 900 & 1000\\
1100 & 1200 & 1300 & 1400 & 1500\\
1600 & 1700 & 1800 & 1900 & 2000\\
2100 & 2200 & 2300 & 2400 & 2500\\
\end{array}\right]
$$

Here we have replaced the pixel value 1 with 100, 2 with 200, etc. Note that in practice the numbers would be clipped to not exceed 255.

Only the pixel being replaced contributed to its new value. We can extend the range of the weighting by using a 3x3 weight instead of a single number which will combine all the pixels adjacent to the pixel being replaced.

$$
\left[\begin{array}{ccccc}
1 & 2 & 3 & 4 & 5\\
6 & 7 & 8 & 9 & 10\\
11 & 12 & 13 & 14 & 15\\
16 & 17 & 18 & 19 & 20\\
21 & 22 & 23 & 24 & 25\\
\end{array}\right]
\circledast
\left[\begin{array}{ccc}
100 & 200 & 100\\
300 & 400 & 300\\
100 & 200 & 100\\
\end{array}\right]
=
\left[\begin{array}{ccccc}

\left(\begin{array}{cccccc}
& 1 \times 100 & + & 2 \times 200 & + & 3 \times 100\\
+ & 6 \times 300 & + & 7 \times 400 & + & 8 \times 300\\
+ & 11 \times 100 & + & 12 \times 200 & + & 13 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 2 \times 100 & + & 3 \times 200 & + & 4 \times 100\\
+ & 7 \times 300 & + & 8 \times 400 & + & 9 \times 300\\
+ & 12 \times 100 & + & 13 \times 200 & + & 14 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 3 \times 100 & + & 4 \times 200 & + & 5 \times 100\\
+ & 8 \times 300 & + & 9 \times 400 & + & 10 \times 300\\
+ & 13 \times 100 & + & 14 \times 200 & + & 15 \times 100\\
\end{array}\right)
\\

\left(\begin{array}{cccccc}
& 6 \times 100 & + & 7 \times 200 & + & 8 \times 100\\
+ & 11 \times 300 & + & 12 \times 400 & + & 13 \times 300\\
+ & 16 \times 100 & + & 17 \times 200 & + & 18 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 7 \times 100 & + & 8 \times 200 & + & 9 \times 100\\
+ & 12 \times 300 & + & 13 \times 400 & + & 14 \times 300\\
+ & 17 \times 100 & + & 18 \times 200 & + & 19 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 8 \times 100 & + & 9 \times 200 & + & 10 \times 100\\
+ & 13 \times 300 & + & 14 \times 400 & + & 15 \times 300\\
+ & 18 \times 100 & + & 19 \times 200 & + & 20 \times 100\\
\end{array}\right)
\\

\left(\begin{array}{cccccc}
& 11 \times 100 & + & 12 \times 200 & + & 13 \times 100\\
+ & 16 \times 300 & + & 17 \times 400 & + & 18 \times 300\\
+ & 21 \times 100 & + & 22 \times 200 & + & 23 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 12 \times 100 & + & 13 \times 200 & + & 14 \times 100\\
+ & 17 \times 300 & + & 18 \times 400 & + & 19 \times 300\\
+ & 22 \times 100 & + & 23 \times 200 & + & 24 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 13 \times 100 & + & 14 \times 200 & + & 15 \times 100\\
+ & 18 \times 300 & + & 19 \times 400 & + & 20 \times 300\\
+ & 23 \times 100 & + & 24 \times 200 & + & 25 \times 100\\
\end{array}\right)
\\

\end{array}\right]
=
\left[\begin{array}{ccc}
12600 & 14400 & 16200\\
21600 & 23400 & 25200\\
30600 & 32400 & 34200\\
\end{array}\right]
$$

Note how the new image becomes smaller than the original image. This is in order to avoid using pixels values that lie outside of the image edges when computing new pixel values. One way to use pixel values outside of the image is to treat any pixel outside of the image as having a value of zero (zero padding). We can also reflect the image outside of its edges or wrap around to the other side of the image.

We can implement the above convolution operation in Python naively using for loops as follows:

def conv(img, wgt):
    img_h = len(img)
    img_w = len(img[0])
    wgt_h = len(wgt)
    wgt_w = len(wgt[0])

    res_h = img_h - (wgt_h//2)*2
    res_w = img_w - (wgt_w//2)*2
    res = [ [ 0 for _ in range(res_w) ] for _ in range(res_h) ]

    for img_row in range(wgt_h//2, img_h-wgt_h//2):
        for img_col in range(wgt_w//2, img_w-wgt_w//2):
            new_pix = 0
            for wgt_row in range(wgt_h):
                for wgt_col in range(wgt_w-1, -1, -1): #Technically, a convolution operation needs to flip the weights left to right.
                    new_pix += wgt[wgt_row][wgt_col]*img[img_row+wgt_row-wgt_h//2][img_col+wgt_col-wgt_w//2]
            res[img_row - wgt_h//2][img_col - wgt_w//2] = new_pix

    return res

print(conv([
            [ 1, 2, 3, 4, 5],
            [ 6, 7, 8, 9,10],
            [11,12,13,14,15],
            [16,17,18,19,20],
            [21,22,23,24,25]
        ], [
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ]))
[[12600, 14400, 16200], [21600, 23400, 25200], [30600, 32400, 34200]]

I say naively because it is possible to perform a convolution using fewer loops and with parallel processing but that would be beyond the scope of this blog post. Note that in the code above we are flipping the weights before multiplying them because that is what a convolution involves, but since we'll be working with symmetric weights then it doesn't really matter.

If we want to use zero padding in our convolution then we would do it as follows:

def conv_pad(img, wgt):
    img_h = len(img)
    img_w = len(img[0])
    wgt_h = len(wgt)
    wgt_w = len(wgt[0])

    res = [ [ 0 for _ in range(img_w) ] for _ in range(img_h) ]

    for img_row in range(img_h):
        for img_col in range(img_w):
            new_pix = 0
            for wgt_row in range(wgt_h):
                for wgt_col in range(wgt_w-1, -1, -1):
                    y = img_row+wgt_row-wgt_h//2
                    x = img_col+wgt_col-wgt_w//2
                    if 0 >= y > img_h and 0 >= x > img_w:
                        pix = img[y][x]
                    else:
                        pix = 0
                    new_pix += wgt[wgt_row][wgt_col]*pix
            res[img_row][img_col] = new_pix

    return res

print(conv_pad([
            [ 1, 2, 3, 4, 5],
            [ 6, 7, 8, 9,10],
            [11,12,13,14,15],
            [16,17,18,19,20],
            [21,22,23,24,25]
        ], [
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ]))
[[2900, 4800, 6200, 7600, 6100], [8300, 12600, 14400, 16200, 12500], [14800, 21600, 23400, 25200, 19000], [21300, 30600, 32400, 34200, 25500], [19900, 28800, 30200, 31600, 23100]]

We can adapt this function to be able to operate on colour images by simply applying the filter on each of the three intensity channels separately as follows:

def conv_pad_colour(img, wgt):
    img_h = len(img)
    img_w = len(img[0])
    wgt_h = len(wgt)
    wgt_w = len(wgt[0])

    res = [ [ [ 0, 0, 0 ] for _ in range(img_w) ] for _ in range(img_h) ]

    for img_row in range(img_h):
        for img_col in range(img_w):
            for channel in range(3):
                new_pix = 0
                for wgt_row in range(wgt_h):
                    for wgt_col in range(wgt_w-1, -1, -1):
                        y = img_row+wgt_row-wgt_h//2
                        x = img_col+wgt_col-wgt_w//2
                        if 0 <= y < img_h and 0 <= x < img_w:
                            pix = img[y][x][channel]
                        else:
                            pix = 0
                        new_pix += wgt[wgt_row][wgt_col]*pix
                res[img_row][img_col][channel] = new_pix

    return res

print(conv_pad_colour([
            [[ 1,  2,  3], [ 4,  5,  6], [ 7,  8,  9], [10, 11, 12], [13, 14, 15]],
            [[16, 17, 18], [19, 20, 21], [22, 23, 24], [25, 26, 27], [28, 29, 30]],
            [[31, 32, 33], [34, 35, 36], [37, 38, 39], [40, 41, 42], [43, 44, 45]],
            [[46, 47, 48], [49, 50, 51], [52, 53, 54], [55, 56, 57], [58, 59, 60]],
            [[61, 62, 63], [64, 65, 66], [67, 68, 69], [70, 71, 72], [73, 74, 75]]
        ], [
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ]))
[[[6700, 7700, 8700], [11600, 13000, 14400], [15800, 17200, 18600], [20000, 21400, 22800], [16300, 17300, 18300]], [[22300, 23600, 24900], [34200, 36000, 37800], [39600, 41400, 43200], [45000, 46800, 48600], [34900, 36200, 37500]], [[41800, 43100, 44400], [61200, 63000, 64800], [66600, 68400, 70200], [72000, 73800, 75600], [54400, 55700, 57000]], [[61300, 62600, 63900], [88200, 90000, 91800], [93600, 95400, 97200], [99000, 100800, 102600], [73900, 75200, 76500]], [[57700, 58700, 59700], [83600, 85000, 86400], [87800, 89200, 90600], [92000, 93400, 94800], [67300, 68300, 69300]]]
In practice we can use scipy's convolve function in order to apply a convolution:
import scipy.ndimage
import numpy as np
img = np.array([
            [[ 1,  2,  3], [ 4,  5,  6], [ 7,  8,  9], [10, 11, 12], [13, 14, 15]],
            [[16, 17, 18], [19, 20, 21], [22, 23, 24], [25, 26, 27], [28, 29, 30]],
            [[31, 32, 33], [34, 35, 36], [37, 38, 39], [40, 41, 42], [43, 44, 45]],
            [[46, 47, 48], [49, 50, 51], [52, 53, 54], [55, 56, 57], [58, 59, 60]],
            [[61, 62, 63], [64, 65, 66], [67, 68, 69], [70, 71, 72], [73, 74, 75]]
        ])
weights = np.array([
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ])
res = np.stack([ #Convolve each channel separately as a greyscale image and then combine them back into a single image.
        scipy.ndimage.convolve(img[:,:,channel], weights, mode='constant')
        for channel in range(3)
    ], axis=2)
print(res)
In signal processing, the weights are actually called a filter.

The gaussian blur
A blur is made by replacing each pixel value with the average value of its nearby pixels. This softens edges and more the image look out of focus. Averaging nearby pixels can be easily achieved by convolving with a filter of $\frac{1}{n}$ where $n$ is the amount of numbers in the filter. This results in making each pixel equal to $$p_1 \times \frac{1}{n} + p_2 \times \frac{1}{n} + \dots + p_n \times \frac{1}{n} = \frac{1}{n} \left(p_1 + p_2 + \dots + p_n\right)$$ where $p_i$ is a pixel value.
import scipy.ndimage
import skimage
import numpy as np

img = skimage.io.imread('image.png')
filter = np.ones([11, 11])
filter = filter/filter.size

res = np.stack([ #Convolve each channel separately as a greyscale image and then combine them back into a single image.
        scipy.ndimage.convolve(img[:,:,channel], filter, mode='constant')
        for channel in range(3)
    ], axis=2)

skimage.io.imshow(res)
skimage.io.show()

This is actually called a uniform blur, which is not considered an ideal blur as it gives equal importance to pixels that are far from the pixel being replaced as to pixels that are close. A better blur is a gaussian blur. A gaussian blur is when the filter values follow a gaussian bell curve with a maximum in the center and a steep decline just around the center. The gaussian function is defined as follows: $$G(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{x^2}{2\sigma^2}}$$ where $\sigma$ is a parameter for controlling how wide the bell curve is and therefore how much influence the nearby pixels have on the value of the pixel being replaced (the greater the $\sigma$, the more blurry). In two dimensions all you do is take the gaussian of each variable and multiply them together: $$G(x,y) = G(x) \times G(y)$$ Now the way the gaussian blur works is to use an infinite 2D gaussian curve as a filter for a convolution. The center of the filter is $G(0,0)$ and each cell in the filter being a distance of one in the gaussian function's domain as follows: $$ \newcommand\iddots{\mathinner{ \kern1mu\raise1pt{.} \kern2mu\raise4pt{.} \kern2mu\raise7pt{\Rule{0pt}{7pt}{0pt}.} \kern1mu }} \begin{array}{ccccc} \ddots & & \vdots & & \iddots\\ & G(-1, 1) & G( 0, 1) & G( 1, 1) & \\ \cdots & G(-1, 0) & G( 0, 0) & G( 1, 0) & \cdots\\ & G(-1,-1) & G( 0,-1) & G( 1,-1) & \\ \iddots & & \vdots & & \ddots\\ \end{array} $$ In order to be a perfect gaussian blur, the filter needs to be infinitely large, which is not possible. Fortunately, $G(3\sigma)$ is small enough to be rounded down to zero. This means that our kernel needs to only have a side of $\lceil 6\sigma + 1 \rceil$ at most as anything larger than that will not make a difference ($\lceil x \rceil$ means $x$ rounded up). The problem is that even though the missing values are negligible, the values in the filter will not equal 1 (especially for a small $\sigma$) which makes the resulting image will be either brighter or darker. To get around this we cheat a bit and divide each value in the filter by the filter's total, which results in a guassian blur with a neutral effect on the brightness of the image. $$ \begin{array}{ccccccc} \frac{G(-\lceil 3\sigma \rceil, \lceil 3\sigma \rceil)}{T} & & & \frac{G(0, \lceil 3\sigma \rceil)}{T} & & & \frac{G(\lceil 3\sigma \rceil, \lceil 3\sigma \rceil)}{T}\\ & \ddots & & \vdots & & \iddots & \\ & & \frac{G(-1, 1)}{T} & \frac{G( 0, 1)}{T} & \frac{G( 1, 1)}{T} & & \\ \frac{G(-\lceil 3\sigma \rceil, 0)}{T} & \cdots & \frac{G(-1, 0)}{T} & \frac{G( 0, 0)}{T} & \frac{G( 1, 0)}{T} & \cdots & \frac{G(\lceil 3\sigma \rceil, 1)}{T}\\ & & \frac{G(-1,-1)}{T} & \frac{G( 0,-1)}{T} & \frac{G( 1,-1)}{T} & & \\ & \iddots & & \vdots & & \ddots & \\ \frac{G(-\lceil 3\sigma \rceil, -\lceil 3\sigma \rceil)}{T} & & & \frac{G(0, -\lceil 3\sigma \rceil)}{T} & & & \frac{G(\lceil 3\sigma \rceil, -\lceil 3\sigma \rceil)}{T}\\ \end{array} $$ where $T$ is the total of all the $G(x,y)$ in the filter.
import scipy.ndimage
import skimage
import numpy as np

img = skimage.io.imread('image.png')

sigma = 4
G = lambda x: 1/np.sqrt(2*np.pi*sigma**2)*np.exp(-x**2/(2*sigma**2))
G2 = lambda x,y: G(x)*G(y)
limit = np.ceil(3*sigma).astype(np.int32)
filter = G2(*np.meshgrid(np.arange(-limit, limit+1), np.arange(-limit, limit+1)))
filter = weights/np.sum(filter)

res = np.stack([
        scipy.ndimage.convolve(img[:,:,channel], filter, mode='constant')
        for channel in range(3)
    ], axis=2)
skimage.io.imshow(res)
skimage.io.show()

Of course skimage has a gaussian blur filter out of the box for us to use:
import skimage

img = skimage.io.imread('image.png')

sigma = 4
res = skimage.filters.gaussian(img, sigma, mode='constant', truncate=3) #Truncate to 3 sigmas.
skimage.io.imshow(res)
skimage.io.show()