Let's use the following grey-scale image as an example:

The image consists of many numbers that are arranged in a grid, each number called a pixel. The pixel values range from 0 to 255 where 0 is represented with a black colour and 255 with a white colour and all the other numbers are shades of grey in between. Here are the values of the 5x5 square of pixels in the top-left corner of the image:
[[156, 157, 160, 159, 158], [156, 157, 159, 158, 158], [158, 157, 156, 156, 157], [160, 157, 154, 154, 156], [158, 157, 156, 156, 157]]
In the case of a colour image, each pixel contains 3 values: the intensity of red, the intensity of blue, and the intensity of green. When these three primary colours are combined in different intensities they can produce any colour.
Let's use the following colour image as an example:

The values of the 5x5 square of pixels in the top-left corner of the image are:
[[[21, 13, 8], [21, 13, 9], [20, 11, 8], [21, 13, 11], [21, 14, 8]], [[21, 13, 7], [21, 13, 9], [20, 14, 7], [22, 14, 8], [20, 13, 7]], [[21, 14, 7], [23, 13, 10], [20, 14, 9], [21, 13, 9], [21, 15, 9]], [[21, 13, 6], [21, 13, 9], [21, 13, 8], [21, 13, 10], [20, 12, 5]], [[21, 13, 5], [21, 13, 7], [21, 13, 4], [21, 13, 7], [21, 13, 8]]]
The convolution operation
One of the basic operations in image processing is the convolution. This is when you replace every pixel value pp in an image with a weighted sum of the pixel values near pp, that is, multiply each nearby pixel value with a different number before adding them all up and replacing pp with the result. The simplest example of a convolution is when 'nearby' means the pixel itself only. For example, if we convolve a 5x5 pixel greyscale image with a 1x1 weighting, this is what we get:
[12345678910111213141516171819202122232425]⊛[100]=[1×1002×1003×1004×1005×1006×1007×1008×1009×10010×10011×10012×10013×10014×10015×10016×10017×10018×10019×10020×10021×10022×10023×10024×10025×100]=[1002003004005006007008009001000110012001300140015001600170018001900200021002200230024002500]⎡⎢
⎢
⎢
⎢
⎢
⎢⎣12345678910111213141516171819202122232425⎤⎥
⎥
⎥
⎥
⎥
⎥⎦⊛[100]=⎡⎢
⎢
⎢
⎢
⎢
⎢⎣1×1002×1003×1004×1005×1006×1007×1008×1009×10010×10011×10012×10013×10014×10015×10016×10017×10018×10019×10020×10021×10022×10023×10024×10025×100⎤⎥
⎥
⎥
⎥
⎥
⎥⎦=⎡⎢
⎢
⎢
⎢
⎢
⎢⎣1002003004005006007008009001000110012001300140015001600170018001900200021002200230024002500⎤⎥
⎥
⎥
⎥
⎥
⎥⎦
Here we have replaced the pixel value 1 with 100, 2 with 200, etc. Note that in practice the numbers would be clipped to not exceed 255.
Only the pixel being replaced contributed to its new value. We can extend the range of the weighting by using a 3x3 weight instead of a single number which will combine all the pixels adjacent to the pixel being replaced.
[12345678910111213141516171819202122232425]⊛[100200100300400300100200100]=[(1×100+2×200+3×100+6×300+7×400+8×300+11×100+12×200+13×100)(2×100+3×200+4×100+7×300+8×400+9×300+12×100+13×200+14×100)(3×100+4×200+5×100+8×300+9×400+10×300+13×100+14×200+15×100)(6×100+7×200+8×100+11×300+12×400+13×300+16×100+17×200+18×100)(7×100+8×200+9×100+12×300+13×400+14×300+17×100+18×200+19×100)(8×100+9×200+10×100+13×300+14×400+15×300+18×100+19×200+20×100)(11×100+12×200+13×100+16×300+17×400+18×300+21×100+22×200+23×100)(12×100+13×200+14×100+17×300+18×400+19×300+22×100+23×200+24×100)(13×100+14×200+15×100+18×300+19×400+20×300+23×100+24×200+25×100)]=[126001440016200216002340025200306003240034200]⎡⎢
⎢
⎢
⎢
⎢
⎢⎣12345678910111213141516171819202122232425⎤⎥
⎥
⎥
⎥
⎥
⎥⎦⊛⎡⎢⎣100200100300400300100200100⎤⎥⎦=⎡⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢⎣⎛⎜⎝1×100+2×200+3×100+6×300+7×400+8×300+11×100+12×200+13×100⎞⎟⎠⎛⎜⎝2×100+3×200+4×100+7×300+8×400+9×300+12×100+13×200+14×100⎞⎟⎠⎛⎜⎝3×100+4×200+5×100+8×300+9×400+10×300+13×100+14×200+15×100⎞⎟⎠⎛⎜⎝6×100+7×200+8×100+11×300+12×400+13×300+16×100+17×200+18×100⎞⎟⎠⎛⎜⎝7×100+8×200+9×100+12×300+13×400+14×300+17×100+18×200+19×100⎞⎟⎠⎛⎜⎝8×100+9×200+10×100+13×300+14×400+15×300+18×100+19×200+20×100⎞⎟⎠⎛⎜⎝11×100+12×200+13×100+16×300+17×400+18×300+21×100+22×200+23×100⎞⎟⎠⎛⎜⎝12×100+13×200+14×100+17×300+18×400+19×300+22×100+23×200+24×100⎞⎟⎠⎛⎜⎝13×100+14×200+15×100+18×300+19×400+20×300+23×100+24×200+25×100⎞⎟⎠⎤⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥⎦=⎡⎢⎣126001440016200216002340025200306003240034200⎤⎥⎦
Note how the new image becomes smaller than the original image. This is in order to avoid using pixels values that lie outside of the image edges when computing new pixel values. One way to use pixel values outside of the image is to treat any pixel outside of the image as having a value of zero (zero padding). We can also reflect the image outside of its edges or wrap around to the other side of the image.
We can implement the above convolution operation in Python naively using for loops as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | def conv(img, wgt): img_h = len (img) img_w = len (img[ 0 ]) wgt_h = len (wgt) wgt_w = len (wgt[ 0 ]) res_h = img_h - (wgt_h / / 2 ) * 2 res_w = img_w - (wgt_w / / 2 ) * 2 res = [ [ 0 for _ in range (res_w) ] for _ in range (res_h) ] for img_row in range (wgt_h / / 2 , img_h - wgt_h / / 2 ): for img_col in range (wgt_w / / 2 , img_w - wgt_w / / 2 ): new_pix = 0 for wgt_row in range (wgt_h): for wgt_col in range (wgt_w - 1 , - 1 , - 1 ): #Technically, a convolution operation needs to flip the weights left to right. new_pix + = wgt[wgt_row][wgt_col] * img[img_row + wgt_row - wgt_h / / 2 ][img_col + wgt_col - wgt_w / / 2 ] res[img_row - wgt_h / / 2 ][img_col - wgt_w / / 2 ] = new_pix return res print (conv([ [ 1 , 2 , 3 , 4 , 5 ], [ 6 , 7 , 8 , 9 , 10 ], [ 11 , 12 , 13 , 14 , 15 ], [ 16 , 17 , 18 , 19 , 20 ], [ 21 , 22 , 23 , 24 , 25 ] ], [ [ 100 , 200 , 100 ], [ 300 , 400 , 300 ], [ 100 , 200 , 100 ] ])) |
[[12600, 14400, 16200], [21600, 23400, 25200], [30600, 32400, 34200]]
I say naively because it is possible to perform a convolution using fewer loops and with parallel processing but that would be beyond the scope of this blog post. Note that in the code above we are flipping the weights before multiplying them because that is what a convolution involves, but since we'll be working with symmetric weights then it doesn't really matter.
If we want to use zero padding in our convolution then we would do it as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | def conv_pad(img, wgt): img_h = len (img) img_w = len (img[ 0 ]) wgt_h = len (wgt) wgt_w = len (wgt[ 0 ]) res = [ [ 0 for _ in range (img_w) ] for _ in range (img_h) ] for img_row in range (img_h): for img_col in range (img_w): new_pix = 0 for wgt_row in range (wgt_h): for wgt_col in range (wgt_w - 1 , - 1 , - 1 ): y = img_row + wgt_row - wgt_h / / 2 x = img_col + wgt_col - wgt_w / / 2 if 0 > = y > img_h and 0 > = x > img_w: pix = img[y][x] else : pix = 0 new_pix + = wgt[wgt_row][wgt_col] * pix res[img_row][img_col] = new_pix return res print (conv_pad([ [ 1 , 2 , 3 , 4 , 5 ], [ 6 , 7 , 8 , 9 , 10 ], [ 11 , 12 , 13 , 14 , 15 ], [ 16 , 17 , 18 , 19 , 20 ], [ 21 , 22 , 23 , 24 , 25 ] ], [ [ 100 , 200 , 100 ], [ 300 , 400 , 300 ], [ 100 , 200 , 100 ] ])) |
[[2900, 4800, 6200, 7600, 6100], [8300, 12600, 14400, 16200, 12500], [14800, 21600, 23400, 25200, 19000], [21300, 30600, 32400, 34200, 25500], [19900, 28800, 30200, 31600, 23100]]
We can adapt this function to be able to operate on colour images by simply applying the filter on each of the three intensity channels separately as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | def conv_pad_colour(img, wgt): img_h = len (img) img_w = len (img[ 0 ]) wgt_h = len (wgt) wgt_w = len (wgt[ 0 ]) res = [ [ [ 0 , 0 , 0 ] for _ in range (img_w) ] for _ in range (img_h) ] for img_row in range (img_h): for img_col in range (img_w): for channel in range ( 3 ): new_pix = 0 for wgt_row in range (wgt_h): for wgt_col in range (wgt_w - 1 , - 1 , - 1 ): y = img_row + wgt_row - wgt_h / / 2 x = img_col + wgt_col - wgt_w / / 2 if 0 < = y < img_h and 0 < = x < img_w: pix = img[y][x][channel] else : pix = 0 new_pix + = wgt[wgt_row][wgt_col] * pix res[img_row][img_col][channel] = new_pix return res print (conv_pad_colour([ [[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ], [ 10 , 11 , 12 ], [ 13 , 14 , 15 ]], [[ 16 , 17 , 18 ], [ 19 , 20 , 21 ], [ 22 , 23 , 24 ], [ 25 , 26 , 27 ], [ 28 , 29 , 30 ]], [[ 31 , 32 , 33 ], [ 34 , 35 , 36 ], [ 37 , 38 , 39 ], [ 40 , 41 , 42 ], [ 43 , 44 , 45 ]], [[ 46 , 47 , 48 ], [ 49 , 50 , 51 ], [ 52 , 53 , 54 ], [ 55 , 56 , 57 ], [ 58 , 59 , 60 ]], [[ 61 , 62 , 63 ], [ 64 , 65 , 66 ], [ 67 , 68 , 69 ], [ 70 , 71 , 72 ], [ 73 , 74 , 75 ]] ], [ [ 100 , 200 , 100 ], [ 300 , 400 , 300 ], [ 100 , 200 , 100 ] ])) |
[[[6700, 7700, 8700], [11600, 13000, 14400], [15800, 17200, 18600], [20000, 21400, 22800], [16300, 17300, 18300]], [[22300, 23600, 24900], [34200, 36000, 37800], [39600, 41400, 43200], [45000, 46800, 48600], [34900, 36200, 37500]], [[41800, 43100, 44400], [61200, 63000, 64800], [66600, 68400, 70200], [72000, 73800, 75600], [54400, 55700, 57000]], [[61300, 62600, 63900], [88200, 90000, 91800], [93600, 95400, 97200], [99000, 100800, 102600], [73900, 75200, 76500]], [[57700, 58700, 59700], [83600, 85000, 86400], [87800, 89200, 90600], [92000, 93400, 94800], [67300, 68300, 69300]]]In practice we can use scipy's convolve function in order to apply a convolution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import scipy.ndimage import numpy as np img = np.array([ [[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ], [ 10 , 11 , 12 ], [ 13 , 14 , 15 ]], [[ 16 , 17 , 18 ], [ 19 , 20 , 21 ], [ 22 , 23 , 24 ], [ 25 , 26 , 27 ], [ 28 , 29 , 30 ]], [[ 31 , 32 , 33 ], [ 34 , 35 , 36 ], [ 37 , 38 , 39 ], [ 40 , 41 , 42 ], [ 43 , 44 , 45 ]], [[ 46 , 47 , 48 ], [ 49 , 50 , 51 ], [ 52 , 53 , 54 ], [ 55 , 56 , 57 ], [ 58 , 59 , 60 ]], [[ 61 , 62 , 63 ], [ 64 , 65 , 66 ], [ 67 , 68 , 69 ], [ 70 , 71 , 72 ], [ 73 , 74 , 75 ]] ]) weights = np.array([ [ 100 , 200 , 100 ], [ 300 , 400 , 300 ], [ 100 , 200 , 100 ] ]) res = np.stack([ #Convolve each channel separately as a greyscale image and then combine them back into a single image. scipy.ndimage.convolve(img[:,:,channel], weights, mode = 'constant' ) for channel in range ( 3 ) ], axis = 2 ) print (res) |
The gaussian blur
A blur is made by replacing each pixel value with the average value of its nearby pixels. This softens edges and more the image look out of focus. Averaging nearby pixels can be easily achieved by convolving with a filter of 1n1n where nn is the amount of numbers in the filter. This results in making each pixel equal to
p1×1n+p2×1n+⋯+pn×1n=1n(p1+p2+⋯+pn)p1×1n+p2×1n+⋯+pn×1n=1n(p1+p2+⋯+pn)
where pipi is a pixel value.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import scipy.ndimage import skimage import numpy as np img = skimage.io.imread( 'image.png' ) filter = np.ones([ 11 , 11 ]) filter = filter / filter .size res = np.stack([ #Convolve each channel separately as a greyscale image and then combine them back into a single image. scipy.ndimage.convolve(img[:,:,channel], filter , mode = 'constant' ) for channel in range ( 3 ) ], axis = 2 ) skimage.io.imshow(res) skimage.io.show() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import scipy.ndimage import skimage import numpy as np img = skimage.io.imread( 'image.png' ) sigma = 4 G = lambda x: 1 / np.sqrt( 2 * np.pi * sigma * * 2 ) * np.exp( - x * * 2 / ( 2 * sigma * * 2 )) G2 = lambda x,y: G(x) * G(y) limit = np.ceil( 3 * sigma).astype(np.int32) filter = G2( * np.meshgrid(np.arange( - limit, limit + 1 ), np.arange( - limit, limit + 1 ))) filter = weights / np. sum ( filter ) res = np.stack([ scipy.ndimage.convolve(img[:,:,channel], filter , mode = 'constant' ) for channel in range ( 3 ) ], axis = 2 ) skimage.io.imshow(res) skimage.io.show() |
1 2 3 4 5 6 7 8 | import skimage img = skimage.io.imread( 'image.png' ) sigma = 4 res = skimage.filters.gaussian(img, sigma, mode = 'constant' , truncate = 3 ) #Truncate to 3 sigmas. skimage.io.imshow(res) skimage.io.show() |