Thursday, April 30, 2020

Rolling neighbourhood histograms: Computing histograms of sliding windows quickly

In an earlier blog post, I described local binary patterns, which are vectors that describe the texture of image regions. In segmentation tasks where you need to group pixels together by texture, you need to compute the local binary pattern of the neighbourhoods of pixels around each pixel. This means that if you want to find the texture descriptor for a particular pixel 'P', first you need to extract a square of pixels centered on 'P' called the neighbourhood of 'P', compute the local binary codes of all the pixels in the neighbourhood, compute the histogram of the local binary codes, and that histogram is the texture descriptor of pixel 'P'. This is illustrated in the figure below.



Now in order to compute the texture descriptor of every pixel in the image you would need to repeat this process of extracting neighbourhoods and computing histograms for every pixel. But this is inefficient because it ends up recomputing the same values multiple times. First of all, the local binary codes do not need to be computed for each neighbourhood but can instead be computed once for the whole image and then the neighbourhoods be extracted from the local binary codes image instead of from the original image. But we can also make the histogram computations faster as well. This is because adjacent pixels have a lot of common pixels in their neighbourhoods as shown in the figure below.



We can take advantage of this commonality by computing the histogram of the new neighbourhood based on the histogram of the previous neighbourhood. Assuming the the histograms are simple frequency histograms, the new histogram is the previous histogram plus the frequencies of the values in the blue vertical column (see last row in the above figure) minus the frequencies of the values in the green vertical column. This means that instead of having to compute the frequencies of the whole neighbourhood you would only need to compute the frequencies for two columns and then add or subtract from the previous histogram. This reduces the time complexity from quadratic time to linear time with respect to the side of the neighbourhood square. Assuming that a neighbourhood is described by the (x,y) coordinate of the central pixel and the (w,h) width and height of the neighbourhood, and that histograms can be added together by vector addition, here is the relationship of the new histogram to the previous histogram:

histogram(x,y,w,h) = histogram(x-1,y,w,h) + histogram(x-w/2,y,1,h) - histogram(x+w/2,y,1,h)

This lets us compute the histogram of each neighbourhood by iteratively using the previous histogram in the same row, requiring only the first pixel in the row to be computed fully. But we can also avoid computing the full histogram of the first pixel in a row after the first row by using the histogram of the above pixel in the previous row like this:

histogram(x,y,w,h) = histogram(x,y-1,w,h) + histogram(x,y-h/2,w,1) - histogram(x,y+h/2,w,1)

This is similar to the way rolling hashes are computed, which is where the name rolling histogram comes from.

Tuesday, March 31, 2020

A differentiable version of Conway's Game of Life

Conway's Game of Life consists of a grid with black and white squares which flip in colour over time. The rules for changing the colour of a square are as follows:
  • If the current colour of a square is white then if it is surrounded by 2 or 3 adjacent white squares it stays white, otherwise it becomes black.
  • If the current colour of a square is black then if it is surrounded by exactly 3 white squares then it becomes white, otherwise it stays black.

If we treat the grid as a numeric matrix where 0 is black and 1 is white, then a differentiable version of these rules can e made. The idea here is to allow not just black and white but also shades of grey so that the values of the squares can change only slightly and hence allow the changes to be differentiable. In order to force the values to remain between 1 and 0 we will make use of the sigmoid (logistic) function which is defined as $y = \frac{1}{1 + e^{-x}}$.

The key trick here is to come up with a differentiable function that maps the number of white squares around a square to 0 or 1. We need two such functions: one that when x is 2 or 3 y is 1 and otherwise y is 0 and another that when x is 3 y is 1 and otherwise y is 0. We can make this function using two sigmoid functions subtracted from each other as follows:

$y = \text{sig}(w \times (x - a)) - \text{sig}(w \times (x - b))$

The graph plot of this function would be one which starts as 0, then increases to 1 at $x = a$ (where $a$ is the halfway point as $y$ goes from 0 to 1), then stays 1 until $x = b$, at which point $y$ goes back to 0 (again, $b$ is the halfway point as $y$ goes from 1 to 0). $w$ is there to control how steep the change from 0 to 1 and back should be with large $w$ meaning very steep.

So now we apply the above equation to be used on a whole matrix of values and where $x$ is the number of adjacent white squares. Given that a square can also be grey, we instead just take the sum of the adjacent values to count the number of white squares. This results in a count which is not a whole number but which is usable in a differentiable function.

We also need to be able to choose between using the rule for white squares or the rule for black squares. To do that we can just take a weighted average as follows:

$y = x \times a + (1 - x) \times b$

where $x$ is a number between 0 and 1 and a is the value $y$ should be when $x$ is 1 whilst $b$ is the value $y$ should be when $x$ is 0.

Given a matrix $G$, you can get the next iteration $G'$ using the following function:

Let $k$ be a 3 by 3 matrix consisting of all 1s and a single 0 in the center.
$N = conv(G, k)$, that is, convolve the kernel $k$ over the matrix $G$ in order to get a new matrix $N$ giving the number of adjacent white squares around every element in the matrix.
$B = \text{sig}(w \times (G - 2.5)) - \text{sig}(w \times (G - 3.5))$ (resultant matrix if all elements where black)
$W = \text{sig}(w \times (G - 1.5)) - \text{sig}(w \times (G - 3.5))$ (resultant matrix if all elements where white)
$G' = G \times W + (1 - G) \times B$

Thursday, February 27, 2020

Proof that the sum of all natural numbers is equal to -1/12 (provided we make an assumption)

This is one of those weird proofs that turns out has practical value in physics. If you add together all the numbers from 1 to infinity, the answer will be $-\frac{1}{12}$. Here's how we get this.

We start from the simpler question of what is 1 - 1 + 1 - 1 + 1 - 1 + ... for an infinite number of terms. If you stop the series at a 1 then the answer is 1 but if you stop at a -1 then the answer is 0. Since the series never stops then we can make the assumption that the answer is 0.5, which is the average. This is like if you had a light switch that you were constantly switching on and off. If done at a fast enough frequency, the light will seem to be halfway light and dark, which is 0.5.

$$\sum_i{(-1)^i} = \frac{1}{2}$$

If you accept the above assumption then the rest is rigorous.

We next need to find what 1 - 2 + 3 - 4 + 5 - ... is. This can be reduced to the above identity by applying the following trick:

S   = 1 - 2 + 3 - 4 + 5 - ...

S+S = 1 - 2 + 3 - 4 + 5 - ...
        + 1 - 2 + 3 - 4 + ...

    = 1 - 1 + 1 - 1 + 1 - ...

Therefore $2S = \frac{1}{2}$ which means that $S = \frac{1}{4}$.

$$\sum_i{(-1)^i i} = \frac{1}{4}$$

Finally we get to our original question:

S'   = 1 + 2 + 3 + 4 + 5 + ...

S'-S = 1 + 2 + 3 + 4 + 5 + 6 + ...
     - 1 + 2 - 3 + 4 - 5 + 6 - ...

     = 0 + 4 + 0 + 8 + 0 + 12 + ...

Which are the even numbers multiplied by 2, that is, the multiples of 4.

S'-S = 4 + 8 + 12 + ...
     = 4(1 + 2 + 3 + ...)
     = 4S'

Now if S' - S = 4S' then

S' - S   = 4S'
S' - 4S' = S
-3S'     = S
S'       = -S/3
         = -(1/4)/3 = -1/12

$$\sum_i{i} = -\frac{1}{12}$$

Sunday, January 26, 2020

The Whittaker-Shannon interpolation formula (inferring missing values from a sampled signal using sinc or Lanczos function)

In an earlier blog post we had talked about Lagrange polynomial interpolation which is when you need to find a polynomial that passes through a set of points. This time we will be seeing a similar problem that is encountered in signal processing.

In discrete signal processing, a continuous signal is represented by a discrete signal consisting of values taken at regular intervals in time, a process called sampling. Below we have an example of a continuous signal that is then sampled at intervals of 0.9 (arbitrarily chosen here).





Now it may be the case that you would like to reconstruct the original signal from the sampled signal. You may think that there is an infinite number of signals that can fit the same samples but because the discrete values are sampled at regular intervals, there is actually only one signal that can fit it, provided that the sampling rate is greater than twice the highest frequency of the original signal. If this condition is met then we can construct the original signal using the Whittaker-Shannon interpolation formula.

The process is similar to that of the Lagrange polynomial interpolation. In polynomial interpolation, the trick is to find a set of component polynomials each of which passes through one of the given set of points. In order to be able to combine them all together into a single polynomial that passes through all of the given set of points, the component polynomials also need to equal zero wherever there is a different point. This makes sure that adding the polynomials together will not cause an interference between them at the given set of points.

In Whittaker-Shannon interpolation, the components are the sinc function instead of polynomials. Sinc functions are periodic functions that are as follows:

$$
\text{sinc}(x) = \begin{cases}
1 & \text{for } x = 0 \\
\frac{sin(x)}{x} & \text{otherwise} \\
\end{cases}
$$



See how sinc is equal to 1 at x = 0 and equal to zero at regular intervals around it? This is how we'll take advantage of it as a component. The first thing we need to do is to synchronise the intervals at which it equals zero to the intervals of the sampled signal. If the sample interval, or period, is 0.9, then the synchronised sinc function would be:

$$
y = sinc\left(\frac{\pi x}{0.9}\right)
$$



Next we need to shift the center peak of the synchronised sinc over one of the samples. Let's pick sample number 3 where sample number 0 is the first sample:

$$
y = sinc\left(\frac{\pi (x - 3 \times 0.9)}{0.9}\right)
$$



Finally we need to adjust the height of the peak to be equal to that of that sample value. Since the peak has a value of one and all the other important points on the sinc function are equal to zero, then if the all we have to do is multiply the sinc function by the sample value:

$$
y = -0.3303 \times sinc\left(\frac{\pi (x - 3 \times 0.9)}{0.9}\right)
$$



Great! Now we just need to do the same for all the other samples and add up all the sinc functions into a single function:

$$
y =
0 \times sinc\left(\frac{\pi (x - 0 \times 0.9)}{0.9}\right) +
0.7628 \times sinc\left(\frac{\pi (x - 1 \times 0.9)}{0.9}\right) +
-0.4309 \times sinc\left(\frac{\pi (x - 2 \times 0.9)}{0.9}\right) +
-0.3303 \times sinc\left(\frac{\pi (x - 3 \times 0.9)}{0.9}\right) +
\dots
$$



Note that the point made at the top about how there is only one signal that passes through the given samples only applied when there is an infinite supply of such samples.



Now, the sinc function is only useful when you want to make use of all points together in order to infer any missing point. In practice, this is very computationally expensive and it would be better to instead use the points thatre close to the missing point rather than all the points. To do this, instead of the sinc function we use a truncated sinc function called a Lanczos filter:

$$
\text{lanczos}(x, a) = \begin{cases}
sinc(\pi x) \times sinc(\frac{\pi x}{a}) & -a < x < a \\
0 & \text{otherwise} \\
\end{cases}
$$

where $a$ is some positive whole number such as 2 and is the amount of points to use to the left and right of the point to infer.



Now we can do the same thing as before but with Lanczos instead of sinc, taking care that the sinc within the lanczos function already multiplies $x$ by $\pi$:

$$
y =
0 \times \text{lanczos}\left(\frac{x - 0 \times 0.9}{0.9}, 2\right) +
0.7628 \times \text{lanczos}\left(\frac{x - 1 \times 0.9}{0.9}, 2\right) +
-0.4309 \times \text{lanczos}\left(\frac{x - 2 \times 0.9}{0.9}, 2\right) +
-0.3303 \times \text{lanczos}\left(\frac{x - 3 \times 0.9}{0.9}, 2\right) +
\dots
$$



The Lanczos filter can be used in things like image enlargement. By treating each pixel in an image as a sample of points from a continuous field, values in between two existing pixels can be deduced by interpolation by reproducing the continuous field using the Lanczos interpolation and then picking missing values from between two pixels.

In general, for samples $s$ and a sample period of $T$,

$$y = \sum_i s[i] \times sinc\left(\frac{\pi (x - i \times T)}{T}\right)$$
$$y = \sum_i s[i] \times lanczos\left(\frac{x - i \times T}{T}\right)$$

Monday, December 30, 2019

Implementing a Gaussian blur filter together with convolution operation from scratch

Gaussian blurring is a very common filter used in image processing which is useful for many things such as removing salt and pepper noise from images, resizing images to be smaller (downsampling), and simulating out-of-focus effects. Let's look at how to implement one ourselves.

Let's use the following grey-scale image as an example:


The image consists of many numbers that are arranged in a grid, each number called a pixel. The pixel values range from 0 to 255 where 0 is represented with a black colour and 255 with a white colour and all the other numbers are shades of grey in between. Here are the values of the 5x5 square of pixels in the top-left corner of the image:
[[156, 157, 160, 159, 158],
 [156, 157, 159, 158, 158],
 [158, 157, 156, 156, 157],
 [160, 157, 154, 154, 156],
 [158, 157, 156, 156, 157]]

In the case of a colour image, each pixel contains 3 values: the intensity of red, the intensity of blue, and the intensity of green. When these three primary colours are combined in different intensities they can produce any colour.

Let's use the following colour image as an example:


The values of the 5x5 square of pixels in the top-left corner of the image are:
[[[21, 13,  8], [21, 13,  9], [20, 11,  8], [21, 13, 11], [21, 14,  8]],
 [[21, 13,  7], [21, 13,  9], [20, 14,  7], [22, 14,  8], [20, 13,  7]],
 [[21, 14,  7], [23, 13, 10], [20, 14,  9], [21, 13,  9], [21, 15,  9]],
 [[21, 13,  6], [21, 13,  9], [21, 13,  8], [21, 13, 10], [20, 12,  5]],
 [[21, 13,  5], [21, 13,  7], [21, 13,  4], [21, 13,  7], [21, 13,  8]]]

The convolution operation

One of the basic operations in image processing is the convolution. This is when you replace every pixel value $p$ in an image with a weighted sum of the pixel values near $p$, that is, multiply each nearby pixel value with a different number before adding them all up and replacing $p$ with the result. The simplest example of a convolution is when 'nearby' means the pixel itself only. For example, if we convolve a 5x5 pixel greyscale image with a 1x1 weighting, this is what we get:

$$
\left[\begin{array}{ccccc}
1 & 2 & 3 & 4 & 5\\
6 & 7 & 8 & 9 & 10\\
11 & 12 & 13 & 14 & 15\\
16 & 17 & 18 & 19 & 20\\
21 & 22 & 23 & 24 & 25\\
\end{array}\right]
\circledast
\left[\begin{array}{c}
100
\end{array}\right]
=
\left[\begin{array}{ccccc}
1 \times 100 & 2 \times 100 & 3 \times 100 & 4 \times 100 & 5 \times 100\\
6 \times 100 & 7 \times 100 & 8 \times 100 & 9 \times 100 & 10 \times 100\\
11 \times 100 & 12 \times 100 & 13 \times 100 & 14 \times 100 & 15 \times 100\\
16 \times 100 & 17 \times 100 & 18 \times 100 & 19 \times 100 & 20 \times 100\\
21 \times 100 & 22 \times 100 & 23 \times 100 & 24 \times 100 & 25 \times 100\\
\end{array}\right]
=
\left[\begin{array}{ccccc}
100 & 200 & 300 & 400 & 500\\
600 & 700 & 800 & 900 & 1000\\
1100 & 1200 & 1300 & 1400 & 1500\\
1600 & 1700 & 1800 & 1900 & 2000\\
2100 & 2200 & 2300 & 2400 & 2500\\
\end{array}\right]
$$

Here we have replaced the pixel value 1 with 100, 2 with 200, etc. Note that in practice the numbers would be clipped to not exceed 255.

Only the pixel being replaced contributed to its new value. We can extend the range of the weighting by using a 3x3 weight instead of a single number which will combine all the pixels adjacent to the pixel being replaced.

$$
\left[\begin{array}{ccccc}
1 & 2 & 3 & 4 & 5\\
6 & 7 & 8 & 9 & 10\\
11 & 12 & 13 & 14 & 15\\
16 & 17 & 18 & 19 & 20\\
21 & 22 & 23 & 24 & 25\\
\end{array}\right]
\circledast
\left[\begin{array}{ccc}
100 & 200 & 100\\
300 & 400 & 300\\
100 & 200 & 100\\
\end{array}\right]
=
\left[\begin{array}{ccccc}

\left(\begin{array}{cccccc}
& 1 \times 100 & + & 2 \times 200 & + & 3 \times 100\\
+ & 6 \times 300 & + & 7 \times 400 & + & 8 \times 300\\
+ & 11 \times 100 & + & 12 \times 200 & + & 13 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 2 \times 100 & + & 3 \times 200 & + & 4 \times 100\\
+ & 7 \times 300 & + & 8 \times 400 & + & 9 \times 300\\
+ & 12 \times 100 & + & 13 \times 200 & + & 14 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 3 \times 100 & + & 4 \times 200 & + & 5 \times 100\\
+ & 8 \times 300 & + & 9 \times 400 & + & 10 \times 300\\
+ & 13 \times 100 & + & 14 \times 200 & + & 15 \times 100\\
\end{array}\right)
\\

\left(\begin{array}{cccccc}
& 6 \times 100 & + & 7 \times 200 & + & 8 \times 100\\
+ & 11 \times 300 & + & 12 \times 400 & + & 13 \times 300\\
+ & 16 \times 100 & + & 17 \times 200 & + & 18 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 7 \times 100 & + & 8 \times 200 & + & 9 \times 100\\
+ & 12 \times 300 & + & 13 \times 400 & + & 14 \times 300\\
+ & 17 \times 100 & + & 18 \times 200 & + & 19 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 8 \times 100 & + & 9 \times 200 & + & 10 \times 100\\
+ & 13 \times 300 & + & 14 \times 400 & + & 15 \times 300\\
+ & 18 \times 100 & + & 19 \times 200 & + & 20 \times 100\\
\end{array}\right)
\\

\left(\begin{array}{cccccc}
& 11 \times 100 & + & 12 \times 200 & + & 13 \times 100\\
+ & 16 \times 300 & + & 17 \times 400 & + & 18 \times 300\\
+ & 21 \times 100 & + & 22 \times 200 & + & 23 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 12 \times 100 & + & 13 \times 200 & + & 14 \times 100\\
+ & 17 \times 300 & + & 18 \times 400 & + & 19 \times 300\\
+ & 22 \times 100 & + & 23 \times 200 & + & 24 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 13 \times 100 & + & 14 \times 200 & + & 15 \times 100\\
+ & 18 \times 300 & + & 19 \times 400 & + & 20 \times 300\\
+ & 23 \times 100 & + & 24 \times 200 & + & 25 \times 100\\
\end{array}\right)
\\

\end{array}\right]
=
\left[\begin{array}{ccc}
12600 & 14400 & 16200\\
21600 & 23400 & 25200\\
30600 & 32400 & 34200\\
\end{array}\right]
$$

Note how the new image becomes smaller than the original image. This is in order to avoid using pixels values that lie outside of the image edges when computing new pixel values. One way to use pixel values outside of the image is to treat any pixel outside of the image as having a value of zero (zero padding). We can also reflect the image outside of its edges or wrap around to the other side of the image.

We can implement the above convolution operation in Python naively using for loops as follows:

def conv(img, wgt):
    img_h = len(img)
    img_w = len(img[0])
    wgt_h = len(wgt)
    wgt_w = len(wgt[0])

    res_h = img_h - (wgt_h//2)*2
    res_w = img_w - (wgt_w//2)*2
    res = [ [ 0 for _ in range(res_w) ] for _ in range(res_h) ]

    for img_row in range(wgt_h//2, img_h-wgt_h//2):
        for img_col in range(wgt_w//2, img_w-wgt_w//2):
            new_pix = 0
            for wgt_row in range(wgt_h):
                for wgt_col in range(wgt_w-1, -1, -1): #Technically, a convolution operation needs to flip the weights left to right.
                    new_pix += wgt[wgt_row][wgt_col]*img[img_row+wgt_row-wgt_h//2][img_col+wgt_col-wgt_w//2]
            res[img_row - wgt_h//2][img_col - wgt_w//2] = new_pix

    return res

print(conv([
            [ 1, 2, 3, 4, 5],
            [ 6, 7, 8, 9,10],
            [11,12,13,14,15],
            [16,17,18,19,20],
            [21,22,23,24,25]
        ], [
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ]))
[[12600, 14400, 16200], [21600, 23400, 25200], [30600, 32400, 34200]]

I say naively because it is possible to perform a convolution using fewer loops and with parallel processing but that would be beyond the scope of this blog post. Note that in the code above we are flipping the weights before multiplying them because that is what a convolution involves, but since we'll be working with symmetric weights then it doesn't really matter.

If we want to use zero padding in our convolution then we would do it as follows:

def conv_pad(img, wgt):
    img_h = len(img)
    img_w = len(img[0])
    wgt_h = len(wgt)
    wgt_w = len(wgt[0])

    res = [ [ 0 for _ in range(img_w) ] for _ in range(img_h) ]

    for img_row in range(img_h):
        for img_col in range(img_w):
            new_pix = 0
            for wgt_row in range(wgt_h):
                for wgt_col in range(wgt_w-1, -1, -1):
                    y = img_row+wgt_row-wgt_h//2
                    x = img_col+wgt_col-wgt_w//2
                    if 0 >= y > img_h and 0 >= x > img_w:
                        pix = img[y][x]
                    else:
                        pix = 0
                    new_pix += wgt[wgt_row][wgt_col]*pix
            res[img_row][img_col] = new_pix

    return res

print(conv_pad([
            [ 1, 2, 3, 4, 5],
            [ 6, 7, 8, 9,10],
            [11,12,13,14,15],
            [16,17,18,19,20],
            [21,22,23,24,25]
        ], [
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ]))
[[2900, 4800, 6200, 7600, 6100], [8300, 12600, 14400, 16200, 12500], [14800, 21600, 23400, 25200, 19000], [21300, 30600, 32400, 34200, 25500], [19900, 28800, 30200, 31600, 23100]]

We can adapt this function to be able to operate on colour images by simply applying the filter on each of the three intensity channels separately as follows:

def conv_pad_colour(img, wgt):
    img_h = len(img)
    img_w = len(img[0])
    wgt_h = len(wgt)
    wgt_w = len(wgt[0])

    res = [ [ [ 0, 0, 0 ] for _ in range(img_w) ] for _ in range(img_h) ]

    for img_row in range(img_h):
        for img_col in range(img_w):
            for channel in range(3):
                new_pix = 0
                for wgt_row in range(wgt_h):
                    for wgt_col in range(wgt_w-1, -1, -1):
                        y = img_row+wgt_row-wgt_h//2
                        x = img_col+wgt_col-wgt_w//2
                        if 0 <= y < img_h and 0 <= x < img_w:
                            pix = img[y][x][channel]
                        else:
                            pix = 0
                        new_pix += wgt[wgt_row][wgt_col]*pix
                res[img_row][img_col][channel] = new_pix

    return res

print(conv_pad_colour([
            [[ 1,  2,  3], [ 4,  5,  6], [ 7,  8,  9], [10, 11, 12], [13, 14, 15]],
            [[16, 17, 18], [19, 20, 21], [22, 23, 24], [25, 26, 27], [28, 29, 30]],
            [[31, 32, 33], [34, 35, 36], [37, 38, 39], [40, 41, 42], [43, 44, 45]],
            [[46, 47, 48], [49, 50, 51], [52, 53, 54], [55, 56, 57], [58, 59, 60]],
            [[61, 62, 63], [64, 65, 66], [67, 68, 69], [70, 71, 72], [73, 74, 75]]
        ], [
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ]))
[[[6700, 7700, 8700], [11600, 13000, 14400], [15800, 17200, 18600], [20000, 21400, 22800], [16300, 17300, 18300]], [[22300, 23600, 24900], [34200, 36000, 37800], [39600, 41400, 43200], [45000, 46800, 48600], [34900, 36200, 37500]], [[41800, 43100, 44400], [61200, 63000, 64800], [66600, 68400, 70200], [72000, 73800, 75600], [54400, 55700, 57000]], [[61300, 62600, 63900], [88200, 90000, 91800], [93600, 95400, 97200], [99000, 100800, 102600], [73900, 75200, 76500]], [[57700, 58700, 59700], [83600, 85000, 86400], [87800, 89200, 90600], [92000, 93400, 94800], [67300, 68300, 69300]]]
In practice we can use scipy's convolve function in order to apply a convolution:
import scipy.ndimage
import numpy as np
img = np.array([
            [[ 1,  2,  3], [ 4,  5,  6], [ 7,  8,  9], [10, 11, 12], [13, 14, 15]],
            [[16, 17, 18], [19, 20, 21], [22, 23, 24], [25, 26, 27], [28, 29, 30]],
            [[31, 32, 33], [34, 35, 36], [37, 38, 39], [40, 41, 42], [43, 44, 45]],
            [[46, 47, 48], [49, 50, 51], [52, 53, 54], [55, 56, 57], [58, 59, 60]],
            [[61, 62, 63], [64, 65, 66], [67, 68, 69], [70, 71, 72], [73, 74, 75]]
        ])
weights = np.array([
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ])
res = np.stack([ #Convolve each channel separately as a greyscale image and then combine them back into a single image.
        scipy.ndimage.convolve(img[:,:,channel], weights, mode='constant')
        for channel in range(3)
    ], axis=2)
print(res)
In signal processing, the weights are actually called a filter.

The gaussian blur
A blur is made by replacing each pixel value with the average value of its nearby pixels. This softens edges and more the image look out of focus. Averaging nearby pixels can be easily achieved by convolving with a filter of $\frac{1}{n}$ where $n$ is the amount of numbers in the filter. This results in making each pixel equal to $$p_1 \times \frac{1}{n} + p_2 \times \frac{1}{n} + \dots + p_n \times \frac{1}{n} = \frac{1}{n} \left(p_1 + p_2 + \dots + p_n\right)$$ where $p_i$ is a pixel value.
import scipy.ndimage
import skimage
import numpy as np

img = skimage.io.imread('image.png')
filter = np.ones([11, 11])
filter = filter/filter.size

res = np.stack([ #Convolve each channel separately as a greyscale image and then combine them back into a single image.
        scipy.ndimage.convolve(img[:,:,channel], filter, mode='constant')
        for channel in range(3)
    ], axis=2)

skimage.io.imshow(res)
skimage.io.show()

This is actually called a uniform blur, which is not considered an ideal blur as it gives equal importance to pixels that are far from the pixel being replaced as to pixels that are close. A better blur is a gaussian blur. A gaussian blur is when the filter values follow a gaussian bell curve with a maximum in the center and a steep decline just around the center. The gaussian function is defined as follows: $$G(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{x^2}{2\sigma^2}}$$ where $\sigma$ is a parameter for controlling how wide the bell curve is and therefore how much influence the nearby pixels have on the value of the pixel being replaced (the greater the $\sigma$, the more blurry). In two dimensions all you do is take the gaussian of each variable and multiply them together: $$G(x,y) = G(x) \times G(y)$$ Now the way the gaussian blur works is to use an infinite 2D gaussian curve as a filter for a convolution. The center of the filter is $G(0,0)$ and each cell in the filter being a distance of one in the gaussian function's domain as follows: $$ \newcommand\iddots{\mathinner{ \kern1mu\raise1pt{.} \kern2mu\raise4pt{.} \kern2mu\raise7pt{\Rule{0pt}{7pt}{0pt}.} \kern1mu }} \begin{array}{ccccc} \ddots & & \vdots & & \iddots\\ & G(-1, 1) & G( 0, 1) & G( 1, 1) & \\ \cdots & G(-1, 0) & G( 0, 0) & G( 1, 0) & \cdots\\ & G(-1,-1) & G( 0,-1) & G( 1,-1) & \\ \iddots & & \vdots & & \ddots\\ \end{array} $$ In order to be a perfect gaussian blur, the filter needs to be infinitely large, which is not possible. Fortunately, $G(3\sigma)$ is small enough to be rounded down to zero. This means that our kernel needs to only have a side of $\lceil 6\sigma + 1 \rceil$ at most as anything larger than that will not make a difference ($\lceil x \rceil$ means $x$ rounded up). The problem is that even though the missing values are negligible, the values in the filter will not equal 1 (especially for a small $\sigma$) which makes the resulting image will be either brighter or darker. To get around this we cheat a bit and divide each value in the filter by the filter's total, which results in a guassian blur with a neutral effect on the brightness of the image. $$ \begin{array}{ccccccc} \frac{G(-\lceil 3\sigma \rceil, \lceil 3\sigma \rceil)}{T} & & & \frac{G(0, \lceil 3\sigma \rceil)}{T} & & & \frac{G(\lceil 3\sigma \rceil, \lceil 3\sigma \rceil)}{T}\\ & \ddots & & \vdots & & \iddots & \\ & & \frac{G(-1, 1)}{T} & \frac{G( 0, 1)}{T} & \frac{G( 1, 1)}{T} & & \\ \frac{G(-\lceil 3\sigma \rceil, 0)}{T} & \cdots & \frac{G(-1, 0)}{T} & \frac{G( 0, 0)}{T} & \frac{G( 1, 0)}{T} & \cdots & \frac{G(\lceil 3\sigma \rceil, 1)}{T}\\ & & \frac{G(-1,-1)}{T} & \frac{G( 0,-1)}{T} & \frac{G( 1,-1)}{T} & & \\ & \iddots & & \vdots & & \ddots & \\ \frac{G(-\lceil 3\sigma \rceil, -\lceil 3\sigma \rceil)}{T} & & & \frac{G(0, -\lceil 3\sigma \rceil)}{T} & & & \frac{G(\lceil 3\sigma \rceil, -\lceil 3\sigma \rceil)}{T}\\ \end{array} $$ where $T$ is the total of all the $G(x,y)$ in the filter.
import scipy.ndimage
import skimage
import numpy as np

img = skimage.io.imread('image.png')

sigma = 4
G = lambda x: 1/np.sqrt(2*np.pi*sigma**2)*np.exp(-x**2/(2*sigma**2))
G2 = lambda x,y: G(x)*G(y)
limit = np.ceil(3*sigma).astype(np.int32)
filter = G2(*np.meshgrid(np.arange(-limit, limit+1), np.arange(-limit, limit+1)))
filter = weights/np.sum(filter)

res = np.stack([
        scipy.ndimage.convolve(img[:,:,channel], filter, mode='constant')
        for channel in range(3)
    ], axis=2)
skimage.io.imshow(res)
skimage.io.show()

Of course skimage has a gaussian blur filter out of the box for us to use:
import skimage

img = skimage.io.imread('image.png')

sigma = 4
res = skimage.filters.gaussian(img, sigma, mode='constant', truncate=3) #Truncate to 3 sigmas.
skimage.io.imshow(res)
skimage.io.show()

Saturday, November 30, 2019

Texture Descriptors of Image Regions using Local Binary Patterns

A very simple way to measure the similarity between the texture of regions in images is to capture the region's texture information in a vector using local binary patterns. These vectors are invariant to intensity changes, fast to compute, and very easy to understand. Let's see how this works. Note that since we're focusing on texture then we don't care about colour but about intensities, so the image needs to be greyscale.

The first step is to replace all the pixels in the image region with LBP (local binary pattern) codes. For every pixel, draw an imaginary circle of some radius centred on the pixel and draw some number of imaginary dots on that circle that are equally spaced apart. The radius and number of dots are to be hand tuned in order to maximise performance but they are usually set to a radius of 1 pixel and a number of dots of 8. Now pick the pixels that fall under those dots together with the pixel at the centre.



In the surrounding pixels, any pixel that has a lower intensity than the central pixel gets a zero and all the rest get a one. These numbers are put together in order to form a binary number called the LBP code of the central pixel.



Do this for every pixel in the region and then count how many times each code appears in the region, putting all the frequencies in a vector (a histogram). This is your texture descriptor and you can stop here. The more similar the vector is to other vectors, the more similar the texture of the image regions are.

We can do better than this however by making the codes uniform. If you do an analysis of these LBP code frequencies, you'll find that most of them have a most two bit changes in them. For example, 00011100 has two bit changes, one from a 0 to a 1 and another after from a 1 to a 0. On the other hand 01010101 has 7 changes. The reason why the majority will have at most two bit changes is because, most of the time, the pixels around a central pixel do not have random intensities but change in a flow as shown in the figure above. If there is a gradient of intensity changes in a single direction then there will only be two bit changes. This means that we can replace all LBP codes with more than two bit changes with a single LBP code that just means 'other'. This is called a uniform LBP code and it reduces our vector size significantly without much of a performance hit, which is good.

We can do even better by making the LBP codes rotation invariant, that is, using the same LBP code regardless of where the gradient direction is pointing. In the above figure, if the line of dark grey pixels were rotated, the LBP code would be changed to something like 00011110 for example or 11110000. The only thing that would change when rotating the surrounding pixels would be the bit rotation of the LBP code. We can make our codes invariant by artificially rotating our LBP codes so that for a given number of ones and zeros in the code we always have a canonical code. Usually we treat the binary code as an integer and rotate the bits until we get the minimum integer. For example, here are all the bit rotations of four 1s and four 0s:

11110000 (240), 01111000 (120), 00111100 (60), 00011110 (30), 00001111 (15)

In this case we would pick 00001111 as our canonical representation as that is the smallest integer. This drastically reduces our vector sizes as the amount of different possible LBP codes to find the frequency of will be a small fraction of the original full set of LBP codes.

You can use LBP descriptors using Python's skimage and numpy as follows:
import numpy as np
import skimage
import skimage.feature

image = skimage.io.imread('img.png', as_gray=True)
region = image[0:100,0:100]
lbp_codes = skimage.feature.local_binary_pattern(region, P=8, R=1, method='uniform') #8 dots, radius 1, rotation invariant uniform codes.
descriptor = np.histogram(lbp_codes, bins=10, range=(0, 10))[0] #Number of bins and range changes depending on the number of dots and LBP method used.

The number of bins and range needed can be obtained by artificially generating all the possible arrangements of darker and lighter pixels around a central pixel:
def get_histogram_params(dots, method):
    all_possible_lbp_codes = set()
    for i in range(2**8):
        str_bin = '{:0>8b}'.format(i)
        region = [ {'0':0,'1':2}[ch] for ch in str_bin ] #Surrounding pixels.
        region.insert(4, 1) #Central pixel.
        region = np.array(region).reshape((3,3))
        lbp_codes = skimage.feature.local_binary_pattern(region, dots, 1, method) #Radius is minimum to keep region size to a minimum.
        all_possible_lbp_codes.add(lbp_codes[1,1]) #Get code of central pixel and add it to the set (keeps on unique values).
    num_bins = len(all_possible_lbp_codes)
    rng = (min(all_possible_lbp_codes), max(all_possible_lbp_codes)+1)
    return (num_bins, rng)

You can get more information about the skimage function from here.

Wednesday, October 30, 2019

No square number can end in 2, 3, 7, or 8 (modulo of square numbers)

Look at the following list of square numbers:
0^2:   0
 1^2:   1
 2^2:   4
 3^2:   9
 4^2:  16
 5^2:  25
 6^2:  36
 7^2:  49
 8^2:  64
 9^2:  81
10^2: 100
11^2: 121
12^2: 144
13^2: 169
14^2: 196
15^2: 225
16^2: 256
17^2: 289
18^2: 324
19^2: 361
20^2: 400
21^2: 441
22^2: 484
23^2: 529
24^2: 576
25^2: 625
26^2: 676
27^2: 729
28^2: 784
29^2: 841
30^2: 900

Notice how the last digit is always 0, 1, 4, 9, 6, or 5? There also seems to be a cyclic pattern of digits repeating themselves but we'll talk about that in another post. In this post, we'll just talk about why certain digits do not occur at the end of square numbers, which is useful in order to check if a number is a square or not.

We can get the last digit of a number by dividing it by 10 and keeping the remainder, also known as the number modulo 10:

0^2 mod 10: 0
 1^2 mod 10: 1
 2^2 mod 10: 4
 3^2 mod 10: 9
 4^2 mod 10: 6
 5^2 mod 10: 5
 6^2 mod 10: 6
 7^2 mod 10: 9
 8^2 mod 10: 4
 9^2 mod 10: 1
10^2 mod 10: 0
11^2 mod 10: 1
12^2 mod 10: 4
13^2 mod 10: 9
14^2 mod 10: 6
15^2 mod 10: 5
16^2 mod 10: 6
17^2 mod 10: 9
18^2 mod 10: 4
19^2 mod 10: 1
20^2 mod 10: 0
21^2 mod 10: 1
22^2 mod 10: 4
23^2 mod 10: 9
24^2 mod 10: 6
25^2 mod 10: 5
26^2 mod 10: 6
27^2 mod 10: 9
28^2 mod 10: 4
29^2 mod 10: 1
30^2 mod 10: 0

It turns out that this sort of exclusion of certain modulo results from square numbers happens with many other numbers apart from 10. Take 12 for example:

0^2 mod 12: 0
 1^2 mod 12: 1
 2^2 mod 12: 4
 3^2 mod 12: 9
 4^2 mod 12: 4
 5^2 mod 12: 1
 6^2 mod 12: 0
 7^2 mod 12: 1
 8^2 mod 12: 4
 9^2 mod 12: 9
10^2 mod 12: 4
11^2 mod 12: 1
12^2 mod 12: 0
13^2 mod 12: 1
14^2 mod 12: 4
15^2 mod 12: 9
16^2 mod 12: 4
17^2 mod 12: 1
18^2 mod 12: 0
19^2 mod 12: 1
20^2 mod 12: 4
21^2 mod 12: 9
22^2 mod 12: 4
23^2 mod 12: 1
24^2 mod 12: 0
25^2 mod 12: 1
26^2 mod 12: 4
27^2 mod 12: 9
28^2 mod 12: 4
29^2 mod 12: 1
30^2 mod 12: 0

With 12 you only get 0, 1, 4, and 9 as digits which is less than with 10. Why does this happen? We can prove that this is the case using modulo algebra. Let's call the number being squared $x$ and the number after the modulo $n$. Then by the distributive law of the modulo operation:

$x^2 \text{ mod } n = (x \times x) \text{ mod } n = ((x \text{ mod } n) \times (x \text{ mod } n)) \text{ mod } n = (x \text{ mod } n)^2 \text{ mod } n$

What the above derivation says is that when dividing a square number, the remainder will be based on the remainder of the root of the square number. In the case of the last digit of a square number, which is equivalent to the square number modulo 10, the last digit will be one of the last digits of the single digits squared: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81. This is because $x \text{ mod } 10$ gives you the last digit of the number $x$, which is a single digit, and then this is squared and the last digit of the square is the answer.

In the general case, this is setting a limit on the number of possible remainders (usually $x \text{ mod } n$ has $n$ different possible answers for all $x$). The number of different remainders is first limited to $n$ with the first inner modulo but then only the remainders resulting from these numbers squared can be returned. This means that if one of the square numbers has the same modulo as another square number, then the total number of final remainders has decreased.

In a future post we will investigate why the remainders of square numbers are cyclic.