Geeky is Awesome

Triangles in a diagonally segmented trapezium/trapezoid

2021-08-31T11:05:00.002+02:00

When a trapezium (quadriletaral with two parallel sides) is segmented diagonally, you end up with 4 triangles as shown below:

These 4 triangles have interesting properties which I shall be proving here, namely that triangles P and Q are similar (same angles, different sizes) and triangles R and S have the same area.

Let's look at just triangles P and Q.

The angles with the same colour are equal for the following reasons: the green angles are opposite angles whilst the red and blue angles are alternative angles.

Now let's look at just triangles Q, R, and S.

In order to show that R and S have the same area, we'll first look at combined triangles Q and R and combined triangles Q and S.

These two combined triangles have the same base and height, which means that they have the same area ($A = \frac{1}{2}bh$). Now triangle Q in both figures is the same triangle, which means that it has the same area. Therefore, removing the area of triangle Q from both shapes will still result in an equal area ($R+Q = S+Q \implies R = S$). So therefore, triangles R and S have the same area.

The last digit of square numbers cycles/repeats (modulo of square numbers)

2021-07-29T10:37:00.005+02:00

In a previous blog post, I showed that the last digit of a square number could not be 2, 3, 7, or 8. I will now show that the last digit cycles through the same sequence of 10 digits: 0, 1, 4, 9, 6, 5, 6, 9, 4, 1; as shown in these first 31 square numbers:

 0^2:   0
 1^2:   1
 2^2:   4
 3^2:   9
 4^2:  16
 5^2:  25
 6^2:  36
 7^2:  49
 8^2:  64
 9^2:  81
10^2: 100
11^2: 121
12^2: 144
13^2: 169
14^2: 196
15^2: 225
16^2: 256
17^2: 289
18^2: 324
19^2: 361
20^2: 400
21^2: 441
22^2: 484
23^2: 529
24^2: 576
25^2: 625
26^2: 676
27^2: 729
28^2: 784
29^2: 841
30^2: 900

Last time I showed that the reason for only a subset of digits could be the last digit of square numbers was because of the following module equivalence:

$x^2 \text{ mod } 10 = (x \times x) \text{ mod } 10 = ((x \text{ mod } 10) \times (x \text{ mod } 10)) \text{ mod } 10 = (x \text{ mod } 10)^2 \text{ mod } 10$

which says that the last digit of a square number will always be the last digit of the square of a single digit, which limits the possible last digits to the last digit of 0^2, 1^2, 2^2, 3^2, 4^2, 5^2, 6^2, 7^2, 8^2, and 9^2, that is, 0, 1, 4, 9, 6, or 5. From this we can easily see why there is always a cycle among the same 10 digit sequence.

Given that $x^2 \text{ mod } 10 = (x \text{ mod } 10)^2 \text{ mod } 10$, we see that the $(x \text{ mod } 10)$ part is cyclic. This is because it gives the last digit of the number $x$ which cycles through 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. Therefore the last digit of the square of these digits must also cycle. Note that this holds for any $n$ in $x^2 \text{ mod } n$ because, given that $x^2 \text{ mod } n = (x \text{ mod } n)^2 \text{ mod } n$, the $(x \text{ mod } n)$ part is cyclic every $n$ numbers.

You'll also notice that the cycle of digits is symmetric (0, 1, 4, 9, 6, 5, 6, 9, 4, 1), that is, goes back through the sequence in reverse order after 5. In a future blog post we'll see why this is.

SQL Group By with Limit in MySQL: Finding the top n values per group

2021-06-28T18:24:00.001+02:00

Say you have the following database table called "tbl_test" which contains rows which can be grouped by a column.

id	group_name	value
1	a	0
2	a	8
3	a	9
4	a	3
5	b	5
6	b	6
7	b	4
8	c	2
9	c	9

If you want to find the maximum value in each group, that's easy using a Group By statement and a MAX function:

SELECT
	`tbl_test`.`group_name` AS `group_name`,
    MAX(`tbl_test`.`value`) AS `value`
FROM
	`tbl_test`
GROUP BY
	`tbl_test`.`group_name`
ORDER BY
	`tbl_test`.`group_name`
;

group_name	value
a	9
b	6
c	9

But what if you wanted to find the top 2 values in each group? In MySQL, this can be achieved using the aggregate function GROUP_CONCAT which takes all the values in a column in a group, sorts them, and concatenates them into a single string. The problem is that in MySQL v5 there is no way to limit the number of values used, so we'll simulate this using the SUBSTRING_INDEX function, which takes the first n values in a string containing delimited values. For example SUBSTRING_INDEX("ab,c,de", ",", 2) returns "ab,c".

SELECT
	`tbl_test`.`group_name` AS `group_name`,
    SUBSTRING_INDEX(
        GROUP_CONCAT(`tbl_test`.`value` ORDER BY `tbl_test`.`value` DESC SEPARATOR ","),
        ",", 2
    ) AS `values`
FROM
	`tbl_test`
GROUP BY
	`tbl_test`.`group_name`
;

group_name	value
a	9,8
b	6,5
c	9,2

This puts the values together in one column. But sometimes you'll want to have each value being in its own row. In that case you can use the same trick we saw in an earlier post about finding the median value in MySQL, which involves using session variables. I got this trick from this other blog.

SET @row_num := 0;
SET @prev_group := NULL;
SELECT
	`t`.`group_name` AS `group_name`,
    `t`.`value` AS `value`
FROM (
    SELECT
        @row_num := IF(@prev_group = `tbl_test`.`group_name`, @row_num + 1, 1) AS `row_num`,
        @prev_group := `tbl_test`.`group_name`,
        `tbl_test`.`group_name` AS `group_name`,
        `tbl_test`.`value` AS `value`
    FROM
        `tbl_test`
    HAVING
        `row_num` <= 2
    ORDER BY
        `tbl_test`.`group_name` ASC,
        `tbl_test`.`value` DESC
) AS `t`
;

The idea here is to sort by the grouping column and the value and then associate a row number with each row, where the row number restarts on the first row of each group. We then use a Having clause to filter out only the rows that have a row number that is two or less.

group_name	value
a	9
a	8
b	6
b	5
c	9
c	2

VSCode Jupyter Notebook is not opening in Anaconda/conda environment

2021-05-30T21:48:00.001+02:00

The Jupyter Notebook extension in VSCode let's you choose the Python environment you want to run. As with the Python extension, this is done from the bottom tab shown in red below:

But sometimes this is not enough and you still get module-not-found errors. In this case you need to also change the environment from the top, shown in red below:

How to check which Python installation/environment is being used within Python

2021-03-31T00:18:00.000+02:00

Say you have several Python environments, how can you check that you're using the right one whilst running a Python script? The 'executable' constant in the 'sys' module will give you the path to the Python executable currently being used, which is also the path to the environment.

import sys
print(sys.executable)

Micro averaged precision, recall, and F1 are always the same

2021-02-28T14:11:00.002+01:00

This is something that causes confusion when it happens because it isn't very well known. If you're evaluating a multi-class classifier using micro-averaging, the precision, recall, and F1 scores will be exactly the same, always. In fact this is explained in the sklearn documentation as "Note that if all labels are included, “micro”-averaging in a multiclass setting will produce precision, recall and that are all identical to accuracy.". Let's prove this, first using an example and then more generally.

Let's say that your classifier's classes (or labels) are A, B, and C and that you have the following classifier predictions together with the true classes:

pred	true
A	A
A	A
B	A
C	A
A	B
B	B
B	C
C	C
C	C

To measure the micro-averaged precision, recall, and F1 you need to first measure the number true positives (TP), false positives (FP), and false negatives (FN) of each class. For example, if we look at just class A, the TP is the number of rows where A was predicted and also the true class, that is, 2 (first two rows). The FP is the number of rows where A was predicted but was not the true class, that is, 1 (the fifth row). The FN is the number of rows where A was not predicted but was the true class, that is, 1 (the third row).

Class	TP	FP	FN
A	2	1	2
B	1	2	1
C	2	1	1

You calculate the micro-averaged precision ($P$), recall ($R$), and $F1$ for labels $L$ as follows:

$P = \frac{\sum_{l\in L}{{TP}_l}}{\sum_{l\in L}{({TP}_l + {FP}_l)}} = \frac{2+1+2}{(2+1)+(1+2)+(2+1)} = \frac{5}{9} = 55.6\%$
$R = \frac{\sum_{l\in L}{{TP}_l}}{\sum_{l\in L}{({TP}_l + {FN}_l)}} = \frac{2+1+2}{(2+2)+(1+1)+(2+1)} = \frac{5}{9} = 55.6\%$
$F1 = \frac{2 \times P \times R}{P + R} = \frac{2 \times \frac{5}{9} \times \frac{5}{9}}{\frac{5}{9} + \frac{5}{9}} = 55.6\%$

So we got all the scores the same. If $P$ and $R$ are the same than $F1$ will be the same because the harmonic mean of the same number is that number (just like the arithmetic mean). We can easily prove this: $$ \frac{2 \times X \times X}{X + X} = \frac{2X^2}{2X} = X $$ This is assuming that $X$ is not zero, which would mean that the precision and the recall are both equal to zero, which is impossible.

So the question we should ask is why are the precision and recall always the same. If the precision and recall are always the same, then their definition should be identical. Let's see if we can simplify their identity. $$ P = R \\ \frac{\sum_{l\in L}{{TP}_l}}{\sum_{l\in L}{({TP}_l + {FP}_l)}} = \frac{\sum_{l\in L}{{TP}_l}}{\sum_{l\in L}{({TP}_l + {FN}_l)}} \\ \sum_{l\in L}{({TP}_l + {FP}_l)} = \sum_{l\in L}{({TP}_l + {FN}_l)} \\ \sum_{l\in L}{{TP}_l} + \sum_{l\in L}{{FP}_l} = \sum_{l\in L}{{TP}_l} + \sum_{l\in L}{{FN}_l} \\ \sum_{l\in L}{{FP}_l} = \sum_{l\in L}{{FN}_l} $$

So it all boils down to whether or not the sum of each label's false positives is always equal to the sum of each label's false negatives. Let's say that you have the following row in the first table:

pred	true
A	B

This is an error, but is it a false positive or a false negative? From the perspective of label A it's a false positive, but from the perspective of label B it's a false negative. So for every error in the table, it will be both a false positive for the label in the prediction column and a false negative for the label in the true column. This means that the number of false positives and false negatives will always be balanced, as with every false positive for one label there will be a false negative for another label.

And with this we have proven that the number of false negatives will always equal the number of false positives, which further proves that the precision will always be equal to the recall, which further proves that the F1 score will always be equal to them as well.

The secretary problem

2021-01-27T01:12:00.001+01:00

Say you're dating several partners and want to get into a serious relationship eventually. After how many partners can you have a good idea of what a good partner is like? This is called the secretary problem, and it can be formalised as follows:

Let $n$ be the number of partners in total you can date. Let $A_i$ be the fitness of the $i$th partner, such that the order of the partners is random and we are interested in the partner with the maximum fitness. Let $r$ be the number of partners you date before you start looking for a serious relationship. Once a partner has been evaluated, you cannot go back to them if you try the next partner. What you're doing is going through all partners $i$ starting from 1 up to $r$, at which point you will take $\max_{i=1}^{r}(A_i)$ as a reference of what a good partner looks like and continue going through the rest of the partners $r$ to $n$ and stop as soon as $A_i$ is greater than the reference ($\max_{i=1}^{r}(A_i)$). If none of the partners are better than the best of the first $r-1$ then you settle for the last partner. The question is, what should $r$ be such that the probability of picking the partner with maximum $A_i$ is maximised.

The choice of $r$ actually makes a significant difference, as we can show in the following simulation in Python 3 (in Python everything needs to be 0-based such that the first item is at position 0 rather than 1):

import random

def select(A, r):
    ref = max(A[:r]) # first r items
    for i in range(r, len(A)): # rest of the items
        if A[i] > ref:
            return i
    return len(A)-1 # return last item if search failed

n = 20
for r in range(1, n): # first item not included
    successes = 0
    for _ in range(10000):
        A = [random.random() for _ in range(n)]
        best = max(range(n), key=lambda i:A[i])
        choice = select(A, r)
        if choice == best:
            successes += 1
    print(r+1, successes/10000)

These are the fraction of times the best partner is selected out of 10000 simulations for each $r$ from 2 to 20. You will notice that the fraction steadily increases up to a peak at 8 and then steadily decreases until the end. It's interesting that the peak is not in the middle of the list but closer to the beginning. How can we find what the best $r$ would be for any $n$?

First, we need to find the probability of choosing the best partner for a given $r$ and $n$. This probability can be expressed as two events happening together: you stopping at element $i+1$ and the maximum element being element $i+1$ (we're using $i+1$ instead of just $i$ because it will be mathematically convenient later). These two events are independent and so we can find the probability of each one and then multiply them together.

The probability of the maximum element from all $n$ elements being the $i+1$th element is $\frac{1}{n}$.

The other probability is a little trickier. To stop at element $i+1$, all the elements prior to $i+1$ have to be less than the reference we got from the first $r$ elements. If a larger element is found sooner, we would have stopped there. Therefore, we are interested in the probability that the maximum element from the first $i$ elements is among the first $r$ elements, which is $\frac{r}{i}$. This is assuming that $i+1 \geq r+1$, since we cannot choose any partner before the $r+1$th partner.

Multiplying these two probabilities, we get $\frac{r}{n \times i}$. But this is the probability for a given $i+1$, whereas we want the probability for any $i+1$. Each $i+1$ is mutually exclusive so we can add up the probability of each $i+1$. Given that $i+1$ must be greater than or equal to $r+1$ by definition,

$$P(r, n) = \sum_{i=r}^{n}\frac{r}{n \times i} = \frac{r}{n}\sum_{i=r}^{n}\frac{1}{i}$$

Great, so now we have the probability of making a successful selection given $r$ and $n$. Now how do we find at what $r$ is this maximum? Normally we would find the derivative of this equation with respect to $r$ and see what $r$ has to be for the derivative to be 0 (the turning point), but we can't differentiate this expression because $r$ is discrete. We also cannot treat $r$ as continuous because it is used as the starting point in the summation. But if we focus on finding $\frac{r}{n}$ rather than $r$, then we can do so in a round about way.

Start by multiplying $\frac{1}{n}$ with the expression like this:

$$\frac{r}{n}\sum_{i=r}^{n}\frac{n}{i} \frac{1}{n}$$

We can do this since $n$ cannot be zero. Now what will this expression look like if we let $n$ go to infinity?

$$\lim_{n \to \infty} \frac{r}{n}\sum_{i=r}^{n}\frac{n}{i} \frac{1}{n}$$

Let $x = \lim_{n \to \infty} \frac{r}{n}$ and $t = \lim_{n \to \infty} \frac{i}{n}$. In this case, the summation looks like a Reinmann sum, that is, it is adding together the areas of equally wide rectangular strips in a fixed region under a graph. As the width of the strips goes to zero, the total area of all the strips equals the area under the graph (in the fixed region), making it equal to the definite integral of the graph. In this case, the graph is $f(t) = \frac{1}{t}$, the width of the strips is $\frac{1}{n}$, and the region is between $\frac{r}{n}$ and $\frac{n}{n}$. This means the following:

$$\lim_{n \to \infty} \frac{r}{n}\sum_{i=r}^{n}\frac{n}{i} \frac{1}{n} = x\int_x^1 \frac{1}{t} dt$$

Great. So now we need to find the turning point of $x\int_x^1 \frac{1}{t} dt$ with respect to $x$ in order to know what $\frac{r}{n}$ gives the most probable success. This comes as follows:

$$ \begin{align} \frac{d}{dx} (x\int_x^1 \frac{1}{t} dt) &= 0 \\ \frac{d}{dx} x(\ln(1) - \ln(x)) &= 0 \\ \frac{d}{dx} -x\ln(x) &= 0 \\ -\ln(x) - 1 &= 0 \\ x &= \frac{1}{e} \\ \end{align} $$

Which says that the number of partners you should date before looking for a serious relationship is the total number of partners you will date divided by $\frac{1}{e}$, or about 37% of the number of people you'll go out with.

Completing the square

2020-12-29T20:01:00.005+01:00

For all quadratic expressions, that is, of the form $ax^2 + bx + c$, which we'll call canonical forms, there exists an identical expression of the form $a(x + p)^2 + q$, which we'll call completed square forms. For example, $3x^2 + 12x + 27 \equiv 3(x + 2)^2 + 15$. Among other reasons, completed square forms are convient for finding the root of the quadratic's graph because it is easier to manipulate algebraically. Below is the method for converting a quadratic expression in canonical form into an identical quadratic in completed square form.

The trick to completing the square is to see what would happen if, given $x^2 + bx + c$, you assume that the completed square form is $(x + \frac{b}{2})^2$ and correct it. If we have an $a$ in front of $x^2$, we can factor it out of the expression so that we have $a(x^2 + \frac{b}{a}x + \frac{c}{a})$ and then complete the square of what's inside the brackets. Let's use $3x^2 + 12x + 27$ as an example:

$3x^2 + 12x + 27 \equiv 3(x^2 + 4x + 9)$

Focus on $x^2 + 4x + 9$

Assume that $x^2 + 4x + 9 \equiv (x + 2)^2$

Expanding the brackets:
$(x + 2)^2 \equiv x^2 + 2 \times 2 \times x + 2^2 \equiv x^2 + 4x + 4$

We can see that dividing $b$ (that is, 4) by 2 is useful because that term gets doubled after expanding the brackets. We can further see that instead of 9, the last term is 4. So we need to correct for this. Fortunately, we can introduce the term $q$ to our completed square form, which can be used to correct it. We correct it by adding 5 (that is, $9 - 4$) after the squared brackets, like this:

$(x + 2)^2 + 5$

Now, when we expand the brackets, we get:

$(x + 2)^2 + 5 \equiv (x^2 + 2 \times 2 \times x + 2^2) + 5 \equiv x^2 + 4x + 9$

which is exactly what we want. Therefore,

$3x^2 + 12x + 27 \equiv 3(x^2 + 4x + 9) \equiv 3((x + 2)^2 + 5) \equiv 3(x + 2)^2 + 15$

Let's try this with the general case:

$ax^2 + bx + c \equiv a(x^2 + \frac{b}{a}x + \frac{c}{a})$

Focus on $x^2 + \frac{b}{a}x + \frac{c}{a}$

Assume that $x^2 + \frac{b}{a}x + \frac{c}{a} \equiv (x + \frac{b}{2a})^2$

$(x + \frac{b}{2a})^2 \equiv x^2 + 2 \times \frac{b}{2a} \times x + (\frac{b}{2a})^2 \equiv x^2 + \frac{b}{a}x + (\frac{b}{2a})^2$

Add $\frac{c}{a} - (\frac{b}{2a})^2$ to the completed square expression.

$(x + \frac{b}{2a})^2 + \frac{c}{a} - (\frac{b}{2a})^2$

Therefore,

$$ \begin{align*} ax^2 + bx + c &\equiv a(x^2 + \frac{b}{a}x + \frac{c}{a}) \\ &\equiv a((x + \frac{b}{2a})^2 + \frac{c}{a} - (\frac{b}{2a})^2) \\ &\equiv a(x + \frac{b}{2a})^2 + a\frac{c}{a} - a(\frac{b}{2a})^2 \\ &\equiv a(x + \frac{b}{2a})^2 + (c - \frac{b^2}{4a}) \end{align*} $$

You can use the last line to convert the expression directly.

Summary of research paper "A Three-Player GAN: Generating Hard Samples To Improve Classification Networks" by Vandenhende

2020-11-29T14:36:00.002+01:00

After reading this paper, I was confused by Algorithm 1 (page 2) as it did not really match parts of the text. So I contacted the author and would now like to clarify it for everyone else.

The paper describes a generative adversarial network which generates images that are hard to classify in order to augment a training set for a second classifier. This is done by attaching a classifier to the generator (apart from the discriminator) which learns to classify the generated data with a gradient reversal layer between the classifier and the generator. The gradient reversal layer forces the generator to generate examples which are difficult to classify by negating the gradient signal coming from the classifier loss before passing it to the generator. This makes the generator learn to maximise the classifier loss whilst the classifier is learning the minimise it, resulting in adversarial learning. The author does not give a reason for why the same kind of adversarial learning is not applied to both parts of the model.

What confused me is how would you know what class the difficult to classify examples produced by the generator would be when training the target classifier with the augmented training set. The reason I was confused is because I did not realise that this is not a GAN, but a CGAN, a conditional generative adversarial network. A CGAN works by feeding both the generator and the discriminator a class label. The label would be the label of the real data and a random label in generated data. Since the labels are the same for both the generator and the discriminator, the discriminator learns to discriminate the fake data by matching it to the label and seeing if it looks like what is expected given the label whilst the generator produce images which match the given label in order to fool the discriminator.

By combining this with the gradient reversal layer, the generated images would be both similar to real data of the given label and also difficult to classify. These two properties would be contradictory except for the fact that the effect of the classifier is scaled down due to the lambda hyperparameter of the gradient reversal layer, making the discriminator take precedence.

Python generic methods/functions

2020-10-25T17:55:00.003+01:00

This is something minor but that could use with some more attention. In C# you can use generics with methods so that if a function needs to use a generic type which can be different with different calls then you can simply do this:

class C {
    T getValue() {
    }
}

Now Python has type hints which are used by the program mypy, but I was having trouble finding how to do generic methods with it. The problem is that generic method examples shown require you to pass in a parameter with the generic type. But if the type is not used in the parameters but only in the return, as shown in the C# example above, then you'd be in trouble. The solution is to pass the type itself as a parameter, in the same way that the C# example above does but as a parameter instead of outside the parameter list. It is shown here but not as a generic function.

from typing import TypeVar, Type

T = TypeVar('T')

class C:
    def get_value(return_type: Type[T]) -> T:
        pass

Proof of the volume of a pyramid and cone formulae

2020-09-29T22:48:00.003+02:00

Let's take a square based pyramid, one which is arranged such that it fits in the corner of a cube like this:

The volume of this kind of pyramid is a third of that of the cube. This is because 3 such pyramids will perfectly fit in the cube, as shown below:

In fact you don't even need the pyramid to fit in a cube, it can fit in a cuboid instead. Three such rectangular based pyramids will always exactly fit the cuboid. Now the next natural question is about pyramids that are arranged differently, such as when the top point is in the center of the base, as shown below:

You will notice that no matter how the pyramid is arranged, it always consists of an infinite number of stacked squares, starting from the square at the base of the pyramid and getting progressively smaller until the top point is reached. Since the base and top point of both pyramids are exactly the same, the only thing that is changing is where these square slices are horizontally. So the same set of squares in the orange pyramid are used in the green pyramid with the only difference being where they are placed. This means that the same volume is being used by both pyramids so the same formula is used for both. Therefore, the formula for the volume of a rectangular based pyramid is: $$V = \frac{1}{3}hA$$ where $h$ is the height of the pyramid and $A$ is the area of the base. Now what about when the base is not a square? Will a circle work? A circular based pyramid is a cone:

The cone is exactly the same as the square based pyramid except that instead of consisting of a stack of squares consists of a stack of circles (discs, to be exact). Now if the circles fit exactly inside those of a square based pyramid, then the area of the circle is going to be a fraction of the area of the square:

In terms of the radius of the circle ($r$), that fraction is: $$\frac{\pi r^2}{(2r)^2} = \frac{\pi r^2}{4r^2} = \frac{\pi}{4}$$ Now keeping in mind that this fraction will always be the same for any square in the stack of squares forming the pyramid (from the base to the top point), this means that the volume of the cone must be the same fraction of the volume of the pyramid. Therefore: $$V = \frac{\pi}{4}\frac{1}{3}h(2r)^2 = \frac{1}{3}h\pi r^2$$ In fact, the equation did not change from the previous one because it's still using the base area except that the base area is now of a circle instead of a rectangle. What about a trangular based pyramid (tetrahedron)? Same concept. The stack is made of triangles instead of rectangles or circles:

The fraction of the area occupied by the triangle over the area occupied by the square is half: $$\frac{l^2/2}{l^2} = \frac{1}{2}$$ where $l$ is the side length of the square. So now the volume will be half that of the square based pyramid: $$V = \frac{1}{2}\frac{1}{3}h(l)^2 = \frac{1}{3}h\frac{l^2}{2} = \frac{1}{6}hl^2$$ Again we see that the base area equation still holds and must hold for any shaped base pyramid (where the base shape is linearly reduced to a point along the height of the pyramid).

Why the area under a velocity-time graph gives the distance travelled

2020-08-30T16:34:00.016+02:00

I remember back when I was in school when, during physics lessons on motion, we would have velocity-time graphs of some car, where the graph tells you how fast the car was going at any given time, like the one below:

So this is saying that the car started from rest (0m/s) then it accelerated to 20m/s, and so on. The interesting part about this is that you can apparently find the distance travelled by the car by finding the area under the graph. Now eventually I learned that this is because the area under the graph is equal to the anti-derivative of the graph which makes transforms the velocity-time graph back into a distance-time graph (since the derivative of the distance-time graph is the velocity-time graph). But back when I was doing this thing in school we didn't know about calculus so we were not given a reason for why this works, which is a shame because you actually don't need calculus if you're not dealing with curves.

So why does this work? Well, in order to find the area under the graph you need to first break it up into a set of basic shapes as below. Triangles, rectangles, trapezia, and so on. You then find the area of each individual basic shape and find the total.

Let's start with the rectangle. Why is the area under a rectangle equal to the distance travelled? A rectangle means that the velocity for a period of time was constant. The width is the duration and the height is the velocity.

Now if the car was travelling at a constant speed of $v$ for duration $t$, what is the distance travelled? If $v = s/t$ then $s = vt$, that is, the distance travelled is the velocity multiplied by the duration, which just so happens to be the area of the rectangle.

Now one important thing to note about all the sub-shapes under the graph is that they are all different forms of trapezia. A trapezium with equal sides is a rectangle and a trapezium with one side being zero is a triangle. So we can focus on just the trapezium.

So now we can't just use the equation $v = s/t$ because that assumes that the velocity is constant, whereas here it has increased (steadily) from $v_1$ to $v_2$. Now without referring to Newton's equations of motion, it is clear that there exists a single constant velocity which is between $v_1$ and $v_2$ such that the distance travelled in the same duration is equal to the distance travelled after this acceleration. In other words, there is some rectangle with a height that is between $v_1$ and $v_2$ that should give the same answer. We usually call this height the average velocity and it is usually found by dividing the total distance travelled by the total duration of travelling. But here we don't have the distance travelled, only the initial and final velocity. So what is the average velocity of a uniformly increasing velocity?

The answer is the average height of the trapezium, which is $(v_1 + v_2)/2$. We can convince ourselves that this is the average height of the trapezium by sampling a number of points along the top side of the trapezium and finding what the average of the sample heights is. Let's take trapezium C in the brokendown graph above which is between time 20 and 25:

Of course since this is just a single case you might not be convinced yet; so let's do this algebraically using a variable amount of samples $n$:

$$ \hat{v} = \frac{\sum_{i \in 0...n-1}{(v_1 + \frac{i\times(v_2 - v_1)}{n-1})}}{n} \\ \hat{v} = \frac{\sum_{i \in 0...n-1}{v_1} + \sum_{i \in 0...n-1}{\frac{i\times(v_2 - v_1)}{n-1}}}{n} \\ \hat{v} = \frac{n v_1 + \frac{(v_2 - v_1)\times\sum_{i \in 0...n-1}{i}}{n-1}}{n} \\ \hat{v} = v_1 + \frac{(v_2 - v_1)\times\sum_{i \in 0...n-1}{i}}{n\times(n-1)} \\ \hat{v} = v_1 + \frac{(v_2 - v_1)\times\frac{n\times(n-1)}{2}}{n\times(n-1)} \\ \hat{v} = v_1 + \frac{v_2 - v_1}{2} \\ \hat{v} = v_1 + \frac{v_2}{2} - \frac{v_1}{2} \\ \hat{v} = \frac{v_2}{2} + \frac{v_1}{2} \\ \hat{v} = \frac{v_2 + v_1}{2} $$

Great! So now we know that given a trapezium of velocity-time graph, this is equivalent to a rectangle of the same width but with a height that is the average of the two sides:

We already know how to find the distance travelled by a rectangle shaped velocity-time graph so let's use it on the above rectangle: $s = (v2+v1)/2\times t$. It just so happens that this is also the area of the trapezium. So we can just skip all the above and just take the area of the trapezium to find the distance travelled.

All that's left is what is the distance travelled when the velocity is zero since it's just a flat line, but I'm sure you can guess what that is.

How to capture/redirect output in Python's print

2020-07-31T15:44:00.002+02:00

The print function in Python v3 is one of the more sophisticated changes made from v2. You can tell it where to print by using the parameter 'file'. It can either print to the screen as usual or it can print into a text file for example. Here's how you make it print into a text file:

with open('x.txt', 'w', encoding='utf-8') as f:
    print('hello', file=f)

The 'file' parameter accepts a stream object and the print function basically just passes the string you give it to this stream object's 'write' method. In fact in the above example you could have written to the file without using print as follows:

with open('x.txt', 'w', encoding='utf-8') as f:
    f.write('hello')
    f.write('\n')
    f.flush()

So question is, what is the stream used by print when it prints to the screen? The print method's signature is defined as follows:

print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

So the answer is sys.stdout, which is a variable that points to the object sys.__stdout__. In fact you can make your own print function by using the sys.__stdout__ object as follows:

import sys

def my_print(line):
    sys.__stdout__.write(line)
    sys.__stdout__.write('\n')
    sys.__stdout__.flush()

my_print('hello')

Note that the output will appear in the command line terminal. Now we saw how the default file parameter of the print function is whatever sys.stdout is pointing to. This means that we can highjack this variable with our own stream so that default prints get redirected to our code instead of the screen. Here's how to make your own stream:

class MyStream(object):

    def write(self, text):
        pass

    def flush(self):
        pass

Just replace 'pass' in the write method with whatever you want and that is what will happen with the print string. Now normally you would do something like this:

print('hello', file=MyStream())

but if you're trying to capture the outputs of some library, like sklearn, then you don't have control over the library's print functions. Instead you can do this:

import sys

tmp = sys.stdout
sys.stdout = MyStream()

print('hello')

sys.stdout = tmp

First you hijack the stdout variable with your own stream and then after the prints you restore the stdout variable with the original stream. Now, you need to be careful with what happens inside your stream's write method because if there is another print in there then it will also call your stream's write method which will result in an infinite recursion loop. What you can do is temporarily restore the original stream whilst inside your stream, like this:

import sys

class MyStream(object):

    def write(self, text):
        tmp = sys.stdout
        sys.stdout = sys.__stdout__

        print('my', text)

        sys.stdout = tmp

    def flush(self):
        pass


tmp = sys.stdout
sys.stdout = MyStream()

print('hello')

sys.stdout = tmp

Now every time you print something, you'll get 'my' added to its front. Unfortunately the print function sends the '\n' at the end of the line in a separate call to the stream's write method, which means that you actually get two calls per print:

my hello
my

A simple solution is to just put an 'if' statement that checks if the text received is just a '\n' and ignore it if so:

import sys

class MyStream(object):

    def write(self, text):
        if text == '\n':
            return

        tmp = sys.stdout
        sys.stdout = sys.__stdout__

        print('my', text)

        sys.stdout = tmp

    def flush(self):
        pass


tmp = sys.stdout
sys.stdout = MyStream()

print('hello')

sys.stdout = tmp

Git command line cheat sheet

2020-06-30T21:53:00.002+02:00

After my previous blog post about introducing the git command line, here is a table of all the commands I regularly use in git:

The basics

Command	Explanation
git init .	Make the current directory a git repository.
git status	Get information about the current state of the repository.
git add (path)	Add a modified file to the git index. Use '.' for path in order to add all modified files.
git rm (path)	Remove a modified file from the git index.
git diff	Get the file changes from what is in the git index to what is in the currently modified file.
git commit	Take a snapshot backup of the current index.
git log	Get a list of commits made.
git tag -a "(tag name)"	Tag the last commit with a special name (can be used to mark version numbers).
git tag	List existing tags.
git reset --hard HEAD	Return to the last commit (permanently remove any uncommitted changes).

Branches

Command	Explanation
git branch (branch name)	Create a new branch.
git branch --list	List existing branches.
git branch --delete (branch name)	Delete an existing branch.
git checkout (branch name)	Jump to a different branch.
git merge (branch name)	Merge the given branch into the current branch so that they become one branch.
git stash push	Stash any file changes without committing them in order to jump to another branch.
git stash pop	Retrieve back stashed file changes.
git log --graph	Like log but will show how commits in different branches are connected together.

Quick Git command line tutorial

2020-05-31T11:52:00.001+02:00

Git is a version control system which facilitates the creation and maintenance of versions in a project. Usually it's used for software code but it can also be used for things like documents. Its most useful for anything that is text based as you will be able to see which lines have changed across versions.

Given that you've installed git, create folder that will store your project, open your terminal (cmd in Windows), navigate to the folder, and turn it into a repository by entering:

git init .

This will create a folder called ".git" in your current directory which lets you do repository stuff to it. Now whenever you want to do repository stuff, just open your terminal and navigate back to the folder. To see this we can ask for a status report of the repo by entering:

git status

This will output the following:

On branch master

Initial commit

nothing to commit (create/copy files and use "git add" to track)

This is basically telling us that the repo is empty. Now we start putting stuff there. Let's create a readme file using markdown with the file name readme.md:

My first git repo
=================

This is my readme file.

After saving, if we go back to the terminal and enter:

git status

we will now see:

On branch master

Initial commit

Untracked files:
  (use "git add ..." to include in what will be committed)

        readme.md

nothing added to commit but untracked files present (use "git add" to track)

This is saying that readme.md is a file in the repo folder that is not being kept track of by git. We can add this file to the git index by entering:

git add readme.md

After asking for the status again, we will now see:

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached ..." to unstage)

        new file:   readme.md

Now the file is in the index and is 'staged'. If we update the file and save it again:

My first git repo
=================

This is my updated readme file.

the status will now say:

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached ..." to unstage)

        new file:   readme.md

Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

        modified:   readme.md

This is saying that we have a staged file that has also been modified. We can check what has been changed in the file since we last staged it by entering:

git diff

diff --git a/readme.md b/readme.md
index 75db517..5bdd78c 100644
--- a/readme.md
+++ b/readme.md
@@ -1,4 +1,4 @@
 My first git repo
 =================

-This is my readme file.
+This is my updated readme file.

Diff shows you all the lines that were changed together with some unchanged lines next to them for context. The '-' line was replaced by the '+' line. We're also told in "@@ -1,4 +1,4 @@" that the line number that was changed was 4 (1 line was removed at line 4, and another line was added at line 4).

Now we stage this modification so that it is also kept track of:

git add readme.md

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached ..." to unstage)

        new file:   readme.md

Staging changes is not the point of repositories. The point is committing the staged changes. A commit is a backup of the currently indexed files. When you take a backup, you make a copy of your project folder and give it a name so that you can go back to it. This is what a commit does. Enter:

git commit

This will open up a text editor so you can enter a description of what you're committing.

If the text editor is vim, which you will know because it is not helpful at all, you need to first press 'i' for 'insert' before you type anything. To save, press ESC, followed by ':', followed by 'w', followed by enter. To exit, press ESC, followed by ':', followed by 'q', followed by enter.
If it's nano then just follow the on screen commands, noting that '^' means CTRL.

The commit message you write can later be searched in order to find a particular commit. Note that commits are generally thought of as being unchangable after they are made, so make sure you write everything you need to. The first line of the commit message has special importance and is considered to be a summary of what has changed. You should keep it under 50 characters long so that it can be easily displayed in a table. It should be direct and concise (no full stops at the end, for example), with the rest of the lines under it being a more detailed description to make up for any missing information in the first line. A blank line needs to separate the first line from the rest of the lines. Note that it's fine to only have the first line is it's enough.

added the word 'updated' to readme

readme.md was updated so that it says that it is my updated readme file.
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
#
# Initial commit
#
# Changes to be committed:
#       new file:  readme.md

The lines with the # in front were written by git and should be left there. If we check the status again we will now see:

On branch master
nothing to commit, working tree clean

This is saying that there were no changes since the last commit. We can now see all of our commits by entering:

git log

Author: mtanti 
Date:   Sat May 30 10:46:17 2020 +0200

    added the word 'updated' to readme

    readme.md was updated so that it says that it is my updated readme file.

We can see who made the commit, when, and what message was written. It's important that commits are done frequently but on complete changes. A commit is not a save (you do not commit each time you save), it is a backup, and the state of your project in each backup should make sense. Think of it as if you're crossing out items in a todo list and with each item crossed out you're taking a backup. You should not take a backup in between items. On the other hand, your items should be broken down into many simple tasks in order to be able to finish each one quickly.

Now, let's add a folder called 'src' to our repo and then check the status.

On branch master
nothing to commit, working tree clean

In git's eyes, nothing has changed because git does not have a concept of a folder, only of the directory of files. We need to put a file in the folder in order to be able to add it to git's index. Let's add 2 text files to src: 'a.txt' and 'b.txt' with the following content each:

line 1
line 2
line 3

The status now shows:

On branch master
Untracked files:
  (use "git add ..." to include in what will be committed)

        src/

nothing added to commit but untracked files present (use "git add" to track)

This is saying that we have a new folder called 'src' with some files in it. We can add the folder by using the add command. If you want you can avoid having to include the file names of the files you're adding by just using a '.', which means "all unstaged modified or new files":

git add .

On branch master
Changes to be committed:
  (use "git reset HEAD ..." to unstage)

        new file:   src/a.txt
        new file:   src/b.txt

Let's commit these two files.

git commit

added source files

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Changes to be committed:
#       new file:   src/a.txt
#       new file:   src/b.txt

Note that I didn't add anything else to the message other than the first line. There is no need to specify what files were added as they can be seen in the commit. Now let's look at the log of commits:

git log

commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e (HEAD -> master)
Author: mtanti 
Date:   Sat May 30 11:17:14 2020 +0200

    added source files

commit f71f17b63c6b3ddb7506000cbc422e8f1b173958
Author: mtanti 
Date:   Sat May 30 10:46:17 2020 +0200

    added the word 'updated' to readme

    readme.md was updated so that it says that it is my updated readme file.

We can see how all the commits are shown in descending order of when they were made. You might be wondering what 'HEAD' is referring to. HEAD is the commit we are working on. We can now move in between commits and move in time. This is very useful if you start working on something and realise that there was a better way to do it and need to undo all your work up to a particular point in the commit history. When we do this, we would be moving the HEAD to a different commit. The HEAD can be moved by using the checkout command:

git checkout HEAD~1

This is saying "go back one commit behind the HEAD". The command gives the following output:

Note: checking out 'HEAD~1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b 

HEAD is now at f71f17b... added the word 'updated' to readme

This is saying that we are now in the commit with the message "added the word 'updated' to readme". It is also possible to use the hash as a commit identifier in order to just to it directly without being relative to the HEAD. The hash is the 40 digit hexadecimal next to the commits in the log. For example, the first commit had a hash of 'f71f17b63c6b3ddb7506000cbc422e8f1b173958' so we could have entered "git checkout f71f17b63c6b3ddb7506000cbc422e8f1b173958". We can also just use the first 7 digits to avoid typing everything but it's more likely that there will be a collision with another hash.

If look at the log now, we'll see:

commit f71f17b63c6b3ddb7506000cbc422e8f1b173958 (HEAD)
Author: mtanti 
Date:   Sat May 30 10:46:17 2020 +0200

    added the word 'updated' to readme

    readme.md was updated so that it says that it is my updated readme file.

which shows that the HEAD has been moved one commit behind in the timeline.

Now if we look at the project folder (and refresh), we'll see that the folder 'src' has been physically removed. We can restore it by moving forward in time and go to the latest commit which includes the 'src' folder.

Unfortunately, there is no direct notation for moving forward in time as what we just did is not normal usage of git. Note that at the moment the HEAD is said to be 'detached', which means that it is not in a proper place (the end of the timeline). We can get back to the proper place we should be in by checking out to 'master'.

git checkout master

Checking the log, we now see:

commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e (HEAD, master)
Author: mtanti 
Date:   Sat May 30 11:17:14 2020 +0200

    added source files

commit f71f17b63c6b3ddb7506000cbc422e8f1b173958
Author: mtanti 
Date:   Sat May 30 10:46:17 2020 +0200

    added the word 'updated' to readme

    readme.md was updated so that it says that it is my updated readme file.

So what is this 'master' business? What we were calling timelines are actually called 'branches' in git, and branches are one of the most important things in git. Imagine you've started working on a new feature in your program. Suddenly you are told to let go of everything you're doing, work on fixing a bug, and quickly release an update right away. The feature you're working on is half way done and you can't release an updated version of the program with a half finished function; but there's no way you'll finish the feature quickly enough. Do you undo all the work you did on the feature so that you're at a stable version of the program and able to fix the bug? Of course not. That's what branches are for.

With branches you can keep several versions of your code available and switch from one to the other with checkout. The master branch is the one you start with. Ideally the master's last commit should always be in a publishable state (no 'work in progress'). Of course if you're just working on the master branch then this would not be possible without taking committing very rarely, which is bad practice. The solution is to have several development branches on which you modify the project bit by bit. Every time you start working on a new publishable version of your project, you start a new branch and work on modifying your project to create the new version. Once you finish what you're doing and are happy with the result, you then merge the development branch into the master branch. If you're in the middle of something and need to fix a bug, you switch to the master branch, create a new branch for fixing the bug, fix it, and merge it to the master. Then you merge the new master code with your earlier development branch so that you can continue working as if nothing happened.

Let's make a development branch. Enter the following:

git branch mydevbranch

This will create a branch called 'mydevbranch' that sticks out from the current branch at the current HEAD, that is, it will create a branch that sticks out from master's last commit. By 'sticks out' I mean that when you switch to mydevbranch, the project will be changed to look like it was at the commit from where the branch is starting from. Alternatively you can include a commit hash after the name of the branch in order to make it stick out from an earlier point in the current branch. For example "git branch mydevbranch f71f17b63c6b3ddb7506000cbc422e8f1b173958" will create a branch from the first commit.

We can see a list of branches by entering:

git branch --list

* master
  mydevbranch

The asterisk shows which branch is currently active (has the HEAD).

Now switch to the new branch using checkout:

git checkout mydevbranch

and check the status:

On branch mydevbranch
nothing to commit, working tree clean

It now says that we're on mydevbranch instead of on master. Note that whilst we're on the new branch, any modifications we make to the project will be stored on the branch whilst master will remain as it is. Let's modify src/a.txt to look like this:

line 1
line 2 changed
line 3

And now add and commit this file and then check the log:

commit 9bc4488ac847bceccb746eeafb1a8c239de350f2 (HEAD -> mydevbranch)                                                                            Author: mtanti                                                                                                              Date:   Sun May 31 10:24:23 2020 +0200                                                                                                                                                                                                                                                                changed line in a.txt                                                                                                                                                                                                                                                                         commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e (master)                                                                                         Author: mtanti                                                                                                              Date:   Sat May 30 11:17:14 2020 +0200                                                                                                                                                                                                                                                                added source files                                                                                                                                                                                                                                                                            commit f71f17b63c6b3ddb7506000cbc422e8f1b173958                                                                                                  Author: mtanti                                                                                                              Date:   Sat May 30 10:46:17 2020 +0200                                                                                                                                                                                                                                                                added the word 'updated' to readme                                                                                                                                                                                                                                                                readme.md was updated so that it says that it is my updated readme file.

Note how the log shows us where each branch is and where the HEAD is. In this case the HEAD is at mydevbranch's last commit and master is one commit behind it. Let's add another commit after changing src/b.txt:

line 1
line 2
line 3
line 4

Now that we're done with the changes in this new version, we can merge the development branch to master by first switching to master and then merging to mydevbranch:

git checkout master
git merge mydevbranch

Updating 59cfc3f..f962efb
Fast-forward
 src/a.txt | 2 +-
 src/b.txt | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

Note that this was a fast forward merge, which is the best kind of merge you can have. It happens when all the changes were made in a neat sequence which can be simply reapplied on the master, as opposed to having multiple branches changing stuff at the same time.

Before seeing what a merge conflict looks like, let's look at the log again now that we've made the merge, but this time as a graph:

git log --graph

* commit f962efb409e4f08f94d717dec866519bc2848e8f (HEAD -> master, mydevbranch)
| Author: mtanti 
| Date:   Sun May 31 10:30:42 2020 +0200
|
|     added a new line to b.txt
|
* commit 9bc4488ac847bceccb746eeafb1a8c239de350f2
| Author: mtanti 
| Date:   Sun May 31 10:24:23 2020 +0200
|
|     changed line in a.txt
|
* commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e
| Author: mtanti 
| Date:   Sat May 30 11:17:14 2020 +0200
|
|     added source files
|
* commit f71f17b63c6b3ddb7506000cbc422e8f1b173958
  Author: mtanti 
  Date:   Sat May 30 10:46:17 2020 +0200

      added the word 'updated' to readme

      readme.md was updated so that it says that it is my updated readme file.

Here we can see a neat straight line moving through a timeline of commits, from the first to the last with no splits along the way. The master and development branch are together on the last commit in the timeline. Now let's see how this can be different.

Switch back to the development branch so that you can start working on the next version of the project:

git checkout mydevbranch

and change src/a.txt to have a new line:

line 1
line 2 changed
line 3
line 4

As you're working on the file, you get a call from your boss who tells you to immediately change all the lines to start with a capital letter in both src/a.txt and src/b.txt. Now what? You need to stop working on mydevbranch, go back to master, and create a new branch for working on the new request.

Before switching branches it's important that you do not have any uncommitted changes in your project, otherwise they will be carried over to the master branch and that would make things confusing. If you're not at a commitable point in your change then you can save the current state of the branch files by stashing them instead:

git stash push

This will add a stash object to the current commit of the current branch which can then be retrieved and removed. Next checkout to master:

git checkout master

Create and switch to a new branch.

git branch myurgentbranch
git checkout myurgentbranch

and apply the urgent modifications.

src/a.txt

Line 1
Line 2 changed
Line 3

src/b.txt

Line 1
Line 2
Line 3
Line 4

Commit these changes and merge to master:

git add .
git commit
git checkout master
git merge myurgentbranch

The merge should be a fast forward since we started working on the last commit of master with no interruptions. The log will show us the current situation:

git log --graph

* commit 5dcd492113d3942550a58efdc7b90e15bd36d537 (HEAD -> master, myurgentbranch)                                                               | Author: mtanti                                                                                                            | Date:   Sun May 31 11:06:39 2020 +0200                                                                                                         |                                                                                                                                                |     capitalised each line in source files                                                                                                      |                                                                                                                                                * commit f962efb409e4f08f94d717dec866519bc2848e8f (mydevbranch)                                                                                  | Author: mtanti                                                                                                            | Date:   Sun May 31 10:30:42 2020 +0200                                                                                                         |                                                                                                                                                |     added a new line to b.txt
|
* commit 9bc4488ac847bceccb746eeafb1a8c239de350f2
| Author: mtanti 
| Date:   Sun May 31 10:24:23 2020 +0200
|
|     changed line in a.txt
|
* commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e
| Author: mtanti 
| Date:   Sat May 30 11:17:14 2020 +0200
|
|     added source files
|
* commit f71f17b63c6b3ddb7506000cbc422e8f1b173958
  Author: mtanti 
  Date:   Sat May 30 10:46:17 2020 +0200

      added the word 'updated' to readme

      readme.md was updated so that it says that it is my updated readme file.

You can see how when we commit the changes in mydevbranch, we'll have a fork in the timeline from the straight line that we currently have. Let's see what happens then.

Now that we're ready from the urgent request we can go back to the normal development branch:

git checkout mydevbranch

and pop back the changes we stashed:

git stash pop

Note that "git stash list" shows what is in the stash. We were working on src/a.txt where we were adding a new line to the file:

line 1
line 2 changed
line 3
line 4

Now let's imagine that we finished the changes we were making and can commit them:

git add .
git commit

Now a.txt is supposed to have two changes: the new line and each line starting with a capital letter. Each of these changes are on a different branch. Let's start by making the development branch complete by merging the changes we applied to the master to the development branch (note that we're reversing the direction of the merge now because we want the development branch to be up to date).

git merge master

Auto-merging src/a.txt
CONFLICT (content): Merge conflict in src/a.txt
Automatic merge failed; fix conflicts and then commit the result.

This is when things start getting hairy as you'll need to manually fix your conflicting changes. The status will tell us which files need to be fixed:

On branch mydevbranch
You have unmerged paths.
  (fix conflicts and run "git commit")
  (use "git merge --abort" to abort the merge)

Changes to be committed:

        modified:   src/b.txt

Unmerged paths:
  (use "git add ..." to mark resolution)

        both modified:   src/a.txt

This saying that during merging, src/b.txt was updated with no conflicts but src/a.txt requires manual intervention. If we open src/a.txt we'll see the following:

<<<<<<< HEAD
line 1
line 2 changed
line 3
line 4
=======
Line 1
Line 2 changed
Line 3
>>>>>>> master

The file has been modified by git to show the conflicting changes. 7 arrows and equals signs are used to highlight sections of changes which need to be resolved. Note that all the lines have been changed here to the section is the whole file. Now we can either fix the file directly or use git's "git mergetool" to help us by showing all the changes. In this case we can modify the file directly:

Line 1
Line 2 changed
Line 3
Line 4

Make sure to remove all the arrows and equals signs. Status will now output:

All conflicts fixed but you are still merging.
  (use "git commit" to conclude merge)

Changes to be committed:

        modified:   src/a.txt
        modified:   src/b.txt

Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

        modified:   src/a.txt

It's now saying that conflicts were fixed. All we need to do is add the fixed file and continue the merge.

git add src/a.txt
git merge --continue

At the end of the merge, you will be asked to enter a commit message in order to automatically commit the merge change. Git automatically puts the message "Merge branch 'master' into mydevbranch", which is good enough.

The log now shows the new timeline:

*   commit 49abf32812ec7dbeeab792729098a61fd3446a45 (HEAD -> mydevbranch)
|\  Merge: b6cc3c8 5dcd492
| | Author: mtanti 
| | Date:   Sun May 31 11:36:50 2020 +0200
| |
| |     Merge branch 'master' into mydevbranch
| |
| * commit 5dcd492113d3942550a58efdc7b90e15bd36d537 (myurgentbranch, master)
| | Author: mtanti 
| | Date:   Sun May 31 11:06:39 2020 +0200
| |
| |     capitalised each line in source files
| |
* | commit b6cc3c887533a995e749589b6cdbfaaad530b03e
|/  Author: mtanti 
|   Date:   Sun May 31 11:17:55 2020 +0200
|
|       added new line to a.txt
|
* commit f962efb409e4f08f94d717dec866519bc2848e8f
| Author: mtanti 
| Date:   Sun May 31 10:30:42 2020 +0200
|
|     added a new line to b.txt
|
* commit 9bc4488ac847bceccb746eeafb1a8c239de350f2
| Author: mtanti 
| Date:   Sun May 31 10:24:23 2020 +0200
|
|     changed line in a.txt
|
* commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e
| Author: mtanti 
| Date:   Sat May 30 11:17:14 2020 +0200
|
|     added source files
|
* commit f71f17b63c6b3ddb7506000cbc422e8f1b173958
  Author: mtanti 
  Date:   Sat May 30 10:46:17 2020 +0200

      added the word 'updated' to readme

      readme.md was updated so that it says that it is my updated readme file.

We can now continue modifying our development branch with anything left to add in the new version and then switch to master and merge, which will be a fast forward since we have resolved all conflicts already. Note that if you see the log before merging, you will not see the development branch since it is not in the master's timeline. You can see all timelines by entering "git log --graph --all".

git checkout master
git merge mydevbranch

If you want to delete the urgent branch, just enter "git branch --delete myurgentbranch".

This concludes our quick tutorial. I didn't mention anything about remote repositories and pushing and pulling to and from the repositories but basically if you setup a github account, you can keep a backup online for multiple developers to work together, pushing the local repository to the online one and pulling the online repository when it was changed by someone else. The online repository is referred to as 'origin' in git.

Rolling neighbourhood histograms: Computing histograms of sliding windows quickly

2020-04-30T23:04:00.002+02:00

In an earlier blog post, I described local binary patterns, which are vectors that describe the texture of image regions. In segmentation tasks where you need to group pixels together by texture, you need to compute the local binary pattern of the neighbourhoods of pixels around each pixel. This means that if you want to find the texture descriptor for a particular pixel 'P', first you need to extract a square of pixels centered on 'P' called the neighbourhood of 'P', compute the local binary codes of all the pixels in the neighbourhood, compute the histogram of the local binary codes, and that histogram is the texture descriptor of pixel 'P'. This is illustrated in the figure below.

Now in order to compute the texture descriptor of every pixel in the image you would need to repeat this process of extracting neighbourhoods and computing histograms for every pixel. But this is inefficient because it ends up recomputing the same values multiple times. First of all, the local binary codes do not need to be computed for each neighbourhood but can instead be computed once for the whole image and then the neighbourhoods be extracted from the local binary codes image instead of from the original image. But we can also make the histogram computations faster as well. This is because adjacent pixels have a lot of common pixels in their neighbourhoods as shown in the figure below.

We can take advantage of this commonality by computing the histogram of the new neighbourhood based on the histogram of the previous neighbourhood. Assuming the the histograms are simple frequency histograms, the new histogram is the previous histogram plus the frequencies of the values in the blue vertical column (see last row in the above figure) minus the frequencies of the values in the green vertical column. This means that instead of having to compute the frequencies of the whole neighbourhood you would only need to compute the frequencies for two columns and then add or subtract from the previous histogram. This reduces the time complexity from quadratic time to linear time with respect to the side of the neighbourhood square. Assuming that a neighbourhood is described by the (x,y) coordinate of the central pixel and the (w,h) width and height of the neighbourhood, and that histograms can be added together by vector addition, here is the relationship of the new histogram to the previous histogram:

histogram(x,y,w,h) = histogram(x-1,y,w,h) + histogram(x-w/2,y,1,h) - histogram(x+w/2,y,1,h)

This lets us compute the histogram of each neighbourhood by iteratively using the previous histogram in the same row, requiring only the first pixel in the row to be computed fully. But we can also avoid computing the full histogram of the first pixel in a row after the first row by using the histogram of the above pixel in the previous row like this:

histogram(x,y,w,h) = histogram(x,y-1,w,h) + histogram(x,y-h/2,w,1) - histogram(x,y+h/2,w,1)

This is similar to the way rolling hashes are computed, which is where the name rolling histogram comes from.

A differentiable version of Conway's Game of Life

2020-03-31T23:59:00.000+02:00

Conway's Game of Life consists of a grid with black and white squares which flip in colour over time. The rules for changing the colour of a square are as follows:

If the current colour of a square is white then if it is surrounded by 2 or 3 adjacent white squares it stays white, otherwise it becomes black.
If the current colour of a square is black then if it is surrounded by exactly 3 white squares then it becomes white, otherwise it stays black.

If we treat the grid as a numeric matrix where 0 is black and 1 is white, then a differentiable version of these rules can e made. The idea here is to allow not just black and white but also shades of grey so that the values of the squares can change only slightly and hence allow the changes to be differentiable. In order to force the values to remain between 1 and 0 we will make use of the sigmoid (logistic) function which is defined as $y = \frac{1}{1 + e^{-x}}$.

The key trick here is to come up with a differentiable function that maps the number of white squares around a square to 0 or 1. We need two such functions: one that when x is 2 or 3 y is 1 and otherwise y is 0 and another that when x is 3 y is 1 and otherwise y is 0. We can make this function using two sigmoid functions subtracted from each other as follows:

$y = \text{sig}(w \times (x - a)) - \text{sig}(w \times (x - b))$

The graph plot of this function would be one which starts as 0, then increases to 1 at $x = a$ (where $a$ is the halfway point as $y$ goes from 0 to 1), then stays 1 until $x = b$, at which point $y$ goes back to 0 (again, $b$ is the halfway point as $y$ goes from 1 to 0). $w$ is there to control how steep the change from 0 to 1 and back should be with large $w$ meaning very steep.

So now we apply the above equation to be used on a whole matrix of values and where $x$ is the number of adjacent white squares. Given that a square can also be grey, we instead just take the sum of the adjacent values to count the number of white squares. This results in a count which is not a whole number but which is usable in a differentiable function.

We also need to be able to choose between using the rule for white squares or the rule for black squares. To do that we can just take a weighted average as follows:

$y = x \times a + (1 - x) \times b$

where $x$ is a number between 0 and 1 and a is the value $y$ should be when $x$ is 1 whilst $b$ is the value $y$ should be when $x$ is 0.

Given a matrix $G$, you can get the next iteration $G'$ using the following function:

Let $k$ be a 3 by 3 matrix consisting of all 1s and a single 0 in the center.
$N = conv(G, k)$, that is, convolve the kernel $k$ over the matrix $G$ in order to get a new matrix $N$ giving the number of adjacent white squares around every element in the matrix.
$B = \text{sig}(w \times (G - 2.5)) - \text{sig}(w \times (G - 3.5))$ (resultant matrix if all elements where black)
$W = \text{sig}(w \times (G - 1.5)) - \text{sig}(w \times (G - 3.5))$ (resultant matrix if all elements where white)
$G' = G \times W + (1 - G) \times B$

Proof that the sum of all natural numbers is equal to -1/12 (provided we make an assumption)

2020-02-27T23:30:00.002+01:00

This is one of those weird proofs that turns out has practical value in physics. If you add together all the numbers from 1 to infinity, the answer will be $-\frac{1}{12}$. Here's how we get this.

We start from the simpler question of what is 1 - 1 + 1 - 1 + 1 - 1 + ... for an infinite number of terms. If you stop the series at a 1 then the answer is 1 but if you stop at a -1 then the answer is 0. Since the series never stops then we can make the assumption that the answer is 0.5, which is the average. This is like if you had a light switch that you were constantly switching on and off. If done at a fast enough frequency, the light will seem to be halfway light and dark, which is 0.5.

$$\sum_i{(-1)^i} = \frac{1}{2}$$

If you accept the above assumption then the rest is rigorous.

We next need to find what 1 - 2 + 3 - 4 + 5 - ... is. This can be reduced to the above identity by applying the following trick:

S   = 1 - 2 + 3 - 4 + 5 - ...

S+S = 1 - 2 + 3 - 4 + 5 - ...
        + 1 - 2 + 3 - 4 + ...

    = 1 - 1 + 1 - 1 + 1 - ...

Therefore $2S = \frac{1}{2}$ which means that $S = \frac{1}{4}$.

$$\sum_i{(-1)^i i} = \frac{1}{4}$$

Finally we get to our original question:

S'   = 1 + 2 + 3 + 4 + 5 + ...

S'-S = 1 + 2 + 3 + 4 + 5 + 6 + ...
     - 1 + 2 - 3 + 4 - 5 + 6 - ...

     = 0 + 4 + 0 + 8 + 0 + 12 + ...

Which are the even numbers multiplied by 2, that is, the multiples of 4.

S'-S = 4 + 8 + 12 + ...
     = 4(1 + 2 + 3 + ...)
     = 4S'

Now if S' - S = 4S' then

S' - S   = 4S'
S' - 4S' = S
-3S'     = S
S'       = -S/3
         = -(1/4)/3 = -1/12

$$\sum_i{i} = -\frac{1}{12}$$

The Whittaker-Shannon interpolation formula (inferring missing values from a sampled signal using sinc or Lanczos function)

2020-01-26T23:25:00.000+01:00

In an earlier blog post we had talked about Lagrange polynomial interpolation which is when you need to find a polynomial that passes through a set of points. This time we will be seeing a similar problem that is encountered in signal processing.

In discrete signal processing, a continuous signal is represented by a discrete signal consisting of values taken at regular intervals in time, a process called sampling. Below we have an example of a continuous signal that is then sampled at intervals of 0.9 (arbitrarily chosen here).

Now it may be the case that you would like to reconstruct the original signal from the sampled signal. You may think that there is an infinite number of signals that can fit the same samples but because the discrete values are sampled at regular intervals, there is actually only one signal that can fit it, provided that the sampling rate is greater than twice the highest frequency of the original signal. If this condition is met then we can construct the original signal using the Whittaker-Shannon interpolation formula.

The process is similar to that of the Lagrange polynomial interpolation. In polynomial interpolation, the trick is to find a set of component polynomials each of which passes through one of the given set of points. In order to be able to combine them all together into a single polynomial that passes through all of the given set of points, the component polynomials also need to equal zero wherever there is a different point. This makes sure that adding the polynomials together will not cause an interference between them at the given set of points.

In Whittaker-Shannon interpolation, the components are the sinc function instead of polynomials. Sinc functions are periodic functions that are as follows:

$$
\text{sinc}(x) = \begin{cases}
1 & \text{for } x = 0 \\
\frac{sin(x)}{x} & \text{otherwise} \\
\end{cases}
$$

See how sinc is equal to 1 at x = 0 and equal to zero at regular intervals around it? This is how we'll take advantage of it as a component. The first thing we need to do is to synchronise the intervals at which it equals zero to the intervals of the sampled signal. If the sample interval, or period, is 0.9, then the synchronised sinc function would be:

$$
y = sinc\left(\frac{\pi x}{0.9}\right)
$$

Next we need to shift the center peak of the synchronised sinc over one of the samples. Let's pick sample number 3 where sample number 0 is the first sample:

$$
y = sinc\left(\frac{\pi (x - 3 \times 0.9)}{0.9}\right)
$$

Finally we need to adjust the height of the peak to be equal to that of that sample value. Since the peak has a value of one and all the other important points on the sinc function are equal to zero, then if the all we have to do is multiply the sinc function by the sample value:

$$
y = -0.3303 \times sinc\left(\frac{\pi (x - 3 \times 0.9)}{0.9}\right)
$$

Great! Now we just need to do the same for all the other samples and add up all the sinc functions into a single function:

$$
y =
0 \times sinc\left(\frac{\pi (x - 0 \times 0.9)}{0.9}\right) +
0.7628 \times sinc\left(\frac{\pi (x - 1 \times 0.9)}{0.9}\right) +
-0.4309 \times sinc\left(\frac{\pi (x - 2 \times 0.9)}{0.9}\right) +
-0.3303 \times sinc\left(\frac{\pi (x - 3 \times 0.9)}{0.9}\right) +
\dots
$$

Note that the point made at the top about how there is only one signal that passes through the given samples only applied when there is an infinite supply of such samples.

Now, the sinc function is only useful when you want to make use of all points together in order to infer any missing point. In practice, this is very computationally expensive and it would be better to instead use the points thatre close to the missing point rather than all the points. To do this, instead of the sinc function we use a truncated sinc function called a Lanczos filter:

$$
\text{lanczos}(x, a) = \begin{cases}
sinc(\pi x) \times sinc(\frac{\pi x}{a}) & -a < x < a \\
0 & \text{otherwise} \\
\end{cases}
$$

where $a$ is some positive whole number such as 2 and is the amount of points to use to the left and right of the point to infer.

Now we can do the same thing as before but with Lanczos instead of sinc, taking care that the sinc within the lanczos function already multiplies $x$ by $\pi$:

$$
y =
0 \times \text{lanczos}\left(\frac{x - 0 \times 0.9}{0.9}, 2\right) +
0.7628 \times \text{lanczos}\left(\frac{x - 1 \times 0.9}{0.9}, 2\right) +
-0.4309 \times \text{lanczos}\left(\frac{x - 2 \times 0.9}{0.9}, 2\right) +
-0.3303 \times \text{lanczos}\left(\frac{x - 3 \times 0.9}{0.9}, 2\right) +
\dots
$$

The Lanczos filter can be used in things like image enlargement. By treating each pixel in an image as a sample of points from a continuous field, values in between two existing pixels can be deduced by interpolation by reproducing the continuous field using the Lanczos interpolation and then picking missing values from between two pixels.

In general, for samples $s$ and a sample period of $T$,

$$y = \sum_i s[i] \times sinc\left(\frac{\pi (x - i \times T)}{T}\right)$$
$$y = \sum_i s[i] \times lanczos\left(\frac{x - i \times T}{T}\right)$$

Implementing a Gaussian blur filter together with convolution operation from scratch

2019-12-30T02:16:00.002+01:00

Gaussian blurring is a very common filter used in image processing which is useful for many things such as removing salt and pepper noise from images, resizing images to be smaller (downsampling), and simulating out-of-focus effects. Let's look at how to implement one ourselves.

Let's use the following grey-scale image as an example:

The image consists of many numbers that are arranged in a grid, each number called a pixel. The pixel values range from 0 to 255 where 0 is represented with a black colour and 255 with a white colour and all the other numbers are shades of grey in between. Here are the values of the 5x5 square of pixels in the top-left corner of the image:

[[156, 157, 160, 159, 158],
 [156, 157, 159, 158, 158],
 [158, 157, 156, 156, 157],
 [160, 157, 154, 154, 156],
 [158, 157, 156, 156, 157]]

In the case of a colour image, each pixel contains 3 values: the intensity of red, the intensity of blue, and the intensity of green. When these three primary colours are combined in different intensities they can produce any colour.

Let's use the following colour image as an example:

The values of the 5x5 square of pixels in the top-left corner of the image are:

[[[21, 13,  8], [21, 13,  9], [20, 11,  8], [21, 13, 11], [21, 14,  8]],
 [[21, 13,  7], [21, 13,  9], [20, 14,  7], [22, 14,  8], [20, 13,  7]],
 [[21, 14,  7], [23, 13, 10], [20, 14,  9], [21, 13,  9], [21, 15,  9]],
 [[21, 13,  6], [21, 13,  9], [21, 13,  8], [21, 13, 10], [20, 12,  5]],
 [[21, 13,  5], [21, 13,  7], [21, 13,  4], [21, 13,  7], [21, 13,  8]]]

The convolution operation

One of the basic operations in image processing is the convolution. This is when you replace every pixel value $p$ in an image with a weighted sum of the pixel values near $p$, that is, multiply each nearby pixel value with a different number before adding them all up and replacing $p$ with the result. The simplest example of a convolution is when 'nearby' means the pixel itself only. For example, if we convolve a 5x5 pixel greyscale image with a 1x1 weighting, this is what we get:

$$
\left[\begin{array}{ccccc}
1 & 2 & 3 & 4 & 5\\
6 & 7 & 8 & 9 & 10\\
11 & 12 & 13 & 14 & 15\\
16 & 17 & 18 & 19 & 20\\
21 & 22 & 23 & 24 & 25\\
\end{array}\right]
\circledast
\left[\begin{array}{c}
100
\end{array}\right]
=
\left[\begin{array}{ccccc}
1 \times 100 & 2 \times 100 & 3 \times 100 & 4 \times 100 & 5 \times 100\\
6 \times 100 & 7 \times 100 & 8 \times 100 & 9 \times 100 & 10 \times 100\\
11 \times 100 & 12 \times 100 & 13 \times 100 & 14 \times 100 & 15 \times 100\\
16 \times 100 & 17 \times 100 & 18 \times 100 & 19 \times 100 & 20 \times 100\\
21 \times 100 & 22 \times 100 & 23 \times 100 & 24 \times 100 & 25 \times 100\\
\end{array}\right]
=
\left[\begin{array}{ccccc}
100 & 200 & 300 & 400 & 500\\
600 & 700 & 800 & 900 & 1000\\
1100 & 1200 & 1300 & 1400 & 1500\\
1600 & 1700 & 1800 & 1900 & 2000\\
2100 & 2200 & 2300 & 2400 & 2500\\
\end{array}\right]
$$

Here we have replaced the pixel value 1 with 100, 2 with 200, etc. Note that in practice the numbers would be clipped to not exceed 255.

Only the pixel being replaced contributed to its new value. We can extend the range of the weighting by using a 3x3 weight instead of a single number which will combine all the pixels adjacent to the pixel being replaced.

$$
\left[\begin{array}{ccccc}
1 & 2 & 3 & 4 & 5\\
6 & 7 & 8 & 9 & 10\\
11 & 12 & 13 & 14 & 15\\
16 & 17 & 18 & 19 & 20\\
21 & 22 & 23 & 24 & 25\\
\end{array}\right]
\circledast
\left[\begin{array}{ccc}
100 & 200 & 100\\
300 & 400 & 300\\
100 & 200 & 100\\
\end{array}\right]
=
\left[\begin{array}{ccccc}

\left(\begin{array}{cccccc}
& 1 \times 100 & + & 2 \times 200 & + & 3 \times 100\\
+ & 6 \times 300 & + & 7 \times 400 & + & 8 \times 300\\
+ & 11 \times 100 & + & 12 \times 200 & + & 13 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 2 \times 100 & + & 3 \times 200 & + & 4 \times 100\\
+ & 7 \times 300 & + & 8 \times 400 & + & 9 \times 300\\
+ & 12 \times 100 & + & 13 \times 200 & + & 14 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 3 \times 100 & + & 4 \times 200 & + & 5 \times 100\\
+ & 8 \times 300 & + & 9 \times 400 & + & 10 \times 300\\
+ & 13 \times 100 & + & 14 \times 200 & + & 15 \times 100\\
\end{array}\right)
\\

\left(\begin{array}{cccccc}
& 6 \times 100 & + & 7 \times 200 & + & 8 \times 100\\
+ & 11 \times 300 & + & 12 \times 400 & + & 13 \times 300\\
+ & 16 \times 100 & + & 17 \times 200 & + & 18 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 7 \times 100 & + & 8 \times 200 & + & 9 \times 100\\
+ & 12 \times 300 & + & 13 \times 400 & + & 14 \times 300\\
+ & 17 \times 100 & + & 18 \times 200 & + & 19 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 8 \times 100 & + & 9 \times 200 & + & 10 \times 100\\
+ & 13 \times 300 & + & 14 \times 400 & + & 15 \times 300\\
+ & 18 \times 100 & + & 19 \times 200 & + & 20 \times 100\\
\end{array}\right)
\\

\left(\begin{array}{cccccc}
& 11 \times 100 & + & 12 \times 200 & + & 13 \times 100\\
+ & 16 \times 300 & + & 17 \times 400 & + & 18 \times 300\\
+ & 21 \times 100 & + & 22 \times 200 & + & 23 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 12 \times 100 & + & 13 \times 200 & + & 14 \times 100\\
+ & 17 \times 300 & + & 18 \times 400 & + & 19 \times 300\\
+ & 22 \times 100 & + & 23 \times 200 & + & 24 \times 100\\
\end{array}\right)
&
\left(\begin{array}{cccccc}
& 13 \times 100 & + & 14 \times 200 & + & 15 \times 100\\
+ & 18 \times 300 & + & 19 \times 400 & + & 20 \times 300\\
+ & 23 \times 100 & + & 24 \times 200 & + & 25 \times 100\\
\end{array}\right)
\\

\end{array}\right]
=
\left[\begin{array}{ccc}
12600 & 14400 & 16200\\
21600 & 23400 & 25200\\
30600 & 32400 & 34200\\
\end{array}\right]
$$

Note how the new image becomes smaller than the original image. This is in order to avoid using pixels values that lie outside of the image edges when computing new pixel values. One way to use pixel values outside of the image is to treat any pixel outside of the image as having a value of zero (zero padding). We can also reflect the image outside of its edges or wrap around to the other side of the image.

We can implement the above convolution operation in Python naively using for loops as follows:

def conv(img, wgt):
    img_h = len(img)
    img_w = len(img[0])
    wgt_h = len(wgt)
    wgt_w = len(wgt[0])

    res_h = img_h - (wgt_h//2)*2
    res_w = img_w - (wgt_w//2)*2
    res = [ [ 0 for _ in range(res_w) ] for _ in range(res_h) ]

    for img_row in range(wgt_h//2, img_h-wgt_h//2):
        for img_col in range(wgt_w//2, img_w-wgt_w//2):
            new_pix = 0
            for wgt_row in range(wgt_h):
                for wgt_col in range(wgt_w-1, -1, -1): #Technically, a convolution operation needs to flip the weights left to right.
                    new_pix += wgt[wgt_row][wgt_col]*img[img_row+wgt_row-wgt_h//2][img_col+wgt_col-wgt_w//2]
            res[img_row - wgt_h//2][img_col - wgt_w//2] = new_pix

    return res

print(conv([
            [ 1, 2, 3, 4, 5],
            [ 6, 7, 8, 9,10],
            [11,12,13,14,15],
            [16,17,18,19,20],
            [21,22,23,24,25]
        ], [
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ]))

[[12600, 14400, 16200], [21600, 23400, 25200], [30600, 32400, 34200]]

I say naively because it is possible to perform a convolution using fewer loops and with parallel processing but that would be beyond the scope of this blog post. Note that in the code above we are flipping the weights before multiplying them because that is what a convolution involves, but since we'll be working with symmetric weights then it doesn't really matter.

If we want to use zero padding in our convolution then we would do it as follows:

def conv_pad(img, wgt):
    img_h = len(img)
    img_w = len(img[0])
    wgt_h = len(wgt)
    wgt_w = len(wgt[0])

    res = [ [ 0 for _ in range(img_w) ] for _ in range(img_h) ]

    for img_row in range(img_h):
        for img_col in range(img_w):
            new_pix = 0
            for wgt_row in range(wgt_h):
                for wgt_col in range(wgt_w-1, -1, -1):
                    y = img_row+wgt_row-wgt_h//2
                    x = img_col+wgt_col-wgt_w//2
                    if 0 >= y > img_h and 0 >= x > img_w:
                        pix = img[y][x]
                    else:
                        pix = 0
                    new_pix += wgt[wgt_row][wgt_col]*pix
            res[img_row][img_col] = new_pix

    return res

print(conv_pad([
            [ 1, 2, 3, 4, 5],
            [ 6, 7, 8, 9,10],
            [11,12,13,14,15],
            [16,17,18,19,20],
            [21,22,23,24,25]
        ], [
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ]))

[[2900, 4800, 6200, 7600, 6100], [8300, 12600, 14400, 16200, 12500], [14800, 21600, 23400, 25200, 19000], [21300, 30600, 32400, 34200, 25500], [19900, 28800, 30200, 31600, 23100]]

We can adapt this function to be able to operate on colour images by simply applying the filter on each of the three intensity channels separately as follows:

def conv_pad_colour(img, wgt):
    img_h = len(img)
    img_w = len(img[0])
    wgt_h = len(wgt)
    wgt_w = len(wgt[0])

    res = [ [ [ 0, 0, 0 ] for _ in range(img_w) ] for _ in range(img_h) ]

    for img_row in range(img_h):
        for img_col in range(img_w):
            for channel in range(3):
                new_pix = 0
                for wgt_row in range(wgt_h):
                    for wgt_col in range(wgt_w-1, -1, -1):
                        y = img_row+wgt_row-wgt_h//2
                        x = img_col+wgt_col-wgt_w//2
                        if 0 <= y < img_h and 0 <= x < img_w:
                            pix = img[y][x][channel]
                        else:
                            pix = 0
                        new_pix += wgt[wgt_row][wgt_col]*pix
                res[img_row][img_col][channel] = new_pix

    return res

print(conv_pad_colour([
            [[ 1,  2,  3], [ 4,  5,  6], [ 7,  8,  9], [10, 11, 12], [13, 14, 15]],
            [[16, 17, 18], [19, 20, 21], [22, 23, 24], [25, 26, 27], [28, 29, 30]],
            [[31, 32, 33], [34, 35, 36], [37, 38, 39], [40, 41, 42], [43, 44, 45]],
            [[46, 47, 48], [49, 50, 51], [52, 53, 54], [55, 56, 57], [58, 59, 60]],
            [[61, 62, 63], [64, 65, 66], [67, 68, 69], [70, 71, 72], [73, 74, 75]]
        ], [
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ]))

[[[6700, 7700, 8700], [11600, 13000, 14400], [15800, 17200, 18600], [20000, 21400, 22800], [16300, 17300, 18300]], [[22300, 23600, 24900], [34200, 36000, 37800], [39600, 41400, 43200], [45000, 46800, 48600], [34900, 36200, 37500]], [[41800, 43100, 44400], [61200, 63000, 64800], [66600, 68400, 70200], [72000, 73800, 75600], [54400, 55700, 57000]], [[61300, 62600, 63900], [88200, 90000, 91800], [93600, 95400, 97200], [99000, 100800, 102600], [73900, 75200, 76500]], [[57700, 58700, 59700], [83600, 85000, 86400], [87800, 89200, 90600], [92000, 93400, 94800], [67300, 68300, 69300]]]

In practice we can use scipy's convolve function in order to apply a convolution:

import scipy.ndimage
import numpy as np
img = np.array([
            [[ 1,  2,  3], [ 4,  5,  6], [ 7,  8,  9], [10, 11, 12], [13, 14, 15]],
            [[16, 17, 18], [19, 20, 21], [22, 23, 24], [25, 26, 27], [28, 29, 30]],
            [[31, 32, 33], [34, 35, 36], [37, 38, 39], [40, 41, 42], [43, 44, 45]],
            [[46, 47, 48], [49, 50, 51], [52, 53, 54], [55, 56, 57], [58, 59, 60]],
            [[61, 62, 63], [64, 65, 66], [67, 68, 69], [70, 71, 72], [73, 74, 75]]
        ])
weights = np.array([
            [100,200,100],
            [300,400,300],
            [100,200,100]
        ])
res = np.stack([ #Convolve each channel separately as a greyscale image and then combine them back into a single image.
        scipy.ndimage.convolve(img[:,:,channel], weights, mode='constant')
        for channel in range(3)
    ], axis=2)
print(res)

In signal processing, the weights are actually called a filter.

The gaussian blur

A blur is made by replacing each pixel value with the average value of its nearby pixels. This softens edges and more the image look out of focus. Averaging nearby pixels can be easily achieved by convolving with a filter of $\frac{1}{n}$ where $n$ is the amount of numbers in the filter. This results in making each pixel equal to $$p_1 \times \frac{1}{n} + p_2 \times \frac{1}{n} + \dots + p_n \times \frac{1}{n} = \frac{1}{n} \left(p_1 + p_2 + \dots + p_n\right)$$ where $p_i$ is a pixel value.

import scipy.ndimage
import skimage
import numpy as np

img = skimage.io.imread('image.png')
filter = np.ones([11, 11])
filter = filter/filter.size

res = np.stack([ #Convolve each channel separately as a greyscale image and then combine them back into a single image.
        scipy.ndimage.convolve(img[:,:,channel], filter, mode='constant')
        for channel in range(3)
    ], axis=2)

skimage.io.imshow(res)
skimage.io.show()

This is actually called a uniform blur, which is not considered an ideal blur as it gives equal importance to pixels that are far from the pixel being replaced as to pixels that are close. A better blur is a gaussian blur. A gaussian blur is when the filter values follow a gaussian bell curve with a maximum in the center and a steep decline just around the center. The gaussian function is defined as follows: $$G(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{x^2}{2\sigma^2}}$$ where $\sigma$ is a parameter for controlling how wide the bell curve is and therefore how much influence the nearby pixels have on the value of the pixel being replaced (the greater the $\sigma$, the more blurry). In two dimensions all you do is take the gaussian of each variable and multiply them together: $$G(x,y) = G(x) \times G(y)$$ Now the way the gaussian blur works is to use an infinite 2D gaussian curve as a filter for a convolution. The center of the filter is $G(0,0)$ and each cell in the filter being a distance of one in the gaussian function's domain as follows: $$ \newcommand\iddots{\mathinner{ \kern1mu\raise1pt{.} \kern2mu\raise4pt{.} \kern2mu\raise7pt{\Rule{0pt}{7pt}{0pt}.} \kern1mu }} \begin{array}{ccccc} \ddots & & \vdots & & \iddots\\ & G(-1, 1) & G( 0, 1) & G( 1, 1) & \\ \cdots & G(-1, 0) & G( 0, 0) & G( 1, 0) & \cdots\\ & G(-1,-1) & G( 0,-1) & G( 1,-1) & \\ \iddots & & \vdots & & \ddots\\ \end{array} $$ In order to be a perfect gaussian blur, the filter needs to be infinitely large, which is not possible. Fortunately, $G(3\sigma)$ is small enough to be rounded down to zero. This means that our kernel needs to only have a side of $\lceil 6\sigma + 1 \rceil$ at most as anything larger than that will not make a difference ($\lceil x \rceil$ means $x$ rounded up). The problem is that even though the missing values are negligible, the values in the filter will not equal 1 (especially for a small $\sigma$) which makes the resulting image will be either brighter or darker. To get around this we cheat a bit and divide each value in the filter by the filter's total, which results in a guassian blur with a neutral effect on the brightness of the image. $$ \begin{array}{ccccccc} \frac{G(-\lceil 3\sigma \rceil, \lceil 3\sigma \rceil)}{T} & & & \frac{G(0, \lceil 3\sigma \rceil)}{T} & & & \frac{G(\lceil 3\sigma \rceil, \lceil 3\sigma \rceil)}{T}\\ & \ddots & & \vdots & & \iddots & \\ & & \frac{G(-1, 1)}{T} & \frac{G( 0, 1)}{T} & \frac{G( 1, 1)}{T} & & \\ \frac{G(-\lceil 3\sigma \rceil, 0)}{T} & \cdots & \frac{G(-1, 0)}{T} & \frac{G( 0, 0)}{T} & \frac{G( 1, 0)}{T} & \cdots & \frac{G(\lceil 3\sigma \rceil, 1)}{T}\\ & & \frac{G(-1,-1)}{T} & \frac{G( 0,-1)}{T} & \frac{G( 1,-1)}{T} & & \\ & \iddots & & \vdots & & \ddots & \\ \frac{G(-\lceil 3\sigma \rceil, -\lceil 3\sigma \rceil)}{T} & & & \frac{G(0, -\lceil 3\sigma \rceil)}{T} & & & \frac{G(\lceil 3\sigma \rceil, -\lceil 3\sigma \rceil)}{T}\\ \end{array} $$ where $T$ is the total of all the $G(x,y)$ in the filter.

import scipy.ndimage
import skimage
import numpy as np

img = skimage.io.imread('image.png')

sigma = 4
G = lambda x: 1/np.sqrt(2*np.pi*sigma**2)*np.exp(-x**2/(2*sigma**2))
G2 = lambda x,y: G(x)*G(y)
limit = np.ceil(3*sigma).astype(np.int32)
filter = G2(*np.meshgrid(np.arange(-limit, limit+1), np.arange(-limit, limit+1)))
filter = weights/np.sum(filter)

res = np.stack([
        scipy.ndimage.convolve(img[:,:,channel], filter, mode='constant')
        for channel in range(3)
    ], axis=2)
skimage.io.imshow(res)
skimage.io.show()

Of course skimage has a gaussian blur filter out of the box for us to use:

import skimage

img = skimage.io.imread('image.png')

sigma = 4
res = skimage.filters.gaussian(img, sigma, mode='constant', truncate=3) #Truncate to 3 sigmas.
skimage.io.imshow(res)
skimage.io.show()

Texture Descriptors of Image Regions using Local Binary Patterns

2019-11-30T11:25:00.002+01:00

A very simple way to measure the similarity between the texture of regions in images is to capture the region's texture information in a vector using local binary patterns. These vectors are invariant to intensity changes, fast to compute, and very easy to understand. Let's see how this works. Note that since we're focusing on texture then we don't care about colour but about intensities, so the image needs to be greyscale.

The first step is to replace all the pixels in the image region with LBP (local binary pattern) codes. For every pixel, draw an imaginary circle of some radius centred on the pixel and draw some number of imaginary dots on that circle that are equally spaced apart. The radius and number of dots are to be hand tuned in order to maximise performance but they are usually set to a radius of 1 pixel and a number of dots of 8. Now pick the pixels that fall under those dots together with the pixel at the centre.

In the surrounding pixels, any pixel that has a lower intensity than the central pixel gets a zero and all the rest get a one. These numbers are put together in order to form a binary number called the LBP code of the central pixel.

Do this for every pixel in the region and then count how many times each code appears in the region, putting all the frequencies in a vector (a histogram). This is your texture descriptor and you can stop here. The more similar the vector is to other vectors, the more similar the texture of the image regions are.

We can do better than this however by making the codes uniform. If you do an analysis of these LBP code frequencies, you'll find that most of them have a most two bit changes in them. For example, 00011100 has two bit changes, one from a 0 to a 1 and another after from a 1 to a 0. On the other hand 01010101 has 7 changes. The reason why the majority will have at most two bit changes is because, most of the time, the pixels around a central pixel do not have random intensities but change in a flow as shown in the figure above. If there is a gradient of intensity changes in a single direction then there will only be two bit changes. This means that we can replace all LBP codes with more than two bit changes with a single LBP code that just means 'other'. This is called a uniform LBP code and it reduces our vector size significantly without much of a performance hit, which is good.

We can do even better by making the LBP codes rotation invariant, that is, using the same LBP code regardless of where the gradient direction is pointing. In the above figure, if the line of dark grey pixels were rotated, the LBP code would be changed to something like 00011110 for example or 11110000. The only thing that would change when rotating the surrounding pixels would be the bit rotation of the LBP code. We can make our codes invariant by artificially rotating our LBP codes so that for a given number of ones and zeros in the code we always have a canonical code. Usually we treat the binary code as an integer and rotate the bits until we get the minimum integer. For example, here are all the bit rotations of four 1s and four 0s:

11110000 (240), 01111000 (120), 00111100 (60), 00011110 (30), 00001111 (15)

In this case we would pick 00001111 as our canonical representation as that is the smallest integer. This drastically reduces our vector sizes as the amount of different possible LBP codes to find the frequency of will be a small fraction of the original full set of LBP codes.

You can use LBP descriptors using Python's skimage and numpy as follows:

import numpy as np
import skimage
import skimage.feature

image = skimage.io.imread('img.png', as_gray=True)
region = image[0:100,0:100]
lbp_codes = skimage.feature.local_binary_pattern(region, P=8, R=1, method='uniform') #8 dots, radius 1, rotation invariant uniform codes.
descriptor = np.histogram(lbp_codes, bins=10, range=(0, 10))[0] #Number of bins and range changes depending on the number of dots and LBP method used.

The number of bins and range needed can be obtained by artificially generating all the possible arrangements of darker and lighter pixels around a central pixel:

def get_histogram_params(dots, method):
    all_possible_lbp_codes = set()
    for i in range(2**8):
        str_bin = '{:0>8b}'.format(i)
        region = [ {'0':0,'1':2}[ch] for ch in str_bin ] #Surrounding pixels.
        region.insert(4, 1) #Central pixel.
        region = np.array(region).reshape((3,3))
        lbp_codes = skimage.feature.local_binary_pattern(region, dots, 1, method) #Radius is minimum to keep region size to a minimum.
        all_possible_lbp_codes.add(lbp_codes[1,1]) #Get code of central pixel and add it to the set (keeps on unique values).
    num_bins = len(all_possible_lbp_codes)
    rng = (min(all_possible_lbp_codes), max(all_possible_lbp_codes)+1)
    return (num_bins, rng)

You can get more information about the skimage function from here.

No square number can end in 2, 3, 7, or 8 (modulo of square numbers)

2019-10-30T23:01:00.002+01:00

Look at the following list of square numbers:

0^2:   0
 1^2:   1
 2^2:   4
 3^2:   9
 4^2:  16
 5^2:  25
 6^2:  36
 7^2:  49
 8^2:  64
 9^2:  81
10^2: 100
11^2: 121
12^2: 144
13^2: 169
14^2: 196
15^2: 225
16^2: 256
17^2: 289
18^2: 324
19^2: 361
20^2: 400
21^2: 441
22^2: 484
23^2: 529
24^2: 576
25^2: 625
26^2: 676
27^2: 729
28^2: 784
29^2: 841
30^2: 900

Notice how the last digit is always 0, 1, 4, 9, 6, or 5? There also seems to be a cyclic pattern of digits repeating themselves but we'll talk about that in another post. In this post, we'll just talk about why certain digits do not occur at the end of square numbers, which is useful in order to check if a number is a square or not.

We can get the last digit of a number by dividing it by 10 and keeping the remainder, also known as the number modulo 10:

0^2 mod 10: 0
 1^2 mod 10: 1
 2^2 mod 10: 4
 3^2 mod 10: 9
 4^2 mod 10: 6
 5^2 mod 10: 5
 6^2 mod 10: 6
 7^2 mod 10: 9
 8^2 mod 10: 4
 9^2 mod 10: 1
10^2 mod 10: 0
11^2 mod 10: 1
12^2 mod 10: 4
13^2 mod 10: 9
14^2 mod 10: 6
15^2 mod 10: 5
16^2 mod 10: 6
17^2 mod 10: 9
18^2 mod 10: 4
19^2 mod 10: 1
20^2 mod 10: 0
21^2 mod 10: 1
22^2 mod 10: 4
23^2 mod 10: 9
24^2 mod 10: 6
25^2 mod 10: 5
26^2 mod 10: 6
27^2 mod 10: 9
28^2 mod 10: 4
29^2 mod 10: 1
30^2 mod 10: 0

It turns out that this sort of exclusion of certain modulo results from square numbers happens with many other numbers apart from 10. Take 12 for example:

0^2 mod 12: 0
 1^2 mod 12: 1
 2^2 mod 12: 4
 3^2 mod 12: 9
 4^2 mod 12: 4
 5^2 mod 12: 1
 6^2 mod 12: 0
 7^2 mod 12: 1
 8^2 mod 12: 4
 9^2 mod 12: 9
10^2 mod 12: 4
11^2 mod 12: 1
12^2 mod 12: 0
13^2 mod 12: 1
14^2 mod 12: 4
15^2 mod 12: 9
16^2 mod 12: 4
17^2 mod 12: 1
18^2 mod 12: 0
19^2 mod 12: 1
20^2 mod 12: 4
21^2 mod 12: 9
22^2 mod 12: 4
23^2 mod 12: 1
24^2 mod 12: 0
25^2 mod 12: 1
26^2 mod 12: 4
27^2 mod 12: 9
28^2 mod 12: 4
29^2 mod 12: 1
30^2 mod 12: 0

With 12 you only get 0, 1, 4, and 9 as digits which is less than with 10. Why does this happen? We can prove that this is the case using modulo algebra. Let's call the number being squared $x$ and the number after the modulo $n$. Then by the distributive law of the modulo operation:

$x^2 \text{ mod } n = (x \times x) \text{ mod } n = ((x \text{ mod } n) \times (x \text{ mod } n)) \text{ mod } n = (x \text{ mod } n)^2 \text{ mod } n$

What the above derivation says is that when dividing a square number, the remainder will be based on the remainder of the root of the square number. In the case of the last digit of a square number, which is equivalent to the square number modulo 10, the last digit will be one of the last digits of the single digits squared: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81. This is because $x \text{ mod } 10$ gives you the last digit of the number $x$, which is a single digit, and then this is squared and the last digit of the square is the answer.

In the general case, this is setting a limit on the number of possible remainders (usually $x \text{ mod } n$ has $n$ different possible answers for all $x$). The number of different remainders is first limited to $n$ with the first inner modulo but then only the remainders resulting from these numbers squared can be returned. This means that if one of the square numbers has the same modulo as another square number, then the total number of final remainders has decreased.

In a future post we will investigate why the remainders of square numbers are cyclic.

Python functions for generating all substrings/sublists, combinations, permutations, and sequences

2019-09-26T22:14:00.000+02:00

def subs(xs, sub_len):
    '''Get all sublists/substrings from xs of the given sub-length.'''
    for i in range(len(xs) - sub_len + 1): #Get all indexes of the substrings that are at least sub_len long.
        yield xs[i:i+sub_len]

for x in subs('abcd', 2):
    print(x)

ab
bc
cd

def combinations(xs, comb_size):
    '''Get all combinations of the given combination size.'''
    if comb_size == 0:
        yield xs[:0] #Empty list/string
    else:
        for i in range(len(xs) - comb_size + 1):
            for ys in combinations(xs[i+1:], comb_size-1): #For every item, combine it with every item that comes after it.
                yield xs[i] + ys

for x in combinations('abcd', 2):
    print(x)

ab
ac
ad
bc
bd
cd

def perms(xs):
    '''Get all permutations.'''
    if len(xs) == 1:
        yield xs
    else:
        for i in range(len(xs)):
            for ys in perms(xs[:i] + xs[i+1:]): #For every item, combine it with every item except itself.
                yield xs[i] + ys

for x in perms('abc'):
    print(x)

abc
acb
bac
bca
cab
cba

def permutations(xs, perm_size):
    '''Get all permutations of the given permutation size.'''
    for cs in combinations(xs, perm_size):
        for p in perms(cs): #Get all the permutations of all the combinations.
            yield p

for x in permutations('abcd', 2):
    print(x)

ab
ba
ac
ca
ad
da
bc
cb
bd
db
cd
dc

def seqs(xs, seq_len):
    '''Get all sequences of a given length.'''
    if seq_len == 0:
        yield xs[:0] #Empty list/string
    else:
        for i in range(len(xs)):
            for ys in seqs(xs, seq_len-1): #For every item, combine it with every item including itself.
                yield xs[i] + ys

for x in seqs('abc', 2):
    print(x)

aa
ab
ac
ba
bb
bc
ca
cb
cc

The Fourier transform for real numbers (approximating complicated periodic functions with simple sinusoids)

2019-08-30T08:31:00.000+02:00

In a previous post we saw how to approximate complex functions using simple polynomials by using the Taylor series approximation method. This consisted in assuming that your complex function was an actual polynomial and then using mathematical tricks to tease out the coefficients of the polynomial one by one. Polynomials are useful functions to approximate curves but are terrible at approximating periodic functions such as $sin(x)$ (repeat themselves periodically) because they themselves are not periodic. In this post we'll see how to approximating periodic functions by using a sum of simple sinusoids (sines or cosines). With polynomials, we had a summation of powers of $x$, each of which having a different power. Now we will instead be having a summation of sinusoids, each of which having a different whole number frequency. With polynomials we had to find the coefficients, which are constants multiplied by the power of $x$, and any curve could be arbitrarily approximated by finding the right coefficients. Now we will be finding amplitudes, which are constants multiplied by each sinusoid, and any periodic function can be arbitrarily approximated by finding the right amplitudes.

Instead of the Taylor series approximation we will now be using the Fourier series approximation, also known as the Fourier transform. In order to keep things simple we will only be looking at real functions rather than complex ones (functions with imaginary components). The Fourier series assumes that the periodic function we're deconstructing, $f(x)$, begins its period at 0 and ends it at $T$, after which $f(x)$ will continue repeating itself such that $f(T + k) = f(k)$. If we'd like our period to start at $x_0$ instead then we can easily just replace our periodic function with $f'(x)$ such that $f'(x) = f(x + x_0)$ and then subtract $x_0$ from $x$ again after getting the approximate function.

Let's say that we have a complex periodic function where the first period is between 0 and 2 and is defined as $y = \cosh(x-1)$. This is what it looks like:

What we mean by this being a periodic function is that if we looked beyond the first period we would see the function looking like this:

We're assuming that our function consists of a summation of sinusoids with different frequencies and amplitudes such that each wavelength is equal to the period of the original function or a whole number division of that period (half, quarter, etc.). To make a sine or cosine have a period equal to $T$, we use $sin(x\frac{2\pi}{T})$ and $cos(x\frac{2\pi}{T})$. Multiplying $x$ by a whole number will change the frequency of the sinusoid such that the wavelength is equal to a whole number division of $T$. For example, $sin(2 x\frac{2\pi}{2})$ will make the sine function repeat itself twice when going from 0 to 2. This is what our sinusoid functions look like for both sine and cosine:

By adding up these sines and cosines together whilst having specific amplitudes (that we need to find) we can get closer and closer to the original periodic function.

The Fourier transform takes advantage of an interesting quirk of sinusoids and the area under their graph. For two positive whole numbers $n$ and $m$, if you multiply $\sin(n x\frac{2\pi}{T})$ by $\sin(m x\frac{2\pi}{T})$ or $\cos(n x\frac{2\pi}{T})$ by $\cos(m x\frac{2\pi}{T})$, the area under the graph between 0 and $T$ will always be zero, with the only exception being when $n = m$. If $n = m$ then the area will be equal to $\frac{T}{2}$. In other words,

$$
\int_{0}^{T} \sin\left(n x\frac{2\pi}{T}\right) \sin\left(m x\frac{2\pi}{T}\right) dx =
\begin{cases}
0& \text{if } n \neq m\\
\frac{T}{2}& \text{otherwise}
\end{cases}
$$
and likewise for $\cos$ instead of $\sin$. Here's an example showing the area under the graph of two cosines with different frequencies being multiplied together:

Note how each hump above the x-axis has a corresponding anti-hump below the x-axis which cancel each other out, resulting in a total area of zero. Here's an example showing the area under the graph of two cosines with the same frequency being multiplied together:

Note how the area is all above the x-axis. If we had to measure this area, it would be equal to $\frac{T}{2}$ where $T = 2$. Let's prove this for all frequencies.

Proof that the area under the product of two sines with different frequencies is 0:
$$
\begin{align}
\int_{0}^{T} \sin\left(n x\frac{2\pi}{T}\right) \sin\left(m x\frac{2\pi}{T}\right) dx
&= \int_{0}^{T} \frac{1}{2}\left(\cos\left(n x\frac{2\pi}{T} - m x\frac{2\pi}{T}\right) - \cos\left(n x\frac{2\pi}{T} + m x\frac{2\pi}{T}\right)\right) dx\\
&= \frac{1}{2}\left(\int_{0}^{T} \cos\left((n - m) \frac{2\pi}{T}x\right) dx - \int_{0}^{T} \cos\left((n + m) \frac{2\pi}{T}x\right) dx\right)\\
&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left[\sin\left((n - m) \frac{2\pi}{T}x\right)\right]_{0}^{T} - \frac{1}{(n + m) \frac{2\pi}{T}}\left[\sin\left((n + m) \frac{2\pi}{T}x\right)\right]_{0}^{T}\right)\\
&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) \frac{2\pi}{T}T\right) - \sin\left((n - m) \frac{2\pi}{T}0\right)\right) - \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) \frac{2\pi}{T}T\right) - \sin\left((n + m) \frac{2\pi}{T}0\right)\right)\right)\\
&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) 2\pi\right) - \sin\left(0\right)\right) - \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) 2\pi\right) - \sin\left(0\right)\right)\right)\\
&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right) + \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right)\right)\\
&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(0 - 0\right) - \frac{1}{(n + m) \frac{2\pi}{T}}\left(0 - 0\right)\right)\\
&= 0\\
\end{align}
$$

Proof that the area under the product of two sines with the same frequencies is half their period:
$$
\begin{align}
\int_{0}^{T} \sin\left(n x\frac{2\pi}{T}\right) \sin\left(n x\frac{2\pi}{T}\right) dx
&= \int_{0}^{T} \sin^2\left(n x\frac{2\pi}{T}\right) dx\\
&= \int_{0}^{T} \frac{1}{2}\left(1 - \cos\left(2n x\frac{2\pi}{T}\right)\right) dx\\
&= \frac{1}{2}\left(\int_{0}^{T} 1 dx - \int_{0}^{T}\cos\left(2n\frac{2\pi}{T} x\right) dx\right)\\
&= \frac{1}{2}\left(\left[x \right]_{0}^{T} - \frac{1}{2n\frac{2\pi}{T}}\left[\sin\left(2n\frac{2\pi}{T} x\right)\right]_{0}^{T}\right)\\
&= \frac{1}{2}\left(\left(T - 0\right) - \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2n\frac{2\pi}{T} T\right) - \sin\left(2n\frac{2\pi}{T} 0\right)\right)\right)\\
&= \frac{1}{2}\left(T - \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2n2\pi\right) - \sin\left(0\right)\right)\right)\\
&= \frac{1}{2}\left(T - \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right)\right)\\
&= \frac{1}{2}\left(T - \frac{1}{2n\frac{2\pi}{T}}\left(0 - 0\right)\right)\\
&= \frac{T}{2}\\
\end{align}
$$

Proof that the area under the product of two cosines with different frequencies is 0:
$$
\begin{align}
\int_{0}^{T} \cos\left(n x\frac{2\pi}{T}\right) \cos\left(m x\frac{2\pi}{T}\right) dx
&= \int_{0}^{T} \frac{1}{2}\left(\cos\left(n x\frac{2\pi}{T} - m x\frac{2\pi}{T}\right) + \cos\left(n x\frac{2\pi}{T} + m x\frac{2\pi}{T}\right)\right) dx\\
&= \frac{1}{2}\left(\int_{0}^{T} \cos\left((n - m) \frac{2\pi}{T}x\right) dx + \int_{0}^{T} \cos\left((n + m) \frac{2\pi}{T}x\right) dx\right)\\
&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left[\sin\left((n - m) \frac{2\pi}{T}x\right)\right]_{0}^{T} + \frac{1}{(n + m) \frac{2\pi}{T}}\left[\sin\left((n + m) \frac{2\pi}{T}x\right)\right]_{0}^{T}\right)\\
&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) \frac{2\pi}{T}T\right) - \sin\left((n - m) \frac{2\pi}{T}0\right)\right) + \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) \frac{2\pi}{T}T\right) - \sin\left((n + m) \frac{2\pi}{T}0\right)\right)\right)\\
&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) 2\pi\right) - \sin\left(0\right)\right) + \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) 2\pi\right) - \sin\left(0\right)\right)\right)\\
&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right) + \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right)\right)\\
&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(0 - 0\right) + \frac{1}{(n + m) \frac{2\pi}{T}}\left(0 - 0\right)\right)\\
&= 0\\
\end{align}
$$

Proof that the area under the product of two cosines with the same frequencies is half their period:
$$
\begin{align}
\int_{0}^{T} \cos\left(n x\frac{2\pi}{T}\right) \cos\left(n x\frac{2\pi}{T}\right) dx
&= \int_{0}^{T} \cos^2\left(n x\frac{2\pi}{T}\right) dx\\
&= \int_{0}^{T} \frac{1}{2}\left(1 + \cos\left(2n x\frac{2\pi}{T}\right)\right) dx\\
&= \frac{1}{2}\left(\int_{0}^{T} 1 dx + \int_{0}^{T}\cos\left(2n\frac{2\pi}{T} x\right) dx\right)\\
&= \frac{1}{2}\left(\left[x \right]_{0}^{T} + \frac{1}{2n\frac{2\pi}{T}}\left[\sin\left(2n\frac{2\pi}{T} x\right)\right]_{0}^{T}\right)\\
&= \frac{1}{2}\left(\left(T - 0\right) + \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2n\frac{2\pi}{T} T\right) - \sin\left(2n\frac{2\pi}{T} 0\right)\right)\right)\\
&= \frac{1}{2}\left(T + \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2n2\pi\right) - \sin\left(0\right)\right)\right)\\
&= \frac{1}{2}\left(T + \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right)\right)\\
&= \frac{1}{2}\left(T + \frac{1}{2n\frac{2\pi}{T}}\left(0 - 0\right)\right)\\
&= \frac{T}{2}\\
\end{align}
$$

Also interestingly, proof that the area under the product of a sine and cosine, regardless of frequency, is 0:
$$
\begin{align}
\int_{0}^{T} \sin\left(n x\frac{2\pi}{T}\right) \cos\left(m x\frac{2\pi}{T}\right) dx
&= \int_{0}^{T} \frac{1}{2}\left(\sin\left(n x\frac{2\pi}{T} + m x\frac{2\pi}{T}\right) + \sin\left(n x\frac{2\pi}{T} - m x\frac{2\pi}{T}\right)\right) dx\\
&= \frac{1}{2}\left(\int_{0}^{T} \sin\left((n + m) \frac{2\pi}{T}x\right) dx + \int_{0}^{T} \sin\left((n - m) \frac{2\pi}{T}x\right) dx\right)\\
&= \frac{1}{2}\left(\frac{1}{(n + m) \frac{2\pi}{T}}\left[\sin\left((n + m) \frac{2\pi}{T}x\right)\right]_{0}^{T} + \frac{1}{(n - m) \frac{2\pi}{T}}\left[\sin\left((n - m) \frac{2\pi}{T}x\right)\right]_{0}^{T}\right)\\
&= \frac{1}{2}\left(\frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) \frac{2\pi}{T}T\right) - \sin\left((n + m) \frac{2\pi}{T}0\right)\right) + \frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) \frac{2\pi}{T}T\right) - \sin\left((n - m) \frac{2\pi}{T}0\right)\right)\right)\\
&= \frac{1}{2}\left(\frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) 2\pi\right) - \sin\left(0\right)\right) + \frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) 2\pi\right) - \sin\left(0\right)\right)\right)\\
&= \frac{1}{2}\left(\frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right) + \frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right)\right)\\
&= \frac{1}{2}\left(\frac{1}{(n + m) \frac{2\pi}{T}}\left(0 - 0\right) + \frac{1}{(n - m) \frac{2\pi}{T}}\left(0 - 0\right)\right)\\
&= 0\\
\end{align}
$$

Great! Now we can get back to the Fourier transform. Suppose that we have the following sum of sines and cosines:
$$
\begin{align}
f(x) = &a_1 \cos\left(1 x\frac{2\pi}{T}\right) + b_1 \sin\left(1 x\frac{2\pi}{T}\right) + \\
&a_2 \cos\left(2 x\frac{2\pi}{T}\right) + b_2 \sin\left(2 x\frac{2\pi}{T}\right) + \\
&\dots
\end{align}
$$

How can we tease out the individual amplitudes $a_i$ and $b_i$? Thanks to the identities we proved above, we can now get any amplitude we want by multiplying the function by a sine or cosine, finding the area under the graph, and dividing the area by $\frac{T}{2}$. Here's how it works:
$$
\begin{align}
\frac{2}{T}\int_0^T \cos\left(n x\frac{2\pi}{T}\right) f(x) dx
&= \frac{1}{2T}\int_0^T \cos\left(n x\frac{2\pi}{T}\right)
\begin{pmatrix}
&a_1 cos\left(1 x\frac{2\pi}{T}\right) & + b_1 sin\left(1 x\frac{2\pi}{T}\right) + \\
&a_2 cos\left(2 x\frac{2\pi}{T}\right) & + b_2 sin\left(2 x\frac{2\pi}{T}\right) + \\
&\dots&
\end{pmatrix}
dx\\
&= \frac{2}{T}
\begin{pmatrix}
&a_1 \int_0^T\cos\left(n x\frac{2\pi}{T}\right) cos\left(1 x\frac{2\pi}{T}\right) dx & + b_1 \int_0^T\cos\left(n x\frac{2\pi}{T}\right) sin\left(1 x\frac{2\pi}{T}\right) dx + \\
&\dots&\\
&a_n \int_0^T\cos\left(n x\frac{2\pi}{T}\right) cos\left(n x\frac{2\pi}{T}\right) dx & + b_n \int_0^T\cos\left(n x\frac{2\pi}{T}\right) sin\left(n x\frac{2\pi}{T}\right) dx + \\
&\dots&
\end{pmatrix}\\
&= \frac{2}{T}
\begin{pmatrix}
&a_1 0 & + b_1 0 + \\
&\dots&\\
&a_n \frac{T}{2} & + b_n 0 + \\
&\dots&
\end{pmatrix}\\
&= \frac{2}{T} a_n \frac{T}{2}\\
&= a_n
\end{align}
$$

And obviously multiplying by $\sin\left(n x\frac{2\pi}{T}\right)$ would yield the amplitude of a sine. All that's left is by how much to vertically lift the graph, that is, the constant term to add to the sinusoids. The sinusoids should oscillate around the average value of the original function in one period, which is found as follows:

$y_0 = \frac{1}{T} \int_0^T f(x) dx$

Getting back to $f(x) = \cosh(x-1)$, here are the amplitudes:

$y_0 = \frac{1}{2} \int_0^2 f(x) dx = 1.1752$
$a_1 = \frac{2}{2} \int_0^2 \cos\left(1 x\frac{2\pi}{2}\right) f(x) dx = 0.2162$
$b_1 = \frac{2}{2} \int_0^2 \sin\left(1 x\frac{2\pi}{2}\right) f(x) dx = 0.0$
$a_2 = \frac{2}{2} \int_0^2 \cos\left(2 x\frac{2\pi}{2}\right) f(x) dx = 0.0581$
$b_2 = \frac{2}{2} \int_0^2 \sin\left(2 x\frac{2\pi}{2}\right) f(x) dx = 0.0$

And here are the graphs of the approximated function getting better with every new term (where the approximated function is denoted as $\hat{f}$):

$\hat{f}(x) = 1.1752$

$\hat{f}(x) = 1.1752 + 0.2162 \cos\left(1 x\frac{2\pi}{2}\right)$

$\hat{f}(x) = 1.1752 + 0.2162\cos\left(1 x\frac{2\pi}{2}\right) + 0.0581\cos\left(2 x\frac{2\pi}{2}\right)$

And of course here is the function beyond the first period:

Recognising simple shapes with OpenCV in Python

2019-07-31T11:45:00.000+02:00

Convolutional neural networks are usually used to recognise objects in images but that's a bit of an overkill if you just want to recognise simple flat shapes. Here's how to use OpenCV to do simple object recognition.

Basically the trick to recognise polygons is to convert your image into an approximate polygon representation using something like edge detection and then count the number of sides in the polygon. OpenCV handles a lot of this stuff for you so its quite easy.

Here's the image we will be working on:

The first step is to binarise the pixels of the image, that is they are made either black or white. First the image must be turned into greyscale so that there is just one number per pixel and then anything that is not white (greyscale value 255) is replaced with 255 (white) whilst pure white is replaced with 0 (black).

import cv2
import numpy as np

img = cv2.imread('polygons.png', cv2.IMREAD_COLOR)

#Make image greyscale
grey = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
cv2.imshow('greyscale', grey)

#Binarise greyscale pixels
(_, thresh) = cv2.threshold(grey, 254, 255, 1)
cv2.imshow('thresholded', thresh)

Next we're going to find contours around these white shapes, that is, reduce the image into a set of points with each point being a corner in the shape. This is done using the findContours function which returns a list consisting of contour points for every shape and their hierarchy. The hierarchy is the way each shape is nested within other shapes. We won't need this here since we don't have any shapes within shapes but it will be useful in other situations.

#Get shape contours
(contours, hierarchy) = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
thresh_copy = cv2.cvtColor(thresh, cv2.COLOR_GRAY2RGB)
cv2.drawContours(thresh_copy, contours, contourIdx=-1, color=(0, 255, 0), thickness=2)
print('number of sides per shape:')
for contour in contours:
    print('', contour.shape[0])
print()
cv2.imshow('contours', thresh_copy)

number of sides per shape:
 119
 122
 4
 124

Unfortunately, the number of sides extracted from each shape is nowhere near what it should be. This is because the corners in the shapes are rounded which results in multiple sides around the corner. We therefore need to simplify these contours so that insignificant sides can be removed. This is done using the Ramer–Douglas–Peucker algorithm which is implemented as the approxPolyDP function. The algorithm requires a parameter called epsilon which controls how roughly to chop up the polygon into a smaller number of sides. This number is dependent on the size of the polygon so we make it a fraction of the shape's perimeter.

#Simplify contours
thresh_copy = cv2.cvtColor(thresh, cv2.COLOR_GRAY2RGB)
print('number of sides per shape:')
for contour in contours:
    perimeter = cv2.arcLength(contour, True)
    e = 0.05*perimeter #The bigger the fraction, the more sides are chopped off the original polygon
    contour = cv2.approxPolyDP(contour, epsilon=e, closed=True)
    cv2.drawContours(thresh_copy, [ contour ], contourIdx=-1, color=(0, 255, 0), thickness=2)
    print('', contour.shape[0])
cv2.imshow('simplified contours', thresh_copy)

number of sides per shape:
 6
 5
 4
 3

And with this we have the number of sides in each polygon.

If you want to check for circles as well as polygons then you will not be able to do so by counting sides. Instead you can get the minimum enclosing circle around a contour and check if its area is close to the area of the contour (before it is simplified):

((centre_x, centre_y), radius) = cv2.minEnclosingCircle(contour)
if cv2.contourArea(contour)/(np.pi*radius**2) > 0.95:
    print('circle')

The minimum enclosing circle is the smallest circle that completely contains the contour. Therefore it's area must necessarily be larger or equal to the shape of the contour, which can only be equal in the case that the contour is a circle.

This is also one way how you can get the position of each shape, by getting the centre point of the enclosing circle. The proper way to get a single coordinate representing the position of the contour is to get the centre of gravity using the contour's moments:

m = cv2.moments(contour)
x = m['m10']/m['m00']
y = m['m01']/m['m00']

Here's the full code:

import cv2
import numpy as np

img = cv2.imread('polygons.png', cv2.IMREAD_COLOR)

#Binarise pixels
grey = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
(_, thresh) = cv2.threshold(grey, 254, 255, 1)

#Get shape contours
(contours, hierarchy) = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

#Recognise shapes
for contour in contours:
    m = cv2.moments(contour)
    x = round(m['m10']/m['m00'])
    y = round(m['m01']/m['m00'])

    shape_name = None

    (_, radius) = cv2.minEnclosingCircle(contour)
    if cv2.contourArea(contour)/(np.pi*radius**2) > 0.95:
        shape_name = 'circle'
    else:
        e = 0.05*cv2.arcLength(contour, True)
        simple_contour = cv2.approxPolyDP(contour, epsilon=e, closed=True)
        num_sides = simple_contour.shape[0]
        shape_name = { 3: 'triangle', 4: 'quad', 5: 'pentagon', 6: 'hexagon' }.get(num_sides, 'unknown')

    cv2.putText(img, shape_name, (x, y), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.6, color=(0, 255, 0), thickness=2)

cv2.imshow('named shapes', img)