tag:blogger.com,1999:blog-43189444598231434732020-07-08T21:01:59.303+02:00Geeky is AwesomeTo learn is to teach.Unknownnoreply@blogger.comBlogger128125tag:blogger.com,1999:blog-4318944459823143473.post-35374338230829142672020-06-30T21:53:00.002+02:002020-06-30T21:53:46.905+02:00Git command line cheat sheetAfter my <a href="https://geekyisawesome.blogspot.com/2020/05/quick-git-command-line-tutorial.html">previous blog post</a> about introducing the git command line, here is a table of all the commands I regularly use in git:<br /><br /><div class="sectitle">The basics</div><br /><table border="1"><tr><th>Command</th><th>Explanation</th></tr> <tr><td><pre>git init .</pre></td><td>Make the current directory a git repository.</td></tr> <tr><td><pre>git status</pre></td><td>Get information about the current state of the repository.</td></tr> <tr><td><pre>git add (path)</pre></td><td>Add a modified file to the git index. Use '.' for path in order to add all modified files.</td></tr> <tr><td><pre>git rm (path)</pre></td><td>Remove a modified file from the git index.</td></tr> <tr><td><pre>git diff</pre></td><td>Get the file changes from what is in the git index to what is in the currently modified file.</td></tr> <tr><td><pre>git commit</pre></td><td>Take a snapshot backup of the current index.</td></tr> <tr><td><pre>git log</pre></td><td>Get a list of commits made.</td></tr> <tr><td><pre>git tag -a "(tag name)"</pre></td><td>Tag the last commit with a special name (can be used to mark version numbers).</td></tr> <tr><td><pre>git tag</pre></td><td>List existing tags.</td></tr> <tr><td><pre>git reset --hard HEAD</pre></td><td>Return to the last commit (permanently remove any uncommitted changes).</td></tr> </table><br /><br /><div class="sectitle">Branches</div><br /><table border="1"><tr><th>Command</th><th>Explanation</th></tr> <tr><td><pre>git branch (branch name)</pre></td><td>Create a new branch.</td></tr> <tr><td><pre>git branch --list</pre></td><td>List existing branches.</td></tr> <tr><td><pre>git branch --delete (branch name)</pre></td><td>Delete an existing branch.</td></tr> <tr><td><pre>git checkout (branch name)</pre></td><td>Jump to a different branch.</td></tr> <tr><td><pre>git merge (branch name)</pre></td><td>Merge the given branch into the current branch so that they become one branch.</td></tr> <tr><td><pre>git stash push</pre></td><td>Stash any file changes without committing them in order to jump to another branch.</td></tr> <tr><td><pre>git stash pop</pre></td><td>Retrieve back stashed file changes.</td></tr> <tr><td><pre>git log --graph</pre></td><td>Like log but will show how commits in different branches are connected together.</td></tr> </table>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-36067830653856672832020-05-31T11:52:00.001+02:002020-06-11T16:08:43.698+02:00Quick Git command line tutorialGit is a version control system which facilitates the creation and maintenance of versions in a project. Usually it's used for software code but it can also be used for things like documents. Its most useful for anything that is text based as you will be able to see which lines have changed across versions.<br /><br />Given that you've <a href="https://git-scm.com/downloads">installed git</a>, create folder that will store your project, open your terminal (cmd in Windows), navigate to the folder, and turn it into a repository by entering:<br /><br /><pre>git init .</pre><br />This will create a folder called ".git" in your current directory which lets you do repository stuff to it. Now whenever you want to do repository stuff, just open your terminal and navigate back to the folder. To see this we can ask for a status report of the repo by entering:<br /><br /><pre>git status</pre><br />This will output the following:<br /><br /><pre style="background-color:black;color:white;">On branch master<br /><br />Initial commit<br /><br />nothing to commit (create/copy files and use "git add" to track)<br /></pre><br />This is basically telling us that the repo is empty. Now we start putting stuff there. Let's create a readme file using <a href="https://daringfireball.net/projects/markdown/basics">markdown</a> with the file name readme.md:<br /><br /><pre style="border:black 1px solid;">My first git repo<br />=================<br /><br />This is my readme file.<br /><br /></pre><br />After saving, if we go back to the terminal and enter:<br /><br /><pre>git status</pre><br />we will now see:<br /><br /><pre style="background-color:black;color:white;">On branch master<br /><br />Initial commit<br /><br />Untracked files:<br /> (use "git add <file>..." to include in what will be committed)<br /><br /> readme.md<br /><br />nothing added to commit but untracked files present (use "git add" to track)<br /></pre><br />This is saying that readme.md is a file in the repo folder that is not being kept track of by git. We can add this file to the git index by entering:<br /><br /><pre>git add readme.md</pre><br />After asking for the status again, we will now see:<br /><br /><pre style="background-color:black;color:white;">On branch master<br /><br />Initial commit<br /><br />Changes to be committed:<br /> (use "git rm --cached <file>..." to unstage)<br /><br /> new file: readme.md<br /></pre><br />Now the file is in the index and is 'staged'. If we update the file and save it again:<br /><br /><pre style="border:black 1px solid;">My first git repo<br />=================<br /><br />This is my updated readme file.<br /><br /></pre><br />the status will now say:<br /><br /><pre style="background-color:black;color:white;">On branch master<br /><br />Initial commit<br /><br />Changes to be committed:<br /> (use "git rm --cached <file>..." to unstage)<br /><br /> new file: readme.md<br /><br />Changes not staged for commit:<br /> (use "git add <file>..." to update what will be committed)<br /> (use "git checkout -- <file>..." to discard changes in working directory)<br /><br /> modified: readme.md<br /></pre><br />This is saying that we have a staged file that has also been modified. We can check what has been changed in the file since we last staged it by entering:<br /><br /><pre>git diff</pre><br /><pre style="background-color:black;color:white;">diff --git a/readme.md b/readme.md<br />index 75db517..5bdd78c 100644<br />--- a/readme.md<br />+++ b/readme.md<br />@@ -1,4 +1,4 @@<br /> My first git repo<br /> =================<br /><br />-This is my readme file.<br />+This is my updated readme file.<br /></pre><br />Diff shows you all the lines that were changed together with some unchanged lines next to them for context. The '-' line was replaced by the '+' line. We're also told in "@@ -1,4 +1,4 @@" that the line number that was changed was 4 (1 line was removed at line 4, and another line was added at line 4).<br /><br />Now we stage this modification so that it is also kept track of:<br /><br /><pre>git add readme.md</pre><br /><pre style="background-color:black;color:white;">On branch master<br /><br />Initial commit<br /><br />Changes to be committed:<br /> (use "git rm --cached <file>..." to unstage)<br /><br /> new file: readme.md<br /></pre><br />Staging changes is not the point of repositories. The point is committing the staged changes. A commit is a backup of the currently indexed files. When you take a backup, you make a copy of your project folder and give it a name so that you can go back to it. This is what a commit does. Enter:<br /><br /><pre>git commit</pre><br />This will open up a text editor so you can enter a description of what you're committing.<br /><ul><li>If the text editor is <a href="https://www.vim.org/">vim</a>, which you will know because it is not helpful at all, you need to first press 'i' for 'insert' before you type anything. To save, press ESC, followed by ':', followed by 'w', followed by enter. To exit, press ESC, followed by ':', followed by 'q', followed by enter.</li><li>If it's nano then just follow the on screen commands, noting that '^' means CTRL.</li></ul><br />The commit message you write can later be searched in order to find a particular commit. Note that commits are generally thought of as being unchangable after they are made, so make sure you write everything you need to. The first line of the commit message has special importance and is considered to be a summary of what has changed. You should keep it under 50 characters long so that it can be easily displayed in a table. It should be direct and concise (no full stops at the end, for example), with the rest of the lines under it being a more detailed description to make up for any missing information in the first line. A blank line needs to separate the first line from the rest of the lines. Note that it's fine to only have the first line is it's enough.<br /><br /><pre style="border:black 1px solid;">added the word 'updated' to readme<br /><br />readme.md was updated so that it says that it is my updated readme file.<br /># Please enter the commit message for your changes. Lines starting<br /># with '#' will be ignored, and an empty message aborts the commit.<br /># On branch master<br />#<br /># Initial commit<br />#<br /># Changes to be committed:<br /># new file: readme.md<br /></pre><br />The lines with the # in front were written by git and should be left there. If we check the status again we will now see:<br /><br /><pre style="background-color:black;color:white;">On branch master<br />nothing to commit, working tree clean<br /></pre><br />This is saying that there were no changes since the last commit. We can now see all of our commits by entering:<br /><br /><pre>git log</pre><br /><pre style="background-color:black;color:white;">Author: mtanti <redacted@gmail.com><br />Date: Sat May 30 10:46:17 2020 +0200<br /><br /> added the word 'updated' to readme<br /><br /> readme.md was updated so that it says that it is my updated readme file.<br /></pre><br />We can see who made the commit, when, and what message was written. It's important that commits are done frequently but on complete changes. A commit is not a save (you do not commit each time you save), it is a backup, and the state of your project in each backup should make sense. Think of it as if you're crossing out items in a todo list and with each item crossed out you're taking a backup. You should not take a backup in between items. On the other hand, your items should be broken down into many simple tasks in order to be able to finish each one quickly.<br /><br />Now, let's add a folder called 'src' to our repo and then check the status.<br /><br /><pre style="background-color:black;color:white;">On branch master<br />nothing to commit, working tree clean<br /></pre><br />In git's eyes, nothing has changed because git does not have a concept of a folder, only of the directory of files. We need to put a file in the folder in order to be able to add it to git's index. Let's add 2 text files to src: 'a.txt' and 'b.txt' with the following content each:<br /><br /><pre style="border:black 1px solid;">line 1<br />line 2<br />line 3<br /><br /></pre><br />The status now shows:<br /><br /><pre style="background-color:black;color:white;">On branch master<br />Untracked files:<br /> (use "git add <file>..." to include in what will be committed)<br /><br /> src/<br /><br />nothing added to commit but untracked files present (use "git add" to track)<br /></pre><br />This is saying that we have a new folder called 'src' with some files in it. We can add the folder by using the add command. If you want you can avoid having to include the file names of the files you're adding by just using a '.', which means "all unstaged modified or new files":<br /><br /><pre>git add .</pre><br /><pre style="background-color:black;color:white;">On branch master<br />Changes to be committed:<br /> (use "git reset HEAD <file>..." to unstage)<br /><br /> new file: src/a.txt<br /> new file: src/b.txt<br /></pre><br />Let's commit these two files.<br /><br /><pre>git commit</pre><br /><pre style="border:black 1px solid;">added source files<br /><br /># Please enter the commit message for your changes. Lines starting<br /># with '#' will be ignored, and an empty message aborts the commit.<br /># On branch master<br /># Changes to be committed:<br /># new file: src/a.txt<br /># new file: src/b.txt<br /></pre><br />Note that I didn't add anything else to the message other than the first line. There is no need to specify what files were added as they can be seen in the commit. Now let's look at the log of commits:<br /><br /><pre>git log</pre><br /><pre style="background-color:black;color:white;">commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e (HEAD -> master)<br />Author: mtanti <redacted@gmail.com><br />Date: Sat May 30 11:17:14 2020 +0200<br /><br /> added source files<br /><br />commit f71f17b63c6b3ddb7506000cbc422e8f1b173958<br />Author: mtanti <redacted@gmail.com><br />Date: Sat May 30 10:46:17 2020 +0200<br /><br /> added the word 'updated' to readme<br /><br /> readme.md was updated so that it says that it is my updated readme file.<br /></pre><br />We can see how all the commits are shown in descending order of when they were made. You might be wondering what 'HEAD' is referring to. HEAD is the commit we are working on. We can now move in between commits and move in time. This is very useful if you start working on something and realise that there was a better way to do it and need to undo all your work up to a particular point in the commit history. When we do this, we would be moving the HEAD to a different commit. The HEAD can be moved by using the checkout command:<br /><br /><pre>git checkout HEAD~1</pre><br />This is saying "go back one commit behind the HEAD". The command gives the following output:<br /><br /><pre style="background-color:black;color:white;">Note: checking out 'HEAD~1'.<br /><br />You are in 'detached HEAD' state. You can look around, make experimental<br />changes and commit them, and you can discard any commits you make in this<br />state without impacting any branches by performing another checkout.<br /><br />If you want to create a new branch to retain commits you create, you may<br />do so (now or later) by using -b with the checkout command again. Example:<br /><br /> git checkout -b <new-branch-name><br /><br />HEAD is now at f71f17b... added the word 'updated' to readme<br /></pre><br />This is saying that we are now in the commit with the message "added the word 'updated' to readme". It is also possible to use the hash as a commit identifier in order to just to it directly without being relative to the HEAD. The hash is the 40 digit hexadecimal next to the commits in the log. For example, the first commit had a hash of 'f71f17b63c6b3ddb7506000cbc422e8f1b173958' so we could have entered "git checkout f71f17b63c6b3ddb7506000cbc422e8f1b173958". We can also just use the first 7 digits to avoid typing everything but it's more likely that there will be a collision with another hash.<br /><br />If look at the log now, we'll see:<br /><br /><pre style="background-color:black;color:white;">commit f71f17b63c6b3ddb7506000cbc422e8f1b173958 (HEAD)<br />Author: mtanti <redacted@gmail.com><br />Date: Sat May 30 10:46:17 2020 +0200<br /><br /> added the word 'updated' to readme<br /><br /> readme.md was updated so that it says that it is my updated readme file.<br /></pre><br />which shows that the HEAD has been moved one commit behind in the timeline.<br /><br />Now if we look at the project folder (and refresh), we'll see that the folder 'src' has been physically removed. We can restore it by moving forward in time and go to the latest commit which includes the 'src' folder.<br /><br />Unfortunately, there is no direct notation for moving forward in time as what we just did is not normal usage of git. Note that at the moment the HEAD is said to be 'detached', which means that it is not in a proper place (the end of the timeline). We can get back to the proper place we should be in by checking out to 'master'.<br /><br /><pre>git checkout master</pre><br />Checking the log, we now see:<br /><br /><pre style="background-color:black;color:white;">commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e (HEAD, master)<br />Author: mtanti <redacted@gmail.com><br />Date: Sat May 30 11:17:14 2020 +0200<br /><br /> added source files<br /><br />commit f71f17b63c6b3ddb7506000cbc422e8f1b173958<br />Author: mtanti <redacted@gmail.com><br />Date: Sat May 30 10:46:17 2020 +0200<br /><br /> added the word 'updated' to readme<br /><br /> readme.md was updated so that it says that it is my updated readme file.<br /></pre><br />So what is this 'master' business? What we were calling timelines are actually called 'branches' in git, and branches are one of the most important things in git. Imagine you've started working on a new feature in your program. Suddenly you are told to let go of everything you're doing, work on fixing a bug, and quickly release an update right away. The feature you're working on is half way done and you can't release an updated version of the program with a half finished function; but there's no way you'll finish the feature quickly enough. Do you undo all the work you did on the feature so that you're at a stable version of the program and able to fix the bug? Of course not. That's what branches are for.<br /><br />With branches you can keep several versions of your code available and switch from one to the other with checkout. The master branch is the one you start with. Ideally the master's last commit should always be in a publishable state (no 'work in progress'). Of course if you're just working on the master branch then this would not be possible without taking committing very rarely, which is bad practice. The solution is to have several development branches on which you modify the project bit by bit. Every time you start working on a new publishable version of your project, you start a new branch and work on modifying your project to create the new version. Once you finish what you're doing and are happy with the result, you then merge the development branch into the master branch. If you're in the middle of something and need to fix a bug, you switch to the master branch, create a new branch for fixing the bug, fix it, and merge it to the master. Then you merge the new master code with your earlier development branch so that you can continue working as if nothing happened.<br /><br />Let's make a development branch. Enter the following:<br /><br /><pre>git branch mydevbranch</pre><br />This will create a branch called 'mydevbranch' that sticks out from the current branch at the current HEAD, that is, it will create a branch that sticks out from master's last commit. By 'sticks out' I mean that when you switch to mydevbranch, the project will be changed to look like it was at the commit from where the branch is starting from. Alternatively you can include a commit hash after the name of the branch in order to make it stick out from an earlier point in the current branch. For example "git branch mydevbranch f71f17b63c6b3ddb7506000cbc422e8f1b173958" will create a branch from the first commit.<br /><br />We can see a list of branches by entering:<br /><br /><pre>git branch --list</pre><br /><pre style="background-color:black;color:white;">* master<br /> mydevbranch<br /></pre><br />The asterisk shows which branch is currently active (has the HEAD).<br /><br />Now switch to the new branch using checkout:<br /><br /><pre>git checkout mydevbranch</pre><br />and check the status:<br /><br /><pre style="background-color:black;color:white;">On branch mydevbranch<br />nothing to commit, working tree clean<br /></pre><br />It now says that we're on mydevbranch instead of on master. Note that whilst we're on the new branch, any modifications we make to the project will be stored on the branch whilst master will remain as it is. Let's modify src/a.txt to look like this:<br /><br /><pre style="border:black 1px solid;">line 1<br />line 2 changed<br />line 3<br /><br /></pre><br />And now add and commit this file and then check the log:<br /><br /><pre style="background-color:black;color:white;">commit 9bc4488ac847bceccb746eeafb1a8c239de350f2 (HEAD -> mydevbranch) Author: mtanti <redacted@gmail.com> Date: Sun May 31 10:24:23 2020 +0200 changed line in a.txt commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e (master) Author: mtanti <redacted@gmail.com> Date: Sat May 30 11:17:14 2020 +0200 added source files commit f71f17b63c6b3ddb7506000cbc422e8f1b173958 Author: mtanti <redacted@gmail.com> Date: Sat May 30 10:46:17 2020 +0200 added the word 'updated' to readme readme.md was updated so that it says that it is my updated readme file.<br /></pre><br />Note how the log shows us where each branch is and where the HEAD is. In this case the HEAD is at mydevbranch's last commit and master is one commit behind it. Let's add another commit after changing src/b.txt:<br /><br /><pre style="border:black 1px solid;">line 1<br />line 2<br />line 3<br />line 4<br /><br /></pre><br />Now that we're done with the changes in this new version, we can merge the development branch to master by first switching to master and then merging to mydevbranch:<br /><br /><pre>git checkout master<br />git merge mydevbranch<br /></pre><br /><pre style="background-color:black;color:white;">Updating 59cfc3f..f962efb<br />Fast-forward<br /> src/a.txt | 2 +-<br /> src/b.txt | 1 +<br /> 2 files changed, 2 insertions(+), 1 deletion(-)<br /></pre><br />Note that this was a fast forward merge, which is the best kind of merge you can have. It happens when all the changes were made in a neat sequence which can be simply reapplied on the master, as opposed to having multiple branches changing stuff at the same time.<br /><br />Before seeing what a merge conflict looks like, let's look at the log again now that we've made the merge, but this time as a graph:<br /><br /><pre>git log --graph</pre><br /><pre style="background-color:black;color:white;">* commit f962efb409e4f08f94d717dec866519bc2848e8f (HEAD -> master, mydevbranch)<br />| Author: mtanti <redacted@gmail.com><br />| Date: Sun May 31 10:30:42 2020 +0200<br />|<br />| added a new line to b.txt<br />|<br />* commit 9bc4488ac847bceccb746eeafb1a8c239de350f2<br />| Author: mtanti <redacted@gmail.com><br />| Date: Sun May 31 10:24:23 2020 +0200<br />|<br />| changed line in a.txt<br />|<br />* commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e<br />| Author: mtanti <redacted@gmail.com><br />| Date: Sat May 30 11:17:14 2020 +0200<br />|<br />| added source files<br />|<br />* commit f71f17b63c6b3ddb7506000cbc422e8f1b173958<br /> Author: mtanti <redacted@gmail.com><br /> Date: Sat May 30 10:46:17 2020 +0200<br /><br /> added the word 'updated' to readme<br /><br /> readme.md was updated so that it says that it is my updated readme file.<br /></pre><br />Here we can see a neat straight line moving through a timeline of commits, from the first to the last with no splits along the way. The master and development branch are together on the last commit in the timeline. Now let's see how this can be different.<br /><br />Switch back to the development branch so that you can start working on the next version of the project:<br /><br /><pre>git checkout mydevbranch</pre><br />and change src/a.txt to have a new line:<br /><br /><pre style="border:black 1px solid;">line 1<br />line 2 changed<br />line 3<br />line 4<br /><br /></pre><br />As you're working on the file, you get a call from your boss who tells you to immediately change all the lines to start with a capital letter in both src/a.txt and src/b.txt. Now what? You need to stop working on mydevbranch, go back to master, and create a new branch for working on the new request.<br /><br />Before switching branches it's important that you do not have any uncommitted changes in your project, otherwise they will be carried over to the master branch and that would make things confusing. If you're not at a commitable point in your change then you can save the current state of the branch files by stashing them instead:<br /><br /><pre>git stash push</pre><br />This will add a stash object to the current commit of the current branch which can then be retrieved and removed. Next checkout to master:<br /><br /><pre>git checkout master</pre><br />Create and switch to a new branch.<br /><br /><pre>git branch myurgentbranch<br />git checkout myurgentbranch<br /></pre><br />and apply the urgent modifications.<br /><br />src/a.txt<br /><pre style="border:black 1px solid;">Line 1<br />Line 2 changed<br />Line 3<br /><br /></pre><br />src/b.txt<br /><pre style="border:black 1px solid;">Line 1<br />Line 2<br />Line 3<br />Line 4<br /><br /></pre><br />Commit these changes and merge to master:<br /><br /><pre>git add .<br />git commit<br />git checkout master<br />git merge myurgentbranch<br /></pre><br />The merge should be a fast forward since we started working on the last commit of master with no interruptions. The log will show us the current situation:<br /><br /><pre>git log --graph</pre><br /><pre style="background-color:black;color:white;">* commit 5dcd492113d3942550a58efdc7b90e15bd36d537 (HEAD -> master, myurgentbranch) | Author: mtanti <redacted@gmail.com> | Date: Sun May 31 11:06:39 2020 +0200 | | capitalised each line in source files | * commit f962efb409e4f08f94d717dec866519bc2848e8f (mydevbranch) | Author: mtanti <redacted@gmail.com> | Date: Sun May 31 10:30:42 2020 +0200 | | added a new line to b.txt<br />|<br />* commit 9bc4488ac847bceccb746eeafb1a8c239de350f2<br />| Author: mtanti <redacted@gmail.com><br />| Date: Sun May 31 10:24:23 2020 +0200<br />|<br />| changed line in a.txt<br />|<br />* commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e<br />| Author: mtanti <redacted@gmail.com><br />| Date: Sat May 30 11:17:14 2020 +0200<br />|<br />| added source files<br />|<br />* commit f71f17b63c6b3ddb7506000cbc422e8f1b173958<br /> Author: mtanti <redacted@gmail.com><br /> Date: Sat May 30 10:46:17 2020 +0200<br /><br /> added the word 'updated' to readme<br /><br /> readme.md was updated so that it says that it is my updated readme file.<br /></pre><br />You can see how when we commit the changes in mydevbranch, we'll have a fork in the timeline from the straight line that we currently have. Let's see what happens then.<br /><br />Now that we're ready from the urgent request we can go back to the normal development branch:<br /><br /><pre>git checkout mydevbranch</pre><br />and pop back the changes we stashed:<br /><br /><pre>git stash pop</pre><br />Note that "git stash list" shows what is in the stash. We were working on src/a.txt where we were adding a new line to the file:<br /><br /><pre style="border:black 1px solid;">line 1<br />line 2 changed<br />line 3<br />line 4<br /><br /></pre><br />Now let's imagine that we finished the changes we were making and can commit them:<br /><br /><pre>git add .<br />git commit<br /></pre><br />Now a.txt is supposed to have two changes: the new line and each line starting with a capital letter. Each of these changes are on a different branch. Let's start by making the development branch complete by merging the changes we applied to the master to the development branch (note that we're reversing the direction of the merge now because we want the development branch to be up to date).<br /><br /><pre>git merge master</pre><br /><pre style="background-color:black;color:white;">Auto-merging src/a.txt<br />CONFLICT (content): Merge conflict in src/a.txt<br />Automatic merge failed; fix conflicts and then commit the result.<br /></pre><br />This is when things start getting hairy as you'll need to manually fix your conflicting changes. The status will tell us which files need to be fixed:<br /><br /><pre style="background-color:black;color:white;">On branch mydevbranch<br />You have unmerged paths.<br /> (fix conflicts and run "git commit")<br /> (use "git merge --abort" to abort the merge)<br /><br />Changes to be committed:<br /><br /> modified: src/b.txt<br /><br />Unmerged paths:<br /> (use "git add <file>..." to mark resolution)<br /><br /> both modified: src/a.txt<br /></pre><br />This saying that during merging, src/b.txt was updated with no conflicts but src/a.txt requires manual intervention. If we open src/a.txt we'll see the following:<br /><br /><pre style="border:black 1px solid;"><<<<<<< HEAD<br />line 1<br />line 2 changed<br />line 3<br />line 4<br />=======<br />Line 1<br />Line 2 changed<br />Line 3<br />>>>>>>> master<br /><br /></pre><br />The file has been modified by git to show the conflicting changes. 7 arrows and equals signs are used to highlight sections of changes which need to be resolved. Note that all the lines have been changed here to the section is the whole file. Now we can either fix the file directly or use git's "git mergetool" to help us by showing all the changes. In this case we can modify the file directly:<br /><br /><pre style="border:black 1px solid;">Line 1<br />Line 2 changed<br />Line 3<br />Line 4<br /><br /></pre><br />Make sure to remove all the arrows and equals signs. Status will now output:<br /><br /><pre style="background-color:black;color:white;">All conflicts fixed but you are still merging.<br /> (use "git commit" to conclude merge)<br /><br />Changes to be committed:<br /><br /> modified: src/a.txt<br /> modified: src/b.txt<br /><br />Changes not staged for commit:<br /> (use "git add <file>..." to update what will be committed)<br /> (use "git checkout -- <file>..." to discard changes in working directory)<br /><br /> modified: src/a.txt<br /></pre><br />It's now saying that conflicts were fixed. All we need to do is add the fixed file and continue the merge.<br /><br /><pre>git add src/a.txt<br />git merge --continue<br /></pre><br />At the end of the merge, you will be asked to enter a commit message in order to automatically commit the merge change. Git automatically puts the message "Merge branch 'master' into mydevbranch", which is good enough.<br /><br />The log now shows the new timeline:<br /><br /><pre style="background-color:black;color:white;">* commit 49abf32812ec7dbeeab792729098a61fd3446a45 (HEAD -> mydevbranch)<br />|\ Merge: b6cc3c8 5dcd492<br />| | Author: mtanti <redacted@gmail.com><br />| | Date: Sun May 31 11:36:50 2020 +0200<br />| |<br />| | Merge branch 'master' into mydevbranch<br />| |<br />| * commit 5dcd492113d3942550a58efdc7b90e15bd36d537 (myurgentbranch, master)<br />| | Author: mtanti <redacted@gmail.com><br />| | Date: Sun May 31 11:06:39 2020 +0200<br />| |<br />| | capitalised each line in source files<br />| |<br />* | commit b6cc3c887533a995e749589b6cdbfaaad530b03e<br />|/ Author: mtanti <redacted@gmail.com><br />| Date: Sun May 31 11:17:55 2020 +0200<br />|<br />| added new line to a.txt<br />|<br />* commit f962efb409e4f08f94d717dec866519bc2848e8f<br />| Author: mtanti <redacted@gmail.com><br />| Date: Sun May 31 10:30:42 2020 +0200<br />|<br />| added a new line to b.txt<br />|<br />* commit 9bc4488ac847bceccb746eeafb1a8c239de350f2<br />| Author: mtanti <redacted@gmail.com><br />| Date: Sun May 31 10:24:23 2020 +0200<br />|<br />| changed line in a.txt<br />|<br />* commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e<br />| Author: mtanti <redacted@gmail.com><br />| Date: Sat May 30 11:17:14 2020 +0200<br />|<br />| added source files<br />|<br />* commit f71f17b63c6b3ddb7506000cbc422e8f1b173958<br /> Author: mtanti <redacted@gmail.com><br /> Date: Sat May 30 10:46:17 2020 +0200<br /><br /> added the word 'updated' to readme<br /><br /> readme.md was updated so that it says that it is my updated readme file.<br /></pre><br />We can now continue modifying our development branch with anything left to add in the new version and then switch to master and merge, which will be a fast forward since we have resolved all conflicts already. Note that if you see the log before merging, you will not see the development branch since it is not in the master's timeline. You can see all timelines by entering "git log --graph --all".<br /><br /><pre>git checkout master<br />git merge mydevbranch<br /></pre><br />If you want to delete the urgent branch, just enter "git branch --delete myurgentbranch".<br /><br />This concludes our quick tutorial. I didn't mention anything about remote repositories and pushing and pulling to and from the repositories but basically if you setup a github account, you can keep a backup online for multiple developers to work together, pushing the local repository to the online one and pulling the online repository when it was changed by someone else. The online repository is referred to as 'origin' in git.<br /><br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-82049618090891550692020-04-30T23:04:00.002+02:002020-04-30T23:04:58.535+02:00Rolling neighbourhood histograms: Computing histograms of sliding windows quicklyIn <a href="https://geekyisawesome.blogspot.com/2019/11/texture-descriptors-of-image-regions.html">an earlier blog post</a>, I described local binary patterns, which are vectors that describe the texture of image regions. In segmentation tasks where you need to group pixels together by texture, you need to compute the local binary pattern of the neighbourhoods of pixels around each pixel. This means that if you want to find the texture descriptor for a particular pixel 'P', first you need to extract a square of pixels centered on 'P' called the neighbourhood of 'P', compute the local binary codes of all the pixels in the neighbourhood, compute the histogram of the local binary codes, and that histogram is the texture descriptor of pixel 'P'. This is illustrated in the figure below.<br /><br /><a href="https://3.bp.blogspot.com/-QVlxdwCFvOw/Xqs9Sc-TT5I/AAAAAAAABgw/pEfOOppLhYI_0piFL2O6_X3BGoiWW_3DgCPcBGAYYCw/s1600/rollinghistogram_neighbourhood.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-QVlxdwCFvOw/Xqs9Sc-TT5I/AAAAAAAABgw/pEfOOppLhYI_0piFL2O6_X3BGoiWW_3DgCPcBGAYYCw/s320/rollinghistogram_neighbourhood.png" width="320" height="293" data-original-width="357" data-original-height="327" /></a><br /><br />Now in order to compute the texture descriptor of every pixel in the image you would need to repeat this process of extracting neighbourhoods and computing histograms for every pixel. But this is inefficient because it ends up recomputing the same values multiple times. First of all, the local binary codes do not need to be computed for each neighbourhood but can instead be computed once for the whole image and then the neighbourhoods be extracted from the local binary codes image instead of from the original image. But we can also make the histogram computations faster as well. This is because adjacent pixels have a lot of common pixels in their neighbourhoods as shown in the figure below.<br /><br /><a href="https://2.bp.blogspot.com/-Uk9bog2zCKY/Xqs9dCkHhlI/AAAAAAAABg0/UyYsQu4YkCQyFunA0OXzGwMDCyPC4pbGQCLcBGAsYHQ/s1600/rollinghistogram_slidingneighbourhood.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-Uk9bog2zCKY/Xqs9dCkHhlI/AAAAAAAABg0/UyYsQu4YkCQyFunA0OXzGwMDCyPC4pbGQCLcBGAsYHQ/s320/rollinghistogram_slidingneighbourhood.png" width="320" height="283" data-original-width="262" data-original-height="232" /></a><br /><br />We can take advantage of this commonality by computing the histogram of the new neighbourhood based on the histogram of the previous neighbourhood. Assuming the the histograms are simple frequency histograms, the new histogram is the previous histogram plus the frequencies of the values in the blue vertical column (see last row in the above figure) minus the frequencies of the values in the green vertical column. This means that instead of having to compute the frequencies of the whole neighbourhood you would only need to compute the frequencies for two columns and then add or subtract from the previous histogram. This reduces the time complexity from quadratic time to linear time with respect to the side of the neighbourhood square. Assuming that a neighbourhood is described by the (x,y) coordinate of the central pixel and the (w,h) width and height of the neighbourhood, and that histograms can be added together by vector addition, here is the relationship of the new histogram to the previous histogram:<br /><br /><pre>histogram(x,y,w,h) = histogram(x-1,y,w,h) + histogram(x-w/2,y,1,h) - histogram(x+w/2,y,1,h)<br /></pre><br />This lets us compute the histogram of each neighbourhood by iteratively using the previous histogram in the same row, requiring only the first pixel in the row to be computed fully. But we can also avoid computing the full histogram of the first pixel in a row after the first row by using the histogram of the above pixel in the previous row like this:<br /><br /><pre>histogram(x,y,w,h) = histogram(x,y-1,w,h) + histogram(x,y-h/2,w,1) - histogram(x,y+h/2,w,1)<br /></pre><br />This is similar to the way <a href="https://en.wikipedia.org/wiki/Rolling_hash">rolling hashes</a> are computed, which is where the name rolling histogram comes from.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-34226415350957000402020-03-31T23:59:00.000+02:002020-04-01T01:27:52.996+02:00A differentiable version of Conway's Game of Life<a href="https://en.wikipedia.org/wiki/Conway's_Game_of_Life">Conway's Game of Life</a> consists of a grid with black and white squares which flip in colour over time. The rules for changing the colour of a square are as follows:<br /><ul><li>If the current colour of a square is white then if it is surrounded by 2 or 3 adjacent white squares it stays white, otherwise it becomes black.</li><li>If the current colour of a square is black then if it is surrounded by exactly 3 white squares then it becomes white, otherwise it stays black.</li></ul><br />If we treat the grid as a numeric matrix where 0 is black and 1 is white, then a differentiable version of these rules can e made. The idea here is to allow not just black and white but also shades of grey so that the values of the squares can change only slightly and hence allow the changes to be differentiable. In order to force the values to remain between 1 and 0 we will make use of the sigmoid (logistic) function which is defined as $y = \frac{1}{1 + e^{-x}}$.<br /><br />The key trick here is to come up with a differentiable function that maps the number of white squares around a square to 0 or 1. We need two such functions: one that when x is 2 or 3 y is 1 and otherwise y is 0 and another that when x is 3 y is 1 and otherwise y is 0. We can make this function using two sigmoid functions subtracted from each other as follows:<br /><br />$y = \text{sig}(w \times (x - a)) - \text{sig}(w \times (x - b))$<br /><br />The graph plot of this function would be one which starts as 0, then increases to 1 at $x = a$ (where $a$ is the halfway point as $y$ goes from 0 to 1), then stays 1 until $x = b$, at which point $y$ goes back to 0 (again, $b$ is the halfway point as $y$ goes from 1 to 0). $w$ is there to control how steep the change from 0 to 1 and back should be with large $w$ meaning very steep.<br /><br />So now we apply the above equation to be used on a whole matrix of values and where $x$ is the number of adjacent white squares. Given that a square can also be grey, we instead just take the sum of the adjacent values to count the number of white squares. This results in a count which is not a whole number but which is usable in a differentiable function.<br /><br />We also need to be able to choose between using the rule for white squares or the rule for black squares. To do that we can just take a weighted average as follows:<br /><br />$y = x \times a + (1 - x) \times b$<br /><br />where $x$ is a number between 0 and 1 and a is the value $y$ should be when $x$ is 1 whilst $b$ is the value $y$ should be when $x$ is 0.<br /><br />Given a matrix $G$, you can get the next iteration $G'$ using the following function:<br /><br />Let $k$ be a 3 by 3 matrix consisting of all 1s and a single 0 in the center.<br />$N = conv(G, k)$, that is, convolve the kernel $k$ over the matrix $G$ in order to get a new matrix $N$ giving the number of adjacent white squares around every element in the matrix.<br />$B = \text{sig}(w \times (G - 2.5)) - \text{sig}(w \times (G - 3.5))$ (resultant matrix if all elements where black)<br />$W = \text{sig}(w \times (G - 1.5)) - \text{sig}(w \times (G - 3.5))$ (resultant matrix if all elements where white)<br />$G' = G \times W + (1 - G) \times B$Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-30228884816192714932020-02-27T23:30:00.002+01:002020-02-27T23:34:35.934+01:00Proof that the sum of all natural numbers is equal to -1/12 (provided we make an assumption)This is one of those weird proofs that turns out has <a href="https://physicstoday.scitation.org/do/10.1063/PT.5.8029/full/">practical value in physics</a>. If you add together all the numbers from 1 to infinity, the answer will be $-\frac{1}{12}$. Here's how we get this.<br /><br />We start from the simpler question of what is 1 - 1 + 1 - 1 + 1 - 1 + ... for an infinite number of terms. If you stop the series at a 1 then the answer is 1 but if you stop at a -1 then the answer is 0. Since the series never stops then we can make the assumption that the answer is 0.5, which is the average. This is like if you had a light switch that you were constantly switching on and off. If done at a fast enough frequency, the light will seem to be halfway light and dark, which is 0.5.<br /><br />$$\sum_i{(-1)^i} = \frac{1}{2}$$<br /><br />If you accept the above assumption then the rest is rigorous.<br /><br />We next need to find what 1 - 2 + 3 - 4 + 5 - ... is. This can be reduced to the above identity by applying the following trick:<br /><br /><pre>S = 1 - 2 + 3 - 4 + 5 - ...<br /><br />S+S = 1 - 2 + 3 - 4 + 5 - ...<br /> + 1 - 2 + 3 - 4 + ...<br /><br /> = 1 - 1 + 1 - 1 + 1 - ...<br /></pre><br />Therefore $2S = \frac{1}{2}$ which means that $S = \frac{1}{4}$.<br /><br />$$\sum_i{(-1)^i i} = \frac{1}{4}$$<br /><br />Finally we get to our original question:<br /><br /><pre>S' = 1 + 2 + 3 + 4 + 5 + ...<br /><br />S'-S = 1 + 2 + 3 + 4 + 5 + 6 + ...<br /> - 1 + 2 - 3 + 4 - 5 + 6 - ...<br /><br /> = 0 + 4 + 0 + 8 + 0 + 12 + ...<br /></pre><br />Which are the even numbers multiplied by 2, that is, the multiples of 4.<br /><br /><pre>S'-S = 4 + 8 + 12 + ...<br /> = 4(1 + 2 + 3 + ...)<br /> = 4S'<br /></pre><br />Now if S' - S = 4S' then<br /><br /><pre>S' - S = 4S'<br />S' - 4S' = S<br />-3S' = S<br />S' = -S/3<br /> = -(1/4)/3 = -1/12<br /></pre><br />$$\sum_i{i} = -\frac{1}{12}$$Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-65825725759607981442020-01-26T23:25:00.000+01:002020-02-01T23:14:37.764+01:00The Whittaker-Shannon interpolation formula (inferring missing values from a sampled signal using sinc or Lanczos function)In an earlier blog post we had talked about <a href="https://geekyisawesome.blogspot.com/2017/09/find-equation-that-passes-through-set.html">Lagrange polynomial interpolation</a> which is when you need to find a polynomial that passes through a set of points. This time we will be seeing a similar problem that is encountered in signal processing.<br /><br />In discrete signal processing, a continuous signal is represented by a discrete signal consisting of values taken at regular intervals in time, a process called sampling. Below we have an example of a continuous signal that is then sampled at intervals of 0.9 (arbitrarily chosen here).<br /><br /><a href="https://2.bp.blogspot.com/-ICGkEv0yVcw/Xi4QJuatY-I/AAAAAAAABd8/hOFH3mcABBgmaFrr76Ol_Sj7_N8JDV0_wCLcBGAsYHQ/s1600/whittaker_origsignal.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-ICGkEv0yVcw/Xi4QJuatY-I/AAAAAAAABd8/hOFH3mcABBgmaFrr76Ol_Sj7_N8JDV0_wCLcBGAsYHQ/s320/whittaker_origsignal.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br /><a href="https://4.bp.blogspot.com/-az3V8UWloo8/Xi4QNlqH_OI/AAAAAAAABeA/vfdDHTmTy4cUHvGdS0_h-pF143YtDwqbgCLcBGAsYHQ/s1600/whittaker_sampledsignal.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-az3V8UWloo8/Xi4QNlqH_OI/AAAAAAAABeA/vfdDHTmTy4cUHvGdS0_h-pF143YtDwqbgCLcBGAsYHQ/s320/whittaker_sampledsignal.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Now it may be the case that you would like to reconstruct the original signal from the sampled signal. You may think that there is an infinite number of signals that can fit the same samples but because the discrete values are sampled at regular intervals, there is actually only one signal that can fit it, <a href="https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem">provided that the sampling rate is greater than twice the highest frequency of the original signal</a>. If this condition is met then we can construct the original signal using the <a href="https://en.wikipedia.org/wiki/Whittaker%E2%80%93Shannon_interpolation_formula">Whittaker-Shannon interpolation formula</a>.<br /><br />The process is similar to that of the Lagrange polynomial interpolation. In polynomial interpolation, the trick is to find a set of component polynomials each of which passes through one of the given set of points. In order to be able to combine them all together into a single polynomial that passes through all of the given set of points, the component polynomials also need to equal zero wherever there is a different point. This makes sure that adding the polynomials together will not cause an interference between them at the given set of points.<br /><br />In Whittaker-Shannon interpolation, the components are the sinc function instead of polynomials. Sinc functions are periodic functions that are as follows:<br /><br />$$<br />\text{sinc}(x) = \begin{cases}<br />1 & \text{for } x = 0 \\<br />\frac{sin(x)}{x} & \text{otherwise} \\<br />\end{cases}<br />$$<br /><br /><a href="https://1.bp.blogspot.com/-I85Uw4Js-1w/Xi4QTnywflI/AAAAAAAABeE/TDdrQ0YyMW4EJxud-PPq7sPxlmmbL0rEACLcBGAsYHQ/s1600/whittaker_sinc.png" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-I85Uw4Js-1w/Xi4QTnywflI/AAAAAAAABeE/TDdrQ0YyMW4EJxud-PPq7sPxlmmbL0rEACLcBGAsYHQ/s320/whittaker_sinc.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />See how sinc is equal to 1 at x = 0 and equal to zero at regular intervals around it? This is how we'll take advantage of it as a component. The first thing we need to do is to synchronise the intervals at which it equals zero to the intervals of the sampled signal. If the sample interval, or period, is 0.9, then the synchronised sinc function would be:<br /><br />$$<br />y = sinc\left(\frac{\pi x}{0.9}\right)<br />$$<br /><br /><a href="https://4.bp.blogspot.com/-shCVNNvVQ3o/Xi4QX30NAaI/AAAAAAAABeI/yQkO0ghN9z0Qa15QGxqln1l4419cu02JwCLcBGAsYHQ/s1600/whittaker_syncsinc.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-shCVNNvVQ3o/Xi4QX30NAaI/AAAAAAAABeI/yQkO0ghN9z0Qa15QGxqln1l4419cu02JwCLcBGAsYHQ/s320/whittaker_syncsinc.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Next we need to shift the center peak of the synchronised sinc over one of the samples. Let's pick sample number 3 where sample number 0 is the first sample:<br /><br />$$<br />y = sinc\left(\frac{\pi (x - 3 \times 0.9)}{0.9}\right)<br />$$<br /><br /><a href="https://1.bp.blogspot.com/-NtyBToCJzH8/Xi4QcYt1SMI/AAAAAAAABeQ/NLRsKWulfVgo-HQ2zTRW9hgsYMhCWUBhwCLcBGAsYHQ/s1600/whittaker_syncshiftsinc.png" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-NtyBToCJzH8/Xi4QcYt1SMI/AAAAAAAABeQ/NLRsKWulfVgo-HQ2zTRW9hgsYMhCWUBhwCLcBGAsYHQ/s320/whittaker_syncshiftsinc.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Finally we need to adjust the height of the peak to be equal to that of that sample value. Since the peak has a value of one and all the other important points on the sinc function are equal to zero, then if the all we have to do is multiply the sinc function by the sample value:<br /><br />$$<br />y = -0.3303 \times sinc\left(\frac{\pi (x - 3 \times 0.9)}{0.9}\right)<br />$$<br /><br /><a href="https://3.bp.blogspot.com/--wwEKJnWcL0/Xi4QgiIjNBI/AAAAAAAABeY/ecTFqksDJ3YGA7SZJl2uGOSr4ufUcPG_wCLcBGAsYHQ/s1600/whittaker_samplesinc.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/--wwEKJnWcL0/Xi4QgiIjNBI/AAAAAAAABeY/ecTFqksDJ3YGA7SZJl2uGOSr4ufUcPG_wCLcBGAsYHQ/s320/whittaker_samplesinc.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Great! Now we just need to do the same for all the other samples and add up all the sinc functions into a single function:<br /><br />$$<br />y = <br />0 \times sinc\left(\frac{\pi (x - 0 \times 0.9)}{0.9}\right) +<br />0.7628 \times sinc\left(\frac{\pi (x - 1 \times 0.9)}{0.9}\right) +<br />-0.4309 \times sinc\left(\frac{\pi (x - 2 \times 0.9)}{0.9}\right) +<br />-0.3303 \times sinc\left(\frac{\pi (x - 3 \times 0.9)}{0.9}\right) +<br />\dots<br />$$<br /><br /><a href="https://3.bp.blogspot.com/-zLwBM6D2_WU/Xi4QlBGn8CI/AAAAAAAABec/qkMsPJprP-QgxfxqGifBz_8nSWP1XzmoACLcBGAsYHQ/s1600/whittaker_sincinterpolated.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-zLwBM6D2_WU/Xi4QlBGn8CI/AAAAAAAABec/qkMsPJprP-QgxfxqGifBz_8nSWP1XzmoACLcBGAsYHQ/s320/whittaker_sincinterpolated.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Note that the point made at the top about how there is only one signal that passes through the given samples only applied when there is an infinite supply of such samples.<br /><br /><a href="https://3.bp.blogspot.com/-QIoPwsT0tVY/Xi4Qo6mKPrI/AAAAAAAABek/bLu6Mahl4tQOmFMf7S9GMQ_uw_FeuY62QCLcBGAsYHQ/s1600/whittaker_sincinterpolated_compared.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-QIoPwsT0tVY/Xi4Qo6mKPrI/AAAAAAAABek/bLu6Mahl4tQOmFMf7S9GMQ_uw_FeuY62QCLcBGAsYHQ/s320/whittaker_sincinterpolated_compared.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Now, the sinc function is only useful when you want to make use of all points together in order to infer any missing point. In practice, this is very computationally expensive and it would be better to instead use the points thatre close to the missing point rather than all the points. To do this, instead of the sinc function we use a truncated sinc function called a Lanczos filter:<br /><br />$$<br />\text{lanczos}(x, a) = \begin{cases}<br />sinc(\pi x) \times sinc(\frac{\pi x}{a}) & -a < x < a \\<br />0 & \text{otherwise} \\<br />\end{cases}<br />$$<br /><br />where $a$ is some positive whole number such as 2 and is the amount of points to use to the left and right of the point to infer.<br /><br /><a href="https://4.bp.blogspot.com/-rVMTcQgaX4M/Xi4Qteu6JNI/AAAAAAAABeo/5ZCJBurXfJUTKFCrJ7N7y41ezTX-KYQegCLcBGAsYHQ/s1600/whittaker_lanczos.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-rVMTcQgaX4M/Xi4Qteu6JNI/AAAAAAAABeo/5ZCJBurXfJUTKFCrJ7N7y41ezTX-KYQegCLcBGAsYHQ/s320/whittaker_lanczos.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Now we can do the same thing as before but with Lanczos instead of sinc, taking care that the sinc within the lanczos function already multiplies $x$ by $\pi$:<br /><br />$$<br />y = <br />0 \times \text{lanczos}\left(\frac{x - 0 \times 0.9}{0.9}, 2\right) +<br />0.7628 \times \text{lanczos}\left(\frac{x - 1 \times 0.9}{0.9}, 2\right) +<br />-0.4309 \times \text{lanczos}\left(\frac{x - 2 \times 0.9}{0.9}, 2\right) +<br />-0.3303 \times \text{lanczos}\left(\frac{x - 3 \times 0.9}{0.9}, 2\right) +<br />\dots<br />$$<br /><br /><a href="https://2.bp.blogspot.com/-XPg3QDi1i3E/Xi4Qx_3URrI/AAAAAAAABes/09rjVwtY_aAdHqnNhty86pPHMWasla7vACLcBGAsYHQ/s1600/whittaker_lanczosinterpolated.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-XPg3QDi1i3E/Xi4Qx_3URrI/AAAAAAAABes/09rjVwtY_aAdHqnNhty86pPHMWasla7vACLcBGAsYHQ/s320/whittaker_lanczosinterpolated.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />The Lanczos filter can be used in things like <a href="https://en.wikipedia.org/wiki/Lanczos_resampling">image enlargement</a>. By treating each pixel in an image as a sample of points from a continuous field, values in between two existing pixels can be deduced by interpolation by reproducing the continuous field using the Lanczos interpolation and then picking missing values from between two pixels.<br /><br />In general, for samples $s$ and a sample period of $T$,<br /><br />$$y = \sum_i s[i] \times sinc\left(\frac{\pi (x - i \times T)}{T}\right)$$<br />$$y = \sum_i s[i] \times lanczos\left(\frac{x - i \times T}{T}\right)$$Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-15698176292879488892019-12-30T02:16:00.002+01:002019-12-31T00:31:31.654+01:00Implementing a Gaussian blur filter together with convolution operation from scratchGaussian blurring is a very common filter used in image processing which is useful for many things such as removing salt and pepper noise from images, resizing images to be smaller (<a href="https://en.wikipedia.org/wiki/Downsampling_(signal_processing)">downsampling</a>), and simulating out-of-focus effects. Let's look at how to implement one ourselves.<br /><br />Let's use the following grey-scale image as an example:<br /><a href="https://3.bp.blogspot.com/-u5QqVa8hS3s/XglO8Dsa5MI/AAAAAAAABdI/JETHqpCmJPcebhwz55ZwHTlHe4PKnb_8QCLcBGAsYHQ/s1600/gauss_orig_grey.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-u5QqVa8hS3s/XglO8Dsa5MI/AAAAAAAABdI/JETHqpCmJPcebhwz55ZwHTlHe4PKnb_8QCLcBGAsYHQ/s320/gauss_orig_grey.png" width="320" height="230" data-original-width="640" data-original-height="459" /></a><br /><br />The image consists of many numbers that are arranged in a grid, each number called a pixel. The pixel values range from 0 to 255 where 0 is represented with a black colour and 255 with a white colour and all the other numbers are shades of grey in between. Here are the values of the 5x5 square of pixels in the top-left corner of the image:<br /><pre>[[156, 157, 160, 159, 158],<br /> [156, 157, 159, 158, 158],<br /> [158, 157, 156, 156, 157],<br /> [160, 157, 154, 154, 156],<br /> [158, 157, 156, 156, 157]]<br /></pre><br />In the case of a colour image, each pixel contains 3 values: the intensity of red, the intensity of blue, and the intensity of green. When these three primary colours are combined in different intensities they can produce any colour. <br /><br />Let's use the following colour image as an example:<br /><a href="https://3.bp.blogspot.com/-AsUF44wDSYA/XglPBVdj-KI/AAAAAAAABdM/EPV4Xjv8krY2CpcLUoyjS0foa3yWpMbawCLcBGAsYHQ/s1600/gauss_orig_colour.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-AsUF44wDSYA/XglPBVdj-KI/AAAAAAAABdM/EPV4Xjv8krY2CpcLUoyjS0foa3yWpMbawCLcBGAsYHQ/s320/gauss_orig_colour.png" width="320" height="230" data-original-width="640" data-original-height="459" /></a><br /><br />The values of the 5x5 square of pixels in the top-left corner of the image are:<br /><pre>[[[21, 13, 8], [21, 13, 9], [20, 11, 8], [21, 13, 11], [21, 14, 8]],<br /> [[21, 13, 7], [21, 13, 9], [20, 14, 7], [22, 14, 8], [20, 13, 7]],<br /> [[21, 14, 7], [23, 13, 10], [20, 14, 9], [21, 13, 9], [21, 15, 9]],<br /> [[21, 13, 6], [21, 13, 9], [21, 13, 8], [21, 13, 10], [20, 12, 5]],<br /> [[21, 13, 5], [21, 13, 7], [21, 13, 4], [21, 13, 7], [21, 13, 8]]]<br /></pre><br /><div class="sectitle">The convolution operation</div><br />One of the basic operations in image processing is the convolution. This is when you replace every pixel value $p$ in an image with a weighted sum of the pixel values near $p$, that is, multiply each nearby pixel value with a different number before adding them all up and replacing $p$ with the result. The simplest example of a convolution is when 'nearby' means the pixel itself only. For example, if we convolve a 5x5 pixel greyscale image with a 1x1 weighting, this is what we get:<br /><br /><div style="height:100%;width:100%;overflow:scroll;">$$<br />\left[\begin{array}{ccccc}<br />1 & 2 & 3 & 4 & 5\\<br />6 & 7 & 8 & 9 & 10\\<br />11 & 12 & 13 & 14 & 15\\<br />16 & 17 & 18 & 19 & 20\\<br />21 & 22 & 23 & 24 & 25\\<br />\end{array}\right]<br />\circledast<br />\left[\begin{array}{c}<br />100<br />\end{array}\right]<br />=<br />\left[\begin{array}{ccccc}<br />1 \times 100 & 2 \times 100 & 3 \times 100 & 4 \times 100 & 5 \times 100\\<br />6 \times 100 & 7 \times 100 & 8 \times 100 & 9 \times 100 & 10 \times 100\\<br />11 \times 100 & 12 \times 100 & 13 \times 100 & 14 \times 100 & 15 \times 100\\<br />16 \times 100 & 17 \times 100 & 18 \times 100 & 19 \times 100 & 20 \times 100\\<br />21 \times 100 & 22 \times 100 & 23 \times 100 & 24 \times 100 & 25 \times 100\\<br />\end{array}\right]<br />=<br />\left[\begin{array}{ccccc}<br />100 & 200 & 300 & 400 & 500\\<br />600 & 700 & 800 & 900 & 1000\\<br />1100 & 1200 & 1300 & 1400 & 1500\\<br />1600 & 1700 & 1800 & 1900 & 2000\\<br />2100 & 2200 & 2300 & 2400 & 2500\\<br />\end{array}\right]<br />$$<br /></div><br />Here we have replaced the pixel value 1 with 100, 2 with 200, etc. Note that in practice the numbers would be clipped to not exceed 255.<br /><br />Only the pixel being replaced contributed to its new value. We can extend the range of the weighting by using a 3x3 weight instead of a single number which will combine all the pixels adjacent to the pixel being replaced.<br /><br /><div style="height:100%;width:100%;overflow:scroll;">$$<br />\left[\begin{array}{ccccc}<br />1 & 2 & 3 & 4 & 5\\<br />6 & 7 & 8 & 9 & 10\\<br />11 & 12 & 13 & 14 & 15\\<br />16 & 17 & 18 & 19 & 20\\<br />21 & 22 & 23 & 24 & 25\\<br />\end{array}\right]<br />\circledast<br />\left[\begin{array}{ccc}<br />100 & 200 & 100\\<br />300 & 400 & 300\\<br />100 & 200 & 100\\<br />\end{array}\right]<br />=<br />\left[\begin{array}{ccccc}<br /><br />\left(\begin{array}{cccccc}<br />& 1 \times 100 & + & 2 \times 200 & + & 3 \times 100\\<br />+ & 6 \times 300 & + & 7 \times 400 & + & 8 \times 300\\<br />+ & 11 \times 100 & + & 12 \times 200 & + & 13 \times 100\\<br />\end{array}\right)<br />&<br />\left(\begin{array}{cccccc}<br />& 2 \times 100 & + & 3 \times 200 & + & 4 \times 100\\<br />+ & 7 \times 300 & + & 8 \times 400 & + & 9 \times 300\\<br />+ & 12 \times 100 & + & 13 \times 200 & + & 14 \times 100\\<br />\end{array}\right)<br />&<br />\left(\begin{array}{cccccc}<br />& 3 \times 100 & + & 4 \times 200 & + & 5 \times 100\\<br />+ & 8 \times 300 & + & 9 \times 400 & + & 10 \times 300\\<br />+ & 13 \times 100 & + & 14 \times 200 & + & 15 \times 100\\<br />\end{array}\right)<br />\\<br /><br />\left(\begin{array}{cccccc}<br />& 6 \times 100 & + & 7 \times 200 & + & 8 \times 100\\<br />+ & 11 \times 300 & + & 12 \times 400 & + & 13 \times 300\\<br />+ & 16 \times 100 & + & 17 \times 200 & + & 18 \times 100\\<br />\end{array}\right)<br />&<br />\left(\begin{array}{cccccc}<br />& 7 \times 100 & + & 8 \times 200 & + & 9 \times 100\\<br />+ & 12 \times 300 & + & 13 \times 400 & + & 14 \times 300\\<br />+ & 17 \times 100 & + & 18 \times 200 & + & 19 \times 100\\<br />\end{array}\right)<br />&<br />\left(\begin{array}{cccccc}<br />& 8 \times 100 & + & 9 \times 200 & + & 10 \times 100\\<br />+ & 13 \times 300 & + & 14 \times 400 & + & 15 \times 300\\<br />+ & 18 \times 100 & + & 19 \times 200 & + & 20 \times 100\\<br />\end{array}\right)<br />\\<br /><br />\left(\begin{array}{cccccc}<br />& 11 \times 100 & + & 12 \times 200 & + & 13 \times 100\\<br />+ & 16 \times 300 & + & 17 \times 400 & + & 18 \times 300\\<br />+ & 21 \times 100 & + & 22 \times 200 & + & 23 \times 100\\<br />\end{array}\right)<br />&<br />\left(\begin{array}{cccccc}<br />& 12 \times 100 & + & 13 \times 200 & + & 14 \times 100\\<br />+ & 17 \times 300 & + & 18 \times 400 & + & 19 \times 300\\<br />+ & 22 \times 100 & + & 23 \times 200 & + & 24 \times 100\\<br />\end{array}\right)<br />&<br />\left(\begin{array}{cccccc}<br />& 13 \times 100 & + & 14 \times 200 & + & 15 \times 100\\<br />+ & 18 \times 300 & + & 19 \times 400 & + & 20 \times 300\\<br />+ & 23 \times 100 & + & 24 \times 200 & + & 25 \times 100\\<br />\end{array}\right)<br />\\<br /><br />\end{array}\right]<br />=<br />\left[\begin{array}{ccc}<br />12600 & 14400 & 16200\\<br />21600 & 23400 & 25200\\<br />30600 & 32400 & 34200\\<br />\end{array}\right]<br />$$<br /></div><br />Note how the new image becomes smaller than the original image. This is in order to avoid using pixels values that lie outside of the image edges when computing new pixel values. One way to use pixel values outside of the image is to treat any pixel outside of the image as having a value of zero (zero padding). We can also reflect the image outside of its edges or wrap around to the other side of the image.<br /><br />We can implement the above convolution operation in Python naively using for loops as follows:<br /><br /><pre class="brush:python">def conv(img, wgt):<br /> img_h = len(img)<br /> img_w = len(img[0])<br /> wgt_h = len(wgt)<br /> wgt_w = len(wgt[0])<br /><br /> res_h = img_h - (wgt_h//2)*2<br /> res_w = img_w - (wgt_w//2)*2<br /> res = [ [ 0 for _ in range(res_w) ] for _ in range(res_h) ]<br /><br /> for img_row in range(wgt_h//2, img_h-wgt_h//2):<br /> for img_col in range(wgt_w//2, img_w-wgt_w//2):<br /> new_pix = 0<br /> for wgt_row in range(wgt_h):<br /> for wgt_col in range(wgt_w-1, -1, -1): #Technically, a convolution operation needs to flip the weights left to right.<br /> new_pix += wgt[wgt_row][wgt_col]*img[img_row+wgt_row-wgt_h//2][img_col+wgt_col-wgt_w//2]<br /> res[img_row - wgt_h//2][img_col - wgt_w//2] = new_pix<br /><br /> return res<br /><br />print(conv([<br /> [ 1, 2, 3, 4, 5],<br /> [ 6, 7, 8, 9,10],<br /> [11,12,13,14,15],<br /> [16,17,18,19,20],<br /> [21,22,23,24,25]<br /> ], [<br /> [100,200,100],<br /> [300,400,300],<br /> [100,200,100]<br /> ]))<br /></pre><pre>[[12600, 14400, 16200], [21600, 23400, 25200], [30600, 32400, 34200]]<br /></pre><br />I say naively because it is possible to perform a convolution using fewer loops and with parallel processing but that would be beyond the scope of this blog post. Note that in the code above we are flipping the weights before multiplying them because <a href="https://towardsdatascience.com/convolution-vs-correlation-af868b6b4fb5">that is what a convolution involves</a>, but since we'll be working with symmetric weights then it doesn't really matter.<br /><br />If we want to use zero padding in our convolution then we would do it as follows:<br /><br /><pre class="brush:python">def conv_pad(img, wgt):<br /> img_h = len(img)<br /> img_w = len(img[0])<br /> wgt_h = len(wgt)<br /> wgt_w = len(wgt[0])<br /><br /> res = [ [ 0 for _ in range(img_w) ] for _ in range(img_h) ]<br /><br /> for img_row in range(img_h):<br /> for img_col in range(img_w):<br /> new_pix = 0<br /> for wgt_row in range(wgt_h):<br /> for wgt_col in range(wgt_w-1, -1, -1):<br /> y = img_row+wgt_row-wgt_h//2<br /> x = img_col+wgt_col-wgt_w//2<br /> if 0 >= y > img_h and 0 >= x > img_w:<br /> pix = img[y][x]<br /> else:<br /> pix = 0<br /> new_pix += wgt[wgt_row][wgt_col]*pix<br /> res[img_row][img_col] = new_pix<br /><br /> return res<br /><br />print(conv_pad([<br /> [ 1, 2, 3, 4, 5],<br /> [ 6, 7, 8, 9,10],<br /> [11,12,13,14,15],<br /> [16,17,18,19,20],<br /> [21,22,23,24,25]<br /> ], [<br /> [100,200,100],<br /> [300,400,300],<br /> [100,200,100]<br /> ]))<br /></pre><pre>[[2900, 4800, 6200, 7600, 6100], [8300, 12600, 14400, 16200, 12500], [14800, 21600, 23400, 25200, 19000], [21300, 30600, 32400, 34200, 25500], [19900, 28800, 30200, 31600, 23100]]<br /></pre><br />We can adapt this function to be able to operate on colour images by simply applying the filter on each of the three intensity channels separately as follows:<br /><br /><pre class="brush:python">def conv_pad_colour(img, wgt):<br /> img_h = len(img)<br /> img_w = len(img[0])<br /> wgt_h = len(wgt)<br /> wgt_w = len(wgt[0])<br /><br /> res = [ [ [ 0, 0, 0 ] for _ in range(img_w) ] for _ in range(img_h) ]<br /><br /> for img_row in range(img_h):<br /> for img_col in range(img_w):<br /> for channel in range(3):<br /> new_pix = 0<br /> for wgt_row in range(wgt_h):<br /> for wgt_col in range(wgt_w-1, -1, -1):<br /> y = img_row+wgt_row-wgt_h//2<br /> x = img_col+wgt_col-wgt_w//2<br /> if 0 <= y < img_h and 0 <= x < img_w:<br /> pix = img[y][x][channel]<br /> else:<br /> pix = 0<br /> new_pix += wgt[wgt_row][wgt_col]*pix<br /> res[img_row][img_col][channel] = new_pix<br /><br /> return res<br /><br />print(conv_pad_colour([<br /> [[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12], [13, 14, 15]],<br /> [[16, 17, 18], [19, 20, 21], [22, 23, 24], [25, 26, 27], [28, 29, 30]],<br /> [[31, 32, 33], [34, 35, 36], [37, 38, 39], [40, 41, 42], [43, 44, 45]],<br /> [[46, 47, 48], [49, 50, 51], [52, 53, 54], [55, 56, 57], [58, 59, 60]],<br /> [[61, 62, 63], [64, 65, 66], [67, 68, 69], [70, 71, 72], [73, 74, 75]]<br /> ], [<br /> [100,200,100],<br /> [300,400,300],<br /> [100,200,100]<br /> ]))<br /></pre><pre>[[[6700, 7700, 8700], [11600, 13000, 14400], [15800, 17200, 18600], [20000, 21400, 22800], [16300, 17300, 18300]], [[22300, 23600, 24900], [34200, 36000, 37800], [39600, 41400, 43200], [45000, 46800, 48600], [34900, 36200, 37500]], [[41800, 43100, 44400], [61200, 63000, 64800], [66600, 68400, 70200], [72000, 73800, 75600], [54400, 55700, 57000]], [[61300, 62600, 63900], [88200, 90000, 91800], [93600, 95400, 97200], [99000, 100800, 102600], [73900, 75200, 76500]], [[57700, 58700, 59700], [83600, 85000, 86400], [87800, 89200, 90600], [92000, 93400, 94800], [67300, 68300, 69300]]]<br /></pre>In practice we can use scipy's convolve function in order to apply a convolution: <pre class="brush:python">import scipy.ndimage<br />import numpy as np<br />img = np.array([<br /> [[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12], [13, 14, 15]],<br /> [[16, 17, 18], [19, 20, 21], [22, 23, 24], [25, 26, 27], [28, 29, 30]],<br /> [[31, 32, 33], [34, 35, 36], [37, 38, 39], [40, 41, 42], [43, 44, 45]],<br /> [[46, 47, 48], [49, 50, 51], [52, 53, 54], [55, 56, 57], [58, 59, 60]],<br /> [[61, 62, 63], [64, 65, 66], [67, 68, 69], [70, 71, 72], [73, 74, 75]]<br /> ])<br />weights = np.array([<br /> [100,200,100],<br /> [300,400,300],<br /> [100,200,100]<br /> ])<br />res = np.stack([ #Convolve each channel separately as a greyscale image and then combine them back into a single image.<br /> scipy.ndimage.convolve(img[:,:,channel], weights, mode='constant')<br /> for channel in range(3)<br /> ], axis=2)<br />print(res)<br /></pre>In signal processing, the weights are actually called a filter. <p></p><div class="sectitle">The gaussian blur</div>A blur is made by replacing each pixel value with the average value of its nearby pixels. This softens edges and more the image look out of focus. Averaging nearby pixels can be easily achieved by convolving with a filter of $\frac{1}{n}$ where $n$ is the amount of numbers in the filter. This results in making each pixel equal to $$p_1 \times \frac{1}{n} + p_2 \times \frac{1}{n} + \dots + p_n \times \frac{1}{n} = \frac{1}{n} \left(p_1 + p_2 + \dots + p_n\right)$$ where $p_i$ is a pixel value. <pre class="brush:python">import scipy.ndimage<br />import skimage<br />import numpy as np<br /><br />img = skimage.io.imread('image.png')<br />filter = np.ones([11, 11])<br />filter = filter/filter.size<br /><br />res = np.stack([ #Convolve each channel separately as a greyscale image and then combine them back into a single image.<br /> scipy.ndimage.convolve(img[:,:,channel], filter, mode='constant')<br /> for channel in range(3)<br /> ], axis=2)<br /><br />skimage.io.imshow(res)<br />skimage.io.show()<br /></pre><p><a href="https://3.bp.blogspot.com/-9PJzUb1k0ys/XglPKGfBpTI/AAAAAAAABdQ/sbPCQeKd8owOrnUmCOm8GiCwkM9MCQlVACLcBGAsYHQ/s1600/gauss_uniform_blur.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-9PJzUb1k0ys/XglPKGfBpTI/AAAAAAAABdQ/sbPCQeKd8owOrnUmCOm8GiCwkM9MCQlVACLcBGAsYHQ/s320/gauss_uniform_blur.png" width="320" height="230" data-original-width="640" data-original-height="459" /></a></p>This is actually called a uniform blur, which is not considered an ideal blur as it gives equal importance to pixels that are far from the pixel being replaced as to pixels that are close. A better blur is a gaussian blur. A gaussian blur is when the filter values follow a gaussian bell curve with a maximum in the center and a steep decline just around the center. The gaussian function is defined as follows: $$G(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{x^2}{2\sigma^2}}$$ where $\sigma$ is a parameter for controlling how wide the bell curve is and therefore how much influence the nearby pixels have on the value of the pixel being replaced (the greater the $\sigma$, the more blurry). In two dimensions all you do is take the gaussian of each variable and multiply them together: $$G(x,y) = G(x) \times G(y)$$ Now the way the gaussian blur works is to use an infinite 2D gaussian curve as a filter for a convolution. The center of the filter is $G(0,0)$ and each cell in the filter being a distance of one in the gaussian function's domain as follows: $$ \newcommand\iddots{\mathinner{ \kern1mu\raise1pt{.} \kern2mu\raise4pt{.} \kern2mu\raise7pt{\Rule{0pt}{7pt}{0pt}.} \kern1mu }} \begin{array}{ccccc} \ddots & & \vdots & & \iddots\\ & G(-1, 1) & G( 0, 1) & G( 1, 1) & \\ \cdots & G(-1, 0) & G( 0, 0) & G( 1, 0) & \cdots\\ & G(-1,-1) & G( 0,-1) & G( 1,-1) & \\ \iddots & & \vdots & & \ddots\\ \end{array} $$ In order to be a perfect gaussian blur, the filter needs to be infinitely large, which is not possible. Fortunately, $G(3\sigma)$ is small enough to be rounded down to zero. This means that our kernel needs to only have a side of $\lceil 6\sigma + 1 \rceil$ at most as anything larger than that will not make a difference ($\lceil x \rceil$ means $x$ rounded up). The problem is that even though the missing values are negligible, the values in the filter will not equal 1 (especially for a small $\sigma$) which makes the resulting image will be either brighter or darker. To get around this we cheat a bit and divide each value in the filter by the filter's total, which results in a guassian blur with a neutral effect on the brightness of the image. $$ \begin{array}{ccccccc} \frac{G(-\lceil 3\sigma \rceil, \lceil 3\sigma \rceil)}{T} & & & \frac{G(0, \lceil 3\sigma \rceil)}{T} & & & \frac{G(\lceil 3\sigma \rceil, \lceil 3\sigma \rceil)}{T}\\ & \ddots & & \vdots & & \iddots & \\ & & \frac{G(-1, 1)}{T} & \frac{G( 0, 1)}{T} & \frac{G( 1, 1)}{T} & & \\ \frac{G(-\lceil 3\sigma \rceil, 0)}{T} & \cdots & \frac{G(-1, 0)}{T} & \frac{G( 0, 0)}{T} & \frac{G( 1, 0)}{T} & \cdots & \frac{G(\lceil 3\sigma \rceil, 1)}{T}\\ & & \frac{G(-1,-1)}{T} & \frac{G( 0,-1)}{T} & \frac{G( 1,-1)}{T} & & \\ & \iddots & & \vdots & & \ddots & \\ \frac{G(-\lceil 3\sigma \rceil, -\lceil 3\sigma \rceil)}{T} & & & \frac{G(0, -\lceil 3\sigma \rceil)}{T} & & & \frac{G(\lceil 3\sigma \rceil, -\lceil 3\sigma \rceil)}{T}\\ \end{array} $$ where $T$ is the total of all the $G(x,y)$ in the filter. <pre class="brush:python">import scipy.ndimage<br />import skimage<br />import numpy as np<br /><br />img = skimage.io.imread('image.png')<br /><br />sigma = 4<br />G = lambda x: 1/np.sqrt(2*np.pi*sigma**2)*np.exp(-x**2/(2*sigma**2))<br />G2 = lambda x,y: G(x)*G(y)<br />limit = np.ceil(3*sigma).astype(np.int32)<br />filter = G2(*np.meshgrid(np.arange(-limit, limit+1), np.arange(-limit, limit+1)))<br />filter = weights/np.sum(filter)<br /><br />res = np.stack([<br /> scipy.ndimage.convolve(img[:,:,channel], filter, mode='constant')<br /> for channel in range(3)<br /> ], axis=2)<br />skimage.io.imshow(res)<br />skimage.io.show()<br /></pre><p><a href="https://1.bp.blogspot.com/-imc48d9phg4/XglPQGyJPuI/AAAAAAAABdU/gU-QYJhprrALwwW6tDTSdPlhYYafeyVhQCLcBGAsYHQ/s1600/gauss_gaussian_blur.png" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-imc48d9phg4/XglPQGyJPuI/AAAAAAAABdU/gU-QYJhprrALwwW6tDTSdPlhYYafeyVhQCLcBGAsYHQ/s320/gauss_gaussian_blur.png" width="320" height="230" data-original-width="640" data-original-height="459" /></a></p>Of course skimage has a gaussian blur filter out of the box for us to use: <pre class="brush:python">import skimage<br /><br />img = skimage.io.imread('image.png')<br /><br />sigma = 4<br />res = skimage.filters.gaussian(img, sigma, mode='constant', truncate=3) #Truncate to 3 sigmas.<br />skimage.io.imshow(res)<br />skimage.io.show()<br /></pre>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-75783353664894230552019-11-30T11:25:00.002+01:002019-11-30T11:25:41.184+01:00Texture Descriptors of Image Regions using Local Binary PatternsA very simple way to measure the similarity between the texture of regions in images is to capture the region's texture information in a vector using local binary patterns. These vectors are invariant to intensity changes, fast to compute, and very easy to understand. Let's see how this works. Note that since we're focusing on texture then we don't care about colour but about intensities, so the image needs to be greyscale.<br /><br />The first step is to replace all the pixels in the image region with LBP (local binary pattern) codes. For every pixel, draw an imaginary circle of some radius centred on the pixel and draw some number of imaginary dots on that circle that are equally spaced apart. The radius and number of dots are to be hand tuned in order to maximise performance but they are usually set to a radius of 1 pixel and a number of dots of 8. Now pick the pixels that fall under those dots together with the pixel at the centre.<br /><br /><a href="https://2.bp.blogspot.com/-MjBycXxzkUs/XeJDOkmacGI/AAAAAAAABcY/4mGGaFmxpq8qO6SKgYQO1uR4-9stFoQLgCLcBGAsYHQ/s1600/lbp_pixel_selection.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-MjBycXxzkUs/XeJDOkmacGI/AAAAAAAABcY/4mGGaFmxpq8qO6SKgYQO1uR4-9stFoQLgCLcBGAsYHQ/s320/lbp_pixel_selection.png" width="320" height="117" data-original-width="604" data-original-height="220" /></a><br /><br />In the surrounding pixels, any pixel that has a lower intensity than the central pixel gets a zero and all the rest get a one. These numbers are put together in order to form a binary number called the LBP code of the central pixel.<br /><br /><a href="https://2.bp.blogspot.com/-9HWfiVbE1E8/XeJDTRTUPoI/AAAAAAAABcc/T_N8YaTxyPEkgU6EuV5cgqC0uCBJn-g0QCLcBGAsYHQ/s1600/lbp_code.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-9HWfiVbE1E8/XeJDTRTUPoI/AAAAAAAABcc/T_N8YaTxyPEkgU6EuV5cgqC0uCBJn-g0QCLcBGAsYHQ/s320/lbp_code.png" width="320" height="114" data-original-width="405" data-original-height="144" /></a><br /><br />Do this for every pixel in the region and then count how many times each code appears in the region, putting all the frequencies in a vector (a histogram). This is your texture descriptor and you can stop here. The more similar the vector is to other vectors, the more similar the texture of the image regions are.<br /><br />We can do better than this however by making the codes uniform. If you do an analysis of these LBP code frequencies, you'll find that most of them have a most two bit changes in them. For example, 00011100 has two bit changes, one from a 0 to a 1 and another after from a 1 to a 0. On the other hand 01010101 has 7 changes. The reason why the majority will have at most two bit changes is because, most of the time, the pixels around a central pixel do not have random intensities but change in a flow as shown in the figure above. If there is a gradient of intensity changes in a single direction then there will only be two bit changes. This means that we can replace all LBP codes with more than two bit changes with a single LBP code that just means 'other'. This is called a uniform LBP code and it reduces our vector size significantly without much of a performance hit, which is good.<br /><br />We can do even better by making the LBP codes rotation invariant, that is, using the same LBP code regardless of where the gradient direction is pointing. In the above figure, if the line of dark grey pixels were rotated, the LBP code would be changed to something like 00011110 for example or 11110000. The only thing that would change when rotating the surrounding pixels would be the bit rotation of the LBP code. We can make our codes invariant by artificially rotating our LBP codes so that for a given number of ones and zeros in the code we always have a canonical code. Usually we treat the binary code as an integer and rotate the bits until we get the minimum integer. For example, here are all the bit rotations of four 1s and four 0s:<br /><br />11110000 (240), 01111000 (120), 00111100 (60), 00011110 (30), 00001111 (15)<br /><br />In this case we would pick 00001111 as our canonical representation as that is the smallest integer. This drastically reduces our vector sizes as the amount of different possible LBP codes to find the frequency of will be a small fraction of the original full set of LBP codes.<br /><br />You can use LBP descriptors using Python's skimage and numpy as follows:<br /><pre class="brush:python">import numpy as np<br />import skimage<br />import skimage.feature<br /><br />image = skimage.io.imread('img.png', as_gray=True)<br />region = image[0:100,0:100]<br />lbp_codes = skimage.feature.local_binary_pattern(region, P=8, R=1, method='uniform') #8 dots, radius 1, rotation invariant uniform codes.<br />descriptor = np.histogram(lbp_codes, bins=10, range=(0, 10))[0] #Number of bins and range changes depending on the number of dots and LBP method used.<br /></pre><br />The number of bins and range needed can be obtained by artificially generating all the possible arrangements of darker and lighter pixels around a central pixel:<br /><pre class="brush:python">def get_histogram_params(dots, method):<br /> all_possible_lbp_codes = set()<br /> for i in range(2**8):<br /> str_bin = '{:0>8b}'.format(i)<br /> region = [ {'0':0,'1':2}[ch] for ch in str_bin ] #Surrounding pixels.<br /> region.insert(4, 1) #Central pixel.<br /> region = np.array(region).reshape((3,3))<br /> lbp_codes = skimage.feature.local_binary_pattern(region, dots, 1, method) #Radius is minimum to keep region size to a minimum.<br /> all_possible_lbp_codes.add(lbp_codes[1,1]) #Get code of central pixel and add it to the set (keeps on unique values).<br /> num_bins = len(all_possible_lbp_codes)<br /> rng = (min(all_possible_lbp_codes), max(all_possible_lbp_codes)+1)<br /> return (num_bins, rng)<br /></pre><br />You can get more information about the skimage function from <a href="https://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.local_binary_pattern">here</a>.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-3197103515384156572019-10-30T23:01:00.002+01:002019-10-30T23:07:37.650+01:00No square number can end in 2, 3, 7, or 8 (modulo of square numbers)Look at the following list of square numbers:<br /><pre>0^2: 0<br /> 1^2: 1<br /> 2^2: 4<br /> 3^2: 9<br /> 4^2: 16<br /> 5^2: 25<br /> 6^2: 36<br /> 7^2: 49<br /> 8^2: 64<br /> 9^2: 81<br />10^2: 100<br />11^2: 121<br />12^2: 144<br />13^2: 169<br />14^2: 196<br />15^2: 225<br />16^2: 256<br />17^2: 289<br />18^2: 324<br />19^2: 361<br />20^2: 400<br />21^2: 441<br />22^2: 484<br />23^2: 529<br />24^2: 576<br />25^2: 625<br />26^2: 676<br />27^2: 729<br />28^2: 784<br />29^2: 841<br />30^2: 900<br /></pre><br />Notice how the last digit is always 0, 1, 4, 9, 6, or 5? There also seems to be a cyclic pattern of digits repeating themselves but we'll talk about that in another post. In this post, we'll just talk about why certain digits do not occur at the end of square numbers, which is useful in order to check if a number is a square or not.<br /><br />We can get the last digit of a number by dividing it by 10 and keeping the remainder, also known as the number modulo 10:<br /><br /><pre>0^2 mod 10: 0<br /> 1^2 mod 10: 1<br /> 2^2 mod 10: 4<br /> 3^2 mod 10: 9<br /> 4^2 mod 10: 6<br /> 5^2 mod 10: 5<br /> 6^2 mod 10: 6<br /> 7^2 mod 10: 9<br /> 8^2 mod 10: 4<br /> 9^2 mod 10: 1<br />10^2 mod 10: 0<br />11^2 mod 10: 1<br />12^2 mod 10: 4<br />13^2 mod 10: 9<br />14^2 mod 10: 6<br />15^2 mod 10: 5<br />16^2 mod 10: 6<br />17^2 mod 10: 9<br />18^2 mod 10: 4<br />19^2 mod 10: 1<br />20^2 mod 10: 0<br />21^2 mod 10: 1<br />22^2 mod 10: 4<br />23^2 mod 10: 9<br />24^2 mod 10: 6<br />25^2 mod 10: 5<br />26^2 mod 10: 6<br />27^2 mod 10: 9<br />28^2 mod 10: 4<br />29^2 mod 10: 1<br />30^2 mod 10: 0<br /></pre><br />It turns out that this sort of exclusion of certain modulo results from square numbers happens with many other numbers apart from 10. Take 12 for example:<br /><br /><pre>0^2 mod 12: 0<br /> 1^2 mod 12: 1<br /> 2^2 mod 12: 4<br /> 3^2 mod 12: 9<br /> 4^2 mod 12: 4<br /> 5^2 mod 12: 1<br /> 6^2 mod 12: 0<br /> 7^2 mod 12: 1<br /> 8^2 mod 12: 4<br /> 9^2 mod 12: 9<br />10^2 mod 12: 4<br />11^2 mod 12: 1<br />12^2 mod 12: 0<br />13^2 mod 12: 1<br />14^2 mod 12: 4<br />15^2 mod 12: 9<br />16^2 mod 12: 4<br />17^2 mod 12: 1<br />18^2 mod 12: 0<br />19^2 mod 12: 1<br />20^2 mod 12: 4<br />21^2 mod 12: 9<br />22^2 mod 12: 4<br />23^2 mod 12: 1<br />24^2 mod 12: 0<br />25^2 mod 12: 1<br />26^2 mod 12: 4<br />27^2 mod 12: 9<br />28^2 mod 12: 4<br />29^2 mod 12: 1<br />30^2 mod 12: 0<br /></pre><br />With 12 you only get 0, 1, 4, and 9 as digits which is less than with 10. Why does this happen? We can prove that this is the case using <a href="https://en.wikipedia.org/wiki/Modulo_operation#Properties_(identities)">modulo algebra</a>. Let's call the number being squared $x$ and the number after the modulo $n$. Then by the distributive law of the modulo operation:<br /><br />$x^2 \text{ mod } n = (x \times x) \text{ mod } n = ((x \text{ mod } n) \times (x \text{ mod } n)) \text{ mod } n = (x \text{ mod } n)^2 \text{ mod } n$<br /><br />What the above derivation says is that when dividing a square number, the remainder will be based on the remainder of the root of the square number. In the case of the last digit of a square number, which is equivalent to the square number modulo 10, the last digit will be one of the last digits of the single digits squared: <b>0</b>, <b>1</b>, <b>4</b>, <b>9</b>, 1<b>6</b>, 2<b>5</b>, 3<b>6</b>, 4<b>9</b>, 6<b>4</b>, 8<b>1</b>. This is because $x \text{ mod } 10$ gives you the last digit of the number $x$, which is a single digit, and then this is squared and the last digit of the square is the answer.<br /><br />In the general case, this is setting a limit on the number of possible remainders (usually $x \text{ mod } n$ has $n$ different possible answers for all $x$). The number of different remainders is first limited to $n$ with the first inner modulo but then only the remainders resulting from these numbers squared can be returned. This means that if one of the square numbers has the same modulo as another square number, then the total number of final remainders has decreased.<br /><br />In a future post we will investigate why the remainders of square numbers are cyclic.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-33750336105500644102019-09-26T22:14:00.000+02:002019-09-26T22:14:03.748+02:00Python functions for generating all substrings/sublists, combinations, permutations, and sequences<pre class="brush:python">def subs(xs, sub_len):<br /> '''Get all sublists/substrings from xs of the given sub-length.'''<br /> for i in range(len(xs) - sub_len + 1): #Get all indexes of the substrings that are at least sub_len long.<br /> yield xs[i:i+sub_len]<br /><br />for x in subs('abcd', 2):<br /> print(x)<br /></pre><pre>ab<br />bc<br />cd<br /></pre><br /><pre class="brush:python">def combinations(xs, comb_size):<br /> '''Get all combinations of the given combination size.'''<br /> if comb_size == 0:<br /> yield xs[:0] #Empty list/string<br /> else:<br /> for i in range(len(xs) - comb_size + 1):<br /> for ys in combinations(xs[i+1:], comb_size-1): #For every item, combine it with every item that comes after it.<br /> yield xs[i] + ys<br /><br />for x in combinations('abcd', 2):<br /> print(x)<br /></pre><pre>ab<br />ac<br />ad<br />bc<br />bd<br />cd<br /></pre><br /><pre class="brush:python">def perms(xs):<br /> '''Get all permutations.'''<br /> if len(xs) == 1:<br /> yield xs<br /> else:<br /> for i in range(len(xs)):<br /> for ys in perms(xs[:i] + xs[i+1:]): #For every item, combine it with every item except itself.<br /> yield xs[i] + ys<br /><br />for x in perms('abc'):<br /> print(x)<br /></pre><pre>abc<br />acb<br />bac<br />bca<br />cab<br />cba<br /></pre><br /><pre class="brush:python">def permutations(xs, perm_size):<br /> '''Get all permutations of the given permutation size.'''<br /> for cs in combinations(xs, perm_size):<br /> for p in perms(cs): #Get all the permutations of all the combinations.<br /> yield p<br /><br />for x in permutations('abcd', 2):<br /> print(x)<br /></pre><pre>ab<br />ba<br />ac<br />ca<br />ad<br />da<br />bc<br />cb<br />bd<br />db<br />cd<br />dc<br /></pre><br /><pre class="brush:python">def seqs(xs, seq_len):<br /> '''Get all sequences of a given length.'''<br /> if seq_len == 0:<br /> yield xs[:0] #Empty list/string<br /> else:<br /> for i in range(len(xs)):<br /> for ys in seqs(xs, seq_len-1): #For every item, combine it with every item including itself.<br /> yield xs[i] + ys<br /><br />for x in seqs('abc', 2):<br /> print(x)<br /></pre><pre>aa<br />ab<br />ac<br />ba<br />bb<br />bc<br />ca<br />cb<br />cc<br /></pre>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-10742889354980975022019-08-30T08:31:00.000+02:002019-09-01T22:37:16.623+02:00The Fourier transform for real numbers (approximating complicated periodic functions with simple sinusoids)In <a href="https://geekyisawesome.blogspot.com/2018/08/the-mclauren-series-and-taylor-series.html">a previous post</a> we saw how to approximate complex functions using simple polynomials by using the Taylor series approximation method. This consisted in assuming that your complex function was an actual polynomial and then using mathematical tricks to tease out the coefficients of the polynomial one by one. Polynomials are useful functions to approximate curves but are terrible at approximating periodic functions such as $sin(x)$ (repeat themselves periodically) because they themselves are not periodic. In this post we'll see how to approximating periodic functions by using a sum of simple sinusoids (sines or cosines). With polynomials, we had a summation of powers of $x$, each of which having a different power. Now we will instead be having a summation of sinusoids, each of which having a different whole number frequency. With polynomials we had to find the coefficients, which are constants multiplied by the power of $x$, and any curve could be arbitrarily approximated by finding the right coefficients. Now we will be finding amplitudes, which are constants multiplied by each sinusoid, and any periodic function can be arbitrarily approximated by finding the right amplitudes.<br /><br />Instead of the Taylor series approximation we will now be using the Fourier series approximation, also known as the <a href="https://en.wikipedia.org/wiki/Fourier_transform">Fourier transform</a>. In order to keep things simple we will only be looking at real functions rather than complex ones (functions with imaginary components). The Fourier series assumes that the periodic function we're deconstructing, $f(x)$, begins its period at 0 and ends it at $T$, after which $f(x)$ will continue repeating itself such that $f(T + k) = f(k)$. If we'd like our period to start at $x_0$ instead then we can easily just replace our periodic function with $f'(x)$ such that $f'(x) = f(x + x_0)$ and then subtract $x_0$ from $x$ again after getting the approximate function.<br /><br />Let's say that we have a complex periodic function where the first period is between 0 and 2 and is defined as $y = \cosh(x-1)$. This is what it looks like:<br /><br /><a href="https://3.bp.blogspot.com/-2ccIGeSF1F0/XWjCFgOf6lI/AAAAAAAABaI/TJyqkvk_L4MyhPDk5v-3JxoAo9-eaK06gCLcBGAs/s1600/fourier_fx.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-2ccIGeSF1F0/XWjCFgOf6lI/AAAAAAAABaI/TJyqkvk_L4MyhPDk5v-3JxoAo9-eaK06gCLcBGAs/s320/fourier_fx.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />What we mean by this being a periodic function is that if we looked beyond the first period we would see the function looking like this:<br /><br /><a href="https://4.bp.blogspot.com/-OFoQK7URpIs/XWjCK4NUfCI/AAAAAAAABaM/daWNHWSFBEQgcH4df_nFmIdyUEweWeNrQCLcBGAs/s1600/fourier_fx_periodic.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-OFoQK7URpIs/XWjCK4NUfCI/AAAAAAAABaM/daWNHWSFBEQgcH4df_nFmIdyUEweWeNrQCLcBGAs/s320/fourier_fx_periodic.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />We're assuming that our function consists of a summation of sinusoids with different frequencies and amplitudes such that each wavelength is equal to the period of the original function or a whole number division of that period (half, quarter, etc.). To make a sine or cosine have a period equal to $T$, we use $sin(x\frac{2\pi}{T})$ and $cos(x\frac{2\pi}{T})$. Multiplying $x$ by a whole number will change the frequency of the sinusoid such that the wavelength is equal to a whole number division of $T$. For example, $sin(2 x\frac{2\pi}{2})$ will make the sine function repeat itself twice when going from 0 to 2. This is what our sinusoid functions look like for both sine and cosine:<br /><br /><a href="https://2.bp.blogspot.com/-unRmIQdPXKM/XWjCPRreYtI/AAAAAAAABaQ/k30FwnMy7akPynHmpccJdIXd7uqDNVrOwCLcBGAs/s1600/fourier_fx_sinusoids.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-unRmIQdPXKM/XWjCPRreYtI/AAAAAAAABaQ/k30FwnMy7akPynHmpccJdIXd7uqDNVrOwCLcBGAs/s320/fourier_fx_sinusoids.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />By adding up these sines and cosines together whilst having specific amplitudes (that we need to find) we can get closer and closer to the original periodic function.<br /><br />The Fourier transform takes advantage of an interesting quirk of sinusoids and the area under their graph. For two positive whole numbers $n$ and $m$, if you multiply $\sin(n x\frac{2\pi}{T})$ by $\sin(m x\frac{2\pi}{T})$ or $\cos(n x\frac{2\pi}{T})$ by $\cos(m x\frac{2\pi}{T})$, the area under the graph between 0 and $T$ will always be zero, with the only exception being when $n = m$. If $n = m$ then the area will be equal to $\frac{T}{2}$. In other words,<br /><br />$$<br />\int_{0}^{T} \sin\left(n x\frac{2\pi}{T}\right) \sin\left(m x\frac{2\pi}{T}\right) dx = <br />\begin{cases}<br />0& \text{if } n \neq m\\<br />\frac{T}{2}& \text{otherwise}<br />\end{cases}<br />$$<br />and likewise for $\cos$ instead of $\sin$. Here's an example showing the area under the graph of two cosines with different frequencies being multiplied together:<br /><br /><a href="https://2.bp.blogspot.com/-ZNL0ZTzbaYs/XWwqvc4jiTI/AAAAAAAABbA/f_2iCD25zDcxppAqiCDAmdVGl0GdIKpkACLcBGAs/s1600/fourier_cosn_cosm.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-ZNL0ZTzbaYs/XWwqvc4jiTI/AAAAAAAABbA/f_2iCD25zDcxppAqiCDAmdVGl0GdIKpkACLcBGAs/s320/fourier_cosn_cosm.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Note how each hump above the x-axis has a corresponding anti-hump below the x-axis which cancel each other out, resulting in a total area of zero. Here's an example showing the area under the graph of two cosines with the same frequency being multiplied together:<br /><br /><a href="https://4.bp.blogspot.com/-FXluztwcWiU/XWwq03yJGLI/AAAAAAAABbE/MIEr6fP4OS8szbG11-U7zJ3tPDSn1K-ZACLcBGAs/s1600/fourier_cosn_cosn.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-FXluztwcWiU/XWwq03yJGLI/AAAAAAAABbE/MIEr6fP4OS8szbG11-U7zJ3tPDSn1K-ZACLcBGAs/s320/fourier_cosn_cosn.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Note how the area is all above the x-axis. If we had to measure this area, it would be equal to $\frac{T}{2}$ where $T = 2$. Let's prove this for all frequencies.<br /><br />Proof that the area under the product of two sines with different frequencies is 0:<br />$$<br />\begin{align}<br />\int_{0}^{T} \sin\left(n x\frac{2\pi}{T}\right) \sin\left(m x\frac{2\pi}{T}\right) dx<br />&= \int_{0}^{T} \frac{1}{2}\left(\cos\left(n x\frac{2\pi}{T} - m x\frac{2\pi}{T}\right) - \cos\left(n x\frac{2\pi}{T} + m x\frac{2\pi}{T}\right)\right) dx\\<br />&= \frac{1}{2}\left(\int_{0}^{T} \cos\left((n - m) \frac{2\pi}{T}x\right) dx - \int_{0}^{T} \cos\left((n + m) \frac{2\pi}{T}x\right) dx\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left[\sin\left((n - m) \frac{2\pi}{T}x\right)\right]_{0}^{T} - \frac{1}{(n + m) \frac{2\pi}{T}}\left[\sin\left((n + m) \frac{2\pi}{T}x\right)\right]_{0}^{T}\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) \frac{2\pi}{T}T\right) - \sin\left((n - m) \frac{2\pi}{T}0\right)\right) - \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) \frac{2\pi}{T}T\right) - \sin\left((n + m) \frac{2\pi}{T}0\right)\right)\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) 2\pi\right) - \sin\left(0\right)\right) - \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) 2\pi\right) - \sin\left(0\right)\right)\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right) + \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right)\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(0 - 0\right) - \frac{1}{(n + m) \frac{2\pi}{T}}\left(0 - 0\right)\right)\\<br />&= 0\\<br />\end{align}<br />$$<br /><br />Proof that the area under the product of two sines with the same frequencies is half their period:<br />$$<br />\begin{align}<br />\int_{0}^{T} \sin\left(n x\frac{2\pi}{T}\right) \sin\left(n x\frac{2\pi}{T}\right) dx<br />&= \int_{0}^{T} \sin^2\left(n x\frac{2\pi}{T}\right) dx\\<br />&= \int_{0}^{T} \frac{1}{2}\left(1 - \cos\left(2n x\frac{2\pi}{T}\right)\right) dx\\<br />&= \frac{1}{2}\left(\int_{0}^{T} 1 dx - \int_{0}^{T}\cos\left(2n\frac{2\pi}{T} x\right) dx\right)\\<br />&= \frac{1}{2}\left(\left[x \right]_{0}^{T} - \frac{1}{2n\frac{2\pi}{T}}\left[\sin\left(2n\frac{2\pi}{T} x\right)\right]_{0}^{T}\right)\\<br />&= \frac{1}{2}\left(\left(T - 0\right) - \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2n\frac{2\pi}{T} T\right) - \sin\left(2n\frac{2\pi}{T} 0\right)\right)\right)\\<br />&= \frac{1}{2}\left(T - \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2n2\pi\right) - \sin\left(0\right)\right)\right)\\<br />&= \frac{1}{2}\left(T - \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right)\right)\\<br />&= \frac{1}{2}\left(T - \frac{1}{2n\frac{2\pi}{T}}\left(0 - 0\right)\right)\\<br />&= \frac{T}{2}\\<br />\end{align}<br />$$<br /><br />Proof that the area under the product of two cosines with different frequencies is 0:<br />$$<br />\begin{align}<br />\int_{0}^{T} \cos\left(n x\frac{2\pi}{T}\right) \cos\left(m x\frac{2\pi}{T}\right) dx<br />&= \int_{0}^{T} \frac{1}{2}\left(\cos\left(n x\frac{2\pi}{T} - m x\frac{2\pi}{T}\right) + \cos\left(n x\frac{2\pi}{T} + m x\frac{2\pi}{T}\right)\right) dx\\<br />&= \frac{1}{2}\left(\int_{0}^{T} \cos\left((n - m) \frac{2\pi}{T}x\right) dx + \int_{0}^{T} \cos\left((n + m) \frac{2\pi}{T}x\right) dx\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left[\sin\left((n - m) \frac{2\pi}{T}x\right)\right]_{0}^{T} + \frac{1}{(n + m) \frac{2\pi}{T}}\left[\sin\left((n + m) \frac{2\pi}{T}x\right)\right]_{0}^{T}\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) \frac{2\pi}{T}T\right) - \sin\left((n - m) \frac{2\pi}{T}0\right)\right) + \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) \frac{2\pi}{T}T\right) - \sin\left((n + m) \frac{2\pi}{T}0\right)\right)\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) 2\pi\right) - \sin\left(0\right)\right) + \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) 2\pi\right) - \sin\left(0\right)\right)\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right) + \frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right)\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n - m) \frac{2\pi}{T}}\left(0 - 0\right) + \frac{1}{(n + m) \frac{2\pi}{T}}\left(0 - 0\right)\right)\\<br />&= 0\\<br />\end{align}<br />$$<br /><br />Proof that the area under the product of two cosines with the same frequencies is half their period:<br />$$<br />\begin{align}<br />\int_{0}^{T} \cos\left(n x\frac{2\pi}{T}\right) \cos\left(n x\frac{2\pi}{T}\right) dx<br />&= \int_{0}^{T} \cos^2\left(n x\frac{2\pi}{T}\right) dx\\<br />&= \int_{0}^{T} \frac{1}{2}\left(1 + \cos\left(2n x\frac{2\pi}{T}\right)\right) dx\\<br />&= \frac{1}{2}\left(\int_{0}^{T} 1 dx + \int_{0}^{T}\cos\left(2n\frac{2\pi}{T} x\right) dx\right)\\<br />&= \frac{1}{2}\left(\left[x \right]_{0}^{T} + \frac{1}{2n\frac{2\pi}{T}}\left[\sin\left(2n\frac{2\pi}{T} x\right)\right]_{0}^{T}\right)\\<br />&= \frac{1}{2}\left(\left(T - 0\right) + \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2n\frac{2\pi}{T} T\right) - \sin\left(2n\frac{2\pi}{T} 0\right)\right)\right)\\<br />&= \frac{1}{2}\left(T + \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2n2\pi\right) - \sin\left(0\right)\right)\right)\\<br />&= \frac{1}{2}\left(T + \frac{1}{2n\frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right)\right)\\<br />&= \frac{1}{2}\left(T + \frac{1}{2n\frac{2\pi}{T}}\left(0 - 0\right)\right)\\<br />&= \frac{T}{2}\\<br />\end{align}<br />$$<br /><br />Also interestingly, proof that the area under the product of a sine and cosine, regardless of frequency, is 0:<br />$$<br />\begin{align}<br />\int_{0}^{T} \sin\left(n x\frac{2\pi}{T}\right) \cos\left(m x\frac{2\pi}{T}\right) dx<br />&= \int_{0}^{T} \frac{1}{2}\left(\sin\left(n x\frac{2\pi}{T} + m x\frac{2\pi}{T}\right) + \sin\left(n x\frac{2\pi}{T} - m x\frac{2\pi}{T}\right)\right) dx\\<br />&= \frac{1}{2}\left(\int_{0}^{T} \sin\left((n + m) \frac{2\pi}{T}x\right) dx + \int_{0}^{T} \sin\left((n - m) \frac{2\pi}{T}x\right) dx\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n + m) \frac{2\pi}{T}}\left[\sin\left((n + m) \frac{2\pi}{T}x\right)\right]_{0}^{T} + \frac{1}{(n - m) \frac{2\pi}{T}}\left[\sin\left((n - m) \frac{2\pi}{T}x\right)\right]_{0}^{T}\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) \frac{2\pi}{T}T\right) - \sin\left((n + m) \frac{2\pi}{T}0\right)\right) + \frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) \frac{2\pi}{T}T\right) - \sin\left((n - m) \frac{2\pi}{T}0\right)\right)\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left((n + m) 2\pi\right) - \sin\left(0\right)\right) + \frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left((n - m) 2\pi\right) - \sin\left(0\right)\right)\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n + m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right) + \frac{1}{(n - m) \frac{2\pi}{T}}\left(\sin\left(2\pi\right) - \sin\left(0\right)\right)\right)\\<br />&= \frac{1}{2}\left(\frac{1}{(n + m) \frac{2\pi}{T}}\left(0 - 0\right) + \frac{1}{(n - m) \frac{2\pi}{T}}\left(0 - 0\right)\right)\\<br />&= 0\\<br />\end{align}<br />$$<br /><br />Great! Now we can get back to the Fourier transform. Suppose that we have the following sum of sines and cosines:<br />$$<br />\begin{align}<br />f(x) = &a_1 \cos\left(1 x\frac{2\pi}{T}\right) + b_1 \sin\left(1 x\frac{2\pi}{T}\right) + \\<br />&a_2 \cos\left(2 x\frac{2\pi}{T}\right) + b_2 \sin\left(2 x\frac{2\pi}{T}\right) + \\<br />&\dots<br />\end{align}<br />$$<br /><br />How can we tease out the individual amplitudes $a_i$ and $b_i$? Thanks to the identities we proved above, we can now get any amplitude we want by multiplying the function by a sine or cosine, finding the area under the graph, and dividing the area by $\frac{T}{2}$. Here's how it works:<br />$$<br />\begin{align}<br />\frac{2}{T}\int_0^T \cos\left(n x\frac{2\pi}{T}\right) f(x) dx<br />&= \frac{1}{2T}\int_0^T \cos\left(n x\frac{2\pi}{T}\right)<br />\begin{pmatrix}<br />&a_1 cos\left(1 x\frac{2\pi}{T}\right) & + b_1 sin\left(1 x\frac{2\pi}{T}\right) + \\<br />&a_2 cos\left(2 x\frac{2\pi}{T}\right) & + b_2 sin\left(2 x\frac{2\pi}{T}\right) + \\<br />&\dots&<br />\end{pmatrix}<br />dx\\<br />&= \frac{2}{T}<br />\begin{pmatrix}<br />&a_1 \int_0^T\cos\left(n x\frac{2\pi}{T}\right) cos\left(1 x\frac{2\pi}{T}\right) dx & + b_1 \int_0^T\cos\left(n x\frac{2\pi}{T}\right) sin\left(1 x\frac{2\pi}{T}\right) dx + \\<br />&\dots&\\<br />&a_n \int_0^T\cos\left(n x\frac{2\pi}{T}\right) cos\left(n x\frac{2\pi}{T}\right) dx & + b_n \int_0^T\cos\left(n x\frac{2\pi}{T}\right) sin\left(n x\frac{2\pi}{T}\right) dx + \\<br />&\dots&<br />\end{pmatrix}\\<br />&= \frac{2}{T}<br />\begin{pmatrix}<br />&a_1 0 & + b_1 0 + \\<br />&\dots&\\<br />&a_n \frac{T}{2} & + b_n 0 + \\<br />&\dots&<br />\end{pmatrix}\\<br />&= \frac{2}{T} a_n \frac{T}{2}\\<br />&= a_n<br />\end{align}<br />$$<br /><br />And obviously multiplying by $\sin\left(n x\frac{2\pi}{T}\right)$ would yield the amplitude of a sine. All that's left is by how much to vertically lift the graph, that is, the constant term to add to the sinusoids. The sinusoids should oscillate around the average value of the original function in one period, which is found as follows:<br /><br />$y_0 = \frac{1}{T} \int_0^T f(x) dx$<br /><br />Getting back to $f(x) = \cosh(x-1)$, here are the amplitudes:<br /><br />$y_0 = \frac{1}{2} \int_0^2 f(x) dx = 1.1752$<br />$a_1 = \frac{2}{2} \int_0^2 \cos\left(1 x\frac{2\pi}{2}\right) f(x) dx = 0.2162$<br />$b_1 = \frac{2}{2} \int_0^2 \sin\left(1 x\frac{2\pi}{2}\right) f(x) dx = 0.0$<br />$a_2 = \frac{2}{2} \int_0^2 \cos\left(2 x\frac{2\pi}{2}\right) f(x) dx = 0.0581$<br />$b_2 = \frac{2}{2} \int_0^2 \sin\left(2 x\frac{2\pi}{2}\right) f(x) dx = 0.0$<br /><br />And here are the graphs of the approximated function getting better with every new term (where the approximated function is denoted as $\hat{f}$):<br /><br />$\hat{f}(x) = 1.1752$<br /><a href="https://2.bp.blogspot.com/-vgDL3UuWkz4/XWjCa4dFI8I/AAAAAAAABac/Wbik4vorStkVid8QVSqSgkRUIZT4l9wVACLcBGAs/s1600/fourier_approx0.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-vgDL3UuWkz4/XWjCa4dFI8I/AAAAAAAABac/Wbik4vorStkVid8QVSqSgkRUIZT4l9wVACLcBGAs/s320/fourier_approx0.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />$\hat{f}(x) = 1.1752 + 0.2162 \cos\left(1 x\frac{2\pi}{2}\right)$<br /><a href="https://1.bp.blogspot.com/-GQp7rgupQN8/XWjCeT9UfvI/AAAAAAAABag/54W9z8Ppmik2XB-GNaGKzC2u1J4Ku_XOACLcBGAs/s1600/fourier_approx1.png" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-GQp7rgupQN8/XWjCeT9UfvI/AAAAAAAABag/54W9z8Ppmik2XB-GNaGKzC2u1J4Ku_XOACLcBGAs/s320/fourier_approx1.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />$\hat{f}(x) = 1.1752 + 0.2162\cos\left(1 x\frac{2\pi}{2}\right) + 0.0581\cos\left(2 x\frac{2\pi}{2}\right)$<br /><a href="https://1.bp.blogspot.com/-7f4j8nVLWfM/XWjCjISpKYI/AAAAAAAABao/VYZF1Iy6FuAshaoc4EQmk2VtFQ38tQQGgCLcBGAs/s1600/fourier_approx2.png" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-7f4j8nVLWfM/XWjCjISpKYI/AAAAAAAABao/VYZF1Iy6FuAshaoc4EQmk2VtFQ38tQQGgCLcBGAs/s320/fourier_approx2.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />And of course here is the function beyond the first period:<br /><a href="https://2.bp.blogspot.com/-a5ZEA9-6Hbk/XWjCnL4vRnI/AAAAAAAABas/787bgLtObkUd6Kvfq-ABtyrlEREtNHwZgCLcBGAs/s1600/fourier_approx_periodic.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-a5ZEA9-6Hbk/XWjCnL4vRnI/AAAAAAAABas/787bgLtObkUd6Kvfq-ABtyrlEREtNHwZgCLcBGAs/s320/fourier_approx_periodic.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-45104141571280924102019-07-31T11:45:00.000+02:002019-07-31T12:35:54.674+02:00Recognising simple shapes with OpenCV in PythonConvolutional neural networks are usually used to recognise objects in images but that's a bit of an overkill if you just want to recognise simple flat shapes. Here's how to use OpenCV to do simple object recognition.<br /><br />Basically the trick to recognise polygons is to convert your image into an approximate polygon representation using something like edge detection and then count the number of sides in the polygon. OpenCV handles a lot of this stuff for you so its quite easy.<br /><br />Here's the image we will be working on:<br /><a href="https://3.bp.blogspot.com/-Evt3--O224c/XUFbwNCrHDI/AAAAAAAABYg/24ChYPwvluIzMNIHaHvu_fpI2j2yR8VUgCLcBGAs/s1600/opencv_polygons_1.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-Evt3--O224c/XUFbwNCrHDI/AAAAAAAABYg/24ChYPwvluIzMNIHaHvu_fpI2j2yR8VUgCLcBGAs/s320/opencv_polygons_1.png" width="320" height="320" data-original-width="300" data-original-height="300" /></a><br /><br />The first step is to binarise the pixels of the image, that is they are made either black or white. First the image must be turned into greyscale so that there is just one number per pixel and then anything that is not white (greyscale value 255) is replaced with 255 (white) whilst pure white is replaced with 0 (black).<br /><br /><pre class="brush:python">import cv2<br />import numpy as np<br /><br />img = cv2.imread('polygons.png', cv2.IMREAD_COLOR)<br /><br />#Make image greyscale<br />grey = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)<br />cv2.imshow('greyscale', grey)<br /><br />#Binarise greyscale pixels<br />(_, thresh) = cv2.threshold(grey, 254, 255, 1)<br />cv2.imshow('thresholded', thresh)<br /></pre><br /><a href="https://4.bp.blogspot.com/-CBY4T_i7jCQ/XUFcTcz41bI/AAAAAAAABYo/b07bY6BperAuDkA7WZ_i1nNmtQkohlOVgCLcBGAs/s1600/opencv_polygons_2.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-CBY4T_i7jCQ/XUFcTcz41bI/AAAAAAAABYo/b07bY6BperAuDkA7WZ_i1nNmtQkohlOVgCLcBGAs/s320/opencv_polygons_2.png" width="291" height="320" data-original-width="455" data-original-height="501" /></a><br /><br /><a href="https://1.bp.blogspot.com/-xLv8LA2EmTE/XUFcY0Mb7jI/AAAAAAAABYs/p8O6XX09IegUSSbODKG8OfOS6a5kKh0MgCLcBGAs/s1600/opencv_polygons_3.png" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-xLv8LA2EmTE/XUFcY0Mb7jI/AAAAAAAABYs/p8O6XX09IegUSSbODKG8OfOS6a5kKh0MgCLcBGAs/s320/opencv_polygons_3.png" width="291" height="320" data-original-width="455" data-original-height="501" /></a><br /><br />Next we're going to find contours around these white shapes, that is, reduce the image into a set of points with each point being a corner in the shape. This is done using the findContours function which returns a list consisting of contour points for every shape and their hierarchy. The <a href="https://docs.opencv.org/trunk/d9/d8b/tutorial_py_contours_hierarchy.html">hierarchy</a> is the way each shape is nested within other shapes. We won't need this here since we don't have any shapes within shapes but it will be useful in other situations.<br /><br /><pre class="brush:python">#Get shape contours<br />(contours, hierarchy) = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)<br />thresh_copy = cv2.cvtColor(thresh, cv2.COLOR_GRAY2RGB)<br />cv2.drawContours(thresh_copy, contours, contourIdx=-1, color=(0, 255, 0), thickness=2)<br />print('number of sides per shape:')<br />for contour in contours:<br /> print('', contour.shape[0])<br />print()<br />cv2.imshow('contours', thresh_copy)<br /></pre><br /><a href="https://4.bp.blogspot.com/-2YXQAftTfnU/XUFcyHq6oSI/AAAAAAAABY4/c_Iy60O7maMg7M0icKVevTpbNl1DyqyegCLcBGAs/s1600/opencv_polygons_4.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-2YXQAftTfnU/XUFcyHq6oSI/AAAAAAAABY4/c_Iy60O7maMg7M0icKVevTpbNl1DyqyegCLcBGAs/s320/opencv_polygons_4.png" width="291" height="320" data-original-width="455" data-original-height="501" /></a><br /><br /><pre>number of sides per shape:<br /> 119<br /> 122<br /> 4<br /> 124<br /></pre><br />Unfortunately, the number of sides extracted from each shape is nowhere near what it should be. This is because the corners in the shapes are rounded which results in multiple sides around the corner. We therefore need to simplify these contours so that insignificant sides can be removed. This is done using the <a href="https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm">Ramerâ€“Douglasâ€“Peucker algorithm</a> which is implemented as the approxPolyDP function. The algorithm requires a parameter called epsilon which controls how roughly to chop up the polygon into a smaller number of sides. This number is dependent on the size of the polygon so we make it a fraction of the shape's perimeter.<br /><br /><pre class="brush:python">#Simplify contours<br />thresh_copy = cv2.cvtColor(thresh, cv2.COLOR_GRAY2RGB)<br />print('number of sides per shape:')<br />for contour in contours:<br /> perimeter = cv2.arcLength(contour, True)<br /> e = 0.05*perimeter #The bigger the fraction, the more sides are chopped off the original polygon<br /> contour = cv2.approxPolyDP(contour, epsilon=e, closed=True)<br /> cv2.drawContours(thresh_copy, [ contour ], contourIdx=-1, color=(0, 255, 0), thickness=2)<br /> print('', contour.shape[0])<br />cv2.imshow('simplified contours', thresh_copy)<br /></pre><br /><a href="https://2.bp.blogspot.com/-GbwvsFF_dUQ/XUFdFz0sgoI/AAAAAAAABZE/i5nLORJl52MYBPyj7j-KkaN_HtIhgPCjwCLcBGAs/s1600/opencv_polygons_5.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-GbwvsFF_dUQ/XUFdFz0sgoI/AAAAAAAABZE/i5nLORJl52MYBPyj7j-KkaN_HtIhgPCjwCLcBGAs/s320/opencv_polygons_5.png" width="291" height="320" data-original-width="455" data-original-height="501" /></a><br /><br /><pre>number of sides per shape:<br /> 6<br /> 5<br /> 4<br /> 3<br /></pre><br />And with this we have the number of sides in each polygon.<br /><br />If you want to check for circles as well as polygons then you will not be able to do so by counting sides. Instead you can get the minimum enclosing circle around a contour and check if its area is close to the area of the contour (before it is simplified):<br /><br /><pre class="brush:python">((centre_x, centre_y), radius) = cv2.minEnclosingCircle(contour)<br />if cv2.contourArea(contour)/(np.pi*radius**2) > 0.95:<br /> print('circle')<br /></pre><br />The minimum enclosing circle is the smallest circle that completely contains the contour. Therefore it's area must necessarily be larger or equal to the shape of the contour, which can only be equal in the case that the contour is a circle.<br /><br />This is also one way how you can get the position of each shape, by getting the centre point of the enclosing circle. The proper way to get a single coordinate representing the position of the contour is to get the centre of gravity using the contour's moments:<br /><br /><pre class="brush:python">m = cv2.moments(contour)<br />x = m['m10']/m['m00']<br />y = m['m01']/m['m00']<br /></pre><br /><hr /><br />Here's the full code:<br /><br /><pre class="brush:python">import cv2<br />import numpy as np<br /><br />img = cv2.imread('polygons.png', cv2.IMREAD_COLOR)<br /><br />#Binarise pixels<br />grey = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)<br />(_, thresh) = cv2.threshold(grey, 254, 255, 1)<br /><br />#Get shape contours<br />(contours, hierarchy) = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)<br /><br />#Recognise shapes<br />for contour in contours:<br /> m = cv2.moments(contour)<br /> x = round(m['m10']/m['m00'])<br /> y = round(m['m01']/m['m00'])<br /><br /> shape_name = None<br /><br /> (_, radius) = cv2.minEnclosingCircle(contour)<br /> if cv2.contourArea(contour)/(np.pi*radius**2) > 0.95:<br /> shape_name = 'circle'<br /> else:<br /> e = 0.05*cv2.arcLength(contour, True)<br /> simple_contour = cv2.approxPolyDP(contour, epsilon=e, closed=True)<br /> num_sides = simple_contour.shape[0]<br /> shape_name = { 3: 'triangle', 4: 'quad', 5: 'pentagon', 6: 'hexagon' }.get(num_sides, 'unknown')<br /><br /> cv2.putText(img, shape_name, (x, y), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.6, color=(0, 255, 0), thickness=2)<br /><br />cv2.imshow('named shapes', img)<br /></pre><br /><a href="https://2.bp.blogspot.com/-rKBGSrLd-d8/XUFuqMgZ67I/AAAAAAAABZQ/MZ6u2OdS2ckoGaJ8OG8UvGdtZbP3dFNkgCLcBGAs/s1600/opencv_polygons_final.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-rKBGSrLd-d8/XUFuqMgZ67I/AAAAAAAABZQ/MZ6u2OdS2ckoGaJ8OG8UvGdtZbP3dFNkgCLcBGAs/s320/opencv_polygons_final.png" width="291" height="320" data-original-width="455" data-original-height="501" /></a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-50741165580673232922019-06-28T12:28:00.000+02:002019-06-28T12:32:32.629+02:00The exploding gradient problem: Why your neural network gives NaNsHave you ever been training your neural network and suddenly got a bunch of NaNs as loss values? This is usually because of something called the exploding gradient problem, which is when your neural network's gradients, used to train it, are extremely large, leading to overflow errors when performing backpropagation.<br /><br />The exploding gradient problem is mostly caused by using large weight values in combination with using a large number of layers. It happens a lot with RNNs which would be neural networks with as many layers as there are items in the input sequence. Basically, given a particular activation function, with every additional layer the gradient of the parameters in the first layer can either vanish or explode as the gradient consists of a multiplication of the weights of the layers in front of it together. If the weights are fractions then their product will get closer and closer to zero (vanishes) with every additional layer, whilst if the weights are large then their product will get closer and closer to infinity (explodes) with every additional layer. Vanishing gradients lead to no learning in the early layers whilst exploding gradients leads to NaNs.<br /><br />In every explanation of this that I see, I always just see general reasoning without any actual examples of what this actually looks like. In order to illustrate what exploding gradients actually look like, we're going to make some simplifications such as that we're building a neural network with just one neural unit per layer and the single weight and bias in each layer will be the same. This is not a necessary condition to reach exploding gradients, but it will help visualise what is going on. Let's see an example of what happens when we have just 3 layers and a weight equal to 8.<br /><br />ReLU is an activation function that is known for mitigating the vanishing gradient problem, but it also makes it easy to create exploding gradients if the weights are large enough, which is why weights must be initialised to very small values. Here is what the graph produced by a neural network looks like when it consists of single ReLU units in each layer (single number input and single number output as well) as the number of layers varies between 1 (pink), 2 (red), and 3 (dark red).<br />$y = \text{relu}(w \times x)$<br />$y = \text{relu}(w \times \text{relu}(w \times x))$<br />$y = \text{relu}(w \times \text{relu}(w \times \text{relu}(w \times x)))$<br /><a href="https://3.bp.blogspot.com/-zT9J7O8W6vs/XRXeax56uOI/AAAAAAAABWU/mL8CjWKWTxgFY6GbpJf3U2FuIpmVjV_9gCLcBGAs/s1600/explodinggrads_relu.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-zT9J7O8W6vs/XRXeax56uOI/AAAAAAAABWU/mL8CjWKWTxgFY6GbpJf3U2FuIpmVjV_9gCLcBGAs/s320/explodinggrads_relu.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />See how the steepness grows exponentially with every additional layer? That's because the gradient is basically the product of all the weights in all the layers, which means that, since $w$ is equal to 8, the gradient is increasing 8-fold with each additional layer.<br /><br />Now with ReLU, this sort of thing is kind of expected as there is not bound to the value returned by it, so there should also be no bound to the gradient. But this also happens with squashing functions like tanh. Even though its returned value must be between 1 and -1, the gradient at particular points of the function can also be exceptionally large. This means that if you're training the neural network at a point in parameter space which has a very steep gradient, even if it's just a short burst, you'll end up with overflow errors or at the very least with shooting off the learning trajectory. This is what the graph produced by a neural network looksl ike when it consists of single tanh units in each layer.<br />$y = \text{tanh}(w \times x)$<br />$y = \text{tanh}(w \times \text{tanh}(w \times x))$<br />$y = \text{tanh}(w \times \text{tanh}(w \times \text{tanh}(w \times x)))$<br /><a href="https://3.bp.blogspot.com/-faTwhD0yrD8/XRXhOmNSCzI/AAAAAAAABWg/lkqoxXPBujEkwvntcSpVDTxr0s86SKn1QCLcBGAs/s1600/explodinggrads_tanh.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-faTwhD0yrD8/XRXhOmNSCzI/AAAAAAAABWg/lkqoxXPBujEkwvntcSpVDTxr0s86SKn1QCLcBGAs/s320/explodinggrads_tanh.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />See how the slope that transitions the value from -1 to 1 gets steeper and steeper as the number of layers increases? This means that if you're training a simple RNN that uses tanh as an activation function, you can either make the state weights very small or make the initial state values very large in order to stay off the middle slope. Of course this would have other problems as then the initial state wouldn't be able to easily learn to set the initial state to any set of values. It might also be the case that there is no single slope but that there are several slopes throughout the graph since the only limitation that tanh imposes is that all values occur between -1 and 1. For example, if the previous layer learns to perform some kind of sinusoid-like pattern that alternates between -5 and 5 (remember that a neural network with large enough layers can approximate any function), then this is what passing that through tanh would look like (note that this equation is a valid neural network with a hidden layer size of 3 and a single input and output):<br />$y = \text{tanh}(5 \times \text{tanh}(40 \times x-5) - 5 \times \text{tanh}(40 \times x+0) + 5 \times \text{tanh}(40 \times x+5))$<br /><a href="https://4.bp.blogspot.com/-obs-rODrXdg/XRXntm5HVWI/AAAAAAAABW8/7ap8ptJW1JA2F70venP340iUP4P1_xNewCLcBGAs/s1600/explodinggrads_tanh_sin.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-obs-rODrXdg/XRXntm5HVWI/AAAAAAAABW8/7ap8ptJW1JA2F70venP340iUP4P1_xNewCLcBGAs/s320/explodinggrads_tanh_sin.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />In this case you can see how it is possible for there to be many steep slopes spread around the parameter space, depending on what the previous layers are doing. Your parameter space could be a minefield.<br /><br />Now sigmoid is a little more tricky to see how it explodes its gradients. If you just do what we did above, you'll get the following result:<br />$y = \text{sig}(w \times x)$<br />$y = \text{sig}(w \times \text{sig}(w \times x))$<br />$y = \text{sig}(w \times \text{sig}(w \times \text{sig}(w \times x)))$<br /><a href="https://4.bp.blogspot.com/-Gttx8ytPtok/XRXpSn4QygI/AAAAAAAABXI/_c_PgvXJPzM5bsCiPvRAgX1E-hCJUAY_gCLcBGAs/s1600/explodinggrads_sig_nobias.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-Gttx8ytPtok/XRXpSn4QygI/AAAAAAAABXI/_c_PgvXJPzM5bsCiPvRAgX1E-hCJUAY_gCLcBGAs/s320/explodinggrads_sig_nobias.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />The slopes actually get flatter with every additional layer and get closer to $y=1$. This is because, contrary to tanh, sigmoid bounds itself to be between 0 and 1, with sigmoid(0) = 0.5. This means that $\text{sig}(\text{sigmoid}(x))$ will be bounded between 0.5 and 1, since the lowest the innermost sigmoid can go is 0, which will be mapped to 0.5 by the outer most sigmoid. With each additional nested sigmoid you'll just be pushing that lower bound closer and closer toward 1 until the graph becomes a flat line at $y=1$. In fact, in order to see exploding gradients we need to make use of the biases (which up till now were set to 0). Setting the biases to $-\frac{w}{2}$ gives very nice curves:<br />$y = \text{sig}(w \times x)$<br />$y = \text{sig}(w \times \text{sig}(w \times x) - \frac{w}{2})$<br />$y = \text{sig}(w \times \text{sig}(w \times \text{sig}(w \times x) - \frac{w}{2}) - \frac{w}{2})$<br /><a href="https://4.bp.blogspot.com/-gOy7aITnXwI/XRXqoFfKM7I/AAAAAAAABXU/3GA60aXT-mg1lOnMQT6sevCcf3CIaJEowCLcBGAs/s1600/explodinggrads_sig.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-gOy7aITnXwI/XRXqoFfKM7I/AAAAAAAABXU/3GA60aXT-mg1lOnMQT6sevCcf3CIaJEowCLcBGAs/s320/explodinggrads_sig.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Note that with sigmoid the steepness of the slope increases very slowly compared to tanh, which means that you'll need to use either larger weights or more layers to get the same dramatic effect. Also note that all these graphs were for weights being equal to 8, which is really large, but if you have many layers as in the case of a simple RNN working on long sentences, even a weight of 0.7 would explode after enough inputs.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-74522759088179742722019-05-30T10:37:00.000+02:002019-05-30T10:37:15.985+02:00Word square makerI made a word square searcher. A <a href="https://en.wikipedia.org/wiki/Word_square">word square</a> is a square grid with equal length words in each row and column such that the word in the first row is the word in the first column, the word in the second row is the word in the second column, and so on. Here's an example:<br /><pre>care<br />acid<br />ring<br />edge<br /></pre><br />It works by taking a file of words and indexing them by character and position such that I can find all the words that have a particular character at a particular position. The program will then build every possible word square that can be made from these words. It goes through all the words and tries them all as a first row/column in a word square. Each word is used as a basis for finding the rest of the words.<br /><br />In the above example, the first row/column is "care". In order to find a suitable word for the second row/column, the index is used to find all the words that have a first letter equal to the first row's second letter. "acid"'s first letter is equal to "care"'s second letter, so it is a good candidate. Each one of these candidate words is used tentatively such that if it doesn't pan out then we can backtrack and try another candidate word. For the third row/column, the index is used to find all the words that have a first letter equal to the first row's third letter and a second letter equal to the second row's third letter. The intersection of these two groups of words would be all the suitable words that can be used as a third row. "ring"'s first letter is equal to "care"'s third letter and its second letter is equal to "acid"'s third letter as well, so it is a good candidate for the third row. This process keeps on going until none of the words make a word square given the first row/column. The program then moves on to another possible word in the first row/column.<br /><br />You can find the program here: <a href="https://onlinegdb.com/By7XtdhpE">https://onlinegdb.com/By7XtdhpE</a>.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-33614535457461529402019-05-01T13:50:00.001+02:002019-05-01T13:50:35.693+02:00Neutral Room Escape Game: Lights - Plus 7 Times 10 puzzle solverI've been playing a point and click game called <a href="http://neutralxe.net/esc/r2.html">Lights</a> which features a puzzle consisting of a 4 digit counter which starts from '0000' and which needs to be made equal to a certain number by a sequence of operations which are to either add 7 to the current value or multiply the current value by 10. I wrote a Python program to quickly solve the puzzle and I'm sharing the code for any one who needs to use it as well. You can find the program online at <a href="https://onlinegdb.com/rkg_l_ZDiV">https://onlinegdb.com/rkg_l_ZDiV</a>. Just run the program, enter the number you need to reach when requested and you'll see a sequence of steps showing which operations to perform in order to solve the puzzle in the least number of steps. The program works using a breadth first search which explores all possible moves without repeating previously reached numbers until the goal number is reached. Enjoy!<br /><br /><br /><br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-10442361374087890842019-04-30T22:20:00.000+02:002019-04-30T22:20:10.418+02:00Sliding tile puzzle solverSo I've been playing a point and click game called <a href="http://akarika.net/games/escape/loom_blend.htm">Loom Blend</a> which features a <a href="https://en.wikipedia.org/wiki/Sliding_puzzle">sliding tile puzzle</a>. I wrote a Python program to quickly solve the puzzle there and I'm sharing the code for any one who needs to use it as well. You can find the program online at <a href="https://onlinegdb.com/BkkrnXLsN">https://onlinegdb.com/BkkrnXLsN</a>. Just run the program and you'll see a sequence of steps showing which tiles to move in order to solve the puzzle in the least number of steps. The program works using a breadth first search which explores all possible moves without repeating previously reached tile configurations until the goal configuration is reached. Enjoy!Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-56345914555229401872019-03-17T15:42:00.002+01:002019-03-17T15:46:16.872+01:00The Gambler's Fallacy: Why after tossing 100 heads in a row you can still get heads againThe <a href="https://en.wikipedia.org/wiki/Gambler's_fallacy">Gambler's fallacy</a> is a deceptively intuitive fallacy about how randomness works. Imagine you're tossing a fair coin repeatedly. You're astonished to see that every toss is resulting in heads. Heads, heads, heads, heads, over and over. As the number of consequtive heads keeps going up, does the probability of getting tails start increasing? It seems reasonable to conclude this as the number of heads and tails should balance each other out given that the coin is fair, as is stated by the <a href="https://en.wikipedia.org/wiki/Law_of_large_numbers">Law of Large Numbers</a>. But the idea that there is some kind of memory in the coin that is used to prevent the fair coin from acting as if it is unfair is wrong.<br /><br />First, let's show that the Gambler's Fallacy is actually a fallacy by simulating a bunch of coin tosses in a program. We'll keep track of all the times a streak of 10 heads was obtained and the following toss after it and see if the number of heads following the streak is significantly less than the number of tails. We do not consider 11 heads in a row as two streaks of 10 heads but instead start searching for a completely new streak after every streak. Here is a Python 3 program to do that (we also count the number of heads and tails obtained in general to show that the simulated coin is not biased).<br /><br /><pre class="brush:python">import random<br /><br />seed = 0<br />num_trails = 1000000<br />head_streak_length_limit = 10<br /><br />random.seed(seed)<br /><br />num_heads = 0<br />num_tails = 0<br />heads_streak_length = 0<br />next_was_head = 0<br />next_was_tail = 0<br />for trial in range(num_trails):<br /> result = random.choice(['head', 'tail'])<br /><br /> if result == 'head':<br /> num_heads += 1<br /> else:<br /> num_tails += 1<br /><br /> if heads_streak_length < head_streak_length_limit:<br /> if result == 'head':<br /> heads_streak_length += 1<br /> else:<br /> heads_streak_length = 0<br /> else:<br /> if result == 'head':<br /> next_was_head += 1<br /> else:<br /> next_was_tail += 1<br /><br /> heads_streak_length = 0<br /><br />print('heads / tails:', round(num_heads/num_tails, 4))<br />print('after heads streak was head / after heads streak was tail:', round(next_was_head/next_was_tail, 4))<br /></pre><br />And here are the results:<br /><pre>heads / tails: 1.002<br />after heads streak was head / after heads streak was tail: 1.0579<br /></pre><br />Both the ratio of heads and tails in general and the ratio of heads and tails following a streak of 10 heads in a row are close to 1. You can try changing the streak length limit to see how making it shorter does not significantly change this ratio.<br /><br />So why does this happen? It's a simple matter of probability theory. Both the probability of getting a head and of getting a tail is 0.5 and each toss is independent from the previous one. Therefore the probability of getting a head after 10 heads is $0.5^{10} \times 0.5 = 0.5^{11}$ and the probability of getting a tail after 10 heads is also $0.5^{10} \times 0.5 = 0.5^{11}$. It does not make a difference what the previous tosses were, the next toss being heads will always have a probability of 0.5, so both heads and tails are equally likely.<br /><br />So then the more interesting question is why do we think otherwise? Why do we think that as the chain of consecutive heads gets longer, the probability of getting another head goes down? It could be because we expect the coin to be fair and a coin that never gives tails is a biased coin, but that doesn't really mean anything in terms of predicting what the next toss will be. It is entirely possible that the fairest possible coin will give a million heads in a row. There is nothing stopping this from happening. Not only that, but a million heads in a row is just as likely to happen as any other random looking sequence resulting from a million coin tosses. Let's write down a sequence of heads and tails as a string of 'H's and 'T's. The sequence 'HHHHHH' is just as likely to come up as the sequence 'HHHTTH', yet we think of the first sequence as being unlikely whilst of the second sequence as... what? Is the second sequence more likely to come up than the first? Would you bet more money on the second sequence than the first? This is what what I think is the root cause of the Gambler's Fallacy.<br /><br />What I think happens in our mind is that we don't really predict whether the next toss will give a heads or a tails but whether the whole sequence will turn out to be an 'interesting' looking sequence or not. We love patterns, and when we see that a random sequence generator (coin tosses in this case) is giving out a sequence that looks like a simple pattern, we become suspicious of the randomness behind the sequence generator and assume that it isn't really random. I mean <a href="https://www.youtube.com/watch?v=tP-Ipsat90c">we're pretty terrible at generating random sequences manually</a> because we assume that in a random process you will not get things like a streak of heads. But it's not really just a streak of heads that would make us suspicious isn't it? It's also a sequence of alternating heads and tails for example or a long sequence of heads followed by a long sequence of tails. I think that the source of the Gambler's Fallacy is that we categorise entire sequences into interesting and uninteresting sequences. As we watch a sequence being generated one toss at a time we expect that interesting sequences of that length are much less likely to happen that uninteresting ones, which is true. This makes us expect the next toss to break the interestingness of the current sequence, which, in the case of a streak of heads, is by the next toss turning out to be tails. Of course, although interesting sequences are less likely to happen as the length grows, given that the probability of the interesting pattern becoming uninteresting on the next toss is equal to the probability of it remaining interesting, it is still an incorrect assumption.<br /><br />So what is the probability of an interesting sequence? This is a subjective question but we can still try to investigate it. Ideally a survey is conducted on a population of people to check what is considered as interesting by people on average. But that would be weird so I'll just conduct the survey on myself. I have generated all the possible coin tosses that can be obtained after 4 coin tosses and I will be annotating each sequence according to what I find interesting in it. I also did the same on 6 coin tosses to see how the ratio of interesting to uninteresting sequences changes with sequence length.<br /><br />Here is what I find interesting in sequences of 4 coin tosses.<br /><table border="1"><tr><td>TTTT</td><td style="font-weight:bold;">same</td></tr><tr><td>TTTH</td><td>not interesting</td></tr><tr><td>TTHT</td><td>not interesting</td></tr><tr><td>TTHH</td><td style="font-weight:bold;">switched</td></tr><tr><td>THTT</td><td>not interesting</td></tr><tr><td>THTH</td><td style="font-weight:bold;">repeated</td></tr><tr><td>THHT</td><td style="font-weight:bold;">mirrored</td></tr><tr><td>THHH</td><td>not interesting</td></tr><tr><td>HTTT</td><td>not interesting</td></tr><tr><td>HTTH</td><td style="font-weight:bold;">mirrored</td></tr><tr><td>HTHT</td><td style="font-weight:bold;">repeated</td></tr><tr><td>HTHH</td><td>not interesting</td></tr><tr><td>HHTT</td><td style="font-weight:bold;">switched</td></tr><tr><td>HHTH</td><td>not interesting</td></tr><tr><td>HHHT</td><td>not interesting</td></tr><tr><td>HHHH</td><td style="font-weight:bold;">same</td></tr></table><br />And here is what I find interesting in sequences of 6 coin tosses (organised into two columns to reduce vertical space).<br /><table border="1"><tr><td>TTTTTT</td><td style="font-weight:bold;">same</td><td>TTTTTH</td><td>not interesting</td></tr><tr><td>TTTTHT</td><td>not interesting</td><td>TTTTHH</td><td>not interesting</td></tr><tr><td>TTTHTT</td><td>not interesting</td><td>TTTHTH</td><td>not interesting</td></tr><tr><td>TTTHHT</td><td>not interesting</td><td>TTTHHH</td><td style="font-weight:bold;">switched</td></tr><tr><td>TTHTTT</td><td>not interesting</td><td>TTHTTH</td><td style="font-weight:bold;">doubled</td></tr><tr><td>TTHTHT</td><td>not interesting</td><td>TTHTHH</td><td>not interesting</td></tr><tr><td>TTHHTT</td><td style="font-weight:bold;">mirrored</td><td>TTHHTH</td><td>not interesting</td></tr><tr><td>TTHHHT</td><td>not interesting</td><td>TTHHHH</td><td>not interesting</td></tr><tr><td>THTTTT</td><td>not interesting</td><td>THTTTH</td><td>not interesting</td></tr><tr><td>THTTHT</td><td style="font-weight:bold;">mirrored</td><td>THTTHH</td><td>not interesting</td></tr><tr><td>THTHTT</td><td>not interesting</td><td>THTHTH</td><td style="font-weight:bold;">alternated</td></tr><tr><td>THTHHT</td><td>not interesting</td><td>THTHHH</td><td>not interesting</td></tr><tr><td>THHTTT</td><td>not interesting</td><td>THHTTH</td><td>not interesting</td></tr><tr><td>THHTHT</td><td>not interesting</td><td>THHTHH</td><td style="font-weight:bold;">doubled</td></tr><tr><td>THHHTT</td><td>not interesting</td><td>THHHTH</td><td>not interesting</td></tr><tr><td>THHHHT</td><td style="font-weight:bold;">mirrored</td><td>THHHHH</td><td>not interesting</td></tr><tr><td>HTTTTT</td><td>not interesting</td><td>HTTTTH</td><td style="font-weight:bold;">mirrored</td></tr><tr><td>HTTTHT</td><td>not interesting</td><td>HTTTHH</td><td>not interesting</td></tr><tr><td>HTTHTT</td><td style="font-weight:bold;">doubled</td><td>HTTHTH</td><td>not interesting</td></tr><tr><td>HTTHHT</td><td style="font-weight:bold;">don't know</td><td>HTTHHH</td><td>not interesting</td></tr><tr><td>HTHTTT</td><td>not interesting</td><td>HTHTTH</td><td>not interesting</td></tr><tr><td>HTHTHT</td><td style="font-weight:bold;">alternated</td><td>HTHTHH</td><td>not interesting</td></tr><tr><td>HTHHTT</td><td>not interesting</td><td>HTHHTH</td><td style="font-weight:bold;">mirrored</td></tr><tr><td>HTHHHT</td><td>not interesting</td><td>HTHHHH</td><td>not interesting</td></tr><tr><td>HHTTTT</td><td>not interesting</td><td>HHTTTH</td><td>not interesting</td></tr><tr><td>HHTTHT</td><td>not interesting</td><td>HHTTHH</td><td style="font-weight:bold;">mirrored</td></tr><tr><td>HHTHTT</td><td>not interesting</td><td>HHTHTH</td><td>not interesting</td></tr><tr><td>HHTHHT</td><td style="font-weight:bold;">doubled</td><td>HHTHHH</td><td>not interesting</td></tr><tr><td>HHHTTT</td><td style="font-weight:bold;">switched</td><td>HHHTTH</td><td>not interesting</td></tr><tr><td>HHHTHT</td><td>not interesting</td><td>HHHTHH</td><td>not interesting</td></tr><tr><td>HHHHTT</td><td>not interesting</td><td>HHHHTH</td><td>not interesting</td></tr><tr><td>HHHHHT</td><td>not interesting</td><td>HHHHHH</td><td style="font-weight:bold;">same</td></tr></table><br />Among the 4 coin toss sequences, 8 out of the 16 sequences were considered interesting, meaning that you have a 50% chance of generating an interesting sequence. But among the 6 coin toss sequences, only 16 out of the 64 sequences were considered interesting, meaning that you have a 25% chance of generating an interesting sequence. Now I don't know if the patterns I identified in the tables above are the only interesting patterns for any sequence length, since maybe some patterns only emerge in very long sequences, but it is safe to say that the longer the sequence, the greater the proportion of uninteresting and random looking sequences will be. Therefore, the longer the sequence, the less likely that a sequence will be interesting. This might be what is going on when we think that the probability of getting a tails goes up as the streak of heads gets longer; we'd really be comparing how likely it is for a sequence of a given length to be uninteresting, which becomes less likely as the length grows.<br /><br /><div class="sectitle">Interesting patterns are interesting</div>I have to admit that whilst annotating interesting sequences I was more interested in being consistent in my reasoning than in saying what I genuinely found interesting. This is because I didn't actually feel that, for example, all "repeated" patterns were interesting, just some of them. But I wanted to avoid explaining myself too much based on what I felt. This means that there might be more going on in feeling that a sequence is aesthetically pleasing than simple mathematical patterns. It would be interesting if someone made an experiment where people are subjected to an EEG to see what is really considered interesting and what is not. How knows, maybe it might even turn out to be a kind of Rorschach test where different patterns of interest indicate different personalities. And maybe it might even be an interesting machine learning challenge to try to make a program predict which sequences would be considered interesting by humans in order to quantify how interesting a given sequence is.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-83109685167918923362019-02-26T17:28:00.001+01:002019-02-26T17:28:59.170+01:00MySQL extra inner join taking too long: optimizer_search_depthMySQL has a <a href="https://dev.mysql.com/doc/refman/5.5/en/optimize-overview.html">query optimiser</a> that takes in SQL queries and tries to find the best way to execute them. Then the query gets executed using the best execution plan found. When you use joins in your query, the time the optimiser takes to finish optimising is <a href="https://dev.mysql.com/doc/refman/5.5/en/controlling-query-plan-evaluation.html">exponential to the number of joins</a>. This means that for a few joins it only takes a negligible amount of time to finish but after a point, which is somewhere between 7 and 10 tables, the optimisation time will shoot up. In my case I went from 0.1 seconds to 17 seconds but just adding another table in my joins.<br /><br />To check if this is happening to you, you can <a href="https://dev.mysql.com/doc/refman/5.5/en/show-profile.html">profile your query</a> by executing the following:<br /><br /><pre class="brush:python">SET profiling = 1;<br /><br />*your query*<br /><br />SHOW PROFILE FOR QUERY 1;<br /></pre><br />(If you using phpMyAdmin you might need to add "SHOW PROFILE;" as well at the end)<br /><br />This will show you a table of execution times for each phase in the execution of the query. The time taken by the query optimiser is in the "statistics" row. If this is too high, then you need to stop your optimiser from spending so much time.<br /><br />Controlling how much time the optimiser should spend doing its job can be accomplished using the <a href="https://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_optimizer_search_depth">optimizer_search_depth</a> variable. This is the maximum depth to search (presumably in some tree search algorithm) when optimising the query and is unfortunately set to the highest value by default. Being set to the highest value makes it the most likely to find the best execution plan but may also be unnecessarily time consuming. Thankfully there is a better default value to use: 0. When this variable is set to zero, the optimiser automatically picks a value based on the query and that seems to work well for me. To do this just execute the following query:<br /><br /><pre class="brush:python">SET optimizer_search_depth = 0;<br /></pre>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-25216890623852623542019-01-29T17:45:00.003+01:002019-01-29T18:13:33.838+01:00Python string format quick guideThis is a quick summary guide for when you just want to remind yourself how to format a value with Python's <a href="https://docs.python.org/3.6/library/string.html#format-string-syntax">'string'.format()</a>.<br /><br /><div class="sectitle">Referencing</div><br />Curly brackets are used to refer to parameter values in the 'format' method. On their own, the first pair of curly brackets will refer to the first parameter, second to the second, etc.<br /><br /><pre class="brush:python">'{} {} {}'.format('a', 1, True)<br /></pre><pre>a 1 True</pre><br />Alternatively you can specify which parameter you want to use where using indexes.<br /><br /><pre class="brush:python">'{0} {0} {2}'.format('a', 1, True)<br /></pre><pre>a a True</pre><br />You can even use names instead of indexes.<br /><br /><pre class="brush:python">'{a} {b} {c}'.format(a='a', b=1, c=True)<br /></pre><pre>a 1 True</pre><br />And even refer to indexes in lists and to instance variables in objects.<br /><br /><pre class="brush:python">'{a[0]} {b.numerator}'.format(a=[3,1,2], b=fractions.Fraction(1,2))<br /></pre><pre>3 1</pre><br /><div class="sectitle">Formatting</div><br />Adding a colon inside the curly brackets allows you to format the values before they are added to the string. You can combine these with any of the above methods of referencing. The following formatting meta characters are to be used in the following order:<br /><pre>{:[alignment] [grouping] [precision] [data type]}</pre><br /><div class="subsectitle">Aligning/padding/leading zeros</div>An alignment meta character is either >, <, or ^. The number after it is the field width in which to align.<br /><br /><pre class="brush:python">'|{:>3}|{:<3}|{:^3}|'.format('a', 'b', 'c')<br /></pre><pre>| a|b | c |</pre><br />Any character before the alignment meta character is used instead of spaces.<br /><pre class="brush:python">'{:_>5} {:0>5}'.format('a', 12)<br /></pre><pre>____a 00012</pre><br /><div class="subsectitle">Grouping</div>You can group digits in numbers into triples with commas automatically by just putting a comma in the curly brackets.<br /><br /><pre class="brush:python">'{:,}'.format(12345678)</pre><pre>12,345,678</pre><br /><div class="subsectitle">Precision</div>When formatting floats you can specify how many digits to keep after the point by adding a dot and a number. By default the format of the number is in scientific notation which means that your floating point number will look like 12e-4 with the precision referring to the number of digits in the mantissa (the number before 'e'). See the next section to format your number in fixed point notation.<br /><br /><pre class="brush:python">'{:.3}'.format(12345678.0)</pre><pre>1.23e+07</pre><br /><div class="subsectitle">Data types</div>This is the most important part. The data type always comes at the end and is a single letter.<br /><br />You can use it to change the numeric base of a whole number.<br /><pre class="brush:python">'dec-{:d} hex-{:X} oct-{:o} bin-{:b}'.format(16, 16, 16, 16)</pre><pre>dec-16 hex-10 oct-20 bin-10000</pre><br />You can use it to convert unicode values to characters.<br /><pre class="brush:python">'{:c}'.format(97)</pre><pre>a</pre><br />You can use it to convert numbers into scientific notations,<br /><pre class="brush:python">'{:e}'.format(12345678)</pre><pre>1.234568e+07</pre><br />or fixed point notations,<br /><pre class="brush:python">'{:f}'.format(12345678)</pre><pre>12345678.000000</pre><br />or percentages.<br /><pre class="brush:python">'{:%}'.format(0.1234)</pre><pre>12.340000%</pre><br /><div class="sectitle">Mixed examples</div><br /><pre class="brush:python">'{:0>2X}'.format(10)</pre><pre>0A</pre><br /><pre class="brush:python">'{:,d}'.format(1234567)</pre><pre>1,234,567</pre><br /><pre class="brush:python">'{:,.2f}'.format(12345.6)</pre><pre>12,345.60</pre><br /><pre class="brush:python">'{:.2%}'.format(0.1234)</pre><pre>12.34%</pre>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-91605694465384295222018-12-27T19:37:00.000+01:002018-12-27T21:31:20.191+01:00Language models and the unknown token: Why perplexity gets worse as the corpus size increasesUnless you're making use of a character-based language model somehow, you're going to have a finite vocabulary of words that you can handle and so you need a way to handle out-of-vocabulary words. One common way to handle such words is by replacing them all with the unknown token, a pseudo word that replaces all out-of-vocabulary words. The unknown token has several advantages: Infrequent words in the training corpus will probably either not be learned by the neural network or not be used when generating sentences. Ignoring them will save a lot in model size as the embedding layer and softmax take a lot of parameters. Plus all the infrequent words put together will occur very frequently so the language model will definitely learn to use the unknown token. It also makes it possible to handle new words at test time (that are not found even once in the training corpus) as they will be replaced by the unknown token.<br /><br />The problem with this approach is what happens when measuring the probability or perplexity of a sentence based on the probabilities of individual words. If you're comparing language models to see which ones make the best predictions, you usually use them all on a corpus to see how well they predict the words in the corpus. The higher the probabilities assigned to those sentences, the better the language model. This is usually measured using language model perplexity. But see what happens when you vary the vocabulary size. You will find that smaller vocabulary sizes lead to better language models, even though this makes no sense.<br /><br />It turns out that if you just multiply all the probabilities of individual words as-is, including that of the unknown token, then your probability will be sensitive to the vocabulary size. Let's say that you have a vocabulary size of 0, that is, you're considering all the words to be out-of-vocabulary and hence all of them will be replaced by the unknown token. This means that during training, the language model will learn that after every unknown token there is (probably) another unknown token, unless its the end of the sentence. This will make the language model give very high probabilities for the unknown token. High word probabilities mean high sentence probabilities which mean good perplexities.<br /><br />Now If we add another word to the vocabulary then we'll be introducing some uncertainty into the language model as now it has to decide between using the unknown token or the known word. Even in a perfect language model, the same prefix of words can be followed by either of the two words so there is no way to correctly assign 100% of the probability to one or the other. This means that the probabilities will be split between the two words, leading to an overall decrease in probabilities, leading to a worse perplexity. Adding more words to the vocabulary makes this even worse, which means that language models with smaller vocabularies have a huge unfair advantage over language models that actually do their job and correctly predict the right word.<br /><br />We can't do away with the unknown token but we can strip away the unknown token's power. Assuming that all the language models are being evaluated on the same corpus, then different vocabularies will have different words being turned into the unknown token. Let's say that your language model considers 1000 different words in its vocabulary but the corpus you're evaluating it on has 500 different words that are out-of-vocabulary. So in reality your language model is predicting one of 1500 different words; it's just that 500 of those words are assumed to be a single word with a single probability. But really there should be 500 separate probabilities for those out-of-vocabulary words and not just one. If we avoid merging all those probabilities into one, then all the language models will have a fair comparison all they will all have the same vocabulary and they will all have the same amount of uncertainty about which word should come next. The question is how to distribute that single unknown token probability between the 500 out-of-vocabulary words. The simplest solution is to assume a uniform distribution and just give each word the same slice of probability from the whole. So if the unknown token has a probability of $p$, then each out-of-vocabulary word gets a probability of $\frac{p}{500}$.<br /><br />Now every time you encounter the unknown token in the evaluation corpus you know that the token is being used in place of one of those 500 words. But you don't know which one it is. Not a problem, just divide the probability by 500 and it's as if all words in the corpus are in the vocabulary. Do this to every unknown token probability and now you have a fair measure of perplexity. Let's see an example.<br /><br />Let's say that we want to find the probability of the following sentence:<br /><pre>the loud dog barked at the loud man</pre><br />and let's say that the language model we're using to do that has the following vocabulary:<br /><pre>the at dog man</pre><br />this means that the sentence is now changed to:<br /><pre>the UNK dog UNK at the UNK man</pre><br />Now the naive way to get the probability of the sentence is as follows:<br /><br />$$<br />P(\text{the UNK dog UNK at the UNK man}) = \\<br />p(\text{the}|\text{}) \times p(\text{UNK}|\text{the}) \times p(\text{dog}|\text{the UNK}) \times p(\text{UNK}|\text{the UNK dog}) \\<br />\times p(\text{at}|\text{the UNK dog UNK}) \times p(\text{the}|\text{the UNK dog UNK at}) \times p(\text{UNK}|\text{the UNK dog UNK at the}) \\<br />\times p(\text{man}|\text{the UNK dog UNK at the UNK}) \times p(\text{}|\text{the UNK dog UNK at the UNK man})<br />$$<br /><br />But now with the new way we'll divide the unknown token's probabilities by 2, the number of different out of vocabulary words ('loud' and 'barked'):<br /><br />$$<br />P(\text{the UNK dog UNK at the UNK man}) = \\<br />p(\text{the}|\text{}) \times p(\text{UNK}|\text{the})/2 \times p(\text{dog}|\text{the UNK}) \times p(\text{UNK}|\text{the UNK dog})/2 \\<br />\times p(\text{at}|\text{the UNK dog UNK}) \times p(\text{the}|\text{the UNK dog UNK at}) \times p(\text{UNK}|\text{the UNK dog UNK at the})/2 \\<br />\times p(\text{man}|\text{the UNK dog UNK at the UNK}) \times p(\text{}|\text{the UNK dog UNK at the UNK man})<br />$$<br /><br />Of course we can leave the re-weighting till the end of the equation by dividing the first equation by the number of different out-of-vocabulary words for as many times as there are unknown tokens, like this:<br /><br />$$<br />P(\text{the UNK dog UNK at the UNK man}) = \\<br />p(\text{the}|\text{}) \times p(\text{UNK}|\text{the}) \times p(\text{dog}|\text{the UNK}) \times p(\text{UNK}|\text{the UNK dog}) \\<br />\times p(\text{at}|\text{the UNK dog UNK}) \times p(\text{the}|\text{the UNK dog UNK at}) \times p(\text{UNK}|\text{the UNK dog UNK at the}) \\<br />\times p(\text{man}|\text{the UNK dog UNK at the UNK}) \times p(\text{}|\text{the UNK dog UNK at the UNK man})/ 2^3<br />$$<br /><br />Now the sentence probability goes up as the vocabulary size increases!Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-20510933309610367662018-11-29T22:14:00.002+01:002018-11-29T22:21:46.349+01:00Explainable AI (XAI): Sensitivity AnalysisImagine you have a neural network that can classify images well. At test time, would it be enough to just know what the class of the image is? Or would you also want to know why the image was classified the way it was? This is one of the goals of explainable AI and in this post we'll see what sensitivity analysis is.<br /><br />Sensitivity analysis is a way to measure the importance of different parts of an input to one part of an output. In other words, you want to know which pixels were most important to give a high probability for a particular class. The way sensitivity analysis measures this is by measuring how sensitive the output is to each pixel, that is, which pixels, when changed, will change the output the most. Most pixels should leave the output unchanged when they themselves are changed, but some pixels would be critical to the particular output that the neural network has given. To measure this, we simply find the following:<br /><br />$\left|\frac{df(x)_i}{dx_j}\right|$<br /><br />that is, the sensitivity of output $i$ to input $j$ is the absolute value of the gradient of the output with respect to the input. In Tensorflow we can compute this as follows:<br /><br /><pre class="brush:python">sensitivity = tf.abs(tf.gradients([outputs[0, output_index]], [images])[0][0])</pre><br />where 'output_index' is a placeholder that is a scalar of type tf.int64, 'outputs' is a softmax with probabilities for each class and where the first index is the batch index and the second index is the class probability, and 'images' is the pixels of images where the first index is batch index and the second index is the image in vector form. This code also assumes that only the first image in the batch is to be analysed. This is because Tensorflow can only find the gradient of a single scalar so we can only find the sensitivity of a single output of a single image.<br /><br />Here are some examples I got when I tried it on a simple fully connected two layer neural network trained to classify MNIST handwritten digits.<br /><br /><a href="https://1.bp.blogspot.com/-kLdO_Qo6vVA/XABXKiTZ30I/AAAAAAAABRw/LE_WF4Blg7oWcndSPQuQduTbUNaJkyQdACLcBGAs/s1600/sensitivityanalysis_0.png" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-kLdO_Qo6vVA/XABXKiTZ30I/AAAAAAAABRw/LE_WF4Blg7oWcndSPQuQduTbUNaJkyQdACLcBGAs/s1600/sensitivityanalysis_0.png" data-original-width="108" data-original-height="210" /></a><a href="https://4.bp.blogspot.com/-EZPrCkoHGe8/XABXK2pdv-I/AAAAAAAABR0/l3fGgIdEw9YdKccqfKnP-iHX81oIHeS5ACLcBGAs/s1600/sensitivityanalysis_4.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-EZPrCkoHGe8/XABXK2pdv-I/AAAAAAAABR0/l3fGgIdEw9YdKccqfKnP-iHX81oIHeS5ACLcBGAs/s1600/sensitivityanalysis_4.png" data-original-width="108" data-original-height="210" /></a><a href="https://2.bp.blogspot.com/-SrkbuGE47ac/XABXK1jIC6I/AAAAAAAABRs/1p_c1eZrCk0GA4Fgm3swRNY7nT4VPztCgCLcBGAs/s1600/sensitivityanalysis_5.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-SrkbuGE47ac/XABXK1jIC6I/AAAAAAAABRs/1p_c1eZrCk0GA4Fgm3swRNY7nT4VPztCgCLcBGAs/s1600/sensitivityanalysis_5.png" data-original-width="108" data-original-height="209" /></a><a href="https://4.bp.blogspot.com/-Zvss5Mw6XGQ/XABXLI2-xxI/AAAAAAAABR4/TTq4wrHC2vYOmHQ5_Sv_Ui053s--P-amwCLcBGAs/s1600/sensitivityanalysis_7.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-Zvss5Mw6XGQ/XABXLI2-xxI/AAAAAAAABR4/TTq4wrHC2vYOmHQ5_Sv_Ui053s--P-amwCLcBGAs/s1600/sensitivityanalysis_7.png" data-original-width="108" data-original-height="209" /></a><a href="https://1.bp.blogspot.com/-SqxUH5Ozhew/XABXLuZg_TI/AAAAAAAABR8/UWUY35aNAx0oirW-jiHeJ8eLnU8ZNr27gCLcBGAs/s1600/sensitivityanalysis_9.png" imageanchor="1" ><img border="0" src="https://1.bp.blogspot.com/-SqxUH5Ozhew/XABXLuZg_TI/AAAAAAAABR8/UWUY35aNAx0oirW-jiHeJ8eLnU8ZNr27gCLcBGAs/s1600/sensitivityanalysis_9.png" data-original-width="108" data-original-height="209" /></a><br /><br />These heat maps show which pixels have the highest sensitivity score for the digit they were classified as. We can see how the empty space in the middle is important for classifying the zero. The four and the nine can be easy confused for each other were it not for the little bit in the top left corner. The seven can be confused for a nine if we complete the top left loop and the five can be confused with a six or eight if we draw a little diagonal line.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-18073264991404833442018-10-24T22:10:00.000+02:002018-10-24T22:28:24.800+02:00Mentally estimating square rootsWhat's $\sqrt{10}$? I'd reckon it's about $3 + \frac{10-9}{2 \times 3} = 3.166$. The actual answer is 3.162. What about $\sqrt{18}$? Should be about $4 + \frac{18-16}{2 \times 4} = 4.250$. The actual answer is 4.242. I found out about this calculation from <a href="https://www.youtube.com/watch?v=PJHtqMjrStk">this video</a> but there was no explanation for why it works. Here's an explanation.<br /><br />So the method here is as follows.<br /><ol><li>Let the number you want to find the square root of be $a^2$.</li><li>Let the largest square number which is less than $a^2$ be $b^2$ such that $b^2 <= a^2 < (b+1)^2$. For example if $a^2$ is 10 then $b^2$ is 9, if $a^2$ is 18 then $b^2$ is 16.</li><li>The square root of $a^2$ is approximately $b + \frac{a^2 - b^2}{2 b}$.</li></ol><br />This method is easy to carry out mentally but why does it work? The trick here is that the graph of the square root function grows so slowly that we can approximate the curve between two adjacent square numbers as a line.<br /><br /><a href="https://3.bp.blogspot.com/-lgpg71gLShs/W9DVfIo9KHI/AAAAAAAABQE/XwYjBepZ-9g8H-P4mbeFruf1LACrY3bSACLcBGAs/s1600/sqrtestimation_graph.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-lgpg71gLShs/W9DVfIo9KHI/AAAAAAAABQE/XwYjBepZ-9g8H-P4mbeFruf1LACrY3bSACLcBGAs/s640/sqrtestimation_graph.png" width="640" height="338" data-original-width="1104" data-original-height="583" /></a><br /><br />We can use the line to approximate the square root of any number between two square numbers. The first thing we need to know is the gradient of the line. The vertical distance between two adjacent square numbers on the square root curve is 1, since the two square numbers are the squares of two consecutive numbers. The horizontal distance changes and becomes larger as the adjacent square numbers become larger but we can calculate it as follows:<br /><br />$$(b+1)^2 - b^2 = b^2 + 2b + 1 - b^2 = 2b + 1$$<br /><br />So the horizontal distance is twice the square root of the smaller square number plus one. Therefore the gradient of the line is $\frac{1}{2b+1}$. Once we know by how much the line grows vertically for every horizontal unit, we can then determine how much higher than $b$ the point on the line will be at $a$ by multiplying the gradient by $a^2-b^2$, as shown below:<br /><br /><a href="https://3.bp.blogspot.com/-BZyfdJce04Y/W9DVnBb8rFI/AAAAAAAABQI/gCMKoTeR2UwDY02k5__TU929wxxBPue2gCLcBGAs/s1600/sqrtestimation_line.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-BZyfdJce04Y/W9DVnBb8rFI/AAAAAAAABQI/gCMKoTeR2UwDY02k5__TU929wxxBPue2gCLcBGAs/s640/sqrtestimation_line.png" width="640" height="360" data-original-width="1280" data-original-height="720" /></a><br /><br />Since the difference in height is less than 1, it is going to be the part of the square root that comes after the decimal point, with the whole number part being $b$.<br /><br />It might be hard to mentally divide by an odd number in $\frac{a^2-b^2}{2b+1}$ so we further approximate it as $\frac{a^2-b^2}{2b}$ instead. And that's why this method works.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-87712869505393152292018-09-29T17:07:00.004+02:002018-09-29T17:10:25.392+02:00Comparing numpy scalars directly is time consuming, use .tolist() before a comparisonThis is something that I found out about recently when going through the elements of a numpy array in order to do some checks on each numbers. Turns out you shouldn't just do this<br /><br /><pre class="brush:python">for x in nparr:<br /> if x == 0:<br /> something something<br /></pre><br />as that uses a lot more time than doing this<br /><br /><pre class="brush:python">for x in nparr.tolist():<br /> if x == 0:<br /> something something<br /></pre><br />This is because a for loop iterating over a numpy array does not result in a sequence of Python constants but in a sequence of numpy scalars which would result in comparing a numpy array to a constant. Converting the array into a list first before the for loop will then result in a sequence of constants.<br /><br />Here is some profiling I've done using cProfile to check different ways to do an 'if' on a numpy array element:<br /><br /><pre class="brush:python">import cProfile<br />import numpy as np<br /><br />runs = 1000000<br /><br />print('Comparing numpy to numpy')<br />x = np.array(1.0, np.float32)<br />y = np.array(1.0, np.float32)<br />cProfile.run('''<br />for _ in range(runs):<br /> if x == y:<br /> pass<br />''')<br />print()<br /><br />print('Comparing numpy to constant')<br />x = np.array(1.0, np.float32)<br />cProfile.run('''<br />for _ in range(runs):<br /> if x == 1.0:<br /> pass<br />''')<br />print()<br /><br />print('Comparing constant to constant')<br />x = 1.0<br />cProfile.run('''<br />for _ in range(runs):<br /> if x == 1.0:<br /> pass<br />''')<br />print()<br /><br />print('Comparing numpy.tolist() to constant')<br />x = np.array(1.0, np.float32)<br />cProfile.run('''<br />for _ in range(runs):<br /> if x.tolist() == 1.0:<br /> pass<br />''')<br />print()<br /><br />print('Comparing numpy to numpy.array(constant)')<br />x = np.array(1.0, np.float32)<br />cProfile.run('''<br />for _ in range(runs):<br /> if x == np.array(1.0, np.float32):<br /> pass<br />''')<br />print()<br /><br />print('Comparing numpy.tolist() to numpy.tolist()')<br />x = np.array(1.0, np.float32)<br />y = np.array(1.0, np.float32)<br />cProfile.run('''<br />for _ in range(runs):<br /> if x.tolist() == y.tolist():<br /> pass<br />''')<br />print()<br /></pre><br />Here are the results in order of speed:<br /><br /><table border="1"><tr><th>Comparing constant to constant:</th><td>0.088 seconds</td></tr><tr><th>Comparing numpy.tolist() to constant:</th><td>0.288 seconds</td></tr><tr><th>Comparing numpy.tolist() to numpy.tolist():</th><td>0.508 seconds</td></tr><tr><th>Comparing numpy to numpy:</th><td>0.684 seconds</td></tr><tr><th>Comparing numpy to constant:</th><td>1.192 seconds</td></tr><tr><th>Comparing numpy to numpy.array(constant):</th><td>1.203 seconds</td></tr></table><br />It turns out that it is always faster to first convert your numpy scalars into constants via .tolist() than to do anything with them as numpy scalars.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-23594156293254387072018-08-30T23:41:00.000+02:002018-09-29T08:26:01.092+02:00The McLauren series and Taylor series (approximating complicated functions with simple polynomials)<div class="sectitle">The McLauren series</div><br />Imagine you had a function $f(x)$ that you knew was a polynomial, but whose details were unknown and you could only apply operations to the function without being able to read it. How could you find the coefficients of this polynomial? We know that for coefficients $a_i$:<br /><br />$f(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + a_4 x^4 + ...$<br /><br />If we find $f(0)$ then we can find $a_0$.<br /><br />$f(0) = a_0 + a_1 0 + a_2 0^2 + a_3 0^3 + a_4 0^4 + ... = a_0 + 0 + 0 + ... = a_0$<br /><br />That was easy, but how can we find $a_1$? We need an operation that gets rid of $a_0$ and also the $x$ in the term $a_1 x$. That operation turns out to be differentiation with respect to $x$:<br /><br />$f'(x) = a_1 + 2 a_2 x + 3 a_3 x^2 + 4 a_4 x^3 + ...$<br /><br />Great! Now we can find $a_1$ by replacing $x$ with 0:<br /><br />$f'(0) = a_1 + 2 a_2 0 + 3 a_3 0^2 + 4 a_4 0^3 + ... = a_1$<br /><br />We can find $a_2$ by repeating these two steps:<br /><br />$f''(x) = 2 a_2 + 2 \cdot 3 a_3 x + 3 \cdot 4 a_4 x^2 + ...$<br />$f''(0) = 2 a_2 + 2 \cdot 3 a_3 0 + 3 \cdot 4 a_4 0^2 + ... = 2 a_2$<br /><br />What we found is twice of $a_2$ which means that we need to divide by 2 to get $a_2$. The differentiation operation is multiplying constants by each coefficient and the constants get bigger and bigger the more times we apply differentiation. You might notice that what's happening is that $a_0$ was multiplied by 1, $a_1$ was also multiplied by 1, $a_2$ has been multiplied by 2, $a_3$ will be multiplied by 6, $a_4$ by 24, and so on. These are factorials, sequences of whole numbers multiplied together ($3! = 1 \times 2 \times 3 = 6$). Which means that we need to divide by the next factorial after each round of differentiation and substitution by zero.<br /><br />In general we can find the $i$th coefficient in an unknown polynomial function by doing the following:<br /><br />$a_i = \frac{f^i(0)}{i!}$<br /><br />That's great. Now to test it. Let's see if a complex function is actually a polynomial in disguise. Take something simple such as $f(x) = e^x$. This doesn't look like a polynomial, but it may be represented by a polynomial with an infinite number of terms. Let's find what are the coefficients of the hidden polynomial in $e^x$.<br /><br />$f(x) = e^x = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + a_4 x^4 + ...$<br />$\frac{f(0)}{0!} = \frac{e^0}{1} = a_0$<br />$\frac{f(0)}{0!} = 1$<br /><br />OK, so $a_0$ is 1. Let's find the rest of the coefficients.<br /><br />$a_1 = \frac{f'(0)}{1!} = \frac{e^0}{1} = 1$<br />$a_2 = \frac{f''(0)}{2!} = \frac{e^0}{2} = \frac{1}{2}$<br />$a_3 = \frac{f'''(0)}{3!} = \frac{e^0}{6} = \frac{1}{6}$<br />$...$<br /><br />So the first few terms of the polynomial hidden within $e^x$ are:<br />$f(x) = 1 + x + \frac{1}{2} x^2 + \frac{1}{6} x^3 + ...$<br /><br />Does this partial polynomial look anything like $e^x$ when plotted as a graph?<br /><br /><a href="https://2.bp.blogspot.com/-YfDy5V5J0OY/W4hRuEFy1xI/AAAAAAAABNw/zGZlBU2-19IerOgCL9lb9U4ubDqk5qyvgCLcBGAs/s1600/taylor_exp.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-YfDy5V5J0OY/W4hRuEFy1xI/AAAAAAAABNw/zGZlBU2-19IerOgCL9lb9U4ubDqk5qyvgCLcBGAs/s320/taylor_exp.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Pretty good within a boundary! Note how the curve has a perfect fit at $x = 0$ and that it gets worse as we move away from there. Adding more terms to the polynomial will enlarge the area around $x = 0$ that is close to the curve but it will always be radiating out from there.<br /><br />Let's try for $f(x) = cos(x)$ now.<br /><br />$a_0 = \frac{f(0)}{0!} = \frac{cos(0)}{1} = 1$<br />$a_1 = \frac{f'(0)}{1!} = \frac{-sin(0)}{1} = 0$<br />$a_2 = \frac{f''(0)}{2!} = \frac{-cos(0)}{2} = -\frac{1}{2}$<br />$a_3 = \frac{f'''(0)}{3!} = \frac{sin(0)}{6} = 0$<br />$...$<br /><br />So the first few terms of the polynomial hidden within $cos(x)$ is<br />$f(x) = 1 - \frac{1}{2} x^2 + ...$<br /><br /><a href="https://4.bp.blogspot.com/-PVy37OfBly4/W4hUwDJ7oDI/AAAAAAAABN8/EI1IcfdV4y8VW3kGMVjBCZ_GV8WlNun-wCLcBGAs/s1600/taylor_cos.png" imageanchor="1" ><img border="0" src="https://4.bp.blogspot.com/-PVy37OfBly4/W4hUwDJ7oDI/AAAAAAAABN8/EI1IcfdV4y8VW3kGMVjBCZ_GV8WlNun-wCLcBGAs/s320/taylor_cos.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br /><div class="sectitle">The Taylor series</div><br />As you can see, this "polynomialisation" of functions is a neat way to approximate a function we might not know how to implement exactly but know how to differentiate and how to find its value when $x$ is 0. But what if we don't know what $f(0)$ is such as in $ln(x)$ or $\frac{1}{x}$? A slight modification to our method allows us to use any value of $x$ and not just 0. Let's call this value $b$. By slightly modifying the polynomial we expect to be hiding inside the function, we can make the polynomial act the same way when $f(b)$ is used instead of $f(0)$:<br /><br />$f(x) = a_0 + a_1 (x - b) + a_2 (x - b)^2 + a_3 (x - b)^3 + a_4 (x - b)^4 + ...$<br />$f(b) = a_0 + a_1 (b - b) + a_2 (b - b)^2 + a_3 (b - b)^3 + a_4 (b - b)^4 + ...$<br />$f(b) = a_0$<br /><br />$f'(x) = a_1 + 2 a_2 (x - b) + 3 a_3 (x - b)^2 + 4 a_4 (x - b)^3 + ...$<br />$f'(b) = a_1$<br /><br />$a_i = \frac{f^i(b)}{i!}$<br /><br />The catch here is that we are now finding coefficients to the polynomial $a_0 + a_1 (x - b) + a_2 (x - b)^2 + ...$ and not of $a_0 + a_1 x + a_2 x^2 + ...$, but that's OK. Let's try this on $ln(x)$ with $b = 1$.<br /><br />$a_0 = \frac{f(1)}{0!} = \frac{ln(1)}{1} = 0$<br />$a_1 = \frac{f'(1)}{1!} = \frac{\frac{1}{1}}{1} = 1$<br />$a_2 = \frac{f''(1)}{2!} = \frac{-\frac{1}{1^2}}{1} = -1$<br />$a_3 = \frac{f'''(1)}{3!} = \frac{\frac{2}{1^3}}{1} = 2$<br />$...$<br /><br />So the first few terms of the polynomial hidden within $ln(x)$ is<br />$f(x) = (x - 1) - (x - 1)^2 + 2 (x - 1)^3 + ...$<br /><br /><a href="https://3.bp.blogspot.com/-OMYquo9voH4/W4hglr_yutI/AAAAAAAABOI/LIdM_LyV8bczzNaIUomN1Ibxr8KcOX5vQCLcBGAs/s1600/taylor_ln.png" imageanchor="1" ><img border="0" src="https://3.bp.blogspot.com/-OMYquo9voH4/W4hglr_yutI/AAAAAAAABOI/LIdM_LyV8bczzNaIUomN1Ibxr8KcOX5vQCLcBGAs/s320/taylor_ln.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />Adding more terms will approximate the original function better and better but what if we didn't have to? Remember how I said in the previous section that the polynomial approximates the original function best close to $x = 0$. Well now we can approximate it best around any point $b$ and not just around 0. This means that if our function has multiple known values, such as $cos(x)$ which is known to be 1 at $x = 0$, 0 at $x = \frac{\pi}{2}$, -1 at $x = \pi$, etc., then we can use several short polynomials centered at different points in the function instead of one large polynomial that approximates it well over a large interval. Let's try approximating $cos(x)$ around $x = \pi$, which means that we'll set $b$ to $\pi$.<br /><br />$a_0 = \frac{f(\pi)}{0!} = \frac{cos(\pi)}{1} = -1$<br />$a_1 = \frac{f'(\pi)}{1!} = \frac{-sin(\pi)}{1} = 0$<br />$a_2 = \frac{f''(\pi)}{2!} = \frac{-cos(\pi)}{2} = \frac{1}{2}$<br />$a_3 = \frac{f'''(\pi)}{3!} = \frac{sin(\pi)}{6} = 0$<br />$...$<br /><br />So the first few terms of the polynomial hidden within $cos(x)$ which best approximates the area around $x = \pi$ is<br />$f(x) = -1 + \frac{1}{2} (x - \pi)^2 + ...$<br /><br /><a href="https://2.bp.blogspot.com/-zzagcHK6GKw/W4hjmtagCeI/AAAAAAAABOU/fn9eG7ZYxboXuCtODr0jCJCcTdRw3LQCQCLcBGAs/s1600/taylor_cos_pi.png" imageanchor="1" ><img border="0" src="https://2.bp.blogspot.com/-zzagcHK6GKw/W4hjmtagCeI/AAAAAAAABOU/fn9eG7ZYxboXuCtODr0jCJCcTdRw3LQCQCLcBGAs/s320/taylor_cos_pi.png" width="320" height="169" data-original-width="1104" data-original-height="583" /></a><br /><br />This is useful when implementing mathematical functions on a computer. You keep several simple polynomials defined at different points in the domain of the function and then pick the closest one to the $x$ you need to evaluate. You can then compute an approximation that isn't too bad without requiring a lot of computational time.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4318944459823143473.post-86050409192796957252018-07-29T23:25:00.001+02:002018-07-29T23:45:37.534+02:00Hyperparameter tuning using Scikit-OptimizeOne of my favourite academic discoveries this year was <a href="https://scikit-optimize.github.io/">Scikit-Optimize</a>, a nifty little Python automatic hyperparameter tuning library that comes with a lot of features I found missing in other similar libraries.<br /><br />So as explained in an <a href="https://geekyisawesome.blogspot.com/2017/10/hyperparameter-tuning-using-hyperopt.html">earlier blog post</a>, automatic hyperparameter tuning is about finding the right hyperparameters for a machine learning algorithm automatically. Usually this is done manually using human experience but even simple MonteCarlo search random guesses can result in better performance than human tweaking (<a href="http://www.jmlr.org/papers/v13/bergstra12a.html">see here</a>). So automatic methods were developed that try to explore the space of hyperparameters and their resulting performance after training the machine learning model and then try to home in on the best performing hyperparameters. Of course each time you want to evaluate a new hyperparameter combination you'd need to retrain and evaluate your machine learning model, which might take a very long time to finish, so it's important that a good hyperparameter combination is found with as little evaluations as possible. To do this we'll use <a href="https://en.wikipedia.org/wiki/Bayesian_optimization">Bayesian Optimization</a>, a process where a separate simpler model is trained to predict the resulting performance of the whole hyperparameter space. We check this trained model to predict which hyperparameters will give the best resulting performance and actually evaluate them by training our machine learning model with them. The actual resulting performance is then used to update the hyperparameter space model so that it makes better predictions and then we'll get a new promising hyperparameter combination from it. This is repeated for a set number of times. Now the most common hyperparameter space model to use is a Gaussian Process which maps continuous numerical hyperparameters to a single number which is the predicted performance. This is not very good when your hyperparameters contain categorical data such as a choice of activation function. There is a <a href="https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9993">paper</a> that suggests that random forests are much better at mapping general hyperparameter combinations to predicted performance.<br /><br />Now that we got the theory out of the way, let's see how to use the library. We'll apply it on a gradient descent algorithm that needs to minimize the squared function. For this we'll need 3 hyperparameters: the range of the initial value to be selected randomly, the learning rate, and the number of epochs to run. So we have two continuous values and one discrete integer value.<br /><br /><pre class="brush:python">import random<br /><br />def cost(x):<br /> return x**2<br /><br />def cost_grad(x):<br /> return 2*x<br /><br />def graddesc(learning_rate, max_init_x, num_epochs):<br /> x = random.uniform(-max_init_x, max_init_x)<br /> for _ in range(num_epochs):<br /> x = x - learning_rate*cost_grad(x)<br /> return x<br /></pre><br />Now we need to define the skopt optimizer:<br /><br /><pre class="brush:python">import skopt<br /><br />opt = skopt.Optimizer(<br /> [<br /> skopt.space.Real(0.0, 10.0, name='max_init_x'),<br /> skopt.space.Real(1.0, 1e-10, 'log-uniform', name='learning_rate'),<br /> skopt.space.Integer(1, 20, name='num_epochs'),<br /> ],<br /> n_initial_points=5,<br /> base_estimator='RF',<br /> acq_func='EI',<br /> acq_optimizer='auto',<br /> random_state=0,<br /> )<br /></pre><br />The above code is specifying 3 hyperparameters:<br /><ul><li>the maximum initial value that is a real number (continuous) and that can be between 10 and 0</li><li>the learning rate that is also a real number but that is also on a logarithmic scale (so that you are equally likely to try very large values and very small values) and can be between 1 and 1e-10</li><li>the number of epochs that is an integer (whole number) and that can be between 1 and 20</li></ul>It is also saying that the hyperparameter space model should be initialized based on 5 random hyperparameter combinations (you train the hyperparameter space model on 5 randomly chosen hyperparameters in order to be able to get the first suggested hyperparameter), that this model should be a random forest (RF), that the acquisition function (the function to decide which hyperparameter combination is the most promising to try next according to the hyperparameter space model) is the expected improvement (EI) of the hyperparameter combination, that the acquisition optimizer (the optimizer to find the next promising hyperparameter combination) is automatically chosen, and that the random state is set to a fixed number (zero) so that it always gives the same random values each time you run it.<br /><br />Next we will use the optimizer to find good hyperparameter combinations.<br /><br /><pre class="brush:python">best_cost = 1e100<br />best_hyperparams = None<br />for trial in range(5 + 20):<br /> [max_init_x, learning_rate, num_epochs] = opt.ask()<br /> [max_init_x, learning_rate, num_epochs] = [max_init_x.tolist(), learning_rate.tolist(), num_epochs.tolist()]<br /> next_hyperparams = [max_init_x, learning_rate, num_epochs]<br /> next_cost = cost(graddesc(max_init_x, learning_rate, num_epochs))<br /> if next_cost < best_cost:<br /> best_cost = next_cost<br /> best_hyperparams = next_hyperparams<br /> print(trial+1, next_cost, next_hyperparams)<br /> opt.tell(next_hyperparams, next_cost)<br />print(best_hyperparams)<br /></pre>The nice thing about this library is that you can use an 'ask/tell' system where you ask the library to give you the next hyperparameters to try and then you do something with them in order to get the actual performance value and finally you tell the library what this performance value is. This lets you do some nifty things such as ask for another value if the hyperparameters resulted in an invalid state in the machine learning model or even to save your progress and continue later.<br /><br />In the above code we're running a for loop to run the number of times we want to evaluate different hyperparameters. We need to run it for the 5 random values we specified before to initialize the hyperparameter space model plus another 20 evaluations to actually optimize the hyperameters. Now skopt does something funny which is that it returns not plain Python values for hyperparameters but rather each number is represented as a numpy scalar. Because of this we convert each numpy scalar back into a plain Python float or int using ".tolist()". We ask for the next hyperparamters to try, convert them to plain Python values, get their resulting cost after running gradient descent, store the best hyperparameters found up to now, and tell the library what the given hyperparameters' resulting performance was. At the end we print the best hyperparamter combination found.<br /><br />Some extra stuff:<br /><ul><li>You can ask for categorical hyperparameters using "skopt.space.Categorical(['option1', 'option2'], name='options')" which will return one of the values in the list when calling "ask".</li><li>You can ask for a different hyperparameter combination in case of an invalid one by changing "ask" to give you several hyperparameter suggestions rather than just one and then trying each one of them until one works by using "opt.ask(num_hyperpars)" (you can also incrementally ask for more values and always take the last one).</li><li>You can save and continue by saving all the hyperparameters evaluated and their corresponding performance value in a text file. You then later resupply the saved hyperparameters and their performance using "tell" for each one. This is much faster than actually evaluating them on the machine learning model so straight supplying known values will be ready quickly. Just be careful that you also call "ask" before each "tell" in order to get the same exact behaviour from the optimizer or else the next "tell" will give different values from what it would have given had you not loaded the previous ones manually.</li></ul>Unknownnoreply@blogger.com0