Sunday, November 29, 2020

Summary of research paper "A Three-Player GAN: Generating Hard Samples To Improve Classification Networks" by Vandenhende

After reading this paper, I was confused by Algorithm 1 (page 2) as it did not really match parts of the text. So I contacted the author and would now like to clarify it for everyone else.

The paper describes a generative adversarial network which generates images that are hard to classify in order to augment a training set for a second classifier. This is done by attaching a classifier to the generator (apart from the discriminator) which learns to classify the generated data with a gradient reversal layer between the classifier and the generator. The gradient reversal layer forces the generator to generate examples which are difficult to classify by negating the gradient signal coming from the classifier loss before passing it to the generator. This makes the generator learn to maximise the classifier loss whilst the classifier is learning the minimise it, resulting in adversarial learning. The author does not give a reason for why the same kind of adversarial learning is not applied to both parts of the model.

What confused me is how would you know what class the difficult to classify examples produced by the generator would be when training the target classifier with the augmented training set. The reason I was confused is because I did not realise that this is not a GAN, but a CGAN, a conditional generative adversarial network. A CGAN works by feeding both the generator and the discriminator a class label. The label would be the label of the real data and a random label in generated data. Since the labels are the same for both the generator and the discriminator, the discriminator learns to discriminate the fake data by matching it to the label and seeing if it looks like what is expected given the label whilst the generator produce images which match the given label in order to fool the discriminator.

By combining this with the gradient reversal layer, the generated images would be both similar to real data of the given label and also difficult to classify. These two properties would be contradictory except for the fact that the effect of the classifier is scaled down due to the lambda hyperparameter of the gradient reversal layer, making the discriminator take precedence.

Sunday, October 25, 2020

Python generic methods/functions

This is something minor but that could use with some more attention. In C# you can use generics with methods so that if a function needs to use a generic type which can be different with different calls then you can simply do this:
class C {
    T getValue() {
    }
}
Now Python has type hints which are used by the program mypy, but I was having trouble finding how to do generic methods with it. The problem is that generic method examples shown require you to pass in a parameter with the generic type. But if the type is not used in the parameters but only in the return, as shown in the C# example above, then you'd be in trouble. The solution is to pass the type itself as a parameter, in the same way that the C# example above does but as a parameter instead of outside the parameter list. It is shown here but not as a generic function.
from typing import TypeVar, Type

T = TypeVar('T')

class C:
    def get_value(return_type: Type[T]) -> T:
        pass

Tuesday, September 29, 2020

Proof of the volume of a pyramid and cone formulae

Let's take a square based pyramid, one which is arranged such that it fits in the corner of a cube like this:
The volume of this kind of pyramid is a third of that of the cube. This is because 3 such pyramids will perfectly fit in the cube, as shown below:
In fact you don't even need the pyramid to fit in a cube, it can fit in a cuboid instead. Three such rectangular based pyramids will always exactly fit the cuboid. Now the next natural question is about pyramids that are arranged differently, such as when the top point is in the center of the base, as shown below:
You will notice that no matter how the pyramid is arranged, it always consists of an infinite number of stacked squares, starting from the square at the base of the pyramid and getting progressively smaller until the top point is reached. Since the base and top point of both pyramids are exactly the same, the only thing that is changing is where these square slices are horizontally. So the same set of squares in the orange pyramid are used in the green pyramid with the only difference being where they are placed. This means that the same volume is being used by both pyramids so the same formula is used for both. Therefore, the formula for the volume of a rectangular based pyramid is: $$V = \frac{1}{3}hA$$ where $h$ is the height of the pyramid and $A$ is the area of the base. Now what about when the base is not a square? Will a circle work? A circular based pyramid is a cone:
The cone is exactly the same as the square based pyramid except that instead of consisting of a stack of squares consists of a stack of circles (discs, to be exact). Now if the circles fit exactly inside those of a square based pyramid, then the area of the circle is going to be a fraction of the area of the square:
In terms of the radius of the circle ($r$), that fraction is: $$\frac{\pi r^2}{(2r)^2} = \frac{\pi r^2}{4r^2} = \frac{\pi}{4}$$ Now keeping in mind that this fraction will always be the same for any square in the stack of squares forming the pyramid (from the base to the top point), this means that the volume of the cone must be the same fraction of the volume of the pyramid. Therefore: $$V = \frac{\pi}{4}\frac{1}{3}h(2r)^2 = \frac{1}{3}h\pi r^2$$ In fact, the equation did not change from the previous one because it's still using the base area except that the base area is now of a circle instead of a rectangle. What about a trangular based pyramid (tetrahedron)? Same concept. The stack is made of triangles instead of rectangles or circles:
The fraction of the area occupied by the triangle over the area occupied by the square is half: $$\frac{l^2/2}{l^2} = \frac{1}{2}$$ where $l$ is the side length of the square. So now the volume will be half that of the square based pyramid: $$V = \frac{1}{2}\frac{1}{3}h(l)^2 = \frac{1}{3}h\frac{l^2}{2} = \frac{1}{6}hl^2$$ Again we see that the base area equation still holds and must hold for any shaped base pyramid (where the base shape is linearly reduced to a point along the height of the pyramid).

Sunday, August 30, 2020

Why the area under a velocity-time graph gives the distance travelled

I remember back when I was in school when, during physics lessons on motion, we would have velocity-time graphs of some car, where the graph tells you how fast the car was going at any given time, like the one below:

So this is saying that the car started from rest (0m/s) then it accelerated to 20m/s, and so on. The interesting part about this is that you can apparently find the distance travelled by the car by finding the area under the graph. Now eventually I learned that this is because the area under the graph is equal to the anti-derivative of the graph which makes transforms the velocity-time graph back into a distance-time graph (since the derivative of the distance-time graph is the velocity-time graph). But back when I was doing this thing in school we didn't know about calculus so we were not given a reason for why this works, which is a shame because you actually don't need calculus if you're not dealing with curves.

So why does this work? Well, in order to find the area under the graph you need to first break it up into a set of basic shapes as below. Triangles, rectangles, trapezia, and so on. You then find the area of each individual basic shape and find the total.

Let's start with the rectangle. Why is the area under a rectangle equal to the distance travelled? A rectangle means that the velocity for a period of time was constant. The width is the duration and the height is the velocity.

Now if the car was travelling at a constant speed of $v$ for duration $t$, what is the distance travelled? If $v = s/t$ then $s = vt$, that is, the distance travelled is the velocity multiplied by the duration, which just so happens to be the area of the rectangle.

Now one important thing to note about all the sub-shapes under the graph is that they are all different forms of trapezia. A trapezium with equal sides is a rectangle and a trapezium with one side being zero is a triangle. So we can focus on just the trapezium.

So now we can't just use the equation $v = s/t$ because that assumes that the velocity is constant, whereas here it has increased (steadily) from $v_1$ to $v_2$. Now without referring to Newton's equations of motion, it is clear that there exists a single constant velocity which is between $v_1$ and $v_2$ such that the distance travelled in the same duration is equal to the distance travelled after this acceleration. In other words, there is some rectangle with a height that is between $v_1$ and $v_2$ that should give the same answer. We usually call this height the average velocity and it is usually found by dividing the total distance travelled by the total duration of travelling. But here we don't have the distance travelled, only the initial and final velocity. So what is the average velocity of a uniformly increasing velocity?

The answer is the average height of the trapezium, which is $(v_1 + v_2)/2$. We can convince ourselves that this is the average height of the trapezium by sampling a number of points along the top side of the trapezium and finding what the average of the sample heights is. Let's take trapezium C in the brokendown graph above which is between time 20 and 25:

Of course since this is just a single case you might not be convinced yet; so let's do this algebraically using a variable amount of samples $n$:

$$ \hat{v} = \frac{\sum_{i \in 0...n-1}{(v_1 + \frac{i\times(v_2 - v_1)}{n-1})}}{n} \\ \hat{v} = \frac{\sum_{i \in 0...n-1}{v_1} + \sum_{i \in 0...n-1}{\frac{i\times(v_2 - v_1)}{n-1}}}{n} \\ \hat{v} = \frac{n v_1 + \frac{(v_2 - v_1)\times\sum_{i \in 0...n-1}{i}}{n-1}}{n} \\ \hat{v} = v_1 + \frac{(v_2 - v_1)\times\sum_{i \in 0...n-1}{i}}{n\times(n-1)} \\ \hat{v} = v_1 + \frac{(v_2 - v_1)\times\frac{n\times(n-1)}{2}}{n\times(n-1)} \\ \hat{v} = v_1 + \frac{v_2 - v_1}{2} \\ \hat{v} = v_1 + \frac{v_2}{2} - \frac{v_1}{2} \\ \hat{v} = \frac{v_2}{2} + \frac{v_1}{2} \\ \hat{v} = \frac{v_2 + v_1}{2} $$

Great! So now we know that given a trapezium of velocity-time graph, this is equivalent to a rectangle of the same width but with a height that is the average of the two sides:

We already know how to find the distance travelled by a rectangle shaped velocity-time graph so let's use it on the above rectangle: $s = (v2+v1)/2\times t$. It just so happens that this is also the area of the trapezium. So we can just skip all the above and just take the area of the trapezium to find the distance travelled.

All that's left is what is the distance travelled when the velocity is zero since it's just a flat line, but I'm sure you can guess what that is.

Friday, July 31, 2020

How to capture/redirect output in Python's print

The print function in Python v3 is one of the more sophisticated changes made from v2. You can tell it where to print by using the parameter 'file'. It can either print to the screen as usual or it can print into a text file for example. Here's how you make it print into a text file:

with open('x.txt', 'w', encoding='utf-8') as f:
    print('hello', file=f)

The 'file' parameter accepts a stream object and the print function basically just passes the string you give it to this stream object's 'write' method. In fact in the above example you could have written to the file without using print as follows:

with open('x.txt', 'w', encoding='utf-8') as f:
    f.write('hello')
    f.write('\n')
    f.flush()

So question is, what is the stream used by print when it prints to the screen? The print method's signature is defined as follows:

print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

So the answer is sys.stdout, which is a variable that points to the object sys.__stdout__. In fact you can make your own print function by using the sys.__stdout__ object as follows:

import sys

def my_print(line):
    sys.__stdout__.write(line)
    sys.__stdout__.write('\n')
    sys.__stdout__.flush()

my_print('hello')

Note that the output will appear in the command line terminal. Now we saw how the default file parameter of the print function is whatever sys.stdout is pointing to. This means that we can highjack this variable with our own stream so that default prints get redirected to our code instead of the screen. Here's how to make your own stream:

class MyStream(object):

    def write(self, text):
        pass

    def flush(self):
        pass

Just replace 'pass' in the write method with whatever you want and that is what will happen with the print string. Now normally you would do something like this:

print('hello', file=MyStream())

but if you're trying to capture the outputs of some library, like sklearn, then you don't have control over the library's print functions. Instead you can do this:

import sys

tmp = sys.stdout
sys.stdout = MyStream()

print('hello')

sys.stdout = tmp

First you hijack the stdout variable with your own stream and then after the prints you restore the stdout variable with the original stream. Now, you need to be careful with what happens inside your stream's write method because if there is another print in there then it will also call your stream's write method which will result in an infinite recursion loop. What you can do is temporarily restore the original stream whilst inside your stream, like this:

import sys

class MyStream(object):

    def write(self, text):
        tmp = sys.stdout
        sys.stdout = sys.__stdout__

        print('my', text)

        sys.stdout = tmp

    def flush(self):
        pass


tmp = sys.stdout
sys.stdout = MyStream()

print('hello')

sys.stdout = tmp

Now every time you print something, you'll get 'my' added to its front. Unfortunately the print function sends the '\n' at the end of the line in a separate call to the stream's write method, which means that you actually get two calls per print:

my hello
my


A simple solution is to just put an 'if' statement that checks if the text received is just a '\n' and ignore it if so:

import sys

class MyStream(object):

    def write(self, text):
        if text == '\n':
            return

        tmp = sys.stdout
        sys.stdout = sys.__stdout__

        print('my', text)

        sys.stdout = tmp

    def flush(self):
        pass


tmp = sys.stdout
sys.stdout = MyStream()

print('hello')

sys.stdout = tmp

Tuesday, June 30, 2020

Git command line cheat sheet

After my previous blog post about introducing the git command line, here is a table of all the commands I regularly use in git:

The basics

CommandExplanation
git init .
Make the current directory a git repository.
git status
Get information about the current state of the repository.
git add (path)
Add a modified file to the git index. Use '.' for path in order to add all modified files.
git rm (path)
Remove a modified file from the git index.
git diff
Get the file changes from what is in the git index to what is in the currently modified file.
git commit
Take a snapshot backup of the current index.
git log
Get a list of commits made.
git tag -a "(tag name)"
Tag the last commit with a special name (can be used to mark version numbers).
git tag
List existing tags.
git reset --hard HEAD
Return to the last commit (permanently remove any uncommitted changes).


Branches

CommandExplanation
git branch (branch name)
Create a new branch.
git branch --list
List existing branches.
git branch --delete (branch name)
Delete an existing branch.
git checkout (branch name)
Jump to a different branch.
git merge (branch name)
Merge the given branch into the current branch so that they become one branch.
git stash push
Stash any file changes without committing them in order to jump to another branch.
git stash pop
Retrieve back stashed file changes.
git log --graph
Like log but will show how commits in different branches are connected together.

Sunday, May 31, 2020

Quick Git command line tutorial

Git is a version control system which facilitates the creation and maintenance of versions in a project. Usually it's used for software code but it can also be used for things like documents. Its most useful for anything that is text based as you will be able to see which lines have changed across versions.

Given that you've installed git, create folder that will store your project, open your terminal (cmd in Windows), navigate to the folder, and turn it into a repository by entering:

git init .

This will create a folder called ".git" in your current directory which lets you do repository stuff to it. Now whenever you want to do repository stuff, just open your terminal and navigate back to the folder. To see this we can ask for a status report of the repo by entering:

git status

This will output the following:

On branch master

Initial commit

nothing to commit (create/copy files and use "git add" to track)

This is basically telling us that the repo is empty. Now we start putting stuff there. Let's create a readme file using markdown with the file name readme.md:

My first git repo
=================

This is my readme file.


After saving, if we go back to the terminal and enter:

git status

we will now see:

On branch master

Initial commit

Untracked files:
  (use "git add ..." to include in what will be committed)

        readme.md

nothing added to commit but untracked files present (use "git add" to track)

This is saying that readme.md is a file in the repo folder that is not being kept track of by git. We can add this file to the git index by entering:

git add readme.md

After asking for the status again, we will now see:

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached ..." to unstage)

        new file:   readme.md

Now the file is in the index and is 'staged'. If we update the file and save it again:

My first git repo
=================

This is my updated readme file.


the status will now say:

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached ..." to unstage)

        new file:   readme.md

Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

        modified:   readme.md

This is saying that we have a staged file that has also been modified. We can check what has been changed in the file since we last staged it by entering:

git diff

diff --git a/readme.md b/readme.md
index 75db517..5bdd78c 100644
--- a/readme.md
+++ b/readme.md
@@ -1,4 +1,4 @@
 My first git repo
 =================

-This is my readme file.
+This is my updated readme file.

Diff shows you all the lines that were changed together with some unchanged lines next to them for context. The '-' line was replaced by the '+' line. We're also told in "@@ -1,4 +1,4 @@" that the line number that was changed was 4 (1 line was removed at line 4, and another line was added at line 4).

Now we stage this modification so that it is also kept track of:

git add readme.md

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached ..." to unstage)

        new file:   readme.md

Staging changes is not the point of repositories. The point is committing the staged changes. A commit is a backup of the currently indexed files. When you take a backup, you make a copy of your project folder and give it a name so that you can go back to it. This is what a commit does. Enter:

git commit

This will open up a text editor so you can enter a description of what you're committing.
  • If the text editor is vim, which you will know because it is not helpful at all, you need to first press 'i' for 'insert' before you type anything. To save, press ESC, followed by ':', followed by 'w', followed by enter. To exit, press ESC, followed by ':', followed by 'q', followed by enter.
  • If it's nano then just follow the on screen commands, noting that '^' means CTRL.

The commit message you write can later be searched in order to find a particular commit. Note that commits are generally thought of as being unchangable after they are made, so make sure you write everything you need to. The first line of the commit message has special importance and is considered to be a summary of what has changed. You should keep it under 50 characters long so that it can be easily displayed in a table. It should be direct and concise (no full stops at the end, for example), with the rest of the lines under it being a more detailed description to make up for any missing information in the first line. A blank line needs to separate the first line from the rest of the lines. Note that it's fine to only have the first line is it's enough.

added the word 'updated' to readme

readme.md was updated so that it says that it is my updated readme file.
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
#
# Initial commit
#
# Changes to be committed:
#       new file:  readme.md

The lines with the # in front were written by git and should be left there. If we check the status again we will now see:

On branch master
nothing to commit, working tree clean

This is saying that there were no changes since the last commit. We can now see all of our commits by entering:

git log

Author: mtanti 
Date:   Sat May 30 10:46:17 2020 +0200

    added the word 'updated' to readme

    readme.md was updated so that it says that it is my updated readme file.

We can see who made the commit, when, and what message was written. It's important that commits are done frequently but on complete changes. A commit is not a save (you do not commit each time you save), it is a backup, and the state of your project in each backup should make sense. Think of it as if you're crossing out items in a todo list and with each item crossed out you're taking a backup. You should not take a backup in between items. On the other hand, your items should be broken down into many simple tasks in order to be able to finish each one quickly.

Now, let's add a folder called 'src' to our repo and then check the status.

On branch master
nothing to commit, working tree clean

In git's eyes, nothing has changed because git does not have a concept of a folder, only of the directory of files. We need to put a file in the folder in order to be able to add it to git's index. Let's add 2 text files to src: 'a.txt' and 'b.txt' with the following content each:

line 1
line 2
line 3


The status now shows:

On branch master
Untracked files:
  (use "git add ..." to include in what will be committed)

        src/

nothing added to commit but untracked files present (use "git add" to track)

This is saying that we have a new folder called 'src' with some files in it. We can add the folder by using the add command. If you want you can avoid having to include the file names of the files you're adding by just using a '.', which means "all unstaged modified or new files":

git add .

On branch master
Changes to be committed:
  (use "git reset HEAD ..." to unstage)

        new file:   src/a.txt
        new file:   src/b.txt

Let's commit these two files.

git commit

added source files

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Changes to be committed:
#       new file:   src/a.txt
#       new file:   src/b.txt

Note that I didn't add anything else to the message other than the first line. There is no need to specify what files were added as they can be seen in the commit. Now let's look at the log of commits:

git log

commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e (HEAD -> master)
Author: mtanti 
Date:   Sat May 30 11:17:14 2020 +0200

    added source files

commit f71f17b63c6b3ddb7506000cbc422e8f1b173958
Author: mtanti 
Date:   Sat May 30 10:46:17 2020 +0200

    added the word 'updated' to readme

    readme.md was updated so that it says that it is my updated readme file.

We can see how all the commits are shown in descending order of when they were made. You might be wondering what 'HEAD' is referring to. HEAD is the commit we are working on. We can now move in between commits and move in time. This is very useful if you start working on something and realise that there was a better way to do it and need to undo all your work up to a particular point in the commit history. When we do this, we would be moving the HEAD to a different commit. The HEAD can be moved by using the checkout command:

git checkout HEAD~1

This is saying "go back one commit behind the HEAD". The command gives the following output:

Note: checking out 'HEAD~1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b 

HEAD is now at f71f17b... added the word 'updated' to readme

This is saying that we are now in the commit with the message "added the word 'updated' to readme". It is also possible to use the hash as a commit identifier in order to just to it directly without being relative to the HEAD. The hash is the 40 digit hexadecimal next to the commits in the log. For example, the first commit had a hash of 'f71f17b63c6b3ddb7506000cbc422e8f1b173958' so we could have entered "git checkout f71f17b63c6b3ddb7506000cbc422e8f1b173958". We can also just use the first 7 digits to avoid typing everything but it's more likely that there will be a collision with another hash.

If look at the log now, we'll see:

commit f71f17b63c6b3ddb7506000cbc422e8f1b173958 (HEAD)
Author: mtanti 
Date:   Sat May 30 10:46:17 2020 +0200

    added the word 'updated' to readme

    readme.md was updated so that it says that it is my updated readme file.

which shows that the HEAD has been moved one commit behind in the timeline.

Now if we look at the project folder (and refresh), we'll see that the folder 'src' has been physically removed. We can restore it by moving forward in time and go to the latest commit which includes the 'src' folder.

Unfortunately, there is no direct notation for moving forward in time as what we just did is not normal usage of git. Note that at the moment the HEAD is said to be 'detached', which means that it is not in a proper place (the end of the timeline). We can get back to the proper place we should be in by checking out to 'master'.

git checkout master

Checking the log, we now see:

commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e (HEAD, master)
Author: mtanti 
Date:   Sat May 30 11:17:14 2020 +0200

    added source files

commit f71f17b63c6b3ddb7506000cbc422e8f1b173958
Author: mtanti 
Date:   Sat May 30 10:46:17 2020 +0200

    added the word 'updated' to readme

    readme.md was updated so that it says that it is my updated readme file.

So what is this 'master' business? What we were calling timelines are actually called 'branches' in git, and branches are one of the most important things in git. Imagine you've started working on a new feature in your program. Suddenly you are told to let go of everything you're doing, work on fixing a bug, and quickly release an update right away. The feature you're working on is half way done and you can't release an updated version of the program with a half finished function; but there's no way you'll finish the feature quickly enough. Do you undo all the work you did on the feature so that you're at a stable version of the program and able to fix the bug? Of course not. That's what branches are for.

With branches you can keep several versions of your code available and switch from one to the other with checkout. The master branch is the one you start with. Ideally the master's last commit should always be in a publishable state (no 'work in progress'). Of course if you're just working on the master branch then this would not be possible without taking committing very rarely, which is bad practice. The solution is to have several development branches on which you modify the project bit by bit. Every time you start working on a new publishable version of your project, you start a new branch and work on modifying your project to create the new version. Once you finish what you're doing and are happy with the result, you then merge the development branch into the master branch. If you're in the middle of something and need to fix a bug, you switch to the master branch, create a new branch for fixing the bug, fix it, and merge it to the master. Then you merge the new master code with your earlier development branch so that you can continue working as if nothing happened.

Let's make a development branch. Enter the following:

git branch mydevbranch

This will create a branch called 'mydevbranch' that sticks out from the current branch at the current HEAD, that is, it will create a branch that sticks out from master's last commit. By 'sticks out' I mean that when you switch to mydevbranch, the project will be changed to look like it was at the commit from where the branch is starting from. Alternatively you can include a commit hash after the name of the branch in order to make it stick out from an earlier point in the current branch. For example "git branch mydevbranch f71f17b63c6b3ddb7506000cbc422e8f1b173958" will create a branch from the first commit.

We can see a list of branches by entering:

git branch --list

* master
  mydevbranch

The asterisk shows which branch is currently active (has the HEAD).

Now switch to the new branch using checkout:

git checkout mydevbranch

and check the status:

On branch mydevbranch
nothing to commit, working tree clean

It now says that we're on mydevbranch instead of on master. Note that whilst we're on the new branch, any modifications we make to the project will be stored on the branch whilst master will remain as it is. Let's modify src/a.txt to look like this:

line 1
line 2 changed
line 3


And now add and commit this file and then check the log:

commit 9bc4488ac847bceccb746eeafb1a8c239de350f2 (HEAD -> mydevbranch)                                                                            Author: mtanti                                                                                                              Date:   Sun May 31 10:24:23 2020 +0200                                                                                                                                                                                                                                                                changed line in a.txt                                                                                                                                                                                                                                                                         commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e (master)                                                                                         Author: mtanti                                                                                                              Date:   Sat May 30 11:17:14 2020 +0200                                                                                                                                                                                                                                                                added source files                                                                                                                                                                                                                                                                            commit f71f17b63c6b3ddb7506000cbc422e8f1b173958                                                                                                  Author: mtanti                                                                                                              Date:   Sat May 30 10:46:17 2020 +0200                                                                                                                                                                                                                                                                added the word 'updated' to readme                                                                                                                                                                                                                                                                readme.md was updated so that it says that it is my updated readme file.

Note how the log shows us where each branch is and where the HEAD is. In this case the HEAD is at mydevbranch's last commit and master is one commit behind it. Let's add another commit after changing src/b.txt:

line 1
line 2
line 3
line 4


Now that we're done with the changes in this new version, we can merge the development branch to master by first switching to master and then merging to mydevbranch:

git checkout master
git merge mydevbranch

Updating 59cfc3f..f962efb
Fast-forward
 src/a.txt | 2 +-
 src/b.txt | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

Note that this was a fast forward merge, which is the best kind of merge you can have. It happens when all the changes were made in a neat sequence which can be simply reapplied on the master, as opposed to having multiple branches changing stuff at the same time.

Before seeing what a merge conflict looks like, let's look at the log again now that we've made the merge, but this time as a graph:

git log --graph

* commit f962efb409e4f08f94d717dec866519bc2848e8f (HEAD -> master, mydevbranch)
| Author: mtanti 
| Date:   Sun May 31 10:30:42 2020 +0200
|
|     added a new line to b.txt
|
* commit 9bc4488ac847bceccb746eeafb1a8c239de350f2
| Author: mtanti 
| Date:   Sun May 31 10:24:23 2020 +0200
|
|     changed line in a.txt
|
* commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e
| Author: mtanti 
| Date:   Sat May 30 11:17:14 2020 +0200
|
|     added source files
|
* commit f71f17b63c6b3ddb7506000cbc422e8f1b173958
  Author: mtanti 
  Date:   Sat May 30 10:46:17 2020 +0200

      added the word 'updated' to readme

      readme.md was updated so that it says that it is my updated readme file.

Here we can see a neat straight line moving through a timeline of commits, from the first to the last with no splits along the way. The master and development branch are together on the last commit in the timeline. Now let's see how this can be different.

Switch back to the development branch so that you can start working on the next version of the project:

git checkout mydevbranch

and change src/a.txt to have a new line:

line 1
line 2 changed
line 3
line 4


As you're working on the file, you get a call from your boss who tells you to immediately change all the lines to start with a capital letter in both src/a.txt and src/b.txt. Now what? You need to stop working on mydevbranch, go back to master, and create a new branch for working on the new request.

Before switching branches it's important that you do not have any uncommitted changes in your project, otherwise they will be carried over to the master branch and that would make things confusing. If you're not at a commitable point in your change then you can save the current state of the branch files by stashing them instead:

git stash push

This will add a stash object to the current commit of the current branch which can then be retrieved and removed. Next checkout to master:

git checkout master

Create and switch to a new branch.

git branch myurgentbranch
git checkout myurgentbranch

and apply the urgent modifications.

src/a.txt
Line 1
Line 2 changed
Line 3


src/b.txt
Line 1
Line 2
Line 3
Line 4


Commit these changes and merge to master:

git add .
git commit
git checkout master
git merge myurgentbranch

The merge should be a fast forward since we started working on the last commit of master with no interruptions. The log will show us the current situation:

git log --graph

* commit 5dcd492113d3942550a58efdc7b90e15bd36d537 (HEAD -> master, myurgentbranch)                                                               | Author: mtanti                                                                                                            | Date:   Sun May 31 11:06:39 2020 +0200                                                                                                         |                                                                                                                                                |     capitalised each line in source files                                                                                                      |                                                                                                                                                * commit f962efb409e4f08f94d717dec866519bc2848e8f (mydevbranch)                                                                                  | Author: mtanti                                                                                                            | Date:   Sun May 31 10:30:42 2020 +0200                                                                                                         |                                                                                                                                                |     added a new line to b.txt
|
* commit 9bc4488ac847bceccb746eeafb1a8c239de350f2
| Author: mtanti 
| Date:   Sun May 31 10:24:23 2020 +0200
|
|     changed line in a.txt
|
* commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e
| Author: mtanti 
| Date:   Sat May 30 11:17:14 2020 +0200
|
|     added source files
|
* commit f71f17b63c6b3ddb7506000cbc422e8f1b173958
  Author: mtanti 
  Date:   Sat May 30 10:46:17 2020 +0200

      added the word 'updated' to readme

      readme.md was updated so that it says that it is my updated readme file.

You can see how when we commit the changes in mydevbranch, we'll have a fork in the timeline from the straight line that we currently have. Let's see what happens then.

Now that we're ready from the urgent request we can go back to the normal development branch:

git checkout mydevbranch

and pop back the changes we stashed:

git stash pop

Note that "git stash list" shows what is in the stash. We were working on src/a.txt where we were adding a new line to the file:

line 1
line 2 changed
line 3
line 4


Now let's imagine that we finished the changes we were making and can commit them:

git add .
git commit

Now a.txt is supposed to have two changes: the new line and each line starting with a capital letter. Each of these changes are on a different branch. Let's start by making the development branch complete by merging the changes we applied to the master to the development branch (note that we're reversing the direction of the merge now because we want the development branch to be up to date).

git merge master

Auto-merging src/a.txt
CONFLICT (content): Merge conflict in src/a.txt
Automatic merge failed; fix conflicts and then commit the result.

This is when things start getting hairy as you'll need to manually fix your conflicting changes. The status will tell us which files need to be fixed:

On branch mydevbranch
You have unmerged paths.
  (fix conflicts and run "git commit")
  (use "git merge --abort" to abort the merge)

Changes to be committed:

        modified:   src/b.txt

Unmerged paths:
  (use "git add ..." to mark resolution)

        both modified:   src/a.txt

This saying that during merging, src/b.txt was updated with no conflicts but src/a.txt requires manual intervention. If we open src/a.txt we'll see the following:

<<<<<<< HEAD
line 1
line 2 changed
line 3
line 4
=======
Line 1
Line 2 changed
Line 3
>>>>>>> master


The file has been modified by git to show the conflicting changes. 7 arrows and equals signs are used to highlight sections of changes which need to be resolved. Note that all the lines have been changed here to the section is the whole file. Now we can either fix the file directly or use git's "git mergetool" to help us by showing all the changes. In this case we can modify the file directly:

Line 1
Line 2 changed
Line 3
Line 4


Make sure to remove all the arrows and equals signs. Status will now output:

All conflicts fixed but you are still merging.
  (use "git commit" to conclude merge)

Changes to be committed:

        modified:   src/a.txt
        modified:   src/b.txt

Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

        modified:   src/a.txt

It's now saying that conflicts were fixed. All we need to do is add the fixed file and continue the merge.

git add src/a.txt
git merge --continue

At the end of the merge, you will be asked to enter a commit message in order to automatically commit the merge change. Git automatically puts the message "Merge branch 'master' into mydevbranch", which is good enough.

The log now shows the new timeline:

*   commit 49abf32812ec7dbeeab792729098a61fd3446a45 (HEAD -> mydevbranch)
|\  Merge: b6cc3c8 5dcd492
| | Author: mtanti 
| | Date:   Sun May 31 11:36:50 2020 +0200
| |
| |     Merge branch 'master' into mydevbranch
| |
| * commit 5dcd492113d3942550a58efdc7b90e15bd36d537 (myurgentbranch, master)
| | Author: mtanti 
| | Date:   Sun May 31 11:06:39 2020 +0200
| |
| |     capitalised each line in source files
| |
* | commit b6cc3c887533a995e749589b6cdbfaaad530b03e
|/  Author: mtanti 
|   Date:   Sun May 31 11:17:55 2020 +0200
|
|       added new line to a.txt
|
* commit f962efb409e4f08f94d717dec866519bc2848e8f
| Author: mtanti 
| Date:   Sun May 31 10:30:42 2020 +0200
|
|     added a new line to b.txt
|
* commit 9bc4488ac847bceccb746eeafb1a8c239de350f2
| Author: mtanti 
| Date:   Sun May 31 10:24:23 2020 +0200
|
|     changed line in a.txt
|
* commit 59cfc3f057bf1f19038ab15c4357d97bc84ac30e
| Author: mtanti 
| Date:   Sat May 30 11:17:14 2020 +0200
|
|     added source files
|
* commit f71f17b63c6b3ddb7506000cbc422e8f1b173958
  Author: mtanti 
  Date:   Sat May 30 10:46:17 2020 +0200

      added the word 'updated' to readme

      readme.md was updated so that it says that it is my updated readme file.

We can now continue modifying our development branch with anything left to add in the new version and then switch to master and merge, which will be a fast forward since we have resolved all conflicts already. Note that if you see the log before merging, you will not see the development branch since it is not in the master's timeline. You can see all timelines by entering "git log --graph --all".

git checkout master
git merge mydevbranch

If you want to delete the urgent branch, just enter "git branch --delete myurgentbranch".

This concludes our quick tutorial. I didn't mention anything about remote repositories and pushing and pulling to and from the repositories but basically if you setup a github account, you can keep a backup online for multiple developers to work together, pushing the local repository to the online one and pulling the online repository when it was changed by someone else. The online repository is referred to as 'origin' in git.