Monday, November 7, 2011
A more complete proof that the square root of 2 is irrational.
Assume that the square root of 2 is rational. So,
√2 = a/b
=>
2 = (a/b)^2
=>
2 = a^2 / b^2
=>
2 b^2 = a^2
So far so good. The usual proof continues with the following statement:
Since a^2 is equal to a natural number multiplied by 2, a^2 is an even number. But for a^2 to be even, a must be even too (see proof in the appendix at the end). So that means that there is a natural number k where a = 2k.
Since a = 2k and 2 b^2 = a^2,
2 b^2 = a^2
=>
2 b^2 = (2k)^2
=>
2 b^2 = 4 k^2
=>
b^2 = 2 k^2
Just like for a, b must also be an even number.
The proof usually ends right there, claiming that since a and b are both even numbers, then the fraction a/b is not simplified and irreducible, contradicting that a/b exists. But let's see where the proof takes us if we just keep on going.
If both a and b are even, then the fraction a/b can be simplified by dividing both a and b by 2, that is, if a = 2k and b = 2l, then we can say that √2 = k/l. But after doing this we can reapply the same reasoning on k and l and we'll discover that k and l are also both even numbers, and we can do it again and again ad infinitum.
So, which natural numbers can be divided by 2 infinitely? Only 1 number can do that, zero. But replacing zero for both a and b will not make their quotient a real number, or if you want to define 0/0, it will not result in a number whose square equals 2. So there is no fraction a/b which gives √2.
So there you have it, a proof that goes on till the end.
===========================
APPENDIX
Now on to the proof that an even square can only come from an even number squared:
Let a^2 be an even number.
a can either be even or odd, that is there must exist an n where
a = 2n or a = 2n + 1
If a = 2n, a^2 = (2n)^2 = 4 n^2 = 2(2 n^2), which is an even number
If a = 2n + 1, a^2 = (2n + 1)^2 = 4 n^2 + 4n + 1 = 2(2 n^2 + 2n) + 1, which is an odd number
So an even number squared will give an even number and an odd number squares will give an odd number. Hence, a square even number can only come from an even number squared.
Wednesday, November 2, 2011
Wisdom hierarchy vs Bloom's taxonomy
An example of this is seeing the symbols "3", "×", "4", "=" and "12". Those symbols may not mean anything to you if you don't know arithmetic.
So now "3×4=12" has a meaning. It means that if you multiply the numbers 3 and 4, the result is equal to 12. You may know what the symbols mean but you may not be in a form that is useful.
So now we have organized every multiplication of two numbers we learned into a multiplication table. If we want to know what a particular multiplication equals, we know how to do that, we simply look it up our multiplication table. You may know how to multiply numbers together but you may not know why when numbers are multiplied they give a particular number as a result.
So now we understand that multiplication is repeated addition. Now we can add to our knowledge new information which is generated from our understanding rather than from the external world (such as having to ask someone). You may understand how to do multiplication but you may not be able to compare different methods to doing multiplication.
We now can decide which method we should use to multiply two numbers, be it by looking up the multiplication table, by repeated addition or by long multiplication.
An example question would be "What does the symbol × represent?".
An example question would be "Explain what the expression 2×3=6 means in your own words.".
An example question would be "How many apples would you have if you had 2 baskets with 3 apples in each?".
An example question would be "What is the next number in the sequence 21, 42, 63, __".
An example question would be "Which multiplication method would you use to multiply 128 by 64 and why?".
An example question would be "If all you have is the product of the sum and difference of two numbers and one of the numbers, how can you find the other number?".
Remembering type questions test the student having memorized data.
Understanding type questions test if the student has derived information from data.
Applying type questions test if the student has developed a useful knowledge from the information and if the knowledge can be readily used.
Analysis type questions test if the student has understood the basis of their knowledge and can derive new information from it.
Evaluating type questions test if the student has obtained any wisdom on the subject and hence can make sound judgement about it.
The last question type, creating, is not covered by the Wisdom hierarchy and perhaps it predicts yet another higher level form of cognition, perhaps called "creativity", which is when you use knowledge, understanding and wisdom together to derive new knowledge, understanding and wisdom, where knowledge provides the raw material to act on, understanding provides the ways to rearrange the knowledge and wisdom guides you into choosing a solution path which is most likely to give good results. Once this is done you will have learned from experience and would have added new knowledge, a deeper understanding of that knowledge together with new ways of using it and you would be able to make better judgement in the future.
Monday, September 19, 2011
A lousy tutorial to C# drag and drop
1. Create your source and target controls. In this case we're using 2 ListViews where the one on the right (called listView1) is the source of the drag and the one on the left (called listView2) is target of the drop. However note that the source of the drag can even be from the windows explorer by drag dropping a file into the control.
2. Set the AllowDrop property of the control which will receive the drop to true.
3. Set up an event in the control which will be dragged from, that senses that a drag has been initiated, such as the ItemDrag event or the DragLeave event and call the DoDragDrop method. The DoDragDrop method is to be passed the data to be transferred (the object you want the receiving control to get).
4. Set up the DragOver event in the control which will be dropped on, to check if the data being dragged over can be accepted and change the cursor to an invalid cursor picture if not. You can check the type of the data being dragged by using the GetDataPresent method.
5. Set up the DragDrop event in the control which will be dropped on, to actually do something with the data once it has been dropped. This event will only fire if the DragOver event did not set Effect to None.
And there you have it. A lousy tutorial. However, here's something worth mentioning:
If you are transferring an object which could be one of several types and you want to view the object as its base type, then this will not work, as the GetData method will just return null if you pass it a base class for a type. The only was I found to get around this was by created a proxy object which will contain the object of interest casted as its base class, and then what you receive the proxy object in the receiving control you just get the object of interest from it. Like so:
Sunday, July 17, 2011
Quicksort partitioning
Quicksort itself works by taking a list, picking an element from the list which is referred to as the "pivot" and splitting the list into two sub lists, the first containing all the elements smaller than the pivot and the second containing all the elements greater than the pivot. Once you have these two sub lists you can sort each one independently of the other since elements in one list will not be moved into the other after the sort is complete.
This splitting into two lists is called "partitioning". Partitioning once will not sort the list but it will allow you to either use a different sorting algorithm on each sub list (partition) or to recursively partition the two partitions until you end up with a partition of 1 or 0 elements, which is necessarily sorted.
For example, partitioning the list [7,3,6,4,1,7,3] using 4 as a pivot will give us a first partition of [3,1,3] and a second partition of [7,6,7]. The pivot itself, along with other duplicates of it, may or may not go in one of the partitions, depending on how the partitioning is done. If it does not go into one of the partitions, then the sort will place the pivot between the 2 partitions after they have been sorted.
The most intuitive way to partition is by creating 2 new lists, going through the unsorted list and copying elements from the unsorted list into one of the 2 lists. This is memory expensive however as you end up needing twice as much space as the unsorted list takes. The following partitioning algorithms are "in-place" and hence do not need any new lists.
This is the partitioning algorithm I was familiar with at school. It's quite intuitive but slow when compared to the next algorithm. The way this works is by putting the pivot into its sorted place, that is, the place where it will be after the whole list has been sorted. All the elements smaller than the pivot will be on its left and all the elements larger than the pivot will be on its right. Therefore you would have created 2 partitions, the left side of the pivot and the right side.
The algorithm uses a pivot pointer which keeps track of where the pivot is and an index pointer which is used to compare the pivot to other elements. The pivot pointer starts by being at the right end of the list (you can choose a pivot and swap it with the last element if you don't want to stick to the element which happens to be there) and the index pointer starts by being at the left end of the list. The index moves towards the pivot pointer until it encounters an element which is not on the correct side of the pivot, upon which the index and the pivot and swapped and the the index pointer and pivot pointer swap locations. Once the index pointer and pivot pointer meet, the pivot is in its sorted location and the left and right side of the pivot are partitions.
Pseudo code:
function partition(arr, left, right) pivotPtr = right indexPtr = left while pivotPtr != indexPtr if indexPtr < pivotPtr //if index pointer is to the left of the pivot while arr[indexPtr] <= arr[pivotPtr] and indexPtr < pivotPtr indexPtr++ //move index pointer towards the pivot if indexPtr < pivotPtr swap(arr[indexPtr], arr[pivotPtr]) swap(indexPtr, pivotPtr) else //if index pointer is to the right of the pivot while arr[indexPtr] >= arr[pivotPtr] and indexPtr > pivotPtr indexPtr-- //move index pointer towards the pivot if indexPtr > pivotPtr swap(arr[pivotPtr], arr[indexPtr]) swap(pivotPtr, indexPtr) return pivotPtr
In the previous partitioning algorithm, we had to constantly swap the pivot in order to eventually put it in its place. This is however unnecessary as partitioning does not require the pivot to be in its sorted place, only that we have 2 partitions, even if the pivot itself is in one of the partitions (it doesn't matter in which one as it could be eventually placed in its sorted place in either partition).
This time we will not care where the pivot is, as long as we know its value. We will need 2 pointers, a high and a low pointer, which will be moving towards each other. The low pointer will expect to encounter only elements which are smaller than the pivot and the high point will expect to encounter only elements which are larger than the pivot. When both pointers encounter a wrong element, they swap the elements and continue moving towards each other. When they eventually meet, all the elements to the left of the meeting point will be smaller than or equal to the pivot and all the elements to the right of the meeting point will be greater than or equal to the pivot.
Since both pointers will be moving toward each other before swapping, this algorithm will do less swaps than the previous one and hence will be much faster. In fact a simple experiment will show that it does half the number of swaps.
Pseudo code:
function partition(arr, left, right, pivot) lo = left hi = right while lo < hi while arr[lo] <= pivot and lo < hi lo++ //move low pointer towards the high pointer while arr[hi] >= pivot and hi > lo hi-- //move high pointer towards the low pointer if lo < hi swap(arr[lo], arr[hi]) /* Since the high pointer moves last, the meeting point should be on an element that is greater than the pivot, that is, the meeting point marks the start of the second partition. However if the pivot happens to be the maximum element, the meeting point will simply be the last element and hence will not have any significant meaning. Therefore we need to make sure that the returned meeting point is where the starting point of the second partition is, including if the second partition is empty. */ if arr[lo] < pivot return lo+1 else return lo
Thursday, June 2, 2011
Negation Introduction
P => Q, P => ~Q --------------- ~P
and which was used like this:
1. | P Assumed ... | ... 10. | Q 11. | P Assumed ... | ... 20. | ~Q 21. ~P Negation introduction from lines 1-10 and 11-20
In other words, if P implies that Q is true and also P implies that Q is not true, then P must itself not be true.
I never actually understood the logic behind it however. We were told that this how a proof by contradiction works. An example of a proof by contradiction, in words, is this:
If it were raining, the streets would be wet. But the streets are not wet. Therefore it is not raining.
The fact that if a particular proposition is true, it would lead to another proposition which we know is false, implies that the first proposition must itself be false.
The problem with the rule of inference we used for negation introduction is that the way I interpret it linguistically is not at all how a proof by contradiction is usually told. If we had to turn the rule of inference into words, it would be this:
If P implies that something is both true and false, then P is itself false.
A rule of inference which is closer to a proof by contradiction would be this:
Q, P => ~Q ---------- ~P
which would be used like this:
... 10. Q 11. | P Assumed ... | ... 20. | ~Q 21. ~P Negation introduction from lines 10 and 11-20
This is closer to a proof by contradiction.
If P implies something which is not true, then P itself is not true.
The question is, can we use this new rule of inference instead of the usual one?
When can the usual rule of inference be more powerful than the proposed one? Only when both Q and ~Q can ONLY be derived using P. Hence P must be assumed in both cases which cannot be done with the proposed rule of inference.
But this can easily be solved by using the lemma Q /\ ~Q => X and tautologies. If P can derive both Q and ~Q, then we can also derive Q /\ ~Q from which further derive anything we want, including something false. Worst case, if we don't know what to derive which is false, we can derive the negation of a tautology, such as ~(A => A). A => A is a tautology and we can always just insert it in a proof.
So using this trick, we'd do the following:
1. | P Assumed ... | ... 10. | Q ... | ... 20. | ~Q 21. | Q /\ ~Q Conjunction introduction from lines 10 and 20 22. | ~(A => A) Q /\ ~Q => X lemma from line 21 23. A => A A => A lemma 24. ~P Negation introduction from lines 1-22 and 23
And therefore, where ever we can use the usual rule of inference, we can also use the proposed rule of inference.
QED :)
Friday, April 22, 2011
Artificial Intelligence
Artificial Intelligence
Intelligence is the ability to do something without previously knowing how and use the new knowledge on other problems, or rather, to learn without being taught. Someone who is stupid and therefore not intelligent is someone who requires instructions in order to do (apparently) everything, much like a computer. So what is artificial intelligence? How can a computer simulate this process?
Well if this is possible then it would imply the end of programming, since essentially there would only be one program which will learn how to do anything the user wants. So basically an intelligent program has 3 phases: You tell it what you want it to do, it figures out how to do it and finally does it. The program can decide that it cannot do what it was told. However the most intelligent program is that which accepts the most specifications. It can also decide that it does not have the necessary resources to do what it was told. Again, the most intelligent program is the one which does the most things with the resources it has.
Learning requires external information. Therefore the program must be allowed to gather information about the problem and perhaps completed research by other sources. This is best accomplished via the Internet, although elicitation with the user is a must. The most intelligent program is that which makes best use of past knowledge and requiring least new knowledge (the ability to reuse knowledge). Of course if the program does not gather new information and relies too much on past knowledge it will end up being “closed minded” and this is not desirable as it will result in a closed region of knowledge with no new ideas.
Just like a computer is built without knowing what it’s going to be used for, so must be an intelligent program. A truly intelligent program would not be made for specific tasks such as recognising images or playing a game. It should be the most reusable application ever programmed. It should not be simply a part of another program, but be the entire program (except perhaps the interface).
So it seems that artificial intelligence means a program which translates specifications to solutions without the programmer knowing the solution. The key point here is that the programmer doesn’t know the solution (OK, no one in the development process knows it). This facilitates programming since the programmer need not know the solution before writing the program but rather leaves it to the program to find the solution. It’s also great when we as humans still have not found a solution (or an efficient one) to a particular problem.
What may the future hold for AI? As already mentioned, one day there will be a published algorithm for true AI which is able to learn how to solve any solvable problem, given a specification which has a high level of expressiveness. The first to benefit would be the humanoid robots which in turn would benefit humans in a variety of ways, which need not be mentioned. However will this mean the end of all human jobs, skilled and unskilled alike? Probably what will happen is that the developed countries will slow down the development of AI in the market in order to preserve jobs. But it might be possible for undeveloped countries to acquire some intelligent robots (through missionaries for example) which will help in some way or another. Perhaps in the developed world some hard to find professions will be filled in by robots, but I doubt they will be popular. However it can be possible that one day no one will work anymore and communism will take over as financial classes will be eradicated. All work will be done by robots and people will live a leisurely life, receiving provisions and resources equally. I doubt this will be allowed.
One application of AI I would pursue would be that of fabricating assignments. It would accept the specs of the assignment in raw format as given to the student, possibly with the addition of some course notes for reference, and the ID number of the student. The routine would understand the spec and using the ID number will generate a unique assignment with compiled code (if any) and documentation and comments. If I were to manage to realise such a routine, I would charge my fellow class mates to write their assignments, except that I wouldn’t do anything except feed the program the spec and ID number. Since every student will receive a unique assignment there will be no fear of plagiarism and detecting that the assignment was generated would be practically impossible unless the lecturers would obtain a copy of the routine and compare the given work with the generated one. Come to think of it, a uniquely generated id number would be better. The reason why AI makes sense to be used is because of the shear difficulty to find an algorithm which solves the problem.
Another application would be the semantic interpretation of a given code. The program would accept a given piece of code (given that it is accepted by the compiler) and produce a description in simple English what happens when it executes, at a given level of abstraction. Of course this can be extended to the interpretation of a given executable file. If this is possible then viruses and all malware would be easily detected with no need for updates. The other way round would also be nice where given a description in simple English, the program generates annotated code, ready to be compiled.
Thursday, April 21, 2011
JQuery event / manipulation not working
Make sure that all html DOM manipulation is done after the html loads by putting the javascript code inside a
$(document).ready(function() { event and manipulation stuff here :D });
Photoshop slices using divs creates gaps
This happened to me whilst doing my personal website (www.marctanti.com). The solution was found here. Apparently this happens because when the doctype of the html page is set to strict, images are by default set to display:inline which makes them vertically aligned so that the bottom of the image is in line with where the base line of the text is. In the case of sliced web pages, the slices are placed inside divs so the images are aligned with where the base line of text inside the div would be if there was any. But the divs make room for the "descenders" of text, that is, the extra room needed to display the bottom of the letters 'y' and 'g' for example. Therefore the images will not be aligned with the very bottom of the divs.
So to fix this problem we need to make the images aligned to the very bottom of the containing divs by setting the display css of each slice to block. I gave each slice image a class attribute called slice and then added the following css:
.slice { display:block; }
That fixed it.
Wednesday, March 30, 2011
Database Normalization (1-3 NF)
http://en.wikipedia.org/wiki/Database_normalization
This-
Id | Student | Subject 1 | Subject 2 |
---|---|---|---|
1 | Harry | Charms | Potions |
2 | Ron | Charms | Potions |
And this-
Id | Student | Subjects |
---|---|---|
1 | Harry | Charms, Potions |
2 | Ron | Charms, Potions |
Or as is shown in most examples, this-
Id | Student | Subject |
---|---|---|
1 | Harry | Charms |
Potions | ||
2 | Ron | Charms |
Potions |
In both cases we are tying to shove in a list of values, that is a one-to-many or many-to-many relationship with the row, into the row itself. For reasons detailed in the wikipedia article, this should be fixed by creating a separate row for each value and repeating the values in the others fields, like so:
Id | Student | Subject |
---|---|---|
1 | Harry | Charms |
1 | Harry | Potions |
2 | Ron | Charms |
2 | Ron | Potions |
The table is now in 1NF were it not for the primary key losing its uniqueness. The id field must be unique in order to identify each distinct student. To solve this we can do one of two things:
Either we create an additional key field for the subjects and make the primary key of the whole table be a composite key of both student and subject ids, thus making the composite key unique-
StudId | Student | SubjId | Subject |
---|---|---|---|
1 | Harry | 1 | Charms |
1 | Harry | 2 | Potions |
2 | Ron | 1 | Charms |
2 | Ron | 2 | Potions |
Or we could just prepare for the other normal forms and start splitting the tables now as is usually done in examples-
StudId | Student | SubjId |
---|---|---|
1 | Harry | 1 |
1 | Harry | 2 |
2 | Ron | 1 |
2 | Ron | 2 |
SubjId | Subject |
---|---|
1 | Charms |
2 | Potions |
Notice that we still need to make the foreign key in the students table part of the primary key in order to preserve uniqueness. If you're wondering why we didn't use a weak entity (bridge table / junction table), that's because it's done in 2NF.
If we had more than one multi-valued field, we'd just create a Cartesian product, like so:
StudId | Student | SubjId | Subject | TeacId | Teacher |
---|---|---|---|---|---|
1 | Harry | 1 | Charms | 1 | Filius |
1 | Harry | 2 | Potions | 2 | Slughorn |
1 | Harry | 2 | Potions | 3 | Snape |
2 | Ron | 1 | Charms | 1 | Filius |
2 | Ron | 2 | Potions | 2 | Slughorn |
2 | Ron | 2 | Potions | 3 | Snape |
Resulting in 3 tables:
StudId | Student | SubjId | TeacId |
---|---|---|---|
1 | Harry | 1 | 1 |
1 | Harry | 2 | 2 |
1 | Harry | 2 | 3 |
2 | Ron | 1 | 1 |
2 | Ron | 2 | 2 |
2 | Ron | 2 | 3 |
SubjId | Subject |
---|---|
1 | Charms |
2 | Potions |
TeacId | Teacher |
---|---|
1 | Filius |
2 | Slughorn |
3 | Snape |
Complete example:
Id | Student | SubjId | Subject | Room | RoomHall | TeacId | Teacher |
---|---|---|---|---|---|---|---|
1 | Harry | 1 | Charms | 101 | A | 1 | Filius |
2 | Potions | 202 | B | 2 | Slughorn | ||
3 | Snape | ||||||
2 | Ron | 1 | Charms | 101 | A | 1 | Filius |
2 | Potions | 202 | B | 2 | Slughorn | ||
3 | Snape |
Turns into:
StudId | Student | SubjId | Subject | Room | RoomHall | TeacId | Teacher |
---|---|---|---|---|---|---|---|
1 | Harry | 1 | Charms | 101 | A | 1 | Filius |
1 | Harry | 2 | Potions | 202 | B | 2 | Slughorn |
1 | Harry | 2 | Potions | 202 | B | 3 | Snape |
2 | Ron | 1 | Charms | 101 | A | 1 | Filius |
2 | Ron | 2 | Potions | 202 | B | 2 | Slughorn |
2 | Ron | 2 | Potions | 202 | B | 3 | Snape |
Notice that the room is assumed to be dependant on the subject such that each subject is taught in its own room.
We may opt to leave the table as it is as it quite complex to break down into smaller tables without any guidance. However if we are to break the table down, the room information would be included in with the subject table since we said that the room is dependant on it, yielding the following:
StudId | Student | SubjId | TeacId |
---|---|---|---|
1 | Harry | 1 | 1 |
1 | Harry | 2 | 2 |
1 | Harry | 2 | 3 |
2 | Ron | 1 | 1 |
2 | Ron | 2 | 2 |
2 | Ron | 2 | 3 |
SubjId | Subject | Room | RoomHall |
---|---|---|---|
1 | Charms | 101 | A |
2 | Potions | 202 | B |
TeacId | Teacher |
---|---|
1 | Filius |
2 | Slughorn |
3 | Snape |
StudId | SubjId |
---|---|
1 | 1 |
1 | 2 |
2 | 1 |
2 | 2 |
StudId | Student |
---|---|
1 | Harry |
2 | Ron |
SubjId | Subject |
---|---|
1 | Charms |
2 | Potions |
And there is it, the weak entity we are so used to in many to many relationships.
Complete example (following from previous):
StudId | Student | SubjId | TeacId |
---|---|---|---|
1 | Harry | 1 | 1 |
1 | Harry | 2 | 2 |
1 | Harry | 2 | 3 |
2 | Ron | 1 | 1 |
2 | Ron | 2 | 2 |
2 | Ron | 2 | 3 |
Turns into:
StudId | SubjId | TeacId |
---|---|---|
1 | 1 | 1 |
1 | 2 | 2 |
1 | 2 | 3 |
2 | 1 | 1 |
2 | 2 | 2 |
2 | 2 | 3 |
StudId | Student |
---|---|
1 | Harry |
2 | Ron |
Hence yielding:
StudId | SubjId | TeacId |
---|---|---|
1 | 1 | 1 |
1 | 2 | 2 |
1 | 2 | 3 |
2 | 1 | 1 |
2 | 2 | 2 |
2 | 2 | 3 |
StudId | Student |
---|---|
1 | Harry |
2 | Ron |
SubjId | Subject | Room | RoomHall |
---|---|---|---|
1 | Charms | 101 | A |
2 | Potions | 202 | B |
TeacId | Teacher |
---|---|
1 | Filius |
2 | Slughorn |
3 | Snape |
Notice that if in 1NF we did not break down the table, we'd result with the same set of tables by now.
SubjId | Subject | Room | RoomHall |
---|---|---|---|
1 | Charms | 101 | A |
2 | Potions | 202 | B |
The RoomHall field is directly dependent on the Room field and not on the SubjId primary key field, so the RoomHall field should go into a table on its own together with the Room field. In fact the room where the subject is thought is not a direct property of the subjects entity but is an entity on its own and hence should be separated into a rooms entity and only referenced by a foreign key.
SubjId | Subject | Room |
---|---|---|
1 | Charms | 101 |
2 | Potions | 202 |
Room | RoomHall |
---|---|
101 | A |
202 | B |
Complete example (following from previous):
SubjId | Subject | Room | RoomHall |
---|---|---|---|
1 | Charms | 101 | A |
2 | Potions | 202 | B |
Turns into:
SubjId | Subject | Room |
---|---|---|
1 | Charms | 101 |
2 | Potions | 202 |
Room | RoomHall |
---|---|
101 | A |
202 | B |
Hence yielding:
StudId | SubjId | TeacId |
---|---|---|
1 | 1 | 1 |
1 | 2 | 2 |
1 | 2 | 3 |
2 | 1 | 1 |
2 | 2 | 2 |
2 | 2 | 3 |
StudId | Student |
---|---|
1 | Harry |
2 | Ron |
SubjId | Subject | Room |
---|---|---|
1 | Charms | 101 |
2 | Potions | 202 |
Room | RoomHall |
---|---|
101 | A |
202 | B |
TeacId | Teacher |
---|---|
1 | Filius |
2 | Slughorn |
3 | Snape |
Monday, March 14, 2011
Reading Related Tables As Nested Rows
The naive way to do this is by using nested loops with a query for movies in the top loop and a query for actors in the second loop. In PHP it would look something like...
$moviesResult = mysql_query("SELECT id, title FROM movies"); while($moviesRow = mysql_fetch_assoc($result)) { echo("<h1>" . $moviesRow['title'] . "</h1>"); echo("<ul>"); $actorsResult = mysql_query("SELECT actors.name FROM actors INNER JOIN movies_actors ON actors.id = movies_actors.actorid WHERE movies_actors.movieid = " . $moviesRow['id'] . ";"); while($actorsRow = mysql_fetch_assoc($actorsResult)) { echo("<li>" . $actorsRow['name'] . "</li>"); } echo("</ul>"); }
This approach however will kill your database. Ideally you should minimize the number of queries sent.
Another approach would be to join the movies table with the actors table and then read it with nested loops.
$result = mysql_query("SELECT movies.id AS id movies.title AS title, actors.name AS actor FROM movies INNER JOIN movies_actors ON movies.id = movies_actors.movie INNER JOIN actors ON actors.id = movies_actors.actor"); $row = mysql_fetch_assoc($result); while($row) { $currMovieId = $row['id']; echo("<h1>" . $row['title'] . "</h1>"); echo("<ul>"); do { echo("<li>" . $row['name'] . "</li>"); } while ($row = mysql_fetch_assoc($result) && $row['id'] == $currMovieId); echo("</ul>"); }
This works but as soon as you add another table to the join, determining which rows contain new information can be a nightmare, not to mention all the rows which just contain repeated information due to the cartesian product. For example:
Id | Title | Actor | Genre |
---|---|---|---|
1 | The Matrix | Keanu Reeves | Action |
1 | The Matrix | Keanu Reeves | Adventure |
1 | The Matrix | Keanu Reeves | Sci-Fi |
1 | The Matrix | Laurence Fishburne | Action |
1 | The Matrix | Laurence Fishburne | Adventure |
1 | The Matrix | Laurence Fishburne | Sci-Fi |
1 | The Matrix | Carrie-Anne Moss | Action |
1 | The Matrix | Carrie-Anne Moss | Adventure |
1 | The Matrix | Carrie-Anne Moss | Sci-Fi |
2 | The Matrix Reloaded | Keanu Reeves | Action |
2 | The Matrix Reloaded | Keanu Reeves | Adventure |
2 | The Matrix Reloaded | Keanu Reeves | Sci-Fi |
2 | The Matrix Reloaded | Laurence Fishburne | Action |
2 | The Matrix Reloaded | Laurence Fishburne | Adventure |
2 | The Matrix Reloaded | Laurence Fishburne | Sci-Fi |
2 | The Matrix Reloaded | Carrie-Anne Moss | Action |
2 | The Matrix Reloaded | Carrie-Anne Moss | Adventure |
2 | The Matrix Reloaded | Carrie-Anne Moss | Sci-Fi |
3 | The Matrix Revolutions | Keanu Reeves | Action |
3 | The Matrix Revolutions | Keanu Reeves | Adventure |
3 | The Matrix Revolutions | Keanu Reeves | Sci-Fi |
3 | The Matrix Revolutions | Laurence Fishburne | Action |
3 | The Matrix Revolutions | Laurence Fishburne | Adventure |
3 | The Matrix Revolutions | Laurence Fishburne | Sci-Fi |
3 | The Matrix Revolutions | Carrie-Anne Moss | Action |
3 | The Matrix Revolutions | Carrie-Anne Moss | Adventure |
3 | The Matrix Revolutions | Carrie-Anne Moss | Sci-Fi |
Are we supposed to receive all those rows just to display the below?
1 | The Matrix | Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss | Action, Adventure, Sci-Fi |
2 | The Matrix Reloaded | Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss | Action, Adventure, Sci-Fi |
3 | The Matrix Revolutions | Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss | Action, Adventure, Sci-Fi |
$moviesResult = mysql_query("SELECT id, title FROM movies;"); $actorsResult = mysql_query("SELECT movies_actors.movieid AS movieid, actors.name AS name FROM actors INNER JOIN movies_actors ON actors.id = movies_actors.actorid;"); $genresResult = mysql_query("SELECT movies_genres.movieid AS movieid, genres.name AS name FROM genres INNER JOIN movies_genres ON genres.id = movies_genres.genreid;"); $actorRow = mysql_fetch_assoc($actorsResult); $genreRow = mysql_fetch_assoc($genresResult); while($movieRow = mysql_fetch_assoc($moviesResult)) { $currMovieId = $movieRow['id']; echo("<h1>" . $movieRow['title'] . "</h1>"); echo("<ul>"); while ($actorRow && $actorRow['movieid'] == $currMovieId) { echo("<li>" . $row['name'] . "</li>"); $actorRow = mysql_fetch_assoc($actorsResult); } echo("</ul>"); echo("<ul>"); while ($genreRow && $genreRow['movieid'] == $currMovieId) { echo("<li>" . $row['name'] . "</li>"); $genreRow = mysql_fetch_assoc($genresResult); } echo("</ul>"); }
In this way we only make 3 queries, equal to the number of tables, and we only read as many rows as needed.
Sunday, March 13, 2011
SQL Joins Tutorial
Now let's say that you want to query the database to view all the rows in the Orders table but with the customer's name and product's name instead of their primary key. If you learned SQL the same way I did, you'd probably do something like this:
SELECT orders.date, customers.name, products.name FROM orders, customers, products WHERE orders.customer = customers.id AND orders.product = products.id;
This is called an implicit join. In fact you are doing an unconditioned join between all 3 tables and then just filtering out the joined rows which are not made of related table rows. What does this mean?
If our 3 tables contain the following data:
Orders:
Id | Date | Customer | Product |
---|---|---|---|
1 | 01/01/01 | 1 | 2 |
2 | 2/02/02 | 3 | 2 |
3 | 03/03/03 | 3 | 3 |
Customers:
Id | Name |
---|---|
1 | John |
2 | Michael |
3 | Terry |
Products:
Id | Name |
---|---|
1 | Cheese |
2 | Parrot |
3 | Spam |
Then the query will cause this to happen:
Orders.Date | Customers.Name | Products.Name |
---|---|---|
01/01/01 | John | Cheese |
01/01/01 | John | Parrot |
01/01/01 | John | Spam |
01/01/01 | Michael | Cheese |
01/01/01 | Michael | Parrot |
01/01/01 | Michael | Spam |
01/01/01 | Terry | Cheese |
01/01/01 | Terry | Parrot |
01/01/01 | Terry | Spam |
02/02/02 | John | Cheese |
02/02/02 | John | Parrot |
02/02/02 | John | Spam |
02/02/02 | Michael | Cheese |
02/02/02 | Michael | Parrot |
02/02/02 | Michael | Spam |
02/02/02 | Terry | Cheese |
02/02/02 | Terry | Parrot |
02/02/02 | Terry | Spam |
03/03/03 | John | Cheese |
03/03/03 | John | Parrot |
03/03/03 | John | Spam |
03/03/03 | Michael | Cheese |
03/03/03 | Michael | Parrot |
03/03/03 | Michael | Spam |
03/03/03 | Terry | Cheese |
03/03/03 | Terry | Parrot |
03/03/03 | Terry | Spam |
And after generating all that it will then select the rows which make sense relation wise, according to the WHERE statement.
Orders.Date | Customers.Name | Products.Name |
---|---|---|
01/01/01 | John | Parrot |
02/02/02 | Terry | Parrot |
03/03/03 | Terry | Spam |
The big table is called a Cartesian product, which means that all possible combinations of rows between the tables are created, yielding a number of rows equal to the multiplication of the number of rows in each table (3 * 3 * 3 = 27 in this case). As you can tell, this will get really huge really quickly as the table get bigger.
Enter explicit joins.
The JOIN operator takes 2 tables and returns a new table, which is a joined version of the 2 tables. You can then join another table to that new table and keep on adding more tables to the "composite" table. There are several joins available in standard SQL which will be described later, but if we should use the INNER JOIN type in an example, this is how joins are used in general:
SELECT orders.date, customers.name, products.name FROM (orders INNER JOIN customers ON orders.customer = customers.id) INNER JOIN products ON orders.product = products.id;
As you can see, the join is used between tables and can even be bracketed in order to state which tables should be joined up first. The ON statement is used to immediately join up the rows which are related, avoiding the cartesian product table in the first place. This will therefore be processed much more efficiently as the third table will only be joined up after the first two tables have been joined correctly, rather than joining everything together without a condition.
In the implicit join example, the tables where joined up with an inner join without the ON condition, hence losing all the advantage of joins. Now that you know how joins work, let's see what are the differences between the 4 join types described in w3schools.com.
INNER JOIN:
Strictly follows the ON condition between tables such that any rows in one table which cannot be matched up with another row in the other table will be left out. This is the kind of join we are used to.
SELECT orders.date, customers.name, products.name FROM (orders INNER JOIN customers ON orders.customer = customers.id) INNER JOIN products ON orders.product = products.id;
results in
Orders.Date | Customers.Name | Products.Name |
---|---|---|
01/01/01 | John | Parrot |
02/02/02 | Terry | Parrot |
03/03/03 | Terry | Spam |
As you can see, not all customers or all products where mentioned. Only the ones which were mentioned in the ORDERS table and hence were related in some way were returned.
OUTER JOIN:
This is where things get a bit hairy and hence require that you make an effort to understand. The outer join makes sure that all rows in both tables are returned, even if there is no matching row in the other table. What happens to those tables which have no matching rows? They are joined to NULL values. As long as they are returned, it doesn't matter what they're joined to right?
You can also use LEFT JOIN and RIGHT JOIN to say whether you want to return all the rows of the table on the left of the join operator only or all the rows of the table on the right of the join operator only.
Now in order to use these joins to create a complex joined table, you have to be sure that a join will not return columns with NULL values which will be used in an ON condition of another join.
SELECT orders.date, customers.name FROM orders RIGHT JOIN customers ON orders.customer = customers.id;
results in
Orders.Date | Customers.Name |
---|---|
01/01/01 | John |
NULL | Michael |
02/02/02 | Terry |
03/03/03 | Terry |
SELECT orders.date, products.name FROM orders RIGHT JOIN products ON orders.product = products.id;
results in
Orders.Date | Customers.Name |
---|---|
NULL | cheese |
02/02/02 | parrot |
01/01/01 | parrot |
03/03/03 | spam |
SELECT orders.date, products.name, customers.name FROM (orders LEFT JOIN customers ON orders.customer = customers.id) RIGHT JOIN products ON orders.product = products.id;
results in
Orders.Date | Customers.Name | Products.Name |
---|---|---|
NULL | NULL | cheese |
02/02/02 | terry | parrot |
01/01/01 | john | parrot |
03/03/03 | terry | spam |
As you can see, depending on how the outer joins are used, we can make a particular table return all its rows, even if it isn't related to any other table.
First of all, what I said about the implicit join not being efficient next to an explicit one is not necessarily true since the SQL optimizer may fix it before executing it.
Secondly, if you can avoid using brackets in the the joins it would be better as that will allow the SQL optimizer to make adjustments to the query without it having to make sure to preserve the precedence which you unneccessarily imposed.
In a future post, I will describe a trick which I used to efficiently view multiple one-to-many related tables which I also had trouble with in SQL until recently. But for now I will leave you with these links:
http://www.w3schools.com/sql/sql_join.asp
http://onlamp.com/pub/a/onlamp/2004/09/30/from_clauses.html
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html