Say you're dating several partners and want to get into a serious relationship eventually. After how many partners can you have a good idea of what a good partner is like? This is called the secretary problem, and it can be formalised as follows:
Let be the number of partners in total you can date. Let be the fitness of the th partner, such that the order of the partners is random and we are interested in the partner with the maximum fitness. Let be the number of partners you date before you start looking for a serious relationship. Once a partner has been evaluated, you cannot go back to them if you try the next partner. What you're doing is going through all partners starting from 1 up to , at which point you will take as a reference of what a good partner looks like and continue going through the rest of the partners to and stop as soon as is greater than the reference (). If none of the partners are better than the best of the first then you settle for the last partner. The question is, what should be such that the probability of picking the partner with maximum is maximised.
The choice of actually makes a significant difference, as we can show in the following simulation in Python 3 (in Python everything needs to be 0-based such that the first item is at position 0 rather than 1):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import random def select(A, r): ref = max (A[:r]) # first r items for i in range (r, len (A)): # rest of the items if A[i] > ref: return i return len (A) - 1 # return last item if search failed n = 20 for r in range ( 1 , n): # first item not included successes = 0 for _ in range ( 10000 ): A = [random.random() for _ in range (n)] best = max ( range (n), key = lambda i:A[i]) choice = select(A, r) if choice = = best: successes + = 1 print (r + 1 , successes / 10000 ) |
2 0.1746 3 0.2532 4 0.3099 5 0.3459 6 0.3581 7 0.3856 8 0.3927 9 0.3866 10 0.3717 11 0.366 12 0.3382 13 0.3227 14 0.2918 15 0.263 16 0.2228 17 0.1908 18 0.149 19 0.0969 20 0.0474
These are the fraction of times the best partner is selected out of 10000 simulations for each from 2 to 20. You will notice that the fraction steadily increases up to a peak at 8 and then steadily decreases until the end. It's interesting that the peak is not in the middle of the list but closer to the beginning. How can we find what the best would be for any ?
First, we need to find the probability of choosing the best partner for a given and . This probability can be expressed as two events happening together: you stopping at element and the maximum element being element (we're using instead of just because it will be mathematically convenient later). These two events are independent and so we can find the probability of each one and then multiply them together.
The probability of the maximum element from all elements being the th element is .
The other probability is a little trickier. To stop at element , all the elements prior to have to be less than the reference we got from the first elements. If a larger element is found sooner, we would have stopped there. Therefore, we are interested in the probability that the maximum element from the first elements is among the first elements, which is . This is assuming that , since we cannot choose any partner before the th partner.
Multiplying these two probabilities, we get . But this is the probability for a given , whereas we want the probability for any . Each is mutually exclusive so we can add up the probability of each . Given that must be greater than or equal to by definition,
Great, so now we have the probability of making a successful selection given and . Now how do we find at what is this maximum? Normally we would find the derivative of this equation with respect to and see what has to be for the derivative to be 0 (the turning point), but we can't differentiate this expression because is discrete. We also cannot treat as continuous because it is used as the starting point in the summation. But if we focus on finding rather than , then we can do so in a round about way.
Start by multiplying with the expression like this:
We can do this since cannot be zero. Now what will this expression look like if we let go to infinity?
Let and . In this case, the summation looks like a Reinmann sum, that is, it is adding together the areas of equally wide rectangular strips in a fixed region under a graph. As the width of the strips goes to zero, the total area of all the strips equals the area under the graph (in the fixed region), making it equal to the definite integral of the graph. In this case, the graph is , the width of the strips is , and the region is between and . This means the following:
Great. So now we need to find the turning point of with respect to in order to know what gives the most probable success. This comes as follows:
Which says that the number of partners you should date before looking for a serious relationship is the total number of partners you will date divided by , or about 37% of the number of people you'll go out with.