Say you're dating several partners and want to get into a serious relationship eventually. After how many partners can you have a good idea of what a good partner is like? This is called the secretary problem, and it can be formalised as follows:
Let n be the number of partners in total you can date.
Let Ai be the fitness of the ith partner, such that the order of the partners is random and we are interested in the partner with the maximum fitness.
Let r be the number of partners you date before you start looking for a serious relationship.
Once a partner has been evaluated, you cannot go back to them if you try the next partner.
What you're doing is going through all partners i starting from 1 up to r, at which point you will take maxri=1(Ai) as a reference of what a good partner looks like and continue going through the rest of the partners r to n and stop as soon as Ai is greater than the reference (maxri=1(Ai)). If none of the partners are better than the best of the first r−1 then you settle for the last partner.
The question is, what should r be such that the probability of picking the partner with maximum Ai is maximised.
The choice of r actually makes a significant difference, as we can show in the following simulation in Python 3 (in Python everything needs to be 0-based such that the first item is at position 0 rather than 1):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import random
def select(A, r):
ref = max (A[:r])
for i in range (r, len (A)):
if A[i] > ref:
return i
return len (A) - 1
n = 20
for r in range ( 1 , n):
successes = 0
for _ in range ( 10000 ):
A = [random.random() for _ in range (n)]
best = max ( range (n), key = lambda i:A[i])
choice = select(A, r)
if choice = = best:
successes + = 1
print (r + 1 , successes / 10000 )
|
2 0.1746
3 0.2532
4 0.3099
5 0.3459
6 0.3581
7 0.3856
8 0.3927
9 0.3866
10 0.3717
11 0.366
12 0.3382
13 0.3227
14 0.2918
15 0.263
16 0.2228
17 0.1908
18 0.149
19 0.0969
20 0.0474
These are the fraction of times the best partner is selected out of 10000 simulations for each r from 2 to 20. You will notice that the fraction steadily increases up to a peak at 8 and then steadily decreases until the end. It's interesting that the peak is not in the middle of the list but closer to the beginning. How can we find what the best r would be for any n?
First, we need to find the probability of choosing the best partner for a given r and n. This probability can be expressed as two events happening together: you stopping at element i+1 and the maximum element being element i+1 (we're using i+1 instead of just i because it will be mathematically convenient later). These two events are independent and so we can find the probability of each one and then multiply them together.
The probability of the maximum element from all n elements being the i+1th element is 1n.
The other probability is a little trickier. To stop at element i+1, all the elements prior to i+1 have to be less than the reference we got from the first r elements. If a larger element is found sooner, we would have stopped there. Therefore, we are interested in the probability that the maximum element from the first i elements is among the first r elements, which is ri. This is assuming that i+1≥r+1, since we cannot choose any partner before the r+1th partner.
Multiplying these two probabilities, we get rn×i. But this is the probability for a given i+1, whereas we want the probability for any i+1. Each i+1 is mutually exclusive so we can add up the probability of each i+1. Given that i+1 must be greater than or equal to r+1 by definition,
P(r,n)=n∑i=rrn×i=rnn∑i=r1i
Great, so now we have the probability of making a successful selection given r and n. Now how do we find at what r is this maximum? Normally we would find the derivative of this equation with respect to r and see what r has to be for the derivative to be 0 (the turning point), but we can't differentiate this expression because r is discrete. We also cannot treat r as continuous because it is used as the starting point in the summation. But if we focus on finding rn rather than r, then we can do so in a round about way.
Start by multiplying 1n with the expression like this:
rnn∑i=rni1n
We can do this since n cannot be zero. Now what will this expression look like if we let n go to infinity?
limn→∞rnn∑i=rni1n
Let x=limn→∞rn and t=limn→∞in. In this case, the summation looks like a Reinmann sum, that is, it is adding together the areas of equally wide rectangular strips in a fixed region under a graph. As the width of the strips goes to zero, the total area of all the strips equals the area under the graph (in the fixed region), making it equal to the definite integral of the graph. In this case, the graph is f(t)=1t, the width of the strips is 1n, and the region is between rn and nn. This means the following:
limn→∞rnn∑i=rni1n=x∫1x1tdt
Great. So now we need to find the turning point of x∫1x1tdt with respect to x in order to know what rn gives the most probable success. This comes as follows:
ddx(x∫1x1tdt)=0ddxx(ln(1)−ln(x))=0ddx−xln(x)=0−ln(x)−1=0x=1e
Which says that the number of partners you should date before looking for a serious relationship is the total number of partners you will date divided by 1e, or about 37% of the number of people you'll go out with.