Intuitively, if a coin flip has a probability of 1/2 of turning out heads, and we flipped the coin 100 times, we expect that 1/2 of those 100 flips will be heads. What is meant by "expect" is that if we do this 100 coin flip experiment for many times, count the number of times it turns out heads for each 100 flip trial, and take the average of these counts, the average will be close to 1/2 of 100. Furthermore, the more 100 flip trials we include in our average, the closer the average will be 1/2 of 100.
If this were the case, then a probability can be treated as an average proportion, because if a probability of something happening is, say, 1/100, then after 1000 attempts we should find that, on average, 1/100 of those 1000 attempts would be the thing happening. In general, if the probability of an outcome is "p", and "n" attempts are made, then we should have "pn" positive outcomes. That probability is acting as a proportion of the average number of attempts made which will result in a positive outcome out of the attempts made. In fact, semantically speaking, the phrase "This outcome occurs with probability 1/100" and the phrase "This outcome occurs once every 100 times" are identical.
A simple proof of this is in the way we estimate the probability of an outcome. We attempt to produce the outcome (such as a coin flip resulting in heads) for a number of times "n", count the number of times "x" the outcome is positive (heads), and then just find x/n. But in order for this probability to be reliable, the quotient must remain constant for different values of "n" (the value "x" will change according to "n" to keep x/n equal). Given this statement, if we know a reliable probability x/n, and have performed the experiment "m" times, then the number of positive outcomes "y" can be predicted as follows:
For x/n to be reliable, x/n = y/m
Therefore, y = m(y/m) = m(x/n)
That is, since x/n is known and "m" is known, "y" can be found using those two values only.
Of course this is not a rigorous proof. To get a rigorous proof we need to turn to a field of probability called expected value. The expected value of a random variable (such as a coin flip) is the average of the values (assumed to be numerical) of the outcomes after a large number of trials. It is defined as the sum of each outcome multiplied by its probability. For example, the expected value of the value on a die is
1*1/6 + 2*1/6 + 3*1/6 + 4*1/6 + 5*1/6 + 6*1/6
because for each outcome from 1 to 6, the probability is 1/6.
In general, if the probability of outcome "o_i" is "p_i", then the expected outcome is
sum(o_i*p_i for all i)
But this isn't useful for proving the statement in the title. The proof is in this Math Exchange answer which explains that the expected number of positive outcomes out of "n" attempts, given that the probability of each outcome each time is "p", is "pn". It goes like this:
Let the random variable "U_i" be the outcome of the "i"th attempt (heads or tails). If the outcome is positive (heads), "U_i" is 1, otherwise it is 0. Given "n" attempts, the number of positive outcomes is
U_1 + U_2 + U_3 + ... + U_n
Call this actual number of positive outcomes "X", that is
X = U_1 + U_2 + U_3 + ... + U_n
The expected value of "X", written as E(X) is
E(X) = E(U_1 + U_2 + U_3 + ... + U_n)
Since the expected value is a linear operator,
E(X) = E(U_1) + E(U_2) + E(U_3) + ... + E(U_n)
Now, given the above definition of what an expected value is,
E(U_i) = 1*(probability of U_i = 1) + 0*(probability of U_i = 0)
If the probability of "U_i" being 1 is "p_i", then
E(U_i) = p_i
But for all "i", the probability of "U_i" is the same. That is
E(U_i) = p
So that means that
E(X) = p + p + p + ... + p
E(X) = pn
And there we have it, the expected number of positive outcomes out of "n" attempts, each of which has a probability of "p", is "pn", which means that the probability "p" can be treated exactly as if it was the proportion of positive outcomes out of a number of trials.