I don't know what possessed me to do this in the wee hours this morning when I couldn't sleep,
but I thought to construct a bell curve from first principles. I had seen
something like this as a Java Applet in 1990s. In that applet, a ball
was dropped onto a grid of pegs. When it hit a peg, it could go either
left or right, with each possibility equally likely. Perhaps it did
this at a height of 20, meaning the ball would hit 20 pegs before coming
to rest. Now do this ball after ball and look at the distribution of
where the balls end up. Most will be near the center. A few will be
well to the left and a few will be well to the right. The distribution
should look like a bell curve.
I did this here in Excel
but I changed things a little. When I did this exactly as described above, I discovered to my chagrin that there were no observations at odd number positions. So I could have done the above with half steps instead of whole ones, but instead I did the following. At each juncture the ball could go left
or right as before, but it also could stay where it is, a third
possibility. In this simulation, left is a value of -1, right is a
value of +1, and staying where it is gives a value of 0. To get these
outcomes equally likely (assuming that RAND() produces a uniform random
variable on the interval [0,1] and different realizations are independent of one another) I generated a realization of RAND() for each ball and each peg it hit.
On
a per ball basis this is done 20 times (with the results in columns AC
to AV. Each ball is a row. I then translated these realizations into left, right or stay put. The translations are in columns C to V. (If you zoom out to 50% view, you can see both realizations and translations in one view.) For example, in cell F44 there is the formula:
=IF(AF44<1/3,-1,IF(AF44<2/3,0,1))1>
There
are 4000 rows of data. The first such row begins in row 44. There are thus 80,000 realizations of RAND() generated (4000 rows times 20 entries per row).
In rows 2
through 42, I record the positions (in column A) and the number of
balls at the position (in column B). To get the number at a position I use
the COUNTIF function. As a little check I add up all the balls at each
position. This is in cell D24. Sure enough, it gives a value of 4000,
as it should.
I then graphed the positions and number
of balls with a graph that connects the points. (Tufte might not like
that but it's good enough for government work.) You can also just eyeball the results.
Finally,
I put in a Spin Button. Each time you change the value with the Spin
Button, it should produce a new simulation. By doing that several
times, it will give you a sense of what is idiosyncratic to that
particular simulation and what seems robust from one simulation to the
next. This is the benefit of doing the exercise in Excel.
It does seem to produce a near Bell Curve. That is
good. It also produces something different each time, which is also good. There is variation in the sample sum and sample mean from one sample to the next.
When I 'learned' probability and statistics, back in the 1970s, it was all from textbooks and all about theory. There was nothing about data. It seems to me simulations such as this would serve as a bridge between the two. I believe that nowadays most statistics textbooks come with simulation software of some sort. That's a plus. But it would be better still if students could build the simulations themselves. They can with this Excel simulation.
No comments:
Post a Comment