POISSON BRIEF ABSTRACT A number of different data are considered as being random, i.e. satisfying the hypotheses of the Poisson Process. These include V2 rocket hits in London in World War II to no-hitters per season in baseball. GENERAL INFORMATION FileName: POISSON Full title: Bombs, Baseball, e, and the Poisson Distribution Developer: Brian J. Winkel, Department of Mathematics, Rose-Hulman Institute of Technology, Terre Haute IN 47803 USA. Contact: Brian J. Winkel, Department of Mathematics, Rose-Hulman Institute of Technology, Terre Haute IN 47803 USA. Phone: 812-877-8412. Email: winkel@rose-hulman.edu. FAX: 812-877-3198. Support: The production of this material is supported by the National Science Foundation under Division of Undergraduate Education grant DUE-9352849: Development Site for Complex, Technology-Based Problems in Calculus with Applications in Science and Engineering and the Arvin Foundation of Columbus IN. STATEMENT OF PROBLEM BACKGROUND: Introductory comments about probability and the Poisson distribution. Suppose we are randomly throwing darts at a 10 cm by 10 cm dart board with a 1 cm grid placed over it. If we throw 1000 darts at this grid how many darts do we expect to get in each of the 1 cm^2 squares in the grid? Does 1000 darts/100 c m^2 = 10 darts/cm^2 sound right to you? It would be too much to expect that each 1 cm^2 square would contain exactly 10 darts. One would expect some squares to have more than 10 darts and some to have less. Mathematicians have ascertained a mathematical model which is quite good at describing the distribution of the darts IF the throwing is truly random. One important assumption of the mathematician's model of such phenomena is that Each dart is equally likely to hit each square. After some analysis involving differential equations and probability theory, this assumption leads to a model we discuss below. But first -- Explain why taking data from boards used in the World Championship Pub Dart Throwing Competition would not be expected to follow this important assumption in the mathematician's model above. Back to our situation: Again, what is the average number of darts to hit a 1 cm^2 square? Does 1000 darts/100 c m^2 = 10 darts/cm^2 still sound right to you? Let us set a = 10 and refer to this as the average number of darts per cm^2 square. Then, the mathematicians have determined that the percentage (decimal value between 0.0 and 1.0) of those squares which have exactly k darts in them if the average number of darts per square is a is given by the formula: p_k = (e^(-a) a^k)/k! This is called the Poisson Distribution. [N.B. Recall what e is. And k! = k(k-1)(k-2)...(3)(2)(1) is k factorial.] Another way of saying this is that the probability of any given square having exactly k darts in it is p_k. Notice this does not say the p_k percent WILL have exactly k darts, but rather it says that the chance that any given square has exactly k darts is p_k. (Probability 1 means the event is certain to occur and probability 0 means the event is certain not to occur.) We give this in a Mathematica formula. We know a = 10 here. And in this case we have..... P[k_] := Exp[-10] 10^k/k! Thus, to find out what percent of the squares should have, say, k = 12 darts, we simply evaluate the function P[k_] at k = 12. P[12]//N 0.0947803 This means that almost 9.5 percent of the squares should have exactly k = 12 darts in them. Or again, it means that .094 is the probability that a given square has exactly k = 12 darts in it. How many squares is this? Simply (0.0947803)(100) = 9.47 or about 9 squares should have exactly 12 darts in them if the darts are thrown randomly. Now, on the assumption that the darts are randomly thrown and the mathematicians have done their work well, let us determine the number of squares in which we should expect to have exactly k = 0,1,2,...,20 darts. We shall put this information into a table called PTable and then plot these values. Here we go. Place the pointer on the cell and click to activate the cell. Then press the Enter key to execute the Mathematica command. Ptable = Table[N[P[i]],{i,0,20}] {0.0000453999, 0.000453999, 0.00227, 0.00756665, 0.0189166, 0.0378333, 0.0630555, 0.0900792, 0.112599, 0.12511, 0.12511, 0.113736, 0.0947803, 0.0729079, 0.0520771, 0.0347181, 0.0216988, 0.012764, 0.00709111, 0.00373216, 0.00186608} Look at the entries in this table Do you see our P[12] from above? What does the 0.00227 (the third number in the list) signify? What does the 0.0520771 (the 15th number in the list) signify? Now let us plot this set of values. We shall plot P[k] (remember the table of values was called PTable) on the vertical axis and k on the horizontal axis. ListPlot[Ptable,PlotStyle->{PointSize[.02]}] What kinds of things do you notice about this plot? And what does this mean for the dart board scenario above? List them here. So to test if some phenomena is random, at least if it satisfies the Poisson distribution we have been discussing, we simply need to determine the average number of observations per unit of (time or area or whatever), call it a, compute the values of p_k = (e^(-a) a^k)/k!, and multiply the total number of observations by p_k to estimate what the Poisson model predicts as the number of observations of type k. Then compare this with the original data. If the comparison is favorable (you decide what you will consider favorable) then we can say the phenomena is random, otherwise it is not random - at least by Poisson standards. We now offer you several interesting, some documented, examples of how the Poisson model may be applied to determine if a phenomena is random. REMEMBER: Mathematicians have found the Poisson Distribution to be an appropriate model for randomness. In what sense do we mean this? Again we remind you of the important assumption upon which this model is constructed: Each dart is equally likely to hit each square. Let us consider more contexts in which this model applies (and some in which it does not apply). Problems: 1. Traffic Model Consider the data we have recorded: # vehicles (k) 0 1 2 3 4 5 6 7 8 9 per minute -------------------------------------------------------------------------------------- # minute intervals with k vehicles 2 8 15 21 19 15 10 7 1 2 There were 0(2) + 1(8) + 2(15) + 3(21) + 4(19) + 5(15) + 6(10) + 7(7) + 8(1) + 9(2) = 387 vehicles altogether. Note the units in this sum: vehicles minutes vehicles. ----------- x = minute Thus in the 100 one minute intervals we average 387/100 = 3.87 vehicles per minute. Can you justify the claim that there are on average 3.87 vehicles per minute in our observation data? Thus, in attempting to model this phenomenon of traffic flow past our observation point we ask the questions the mathematicians ask: Is each interval of time equally likely to have a vehicle pass? This may be a subtle question, for consider an observation post some 100 meters down highway from a stop light? Do you think that in each minute of our 100 minute observation period it is equally likely that a vehicle will pass? Or consider a street outside a meat packing plant in which we are observing from 4:00 PM through 5:40 PM? Hmmmmmmm! But let us assume we are on a stretch of highway far outside of town at 9:00 AM and we take our data through 10:40 AM -- 100 minutes worth. We wish to see how well the Poisson model fairs in this situation. This means we need to determine how well the model will predict the data we observed. We examine the Poisson model. Again using the result which the the mathematicians have determined for the percentage of those one minute intervals which have k vehicles passing the observation point : p_k = (e^(-a) a^k)/k! where a is the average number of observations (darts, vehicles) per unit (square, minute interval respectively). Recall we have determined that the average number of vehicles per minute is a = 3.87. Let us clear out the P[k_] function with a = 10 and use our new a = 3.87. P[k_] = Exp[-3.87] (3.87)^k/k! k 0.0208584 3.87 --------------- k! Let us evaluate this for a number of values of k. TrialTable = Table[N[P[i]],{i,0,9}] {0.0208584, 0.0807219, 0.156197, 0.201494, 0.194945, 0.150888, 0.0973226, 0.0538055, 0.0260284, 0.0111922} Recall the meaning of each of these numbers. P[2] = 0.156197 is the percentage of the one minute time intervals which we would expect to observe exactly 2 vehicles. In our observations of 100 minutes, in how many of these minute intervals should we expect to see 2 vehicles? Need help? Guess where it is! We should expect to see 100 P[2] = 100 (0.156197) = 15.6197 or between 15 and 16 one minute time intervals in which we see 2 vehicles pass. Recall the observation data we have: # vehicles (k) 0 1 2 3 4 5 6 7 8 9 per minute -------------------------------------------------------------------------------- # minute intervals with k vehicles 2 8 15 21 19 15 10 7 1 2 How good was our model in predicting the observed data? If you did not get the answer to the question posed before the data above, then open the cell above the data to see our answer. Not bad! We predicted (did you also get the same?) somewhere between 15 and 16 intervals in which 2 vehicles were observed. Now let us fill in one able (ActualTable) with the actual data and one table with the theoretical data (TheoryTable). Then we shall plot them and see how close they look. If they look close, then we shall say the Poisson model is a good predictor of traffic observation. If they do not look close, we shall say the Poisson model is a lousy predictor of traffic observation. Here we go. First the Actual Table: ActualTable = List[2, 8, 15, 21, 19, 15, 10, 7, 1, 2] {2, 8, 15, 21, 19, 15, 10, 7, 1, 2} And now the TheoryTable: TheoryTable = Table[100*N[P[i]],{i,0,9}] {2.08584, 8.07219, 15.6197, 20.1494, 19.4945, 15.0888, 9.73226, 5.38055, 2.60284, 1.11922} Let's go for the plot of both actual data and theoretical data. Ready, set, go! First the Actual Data plot: q1 = ListPlot[ActualTable,PlotRange->{{0,10},{0,25}}] Then the Theoretical Data plot: q2 = ListPlot[TheoryTable,PlotRange->{{0,10},{0,25}}]€ We examine them on the same axes: Show[q1,q2] Show::gtype: Times is not a type of graphics. Show::gcomb: An error was encountered in combining the graphics objects in Show[-Graphics-, € -Graphics-]. Show[-Graphics-, € -Graphics-] Now make any comments and conclusions on how these two plots compare; how the two data sets compare. Be sure to say if you believe the Poisson is a good model for predicting this data. If it is, say why; if it is not say why not. 2. Couch Potato Data We consider Dan CouchPotato who sits in front of the TV. His only exercise is to switch channels. We have been observing him for about 100 minutes and we have found the following data for the number of intervals in which he exercises k channel switches: # switches (k) 0 1 2 3 4 5 6 7 per minute ------------------------------------------------------------------------------------- # one minute intervals with k switches 13 12 15 42 54 32 8 4 It is now your turn to (1) determine the average number of channel switches per minute - call this a. (2) Determine a theoretical Poisson Q[k_] model for predicting the number of one minute intervals in which there are exactly k channel switches. (3) Compare the theoretical and observed data. (4) Discuss whether or not (and why) this Poisson model is a good model for predicting the observations. Remember that underlying assumption those mathematicians made. Go to it! 3. V-2 Rocket hits on London in World War II - A Real Application! During World War II, London was assaulted with German flying-bombs on V-2 rockets. The British were interested in whether or not the Germans could actually target their bomb hits or were limited to random hits with their flying-bombs. R. D. Clarke in his article, An Application of the Poisson Distribution, which appeared in the JOURNAL OF THE INSTITUTE OF ACTUARIES Vol 72 (1946), p. 481, shows the analysis which led the British to determine whether or not the Germans could target their bombs or were merely limited to random hits. Before we turn the analysis over to YOU, it should be noted that this analysis is very important. For if the Germans could only randomly hit targets, then deployment throughout the countryside of various security installations would serve quite well to protect them, as random bombing over a wide range was unlikely to hit a given target. However, if the Germans could actually target their flying-bombs, then the British were faced with a more potent opponent and deployment of security installations would do little to protect them. The British mapped off the central 24 km by 24 km region of London into 1/2 km by 1/2 km square areas. Then they recorded the number of bomb hits, noting their location, and this data is in the following table: # bomb hits (k) 0 1 2 3 4 5 and over per area ------------------------------------------------------------------------------------- # areas with k bomb hits 229 211 93 35 7 1 Imagine that you are a young Lieutenant in His Majesty's Service. You are charged with ascertaining if the British are up against an adversary who can target their flying-bombs or one who can only randomly toss these bombs at London. Use the Poisson analysis approach offered below to make this decision. Write up your conclusions: show your analysis and defend your conclusion. 4. No-hitters in Baseball - random events? Consider the phenomenon of no-hitters in baseball. A no-hitter for a pitcher is a 9 inning game in which the pitcher allows no hits! They are rare, but are they randomly distributed? Consider the # no-hitters (k) per season, and the # seasons with k no-hitters. Using the Poisson analysis offered above, ascertain if no-hitters are randomly distributed among baseball seasons. Offer your analysis and defend your conclusions. The following data relates to the number of no-hitters per season for Major League Baseball from the years 1876-1989, some 114 years of professional baseball history. # no-hitters (k) 0 1 2 3 4 5 6 7 8+ per season ---------------------------------------------------------------------------------------------- # seasons with k no-hitters 26 31 23 19 10 3 1 1 1 [Reference: THE SPORTING NEWS COMPLETE BASEBALL RECORD BOOK. 1990. The Sporting News: St. Louis MO. pp. 154-155.] 5. Radioactive disintegration Lord Ernest Rutherford, the famous British physicist who worked in the early part of the twentieth century, was detecting radioactive disintegrations in his laboratory. His results are reported in his book (Rutherford, Chadwisk, and Ellis. RADIATION FROM RADIOACTIVE SUBSTANCES. Cambridge ENGLAND. 1920. p. 172) and later analyzed in H. Cramer's book, (MATHEMATICAL METHODS OF STATISTICS. Princeton NJ. 1945. p. 436.) Basically Rutherfod took N = 2608 time intervals of 7.5 seconds each and counted the number of particles in each interval which reached a counter. His data is presented below. # particles counted (k) 0 1 2 3 4 5 6 7 8 9 10 per time interval ------------------------------------------------------------------------------------------ # time intervals with k particles counted 57 203 383 525 532 408 273 139 45 27 16 From this data, can you consider that radioactive disintegration is a random process? Write up your opinion. Defend your conclusion using the Poisson model approach. 6. Project Ideas (a) Are the distribution of vowels random in random typing? Ask a buddy to type out 100 groups of 5 letters each in your word processor. Then use the Poisson model analysis technique to ascertain if the distribution of vowels is random. Offer some reasons to defend your conclusion. Examine your keyboard. Question the subject's typing ability. (b) Consider the typographical errors in a newspaper. Take 100 lines of columnar newspaper copy and count the number of lines with k typos, k = 0,1,2,.... Are typos randomly distributed? (c) Set up an observation station near the entrance to the student dining area and count the number of arrivals per 30 second interval. Are such arrivals random? (d) Get a table of random numbers and use a data sampling method and the Poisson model to ascertain if these digits are random. (e) Are the shots taken in a basketball game random? Divide the game (actual clock playing time) into 30 second intervals and use the Poisson analysis. (f) Determine if the number of presidential appointments (per four year presidential term) to the US Supreme Court are random. (g) Epidemiologists and medical geographers are interested in the occurrence of disease and one can study the random outbreaks across a geographical region, e.g. are cancers randomly distributed in the county or is there some undue bunching and what is this bunching of cancers due to? (h) Is rain random? Does the number of days of rain per week follow a Poisson model? (i) Are red cars randomly parked in the student or faculty lot? Set up a grid and count 'em. Conclusion - Our last word! Throughout this activity we have attempted to convey one view of random distribution of phenomena in space and time. As you can see, random does permit "bunching" and does not imply uniformity or "flat" distribution. Indeed, bunching is a sign of randomness. We have seen that the Poisson model predicts some phenomena quite well (because the given phenomenon adheres to the underlying assumption the mathematicians make in the Poisson model: observations of the phenomenon are equally likely in each observation period, and in other cases it does not predict randomness because this assumption is not satisfied. KEYWORDS Poisson distribution, e, factorial, probability, average or mean, data plotting. TEACHER NOTES ISSUES RELATED TO THE PROBLEM Prerequisites Students do not have to know about probability to succeed in this effort. They just need to determine the average value of a set of weighted data and computer p_k = (e^(-a) a^k)/k! where a is that average value and k = 0, 1, 2, 3, . . . Time allotment - time management This exercise can be done in half of a period with proper teacher introduction and illustration or it can be assigned as a homework assignment over a few nights. Indeed, the analysis can be performed with a calculator and graph paper. Expectations Expect the students to compute the theoretical values of expected distributions based on the average umber of phenomena per unit and the Poisson distribution and then simply to compare the two data plots. Students will discuss the underlying assumptions, especially if they do the couch potato or collect data on local campus traffic, e.g. # students per minute leaving the library front entrance. We have found them intrigued that so many different phenomena seem to fit the Poisson definition of random and the students enjoy discussion/speculating why these particular phenomena satisfy the underlying model assumptions. Future payoffs When students see Poisson distribution in statistics courses or use it in ecology courses (for example to establish the randomness of a species distribution in an environment) they will be familiar with the notion. Students will see some non-deterministic data collection activity in their calculus coursework and be required to make an educated guess as to whether or not the data really is random based on a mathematical model, albeit plotted data from the model vs. actual data. Extensions One could introduce the Chi-square model to affirm the validity of the claims the students make, but we have found on this first go round that eyeballing the theoretical data as compared to the actual data is sufficient for students to surmise whether or not the phenomena is random. Further one could compare the Poisson model to the normal and the binomial distribution for large values and lead into more and more statistics. References and Sources POSSIBLE SOLUTION(S) Problems: We offer up solutions of two of the problem - one which is not Poisson and one which is Poisson. 2. We consider Dan CouchPotato who sits in front of the TV. His only exercise is to switch channels. We have been observing him for about 100 minutes and we have found the following data for the number of intervals in which he exercises k channel switches. # switches (k) 0 1 2 3 4 5 6 7 per minute ------------------------------------------------------------------------------------- # one minute intervals with k switches 13 12 15 42 54 32 8 4 To determine the average number of channel switches per minute, a, we compute the following weighted average: a = (0*13 + 1*12 + 2*15 + 3*42 + 4*54 + 5*32 + 6*8 + 7*4)/ (13 + 12 + 15 + 42 + 54 + 32 + 8 + 4)//N 3.44444 Computed below is the amount of time Dan was observed, not the number of switches. We note the total number of channel switches is 180. total = (13 + 12 + 15 + 42 + 54 + 32 + 8 + 4); Thus our Poisson model for the probability of k switches per minute is given by the Poisson distribution formula. P[k_] := Exp[-a] a^k/k! Thus on average there are 3.44444 channel switches per minute. We compute a table of theoretical values and compare these with the observed data to see channel switching is really random. theorydata = Table[{k,P[k] total},{k,0,7}] {{0, 5.74605}, {1, 19.7919}, {2, 34.0861}, {3, 39.1359}, {4, 33.7004}, {5, 23.2158}, {6, 13.3276}, {7, 6.55802}} theory = ListPlot[theorydata,PlotStyle->{PointSize[.02]}] actualdata = {{0,13},{1,12},{2,15},{3,42},{4,54},{5,32}, {6,8},{7,4}}; actual = ListPlot[actualdata,PlotStyle->{PointSize[.02]}] Show[theory,actual] It would appear that the theoretical data and the observed data do not match very well and we conclude that switching channels is not a random phenomena, which is not surprising for most channel switching occurs during commercials, hence each interval is not equally likely to experience a channel switch and thus this phenomena would not follow the Poisson assumptions and hence not appear to be random according to the Poisson distribution. This shows from the data listing as well. compare = Table[{k-1,actualdata[[k]][[2]], theorydata[[k]][[2]]},{k,1,8}] {{0, 13, 5.74605}, {1, 12, 19.7919}, {2, 15, 34.0861}, {3, 42, 39.1359}, {4, 54, 33.7004}, {5, 32, 23.2158}, {6, 8, 13.3276}, {7, 4, 6.55802}} TableForm[compare, TableHeadings->{None,{"k","observed","Poisson"}}] k observed Poisson 0 13 5.74605 1 12 19.7919 2 15 34.0861 3 42 39.1359 4 54 33.7004 5 32 23.2158 6 8 13.3276 7 4 6.55802 3. The British mapped off the central 24 km by 24 km region of London into 1/2 km by 1/2 km square areas. Then they recorded the number of bomb hits, noting their location, and this data is in the following table. Do the bombs obey a Poisson model? # bomb hits (k) 0 1 2 3 4 5 and over per area ------------------------------------------------------------------------------------- # areas with k bomb hits 229 211 93 35 7 1 To determine the average number of bomb hits per 1/2 km by 1/2 km section in London, a, we compute the following weighted average: a = (0*229 + 1*211 + 2*93 + 3*35 + 4*7 + 5*1)/ (229 + 211 + 93 + 35 + 7 + 1)//N 0.928819 We note the total number of bomb hits is 576. total = (229 + 211 + 93 + 35 + 7 + 1); Thus our Poisson model for the probability of k bombs striking a given section is given by the Poisson distribution formula. P[k_] := Exp[-a] a^k/k! Thus on average 0.928819 bombs hit a 1/2 km by 1/2 km section. We compute a table of theoretical values and compare these with the observed data to see if the bombs really fell randomly. theorydata = Table[{k,P[k] total},{k,0,5}] {{0, 227.531}, {1, 211.336}, {2, 98.1463}, {3, 30.3867}, {4, 7.05595}, {5, 1.31074}} theory = ListPlot[theorydata,PlotStyle->{PointSize[.02]}] actualdata = {{0,229},{1,211},{2,93},{3,35},{4,7},{5,1}}; actual = ListPlot[actualdata,PlotStyle->{PointSize[.02]}] Show[theory,actual] It would appear that the theoretical data and the observed data are almost identical; indeed, we compare the numbers as well. compare = Table[{k-1,actualdata[[k]][[2]], theorydata[[k]][[2]]},{k,1,6}]; TableForm[compare, TableHeadings->{None,{"k","observed","Poisson"}}] k observed Poisson 0 229 227.531 1 211 211.336 2 93 98.1463 3 35 30.3867 4 7 7.05595 5 1 1.31074 Thus it appears that the bombs falling on London were indeed random, not targeted. ISSUES IN SOLUTION We try to get students into the format of this problem, (a) determine average number of observations per unit, (b) compute p_k, multiplying each p_k by the total number of observations to determine the expected number of observations of each type, (c) compare these with the observed data, and (d) make a conclusion - random or not random. This may take several trials, but students become comfortable with the algorithm and enjoy analyzing yet another set of data because the data, e.g. baseball no-hitters, supreme court appointments, couch potato TV channel switching, etc. intrigues them.