Protozoa - How fast can they grow? How far can they reach? BRIEF ABSTRACT Fitting the population data to the logistic growth model using differenced data on the differential equation or actual data on the solved equation. Several approaches are suggested and comparison is of interest. GENERAL INFORMATION FileName: PROTOZOA Full title: Protozoa - How fast can they grow? How far can they reach? Developer: Brian J. Winkel, Department of Mathematics, Rose-Hulman Institute of Technology, Terre Haute IN 47803 USA Contact: Brian J. Winkel, Department of Mathematics, Rose-Hulman Institute of Technology, Terre Haute IN 47803 USA. Phone: 812-877-8412. Email: winkel@rose-hulman.edu. FAX: 812-877-3198. Support: The production of this material is supported by the National Science Foundation under Division of Undergraduate Education grant DUE-9352849: Development Site for Complex, Technology-Based Problems in Calculus with Applications in Science and Engineering and the Arvin Foundation of Columbus IN. STATEMENT OF PROBLEM Consider the following table describing a population of protozoa. Time Number of Time Number of (in days) Protozoa (in days) Protozoa 0 40 11 470 1 53 12 526 2 70 13 577 3 91 14 622 4 118 15 660 5 152 16 691 6 193 17 716 7 240 18 736 8 293 19 752 9 351 20 764 10 411 One reliable model for the growth of protozoa is the Logistic Equation, a differential or rate equation which suggests that the rate of growth of the population depends on the size of the population, a natural species growth rate, and the carrying capacity or limiting number , i.e. the maximum number of the protozoa which the environment can support. The differential equation model is dp/dt = p'(t) = r p(t) (K - p(t))/K or dp/dt = r p (K - p)/K = r p - r/K p^2 (where p = p(t)). r is called the growth rate and K is the carrying capacity. (1) What values of the population p would make the growth rate (dp/dt) zero? Which of these values is "interesting"? And what insight does this give to you concerning the term carrying capacity or limiting number for K? (2) What would the population tend to do if the initial population were less than K? Hint: Examine the sign of dp/dt. What would the population tend to do if the initial population were greater than K? Again what insight does this give to you concerning the term carrying capacity or limiting number for K? (3) What value of p would give the maximum value of dp/dt? I.e. if we could keep the population at one level, say by harvesting of the growth, so that our growth would be greatest what value of p would we choose? (4) Ecologists who study populations of species are interested in determining values of r and K from collected data. In fact, the values of r and K often are used to characterize species. If a species has a large r value it is said to be r-selected and uses a high breeding rate for survival. If a species has a large K value it is said to be K-selected and uses its environment efficiently, i.e., for a fixed environment it has a higher carrying capacity or can fit more of its own kind into the fixed environment size. We shall consider a set (see above) of data obtained on the population of a protozoan colony in a fixed or limited environment. Our data consists of (time, population size) observations for this protozoan colony. On the assumption that the logistic equation posed above is a viable model for population growth the task is to determine good values of r and K so that the data fits thismodel well. Produce several ways to determine r and K. Compare the various results you get and suggest which approach you would prefer to use and why. Consider the following approaches: (a) Estimate the rate of change (dp/dt) from the data and fit a quadratic function, a p - b p^2, of p to these estimates to obtain values of r and K. (b) Estimate the rate of change (dp/dt) divided by p from the data and fit a linear function, a - b p, of p to these estimates to obtain values of r and K. (c) Solve the differential equation with parameters as constants. Use initial condition to obtain constant of integration. Directly use a least non-linear squares technique on fitting the function to the data. KEYWORDS Logistic growth model, limited growth model, growth rate, carrying capacity, quadratic equation, differential equation, data analysis, curve fitting, numerical derivative, symmetric differences, prediction. TEACHER NOTES ISSUES RELATED TO THE PROBLEM Here is the Mathematica code to generate the data used in the problem. popEq = p'[t] == .3 p[t] - .3/800 p[t]^2 sol = DSolve[{p'[t] == .3 p[t] - .3/800 p[t]^2, p[0] == 40}, p[t],t] f[t_] = p[t]/.sol[[1]] Plot[f[t],{t,0,20}] Here is the data we intend to give to students: data = Table[{i, Floor[f[i]]},{i,0,20}]; In questions 1-3 we ask a few short questions on the qualitative behavior of the logistic equation and attempt to infer some ecological and semantic meaning to the constant K - carrying capacity. The major question is really question 4. In question 4 the objective is to estimate parameters in a problem, in this case the growth rate r and the carrying capacity K, using observed data where the model may not be in closed form originally. We also give several other opportunities for estimating parameters in a differential equation given some observed data by using numerical derivatives as estimates of the derivative and fitting data to these approximations to the differential equation. So for p'(t) = r p(t) (K - p(t))/K we are given some data and we seek to estimate r and K, first using a number of different ways without resorting to solving the differential equation, and second with a solution of the differential equation in hand. We give several suggestions in the question itself about how to proceed, but it is up to the students to discover the approaches and use the software appropriately, e.g. manipulation of data and parts of data, the use of symmetric differences or non-symmetric differences as numerical derivatives to approximate the derivative, and the solution of the differential equation followed by setting up a non-linear least square function with the data, in the parameters r and K. Questions (1) - (3), being warm-up questions can be done in class in about 45 minutes (maximum), small group efforts will assure all understand the issues in that time. Question (4) should be given considerable time, with in class group efforts and suggestions by the teacher, e.g., on numerical derivatives which are not often done. The class may determine a final comparison method, but we offer comparing the sum of the squares between the population values from the analytic solution to the differential equation with the parameters used and the observed population data for each time population data pair. The class may use how the data looks when plotted over the curve with the estimated parameters as a comparison. This data is very clean and should give rise to an excellent fit. One could offer up dirty data by using the function Pop(t) offered in the solution to Question (4), generating the numbers for a particular set of parameters r and K and then adding some noise to that data. But getting data which is too dirty at this stage would be detrimental as the students would not see the good fits. Effectively one could use p(t) = (Exp[r * t] * k)/(-1 + Exp[r * t] + k/p(0)) to compute new data. where r and K are the parameters to be estimated (selected to generate the problem by the teacher) and p(0) is the initial population level at time t = 0. Prerequisites Some fitting of functions to dasta, solution of first-order differential equation. Time allotment - time management This could be a several days project, for it demands data fitting, solutions to differential equation, and discovery of technique by students. Suggesting one of the techniques proposed would shorten the time. Expectations Future payoffs Extensions References and Sources POSSIBLE SOLUTION(S) 1. If p = 0 or p = K then dp/dt = 0. p = 0 is uninteresting as we do not believe in spontaneous generation and thus if the population got to be 0 there is nothing more to study. However, we see that if K = 0 there is not any growth or decline, and certainly that would qualify K for something special, although it would not necessarily mean it is a limiting value or carrying capacity. We turn to question 2 for more insight. 2. If the population were less than K then dp/dt would be positive and the population would then increase. If the population were greater than K then dp/dt would be negative and the population would tend to fall. Thus the population would grow until it got to K and fall back to K if it were over K. This says that the population would not stay above K long and would tend to rise to K if it falls below K, thus limiting value or carrying capacity seems to be a good name. 3. Looking at the function dp/dt = r p (K - p)/K we see that it is a quadratic in p and has a maximum when p = K/2. thus dp/dt is greatest when the population is at a level of K/2 or one-half its carrying capacity. 4. We enter the data for the number of protozoa in a culture at time t in days. data = {{0, 40}, {1, 53}, {2, 70}, {3, 91}, {4, 118}, {5, 152}, {6, 193}, {7, 240}, {8, 293}, {9, 351}, {10, 411}, {11, 470}, {12, 526}, {13, 577}, {14, 622}, {15, 660}, {16, 691}, {17, 716}, {18, 736}, {19, 752}, {20, 764}}; This recovers r and K using a parabolic fit to the differential equation and non-symmetric differences. Here is the growth rate data as a function of population size. ddata = Table[{data[[i]][[2]], (data[[i+1]][[2]]- data[[i]][[2]])/ (data[[i+1]][[1]]- data[[i]][[1]])}, {i,1, Length[data]-1}]; dp1 = ListPlot[ddata,PlotStyle->{PointSize[.02]}, AxesLabel->{"P", "P'(t)"},PlotRange->{{0,800},{0,80}}] This appears to be a parabolic shape and we attempt to fit this data to a parabola of the form a p - b p^2. q[p_] = Fit[ddata,{p,p^2},p] 2 0.310469 p - 0.000396418 p qp = Plot[q[p],{p,0,800}] We plot the data over the parabola and it appears a good match. Show[dp1,qp] We now use information about the function a p - b p^2 for p'(t) at t = 0 to obtain r and k values. eq1 = q'[0] == r; eq2 = q''[0] == - 2 r/K; Solve[{eq1,eq2},{r,K}] {{K -> 783.185, r -> 0.310469}} These values are K = 783.185 and r = 0.3104. This recovers r and K using a parabolic fit to the differential equation and symmetric differences. We now take averages of the slope about each time interval to get an averaging of slopes. ddata = Table[{data[[i]][[2]], ((data[[i+1]][[2]]- data[[i]][[2]])/ (data[[i+1]][[1]]- data[[i]][[1]]) + (data[[i]][[2]]- data[[i-1]][[2]])/ (data[[i]][[1]]- data[[i-1]][[1]]))/2}, {i,2, Length[data]-1}]//N; dp1 = ListPlot[ddata,PlotStyle->{PointSize[.02]}, AxesLabel->{"P", "P'(t)"}] This appears to be a parabolic shape and we attempt to fit this data to a parabola of the form a p - b p^2. dq[p_] = Fit[ddata,{p,p^2},p] 2 0.298494 p - 0.00037285 p dqp = Plot[dq[p],{p,0,800}] We plot the data over the parabola and it appears a good match. Show[dp1,dqp] We now use information about the function a p - b p^2 for p'(t) at t = 0 to obtain r and k values. eq1 = dq'[0] == r; eq2 = dq''[0] == - 2 r/K; Solve[{eq1,eq2},{r,K}] {{K -> 800.575, r -> 0.298494}} These values K = 800.575 and r = 0.2984 compared to the answers we get using non-symmetric difference K = 783.185 and r = 0.3104. We have now recovered r and K using a linear fit to the differential equation divided by p(t) and non-symmetric differences. We now take averages of the slope about each time interval and divide these by p(t) for each time t. rdata = Table[{data[[i]][[2]], (data[[i+1]][[2]]- data[[i]][[2]])/ (data[[i+1]][[1]]- data[[i]][[1]])/data[[i]][[2]]}, {i,1, Length[data]-1}]//N; rp1 = ListPlot[rdata,PlotStyle->{PointSize[.02]}, AxesLabel->{"P(t)", "P'(t)/P(t)"}] This appears to be a straight line and we attempt to fit this data to a straight line of the form a - b p. rq[p_] = Fit[rdata,{1,p},p] 0.332564 - 0.000434191 p rqp = Plot[rq[p],{p,0,800}] Show[rp1,rqp] We now use information about the function a - b p for p'(t) at t = 0 to obtain r and k values. eq1 = rq[0] == r; eq2 = rq'[0] == - r/K; Solve[{eq1,eq2},{r,K}] {{K -> 765.941, r -> 0.332564}} This compares favorably to the original toy values of K = 800, r = 0.3 but not as well as several previous results. This recovers r and K using a linear fit to the differential equation divided by p(t) and symmetric differences. We now take averages of the slope about each time interval and divide these by p(t) at each time t. sdata = Table[{data[[i]][[2]], ((data[[i+1]][[2]]- data[[i]][[2]])/ (data[[i+1]][[1]]- data[[i]][[1]]) + (data[[i]][[2]]- data[[i-1]][[2]])/ (data[[i]][[1]]- data[[i-1]][[1]]))/2/data[[i]][[2]]}, {i,2, Length[data]-1}]//N; sp1 = ListPlot[sdata,PlotStyle->{PointSize[.02]}, AxesLabel->{"P(t)", "P'(t)/P(t)"}] This appears to be a straight line and we attempt to fit this data to a straight line of the form a - b p. sq[p_] = Fit[sdata,{1,p},p] 0.300345 - 0.000375982 p sqp = Plot[sq[p],{p,0,800}] Show[sp1,sqp] We now use information about the function a - b p for p'(t) at t = 0 to obtain r and k values. eq1 = sq[0] == r; eq2 = sq'[0] == - r/K; Solve[{eq1,eq2},{r,K}] {{K -> 798.827, r -> 0.300345}} This compares favorably to the original toy values of K = 800, r = 0.3 but not as well as several previous results. And it is better than the non-symmetric difference approach which yielded K = 765.941 and r = 0.332564. This recovers r and K after solving the differential equation and directly doing a nonlinear least squares fit to determine r and K. This approach to estimating the parameters uses solution of the differential equation followed by a least squares, non-linear fit of the closed-form solution with parameters to the data. Here is our differential equation model for paramecium growth. eq = p'[t]== r p[t] (k - p[t])/k r (k - p[t]) p[t] p'[t] == ----------------- k We solve with the parameters r and k unknown. sol = DSolve[eq,p[t],t] r t E k {{p[t] -> -------------}, {p[t] -> 0}} r t E + k C[1] We capture the solution. P[t_] = p[t]/.sol[[1]] r t E k ------------- r t E + k C[1] We use the initial condition to determine the constant of integration and a "final" form for the population model. sc = Solve[P[0]==40, C[1]] -40 + k {{C[1] -> -------}} 40 k Pop[t_] = P[t]/.sc[[1]] r t E k -------------- r t -40 + k E + ------- 40 Pop[t_] = Simplify[Pop[t]] r t E k -------------- r t k -1 + E + -- 40 Here is the data from the observations. data {{0, 40}, {1, 53}, {2, 70}, {3, 91}, {4, 118}, {5, 152}, {6, 193}, {7, 240}, {8, 293}, {9, 351}, {10, 411}, {11, 470}, {12, 526}, {13, 577}, {14, 622}, {15, 660}, {16, 691}, {17, 716}, {18, 736}, {19, 752}, {20, 764}} We offer a plot of the data to see the form we are looking for. p1 = ListPlot[data,PlotStyle->{PointSize[.02]}, AxesLabel->{"t", "p(t)"}] This is the sum of the squares for the differences between the observed data(i.e. the second coordinate of the ith data point - data[[i]] [[2]]) and the predicted data from the model (i.e. the value of the function at the first coordinate of the ith data point - Pop[data[[i]] [[1]] ]) . SS[r_,k_] = Sum[ (data[[i]][[2]] - Pop[data[[i]][[1]]])^2, {i,1,Length[data]}]; Here we enter the function with all three variable, r, k , and t. ParPop[r_,k_,t_] = Pop[t] r t E k -------------- r t k -1 + E + -- 40 We offer up a way to plot the data with a trial for any value of r and k one wishes to try. try[r_,k_] := Show[p1, Plot[ParPop[r,k,t], {t,0,20},PlotRange->{0,800}, DisplayFunction->Identity] ] try[.35,750] We seek to minimize this sum of the squares function SS[r, k] of two variables (the parameters r and k). FindMinimum[SS[r,k],{r,.6},{k,600}] {2.45308, {r -> 0.29972, k -> 799.791}} vals = FindMinimum[SS[r,k],{r,.2},{k,900}] {2.45308, {r -> 0.29972, k -> 799.791}} These compare very favorably to the toy values of K = 800 and r = 0.30. We pick off the values of r and K. r1 = r/.vals[[2]]; k1 = k/.vals[[2]]; We substitute the values of the parameters in the model and plot data and model to see how good we are doing. Protozoa[t_] = Pop[t]/.vals[[2]] 0.29972 t 799.791 E -------------------- 0.29972 t 18.9948 + E ProtozoaPlot= Plot[Protozoa[t],{t,0,20}] Show[p1, ProtozoaPlot] This looks very reasonable for prediction purposes and these values of r = 0.2999 and K = 799.79 may be used to characterize this species under these conditions. Summary of values obtained for r and K and comparisons. After using a parabolic fit to the differential equation and non-symmetric differences we determine r = 0.3104 and K = 783.184 and a sum of squares error of 1880.42 in the actual model. SS[.3104,783.184] 1880.42 After using a parabolic fit to the differential equation and symmetric differences we determine r = 0.2984 and K = 800.574 and a sum of squares error of 51.4211 in the actual model. SS[0.2984,800.574] 51.4211 After using a linear fit to the differential equation divided by p(t) and non-symmetric differences we determine r = 0.332564 and K = 765.941 and a sum of squares error of 18636.2 in the actual model. SS[.332564,765.941] 18636.2 After using a linear fit to the differential equation divided by p(t) and symmetric differences we determine r = 0.3003 and K = 798.826 and and a sum of squares error of 18636.2 in the actual model. SS[.3003,798.826] 8.23376 After solving the differential equation and directly doing a nonlinear least squares fit we determine r = 0.2999 and K = 799.79 and a sum of squares error of 22.562 in the actual model . SS[.299,799.79]//N 22.5652 Summary: It would appear that the linear fit of the differential equation divided by p(t) using symmetric difference gives the best least squares fit although the direct non-linear least squares on the solutions does quite well too. Indeed both symmetric numerical methods do quite well in comparison with the direct least squares method on the solution. The non-symmetric differencing does worst. ISSUES IN SOLUTION We have offered several different ways of addressing this problem and, no doubt, expect that students will propose others.