(*^ ::[ Information = "This is a Mathematica Notebook file. It contains ASCII text, and can be transferred by email, ftp, or other text-file transfer utility. It should be read or edited using a copy of Mathematica or MathReader. If you received this as email, use your mail application or copy/paste to save everything from the line containing (*^ down to the line containing ^*) into a plain text file. On some systems you may have to give the file a name ending with ".ma" to allow Mathematica to recognize it as a Notebook. The line below identifies what version of Mathematica created this file, but it can be opened using any other version as well."; FrontEndVersion = "NeXT Mathematica Notebook Front End Version 2.2"; NeXTStandardFontEncoding; fontset = title, inactive, noPageBreakBelow, noPageBreakInGroup, nohscroll, preserveAspect, groupLikeTitle, center, M7, bold, L1, e8, 24, "Times"; ; fontset = subtitle, inactive, noPageBreakBelow, noPageBreakInGroup, nohscroll, preserveAspect, groupLikeTitle, center, M7, bold, L1, e6, 18, "Times"; ; fontset = subsubtitle, inactive, noPageBreakBelow, noPageBreakInGroup, nohscroll, preserveAspect, groupLikeTitle, center, M7, italic, L1, e6, 14, "Times"; ; fontset = section, inactive, noPageBreakBelow, nohscroll, preserveAspect, groupLikeSection, grayBox, M22, bold, L1, a20, 18, "Times"; ; fontset = subsection, inactive, noPageBreakBelow, nohscroll, preserveAspect, groupLikeSection, blackBox, M19, bold, L1, a15, 14, "Times"; ; fontset = subsubsection, inactive, noPageBreakBelow, nohscroll, preserveAspect, groupLikeSection, whiteBox, M18, bold, L1, a12, 12, "Times"; ; fontset = text, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7, L1, 12; fontset = smalltext, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7, L1, 10, "Times"; ; fontset = input, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeInput, M42, N23, bold, L1, 12, "Courier"; ; fontset = output, output, inactive, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeOutput, M42, N23, L-5, 12, "Courier"; ; fontset = message, inactive, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeOutput, M42, N23, L1, 12, "Courier"; ; fontset = print, inactive, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeOutput, M42, N23, L1, 12, "Courier"; ; fontset = info, inactive, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeOutput, M42, N23, L1, 12, "Courier"; ; fontset = postscript, PostScript, formatAsPostScript, output, inactive, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeGraphics, M7, l34, w282, h287, L1, 12, "Courier"; ; fontset = name, inactive, noPageBreakInGroup, nohscroll, preserveAspect, M7, italic, B65535, L1, 10, "Times"; ; fontset = header, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7, italic, L1, 12, "Times"; ; fontset = leftheader, 12; fontset = footer, inactive, nohscroll, noKeepOnOnePage, preserveAspect, center, M7, italic, L1, 12, "Times"; ; fontset = leftfooter, 12; fontset = help, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7, L1, 12; fontset = clipboard, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7, L1, 12; fontset = completions, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7, L1, 12, "Courier"; ; fontset = special1, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7, L1, 12; fontset = special2, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7, L1, 12; fontset = special3, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7, L1, 12; fontset = special4, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7, L1, 12; fontset = special5, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M7, L1, 12; paletteColors = 128; automaticGrouping; magnification = 125; currentKernel; ] :[font = title; inactive; preserveAspect; startGroup] Population Rank Modelling :[font = section; inactive; preserveAspect; startGroup] BRIEF ABSTRACT :[font = subsection; inactive; preserveAspect; endGroup] Fitting a function to the population vs. rank data for major cities in a region and conjecturing (1) how this function changes over time and (2) how plots for different societies, e.g. rural, industrial, compare. :[font = section; inactive; Cclosed; preserveAspect; startGroup] GENERAL INFORMATION :[font = subsection; inactive; preserveAspect; endGroup] FileName: POPRANK Full title: Population Rank Modelling Developer: Brian J. Winkel, Department of Mathematics, Rose-Hulman Institute of Technology, Terre Haute IN 47803 USA Contact: Brian J. Winkel, Department of Mathematics, Rose-Hulman Institute of Technology, Terre Haute IN 47803 USA. Phone: 812-877-8412. Email: winkel@rose-hulman.edu. FAX: 812-877-3198. Support: The production of this material is supported by the National Science Foundation under Division of Undergraduate Education grant DUE-9352849: Development Site for Complex, Technology-Based Problems in Calculus with Applications in Science and Engineering and the Arvin Foundation of Columbus IN. :[font = section; inactive; Cclosed; preserveAspect; startGroup] STATEMENT OF PROBLEM :[font = subsection; inactive; preserveAspect] Consider the relationship between the rank and the actual population of cities in a region (world, United States, state, county, etc.) We offer you data on United States cities. (1-2)*Consider the data on United States cities and plot Population vs. Rank to see if there is any possible relationship. Population of United States Cities. Source: 1988 World Almanac and Book of Facts. New York: World Almanac. p. 538. (IF you want current data you will need to consult current Almanac.!) :[font = subsection; inactive; preserveAspect] Rank 1986 Pop City 1 7,262,700 New York 2 3,259,340 Los Angeles 3 3,009,530 Chicago 4 1,728,910 Houston 5 1,642,900 Philadelphia 6 1,086,220 Detroit 7 1,015,190 San Diego 8 1,003,520 Dallas 9 914,350 San Antonio 10 894,070 Phoenix 11 752,800 Baltimore 12 749,000 San Francisco 13 719,820 Indianapolis 20 566,030 Columbus (OH) 30 429,550 Fort Worth :[font = subsection; inactive; preserveAspect] * Considering is HARD WORK and so we give it (1-2) in number! :[font = subsection; inactive; preserveAspect] (3) Conjecture a functional relationship between an independent variable Rank and a dependent variable Population with one or two other parameters to be determined. :[font = subsection; inactive; preserveAspect] (4) Attempt to determine the best parameter(s) for your data and use the value of the parameter value(s) to plot a function relating Population to Rank. :[font = subsection; inactive; preserveAspect] (5) Compare your best parameter model with the actual data. :[font = subsection; inactive; preserveAspect] (6) Consider other demographic data, e.g., countries of the world or cities in your state, and perform the same analysis. Perform the modeling analysis outlined in (1) - (3) above. Be sure to give the full source citation. You might not want to use all the data offered in your source. In this case use your model which you determine to predict population and rank for cities not used in your analysis. How good is your function? In selected regions of the ranks, e.g., in top ranked and near bottom ranked regions? :[font = subsection; inactive; preserveAspect] (7) Discuss the nature of your models for different types of societies, e.g. a region with a few large industrial centers or a region which is all agrarian. Find data for different types of regions and discuss your model. :[font = subsection; inactive; preserveAspect; endGroup] (8) An option, find data for one region over time and observe how the parameters in your model change. Attempt to relate your changing parameter to historical phenomena, e.g. war, agrarian reform, industrial revolution, immigration, etc. E.g., United States data from 1790 through 1990. :[font = section; inactive; Cclosed; preserveAspect; startGroup] KEYWORDS :[font = subsection; inactive; preserveAspect; endGroup] Data analysis, function fitting, logarithm, parameter estimation, least squares fit, population, rank, demographics, social change, history. :[font = section; inactive; Cclosed; preserveAspect; startGroup] TEACHER NOTES :[font = subsection; inactive; Cclosed; preserveAspect; startGroup] ISSUES RELATED TO THE PROBLEM :[font = subsubsection; inactive; preserveAspect] We have used this assignment in a number of calculus and above classes, e.g., mathematical modelling class in which this effort represents an empirical model of data fitting. In all cases students usually come to the model P(r) = K/r^a where P = P(r) is population, r is rank, K is often a parameter to be determined or set to the rank of the largest city (P(1) = K/1^a = K). :[font = subsubsection; inactive; preserveAspect] One year our calculus students did go after the census data for the United States and it was in that class that we discovered the potential for studying changing a values. The students did get excited about this for their analysis worked decade after decade and they then plotted the changing values of a versus time and made conjectures. Thus there was a meta data analysis problem as well. Often time students could eyeball the parameters, at least K, and to some extent a. Some students will linearize the data to obtain log(P(r)) = log(K) - a log(r) followed by linear regression or straight line fitting. It is in this linearization that one can get different K values other than K = P(1). Non-calculus students can do a good bit of eyeballing of non-linear and linearized data and get good success here. :[font = subsubsection; inactive; preserveAspect; endGroup] Calculus students who can apply optimization techniques may choose to go directly after the least squares function formulation and use software to find the parameter values K and a which minimize the least squares, be it from linearized data or non-linearized data. The K and a values you get from these two approaches, i.e., linearize first then do least squares or non-linear least squares, may differ and cause some discussion. They should not be expected to be the same as in each case we are minimizing the vertical distance squared between different types of functions and data. :[font = subsection; inactive; Cclosed; preserveAspect; startGroup] Prerequisites :[font = subsubsection; inactive; preserveAspect; endGroup] Students should be able to use the logarithm function to "linearize" data . They should be able to plot data, perhaps logged data and estimate slopes. :[font = subsection; inactive; preserveAspect] Time allotment - time management :[font = subsection; inactive; preserveAspect] Expectations :[font = subsection; inactive; preserveAspect] Future payoffs :[font = subsection; inactive; Cclosed; preserveAspect; startGroup] Extensions :[font = subsubsection; inactive; preserveAspect; endGroup] Study the parameter a over time and get a plot of a vs. time. Then conjecture the significance of change, e.g. Industrail Revolution, Depression, Boom, etc. :[font = subsection; inactive; Cclosed; preserveAspect; startGroup] References and Sources :[font = subsubsection; inactive; preserveAspect; endGroup; endGroup] Population of United States Cities. Source: 1988 World Almanac and Book of Facts. New York: World Almanac. p. 538. :[font = section; inactive; Cclosed; preserveAspect; startGroup] POSSIBLE SOLUTION(S) :[font = subsection; inactive; Cclosed; preserveAspect; startGroup] First we set up a data set. We enter the {Rank, Population} data. :[font = input; preserveAspect; endGroup] pdata = {{1,7262700},{2,3259340},{3,3009530},{4,1728910}, {5,1642900}, {6,1086220},{7,1015190},{8,1003520}, {9,914350},{10,894070}, {11,752800},{12,749000}, {13,719820},{20,566030},{30,429550}}; :[font = subsection; inactive; Cclosed; preserveAspect; startGroup] We plot the {Rank, Population} data. :[font = input; preserveAspect; endGroup] popPlot = ListPlot[pdata,PlotStyle->{PointSize[.02]}] :[font = subsection; inactive; Cclosed; preserveAspect; startGroup] First we attack the problem as an exponential decay function, i.e. P = K Exp[ - a r] where P is the population, r is the rank, and a and K are parameters to be determined. :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] This looks like an exponential decay and if we take the logarithm of both sides of the equation (linearizing we hope), say to base e, of P component of each data point and plot log(P) vs. r we should get a linear relationship between the transformed or linearized data. The transformed equation will look like this: log(P) = log(K) - a*r. :[font = input; preserveAspect; endGroup] data = Table[{pdata[[i]][[1]],Log[pdata[[i]][[2]]]}, {i,1,Length[pdata]}]//N; :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] We examine a plot of the logged data to see if it looks like it could be linear, i.e. a possible relationship such as log(P) = log(K) - a*r. :[font = input; preserveAspect; endGroup] pLogPlot = ListPlot[data,PlotStyle->{PointSize[.02]}] :[font = subsubsection; inactive; preserveAspect; endGroup] This logging does not appear to effect a change to a linear function so we try another model. :[font = subsection; inactive; Cclosed; preserveAspect; startGroup] Now we attack the problem as a hyperbola function, i.e. P = K/ r^(a) where P is the population, r is the rank, and a and K are parameters to be determined. :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] If we log both variables, P and r, we should get a linear relationship between the logged data, i.e., log(P) = log(K) - a*log(r). :[font = input; preserveAspect; endGroup] pLogData = Table[{Log[pdata[[i]][[1]]],Log[pdata[[i]][[2]]]}, {i,1,Length[pdata]}]//N; :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] A plot of the logged data looks like it could be linear, i.e. a possible relationship such as log(P) = log(K) - a*log(r). :[font = input; preserveAspect; endGroup] pLogPlot = ListPlot[pLogData,PlotStyle->{PointSize[.02]}] :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] We determine the line which fits the data best. :[font = input; Cclosed; preserveAspect; startGroup] pLog[r_] = Fit[pLogData,{1,r},r] :[font = output; output; inactive; preserveAspect; endGroup; endGroup] 15.62861606572001 - 0.839495691480363*r ;[o] 15.6286 - 0.839496 r :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] And plot it through the logged data for a good feel. :[font = input; preserveAspect] pLinePlot = Plot[pLog[r],{r,0,4}]; :[font = input; preserveAspect; endGroup] Show[pLogPlot,pLinePlot] :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] This looks quite good and so we now we take the relationship as a logarithmic one and solve for the independent variable, p = f(r). :[font = input; preserveAspect] eq = Log[p] == pLog[Log[r]]; :[font = input; preserveAspect; endGroup] sol = Solve[eq,p]; :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] Hence we have our functional relationship: :[font = input; Cclosed; preserveAspect; startGroup] P[r_] = p/.sol[[1]] :[font = output; output; inactive; preserveAspect; endGroup; endGroup] 1.*2.718281828459045^ (15.62861606572001 - 0.839495691480363*Log[r]) ;[o] 15.6286 - 0.839496 Log[r] 1. 2.71828 :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] We can simplify this using our understanding of Logs in Base e and the fact that 2.71828 "is" e to the following form, P = PS(r) = K/ r^(a). :[font = input; Cclosed; preserveAspect; startGroup] PS[r_] = 2.71828^15.6286/r^(0.83946) :[font = output; output; inactive; preserveAspect; endGroup; endGroup] (6.129290061248319*10^6)/r^0.83946 ;[o] 6 6.12929 10 ----------- 0.83946 r :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] And then plot the fitted curve to the data. :[font = input; preserveAspect; endGroup] pf = Plot[PS[r],{r,1,30}] :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] It appears to be a pretty good fit. :[font = input; preserveAspect; endGroup] Show[popPlot,pf] :[font = subsubsection; inactive; preserveAspect; endGroup] The model decreases more smoothly than the data, e.g. the 3 and 4 ranked cities are about the same and the 6, 7, and 8 ranked cities are about the same. :[font = subsection; inactive; Cclosed; preserveAspect; startGroup] Finally we use a direct non-linear least squares approach to minimizing the sum of the square errors between the function of the two parameters k and a and the actual data. We enter the function model first. :[font = input; preserveAspect] P[r_,k_,a_] = k/r^a; :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] We form our sum of square errors as a function of the two parameters k and a which are to be found. :[font = input; preserveAspect; endGroup] SS[k_,a_] = Sum[(pdata[[i]][[2]] - P[pdata[[i]][[1]],k,a])^2, {i,1,Length[pdata]}]; :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] We examine this function of two variables over a wide range of variables for k and a to see reasonable set of starting values for the FindMinimum command. :[font = input; preserveAspect; endGroup] ContourPlot[SS[k,a],{k,6000000,8000000},{a,.5,1.5}] :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] Thus reasonable starting values for k are near 7,000,000 and for a near 0.9. We can now use the power of the FindMinimum command to determine the values of parameters k and a which minimize the sum of square errors (SS): :[font = input; Cclosed; preserveAspect; startGroup] FindMinimum[SS[k,a],{k,7000000},{a,.9}] :[font = output; output; inactive; preserveAspect; endGroup; endGroup] {6.481719707436499*10^11, {k -> 7.00000000291345*10^6, a -> 0.933854084301341}} ;[o] 11 6 {6.48172 10 , {k -> 7. 10 , a -> 0.933854}} :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] We try several other starting values for k and a. :[font = input; Cclosed; preserveAspect; startGroup] FindMinimum[SS[k,a],{k,7500000},{a,.9}] :[font = output; output; inactive; preserveAspect; endGroup] {6.208103953879467*10^11, {k -> 7.157083378839951*10^6, a -> 0.948961875473471}} ;[o] 11 6 {6.2081 10 , {k -> 7.15708 10 , a -> 0.948962}} :[font = input; Cclosed; preserveAspect; startGroup] FindMinimum[SS[k,a],{k,6000000},{a,.9}] :[font = output; output; inactive; preserveAspect; endGroup] {6.208103953874158*10^11, {k -> 7.157083363788432*10^6, a -> 0.948961968860671}} ;[o] 11 6 {6.2081 10 , {k -> 7.15708 10 , a -> 0.948962}} :[font = input; Cclosed; preserveAspect; startGroup] FindMinimum[SS[k,a],{k,7500000},{a,.8}] :[font = output; output; inactive; preserveAspect; endGroup] {6.208103953878746*10^11, {k -> 7.157083696297024*10^6, a -> 0.94896207583786}} ;[o] 11 6 {6.2081 10 , {k -> 7.15708 10 , a -> 0.948962}} :[font = input; Cclosed; preserveAspect; startGroup] sol = FindMinimum[SS[k,a],{k,6500000},{a,.9}] :[font = output; output; inactive; preserveAspect; endGroup; endGroup] {6.208103953914559*10^11, {k -> 7.157083299517888*10^6, a -> 0.94896170161526}} ;[o] 11 6 {6.2081 10 , {k -> 7.15708 10 , a -> 0.948962}} :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] And we accept the values of k = 7.15708 10^6 and a = 0.948962 to obtain a fitted function which we call p1[r]. :[font = input; Cclosed; preserveAspect; startGroup] p1[r_] = P[r,k,a]/.sol[[2]] :[font = output; output; inactive; preserveAspect; endGroup; endGroup] (7.157083299517888*10^6)/r^0.94896170161526 ;[o] 6 7.15708 10 ----------- 0.948962 r :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] We compare this to our previously fitted function from our linearization approach above: :[font = input; Cclosed; preserveAspect; startGroup] PS[r] :[font = output; output; inactive; preserveAspect; endGroup; endGroup] (6.129290061248319*10^6)/r^0.83946 ;[o] 6 6.12929 10 ----------- 0.83946 r :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] In fact we plot the newly obtained function, p1(r). :[font = input; preserveAspect; endGroup] pf1 = Plot[p1[r],{r,1,30}] :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] And we plot both functions, PS(r) and p1(r), over the data. We see that both seem to do quite well, but the direct least squares approach (p1(r)) predicts the Rank 1 city better than the linearized form, PS(r). :[font = input; Cclosed; preserveAspect; startGroup] pdata[[1]] :[font = output; output; inactive; preserveAspect; endGroup] {1, 7262700} ;[o] {1, 7262700} :[font = input; Cclosed; preserveAspect; startGroup] p1[1] :[font = output; output; inactive; preserveAspect; endGroup] 7.157083299517888*10^6 ;[o] 6 7.15708 10 :[font = input; Cclosed; preserveAspect; startGroup] PS[1] :[font = output; output; inactive; preserveAspect; endGroup; endGroup] 6.129290061248319*10^6 ;[o] 6 6.12929 10 :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] We look at a plot of both functions, PS(r) and p1(r), over the data. :[font = input; preserveAspect; endGroup] Show[popPlot,pf,pf1] :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] We look at the function resulting from linearized approach. :[font = input; preserveAspect; endGroup] Show[popPlot,pf] :[font = subsubsection; inactive; Cclosed; preserveAspect; startGroup] We look at the function resulting from non-linear least square approach. :[font = input; preserveAspect; endGroup] Show[popPlot,pf1] :[font = subsubsection; inactive; preserveAspect; endGroup; endGroup] Finally we see that the non-linearized approach is a bit low in predicting the population of the smaller cities from the rank. :[font = section; inactive; Cclosed; pageBreak; preserveAspect; startGroup] ISSUES IN SOLUTION :[font = subsection; inactive; preserveAspect; endGroup; endGroup] Students need to plot the data first and roam through their function catalogue for possible fits. There is no physics of fit in this problem, i.e., no underlying physical model which would be THE one to apply based on first principles of physics, but rather there is a shape with parameters. Students may try out several shapes plotted over the data, attempting various shapes, and in so doing altering the parameters. But eventually they should settle on a model equation or function and go after the parameters. Several models may come up and students should be encourage to compare at all stages. Moreover, students may not be self-motivated to linearize the data, i.e., convert the model form to a linear relationship between converted variables the way scientists, e.g. chemists, routinely do. In the solution we offered the exponential model P = K e^(-a r) was considered first, but abandoned after the linearization process seemed to indicate the linearized data was not going to be linear. The hyperbolic model seemed to work as a second choice and no other model was discussed. But we have had students offer other quite different models, e.g., K tan(a (1 - r/30)). ^*)