POSSIBLE SOLUTION(S)
First we set up a data set. We enter the {Rank, Population} data.
Input :=
pdata = {{1,7262700},{2,3259340},{3,3009530},{4,1728910},
{5,1642900}, {6,1086220},{7,1015190},{8,1003520},
{9,914350},{10,894070}, {11,752800},{12,749000},
{13,719820},{20,566030},{30,429550}};
We plot the {Rank, Population} data.
Input :=
popPlot = ListPlot[pdata,PlotStyle->{PointSize[.02]}]
Output =
-Graphics-
First we attack the problem as an exponential decay function, i.e.
P = K Exp[ - a r] where P is the population, r is the rank, and a and K are parameters to be determined.
This looks like an exponential decay and if we take the logarithm of both sides of the equation (linearizing we hope), say to base e, of P component of each data point and plot log(P) vs. r we should get a linear relationship between the transformed or linearized data. The transformed equation will look like this: log(P) = log(K) - a*r.
Input :=
data = Table[{pdata[[i]][[1]],Log[pdata[[i]][[2]]]},
{i,1,Length[pdata]}]//N;
We examine a plot of the logged data to see if it looks like it could be linear, i.e. a possible relationship such as log(P) = log(K) - a*r.
Input :=
pLogPlot = ListPlot[data,PlotStyle->{PointSize[.02]}]
Output =
-Graphics-
This logging does not appear to effect a change to a linear function so we try another model.
Now we attack the problem as a hyperbola function, i.e. P = K/ r^(a) where P is the population, r is the rank, and a and K are parameters to be determined.
If we log both variables, P and r, we should get a linear relationship between the logged data, i.e., log(P) = log(K) - a*log(r).
Input :=
pLogData = Table[{Log[pdata[[i]][[1]]],Log[pdata[[i]][[2]]]},
{i,1,Length[pdata]}]//N;
A plot of the logged data looks like it could be linear, i.e. a possible relationship such as log(P) = log(K) - a*log(r).
Input :=
pLogPlot = ListPlot[pLogData,PlotStyle->{PointSize[.02]}]
Output =
-Graphics-
We determine the line which fits the data best.
Input :=
pLog[r_] = Fit[pLogData,{1,r},r]
Output =
15.6286 - 0.839496 r
And plot it through the logged data for a good feel.
Input :=
pLinePlot = Plot[pLog[r],{r,0,4}];
Input :=
Show[pLogPlot,pLinePlot]
Output =
-Graphics-
This looks quite good and so we now we take the relationship as a logarithmic one and solve for the independent variable, p = f(r).
Input :=
eq = Log[p] == pLog[Log[r]];
Input :=
sol = Solve[eq,p];
Hence we have our functional relationship:
Input :=
P[r_] = p/.sol[[1]]
Output =
15.6286 - 0.839496 Log[r]
1. 2.71828
We can simplify this using our understanding of Logs in Base e and the fact that 2.71828 "is" e to the following form, P = PS(r) = K/ r^(a).
Input :=
PS[r_] = 2.71828^15.6286/r^(0.83946)
Output =
6
6.12929 10
-----------
0.83946
r
And then plot the fitted curve to the data.
Input :=
pf = Plot[PS[r],{r,1,30}]
Output =
-Graphics-
It appears to be a pretty good fit.
Input :=
Show[popPlot,pf]
Output =
-Graphics-
The model decreases more smoothly than the data, e.g. the 3 and 4 ranked cities are about the same and the 6, 7, and 8 ranked cities are about the same.
Finally we use a direct non-linear least squares approach to minimizing the sum of the square errors between the function of the two parameters k and a and the actual data. We enter the function model first.
Input :=
P[r_,k_,a_] = k/r^a;
We form our sum of square errors as a function of the two parameters k and a which are to be found.
Input :=
SS[k_,a_] =
Sum[(pdata[[i]][[2]] - P[pdata[[i]][[1]],k,a])^2,
{i,1,Length[pdata]}];
We examine this function of two variables over a wide range of variables for k and a to see reasonable set of starting values for the FindMinimum command.
Input :=
ContourPlot[SS[k,a],{k,6000000,8000000},{a,.5,1.5}]
Output =
-ContourGraphics-
Thus reasonable starting values for k are near 7,000,000 and for a near 0.9. We can now use the power of the FindMinimum command to determine the values of parameters k and a which minimize the sum of square errors (SS):
Input :=
FindMinimum[SS[k,a],{k,7000000},{a,.9}]
Output =
11 6
{6.48172 10 , {k -> 7. 10 , a -> 0.933854}}
We try several other starting values for k and a.
Input :=
FindMinimum[SS[k,a],{k,7500000},{a,.9}]
Output =
11 6
{6.2081 10 , {k -> 7.15708 10 , a -> 0.948962}}
Input :=
FindMinimum[SS[k,a],{k,6000000},{a,.9}]
Output =
11 6
{6.2081 10 , {k -> 7.15708 10 , a -> 0.948962}}
Input :=
FindMinimum[SS[k,a],{k,7500000},{a,.8}]
Output =
11 6
{6.2081 10 , {k -> 7.15708 10 , a -> 0.948962}}
Input :=
sol = FindMinimum[SS[k,a],{k,6500000},{a,.9}]
Output =
11 6
{6.2081 10 , {k -> 7.15708 10 , a -> 0.948962}}
And we accept the values of k = 7.15708 10^6 and a = 0.948962 to obtain a fitted function which we call p1[r].
Input :=
p1[r_] = P[r,k,a]/.sol[[2]]
Output =
6
7.15708 10
-----------
0.948962
r
We compare this to our previously fitted function from our linearization approach above:
Input :=
PS[r]
Output =
6
6.12929 10
-----------
0.83946
r
In fact we plot the newly obtained function, p1(r).
Input :=
pf1 = Plot[p1[r],{r,1,30}]
Output =
-Graphics-
And we plot both functions, PS(r) and p1(r), over the data. We see that both seem to do quite well, but the direct least squares approach (p1(r)) predicts the Rank 1 city better than the linearized form, PS(r).
Input :=
pdata[[1]]
Output =
{1, 7262700}
Input :=
p1[1]
Output =
6
7.15708 10
Input :=
PS[1]
Output =
6
6.12929 10
We look at a plot of both functions, PS(r) and p1(r), over the data.
Input :=
Show[popPlot,pf,pf1]
Output =
-Graphics-
We look at the function resulting from linearized approach.
Input :=
Show[popPlot,pf]
Output =
-Graphics-
We look at the function resulting from non-linear least square approach.
Input :=
Show[popPlot,pf1]
Output =
-Graphics-
Finally we see that the non-linearized approach is a bit low in predicting the population of the smaller cities from the rank.