The context of Bill Mazeroski's fielding

A long discussion on SABR-L, the Society for American Baseball Research mailing list, caused me to try to determine how much of Bill Mazeroski's fielding numbers came out of the fact that he was a great fielder and how much was a result of his environment.+
I looked at some data for the NL and the Pirates during two time periods;
1. 1947-55, 1973-81 (the 17 years surrounding Mazeroski's career) 2. 1956-1972 (Mazeroski's 17 years)
[numbers from "Sports Encyclopedia:Baseball", Total Baseball and "The Bill James Electronic Encyclopedia"] The Pirates played 2707 games in Mazeroski's 17 years (159.2/yr) so I projected totals to 160 games;
```       G   PO2b  A2b   E2b  DP2b    PO     A      E     DP  othDP
1. NL 160 399.6 473.6 21.5 107.9  4295.5 1784.7 147.4 152.4  44.5
1.Pit 160 413.9 478.2 22.5 111.1  4283.1 1791.4 155.9 153.4  42.3
2. NL 160 380.5 465.1 20.7 104.6  4306.7 1755.2 144.0 149.8  45.1
2.Pit 160 390.1 522.4 17.0 130.6  4303.2 1848.4 150.2 175.7  45.2 ```
othDP is double plays turned not involving second basemen. Mazeroski's years with the pirates are in bold
Many of the extra assists for the Pirates appear to come from their 2b. The others probably come from a slight GB tendency of the Pirate pitchers. As another check on this I looked at the A/PO ratio for pitchers, the A/PO ratio for the Pirate pitchers was 2.95, for all NL pitchers it was 2.84. (But see later for a better look into this)
How do the extra plays turned by Pirates 2b affect other total? (especially runs allowed and DPs) I looked at total outs per game (PO/G), outs involving 2b ((PO2b+A2b)/G) and outs no involving 2b ((PO/G-(PO2b+A2b)/G)) for 1956-72;
```     PO/G   2b   others    H     BB    K   Approx. OBA
NL  26.917 5.285 21.632
Pit 26.895 5.703 21.192  8.731 2.963 5.288    .3030
```
Using the "others" as a "clock" (assuming that over the years the rest of the players come out as average fielders and make plays at the same rate), I estimate that an average 2b would make 5.178 plays in the same time that the Pirates 2b made 5.703 plays. This accounts for 26.369 outs. To get the rest of the outs will take another 1.99% "longer" giving the totals
``` PO/G   2b   others    H     BB    K    Approx. OBA
26.895 5.281 21.614  8.905 3.022 5.2391   .3072
```
Using Bill James' simple Runs Created estimate RC=OBA*TB, gives an increase of 3.401% in RC (OBA increased by 1.38% and TB by 1.99%) The Pirates allowed 628.533 runs per 160 games. A 3.4% increase sends this to 649.910 R/160G, indicating a saving of 21.377 runs per tear. Over 17 years this amounts to 363.4 runs, consistent with total baseball's 362 Fielding Runs for Mazeroski, though a bit lower than Mike Emeigh's figures. (I could also project this to account for the fact that more runs allowed means more losses ==> a few more outs at home, a few less on the road, but the effect is small enough that I'm not going to worry about it.)

Regression models

Writing in SABR-L, Michael Mavrogiannis suggested, in "Mazeroski's Restaurant (Part 1)", that a regression model be used to predict how many double plays should be turned behind the Pirates pitching staff. I ran the numbers for all NL teams from 1956-1972 and found that the coefficients for errors and IP are not (statistically) significantly different from 0. If these are removed the "P-value' for the model actually improves. When this is done I get the model
```Proj. DP= 83.269 +.0641*H - .0190*HR + .0944*BB - .0494*SO
with R^2=.249, SE=16.67, F(4,161)=13.36 ```
Using this model, the Pirates turned an extra 351 DPs in Mazeroski's 17 years, with the worst values coming in 1956, '57 and '71. Looking only at the years 1958-70 gives 351 extra DPs in 13 years.
Michael also looked at this using a Dummy Variable for the Pirates data. When I looked at the 1956-1972 data I got:
```Proj.DP=-19.939 + .116*IP + .103*BB - .0568*SO + 27.947*DV
r^2=.412, SE=14.76, F(4,161)=28.20
```
greatly "improving" the model and implying that the Pirates were turning an extra 27 double plays per year.
I also tried making the dummy variable a little smarter by replacing the "1" by the fraction of Pirates games played by Mazeroski that year (basically measuring the "Mazeroski-Ness" of each team) which gave the model
```Proj. DP=-22.687+.117*IP+.104*BB-.0562*SO+33.813*MAZ
r^2=.416, SE=14.71, F(4,161)=28.61
```
implying that Mazeroski was worth 33.81DP (SE=4.58), with a 95% confidence interval of [24.88,42.75].
Simply looking at the correlations between the variables I found the following correlations significant at the 1% level; positive correlations: IP/SO, H/HR, H/E, H/DP, HR/BB, BB/DP, DP/DV negative correlations: IP/HR, H/SO, HR/SO, HR/DVar, SO/DP indicating that the model for projecting DPs should have H, BB, K and the Pirates dummy variable in it. The IP, HR and E variables probably don't belong. The presence of the HR variable in the non-DV model probably came about because the Pirates tended to give up fewer HR (Forbes Field) and the effect of the missing DV probably appeared as a HR coefficient. The negative correlations for IP/HR and HR/SO are probably due to the fact that early expansion (more IP) took place in the early '60s. (bigger strike zone, fewer HR)

Mixing this with the "effects of good fielding 2b", without the Pirates dummy variable I get, per 160 games;
```           IP     H     HR    BB    SO  projDP actDP
NL       1435.6 1374.3 128.9 490.0 889.4 149.2 149.8
Pit      1428.5 1390.2 110.0 471.9 842.0 154.3 175.7
PitnoMaz 1428.5 1417.9 113.2 481.3 858.5 155.7
```
Indicating that playing for the Pirates was a DP inducing environment, but that Mazeroski was so good at turning the DP that he cost himself about 1.5 DP opportunities per year relative to other 2b. (and cost the pitchers 15-20 SO)

Regressions looking a ground ball tendencies

On SABR-L Mark Wernick suggested trying to account for the fact that the Pirates seemed to have a staff that tended to induce ground balls. Taking a page from Clay Davenport, I'll look at the ratio of infield assists to outfield putouts. (hereafter referred to as IO) I checked this against the groundball-to-flyball ratio in the AL 1993-1995 and found a correlation of +.87, so this seems like a good first approximation. I then used this to build a model for predicting the number of DPs turned.
`Projected DP/G = .8211 + .29567*IO  - .048679*K/G`
Using this, I find that the Pirates project to an extra .040 DP/G (6.4 per year). The rest was Mazeroski. Using these calculations to adjust the assists, I get
```        RF  adj.RF  FA   DP/G Adj.play-ratio E-ratio
McPhee +.48  +.45  +.019 +.123    1.077       1.34
Lajoie +.54  +.51  +.011 +.145    1.098       1.29
Maz    +.54  +.40  +.006 +.181    1.080       1.34
```
The "adjusted range factor" takes into account the ground ball tendencies of the Pirates as found above. I have not yet tried to account for putouts.

For another look at IO, I checked the Bill James Electronic Encyclopedia and the Franklin Electronic Baseball Encyclopedia for the totals for each team for the years 1956-1972 and found the following:
```                            Assists                                non2b
Team        G    DP     P   1b   2b   3b   SS  Inf-A   OF-PO 2b-DP  -DP  IO  LH IP (#)
Braves    2708  2478  4102 1838 7840 5581 8604 27,965 17,053  1782  696 1.64 7141.3(33)
Cubs      2714  2565  4221 1898 8482 5701 8846 29,148 16,128  1789  776 1.81 5762.7(49)
Reds      2700  2469  3798 1826 7381 5162 8144 26,311 16,934  1679  790 1.55 6857.7(37)
Dodgers   2706  2460  3988 1823 7479 5859 8714 27,863 15,746  1698  762 1.77 9550.7(25)
Phillies  2702  2525  3691 1849 7398 5364 8461 26,763 17,039  1757  768 1.57 6915  (27)
Pirates   2707  2973  3951 1955 8838 5429 9024 29,197 16,504  2209  764 1.77 6158.7(31)
Giants    2706  2353  4160 1775 7770 5350 8434 27,489 16,623  1595  758 1.65 7221.7(33)
Cardinals 2705  2615  3876 1841 7938 5358 8582 27,595 16,597  1823  792 1.66 7198.3(55)

Avg.      2706  2555  3973 1851 7891 5476 8601 27,791 16,578  1792  763 1.68 7100.7(36)
Pir.diff    +1  +418   +22 +104 +947  -47 +423 +1,406    -74  +417   +1 +.09 -942  (-5)
```
Avg. is the average of the 8 teams rounded to the nearest integer
Pir.diff is the Avg. subtracted from the Pirate total
LH IP is the number of innings thrown by Left-handed pitchers (#) is the number of left-handed pitchers used by the team

Again, Mazeroski was helped by the Pirates being a groundball staff, but he was clearly a great fielder.

Here are more of the pitching numbers to help set the context:
```          Team        G    IP     LHIP    H    HR    SO   BB    E
Braves    2708 24272   7141.3 23369 2346 13962 8001 2260
Cubs      2714 24250   5762.7 23931 2461 14390 8403 2481
Reds      2700 24275   6857.7 23450 2301 14859 8419 2151
Dodgers   2706 24389.3 9550.7 22321 2128 16771 8005 2414
Phillies  2702 24219.3 6915   23959 2239 15224 8167 2323
Pirates   2707 24284   6158.7 23634 1887 14314 8022 2542
Giants    2706 24388.7 7221.7 22931 2260 15202 8102 2651
Cardinals 2705 24325.3 7198.3 23225 2126 15175 8625 2448

Avg.      2706 24300.5 7100.7 23353 2219 14987 8218 2409
Pir. diff.  +1   -16.5 -942    +281 -332  -673 -196 +133
```
For more context, I looked at the team statistics for National League teams 1956-1972 and found the following regression model:
```DP2b = -.0740 + .0268*LH + .0410*H + .0727*BBHB - .193*E + .0767*A + .0170*Maz + .0161*pir
R^2=.61 SE=.0079
```
Where DP2b is DP by 2b divided by team IP
LH is fraction of IP by left-handed pitchers
BBHB is walks + hit batters
A is team assists,
Maz is the fraction of the team's games that Maz played 2b.
pir is the fraction of the team's games played that were Pirate games but did not have Mazeroski playing.
(i.e. if Maz played 108 of the Pirates' 162 games, Maz is 2/3, pir is 1/3, any Cubs games, Maz and pir are both 0)

This has Mazeroski accounting for an extra 318 DP (18.7 per year, or 24.3 per 160G) and the Pirates "context" (Pitchers, ball park, etc.) contributed 89 DP (5.2 per year) The SE on the Mazeroski coefficient is .00270, about 50.7 DP over his career, 3.0 per year or 3.9 per 160 G.
If we remove the "Maz" and "pir" variables from the model, the R^2 drops below 50% (without the LH IP it's 44%)

Leading DP makers of Mazeroski's time

Rel.DP Year Team Regular 2b
+41 1966 Pirates Mazeroski
+34 1972 Pirates Cash
+34 1961 Pirates Mazeroski
+32 1965 Pirates Mazeroski
+30 1963 Pirates Mazeroski
+28 1962 Pirates Mazeroski
+27 1958 Dodgers Neal
+26 1971 Braves Millan
+24 1960 Pirates Mazeroski
+23 1971 Giants Fuentes
+22 1972 Astros Helms
+21 1959 Pirates Mazeroski
+20 1961 Cubs Zimmer
+20 1971 Reds Helms
+19 1957 Cardinals Blasingame
+17 1961 Phillies Taylor
+17 1958 Pirates Mazeroski
+16 1969 Pirates Mazeroski/Alley/Martinez
+16 1969 Expos Sutherland
+16 1956 Cardinals Blasingame
+15 1958 Cubs Taylor

Consider the `best' model that dos not include the dummy variables;

```DP2b = -.0916 + .0121*LH + .0405*H + .0623*BBHB - .174*E + .0955*A
R^2=.45 SE=.0092
```
Using this, I found the predicted number of double plays for each National League team for the years 1956-1972. I then compared this number to the actual number of DPs turned by the team's second baseman. There were 21 teams whose second basemen turned 15 "extra" double plays. The table at right lists the teams and their regular secondbaseman. The secnodbasemen who appeared on the list more than once were
```    Bill Mazeroski 8 times
Tommy Helms    2 times
Don Blasingame 2 times
Tony Taylor    2 times
```
I did not included the 1969 Pirates in the Mazeroksi count because he played fewer than half the Pirates games. The annual "relative DPs" for the Pirates beginning in 1956, were
```
Year: 56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
DP:   +6 +12 +17 +21 +24 +34 +28 +30  +6 +32 +41  +0  -2 +16 +11 +14 +34
```
Suggesting that Mazeroski took a few years to learn how to turn the pivot and that his skill considerably diminished after 1966.

Send comments to me at john.rickert@rose-hulman.edu