Regression Analyses of Yields of Hybrid Corn and Inbred Parent Lines.-- 1. Derivation of a theoretical regression function. For n loci let the basic effect of a gene substitution be d, dominance effect kd, proportions of loci AA in P_{1} and P_{2} be u and w, the multiple recessive phenotype T, and gene action additive.

P_{1} = 2und + T, P_{2} = 2wd + T,

F_{1} = 2uwnd + [u(l-w) + w(l-u)] (nd + nkd) + T,

F_{1} = (1 + k + kT/nd) (P_{1} + P_{2})[2-(k/2nd)P_{1}P_{2}-(k/2nd)T^{2}-kT,

F_{1} = b_{1} P - b_{2}P_{1}P_{2} + C_{1}, where P = (P_{1}+ P_{2})/2

With each generation of selfing 1/2 of dominance effects disappear. Divide each term in k by 2 for each time selfed to obtain the general function for Fn. This function is a surface which is curved if there is any dominance (k not zero). (Regression of F_{1} on mean of parents neglects the second term of the function. A plane is fitted where a curved surface provides a closer fit if there is dominance).

Regression of F_{1} on P_{2} with constant P_{1} (any single F_{1} column in Stringfield's table below) is obtained by treating P_{1} as a constant in the main function.

F_{1} = [1/2 + k/2 - k(P_{1} - T)/2nd] P_{2} + C_{2}

The partial regression coefficient bp is contained in the brackets. Its value manifestly depends upon the value of constant P_{1}. P_{2} is the independent variable. Substitution of AA for aa at one locus in P_{2} provides an increment 2d. The corresponding increment of F_{1} is [1/2 + k/2 - k(P_{1} - T)/2nd] 2d. The first term of this expression, (1/2)2d = d, accounts for the basic effect of an additional A allele in F_{1} coming from P_{2}. The second term, (k/2)2d = kd, provides a dominance effect. If, however, P_{1} is AA at that locus no dominance effect will be added to F_{1} by the substitution, and the one already there will disappear. P_{1} is AA at u loci, and (P_{1} - T)/2nd = u. The third term adds [-k(P_{1} - T)/2nd] 2d = -2ukd.

Under the assumptions, our main function calculates exactly mean F_{1} for any type pair of parent values. Variance from such means, or deviations from the regression surface are due solely to variations in degree of heterozygosity. This portion of the variance is beyond parent criteria. Present parent criteria P and P_{1}P_{2} together provide maximum estimation of F_{1} by parent criteria. It is clear that the mean degree of heterozygosity is greater in crosses of good × poor lines than in crosses of medium × medium lines and that the product of parents P_{1}P_{2} is included to measure that variation. It must also be clear that the various genetic interpretations inserted along have not been employed in the mathematical derivations. For the most part they were not recognized until after completion of the algebraic formulations.

Finally regression of bp on P_{1} is given by the formula for bp. The regression coefficient is (-k/2nd) which is b_{2} of the main function. It will be labeled b_{2} here also since the two coefficients are identical.

2. Fitting the functions to data. An unpublished table kindly furnished by Mr. G. H. Stringfield is included to illustrate the process of fitting. Values of bp at the bottom are simply regressions of F_{1} of the respective columns on P_{2}. Regression of the values of bp at the bottom of the table on the values of P_{1} at the top is -0.015, and the correlation is -0.98 which is highly significant.

F_{1} and parents, bushels per acre, (G. H. Stringfield, unpublished)

P_{2} |
P_{1} |
4-8 13.6 |
90 28.2 |
Hy 29.8 |
O2 46.1 |
WF9 51.4 |
51 55.3 |

4-8, 13.6 | 76.7 | 96.3 | 91.0 | 100.7 | 106.1 | ||

90, 28.2 | 76.7 | 81.4 | 94.2 | 97.9 | 86.4 | ||

Hy, 29.8 | 96.3 | 81.4 | 108.9 | 109.8 | 94.7 | ||

O2, 46.1 | 91.0 | 94.2 | 108.9 | 104.0 | 100.8 | ||

WF9 51.4 | 100.7 | 97.9 | 109.8 | 104.0 | 103.4 | ||

51 55.3 | 106.1 | 86.4 | 94.7 | 100.8 | 103.4 | ||

bp | .6947 | .4060 | .3433 | .2314 | .0516 | .0512 | |

Mean P_{2} |
42.0 | 39.2 | 38.8 | 35.6 | 34.6 | 33.8 | |

Mean F_{1} |
94.2 | 87.2 | 98.2 | 99.8 | 103.2 | 98.2 |

From this regression the estimated value of P_{1} for bp = 0 is 57.1 bushels per acre which is just beyond the range of the data. The same process has been applied to the other sets of data listed in the second table. Where significant values of b_{2} have been obtained the main multiple regression function has also been fitted. In each case the second estimate of b_{2} agreed closely with the first one, which provides a computation check since the two are algebraically identical also in the computation formulas.

The last five items in the table were then computed by quadratic solution of the multiple regression function on the assumption that where P_{1} and P_{2} are both completely aa or completely AA, P_{1} = P_{2} = F_{1} = F_{2}. Roots thus obtained are estimates of the bottom recessive and top dominant.

3. Interpretation. First I must note that I have never had any notion that yield of corn could depend upon a multiple set of genes with uniform d and kd from locus to locus. Variation of d and of kd must contribute to the variance of F_{1} and thus provide additional variance from the present regression surface. Beyond that I doubt that variation of d and kd could confuse present analyses.

Evidence here for overdominance (no dominance, k = 0; complete dominance k = ± 1; overdominance k numerically greater than one) seems to lie in the estimated values of P_{1} for zero partial regression. If dominance is complete, zero partial regression will obtain only when P_{1} is the top dominant. This statement agrees with long held genetic philosophy of prepotence. That it is mathematically true in present theory may be seen by setting bp = 0 and k = 1 in the partial regression coefficient formula and solving to find (P_{1} - T)/ 2nd = u = 1. Note also that with complete dominance the top dominant and top heterozygote are equal. Since for present data, values of completely prepotent P_{1}, (bp = 0), are far below mean F_{1}, the only direct interpretation is overdominance, see values of k estimated from the data. It would seem to make no difference whether the genes of P_{1} and P_{2} are completely linked or completely independent, so far as immediate contributions to F_{1} are concerned.

Fisher, (Genetical Theory of Natural Selection) gives the condition for equilibrium where the heterozygote has selective advantage over both homozygotes for one pair. His mathematical condition is identical with the present one for bp = 0 for any value of k (Selective advantage) except that his condition is in terms of the proportions of a and A alleles in the population at equilibrium. The present condition is in terms of u, the proportion of loci AA in P_{1}. If many loci are all at Fisher equilibrium in a cross breeding variety the expected value of u for a homozygote derived without bias is identical with q for the variety. Or if u for a group of lines is identical with q for equilibrium the lines as a set are at equilibrium. Every line, good or poor, will then have the same general combining ability as measured by the average of its crosses with all of the other lines. Equilibrium for each locus is at the instant where a and A alleles combine equally well with the field.

REGRESSION ANALYSES OF YIELDS OF HYBRID CORN AND INBRED PARENT LINES

Mean partial regression |
b_{2} |
Estimated P for bp = 0 |
Mean F_{1}or F _{2} |
Bottom recessive |
Top dominant |
Maximum F _{1} ofhomozygous parents |
Maximum open pollinating variety |
k | ||

Stringfield,^{1} |
F_{1} |
0.30 | -0.015** | 57.1 | 96.8 | -44.2 | 88.5 | 146.3 | 102.0 | 1.87 |

F_{2} |
0.34 | -0.009** | 76.7 | 69.9 | -48.9 | 82.7 | 159.1 | 2.16 | ||

Kinman & Sprague,^{2} |
F_{1} |
0.42 | -0.015* | 54.2 | 79.9 | -29.5 | 76.2 | 120.0 | 91.2 | 1.64 |

F_{2} |
0.42 | +0.005 | - | 50.8 | - | - | - | - | ||

Jorgensen & Brewbaker^{3} |
0.04 | -0.002 | 210.1 | 372.5 | ||||||

Nilsson-Leissner,^{4} |
||||||||||

Dent I | 0.28 | -0.008** | 154.9 | 314.5 | -44.6 | 224.1 | 369.9 | 324.5 | 2.08 | |

Dent II | 0.22 | -0.004 | 130.5 | 291.4 | ||||||

Flint I | 0.36 | -0.0002 | 2430.2 | |||||||

Flint II | 0.62 | -0.0008 | 888.3 | |||||||

Jenkins,^{5} |
||||||||||

white '26 | 0.65 | +0.018 | ||||||||

early yellow, '26 | 0.38 | -0.052** | ||||||||

later yellow, '26 | 0.10 | +0.037 | ||||||||

white, '27 | -0.09 | +0.153 | ||||||||

yellow, '27 | 0.07 | -0.002 |

* Significant

** Highly significant

1. Unpublished, see text.

2. Jour. Am. Soc. Agron., May, 1945

3. " Sept., 1927

4. " May, 1927

5. Jour. Agr. Res. Nov. 1. 1929

Jenkins (1929) almost attained that condition (last 3 entries in present table). For those data the partial regressions are nearly as frequently negative as positive and almost uniformly small numerically. After much selection Stringfield, and Kinman and Sprague studied groups of lines which show recession from the equilibrium which well selected varieties had closely approached 20 years or more ago. Recession may be due to mixing lines from different sources in one group and probably to selection for specific combining ability (more than average heterozygosity). The ceiling for hybrids is higher if one line has fewer AA loci, but this point can hardly be fully demonstrated without a 3-dimensional figure.

From the 3-dimensional figure for overdominance of the degree indicated (k = 2) it is clear that the F_{1} trend for increasing P_{1} and P_{2} rises steeply over most of the range of present corn breeding experience which just laps over the crest. Beyond the trend is downwards. Beyond we have hardly gone, partly because of linkage as visioned by Jones and partly because present practice requires slight recession from the crest to another equilibrium between selection for specific combining ability and selection for general combining ability and excellence of lines themselves.

Present interpretations must remain in some degree tentative until lines well beyond the crest to provide significant negative partial regressions have been obtained. Before such evidence any alternative interpretation of complex, non-additive gene action would stand entirely refuted, I think. Excess of any heterozygote over the top dominant would seem to be overdominance by definition. The possibility of explaining present results by non-additive action without overdominance is very small insofar as I can tell but space does not permit more to be said here. Neither does space permit listing of every point where overdominance theory agrees with corn breeding experience more closely than does dominance theory. I have found no discrepancies and so must say that the evidence for overdominance must seem overwhelming but not crucial to any unprejudiced mind. It will be appreciated if any discrepancies are pointed out.

The same analysis has been employed with data on other characters of Jenkins (loc. cit.) with no evidence of overdominance and in most cases slight evidence of any dominance at all. Height of plant is an exception, but it depends largely on vigor. No data on ear dimensions have been available.

Fred H. Hull