Regression Analyses of Yields of Hybrid Corn and Inbred Parent Lines.-- 1. Derivation of a theoretical regression function. For n loci let the basic effect of a gene substitution be d, dominance effect kd, proportions of loci AA in P1 and P2 be u and w, the multiple recessive phenotype T, and gene action additive.

P1 = 2und + T, P2 = 2wd + T,
F1 = 2uwnd + [u(l-w) + w(l-u)] (nd + nkd) + T,
F1 = (1 + k + kT/nd) (P1 + P2)[2-(k/2nd)P1P2-(k/2nd)T2-kT,
F1 = b1 P - b2P1P2 + C1, where P = (P1+ P2)/2

With each generation of selfing 1/2 of dominance effects disappear. Divide each term in k by 2 for each time selfed to obtain the general function for Fn. This function is a surface which is curved if there is any dominance (k not zero). (Regression of F1 on mean of parents neglects the second term of the function. A plane is fitted where a curved surface provides a closer fit if there is dominance).

Regression of F1 on P2 with constant P1 (any single F1 column in Stringfield's table below) is obtained by treating P1 as a constant in the main function.

F1 = [1/2 + k/2 - k(P1 - T)/2nd] P2 + C2

The partial regression coefficient bp is contained in the brackets. Its value manifestly depends upon the value of constant P1. P2 is the independent variable. Substitution of AA for aa at one locus in P2 provides an increment 2d. The corresponding increment of F1 is [1/2 + k/2 - k(P1 - T)/2nd] 2d. The first term of this expression, (1/2)2d = d, accounts for the basic effect of an additional A allele in F1 coming from P2. The second term, (k/2)2d = kd, provides a dominance effect. If, however, P1 is AA at that locus no dominance effect will be added to F1 by the substitution, and the one already there will disappear. P1 is AA at u loci, and (P1 - T)/2nd = u. The third term adds [-k(P1 - T)/2nd] 2d = -2ukd.

Under the assumptions, our main function calculates exactly mean F1 for any type pair of parent values. Variance from such means, or deviations from the regression surface are due solely to variations in degree of heterozygosity. This portion of the variance is beyond parent criteria. Present parent criteria P and P1P2 together provide maximum estimation of F1 by parent criteria. It is clear that the mean degree of heterozygosity is greater in crosses of good × poor lines than in crosses of medium × medium lines and that the product of parents P1P2 is included to measure that variation. It must also be clear that the various genetic interpretations inserted along have not been employed in the mathematical derivations. For the most part they were not recognized until after completion of the algebraic formulations.

Finally regression of bp on P1 is given by the formula for bp. The regression coefficient is (-k/2nd) which is b2 of the main function. It will be labeled b2 here also since the two coefficients are identical.

2. Fitting the functions to data. An unpublished table kindly furnished by Mr. G. H. Stringfield is included to illustrate the process of fitting. Values of bp at the bottom are simply regressions of F1 of the respective columns on P2. Regression of the values of bp at the bottom of the table on the values of P1 at the top is -0.015, and the correlation is -0.98 which is highly significant.

F1 and parents, bushels per acre, (G. H. Stringfield, unpublished)

P2 P1 4-8
4-8, 13.6   76.7 96.3 91.0 100.7 106.1
90, 28.2 76.7   81.4 94.2 97.9 86.4
Hy, 29.8 96.3 81.4   108.9 109.8 94.7
O2, 46.1 91.0 94.2 108.9   104.0 100.8
WF9 51.4 100.7 97.9 109.8 104.0   103.4
51 55.3 106.1 86.4 94.7 100.8 103.4  
bp .6947 .4060 .3433 .2314 .0516 .0512
Mean P2 42.0 39.2 38.8 35.6 34.6 33.8
Mean F1 94.2 87.2 98.2 99.8 103.2 98.2

From this regression the estimated value of P1 for bp = 0 is 57.1 bushels per acre which is just beyond the range of the data. The same process has been applied to the other sets of data listed in the second table. Where significant values of b2 have been obtained the main multiple regression function has also been fitted. In each case the second estimate of b2 agreed closely with the first one, which provides a computation check since the two are algebraically identical also in the computation formulas.

The last five items in the table were then computed by quadratic solution of the multiple regression function on the assumption that where P1 and P2 are both completely aa or completely AA, P1 = P2 = F1 = F2. Roots thus obtained are estimates of the bottom recessive and top dominant.

3. Interpretation. First I must note that I have never had any notion that yield of corn could depend upon a multiple set of genes with uniform d and kd from locus to locus. Variation of d and of kd must contribute to the variance of F1 and thus provide additional variance from the present regression surface. Beyond that I doubt that variation of d and kd could confuse present analyses.

Evidence here for overdominance (no dominance, k = 0; complete dominance k = ± 1; overdominance k numerically greater than one) seems to lie in the estimated values of P1 for zero partial regression. If dominance is complete, zero partial regression will obtain only when P1 is the top dominant. This statement agrees with long held genetic philosophy of prepotence. That it is mathematically true in present theory may be seen by setting bp = 0 and k = 1 in the partial regression coefficient formula and solving to find (P1 - T)/ 2nd = u = 1. Note also that with complete dominance the top dominant and top heterozygote are equal. Since for present data, values of completely prepotent P1, (bp = 0), are far below mean F1, the only direct interpretation is overdominance, see values of k estimated from the data. It would seem to make no difference whether the genes of P1 and P2 are completely linked or completely independent, so far as immediate contributions to F1 are concerned.

Fisher, (Genetical Theory of Natural Selection) gives the condition for equilibrium where the heterozygote has selective advantage over both homozygotes for one pair. His mathematical condition is identical with the present one for bp = 0 for any value of k (Selective advantage) except that his condition is in terms of the proportions of a and A alleles in the population at equilibrium. The present condition is in terms of u, the proportion of loci AA in P1. If many loci are all at Fisher equilibrium in a cross breeding variety the expected value of u for a homozygote derived without bias is identical with q for the variety. Or if u for a group of lines is identical with q for equilibrium the lines as a set are at equilibrium. Every line, good or poor, will then have the same general combining ability as measured by the average of its crosses with all of the other lines. Equilibrium for each locus is at the instant where a and A alleles combine equally well with the field.


b2 Estimated
P for
bp = 0
Mean F1
or F2
F1 of
Maximum open
Stringfield,1 F1 0.30 -0.015** 57.1 96.8 -44.2 88.5 146.3 102.0 1.87
  F2 0.34 -0.009** 76.7 69.9 -48.9 82.7 159.1   2.16
Kinman & Sprague,2 F1 0.42 -0.015* 54.2 79.9 -29.5 76.2 120.0 91.2 1.64
  F2 0.42 +0.005 - 50.8 - - - -  
Jorgensen & Brewbaker3 0.04 -0.002 210.1 372.5  
 Dent I 0.28 -0.008** 154.9 314.5 -44.6 224.1 369.9 324.5 2.08
 Dent II 0.22 -0.004 130.5 291.4  
 Flint I 0.36 -0.0002 2430.2  
 Flint II 0.62 -0.0008 888.3  
 white '26 0.65 +0.018  
 early yellow, '26 0.38 -0.052**  
 later yellow, '26 0.10 +0.037  
 white, '27 -0.09 +0.153  
 yellow, '27 0.07 -0.002  

* Significant
** Highly significant
1. Unpublished, see text.
2. Jour. Am. Soc. Agron., May, 1945
3. " Sept., 1927
4. " May, 1927
5. Jour. Agr. Res. Nov. 1. 1929

Jenkins (1929) almost attained that condition (last 3 entries in present table). For those data the partial regressions are nearly as frequently negative as positive and almost uniformly small numerically. After much selection Stringfield, and Kinman and Sprague studied groups of lines which show recession from the equilibrium which well selected varieties had closely approached 20 years or more ago. Recession may be due to mixing lines from different sources in one group and probably to selection for specific combining ability (more than average heterozygosity). The ceiling for hybrids is higher if one line has fewer AA loci, but this point can hardly be fully demonstrated without a 3-dimensional figure.

From the 3-dimensional figure for overdominance of the degree indicated (k = 2) it is clear that the F1 trend for increasing P1 and P2 rises steeply over most of the range of present corn breeding experience which just laps over the crest. Beyond the trend is downwards. Beyond we have hardly gone, partly because of linkage as visioned by Jones and partly because present practice requires slight recession from the crest to another equilibrium between selection for specific combining ability and selection for general combining ability and excellence of lines themselves.

Present interpretations must remain in some degree tentative until lines well beyond the crest to provide significant negative partial regressions have been obtained. Before such evidence any alternative interpretation of complex, non-additive gene action would stand entirely refuted, I think. Excess of any heterozygote over the top dominant would seem to be overdominance by definition. The possibility of explaining present results by non-additive action without overdominance is very small insofar as I can tell but space does not permit more to be said here. Neither does space permit listing of every point where overdominance theory agrees with corn breeding experience more closely than does dominance theory. I have found no discrepancies and so must say that the evidence for overdominance must seem overwhelming but not crucial to any unprejudiced mind. It will be appreciated if any discrepancies are pointed out.

The same analysis has been employed with data on other characters of Jenkins (loc. cit.) with no evidence of overdominance and in most cases slight evidence of any dominance at all. Height of plant is an exception, but it depends largely on vigor. No data on ear dimensions have been available.

Fred H. Hull