Heterosis, grain yield. For homozygous parents and linear interaction of nonallelic genes, in the notation of Fisher et al Genetics 17:107, 1932, d is (AAaa)/2, h is the deviation of aA from the midpoint between aa and AA.
P_{1} = 2n_{1}d + R  F_{1} = n(d + h) + R  B_{1} = 1/2n(d + h) + n_{1}d + R  
P_{2} = 2n_{2}d + R  F_{2} = n(d +1/2h) + R  B_{2} = 1/2n(d + h) + n_{2}d + R  
P = 2nd +2R  F = 2nd + 3/2nh +2R  B = 2nd+ nh +2R 
is the phenotype, n is number loci heterozygous in F_{1}, R is the least homozygote available by segregation.
Analysis of data
Maize yield  Tomato, Powers^{3}  
Neal^{1}  Lindstrom^{2}  Danmark  × Red Current 
Johannis. × Red C 

Height  Fruit wt.  Fruit wt.  
Estimates of 2nh  (All records per cent of F_{1})  
4(F_{1}F_{2})  148.1  136.8  76.0  36.0  
(2F_{1}P)  124.4  127.6  58.5   751.7   625.1 
2(2F_{1}B)  113.2  62.8  241.6   228.8  
2(2F_{2}P)  130.3  118.4  41.0  1510.6  1486.3 
4/3(FP)  126.4  124.5  52.6  1004.7   845.0 
4(FB)  89.6  49.6  490.4   493.6  
2(BP)  142.0  54.2  1261.8  1021.5  
Mean 2nh  132.3  121.7  56.4   750.5   666.3 
(F_{2}1/2B)   5.9   3.3  67.5  67.3  
P  75.6  72.4  141.5  950.7  836.6 
^{1}J.A.S.A. 27:666, 1935.
^{2}Proc. 7 Int. G.C.
^{3}J.A. Res. 63:149, 1941.
The close agreement of Neal's and Lindstrom's data in the above analysis seems to indicate strongly that grain yield is a function of heterozygosis. For any locus, (aAaa)  (AAaA) = (h+d)  [2d(h+d)] = 2h. The interval from the least homozygote to the heterozygote minus the interval from the heterozygote to the top homozygote is 2h for one locus or 2nh for n loci, if h and d values are essentially the same for all loci.
For all values of h or h/d (any degree of dominance) the 7 estimates of 2nh (table) are a homogenous set, except for nongenetic fluctuations. Heterogeneity indicates interaction of nonalleles.
The three quantities, (P = 2nd+2R)>(F_{1} = nh+nd+R)>2nh must lie in that or the reverse order with each interval in any case equal to ±[n(hd)R]. If h=d (dominance complete) the intervals are estimates of R. On that assumption the mean estimate of R for the two maize records is minus 26.5%F_{1}. If R cannot be negative the minimum estimate of R equal zero provides the minimum estimate of h equal 1.7d.
The top homozygote is (PR). For these records it cannot be estimated larger than 74%F_{1} if negative R is to be avoided.
The data on tomato weight and estimates of 2nh from them may seem to suggest a complication of interactions, although the two sets of 2nh are quite similar. It is proposed to separate allelic from any regular nonallelic interaction graphically. The points P_{1}, B_{1}, F_{1}, F_{2}, B_{2} and P_{2} are plotted with the scale on the Ø axis being that of the actual data and on the x axis that of allelic but no nonallelic interaction. Lay off a wide interval from P_{1} to P_{2} on the x axis. Trial positions of F_{1} may then be taken with F_{2} midway between F_{1} and the mean of parents and each backcross midway from F_{1} to the recurrent parent. The best trial position of F_{1} should be 2(F_{1}F_{2}) from the mean of parents in the direction indicated by the data, since F_{1} and F_{2} have the same gene number and their comparison will be least affected by nonallelic interaction. If the 6 plotted points do not seem to lie on a smooth curve F_{1} is to be shifted right or left with F_{2} and backcross shifts being 1/2 of the F_{1} shift until the best fit to a smooth curve is obtained. The curve presumably represents regular nonallelic interaction or regular interaction with environment. Allelic interaction is evident in the 7 estimates of 2nh which should be a uniform set.
In this way, close fits to smooth curves were obtained with Power's data on the crosses Danmark × Red Current and Johannisfeur × Red Current with F_{1}s just slightly to the right of the parental midpoint towards heavier fruit. The curves lie between Ø = kx^{3} and Ø = b^{x} over most of the range. Both agree closely with the hypothesis of very slight dominance of heavier fruit and strong, regular interaction of nonalleles. The interaction may of course be little more than the cubic relation of weight or volume to linear dimension.
A slightly poorer fit was obtained for Johannisfeur × Bonny Best but the same dominance bias and interaction is evident. The two records on Danmark × Johannisfeur did not provide consistent solutions, perhaps because the parents are too close together. That difficulty would always appear with yield records on inbred maize.
Complementary interaction is not regular in the above sense. It might become evident in the (F_{2}1/2B) comparison and in aberrations from regular interaction in the above graphical analysis. With 2factor interaction, F_{2} is 9/16 and 1/2B is 8/16 of the interval from 1/2P to F_{1}; both are 8/16 without interaction. There is no evidence of complementary interaction as a factor of heterosis of maize yield or of tomato plant height. There seems to be no evidence for complementary interaction for tomato weight except in the cross Johannisfeur x Bonny Best. If the curve for that cross is plotted by neglecting the F_{2} to obtain the best fit with F_{1} and backcrosses the F_{2} deviation from the curve is large and positive which may indicate complementary interaction for heavier fruit. Plotting 3√Ø or log Ø might bring the complementary interaction out more clearly.
The reader should be warned that application of the above graphical analysis to data involving little or no nonallelic interaction and strong interaction of alleles as in tomato plant height may produce a straight line with the 6 values spaced the same on both axes or a smooth curve through P_{1}, B_{1}, F/2, B_{2} and P_{2}. In the latter event the six values will agree with the hypothesis of no allelic interaction on the x axis. The factor of curvature here is h. I do not now have the function.
For linear interaction of nonalleles, theoretical regressions in F_{2} and backcross of Ø on x (gene number) are:
F_{2};  Ø =  hx_{2} + (2n1)dx + 2nhx + R  ,  dØ/dx = d +  (2n2x)h  
2n1  2n1  
Bn;  Ø =  nd + (n2n_{b})hx  +  2n^{2}_{b}h + R  ,  dØ/dx = d +  (n2n_{b})h 
n  n  n 
n is the number of loci heterozygous in F_{1}; nb is the number of n loci fixed AA in the recurrent parent.
These equations seem to be mainly useful for the solution of theoretical problems. For example, the backcross distribution is not skewed by any degree of dominance even though the recurrent parent is fixed AA at all n loci, (n_{b} = n). The slope is then (dh) or zero if h = d. If h>d the slope is negative  0 decreases as the number of plus genes increases. If n_{b} is zero the slope is (d+h)  positive unless h is negative and greater than d.
F_{2} regression is a second degree parabola with slope a function of 2hx. The F_{2} distribution is skewed by dominance. The familiar case (h = d) involves the left branch of the parabola from (0,R) rising with decreasing slope to the vertex at (x = 2n1/2), then dropping slightly to (x = 2n). This function may be employed with the normal frequency table to construct a theoretical distribution for any number of loci and any degree of dominance to show that maximum skewness is reached when h = d; and that skewness then decreases with increasing h. The demonstration is facilitated by working with one pair of genes. Thus if A'A' equals AA, and A'A is some greater value, d is zero and h is relatively large. The F_{2}, (1/4 A'A'+1/2AA) becomes (1/2A'A',AA + 1/2A'A). This distribution or the product of any number of such distributions is symmetrical. If d is now allowed to take increasing positive values, skewness increases up to h = d. East's alleles of divergent function would not intensify skewness of F_{2}.
The conclusion of h>d for maize yield is supported by failure of mass and ear row selection, by failure of synthetic combinations of selected inbreds, by superiority of hybrids of inbreds of diverse origins and by the success of modern maize breeding itself. If h is not greater than d, mass or ear row selection will probably continue to surpass present maize breeding technic, because of more frequent recurrence of selection. But if h>d, present technic is the only method so far tried which should effect appreciable improvement. No degree of allelic interaction will confuse selection among F_{1} hybrids of homozygous lines. However, selection favoring the heterozygote loses efficiency rapidly. It is questionable if the expectation of continuing success with present technic can be supported in Mendelian theory.
Selection may be measured by the deviation of the mean of a selected group from the original mean in terms of the standard deviation of the original. Thus "student" noted selection effects of 12 and 7 sigma for high and low oil in the Illinois experiments. If the selected group may be represented by a tail of the normal area cut off above x = t, and the mean of the tail is s; s = (ordinate at t)/(area beyond t), or (P_{t}). Then 1/P_{t} is the number of individuals from which selection of the top one may be expected to effect a selection differential of the given value of s. The highest value of s calculable from a 15place table of areas and ordinates of the normal curve, (W.P.A. City of New York) is 8, for which 1/P_{t} is 222,222,000,000,000. This is roughly 2000 times the number of maize plants grown in the world in one season. That the low oil result (s = 7) might have been obtained by selection among 400,000,000 homozygous lines is plausible. The high oil result (s = 12) is 4 billion million times as difficult. Selection of the top 10 from 26 provides an s of one in the absence of gene interaction and environmental effects. Eight recurrences of such selection will effect an s value of 8 if variability is maintained as it was in the selection for oil. A total of 208 plants is required. From this viewpoint the oil selection results do not seem improbable as the work was done; they do seem very improbable in the face of much inbreeding.
The s value of the top one of 11,185 singlecrosses from at least 150 inbred lines is about 4. This might be a yield increase of about 40% over original stock. The genetic variance of singlecrosses is the same as for single plants of original crossbred stock. Sigma in this case is then 10% of the original mean yield. This seems a fair estimate of the present Florida situation. The problem now is how much effort will be required for further gains. If each cycle of inbreeding must begin at the same level as the first, as indicated by the yield of synthetic combinations of selected lines and nearly all other available evidence, it will be necessary to identify the best single cross among 1,300,000 from 1600 homozygous lines to effect a further improvement of 10%. Gaining 10% again beyond that will be truly difficult, even though the genetic variation may remain unimpaired in the process as suggested by oil selection results.
A breeding technic has been proposed to deal with the case h>d, Hull, Recurrent Selection for Specific Combining Ability in Corn. J.A.S.A. in press. The method is recurrent selection in a crossbred lot for combining ability with a specific homozygous line. Selection is among testcrosses of single plants of the crossbred lot to the homozygous tester line. For any locus heterozygous in the crossbred lot and aa in the tester the testcrosses are: aa, (aa+aA)/2, and aA, or if the tester is AA they are: aA, (aA+AA)/2, and AA. The three testcrosses are separated by equal intervals, (d+h)/2 in the first case and (dh)/2 in the second. The essential point is that the three values are equally spaced as would be the three genotypes in a crossbred population without dominance. This type of selection avoids the confusion of dominance or allelic interaction even though h>d. The price is some loss of variance. It also allows maximum frequency of recurrence of selection. Maximum frequency of recurrence with respect to resistance to insects and diseases as well as to yield and any other desirable characters would seem to be obtained by simultaneous selection.
Tomato weight and height have been included for contrast with maize yield. Estimates of 2nh involving (B) are smaller than those involving (P) for both maize yield and tomato weight. B values might suffer less distortion from nonallelic interaction than P values since the former are nearer the center. The slightly excessive value of B in Lindstrom's data may indicate nothing more than a little heterozygosity remaining in the parent lines. Strong allelic interaction is indicated for maize yield. Tomato weight records indicate very slight allelic interaction but strong nonallelic interaction. Both the maize yield and tomato weight situations seem improbable. If the tomato weight interaction is the cubic relation of volume to linear dimension, why does not this function appear in the relations of aa, aA and AA at one locus? Why would it not appear in the maize yield between nonalleles? Why does h>d appear only in grain yield of maize; not in components, e.g. ear length and diameter, plant height, stalk diameter etc.? Tomato height in F_{1} exceeds the greater parent but not the sum of parents (P). There is no evidence here of h>d and slight evidence of nonallelic interaction.
The enormous selection intensities available by properly controlled recurrent selection provide a tool for investigation of physiological limits, limits of recombination, and perhaps detection of aggregates of natural or induced mutations in a group of numerous small genes.
Appendix  January 10, 1945: Hayes et al, J.A.S.A. 36:998, 1944; data on synthetic, mean of parent lines and mean F_{1}. From F_{1} minus synthetic the estimate of 2nh is 160% F_{1}. The (2F_{1}  P) estimate of 2nh is 127% F_{1}. If h = d, and R = 0, then F_{1} = 2nh. Decline from F_{1} to F_{2} or synthetic is 2nh/2N, where N is number of lines. On the ongoing assumptions, expected decline of Hayes' synthetic is 100/16 o r 6.25 % F_{1}. If R is 20 % F_{1}, expected decline of synthetic is 5 %F_{1}.
The actual decline of 10% F_{1}, may be evidence of h>d, nonallelic interaction, or R<0. Taking R = 0, no interaction, then h = 4d for the F_{1}  synthetic comparison, and h = 1.74d for (2F_{1}  P).
Kiesselbach, J.A.S.A. 22:614, 1930; F_{2} and F_{3} of 21 singlecrosses, h = 1.98d.
Richey et al, J.A.S.A. 26:196, 1934; F_{2} 10 double crosses, h = 1.55d.
Neal, loc. cit., F_{2} 10 double crosses, h = 172d.
If R is some positive value all of the above estimates of h must be revised upward.
Fred H. Hull