Potential pitfalls in mapping with recombinant-inbreds --M.A. Johns The recombinant-inbred (RI) mapping technique brought to maize by Burr et al. (Genetics 118:519-526, 1988) allows the rapid localization of almost any DNA probe. Using this method, genes can be mapped with just a few Southern blots performed on DNA from pre-existing and readily available plant material. It is a major advance in maize gene mapping. However, after mapping more than 30 gel bands by this method, I realized that it is not always easy to pinpoint the chromosomal location of a probe. This problem is due to the limited number of RI lines available and thus to the limited number of recombinations which have occurred between any two points.

The first difficulty in RI mapping will only be mentioned briefly without a detailed analysis. At best, a new gene can be localized to an approximate position between two previously mapped loci, and quite frequently the new gene can only be mapped near another locus without knowing which side it is on. Because the number of recombinations in any region is so low, distances between loci often do not add linearly. Also, the determination of gene order in classical gene mapping depends on examining flanking markers. With RI mapping, there are numerous cases where several recombinations have occurred on a chromosome in any given line, and so the examination of nearby markers in the RI lines is not a reliable indication of gene order.

The second difficulty in RI mapping concerns what could be called "ectopic localization": unlinked loci can appear to be linked, and unknown genes can appear to be located in more than one region of the genome. It is this problem that I wish to address in more detail here.

The RI mapping scheme compares the allelic distribution of 205 loci (database from early 1988) distributed over the maize genome, using two independent families of recombinant-inbreds. The COXTx family has 48 RI's, and the TXCM family has 41 RI's. As explained by Burr et al., the RI method allows a direct calculation of an R value, which is related to recombination frequency by the formula, r = R/(2-2R). Some of the lines show heterozygosity for some loci even after 7 generations of selfing: Burr et al. report a residual heterozygosity of 7.5%. The database does not contain all possible data points for every locus: 13.7% of the potential data points are missing. Most of these missing points are for probes which gave no usable data for one of the RI families, presumably due to lack of detectable polymorphisms.

I devised a computer program to compare the allelic distribution of each locus with that of every other mapped locus, except for the nearest 5 loci on either side. This exclusion eliminated most of the tightly linked loci. I found that, on the average, each locus was 0.481 R units (46.4 map units) away from every other locus, with a range of 0.434 to 0.512. That is, except for nearby loci, the RI method shows that every locus is essentially unlinked to the bulk of other loci. This result is exactly as expected.

However, the distance to the "nearest" locus on a different chromosome is quite variable, with an average of 0.317 R units (23 map units), and a range of 0.207 to 0.400 (13.0 to 33.3 map units). This means that, with the use of RI mapping, every locus is between 13 and 33 map units from another locus which is definitely unlinked. The lower number is especially significant, because in a normal mapping experiment, loci 13 map units apart are clearly linked. Also, some of the adjacent loci mapped by Burr et al. are more than 13 map units apart. Thus, it seems possible that an unknown locus mapped by the RI method could by chance appear to be located quite far from its actual location.

To take a specific example, the region between 8.05 and Pl on chromosome 1S is apparently close to a region on 7L between 7.61 and 8.37. The closest approach is between 10.38 on 1S and 7.61 on 7L, which are separated by 0.207 R units (13.0 map units). In comparison, the loci flanking 10.38 on 1S are 9.4 and 1.8 map units away. It is clear that the loci on 1S and 7L have been properly located, because they are linked in a chain to previously mapped genes. However, an unknown gene could fall at an ambiguous position, equally close to loci on different chromosomes. This is especially true if the unknown gene falls in a relatively sparsely mapped area. It was not difficult for me to create an artificial set of data that was equally close to 7.61 and 10.38, and more distant from every other locus. This example is by no means unique: many regions of the genome are apparently close to one another when mapped by the RI method.

After seeing that the RI mapping method produces apparent linkages between loci on different chromosomes, I decided to see how well random data sets could be mapped. These data sets were created by assigning the two parental alleles to the different RI lines at random. After a number of trials it became clear that random numbers for both the COXTx and TXCM families rarely produced any apparent linkages. That is, since the two families are independent, using both of them to map a locus is quite likely to yield a good, unique location.

However, not all probes will give polymorphic bands for both families: 52 of the 205 probes in the RI data base have data for only one family. Also, some probes, such as those from transposable elements, will not map to the same locations in both families. For these reasons, I attempted to map random data into the COXTx family only. To summarize the results, out of 709 random sets of data, 8 contained an R value of less than 0.25, and 2 of these had an R value of less than 0.225. Out of 296 trials, 25 had R < 0.275 for some locus, and out of 192 trials, 39 had an R of less than 0.30. The R value at which there is less than a 5% chance of getting random data to fit would seem to be between 0.275 and 0.25 (i.e. between 16.7 and 19.0 map units). The positions of these R value minima were randomly distributed in the genome. It can be seen that even random data, which might be produced as a result of wishful thinking applied to marginal data, can produce a "locus" for a probe.

This problem becomes even more acute for incomplete data sets. To address this issue, I created random data sets for the COXTx family that contained 10-50% missing data points, and compared them with the RI data base. As mentioned above, 1.1% of complete data sets contained an R value of less than 0.25. Using R < 0.25 as a criterion, I found that 1.8% of data sets with 10% of the data missing fit the criterion, 3.1% of data sets with 20% of the data missing fit, 8.2% of data sets with 30% missing fit, 17% of data sets with 40% missing fit, and 50% of data sets with 50% of their data missing fit the criterion of containing an R value of less than 0.25. Thus, small amounts of missing data do not seem likely to give false localizations, but the chance of getting a fit to a random location rises sharply as the amount of missing data increases. This problem is significantly eased if mapping can be performed with both RI families.

In conclusion, the RI method is an excellent method for quickly mapping a probe to a genomic position. However, since there are only a limited number of members in the RI families, certain problems arise which are not seen in standard genetic mapping. Specifically, there is a significant chance of mapping to an incorrect location, especially if there are no previously mapped loci near the unknown probe's apparent position. This problem is significantly increased when only one of the RI families is used, and is increased further if the data set is not substantially complete.

Please Note: Notes submitted to the Maize Genetics Cooperation Newsletter may be cited only with consent of the authors

Return to the MNL 64 On-Line Index
Return to the Maize Newsletter Index
Return to the Maize Genome Database Page