Address:
Department of Computational and Systems Biology,
John Innes Centre, Norwich Research Park,
Colney, Norwich, NR4 7UH, UK.

Telephone:
+44(0) 1603 450597 (direct)
+44(0) 1603 450000 (switchboard)

Fax:
+44(0) 1603 450595 (direct)
+44(0) 1603 450045 (JIC reception)

CrazyMapper PROJECT

The accurate construction of genetic linkage maps performs a vital role in the development and exploitation of plant genomic sequence [1]. The linkage map helps to guide a genome assembly via anchor markers and the markers further enable crop improvement through marker assisted selection. Many tools exist for linkage map construction but many of them are viewed as “black boxes” by the scientists who use them. Here we present a new tool, CrazyMapper, for the construction of genome wide linkage maps using a new computational approach which we show to be both accurate and robust. Importantly, through a visualisation process the tool allows the user to interact with the results of an analysis and to use their expert knowledge to refine the results.

The computational problem in hand is to take genetic segregation data from a mapping population for a series of genetic markers, to partition the genetic markers into a set of distinct linkage groups (ideally with a single linkage group spanning a chromosome) and, for each individual linkage group, to find the order of markers along them and the distances between adjacent markers. Our computational approach is based on our discovery that first three Principal Components of the marker distance matrix represent the majority of variation in such datasets. 3D visualisation of the results shows single linkage groups to exist as “snakelike” trend lines, with the genetic markers projected onto them. Using a combination of Minimal Spanning Trees aided by Local Principal Curves [2,3] and curvilinear geodesic smoothing, we are able to estimate the marker order and inter-marker distances for single linkage groups. Importantly, distance from the trend line is a function of the quality of a marker, allowing the user to remove unreliable markers from an analysis. Separation of linkage groups from the initial genome wide dataset is gained computationally using eigendecomposition-assisted Laplacian embedding [4], but with visual identification also possible via the characteristic shapes of these 3D linkage group objects.

We have tested our new methodology on both whole genome plant linkage datasets (particularly in pea, Arabidopsis, rice and barley) and synthetic benchmark datasets. Here, we describe the CrazyMapper’s analytical process, using the Arabidopsis [5] and barley [6] whole genome datasets as case studies. In all, we believe our work to be a paradigm shift from current methods of linkage map estimation. The visualisation process it encapsulates can provide researchers with a better understanding of their data, allowing them to avoid the problem of “black box” algorithms over-fitting their datasets. By combining proven techniques from graph partitioning and advances in manifold embedding and learning, we have introduced fresh ideas that can change the way that biologists view, literally, their genetic maps.

References:
[1] Maksem K and Kahl G (Eds.) (2005) The Handbook of Plant Genome Mapping, Wiley-VCH.
[2] Hastie T and Stuetzle W (1989) Journal of the American Statistical Association 84:406; 502-516.
[3] Einbeck J, Tutz G and Evers L (2005) Statistics and Computing 15:4; 301-313.
[4] Ng, A. Jordan, M. and Weiss, Y. (2002) Proceedings of the NIPS Conference 2002.
[5] West MA, van Leeuwen H, Kozik A, Kliebenstein DJ, Doerge RW, St Clair DA and Michelmore RW (2006) Genome Research 16:6; 787-795.
[6] Cuesta Marcos A and Russell J (2005) Oregon Wolfe Barley (OWB) dataset: http://barleyworld.org/oregonwolfebarleys/maps.php


Name Role
Dr Jitender Cheema Lead developer
Dr Jo Dicks Project co-ordinator
Prof. Noel Ellis Original idea and project guidance
Links Description Status
CrazyMapper beta Please keep checking this link for progress on the CrazyMapper Project Beta version to be released in summer 2009
Supplementary Information Supplementary interactive figures for the Brachypodium genetic map manuscript

Figure 2: Brachypodium data set annotated by MapManager QTX linkage groups

Figure 4: Brachypodium data set annotated by highest BLASTn hits to rice supercontigs
For peer-review purposes

Designed by Virginia Barnard
LAST UPDATED: 15th July 2009