Address:
Department of Computational and Systems Biology,
John Innes Centre, Norwich Research Park,
Colney, Norwich, NR4 7UH, UK.

Telephone:
+44(0) 1603 450597 (direct)
+44(0) 1603 450000 (switchboard)

Fax:
+44(0) 1603 450595 (direct)
+44(0) 1603 450045 (JIC reception)

THREaDMapper PROJECT

The accurate construction of genetic linkage maps performs a vital role in the development and exploitation of plant genomic sequence [1]. The linkage map helps to guide a genome assembly via anchor markers and the markers further enable crop improvement through marker assisted selection. Many tools exist for linkage map construction but many of them are viewed as “black boxes” by the scientists who use them. Here we present a new tool, THREaDMapper, for the construction of genome wide linkage maps using a new computational approach which we show to be both accurate and robust. Importantly, through a visualisation process the tool allows the user to interact with the results of an analysis and to use their expert knowledge to refine the results.

The computational problem in hand is to take genetic segregation data from a mapping population for a series of genetic markers, to partition the genetic markers into a set of distinct linkage groups (ideally with a single linkage group spanning a chromosome) and, for each individual linkage group, to find the order of markers along them and the distances between adjacent markers. Our computational approach is based on our discovery that first three Principal Components of the marker distance matrix represent the majority of variation in such datasets. 3D visualisation of the results shows single linkage groups to exist as “snakelike” trend lines, with the genetic markers projected onto them. Using a combination of Minimal Spanning Trees aided by Local Principal Curves [2,3] and curvilinear geodesic smoothing, we are able to estimate the marker order and inter-marker distances for single linkage groups. Importantly, distance from the trend line is a function of the quality of a marker, allowing the user to remove unreliable markers from an analysis. Separation of linkage groups from the initial genome wide dataset is gained computationally using eigendecomposition-assisted Laplacian embedding [4], but with visual identification also possible via the characteristic shapes of these 3D linkage group objects.

We have tested our new methodology on both whole genome plant linkage datasets (particularly in pea, Arabidopsis [5], rice and barley [6]) and synthetic benchmark datasets. We believe our work to be a paradigm shift from current methods of linkage map estimation. The visualisation process it encapsulates can provide researchers with a better understanding of their data, allowing them to avoid the problem of “black box” algorithms over-fitting their datasets. By combining proven techniques from graph partitioning and advances in manifold embedding and learning, we have introduced fresh ideas that can change the way that biologists view, literally, their genetic maps.

References:
[1] Maksem K and Kahl G (Eds.) (2005) The Handbook of Plant Genome Mapping, Wiley-VCH.
[2] Hastie T and Stuetzle W (1989) Journal of the American Statistical Association 84:406; 502-516.
[3] Einbeck J, Tutz G and Evers L (2005) Statistics and Computing 15:4; 301-313.
[4] Ng, A. Jordan, M. and Weiss, Y. (2002) Proceedings of the NIPS Conference 2002.
[5] West MA, van Leeuwen H, Kozik A, Kliebenstein DJ, Doerge RW, St Clair DA and Michelmore RW (2006) Genome Research 16:6; 787-795.
[6] Cuesta Marcos A and Russell J (2005) Oregon Wolfe Barley (OWB) dataset: http://barleyworld.org/oregonwolfebarleys/maps.php


Name Role
Dr Jitender Cheema Lead developer
Dr Jo Dicks Project co-ordinator
Prof. Noel Ellis Original idea and project guidance
Category Links Status
THREaDMapper website Beta version now live Beta version. Version 1 planned for the end of December 2009.
Application to real datasets Interactive figure for the Brachypodium genetic map manuscript

Figure 2: Brachypodium data set annotated by MapManager QTX linkage groups
Garvin DF, McKenzie N, Vogel JP, Mockler TC, Blankenheim ZJ, Wright J, Cheema JJS, Dicks J, Huo N, Hayden DM, Gu Y, Tobias C, Chang JH, Chu A, Trick M, Michael TP, Bevan MW and Snape JW (2010) An SSR-based Genetic Linkage Map of the Model Grass Brachypodium distachyon. Genome (in press).
Related papers Genetic mapping review Cheema J and Dicks J (2009) Computational approaches and software tools for genetic linkage maps estimation in plants. Briefings in Bioinformatics 10(6):595-608.

Designed by Virginia Barnard
LAST UPDATED: 24th November 2009