John Innes Centre, Norwich Research Park,
Colney Lane, Norwich, NR4 7UH, UK.

Telephone: +44(0) 1603 450597 (direct)
+44(0) 1603 450000 (switchboard)
Fax: +44(0)1603 450595


CGHdist is a phylogenetics program which estimates distance matrices based on gene content derived from the results of comparative genome hybridisation arrays. The method used is described in our paper (Savva et al) which is available for download.

From this page you can download CGHdist as source code (written in C) or Windows executable, along with the manual and a test dataset.


cghdist.c The CGHdist sourse code (in C)
cghdist.exe Windows executable

Usage of CGHdist

CGHdist uses PHYLIP file formats for input data and returning the distance matrix. That is, an input containing gene content datasets relating three strains, each compared using array CGH to a reference strain might look like this:

3 7
Strain1   0011001
Strain2   0001001
Strain3   1001001

where each column refers to one of the genes found in the reference strain, and 0 or 1 is used to denote whether or not a homologue of that gene was identified in the test strain. The numbers in the first column refer to the number of strains and then the number of characters per strain. This must be the same for each strain. The reference strain does not need to be included here and

Each strain name must be exactly ten characters long (including spaces, as above). After the name has been read the program reads characters one at a time until it has the required number of 0's and 1's, so any amount of whitespace may be present. PHYLIP's interleaved format may not be used.

Since infile format conforms to the PHYLIP style for representing binary morphological caharaters, bootstrapped datasets can be generated using PHYLIP's seqgen program.

The syntax for running CGHdist is

cghdist <infile> <outfile>

where <infile> and <outfile> are the names of the files you want to use. If the infile is not found, an error occurs. If the outfile already exists, you are asked if you would like to overwrite it.

CGHdist will then prompt you for the model parameters to be used, and the number of datasets present in the file. The outfile containing distance matrices is then created. For instance, with alpha and rho set to zero, the infile described above yeilds the distance matrix output:

Strain1   0 0.405465 0.81093 0.847298
Strain2   0.405465 0 0.405465 1.25276
Strain3   0.81093 0.405465 0 0.847298
Reference 0.847298 1.25276 0.847298 0

Project Information

CGHdist is distributed under the GNU General Public License (see for more information.

DEVELOPER - George Savva

Contact the author at report bugs, give comments and suggestions, or if you would like to be kept up to date with future releases / bug fixes.

Designed and maintained by Virginia Barnard
LAST UPDATED: 21st September 2005