CGHdist is a phylogenetics program which estimates distance matrices based on gene content derived from the results of comparative genome hybridisation arrays. The method used is described in our paper (Savva et al) which is available for download.
From this page you can download CGHdist as source code (written in C) or Windows executable, along with the manual and a test dataset.
| FILE | DESCRIPTION | LAST UPDATED |
|---|---|---|
| cghdist.c | The CGHdist sourse code (in C) | |
| cghdist.exe | Windows executable |
CGHdist uses PHYLIP file formats for input data and returning the distance matrix. That is, an input containing gene content datasets relating three strains, each compared using array CGH to a reference strain might look like this:
3 7 Strain1 0011001 Strain2 0001001 Strain3 1001001
where each column refers to one of the genes found in the reference strain, and 0 or 1 is used to denote whether or not a homologue of that gene was identified in the test strain. The numbers in the first column refer to the number of strains and then the number of characters per strain. This must be the same for each strain. The reference strain does not need to be included here and
Each strain name must be exactly ten characters long (including spaces, as above). After the name has been read the program reads characters one at a time until it has the required number of 0's and 1's, so any amount of whitespace may be present. PHYLIP's interleaved format may not be used.
Since infile format conforms to the PHYLIP style for representing binary morphological caharaters, bootstrapped datasets can be generated using PHYLIP's seqgen program.
The syntax for running CGHdist is
cghdist <infile> <outfile>
where <infile> and <outfile> are the names of the files you want to use. If the infile is not found, an error occurs. If the outfile already exists, you are asked if you would like to overwrite it.
CGHdist will then prompt you for the model parameters to be used, and the number of datasets present in the file. The outfile containing distance matrices is then created. For instance, with alpha and rho set to zero, the infile described above yeilds the distance matrix output:
4 Strain1 0 0.405465 0.81093 0.847298 Strain2 0.405465 0 0.405465 1.25276 Strain3 0.81093 0.405465 0 0.847298 Reference 0.847298 1.25276 0.847298 0
CGHdist is distributed under the GNU General Public License (see www.gnu.org/licenses/licenses.html for more information.
DEVELOPER - George Savva
Contact the author at g.savva@qmul.ac.ukto report bugs, give comments and suggestions, or if you would like to be kept up to date with future releases / bug fixes.
Designed and maintained by Virginia Barnard
LAST UPDATED: 21st September 2005