|
Gene and/or genome duplication has been postulated as the driving force for the introduction of novel gene functions. The analysis of the impact of gene expansion on the
introduction of novel gene functions is made possible through the availability of sequenced genomes for a range of species. At present, comprehensive identification of gene
duplications in an organism requires powerful computational resources and bioinformatics expertise to execute various protocols including phylogenetic software such as PHYLIP and
LINTREE on a genome-scale in addition to implementing protocols for system and data management. |
|
We have addressed these issues through development of GDD application. Protocols for gene duplication analysis as outlined in Christoffels et al. 2004 are implemented here in GDD.
Using the GDD tool, the biologist can submit one or many protein families online. As easy as a click, the specified protein family is sent for processing and returns to the to
the user: (1) gene duplication(s) detected, (2) phylogenetic trees, (3) probabilities that genes evolve at the same rate (i.e., molecular clock) and (4) estimate of age of
duplication extrapolated from the linearized tree . The data is rendered to run in a high-performance Load Sharing Facility (LSF) cluster for parallel processing to achieve
better job performance and efficiency. The entire protocol (flow chart) is divided into small stages whereby each
stage represents one logical process that is normally the execution of a set of instructions. |
| Features in GDD: |
| ∗ able to search for duplicate genes in single or multiple families |
| ∗ online tracking of the analysis status at each stage
(e.g., single submission, multiple submissions) |
| ∗ result/data at each stage is available once completed, without waiting till the completion of entire protocol |
| ∗ able to view result of phylogenetic tree in both postscript and png format |
| ∗ parse the successful families and summarise all duplicate genes and its estimated duplication age |
| ∗ parse the failed families and consolidate the error code associate with family ID |
| ∗ support download of both completed duplicate gene-pair and failed family ID in tab-delimited text
format |
| ∗ retain and display of past submissions in history mode |
| ∗ download entire analysis result in tarball or zip format |
| ∗ re-run any stage(s) |
| click here to view summary on GDD application
including some screenshots. |