uni of otago logo
Otago School of Medical Sciences

uni of otago logo

Integrated Genomics Resources for Health and Disease- Project overview









Transterm: A database of mRNA sequences and elements


A Transterm analysis window



Conservation of a regulatory element during evolution

mRNA related links (from the Transterm database)



Bioinformatics is the science required to capture, store and comprehend large amounts of biological data - for example the three billion bases of the human genome.

This new field is a blend of biology, mathematics and computer science. Current research at the University of Otago includes the creation and mining of international genetic databases, for example the human genome. A more technical outline is here:

Regulatory elements in the human genome

Many genes provide the instructions to make proteins - they 'code' for proteins. These proteins have specific roles in cells. Massive worldwide efforts, including the human genome project, have been able to discover most of the protein coding genes in the human genome. There are over 25,000 human protein coding genes, encoding proteins as diverse as haemoglobin (oxygen carrier), insulin (hormone), or trypsin (digestive enzyme).

Surprisingly, protein coding regions only make up only about 1% of the three billion base genome. Most of rest of the human genome was once considered to be 'junk' DNA.

However, ongoing animal genome sequencing projects have shown a much larger amount, about 3-5% of the genome is similar between vertebrates. Thus different species for example human, chicken and dogs have conserved regions. This is in addition to what is needed to code for the proteins. These parts of the genome have also been conserved during the evolution of vertebrates.
These conserved parts are expected to contain most of the information required to program cells to perform different functions. For example to control, or 'turn on' genes in the right cells or times, haemoglobin in blood cells, trypsin in the digestive system, or regulate the synthesis of monoamine oxidase in the brain. This information directs cells to develop into different types, to divide or die, to perform their proper function. If the process goes wrong cells may grow aberrantly, as cancer cells.

Ongoing research at the University of Otago

The Brown group aims to decipher this control or regulatory information. Much of this information is found in the regions before the coding regions for proteins. Deciphering this information in bovine and sheep is being been done in collaboration with AgResearch.

Another part of the regulation is done through regions in the molecules that encode the proteins, messenger RNAs (mRNAs). Each protein coding region produces mRNAs that are translated into the appropriate protein. However, about 40% of this mRNA is not translated (untranslated) known as the UTR. Some of these regions contain regulatory information.   Well known human examples include information for the regulation of iron balance and inflammation.

Mutations in these elements are associated with disease, as are the better known mutations in the protein encoding regions and other regulatory elements. Such mutations have been shown to contribute to some cases of disorders including cancer, heart disease, arthritis and diabetes (reviewed in Chen et al 2006).
The Brown group have developed a new integrated approach to decipher this information. Analysis has utilised high throughput genomic experimental data and the Brown group's bioinformatic techniques. This identified regions subject to 'purifying selection' that are also similar to known functional elements and that we predict are key regulators of gene expression.
The outcome of these studies will be to greatly extend the range of known functional RNA elements in the human genome.

Mining large amounts of sequence data: Integrated Genomics Resources for Health and Disease

Genomic information is stored in large repositories located around the world. Data is continuously poured into these databases, then copied or mirrored to individual countries or institutions. Currently the international 'biomirror' of sequence databases contains ~500 Gb of dynamic data. In the past of this data, for example the human genome (just 3 Gb of raw sequence data, but 70Gb of associated information about disease and similarities to other species) are downloaded to the University of Otago and mined for key pieces of information.

Using New Zealand's new Research and Education Network's (KAREN) international reach to combine human genetic databases from the USA (eg UCSC, CaBig, GenePattern) and Europe (eg ENSEMBL) with NZ generated data (eg UOGF), and using NZ developed integration tools (eg Transterm) information will be provided to the international medical and biotechnology research and education community. The services are now available at Bioanalysis.otago.ac.nz.

This project was recently funded by a grant to Dr Chris Brown and Dr Mik Black from REANNZ as a part of the KAREN capability building fund and is an example of eResearch at Otago. Videos of presentations on the Karen network including one by Mik Black on this project are available here.

Links to related research at the University of Otago



Data Sources

Comparative genomics-Ensembl, Europe

The National Center for Biotechnology Information. USA

The Human Genome - Galaxy Browser, USA

The Human Mutation Database, UK

The Welcome Trust Control Consortium (WTCC), UK

The Encyclopedia of DNA elements. Encode project, USA

The Cancer Array Database, USA

Ongoing vertebrate genome projects- data used by the project


Human Genome


Cow genome, Bovine genome


Dog genome


Dog genome




Marsupial genome





A typical human mRNA drawn approximately to scale drawn for wikipedia (mRNA). The main function of a mRNA is to encode a protein, the Coding Sequence (green) does this, but much of the mRNA is not translated (Untranslated regions (UTRs) yellow and purple). This part may contain regulatory sequences see UTRPathDB.



        Contact us