Problems and Solutions for Estimating Indel Rates and Length Distributions


My latest paper has now been published by MBE. You can grab your free reprint by following this link. I earlier blogged about some of the implications of this research, which you can see here.


Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly handle indels in neutrally evolving DNA sequences. From a data set of orthologous introns, we estimate relative rates and length distributions of indels among primates and rodents. This technique has the advantage of potentially handling large genomic data sets. We find that a zeta power-law model of indel lengths provides a much better fit than the traditional geometric model and that indel processes are conserved between our taxa. The estimated relative rates are about 12-16 indels per 100 substitutions, and the estimated power-law magnitudes are about 1.6-1.7. More significantly, we find that using the traditional geometric/affine model of indel lengths introduces artifacts into evolutionary analysis, casting doubt on studies of the evolution and diversity of indel formation using traditional models and invalidating measures of species divergence that include indel lengths.

Reference: Cartwright RA (2009) Problems and solutions for estimating indel rates and length distributions. Molecular Biology and Evolution. 26(7):473–480

About this Entry

This page contains a single entry by Reed A. Cartwright published on January 15, 2009 6:07 PM.

Hardware Store: Male-Male Adapters are "possible immoral" was the previous entry in this blog.

Compiling R Modules on FreeBSD is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by Movable Type 4.37