Recently in Biology Category

I’ve had the good fortune of having some papers published recently. The first one is a methodology paper concerning a way of extracting phylogenetic information from regions of multiple sequence alignments that are full of indels and difficult to align:

PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination. (link)

Robert Lücking, Brendan P Hodkinson, Alexandros Stamatakis and Reed A Cartwright

BMC Bioinformatics 2011, 12:10 doi:10.1186/1471-2105-12-10

My co-author, Brendan Hodkinson, has already covered it on his blog.

In molecular biology, an alignment is a partial reconstruction of the evolutionary history of a group of sequences. In an alignment, all residues found in the same column are considered to be descended from a single residue in the ancestral sequence. (Of course, insertions violate this description, but I won’t get into that.) Alignments are not direct observations. They are actually inferences based on the patterns of sequences found in the dataset. Often times there are particular areas in which the alignment is difficult to resolve. Take this example:

problemalign.png

A typical problem in multiple sequence alignments where a section is full of gaps and contains a complicated phylogenetic signal. Dark red: high certainty that alignment is accurate; Dark blue: low certainty that alignment is accurate..

It was constructed via the GUIDANCE webserver. (A great resource that everyone should use.) In this example, we have a region defined by a lot of sequence variation created by many insertions and deletions. The alignment is not well defined here, and in most applications it will just be removed, and the data “thrown away”.

But is this the only solution? In our paper we develop a methodology, dubbed PICS-Ord (download), that provides an easy solution for extracting phylogenetic information from problematic regions chosen by its user. PICS-Ord works through a three-step process:

  1. Realign the segments in pairs using Ngila, and calculate the likelihood of the alignment from an evolutionary model. This produces a distance matrix of the segments.
  2. Ordinate the distance matrix using principal coordinate analysis (PCoA). This assigns each segment to a point in n-1-dimensional space.
  3. Quantize each dimension into a set of characters

This might seem a bit odd at first. “Why not just use the distance matrix directly?” That would be great if we could, but there aren’t any phylogenetic programs that we know off that allow the mixing of distance matrices and sequence data. With our method, we get discrete, ordered characters that can be used in popular programs like, RAxML.

There are three example files in the PICS-Ord distribution, and I’ll illustrate its usage with example1.fas. The alignment of these sequence fragments is messy:

 100 114
sequence_001 ----------------------------------------------------------------------------tatactatcta---------------------------
sequence_002 -------------------------------------------------------------------aattgtatttatactatata---------------------------
sequence_003 -------------------------------------------------------------------tttaagatttattctatatt---------------------------
sequence_004 tttaggattaattttata--------------------------------------------------------taatactaatata---------------------------
sequence_005 -------------gatgg--------------------------------------------------------ttttacctatata---------------------------
sequence_006 ---------------------------------------------------------------------------tatcattatgca---------------------------
sequence_007 ---------------------------------------------------------------------------tatcattatgca---------------------------
sequence_008 -------------------------------------------------------------------------atatgtttaagata---------------------------
sequence_009 -------------------------------------------------------------------------atatgtttaagata---------------------------
sequence_010 -------------------------------------------------------------------------atatgtttaagata---------------------------
sequence_011 -------------gtac----------------------------------------------------------aattataatata---------------------------
sequence_012 -------------gtac----------------------------------------------------------aattataatata---------------------------
sequence_013 -------------gtac----------------------------------------------------------taatttaatata---------------------------
sequence_014 -------------ctac-----------------------------------------------------------aatataatata---------------------------
sequence_015 -------------ctac-----------------------------------------------------------aatataatata---------------------------
sequence_016 -------------ctac-----------------------------------------------------------attaaaatata---------------------------
sequence_017 -------------ctac-----------------------------------------------------------attaaaatata---------------------------
sequence_018 -------------gtat-----------------------------------------------------------aatttaatcta---------------------------
sequence_019 -------------gtat-----------------------------------------------------------attttaatcta---------------------------
sequence_020 -------------------------------------------------------------------------------ataagata---------------------------
sequence_021 -------------------------------------------------------------------------------ataagata---------------------------
sequence_022 --------------------------------------------------------------------------attataattaata---------------------------
sequence_023 --------------------------------------------------------------------------attataattaata---------------------------
sequence_024 -------------------------------------------------------------------------------ataagata---------------------------
sequence_025 -------------------------------------------------------------------------------ataagata---------------------------
sequence_026 ----------------------------------------------------------------------------aaaaaaaaata---------------------------
sequence_027 -----------------------------------------------------------------------------aaaaaaaata---------------------------
sequence_028 -------------------------------------------------------------------------------acaaaata---------------------------
sequence_029 -------------------------------------------------------------------------------acaagata---------------------------
sequence_030 --------------------------------------------------------------------------------acaaata---------------------------
sequence_031 -------------------------------------------------------------------------------acaaaata---------------------------
sequence_032 -------------gaat-----------------------------------------------------------aatattaaata---------------------------
sequence_033 -------------gaat-----------------------------------------------------------aatattaaata---------------------------
sequence_034 -------------gaaa-----------------------------------------------------------aatattaaata---------------------------
sequence_035 -------------gtat-----------------------------------------------------------tctttaatata---------------------------
sequence_036 -------------gtat-----------------------------------------------------------tatttaatcta---------------------------
sequence_037 -------------gtat-----------------------------------------------------------tatttaatata---------------------------
sequence_038 -------------gtat-----------------------------------------------------------tatttaatcta---------------------------
sequence_039 -----------------------------------------------------------------------------gttttatata---------------------------
sequence_040 -----------------------------------------------------------------------------gtttaatata---------------------------
sequence_041 -------------------------------------------------------------------------atcagtttaatacg------------------ctgagtgat
sequence_042 -------------------------------------------------------------------------accagtttaattta------------------ctgggtgat
sequence_043 ----------------------------------------------------------------------------------------------ctcagtttctgctgagtggt
sequence_044 ----------------------------------------------------------------------------agtttaatatg------------------ctgattgat
sequence_045 --------------------------------------------------------------------------------atatgta---------------------------
sequence_046 --------------------------------------------------------------------------------atatgta---------------------------
sequence_047 --------------------------------------------------------------------------------ataagta---------------------------
sequence_048 --------------------------------------------------------------------------------ataagta---------------------------
sequence_049 --------------------------------------------------------------------------------ataagta---------------------------
sequence_050 --------------------------------------------------------------------------------atatgta---------------------------
sequence_051 -----------------------------------------------------------------------------gttttctaat---------------------------
sequence_052 -----------------------------------------------------------------------------gtttactaaa---------------------------
sequence_053 -----------------------------------------------------------------------------gtttactaat---------------------------
sequence_054 -----------------------------------------------------------------------------gtttactaat---------------------------
sequence_055 -------------------------------------------------------------------------------gcta-aaa---------------------------
sequence_056 -------------------------------------------------------------------------------gcta-aaa---------------------------
sequence_057 -------------------------------------------------------------------------------gcta-aaa---------------------------
sequence_058 -----------------------------------------------------------------------------gtttactgaa---------------------------
sequence_059 -----------------------------------------------------------------------------gtttactgaa---------------------------
sequence_060 -----------------------------------------------------------------------------gtttactgaa---------------------------
sequence_061 -----------------------------------------------------------------------------gttagctgaa---------------------------
sequence_062 -----------------------------------------------------------------------------gttagctgaa---------------------------
sequence_063 -----------------------------------------------------------------------------gttagctgaa---------------------------
sequence_064 -------------------------------------------------------------------------------gttt-aaa---------------------------
sequence_065 -------------------------------------------------------------------------------gttt-aaa---------------------------
sequence_066 -------------------------------------------------------------------------------gttt-aaa---------------------------
sequence_067 -------------------------------------------------------------------------------gcta-aaa---------------------------
sequence_068 -------------------------------------------------------------------------------gcta-aaa---------------------------
sequence_069 -----------------------------------------------------------------------------atttacttaa---------------------------
sequence_070 -----------------------------------------------------------------------------atttacttaa---------------------------
sequence_071 -----------------------------------------------------------------------------atttacttaa---------------------------
sequence_072 ---------------------------------------------------------------------------------gttaaa---------------------------
sequence_073 ---------------------------------------------------------------------------------gttaaa---------------------------
sequence_074 aattttattaattactttagtaattaataaggttattttaagtaacagcaaaatattagttaaaagcgttgct-tgcaattagtaaagt--------------agca-ttatta
sequence_075 aattatattaattactttagtaattaaatttgttatttttagtaacagcaaaatattagttacaagcgttgct-tgtaattagtaaagt--------------agca-ttatta
sequence_076 ---------------------------------------------------------------------------------ttttta---------------------------
sequence_077 ---------------------------------------------------------------------------------ttttta---------------------------
sequence_078 ---------------------------------------------------------------------------------ttttta---------------------------
sequence_079 ---------------------------------------------------------------------------------ttttta---------------------------
sequence_080 -------------gaag-----------------------------------------------------------attaataacta---------------------------
sequence_081 -----------------------------------------------------------------------------atttatatta---------------------------
sequence_082 -----------------------------------------------------------------------------atttatatta---------------------------
sequence_083 actcctact------ttaaacatttagtagtgtcgaacctactgatagcatctggttttctattgg--------tacttataacataaccactaaatatttagagtattaatta
sequence_084 actcctact------ttaaacatttagtagtgtcgaacctactgatagcatctggttttctattgg--------tacttataacataaccactaaatatttagagtattaatta
sequence_085 -------------gaaa----------------------------------------------------------taacagtaacta---------------------------
sequence_086 -------------aaag-----------------------------------------------------------attagtaacta---------------------------
sequence_087 aattttaca------tttagtttttaatctttatgtttaaaa----acatgtatgctatttatatg--------tatatataatatagt--------------agaacttacaa
sequence_088 aattttact-------------------ttgggt-tttaaaa----actagtatgctatgtttatatattaatttatatatcatatagt--------------agaacttacaa
sequence_089 aattttact------ctt--tttttaagttttat-atttaaa----atctgtatgctatgtttatatattaatttatatataatatagt--------------agaacttacaa
sequence_090 aattttact------ctt--tttttaagttttat-atttaaa----atctgtatgctatgtttatatattaatttatatataatatagt--------------agaacttacaa
sequence_091 -------------gtac-----------------------------------------------------------ataataatata---------------------------
sequence_092 -------------gtaca--------------------------------------------------------taataataatata---------------------------
sequence_093 -------------gtaca--------------------------------------------------------taataataatata---------------------------
sequence_094 -------------gtac-----------------------------------------------------------ataataatata---------------------------
sequence_095 ---------------------------------------------------------------ttttttataccaataaataatata---------------------------
sequence_096 ---------------------------------------------------------------ttttttataccaataaataatata---------------------------
sequence_097 ---------------------------------------------------------------ctatttata-taataaataatata---------------------------
sequence_098 -------------ctat-----------------------------------------------------------ataaaaatata---------------------------
sequence_099 -------------ctat-----------------------------------------------------------ataaaaatata---------------------------
sequence_100 -------------ctat-----------------------------------------------------------ataaaaatata---------------------------

But instead of throwing it away, you can process it with PICS-Ord and get a clean set of ordered characters that contain approximately the same phylogenetic information as the sequences above.

    100    20
sequence_001 53221002101000000010
sequence_002 44121113101010000000
sequence_003 53211103102011000100
sequence_004 53321103111000010100
sequence_005 53211003101000001000
sequence_006 53221002001000000000
sequence_007 53221002001000000000
sequence_008 43220112011000000000
sequence_009 43220112011000000000
sequence_010 43220112011000000000
sequence_011 53221012011000000000
sequence_012 53221012011000000000
sequence_013 53221012011000100000
sequence_014 53321012001010100000
sequence_015 53321012001010100000
sequence_016 53221013001000000000
sequence_017 53221013001000000000
sequence_018 53221012001000100000
sequence_019 53221012001000001000
sequence_020 53220102011000000000
sequence_021 53220102011000000000
sequence_022 53121012011010000000
sequence_023 53121012011010000000
sequence_024 53220102011000000000
sequence_025 53220102011000000000
sequence_026 53220002001000000000
sequence_027 53220002001000000000
sequence_028 53120102011000000000
sequence_029 53220102011000000000
sequence_030 53120002011000000000
sequence_031 53120102011000000000
sequence_032 53220002111100000000
sequence_033 53220002111100000000
sequence_034 53220002111000000000
sequence_035 53221012011000001000
sequence_036 53221012001000000000
sequence_037 53221012011000000000
sequence_038 53221012001000000000
sequence_039 53221102111000000000
sequence_040 53211002011000000000
sequence_041 53300112011000000000
sequence_042 53200112011010001010
sequence_043 53300103001100001001
sequence_044 53200112001110000000
sequence_045 53120112001000000000
sequence_046 53120112001000000000
sequence_047 53120102001000000000
sequence_048 53120102001000000000
sequence_049 53120102001000000000
sequence_050 53120112001000000000
sequence_051 53211002011000000000
sequence_052 53111002011000000000
sequence_053 53211002011000000000
sequence_054 53211002011000000000
sequence_055 53110002001010000000
sequence_056 53110002001010000000
sequence_057 53110002001010000000
sequence_058 43111002011000000000
sequence_059 43111002011000000000
sequence_060 43111002011000000000
sequence_061 53201002011000000000
sequence_062 53201002011000000000
sequence_063 53201002011000000000
sequence_064 43111002011000000000
sequence_065 43111002011000000000
sequence_066 43111002011000000000
sequence_067 53110002001010000000
sequence_068 53110002001010000000
sequence_069 43111002011000000000
sequence_070 43111002011000000000
sequence_071 43111002011000000000
sequence_072 53111002001000000000
sequence_073 53111002001000000000
sequence_074 59021102011001100000
sequence_075 59020012001110001000
sequence_076 53121102001100000000
sequence_077 53121102001100000000
sequence_078 53121102001100000000
sequence_079 53121102001100000000
sequence_080 53220002001100000000
sequence_081 53121102001000000000
sequence_082 53121102001000000000
sequence_083 90021002001000000000
sequence_084 90021002001000000000
sequence_085 53220002000100000000
sequence_086 53120002000101000000
sequence_087 02121100001000000000
sequence_088 02020003000010100000
sequence_089 02021013011100000000
sequence_090 02021013011100000000
sequence_091 53321002001000000000
sequence_092 53321002011000000000
sequence_093 53321002011000000000
sequence_094 53321002001000000000
sequence_095 53321202010100001000
sequence_096 53321202010100001000
sequence_097 43321102011000001000
sequence_098 53321003001000000000
sequence_099 53321003001000000000
sequence_100 53321003001000000000

June Conferences

| 2 Comments

This month I’m going to two conferences: SMBE and Evolution.

At SMBE, I’ll be giving a poster in poster session #2 on Friday, June 5th. Drop by between 8–9pm and take your picture with Prof. Steve Steve.

At Evolution, I’ll be giving my talk on Tuesday, June 16th at 11am, in the “Population Genetic Modeling” section.

The final chapter of my dissertation has finally been published in Molecular Ecology. This is the project that got me involved with the software SPAGeDi. Although, none of that work remains in the final version of the paper, I have successfully collaborated with the authors of SPAGeDi to make it portable to Linux and OS X. The portable version will soon be made public.

Anyway, the citation of my paper is

Cartwright RA (2009) Antagonism between local dispersal and self-incompatibility systems in a continuous plant population. Molecular Ecology 18:2327-2336. [doi:10.1111/j.1365-294X.2009.04180.x]

Unfortunately, there is not a free version available yet online. The research was partially funded by NIH, so a copy should show up in pubmed in several months. Until then, you can email me at [Enable javascript to see this email address.] (NB this is not my usual address), and I’ll send you a reprint.

Abstract: Many self-incompatible plant species exist in continuous populations in which individuals disperse locally. Local dispersal of pollen and seeds facilitates inbreeding because pollen pools are likely to contain relatives. Self-incompatibility promotes outbreeding because relatives are likely to carry incompatible alleles. Therefore, populations can experience an antagonism between these forces. In this study, a novel computational model is used to explore the effects of this antagonism on gene flow, allelic diversity, neighborhood sizes, and identity-by-descent. I confirm that this antagonism is sensitive to dispersal levels and linkage. However, the results suggest that there is little to no difference between the effects of gametophytic and sporophytic SI on unlinked loci. More importantly both GSI and SSI affect unlinked loci in a manner similar to obligate outcrossing without mating types. This suggests that the primary evolutionary impact of self-incompatibility systems may be to prevent selfing, and prevention of biparental inbreeding might be a beneficial side effect.

Oak Toe Lichen

DSC00070-sm.JPG

Oak Toe Lichen, Carolina Beach State Park

When a novel, adaptive mutation arises in a population it is more likely to go extinct than go to fixation. This is because rare alleles can by chance be lost from the population through accidental deaths, chromosomal segregation, or other forces behind genetic drift.

In 1955 Motoo Kimura used what’s known as diffusion theory to find formulas for calculating the probability that a novel allele with selection coefficient goes to fixation. For example, take a haploid, Wright-Fisher population of size that consists of only A individuals with fitness . Assume that one of those individuals mutates to B, such that the . From this setup, Kimura found that the the probability that B will eventually become fixed in the population and A goes extinct is approximately

under the right assumptions. For diploids with fitnesses , , and . This is

Sella and Hirch (2005) use intuition to modify Kimura’s results and derive some better and more useful alternatives for the above equations, which we use in our work for their nice properties. Sella and Hirch give the following equation in their paper:

where if the population is haploid and if the population is diploid “with multiplicative fitness within loci. …” Unfortunately, they did not specify the diploid model that they were using because there are two different ones used in the literature.

The first one is used above, , , and .

The second one is , , and .

As you can see, is not measuring the same thing in both models. In the first approach, ; in the second . This is important because accidentally mixing up the models will lead to erroneous results.

The difference between these two diploid models came up last week when we were applying Sella and Hirch (2005) to one of our projects. For diploids, we just couldn’t get their approximation to work (), and we suspected that they were using the second model, while we had read their paper to imply the first model. We looked back at their paper and realized that they neglect to specify exactly how they calculate and for diploids; we weren’t sure what they used.

I ended up working through their unspecified math to verify that in fact Sella and Hirch (2005) used the second model of diploid fitnesses, while we wanted to work with the first model. So here is a clarification of their equation, based on our way of thinking:

where if the population is haploid with fitnesses and , and if the population is diploid with fitnesses , , and .

References

  • Sella and Hirch (2005) The application of statistical physics to evolutionary biology. PNAS 102:27 9541–9546.

Pinus palustris

| 3 Comments
DSC00067-sm.JPG

Pinus palustris — Longleaf Pine “grass stage,” Carolina Beach State Park

Ardea herodias

| 1 Comment
herron-down.jpg

Ardea herodias — Great Blue Heron, Sarah P. Duke Gardens

Ardea herodias

herron-standing.jpg

Ardea herodias — Great Blue Heron, Sarah P. Duke Gardens

Victoria amazonica

DSCF2201-sm.JPG

Victoria amazonica — Water Lily, Sarah P. Duke Gardens

For many years now, I’ve been trying to get the final chapter of my dissertation published. Its gone to several journals, and at Molecular Ecology I finally got a really good editor, who is an expert in the topic. He provided great comments, which finally convinced me to bite the bullet and redo my simulations, updating my analysis.

I’ll post a pre-print when it comes available, but for now here is the abstract information.

Antagonism between Local Dispersal and Self-Incompatibility Systems in a Continuous Plant Population

Reed A. Cartwright

Abstract: Many self-incompatible plant species exist in continuous populations in which individuals disperse locally. Local dispersal of pollen and seeds facilitates inbreeding because pollen pools are likely to contain relatives. Self-incompatibility promotes outbreeding because relatives are likely to carry incompatible alleles. Therefore, populations can experience an antagonism between these forces. In this study, a novel computational model is used to explore the effects of this antagonism on gene flow, allelic diversity, neighborhood sizes, and identity-by-descent. I confirm that this antagonism is sensitive to dispersal levels and linkage. However, the results suggest that there is little to no difference between the effects of gametophytic and sporophytic SI on unlinked loci. More importantly both GSI and SSI affect unlinked loci in a manner similar to obligate outcrossing without mating types. This suggests that the primary evolutionary impact of self-incompatibility systems may be to prevent selfing, and prevention of biparental inbreeding might be a beneficial side effect.

My latest paper has now been published by MBE. You can grab your free reprint by following this link. I earlier blogged about some of the implications of this research, which you can see here.

Abstract:

Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly handle indels in neutrally evolving DNA sequences. From a data set of orthologous introns, we estimate relative rates and length distributions of indels among primates and rodents. This technique has the advantage of potentially handling large genomic data sets. We find that a zeta power-law model of indel lengths provides a much better fit than the traditional geometric model and that indel processes are conserved between our taxa. The estimated relative rates are about 12-16 indels per 100 substitutions, and the estimated power-law magnitudes are about 1.6-1.7. More significantly, we find that using the traditional geometric/affine model of indel lengths introduces artifacts into evolutionary analysis, casting doubt on studies of the evolution and diversity of indel formation using traditional models and invalidating measures of species divergence that include indel lengths.

Reference: Cartwright RA (2009) Problems and solutions for estimating indel rates and length distributions. Molecular Biology and Evolution. 26(7):473–480

Citation Station

I recently became aware of an October paper from Ian Holmes’s group at UC Berkeley describing a rich genome simulation toolset: “Tools for simulating evolution of aligned genomic regions with integrated parameter estimation” Besides their programs, what’s interesting about the paper is that they judged the quality of their work by comparing it to my own research. I have much respect for Holmes and to have him compete against a program from my dissertation research is an honor. Here is a choice quote from their paper:

We also compared our simulation methods, GSIMULATOR and SIMGENOME, to DAWG, a widely cited program for simulation of neutral substitution and indel events. We chose DAWG because it most closely exemplifies the goals we have identified here: it is clearly based on an underlying evolutionary model and provides tools for estimating the parameters of the indel model directly from sequence data. It appears to be the leading general-purpose simulator at the time of writing. Other simulators (such as PSPE) are richer, but do not provide the parameter-estimation functionality that DAWG does. (internal citations removed)

In other citation news, there is a new paper out in Genetics discussing the “hothead” mutation in Arabidopsis thaliana that made this blog semi-famous. This paper follows up on arguments that hothead’s revertant phenotype is due to unexpected outcrossing. They confirm that hothead does have a high outcrossing rate and that revertant phenotypes are due to outcrossed seeds.

When I first heard this argument, I kicked myself for not thinking about it first. What first characterized hothead plants was the lack of fusion in their flowers. Wild-type A. thaliana plants are complete selfers because their flowers are fused shut, preventing external pollen from reaching the stigma. This way, only pollen produced internally will pollinate A. thaliana. However, when the flower fails to close completely, external pollen is no longer excluded and outcrossing can occur. Duh!

Admittedly, this would have been hard to argue without having the actual plants because some observations reported in Lolle’s original paper conflict with outcrossing, but it appears that these observations cannot be replicated. Perhaps they were just artifacts.

Some of you may remember that several years ago that Britten (2002) argued that human-chimp divergence was 5% not ~1.2%. (See this press release for a refresher.) Of course, creationists jumped on this research and began harping that the more scientists looked, the more distant humans and chimps were. This is important to them because the number one rule of creationism is “no matter what, humans are not related to any other living creatures,” which is so difficult to maintain in our age of science and education.—Amusingly, humans and chimps are so similar to one another that creationists cannot create a consistent definition of “created kinds” that makes humans special and lumps all the boring animals together.

Britten (2002) derived his 5% divergence metric by considering the lengths of insertions and deletions (indels) along with point substitutions between human and chimp genomes. This is unlike other estimates that just consider the number of point substitutions that have occurred between the two species and find ~1.2% divergence. At the time I commented that these two numbers—1.2% and 5%—could not be compared because they are different metrics. Additionally, Britten’s metric is probably unfairly upweighting the contribution of indels because a single event can add or remove multiple residues at a time.

A recent study of mine, which was not directed at Bitten’s work, has found that it is actually worse than that. Simply put, the total length of indels separating humans and chimps is unrelated to the evolutionary divergence between them. This arises because the variance of indel length is “nearly-infinite”, which causes nonconservation of average indel length. Therefore, two pairs of species, equally divergent evolutionarily, can and probably will have very different proportions of nucleotides belonging to indels. One pair might be 5% divergent and the other 1.5% divergent, including indels, without any underlying change in the evolutionary process or time since speciation.

The upside is that traditional substitution based evolutionary distances are unaffected and can still be used to properly estimate the evolutionary divergence between species.

Well I’ve sent my stuff off to the publisher, so I figure that I’ll share the abstract with y’all.

Cartwright RA. Problems and solutions for estimating indel rates and length distributions. Molecular Biology and Evolution. (in press)

Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly handle indels in neutrally evolving DNA sequences. From a dataset of orthologous introns, we estimate relative rates and length distributions of indels among primates and rodents. This technique has the advantage of potentially handling large genomic datasets. We find that a zeta power-law model of indel lengths provides a much better fit than the traditional geometric model and that indel processes are conserved between our taxa. The estimated relative rates are about 12–16 indels per 100 substitutions, and the estimated power-law magnitudes are about 1.6–1.7. More significantly, we find that using the traditional geometric/affine model of indel lengths introduces artifacts into evolutionary analysis, casting doubt on studies of the evolution and diversity of indel formation using traditional models and invalidating measures of species divergence that include indel lengths.

Sarracenia leucophylla

pitcher.jpg

Sarracenia leucophylla — Crimson Pitcher Plant, Duke Gardens

Bees, Bees, Bees

irwina.png

A couple weeks ago, tree surgeons working out side of a hospital in Wake county, discovered that there was a large hive of bees living in large oak that needed to come down. Due to the decline of wild and domestic honey bees, the state and county governments sent beekeepers to save the hive. The story was popular in the local news and made the NY Times as well.

Volunteers with the Wake County Beekeepers Association and state bee specialists squirted smoke from smoldering canisters into the opening of the giant oak to calm the bees, then moved eight large chunks of honeycomb from the trunk to a new bee box.

“We got them a good home,” said Danny Jaynes, president of the Wake County Beekeepers Association and hobby beekeeper. “It’s one of the most rewarding days of my beekeeping life.”

The combs, containing thousands of adult bees, juveniles and eggs, were placed inside wooden frames. The frames hang vertically like files inside the bee box. By moving the combs, beekeepers expect most of the bees will relocate to raise the young bees and make a new home.

Research Blogging Roundup

Mailund on the Internet: Heads or tails and reliable alignments

In this paper they analyse the quality of multiple sequence alignments in an extremely simple manner: They first align the sequences left to right, then reverse them to essentially align them right to left. Unless the alignment algorithm has a preferred order of symbols, you’d expect to get the same alignment going left to right as right to left.

Not always, of course: if the algorithm is based on oligonucleotides or such, then the order matters, but in many cases it doesn’t.

Greg Laden: Genetics of Behavior: Fire Ants

Solenopsis invicta, a fire ant, can have colonies with a single reproductive queen (these are called mongyne colonies) or a colony wit multiple reproductive queens (called polygyne colonies).

In mongyne colonies, all individuals have a particular allele for one gene. The gene is General Protein-9 (Gp-9), and the allele is the B-like allele.

Polygyne colonies contain individuals with both the B-like allele and the b-like allele (case matters!). This has led to the suggestion that the presence of b-like is necessary and sufficient for the rise of polygyne colonies.

Below is an email that I received from NESCent:

Phyloinformatics Summer of Code 2008

Please disseminate this announcement widely to appropriate students at your institution

The National Evolutionary Synthesis Center (NESCent) is participating in 2008 for the second year as a mentoring organization in the Google Summer of Code. Through this program, Google provides undergraduate, masters, and PhD students with a unique opportunity to obtain hands-on experience writing and extending open-source software under the mentorship of experienced developers from around the world.

Our goal in participating is to train future researchers and developers to not only have awareness and understanding of the value of open-source and collaboratively developed software, but also to gain the programming and remote collaboration skills needed to successfully contribute to such projects. Students will receive a stipend from Google, and may work from their home, or home institution, for the duration of the 3 month program. Students will each have one or more dedicated mentors with expertise in phylogenetic methods and open-source software development.

NESCent is particularly targeting students interested in both evolutionary biology and software development. Project ideas (see URL below) range from visualizing phylogenetic data in R, to development of a Mesquite module, web-services for phylogenetic data providers or geophylogeny mashups, implementing phyloXML support, navigating databases of networks, topology queries for PhyloCode registries, to phylogenetic tree mining in a MapReduce framework, and more.

The project ideas are flexible and many can be adjusted in scope to match the skills of the student. If the program sounds interesting to you but you are unsure whether you have the necessary skills, please email the mentors at the address below. We will work with you to find a project that fits your interests and skills.

Inquiries

Email any questions, including self-proposed project ideas, to [Enable javascript to see this email address.].

To Apply

Apply on-line at the Google Summer of Code website, where you will also find GSoC program rules and eligibility requirements. The 1-week application period for students opens on Monday March 24th and runs through Monday, March 31st, 2008.

Hilmar Lapp and Todd Vision US National Evolutionary Synthesis Center

URLS

Dr. Anne Yoder and her group at the Duke Lemur Center have produced lemur family tree tree. Greg Laden has the goods.

Note: Prof. Steve Steve was not involved in the study.

Estimating Local Ancestry

Last month, one of the fellow postdocs in our lab went back home to Denmark and began professoring again. So it’s nice to find out that one of his collaborators in Denmark is a blogger and has a really cool post about Estimating local ancestry. He even connects the post to some research that was done in our lab, which makes it double plus one good.

About this Archive

This page is an archive of recent entries in the Biology category.

Education is the next category.

Find recent content on the main index or look in the archives to find all content.

Archives

Powered by Movable Type 4.37