Quiet Times

| No Comments

This have been quite here because I have been focusing my energy on work and family. To simplify things, I’m going to focus nearly all of my blogging on the Panda’s Thumb for the time being.

I’ve had the good fortune of having some papers published recently. The first one is a methodology paper concerning a way of extracting phylogenetic information from regions of multiple sequence alignments that are full of indels and difficult to align:

PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination. (link)

Robert Lücking, Brendan P Hodkinson, Alexandros Stamatakis and Reed A Cartwright

BMC Bioinformatics 2011, 12:10 doi:10.1186/1471-2105-12-10

My co-author, Brendan Hodkinson, has already covered it on his blog.

In molecular biology, an alignment is a partial reconstruction of the evolutionary history of a group of sequences. In an alignment, all residues found in the same column are considered to be descended from a single residue in the ancestral sequence. (Of course, insertions violate this description, but I won’t get into that.) Alignments are not direct observations. They are actually inferences based on the patterns of sequences found in the dataset. Often times there are particular areas in which the alignment is difficult to resolve. Take this example:

problemalign.png

A typical problem in multiple sequence alignments where a section is full of gaps and contains a complicated phylogenetic signal. Dark red: high certainty that alignment is accurate; Dark blue: low certainty that alignment is accurate..

It was constructed via the GUIDANCE webserver. (A great resource that everyone should use.) In this example, we have a region defined by a lot of sequence variation created by many insertions and deletions. The alignment is not well defined here, and in most applications it will just be removed, and the data “thrown away”.

But is this the only solution? In our paper we develop a methodology, dubbed PICS-Ord (download), that provides an easy solution for extracting phylogenetic information from problematic regions chosen by its user. PICS-Ord works through a three-step process:

  1. Realign the segments in pairs using Ngila, and calculate the likelihood of the alignment from an evolutionary model. This produces a distance matrix of the segments.
  2. Ordinate the distance matrix using principal coordinate analysis (PCoA). This assigns each segment to a point in n-1-dimensional space.
  3. Quantize each dimension into a set of characters

This might seem a bit odd at first. “Why not just use the distance matrix directly?” That would be great if we could, but there aren’t any phylogenetic programs that we know off that allow the mixing of distance matrices and sequence data. With our method, we get discrete, ordered characters that can be used in popular programs like, RAxML.

There are three example files in the PICS-Ord distribution, and I’ll illustrate its usage with example1.fas. The alignment of these sequence fragments is messy:

 100 114
sequence_001 ----------------------------------------------------------------------------tatactatcta---------------------------
sequence_002 -------------------------------------------------------------------aattgtatttatactatata---------------------------
sequence_003 -------------------------------------------------------------------tttaagatttattctatatt---------------------------
sequence_004 tttaggattaattttata--------------------------------------------------------taatactaatata---------------------------
sequence_005 -------------gatgg--------------------------------------------------------ttttacctatata---------------------------
sequence_006 ---------------------------------------------------------------------------tatcattatgca---------------------------
sequence_007 ---------------------------------------------------------------------------tatcattatgca---------------------------
sequence_008 -------------------------------------------------------------------------atatgtttaagata---------------------------
sequence_009 -------------------------------------------------------------------------atatgtttaagata---------------------------
sequence_010 -------------------------------------------------------------------------atatgtttaagata---------------------------
sequence_011 -------------gtac----------------------------------------------------------aattataatata---------------------------
sequence_012 -------------gtac----------------------------------------------------------aattataatata---------------------------
sequence_013 -------------gtac----------------------------------------------------------taatttaatata---------------------------
sequence_014 -------------ctac-----------------------------------------------------------aatataatata---------------------------
sequence_015 -------------ctac-----------------------------------------------------------aatataatata---------------------------
sequence_016 -------------ctac-----------------------------------------------------------attaaaatata---------------------------
sequence_017 -------------ctac-----------------------------------------------------------attaaaatata---------------------------
sequence_018 -------------gtat-----------------------------------------------------------aatttaatcta---------------------------
sequence_019 -------------gtat-----------------------------------------------------------attttaatcta---------------------------
sequence_020 -------------------------------------------------------------------------------ataagata---------------------------
sequence_021 -------------------------------------------------------------------------------ataagata---------------------------
sequence_022 --------------------------------------------------------------------------attataattaata---------------------------
sequence_023 --------------------------------------------------------------------------attataattaata---------------------------
sequence_024 -------------------------------------------------------------------------------ataagata---------------------------
sequence_025 -------------------------------------------------------------------------------ataagata---------------------------
sequence_026 ----------------------------------------------------------------------------aaaaaaaaata---------------------------
sequence_027 -----------------------------------------------------------------------------aaaaaaaata---------------------------
sequence_028 -------------------------------------------------------------------------------acaaaata---------------------------
sequence_029 -------------------------------------------------------------------------------acaagata---------------------------
sequence_030 --------------------------------------------------------------------------------acaaata---------------------------
sequence_031 -------------------------------------------------------------------------------acaaaata---------------------------
sequence_032 -------------gaat-----------------------------------------------------------aatattaaata---------------------------
sequence_033 -------------gaat-----------------------------------------------------------aatattaaata---------------------------
sequence_034 -------------gaaa-----------------------------------------------------------aatattaaata---------------------------
sequence_035 -------------gtat-----------------------------------------------------------tctttaatata---------------------------
sequence_036 -------------gtat-----------------------------------------------------------tatttaatcta---------------------------
sequence_037 -------------gtat-----------------------------------------------------------tatttaatata---------------------------
sequence_038 -------------gtat-----------------------------------------------------------tatttaatcta---------------------------
sequence_039 -----------------------------------------------------------------------------gttttatata---------------------------
sequence_040 -----------------------------------------------------------------------------gtttaatata---------------------------
sequence_041 -------------------------------------------------------------------------atcagtttaatacg------------------ctgagtgat
sequence_042 -------------------------------------------------------------------------accagtttaattta------------------ctgggtgat
sequence_043 ----------------------------------------------------------------------------------------------ctcagtttctgctgagtggt
sequence_044 ----------------------------------------------------------------------------agtttaatatg------------------ctgattgat
sequence_045 --------------------------------------------------------------------------------atatgta---------------------------
sequence_046 --------------------------------------------------------------------------------atatgta---------------------------
sequence_047 --------------------------------------------------------------------------------ataagta---------------------------
sequence_048 --------------------------------------------------------------------------------ataagta---------------------------
sequence_049 --------------------------------------------------------------------------------ataagta---------------------------
sequence_050 --------------------------------------------------------------------------------atatgta---------------------------
sequence_051 -----------------------------------------------------------------------------gttttctaat---------------------------
sequence_052 -----------------------------------------------------------------------------gtttactaaa---------------------------
sequence_053 -----------------------------------------------------------------------------gtttactaat---------------------------
sequence_054 -----------------------------------------------------------------------------gtttactaat---------------------------
sequence_055 -------------------------------------------------------------------------------gcta-aaa---------------------------
sequence_056 -------------------------------------------------------------------------------gcta-aaa---------------------------
sequence_057 -------------------------------------------------------------------------------gcta-aaa---------------------------
sequence_058 -----------------------------------------------------------------------------gtttactgaa---------------------------
sequence_059 -----------------------------------------------------------------------------gtttactgaa---------------------------
sequence_060 -----------------------------------------------------------------------------gtttactgaa---------------------------
sequence_061 -----------------------------------------------------------------------------gttagctgaa---------------------------
sequence_062 -----------------------------------------------------------------------------gttagctgaa---------------------------
sequence_063 -----------------------------------------------------------------------------gttagctgaa---------------------------
sequence_064 -------------------------------------------------------------------------------gttt-aaa---------------------------
sequence_065 -------------------------------------------------------------------------------gttt-aaa---------------------------
sequence_066 -------------------------------------------------------------------------------gttt-aaa---------------------------
sequence_067 -------------------------------------------------------------------------------gcta-aaa---------------------------
sequence_068 -------------------------------------------------------------------------------gcta-aaa---------------------------
sequence_069 -----------------------------------------------------------------------------atttacttaa---------------------------
sequence_070 -----------------------------------------------------------------------------atttacttaa---------------------------
sequence_071 -----------------------------------------------------------------------------atttacttaa---------------------------
sequence_072 ---------------------------------------------------------------------------------gttaaa---------------------------
sequence_073 ---------------------------------------------------------------------------------gttaaa---------------------------
sequence_074 aattttattaattactttagtaattaataaggttattttaagtaacagcaaaatattagttaaaagcgttgct-tgcaattagtaaagt--------------agca-ttatta
sequence_075 aattatattaattactttagtaattaaatttgttatttttagtaacagcaaaatattagttacaagcgttgct-tgtaattagtaaagt--------------agca-ttatta
sequence_076 ---------------------------------------------------------------------------------ttttta---------------------------
sequence_077 ---------------------------------------------------------------------------------ttttta---------------------------
sequence_078 ---------------------------------------------------------------------------------ttttta---------------------------
sequence_079 ---------------------------------------------------------------------------------ttttta---------------------------
sequence_080 -------------gaag-----------------------------------------------------------attaataacta---------------------------
sequence_081 -----------------------------------------------------------------------------atttatatta---------------------------
sequence_082 -----------------------------------------------------------------------------atttatatta---------------------------
sequence_083 actcctact------ttaaacatttagtagtgtcgaacctactgatagcatctggttttctattgg--------tacttataacataaccactaaatatttagagtattaatta
sequence_084 actcctact------ttaaacatttagtagtgtcgaacctactgatagcatctggttttctattgg--------tacttataacataaccactaaatatttagagtattaatta
sequence_085 -------------gaaa----------------------------------------------------------taacagtaacta---------------------------
sequence_086 -------------aaag-----------------------------------------------------------attagtaacta---------------------------
sequence_087 aattttaca------tttagtttttaatctttatgtttaaaa----acatgtatgctatttatatg--------tatatataatatagt--------------agaacttacaa
sequence_088 aattttact-------------------ttgggt-tttaaaa----actagtatgctatgtttatatattaatttatatatcatatagt--------------agaacttacaa
sequence_089 aattttact------ctt--tttttaagttttat-atttaaa----atctgtatgctatgtttatatattaatttatatataatatagt--------------agaacttacaa
sequence_090 aattttact------ctt--tttttaagttttat-atttaaa----atctgtatgctatgtttatatattaatttatatataatatagt--------------agaacttacaa
sequence_091 -------------gtac-----------------------------------------------------------ataataatata---------------------------
sequence_092 -------------gtaca--------------------------------------------------------taataataatata---------------------------
sequence_093 -------------gtaca--------------------------------------------------------taataataatata---------------------------
sequence_094 -------------gtac-----------------------------------------------------------ataataatata---------------------------
sequence_095 ---------------------------------------------------------------ttttttataccaataaataatata---------------------------
sequence_096 ---------------------------------------------------------------ttttttataccaataaataatata---------------------------
sequence_097 ---------------------------------------------------------------ctatttata-taataaataatata---------------------------
sequence_098 -------------ctat-----------------------------------------------------------ataaaaatata---------------------------
sequence_099 -------------ctat-----------------------------------------------------------ataaaaatata---------------------------
sequence_100 -------------ctat-----------------------------------------------------------ataaaaatata---------------------------

But instead of throwing it away, you can process it with PICS-Ord and get a clean set of ordered characters that contain approximately the same phylogenetic information as the sequences above.

    100    20
sequence_001 53221002101000000010
sequence_002 44121113101010000000
sequence_003 53211103102011000100
sequence_004 53321103111000010100
sequence_005 53211003101000001000
sequence_006 53221002001000000000
sequence_007 53221002001000000000
sequence_008 43220112011000000000
sequence_009 43220112011000000000
sequence_010 43220112011000000000
sequence_011 53221012011000000000
sequence_012 53221012011000000000
sequence_013 53221012011000100000
sequence_014 53321012001010100000
sequence_015 53321012001010100000
sequence_016 53221013001000000000
sequence_017 53221013001000000000
sequence_018 53221012001000100000
sequence_019 53221012001000001000
sequence_020 53220102011000000000
sequence_021 53220102011000000000
sequence_022 53121012011010000000
sequence_023 53121012011010000000
sequence_024 53220102011000000000
sequence_025 53220102011000000000
sequence_026 53220002001000000000
sequence_027 53220002001000000000
sequence_028 53120102011000000000
sequence_029 53220102011000000000
sequence_030 53120002011000000000
sequence_031 53120102011000000000
sequence_032 53220002111100000000
sequence_033 53220002111100000000
sequence_034 53220002111000000000
sequence_035 53221012011000001000
sequence_036 53221012001000000000
sequence_037 53221012011000000000
sequence_038 53221012001000000000
sequence_039 53221102111000000000
sequence_040 53211002011000000000
sequence_041 53300112011000000000
sequence_042 53200112011010001010
sequence_043 53300103001100001001
sequence_044 53200112001110000000
sequence_045 53120112001000000000
sequence_046 53120112001000000000
sequence_047 53120102001000000000
sequence_048 53120102001000000000
sequence_049 53120102001000000000
sequence_050 53120112001000000000
sequence_051 53211002011000000000
sequence_052 53111002011000000000
sequence_053 53211002011000000000
sequence_054 53211002011000000000
sequence_055 53110002001010000000
sequence_056 53110002001010000000
sequence_057 53110002001010000000
sequence_058 43111002011000000000
sequence_059 43111002011000000000
sequence_060 43111002011000000000
sequence_061 53201002011000000000
sequence_062 53201002011000000000
sequence_063 53201002011000000000
sequence_064 43111002011000000000
sequence_065 43111002011000000000
sequence_066 43111002011000000000
sequence_067 53110002001010000000
sequence_068 53110002001010000000
sequence_069 43111002011000000000
sequence_070 43111002011000000000
sequence_071 43111002011000000000
sequence_072 53111002001000000000
sequence_073 53111002001000000000
sequence_074 59021102011001100000
sequence_075 59020012001110001000
sequence_076 53121102001100000000
sequence_077 53121102001100000000
sequence_078 53121102001100000000
sequence_079 53121102001100000000
sequence_080 53220002001100000000
sequence_081 53121102001000000000
sequence_082 53121102001000000000
sequence_083 90021002001000000000
sequence_084 90021002001000000000
sequence_085 53220002000100000000
sequence_086 53120002000101000000
sequence_087 02121100001000000000
sequence_088 02020003000010100000
sequence_089 02021013011100000000
sequence_090 02021013011100000000
sequence_091 53321002001000000000
sequence_092 53321002011000000000
sequence_093 53321002011000000000
sequence_094 53321002001000000000
sequence_095 53321202010100001000
sequence_096 53321202010100001000
sequence_097 43321102011000001000
sequence_098 53321003001000000000
sequence_099 53321003001000000000
sequence_100 53321003001000000000

I haven’t been keeping up with my Calix Cari polls this year for collage football. But now that the regular season has ended, I have found time to produce one. The events of this season have been rather unpredictable. Of course by the end of the season there were few surprises. Auburn appeared out of nowhere to become #1 on the strength of a once-in-a-decade player who fit perfectly into their offensive system. (Yay, for the booster who had cash to spare in this economy. We will see if their season stands the test of time.) But in my calculation Auburn is only #3, behind Oregon and ¡Stanford! (I still think Harbaugh would make a smooth transition into the coach’s chair at UGA but wouldn’t be there long. It’s good that Richt was retained.)

It appears my algorithm likes the Pac-10 over the SEC, and Auburn lost ground because of its tight victories, early in the season.

RankTeamRecordQuality
1Oregon12-07.3937
2Stanford11-17.2491
3Auburn13-07.1707
4Oklahoma11-27.0580
5TCU12-06.9927
6Boise St11-16.8977
7Missouri10-26.7980
8Ohio State11-16.5913
9Texas A&M9-36.5043
10Oklahoma St10-26.4876

Ngila 1.3 Released

It has been a long time coming, but I have finally released Ngila 1.3. This version fixes a few bugs and includes many new features.

  • Use CMake for compilation and installation
  • New scaling option enabled by default (identical sequences default to cost of 0)
  • Protein evolutionary models: aazeta and aageo
  • Fasta and Phylip format output support
  • Clustal and Phylip format input support
  • Report sequence identity measure
  • Matrix output formats for distance measures
  • Look for “ngilarc” file in the home directory.
  • New separator option
  • New const-align option
  • Replace arg-file option with ngilarc option.
  • Use custom zeta function if GSL not found.
  • Optimize size of travel table.
  • Ordering of –pairs-all fixed
  • bug fix for output of large alignments >10kb
  • minor bug fix for geo model

The Working Life

| 1 Comment

I apologize for things being slow on this blog. I’ve been knee deep in programming, manuscripts, grant proposals, and teaching. I’m hoping to have results to share in the near future. In the mean time, you can follow some of my activities on the Panda’s Thumb.

I will say that the development version of Dawg now supports codon models, and Ngila has some new features as well.

Odd Hack

| 1 Comment

This machine got partially hacked over the weekend. From what I can tell, Ziproxy was compromised and used to submit spam email through my system. Because my mail sever accepts local email, it was going out. It looks like only yahoo emails were being hit. Of course, the spam was coming from China.

Since I turned off Ziproxy, I haven’t seen any odd email originating from my machine.

Dawg 2

| 3 Comments

Dawg created its first protein sequences today. Woot!

Bama Wins 2009 Calix Cari

| 1 Comment

The announcement is a bit late, but after beating Texas, Bama has won the 2009 Calix Cari.

RankTeamRecordQuality
1Alabama14-07.8239
2Florida13-17.4268
3Texas13-17.1820
4TCU12-16.8893
5Oregon10-36.7193
6Virginia Tech10-36.7187
7Ohio State11-26.7114
8Boise St14-06.6326
9Cincinnati12-16.5913
10Iowa11-26.4511
11Georgia Tech11-36.4444
12Penn State11-26.3350
13Brigham Young11-26.1872
14LSU9-46.1531
15Pittsburgh10-36.0738
16Southern Cal9-46.0535
17Miami FL9-46.0213
18Nebraska10-46.0208
19Oklahoma8-55.8503
20Wisconsin10-35.8343
21Arkansas8-55.7899
22Utah10-35.7644
23Clemson9-55.7595
24Mississippi9-45.7330
25Arizona8-55.6809
26Stanford8-55.6637
27West Virginia9-45.6543
28Georgia8-55.6270
29Texas Tech9-45.6112
30Oregon St8-55.6002

See below for the complete results of 2009.

Network Issues

We’ve been having network issues for the last week. (The network setup in general is not reliable.) Nothing is wrong with what I have control over, but I’m kind of stuck with the connection that I’ve been given.

My apologies to everyone this affects.

Calix Cari 2009 Week 14

With the conference championship games complete my Calix Cari rankings once again agree with the pollsters: its Texas versus Alabama for the title.

Surprisingly, despite Tebow crying after visiting the wood-shed in Atlanta Florida still ranks #3, ahead of undefeated teams TCU, Cincinnati, and Boise State. Two loss Oregon comes in at #4, based on its strong showings against Cal, USC, Arizona, and Oregon State. Boise State produced the second best victory of the year (against Oregon) but couldn’t overcome a week conference schedule to rise higher than #9.

The Pac-10 placed 6 teams in the top 25, followed by Big 12 at 4, SEC, Big East, ACC, and Big 10 at 3, MWC at 2, and WAC at 1. Clearly the Pac-10’s 8-4 logjam was seen more highly than the SEC’s 7-5 logjam.

RankTeamRecordQuality
1Alabama13-07.6798
2Texas13-07.3390
3Florida12-17.2858
4Oregon10-27.0857
5TCU12-06.9550
6Cincinnati12-06.7700
7Virginia Tech9-36.6491
8Georgia Tech11-26.6343
9Boise St13-06.5395
10Ohio State10-26.4794
11LSU9-36.2673
12Iowa10-26.2457
13Miami FL9-36.2368
14Penn State10-26.1839
15Southern Cal8-46.1387
16Arizona8-46.1116
17Stanford8-46.0294
18Oregon St8-45.9837
19Pittsburgh9-35.9276
20Brigham Young10-25.8856
21West Virginia9-35.8317
22Oklahoma St9-35.8001
23Nebraska9-45.7685
24California8-45.7510
25Oklahoma7-55.7005

Snow

It’s snowing right now in Houston. I’m not sure if it will stick.

Calix Cari 2009 Week 11

It’s be a few weeks since I’ve calculated my rankings for collage football, but with Uga VII passing away, I figure I’d post something in honor of him.

RankTeamRecordQuality
1Alabama10-07.5935
2Florida10-07.4398
3Texas10-07.3265
4TCU10-07.2566
5Oregon8-26.9875
6Georgia Tech10-16.9202
7Cincinnati10-06.7603
8Ohio State9-26.6532
9Virginia Tech7-36.6306
10Boise St10-06.6197

RIP Uga VII

Today sucks. Uga VII has died.

Calix Cari 2009 Week 8

This is my first poll of the season, and Alabama is ranked #1, followed by Iowa and Florida. What’s interesting is that a quick look through the results reveals that Boise State’s victory of Oregon is considered the best win of any team of the year.

RankTeamRecordQuality
1Alabama8-07.8141
2Iowa8-07.7549
3Florida7-07.6983
4Oregon6-17.5908
5Southern Cal6-17.5221
6Texas7-07.4951
7TCU7-07.3465
8Virginia Tech5-27.2887
9Georgia Tech7-17.1787
10Boise St7-07.0502

Happy Happy Joy Joy

I need to head home early today to clean up my otaku pad.

I have a girl coming over for a sleep over.

I Have a Minon

| 4 Comments

We’ve hired a master’s student to be my minion in the lab. Bwahaha…

Downtime

We’re having network issues which explains why this machine has been off the net since yesterday. It’s working now. Hopefully it will stay that way.

Alive in Houston

Well I’ve moved to Houston and my server is back online.

Moving

I’m moving to Houston this weekend. Expect this website and other I administer to have limited functionality for about a week.

When things just work.

| 7 Comments

I’m moving to Houston next month, and I’m working furiously to finish up about 4 papers before I leave. Yesterday was a good day because two things worked absolutely perfectly.

I’m calculating the likelihoods of some of our observations and I wrote a specific routine to do it for a first-order model. I then wrote a generic routine to calculated likelihoods for higher-order models. When I used the generic routine to calculate the likelihood for a first-order model, I got the same result as my specialized routine. Yatah!

From previous analysis using partial autocorrelations, we determined that a third-order model should explain our data the best. When I compared our models to see which one was most parsimonious (using AIC), the third-order model again came out on top. Yatah2!

Find recent content on the main index or look in the archives to find all content.

Recent Comments

  • RBH: I see you note that Auburn was weakened by early read more
  • Monado: Welcome back! read more
  • Joe: Most likely you left the proxy accepting CONNECT method (HTTPS), read more
  • Reed A. Cartwright: Just set them as environmental variables. If you use Bash, read more
  • Claire and Tom: Yes that’s right we can attest to all that because read more
  • HARRY C BEAVER MD: I am he! Most of that recorded here is true read more
  • Johnny: forgive me for being clueless, but how do I modify read more
  • Monado: Well??? What’s new with DAWG? read more
  • Antique Furniture Mall: Finally I found this continuation. Very thanks for posting it! read more
  • David Fisher: No apostrophe in “its”. read more

Recent Trackbacks

Archives

Powered by Movable Type 4.38