Recently in Software Category

Ngila 1.3 Released

It has been a long time coming, but I have finally released Ngila 1.3. This version fixes a few bugs and includes many new features.

  • Use CMake for compilation and installation
  • New scaling option enabled by default (identical sequences default to cost of 0)
  • Protein evolutionary models: aazeta and aageo
  • Fasta and Phylip format output support
  • Clustal and Phylip format input support
  • Report sequence identity measure
  • Matrix output formats for distance measures
  • Look for “ngilarc” file in the home directory.
  • New separator option
  • New const-align option
  • Replace arg-file option with ngilarc option.
  • Use custom zeta function if GSL not found.
  • Optimize size of travel table.
  • Ordering of –pairs-all fixed
  • bug fix for output of large alignments >10kb
  • minor bug fix for geo model

Dawg 2


Dawg created its first protein sequences today. Woot!

Updated Plugins

After a while of testing, I’ve released new versions of Xomment and MT-Dispatch.

Information and download links can be found in the documentation liked to in the sidebar

Meet Melody

A while back Six Apart made much of their Movable Type blogging software open source. Many people hoped that this would make MT a community driven software project and allow Six Apart to focus on providing support for enterprise blogging solutions. Sadly, Movable Type is still developed basically as a closed source system, with roadmaps and feature sets kept internal to the company.

It is nearly impossible for someone on the outside to contribute to the project. And if you don’t serve MT using Apache, good luck getting support or getting them to fix non-apache bugs in their software. I filed a bug with Six Apart over a year ago, which pertained to how certain plugins interacted with my MT-Dispatch system.—MT-Dispatch is the only way to run MT on some webservers, like Nginx, and provides advanced FastCGI support on all webservers.—In this bug report I provided a one-line patch to MT, which ensures that all plugin types behave the same way and work with MT-Dispatch. It wasn’t even a new line of code I just copied some logic from one plugin type to another.

However, Six Apart had no interest in including this simple patch in their software because MT-Dispatch and non-Apache webservers were not supported by their company. Its is a stupid policy if they are trying to get community involvement in their open source system. They finally committed my patch this month, after they discovered that Mod-Perl was affected by the same problem as MT-Dispatch. If they would have just listed to me in the first place, they would have fixed the Mod-Perl bug sooner.

But’s that’s all history because a group of influential Movable Type consultants and developers have forked MTOS and are producing Melody, which will be an actual open source, community driven project.

I am hoping to see my MT-Dispatch and Xomment technology (or some derivative of them) committed to Melody’s core. But I lack the time to do such development, so I’m hoping to interest another developer to do it. Time will tell if anyone is interested enough to put the work in.

In my previous post, I mention that I will email you a reprint of a paper, if you send an email to [Enable javascript to see this email address.]. I actually do nothing. The reprint is handled automatically using my Procmail. In this post I will explain how I got it to work.

The first thing to note is that my email supports sub-addressing. This means that in the above email address, the mail is delivered to “reed” with argument “2009b”. I use this argument to determine that the sender is looking for a PDF reprint of my recent paper.

Next I had to modify my .procmailrc to copy 2009b requests to a specific folder and reply to them with the paper. Here is my solution, which is based somewhat on the solution for vacation notice emails with Procmail.

PLUS=$1 #copy the sub-address into variable PLUS


* PLUS ?? ^2009b$
* !^X-Loop: reed+2009b@[snip]
* !^X-Spam-Status: Yes

   :0 A
   | (formail -r -A"From: reed@[snip]" -A"X-Loop: reed+2009b@[snip]" \
        -I"MIME-Version: 1.0" \
        -I"Content-Type: multipart/mixed; boundary=\"------------070504020300020208040609\""; \
      cat $HOME/papers/2009b.msg ) | $SENDMAIL -oi -t

This solution will copy and auto-reply to the incoming email if it matches the +2009b argument, is not from an email list, has not already been auto-replied to, and is not spam.

To send the reply, I use the formail tool that comes with Procmail to construct the reply. This involves using flags to specifying some email header variables, followed by catting a prespecified email body that contains the encoded attachment. I generated the body by sending myself the email and attachments that I wanted to send out to people, and then copying the body of the message to a text file, 2009b.msg. I just had to copy the boundary header used by my email program to the formail recipe above.

This rule can be expanded to autorespond to multiple requests and to handle formail or sendmail errors.

We’re running some popgen simulations on an OS X workstation in the lab, and I’m trying to squeeze everything out of it I can by using the Intel Compiler 11, “icc.” Our code uses GSL so I’ve had to compile it with icc as well and that is where the fun begins. This is what I learned.

  1. make check is very important to ensure that GSL is compiled correctly
  2. GSL does not like -fast because it doesn’t like -ipo. I’m not sure if it is because -ipo shouldn’t be used with libraries anyway.
  3. GSL does not like optimizations that reduce the precision of floating point operations
  4. GSL’s ode-intval library does not like icc’s vectorization

To optimally compile GSL and pass all of its tests, I used the following flags:

-O3 -xHost -DNDEBUG -fp-model precise -fp-model source

However, for the ode-intval library, I had to edit the Makefile and add -no-vec to get it to compile right.

Cluster Magic with LSF

For two of my projects, we’ve been using the HPC cluster at NCSU pretty heavily. This cluster uses LSF for job control and one of the problems we encountered is that several of the worker nodes have scratch directories that are full. Luckily LSF has a solution for that. When submitting jobs, LSF allows you to specify a pre-execution command (bsub -E) that can be used to determine, if a node has the resources to complete a job. If the command is a success, then the job is run on the node, otherwise the job is put back in the queue to wait for another node.

We use the pre-execution command to test whether there is enough space on /scratch to hold our files. For one set of runs, the rule of thumb I use is greater than 1GB of space. Here is our solution:

bsub -E "test `df -B 1G /scratch | grep /scratch | cut -c42-50` -gt 1" [...]

This command works by extracting the freespace on /scratch from df and then comparing it against our limit.

For my second trick, I’ve written my application to respond gracefully to interrupts that ask it to terminate early. When it receives a ctrl-c (INT signal), the program stops cycling, reports partial results, and then exits. This allows the program to be stopped early without wasting the results that it has already gathered. On the cluster, when my jobs have run longer than their time limit, LSF sends a USR2 signal to them, which they treat like the INT signal above.

This was actually not that difficult to implement, the signal function was well documented. However, on the cluster my jobs are contained in a shell script that sets up the environment, calls my application, and then cleans up, moving the output from scratch to a permanent location. This shell script was not handling the USR2 signal gracefully and was thus dieing instead of preforming cleanup. After some digging, I found that I needed to convert my shell script from tcsh (C Shell), which has limited signal handling, to bash (Bourne-again Shell), which has flexible signal handling. Under bash, all I have to do is issue the following command to force bash to ignore USR2 and INT signals.


This allows the shell script to proceed with cleanup after my application has terminated early.

Writing the previous entry reminding me that I never posted the following solution.

While working on my book, The Open Laboratory 2007, I typeset everything in LaTeX. One of the features that I included was a drop cap at the beginning of each entry in the anthology, using the Lettrine package.

However, I could find no way to automate the process in LaTeX, requiring the \lettrine command to be included at the beginning of each entry. This resulted in a few entries missing the drop cap because their omission was not caught in our short development time. I worked on the book a while after it was published, fixing some errata that I found. During this time, I finally developed a way to apply the drop cap automatically to the entries. Here is the solution, simplified a bit.

% Required Packages

% Setup lettrine
\input EileenBl.fd

% The Magic: finds first letter and first word in the proceeding text
% and passes them to lettrine

% Setup environment `entry' to use `entry*' with a drop cap
% Setup environment `entry*' so that lettrine can be manually specified if needed

This solution will not always work if the entry begins with something other than text; markup and figures can confuse it. However, for the few instances when I need to specify \lettrine manually, I can fall back to the entry* environment.

Sumatra PDF and LaTeX

I recently discovered a wonderful pdf reader for windows that makes working with pdfLaTeX a breeze: Sumatra PDF. (Yes, it has a whole Watchmen theme going on.) It is lightweight and easy to use. Unlike Acrobat Reader, Sumatra PDF does not lock the file, which allows you to overwrite it without having to close your viewer. Additionally, Sumatra will detect that the pdf has changed and reload the document staying on the same page.

I doubt Sumatra PDF supports all the latest features of Acrobat Reader, so if you have to work with a complex document, you can always fall back to Acrobat. However, for the majority of LaTeX documents it is the perfect solution. Combine Sumatra with TeXnicCenter, and you have a powerful, free solution for writing LaTeX-based documents on Windows.

Compiling R Modules on FreeBSD

Installing modules for R can be tricky on FreeBSD due to differences between the compiler environment on FreeBSD and Linux. If the R module links against a third-party library, it is likely that the library will not be found without a bit off additional tweaking of the install command. On my FreeBSD 7.1 workstation, I have to use the following commands to install the GSL and Cairo R libraries, both of which link against external libraries. (They are wrappers to these libraries.)

install.packages(c("gsl", "Cairo"), configure.args=c("CPPFLAGS=-I/usr/local/include", "LIBS=-L/usr/local/lib"))

It would be nice if this was supported out of the box in R, but if you know the solution it’s rather trivial to do.

MT-Dispatch 2.00

I finally rolled MT-Dispatch 2.00. I’ve been using it for a while on my two sites, but haven’t released it until now. This version includes several new features including a new directory layout, better responsiveness, auto recycling, and synchronization to MTOS 4.21.



Test Away!


I’ve created a new client-side script to go along with the next version of Xomment that will verify and correct comments, before they are submitted. I don’t have time to describe how it works right now, but try previewing or submitting a comment on this entry to see it in action.

Try this bad comment for starters:

<b><i>I am a quote</i></b>


<b><i>I am a quote</b></i>

Send me any complements or issues that you find. Note, comment validation is tied to your browser, so depending on what you use, you many get different results.

Howling Nightmare

Three years ago I mentioned that the creator of the Sims was working on a new game-of-life called Spore. It now looks to be nearly done and they’ve offered an early showing of their “creature creator” to celebrities and choice bloggers to help advertise their game. Although inspired by the science of evolution, the developers have taken plenty of artistic liberties with the concept to make this game. Don’t expect Spore to be anything like Avida anytime soon.


While I tried to make a Prof. Steve Steve inspired creature, it didn’t work too well. Instead I opted on creating a “Howling Nightmare”. This is the creature’s description:

Howling Nightmare, Alouatta pandas, is a flying carnivore covered in hard armor and known for its powerful howls and painful bites. It typically hunts at night, ambushing large, slow herbivores while they sleep. A single pair can consume five thousand times their body weight during a breeding season to feed their ravenous brood. Although this secretive creature is rarely seen, its kills litter the landscape, while its frightening howls remind you that it is never far away.

Announcing Xomment


To improve the experience for our readers on Panda’s Thumb, I’ve been working a while on revolutionizing the comment experience with Web 2.0 technology. An early version of the technology is already running the Bathroom Wall, and I’ve deployed it on this site. Note that I’ll deploy it fully on PT once MT 4.15 gets out of beta.

I’m very proud of the technology and have released it to the Movable Type blogging community as Xomment. What makes Xomment special is that it Ajaxifies the comments, providing four new features.

  1. Comment paneling. It’s like pagination, only better.
  2. Comment submission without redirect.
  3. Comment preview without redirect.
  4. Comment quotation.

Cool, huh?

For download and installation instructions, see Xomment documentation.

A traditional library for command line processing is the getopt library. For C++ programmers, a powerful alternative is the Boost::Program_options library (PO). PO allows you to specify your options in a rather simple and flexible way:

po::options_description desc("Allowed options");
    ("help", "produce help message")
    ("compression", po::value<int>(), "set compression level")

What I like about OI is that it does all the heavy lifting for you, like putting the command-line options into variables:

    ("ouput-file,o", po::value<std::string>(&arg_output_file)->default_value("out.txt"), "Save output to file.")

Now you can access your program and pass command line options to it:

foo --output-file=foo.out
foo -o foo.out

And after the application processes your command line options, the values of output-file will be found in the arg_output_file variable. Now you just have to create a bunch of arg_ variables, and then pass them to add_options. However, this quickly becomes hard to maintain when you realize that to add a new command line options, you may have to edit multiple files to get it to work right. This is where the awesome power of X-Macros come into play.

X-Macros are undefined macros that are contained in a file, say app.cmds:

//XCMD(lname, sname, desc, type, def) 
XCMD(help,             h, "display help message", bool, false)
XCMD(version,          v, "display version information", bool, false)

One uses these X-Macros by defining XCMD in a source file, including app.cmds, and then undefining XCMD. By defining XCMD different ways, you can reuse your cmds file.

#define XCMD(lname, sname, desc, type, def) type arg_##lname ;
#include "app.cmds"
#undef XCMD

po::options_description desc("Allowed options");
#define XCMD(lname, sname, desc, type, def) ( \
    #lname "," #sname, \
    po::value< type >(&arg_##lname)->default_value(def), \ 
    desc )
#include "app.cmds"
#undef XCMD

This code would become the following after preprocessing the X-Macros.

bool arg_help;
bool arg_version;

po::options_description desc("Allowed options");
    ("version,v", po::value< bool >(&arg_version)->default_value(false) , "display help message")
    ("help,h", po::value< bool >(&arg_help)->default_value(false) , "display version information")

Now to update your command line options, all you need to do is edit your commands line file, and PO and X-Macros will do the rest for you. Isn’t that totally cool!

I’ve uploaded a simple application framework that makes use of these concepts. I’ve used it in my recent application development, including the 1.2 version of Ngila, and three other unreleased projects. Click here to download the framework. You will of course need to install Boost to use it.

The X-Macros in this framework are a bit more complicated than the examples here because they handle conditions not discussed here. These include an optional sname argument, and the ability to handle compound command line options, e.g. arg_output_file being specified by --output-file instead of --output_file. I use parts of the Boost::Preprocessor library to support these things. The framework also uses additional features of Boost::Program_options, including passing arguments via a file and using positional arguments.

Try it out and let me know what you think.

MCMC Convergence Tool

My friend Paulo Nuin has an article up highlighting a new tool for testing the convergence of MCMC chains: Are we there yet? Pretty much.

AWTY is pretty simple interface, the simple here being a compliment. No frills and easy to use. Basically the whole process can be summarized by:

upload tree file(s) -> select analysis type -> click Plot -> check result

Here is another javascript blogging trick for you. We’ve been using it for many years on the Panda’s Thumb, and we recently debugged a problem some browsers had with it. Like my previous ones, this one was recently updated to work with the jQuery javascript library.

In this trick, whenever your users read an entry via the ‘#comments-new’ anchor, their browser will jump them to the last comment that they read. The browser knows which comment to jump to because it store a cookie last time they visited the page. To get this to work with Movable Type 4 you will need to first upload jquery.js and modify your header template module. Insert the following line between the <head></head> tags.

 <script type="text/javascript" src="/URL/TO/jquery.js"></script>

At same time it might be best to replace MovableType’s use of the onload command with a call to jQuery’s ready event. Simply add the following lines before the </head> tag:

<MTIf name="body_onload">
<script type="text/javascript" language="javascript">
$(function(){<$MTGetVar name="body_onload"$>});

Also remove the <MTIf name="body_onload">... tag from the <body> tag. This should provide better performance and compatibility with the rest of my javascript tips.

Now for the magic. Edit your Comments template module and add the following text before the first <MTComments> tag.

<mt:Unless name="comment_preview_template">
<div id="comments-new"></div>
<script type="text/javascript" language="javascript">
	var now = new Date();
	var cook = getCookie('last_comments_a');
	var hid = (cook) ? eval(cook) : new Object();
	for(var k in hid)
		if(hid[k][1] < now )
			delete hid[k];
	now.setTime(now.getTime() + 7*24*60*60*1000);
	var cid = 0;
<mt:Comments lastn="1">
	cid = hid[<mt:EntryID/>];
	hid[<mt:EntryID/>] = [<mt:commentid>, now.getTime()];
	var src = (!Object.prototype.toSource) ? objSource(hid) : hid.toSource();
	setCookie('last_comments_a', src, now,'/','','');
	if( window.location.hash == '#comments-new')
		window.location.hash = (cid) ? '#comment-' + cid[0] : '#comments';

And finally, edit your Entry Metadata template module and replace #comments with #comments-new.

So now whenever someone clicks the comments tag, it will direct them to #comments-new which will trigger the script to jump them to the last comment read or the first comment on the page.

I was recently made aware of the fact that this solution was broken on Opera and maybe other browsers. Therefore, you may want to add the following lines to your Javascript index template to work around limitations of such browsers.

function objSource(obj) {
	switch (typeof obj) {
	case 'number':
	case 'string':
		return obj.toString();
	case 'object':
		var str = [];
		switch(obj.constructor) {
		case Function:
			return 'null';
		case String:
		case Number:
			return obj.toString();
		case Array:
			var i=0,j=obj.length;
			while(i<j) {
			str=['[',str.join(', '),']'];
			for(var i in obj){
				var v = objSource(obj[i]);
				if(typeof(v) != 'undefined') {
			str=['({',str.join(', '),'})'];
		return str.join('');
	return 'null';

And knowing is half the battle.

Last October I described my algorithm for rendering LaTeX equations for use on this blog: “Rendering Equations in Movable Type”. For kicks, here are instructions on how to manually run the algorithm. The only thing missing is the middle alignment part of the algorithm.

First create a file, called eqn.tex using the following skeleton. Insert your LaTeX equation code where it says to.


Next process the file with pdflatex eqn.tex, which will render your equation as eqn.pdf. Now we will use the convert command line tool from the ImageMagick library to turn eqn.pdf into eqn.png.

convert \( -density $DENSITY eqn.pdf -trim +repage \) \
   \( +clone -fuzz 100% -fill $FG -opaque black \) \
   +swap -compose copy-opacity -composite \
   \( +clone -fuzz 100% -fill white -opaque $BG +matte \) \
    +swap -compose over -composite eqn.png

Here DENSITY, FG, and BG are user tunable variables. DENSITY tells convert how many pixels per inch to use when rendering the pdf to a png. This determines how big of a png you have for a given rendered equation. Note that a density of 300 is print quality, 96 is windows monitor standard, and 72 is the mac standard. FG is the color of the equation text in the final image, and BG is the color of the background. Setting BG to none will make the background transparent.

Now for an explanation of the options:

\( -density $DENSITY eqn.pdf -trim +repage \) \ loads the pdf into memore, removes excess background, and saves it into the image stack position 0.

\( +clone -fuzz 100% -fill $FG -opaque black \) \ copies the image at position 0, fills it with the FG color, and saves it at position 1. We clone and fill to ensure that image 1 has the same dimensions as the trimmed image in position 0.

+swap -compose copy-opacity -composite \ uses the black-and-white values in position 0 to determine how transparent pixels are in 1, clears the stack, and saves the result to pos 0. We actually swap positions 0 and 1 to put them in the right order for the -composite operator.

\( +clone -fuzz 100% -fill white -opaque $BG +matte \) \ copies image 0 again and fills it with the BG color, discarding all transparency information.

+swap -compose over -composite eqn.png overlays image 0 onto image 1 and saves result to eqn.png.

And now you have an equation rendered as a png that can be included on any webpage.

Spammers use automated programs to search the internet for email addresses to harvest. There are several techniques out there to hide email addresses from spam “robots”: Address Munging. Although, all these techniques can be beaten—some rather easily—you probably still want to use them because the average harvester is not designed to handle them. Sites with exposed email addresses outnumber those that protect theirs, and as long as there are easy nuts, the spammers are not going to invest the resources and time to crack the hard nuts. (Think Darwin’s Finches.)

The simplest technique is to add characters to the email address that most humans will recognize and remove or replace to form the valid address:

jason [at] theargonauts [dot] com

This is a very effective technique; however, it requires user intervention, and the most popular techniques can still be interpreted with little effort by a robot. If user intervention is not a concern, then one nearly flawless technique is to simply display the email in an image. In my opinion this should only be used as a last resort because it has several drawbacks: no click to email ability, difficulty in copying the email address, increased download size, etc.

Transparent munging techniques can be used to allow users to click and use email address directly. The simplest technique encodes the email address using html entities:

Obviously, this is not that hard to crack because everyone knows the entity-to-character translation rules ahead of time. However, we can use javascript to make things difficult to understand but easy to use. The logic here is to encrypt email addresses in webpages and use javascript to force browsers to decrypt them. The advantage here is that to all humans visiting your site, the email address behaves like a normal mailto: link. However, robots won’t understand it because they don’t implement javascript. Using a simple rot13 cypher would produces the following address:


From this, javascript can be used to tell the browser that the above email address is really Although generic harvesters will probably never implement the complexities of javascript, they can be programed to decode the email addresses without javascript once they learn the algorithms. However, as long as each site implements its own encode and decode algorithms, it is uneconomical for robots to specialize and hard to be comprehensive.

I use the following javascript framework to decode email addresses on my blogs:

  1. Find all links with a mailto: specification and for each:
  2. Match an encoded address or return
  3. Decode the address
  4. Split the decoded string into the real address and an optional url text
  5. Set the link location to the real email address.
  6. Set the inner html text if it was specified.

Using the tricks of the jQuery library and the assumption that email addresses are encoded as hexadecimal numbers (characters 0-9 and a-f), we have the following frame work. You will just need to supply your own decryption routine.

	$('a[href^=mailto:]').each(function(i) {
		var xr = this.href.match(/^mailto:([0-9a-f]+)$/);
		if(!xr) return;
		var x = xr[1]; // encoded address
		var s = ''; // decoded address
		Here be decoding of 'x' into 's'.
		xr = s.split(/\|/);  // optionally split real address from url text
		this.href = "mailto:" + xr[0];
		if(xr.length > 1) $(this).html(xr[1]);

Here is an example of an encoded email address used on this page that is decoded by the script above:

  <a href="mailto:7d0f6a0f6b2b4f2a583d4f3a5739582c592b4a6411625d0e7b19731675013c781d3d6f0a780d60400e6f1b6e1c7d5d1f731c7b">Email Me</a>

Very cryptic and yet the page still validates—unlike other javascript email hiding implementations—while the user hardly ever notices that anything is up. Of course, if you decide to use this technique you will need a way to encode your email addresses in the first place. I wrote movabletype plugin to produce my encoded email addresses. Another option would be to setup a form on your website that would process addresses and return the proper encoding.

Many of you may be familiar with the behavior on this blog and the Panda’s Thumb that links to external websites open a new window or tab. In old fashioned html, this would be accomplished by adding a “target” attribute to your link:

<a href="" target="_blank">Scitus</a>

However, with the advent of XHTML, content was separated from style and behavior, and the target attribute was officially retired; although, it is still recognized by standard browsers. Instead with XHTML, the proper solution is to use the “rel” tag to show how the linked page is connected to the current page:

<a href="" rel="external">Scitus</a>

Perhaps you are using a browser that understands this but are probably not; therefore, we use a line of javascript to tell the browser what to do with those links:

$(function(){$('a[href][rel*=external]').each(function(i){ = "_blank";});});

The above code is very cryptic because it relies on shortcuts provided by the jQuery library. With prototype you can do use something equally cryptic like this:

function externalLinks() {
	$$('a[href][rel~=external]').each( function(value, index) { = "_blank";});
Event.observe(window, 'load', externalLinks, false);

Sure, it’s more complicated than the original, but it wouldn’t be progress if unless they made things harder to do.

About this Archive

This page is an archive of recent entries in the Software category.

Reed's Life is the previous category.

Find recent content on the main index or look in the archives to find all content.


Powered by Movable Type 4.37