Friday, June 25, 2010

Implementing file upload progress bar for the new PhilPapers

At first glance this seems like a trivial thing to do - you periodically observe how much of the file you've got and update the progress bar - so it did not look like much of work when I was assigned the task to replace the hodgepodge of technologies that provided that feature at the previous version of PhilPapers website. But a quick research of available Open Source solutions revealed that there aren't that many of them and none fitting our choice of technologies and in particular ones that don't use Flash - so bad lack I had to rewrite our own from scratch. This lack of ready-available libraries hints about the difficulties of that seemingly trivial task. I will describe here our solution - not because I think it is an optimal one - but to start a discussion of what could be such a solution.

First of all the common programming tools (like CGI.pm or Mason that we use here) assume that the page handler receives the whole request as input - and that whole request is not available until after the file is uploaded. So for example 'my $q = CGI.pm->new' will not finish until it is too late to measure the upload progress. The solution to that is to use another page to report the upload progress and call that page via Ajax from Javascript code updating the progress bar. This would work great - but the file is normally uploaded to a temporary file with a random name and the other script would not have any chance to guess it. We need to generate a new random file name in the form page and then pass that name to the form handler script so that it would save the data to that file, and in parallel to the Ajax scripts that would check the size of that file.


To save the data into a specified filename I used the CGI.pm callback feature:

my $q = CGI->new( \&hook, $fh, undef );
...
sub hook {
my ($filename, $buffer, $bytes_read, $fh) = @_;
print $fh substr($buffer, 0, $bytes_read);
$fh->flush();
}

It is described in the subsection called "Progress bars for file uploads and avoiding temp files" of the CGI.pm documentaion, but actually it is a great leap of thought to say that it supports progress bar implementation, you still cannot use it directly to get the progress bar from the CGI object on the form landing page, you still need the separate scripts measuring the progress. For my solution all I needed was to pass the target file name to the code saving the data, this could be easier than writing this callback above. And the callback is still not everything - I yet need a way to pass the generated filename from the form page to that script - and not via form parameters, remember they are not available at that stage. So how can that be done? Simple - as PATH_INFO - which is available in the %ENV hash even before the params are parsed by CGI.pm.

This is the skeleton of the solution - there are a few more details in the actual implementation - but the code will be published soon as Open Source - so I hope everyone will be able to look them up there.