Code and Hacks

Stuff I've stumbled on or figured out... mostly Perl, Linux, Mac and Cygwin.

My Photo
Name:
Location: CA, United States

Perl hacker, investor and entrepreneur.

Wednesday, April 29, 2009

C-Like Pointers In Perl...Oh No!

Tuesday night David Lowe gave a very interesting talk at SF.pm on pack/unpack and some of the awful things you can do with them.1 We ended the meeting talking about whether you could use the pack format "P" (which packs and unpacks "a pointer to a structure (fixed-length string)") to force poor Perl to do C-like pointer arithmetic.

David is using unpack to do a binary search of fixed width blobs of data in order to avoid unserializing it. His current (minor) bottleneck is creating the pack format string dynamically for each step in the binary search (ie, 'x' . ($record_size * $record + 1)). The math is fast, the string concatenation is relatively slow. I wondered if you could use the "P" format to avoid creating the format string on each pass and stick with simple integer arithmetic.

After a bit of hacking, it turns out this can be done. Instead of David's very complicated:

# Create an unpack format to skip the first $record * $record_size 
# bytes, then return the next 100 byte null padded string
my $format  = 'x' . ( $record_size * $record ) . 'Z100';
# Unpack from our binary blob
my $element = unpack( $format, ${$frozen_haystack_ref} );

You get the nearly unfathomable:

 
# Use pointer arithmetic to calculate where the record is in memory
# and convert the Perl integer into an unsigned long integer
my $ptr     = pack( 'L!', $ptr_to_base + $record_size * $record );
# Pull 100 bytes from that spot in memory
my $element = unpack( 'P100', $ptr );

And voila, Perl is doing pointer arithmetic and accessing structures just like C. Unfortunately, unpack("P") won't take a native Perl integer as an argument. You need to use pack("L!") to turn a Perl integer it into a long integer. So we trade the string concatenation in David's code for a pack("L!") in this code. And even worse, string concatenation is about 20% faster than unpack.

So, while this doesn't appear to help David speed up his already cheetah like code, it does prove that you can have pointers in Perl. Of course, you should never ever do anything like this. It is fraught with potential bugs and will drive anyone stuck maintaining your code insane.

Feel free to take apart my ugly benchmarking code. Maybe someone who knows this better can actually save David a few clock-cycles.

--

By the way, thanks to Matt Trout who got me motivated to (re)start blogging about Perl. In the past, I have gotten bogged down by setting up a site rather than focusing on adding content2. This time I decided to let Google do the work for me and focus on the content. Hopefully, this will result in more regular (and interesting?) posts. Feedback is very welcome.

Footnotes:
1. David actually has good reasons to do these horrible things, given some of the performance demands of his code, for the rest of us this is just fun^H^H^Hwrong. 2. Either putting together my own TT based blog/site or trying to get MT to work the way I want.

Labels: ,