Perl6 load testing

Tags:

I had a need to do some testing of concurrent requests against and service and as is my way I reached for Perl6. After whipping up a quick single line script to do the testing I thought I might want to make something a bit more usable.

I'll run through the code in sections. Firstly we have a bit of boiler plate to use Perl6, allow command lines arguments in any order and load the HTTP::UserAgent module. I picked this one because it's lightweight and simple to use for what I wanted to do.

#!/usr/bin/env perl6
use v6;

use HTTP::UserAgent;

my %*SUB-MAIN-OPTS = :named-anywhere;

With that out of the way I like to use a MAIN sub for all my command line apps and use the built in self documentation comments with it

#| Makes a number of concurrent requests to a given URL and lists the time it takes to load
#| Times are given in the order requests are started.
    sub MAIN (
        Str $url, #= Url to request
        Int $count = 10, #= Number of requests to make (Default 10)
        Int :r(:$ramp-up) = $count, #= Ramp up increment (Note: if it's not a divisor of count the final count will be higher).
        Bool :s(:$summarize) = $count > 20 ?? True !! False, #= Summarize the results. Defaults to true if count > 20.
) { 
    ...
}

If you run this code without any arguments you get the automatically generated usage text.

Usage:
  ./load-tester.pl6 <url> [<count>] [-r|--ramp-up=<Int>] [-s|--summarize] -- Makes a number of concurrent requests to a given URL and lists the time it takes to load Times are given in the order requests are started.

    <url>                   Url to request
    [<count>]               Number of requests to make (Default 10)
    -r|--ramp-up=<Int>      Ramp up increment (Note: if it's not a divisor of count the final count will be higher).
    -s|--summarize          Summarize the results. Defaults to true if count > 20.

A couple of notes on the arguments. The $count value has a default value, the $ramp-up value defaults to be the same as $count so you have no ramp up. The defaulting of $summarize is fun, it looks at the value of $count and bases itself off that. By putting all this logic in the arguments when you enter the subroutine you don't have to waste a bunch of time checking values and setting defaults.

Inside our MAIN sub we firstly set up our code for loading urls. This is nice and simple.

my $ua = HTTP::UserAgent.new;

my sub time-load() {
    my $resp = $ua.get($url);
    return now - ENTER now;
}

Note that the time-load subroutine is lexically scoped using my to live within the MAIN subroutine. As such it has access to the newly created $ua object and the $url passed in. It's a local named closure we can use to make requests. The ENTER now part is a Phaser, the call to now is triggered when the subroutine is entered and the value got is used later when the calculation is reached. This way our time function returns the amount of time it took to make the request.

With that we just need to call the function. Of course to load test we want to fire up a bunch of concurrent requests and then wait for them all.

First we want a sequence of number of requests to make which will either start with $count or start at $ramp-up and increase up until count.

my @counts = $ramp-up,*+$ramp-up ... $count <= *;

for @counts -> $current {
     ...
}

Then inside our loop we want to create $current request concurrently, wait for them all the finish and report on the result.

    my @p;

    say "$current requests";

    for (^$current) {
        @p.push( start time-load() );
    }
    await @p;
    @p = @p.map( *.result );
    if ! $summarize { .say for @p }

    say "Min {@p.min}";
    say "Max {@p.max}";

Here we make an array @p then use start call our time-load closure as a Promise then we pass this to await which blocks until all the promises in the array have completed.

Then we get the result and display it. Simple really. The initial commandline version took about a minute to make then I decided to tidy it up into a script as I wanted to use it again. All in about 15 minutes. Things I may look at adding include keeping the load up for a bit maybe using channels and adding some error checking. But for me needs to check a theory I had about a server and it's time properties under load this did the job nicely. Perl6's ability to get a lot done in a little bit of code never ceases to amaze me.

Here's the complete script if you'd like it :

#!/usr/bin/env perl6
use v6;

use HTTP::UserAgent;

my %*SUB-MAIN-OPTS = :named-anywhere;

#| Makes a number of concurrent requests to a given URL and lists the time it takes to load
#| Times are given in the order requests are started.
sub MAIN (
    Str $url, #= Url to request
    Int $count = 10, #= Number of requests to make (Default 10)
    Int :r(:$ramp-up) = $count, #= Ramp up increment (Note: if it's not a divisor of count the final count will be higher).
    Bool :s(:$summarize) = $count > 20 ?? True !! False, #= Summarize the results. Defaults to true if count > 20.
) { 

    my $ua = HTTP::UserAgent.new;

    my sub time-load() {
        my $resp = $ua.get($url);
        return now - ENTER now;
    }

    my @counts = $ramp-up,*+$ramp-up ... $count <= *;

    for @counts -> $current {
        my @p;

        say "$current requests";

        for (^$current) {
            @p.push( start time-load() );
        }
        await @p;
        @p = @p.map( *.result );
        if ! $summarize { .say for @p }

        say "Min {@p.min}";
        say "Max {@p.max}";
    }
}

Perl Weekly Challenge 13

Tags:

So I know I've fallen off the blogging about the weekly challenge. This is not because I've not been doing (I have) or I've not been enoying them (I also have). It's because my blogging platform (a neat little Perl based system called Statocles) kind of died while parsing the Markdown on a previous blog post and this got me wound up.

It also got me to restart my personal plan to make a similar system for building simple, fast to update, static sites using Perl6. The first part of which (that is mostly so I can learn Grammars) is a Pure Perl6 Markdown parser.

It's taking a while but I'm hoping in my upcoming week off I'll get a lot more of it done.

Anyway I just read Laurent Rosenfield's post about this week and I wanted to share one of my two solutions becuase I thought it was kind of neat.

Hofstadter Female and Male sequences

So the Hofstadter Female and Male sequences are defined by the following equation :

F(0) = 1 ; M(0) = 0
F(n) = n - M(F(n-1)), n > 0
M(n) = n - F(M(n-1)), n > 0

So it's a pair of recursive functions. Where to calculate the value for n in either sequnces you need to calulate the ones before in both sequences. The problem with doing it as a pair of recursive functions is that without some kind of caching you're going to be calculating the same values over and over again (especailly) the smaller values of n. Laurent covered some nice ways to cache the data but when I had a look the word sequence jumped out at me. Hence my final code looks like this :

sub MAIN ( UInt() $n ) {
    my @M;
    my @F = lazy gather {
        my $n = 0;
        take 1;
        loop {
            $n++;
            take $n - @M[@F[$n-1]];
        }
    }

    @M = lazy gather {
        my $n = 0;
        take 0;
        loop {
            $n++;
            take $n - @F[@M[$n-1]];
        }
    };

    say "F : {@F[0..$n].join(", ")}";
    say "M : {@M[0..$n].join(", ")}";
}

Here I define a couple of lazy sequences that refer to one another. Note I have to predefine @M before I do @F or it raises and error but it's fine with my updating what @M is after the fact. I added some notes in the code to note before a take happened and each value is only taken once.

Later today I may run some speed tests of this code against the code Laurent wrote to see how performant it is.

An aside

While doing the first challenge for this week I found a bug in Perl6 where is you create a Range or Sequence of date objects and you give the first a custom formatter after the 28th of the month the formatted drops off. This bug was fixed as soon as I raised it in the Perl6 IRC and will hopefully be resolved in the next release.

Which is pretty cool.

Anyway, more later. Hopefully.

Perl Weekly Challenge Part 8

Tags:

Perfect Numbers

So if nothing else the challenges are helping me learn new bits of number theory. For this we're looking for Perfect Numbers which are numbers where the sum of their proper divisors equals the number itself.

I'll admit my first attempt at this involved me reading that, not reading the rest of the wiki page and making a couple of simple functions :

sub proper-divisors ( Int $p ) is pure {
    (1..^$p).grep( $p %% * );   
}

sub is-perfect ( Int $p ) is pure {
    $p == [+] proper-divisors( $p );
}

Once you've got those is simply a case of checking numbers until you get 5 where is-perfect is True. Which technically works. But it takes a while. A very long while to get to number 4 (8128) and I then stopped the script and looked at the wiki page.

Time for a new plan because I'd be dead of old age before we hit the next value.

Once again I scanned the code and whipped up a nice little function.

sub perfect-from-prime( Int $p where *.is-prime ) is pure {
    return (2**($p-1)) * ((2**$p) -1 ); 
}

So with this in place the code to get the list of prefect numbers is now simple.

.say for (1..*).grep( *.is-prime ).map( { perfect-from-prime( $_ ) } )[^5];

... I'll break that down.

.say for # Call the .say method on each item in the given list 
(1..*). # An infinite range starting at one. This will be evaluated lazily
grep( *.is-prime ). # Only include prime numbers
map( { perfect-from-prime( $_ ) } ) # Get the perfect number for the prime
[^5] # Get the range of items from this list starting at 0 and up to but not including 5

Which is nice. But as the eagle eyed amoung you will note .... utterly wrong. Because perfect numbers are not generated from primes but from Mersine primes. Did I mention I'm not a mathematician.

Anyway once I worked that out the fix was simple enough. Add a new function called is-mersine. This is a simple enough thing. If you call it with a prime number ($p) where ((2**$p)-1).is-prime is true then it's True. For any other value it's False. So you could do this with a ternary operator (or just by returning $p.is-prime && ((2**$p)-1).is-prime ) but I felt like pushing the boat out...

multi sub is-mersine( Int $ ) is pure { False; }

multi sub is-mersine( Int $p where ((2**$p)-1).is-prime ) is pure { True; }

Because I love multi subs an abusing signatures. In this case I'm not checking for primeness because I'm nothing if not consistent as I know I'm going to only pass it primes... Of course here's the final code I submitted.

#!/usr/bin/env perl6

use v6;

sub perfect-from-prime( Int $p where { $p.is-prime && is-mersine( $p ) } ) is pure {
    return (2**($p-1)) * ((2**$p) -1 ); 
}

multi sub is-mersine( Int $ ) is pure { False; }

multi sub is-mersine( Int $p where ((2**$p)-1).is-prime ) is pure { True; }

#| Help
multi sub MAIN ( Bool :h($help) where *.so ) {
    say $*USAGE;
}

#| Give the first X perfect integers
multi sub MAIN ( 
    Int $x = 5 #= Number of perfect integers to return (default 5)
) { 
    .say for (0..*).grep( *.is-prime ).grep( { is-mersine( $_ ) } ).map( { perfect-from-prime($_) } )[^$x];
}

Note my wonderful consistency if is-mersine not checking for primality but perfect-from-prime is. I dunno, I guess it made sense to me at the time. Anyway this was fun, enough maths, time for some text munging.

Centering Text

So this challenge asks for a function to add spaces to a list of strings to center them. In order to do this you are going to need to know the length of the longest string. So you're going to need to loop over the list a couple of times. Once to get the length and a second to add the spaces.

Of course I found a weird way to do it :

sub center ( *@lines, :$len = max( @lines.map( *.codes ) ) ) {
    @lines.map( { "{ ' ' x ( ( $len - $_.codes ) div 2 )}$_" } );
}

So here I have a function that takes any number of positional arguments and put them in an array *@lines. I also accept an optional named parameter :$len that by default is set to the max value of getting the length of each item in @lines (yes, the one defined in the signature).

Then we use a map to create out new strings. At this point I use the upgraded string interpolation powers of Perl6 to just embed my spacing code in here ' ' x ( ( $len - $_.codes ) div 2) gives us the difference between the max length and the current string halved (rounded down) then we just use the x string repition operator.

Of course by itself that's a bit dull so lets wrap that in a MAIN block and then... run it like so cat ch-2.p6 | perl6 ch-2.p6 which gives us.

                        #!/usr/bin/perl6

                            #! Help
    multi sub MAIN ( :help($h) where *.so ) { say $*USAGE; }

            #| Reads a list of strings from STDIN 
                    #| Outputs them centered
                    multi sub MAIN () {
                .say for center( $*IN.lines );
                                }

sub center ( *@lines, :$len = max( @lines.map( *.codes ) ) ) {
    @lines.map( { "{ ' ' x ( ( $len - $_.codes ) div 2 )}$_" } );
                                }

Isn't it pretty? The centering is a bit off of course as some of the lines start with spaces. I could probably strip those out first. But this is neat. So lets go with it.

Perl Weekly Week 5 (Part 2)

Tags:

Technical note

My blogging platform was really not happy with some of this post. I tihnk I'm going to have to go back to my earlier plan of building my own in Perl6. So I've cut up my first couple of failed attempts, one was kind of neat as it involved threading but I'm sure I'll revist it in the future. Meanwhile he's the final result.

Back to the drawing board

Ok. Brute force and ignorance, even with threads was not enough. Time to have a think about a better solution. Also... time to re-read the spec. I was trying to do too much. I already have a script that can find Anagrams for a given string, so why am I not doing what the spec says and follow the Unix paradigm? Write a script that does one thing well.

So I had a think. I needed to avoid trying to cross reference things, at this point I had an idea that was based on trees. The core idea goes like this, normalise and sort the string, then store it in a hash and keep count of numbers of times that node has been reached. Then I can keep running count of the biggest number and it's strings. Here's an example. Lets say we've just started and out first word is ant this goes into out tree which becomes :

{ 
 'a' => {
         'n' => {
                 't' => {
                         'count' => 1
                        }
                }
        }
}

So our counter is at 1 and the list of strings is ['ant']. Then we get act :

{ 
 'a' => {
         'c' => {
                 't' => {
                         'count' => 1
                        }
                }
         'n' => {
                 't' => {
                         'count' => 1
                        }
                }
        }
}

The counter stays at 1 and the list is now ['ant', 'act']. Now we get tan which becomes act and the tree is now

{ 
 'a' => {
         'c' => {
                 't' => {
                         'count' => 1
                        }
                }
         'n' => {
                 't' => {
                         'count' => 2
                        }
                }
        }
}

The counter is upped to 2 and the list is reset to ['ant'] and this continues. It's a nice simple algorithm and the code for it turns out like so :

#!/usr/bin/env perl6

use v6;
use lib 'lib';
use Anagrams;

my %*SUB-MAIN-OPTS = :named-anywhere;

sub USAGE { say $*USAGE }

subset FilePath of Str where *.IO.f;

#| Display Help file
multi sub MAIN ( Bool :h($help) where *.so ) { USAGE(); }

So here's our boilerplate again

#| Find the strings(s) which has the most anagrams in the given dictionary
#| Prints out 1 string per line to STDOUT
multi sub MAIN (
    FilePath :$dict = "/etc/dictionaries-common/words" #= Dictionary file to use. Defaults to "/etc/dictionaries-common/words"
) {
    my %store;
    my $max-length = 0;
    my @values;
    for $dict.IO.words -> $word {
        my $string = order-string( $word );
        my @keys = $string.comb;
        my $current = %store;
        for @keys -> $key {
            $current{$key} //= {};
            $current = $current{$key};
        }
        $current<count>++;

This part handles the spidering through the tree. At the end $current is a reference to the end node for the string. We increment the count value (creating it if it doesn't exists).

        given $current<count> {
            when * > $max-length {
                $max-length = $current<count>;
                @values = [$string];
            }
            when * == $max-length {
                @values.push($string);
            }
        }

Given is the Perl 6 version of a switch if our count is over the max length we up it and create a new output array. If it's the same as max length we add our new string to the output array.

    }
    @values>>.say;
}

Then once the loop is down we print out the results.

While the previous script took 3 hours and got no where this took 3 seconds. So I claim that as a win. Roll on next week.

Perl Weekly Week 5 (Part 1)

Tags:

So this weeks challenge was all about Anagrams and was very interesting. Because both of the solutions were Anagram based I ended up making a library used by both (and in the case of the second test all 3 of my attempted solutions) and I'm going to cover that first.

Anagram Library

So as this is a simple script I don't plan to release as a module or anything I'm not going full out with a META6.json file just a lib folder in the same folder as the scripts and inside that a file called Anagrams.pm6. (Note Perl6 modules can have iether a .pm6 or a .p6 extension. I think the latter is now recommended... but I'm sometimes slow to catch up.)

Our module starts by declaring it's namespace. As everything in the file is considered part of the module I can just use a unit declaration to state that (if you want to have multiple namespaces in a file you need to use blocks for scoping).

unit package Anagrams;

Of course a module with just that in would be a bit pointless so let's have a simple function is-anagram-of that given two strings will return true if the first is an anagram of the second. Now I know I'm going to be doing this a lot and that are some cases where we can quickly see it's not going to be true. For instance a word is not an anagram of itself so if our two strings are the same return false :

multi sub is-anagram-of( Str \target, Str \word where * eq target ) is export is pure { 
    False; 
}

I'll go into this in a bit more detail as there's a lot here and once you get your head around it it's kind of awesome :

  • multi sub is-anagram-of : This is a multi sub which means there will be more than one code path for it with different arguments.
  • Str \target : I'm defining my variables as a immutable Strings. Immutability is important if you want to easily leverage threading.
  • where * eq target : So this where clause is True if the two strings are the same
  • is export : I want this function to be exported to the callers scope.
  • is pure : For any given pair of strings this function always returns the same result. Can help the VM to speed up the code.
  • { False; } : No need for a return. The last thing evaluated is the return value for the sub.

Another case that's going to turn up a lot is when the two strings are not the same length. Then there's now way they can be anagrams :

multi sub is-anagram-of( Str \target, Str \word where *.codes != target.codes ) is export is pure { 
    False; 
}

Here I use the .codes method to count unicode code points which is generally What you want to do TM.

With those two False cases out of the way I can now test my strings. It's at this point I want to have a small digression about English and theft.

Cafes, Faces and Cáfes

The concept of an Anagram can only have been invented by and English speaker. The English language is much like the English people, a bit light fingered when it comes to other cultures stuff. Historically we were really good at wandering to other peoples countries and going "Oh that looks nice I. It's ours now". And we do it to this day with other languages, English speakers see nothing wrong with appropriating words from other languages. (Except Americanisms, those are just wrong.)

But... technically we don't really use accents on letters. We've just got the 26 and get all confused about funny squiggles on top of letters or dangling off the bottom. And don't get me started on ß...

Anyway when I was testing my anagram code I tried looking for the anagram of cafe and got face. And then I tried looking for the anagram of face and got... nothing. Because cafe is actually spelt cáfe. Please note I can't event type á easily. As a native English speaker though I don't care about the accent, just the letters and I'd quite like my anagram checker to be the same, also it would be good if it wasn't case sensitive.

This first part is simple enough lc will give us the string in lowercase. But stripping of accents and other marks that's going to be a pain isn't it?

Itroduction samemark. Which allows you to match the accent information between two strings. And if the second string is shorter than the first then the last character is used for all matching. Which means :

samemark( lc( "Cáfe" ), "a" ) eq "cafe"

That's great and I've got a little function to do that (I call it normal).

sub normal ( Str \word ) is pure {
    samemark( lc( word ), "a" )
}

This function is only used internally so it's not exported.

Back to Anagrams

Ok so to test if a string is an anagram of another string if I take the string, split it into letters and sort the list and then compare the sorted lists they should be the same. As I'm doing that twice I'll make a function to do it :

sub order-string ( Str \word ) is export is pure {
    normal( word ).comb.sort( { $^a cmp $^b } ).join;
}

Note that we're using our normalised string. I'm also joining it again but it would be trivial to return a list and compare that... I just didn't think of it.

With that in place we've got our final is-anagram-of sub :

multi sub is-anagram-of( Str \target, Str \word ) is export is pure {
    normal( target ) ne normal( word ) && order-string( target ) ~~ order-string( word );
}

So you'll see that the first thing we do is compare the normalised strings are not the same. Otherwise cafe would be an anagram of cáfe. This isn't caught by our initial where clause. I thought about doing it there but I figured it's more of an edge case and I didn't want to normalise every string.

So if our two normal strings are different and the ordered versions are the same then they are anagrams. Awesome!

Putting it together

Now I've got this library the script (which weirdly I see I gave a .p6 extension, I'm nothing if not inconsistent).

Firstly some boilerplate :

#!/usr/bin/env perl6

use v6;
use lib 'lib';
use Anagrams;

This loads our Anagrams module and makes sure if we accidentally run this with perl5 it explodes asap.

Then I like to use Perl6's auto generation for usage text and the following gives us a help / h flag.

my %*SUB-MAIN-OPTS = :named-anywhere;

sub USAGE { say $*USAGE }

#| Display Help file
multi sub MAIN ( Bool :h($help) where *.so ) { USAGE(); }

The my %*SUB-MAIN-OPTS = :named-anywhere; allows you to mix and matched named and positional arguments on the command line. $*USAGE is an autogenerated string based on the signatures and comments on the MAIN functions. (#| being a comment tagged to the function below).

For more information why not check out the talk I gave at London Perl Mongers last year?

With that out of the way here's the meat of the script :

subset FilePath of Str where *.IO.f;

#| Find the anagrams for a given word
multi sub MAIN (
    Str $word, #= Word to check for. Case insensitive
    FilePath :$dict = "/etc/dictionaries-common/words" #= Dictionary file to use. Defaults to "/etc/dictionaries-common/words"
) {
    $dict.IO.words.grep( { is-anagram-of( $word, $_ ) } )>>.say;
}

FilePath is a simple subset to ensure that a given string is actually a file, but the main function is no quite simple. $dict.IO.words opens our files and reads string from it splitting on white space (in this case newlines) this is a Sequence that can be iterated over. grep applies our is-anagram-of function with out given word and keeps those that are then we use the hyper operator and .say to print out any that work.

Generally nice and simple. How hard could part 2 be?

Oh ho ho ho ho. Tune in later for Part two of the challenge and be ready to mock my child like ignorance.