pl6anet

Perl 6 RSS Feeds

Steve Mynott (Freenode: stmuk) steve.mynott (at)gmail.com / 2016-08-31T08:21:19


Weekly changes in and around Perl 6: 2016.35 A Quick One From Zagreb

Published by liztormato on 2016-08-29T21:41:12

Yours truly reporting en route from the YAPC::Europe 2016 in Cluj-Napoca to the Alpine Perl Workshop in Innsbruck. should be appearing shortly.

Meanwhile in the Far East

Audrey Tang, the original creator of Pugs (an earlier implementation of Perl 6 using Haskell), has been appointed as a minister without portfolio in the Taiwanese government. I think it is safe to say that the entire Perl community wishes her all the best in this new endeavour! Each time one does a spectest, remember that the The Official Perl 6 Test Suite was started as part of the Pugs project!

Core Developments

Blog Posts

Ecosystem Additions

Quite a nice crop again this week:

Winding Down

It has been a busy week for yours truly with all the travelling and conference attending and socializing and sleep depriving. I hope I didn’t miss too many important things. If you think I did, let me know and we’ll let the world know the next week. Until then!


Zoffix Znet: Open Source Projects: When Ideas Meet Reality

Published on 2016-08-28T00:00:00

Discussion about proposal of ideas in open source communities

6guts: Concurrency bug squishing: part 1

Published by jnthnwrthngtn on 2016-08-22T22:24:26

Most of my recent Perl 6 development time has been spent hunting down and fixing various concurrency bugs. We’ve got some nice language features in this area, and they’ve been pretty well received so far. However, compared with many areas of Perl 6, they have been implemented relatively recently. Therefore, they have had less time to mature – which, of course, means more bugs and other rough edges.

Concurrency bugs tend to be rather tedious to hunt down. This is in no small part because reproducing them reliably can be challenging. Even once a fairly reliable reproduction is available, working out what’s going on can be tricky. Like all debugging, being methodical and patient is the key. I tend to keep notes of things I’ve tried and observed, and output produced by instrumenting programs with logging (fancy words for “adding prints and exception throws when sanity checks fail”). I will use interactive debuggers when needed, but even then the data from them tends to end up in my editor on any extended bug hunt. Debugging is in some ways the experimental science of programming. Even with a good approach, being sufficiently patient – or perhaps just downright stubborn – matters plenty too.

In the next 2-3 posts here, I’ll discuss a few of the bugs I recently hunted down. It will not be anything close to an exhaustive list of them, which would surely be as boring for you to read as it would be for me to write. This is just the “greatest hits”, if you like.

A tale of silly suspicions and dodgy data

This hunt started out looking in to various hangs that had been reported. Deadlocks are a broad category of bug (there’s another completely unrelated one I’ll cover in this little series, even). However, a number of ones that had shown up looked very similar. All of them showed a number of threads trying to enter GC, stuck in the consensus loop (which, yes, really is just a loop that we go around asking, “did every thread agree we’re going to do GC yet?”)

I’ll admit I went into this one with a bit of a bias. The GC sync-up code in question is a tad clever-looking. That doesn’t mean it’s wrong, but it makes it harder to be comfortable it’s correct. I also worried a bit that the cost of the consensus loop might well be enormous. So I let myself become a bit distracted for some minutes doing a profiling run to find out. I mean, who doesn’t want to be the person who fixed concurrency bug in the GC and made it faster at the same time?!

I’ve found a lot of useful speedups over the years using callgrind. I’ve sung its praises on this blog at least once before. By counting CPU cycles, it can give very precise and repeatable measurements on things that – measured using execution time – I’d consider within noise. With such accuracy, it must be the only profiler I need in my toolbox, right?

Well, no, of course not. Any time somebody says that X tool is The Best and doesn’t explain the context when it’s The Best, be very suspicious. Running Callgrind on a multi-threaded benchmark gave me all the ammunition I could ever want for replacing the GC sync-up code. It seemed that a whopping 30% of CPU cycles were spent in the GC consensus loop! This was going to be huuuuge…

Or was it? I mean, 35% seems just a little too huge. Profiling is vulnerable to the observer effect: the very act of measuring a program’s performance inevitably changes the program’s performance. And, while on the dubious physics analogies (I shouldn’t; I was a pretty awful physicist once I reached the relativity/QM stuff), there’s a bit of an uncertainty principle thing going on. The most invasive profilers can tell you very precisely where in your program time is spent, but you’re miles off knowing how fast you’re normally going in those parts of the program. They do this by instrumenting the program (like the current Perl 6 on MoarVM profiler does) or running it on a synthetic CPU, as Callgrind does. By contrast, a sampling profiler lets your program run pretty much as normal, and just takes regular samples of the call stack. The data is much less precise about where the program is spending its time, but it is a much better reflection of how fast the program is normally going.

So what would, say, the perf sampling profiler make of the benchmark? Turns out, well less than 1% of time was spent in the GC consensus loop in question. (Interestingly, around 5% was spent in a second GC termination consensus loop, which wasn’t the one under consideration here. That will be worth looking into in the future.) The Visual Studio sampling profiler on Windows – which uses a similar methodology – also gave a similar figure, which was somewhat reassuring also.

Also working against Callgrind is the way the valgrind suite of tools deal with multi-threaded applications – in short by serializing all operations onto a single thread. For a consensus loop, which expects to be dealing with syncing up a number of running threads, this could make a radical difference.

Finally, I decided to check in on what The GC Handbook had to say on the matter. It turns out that it suggests pretty much the kind of consensus loop we already have in MoarVM, albeit rather simpler because it’s juggling a bit less (a simplification I’m all for us having in the future too). So, we’re not doing anything so unusual, and with suitable measurements it’s performing as expected.

So, that was an interesting but eventually fairly pointless detour. What makes this all the more embarassing, however, is what came next. Running an example I knew to hang under GDB, I waited for it to do so, hit Ctrl + C, and started looking at all of the threads. Here’s a summary of their states:

And yes, for the sake of this being a nice example for the blog I perhaps should not have picked one with 17 threads. Anyway, we’ll cope. First up, the easy to explain stuff. All the threads that are “blocking on concurrent queue read (cond var wait)” are fairly uninteresting. They are Perl 6 thread pool threads waiting for their next task (that is, wanting to pull an item from the scheduler’s queue, and waiting for it to be non-empty).

Thread 01 is the thread that has triggered GC. It is trying to get consensus from other threads to begin GC. A number of other threads have already been interrupted and are also in the consensus loop (those marked “in AO_load_read at MVM_gc_enter_from_interrupt”). This is where I had initially suspected the problem would be. That leaves 4 other threads.

You might wonder how we cope with threads that are simply not in a position to participate in the consensus process, because they’re stuck in OS land, blocked waiting on I/O, a lock, a condition variable, a semaphore, a thread join, and so forth. The answer is that before they hand over control, they mark themselves as blocked. Another thread will steal their work if a GC happens while the thread is blocked. When the thread becomes unblocked, it marks itself as such. However, if a GC is already happening at that point, it’s not safe for the thread to proceed. Thus, it yields until GC is done. This accounts for the 3 threads described as “in mark thread unblocked; yielded”.

Which left one thread, which was trying to acquire a lock in order to peek a queue. The code looked like this:

    if (kind != MVM_reg_obj)
        MVM_exception_throw_adhoc(tc, "Can only shift objects from a ConcBlockingQueue");

    uv_mutex_lock(&cbq->locks->head_lock);

    while (MVM_load(&cbq->elems) == 0) {
        MVMROOT(tc, root, {

Spot anything missing?

Here’s the corrected version of the code:

    if (kind != MVM_reg_obj)
        MVM_exception_throw_adhoc(tc, "Can only shift objects from a ConcBlockingQueue");

    MVM_gc_mark_thread_blocked(tc);
    uv_mutex_lock(&cbq->locks->head_lock);
    MVM_gc_mark_thread_unblocked(tc);

    while (MVM_load(&cbq->elems) == 0) {
        MVMROOT(tc, root, {

Yup, it was failing to mark itself as blocked while contending for a lock, meaning the GC could not steal its work. So, the GC’s consensus algorithm wasn’t to blame after all. D’oh.

To be continued…

I actually planned to cover a second issue in this post. But, it’s already gone midnight, and perhaps that’s enough fun for one post anyway. :-) Tune in next time, for more GC trouble and another cute deadlock!


Weekly changes in and around Perl 6: 2016.34 A Quick Botch From Cluj

Published by liztormato on 2016-08-22T22:18:59

After a long trans-European journey, yours truly arrived to Cluj for the YAPC::Europe. Without a lot of time to spend on the Perl 6 Weekly. So, this time, only a small Perl 6 Weekly. Hope I didn’t miss too much!

Rakudo Compiler Release 2016.08.1

Zoffix Znet worked very hard on the Rakudo 2016.08 Compiler Release, but he botched a tag, so now we have a Rakudo 2016.08.1 Compiler Release. I’m pretty sure we will have a new Rakudo Star release based on this Compiler Release pretty soon!

More Dogfooding

It has been a while since Leon Timmermans worked on the Perl 6 version on TAP, which allows tests (such as running with make spectest) to use Perl 6 itself to run the test-files, rather than Perl 5. For the first time, this actually completed flawlessly for yours truly in the past week:

All tests successful.
Files=1118, Tests=52595, 336 wallclock secs
Result: PASS

Again a step closer to providing the whole tool-chain in Perl 6!

Core Developments

Some of the core developments of the past week that made it into the 2016.08.1 release:

Blog Posts

And the blog posts keep on coming!

Ecosystem Additions

Quite a nice selection again!

Winding Down

Time to get some sleep before all the stuff starts happening in Cluj!


Zoffix Znet: I Botched a Perl 6 Release And Now a Robot Is Taking My Job

Published on 2016-08-20T00:00:00

Automating your release cycle

Weekly changes in and around Perl 6: 2016.33 Where did we go wrong?

Published by liztormato on 2016-08-15T21:20:19

Useful, descriptive errors are becoming the industry standard and Perl 6 and Rust languages are leading that effort. The errors must go beyond merely telling you the line number to look at. They should point to a piece of code you wrote. They should make a guess at what you meant. They should be referencing your code, even if they originate in some third party library you’re using.

Check out The Awesome Errors of Perl 6 by Zoffix! (see also the Reddit comments).

Other Blog Posts

Core Developments

Documentation Developments

Mostly housekeeping, gfldex++ tells me:

Ecosystem Additions

Winding Down

Quite some nice blog posts! And some nice fixes and improvements. Can’t wait to see what we will have next week!


Zoffix Znet: The Awesome Errors of Perl 6

Published on 2016-08-15T00:00:00

The show off and the explanation of Perl 6 errors.

gfldex: It’s lazy all the way down

Published by gfldex on 2016-08-14T11:00:05

On my quest to help with the completion of the docs for Perl 6 I found the task to find undocumented methods quite cumbersome. There are currently 241 files that should contain method definitions and that number will likely grow with 6.d.

Luckily I wield the powers of Perl 6 what allows me to parse text swiftly and to introspect the build-in types.

71     my \methods := gather for types -> ($type-name, $path) {
72         take ($type-name, $path, ::($type-name).^methods(:local).grep({
73             my $name = .name;
74             # Some buildins like NQPRoutine don't support the introspection we need.
75             # We solve this the British way, complain and carry on.
76             CATCH { default { say "problematic method $name in $type-name" unless $name eq '<anon>'; False } }
77             (.package ~~ ::($type-name))
78         })».name)
79     }

We can turn a Str into a list of method objects by calling ::("YourTypeName").^methods(). The :local adverb will filter out all inherited methods but not methods that are provided by roles and mixins. To filter roles out we can check if the .package property of a method object is the same then the type name we got the methods from.

For some strange reason I started to define lazy lists and didn’t stop ’till the end.

52     my \ignore = LazyLookup.new(:path($ignore));

That creates a kind of lazy Hash with a type name as a key and a list of string of method names as values. I can go lazy here because I know the file those strings come from will never contain the same key twice. This works by overloading AT-KEY with a method that will check in a private Hash if a key is present. If not it will read a file line-by-line and store any found key/value pairs in the private Hash and stop when it found the key AT-KEY was asked for. In a bad case it has to read the entire file to the end. If we check only one type name (the script can and should be used to do that) we can go lucky and it wont pass the middle of the file.

54     my \type-pod-files := $source-path.ends-with('.pod6')

That lazy list produces IO::Path objects for every file under a directory that ends in ‘.pod6’ and descends into sub-directories in a recursive fashion. Or it contains no lazy list at all but a single IO::Path to the one file the script was asked to check. Perl 6 will turn a single value into a list with that value if ever possible to make it easy to use listy functions without hassle.

64     my \types := gather for type-pod-files».IO {

This lazy list is generated by turning IO::Path into a string and yank the type name out of it. Type names can be fairly long in Perl 6, e.g. X::Method::Private::Permission.

71     my \methods := gather for types -> ($type-name, $path) {

Here we gather a list of lists of method names found via introspection.

81     my \matched-methods := gather for methods -> ($type-name, $path, @expected-methods) {

This list is the business end of the script. It uses Set operators to get the method names that are in the list created by introspection (methods that are in the type object) but neither in the pod6-file nor in the ignore-methods file. If the ignore-methods is complete and correct, that will be a list of methods that we missed to write documentation for.

88     for matched-methods -> ($type-name, $path, Set $missing-methods) {
89         put "Type: {$type-name}, File: ⟨{$path}⟩";
90         put $missing-methods;
91         put "";
92     };

And then at the end we have a friendly for loop that takes those missing methods and puts them on the screen.

Looking at my work I realised that the whole script wouldn’t do squad all without that for loop. Well, it would allocate some RAM and setup a handful of objects, just to tell the GC to gobble them all up. Also there is a nice regularity with those lazy lists. They take the content of the previous list, use destructuring to stick some values into input variables that we can nicely name. Then it declares and fills a few output variables, again with proper speaking names, and returns a list of those. Ready to be destructured in the next lazy list. I can use the same output names as input names in the following list what makes good copypasta.

While testing the script and fixing a few bugs I found that any mistake I make that triggers and any pod6-file did terminate the program faster then I could start it. I got the error message I need to fix the problem as early as possible. The same is true for bugs that trigger only on some files. Walking a directory tree is not impressively fast yet, as Rakudo creates a lot of objects based on c-strings that drop out of libc, just to turn them right back into c-strings when you asked for the content of another directory of open a file. No big deal for 241 files, really, but I did have the pleasure to $work with an archive of 2.5 million files before. I couldn’t possibly drink all that coffee.

I like to ride the bicycle and as such could be dead tomorrow. I better keep programming in a lazy fashion so I get done as much as humanly possible.

EDIT: The docs where wrong about how gather/take behaves at the time of the writing of this blog post. They are now corrected what should lead to less confusion.


gfldex: Sneaking into a loop

Published by gfldex on 2016-08-09T23:03:24

Zoffix answered a question about Perl 5s <> operator (with a few steps in-between as it happens on IRC) with a one liner.

slurp.words.Bag.sort(-*.value).fmt("%10s => %3d\n").say;

While looking at how this actually works I stepped on a few idioms and a hole. But first lets look at how it works.

The sub slurp will read the whole “file” from STDIN and return a Str. The method Str::words will split the string into a list by some unicode-meaning of word. Coercing the list into a Bag creates a counting Hash and is a shortcut for the following expression.

my %h; %h{$_}++ for <peter paul marry>; dd %h
# OUTPUT«Hash %h = {:marry(1), :paul(1), :peter(1)}␤»

Calling .sort(-*.value) on an Associative will sort by values descending and return a ordered list of Pairs. List::fmt will call Pair::fmt what calls fmt with the .key as its 2nd and .value the parameter. Say will join with a space and output to STDOUT. The last step is a bit wrong because there will be an extra space in from of each line but the first.

slurp.words.Bag.sort(-*.value).fmt(“%10s => %3d”).join(“\n”).say;

Joining by hand is better. That’s a lot for a fairly short one-liner. There is a problem though that I tripped over before. The format string is using %10s to indent and right-align the first column what works nicely unless the values wont fit anymore. So we would need to find the longest word and use .chars to get the column width.

The first thing to solve the problem was a hack. Often you want to write a small CLI tool that reads stuff from STDIN and that requires a test. There is currently no way to have a IO::Handle that reads from a Str (there is an incomplete module for writing). We don’t really need a whole class in that case because slurp will simply call .slurp-rest on $*IN.

$*IN = <peter paul marry peter paul paul> but role { method slurp-rest { self.Str } };

It’s a hack because it will fail on any form of type check and it wont work for anything but slurp. Also we actually untie STDIN from $*IN. Don’t try this at homework.

Now we can happily slurp and start counting.

my %counted-words = slurp.words.Bag;
my $word-width = [max] %counted-words.keys».chars;

And continue the chain where we broke it apart.

%counted-words.sort(-*.value).fmt("%{$word-width}s => %3d").join("\n").say;

Solved but ugly. We broke a one-liner apart‽ Let’s fix fmt to have it whole again.

What we want is a method fmt that takes a Positional, a printf-style format string and a block per %* in the format string. Also we may need a separator to forward to self.fmt.

8 my multi method fmt(Positional:D: $fmt-str, *@width where *.all ~~ Callable, :$separator = " "){
9     self.fmt(
10         $fmt-str.subst(:g, "%*", {
11             my &width = @width[$++] // Failure.new("missing width block");
12             '%' ~ (&width.count == 2 ?? width(self, $_) !! width(self))
13         }), $separator);
14 }

The expression *.all ~~ Callable simply checks if all elements in the slurpy array implement CALL-ME (that’s the real method that is executed with you do foo()).

We then use subst on the format string to replace %*, whereby the replacement is a (closure) block that is called once per match. And here we have the nice idiom I stepped on.

say "1-a 2-b 3-c".subst(:g, /\d/, {<one two three>[$++]});
# OUTPUT«one-a two-b three-c␤»

The anonymous state variable $ is counting up by one from 0 for every execution of the block. What we actually do here is removing a loop by sneaking an extra counter and an array subscript into the loop subst must have. Or we could say that we inject an iterator pull into the loop inside subst. One could argue that subst should accept a Seq as its 2nd positional, what would make a call redundant. Anyway, we got hole plugged.

In line 11 we take one element out of the slurpy array or create a Failure if there is no element. We store the block in a variable because we want to introspect in line 12. If the block takes two Positionals, we feed the topic subst is calling the block with as a 2nd parameter to our stored block. That happens to be a Match and may be useful to react on what was matched. In our case we know that we matched on %* and the current position is counted by $++ anyway. With that done we got a format string augmented with a column provided by the user of our version of fmt.

The user supplied block is called with a list of Pairs. We have to go one level deeper to get the biggest key.

{[max] .values».keys».chars}

That’s what we have to supply to get the first columns width, in a list of Pairs dropping out of Bag.sort.

print %counted-words.sort(-*.value).&fmt("%*s => %3d", {[max] .values».keys».chars}, separator => "\n");

The fancy .&fmt call is required because our free floating method is not a method of List. That spares us to augment List.

And there you have it. Another hole in CORE plugged.


Weekly changes in and around Perl 6: 2016.32 A Quiet Week

Published by liztormato on 2016-08-08T19:57:33

Feels like many people are enjoying the nice weather. So a bit of a quiet week. Feels a lot like the quiet before the storm, though! 🙂

Core Developments

Documentation Developments

gfldex reports again from the documentation effort:

Blog Posts

Ecosystem Additions

Winding down

Only 2 more weeks or so until the YAPC::Europe. The tension is mounting! 🙂


Zoffix Znet: Perl 6 Core Hacking: Where's Da Sauce, Boss?

Published on 2016-08-04T00:00:00

Locating the source code for specific core methods and subs

Weekly changes in and around Perl 6: 2016.31 An End Of An Era

Published by liztormato on 2016-08-01T21:11:17

Because the project has not seen a release since mid-February, and there have been no commits since then, it was generally felt that the support for Parrot in NQP has become a maintenance burden, especially in the light of the development of the Javascript backend. So all code related to Parrot has been now been removed from the NQP codebase (as it was removed from the Rakudo codebase about a year ago).

It should be noted that without Parrot, Rakudo Perl 6 would not exist today. It is one of the giants on whose shoulders Rakudo Perl 6 is standing. So let me again express gratitude to the multitude of developers who have worked on the Parrot project. Even though it didn’t turn out the way it was intended, it was definitely not for nothing!

Alpine Perl Workshop

The First Alpine Perl Workshop (a cooperation of the Austrian and Swiss Perl Mongers) will be held in Innsbruck, Austria on 2 and 3 September. There is still time to submit a Perl 6 talk! If you couldn’t make it to the YAPC::Europe this year, or you have some extra time to spend after coming back from the YAPC::Europe, Innsbruck and the Alpine Perl Workshop are definitely worth looking into!

Other Core Developments

Documentation Developments

gfldex++ reports from the Documentation Dept:

Blog Posts

A nice harvest, with some blog posts that I missed in the past weeks.

Ecosystem Additions

A nice harvest as well!

Winding Down

See you next week!


Zoffix Znet: Hacking on The Rakudo Perl 6 Compiler: Mix Your Fix

Published on 2016-08-01T00:00:00

An example of fixing a bug in Rakudo

gfldex: Walk on the flats

Published by gfldex on 2016-07-31T22:18:36

In #perl6 we are asked on a regular basis how to flatten deep nested lists. The naive approach of calling flat doesn’t do what beginners wish it to do.

my @a = <Peter Paul> Z <70 42> Z <green yellow>;

dd @a;
# Array @a = [("Peter", IntStr.new(70, "70"), "green"), ("Paul", IntStr.new(42, "42"), "yellow")]

dd @a.flat.flat;
# ($("Peter", IntStr.new(70, "70"), "green"), $("Paul", IntStr.new(42, "42"), "yellow")).Seq

The reason why @a is so resistant to flattening is that flat will stop on the first item, what happens to be a list in the example above. We can show that with dd.

for @a { .&dd }
# List @a = $("Peter", IntStr.new(70, "70"), "green")
# List @a = $("Paul", IntStr.new(42, "42"), "yellow")

The reason why this easy thing isn’t easy is that you are not meant to do it. Flattening lists may work well with short simple list literals but will become pricey on lazy lists, supplies or 2TB of BIG DATA. Leave the data as it is and use destructuring to untangle nested lists.

If you must iterate in a flattening fashion over your data, use a lazy iterator.

sub descend(Any:D \l){ gather l.deepmap: { .take } }

dd descend @a;
# ("Peter", IntStr.new(70, "70"), "green", "Paul", IntStr.new(42, "42"), "yellow").Seq

By maintaining the structure of your data you can reuse that structure later, what includes changes in halve a year’s time.

put @a.map({ '<tr>' ~ .&descend.map({"<td>{.Str}</td>"}).join ~ "</tr>" }).join("\n");
# <tr><td>Peter</td><td>70</td><td>green</td></tr>
# <tr><td>Paul</td><td>42</td><td>yellow</td></tr>

There are quite a few constructs that will require you to flatten to feed the result to operators and loops. If you concatenate list-alikes like Ranges or Seq returned by metaops, flat will come in handy.

"BenGoldberg++ for this pretty example".trans( [ flat "A".."Z", "a".."z"] => [ flat "𝓐".."𝓩", "𝓪".."𝔃" ] ).put
# OUTPUT«𝓑𝓮𝓷𝓖𝓸𝓵𝓭𝓫𝓮𝓻𝓰++ 𝓯𝓸𝓻 𝓽𝓱𝓲𝓼 𝓹𝓻𝓮𝓽𝓽𝔂 𝓮𝔁𝓪𝓶𝓹𝓵𝓮␤»

gfldex: Nil statement considered harmful

Published by gfldex on 2016-07-24T17:35:24

Nil is the little parent of Failure and represents the absence of a value. It’s the John Doe of the Perl 6 world. Its undefined nature means we can talk about it but we don’t know its value. If we try to coerce it to a presentable value we will be warned.

my $a := Nil; dd $a.Str, $a.Int;
# OUTPUT«Use of Nil in string context  in block <unit> at <tmp> line 1␤Use of Nil
 in numeric context  in block <unit> at <tmp> line 1␤""0␤»

As the output shows it still coerces to the closest thing we have for a undefined string or number. Some times the empty string is outright dangerous.

sub somesub { Nil };
my $basename = somesub;
spurt("$basename.html", "<body>oi!</body>");

If we do that in a loop we would drop plenty of “.html” into the filesystem. Since this can depend on input data, some cunning individual might take advantage of our neglect. We can’t test for Nil in $basename, because assignment of Nil reverts the container to it’s default value. The default default value for the default type is Any. We can protect ourselves against undefined values with a :D-typesmile.

my Str:D $basename = somesub;

That would produce a runtime error for anything but Nil, because the default value for a container of type Str:D is Str:D. A type object that happens to be undefined and wont turn into anything then the empty string. Not healthy when use with filenames.

We still get the warning though, what means that warn is called. As it happens warn will raise a control exception, in this instance of type CX::Warn. We can catch that with a CONTROL block and forward it to die.

sub niler {Nil};
my Str $a = niler();
say("$a.html", "sometext");
say "alive"; # this line is dead code
CONTROL { .die };

That’s quite a good solution to handle interpolation problems stemming from undefined values. Given that any module or native function could produce undefined values makes it hard to reason about our programs. Having the control exception allows us to catch such problems anywhere in the current routine and allows us to deal with more then one place where we interpolate in one go.

Sometimes we want to stop those values early because between an assignment of a undefined value and the output of results minutes or even hours can pass by. Fail loudly and fail early they rightfully say. Type smileys can help us there but for Nil it left me with a nagging feeling. So I nagged and requested judgment, skids kindly provide a patch and judgement was spoken.

Perl 6 will be safe and sound again.


6guts: Assorted fixes

Published by jnthnwrthngtn on 2016-07-23T17:36:04

I’ve had a post in the works for a while about my work to make return faster (as well as routines that don’t return), as well as some notable multi-dispatch performance improvements. While I get over my writer’s block on that, here’s a shorter post on a number of small fixes I did on Thursday this week.

I’m actually a little bit “between things” at the moment. After some recent performance work, my next focus will be on concurrency stability fixes and improvements, especially to hyper and race. However, a little down on sleep thanks to the darned warm summer weather, I figured I’d spend a day picking a bunch of slightly less demanding bugs off from the RT queue. Some days, it’s about knowing what you shouldn’t work on…

A nasty string bug

MoarVM is somewhat lazy about a number of string operations. If you ask it to concatenate two simple strings, it will produce a string consisting of a strand list, with two strands pointing to the two strings. Similarly, a substring operation will produce a string with one strand and an offset into the original, and a repetition (using the x operator) will just produce a string with one strand pointing to the original string and having a repetition count. Note that it doesn’t currently go so far as allowing trees of strand strings, but it’s enough to prevent a bunch of copying – or at least delay it until a bunch of it can be done together and more cheaply.

The reason not to implement such cleverness is because it’s of course a whole lot more complex than simple immutable strings. And both RT #123602 and RT #127782 were about a sequence of actions that could trigger a bug. The precise sequence of actions were a repeat, followed by a concatenation, followed by a substring with certain offsets. It was caused by an off-by-one involving the repetition optimization, which was a little tedious to find but easy to fix.

Constant folding Seqs is naughty

RT #127749 stumbled across a case where an operation in a loop would work fine if its input was variable (like ^$n X ^$n), but fail if it were constant (such as ^5 X ^5). The X operator returns a Seq, which is an iterator that produces values once, throwing them away. Thus iterating it a second time won’t go well. The constant folding optimization is used so that things like 2 + 2 will be compiled into 4 (silly in this case, but more valuable if you’re doing things with constants). However, given the 1-shot nature of a Seq, it’s not suitable for constant folding. So, now it’s disallowed.

We are anonymous

RT #127540 complained that an anon sub whose name happened to match that of an existing named sub in the same scope would trigger a bogus redeclaration error. Wait, you ask. Anonymous sub…whose name?! Well, it turns out that what anon really means is that we don’t install it anywhere. It can have a name that it knows itself by, however, which is useful should it show up in a backtrace, for example. The bogus error was easily fixed up.

Charset, :ignoremark, :global, boom

Yes, it’s the obligatory “dive into the regex compiler” that all bug fixing days seem to come with. RT #128270 mentioned that that "a" ~~ m:g:ignoremark/<[á]>/ would whine about chr being fed a negative codepoint. Digging into the MoarVM bytecode this compiled into was pretty easy, as chr only showed up one time, so the culprit had to be close to that. It turned out to be a failure to cope with end of string, and as regex bugs go wasn’t so hard to fix.

Hang, crash, wallop

This is one of those no impact on real code, but sorta embarrassing bugs. A (;) would cause an infinite loop of errors, and (;;) and [0;] would emit similar errors also. The hang was caused by a loop that did next but failed to consider that the iteration variable needed updating in the optimizer. The second was because of constructing bad AST with integers hanging around in it rather than AST nodes, which confused all kinds of things. And that was RT #127473.

Improving an underwhelming error

RT #128581 pointed out that my Array[Numerix] $x spat out an error that fell rather short of the standards we aim for in Perl 6. Of course, the error should complain that Numerix isn’t known and suggest that maybe you wanted Numeric. Instead, it spat out this:

===SORRY!=== Error while compiling ./x.pl6
An exception occurred while parameterizing Array
at ./x.pl6:1
Exception details:
  ===SORRY!=== Error while compiling
  Cannot invoke this object (REPR: Null; VMNull)
  at :

Which is ugly. The line number was at least correct, but still… Anyway, a small tweak later, it produced the much better:

$ ./perl6-m -e 'my Array[Numerix] $x;'
===SORRY!=== Error while compiling -e
Undeclared name:
    Numerix used at line 1. Did you mean 'Numeric'?

Problems mixing unit sub MAIN with where constraints

RT #127785 observed that using a unit sub MAIN – which takes the entire program body as the contents of the MAIN subroutine – seemed to run into trouble if the signature contained a where clause:

% perl6 -e 'unit sub MAIN ($x where { $^x > 1 } );  say "big"'  4
===SORRY!===
Expression needs parens to avoid gobbling block
at -e:1
------> unit sub MAIN ($x where { $^x > 1 }⏏ );  say "big"
Missing block (apparently claimed by expression)
at -e:1
------> unit sub MAIN ($x where { $^x > 1 } );⏏  say "big"

The error here is clearly bogus. Finding a way to get rid of it wasn’t too hard, and it’s what I ended up committing. I’ll admit that I’m not sure why the check involved was put there in the first place, however. After some playing around with other situations that it might have aided, I failed to find any. There were also no spectests that depended on it. So, off it went.

$?MODULE

The author of RT #128552 noticed that the docs talked about $?MODULE (“what module am I currently in”), to go with $?PACKAGE and $?CLASS. However, trying it out let to an undeclared variable error. It seems to have been simply overlooked. It was easy to add, so that’s what I did. I also found some old, provisional tests and brought them up to date in order to cover it.

Subtypes, definedness types, and parameters

The submitter of RT #127394 was creative enough to try -> SomeSubtype:D $x { }. That is, take a subset type and stick a :D on it, which adds the additional constraint that the value must be defined. This didn’t go too well, resulting in some rather strange errors. It turns out that, while picking the type apart so we can code-gen the parameter binding, we failed to consider such interesting cases. Thankfully, a small refactor made the fix easy once I’d figured out what was happening.

1 day, 10 RTs

Not bad going. Nothing earth-shatteringly exciting, but all things that somebody had run into – and so others would surely run into again in the future. And, while I’ll be getting back to the bigger, hairier things soon, spending a day making Perl 6 a little nicer in 10 different ways was pretty fun.


rakudo.org: Announce: Rakudo Star Release 2016.07

Published by Steve Mynott on 2016-07-22T10:41:46

On behalf of the Rakudo and Perl 6 development teams, I’m pleased to announce the July 2016 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for the July 2016 release is available from http://rakudo.org/downloads/star/.

This is the third post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms).

Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

In the Perl 6 world, we make a distinction between the language (“Perl 6″) and specific implementations of the language such as “Rakudo Perl”. This Star release includes release 2016.07 of the Rakudo Perl 6 compiler, version 2016.07 of MoarVM, plus various modules, documentation, and other resources collected from the Perl 6 community.

Some of the new compiler features since the last Rakudo Star release include:

Compiler maintenance since the last Rakudo Star release includes:

Notable changes in modules shipped with Rakudo Star:

perl6intro.pdf has also been updated.

There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases. Some of the not-quite-there features include:

In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at .

See http://perl6.org/ for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see http://rakudo.org/how-to-help, ask on the mailing list, or join us on IRC #perl6 on freenode.

gfldex: Sneaky methods

Published by gfldex on 2016-07-20T01:16:13

As one would expect methods can be declared and defined inside a class definition. Not so expected and even less documented are free floating methods declared with my method. Now why would you want:

my method foo(SomeClass:D:){self}

The obvious answer is the Meta Object Protocols -method, as can be found in Rakudo:

src/core/Bool.pm
32:    Bool.^add_method('pred',  my method pred() { Bool::False });
33:    Bool.^add_method('succ',  my method succ() { Bool::True });
35:    Bool.^add_method('enums', my method enums() { self.^enum_values });

There is another, more sneaky use for such a method. You may want to have a look at what is going on in a chain of method calls. We could rip the expression apart and insert a one shot variable, do our debugging output, and continue in the chain. Good names are important and wasting them on one shot variables is unnecessary cognitive load.

<a b c>.&(my method ::(List:D){dd self; self}).say;
# OUTPUT«("a", "b", "c")␤(a b c)␤»

We can’t have no name without an explicit invocant, because Perl 6 wont let us, so we use the empty scope :: to make the parser happy. With a proper invocant, we would not need that. Also, the anonymous method is not a member of List. We need to use postfix .& to call it. If we need that method more then once we could pull it out and give it a name.

my multi method debug(List:D:){dd self; self};
<a b c>.&debug.say;
# OUTPUT«("a", "b", "c")␤(a b c)␤»

Or we assign it as a default argument if we want to allow callbacks.

sub f(@l, :&debug = my method (List:D:){self}) { @l.&debug.say };
f <a b c>, debug => my method ::(List:D){dd self; self};
# OUTPUT«("a", "b", "c")␤(a b c)␤»

In Perl 6 pretty much everything is a class, including methods. If it’s a class it can be an object and we can sneak those in wherever we like.


Pawel bbkr Pabian: Comprehensive guide and tools to split monolithic database into shards using Perl

Published by Pawel bbkr Pabian on 2016-06-21T04:47:32

You can find the most recent version of this tutorial here.

Intro

When you suddenly get this brilliant idea, the revolutionary game-changer, all you want to do is to immediately hack some proof of concept to start small project flame from a spark of creativity. So I'll leave you alone for now, with your third mug of coffee and birds chirping first morning songs outside of the window...

...Few years later we meet again. Your proof of concept has grown into a mature, recognizable product. Congratulations! But why the sad face? Your clients are complaining that your product is slow and unresponsive? They want more features? They generate more data? And you cannot do anything about it, despite the fact that you bought the most shiny, expensive database server that is available?

When you were hacking your project on day 0, you were not thinking about the long term scalability. All you wanted to do was to create working prototype as fast as possible. So single database design was easiest, fastest and most obvious to use. You didn't think back then, that a single machine cannot be scaled up infinitely. And now it is already too late.

LOOKS LIKE YOU'RE STUCK ON THE SHORES OF [monolithic design database] HELL. THE ONLY WAY OUT IS THROUGH...

(DOOM quote)

Sharding to the rescue!

Sharding is the process of distributing your clients data across multiple databases (called shards). By doing so you will be able to:

But if you already have single (monolithic) database this process is like converting your motorcycle into a car... while riding.

Outline

This is step-by-step guide of a a very tricky process. And the worst thing you can do is to panic because your product is collapsing under its own weight and you have a lots of pressure from clients. The whole process may take weeks, even months. Will use significant amount of human and time resources. And will pause new features development. Be prepared for that. And do not rush to the next stage until you are absolutely sure the current one is completed.

So what is the plan?

Understand your data

In monolithic database design data classification is irrelevant but it is the most crucial part of sharding design. Your tables can be divided into three groups: client, context and neutral.

Let's assume your product is car rental software and do a quick exercise:

  +----------+      +------------+      +-----------+
  | clients  |      | cities     |      | countries |
  +----------+      +------------+      +-----------+
+-| id       |   +--| id         |   +--| id        |
| | city_id  |>--+  | country_id |>--+  | name      |
| | login    |   |  | name       |      +-----------+
| | password |   |  +------------+
| +----------+   |
|                +------+
+--------------------+  |
                     |  |     +-------+
  +---------------+  |  |     | cars  |
  | rentals       |  |  |     +-------+
  +---------------+  |  |  +--| id    |
+-| id            |  |  |  |  | vin   |
| | client_id     |>-+  |  |  | brand |
| | start_city_id |>----+  |  | model |
| | start_date    |     |  |  +-------+
| | end_city_id   |>----+  |
| | end_date      |        |
| | car_id        |>-------+   +--------------------+
| | cost          |            | anti_fraud_systems |
| +---------------+            +--------------------+
|                              | id                 |--+
|      +-----------+           | name               |  |
|      | tracks    |           +--------------------+  |
|      +-----------+                                   |
+-----<| rental_id |                                   |
       | latitude  |     +--------------------------+  |
       | longitude |     | blacklisted_credit_cards |  |
       | timestamp |     +--------------------------+  |
       +-----------+     | anti_fraud_system_id     |>-+
                         | number                   |
                         +--------------------------+

Client tables

They contain data owned by your clients. To find them, you must start in some root table - clients in our example. Then follow every parent-to-child relation (only in this direction) as deep as you can. In this case we descend into rentals and then from rentals further to tracks. So our client tables are: clients, rentals and tracks.

Single client owns subset of rows from those tables, and those rows will always be moved together in a single transaction between shards.

Context tables

They put your clients data in context. To find them, follow every child-to-parent relation (only in this direction) from every client table as shallow as you can. Skip if table is already classified. In this case we ascend from clients to cities and from cities further to countries. Then from rentals we can ascend to clients (already classified), cities (already classified) and cars. And from tracks we can ascend into rentals (already classified). So our context tables are: cities, countries and cars.

Context tables should be synchronized across all shards.

Neutral tables

Everything else. They must not be reachable from any client or context table through any relation. However, there may be relations between them. So our neutral tables are: anti_fraud_systems and blacklisted_credit_cards.

Neutral tables should be moved outside of shards.

Checkpoint

Take any tool that can visualize your database in the form of a diagram. Print it and pin it on the wall. Then take markers in 3 different colors - each for every table type - and start marking tables in your schema.

If you have some tables not connected due to technical reasons (for example MySQL partitioned tables or TokuDB tables do not support foreign keys), draw this relation and assume it is there.

If you are not certain about specific table, leave it unmarked for now.

Done? Good :)

Q&A

Q: Is it a good idea to cut all relations between client and context tables, so that only two types - client and neutral - remain?

A: You will save a bit of work because no synchronization of context data across all shards will be required. But at the same time any analytics will be nearly impossible. For example, even simple task to find which car was rented the most times will require software script to do the join. Also there won't be any protection against software bugs, for example it will be possible to rent a car that does not even exist.

There are two cases when converting a context table to neutral table is justified:

In every other case it is a very bad idea to make neutral data out of context data.

Q: Is it a good idea to shard only big tables and leave all small tables together on a monolithic database?

A: In our example you have one puffy table - tracks. It keeps GPS trail of every car rental and will grow really fast. So if you only shard this data, you will save a lot of work because there will be only small application changes required. But in real world you will have 100 puffy tables and that means 100 places in application logic when you have to juggle database handles to locate all client data. That also means you won't be able to split your clients between many data centers. Also you won't be able to reduce downtime costs to 1/nth of the amount of shards if some data corruption in monolithic database occurs and recovery is required. And analytics argument mentioned above also applies here.

It is a bad idea to do such sub-sharding. May seem easy and fast - but the sooner you do proper sharding that includes all of your clients data, the better.

Fix your monolithic database

There are a few design patterns that are perfectly fine or acceptable in monolithic database design but are no-go in sharding.

Lack of foreign key

Aside from obvious risk of referencing nonexistent records, this issue can leave junk when you will migrate clients between shards later for load balancing. The fix is simple - add foreign key if there should be one.

The only exception is when it cannot be added due to technical limitations, such as usage of TokuDB or partitioned MySQL tables that simply do not support foreign keys. Skip those, I'll tell you how to deal with them during data migration later.

Direct connection between clients

Because clients may be located on different shards, their rows may not point at each other. Typical case where it happens is affiliation.

+-----------------------+
| clients               |
+-----------------------+
| id                    |------+
| login                 |      |
| password              |      |
| referred_by_client_id |>-----+
+-----------------------+

To fix this issue you must remove foreign key and rely on software instead to match those records.

Nested connection between clients

Because clients may be located on different shards their rows may not reference another client (also indirectly). Typical case where it happens is post-and-comment discussion.

  +----------+        +------------+
  | clients  |        | blog_posts |
  +----------+        +------------+
+-| id       |---+    | id         |---+
| | login    |   +---<| client_id  |   |
| | password |        | text       |   |
| +----------+        +------------+   |
|                                      |
|    +--------------+                  |
|    | comments     |                  |
|    +--------------+                  |
|    | blog_post_id |>-----------------+
+---<| client_id    |
     | text         |
     +--------------+

First client posted something and a second client commented it. This comment references two clients at the same time - second one directly and first one indirectly through blog_posts table. That means it will be impossible to satisfy both foreign keys in comments table if those clients are not in single database.

To fix this you must choose which relation from table that refers to multiple clients is more important, remove the other foreign keys and rely on software instead to match those records.

So in our example you may decide that relation between comments and blog_posts remains, relation between comments and clients is removed and you will use application logic to find which client wrote which comment.

Accidental connection between clients

This is the same issue as nested connection but caused by application errors instead of intentional design.

                    +----------+
                    | clients  |
                    +----------+
+-------------------| id       |--------------------+
|                   | login    |                    |
|                   | password |                    |
|                   +----------+                    |
|                                                   |
|  +-----------------+        +------------------+  |
|  | blog_categories |        | blog_posts       |  |
|  +-----------------+        +------------------+  |
|  | id              |----+   | id               |  |
+-<| client_id       |    |   | client_id        |>-+
   | name            |    +--<| blog_category_id |
   +-----------------+        | text             |
                              +------------------+

For example first client defined his own blog categories for his own blog posts. But lets say there was mess with www sessions or some caching mechanism and blog post of second client was accidentally assigned to category defined by first client.

Those issues are very hard to find, because schema itself is perfectly fine and only data is damaged.

Not reachable clients data

Client tables must be reached exclusively by descending from root table through parent-to-child relations.

              +----------+
              | clients  |
              +----------+
+-------------| id       |
|             | login    |
|             | password |
|             +----------+
|
|  +-----------+        +------------+
|  | photos    |        | albums     |
|  +-----------+        +------------+
|  | id        |    +---| id         |
+-<| client_id |    |   | name       |
   | album_id  |>---+   | created_at |
   | file      |        +------------+
   +-----------+

So we have photo management software this time and when a client synchronizes photos from a camera, new album is created automatically for better import visualization. This is an obvious issue even in monolithic database - when all photos are removed from album, then it becomes zombie row. It won't be deleted automatically by cascade and cannot be matched with client anymore. In sharding, this also causes misclassification of client table as context table.

To fix this issue foreign key should be added from albums to clients. This may also fix classification for some tables below albums, if any.

Polymorphic data

Table cannot be classified as two types at the same time.

              +----------+
              | clients  |
              +----------+
+-------------| id       |-------------+
|             | login    |             |
|             | password |             |
|             +----------+         (nullable)
|                                      |
|  +-----------+        +-----------+  |
|  | blogs     |        | skins     |  |
|  +-----------+        +-----------+  |
|  | id        |    +---| id        |  |
+-<| client_id |    |   | client_id |>-+
   | skin_id   |>---+   | color     |
   | title     |        +-----------+
   +-----------+

In this product client can choose predefined skin for his blog. But can also define his own skin color and use it as well.

Here single interface of skins table is used to access data of both client and context type. A lot of "let's allow client to customize that" features end up implemented this way. While being a smart hack - with no table schema duplication and only simple WHERE client_id IS NULL OR client_id = 123 added to query to present both public and private templates for client - this may cause a lot of trouble in sharding.

The fix is to go with dual foreign key design and separate tables. Create constraint (or trigger) that will protect against assigning blog to public and private skin at the same time. And write more complicated query to get blog skin color.

              +----------+
              | clients  |
              +----------+
+-------------| id       |
|             | login    |
|             | password |
|             +----------+
|
|   +---------------+        +--------------+
|   | private_skins |        | public_skins |
|   +---------------+        +--------------+
|   | id            |--+  +--| id           |
+--<| client_id     |  |  |  | color        |
|   | color         |  |  |  +--------------+
|   +---------------+  |  |    
|                      |  |
|                   (nullable)
|                      |  |
|                      |  +------+
|                      +-----+   |
|                            |   |
|       +-----------------+  |   |
|       | blogs           |  |   |
|       +-----------------+  |   |
|       | id              |  |   | 
+------<| client_id       |  |   |
        | private_skin_id |>-+   |
        | public_skin_id  |>-----+
        | title           |
        +-----------------+

However - this fix is optional. I'll show you how to deal with maintaining mixed data types in chapter about mutually exclusive IDs. It will be up to you to decide if you want less refactoring but more complicated synchronization.

Beware! Such fix may also accidentally cause another issue described below.

Opaque uniqueness (a.k.a. horse riddle)

Every client table without unique constraint must be reachable by not nullable path of parent-to-child relations or at most single nullable path of parent-to-child relations. This is very tricky issue which may cause data loss or duplication during client migration to database shard.

              +----------+
              | clients  |
              +----------+
+-------------| id       |-------------+
|             | login    |             |
|             | password |             |
|             +----------+             |
|                                      |
|  +-----------+        +-----------+  |
|  | time      |        | distance  |  |
|  +-----------+        +-----------+  |
|  | id        |--+  +--| id        |  |
+-<| client_id |  |  |  | client_id |>-+
   | amount    |  |  |  | amount    |
   +-----------+  |  |  +-----------+
                  |  |
               (nullable)
                  |  |
                  |  |
         +--------+  +---------+
         |                     |
         |   +-------------+   |
         |   | parts       |   |
         |   +-------------+   |
         +--<| time_id     |   |
             | distance_id |>--+
             | name        |
             +-------------+

This time our product is application that helps you with car maintenance schedule. Our clients car has 4 tires that must be replaced after 10 years or 100000km and 4 spark plugs that must be replaced after 100000km. So 4 indistinguishable rows for tires are added to parts table (they reference both time and distance) and 4 indistinguishable rows are added for spark plugs (they reference only distance).

Now to migrate client to shard we have to find which rows from parts table does he own. By following relations through time table we will get 4 tires. But because this path is nullable at some point we are not sure if we found all records. And indeed, by following relations through distance table we found 4 tires and 4 spark plugs. Since this path is also nullable at some point we are not sure if we found all records. So we must combine result from time and distance paths, which gives us... 8 tires and 4 spark plugs? Well, that looks wrong. Maybe let's group it by time and distance pair, which gives us... 1 tire and 1 spark plug? So depending how you combine indistinguishable rows from many nullable paths to get final row set, you may suffer either data duplication or data loss.

You may say: Hey, that's easy - just select all rows through time path, then all rows from distance path that do not have time_id, then union both results. Unfortunately paths may be nullable somewhere earlier and several nullable paths may lead to single table, which will produce bizarre logic to get indistinguishable rows set properly.

To solve this issue make sure there is at least one not nullable path that leads to every client table (does not matter how many tables it goes through). Extra foreign key should be added between clients and part in our example.

TL;DR

Q: How many legs does the horse have?

A1: Eight. Two front, two rear, two left and two right.

A2: Four. Those attached to it.

Foreign key to not unique rows

MySQL specific issue.

CREATE TABLE `foo` (
  `id` int(10) unsigned DEFAULT NULL,
  KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `bar` (
  `foo_id` int(10) unsigned NOT NULL,
  KEY `foo_id` (`foo_id`),
  CONSTRAINT `bar_ibfk_1` FOREIGN KEY (`foo_id`) REFERENCES `foo` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

mysql> INSERT INTO `foo` (`id`) VALUES (1);
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO `foo` (`id`) VALUES (1);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `bar` (`foo_id`) VALUES (1);
Query OK, 1 row affected (0.01 sec)

Which row from foo table is referenced by row in bar table?

You don't know because behavior of foreign key constraint is defined as "it there any parent I can refer to?" instead of "do I have exactly one parent?". There are no direct row-to-row references as in other databases. And it's not a bug, it's a feature.

Of course this causes a lot of weird bugs when trying to locate all rows that belong to given client, because results can be duplicated on JOINs.

To fix this issue just make sure every referenced column (or set of columns) is unique. They must not be nullable and must all be used as primary or unique key.

Self loops

Rows in the same client table cannot be in direct relation. Typical case is expressing all kinds of tree or graph structures.

+----------+
| clients  |
+----------+
| id       |----------------------+
| login    |                      |
| password |                      |
+----------+                      |
                                  |
          +--------------------+  |
          | albums             |  |
          +--------------------+  |
      +---| id                 |  |
      |   | client_id          |>-+
      +--<| parent_album_id    |
          | name               |
          +--------------------+

This causes issues when client data is inserted into target shard.

For example in our photo management software client has album with id = 2 as subcategory of album with id = 1. Then he flips this configuration, so that the album with id = 2 is on top. In such scenario if database returned client rows in default, primary key order then it won't be possible to insert album with id = 1 because it requires presence of album with id = 2.

Yes, you can disable foreign key constraints to be able to insert self-referenced data in any order. But by doing so you may mask many errors - for example broken references to context data.

For good sharding experience all relations between rows of the same table should be stored in separate table.

+----------+
| clients  |
+----------+
| id       |----------------------+
| login    |                      |
| password |                      |
+----------+                      |
                                  |
          +--------------------+  |
          | albums             |  |
          +--------------------+  |
  +-+=====| id                 |  |
  | |     | client_id          |>-+
  | |     | name               |
  | |     +--------------------+
  | |
  | |    +------------------+
  | |    | album_hierarchy  |
  | |    +------------------+
  | +---<| parent_album_id  |
  +-----<| child_album_id   |
         +------------------+

Triggers

Triggers cannot modify rows. Or roar too loud :)

            +----------+
            | clients  |
            +----------+
 +----------| id       |------------------+
 |          | login    |                  |
 |          | password |                  |
 |          +----------+                  |
 |                                        |
 |  +------------+        +------------+  |
 |  | blog_posts |        | activities |  |
 |  +------------+        +------------+  |
 |  | id         |        | id         |  |
 +-<| client_id  |        | client_id  |>-+
    | content    |        | counter    |
    +------------+        +------------+
          :                      :
          :                      :
 (on insert post create or increase activity)

This is common usage of a trigger to automatically aggregate some statistics. Very useful and safe - doesn't matter which part of application adds new blog post, activities counter will always go up.

However, when sharding this causes a lot of trouble when inserting client data. Let's say he has 4 blog posts and 4 activities. If posts are inserted first they bump activity counter through trigger and we have collision in activties table due to unexpected row. When activities are inserted first they are unexpectedly increased by posts inserts later, ending with invalid 8 activities total.

In sharding triggers can only be used if they do not modify data. For example it is OK to do sophisticated constraints using them. Triggers that modify data must be removed and their logic ported to application.

Checkpoint

Check if there are any issues described above in your printed schema and fix them.

And this is probably the most annoying part of sharding process as you will have to dig through a lot of code. Sometimes old, untested undocumented and unmaintained.

When you are done your printed schema on the wall should not contain any unclassified tables.

Ready for next step?

Prepare schema

It is time to dump monolithic database complete schema (tables, triggers, views and functions/procedures) to the shard_schema.sql file and prepare for sharding environment initialization.

Separate neutral tables

Move all tables that are marked as neutral from shard_schema.sql file to separate neutral_schema.sql file. Do not forget to also move triggers, views or procedures associated with them.

Bigints

Every primary key on shard should be of unsigned bigint type. You do not have to modify your existing schema installed on monolithic database. Just edit shard_schema.sql file and massively replace all numeric primary and foreign keys to unsigned big integers. I'll explain later why this is needed.

Create schema for dispatcher

Dispatcher tells on which shard specific client is located. Absolute minimum is to have table where you will keep client id and shard number. Save it to dispatch_schema.sql file.

More complex dispatchers will be described later.

Dump common data

From monolithic database dump data for neutral tables to neutral_data.sql file and for context tables to context_data.sql file. Watch out for tables order to avoid breaking foreign keys constraints.

Checkpoint

You should have shard_schema.sql, neutral_schema.sql, dispatch_schema.sql, neutral_data.sql and context_data.sql files.

At this point you should also freeze all schema and common data changes in your application until sharding is completed.

Set up environment

Finally you can put all those new, shiny machines to good use.

Typical sharding environment contains of:

Each database should of course be replicated.

Database for neutral data

Nothing fancy, just regular database. Install neutral_schema.sql and feed neutral_data.sql to it.

Make separate user for application with read-only grants to read neutral data and separate user with read-write grants for managing data.

Database for dispatch

Every time client logs in to your product you will have to find which shard he is located on. Make sure all data fits into RAM, have a huge connection pool available. And install dispatch_schema.sql to it.

This is a weak point of all sharding designs. Should be off-loaded by various caches as much as possible.

Databases for shards

They should all have the same power (CPU/RAM/IO) - this will speed things up because you can just randomly select shard for your new or migrated client without bothering with different hardware capabilities.

Configuration of shard databases is pretty straightforward. For every shard just install shard_schema.sql, feed context_data.sql file and follow two more steps.

Databases for shards - users

Remember that context tables should be identical on all shards. Therefore it is a good idea to have separate user with read-write grants for managing context data. Application user should have read-only access to context tables to prevent accidental context data change.

This ideal design may be too difficult to maintain - every new table will require setting up separate grants. If you decide to go with single user make sure you will add some mechanism that monitors context data consistency across all shards.

Databases for shards - mutually exclusive primary keys

Primary keys in client tables must be globally unique across whole product.

First of all - data split is a long process. Just pushing data between databases may take days or even weeks! And because of that it should be performed without any global downtime. So during monolithic to shard migration phase new rows will still be created in monolithic database and already migrated users will create rows on shards. Those rows must never collide.

Second of all - sharding does not end there. Later on you will have to load balance shards, move client between different physical locations, backup and restore them if needed. So rows must never collide at any time of your product life.

How to achieve that? Use offset and increment while generating your primary keys.

MySQL has ready to use mechanism:

Now your first shard for any table will generate 1, 101, 201, 301, ... , 901, 1001, 1101 auto increments and second shard will generate 2, 102, 202, 302, ... , 902, 1002, 1102 auto increments. And that's all! Your new rows will never collide, doesn't matter which shard they were generated on and without any communication between shards needed.

TODO: add recipes for another database types

Now you should understand why I've told you to convert all numerical primary and foreign keys to unsigned big integers. The sequences will grow really fast, in our example 100x faster than on monolithic database.

Remember to set the same increment and corresponding offsets on replicas. Forgetting to do so will be lethal to whole sharding design.

Checkpoint

Your database servers should be set up. Check routings from application, check user grants. And again - remember to have correct configurations (in puppet for example) for shards and their replicas offsets. Do some failures simulations.

And move to the next step :)

Q&A

Q: Can I do sharding without dispatch database? When client wants to log in I can just ask all shards for his data and use the one shard that will respond.

A: No. This may work when you start with three shards, but when you have 64 shards in 8 different data centers such fishing queries become impossible. Not to mention you will be very vulnerable to brute-force attacks - every password guess attempt will be multiplied by your application causing instant overload of the whole infrastructure.

Q: Can I use any no-SQL technology for dispatch and neutral databases?

A: Sure. You can use it instead of traditional SQL or as a supplementary cache.

Adapt code

Product

There will be additional step in your product. When user logs in then dispatch database must be asked for shard number first. Then you connect to this shard and... it works! Your code will also have to use separate database connection to access neutral data. And it will have to roll shard when new client registers and note this selection in dispatch database.

That is the beauty of whole clients sharding - majority of your code is not aware of it.

Synchronization of common data

If you modify neutral data this change should be propagated to every neutral database (you may have more of those in different physical locations).

Same thing applies to context data on shard, but all auto increment columns must be forced. This is because every shard will generate different sequence. When you execute INSERT INTO skins (color) VALUES ('#AA55BC') then shard 1 will assign different IDs for them than shard 2. And all client data that reference this language will be impossible to migrate between shards.

Dispatch

Dispatch serves two purposes. It allows you to find client on shard by some unique attribute (login, email, and so on) and it also helps to guarantee such uniqueness. So for example when new client is created then dispatch database must be asked if chosen login is available. Take an extra care of dispatch database. Off-load as much as you can by caching and schedule regular consistency checks between it and the shards.

Things get complicated if you have shards in many data centers. Unfortunately I cannot give you universal algorithm of how to keep them in sync, this is too much application specific.

Analytic tools

Because your clients data will be scattered across multiple shard databases you will have to fix a lot of global queries used in analytical and statistical tools. What was trivial previously - for example SELECT city.name, COUNT(*) AS amount FROM clients JOIN cities ON clients.city_id = cities.id GROUP BY city.name ORDER BY amount DESC LIMIT 8 - will now require gathering all data needed from shards and performing intermediate materialization for grouping, ordering and limiting.

There are tools that helps you with that. I've tried several solutions, but none was versatile, configurable and stable enough that I could recommend it.

Make a wish-ard

We got to the point when you have to switch your application to sharding flow. To avoid having two versions of code - old one for still running monolithic design and a new one for sharding, we will just connect monolithic database as another "fake" shard.

First you have to deal with auto increments. Set up in the configuration the same increment on your monolithic database as on shards and set up any free offset. Then check what is the current auto increment value for every client or context table and set the same value for this table located on every shard. Now your primary keys won't collide between "fake" and real shards during the migration. But beware: this can easily overflow tiny or small ints in your monolithic database, for example just adding three rows can overflow tiny int unsigned capacity of 255.

After that synchronize data on dispatch database - every client you have should point to this "fake" shard. Deploy your code changes to use dispatch logic.

Checkpoint

You should be running code with complete sharding logic but on reduced environment - with only one "fake" shard made out of your monolithic database. You may also already enable creating new client accounts on your real shards.

Tons of small issues to fix will pop up at this point. Forgotten pieces of code, broken analytics tools, broken support panels, need of neutral or dispatch databases tune up.

And when you squash all the bugs it is time for grande finale: clients migration.

Migrate clients to shards

Downtime semaphore

You do not need any global downtime to perform clients data migration. Disabling your whole product for a few weeks would be unacceptable and would cause huge financial loses. But you need some mechanism to disable access for individual clients while they are migrated. Single flag in dispatch databases should do, but your code should be aware of it and present nice information screen for client when he tries to log in. And of course do not modify clients data.

Timing

If you have some history of your client habits - use it. For example if client is usually logging in at 10:00 and logging out at 11:00 schedule his migration to another hour. You may also figure out which timezones clients are in and schedule migration for the night for each of them. The migration process should be as transparent to client as possible. One day he should just log in and bam - fast and responsive product all of a sudden.

Exodus tool

Exodus was the tool used internally at GetResponse.com to split monolithic database into shards. And is still used to load balance sharding environment. It allows to extract subset of rows from relational database and build series of queries that can insert those rows to another relational database with the same schema.

Fetch Exodus.pm and create exodus.pl file in the same directory with the following content:

#!/usr/bin/env perl

use strict;
use warnings;

my $database = DBI->connect(
    'DBI:mysql:database=test;host=localhost',
    'root', undef,
    {'RaiseError' => 1}
);

my $exodus = Exodus->new(
    'database' => $database,
    'root'     => 'clients',
);

$exodus->extract( 'id' => 1 );

Of course provide proper credentials to connect to your monolithic database.

Now when you call the script it will extract all data for client represented by record of id = 1 in clients root table. You can directly pipe it to another database to copy the client there.

./exodus.pl | mysql --host=my-shard-1 --user=....

Update dispatch shard for this client, check that product works for him and remove his rows from monolithic database.

Repeat for every client.

MySQL issues

Do not use user with SUPER grant to perform migration. Not because it is unsafe, but because they do not have locales loaded by default. You may end up with messed character encodings if you do so.

MySQL is quite dumb when it comes to cascading DELETE operations. If you have such schema

            +----------+
            | clients  |
            +----------+
 +----------| id       |----------------+
 |          | login    |                |
 |          | password |                |
 |          +----------+                |
 |                                      |
 |  +-----------+        +-----------+  |
 |  | foo       |        | bar       |  |
 |  +-----------+        +-----------+  |
 |  | id        |----+   | id        |  |
 +-<| client_id |    |   | client_id |>-+
    +------------+   +--<| foo_id    |
                         +----------+

and all relations are ON DELETE CASCADE then sometimes it cannot resolve proper order and may try to delete data from foo table before data from bar table, causing constraint error. In such cases you must help it a bit and manually delete clients data from bar table before you will be able to delete row from clients table.

Virtual foreign keys

Exodus allows you to manually add missing relations between tables, that couldn't be created in the regular way for some reasons (for example table engine does not support foreign keys).

my $exodus = Exodus->new(
    'database' => $dbh,
    'root' => 'clients',
    'relations' => [
        {
            'nullable'      => 0,
            'parent_table'  => 'foo',
            'parent_column' => 'id'
            'child_table'   => 'bar',
            'child_column'  => 'foo_id',
        },
        { ... another one ...}
    ]
);

Note that if foreign key column can be NULL such relation should be marked as nullable.

Also remember to delete rows from such table when your client migration is complete. Due to lack of foreign key it won't auto cascade.

Cleanup

When all of your clients are migrated simply remove your "fake" shard from infrastructure.

Exodus roadmap

I have a plans of refactoring this code to Perl 6 (it was prototyped in Perl 6, although back in the days when GetResponse introduced sharding Perl 6 was not fast or stable enough to deal with terabytes of data). It should get proper abstracts, more engines support and of course good test suite.

YAPC NA 2016 "Pull request challenge" seems like a good day to begin :)

Contact

If you have any questions about database sharding or want to contribute to this guide or Exodus tool contact me in person or on IRC as "bbkr".

Final screen

YOU'VE PROVEN TOO TOUGH FOR [monolithic database design] HELL TO CONTAIN

(DOOM quote)

Strangely Consistent: Train tracks

Published by Carl Mäsak

I don't know anything, but I do know that everything is interesting if you go into it deeply enough. — Richard Feynman

Someone gives you a box with these pieces of train track:

It's possible you quickly spot the one obvious way to put these pieces into one single train track:

But... if you're like me, you might stop and wonder:

Is that the only possible track? Or is there some equally possible track one could build?

I don't just mean the mirror image of the track, which is obviously a possibility but which I consider equivalent:

I will also not be interested in complete tracks that fail to use all of the pieces:

I would also reject all tracks that are physically impossible because of two pieces needing to occupy the same points in space. The only pieces which allow intersections even in 2D are the two bridge pieces, which allow something to pass under them.

I haven't done the search yet. I'm writing these words without first having written the search algorithm. My squishy, unreliable wetware is pretty certain that the obvious solution is the only one. I would love to be surprised by a solution I didn't think of!

Here goes.

Yep, I was surprised. By no less than nine other solutions, in fact. Here they are.

I really should have foreseen most of those solutions. Here's why. Already above the fold I had identified what we could call the "standard loop":

This one is missing four curve pieces:

But we can place them in a zig-zag configuration...

...which work a little bit like straight pieces in that they don't alter the angle. And they only cause a little sideways displacement. Better yet, they can be inserted into the standard loop so they cancel each other out. This accounts for seven solutions:

If we combine two zig-zag pieces in the following way:

...we get a piece which can exactly cancel out two orthogonal sets of straight pieces:

This, in retrospect, is the key to the final two solutions, which can now be extended from the small round track:

If I had required that the track pass under the bridge, then we are indeed back to only the one solution:

(But I didn't require that, so I accept that there are 10 solutions, according to my original search parameters.)

But then reality ensued, and took over my blog post. Ack!

For fun, I just randomly built a wooden train track, to see if it was on the list of ten solutions. It wasn't.

When I put this through my train track renderer, it comes up with a track that doesn't quite meet up with itself:

But it works with the wooden pieces. Clearly, there is some factor here that we're not really accounting for.

That factor turns out to be wiggle, the small amounts that two pieces can shift and rotate around the little connector that joins them:

Since there are sixteen pieces, there are sixteen connection points where wiggle can occur. All that wiggle adds up, which explains how the misrendered tack above can be made to cleanly meet up with itself.

Think of the gap in the misrendered track as a 2D vector (x, y). What was special about the ten solutions above is that they had a displacement of (0, 0). But there are clearly other small but non-zero displacements that wiggle can compensate for, leading to more working solutions.

How many working solutions? Well, armed with the new understanding of displacements-as-vectors, we can widen the search to tolerate small displacements, and then plot the results in the plane:

That's 380 solutions. Now, I hasten to add that not all of those are possible. At least one solution that I tried actually building turned out to be impossible because the track tried to run into itself in an unfixable way.

With yet another 8-shaped solution, I had given up trying to make it work because it seemed there was too much displacement. Then my wife stopped by, adjusted some things, and voilà! — it worked. (She had the idea to run the track through one of the small side-tunnels under the bridge; something that hadn't occurred to me at all. There's no way to run a wooden train through that small tunnel, but that also wasn't something I restricted up front. Clearly the track itself is possible.)

Anyway, I really enjoyed how this problem turned out to have a completely solvable constraint-programming core of 10 solutions, but then also a complete fauna — as yet largely unclassified — of approximate, more-or-less buildable solutions around it. All my searching and drawing can be found in this Github repository — others who wish to experiment may want to start from the ideas in there, rather than from scratch.

Essentially, all models are wrong, but some are useful. — George E. P. Box

6guts: ‘grinding out performance improvements

Published by jnthnwrthngtn on 2016-06-12T17:55:26

In the last weeks, I’ve been working on various Perl 6 performance improvements. They’ve led to changes in all of MoarVM, NQP, and Rakudo. For each improvement, there are usually three steps involved:

  1. Take some program that performs poorly, and discover a reason that it’s slow.
  2. Design, implement and test potential changes, until one yields an improvement. (Or, if nothing seems to help, move on to a different reason that it’s slow).
  3. Make sure that the improvement doesn’t cause regressions elsewhere.

In all of these steps, getting some objective measurements are important. Step 3 is relatively straightforward: just build NQP/Rakudo and run their respective test suites. It’s possible for problems to find a place to hide even with this; test suites are, after all, not full system proofs. But it rules out a lot of bad behaviors.

For step 1, profiling is key. I mean, sure, sometimes I can guess why something is slow, and sometimes I’m even right. But the overwhelming majority of the time, measuring wins hands down. That’s why I seeded tools such as the MoarVM profiler and MoarVM heap snapshot analyzer, both of which have seen contributions from others since. And, when working on MoarVM itself, there are various tools for profiling native code.

That leaves step 2. How do I know if I’ve got an improvement? One easy way is to use something like time, doing thousands or millions of iterations to try and avoid noise, and seeing what happens. It’s a bit of a blunt instrument, however. It’s hard to be confident that anything less than a 2%-3% improvement isn’t just measurement noise, and I’m not even confident about that rule of thumb. :-) And, while it may be reasonable to argue that an improvement of just 1% or even a fraction of a percent is just not worth it in some contexts, VM engineering isn’t one of those contexts. A 1% improvement rolled out to all of our users quickly multiplies out to a huge number of saved CPU cycles.

More than that, though, the small improvements add up. Do 5 small improvements that win an average of 1% each, and it adds up to a more satisfying 5% improvement. But with only wallclock times to go on, it’s hard to confidently commit an improvement that wins only 1%, or 0.5%. Why? Simply because it’s hard to be sure that it’s a move in the correct direction. If it’s in the noise range, it may be that things are actually getting a tiny bit worse. Experience tells that just because something looks like it should be an improvement does not, in fact, mean that it will be. So if I’m seeing – to the degree of measurement error – two things coming out the same, and I go ahead and commit the “improvement” anyway, I’m really just guessing.

Enter callgrind, part of the Valgrind suite. It can give a count of the number of CPU instructions executed. And, on two consecutive runs with the same input, it will produce the exact same number. Yes, it’s sloooooow at gathering data; on the other hand, with such precision there’s less need to bump up the number of iterations to hide measurement error. Callgrind also explains what functions the CPU instructions are from, meaning it can play a key role in the profiling step too.

Of course, CPU cycles are also a somewhat blunt instrument too. Instructions executed is not the same as CPU cycles spent. Modern CPUs can both execute multiple instructions in a single cycle due to having multiple function units, as well as stall for anything from several to several hundred cycles on an instruction that accesses memory. This measurement also, of course, gives little insight into I/O bound programs, and I could easily imagine getting rather misled by such data in contention-bound programs. However, for CPU bound programs operating on relatively small heaps and running on a single thread, the numbers can be treated something like a very accurate clock.

Here’s a look at handful of the improvements I’ve been doing, based off looking at callgrind output.

The test program

Here’s the test program I considered. It involves a bunch of invocation and integer operations in a tight loop.

class A {
    has $.i = 0;
    method m() { $!i++ }
}
my $a = A.new;
for ^5000000 {
    $a.m();
}
say $a.i;

Before starting out, this took 17,855,658,600 instructions, which comes out at around 3,580 instructions per iteration. For comparison, here’s a Perl 5 program that I hope is fairly equivalent (I stuck with the built-in OO to avoid the costs of any sugar):

package A;
sub new {
    return bless { i => 0 }, shift;
}
sub m() {
    my $self = shift;
    $self->{i}++;
}
package main;
my $a = A->new;
for (1..5000000) {
    $a->m();
};
say $a->{i};

It weighs in at 10,198,184,195 cycles, or 2040 instructions per iteration, thus running in 56% of the CPU instructions the Perl 6 version takes.

A wasted memset

One thing that stood out right away was the amount of time spent in memset, to zero memory. As I looked at the callers, I spotted one that seemed out of place: clearing the arguments buffer. The code gave a reason (make sure the GC never sees a partial args buffer with old data), but that reason turns out to be bogus: there can never be a GC safepoint anywhere inside of the sequence of instructions that put the arguments in place to pass. So, I removed it.

Now I was down to a really small function. That was just begging to go away. It was called both from the interpreter (where the C compiler may well have inlined it) and also from the JIT. The JIT case was certainly going to be a nice win; instead putting args into the appropriate registers for a function call, making it, and then using the return value, I could just do a couple of simple instructions.

These two fairly simple changes got it down to 17,624,207,945 instructions – a saving of 230 million CPU instructions (1.3%), or around 50 cycles off every iteration. Very much worth having (it makes every single call cheaper), but could have been marginal if measuring simply wallclock time.

Exhausting work

The next win would have been easily visible just from wallclock time, but looking at the callgrind output led to it. I noticed we spent a huge amount of time doing a late-bound (by name) lexical lookup. With --inclusive=yes passed to callgrind_annotate, its line in the output looked like this:

1,149,528,043  ???:MVM_frame_find_lexical_by_name [libmoar.so]

Now, late-bound lexical lookups being costly isn’t really a surprise. This is the case where we have to go searching for a symbol by name (doing hash lookups), because we couldn’t do its resolution at compile-time (or at optimization time). The surprise was that we were doing them at all while executing such a simple program. So I looked into why, and it surprised me.

Long ago (back in the Parrot days), we replaced the secret RETURN lexical (which held an object used in implementing return) with the &EXHAUST sub. This would throw a useful error if you tried to return from a sub you had already returned from, for example due to forgetting that map is lazy and the final statement of a sub is its return value:

sub in(@a, $b) {
    @a.map({ return True if $_ eqv $b })
}
say in([1,2,3], 2); # dies: Attempt to return outside of any Routine

However, this lookup ended up being code-gen’d late-bound. I was considering fixing that, when I got curious if we even needed it at all any more. And indeed, on the VMs we run on today, I got the same decent error message if I simply set RETURN to null. So, I did that and the win was enormous: down to 11,085,521,589 CPU instructions! That meant, relative to the previous measurement, it ran in 63% of the CPU cycles it used to, or 2,220 per iteration.

That might seem incredible given that we only stood to gain 1.1 billion cycles by avoiding the indirect lexical lookup. Where did the other 5 billion go to? It turns out that blocks that have references to symbols lexically outer to them are not able to be inlined (unless they have been resolved by optimization time and are known to stay constant). The late-bound lookup of &EXHAUST was thus an inlining blocker for the method m. With it now being inlined, all of the calling overhead went away in favor of a couple of gotos. (Those also could go away in the future if inlining gets smarter.)

Optimizing get_boxed_ref

Eliminating control flow in favor of data access is usually a win. Branches aren’t the cheapest thing. I spotted an opportunity in a function get_boxed_ref, which is used to extract the memory address of the P6bigint representation that is inlined into a P6opaque (as happens with Int; if you’re wondering why it’s a P6opaque, recall that it’s allowable to mix into Int objects).

The resulting change shaved another 60 million cycles off the program; only 12 per iteration, but since this improves every single Int operation we perform in Perl 6, I’ll take it.

Static elimination of decontainerization

The &EXHAUST change uncovered an inlining issue that I had to fix, and while doing so I took a look at the code we were producing and got an idea. In Perl 6, we have Scalar containers. So when you have:

my $answer = 42;

Then the callframe has a slot for $answer that points to a Scalar object, which has a value attribute that points to the immutable Int object 42. Therefore, when we need the value from a container, we have to get hold of it. That is done with an instruction called decont, which checks if we have a container and dereferences it if so, and otherwise just evaluates to the value. The dynamic optimizer is pretty good at removing unneeded decont instructions to save the cycles/branches involved, and can lower those that remain into simple pointer arithmetic. But it can’t get ’em all, and of course not all code gets hot enough to be optimized in such a way.

I spotted that in some cases, we emitted decont instructions against compile time constants that we could cheaply determine, at compile time, would never need them. A small code-gen patch was sufficient to eliminate a load of them. This led to another 35 million cycles saved. However, the number of cycles in an empty program at startup also went down by quite an amount too, so it’s likely not that we’re saving so much in the loop (if anything). Additionally, this change shaved 3.6KB off the NQP bytecode size, 16.2KB off the Rakudo compiler bytecode size, and 55.8KB off CORE.setting bytecode size.

And next time: returning to return

Fixing the &EXHAUST bug reminded me that I’ve long wanted to change the way we implement return, to make it free – rather than just cheap – in the case that we never explicitly return, and much cheaper when we do. I’m still working on getting those changes finished up, and this has already been a fairly long post, so I’ll save the details – and the results – for next time.


brrt to the future: In praise of incremental change

Published by Bart Wiegmans on 2016-06-04T17:01:00

Hi everybody, welcome back. I'd like to share with you the things that have been happening in the MoarVM JIT since I've last posted, which was in fact March. To be brief, the JIT has been adapted to deal with moving frames, and I've started to rewrite the register allocator towards a much better design, if I do say so myself.

First, moving frames. jnthn has already written quite a bit about them. The purpose of it is to make the regular case of subroutine invocation cheaper by making special cases - like closures and continuations - a bit more expensive. I have nothing to add to that except that this was a bit of a problem for the JIT compiler. The JIT compiler liked to keep the current frame in a non-volatile register. These are CPU registers which are supposed not to be overwritten when you call a function. This is useful because it speeds up access of frequently used values. However, if the current frame object is be moved, the frame pointer in this register becomes stale. Thus, it had to go, and now we load the frame pointer from the thread context object (i.e. in memory), which never moves.

Unfortunately, that was not sufficient. Because MoarVM is an interpreter, control flow (like returning from a routine) is implemented updating the pointer to the next instruction (and the next frame). JIT compiled code never deals with this instruction pointer. Hence, whenever this instruction pointer could have been updated - we call this invokish or throwish code - the JIT may need to return control to the interpreter, which can then figure out what to do next. Originally, this had been implemented by comparing the frame pointer of the JIT frame - stored in the non-volatile register - with the frame pointer as understood by the interpreter - i.e., in the thread context object. This check no longer worked, because a): we didn't have a permanent pointer to the current frame anymore, and b): the current frame pointer might change for two different reasons, namely control flow and object movement.

I figured out a solution to this issue when I realized that what we really needed is a way to identify (cheaply) in the JIT whether or not we have changed control flow, i.e. whether we have entered another routine or returned out of the current one. This might be achieved by comparing immutable locations, but lacking those, another method is to simply assign increasing numbers to constructed frames. Such a sequence number then identifies the current position in the control flow, and whenever it is changed the JIT knows to return control to the interpreter. This caused some issues at first when I hadn't correctly updated the code in all places where the interpreter changed the current instruction, but afterwards it worked correctly. Special thanks go to lizmat who allowed me to debug this on Mac OS X, where it was broken.

Afterwards, I've focused on improving the register allocator. The primary function of a register allocator is to ensure that the values used in a calculations are placed in (correct) registers prior to that calculation. This involves, among other things, assigning the correct registers (some operations only work on specific registers), spilling registers to memory in order to make place, loading spilled values from memory if necessary, and ensuring that values in volatile registers are spilled prior to a function call. This was rather difficult because in the old design because, as it was inlined into the compilation step, it  couldn't really look behind or ahead, which is a problem if you want to place something correctly for future use. Furthermore, it allowed only for a one-on-one correspondence between a value that was computed and its current location in a register. That is a problem whenever -a value is copied to a different register, or stored in multiple memory locations.

So I've been, very slowly and methodically, in very small steps, moving code and structures through the JIT compiler in order to arrive at a register allocator that can handle these things. The first thing I did was remove the register allocation step out of compilation, into its own step (commit and another commit). Then I extracted the value descriptor structure - which describes in which location a value is stored - out of the expression tree (commit). I stored the tile list in a vector, in order to allow reverse and random access (commit). Because the register allocator works in a single pass and only requires temporary structures, I've 'internalized' it to its own file (commit one and commit two). Finally, I replaced the per-expression value structure with value descriptor structures (commit).

This places me in a position to replace register allocator structures (such as the active stack with an expiry heap), implement loads and stores, record register requirements per tile, implement pre-coloring, and correct allocation over basic blocks. All these things were impossible, or at least very difficult, with the old design.

What I think is interesting here is that in each of these commits, the logic of the program didn't substantially change, and the JIT continued to just as well as it had before. Nevertheless, all of this is progress - I replaced a rather broken design assumption (online register allocation with a value state machine) with another (offline register allocation with value descriptors) - that allows me to implement the necessary mechanics in a straightforward manner. And that, I think, demonstrates the power of incremental changes.

hoelzro: Combining Character Caveats

Published on 2016-06-01T01:05:26

I'm currently following the Fluent Forever methodology for learning Russian. The method suggests that when you start learning vocabulary, you should learn 625 common, concrete words that you can easily associate with pictures and feelings. The next step after this is to build your vocabulary to include the 1,000 most common words in your target language. I recently completed the foundation of 625 words, so it was time to move on to the next 1,000. There was just one small problem…


Read Morerakudo.org: Announce: Windows (64 bit) Installer for Rakudo Star 2016.04

Published by Steve Mynott on 2016-05-03T21:05:34

A Windows MSI installer is now available for x86_64 (64bit) platforms (probably Windows 7 or better) and has the JIT (Just in Time compiler) enabled.  This version was built with mingw (from Strawberry Perl) rather than the usual MSVC which may have performance implications. In order to use the module installer “panda” you will also need to install Strawberry Perl and Git for Windows. Also see http://www.perl6.org/downloads/ for Errata notices concerning panda.

Unfortunately the usual second installer targeting x86 (32bit) platforms isn’t currently available.  It’s hoped this will be available in the near future and until then the previous 2016.01 is recommended for x86 (32bit). No 32 bit versions feature the JIT. MSIs are available from http://rakudo.org/downloads/star/.

6guts: Refactoring and torture

Published by jnthnwrthngtn on 2016-04-30T15:14:59

This post covers the previous two weeks of my Perl 6 grant work. Last time I wrote here, I plotted changes to call frames in MoarVM. Or, as the wonderful Perl Weekly put it, provided way too much information about call frames. :-)

That there is so much to say about call frames reflects their rather central role. They play a big part in supporting numerous language features (function calls, recursion, closures, continuations, dynamic variables, and pseudo-packages like OUTER and CALLER). The garbage collector scans them to find live objects. Both the interpreter and JIT compiler have to relate to them in various ways. The dynamic optimizer performs transforms over them when doing OSR (On Stack Replacement) and uninlining (the deoptimization that enables us to speculatively perform inlining optimizations).

All of which makes a major refactor of call frames a rather scary prospect. While they have picked up additional bits of state as MoarVM has evolved, they have been reference counted since, well, the day I first implemented call frames, which means “before MoarVM could even run any code”. Being reference counted, rather than handled by the garbage collector, gave them a property that is easy to rely on, and rather less easy to discover reliance on: that they never move over their lifetime.

I like to move it move it

Wait, what, move? Why would they even move?

Here’s a little Perl 6 program to illustrate. It declares a class, makes an instance of it, prints its memory address, does a load of further, throwaway, memory allocations, and then again prints the address of the object instance we made.

class A { }
my $obj = A.new;
say $obj.WHERE;
A.new for ^10000;
say $obj.WHERE;

When I ran this locally just now, I got:

39391392
95466848

If you get the same number twice, just make the 10000 something bigger. What’s interesting to note here is that an object’s location in memory can change over time. This is a consequence of MoarVM’s garbage collector, which is both generational and manages its young generation using semi-space copying. (This is nothing special to MoarVM; many modern VMs do it.)

Being able to move objects relies on being able to find and update all of the references to them. And, since MoarVM is written in C, that includes those references on the C stack. Consider this bit of code, which is the (general, unoptimized) path for boxing strings:

MVMObject * MVM_repr_box_str(MVMThreadContext *tc, MVMObject *type, MVMString *val) {
    MVMObject *res;
    MVMROOT(tc, val, {
        res = MVM_repr_alloc_init(tc, type);
        MVM_repr_set_str(tc, res, val);
    });
    return res;
}

It receives val, which is a string to box. Note that strings are garbage-collectable objects in MoarVM, and so may move. It then allocates a box of the specified type (for example, Perl 6’s Str), and puts the string inside of it. Since MVM_repr_alloc_init allocates an object, it may trigger garbage collection. And that in turn may move the object pointed to by val – meaning that the val pointer needs updating. The MVMROOT macro is used in order to add the memory address of val on the C stack to the set of roots that the GC considers and updates, thus ensuring that even if the allocation of the box triggers garbage collection, this code won’t end up with an old val pointer.

Coping with moving frames

Last time, I discussed how reference counting could be eliminated in favor of a “linear” call stack for frames that don’t escape (that is, become heap referenced), and promoting those that do escape to being garbage collected. As an intermediate step there, I’ve been working to make all frames GC-managed. This means that frames can move, and that they are part of the generational scheme. Therefore, every piece of code that both holds a reference to a frame and takes a code path that can allocate would need updating with MVMROOT. Further, all assignments of frames into other objects, and other objects into frames, would need write barriers (aside from the working area, which is handled specially).

In part, this just needs a lot of care. Going through the places frames show up, updating things as needed, and so forth. But even then, would that really be enough to be confident things weren’t broken? After all, my refactor was changing the rules for one of the most important data structures in the VM.

Of course, building NQP and Rakudo and passing the spectest suite is one good way to exercise MoarVM after the changes. Doing this showed up some issues, which I fixed. But even that doesn’t offer a huge amount of confidence. A simple script, or a short test, might trigger no garbage collections at all, or just the odd one. And the collections are highly likely to be triggered on the most common code paths in the VM.

GC torture testing

When faced with something scary, a surprisingly good approach is to tackle it by doing it really often. For example, are software releases scary? If yes, then do time-based releases every month, and with time they’ll become automatic and boring enough not to be scary. Is deploying changes to a production system scary? If yes, then adopt continuous delivery, deploying lots of times in a day and with easy rollback if things don’t go well.

Garbage collection is pretty scary. I mean, we take this world of objects the program has created, move them around, throw a bunch of them out, and then have the program continue running as if nothing happened. So…let’s try doing it really often!

This is exactly what GC torture testing involves.

--- a/src/gc/collect.h
+++ b/src/gc/collect.h
@@ -1,7 +1,7 @@
 /* How big is the nursery area? Note that since it's semi-space copying, we
  * actually have double this amount allocated. Also it is per thread. (In
  * the future, we'll make this adaptive rather than a constant.) */
-#define MVM_NURSERY_SIZE 4194304
+#define MVM_NURSERY_SIZE 13000

Rather than doing a collection every 4MB worth of allocations, let’s do one every 13KB worth of allocations! That’s around 320 times more often. Combined with a few debugging checks enabled, to catch references to objects that are out of date, bugs resulting from missing MVMROOTs and write barriers can be driven out of their hiding places into the light of day.

It’s a rather effective technique. It’s also a very time-consuming one. The NQP and Rakudo builds easily take an hour between them, and running spectest this way takes over 12 hours. It’s cheap compared to shipping a MoarVM with new and nasty bugs that waste a bunch of people’s time, of course!

It’s been a while since we did such a torture test. I’ve decided we should do them more often. It found issues. So far, from the spectest run torture test results, I’ve fixed 9 bugs (I didn’t go back and count those discovered while building NQP and Rakudo). What’s interesting is that of the 9, only 3 of them were clearly attributable to my refactors, one was potentially related to them, and 5 were bugs that must have been around a good while. One of the bugs that did relate to the frames changes caused deadlocks in multi-threaded code quite reliably under torture testing, but would have likely caused them very rarely under normal use (and so been extremely frustrating to reproduce and track down if it made it into the wild). 2 of the fixed non-frame bugs exclusively affected multi-threaded programs and would have doomed them. One was in the CUnion representation, and probably was the cause of some previously unresolved occasional failures of the NativeCall union tests.

What next?

By this point, I’m reasonably confident that regressions due to the first step of the frame changes have been shaken out. The GC torture testing has, however, shed light on some other issues that will want addressing in the near future.

I intend to put those aside for a little while, and complete the frame changes, introducing the linear stack. Compared with the first step, this feels like a lower risk change, in that mistakes should be a lot easier and cheaper to detect. I’d like to try and land this in the next week or so, in order that it can get some real-world testing before it makes it into a MoarVM and Rakudo release.

Once that’s out the way, I’ll be returning to other issues turned up in GC torture testing. I’d also like to look into a way to be able to run it automatically and regularly (once a week, perhaps). It’s a good bit too intensive to be able to farm it out to Travis. The sensible solution is probably to do it in the cloud, on some elastic compute thing where it just uses a machine once a week for a day or so. The silly but fun way is to build a Raspberry Pi cluster on my desk, and hack up something to distribute the tests across them. :-)


Strangely Consistent: Trinity

Published by Carl Mäsak

Lately, driven by a real itch I wanted to scratch, I finally wrote my first Perl 6 web app since the November wiki engine. (That was back in 2008. Very early days. I distinctly recall Rakudo didn't have a does-file-exist feature yet.)

Because I'm me, naturally the web app is a game. A board game. The details don't matter for this article — if you're curious, go check out the README. If you're not, I forgive you. It's just a game with a board and stones of various colors.

Here's the order in which I wrote the web app. The first thing I made work was a model, literally the game itself.

image of the model in the middle

This is the core, the heart of the application. A Games::Nex instance models the progress of a game, and the rules of Nex are encoded in this type's "algebra". Naturally, this was developed test-first, because it would be reckless not to. This is the essence of masakism: "testing gets you far" and "keep it small and simple". Eliminate all other factors, and the one which remains must be the model at the core.

The core domain is nice, but it doesn't have any face. By design, it's just mind stuff. I wanted it to be a web app, so the next reasonable step was to add a Bailador app that could show the board and allow moves to be tried on it.

image of the app+model in the middle

Of course, when you run it — it's a web app — the result turns up in your browser, as you're expect.

image of browser <--- app+model

And when you make a move by interacting with the web page, the browser goes "Hey, app! Yo! Do some stuff!"

image of browser ---> app+model

None of this is news to you, I'm sure. My point so far though is that we're now up to two environments. Two separate runloops. It has to be that way, because — in every single case except the crazy one where I am you — the server and the client will be two different computers. The GET and POST requests and their responses are simply client and server shouting to each other across the network.

Now my app had a face but no long-term memory. Every move was against an empty game board; you made it, and all traces of it disappeared into the great data limbo.

Time to add (you guessed it) a database backend:

image of browser <---> app+model <---> db

(Postgres is great, by the way. Highly recommended. Makes me feel happy just thinking about it. Mmm, Postgres.)

The database also sits in its own, separate runloop. Maybe — probably — on its own machine, too. So we have three environments.

It was at this point I felt that something was... well not wrong exactly, but odd...

<masak> I was struck by not just how wide apart the database world is from
        the server backend world, but also how wide apart the [browser]
        world is from the server backend world
<masak> you're writing three separate things, and making them interoperate
<masak> I almost feel like writing a blog post about that

Coke insightfully quipped:

<[Coke]> this reads like you discovered a 3-tier web app.

And yes, that is the name for it. Freakin' three-tier. Love those bloody tiers. Like a hamburger, but instead of meat, you got a tier, and not in bread as usual, but between two more tiers!

I have a subtle point I want to make with this post, and we're now getting to it. I'm well aware of how good it is to separate these three tiers. I teach a software architecture course several times yearly, and we've made a point to be clear about that: the model in the middle should be separated, nay, insulated, from both UI and DB. Why? Because UI layers come and go. First it's a desktop client, some years later it's a responsive Web 2.0 snazzle-pot of janky modernity, and surely in a few more decades we'll all be on VR and telepathy. Throughout all this, the model can stay the same, unfazed. Ditto the database layer; you go from RDBMS to NoSQL back to PffyeahSQL to an in-memory database to post-it-notes on the side of your screen. .

And yet... I want to make sure I'm building a single monolithic system, not three systems which just happen to talk to each other. That's my thesis today: even in a properly factored three-tier system, it's one system. If we don't reflect that in the architecture, we have problems.

If you have three tiers I feel bad for you son / I got 99 problems but message passing ain't one

So, just to be abundantly clear: the three tiers are good — besides being a near-necessity. The web is practically dripping with laud and praise of three-tier:

By segregating an application into tiers, developers acquire the option of modifying or adding a specific layer, instead of reworking the entire application. — Wikipedia

During an application's life cycle, the three-tier approach provides benefits such as reusability, flexibility, manageability, maintainability, and scalability. — Windows Dev Center

Three-tier architecture allows any one of the three tiers to be upgraded or replaced independently. —

It's against this unisonal chorus of acclaim that I'm feeling that some kind of balancing force is missing and never talked about. Why am I writing three things again? I'm down in the database, I'm writing code in my model and my app, and I'm away writing front-end JavaScript, and they are basically unrelated. Or rather, they are only related because I tend to remember that they should be.

I think it's especially funny that I didn't really choose the three-tier model along the way. It was forced upon me: the browser is a separate thing from the server, and the database engine is a separate thing from the app. I didn't make it so! Because I didn't make that design decision, it's also not something to be proud of afterwards. "Ah, look how nice I made it." When everyone keeps repeating how good it is with a software architecture that's not a choice but a fact of the territory, it sounds a bit like this:

Potholes in the road allow us to slow down and drive more carefully, making our roads safer. — no-one, ever

Maybe we can close our eyes and imagine a parallel universe where some enlightened despot has given us the ultimate language Trinity, which encompasses all three tiers, and unites all concerns into one single conceptual runloop, one single syntax, and one single monolithic application. Sure, we still do DB and frontend stuff, but it's all the same code base, and incidentally it's so devoid of incidental complexity, that pausing to think about it will invariably make us slightly misty-eyed with gratitude.

Would that work? I honestly don't know. It might. We don't live in that universe, but I sure wouldn't mind visiting, just to see what it's like.

One very good argument that I will make at this point, before someone else makes it for me, is that the three tiers actually handle three very different concerns. Basically:

These differences go deep, to the point where each tier has a quite specialized/different view of the world. To wit:

But it doesn't end there. The three tiers also have very typical, well-defined relations to each other.

And I haven't even started talking about synchronous/asynchronous calls, timeouts, retries, optimistic UIs, user experience, contention, conflicts, events pushed from the server, message duplication, and error messages.

I guess my point is that however Trinity looks, it has a lot on its plate. I mean, it's wonderful that a Trinity system is one single application, and things that ought to be close to each other in code can be... but there's still a lot of concerns. Sometimes the abstraction leaks. You can practically see that server-code block glare in paranoid suspicion at that client-code block. And, while Trinity of course has something much more unified than SQL syntax, sometimes you'd feel a little bump in the floor when transitioning from model code to DB code.

But I could imagine liking Trinity a lot. Instead of writing one page Perl 6 and two pages JavaScript, I'd just implement the code once in Trinity. Instead of describing the model once for the server and once for the database, I'd just describe it once. Parallel universe, color me jealous.

But back to our world with three inevitable tiers. I've been thinking about what would make me feel better about the whole ui-app-db impedance mismatch. And, I haven't tried it out yet, but I've decided I want to borrow a leaf from 007 development. In 007, basically when we've found things that might go wrong during development, we've turned those things into tests. These are not testing the behavior of the system, but the source code itself. Call them consistency tests. We really like them; they make us feel ridiculous for making silly consistency mistakes, but also grateful that the test suite caught them before we pushed. (And if we accidentally push, then TravisCI notifies us that the branch is failing.)

What needs testing? As so often, the interfaces. In this case, the surface areas between frontend/app, and between app/database.

image of browser <-|-> app+model <-|-> db

Here's what I'm imagining. The frontend and app are only allowed to talk to each other in certain pre-approved ways. The tests make sure that each source file doesn't color outside of the line. (Need to make that code analysis robust enough.) Ditto with the app and database.

I figure I could use something pre-existing like Swagger for the frontend/app interaction. For the app/db interaction, actually using the database schema itself as the canonical source of truth seems like a decent idea.

Maybe it makes sense to have both static tests (which check the source code), and runtime tests (which mock the appropriate components and check that the results match in practice). Yes, that sounds about robust enough. (I'm a little unsure how to do this with the client code. Will I need to run it headless, using PhantomJS? Will that be enough? I might need to refactor just to expose the right kind of information for the tests.)

That would give me confidence that I'm not breaking anything when I'm refactoring my game. And it might make me feel a little bit better about the Trinity language being in a parallel universe and not in this one.

I might write a follow-up post after I've tried this out.

rakudo.org: Announce: Mac OS X Installer for Rakudo Star 2016.04

Published by Steve Mynott on 2016-04-27T14:13:45

A Mac OS X installer is now available. This installer has the “.dmg” file extension and is available from http://rakudo.org/downloads/star/rakudo-star-2016-04.dmg

rakudo.org: Announce: Rakudo Star Release 2016.04

Published by Steve Mynott on 2016-04-25T23:36:56

On behalf of the Rakudo and Perl 6 development teams, I’m honored to announce the April 2016 release of “Rakudo Star”, a useful and usable production distribution of Perl 6. The tarball for the April 2016 release is available from http://rakudo.org/downloads/star/.

This is the second post-Christmas (production) release of Rakudo Star and implements Perl v6.c. It comes with support for the MoarVM backend (all module tests pass on supported platforms).

Please note that this release of Rakudo Star is not fully functional with the JVM backend from the Rakudo compiler. Please use the MoarVM backend only.

In the Perl 6 world, we make a distinction between the language (“Perl 6″) and specific implementations of the language such as “Rakudo Perl”. This Star release includes release 2016.04 of the Rakudo Perl 6 compiler, version 2016.04 of MoarVM, plus various modules, documentation, and other resources collected from the Perl 6 community.

Some of the new compiler features since the last Rakudo Star release include:

Notable changes in modules shipped with Rakudo Star:

There are some key features of Perl 6 that Rakudo Star does not yet handle appropriately, although they will appear in upcoming releases. Some of the not-quite-there features include:

There is an online resource at http://perl6.org/compilers/features that lists the known implemented and missing features of Rakudo’s backends and other Perl 6 implementations.

In many places we’ve tried to make Rakudo smart enough to inform the programmer that a given feature isn’t implemented, but there are many that we’ve missed. Bug reports about missing and broken features are welcomed at .

See http://perl6.org/ for links to much more information about Perl 6, including documentation, example code, tutorials, presentations, reference materials, design documents, and other supporting resources. Some Perl 6 tutorials are available under the “docs” directory in the release tarball.

The development team thanks all of the contributors and sponsors for making Rakudo Star possible. If you would like to contribute, see http://rakudo.org/how-to-help, ask on the  mailing list, or join us on IRC #perl6 on freenode.

6guts: Framing the problem

Published by jnthnwrthngtn on 2016-04-21T17:11:49

In this post I’ll be talking a lot about call frames, also known as invocation records. Just to be clear about what they are, consider a sub:

sub mean(@values) {
    @values.sum / @values
}

Whenever we call mean, we create a call frame. This holds the storage for the incoming @values parameter. It also holds some temporary storage we use in executing the sub, holding, for example, the sum method object we get back when looking up the method, and the result of calling @values.sum, which we then pass to infix:</>. Call frames also record outer and caller references (so we can resolve lexical and dynamic variables), the place to store the return value and go to on return, and other bits. It’s important to note that call frames are not 1:1 with subs/methods/blocks. Perhaps the best way to understand why is to consider a recursive sub:

sub fac($n) {
    $n <= 1
        ?? 1
        !! $n * fac($n - 1)
}

There’s one fac sub but we need a call frame for each invocation of (that is, call to) fac, since the $n parameter will vary in each call. (Threads are another example where you’re “in” a sub multiple times at the same time.)

All complex software systems evolve from simple systems. MoarVM is no exception. Back when MoarVM started out, I knew I wanted to have invocation be cheap, and call frames be fairly lightweight. I also didn’t want them to be GC-allocated. I figured that code sat in a loop, only using native types and only calling things involving native types, should not create garbage that needed collecting. All good goals.

Fast forward a few years, and where are we? Let’s start out with the easy one to assess: frames aren’t GC-allocated. So that’s good, right? Well, sure, in that I got the natives property that I was after. However, since things like closures and continuations exist, not to mention that you can get a first-class reference to a call frame and traverse the outer/caller chain, the lifetime of frames is interesting. They most certainly don’t always just go away at the point of return. Therefore, they need to have their memory managed in some way. I went with reference counts, figuring that since we’d only need to twiddle them fairly occasionally, it’d be fairly OK. Trouble is, thanks to MoarVM supporting concurrent threads of execution, those counts need to be incremented and decremented using atomic operations. Those are CPU native, but they’re still a bit costly (more so on some CPUs that others).

There’s another, more hidden, cost, however – one I didn’t really see coming. MoarVM has a generational garbage collector, as discussed in my previous post. But frames are not garbage collectable objects. They’re managed by reference counts. So what happens when a reference counted frame is referenced by a second generation object? Well, there’s no risk of the frames going away too early; the reference count won’t be decremented until the gen2 object itself is collected. The problem is about the objects the frame references. Frames, not being garbage collectable, don’t have write barriers applied on binds into them. This means that they can come at any time to point to nursery objects. We solved this by keeping all objects referencing frames in the inter-generational root set. This is perfectly safe. Unfortunately, it also greatly increases the cost of garbage collection for programs that build up large numbers of closures in memory and keep them around. Of course, since write barriers are cheap but not free, we get a performance win on all programs by not having to apply them to writes to working registers or lexical.

So, how about invocation cost? Is invocation cheap? Well, first of all lets turn off inlining:

SET MVM_SPESH_INLINE_DISABLE=1

And measure 10 million invocations passing/receiving one argument using Perl 5, NQP, and Rakudo. Perl 5 does them in 2.85s. NQP comes out a little ahead, at 2.45s. Rakudo strolls through them in an altogether too leisurely 6.14s. (Turn inlining back on, and Rakudo manages it in 3.39s.) So, if NQP is already ahead, is MoarVM really so bad? Well, it could certainly be better. On an idealized 3GHz GPU, each invocation is costing around 735 CPU cycles. That’s pricey. The other issue here is that just matching Perl 5 on invocation speed isn’t really enough, because tons of things that aren’t invocations in Perl 5 actually are in Perl 6 (like, every array and hash index). In a “Perl 6 is implemented in Perl 6” world, we need to squeeze a good bit more out of invocation performance.

And finally, what about size? An MVMFrame comes at a cost of 296 bytes. It points to a chunk of working space together with a lexical environment (both arrays). Every single closure we take also pays that fixed 296 byte cost (and, of course, the cost of the lexical environment storage, since that’s what we actually take closures for). Again, not staggeringly huge, but it adds up very quickly.

These are all areas that need improvement. In fact, they make up two of the entries in the performance section of theproposal for the grant I’m doing this work under. So, I decided it was time to start thinking about how I’ll address them.

Some measurements

I was curious how many frames end up referenced by garbage collectable objects against how many never end up in this situation. So, I quickly patched MoarVM to keep track of if a frame ever came to be referenced by a GC-able object:

diff --git a/src/core/frame.c b/src/core/frame.c
index ca1a4d2..f392aca 100644
--- a/src/core/frame.c
+++ b/src/core/frame.c
@@ -114,7 +114,10 @@ MVMFrame * MVM_frame_dec_ref(MVMThreadContext *tc, MVMFrame *frame) {
      * to zero, so we look for 1 here. */
     while (MVM_decr(&frame->ref_count) == 1) {
         MVMFrame *outer_to_decr = frame->outer;
-
+if (frame->refd_by_object)
+    tc->instance->refd_frames++;
+else
+    tc->instance->non_refd_frames++;
         /* If there's a caller pointer, decrement that. */
         if (frame->caller)
             frame->caller = MVM_frame_dec_ref(tc, frame->caller);
diff --git a/src/core/instance.h b/src/core/instance.h
index b14f11d..4f61000 100644
--- a/src/core/instance.h
+++ b/src/core/instance.h
@@ -365,6 +365,9 @@ struct MVMInstance {

     /* Cached backend config hash. */
     MVMObject *cached_backend_config;
+
+MVMuint64 refd_frames;
+MVMuint64 non_refd_frames;
 };

 /* Returns a true value if we have created user threads (and so are running adiff --git a/src/main.c b/src/main.c
index 5458912..1df4fe3 100644
--- a/src/main.c
+++ b/src/main.c
@@ -189,7 +189,9 @@ int main(int argc, char *argv[])

     if (dump) MVM_vm_dump_file(instance, input_file);
     else MVM_vm_run_file(instance, input_file);
-
+printf("Ref'd frames: %d\nNon-ref'd frames: %d\n",
+    instance->refd_frames,
+    instance->non_refd_frames);
     if (full_cleanup) {
         MVM_vm_destroy_instance(instance);
         return EXIT_SUCCESS;

And measured a few things (the names from the latter ones are benchmark names from perl6-bench):

Measured                    Ref'd       Non-ref'd       % Ref'd
========                    =====       =========       =======
NQP startup                 0           5259            0.0%
NQP regex tests             28065       1682655         1.6%
Compile Perl 6 actions      115092      6100770         1.7%
Compile Perl 6 grammar      130716      5451120         2.3%
Compile CORE.setting        2065214     55771097        3.8%
Perl 6 startup              35          12822           0.3%
Compiling Test.pm6          39639       860474          4.4%
Compiling NativeCall.pm6    145426      1887682         7.2%
while_array_set             993701      6024920         14.1%
while_hash_set              1804        2024016         0.1%
for_assign                  1654        1020831         0.2%
for_concat_2                1743        2023589         0.1%
split_string_regex          8992750     19089026        32.0%
create_and_iterate_hash_kv  14990870    40027814        27.2%
parse_json                  10660068    42364909        20.1%
rc-forest-fire              3740096     16202368        18.8%
rc-mandelbrot               89989       5523439         1.6%
rc-man-or-boy-test          791961      7091381         10%

What can we infer from this? First of all, most NQP programs have at most just a few percent of their frames referenced by GC-able objects. With the Perl 6 benchmarks, it’s all over the map, with split_string_regex being the “worst” case. NQP’s optimizer is much better doing lexical to local lowering, and flattening away scopes that we don’t really need. In Rakudo, we’re pretty weak at that. Clearly, some more work on this area could benefit Rakudo (and yes, it’s also on the list of things to do under my grant).

Secondly, since – even in the worst cases – the majority of frames never get themselves tied up with any “interesting” situations that causes them to become GC-referenced, a strategy that handles them differently – and hopefully far more efficiently – would give us a win.

What GC-able things reference frames?

It was fairly easy to grep through the MoarVM source and make a list. I did so to help me think through the cases:

It’s also interesting to note that a frame only ever “escapes” such that it can be touched by another thread if it becomes referenced by a GC-able object.

What makes frames take up space?

Next, I decided to to through the MVMFrame data structure and see where the space is going, and what options might exist for saving that space. What follows is an analysis of all the fields in an MVMFrame.

/* The thread that is executing, or executed, this frame. */
MVMThreadContext *tc;

Interestingly, this one gets cleared after a certain point in the frame’s life, except if it’s captured in a continuation. Exception handling uses it to know if the frame is still on the call stack, which is interesting in various cases. GC marking uses it to know if it should mark ->work (see below).

Interestingly, nothing seems to care overly much at the moment that it points to a particular thread context; they all want it for a flag. So, it’s certainly a candidate for removal. It’s also interesting to note that in every case where a frame is not referenced by an object, it is alive solely by being in a thread’s “call stack” – that is, the call chain from following the ->caller pointer from the currently executing frame of a thread. So, the flag will only matter for frames that are GC-referenced.

/* The environment for this frame, which lives beyond its execution.
* Has space for, for instance, lexicals. */
MVMRegister *env;

Relevant for frames in whatever state.

/* The temporary work space for this frame. After a call is over, this
* can be freed up. Must be NULLed out when this happens. */
MVMRegister *work;

Relevant for frames that are still executing, or that are captured by a continuation. Cross-cuts whether they are GC-referenced.

/* The args buffer. Actually a pointer into an area inside of *work, to
* decrease number of allocations. */
MVMRegister *args;

Possibly could go away through a level of indirection, but it’s performance sensitive. Used together with…

/* Callsite that indicates how the current args buffer is being used, if
* it is. */
MVMCallsite *cur_args_callsite;

…this one.

/* The outer frame, thus forming the static chain. */
MVMFrame *outer;

Pretty much everything has an outer.

/* The caller frame, thus forming the dynamic chain. */
MVMFrame *caller;

Pretty much everything has a caller too.

/* The static frame information. Holds all we statically know about
* this kind of frame, including information needed to GC-trace it. */
MVMStaticFrame *static_info;

As you might guess, this is pretty important and useful. However, it’s also possible to obtain it – at the cost of a level of indirection – through the ->code_ref below. Would need to measure carefully, since it’d increase the cost of things like lexical lookups from outer frames (and, once we get better at optimizing, that will be “most of them”).

/* The code ref object for this frame. */
MVMObject *code_ref;

The particular closure we were invoked as. Not something we can obviously lose, and needed for the lifetime of the frame in general.

/* Parameters received by this frame. */
MVMArgProcContext params;

Argument processing context. Every frame uses it to process its arguments. It’s only useful while ->work is active, however, and so could be allocated as a part of that instead, which would reduce the cost of closures.

/* Reference count for the frame. */
AO_t ref_count;

Can go away provided we stop reference counting frames.

/* Is the frame referenced by a garbage-collectable object? */
MVMint32 refd_by_object;

Could also go away provided we stop reference counting frames and have some scheme for optimizing the common, non-referenced case.

/* Address of the next op to execute if we return to this frame. */
MVMuint8 *return_address;

/* The register we should store the return value in, if any. */
MVMRegister *return_value;

/* The type of return value that is expected. */
MVMReturnType return_type;

/* The 'entry label' is a sort of indirect return address
* for the JIT */
void * jit_entry_label;

These four are only used when the frame is currently on the call stack, or may be re-instated onto the call stack by a continuation being invoked. Could also live with ->work, thus making closures cheaper.

/* If we want to invoke a special handler upon a return to this
* frame, this function pointer is set. */
MVMSpecialReturn special_return;

/* If we want to invoke a special handler upon unwinding past a
* frame, this function pointer is set. */
MVMSpecialReturn special_unwind;

/* Data slot for the special return handler function. */
void *special_return_data;

/* Flag for if special_return_data need to be GC marked. */
MVMSpecialReturnDataMark mark_special_return_data;

Used relatively occasionally (and the more common uses are candidates for spesh, the dynamic optimizer, to optimize out anyway). A candidate for hanging off an “extra stuff” pointer in a frame. Also, only used when a frame is on the call stack, with the usual continuation caveat.

/* Linked list of any continuation tags we have. */
MVMContinuationTag *continuation_tags;

Used if this frame has been tagged as a possible continuation “base” frame. Only relevant if that actually happens (which is quite rare in the scheme of things), and can only happen when a frame is on the call stack. A candidate for similar treatment to the special return stuff.

/* Linked MVMContext object, so we can track the
* serialization context and such. */
/* note: used atomically */
MVMObject *context_object;

This is used when a context goes first-class. Thus, it implies the frame is referenced by at least one GC-able object (in fact, this points to said object). That’s fairly rare. It can happen independently of whether the frame is currently executing (so, unrelated to ->work lifetime).

/* Effective bytecode for the frame (either the original bytecode or a
* specialization of it). */
MVMuint8 *effective_bytecode;

/* Effective set of frame handlers (to go with the effective bytecode). */
MVMFrameHandler *effective_handlers;

/* Effective set of spesh slots, if any. */
MVMCollectable **effective_spesh_slots;

/* The spesh candidate information, if we're in one. */
MVMSpeshCandidate *spesh_cand;

These are all related to running optimized/specialized code. Only interesting for frames currently on the call stack or captured in a continuation (so, ->work lifetime once again).

/* Effective set of spesh logging slots, if any. */
MVMCollectable **spesh_log_slots;

/* If we're in a logging spesh run, the index to log at in this
* invocation. -1 if we're not in a logging spesh run, junk if no
* spesh_cand is set in this frame at all. */
MVMint8 spesh_log_idx;

/* On Stack Replacement iteration counter; incremented in loops, and will
* trigger if the limit is hit. */
MVMuint8 osr_counter;

These 3 play part a part in dynamic optimization too, though more in the stage where we’re gathering information. Again, they have ->work lifetime. The top may well go away in future optimizer changes, so not worth worrying over too much now.

/* GC run sequence number that we last saw this frame during. */
AO_t gc_seq_number;

This one is certainly a candidate for going away, post-refactoring. It serves as the equivalent of a “mark bit” when doing GC.

/* Address of the last op executed that threw an exception; used just
* for error reporting. */
MVMuint8 *throw_address;

May be something we can move inside of exception objects, and have them pay for it, not every frame. Worth looking in to.

/* Cache for dynlex lookup; if the name is non-null, the cache is valid
* and the register can be accessed directly to find the contextual. */
MVMString   *dynlex_cache_name;
MVMRegister *dynlex_cache_reg;
MVMuint16    dynlex_cache_type;

These also have ->work lifetime. Give a huge speed-up on dynlex access, so (aside from re-designing that) they can stay.

/* The allocated work/env sizes. */
MVMuint16 allocd_work;
MVMuint16 allocd_env;

These exist primarily because we allocate work and env using the fixed size allocator, and so we need the sizes to free the memory.

/* Flags that the caller chain should be kept in place after return or
* unwind; used to make sure we can get a backtrace after an exception. */
MVMuint8 keep_caller;

/* Flags that the frame has been captured in a continuation, and as
* such we should keep everything in place for multiple invocations. */
MVMuint8 in_continuation;

/* Assorted frame flags. */
MVMuint8 flags;

It appears the top two could be nicely folded into flags. Also, the flags may only be relevant for currently executing frames, or those captured in a continuation, so this lot is a candidate to move to something with ->work lifetime.

Observations

Here are some things that stand out to me, and that point the way to an alternate design.

  1. An MVMFrame presently carries a bunch of things in it that aren’t relevant unless the frame is either currently on a thread’s call stack or captured in a continuation.
  2. This is an orthogonal axis to whether the frame is referenced by something that is garbage-collectable.
  3. It’s further orthogonal to one of a number of relatively rare things that can happen and need storage in the frame.
  4. Frames that are never referenced by a garbage collectable object will only ever have a reference count of 1, because they will only be alive by virtue of being either the currently executing frame of a thread, or in its caller chain.
  5. Frames only become referenced by something garbage collectable in cases where we’d end up with some other garbage-collectable allocation anyway. For example, in the closure case, we allocate the code-ref that points to the referenced outer frame.
  6. Let’s assume we were to allocate all frames using the GC, and consider the analysis that would let us known when we are able to avoid those allocations. The analysis needed would be escape analysis.

A new approach: the big picture

Taking these into account, I arrived at a way forward that should, I hope, address most of the issues at hand.

Every thread will have a chunk of memory that we’ll refer to as its “call stack”. Every new frame created during normal program execution will be allocated by making space for it, including its ->work and ->env, on this stack. This will need:

Should this frame ever become referenced by a garbage collectable object, then we will GC-allocate a frame on the garbage-collected heap – as a totally normal garbage-collectable object. The frame state will be copied into this. The work space and environment will also be allocated from the fixed-size allocator, and the data migrated there.

Since this frame is now garbage-collectable, we have to check its ->caller to see if it’s on the thread-local stack, or already been promoted to the heap. If the former, we repeat the above process for it too. This is in order to uphold the key invariant in this design: the thread-local stack may point to things in the garbage-collectable heap, but never vice-versa.

This means the reference counting and its manipulation goes away entirely, and that frames that are heap-promoted become subject to the usual generational rules. Frames that would never be heap-referenced never end up on the heap, don’t add to GC pressure, and can be cleaned up immediately and cheaply.

There are some details to care about, of course. Since generational collection involves write barriers, then binds into frames on the garbage-collectable heap will also be subject to write barriers. Is that OK? There are two cases to consider.

  1. Binding of lexicals. Since most lexicals in Perl 6 point to a Scalar, Array, or Hash in my declarations, or point directly to a read-only object if parameters, this is relatively rare (of course, write barriers apply to the Scalar itself). In NQP, loads of lexicals are lowered to locals already, and we’ll do some more of that in Rakudo too, making it rarer still. Long story short, we can afford write barriers on lexical binds.
  2. Binding of stuff in ->work, which basically means every write into the register set of the interpreter. This, we cannot afford to barrier. However, there are only two cases where a frame is promoted to the heap and has ->work. One case is when it’s still executing, and so in the call chain of a thread. In this case, we can take care to always walk the objects in ->work by simply following the call chain . The second case is when a continuation is taken. But here, there are no binds to registers until the continuation is invoked again – at which point things are back in a thread’s call chain.

Refactoring towards it

The thing that makes this a somewhat scary piece of work is that, in making call frames potentially collectable objects, we break an assumption that has been there since week 1 of MoarVM’s development: that call frames never move. To maximize the chances of discovering problems with this refactor, I decided that step 1 would be to always allocate every single call frame on the heap. Only when that is working would I move on to optimizing away most of those heap allocations by adding the thread-local call stack.

MoarVM currently has 3 kinds of collectable:

So, I added a fourth: call frames. As a result, MVMFrame gains an MVMCollectable at the start of the data structure – which will be present whether it’s stack or heap allocated. This will start out zeroed when a frame is born on the call stack. This does two nice things: it gives us a way to know if a frame is GC-able or not, and also means the write barrier – without modification – will do the right thing on both stack and heap frames.

There were two more easy things to do. First was to add a function to allocate a heap frame. Second was to factor out frame destruction from reference decrement, since the latter was going away.

Beyond that, there was nothing for it besides diving in, breaking the world, and then trying to put it back together again. I got a good start towards it – but the conclusion of this first step will have to wait for next week’s installment! See you then.


hoelzro: Binding to C++ With NativeCall

Published on 2016-03-29T03:03:21

A few months back, I was working on Xapian bindings for Perl 6. I got enough of the binding done for me to use it for what I wanted (using Xapian's stemmers and stoppers), but not enough for me to feel comfortable publishing it on modules.perl6.org. However, what I am comfortable publishing is what I learned about binding a C++ library to Perl 6 using NativeCall!


Read Morebrrt to the future: FOSDEM and the future

Published by Bart Wiegmans on 2016-03-13T11:20:00

Hi all, I realise I haven't written one of these posts for a long time. Since November, in fact. So you could be forgiven for believing I had stopped working on the MoarVM JIT. Fortunately, that is not entirely true. I have, in fact, been very busy with a project that has nothing to do with perl6, namely SciGRID, for which I've developed GridKit. GridKit is a toolkit for extracting a power network model from OpenStreetMap, which happens to contain a large number of individual power lines and stations. Such a network can then be used to compute the flow of electric power throughout Europe, which can then be used to optimize the integration of renewable energy sources, among other things. So that was and is an exciting project and I have a lot of things to write on that yet. It is not, however, the topic of my post today.

Let's talk about the expression JIT, where it stands, how it got there, and where it needs to go. Last time I wrote, I had just finished in reducing the expression JIT surface area to the point where it could just compile correct code. And that helped in making a major change, which I called tile linearisation. Tile linearisation is just one example of a major idea that I missed last summer, so it may be worthwhile to expand a bit on it.

As I've epxlained at some length before, the expression JIT initially creates trees of low-level operations out of high-level operations, which are then matched (tiled) to machine-level operations. The low-level operations can each be expressed by a machine-level operation, but some machine-level instructions match multiple low-level operations. The efficient and optimal matching of low-level to machine-level operations is the tiling step of the compiler, and it is where most of my efforts have been.

Initially, I had 'tagged' these tiles to the tree that had been created, relying on tree traversal to get the tiles to emit assembly code. This turned out to be a poor idea because it introduces implicit order based on the tree traversal order. This is first of all finicky - it forces the order of numbering tiles to be the same in the register allocator and the tile selection algorithm and again for the code emitter. In practice that means that the last two of these were implemented in a single online step. But more importantly and more troubling, it makes it more complex to determine exactly the extent of live ranges and of basic blocks.

The notion of basic blocks is also one that I missed. Expression trees are typically compiled for single basic blocks at a time. The definition of a basic block is a sequence of instructions that is executed without interruption. This allows for some nice simplifications, because it means that a value placed in a register at one instruction will still be there in the next. (In contrast, if it were possible to 'jump in between' the instructions, this is not so easy to ensure). However, these basic blocks are defined at the level of MoarVM instructions. Like most high-level language interpreters, MoarVM instructions are polymorphic and can check and dispatch based on operands. In other words, a single MoarVM instruction can form multiple basic blocks. For correct register allocation, it is  vital that the register allocator knows about these basic blocks. But this is obscured, to say the  least, by the expression 'tree' structure, which really forms a Directed Acyclic Graph, owing to the use of values by multiple consumers.

The point of tile linearisation is provide an authoritative, explicit order for tiles - and the code sequences that they represent - so that they can be clearly and obviously placed in basic blocks. This then allows the register allocator to be extended to deal with cross-basic block compilation. (In the distant future, we might even implement some form of instruction scheduling). As a side effect, it also means that the register allocation step should be moved out of the code emitter. I've asked around and got some nice papers about that, and it seems like the implementation of one of these algorithms - I'm still biased towards linear scan - is within the range of reasonable, as soon as I have the details figured out. Part of the plan is to extract value descriptors from the tree (much like the tile state) and treat them as immutable, introducing copies as necessary (for instance for live range splitting). The current register allocator can survive as the register selector, because it has some interesting properties in that aspect.

Aside from that I've implemented a few other improvements, like:
From that last bit, I've learned that the way the JIT is currently dealing with annotations is subtly broken, because the following thing can and does happen:
I'm not yet sure how to deal with this. Jonathan implemented a fix last year that introduced a dynamic control label at the start of each basic block. Ultimately, that reinforces this 'stacking' behavior, although it already happened. Ideally, we would not need to store the current location for each basic block just for the few operations that need it. It might instead be possible to refer to the current region in some other way, which is what happens to some extent in exception handling already.

Anyway, that's all for today, and I hope next time I will be able to bring you some good news. See you!

Pawel bbkr Pabian: Running mixed Perl 5 and Perl 6 tests.

Published by Pawel bbkr Pabian on 2016-03-09T18:33:32

Those two tricks are especially useful when refactoring big codebase from Perl 5 to Perl 6. Such process may take weeks or even a months, and you will encounter two cases:



1. Some features are still in Perl 5, some are fully refactored to Perl 6. So you want to
run separate Perl 5 and Perl 6 test files on single prove command. Prove is not very smart. It does not peek into test files to use correct interpreter (Perl 5 is assumed) and it does not recognize ".t6" extension some people use. But there is a solution. First create your test files.

t/perl5.t


#!/usr/bin/env perl
use v5;
use Test::Simple 'tests' => 1;
ok 1, 'Hello Perl 5';


t/perl6.t


#!/usr/bin/env perl6
use v6;
use Test;
plan 1;
ok 1, 'Hello Perl 6';


Using shebang is crucial here because it will cause the system to use proper interpreter. Now make tests executable. And explicitly pass empty interpreter to prove, so it won't enforce anything.

$ prove -e ''
t/perl5.t .. ok
t/perl6.t .. ok
All tests successful.
Files=2, Tests=2,  1 wallclock secs ( 0.02 usr  0.00 sys +  0.19 cusr  0.03 csys =  0.24 CPU)
Result: PASS


2. Some feature components are entangled. For example you have communication protocol with client already refactored to Perl 6 but server still in Perl 5. To test them you need to interpolate Perl 6 test within Perl 5 test and combine test output. One of the Perl 5 core modules - Tap::Parser - has just what we need. First create your test files (we will use Perl 6 version from example above).

t/interpolated.t


#!/usr/bin/env perl
use v5;
use Tap::Parser;
use Test::Simple 'tests' => 3;
ok 1, 'Hello Perl 5';
my $parser = TAP::Parser->new( { 'exec' => [ 'perl6', 't/perl6.t' ] } );
while ( my $result = $parser->next ) {
next unless $result->is_test;
ok $result->is_ok, $result->description;
}
ok 1, 'Bye Perl 5';

Tap parser allows to run any test code using any interpreter from your script and access those test results using nice, OO interface. Line "ok $result->is_ok" is what makes foreign test result your own.

$ perl t/interpolated.t
1..3
ok 1 - Hello Perl 5
ok 2 - - Hello Perl 6
ok 3 - Bye Perl 5

This is the very basic way to interpolate tests. As you may notice description output from Perl 6 is a bit messy, also comments, subtests, bailouts are not handled yet. However with excellent TAP::Parser documentation you should be able to implement more complex scenarios in no time.

Stay calm an keep testing!

hoelzro: Finding the most common n-grams in Russian using Perl 6 and HabraHabr

Published on 2016-03-05T07:10:41

I've been teaching myself Russian for some time; in fact, I would probably be a lot better at it if I spent time actually learning Russian instead of thinking of ways of hacking my language learning process…which is exactly what we'll be doing here.


Read Morehoelzro: The State of Multi-Line Input in Rakudo

Published on 2016-02-15T13:07:30

Last week, I created an experimental branch for multi-line input in the Rakudo REPL. I merged this branch on Friday, but I wanted to talk about where we stand, and where I see us going in the future.


Read Morerakudo.org: Announce: Mac OS X Installer for release 2016.01

Published by Tobias Leich on 2016-02-12T12:27:09

Thanks to Steve Mynott a Mac OS X installer is now available.
This installer has the “.dmg” file extension and is available from http://rakudo.org/downloads/star/.

Death by Perl6: Perl6 Distribution thoughts and proposals (s22)

Published by Nick Logan on 2016-01-31T17:47:00

Currently Distribution1 is just a glorified key/value store. After years of digesting s222 I'm comfortable pushing for its adaption. A common complaint is that its over engineered or too academic. However I would guess many such complaints boil down to not putting in all the considerations of the original drafters. I say this because over the years I've gone from being confused and disliking it to promoting its implementation. Nearly every problem I've encountered or run into I end up finding a solution for in s22. Sometimes these solutions are vague, but their intentions can still be interpreted. So I will lay out my understanding of the Distribution aspect of s22.

DESIGN:

https://design.perl6.org/S22.html#Distribution
The class for installing distributions using CompUnitRepo::Local::Installation. Basically provides the API to access the META6.json file representation of a distribution. It should at least contain the following methods:

method meta
my $meta = $dist.meta;
# Return a Hash with the representation of the meta-data,
# constructed the same way as in the META6.json specification.
# Please note that an actual META6.json file does not need to
# exist, just a representation in that format.
method content
my $content = $dist.content( <provides JSON::Fast lib/JSON/Fast.pm6> );
my $content = $dist.content( <resource images fido.png> );
# Return the octet-stream as specified by the given keys,
# navigating through the META6.json hash.

INTERFACE:

https://design.perl6.org/S22.html#Distribution
(note this is an interface, not a class as currently implemented)

role Distribution {
    meta {...}
    content(*@_ -> IO) {...}
}

INTERFACE EXPLANATION:

method meta is simply for giving hash-like access to the META6. In most cases it would probably just be an alias to a json-from-file generated hash, but it doesn't have to be (nor does the META6 have to exist as a file). I don't think there is much confusion here.

method content is the interesting bit.

currently: Distribution assumes everything is a file. CU::R::I concats relative paths into strings so there is no way for anything other than a file to transfer data.

proposed: Distribution doesn't not care what representation the Distribution is as it just uses .content, so content may feed it data from a socket, a file, or even a hash

example:

$dist.content(<provides Module::XXX lib/Module/XXX.pm6>)

(*note: key names found via method .meta)

This would return the raw data of whatever that set of keys point to (in this case lib/Module/XXX.pm6) so that CU::R::I can save that data anywhere it wants on the filesystem. So when CU::R::I gets the Distribution object the Distribution does not even have to exist yet; CU::R::I is just going to get the raw data and save it to (for instance) rakudo/install/perl6/site/XXX.pm6

%?RESOURCE:

https://design.perl6.org/S22.html#%25%3FRESOURCE

"resource" : {
    "images" : [
        "fido.png", "zowie.png"
    ],
    "libraries" : {
        "inline_helper" : "build-time",
    }
}

The current implementation of resource is rather lacking. Right now it just takes an array of paths: "resources" : ["libraries/libfoo", "xxx.img"]
The s22 design would allows for:

  1. each directory is its own hash key (so not "dir/dir2/xxx.txt" but rather "dir" : ["dir2" : "xxx.txt"])
  2. each file is not directly part of a string that contains its directory (no directory separators involved)
  3. Arbitrary data can be attached on leaf nodes; if a leaf node is a hash
    then its meant to be understood by a package manager and can be ignored by rakudo/compunit (as these might mean different things to specific package managers).

Let us look at the libraries example above; the arbitrary data here is build-time. This may tell a package manager something about libraries, so for this example we will say it tells the build phase we will generate a file called inline_helper that does not exist yet (so take this into account when creating a manifest/uninstall). It may also be that the package manager simply added it itself so that later it can look up that info [think dependency chain])

But its more useful than that. A package manager could then allow a command like <pkger> build-time . to run the build hook manually (similar to how npm scripts is).
Or allow explicitly requesting that $*VM.platform-library-name be applied (or explicitly supplying a type of "-or" => ["lib.so", "lib.dll"] to say one of these will exist). Remember, CU::R::I doesn't need to understand these outer hash nodes so any code meant to interpret these would be in whatever Distribution object (or package manager).

t/ hooks/ bin/:

resources/ does not belong in this group. Why? Because t/ hooks/ bin/ are special folders whereas resources can contain special files (anything with a hash leaf is considered a special file). I will say I can not explain why these special folders are not required entries in the META. I can only guess its because:

  1. These folders probably won't end up with any build time generated files meant to be installed (like resources)
  2. The files do not get tied to a specific name (like provides style Module::Foo => 'some/path')

So in the prototype distributions I code generally include a .files method to allow access to the files in these special folders that are not included in the META6. This is ok as none of these files have special properties (like resources) nor have to associate with a name (like provides).

CompUnit::Repository::Installation.install

Currently CompUnit::Repository::Installation has a signature3 of:

method install(Distribution $dist, %sources, %scripts?, %resources?, :$force)

We'll ignore the flaw of 2 optional positional hashes, but I do have to point out the pointlessness of passing in the 3 hashes at all. The $dist should already know all of this, and if not it should be able to generate it on the fly whenever CompUnit::Repository::Installation makes a request. So when its time to install the sources it would call something like

for $dist.meta<provides>.kv -> {
    $save-to.spurt: $dist.content(["provides", $_.key, $_.value])
}

instead of relying on a hash that gets passed in with the exact same data. In other words we want: method install(Distribution $dist, :$force)

I ask: is it safe to dismiss s22 Distribution for something else, when the something else is probably not considering many of the things that s22 does (but the proposer may not have thought of yet)? I'm willing to assuming quite a few hours were put into s22, and I'd hope anyone wanting to do something different puts in an equal amount of time to understand the problems its meant to solve before we say "lets just rewrite it because I understand this model I've built in my head over $some-fraction of the hours that were put into s22"

(following the programming story of programmer Foo insisting on removing a piece of code to programmer Bar, but programmer Bar denies this course of action because programmer Foo is unable to answer what its original purpose was. Foo wants to change code he does not understand into code he does understand, but Bar knows you should not change code you do not fully understand yet).

Again, I've been guilty of this as well. But I don't want to see the hack-y workarounds I've used for zef's4 Distribution objects over the last 2 years (see the bottom of this post) to somehow make it as the default implementation because they are easier to understand initially... they were developed as hacks until s22 was implemented after all.

A simpler interface could be used internally since the CompUnit::Repository already knows the actual location of the distribution it will load. It will also know what type of Distribution to use because the CompUnit installed it itself. This means the CompUnit can save it as any type of Distribution it wants; it may get Distribution::Tar but that does not mean CompUnit::Repository can't save it as a Distribution::Local (which means you no longer need Distribution::Tar to reinstall it when you upgrade rakudo). However a simpler interface being the default trades making it only a few lines easier to install a Distribution for taking away some of the flexibility s22 provides.

To wrap this up: I think we have been avoiding some/many of these things because at the time we did not understand why they were designed that way to begin with. I would ask that instead of requiring one to explain why we should use the s22 version of Distribution that anyone that would like to see otherwise explain what is actually wrong with it and what they think the original intention was. I do not think anyone should push for alternatives unless they can explain what they feel the original intention was meant to solve and whats wrong with the design decision. If these cannot be answered then it could be inferred that you may not have considered or be aware of certain problems that will be encountered. I know i've certainly been guilty of this in the past, but years of working on zef has only shown me the answers to most problems/features were in fact already solved in s22 in some way (and that I simply had not understood everything fully before).

Examples / Code

The first 2 examples each show multiple Distribution objects fulfilling the s22 interface (one using roles, one using classes):
Demo/prototype of Distribution::Local and Distribution::Tar

Demo/prototype of Distribution::Local and Distribution::GitHub (with a modified CompUnit::Repository::Installation to use .content and .meta methods)

A type of Distribution::Simple that can be created using the previously mentioned s22 implementation (but making this the actual interface leaves it too simple/limited to be the actual API; let the core have an API that allows slightly lower level access to the install process and leave the ::Simple style 'method per META6 root key' wrappers for distributions to implement themselves (as such implementations are simple enough to not require a core Distribution::Simple) or make a simple API available through a different CompUnit::Repository (CompUnit::Repository::Installation::Simple)
zef's Hack-y Distribution::Simple style implementation

Note: ioify is just a way to "absolutify" the relative paths contained in a META. In reality the data doesn't have to come from a file, but it may be used to absolutify a relative url or anything else that could transform a urn to an actual location.

  1. Distribution

  2. s22

  3. CompUnit::Repository::Installation.install

  4. zef

hoelzro: Anonymous State Variables And How They Work

Published on 2016-01-27T14:57:00

When debugging code, I will often add a counter variable to a loop so I can keep track of what's going on, or so that I can process a fraction of my data set while I'm iterating on a piece of code:


Read Morebrrt to the future: Rumors of JITs' demise are greatly exaggerated.

Published by Bart Wiegmans on 2015-10-12T19:38:00

Earlier this week my attention was brought to an article claiming that the dusk was setting for JIT compilation. Naturally, I disagree. I usually try to steer clear of internet arguments, but this time I think I may have something to contribute. Nota bene, this is not a perl- or perl6 related argument, so if that is strictly your interest this is probably not an interesting post for you.

The main premise of the argument is that people are shifting away from JIT compilation because the technique has failed to live up to its promises. Those promises include, in various forms, high level languages running 'as fast as C', or having more optimization possibilities than ahead-of-time (AOT) compilers do. Now my perspective may be a bit unusual in that I don't actually expect momentous gains from JIT compilation per se. As I've described in the talk I gave at this years' YAPC::EU, by itself JIT compilation removes only the decoding and dispatch steps of interpretation, and - depending on the VM architecture - these may be a larger or smaller proportion of your running time. However, my thesis is that interpretation is not why high-level languages are slow, or rather, that interpretation is only one of the many sources of indirection that make high-level languages slow.

First of all, what of the evidence that JITs are actually in demise? The author provides three recent trends as evidence, none of which I hold to be decisive. First, both Windows 10 and the newest versions of Android translate .NET and Dalvik applications respectively to native code at installation time, which is properly considered ahead of time compilation. Second, high-performance javascript applications are currently often created using tools like emscripten, which compiles to asm.js, and this is in many ways more similar object code than it is to a high-level language, implying that the difficult bit of compilation is already behind us. (I agree mostly with that assesment, but not with its conclusion). Finally, on iOS devices JIT compilation is unsupported (except for the JIT compiler in the webkit browser engine), allegedly because it is insecure.

As to the first piece, the author suggest that the main reason is that JIT compilers being unpredictable in their output, at least relative to optimizing ahead-of-time compilers. I think that is nonsense; JIT compilation patterns tend to be quite reliably the same on different runs of the same program, a property I rely on heavily during e.g. debugging. The output code is also pretty much invariant, with an exception being the actual values of embedded pointers. So in my experience, what you see (as a developer) is also what you get (as a user), provided you're using the same VM. I humbly suggest that the author believes JITs to be unreliable because his work is being compiled by many different VMs using many different strategies. But I see that no differently than any other form of platform diversity. Maybe the author also refers to the fact that often optimization effectiveness and the resultant performance of JIT compiled applications is sensitive to minor and innocuous changes in the application source code. But this is true of any high-level language that relies primarily on optimizing compilers, for C as much as for python or javascript. The main difference between C and python is that any line of C implies far fewer levels of indirection and abstraction than a similar line of python.

I think I have a much simpler explanation as to why both Google and Microsoft decided to implement ahead-of-time compilation for their client platforms. The word 'client' is key here; because I think we're mostly talking about laptops, smartphones and tablets. As it turns out, hardware designers and consumers alike have decided to spend the last few years worth of chip manufacturing improvements on smaller, prettier form factors (and hopefully longer battery life) rather than computing power. Furthermore, what Qualcomm, Samsung etc. have given us, Photoshop has taken away. The result is that current generation portable devices are more portable and more powerful (and cheaper) than ever but are still memory-constrained.

JIT compilation inevitably comes with a significant memory cost from the compiled code itself (which is generally considerably larger than the interpreted code was), even when neglecting the memory usage of the compiler. Using various clever strategies one can improve on that a bit, and well-considered VM design is very important as always. But altogether it probably doesn't make a lot of sense to spend precious memory for JIT-compiled routines in a mobile setting. This is even more true when the JIT compiler in question, like Dalviks', isn't really very good and the AOT compiler has a good chance of matching its output.

Now to the case of asm.js. As I said, i agree mostly that a significant amount of work has already been done by an ahead-of-time compiler before the browser ever sees the code. It would be a mistake to think that therefore the role of the JIT (or rather the whole system) can be neglected. First of all, JIT-compiled code, even asm.js code, is greatly constrained in comparison to native code, which brings some obvious security benefits. Second of all, it is ultimately the JIT compiler that allows this code to run cross-platform at high performance. I think it is mistaken to suggest that this role is trivial, and so I see asm.js as a success of rather than evidence against JIT compilation as a technique.

Next, the iOS restriction on JIT compilation. I think the idea that this would be for security reasons is only plausible if you accept the idea that application security is significantly threatened by dynamic generation of machine code. While I'm sure that the presence of a JIT compiler makes static analysis very difficult - not to say impossible - I don't believe that this is the primary attack vector of our times. The assertion that memory must be both writable and executable for a JIT compiler to work is only superficially true, since there is no requirement that the memory must be both at the same time, and so this doesn't imply much of a threat (So called W^X memory is becoming a standard feature of operating systems). Vtable pointers stored in the heap, and return addresses on a downward-growing stack, now those are attack vectors of note.

But more importantly, that is not how mobile users are being attacked. It is much more interesting, not to mention significantly easier, for attackers to acquire whole contact books, private location information, credentials and private conversations via phishing and other techniques than it is to corrupt a JIT compiler and possibly, hopefully, and generally unreliably gain remote execution. Most of these attack vectors are wide-open indeed and should be prevented by actual security techniques like access control rather than by outlawing entire branches of computing technology. Indeed, an observer not sympathetic to Apple could probably relate this no-JIT compilation rule with the Californian company's general attitude to competing platforms, but I will not go further down that path here.

Finally, I think the claim that JIT compilation can't live up to its promise can readily be disproven by a simple google search. The reason is simple; the JIT compiler, which runs at runtime, has much more information at its disposal than even the best of ahead-of-time compilers. So-called profile-guided optimization help to offset the difference, but it is not a common technique, moreover that is still only a small subset of information available to a JIT compiler. The fact that many systems do not match this level of performance (and MoarVM's JIT compiler certainly doesn't) is of course relevant but not, in my opinion, decisive.

In conclusion, I would agree with the author that there are many cases in which JIT compilation is not suitable and in AOT compilation is. However, I think the much stronger claim that the dusk is setting on JIT compilation is unwarranted, and that JIT compilers will remain a very important component of computing systems.

brrt to the future: Most Significant Bits

Published by Bart Wiegmans on 2015-09-20T14:12:00

This week I think I fixed irregular behavior in the x64 instruction encoding register selction of DynASM. I think it'll be a fun story to share, so I thought it'd be time to blog.

The astonishingly irregular thing about x64 instruction encoding is that it is mostly very regular. Ignoring for the moment instruction prefixes and constants, an x86 instruction consists of two bytes, one for the instruction proper, and one for it's two operands. Take for example the addition of two registers:

+-----------------+
| add | eax, ecx |
+-----------------+
| 0x01 | 0xc8 |
+-----------------+

We take the instruction byte as given. It is the second byte that concerns me because it determines which operands to use and how. Like a good CISC architecture, the x86 supports a number of addressing modes, meaning that a register can be used as a value but also as a (part of a) pointer. One of the reasons C does pointer arithmetic so freely is that this reflects the nature of the CPU's which where current when C was designed. The (relevant) x86 addressing modes are shown in the following table. (There are more, but you shouldn't use them):

Addressing Mode Byte flag Meaning
Direct 0xc0 Both operands are used as direct values
Indirect 0x00 One of the operands is used as a memory reference, whether the source of destination operand depends on the instruction
Indirect with offset 0x80 or 0x40 One of the operands is used as a memory reference, offset by a constant which is encoded directly after the instruction.
Indexed 0x04 One operand consists of two registers base and index, which is multiplied by a scale, to provide a memory reference. Optionally also has an offset, in which case the code byte is 0x44 or 0x84, depending on whether the offset fits in a single byte or not.
Instruction-relative 0x05 One of the operands is a reference to the current location in the code offset by a constant (and the other refers to a register as usual).

Readers which are more careful than can be reasonably expected will have noticed that in the first three addressing modes, the lowest nibble is zero, whereas it is nonzero for the lower two addressing modes. This is in fact the source of irregularities in instruction encoding. To appreciate this it helps to unpack the operand byte in octal rather than hexadecimal. Octal is much closer to how x86 thinks about the world. As demonstrated in this table, the lowest two pairs of 3 bits each encode the actual registers that should be used.

+------+------+-----+-----+----------+
| Byte | Mode | Op2 | Op1 | Meaning |
+------+------+-----+-----+----------+
| 0xc0 | 3 | 0 | 0 | Direct |
+------+------+-----+-----+----------+
| 0x80 | 2 | 0 | 0 | Indirect |
+------+------+-----+-----+----------+
| 0x04 | 0 | 0 | 4 | Indexed |
+------+------+-----+-----+----------+

The upshot of this is that in case the operand mode isn't direct, and the first operand register is either 4 or 5, the meaning of the operand byte is completely different. x86 suddenly expects another operand byte (a so-called SIB byte) to specify which register shall be base and which shall be index.

Normally this isn't much of a problem since the registers refered to by number 4 and 5 are rsp and rbp respectively; meaning the stack top and stack bottom registers. Fun fact: the x86 stack grows downward, so rbp > rsp in basically all cases. Also fun fact: because of this, writing from a rsp relative reference upwards can overwrite the return pointer held somewhere below rbp, which is the basis of most buffer overflow attacks. You thought NULL was a billion dollar mistake? Consider how the engineers that decided the stack should grow downward must feel.

Anyway, considering that rbp and rsp take up such a pivotal role in your program, it's actually unlikely you'll encode them by mistake. So as long as you don't do that, it's safe to ignore this complexity and just 'add in' the correct bits into the operand byte. Thus, for instance, to refer to the 7th and first register respectively in direct mode, we generate:

0300 + (07 << 3) + (01) == 0371 == 0xf9

However, in the land of x64, things are not so happy. I have blogged earlier about how the x64 architecture gives you 8 extra registers on top of the 8 legacy x86 registers, and how the difference between those registers lies only in that x64 specifies a prefix byte called REX. REX byte encoding is not so difficult, the trick is to find it reliably.  But because the lower three bits of the 4-bit register number are placed in the operand byte, register r12 and r13 look exactly like rsp and rbp to the CPU. Well, that's where the fun really starts, because it's all too easy to encode these 'accidentally'. They are after all perfectly regular registers.

For those not keeping score, we have two special cases to handle. First, whenever the first operand is either rsp or r12 and we're not using direct mode, an extra SIB byte needs to be encoded to specify that we are really talking about accessing rsp/r12 directly. This is done by encoding rsp as both the base and index, which the x86 understands because using rsp as an index is usually illegal. (The magic byte is thus 0044 or 0x24). Second, whenever the first operand is rbp or r13 and we're using indirect access without an offset, we need to encode indirect access with an offset instead, just with the offset at zero. This of course requires another byte. Somewhat byzantine, but manageable.

We are, unfortunately, not completely OK yet. It is my central hypothesis of this post that DynASM was not designed to handle register selection at runtime. Evidence for this hypothesis is that DynASM does 'weird' things like mix data and instructions and linking prior to encoding. Usually one encodes first and links afterwards, especially when during encoding you may need to make decisions that influence the final positions of certain segments. DynASM does it the other way around, which means that during linking we should be able to calculate exactly how much space we need for each instruction. Which is a pain, because DynASM mixes the data stream (which we need for inspection) with the instruction stream (which tells the runtime what to do with its input). It's possible to hack around this - basically by copying data into the instructions - but it's not elegant. Starting with this commit, I'm reasonably confident that this stuff works, a test case is provided here.

That almost concludes this weeks madness. The only thing left is to question the method. Why should x86 use this convoluted scheme? I could go on a detailed historical research, but I prefer to speculate it is caused by economy in memory. After all, in the regular case you need only 2 bytes, which is - conveniently - equal to 16 bit, the original register size of the 8086. And since that chip was designed in the 1970s, it makes sense instructions should be as space-efficient as possible. In contrast, ARM uses 32 bit instructions with 3 operands. So space economy seems a plausible cause to me.

See you next time!

brrt to the future: Moar JIT news

Published by Bart Wiegmans on 2015-11-29T15:22:00

Hello there, I thought it high time to write to you again and update you on the world of JITs. Since last I wrote, PyPy 4.0 was released. Also in python-land, Pyston 0.4 was released, and finally Guile 2.1.1 was released and Andy Wingo wrote a nice piece about that, as is his custom. I present these links not only to give these projects the attention they deserve, but also because I think they are relevant to our own project.

In chronological order, the release of PyPy 4.0 marked the first 'production' release of that projects' autovectorizer, which was developed over the course of this years Google Summer of Code. I'd like to take this opportunity to publicly congratulate the PyPy team on this achievement. So called 'vector' or SIMD operations perform a computation on multiple values in a single step and are an essential component of high-performance numerical computations. Autovectorizing refers to the compiler capability to automatically use such operations without explicit work by the programmer. This is not of great importance for the average web application, but it is very significant for scientific and deep learning applictions.

More recently, the Pyston project released version 0.4. Pyston is another attempt at an efficient implementation of Python, funded by Dropbox. Pyston is, or I should rather say, started out based on llvm. Most of my readers know of LLVM; for those who don't, it is a project which has somewhat revolutionised compiler development in the last few years. Its strengths are its high-quality cross-platform code generation with a permissive license. LLVM is also the basis for such languages as rust and julia. Notable weaknesses are size, speed, and complexity. To make a long story short, many people have high expectations of LLVM for code generation, and not without reason.

There are a few things that called my attention in the release post linked above. The first thing is that the Pyston project introduced a 'baseline' JIT compiler that skips the LLVM compilation step, so that JIT compiled code is available faster. They claim that this provides hardly a slowdown compared to the LLVM backend. The second thing is that they have stopped working on implementing LLVM-based optimisation. The third thing is that to support more esoteric python feature, Pyston now resorts to calling the Python C API directly, becoming sort of a hybrid interpreter. I would not be entirely surprised if the end point for Pyston would be life as a CPython extension module, although Conways law will probably prohibit that.

Pyston is not the first, nor the only current JIT implementation based on LLVM. It might be important to say here that there are many projects which do obtain awesome results from using LLVM; julia being a prime example. (Julia is also an excellent counterexample to the recent elitist celebration of self-declared victory by static typing enthusiasts assertion that 'static types have won', being very dynamic indeed). But Julia was designed to use the LLVM JIT, which probably means that tradeoffs have been made to assure performance; and it is also new, so it doesn't have to run as much weird legacy code; the team is probably highly competent. I don't know why some mileages vary so much (JavascriptCore also uses LLVM successfully, albeit as it's fourth and last tier). But it seems clear that far from being a golden gun, using LLVM for dynamic language implementations is a subtle and complex affair.

Anybody willing to try building an LLVM-backed JIT compiler for MoarVM, NQP, or perl6 in general, will of course receive my full (moral) support, for whatever that may be worth.

The posts by Andy Wingo, about the future of the Guile Scheme interpreter, are also well worth reading. The second post is especially relevant as it discusses the future of the guile interpreter and ways to construct a high-performance implementation of a dynamic language; it generalizes well to other dynamic languages. To summarise, there are roughly two ways of implementing a high-performance high-level programming language, dynamic or not, and the approach of tracing JIT compilers is the correct one, but incredibly complex and expensive and - up until now - mostly suitable for big corporations with megabucks to spend.
 Of course, we aim to challenge this; but for perl6 in the immediate future correctness far outranks performance in priority (as it should).

That all said, I also have some news on the front of the MoarVM JIT. I've recently fixed a very longstanding and complex bug that presented itself during the compilation of returns with named arguments by rakudo. This ultimately fell out as a missing inlined frame in the JIT compiler, which ultimately was caused by MoarVM trying to look for a variable using the JIT-compiler 'current location', while the actual frame was running under the interpreter,and  - this is the largest mystery - it was not deoptimized. I still do not know why that actually happened, but a very simple check fixed the bug.

I also achieved the goal of running the NQP and rakudo test suites under the new JIT compiler, albeit in a limited way; to achieve this I had to remove the templates of many complex operations, i.e. ops that call a C function or that have internal branches. The reason is that computing the flow of values beyond calls and branches is more complex, and trying to do it inline with a bunch of other things - as the new JIT has tried to so far - is prone to bugs. This is true especially during tree traversal, since it may not be obvious that computations relying on values may live in another context as the computations that generate these values.

In order to compile these more complex trees correctly, and understandably, I aim to disentangle the final  phases of compilation, that is, the stages of instruction selection, register allocation, and bytecode generation. Next to that I want to make the tiler internals and interface much simpler and user-friendly, and solve the 'implied costs problem'. The benefit of having the NQP test suite working means I can demonstrate the effects of changes much more directly, and more importantly, demonstrate whether individual changes work or not. I hope to report some progress on these issues soon, hopefully before christmas.

If you want to check out the progress of this work, checkout the even-moar-jit branch of MoarVM. I try, but not always successfully, to keep it up-to-date with the rapid pace of the MoarVM master branch. The new JIT only runs if you set the environment variable MVM_JIT_EXPR_ENABLE to a non-empty value. If you run into problems, please don't hesitate to report on github or on the #moarvm or #perl6 channels on freenode. See you next time!

Strangely Consistent: Macros: what the FAQ are they?

Published by Carl Mäsak

Thank you sergot++ for eliciting these answers out of me, and prompting me to publish them.

Q: Is it common to be totally befuddled by all these macro-related concepts?

Yes! In fact, that seems to be the general state of mind not just for people who hear about these things now and then, but also for those of us who have chosen to implement a macro system! 😝

Seriously though, these are not easy ideas. When you don't deal with them every day, they naturally slip away from your attention. The brain prefers it that way.

Q: I keep seeing all these terms: quasis, unquotes, macros... I think I know what some of them are, but I'm not sure. Could you explain?

Yes. Let's take them in order.

Q: OK. What the heck is a quasi?

Quasis, or quasiquotes, are a way to express a piece of code as objects. An average program doesn't usually "program itself", inserting bits and pieces of program code using other program code. But that's exactly what quasis are for.

Quasis are not strictly necessary. You could create all those objects by hand.

quasi { say "OH HAI" }        # easy way

Q::Block.new(Q::Statement::Expr.new(Q::Postfix::Call.new(
    Q::Identifier.new("say"),
    Q::Literal::Str.new("OH HAI")
)));                          # hard way

It's easier to just write that quasi than to construct all those objects. Because when you want to structurally describe a bit of code, it turns out the easiest way to do that is usually to... write the code.

(By the way, take the Q::Block API with a huge grain of salt. It only exists for 007 so far, not for Perl 6. So the above is educated guesses.)

Q: Why would we want to create those Q objects? And in which situations?

Excellent question! Those Q objects are the "API for the structure of the language". So using it, we can query the program structure, change it during compile time, and even make our own Q objects and use them to extend the language for new things.

They are a mechanism for taking over the compiler's job when the language isn't flexible enough for you. Macros and slangs are a part of this.

Q: Do you have an example of this?

Imagine a format macro which takes a format string and some arguments:

say format "{}, {}!", "Hello", "World";

Since this is a macro, we can check things at compile time. For example, that we have the same number of directives as arguments after the format string:

say format "{}!", "Hello", "World";                        # compile-time error!

In the case of sufficient type information at compile time, we can even check that the types are right:

say format "{}, robot {:d}!", "Hello", "four-nineteen";    # compile-time error!

Q: Ooh, that's pretty cool!

I'm not hearing a question.

Q: Ooh, that's pretty... cool?

Yes! It is!

Q: Why is it called "quasiquote"? Why not "code quote" or something?

Historical reasons. In Lisp, the term is quasiquote. Perl 6's quasiquotes are not identical, but probably the nearest thing you'll get with Perl 6 still being Perl.

Traditionally, quasiquotes have unquotes in them, so let's talk about them.

Q: Right. What's an unquote?

In Java, you have to write something like "Hello, " + name + "!" when interpolating variables into a string. Java developers don't have it easy.

In Perl, you can do "Hello, $name!". This kind of thing is called "string interpolation".

Unquotes are like the $name interpolation, except that the string is a quasi instead, and $name is a Qtree that you want to insert somewhere into the quasi.

quasi {
    say "Hello," ~ {{{$name}}};
}

Just like the "Hello, $name" can be different every time (for example, if we loop over different $name from an array), unquotes make quasis potentially different every time, and therefore more flexible and more useful.

To tie it to a concrete example: every time we call the format macro at different points in the code, we can pass it different format strings and arguments. (Of course.) These could end up as unquotes in a quasi, and thus help to build different program fragments in the end.

In other words, a quasi is like a code template, and unquotes are like parametric holes in that template where you can pass in the code you want.

Q: Got it! So... macros?

Macros are very similar to subroutines. But where a sub call happens at run time, a macro call happens at compile time, when the parser sees it and knows what to send as arguments. At compile time, it's still early enough for us to be able to contribute/modify Q objects in the actual program.

So a macro in its simplest form is just a sub-like thing that says "here, insert this Qtree fragment that I just built".

Q: So quasis are used inside of a macro?

Yes. Well, they're no more tightly tied to each other than given and when are in Perl, but they're a good fit together. Since what you want to do in a macro is return Q objects representing some code, you'd naturally reach for a quasi to do that. (Or build the Q objects yourself. Or some combination of the two.)

Q: Nice! I get it!

Also not a question.

Q: I... get it?

Yeah! You do!

Q: Ok, final question: is there something that you've omitted from the above explanation that's important?

Oh gosh, yes. Unfortunately macros are still gnarly.

The most important bit that I didn't mention is hygiene. In the best case, this will just work out naturally, and Do What You Mean. But the deeper you go with macros, the more important it becomes to actually know what's going on.

Take the original quasiquote example from the top:

quasi { say "OH HAI" }

The identifier say refers to the usual say subroutine from the setting. Well, unless you were actually doing something like this:

macro moo() {
    sub say($arg) { callwith($arg.lc) }

    return quasi { say "OH HAI" }
}

moo();    # 'oh hai' in lower-case

What we mean by hygiene is that say (or any identifier) always refers to the say in the environment where the quasi was written. Even when the code gets inserted somewhere else in the program through macro mechanisms.

And, conversely, if you did this:

macro moo() {
    return quasi { say "OH HAI" }
}

{
    sub say($arg) { callwith("ARGLEBARGLE FLOOT GROMP") }
    moo();    # 'OH HAI'
}

Then say would still refer to the setting's say.

Basically, hygiene is a way to provide the macro author with basic guarantees that wherever the macro code gets inserted, it will behave like it would in the environment of the macro.

The same is not true if we manually return Q objects from the macro:

Q::Block.new(Q::Statement::Expr.new(Q::Postfix::Call.new(
    Q::Identifier.new("say"),
    Q::Literal::Str.new("OH HAI")
)));

In this case, say will be a "detached" identifier, and the corresponding two examples above would output "OH HAI" with all-caps and "ARGLEBARGLE FLOOT GROMP".

The simple explanation to this is that code inside a quasi can have a surrounding environment (namely that which surrounds the quasi)... but a bunch of synthetically created Q objects can't.

We're planning to use this to our advantage, providing the safe/sane quasiquoting mechanisms for most things, and the synthetic Q object creation mechanism for when you want to mess around with unhygiene.

Q: Excellent! So when will all this land in Perl 6? I'm so eager to...

Ok, that was all the questions we had time for today! Thank you very much, and see you next time!

Steve Mynott: FOSDEM 2016 Perl Dev Room Lineup

Published by Steve on 2016-01-09T21:32:00

FOSDEM is a free two day conference in Brussels, Belgium on Jan 30th and 31st 2016.

The FOSDEM 2016 schedule for the Perl Dev Room on the second day (the Sunday)  has now been announced at

https://fosdem.org/2016/schedule/track/perl/

From a Perl 6 perspective it includes  Ovid's "Perl 6 for those who hate Perl", Daisuke Maki on "Crust --  Perl6 Port of Plack", Jeffrey Goff on Perl 6 Grammars, Bart Wiegmans talks about AMD64 assembly language programming and MoarVM, Stevan Little's "Perl is not dead,... it got better" and lastly Elizabeth Mattijsen finishes with "Perl 6 -- The end of the beginning".

Strangely Consistent: Strategic rebasing

Published by Carl Mäsak

Just a quick mention of a Git pattern I discovered recently, and then started using a whole lot:

  1. Realize that a commit somewhere in the commit history contained a mistake (call it commit 00fbad).

  2. Unless it's been fixed already, fix it immediately and push the fix.

  3. Then, git checkout -b test-against-mistake 00fbad, creating a branch rooted in the bad commit.

  4. Write a test against the bad thing. See it fail. Commit it.

  5. git rebase master.

  6. Re-run the test. Confirm that it now passes.

  7. Check out master, merge test-against-mistake, push, delete the branch.

There are several things I like about this pattern.

First, we're using the full power of Git's beautiful (distributed) graph theory model. Basically, we're running the branch in two different environments: one where the thing is broken and one where the thing is fixed. Git doesn't much care where the two base commits are; it just takes your work and reconstitutes it in the new place. Typically, rebasing is done to "catch up" with other people's recent work. Here, we're doing strategic rebasing, intentionally starting from an old state and then upgrading, just to confirm the difference.

Second, there's a more light-weight pattern that does this:

  1. Fix the problem.

  2. Stash the fix.

  3. Write the test. See it fail.

  4. git stash pop

  5. Confirm test now passes.

  6. Commit test and fix.

This is sometimes fully adequate and even simpler (no branches). But what I like about the full pattern is (a) it prioritizes the fix (which makes sense if I get interrupted in the middle of the job), and (b) it still works fine even if the problem was fixed long ago in Git history.

Git and TDD keep growing closer together in my development. This is yet another step along that path.

Strangely Consistent: Double-oh seven

Published by Carl Mäsak

It's now one year since I started working on 007. Dear reader, in case I haven't managed to corner you and bore you with the specifics already, here's the elevator pitch:

007 is a deliberately small language that's fun and easy to develop where I/we can iterate quickly on all the silly mistakes needed to get good macros in Perl 6.

At this stage, I'd say we're way into "iterate quickly", and we're starting to see the "good macros" part emerge. (Unless I'm overly optimistic, and we just went from "easy" into a "silly mistakes" phase.)

Except for my incessant blabbering, we have been doing this work mostly under the radar. The overarching vision is still to give 007 the three types of macros I imagine we'll get for Perl 6. Although we have the first type (just as in Rakudo), we're not quite there yet with the other two types.

Instead of giving a dry overview of the internals of the 007 parser and runtime — maybe some other time — I thought I would share some things that I've discovered in the past year.

The AST is the program

The first week of developing 007, the language didn't even have a syntax or a parser. Consider the following simple 007 script:

say("Greetings, Mister Bond!");

In the initial tests where we just wanted to run such a program without parsing it, this program would be written as just its AST (conveniently expressed using Lisp-y syntax):

(compunit (block
  (paramlist)
  (stmtlist (stexpr (postfix:<()>
    (ident "say")
    (arglist (str "Greetings, Mister Bond!")))))))

(In the tests we helpfully auto-wrap a compunit and a block on the outermost level, since these are always the same. So in a test you can start writing at the stmtlist.)

But 007 isn't cons lists internally — that's just a convenient way to write the AST. The thing that gets built up is a Qtree, consisting of specific Q nodes for each type of program element. When I ask 007 to output what it built, it gives me this:

Q::CompUnit Q::Block {
    parameterlist: Q::ParameterList [],
    statementlist: Q::StatementList [Q::Statement::Expr Q::Postfix::Call {
        expr: Q::Identifier "say",
        argumentlist: Q::ArgumentList [Q::Literal::Str "Greetings, Mister Bond!"]
    }]
}

Yes, it's exactly the same structure, but with objects/arrays instead of lists. This is where 007 begins, and ends: the program is a bunch of objects, a hierarchy that you can access from the program itself.

I've learned a lot (and I'm still learning) in designing the Qtree API. The initial inspiration comes from IntelliJ's PSI, a similar hierarchy for describing Java programs. (And to do refactors and analysis, etc.)

The first contact people have with object-oriented design tends to be awkward and full of learning experiences. People inevitably design according to physical reality and what they see, which is usually a bad fit for the digital medium. Only by experience does one learn to play to the strengths of the object system, the data model, and the language. I find the same to be true of the Qtree design: initially I designed it according to what I could see in the program syntax. Only gradually have my eyes been opened to the fact that Qtrees are their own thing (and in 007, the primary thing!) and need to be designed differently from textual reality and what I can see.

Some examples:

Qtrees are the primary thing. This is the deep insight (and the reason for the section title). We usually think of the source text as the primary representation of the program, and I guess I've been mostly thinking about it that way too. But the source text is tied to the textual medium in subtle ways, and not a good primary format for your program. Much of what Qtrees do is separate out the essential parts of your program, and store them as objects.

(I used to be fascinated by, and follow at a distance, a project called Eidola, aiming to produce a largely representation-independent programming medium. The vision is daring and interesting, and maybe not fully realistic. Qtrees are the closest I feel I've gotten to the notion of becoming independent of the source code.)

Another thing I've learned:

Custom operators are harder to implement than macros

I thought putting macros in place wold be hard and terrible. But it was surprisingly easy.

Part of why, of course, is that 007 is perfectly poised to have macros. Everything is already an AST, and so macros are basically just some clever copying-and-pasting of AST fragments. Another reason is that it's a small system, and we control all of it and can make it do what we want.

But custom operators, oh my. They're not rocket science, but there's just so many moving parts. I'm pretty sure they're the most tested feature in the language. Test after test with is looser, is tighter, is equal. Ways to do it right, ways to do it wrong. Phew! I like the end result, but... there are a few months of drastically slowed 007 activity in the early parts of 2015, where I was basically waiting for tuits to finish up the custom-operators branch. (Those tuits finally arrived during YAPC::Europe, and since then development has picked up speed again.)

I'm just saying I was surprised by custom operators being hairier than macros! But I stand by my statement: they are. At least in 007.

(Oh, and by the way: you can combine the two features, and get custom operator macros! That was somewhere in the middle in terms of difficulty-to-implement.)

Finally:

Toy language development is fun

Developing a small language just to explore certain aspects of language design space is unexpectedly addictive, and something I hope to do lots of in the next few years. I'm learning a lot. (If you've ever found yourself thinking that it's unfortunate that curly braces ({}) are used both for blocks and objects, just wait until you're implementing a parser that has to keep those two straight.)

There's lots of little tradeoffs a language designer makes all day. Even for a toy language that's never going to see any actual production use, taking those decisions seriously leads you to interesting new places. The smallest decision can have wide-ranging consequences.

If you're curious...

...here's what to look at next.

Looking forward to where we will take 007 in 2016. We're gearing up to an (internal) v1.0.0 release, and after that we have various language/macro ideas that we want to try out. We'll see how soon we manage to make 007 bootstrap its runtime and parser.

I want to take this opportunity to thank Filip Sergot and vendethiel for commits, pull requests, and discussions. Together we form both the developer team and user community of 007. 哈哈

Death by Perl6: Yet Another Perl6 HTTP Client

Published by Nick Logan on 2015-11-08T23:08:12

I've had a few bits of Perl6 code in production for some basic tasks for awhile, but with rakudo growing increasingly stable I decided to give it a little more responsibility. Its next task? Basic web scraping.

Our data source is open to scraping, but they expect certain considersations to be made. Additionally there are 1.2 million pages that need to be scraped weekly within a certain window. This ruled out the (then) current list of pure Perl6 HTTP clients, including the one I had built into Zef 1, due to keep-alive. Using curl or perl5 would certainly be the superior choice here, but this is as much an exercise as much as it is a task to complete. Thus Net::HTTP 2 was started. This blog post will focus on 2 of the more interesting aspects of Net::HTTP, the connection caching responsible for keep-alive, and the socket wrapper used to parse responses.

In Perl6 IO::Handle 3 provides the .get and .lines methods, but these rely on decoding the data first. Instead we will just split the Positional Blob on a specific ord sequence:

method get(Bool :$bin where True, Bool :$chomp = True) {
    my @sep      = $CRLF.contents;
    my $sep-size = +@sep;
    my $buf = buf8.new;
    loop {
        $buf ~= $.recv(1, :bin);
        last if $buf.tail($sep-size) ~~ @sep;
    }
    $ = ?$chomp ?? $buf.subbuf(0, $buf.elems - $sep-size) !! $buf;
}

The code is fairly simple: $CRLF contains the string "\r\n", and .contents extracts the ords that will match it. Then we iterate 1 byte and check the end of the buffer to see if it matches our end-of-line ords. Reading 1 byte at a time may not be the most efficient method, but we only use .get(:bin) for accessing headers so the performance hit is insignificant. The .lines(:bin) method is implemented similar to how it already is in IO::Handle:

method lines(Bool :$bin where True) {
    gather while (my $data = $.get(:bin)).DEFINITE {
        take $data;
    }
}

Of course these are meant to be used in a very specific order... call .get(:bin) once to get your status line followed by .lines(:bin).map({$_ or last }) to get your headers. What is that map bit for you ask? It keeps .lines(:bin) from iterating into the message body itself. We need to read the message body with a different function, one that can understand content length and chunked encoding:

method supply(:$buffer = Inf, Bool :$chunked = False) {
    my $bytes-read = 0;
    my @sep        = $CRLF.contents;
    my $sep-size   = @sep.elems;
    my $want-size  = ($chunked ?? :16(self.get(:bin).unpack('A*')) !! $buffer) || 0;
    $ = Supply.on-demand(-> $supply {
        loop {
            my $buffered-size = 0;
            if $want-size {
                loop {
                    my $bytes-needed = ($want-size - $buffered-size) || last;
                    if (my $data = $.recv($bytes-needed, :bin)).defined {
                        last unless ?$data;
                        $bytes-read    += $data.bytes;
                        $buffered-size += $data.bytes;
                        $supply.emit($data);
                    }
                    last if $buffered-size == $bytes-needed | 0;
                }
            }

            if ?$chunked {
                my @validate = $.recv($sep-size, :bin).contents;
                die "Chunked encoding error: expected separator ords '{@sep.perl}' not found (got: {@validate.perl})" unless @validate ~~ @sep;
                $bytes-read += $sep-size;
                $want-size = :16(self.get(:bin).unpack('A*'));
            }
            last if $want-size == 0 || $bytes-read >= $buffer || $buffered-size == 0;
        }

        $supply.done();
    });
}

This code may appear more intimidating than it actually is, but it essentially just double buffers the data (the inner loop almost never needs to iterate a second time). It knows when to stop reading based on the content length sent via header, decoding the size line of a chunked section of message body, or reads everything until the connection is closed. We emit our data out via a supply (for threading reasons outside the scope of this post), so we can even close the connection mid-body read if needed. Then it all gets stuffed into a IO::Socket::INET or IO::Socket::SSL object via the role IO::Socket::HTTP 4 that contains these helpers. Here is what your basic http client might look like using these components:

# Personally I like how clean this looks, but I'm a big fan of higher order functions
my $socket        = IO::Socket::INET.new(:host<google.com>, :port(80)) but IO::Socket::HTTP;
my $status-line   = $socket.get(:bin).unpack('A*');
my @header-lines  = $socket.lines(:bin).map({"$_" or last})>>.unpack('A*');
my $body          = $socket.supply.list;

With the ability to write clean HTTP interfaces let us now look at connection caching and our keep-alive goal. We know that we can't just send a HTTP request for one host to any old socket that is open, so a simple solution is to just use a hash and index it based on host and scheme: my %connections{$host}{$scheme}. If a socket exists and is not being used, then try to reuse it. Otherwise create the socket and save it to the hash (but only if Connection: keep-alive)

method get-socket(Request $req) {
    $!lock.protect({
        my $connection;

        # Section 1
        my $scheme    = $req.url.scheme;
        my $host      = $req.header<Host>;
        my $usable   := %!connections{$*THREAD}{$host}{$scheme};

        # Section 2
        if $usable -> $conns {
            for $conns.grep(*.closing.not) -> $sock {
                # don't wait too long for a new socket before moving on
                next unless await Promise.anyof( $sock.promise, start { $ = Promise.in(3); False });
                next if $sock.promise.status ~~ Broken;
                last if $connection = $sock.init;
            }
        }

        # Section 3
        if $connection.not {
            $connection = $.dial($req) but IO::Socket::HTTP;
            $connection.init;

            $usable.append($connection) unless $req.header<Connection>.any ~~ /[:i close]/;
        }

        # Section 4
        $connection.closing = True if $req.header<Connection>.any ~~ /[:i close]/;

        $connection;
    });
}

First we lock this block of code up because our Net::HTTP::Transport 5 needs to be thread safe and we don't want a race condition when retrieving or setting a socket into the cache (Section 1). $usable gets bound to %connections just because its shorter to write later on. There is also the additional index on $*THREAD; This too is beyond the scope of this blog post but just understand it needs to be there if you want to launch these in start blocks.

In Section 2 we iterate over our cache looking at the .closing attribute (an attribute in IO::Socket::HTTP that we set if a socket in the queue knows it will close the connection, aka be the last request sent on that socket). Because we don't want to wait for a long request to finish we also implement a timeout of 3 seconds before it tries the next socket. Next we check if a promise used in IO::Socket::HTTP is broken, which would mean the socket was closed, and move on if it is. Finally we call $connection = $sock.init, where our .init method from IO::Socket::HTTP resets the previously mentioned promise and essentially claims the socket for its own use.

We enter Section 3 if there are no reusable connections (either the first connection for a specific host created, or none allow keep-alive). .dial($req) simply returns a IO::Socket::INET or IO::Socket::SSL, and we apply our IO::Socket::HTTP to this connection. Finally we add the connection to our cache for possible reuse unless the server has told us it will close the connection.

Section 4 speaks for itself I hope :)

With the novel parts out of the way I still need to implement cookies, multipart form posting, and some other simple basics. But now we have a strong base for building customized client foundations similar to golang. No it doesn't follow the keep-alive rules sent via the header, but these are trivial tasks.

I'll leave one last code snippet from IO::Socket::HTTP some may find useful:

method closed {
    try {
        $.read(0);
        # if the socket is closed it will give a different error for read(0)
        CATCH { when /'Out of range'/ { return False } }
    }
}

This will let you call $socket.closed to find out if a socket is open or not... no need to access the $!PIO. You may wonder why we wouldn't use this .closed method in our get-socket method above. The answer is it is used, but its abstracted behind the call to .init.

  1. Zef

  2. Net::HTTP

  3. IO::Handle

  4. IO::Socket::HTTP

  5. Net::HTTP::Transport

Perl 6 Maven: Number guessing game in Perl 6

Published by szabgab

Perl 6 Maven: push vs. append on arrays in Perl 6

Published by szabgab

Perl 6 Maven: Installing Rakudo Perl 6

Published by szabgab

Perl 6 Maven: Find Perl 6 modules without Travis CI

Published by szabgab

jdv: Perl6 and CPAN: MetaCPAN Status as of 2015-10-09

Published by jdv on 2015-10-09T14:04:00

MetaCPAN, like the rest of "CPAN", was built assuming the sole context of Perl5. Which is cool until we want to use it for Perl6 and avoid the troubles associated with different namespaces, dist mgmt, etc... To largely avoid and more easily handle these issues for MetaCPAN it's been suggested that we have separate instances. The existing Perl5 instance only needs to be changed to ignore Perl6 distributions. There has already been some breakage because it didn't ignore a Perl6 dist of mine which exists in the Perl5 world:( And the new Perl6 instance will do just the opposite and only look at Perl6 distributions.

In contrast, and relatedly, on CPAN we've designated a special spot for Perl6 distributions in order to keep them separate from the Perl5 dists. This reserved place is a Perl6 subdir in an author's dir (/author/id/*/*/*/Perl6/). Any dists in or under that spot on the fs will be considered a Perl6 dist; valid or invalid. So this is where the Perl6 MetaCPAN will look and the Perl5 instance will not.

Current development is being done on these temporary branches:

And the main dev instance is running on hack.p6c.org. The web end is at http://hack.p6c.org:5001 and the api is at http://hack.p6c.org:5000.

So far the idea has been to iterate on the aforementioned branches and instance until we have something that works sufficiently well. At that point we'll tidy up the branches and submit them for merging. Shortly after that time the hope is that we'll be able to stand up the official Perl6 instance.

The list of requirements for being adequately cooked is:

  1. track Perl6 CPAN dists and ignore Perl5 dists
  2. import a Perl6 distribution
  3. index a Perl6 distribution for search
  4. render pod6 documentation
  5. do Perl6 syntax highlighting

All of these have been hacked in and are at various degrees of completeness. Next up is testing and fixing bugs until nothing major is left. To that end I've recently loaded up the dev instance with all the distributions from modules.perl6.org. The dist files were generated, very hackily, with https://github.com/jdv/cpan-api/blob/master/test_p6_eco_to_p6_cpan.pl. I also just loaded them all under one user, mine, for simplicity. That load looks like it has problems of its own as well as revealing a bunch of issues. So in the coming days I hope to get that all sorted out.