Hudzilla.org - the homepage of Paul Hudson
Contents > Alternative PHP uses > Making your own language Wish List | Report Bug | About Me ]

21.5.5     Planning it out

This is NOT the latest copy of this book; click here for the latest version.

Okay, here's where the theory begins. Like I said I've planned this out in such a way that hopefully everyone should be able to follow even if they have little knowledge of compiler mechanics.

Our language, which we shall called FooLang for argument's sake, will work by scanning a line of text, terminated by a semi-colon, and executing it. The following will all be valid examples of FooLang lines:

a = 45;
b = a;
print a;
d = a + b + a;
c = "Foo bar baz";
print "Baz bar foo";

As you can see, any text outside of strings that isn't "print" is considered a variable. Thus, the first line assigns 45 to a, the second assigns a to b, etc. The four example is of particular interest as it is a complex statement - there are three operators in there, and each one needs to be handled individually.

Handling simple statements such as "a = 10;" is quite easy - match variable, match equals, and match number. However, allowing statements that can be of any arbitrary size, such as "a = 10 * 10 + 5 - 3 * 4 - 9 / 2 * 10" isn't just a matter of writing dozens of rules for each possibility. Instead, we need some recursion. Ignoring problems such as operator associativity, our long equation above can be thought of like this:

a = 10 * (10 + (50 - (3 * (4 - (9 / (2 * 10))))));

That is, we multiply the last two first, then take the result of that and use it as the divisor for 9, then take the result of that and subtract it from 4, etc. Essentially what is happening is that the last three items are being taken off the equation, executed, and the result is being put back onto the end of the equation. This is then repeated again and again like this:

a = 10 * (10 + (50 - (3 * (4 - (9 / (2 * 10))))));
a = 10 * (10 + (50 - (3 * (4 - (9 / 20)))));
a = 10 * (10 + (50 - (3 * (4 - 0.45))));
a = 10 * (10 + (50 - (3 * 3.55)));
a = 10 * (10 + (50 - 10.65));
a = 10 * (10 + 39.35);
a = 10 * 49.35;
a = 493.5

At the end we have our simple rule, VARIABLE EQUALS NUMBER, so all we have to do is write a handler for that. Previously we looked at using an array like a double-ended queue, which means adding and removing items to and from the front and back of the array. If we only add or remove items from the back, what we'd have is the "stack" data structure, and we can manipulate our array stack using array_push() to push an item onto the end of the stack and array_pop() to pop an item off the stack.

Now look at the first two lines of the expression break down again:

a = 10 * (10 + (50 - (3 * (4 - (9 / (2 * 10))))));
a = 10 * (10 + (50 - (3 * (4 - (9 / 20)))));

If we're to use a stack to store this expression, this would look something like this (actions are in bold ):

a = 10 * (10 + (50 - (3 * (4 - (9 / (2 * 10))))));
pop 10
a = 10 * (10 + (50 - (3 * (4 - (9 / (2 *)))));
pop *
a = 10 * (10 + (50 - (3 * (4 - (9 / (2)))));
pop 2
a = 10 * (10 + (50 - (3 * (4 - (9 /))));
multiply 2 * 10
push result
a = 10 * (10 + (50 - (3 * (4 - (9 / 20)))));

Thus, a stack works perfectly for our needs. However, there's one more thing to consider: given that code needs to be executed in precisely the same way it was written, how can you be sure you're executing things correctly? That is, in the example above you can see the last two operations will be 2*10 then 9/20, but, if we start at the end, how do we know where to pass the result of 2*10?

The answer is that we need to set up recursion of the calculations, and this is best done using Reverse Polish Notation. This is a fancy term meaning that operators are postfix, or appended to the two operands.

For example, 2*10 becomes 2 10 *. This actually eliminates any problems of ambiguity, because a given expression can only be evaluated in one way. Our original expression was this:

a = 10 * (10 + (50 - (3 * (4 - (9 / (2 * 10))))));

In postfix notation, this is equal to

a 10 10 50 3 4 9 2 10 * / - * - + * =

Yes, that might look complicated to a reader, but to a computer it makes perfect sense, and will be analysed like this:

a 10 10 50 3 4 9 2 10 * / - * - + * =
pop =
a 10 10 50 3 4 9 2 10 * / - * - + *
pop *
a 10 10 50 3 4 9 2 10 * / - * - +
pop + (then, to save space here, it will also pop -, *, -, and /)
a 10 10 50 3 4 9 2 10 *
pop *, 10, and 2
a 10 10 50 3 4 9
calculate 10 * 2, push result, push previous operator
a 10 10 50 3 4 9 20 /
pop /, 20, and 9
Calculate 9 / 20, push result, push previous operator
a 10 10 50 3 4 0.45 -
Etc...

Essentially the code would pop off an element, and, if it's an operator, it would pop off another, and another, etc, until it finds two operands. These operands are then calculating with the last operator, and the result is pushed back onto the stack along with the previous operator. This process is then repeated and repeated until the stack is empty.

Of course, rather than pushing data back onto the stack, it's better to use a recursive function call, but I'm getting ahead of myself - let's start at the very beginning: how to parse text into tokens





<< 21.5.4 Output   21.5.6 How to parse text into tokens >>
Table of Contents
Want to see this stuff in print? PHP in a Nutshell takes the core topics covered here, adds in thousands of edits from the editorial team and myself, and combines them to make an unbeatable reference for PHP programmers at all levels.



My latest book has hundreds more tips on how to use PHP, Apache, and MySQL, plus Perl, Python, shell scripts, performance tuning, and more!



Top-right shadow
 
Bottom-left shadow Bottom shadow

Comments from other readers
Be the first to add a comment to this chapter!



Add comment
Please note that by posting a comment here you are committing it to the public domain. This is important so that others can make use of your code themselves, and also so that I can incorporate helpful notes directly into the main text. Comments are limited to 2000 characters in length.

If you are reporting an error in the content, please tell me directly.

Your name/email address:
Your comment:
 
Now, in order to verify that you're a real person, please answer this simple question: what is nine plus three?
The answer is:
(please write in
numbers, eg 19)


Top-right shadow
 
Bottom-left shadow Bottom shadow