|
Posted by Toby A Inkster on 02/10/07 14:14
cjl wrote:
> Short of writing a parser, which is clearly beyond me, what are some
> reasonable approaches to handling user input that will be executed?
Writing a parser is the best option in the long-run. If you were to
attempt to interpret the user input some other way, like pure regular
expressions, then you would fall into a lot of traps, and your interpreter
would behave oddly in many cases.
A full parser is a much better option: it will behave far more reliably
and would be a lot easier to extend, should you feel the need to add extra
features to the language at a later date.
Although it's a lot of work, there are some fairly well established
methods on writing them. What you basically need to write is three fairly
independent components: a tokeniser, a parser and an interpreter. None of
these share any code in common, except for the definitions of a few
constants and classes.
Firstly, a tokeniser, which reads the user input and splits it into a long
list of tokens. Each token should have the form:
class Token
{
var $token_type; // integer
var $token_value; // mixed
var $line; // integer
var $char; // integer
}
Such that when you tokenize the following PHP:
echo "Foo";
You end up with something like this (though imagine the inner arrays are
actually Token objects!):
array(
array(TOKEN_BUILTIN, "echo", 1, 1),
array(TOKEN_STRING_DQUOTED, "Foo", 1, 6),
array(TOKEN_TERMINATOR, NULL, 1, 11)
);
Note the $line and $char which contain the line number and character
number where this token was found? That helps when a later stage of your
program needs to print an error message -- it can inform the user of the
exact location where the error occurred.
Writing a tokeniser is probably the easiest step. The only slightly
difficult bits are things like "dealing with strings that contain
\"special\" characters", but even they are not too difficult!
Your tokeniser then passes this list over to the parser. The parser is
probably the hardest part you have to write. You have to convert the
stream of tokens into an "abstract syntax tree".
First you need to define the classes you'll build the AST out of. PHP 5's
object oriented features will be very useful here.
abstract class AstNode
{
public $token;
final public function __construct($t)
{
$this->token = $t;
}
abstract public function evaluate($machine);
}
class AstNode_Script extents AstNode
{
public $statements;
public function evaluate($machine)
{
foreach ($this->statements as $s)
$s->evaluate($machine);
}
}
class AstNode_If extends AstNode
{
public $condition_expression;
public $execution_block;
public function evaluate()
{
if ($this->condition_expression->evaluate($machine))
$this->execution_block->evaluate($machine);
}
}
class AstNode_Constant_False extends AstNode
{
public function evaluate($machine) { return FALSE; }
}
// etc
Then write the parser itself, which takes the form:
class Parser
{
private $tokens;
public function __construct($T)
{
if (is_array($T))
$this->tokens = $T;
else
throw new Exception('Argh!');
}
public function next()
{
return array_shift($this->tokens);
}
public function peek()
{
return $this->tokens[0];
}
public function get($type, $hissy_fit=FALSE)
{
$next = $this->peek;
if ($next->token_type==$type)
return $this->next();
elseif ($hissy_fit)
throw new Exception('hissy fit');
else
return FALSE;
}
public function parseScript()
{
$ast = new AstNode_Script($this->peek());
$ast->statements = $this->parseCommand();
while ($this->peek())
{
$ast->statements = $this->parseCommand();
}
return $ast;
}
// And then you write parseCommand, which in turn probably
// calls things like parseConditional, parseExpression,
// parseFunctionCall and so forth.
}
The third part of the job is interpreting the AST, but if you look at my
AstNode_* classes above, you'll see they have the logic built into them.
All you then need to do is:
$ast->evaluate($machine);
Where machine is an object capable of keeping track of things like
variable values, function definitions and so forth.
It's quite a bit of work, but it's certainly do-able. It helps if you have
a good book on compilers -- I'd recommend Watt & Brown "Programming
Language Processors in Java". As you might guess from the title, it
teaches you to write parsers, compilers and interpreters in Java, but the
same techniques can easily be applied to any object-oriented language, and
with a little more imagination, to non-OO languages too.
A few months back, partly as an experiment, but partly because I thought
it would be useful for a project of mine, I designed my own scripting
language and wrote a tokeniser, parser and machine for it in PHP. It
supports variables (numeric, string and multi-dimensional arrays),
functions, comments, and has all the normal numeric, string and array
operators built-in. Scalar (non-array) variables, are automatically
typecast as arrays (such that they become single-element arrays) and array
variables are automatically typecast as scalars (the first value in the
array is used, the rest are discarded).
The reason I wrote it is that it would allow user-supplied code to run in a
"sandbox" environment, so that if it crashed, or tampered with variables,
or whatever, it wouldn't cause any problems for the rest of the site.
It's half-finished, the syntax is sort of crazy and it needs improving,
which is why I've not foisted it upon the general public. But if you want
a copy, I'd be happy to send you one, licensed under the GPL.
Here's an example of using it:
<?php
$p = <<<PROG
/* Function names can be arbitrary strings. No parentheses used. */
function "my concatenation function", $a, $b
{
/* Uses "let VAR := EXPR" for assignment. A bit like Turing. */
let $result := $a . $b;
/* Perlish syntax for returning function results. */
$result;
}
let $quux = call "my concatenation function", "foo", "bar";
/* Print automatically appends a new line, a la Python */
print $quux;
PROG;
$r = eval_programme($p);
?>
--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux
* = I'm getting there!
[Back to original message]
|