|  | Posted by Toby A Inkster on 02/12/07 13:30 
cjl wrote:
 > I'm wondering if there is a simpler approach...after all, I want the
 > user input to be valid php, I just want to limit what they can type to
 > a few functions I write ( circle(), line(), etc..) and a few control
 > structures.
 
 If the user input is to be valid PHP, the "obvious" solution is eval(),
 but this will totally destroy your security. You could use regular
 expressions to check for "naughty" functions (like SQL queries, file
 system manipulation, TCP sockets, etc), but then you end up:
 
 (a) playing catch-up with the features of PHP itself. As new
 functions are added to the language, you'll need to evaluate
 how naughty they are, and add them to the block list.
 
 (b) naively blocking more innocent PHP like:
 print "fopen";
 
 (It's worth mentioning, that you'll also need to include in your block
 list "naughty" functions from any of your own or third-party libraries you
 use.)
 
 > Maybe I could create an object which includes member functions which
 > override all native php functions, and have the user input actually be
 > calls to that objects methods, and only pass through the ones that I
 > want to allow?
 
 Aye -- that is indeed the dream: the ability to have an eval() function
 that works within a single object, such that any function calls are
 silently re-written to "$this->function()", any globals to
 "$this->variable" and any constants to "self::CONSTANT".
 
 Although PHP doesn't have such a "safe eval" function built in, it
 shouldn't be too difficult to build one. As your language follows PHP
 syntax rules, you can use PHP's built-in tokeniser:
 
 $tokens = token_get_all($source);
 
 Then loop through that list, looking for all tokens of type T_VARIABLE and
 re-pointing them at object members; finding T_STRING (which despite the
 name is an "non-$-identifier" token, so could be either a function call or
 a constant) and heuristically (e.g. UPPERCARE is assumed to be a
 constant; MixedOr_lower_case is assumed to be a method.) re-pointing it at
 a class constant or object method; and finally finding T_EVAL and
 replacing it with T_ECHO. You would then need to loop through the token
 list and re-assemble it as source code before passing it through an eval()
 function wrapper within the object.
 
 Sounds complicated; but is simpler than implementing your own real parser
 and interpreter; and could probably be done in less than 50 lines of code.
 
 I wouldn't be happy running it on a production system though without
 substantial hack-testing!
 
 > As far as the approach you are suggesting, some googling showed:
 > http://greg.chiaraquartet.net/archives/138-PHP_ParserGenerator-and-PHP_LexerGenerator.html
 > Which maybe can help me?
 
 Quite possibly -- it does look quite good. If I'd known of its existence
 when I started my scripting language, I might not have attempted to write
 a scripting language. But I certainly learnt a lot --especially about OO
 PHP -- from doing so, so I don't regret it.
 
 > That is the single best response ever given to a newsgroup post. Thank
 > you.
 
 No problem -- I'd guessed that nobody else in this group had been crazy
 enough to attempt a scripting language parser and interpreter in PHP, so
 if I didn't help you, nobody would!
 
 --
 Toby A Inkster BSc (Hons) ARCS
 Contact Me ~ http://tobyinkster.co.uk/contact
 Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux
 
 * = I'm getting there!
 [Back to original message] |