poorly formed JSON - options?
Date: 12/14/09
(PHP Community) Keywords: no keywords
Hello everyone!
For a project I'm working on, I'm writing a screen scraper. Unfortunately the material I am attempting to mine is poorly-formed JSON (probably parsed with proprietary code). json_decode() returns NULL.
My first thought was that I'd just parse it with regular expressions. Here I've run into a question I can't seem to work out. Much of the data I'm trying to parse looks like
name:'blah blah blah'
So I thought perhaps I could use
/name:\'([^'.*]*)\'/
to grab the stuff between single quotes.
Unfortunately some of the data has escaped single quotes inside, like this:
name:'blah\'s blah blah'
in which case my regex returns only "blah\".
So is there a way to say in regex speak, "stop at the next single quote, unless it is escaped with a slash?" I've not been able to find any reference to a way to write a regex which has an exception to a character negation.
As an alternative, I'm wondering if maybe there's something like Tidy, but for JSON. So far I'm not finding much.
Thanks!
ETA: After much trial and error I found a solution:
name:\'([^']+\\\'[^']+)\'|name:\'([^']*)\'
But I'd still be curious to learn if there's a program like Tidy for JSON.
Source: http://php.livejournal.com/675088.html