poorly formed JSON - options?

    Date: 12/14/09 (PHP Community)    Keywords: no keywords

    Hello everyone!

    For a project I'm working on, I'm writing a screen scraper. Unfortunately the material I am attempting to mine is poorly-formed JSON (probably parsed with proprietary code). json_decode() returns NULL.

    My first thought was that I'd just parse it with regular expressions. Here I've run into a question I can't seem to work out. Much of the data I'm trying to parse looks like

    name:'blah blah blah'

    So I thought perhaps I could use
    /name:\'([^'.*]*)\'/
    to grab the stuff between single quotes.

    Unfortunately some of the data has escaped single quotes inside, like this:
    name:'blah\'s blah blah'
    in which case my regex returns only "blah\".

    So is there a way to say in regex speak, "stop at the next single quote, unless it is escaped with a slash?" I've not been able to find any reference to a way to write a regex which has an exception to a character negation.

    As an alternative, I'm wondering if maybe there's something like Tidy, but for JSON. So far I'm not finding much.

    Thanks!

    ETA: After much trial and error I found a solution:
    name:\'([^']+\\\'[^']+)\'|name:\'([^']*)\'

    But I'd still be curious to learn if there's a program like Tidy for JSON.

    Source: https://php.livejournal.com/675088.html

« Hmm... || Do x to items 1-7, then y... »


antivirus | apache | asp | blogging | browser | bugtracking | cms | crm | css | database | ebay | ecommerce | google | hosting | html | java | jsp | linux | microsoft | mysql | offshore | offshoring | oscommerce | php | postgresql | programming | rss | security | seo | shopping | software | spam | spyware | sql | technology | templates | tracker | virus | web | xml | yahoo | home