|
Posted by comp.lang.tcl on 11/25/06 19:15
Bryan Oakley wrote:
> comp.lang.tcl wrote:
> > ... I do not understand XML parsing at all, DOM, SAX, xQuery,
> > X[whatever else], none of it makes sense to me and I cannot understand
> > the books, tutorials and online guides (not to mention the tips from
> > far smarter people than I) that has barraged me; all of it makes
> > absolutely no sense to me.
> >
> > At this point the only way I can parse XML is using PHP, because that's
> > literally the ONLY way I can do it, period!
> >
> > But if you want to see a sample of the XML I'm working with, here it
> > is:
> >
> > <?xml version="1.0" encoding="utf-8" ?><trivia><entry id="1101"
> > triviaID="233" question="Who wrote "Trilogy of Knowledge"?"
> > answerID="1" correctAnswerID="1" answer="Believer"
> > expDate="1139634000"></entry><entry id="1102" triviaID="233"
> > question="Who wrote "Trilogy of Knowledge"?" answerID="2"
> > correctAnswerID="1" answer="Saviour Machine"
> > expDate="1139634000"></entry><entry id="1103" triviaID="233"
> > question="Who wrote "Trilogy of Knowledge"?" answerID="3"
> > correctAnswerID="1" answer="Seventh Avenue"
> > expDate="1139634000"></entry><entry id="1104" triviaID="233"
> > question="Who wrote "Trilogy of Knowledge"?" answerID="4"
> > correctAnswerID="1" answer="Inevitable End"
> > expDate="1139634000"></entry><entry id="1105" triviaID="233"
> > question="Who wrote "Trilogy of Knowledge"?" answerID="5"
> > correctAnswerID="1" answer="No such song existed"
> > expDate="1139634000"></entry>
>
> That data is mal-formed XML. For example, you are missing the closing
> </trivia> tag.
>
> Here's a solution that works with the above data. I've mentioned the
> "xml2list" proc a couple of times, but with the sample data I see your
> data will need a little extra pre-processing.
>
> Step 1: copy the proc "xml2list" from this page: http://mini.net/tcl/3919
Ok done
>
> Second, enter the following, which is taking the above data verbatim and
> storing it in a variable:
>
> set data {<?xml version="1.0" encoding="utf-8" ?><trivia><entry
> id="1101"
> triviaID="233" question="Who wrote "Trilogy of Knowledge"?"
> answerID="1" correctAnswerID="1" answer="Believer"
> expDate="1139634000"></entry><entry id="1102" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="2"
> correctAnswerID="1" answer="Saviour Machine"
> expDate="1139634000"></entry><entry id="1103" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="3"
> correctAnswerID="1" answer="Seventh Avenue"
> expDate="1139634000"></entry><entry id="1104" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="4"
> correctAnswerID="1" answer="Inevitable End"
> expDate="1139634000"></entry><entry id="1105" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="5"
> correctAnswerID="1" answer="No such song existed"
> expDate="1139634000"></entry>}
>
> Your data is missing an ending </trivia> tag, so we have to add it for
> this specific example. I don't know if this is a problem you'll have to
> solve with your full dataset. Also, the xml2list proc doesn't like the
> leading <?xml...> stuff. So, let's modify your data:
Sorry that was my fault, I left off the </trivia> tag when I copied and
pasted onto here. The closing tags do exist in all of my XML files
>
> # remove the leading <?xml...?> data
> regexp {<\?.*?\?>(.*$)} $data -- data
>
> # add a trailing </trivia> which is missing from
> # the sample data
> set data "$data</trivia>"
>
> And now, convert it to a list and print it out:
>
> set result [xml2list $data]
> puts $result
>
> If you didn't introduce any typos, you'll get the following output:
No I don't, I get the following error, spawned from within xml2list:
unmatched open quote in list while executing "lindex $item 0"
("default" arm line 2) invoked from within "switch -regexp -- $item {
^# {append res "{[lrange $item 0 end]} " ; #text item} ^/ { regexp
{/(.+)} $item -> ..." (procedure "xml2list" line 9)
This is what I did:
[TCL]
# USE xml2list PROC WITHIN THIS LIBRARY AS YOUR DEFAULT MEANS OF
PARSING XML INTO TCL LIST
if {![string equal $switch -body] && [string length [info procs
{xml2list}]] > 0} {
regexp {<\?.*?\?>(.*$)} $contents -- contents
return [xml2list $contents]
}
[/TCL]
>
> trivia {} {{entry {id 1101 triviaID 233 question {Who wrote
> "Trilogy of Knowledge"?} answerID 1 correctAnswerID 1 answer
> Believer expDate 1139634000} {}} {entry {id 1102 triviaID 233 question
> {Who wrote "Trilogy of Knowledge"?} answerID 2 correctAnswerID
> 1 answer {Saviour Machine} expDate 1139634000} {}} {entry {id 1103
> triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
> answerID 3 correctAnswerID 1 answer {Seventh Avenue} expDate 1139634000}
> {}} {entry {id 1104 triviaID 233 question {Who wrote "Trilogy of
> Knowledge"?} answerID 4 correctAnswerID 1 answer {Inevitable End}
> expDate 1139634000} {}} {entry {id 1105 triviaID 233 question {Who wrote
> "Trilogy of Knowledge"?} answerID 5 correctAnswerID 1 answer
> {No such song existed} expDate 1139634000} {}}}
>
> The above is a valid tcl list that you can now process with normal tcl
> list-handling commands. Do *not* process this list with string
> transformations (such as converting " to a quote). If you do, you
> run the risk of breaking it's list-ness. Instead, loop over the data and
> do the conversion as a final step on a element-by-element basis.
>
> Does this help? It's not robust; the xml2list assumes you have proper
> xml with a balanced set of tags (or in the specific case in this
> message, with a missing </trivia> tag). Hopefully, though, it will at
> least get you started.
Like I said, I do have balanced XML (just didn't produce it here tis
all), but xml2list produces errors when I try to read it
Phil
Navigation:
[Reply to this message]
|