Computing.Net > Forums > Web Development > XML Parser

XML Parser

Reply to Message Icon

Original Message
Name: WCWidgets
Date: June 13, 2008 at 11:16:52 Pacific
Subject: XML Parser
OS: Mac OS X v10.4.11
CPU/Ram: 500 MHz/512 MB RAM
Model/Manufacturer: Apple/iMac G3
Comment:

I am developing a Dashboard widget, and need it to parse a certain XML file online. Here is
the file's code. The values that I need are "text", (from child "post"), "avatar", and "userid"
(both from child "userinfo").

<?xml version="1.0" encoding="UTF-8"?>
<posts type="array">
<post>
<posted_at>Thu Jun 12 02:40:04 +0000 2008</posted_at>
<post_id>987654</post_id>
<text>Posting text...</text>
<post_from>Forum</post_from>
<shortened>no</shortened>
<reply_to>John Smith</reply_to>
<reply_to_id>123456</reply_to_id>
<starred>no</starred>
<userinfo>
<id>456789</id>
<real_name>John Doe</real_name>
<userid>JohnDoe</userid>
<from>Paris, France</from>
<signature>Thanks, John</signature>
<avatar>john.png</avatar>
<url>http://sample.johndoe.net</url>
<private>No</private>
<rss_subscribed>Yes</rss_subscribed>
</userinfo>
</post>

(more posts exist)...

Originally, the XML parser I used was to parse an RSS feed, but I wanted to modify it, since
the XML file seemed to be much more capable, and included the avatar feature. Here is the
original code for the parser (this was a pre-canned script, and I did not build it):

function parse(){
var s = "";
var items = xml.getElementsByTagName("post");

for (var i = 0; i < items.length; i++)
{
var item = items[i];

var vals = {};
for (var j = 0; j < item.childNodes.length; j++)
{
var jj = item.childNodes[j];
vals[jj.nodeName] = txt(jj);
}
}

s += '<div class="post">' + vals["text"] + '</div>'
}

setHTML ("results",s)
}
}

function txt(x)
{
if (x && x.firstChild)
return x.firstChild.nodeValue;
else
return x;
}

I was able to receive results simply by altering the getElements string to "post" instead of the
standard "item". This allows me to receive the actual post text without problem. However,
this does not allow me to receive the user's ID or avatar, since they are children of "userinfo",
not "post". I have attempted to create a second parsing instance (using the variable
"useritems" and "uservals", also substituting all other variables), but this leads to either the
avatar and user ID displaying improperly, or the text displaying improperly (depending of I
insert the new code before the "item" section or above).

I am very new to XML parsing, so I am not sure how to fix this problem. I am sure it has to
do with child and node mix-ups, but, again, I am not sure how to fix them. Essentially, I
need the parsing function to get the values of nodes "userid" and "avatar" of child "userinfo"
AND get the values of node "text" of child "post". I know I need to re-write the parser, and I
have seen parsers such as Xparse that can parse the XML correctly, but I don't know how to
implement them.

Any assistance would be greatly appreciated.


Report Offensive Message For Removal


Response Number 1
Name: FishMonger
Date: June 14, 2008 at 01:28:52 Pacific
Reply: (edit)

It might help if you see how it's done in another language. The link to the parser source code is at the bottom.


#!/usr/bin/perl

use strict;
use warnings;
use XML::Simple;
use Data::Dumper;

my $xml = XMLin('post.xml');

# let's print a single element
print $xml->{post}{userinfo}{userid}, "\n\n";

# let's dump the entire data
print Dumper $xml;

================================
Here's the output of that script


JohnDoe

$VAR1 = {
'post' => {
'starred' => 'no',
'reply_to_id' => '123456',
'shortened' => 'no',
'userinfo' => {
'real_name' => 'John Doe',
'userid' => 'JohnDoe',
'private' => 'No',
'avatar' => 'john.png',
'rss_subscribed' => 'Yes',
'signature' => 'Thanks, John',
'from' => 'Paris, France',
'url' => 'http://sample.johndoe.net',
'id' => '456789'
},
'posted_at' => 'Thu Jun 12 02:40:04 +0000 2008',
'post_from' => 'Forum',
'text' => 'Posting text...',
'reply_to' => 'John Smith',
'post_id' => '987654'
},
'type' => 'array'
};

http://search.cpan.org/~grantm/XML-...
http://search.cpan.org/src/GRANTM/X...


Report Offensive Follow Up For Removal

Response Number 2
Name: WCWidgets
Date: June 26, 2008 at 16:53:40 Pacific
Reply: (edit)

OK, Thank you. I've re-implemented the parser, since Perl is
a lot easier to understand and manipulate than pre-
programmed DOM functions.


Report Offensive Follow Up For Removal







Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: XML Parser

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software




Have you ever used OpenOffice?

Yes, as my main suite.
Yes, occationally.
Yes, but only once.
No, never.


View Results

Poll Finishes In 5 Days.
Discuss in The Lounge