--- In searchmonkey-developers@yahoogroups.com, "Robert Crowther"
<robertc@...> wrote:
>
> I'm trying to get the text inside an option element as part of an
> XPath (example URL is http://googleblog.blogspot.com/). In XPather
> both of these give the text I want (currently 'September 2008 (51)'):
>
> //div[@id='BlogArchive1_ArchiveList']/select/option[2]
> //div[@id='BlogArchive1_ArchiveList']/select/option[2]/text()
>
> But in the "Define your page extraction rules" part of the Custom Data
> Service developer tool for either one I get the message "WARNING:
> Element meta with property = dc:title has no content."
>
> I've checked the input using the link on the page, but it looks the
> same as the code I'm seeing with XPather.
>
> Have I done something dumb wrong or is this a bug? If I've done
> something dumb, what is the correct way to get at the text of an
> option element?
>
> Rob
>
What you have looks correct. I tried it and got the same error. I
think the problem is that Google's blog HTML is ridiculously messed
up. I mean, they are using single quotes for most of their attributes!
Check out the result of the W3 Validator:
http://validator.w3.org/check?uri=http%3A%2F%2Fgoogleblog.blogspot.com%2F&charse\
t=(detect+automatically)&doctype=Inline&group=0
356 Errors, 155 warning(s)
Even trying to just get the div you mentioned gives an error in the
dev tool:
//div[@id='BlogArchive1_ArchiveList']
I would tell you to make a simple HTML page with just the <div> and
<option>s from the Google blog, upload somewhere, and use that page to
test your data service, but... I think all URLs must be in the Yahoo
index before you can use them in the dev tool.