The correct way to validate SearchMonkey markup is by using the validator. The raw data presented by the [Cached] link is not, and has never been, guaranteed to represent the actual RDFa data stored in the system.
As for that specific product example link, it is validating in the SearchMonkey validator and the cache is now in sync:
Best,
Evan Goer
Yahoo! SearchMonkey Team
On Sep 25, 2009, at 7:48 AM, Martin Hepp (UniBW) wrote:
Dear Evan:
I was just trying to investigate why the page
http://www.heppnetz.de/searchmonkey/product.html
does not appear in the Yahoo search results with SearchMonkey enhancements.
The markup is fine when checked with the Yahoo validator.
While doing so, I found that in the CACHED version, all RDFa attribute names are capitalized, e.g.
<span typeof="gr:UnitPriceSpecification">
was turned into
<SPAN TYPEOF="gr:UnitPriceSpecification">
Even worse, closing span / div elements are mostly omitted, and partly in lower caps, so that the opening and closing tags don't match.
This happens in particular for elements that have no visible content, e.g.
<span property="gr:validFrom" content="2009-07-20T00:00:00Z"></span>
turns ito
<SPAN PROPERTY="gr:validFrom" CONTENT="2009-07-20T00:00:00Z">
or
<SPAN PROPERTY="gr:validFrom" CONTENT="2009-07-20T00:00:00Z">
</span>
Consequently, PyRDFa cannot parse the cached document and when I submit the CACHED version to your validator, it does not find any markup (see screenshot).
===> Can you please investigate this?
I think this is really a high priority issue, because you will create a lot of frustration among early adopters if their proper markup will not appear in Yahoo due to errors in the crawler or index.
Ideally, please submit
http://www.heppnetz.de/searchmonkey/product.html
to the crawler with high priority and check the results - if it works, please report.
For the community, it will likely be okay to wait a couple of weeks for inclusion or update. But it's really difficult to adopt the technology if it remains unclear whether problems at Yahoo's side are the source of observed problems while debugging requires such a delay between any iteration.
Best
Martin
--
--------------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen
e-mail: mhepp@...
phone: +49-(0)89-6004-4217
fax: +49-(0)89-6004-4620
www: http://www.unibw.de/ebusiness/ (group)
http://www.heppnetz.de/ (personal)
skype: mfhepp twitter: mfhepp
Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
Webcast:
http://www.heppnetz.de/projects/goodrelations/webcast/
Recipe for Yahoo SearchMonkey:
http://tr.im/rAbN
Talk at the Semantic Technology Conference 2009: "Semantic Web-based E-Commerce: The GoodRelations Ontology"
http://tinyurl.com/semtech-hepp
Talk at
Overview article on Semantic Universe:
http://tinyurl.com/goodrelations-universe
Project page:
http://purl.org/goodrelations/
Resources for developers:
http://www.ebusiness-unibw.org/wiki/GoodRelations
Tutorial materials:
CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey http://tr.im/grcec09
<Picture 10.png>This page is a demo of how a small business can feed its product and offer descriptions into Yahoo! SearchMonkey and the Web of Linked Data.
Provided by the E-Business & Web Science Research Group (Prof. Hepp) at Universitaet der Bundeswehr Muenchen, www.unibw.de/ebusiness/.
Price: 34.99 USDProduct Name: Personal SCSI 16-bit SCSI Controller
Description: This low-cost, high-performance SCSI controller allows you to connect up to seven professional mass-storage devices to your computer.
EAN/UPC code: 00010363780
Article number: 10363780
Manufacturer: Hepp Computertechnik
Product Specifications
FAQ
Product Manual
Product Reviews:
Average: 4.5, lowest: 0, highest: 5 (total number of reviews: 45)