Search the web
Sign In
New User? Sign Up
searchmonkey-developers · SearchMonkey - Developer Group
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Re: Important Problem with the Yahoo Crawler?   Message List  
Reply | Forward Message #461 of 483 |
Hello Martin,

The correct way to validate SearchMonkey markup is by using the validator. The raw data presented by the [Cached] link is not, and has never been, guaranteed to represent the actual RDFa data stored in the system.

As for that specific product example link, it is validating in the SearchMonkey validator and the cache is now in sync:

Best,
Evan Goer
Yahoo! SearchMonkey Team



On Sep 25, 2009, at 7:48 AM, Martin Hepp (UniBW) wrote:

Dear Evan:

I was just trying to investigate why the page

http://www.heppnetz.de/searchmonkey/product.html

does not appear in the Yahoo search results with SearchMonkey enhancements.

The markup is fine when checked with the Yahoo validator.

While doing so, I found that in the CACHED version, all RDFa attribute names are capitalized, e.g.

<span typeof="gr:UnitPriceSpecification">

was turned into

<SPAN TYPEOF="gr:UnitPriceSpecification">

Even worse, closing span / div elements are mostly omitted, and partly in lower caps, so that the opening and closing tags don't match.

This happens in particular for elements that have no visible content, e.g.

<span property="gr:validFrom" content="2009-07-20T00:00:00Z"></span>

turns ito

<SPAN PROPERTY="gr:validFrom" CONTENT="2009-07-20T00:00:00Z">

or

<SPAN PROPERTY="gr:validFrom" CONTENT="2009-07-20T00:00:00Z">
</span>

Consequently, PyRDFa cannot parse the cached document and when I submit the CACHED version to your validator, it does not find any markup (see screenshot).

===>  Can you please investigate this?

I think this is really a high priority issue, because you will create a lot of frustration among early adopters if their proper markup will not appear in Yahoo due to errors in the crawler or index.

Ideally, please submit

http://www.heppnetz.de/searchmonkey/product.html

to the crawler with high priority and check the results - if it works, please report.

For the community, it will likely be okay to wait a couple of weeks for inclusion or update. But it's really difficult to adopt the technology if it remains unclear whether problems at Yahoo's side are the source of observed problems while debugging requires such a delay between any iteration.

Best

Martin


--
--------------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  mhepp@...
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
       http://www.heppnetz.de/ (personal)
skype:   mfhepp twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================

Webcast:
http://www.heppnetz.de/projects/goodrelations/webcast/

Recipe for Yahoo SearchMonkey:
http://tr.im/rAbN

Talk at the Semantic Technology Conference 2009: "Semantic Web-based E-Commerce: The GoodRelations Ontology"
http://tinyurl.com/semtech-hepp

Talk at
Overview article on Semantic Universe:
http://tinyurl.com/goodrelations-universe

Project page:
http://purl.org/goodrelations/

Resources for developers:
http://www.ebusiness-unibw.org/wiki/GoodRelations

Tutorial materials:
CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey http://tr.im/grcec09

<Picture 10.png>

This page is a demo of how a small business can feed its product and offer descriptions into Yahoo! SearchMonkey and the Web of Linked Data.

Provided by the E-Business & Web Science Research Group (Prof. Hepp) at Universitaet der Bundeswehr Muenchen, www.unibw.de/ebusiness/.

Price: 34.99 USD
Product Name: Personal SCSI 16-bit SCSI Controller
Description: This low-cost, high-performance SCSI controller allows you to connect up to seven professional mass-storage devices to your computer.
EAN/UPC code: 00010363780
Article number: 10363780
Product Image
Manufacturer: Hepp Computertechnik

Product Specifications
FAQ
Product Manual
Product Reviews:

Average: 4.5, lowest: 0, highest: 5 (total number of reviews: 45)



Fri Sep 25, 2009 5:02 pm

evangoer
Offline Offline
Send Email Send Email

Forward
Message #461 of 483 |
Expand Messages Author Sort by Date

Hello Martin, The correct way to validate SearchMonkey markup is by using the validator. The raw data presented by the [Cached] link is not, and has never...
Evan Goer
evangoer
Offline Send Email
Sep 25, 2009
5:03 pm

Dear Evan: ... Okay, thanks - good to know. ... When I invoke the Yahoo validator, I still get the the message " 1. Cached Data Does Not Match The data stored...
Martin Hepp (UniBW)
hepp_m
Offline Send Email
Sep 27, 2009
8:22 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help