Hi All,
I have spotted a bug in the url escaping code in the BOSS API. I get back the
following XML fragment, and you can see that the <url>http://...</url> has
backticks in it (`), well two of them to be exact, and this is not legal.
I say it is a bug, because as per http://www.ietf.org/rfc/rfc2396.txt
on page 10 it states that the following characters are "unwise", see section
"2.4.3. Excluded US-ASCII Characters".
<!--
<result>
<abstract><![CDATA[Press Release Summary: A scheme piloted by
World-Dating-Partner.com has recently <b>...</b> Somerset, <b>TA7 8LH</b>.
Website: http://www.world-dating-partne<wbr>r.com. Tel: <b>...</b>]]></abstract>
<clickurl>http://lrd.yahooapis.com/_ylc=X3oDMTVmYzBuMzNqBF9TAzIwMjMxNTI3MDIEYXBw\
aWQDdDBZdnZFalYzNEdpNGIwVjN6RzBsMHlLTXJ5OXNvMVdEVnBaN1R1ZVhkeTE0N1ptSWZZaEI2Y1A0\
RGNaR1ZOSVlyWEhUQ0UEY2xpZW50A2Jvc3MEc2VydmljZQNCT1NTBHNsawN0aXRsZQRzcmNwdmlkAzc2\
MXRJbUtJY3JwZ1hINDFZcTRHSHl3ZXNoRWdRRXhRVHJzQUNZTjc-/SIG=14p4eg9li/**http%3A//wa\
shington-press-release.com/41/Study%2520Addresses%2520%60Cross-Selling%60%2520Wi\
thin%2520the%2520Dating,%2520Adult%2520Dating%2520Arena.php</clickurl>
<date>2010/07/01</date>
<dispurl><![CDATA[<b>washington-press-release.com</b>/<wbr>41/Study
Addresses `Cross-Se<b>...</b>]]></dispurl>
<size>57368</size>
<title><![CDATA[Study Addresses `Cross-Selling` Within the Dating, Adult
<b>...</b>]]></title>
<url>http://washington-press-release.com/41/Study%20Addresses%20`Cross-Selling`%\
20Within%20the%20Dating,%20Adult%20Dating%20Arena.php</url></result>
-->
I guess the important line is :
<!--
<url>http://washington-press-release.com/41/Study%20Addresses%20`Cross-Selling`%\
20Within%20the%20Dating,%20Adult%20Dating%20Arena.php</url>
-->
Sadly, it is not as simple as me URL decoding and then URL encoding the field,
but I will make do with having to manually escape this character for now.
Is there a bug tracker I should be using? Or is there a list of known illegal
characters which the BOSS API returns which I can escape by hand?
What I noticed was that the url in the <clickurl/> element is correctly encoded
as %60, which is odd I guess ...
Anyways, cheers people, keep up the good work !
Mischa