So, I was reviewing a few Sitecore log files for a customer of ours, and I kept coming across the following entries.
ManagedPoolThread #15 10:26:56 WARN The serachengine "Http://google.com/webmasters/sitemaps/ping?sitemap=sitemap.xml" returns an 404 error
ManagedPoolThread #15 10:26:57 WARN The serachengine "http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=sitemap.xml" returns an 404 error
This struck me as interesting. These are calls to different search providers to tell them that your site’s sitemap file has been updated, and should be read again (so that new content could be indexed). This is all from a nice little module on Sitecore Marketplace. We use it quite a lot. However, I spotted a few issues…
Invalid sitemap path
Firstly, the sitemap paths … aren’t paths. All that tells the web services is that there is a sitemap somewhere on the Internet that needs updated. The full path should be used – something like:
http://google.com/webmasters/sitemaps/ping?sitemap=http://www.example.com/sitemap.xml
So why wasn’t that being used?
“serverUrl” setting not used for search engine ping
The SitemapXML.config file contains a setting “serverUrl” that lets you set, well, the root path for a sitemap. This is then used in the sitemap xml file itself, which is good.
Unfortunately, this is not used in the search engine ping – thus the problem above. I decompiled the SitemapXML module’s code to check, and it really isn’t used in the ping bit. at all.
Side-note: The search engine ping won’t happen at all unless the ‘productionEnvironment’ setting is set to ‘true’
Error handling has an Error…
During decompilation I checked the code that actually performs the ping to the search engines. This is it below. Basically, it creates an HttpWebRequest object, and then calls GetResponse(). So far, so normal…
Unfortunately, if there is an exception OF ANY sort, it logs an error message saying that the request returned a 404 error. This is sad – The calls to the Google ping service were returning a much more informative 400 – Bad Request.
Yahoo’s service no longer exists…
The Sitemap XML module comes pre-configured with 3 search providers that it will ping (or try to):
Unfortunately, Yahoo’s Ping service now uses Bing – so it is redundant. And a Ping for Bing would be useful. Oh, wait, the ‘Live Search’ ping is actually forwarded to Bing, so that works – if you submit the right url. Might was well use the Bing URL of…
http://www.bing.com/webmaster/ping.aspx?siteMap=
…directly too.
The Fix…
So, to fix this:
- Remove Yahoo. You don’t need it.
- Update Live Search to use Bing.
- Update the URLs being pinged in Sitecore to include the site’s absolute path:
- Republish all the stuff you just updated.
[…] I’ve looked at some of the problems with the SitemapXML module for Sitecore. Well, here are another couple that have caught me a few […]