Awkward Sitemap XML module

So, I was reviewing a few Sitecore log files for a customer of ours, and I kept coming across the following entries.

ManagedPoolThread #15 10:26:56 WARN The serachengine "Http://google.com/webmasters/sitemaps/ping?sitemap=sitemap.xml" returns an 404 error
ManagedPoolThread #15 10:26:57 WARN The serachengine "http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=sitemap.xml" returns an 404 error

Capture2
This struck me as interesting. These are calls to different search providers to tell them that your site’s sitemap file has been updated, and should be read again (so that new content could be indexed). This is all from a nice little module on Sitecore Marketplace. We use it quite a lot. However, I spotted a few issues…

Invalid sitemap path

Firstly, the sitemap paths … aren’t paths. All that tells the web services is that there is a sitemap somewhere on the Internet that needs updated. The full path should be used – something like:

http://google.com/webmasters/sitemaps/ping?sitemap=http://www.example.com/sitemap.xml

So why wasn’t that being used?

“serverUrl” setting not used for search engine ping

The SitemapXML.config file contains a setting “serverUrl” that lets you set, well, the root path for a sitemap. This is then used in the sitemap xml file itself, which is good.

 

Capture

 

Unfortunately, this is not used in the search engine ping – thus the problem above. I decompiled the SitemapXML module’s code to check, and it really isn’t used in the ping bit. at all.

Side-note: The search engine ping won’t happen at all unless the ‘productionEnvironment’ setting is set to ‘true’

Error handling has an Error…

During decompilation I checked the code that actually performs the ping to the search engines. This is it below. Basically, it creates an HttpWebRequest object, and then calls GetResponse(). So far, so normal…

ReCaptured

Unfortunately, if there is an exception OF ANY sort, it logs an error message saying that the request returned a 404 error. This is sad – The calls to the Google ping service were returning a much more informative 400 – Bad Request.

Yahoo’s service no longer exists…

The Sitemap XML module comes pre-configured with 3 search providers that it will ping (or try to):

Capture4

Unfortunately, Yahoo’s Ping service now uses Bing – so it is redundant. And a Ping for Bing would be useful. Oh, wait, the ‘Live Search’ ping is actually forwarded to Bing, so that works – if you submit the right url. Might was well use the Bing URL of…

http://www.bing.com/webmaster/ping.aspx?siteMap=

…directly too.

The Fix…

So, to fix this:

  • Remove Yahoo. You don’t need it.
  • Update Live Search to use Bing.
  • Update the URLs being pinged in Sitecore to include the site’s absolute path:

Capture5

  • Republish all the stuff you just updated.
Advertisement
Awkward Sitemap XML module

One thought on “Awkward Sitemap XML module

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.