Robots.txt files tells visiting robots – such as Google’s crawler – what they should and should not crawl. Part of this should be a Sitemap directive:
The Sitemap element(s) points to an XML file of all your pages of content, and when they were last updated. That way, the crawler can quickly and efficiently find new content.
Technically, the URL to the sitemap(s) must be absolute. Yes, there’s no earthly reason why this must be so, but that’s what the specification says. Fortunately, Google will handle relative URLs – so the definition shown above should work for Google – but it might not work for other robots. Continue reading “Don’t forget that Robots.txt Sitemap entries need to be absolute”
So, I was reviewing a few Sitecore log files for a customer of ours, and I kept coming across the following entries.
ManagedPoolThread #15 10:26:56 WARN The serachengine "Http://google.com/webmasters/sitemaps/ping?sitemap=sitemap.xml" returns an 404 error
ManagedPoolThread #15 10:26:57 WARN The serachengine "http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=sitemap.xml" returns an 404 error
This struck me as interesting. These are calls to different search providers to tell them that your site’s sitemap file has been updated, and should be read again (so that new content could be indexed). This is all from a nice little module on Sitecore Marketplace. We use it quite a lot. However, I spotted a few issues… Continue reading “Awkward Sitemap XML module”
In a robust web-based system one would expect to have multiple servers serving content. In Sitecore, this is typically that you have a content management server, and then one or more content delivery servers.
Sitecore also offers a ‘Sitemap XML module’, which is pretty neat. When you publish content it will generate an updated Sitemap.xml file – basically, a listing of pages that a web-crawler like Googlebot can use to crawl and index a site. It can also ping the major search engines, telling them that the sitemap has been updated, and that they should recrawl it at their leisure.
However, things get trickier where you have content delivery servers. Continue reading “Configure Sitecore to update Sitemaps.xml on ALL servers”