Don’t forget that Robots.txt Sitemap entries need to be absolute

2015-12-082015-11-04 Andy BurnsRobots, sitemapsLeave a comment

Robots.txt files tells visiting robots – such as Google’s crawler – what they should and should not crawl. Part of this should be a Sitemap directive:

The Sitemap element(s) points to an XML file of all your pages of content, and when they were last updated. That way, the crawler can quickly and efficiently find new content.

Technically, the URL to the sitemap(s) must be absolute. Yes, there’s no earthly reason why this must be so, but that’s what the specification says. Fortunately, Google will handle relative URLs – so the definition shown above should work for Google – but it might not work for other robots. Continue reading “Don’t forget that Robots.txt Sitemap entries need to be absolute” →

Problems with SitemapXML module for Sitecore

2015-12-042015-11-04 Andy BurnssitemapsLeave a comment

Previously, I’ve looked at some of the problems with the SitemapXML module for Sitecore. Well, here are another couple that have caught me a few times… Continue reading “Problems with SitemapXML module for Sitecore” →

Awkward Sitemap XML module

2015-07-222015-09-11 Andy Burnssitemaps1 Comment

So, I was reviewing a few Sitecore log files for a customer of ours, and I kept coming across the following entries.

ManagedPoolThread #15 10:26:56 WARN The serachengine "Http://google.com/webmasters/sitemaps/ping?sitemap=sitemap.xml" returns an 404 error ManagedPoolThread #15 10:26:57 WARN The serachengine "http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=sitemap.xml" returns an 404 error

This struck me as interesting. These are calls to different search providers to tell them that your site’s sitemap file has been updated, and should be read again (so that new content could be indexed). This is all from a nice little module on Sitecore Marketplace. We use it quite a lot. However, I spotted a few issues… Continue reading “Awkward Sitemap XML module” →

Configure Sitecore to update Sitemaps.xml on ALL servers

2015-05-012015-05-01 Andy Burnssitemaps1 Comment

In a robust web-based system one would expect to have multiple servers serving content. In Sitecore, this is typically that you have a content management server, and then one or more content delivery servers.

Sitecore also offers a ‘Sitemap XML module’, which is pretty neat. When you publish content it will generate an updated Sitemap.xml file – basically, a listing of pages that a web-crawler like Googlebot can use to crawl and index a site. It can also ping the major search engines, telling them that the sitemap has been updated, and that they should recrawl it at their leisure.

However, things get trickier where you have content delivery servers. Continue reading “Configure Sitecore to update Sitemaps.xml on ALL servers” →

Andy Burns' Blog

Whatever I'm working on

sitemaps

Don’t forget that Robots.txt Sitemap entries need to be absolute

Problems with SitemapXML module for Sitecore

Awkward Sitemap XML module

Configure Sitecore to update Sitemaps.xml on ALL servers