Don’t forget that Robots.txt Sitemap entries need to be absolute

Robots.txt files tells visiting robots – such as Google’s crawler – what they should and should not crawl. Part of this should be a Sitemap directive:

Sitemap

The Sitemap element(s) points to an XML file of all your pages of content, and when they were last updated. That way, the crawler can quickly and efficiently find new content.

Technically, the URL to the sitemap(s) must be absolute. Yes, there’s no earthly reason why this must be so, but that’s what the specification says. Fortunately, Google will handle relative URLs – so the definition shown above should work for Google – but it might not work for other robots.Sadly, the SitemapXML module for Sitecore automatically adds an entry for the maps it generates – but it’s on a relative URL.

The best bet for this is to double up – manually put your absolute URLs into the robots.txt file, and let the SitemapXML module add the (unnecessary) relative URLs  too.

Sitemap fixed

It’s less than ideal, but that should work.

Advertisements
Don’t forget that Robots.txt Sitemap entries need to be absolute

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s