Sitemaps

Top  Previous  Next

This option allows you to generate a plain text sitemap (compatible with Yahoo Sitemaps) and a XML sitemap (compatible with Google Sitemaps) for your website at the same time as having Zoom index your website.

Sitemaps can be useful for submitting your websites to Internet-wide search engines such as Google and Yahoo, helping their spiders index your website more quickly and increasing your web presence. You may also find it useful for maintenance or other development purposes. The sitemap includes information such as the last modified date and the relative priority of the page within the website. Note that you should only create sitemaps for individual websites (and not sitemaps that span multiple websites).

lightbulb

Tip: Note that you can use Zoom's Incremental Indexing to update your search index, as well as your sitemap, in one go without having to re-spider your entire website.

Text file URL list (Yahoo Sitemap compatible)

By selecting this option, Zoom will create a file named "urllist.txt" in your output directory at the end of indexing. This is a simple text file containing a full list of the web pages that have been indexed, compliant with the Yahoo Sitemap requirements, and can be submitted to Yahoo for search engine submissions and other search engines accepting a similar sitemap format. It could also, of course, be used for your personal development or maintenance purposes, to keep track of the active files on your website.

XML sitemap (Google Sitemap compatible)

By selecting this option, Zoom will create a XML sitemap which is compatible with the Google Sitemap specifications. The files will be created in your output directory at the end of indexing.

As required by the Google Sitemap protocol, each XML file can only contain a maximum of 50,000 URLs. Multiple XML sitemap files are created when a website exceeds this number of pages (named "sitemap.xml", "sitemap2.xml", "sitemap3.xml" and so forth). A sitemap index file is also created which contains a list of these individual sitemap files ("sitemap_index.xml").

For more information on the Google Sitemap protocol, you can visit the official web page here:
https://www.google.com/webmasters/sitemaps/docs/en/protocol.html

Sitemap Base URL

Note that a Sitemap Base URL is required to generate a XML sitemap. This is necessary for two reasons:

 

1.To determine the URL of your split sitemap files (e.g. "sitemap2.xml", etc.) should it be necessary to generate a sitemap index ("sitemap_index.xml") as explained above.
2.To exclude any indexed URLs which are under a different domain (or base URL) to the location of the sitemap file, as required by Google's Sitemap specifications.

 

With the default option "Include only URLs within the Sitemap Base URL" selected, the latter requirement is enforced to obey Google requirements for sitemap files to only contain URLs which are below the base URL of the sitemap file. For example, you can not normally have a URL such as "http://www.myotherdomain.com/mypage.html" included in a sitemap file located at "http://www.mysite.com/sitemap.xml".

 

However, new Google rules make an exception for this when your multiple domains/sites have been verified with Google. In this case, you can submit a sitemap file containing URLs to multiple domains (known as "cross submits"). To create a sitemap for "cross submits" in Zoom, you should select the option to "Include all indexed URLs". Find out more about verifying your sites with Google here:

http://www.google.com/support/webmasters/bin/answer.py?answer=35179

lightbulb

Note: When "Include only URLs within the Sitemap Base URL" is selected, Zoom will automatically exclude any URLs from your XML sitemap if it is not within the specified Sitemap Base URL, even though that URL may have been indexed (or it may be a recommended link). This means that your XML sitemap may not contain all the URLs indexed if this is setup incorrectly or if you are indexing with multiple base URLs. See "Uploading your XML sitemap files" below for more information.

By default, each XML sitemap file will contain the URL and the last modified date of the pages that were found on your website. There are also the following options available:

Use ZOOMPAGEBOOST values for Priority field in sitemap (XML only)

By selecting this option, the XML sitemap file(s) will also contain a priority value (from 0.0 to 1.0) which indicates the importance of a page, relative to the rest of your website. It will determine this value based on the ZOOMPAGEBOOST meta tag which is also used for prioritizing pages within Zoom's index (see "Weightings (Page Boosting)" for more information).

The default priority is 0.5, for any page where this attribute is not specified. These values will correspond to the ZOOMPAGEBOOST value (where a -5 page boost is equivalent to 0.0 priority, 0 page boost is 0.5 priority, and +5 page boost is 1.0 priority).

Uploading your sitemap files

The sitemap files can also be automatically uploaded at the end of indexing, along with your search files. You will need to enable this option on the Sitemaps tab and the "Automatically upload files at the end of indexing" option on the FTP tab of the Configuration window.

You could also upload your sitemap files manually using a third party FTP client if you prefer.

Notes on uploading Google XML Sitemap files

You may need to specify a different folder or path on the server for your sitemap files, as the Google Sitemap protocol requires the files to be located at the base URL of the files which have been scanned. This means that if your sitemap files contain URLs such as:

http://www.mysite.com/index.html
http://www.mysite.com/story/page1.html
http://www.mysite.com/news/archive/test.html

Then your sitemap files must be hosted at the common base URL of the above files, and that is the root directory at http://www.mysite.com/

However, if your sitemap only contains URLs such as:

http://www.mysite.com/news/index.html
http://www.mysite.com/news/page1.html
http://www.mysite.com/news/archive/test.html
Then you should have your sitemap files hosted at: http://www.mysite.com/news/ and this should be your Sitemap Base URL.

Note: the folder/path specified in this window should either be a path which is relative to your home directory (eg. "public_html/news/") or an absolute path on the server (eg. "/usr/home/myaccount/public_html/news/"). The folder path should NOT be a URL and should NOT start with "http://".

 
For more information on the Google Sitemap protocol requirements of the location of your sitemap files, please visit their webpage here:
https://www.google.com/webmasters/sitemaps/docs/en/protocol.html#sitemapLocation

Tip: Submitting your XML sitemap to search engines
If you anticipate that your website will not grow any larger than 50,000 pages, you can simply submit the URL to your "sitemap.xml" file to Google or other search engines using the Google Sitemap protocol format. However, if you anticipate your website growing larger than 50,000 pages at some point, we would recommend submitting the URL to your "sitemap_index.xml" page instead. This would allow your submission to stay valid even when your website grows beyond this limit, and you need to have more individual sitemap files in order to list your entire website.