Mobile app version of vmapp.org
Login or Join
Gonzalez347

: How does Google treat underscores in site map URLs? Google is currently reporting that my URLs are invalid within my sitemap. Here's an example of a document that was considered erroneous by

@Gonzalez347

Posted in: #Seo #Sitemap #WebCrawlers

Google is currently reporting that my URLs are invalid within my sitemap. Here's an example of a document that was considered erroneous by Google's validator:



For each of the URLs in the map, Google reported


URL not allowed | This url is not allowed for a Sitemap at this location.


With a little searching, I read that Google combines underscore-separated strings inside of URLs, which would mean that www.example.com/foo_bar_baz would be read by the bot as www.example.com/foobarbaz, which may not actually be an existing page (which would render an HTTP 404). Am I wrong? If not, is there any way to circumvent this issue?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Gonzalez347

1 Comments

Sorted by latest first Latest Oldest Best

 

@Alves908

The error, This url is not allowed for a Sitemap at this location. is actually rather self explanatory. It has nothing to do with underscores (_) which are perfectly valid.

Simply put, you cannot refer to another domain name within your sitemap. A sitemap must refer to the domain name on which it resides. The purpose for a sitemap is to inform a crawler of your site's URLs. In your example, it appears you are referring to BuzzFeed.com.

From: en.wikipedia.org/wiki/Sitemaps

The Sitemaps protocol allows a webmaster to inform search engines
about URLs on a website that are available for crawling. A Sitemap
is an XML file that lists the URLs for a site. It allows
webmasters to include additional information about each URL: when it
was last updated, how often it changes, and how important it is in
relation to other URLs in the site. This allows search engines to
crawl the site more intelligently. Sitemaps are a URL inclusion
protocol and complement robots.txt, a URL exclusion protocol.


From: www.sitemaps.org/index.html

Sitemaps are an easy way for webmasters to inform search engines
about pages on their sites that are available for crawling. In its
simplest form, a Sitemap is an XML file that lists URLs for a site
along with additional metadata about each URL (when it was last
updated, how often it usually changes, and how important it is,
relative to other URLs in the site) so that search engines can more
intelligently crawl the site.


From: stackoverflow.com/questions/1702004/why-wont-google-accept-my-sitemap-xml-url-not-allowed-this-url-is-not-allowed

That error usually means that you have an URL pointing to a different
Domain from yours.


Another typical reason for the error is when a relative URL is offered instead of an absolute URL. The URL must include the protocol HTTP or HTTPS, the domain name of your site, and the URI for the page. The URL must be fully valid including a trailing slash where the web server does not provide one. It is likely better to always provide a trailing slash.

With sitemaps.org, Google, Wikipedia, there seems to be no reference to the requirement that the URL location refer to the site on which the sitemap resides. However, the purpose of the sitemap makes it clear that the sitemap is to aide a crawler to access pages within the site for which the sitemap is created for and that the sitemap for the domain must reside on the domain preferably/traditionally within the root though not required.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme