: How does robots.txt handle links to disallowed pages? We have affiliates that link to our site with a querystring. e.g. <a href="http://www.mysite.com/affiliate?id=123">Affiliate Link</a>
We have affiliates that link to our site with a querystring. e.g.
<a href="http://www.mysite.com/affiliate?id=123">Affiliate Link</a>
In my robots.txt file, i have:
Disallow: /affiliate/*?*
What would a search engine do with the link? Will it hit our site, then stop because it's disallowed? I'm guessing it'll index it, but will it index with the query string?
Ultimately, we want the link to our site to be followed, but we don't want the query string'd link to be ranked for obvious reasons. How does this work and how can this be achieved?
If the original link had a rel="nofollow", will google completely ignore that link?
More posts by @Bryan171
2 Comments
Sorted by latest first Latest Oldest Best
An affiliate ID used for tracking purposes can be considered similar to a session ID, in that different affiliate links can lead to the same page and content, thus resulting in duplicate content.
Therefore to disallow affiliate links from being crawled (resulting in duplicate content), from the "Pattern matching" section in Google Webmaster Tools - Block or remove pages using a robots.txt file:
The Disallow: / *? directive will block any URL that includes a ?
(more specifically, it will block any URL that begins with your domain
name, followed by any string, followed by a question mark, followed by
any string).
Even more specific to URL's with ?id= in them, you could simply have:
Disallow: /*?id=
If the affiliate links redirect to custom affiliate pages with similar content, you should specify a canonical URL in these pages to point to the preferred version of the page you want indexed. For more on this, see: Google Webmaster Tools - About rel="canonical"
The "nowfollow" attribute in rel="nofollow" tells search engines not to follow the links on a page, or don't follow this specific link, which isn't necessary if you're blocking them in your robots.txt. As Google indicates, this gives webmasters more "granular control" to tell robots not to crawl specific links or pages.
If the original link had a rel="nofollow", will google completely ignore that link?
No, it only means, that you do not trust this link and Google will not pass your site's authority to the recipient.
Disallow: /affiliate/*?*
This directive says Google not to index all the URLs with /affiliate/, which have some additional parameters with ?. But the directory /affiliate/ will be accessible for Google.
To resolve all your question about robots.txt file use Google Webmasters Tools' tool. That tool let you check all the URLs you want for availability to crawl.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.