: PageRank: will links pointing to pages protected by robots.txt still count? If every single link to a given website www.example.com points to a page in a particular subdirectory (i.e. www.example.com/user/[something]),
If every single link to a given website example.com points to a page in a particular subdirectory (i.e. example.com/user/[something]), but that directory is off limits as per robots.txt, i.e.
User-agent: *
Disallow: /user/
because I don't want these pages appearing in google search results, am I shooting myself in the foot in terms of pagerank? Does the incoming "link-juice" still count if the target page is forbidden to Google in my robots.txt?
More posts by @Kaufman445
3 Comments
Sorted by latest first Latest Oldest Best
Yes, Google does assign PageRank to URLs that are robotted, but no, you're not shooting yourself in the foot by having such URLs or by linking to them. The time you spend on tweaking your perceived PageRank flow is generally much better spent on working on your content instead.
The only reason to watch out for this is if you're using the robots.txt disallows to control duplicate content. Since the robotted URLs can collect PageRank, and since Google can't confirm that they are duplicates (as would be possible if they could be crawled), it can result in Google indexing both the robotted, uncrawled URL as well as the crawlable version with the same content. It's much better to allow crawling of duplicate content and to use one of the usual canonicalization methods (like a 301 redirect or a rel=canonical link element).
Even if the page is not indexed by Google those pages will still get PageRank assigned to them. This means by linking to them you will "leak" PR as that PR will simply be lost instead of passed to other links. It essentially is the same as using nofollow on a link. So if you are linking to internal pages that are blocked with robots.txt you are essentially diluting the amount of PR you are passing to the allowed pages on your site.
See this blog post for more on this.
Speaking to your question - "Does the incoming 'link-juice' still count if the target page is forbidden to Google in my robots.txt?" - I would say that PageRank is calculated, even for noindex/nofollow URI's:
While Google won't crawl or index the
content of pages blocked by
robots.txt, we may still index the
URLs if we find them on other pages on
the web. As a result, the URL of the
page and, potentially, other publicly
available information such as anchor
text in links to the site, or the
title from the Open Directory Project,
can appear in Google search results.
Google Webmaster Central: Block or remove pages using a robots.txt file
Example: My "working-model.com" domain has had an all-exclusive robots.txt specified for as long as I can remember, however, a Google search for working-model.com (or a Yahoo search, or a Bing search) shows a rank for the domain (probably as a result of a domain WHOIS site linking in).
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.