: Preventing a link on a page from being indexed and followed I have read the post about how nofollow value impacts on crawlers/indexing. However the information from the post answers end on 2012.
I have read the post about how nofollow value impacts on crawlers/indexing. However the information from the post answers end on 2012.
In my case, a have a file page that is indexed by Google. On this page, however, there's a download link (ending with ?download). I am not sure, whether the link was index as well but I would really like to know this: how can I make sure a link on a page doesn't get indexed and followed?
Do I use the robots.txt file and pass something like this? --
Disallow: /*?download$
Or do I simply put rel="nofollow" on the download link?
Also, what do I do about the potentially already-indexed download links?
Thanks!
UPDATE:
According to Google's nofollow docs:
In general, we don't follow them. This means that Google does not
transfer PageRank or anchor text across these links. Essentially,
using nofollow causes us to drop the target links from our overall
graph of the web. However, the target pages may still appear in our
index if other sites link to them without using nofollow, or if the
URLs are submitted to Google in a Sitemap.
So, I suppose that stating that the "pages may still appear in our index" suggests that the links using nofollow are generally not being indexed - with exceptions mentioned in the docs.
I think that kind of settles it, but if anyone has extra information to back this, feel welcome.
More posts by @Gretchen104
3 Comments
Sorted by latest first Latest Oldest Best
You should use `rel="nofollow" for external links in your page, for example links to articles on other blogs or products etc..
And use Disallow in robots.txt for internal pages.
If you really want to prevent a link from being indexed or followed, you can go extreme as follows:
If you're using a server-side scripting language or have sufficient apache access, then modify code so that the page to not be indexed will have an HTTP 410 status code attached to it, meaning the page is gone for good. This will effectively cause previous the page to be removed from google's index.
In the HTML between <head> and </head> add <meta name="ROBOTS" content="NOINDEX,NOFOLLOW"> to instruct robots to not index the page.
Now as for any secret pages you want to make in the future, I suggest making a form but with the method of POST instead of GET, and with the following type of code:
<form method="POST" action="http://example.com/path/to/secret">
<input type="submit" value="button label">
</form>
That way, you can use scripts to prevent users from accessing that secret URL from solely typing it in manually in the address bar.
Nothing in the world will make a real link undiscoverable by Google.
Even if you close your example.com/page?download from crawling with robots.txt, de-index the download page with noindex, and markup the link with nofollow - it is enough to get one single incoming backlink to your example.com/page?download, and the page is crawled.
Thats why use better button instead of link
<form method="get" action="file.exe">
<button type="submit">Download</button>
</form>
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.