: Should I add a "nofollow" attribute to download links, or disallow the URLs in robots.txt? I have a download link very similar to Opera's one - it's just a script that sends the file. It
I have a download link very similar to Opera's one - it's just a script that sends the file. It doesn't have an extension and there's no obvious way to tell that it's actually a download link.
So since I don't want robots to crawl this link, do I need to add it to robots.txt or maybe add a "nofollow" attribute to it? I see that on Opera's website they didn't do either of this, so perhaps it's not necessary?
More posts by @Samaraweera270
5 Comments
Sorted by latest first Latest Oldest Best
While I understand your question, I am curious why you wouldn't want it indexed, I have recently been doing some reading on SEO for non-HTML documents (like .pdf and .docx).
Favoring transparency, I figure if it isn't a payment page or something of that sort (confidential info, not relevant to my site) then I may as well milk it for all it is worth, SEO-wise...and people love free stuff (like downloads).
Anyway, I think it is good practice to make it clear (normally visually) that download links are downloads (I am guessing by context that you already have by styling it as a download button or something). Further, while I don't know if it is still used by Yahoo! (or has been adopted by others) they utilize(d) (in 2008) a 'noindex' class for some content
You don't need to put nofollow on any links which are perfectly trustworthy, It's not needed, there's no reason to try and sculpt your page rank if that's what your aiming for.
From SEOMoz and Matt Cutts
Does the Number of Outbound Links from a Page Affect PageRank?
For instance, to conserve "link juice" and/or funnel it more discretely, does it matter whether I have three outbound links versus
two? In the original PageRank formula, yes, juice flowed out in a
simple formula of Passable PR divided by number of outbound links. But
nowadays, Matt says it is a much more cyclical, iterative analysis
and, "it really doesn't make as much difference as people suspect."
There's no need to hoarde all of your link juice on your page and, in
fact, there may be benefit to generously linking out (not the least of
which is the link-building power of good will).
www.seomoz.org/blog/whiteboard-interview-googles-matt-cutts-on-redirects-trust-more
On the other hand are your trying to hide the download file from everyone but authorized users? If so you should be going about it another way using a script to initiate the download when the user is authorized.
If the file doesn't matter just link to it.
To stop search engines and other robots from requesting certain URLs, you need to disallow those URLs in your robots.txt file.
Adding the rel="nofollow" attribute to links might not actually stop robots from following those links — it just tells search engines that those links should not be considered as endorsements of any kind, e.g. because they might have been added by untrusted third parties, and, in particular, that they should not be taken into account when calculating PageRank or other similar link metrics. However, the exact details might vary between search engines, and possibly also over time for any given search engine.
There exists also the similarly name nofollow attribute for the robots meta tag which does instruct compliant search engines not to follow any links from the tagged page. However, that tag applies to all links on the page, not just to some particular ones. Also, neither meta tags nor rel="nofollow" will stop search engines from crawling your download URLs if they find them through some link that they are allowed to follow, either because you forgot to put the tag on some page that links to the download URL, or because someone else copied the URL to another site. Thus, in this situation, robots.txt is the only reliable solution.
Neither should actually be necessary. The 'extension' at the end of a file or URL is practically meaningless - it gives a hint to the filetype but obviously files can be named anything you like. On the web, the way to specify a file type is in the mime type, given through the Content-type HTTP header. For example, a JPEG image has the mime type image/jpeg.
Before downloading a file, a program will first check the 'HEAD' response from the server so it knows what to do with the content. Search engines do the same, and assuming you are sending correct headers (for example application/zip for .zip files or application/octet-stream for generic binary files) they will avoid downloading the actual content and stick to things they recognise like web pages and images.
Having said all this, it probably is best to block access to the URLs to be on the safe side. I would say robots.txt is the simplest and best way to do this. You can block the script or folder the script is in and that will apply for all future downloads you add, too.
There's no reason not to do both. You could consider requiring an Http-Referer from within your site too - the only bot I've seen which sends referrer headers isn't actually a spider but a distributed attempt to exploit the ASP.Net Viewstate oracle padding vulnerability.
I have one directory which I do this on. The .htaccess for that directory is
RewriteEngine On
RewriteCond %{HTTP_HOST};%{HTTP_REFERER} !^([^;]+);http://1(/.*)?$
RewriteRule .* /foro.htm [R]
(where /foro.htm is the page which links into the directory - that way people following links from other sites aren't completely messed up).
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.