: There are 2 main ways to prevent search engines from indexing specific pages: A Robots.txt file for your domain. The Meta Robots tag on each page. Robots.txt should be your first stop for
There are 2 main ways to prevent search engines from indexing specific pages:
A Robots.txt file for your domain.
The Meta Robots tag on each page.
Robots.txt should be your first stop for URL patterns that match several files. You can see the syntax here and more detailed here. The robots.txt file must be placed in the root folder of your domain, i.e. at www.yourdomain.com/robots.txt , and it would contain something like:
User-agent: *
Disallow: /path/with-trailing-slash/
(The text coloring above is done by the Stackexchange software, and should be ignored.)
The Meta Robots tag is more flexible and capable, but must be inserted in every page you want to affect.
Again Google has a overview of how to use Meta Robots, and how to get pages removed from their index via Webmaster Tools. Wikipedia has more comprehensive documentation on Meta Robots, including the search engine specific derivations.
If you want to prohibit Google, The Web Archive and other search engines from keeping a copy of your webpage, then you want the following tag (shown in HTML4 format):
<meta name="robots" content="noarchive">
To prevent indexing and keeping a copy:
<meta name="robots" content="noindex, noarchive">
And to prevent both of the above, as well as using links on the page to find more pages to index:
<meta name="robots" content="noindex, nofollow, noarchive">
NB 1: All 3 above meta tags are for search engines alone -- they do not impact HTTP proxies or browsers.
NB 2: If you already have pages indexed and archived, and you block pages via robots.txt while at the same time adding the meta tag to the same pages, then the robots.txt will prevent search engines from seeing the updated meta tag.
More posts by @Becky754
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.