: Google indexing page with parameters but page is Disallowed in robots.txt I have the following in robots.txt: User-agent: * Disallow: /refer.php User-agent: NinjaBot Allow: / Sitemap: http://www.mysite.com/sitemap.xml
I have the following in robots.txt:
User-agent: *
Disallow: /refer.php
User-agent: NinjaBot
Allow: /
Sitemap: www.mysite.com/sitemap.xml
The refer.php file does various things depending on what GET parameters are passed to it.
When I do a Google search, I see tons of results for pages like this:
www.mysite.com/refer.php?o=23945 http://www.mysite.com/refer.php?o=39858 www.mysite.com/refer.php?o=9683 http://www.mysite.com/refer.php?o=10569 www.mysite.com/refer.php?o=58304 http://www.mysite.com/refer.php?o=69604
Is the reason that Google is indexing these because I don't have an asterisk * after refer.php in the robots.txt ? Should changing it to Disallow: /refer.php* fix the problem?
More posts by @Holmes151
3 Comments
Sorted by latest first Latest Oldest Best
Your robots.txt is just fine. However, it might not be enough to totally prevent indexing: Disallow command in robots.txt will block crawling, but in some cases the URLs themselves will still be indexed because of links or other factors.
Robots.txt is not meant to prevent the indexing of URLs, its purpose is to prevent crawling.
Best way to prevent Google from indexing an URL is to use this in the document head:
<meta name="robots" content="noindex" />
Google Help:
While Google won't crawl or index the content of pages blocked by
robots.txt, we may still index the URLs if we find them on other pages
on the web. As a result, the URL of the page and, potentially, other
publicly available information such as anchor text in links to the
site, or the title from the Open Directory Project (www.dmoz.org), can
appear in Google search results.
Add:
Disallow: /refer.php?*
To your robots.txt. Googlebot understand the wildcard and is the most explicit way to tell them not to index the URLs you want.
For working with all robots, try without the trailing * but do test using the Google Webmaster Tools robot tester to make sure Googlebot will be blocked.
You shouldn't need an asterisk after, as leaving the path open without a dollar sign should match anything after. Maybe as its ending in php is causing an issue. In this case I might try:
Disallow: /*refer.php?
Also maybe obvious, but how long has the robots.txt been in place? I have seen Google take up too and over couple of weeks before updating the SERPS to reflect robots.txt changes.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2025 All Rights reserved.