: Robots.txt to block a parameter instead of a directory It's my understanding that URLs are of the format example.com/something/somethingelse. It's my understanding that parameters follow the URL
It's my understanding that URLs are of the format example.com/something/somethingelse.
It's my understanding that parameters follow the URL with a question mark example.com?l=fr_FR
My CMS website has language translations that use parameters. The example above is for the French language version of my site.
I would like to block all non english translations from Google Index using robots.txt.
In the blocked URLs tool in GWT I tried to test this:
# robots.txt generated at www.mcanerin.com User-agent: *
Disallow:
Disallow: /cgi-bin/
Disallow: ?l=
Against the following URL, which appeared as one of the duplicate page titles in HTML improvements.
example.com/reports/view/884?l=eu
This is my first time fiddling with this tool in GWT so I'm not sure if I'm using it right.
The test results for Googlebot says
Allowed by line 3: Disallow:
I wanted the code to prevent Google indexing any URLs that contain the following string
?l=
Here are some examples of URLs I'd like to block from the index. These URLs generate duplicate titles according to GWT.
/reports/view/884?l=km
/reports/view/884?l=ne_NP
/reports/view/884?l=te
/index.php/page/index/12?l=fr_FR&l=hy_AM
/index.php/page/index/12?l=ht_HT&l=bn_BD
/index.php/page/index/12?l=hu_HU&l=hy_AM
Can I tell the robots to exclude URLs with tags that contain
?l=
More posts by @Twilah146
1 Comments
Sorted by latest first Latest Oldest Best
You can block URLs that contain ?l= from being indexed by search engine robots by using the following robots.txt directive:
Disallow: /*?l=
The / indicates the root directory, and * is a wildcard for anything up to ?l=, followed by anything after it.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.