: Prevent bing from crawling thousands of essentially identical pages? I have a web page with a dozen tables of data on it, each with half a dozen columns. Every table can be sorted by a
I have a web page with a dozen tables of data on it, each with half a dozen columns. Every table can be sorted by a column by clicking on the relevant header, and these get appended to the querystring.
e.g. a page with three tables sorted by column 4, 6, and descending 3:
page.html?s1=4&s2=6&s3=-3
etc.
I have nofollow links on the column headers, and
<link rel="canonical" href="page.html">
on the page.
But bing still crawls its way through thousands of combinations. 5772 of them yesterday!
I've marked s1/s2/s3/s4... as parameters to ignore (a long time ago), but that's not helped.
How can I prevent it from doing this? It's unnecessary server load for no gain.
More posts by @Turnbaugh106
1 Comments
Sorted by latest first Latest Oldest Best
You could tell Bing, and other webcrawlers, what to spider and what to ignore using a file called robots.txt in the root of your website.
You can tell specific or all crawlers to ignore specific urls.
in your case
User-Agent: *
Disallow: /*?s1=*&s2=*&s3=*
you might need to make small changes in the Disallow line depending on the parameters used in your site.
More on robots.txt files here
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.