Mobile app version of vmapp.org
Login or Join
Turnbaugh106

: Prevent bing from crawling thousands of essentially identical pages? I have a web page with a dozen tables of data on it, each with half a dozen columns. Every table can be sorted by a

@Turnbaugh106

Posted in: #Bingbot #CanonicalUrl #WebCrawlers

I have a web page with a dozen tables of data on it, each with half a dozen columns. Every table can be sorted by a column by clicking on the relevant header, and these get appended to the querystring.

e.g. a page with three tables sorted by column 4, 6, and descending 3:

page.html?s1=4&s2=6&s3=-3


etc.

I have nofollow links on the column headers, and

<link rel="canonical" href="page.html">


on the page.

But bing still crawls its way through thousands of combinations. 5772 of them yesterday!

I've marked s1/s2/s3/s4... as parameters to ignore (a long time ago), but that's not helped.

How can I prevent it from doing this? It's unnecessary server load for no gain.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Turnbaugh106

1 Comments

Sorted by latest first Latest Oldest Best

 

@Phylliss660

You could tell Bing, and other webcrawlers, what to spider and what to ignore using a file called robots.txt in the root of your website.

You can tell specific or all crawlers to ignore specific urls.

in your case

User-Agent: *
Disallow: /*?s1=*&s2=*&s3=*


you might need to make small changes in the Disallow line depending on the parameters used in your site.

More on robots.txt files here

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme