Mobile app version of vmapp.org
Login or Join
Pope3001725

: GoogleBot is crawling my datepicker to inifnity Our website is an agenda of events taking place in our city. We have a datepicker for users to look at events scheduled in the next week, next

@Pope3001725

Posted in: #Googlebot #Seo

Our website is an agenda of events taking place in our city.

We have a datepicker for users to look at events scheduled in the next week, next month, etc...

We recently discovered that GoogleBot is crawling this datepicker, requesting events centuries in the future: it crawls URLs like
example.com/2208-01-01/
Is there a way for us to tell GoogleBot to not look that far in the future?

I apologize if this is a silly question, but I am a developer, not a webmaster thus I'm pretty noobie about this.

UPDATE 1

As suggested by Stephen in the comments, I should prevent both human and bot visitors from requesting events too far in the future.
I can do this in code without having to rely on updating robots.txt periodically.

I have a doubt though: won't GoogleBot periodically request all the URLs in the future that it has already crawled?

If so, maybe I should complement this solution with a few rules in robots.txt to block requests further than say 10 years in the future, or even better, do this dynamically in code once again.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Pope3001725

2 Comments

Sorted by latest first Latest Oldest Best

 

@YK1175434

This is a more case specific solution, but it might be usefull in your case:


Indicate paginated content
<link rel="prev" href="http://www.example.com/article-part1.html">
<link rel="next" href="http://www.example.com/article-part3.html">

Use rel="next" and rel="prev" links to indicate the relationship between component URLs. This markup provides a strong hint to Google that you would like us to treat these pages as a logical sequence, thus consolidating their linking properties and usually sending searchers to the first page.

10% popularity Vote Up Vote Down


 

@Sherry384

If you want to prevent crawling, you have to use robots.txt.

It would make sense to go this way if 1) you need to have the pages for these future dates and 2) you want to save your server’s and/or the search engine bot’s resources.

You can decide for which years you want to prevent crawling by specifying the beginning of the corresponding URL paths:


Prevent crawling of all years after 2019:

Disallow: /202

Prevent crawling of all years after 2018:

Disallow: /2019
Disallow: /202

Prevent crawling of all years after 2022:

Disallow: /2023
Disallow: /2024
Disallow: /2025
Disallow: /2026
Disallow: /2027
Disallow: /2028
Disallow: /2029
Disallow: /203

etc.


Make sure not to forget to remove the previously blocked years when the time comes.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme