Mobile app version of vmapp.org
Login or Join
Carla537

: Why is Google crawling non-existant URLs? I can see in live traffic of my wordpress website that goggle bot crawl non existing pages. www.example.gr/search/search-results/password-reset%252Fpassword-reset/password-reset%252Fpassword-

@Carla537

Posted in: #CrawlErrors #Google #Googlebot #Links

I can see in live traffic of my wordpress website that goggle bot crawl non existing pages.
example.gr/search/search-results/password-reset%252Fpassword-reset/password-reset%252Fpassword-reset%252F&listview=2/?pg=6&dtype=prosfata&listview=2 example.gr/search/search-results/password-reset%252F&listview=1/password-reset/search/advanced-search/tag/katigoria/gaming/?pg=15&order=lcomdate&dtype=prosfata&listview=1
I can’t find out where google bot has discover this links but are thousand and almost the only links that google Crawl.

I have add noindex, noffolw for these urls but bot steel Crawl them. How I can stop this? Why google Crawl only these urls? I thing that the High CPU amount can caused by this.

One more question. Recently I have add caching to my website. Shouldn’t google Crawl the cached pages for better speed? When I use the “fetch as google” I can see that Crawl no cached pages.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Carla537

1 Comments

Sorted by latest first Latest Oldest Best

 

@BetL925

Googlebot crawls any URL that it finds:


Links on your own and third party websites
Text on the page that looks like a URL
JavaScript strings that look like they might be URLs


Check your own site to see if there are links to these pages. If not, it is probably some other site. Google may be able to tell you which site in Google Search Console in the crawl error report.

One thing that you can do about it is to use robots.txt to disallow crawling of whole directories. Based on your examples, /search would be a great candidate for disallow:

Disallow: /search


It is also possible that it isn't actually Googlebot doing the crawling. It may be a bot spoofing Googlebot to try to find vulnerabilities on your website. You can verify whether or not it is actually Googlebot by checking the IP address using the procedure here: How to identify if IP address is really google's IP

If it isn't actually Googlebot, you can block the IP addresses used in .htaccess: How to block entire IPs of a VPN server by IP

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme