: How to tell Googlebot to crawl just URLs ending in .html I have a big problem in Google Webmaster Tools. The number of 404 pages in the error report increases so fast that now I have over

Posted in: #CrawlErrors #GoogleSearchConsole

I have a big problem in Google Webmaster Tools. The number of 404 pages in the error report increases so fast that now I have over 1000. When I check for errors, I see that for every page Googlebot tries to crawl URLs without .html. That creates a 404 error each time.

I have tried to find the source of this error. Here is an example: ermagazin.com/najgora-nuklearna-katastrofa-u-americkoj-povijesti-za-koju-nikad-niste-culi It has 3 sources that are correct links. One of them is ermagazin.com/najgora-nuklearna-katastrofa-u-americkoj-povijesti-za-koju-nikad-niste-culi.html which is the correct URL that Googlebot should be crawling instead the first one without .html.

Check screenshot:

Is there something I can add in robots.txt to prevent Googlebot from crawling the URLs without .html?

My robots.txt is:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
Disallow: /readme.html

Sitemap: ermagazin.com/sitemap_index.xml ermagazin.com/post-sitemap.xml 2016-02-11 08:57 ermagazin.com/page-sitemap.xml 2016-01-14 14:45 ermagazin.com/category-sitemap.xml 2016-02-11 08:57 ermagazin.com/post_tag-sitemap1.xml 2016-02-11 08:57 ermagazin.com/post_tag-sitemap2.xml 2016-02-11 08:57

10.01% popularity Vote Up Vote Down

: Best tag placement for Analytics.js tracking code? I am using a webapp for my website which has an option in the settings to add my UA tracking code from google analytics. However, when I

@Gloria169

Posted in: #GoogleAnalytics #GoogleSearchConsole #Tracking

1 Comments

: Does opening link in new tab break goal funnels steps? I have a landing page with a button that takes the user to a store page which opens in a new tab. This store page is a different

@Gloria169

Posted in: #Analytics #GoalTracking #Google #GoogleAnalytics #GoogleSearchConsole

1 Comments

: Htaccess rewrite incoming links to new one I'm having a problem with site migration and old links. Now, its not just a server migration, it was also a CMS migration. Site went from umbraco

@Gloria169

Posted in: #Htaccess #ModRewrite #Redirects #Wordpress

1 Comments

: How can I compare sitemaps between two sites (staging and production) I have rebuilt a site for a client but want to limit the SEO fallout upon deployment of their new website. How can

@Gloria169

Posted in: #GoogleSearchConsole #Sitemap

2 Comments

Login to post a comment!

1 Comments

Sorted by latest first Latest Oldest Best

@Eichhorn148

When Googlebot is crawling such a large number of bad URLs, it is almost always because your site is misconfigured and you are linking to the URLs incorrectly somewhere.

In your case it is the "show all articles" link. For example on this page I see the following in the HTML source code:

<a href="http://ermagazin.com/zakopao-zivu-djevojku-8-mjeseci-zbog-vjerovanja-da-ce-to-donijeti-bogatstvo-tanzanija-u-soku" class="more-articles-button">show all articles</a>

It appears that when I click on that button in a browser, I don't get to the 404 page. You must have some JavaScript that intercepts the click and causes browsers to to something else. However, Googlebot scans the HTML source code and finds that link. When it tries to follow it, it gets a 404 version of each and every article on your site.

You need to fix that link, and look for others like it.

Another thing that you can do is redirect requests for the URLs without .html to the correct versions. Since you are using WordPress, you might want to use a WorpPress 404 plugin that allows you to monitor and redirect 404 errors. I used to use one called "True Google 404" that ran the words in not found URLs through site search and automatically redirect to the proper page. Sadly that plugin appears not to be available anymore. I did a quick search but I didn't find any plugins that allow redirects based on patterns from WordPress 404s.

10% popularity Vote Up Vote Down

Feed

: How to tell Googlebot to crawl just URLs ending in .html I have a big problem in Google Webmaster Tools. The number of 404 pages in the error report increases so fast that now I have over

More posts by @Gloria169

: Best tag placement for Analytics.js tracking code? I am using a webapp for my website which has an option in the settings to add my UA tracking code from google analytics. However, when I

: Does opening link in new tab break goal funnels steps? I have a landing page with a button that takes the user to a store page which opens in a new tab. This store page is a different

: Htaccess rewrite incoming links to new one I'm having a problem with site migration and old links. Now, its not just a server migration, it was also a CMS migration. Site went from umbraco

: How can I compare sitemaps between two sites (staging and production) I have rebuilt a site for a client but want to limit the SEO fallout upon deployment of their new website. How can

Login to post a comment!

1 Comments

Back to top | Use Dark Theme