Mobile app version of vmapp.org
Login or Join
Yeniel560

: Google can't crawl my rewritten URLs I tried to submit my robots.txt and sitemap.xml for hours now - changing directories and .htacces rules like a horse. For some stupid reason Google can't

@Yeniel560

Posted in: #Googlebot #GoogleSearchConsole #Htaccess #RobotsTxt #Sitemap

I tried to submit my robots.txt and sitemap.xml for hours now - changing directories and .htacces rules like a horse.

For some stupid reason Google can't crawl my pages and gives me warnings, that my urls in my sitemap.xml are blocked by the robots.txt, even if I allow Google to crawl my whole site with Disallow:. My .htacces rules working fine on my homepage.

robots.txt

User-agent: *
# allow sitemap
Allow: /sitemap.xml

# allow article page
Allow: /blog/article/

# allow article 1
Allow: /blog/article/first-article

# sitemap url
Sitemap: www.my-page.de/sitemap.xml

sitemap.xml

...
<url>
<loc>http://www.my-page.de</loc>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://www.my-page.de/blog/article</loc>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://www.my-page.de/blog/article/first-article</loc>
<changefreq>daily</changefreq>
</url>
...


.htaccess

# ERROR DOCUMENTS
ErrorDocument 401 /includes/errors/404.php
ErrorDocument 403 /includes/errors/404.php
ErrorDocument 404 /includes/errors/404.php
ErrorDocument 410 /includes/errors/404.php

# ACTIVATING REWRITEENGINE
RewriteEngine on

# TRANSLATE NON-WWW TO WWW
RewriteCond %{HTTP_HOST} !^www.
RewriteRule ^(.*)$ www.%{HTTP_HOST}/ [R=301,L]

# TRANSLATE EVERY SUB_DOMAIN_PAGE TO DOMAIN_PAGE
RewriteCond %{HTTP_HOST} ^www.mypage.de$ [NC]
RewriteRule ^ www.my-page.de [L,R]

# CLEAN URLS
# blog & article
RewriteRule ^(.*)blog/article.?$ $article.php
RewriteRule ^(.*)first-article$ first-article.php


root directory

├─── ...
├─── blog
│ └─── first-article.php
├─── sitemap.xml
├─── robots.txt
├─── .htaccess
├─── article.php
├─── ...


EDIT: For some reason Google now managed to crawl through the URLs, listed in my sitemap. I changed absolutely nothing. Maybe the errors were caused due to the fact @closetnoc mentioned below (Google checks the robots.txt file periodically).

10% popularity Vote Up Vote Down


Login to follow query

More posts by @Yeniel560

0 Comments

Sorted by latest first Latest Oldest Best

Back to top | Use Dark Theme