Mobile app version of vmapp.org
Login or Join
Reiling115

: Applebot not crawling sitemap.xml On our website we noticed that the Applebot is not crawling our sitemap.xml, so it is unaware of most of our internal webpages. We have no robots.txt restrictions

@Reiling115

Posted in: #RobotsTxt #WebCrawlers

On our website we noticed that the Applebot is not crawling our sitemap.xml, so it is unaware of most of our internal webpages.

We have no robots.txt restrictions for that bot, on the contrary we tried to whitelist it, and there is a Sitemap field containing the sitemap URL as you can see below.

Are we doing anything wrong in robots.txt which is blocking it from crawling our sitemap.xml?

Here's our robots.txt looks like, with minor changes such as replacing our domain with example.com:

## Disallow ALL
User-agent: *
Disallow: /
Crawl-delay: 600

## but allow only important bots
User-agent: Applebot
User-agent: Googlebot
User-agent: Googlebot-Image
User-agent: Mediapartners-Google
User-agent: msnbot
User-agent: msnbot-media
User-agent: Slurp
User-agent: Yahoo-Blogs
User-agent: Yahoo-MMCrawler
User-agent: Yandex

## disallow directories
Disallow: /dir1/
Disallow: /dir2/
## disallow files
Disallow: /status
Disallow: /health

## disallow some text file extensions
Disallow: /*.txt$
Disallow: /*.json$

Sitemap: example.com/sitemap.xml Host: example.com

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Reiling115

3 Comments

Sorted by latest first Latest Oldest Best

 

@Kristi941

Keep these lines at last.

## Disallow ALL
User-agent: *
Disallow: /
Crawl-delay: 600

10% popularity Vote Up Vote Down


 

@Caterina187

@PlanetScale

Since the default for URLs is 'index' (for Google, at least), why leave a blank line in robots.txt with regards to Disallow:?

I'd omit it before I'd specify it.

10% popularity Vote Up Vote Down


 

@Jessie594

A single typo looks to be causing this issue.

Your disallow: / statement is blocking all robots. It should show up as disallow:

Here is a rewritten version of your robots.txt file to allow robots to access your sitemap...

User-Agent: *
Disallow:
Disallow: /dir1
Disallow: /dir2
Disallow: /status
Disallow: /health

Sitemap: example.com/sitemap.xml

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme