Mobile app version of vmapp.org
Login or Join
Sue5673885

: Clean up hacked site by getting Google to crawl and index only the URLs in the sitemap So recently, our website has been hacked and we're trying to clean everything right now. But, when doing

@Sue5673885

Posted in: #GoogleSearchConsole #RobotsTxt #Seo #Sitemap #XmlSitemap

So recently, our website has been hacked and we're trying to clean everything right now. But, when doing the "site:" search it still shows the cached japanese websites.

So we tried playing with robots.txt i.e.:

User-agent: *

Disallow:

Sitemap: www.example.com/sitemap.xml

But when I enter the bad URL in robots.txt tester, it still allow the URL that we don't want.

Is there any way that Google only crawls the sitemap on robots.txt without manually entering all the bad links on the Disallow?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Sue5673885

1 Comments

Sorted by latest first Latest Oldest Best

 

@Heady270

Google has never limited itself to crawling and indexing just URLs that are in the sitemap. Such functionality does not exist, and I doubt that it ever will.

Sitemaps are fairly useless. They don't help with rankings. They rarely get Google to index pages it wouldn't otherwise index. Google really only uses them to choose preferred URLs, to specify alternate language URLs, and to give you extra data in search console. See The Sitemap Paradox.

You probably don't want to use robots.txt to disallow the URLs either. robots.txt blocks crawling but not indexing. You need to have Google re-crawl the URLs and see that they are gone. Googlebot needs to be able to access the URLs for that.

To clean up your hacked URLs, make sure they now return 404 status. Google will remove each of them within 24 hours of next crawling them. It could take Google a few months to remove all the URLs because it may not re-crawl some of them again soon. See Site was hacked, need to remove all URLs starting with + from Google, use robots.txt?

If there are not too many of the URLs, you can submit them individually through the Google Search Console Remove URLs Tool. That will get Google to remove them much faster than waiting around for the re-crawl, but there is no bulk remove feature.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme