: Recovering from an incorrectly deployed robots.txt? We accidentally deployed a robots.txt from our development site that disallowed all crawling. This has caused traffic to dip dramatically, and
We accidentally deployed a robots.txt from our development site that disallowed all crawling. This has caused traffic to dip dramatically, and google results to report:
A description for this result is not available because of this site's robots.txt – learn more.
We've since corrected the robots.txt about a 1.5 weeks ago, and you can see our robots.txt here.
However, search results still report the same robots.txt message. The same appears to be true for Bing.
We've taken the following action:
Submitted site to be recrawled through google webmaster tools
Submitted a site map to google
(basically doing everything possible to say "Hey we're here! and we're crawlable!")
Indeed a lot of crawl activity seems to be happening lately, but still no description is crawled.
I noticed this question where the problem was specific to a 303 redirect back to a disallowed path.
We are 301 redirecting to /blog, but crawling is allowed here. This redirect is due to a site redesign, wordpress paths for posts such as /2012/02/12/yadda yadda have been moved to /blog/2012/02/12. We 301 redirect to wordpress for /blog to keep our google juice. However, the sitemap we submitted might have /blog URLs. I'm not sure how much this matters. We clearly want to preserve google juice for URLs linked to us from before our redesign with the /2012/02/... URLs.
So perhaps this has prevented some content from getting recrawled? How can we get all of our content, with links pointed to our site from pre-and-post redesign reporting descriptions? How can we resolve this problem and get our search traffic back to where it used to be?
More posts by @Margaret670
2 Comments
Sorted by latest first Latest Oldest Best
As you already noticed that bots are crawling your pages again, it is only a matter of time when they will crawl more of your pages and show indexed snippets in their search results.
While it will likely not result in any problems, your current robots.txt is invalid according to the original specification, because your record doesn’t contain a Disallow line (emphasis mine):
The record starts with one or more User-agent lines, followed by one or more Disallow lines, as detailed below.
So your robots.txt should be:
User-agent: *
Disallow:
Apart from that …
As also mentioned by w3d in the comments, it would be a good practice to use the same host name for all your pages: either with the www subdomain or without it.
Your sitemap tries to load an XSL stylesheet which doesn’t exist.
Your sitemap is in the wrong location. Because it is at opensourceconnections.com/blog/sitemap.xml, it may only contain URLs that start with opensourceconnections.com/blog/. But the paths of the URLs you list don’t start with /blog/.
Your blog redirect doesn’t seem to work. These two URLs show the same page:
/2014/01/02/presentation-building-client-side-search-applications/
/blog/2014/01/02/presentation-building-client-side-search-applications/
I see on your robots.txt
User-agent: *
To allow all, you should declare a user agent with a specific directive.
Use :
User-agent: *
Allow: /
This will make sure that Google knows that everything under the root folder is allowed to be in the index. Give it a few weeks, Google's crawler deals with dozens of thousands of sites and it does not update in a jiffy.
As for the redirection - I'm not I got your question. It seems that you did right by appending the /blog to the old URLs using 301. The juice should flow to the new URLs. The sitemap should reflect the new URL structure, so there seems to be no problem here either.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.