: Recovering from an incorrectly deployed robots.txt? We accidentally deployed a robots.txt from our development site that disallowed all crawling. This has caused traffic to dip dramatically, and

We accidentally deployed a robots.txt from our development site that disallowed all crawling. This has caused traffic to dip dramatically, and google results to report:

A description for this result is not available because of this site's robots.txt – learn more.

We've since corrected the robots.txt about a 1.5 weeks ago, and you can see our robots.txt here.

However, search results still report the same robots.txt message. The same appears to be true for Bing.

We've taken the following action:

Submitted site to be recrawled through google webmaster tools
Submitted a site map to google

(basically doing everything possible to say "Hey we're here! and we're crawlable!")

Indeed a lot of crawl activity seems to be happening lately, but still no description is crawled.

I noticed this question where the problem was specific to a 303 redirect back to a disallowed path.

We are 301 redirecting to /blog, but crawling is allowed here. This redirect is due to a site redesign, wordpress paths for posts such as /2012/02/12/yadda yadda have been moved to /blog/2012/02/12. We 301 redirect to wordpress for /blog to keep our google juice. However, the sitemap we submitted might have /blog URLs. I'm not sure how much this matters. We clearly want to preserve google juice for URLs linked to us from before our redesign with the /2012/02/... URLs.

So perhaps this has prevented some content from getting recrawled? How can we get all of our content, with links pointed to our site from pre-and-post redesign reporting descriptions? How can we resolve this problem and get our search traffic back to where it used to be?

10.02% popularity Vote Up Vote Down

: What can I do to take control of a domain from a client's previous developer? A client of mine has a website that is incomplete. The last developers quit from this work, and they need get

@Margaret670

Posted in: #Cpanel #Password #WebHosting #Whois

5 Comments

: Structured Data Markup Helper: article about an event With the Structured Data Markup Helper (login required; see documentation instead), Google provides a way to markup a webpage with structured

@Margaret670

Posted in: #GoogleSearchConsole #StructuredData

1 Comments

: Index file not loading automatically for Alias in Virtual Host I have the following virtual host setup: RailsBaseURI / <VirtualHost *:80> ServerName mydomain DocumentRoot /usr/share/redmine/public

@Margaret670

Posted in: #Apache2 #RubyOnRails

0 Comments

: Font blocked from loading by Cross-Origin Resource Sharing policy: No 'Access-Control-Allow-Origin' we are experiencing this error in Google Chrome. We thought everything was set-up correct. But maybe

@Margaret670

Posted in: #Cdn #Htaccess #HttpHeaders

1 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Shanna517

As you already noticed that bots are crawling your pages again, it is only a matter of time when they will crawl more of your pages and show indexed snippets in their search results.

While it will likely not result in any problems, your current robots.txt is invalid according to the original specification, because your record doesn’t contain a Disallow line (emphasis mine):

The record starts with one or more User-agent lines, followed by one or more Disallow lines, as detailed below.

So your robots.txt should be:

User-agent: *
Disallow:

Apart from that …

As also mentioned by w3d in the comments, it would be a good practice to use the same host name for all your pages: either with the www subdomain or without it.

Your sitemap tries to load an XSL stylesheet which doesn’t exist.

Your sitemap is in the wrong location. Because it is at opensourceconnections.com/blog/sitemap.xml, it may only contain URLs that start with opensourceconnections.com/blog/. But the paths of the URLs you list don’t start with /blog/.

Your blog redirect doesn’t seem to work. These two URLs show the same page:

/2014/01/02/presentation-building-client-side-search-applications/
/blog/2014/01/02/presentation-building-client-side-search-applications/

10% popularity Vote Up Vote Down

@Debbie626

I see on your robots.txt

User-agent: *

To allow all, you should declare a user agent with a specific directive.
Use :

User-agent: *
Allow: /

This will make sure that Google knows that everything under the root folder is allowed to be in the index. Give it a few weeks, Google's crawler deals with dozens of thousands of sites and it does not update in a jiffy.

As for the redirection - I'm not I got your question. It seems that you did right by appending the /blog to the old URLs using 301. The juice should flow to the new URLs. The sitemap should reflect the new URL structure, so there seems to be no problem here either.

10% popularity Vote Up Vote Down

Feed

: Recovering from an incorrectly deployed robots.txt? We accidentally deployed a robots.txt from our development site that disallowed all crawling. This has caused traffic to dip dramatically, and

More posts by @Margaret670

: What can I do to take control of a domain from a client's previous developer? A client of mine has a website that is incomplete. The last developers quit from this work, and they need get

: Structured Data Markup Helper: article about an event With the Structured Data Markup Helper (login required; see documentation instead), Google provides a way to markup a webpage with structured

: Index file not loading automatically for Alias in Virtual Host I have the following virtual host setup: RailsBaseURI / <VirtualHost *:80> ServerName mydomain DocumentRoot /usr/share/redmine/public

: Font blocked from loading by Cross-Origin Resource Sharing policy: No 'Access-Control-Allow-Origin' we are experiencing this error in Google Chrome. We thought everything was set-up correct. But maybe

Login to post a comment!

2 Comments

Back to top | Use Dark Theme