Mobile app version of vmapp.org
Login or Join
Goswami781

: How can I tell GoogleBot that a subdirectory is now a subdomain? I had about a million pages of a catalog indexed under a subdirectory, and now that's moved to a subdomain. GoogleBot is crawling

@Goswami781

Posted in: #301Redirect #Googlebot #Subdomain

I had about a million pages of a catalog indexed under a subdirectory, and now that's moved to a subdomain. GoogleBot is crawling each one of them and getting a 301 redirect to the new location. Even though I have set up the redirect rule in the apache sites-enabled configuration file, (i.e. it's early on when apache does the redirect - PHP is not even getting loaded), even though I have done that, the server isn't handling the load well. GoogleBot is making around 5 requests per second, and on top of my normal traffic that is hiking up the CPU for a few hours at a time.

I checked in Webmaster Tools and the corresponding documentation for a way to let Google know that the content had been moved from a subdirectory to a subdomain, but with little luck. Basically the most helpful thing I saw said to just send 301 headers for the new location.

How can I tell GoogleBot that a subdirectory is now a subdomain? If that is not an option, how can I more efficiently send 301 redirects out for a particular subdomain?

I was thinking perhaps the Nginx server but I'm not sure that I can run both Apache and Nginx side by side on port 80 for different subdomains.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Goswami781

2 Comments

Sorted by latest first Latest Oldest Best

 

@Goswami781

GoogleBot is making around 5 requests per second, and on top of my normal traffic that is hiking up the CPU for a few hours at a time.


You can reduce the rate google and other search engines are crawling with the robots.txt "Crawl Delay" directive. For example, set the maximum until things cool down: Crawl-Delay: 30

I agree with everything PatomaS said.

10% popularity Vote Up Vote Down


 

@Yeniel560

You mention that you already read the documentation and you are using the right steps, so then that's done. Of course you can post here the specific technics you are using and then we can check that you are doing it correctly.

About the redirection, 301 is the best option. If you moved something permanently, then 301 is what you have to send to the clients.

You also mention the speed that Google is crawling your site, then you should control that, check this, Changing Google's crawl rate, plus remember that you can control the frequency of revisits in the sitemap. That won't help for the first visit, but as soon as it finishes wit it, your problem should go away and with the right schedule in your sitemap, everything should be ok.

You can also block parts of the site using your robots.txt. Google respects that. Then, after a few hours, you can edit the robots.txt and allow access to some other folders. That will control flow to your sites, but may cause delays in indexing and appearance in Google results. It's your decision.

Don't forget to add the subdomain to your webmaster tools account so you can control all the settings, including the crawling rate and frequency.

Check your logs to see that all the redirections are working as expected.

About having two web servers listening in the same port, you shouldn't do that, that will make them collide and break. Plus I don't think you can do that with modern servers since they check the port availability before starting.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme