Mobile app version of vmapp.org
Login or Join
Jessie594

: Best way to take down site for a long period while I develop it Rather foolishly, I have been developing my site on a live server (with VCS don't worry), kind of using it as a portfolio

@Jessie594

Posted in: #Seo #WebDevelopment

Rather foolishly, I have been developing my site on a live server (with VCS don't worry), kind of using it as a portfolio / show off piece. It is far from finished.

Recently, I noticed that Google started to spider it, including some of the debug data accessible across the site as it's in 'debug' mode. I'm not sure of the future effects this will have on my site when it is finally released - especially as some pages are just erroring out while I develop and there is some gobbledygook/dummy/blank data on a few pages.

SEO wise, what are the best steps to take to not incur any present or future Google penalty? One thing I thought of was to possible disallow spidering in robots.txt, put up a splash page and then setup a decent sitemap when the site is live again.

Regardless I am going to setup a development domain somewhere but would like to know the safest way of migrating it down.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Jessie594

2 Comments

Sorted by latest first Latest Oldest Best

 

@Odierno851

I'd do two things:


Verify your site in Google Webmaster Tools and request removal of your full site. Remember to undo this when you're ready :-). Keep in mind that this will only last 90 days, so you may have to re-request a site removal at that time.
Return 403 (use HTTP authentication) for all URLs on your development site, including the robots.txt. Returning 403 for the robots.txt will prevent the site from being crawled, so you don't need to block it in the robots.txt file (use the normal robots.txt file that you'd use when you make the site public).


Reasons for using HTTP authentication over robots.txt disallows include blocking the site to all other visitors, and making it harder to accidentally publish a robots.txt file with a full disallow :). The latter has happened too many times, even to really large sites.

10% popularity Vote Up Vote Down


 

@Hamm4606531

Well, first of all as general advice, I suggest to put all the files in a subfolder that only you'll know. For example, if you have the domain example.com put all your file in example.com/private. If Google starts to crawl your website and starts to index them you'll pay a little penality later because of the fact that some pages will no longer exist (usually the test pages).

Besides, it is a good pratice, as you said, to use the robots.txt to exclude same (or all) pages from being indexed. Anyway, this tip is more useful now than if you used the private folder as I suggest. Keep in mind that good robots.txt is a good way to exclude pages but not the best. Infact, while "good" crawler as Google, Yahoo and so on will respect your will, the "evil" ones will use this information to grab some of your (maybe private) data.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme