Mobile app version of vmapp.org
Login or Join
Miguel251

: Blackhat - Copying competitor content before it is indexed The title pretty much sums up the question. I am in a very competitive niche. Despite having a relatively strong site my content

@Miguel251

Posted in: #Blackhat #DuplicateContent #Google #Seo

The title pretty much sums up the question. I am in a very competitive niche.

Despite having a relatively strong site my content does not always get indexed straight away.

I have recently had a case where a competitor copied my content -- posted the exact duplicate, yet he got ranked for the post despite me writing it.

QUESTION:


If content is not yet indexed by google is it free for anyone to grab and copy it to their site?, In the hope that the stolen copy
will get indexed before the original copy? Does this often happen as a black hat technique?
I realize you could use the fetch as google in search console and then submit to index, usually my content gets indexed instantly when doing this -- BUT I dont always like to do this, rather preferring google to discover content naturally. Am I wrong to want natural content discovery and should I just submit each post manually to the index?


Any answers / suggestions welcome.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Miguel251

2 Comments

Sorted by latest first Latest Oldest Best

 

@Heady270

Make your Site Harder to Scrape

Most scrapers download your RSS feed to see your fresh content. If you put the full content of your articles into your RSS feed, it is very easy for scrapers get your full content and re-publish it. To combat this you should put only an article summary in RSS feeds, or disable RSS altogether.

Another mechanism that scrapers can use is XML sitemaps. You can give Google access to your XML sitemap without showing it to potential scrapers. To do so, give the sitemap a custom name and submit it to Google via Search Console. Do not link to it or put it in your robots.txt file.

Scrapers rely on bots to fetch your content. They come view your site with that bot frequently. Look in your server logs to see if you can identify that behavior and block IP addresses that may belong to content scrapers.

Publishing your content to an unlinked URL until Google indexes it is another strategy that I've seen employed. For example, in WordPress you can use the "draft" feature that keeps the article off the home page. In the meantime, tell Google about the page and let them crawl it.

Get Google to Index Your Content Faster

We have a question about this with good anwsers.

Submitting pages manually to Google Search Console is one way to get your content indexed quickly, but as you mention, it is a pain. There are other methods to let Google know about your content:


Sitemap XML files
Ping services such as pingomatic.com/. They let Google know about new content and if your site has enough reputation Google will crawl it quickly.
Posting a link to your article on Twitter. Google has access to their "firehose" and usually crawls things linked on Twitter within minutes.


Many content management systems have built in functionality or have plugins to do those automatically for you.

React to Scraper Sites

If you do find scraper sites with your content you can sometimes get them taken down with DMCA requests.

In the past, Google has asked for reports of scraper sites outranking the original. Their submission process is now closed, however. I don't know of any place to submit that to Google currently.

10% popularity Vote Up Vote Down


 

@Mendez628

I'm sure this is a common problem faced in competitively set environments, and even more so when the content is a niche topic.


Q: If content is not yet indexed by google is it free for anyone to grab
and copy it to their site?, In the hope that the stolen copy will get
indexed before the original copy? Does this often happen as a black
hat technique?


A: There are many known an unknown factors when it comes to when and how Googlebot does it's crawling. For instance, Google will crawl https first, before http. As the content owner, you do have the sole rights to distribution thereof and as such, should you have a copyright claim, depending on your whereabouts, you may be able to take legal action.

You can consider contacting the webmaster directly and request that the content be taken down, though this may not always prove fruitful. Alternately, you could consider something like Who Is Hosting This to identify who the host is and contact them directly. More than often, the web host will take action in a more pertinent fashion and may even take the website completely offline in order for them to remove the duplicate content.

Another option would be going the Google DMCA (Digital Millennium Copyright Act) complaint route. Taken directly from google.com:


This page will help you get to the right place to report content that
you would like removed from Google's services under applicable laws.


But ultimately, yes this does happen in the wild, you're certainly not the only person experiencing this.

The good news is you have a wide range of tools available at your disposal to help deter this sort of behaviour in future:


Google Alerts: You could for instance put in a portion of your article, choose what types of websites should be searched and then provide your email address so that Google can contact you with the results. (You can set up as many alerts as you'd like and adjust the settings to be notified daily, weekly, or on an "as it happens" basis).
Copyscape : This is a paid for option, but this site will search the web and let you know which sites are reposting your content.



Q: I realize you could use the fetch as google in search console and then
submit to index, usually my content gets indexed instantly when doing
this -- BUT I dont always like to do this, rather preferring google to
discover content naturally. Am I wrong to want natural content
discovery and should I just submit each post manually to the index?


This is more of a subjective question, as there really is no "wrong" or "right" in this instance. At the end of the day, submitting the content directly to Google will prove to get it indexed in a more timely fashion and as a result, have the potential to reduce the chances of you sitting in this situation.

Another great option to consider is posting via Google + which virtually gets indexed instantly.

Ultimately, SEO is a not a destination but a lifelong task which takes constant follow-up and continuous attention. This is simply one of the obstacles that you'll experience along the way. The fact that you've noticed the behaviour already puts you miles ahead of a majority of your competition, so well done!

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme