Tag: WebCrawlers
Sorted by: Newest Newest Oldest

: Web crawling and ethics/legality Is it illegal or unethical if i compare prices on my website and don't provide a link to that website but instead go there myself and deliver it to the customer?

: Facebook crawler with no user agent spamming our site in possible DoS attack Crawlers registered to Facebook (ipv6 ending in :face:b00c::1) were slamming our site, seeing 10s of thousands of hits

: Is there a way to get Google search console crawl stats for larger than 90 days? Google search console allows you to see how many pages Google is crawling per day on your site, but it only

: What is `/&wd=test` URL that is being requested from my site, probably by bots I'm seeing error logs on a website because something tried to access: example.com/&wd=test the HTTP_REFERER

: How to force Google and other bots to pick actual images and not thumbnails? For example, if there is an online shopping websites with thousands of small thumbnails of products and when you

: Pagination and crawl depths On a website with ~40 blogs, we have recently switched on pagination meanings blogs are on page 1-8. With google crawl being less likely to crawl over 3 clicks deep,

: Do soft 404 errors on wiki sites caused by pages not yet created cause SEO problems? I host a couple of wiki based sites, so there is a lot of content at various stages of generation, and

: 404 or 302 Redirect - what to use for a url which may be used in the future but not available at the moment My site lists blogs like this example.com/?status=blog&id=number I only have

: Will a dynamic robots.txt file that disallows crawling based on the time of day hurt SEO? We have a serious traffic issue on our site and we want to eliminate crawlers as part of the problem.

: Strange usage pattern: UK user accessing 1 page X times every 14hrs – who/what can it be? For about two week I find a strange pattern in my web statistics: approximately every 14 hours a

: Website Home Page URL Not Show On Google Search If i search like site:www.paka.tv my main homepage URL not showing, but categories of my site pages showing on results. How to show my website

: Why am I getting bot hits from compute-1.amazonaws.com? I have a WordPress website with AWS on use, namely the Cloudfront service, to serve CSS, images and JS from the cloud. Lately, I noticed

: Webmaster tools 'URL errors' always shows errors The issue I am facing is my webmaster tools always show an error message.The error URL's are not related to my website.The site url mentioned

: Will creating an app that user prefer to the website reduce search engine rankings due to lower usage of the site? If all users use only iPhone or Android app instead of use the website,

: How to find out the referrer of Googlebot's crawling URL? Googlebot crawls 100s of 404 URLs from my website. I want to know from where it gets those links? Is there anything like HTTP Referrer?

: Robots.txt isn't preventing my site from being crawled I'm having problem with robots.txt. I put the robots.txt file in the website main directory (and also in /var/www/html - to make it work

: Keep order status private from search engines? I searched hard before posting this question. I apologize if it is a duplicate or if this is not the correct forum. We have a homegrown shopping

: Home page has a youtube video on it, unsure how to fix google's description A website i'm working on has embedded on it's homepage a youtube video. For some reason, google is ignoring my meta

: Can we have Google crawl but ignore our paginated category pages and prefer our individual post pages in the index? We have a site with a massive content back-end (50.000+) with, lets say,

: How frequently should I expect Google AdsBot to visit my site? I bought paid advertising for my website through an internet advertising company. I noticed that, among all the visits coming from

: Htaccess empty referer deny "google bot" I put this rule into htaccess file to deny empty referers which is returning 403 SetEnvIfNoCase Referer "^$" bad_user Deny from env=bad_user I

: Confused with google url indexing When enter my site in google like this site:mywebsite.com I have seen this below image. I have 169 total urls in my sitemap.xml file. It says 102 results

: Seo for hidden anchor tags I have a very simple doubt, does google seo or other seos recursively crawl hidden hyperlink tags. I googled but could not find any solution. Any experience or any

: How to stop indexing/crawling for Shop Checkout Summaries? I have a small shop checkout that uses cookies for my cart and after the payment is done it generates a unique order-id creates an

: Unverified and verified Bing crawlers from same subnet I have some traffic from Bing crawlers I'm trying to verify. I'm using the method Bing suggests, namely reverse and then forward DNS lookup,

: What's the SEO effect of copying the same content to another blog? I am writing on my own blog, now I would like to copy the same content to another blog without deleting the old one. Could

: How to prevent CDN content URLs being indexed by Google Well, robots.txt prevent crawling and meta robots tag in HTML (or) X-Robots-Tag HTTP header prevents indexing (and other functionalities

: Does Google or Bing follow dynamic HATEOS links that are not in an 'a' tag? I have a Angular 1x SPA that uses the HATEOS standard to manage navigation. This is further complication by

: What are these many requests referred from Facebook with a changing s= parameter? I just recognised a peak in our access logs and I'm curious how to explain this or if my suggestion how to

: How to prevent Googlebot from doing API requests? I have a currency converter site around 32k pages. Every pair for each page. And every page has 2 API requests. I started to see huge number

: SEO impact of ajax loaded content from external site I am in need of some expert opinions before I put a lot of time into a business decision. I have a website where I write product reviews

: How do you add a rule just for a specific bot to robots.txt? I have a small website, for which the current robots.txt looks like so: User-agent: * Disallow: Sitemap: https://www.myawesomesite.com/sitemap.xml

: How to block everything from being indexed except sitemap.xml I want to block everything and index sitemap.xml file alone. So I do it as shown below: User-agent: * Disallow: / Allow: /sitemap.xml

: Wordpress Website not indexing in google, no error on google webmaster I just recognized my SEO Score is 0 and that my wordpress website is not indexed on google search engine, no automatic

: When does a search engine learn about a particular user inside a website and display the profile link and /or a Login link along with search results? How does Google search engine display user

: Is the User-Agent "gce-spider" a well known scammer, a bad bot? My website has been scammed using some "scamming-web-site steals my content through a proxy and serves the stolen content from

: Prevent Googlebot from crawling "access denied" error (403) of private forum that are reported in Google Search Console? I'm running a website base it by vbulletin. Recently I moved one of my

: Moving a site from one subdomain to another subdomain, what to do with pages which aren't "mappable"? In the process of moving a site from one subdomain to another subdomain. For a lot of

: How can I stop Google from indexing "pretty links" external redirects from my WordPress site? When I search site:[example].com in Google for my blog, the majority of the pages that are being

: Can a user or a crawler see the source of a page that has been redirected via a 301? Is it possible for a user or a web-crawler to see the contents/source code of a webdocument that is

: Preventing/blocking from crawling a specific user control of a page Currently, google access/crawls a user control from the page given below- http://articles.mercola.com/sites/articles/archive/2017/07/20/do-fidget-spinners-help-anxiety

: Repeated hits on my site from different IP addresses trying to access .aspx files using all my bandwidth I checked my raw access files after being notified that my site has been limited over

: In a robots.txt file is a Noindex: command recognised? I have come across a website and it the robots.txt file it has the following information User-agent: * Noindex: /search Disallow: /search Sitemap:

: What % can be regarded as "normal" "not viewed traffic" reported in AWStats? Recently statistics for my site is about 50% "not viewed traffic". This seems much as many robots are blocked. What

: How can I avoid site search page duplicate title tag error in pagination of site search? Google Search Console shows me my site search page duplicate Title Tag,

: Googlebot submitting thousands of requests to our map locater and using up the API quota We have a store locater page on our customer's site. The end user enters their postcode and a search

: ASP.NET Seo and web crawlers So I am going to build a website in Asp.NET and have a few questions. I am planning on using technologies like React and EF. My worry is SEO though, I am wondering

: Google is adding "Archive" to the title of tag and category pages and not using the meta description I am wondering why Google is indexing my tags and categories as archives. Is that OK to

: Should I prevent search engines from indexing empty user profile pages? If so, how much content is enough for indexing? I'm developing a social website for book readers, with public user profile

: Would an online queueing system for a website have an effect on google's ability to crawl the site? If I was to implement a visitor queuing system for my website, where if visitor count is

: How long before Google indexes a new (to me) domain that previously had spam and virus problems? I registered a new ".com" domain, but when I added it to the Google Search Console I saw that

: Is it bad for SEO to have a URL with nothing on it? I have a WordPress site and for a very complicated reason I have to set some post type's template to empty file. Nothing. Which means

: How to Crawl a website requires cookies for audit? Situation: My Client's website requires cookies to access it. Users should choose (Language and country) to access the website. The problem is:

: Can we find all backlinks to a webpage? In Page and Brin's paper on PageRank, they say that while you are guaranteed to be able to find all links that point away from a page, the reverse

: If .htaccess is used to block my bot from accessing a particular directory, will I know this? I'm working on a research project and I have a question. Say I would like to crawl all pages

: Why would a bot submit a sign up form with fake info? I have a sign up form where people can enter their name and address, then click on a submit button. However I am getting BOTs entering

: How do we know, what is the source from which BING search indexed my webpage? - A Blog or website where my website link is placed One of my web page with sensitive information was indexed

: Where can I redirect exploit scanning bots? I get a lot of those exploit scanning bots like the ones looking for a WordPress login (which I don't have) or guessing other exploitable URLs.

: Facebot sometimes fails to parse open graph meta data, causing share failure We maintain a WordPress-based website where Yoast plugin takes care of Open Graph meta tags generation. Recently we

: Why wouldn't a website apear in Google search results after robots.txt update to clean up hacked site? I have a WordPress website which was hacked couple of days ago. I have tried to add

: Re-Indexing Home Page - HTML/DOM Change (Site on WordPress) Currently, our company homepage is using images instead of divs to display our main products/solutions. As the SEO, I wanted to remove

: Are user agent names case-sensitive in robots.txt? I'm blocking various bots in robots.txt and I was wondering if their names are case-sensitive. For example: User-agent: grapeshot Disallow: /

: Could a custom crawler find unlisted web pages? Example: A website has no sitemap.xml, no robots.txt, no index of those pages. Pages are not blocked, bots and humans have access, but they

: Rule out third party scraping, but allow Google crawling How to make scraping of own content through wget, httrack etc. impossible, but allow crawling through googlebot? This should be done without

: A top directory in a URL path returns a 404 error, will its subpages still be crawled? I found a site in which one of the directories in the URL path returns 404 error, but the subpages

: Google unable to Crawl and Fetch my Website I created my website www.tribologyconference.com on April 09, 2017 and on the same day I submitted this url (with www and without) to google and

: How do spambots find and submit to email opt-in the end of a funnel? I have an opt-in page that's being nuked right now with spambots. A simple webpage that I use to allow subscribers into

: How should I protect "secret" links I send in Emails from being indexed by search engines? I have noticed that bing/msnbot tries to index pages that were only ever linked to in one single

: Should article preview pages be crawled and indexed by search engines? I have a page called "all articles" that loads previews of articles using AJAX. Because of that, the content won't be

: Does adding CDN stops Google from crawling? I want to add Cloudflare free CDN to my site but worried about whether Google bots will be able to crawl my site later on or not.

: How block only Yandex bot Can you show me how will look robots.txt when I block only yandex bot, allow Google bot and block Yandex bot.

: How does robots.txt work with sites in subfolders? I have a single web host with a number of other parked domains/sites in sub-directories, like this: example.com is the primary site and root

: Ajax Content from Blocked Resource I have a site built in AngularJS. Most of the dynamically-loaded content comes from a Wordpress back-end that is separate from the AngularJS site. In fact,

: SEO - Pages are blocked because google failed to get the resource since it is blocked by robots.txt I would like to index all the pages in my angular site by Google. I used ngMeta in my

: How do you properly SEO-tag an app that is only a catalogue of images? I am making a little helper app that mostly links a lot of graphs and images. There is also some text explaining a

: Is it good to submit our site in website submission sites? I was wondering why don't we submit our site in website submission sites and increase our site's rank. Is is good to do this? How

: Set bot crawl delay 10 seconds EXCEPT Googlebot? Is it possible set the crawl delay for all bots to 10 seconds except Googlebot and Bing/Yahoo which can proceed at any pace? I like being indexed

: Checking site for meta refresh redirects How do you check or crawl site (couple of urls) to explore existing meta refresh redirects? Screaming Frog doesn't handle them - it indicates a page

: How to check whether the infinite page scroll data is crawled and indexed by google or not? I am having a music platform website where at a time 20 songs are visible right now then after

: How can search engines find my RSS feed? I have a small blog. Now, I have created an rss.xml. I put that in the root of the site, on the server. Should I do anything to make search engines

: Google Search Console: 404 errors on existing pages There is a few years old small website with very few pages (~5), which were indexed and ranked by Google. A few days ago 4 of those pages

: Should i create seperate page for every ad for search engine indexing purposes in my website I am going to create a simple ad marketing web site. so I should use databases as my storing

: Which one is more SEO friendly Dynamic pages or static pages for blogging website I am new to web development and I am developing a Blogging website. While working on its architecture from

: "Is robots.txt blocking important pages?" in Search Console regarding static folder This is the contents of my robots.txt file: User-agent: * Disallow: /static/* Disallow: /templates/* Disallow: /translations/*

: Does a link rel canonical tag pointing back to the page itself cause an infinite loop that wastes crawl budget? I assume that search engine robots crawl the whole page and then the canonical

: Robots.txt for website hosted in a subdirectory I have 2 website which is hosted in a shared hosting. 1st website example.com hosted in a root directory as /public_html/ 2nd website example2.com

: How is noFollow enforced on sites like Quora and Facebook? I'm curious to know how search engines like Google enforce their noFollow policy on social sites. It seems like it would be largely

: Block Yandex crawler Our site has been behaving very strangely for the last few days, lots of time outs etc. Finally think I found the cause, the Yandex bot is crawling around 10,000 pages

: Crawler says, page not found, but browser says otherwise I'm stumped. Google has been reporting increasing numbers of 404 errors for my website. But my site is static. Ok. Maybe they

: I need the subdomain in cpanel to not be followed. Help! I have little knowledge of robot.txt but I know how meta follow and no follow tags work. The problem is I have 2 totally different

: Spider-Trap on a GitHub Site I have a GitHub site and I hate web-crawlers that disobey or ignore robots.txt. How would I set up a Spider-Trap on a GitHub site that the robots.txt disallows

: How does a webmaster perform a proper redirect from one domain to another? There are a few questions that are inherently present inside the main question. Those questions are: What is the

: Is it possible to block search engine indexing using DNS alone? As per the title, is it possible to block all search engine indexing using DNS? Most guides point towards robots.txt or meta

: My personal website was cloned in its entirety, is this a security concern? I was googling my (real) name yesterday (which is the name of my site) on a lark and discovered a website that

: Googlebot not respecting HTTP basic auth I have basic auth set up and it has always worked. Suddenly Google started crawling my pages. The auth is still there (I have checked it using different

: Do I need an umbrella services page to match the /services/ part of our URLs? From a usability perspective, this is not necessary. The mega menu dropdown on all pages makes it extremely clear

: SEO issues with Elm? Do search engines, particularly Google, render JavaScript created from transpiled Elm code when crawling? Can they follow links, even internal that modify the existing page?

: How does Google treat underscores in site map URLs? Google is currently reporting that my URLs are invalid within my sitemap. Here's an example of a document that was considered erroneous by

: Stopping Google from crawling my static domain I use a cookieless sub domain static.example.com to serve all images, js, and css files. This static sub domain has as its root directory the same

: Google Analytics traffic surge from China, not real visitor, Baidu? UPDATE2: Seems this all comes from Baidu crawlers, here's some outputcreated by GoAccess analyzing our logs. We've restricted
Next Page
Terms of Use Create Support ticket Your support tickets Powered by ePowerPress Stock Market News! Top Seo SMO © vmapp.org2023 All Rights reserved.