Mobile app version of vmapp.org
Login or Join
Barnes591

: How to tell how old a page is? I thought Google was more or less accurate at determining who posted a text first and who copied. However, when I use the "search tool: customized interval"

@Barnes591

Posted in: #GoogleIndex #GoogleSearch #Tools

I thought Google was more or less accurate at determining who posted a text first and who copied. However, when I use the "search tool: customized interval" the results are quite odd. I've found pages dating back to 2002 for a website I've had for only a couple of years.

So Google isn't accurate to find out who copied and who wrote the original. What is?



If stackexchange.com was created in 2009 then how is this possible? hermeneutics.se is older than Stack Overflow!

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Barnes591

2 Comments

Sorted by latest first Latest Oldest Best

 

@Miguel251

If you want to see how old is a domain, search on Google for wayback machine. This site is what you're looking for: archive.org/web/.
If you want to detect plagiarism, this link will help you: copyscape.com/signup.php?pro=0&o=f
Also, search on Google for "plagiarism checker".

Hope I helped.

10% popularity Vote Up Vote Down


 

@Alves908

I researched the answer to this question this way: using Google since this is the example I have, how Google gets creation dates and modified dates, and date formats that Google recognizes. Please understand that this information does not exist on just a few pages and I had to ferret out the data from very many sources some of which do not seem to apply directly and piece it together. In some cases, the information is derived from several sources and not always quotable.

Google looks for page dates in this order; URL, title tag, body (content), meta-tags, HTTP response header at least as far as the Google search appliance is concerned. In other paragraphs in other documents, no order was documented, but the list was discussed and seemed to confirm the list. If you think about it, this mirrors the order that a search engine would; one - discover your page (link), and two - read your page from top to bottom (title, body, and meta-tag) with the exception of the meta-tag (small detail) and HTTP response header. Here is the list as far as the appliance is concerned: developers.google.com/search-appliance/documentation/68/admin_crawl/Preparing#docdaterule
Note: The inception date is the date that the page was first requested by Google. In the absence of a creation date, the inception date is used.

1] Any search engine can request a resource via a HTTP GET request and the web server returns the last modified date within the response header with the resource within the data packet.

2] Any search engine can request header information of a resource via a HTTP HEAD request and the web server returns the modified date within the response header without the resource within the data packet.

3] Any search engine can request if a resource has been modified since a certain date by requesting a resource with a HTTP GET with if-modified-since set to a date. If the resource has been modified since the date set, the web server responds with a 200 Ok response and returns the resource or if the resource has not been modified since the date set, the web server responds with a 304 Not Modified without returning the resource.

Google makes many requests using method #3 to save on bandwidth. You will see these in your web server log files.

Note: It is possible that a content management system (CMS) or other software cannot provide date appropriately within a response header.

These date examples also come from the Google appliance documentation but also exist in other places concerning general search. I took these details from the appliance documentation simply because it could be cut and pasted as a list where in other places it was not as neat.

4] Google looks for a date within the URL. It looks for the following formats; YYYMMDDHH - YYYY - YYYYMM.

5] Google looks for a date within the title tag. It looks for the following formats; YYYMMDDHH - YYYY - YYYYMM though I suspect other formats can be recognized. See below.

6] Google looks for a date within the body tag (content). It looks for the following formats; YYYMMDDHH - YYYYMMDD - YYYYMM - YYYY - DDMMYYYY - YYMMMDD - MMDDYYYY - YYMMDD - DDMMYY - MMDDYY though I suspect other formats can be recognized. See below.

Note: It is known that Google looks specifically for a date just under the first H1 tag. This is because blogs often put dates in this location.

7] Google looks for a meta-tag like this one. <meta http-equiv="last-modified" content="YYYY-MM-DD@hh:mm:ss TMZ" />

Google is also said to recognize the following date formats.

YYYY-M-D - YYYY.M.D - YYYY/M/D - M-D-YYYY - M.D.YYYY - M/D/YYYY - YY-MM-DD - YY.MM.DD - YY/MM/DD - WK, D MON, YR - WK, MON D, YR - D MON, YR - MON YYYY - MON D, YR - MON YY - YYYY-D-M - YYYY.D.M - YYYY/D/M - D-M-YYYY - D.M.YYYY - D/M/YYYY - DD-MM-YY - MM-DD-YY - DD/MM/YY - MM/DD/YY - YYYYMMDDHH - YYYYMMDD - YYYYMM - YYYY - DDMMYYYY - MMDDYYYY - YYMMDD - DDMMYY - MMDDYY

The research I found did not answer the question of time.

In the case of the examples cited, the pages do not provide date clues except for within a span tag which may be ignored. It is possible that the SE software / web server cannot return creation and modified dates within any response header.

Why and how Google derived these dates is a good question that may never be resolved. I will keep looking however.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme