Can we trust site explorers?

By Emi Kawashima on Jun 13, 2014 -


Can we trust site explorers?

There are many site exploring tools and services on the Internet, which provide all kinds of competitive data on every website or domain. We shouldn’t always accept their data as reliable and this is why:

Business attention concentrates on Alexa’s top-100K traffic sites, which are reviewed more carefully and the data about them is supposed to be precise and reliable.

However, even this isn’t true in every case. In a test we encourage everyone to repeat, our team analysed a site rating in the middle of Alexa Top-100, with a PR of 5, and a three-year domain history.

Three popular backlink analysis services have reported the site’s backlinks number as:

  • 1,000
  • 8,000
  • 19,000

As to traffic, the results’ difference was less drastic: a well-known service estimated the site’s traffic to be around 20k visitors per day, while another one stated that the site had three times that number. To our surprise, the site’s admin reported that he had really had 2000 unique visitors per day, judging from his server console tcptrack utility. How can that be possible?

What we found was a known site of average popularity. When you need to investigate a site with lower traffic and popularity, data reliability becomes much worse.

Our small experiment leads us to another question: if market’s first rate players provide such different data on a website’s basic stats, how can we trust site explorers?
To answer this question let’s have a look on how site explorers work. Most of them are pretty similar engines made of a crawler with a parser and more or less advanced modules, adding features and expanding functionality in grabbing data from different sources. Some of these sources provide it for free, some are commercial.
For sure, everyone except the market’s affluent players tries to avoid proprietary databases and aggregate as much valuable info from free sources as possible. However, most of these free services (whoises, for example) don’t like their servers being routinely overloaded with huge automated data requests. Some ban site explorer’s crawlers for a week just after their first three or five hundred requests.
For a site explorer this means that some of the data it shows as current might easily be several weeks old. A competitor using different services can have different freshness data, so if you are sensitive to data quality you need to know your favourite site explorers really well.

When it comes to issues that are even more complex (i.e. inlink analysis), the situation becomes even more chaotic. We have found that while using two of the most commonly-used backlink checking services, we receive completely different lists of backlinks for the same website.

Stats like average daily page-views and average daily reach should generally be perceived as vague averages, which are often far from the real numbers. Unless a site exploring entity has access to universal tracers (Google Analytics, for example), there is a slim chance this type of information could actually be reliable.

At this point it is important to point out that inexperienced Internet users tend to see publications of such statistics as objective and reliable. Novice journalists and bloggers tend to quote the services’ findings without double-checking their information, and thus, both online and offline media often publish unreliable data, which only serves to disinform their audience.

The internet is full of fake data and misinformation. It seems as though every other start-up with a fancy website tends to show fake achievements to impress their visitors. No matter what the numbers say, there is a very high probability that their boasting comes out of thin air. Practices like this are as popular as buying fake likes on Facebook, fake “characters” showing forum activity, and people leaving fake product reviews on Amazon. Marketing departments are interested in pushing the truth into the margins, where nobody would be interested in searching for it.

So, no matter what information from the Internet we are working with, the rule of thumb is that you should always remember to double or even triple-check it with other sources, and never indulge in temptation of viewing any data as 100% true.

About Emi Kawashima

Being a student of computer sciences Emi shows a huge passion and enthusiasm for all things digital. She has become a blogger to share how to use technology and enhance your life through simplifying and minimizing routine task freeing time for what can be really important. She believes curiosity is the cornerstone of the development, and there is an endless amount of topics worth covering, each having a lot to be said about. She is trying her best to deliver you tech facts in a personalized manner in order to awaken your curiosity and open you the world of digital as she sees it - dynamic, majestic, futuristic.

By Emi Kawashima on Jun 13, 2014 -