By James Ater on Apr 30, 2014 -
How objective are the Internet’s Rankings?
Today’s question is simple: can we trust Internet ratings and rankings? Or, to be even more specific: to what extent can we trust internet statistics?
To answer these questions we first need to delve into philosophy and the modern world’s picture. We’ve gotten used to positivist, scientific knowledge, to the idea that everything is measurable and countable. Most people expect that from Internet — a system created by scientists using solid mathematics. According to common perception, it’s supposed to be a system where everything, every single byte, is counted.
The roots of the problem lie in our psychology and our belief in intellectual power and rationality of scientists, mathematicians, programmers.
Modern cognitive science and the common perception of the Internet
As most other beliefs, this one can be challenged by an everyday example: when you are buying a 1 terabyte hard drive do you know how many real bytes you will have available for storing your files, movies, and music? Never.
A hard drive is a simple example, but what about when it comes to the entire Internet?
In today’s “digital world” every year looks less and less like the positivist, everything-is-counted world, and more like the modern quantum physics world with lots of unknown variables — crazy, moving galaxies of unknown size and placement in a Universe we know very little about.
Backlink Checking Trustworthiness
When checking the backlinks for sites we’ve noticed that despite old some of them are, many competitors don’t see them. We’ve tried to check the sites’ links with several other popular backlink-checking tools and have found different numbers and different lists of backlinks each time.
This small case posed a serious question: how can we rely on Internet ranking data, or in 20th century terms — “how objective” is this data? To find the answer we need to get into the nature of data and information collecting methodologies. However, that only seems simple at first glance. Who said that since Internet is “rational and digital”, everything should be easy to count?
The sphere of data collecting methodology is a mess. None of the companies on the market using proprietary and expensive coding are eager to disclose the ways they gather and analyze data. This is true about a majority of statistics. Even though the market leaders often show rough results, the market doesn’t complain and hungrily check every new project in the field, hoping for it to be ideal and to “tell the Truth and nothing but Truth”.
How reliable are MoonRankings then?
In Moonsearch we do our best to make our ranking as precise as possible. We do not pretend that our “Top by Platform” ranking (or any other ranking, for that matter) is entirely “objective”.
The data we operate is gathered through complex processes. We query lots of databases, which often have limits and query restrictions. Some allow only 5 queries per day. Some whois servers let you grab only 300 domains’ data per week. So, the joint data presented in our tables can be of different “freshness”. Besides, every web-exploring crawler or software has to face this problem. Nobody has 100% accurate data. Moonsearch’s is as close to this percentage as today’s technology allows us to be.
One more good example of the objectivity problem is Moonsearch’s Top by Platform ranking.
Moonsearch analyses the data from headers, but there are no obligatory rules in the Web as to what webmasters should put into these headers. This is why sometimes the rating’s information can provide us with deep insights on what competitors do, but in other cases it looks like junk.
The question transforms into another one: to what extent can we trust our Internet statistics?
The answer depends on our goals: if we need some data for showing off or to support our image of an expert in some field — anything is good, but if something serious relies on the data fished out from the Internet it should be validated with multiple instances.
How big is the Internet?
Another great example of the problem is an age-old question: how many sites are there in the internet?
In 1996 we could answer more or less precisely counting the IP addresses. Today’s picture is different.
Lets analyse Google’s first SERP on the question. Different sources give different results: the total number of sites is either 346 million, or 759 million, or 1.19 billion according to resources that rank highest in Google’s results.
Authoritative Netcraft (which isn’t in the Google Top-10 Search Results) gives the number 958,919,789; another “authoritative” blog citing Netcraft says the company sees “1-2 billions” pages.
There are lots of other sources giving different numbers for this one, simple question.
It seems like the actual number of sites is impossible to get: some sites have no links at all, there instances where multiple sites hide under a single IP, there are even whole networks with constantly changing IP addresses. Not to mention the large numbers of hidden military and governmental sites…
Another aspect that makes things even more complicated: how do we define a “site”? Should we count internal sites of corporations getting thousands of hosts each day with cutting edge data-managing technologies, millions and millions pages accessible from nowhere except their companies’ inner networks? If yes, then how?
In fact, though much depends on what we define as a “site” and how we count, there’s a second level of “data distortion” in mass-media: fast-paced journalists and bloggers are not eager to go into depths, so we end up with controversial data about even the most basic facts about the Web.
Thus, a seemingly simple statistic turns into a mystery. It seems like humanity doesn’t know the size of what it has created and only has a vague idea of the Web’s main characteristics.
About James Ater
An experienced developer, James is the one who stood behind Moonsearch.com at the dawn of the project. He knows the algorithms Moonsearch is using from the inside, and it's a pleasure for him to talk about the processes the project is based on. Inspired by technology he is glad to give you a full understanding of what the practical purpose of Moonsearch is and how it can be used to evaluate your competitors, define your brand's position on the market and identify the gaps for the development. Why to create technology if not to benefit from it? James knows how to do both and is happy to share his vision.