Aug 16, 2019 - 11:20 PM
You are right - tag-based analytics solutions, like Google Analytics can underreport visitors for various reasons. The most reliable data comes from server logs, but they tend to be very large & hard to process. One large site I worked for took a full day of processing for one day of data. Server log processing is susceptible to other issues, like counting spiders & bots as real human traffic.
Tag-based analytics solutions usually put a single-pixel tracking image, but those, too, could not render in time to count if too far down in the code.
Then there are newer issues like caching from browsers and ISPs. I've seen sites that implemented AMP pages missing data and accidentally counting search engine visits as direct traffic. And now we also see an increase people blocking tracking or ads with browser plugins or alternative browsers.
Despite these problems, I would say Alexa data & Similar Web to be far from close to reality. I wrote about this way back in 2007, basically showing how a small site looked to be bigger than a site that had 11,842% more traffic.
They both rely on panels with major inherent biases (for Alexa, more old-school technical people and for Similar Web, I'm guessing more people who are susceptible to having their browsers hacked). Your own data, including Google Analytics, will always be more accurate than paneled data.
On the other hand an audience of busy parents looking for kid friendly recipes might rarely have blocking. How your site is coded, how fast it is, & where the GA tag appears on the page could also affect your numbers. From what I've read, the error is likely between 5-30%.