Audience Engagement Digital Publishing
6 mins read

Publishers’ “shadow traffic” problem: Why your traffic numbers are off by 20%

Getting your Trinity Audio player ready...

Getting the complete picture of your traffic has always been challenging. Dark traffic first popped up in 2012, when it became clear that traffic stripped of its referrer information was categorized as “direct.” Dark traffic has grown with the rise of HTTPS and the increasing prevalence of private messaging systems (Slack, WhatsApp, and the like).

While campaign tracking helps identify some dark traffic, there’s a new wrench in the gears of our analytics: shadow traffic.

What is shadow traffic?

Shadow traffic is visits to your site that are not captured by your typical analytics software provider.

Shadow traffic is real traffic from real people, but you won’t have any visibility into them or their behavior. Shadow traffic—real traffic that your analytics misses.

Before you can understand shadow traffic, you need to understand how web, content, and product analytics tools work. Every major analytics tool requires you to add code to your website or app. After this code is added, your analytics provider starts reporting on traffic and events.

When someone visits your website or app, this code sends “events” to your analytics provider. Events are little packets of information that summarize your activity on the site. You are probably most familiar with common web analytics events, such as pageview events, video start events, or email capture events. If events fail to arrive for any reason, such as being stopped by the browser or an adblocker, your analytics provider won’t record anything—not the user, not the session, not the pageview. These visits are still real traffic from real people, but you won’t have any visibility into them or their behavior. This is shadow traffic—real traffic that your analytics misses.

Shadow Traffic Diagram

Shadow traffic happens when your analytics events fail to reach analytics servers.

What causes shadow traffic?

Shadow traffic is caused by adblockers, browser privacy features, and other tools which stop events from reading your analytics provider.

The largest causes of shadow traffic on both web and mobile are adblockers and new privacy features built into browsers.

Later on in this article, we’ll discuss the motivation behind adblockers and browser privacy features, which stems from a valid data privacy concern from internet users.

Despite their name, adblockers often block first-party analytics providers, too. Firefox’s Enhanced Tracking Protection, Safari’s Intelligent Tracking Prevention (ITP), and Microsoft Edge’s Tracking Protection act as built-in blockers of some first-party analytics services. Even Google, the juggernaut of advertising, is building an adblocker into Chrome. Shadow traffic is a growing concern you can’t ignore, since it affects pretty much every major browser and platform.

Shadow Traffic Causes

The largest causes of shadow traffic: adblockers and new privacy features built into browsers. There are also less common tools which cause shadow traffic.

Some advanced internet users also use more advanced privacy tools that are less common causes of shadow traffic. These include:

  • Network-level blocking — like Pi-hole
  • VPN-level blocking — like NordVPN
  • Device DNS blocking — like AdGuard
  • Device App-based blocking — like Wipr

Why do users adopt adblockers?

Although every user is different, there are three major motivations driving the adoption of adblockers:

1. Preventing third-party personal information leakage: Many advertising networks and vendors took advantage of their relationship with site operators to create “cross-site device graphs” out of users. These device graphs link personal information from one site with behavioral information from another site. Users rightly view this commercial practice as an invasion of privacy and installed adblockers in response. Regulations like GDPR and CCPA were created in part to address this issue on a widespread scale. Installing an adblocker reduces the impact of this commercial practice in your browser.

2. Avoiding creepy behavioral retargeting:Many advertising networks allow their customers to “retarget” users based on a multi-site visit history. For example, they might show you ads for a product you viewed (but did not buy) on Amazon.com in the ad slots for a news/information site you visited days later. Users feel these ads as “creepy” and installed adblockers to break the technology that let them work.

3. Improving web page performance: This is probably the top pragmatic reason for the rise of adblockers — improving the end-user performance of most websites. Many sites have ten or twenty different ad networks installed on their sites, and sometimes ad networks deploy other ad networks through a “cascade” of adtech vendors. All of this dramatically slows down site performance and sometimes makes web browsing unbearable.

Why did browsers create built-in blocking technology?

The reasons listed above might seem like nuisances, but preventing personal information leaks, avoiding poor user experiences, and improving page performance are all core responsibilities of the mainstream browsers and the companies or non-profits behind them (that is, Apple, Google, Microsoft, and Mozilla, among others). Plus, several new browser alternatives, like Brave and Vivaldi, started to grow popular by differentiating on these very features via built-in blocking technology. Thus, this technology — and the end-user benefits — are a permanent part of the “browser wars” we have witnessed over the last several decades, as browser developers build on each others’ work in an effort to win market share.

Unlike third-party adblockers, the blocking technology inside browsers tries very hard to not break web technologies in any dramatic way. They take a different implementation approach and tend to be more conservative in how they work. However, in the last couple of years, spurred on especially by Safari’s ITP and Mozilla’s ETP, several classes of technology have been blocked to increase user privacy and browser performance. Blocked technology includes social media trackers, cross-site tracking cookies, fingerprinters, and cryptominers. This has generally led to a better web browsing experience for the average user.

The rise of shadow traffic

In 2019, a global study by GlobalWebIndex, cited by IMPACT, concluded that 47% of today’s internet users use some type of blocker. In 2020, Parse.ly began our own research into what proportion of visitors go untracked. Participants in our early-access program found that at least 20% and up to 40% of traffic was shadow traffic, compared to traditional web analytics tracking.

Missing 20% to 40% of your visitor data leads to misguided decisions, revenue loss, and bad business outcomes. The first step to improving your data quality is addressing shadow traffic, while still fully respecting the valid privacy, user experience, and performance concerns of your audience.

How to measure shadow traffic

Typical analytics tags deployed to your website or app won’t detect shadow traffic. There are workarounds to gain visibility into shadow traffic, but they are difficult to implement and have drawbacks, and all of them require heavy engineering or IT effort.

Option 1 – Consolidated edge server logs

When users access your site or app, they interact with your servers. These servers can log user interactions, and then you can visualize this data to gain some insight onto your users. This approach falls apart because reconciling logs from different servers, a CDN, or even multiple sites is effectively impossible, even with server log management software. This tooling is made for developers and the information that matters to them. Content marketers and editors care about engagement and conversions, not load balancers and server health checks.

Option 2 – Server-side tracking

Blocking typically happens within the context of your user’s web browser, but ultimately, users still need to make requests against your servers in order to access your content. If you send first-party analytics events to your analytics provider from your own servers, then you can keep visibility into the user even as the browser might block a request to an analytics service. This option is technically complex, and replicating the intricacies of a robust client-side analytics instrumentation implementation is effectively impossible. You’ll likely miss out on data that should automatically be collected.

Option 3 – Leverage existing analytics services

The best option leverages your first-party analytics vendor’s managed service, but in a way where you could still securely access privacy-safe aggregated analytics events. This should span your visible and currently-invisible (shadow) traffic, and maintain respect for a user’s privacy, experience, and performance expectations.

by Aakash Shah

Re-published with kind permission of Parse.ly, the insights company that empowers media owners to understand and improve digital audience engagement through data