Looking towards a cookieless future for publishers: Part 1

Getting your Trinity Audio player ready...

Living in a post-cookie world may seem daunting for publishers, but the advice from tech providers is unanimous: by looking into alternatives in good time and working collaboratively where appropriate, publishers can help brands get even closer to their audiences. By being prepared, remaining transparent and fostering a culture of trust and loyalty, everyone stands to benefit.

To help prepare for the changes, in December 2020, a dozen UK publishers consulted with the IAB’s Tech Lab to explore the “expected impact on ad targeting and real-time bidding within the open marketplace”. It goes without saying that privacy will also be key. As Matt White, Vice President EMEA at Quantcast sums up: “Whatever the publisher landscape that emerges after third-party cookies, the successful solutions will be the ones that put consumer privacy first”.

Meanwhile, Hogg says that Safari and Firefox “only ever really had 20%-40% of the browser market share, so there’s always been enough inventory, whether it’s by using apps, or the Safari and Android identifiers, or whether it’s just through the main browser, which has been Chrome for a long time. By using new technology such as Lotame, Chrome users can now be added into the pool.

Larger publishers may be able to use a walled garden approach, using their own ecosystem of registered users and contextual targeting. However, they are unlikely to compete effectively against [the largest players], with their sheer volume of registrations and email addresses, because they control both the authenticated accounts and the ad ecosystem. Moving data from an open industry standard (third-party cookies) encourages heavier reliance on proprietary systems”.

So, what solutions are available?

Fingerprinting – more harm than good?

Browser (or device) fingerprinting is a method used to cross-track users between websites without using cookies or IDs. Instead, website code is used to test the presence of certain attributes to identify a device. The ‘fingerprint’ is the combination of probabilistic attributes (rather than deterministic cookies attached to a browser), which can then be used to track the user.

Fingerprinting is categorized in four ways: browser parameters (browser name and version, screen resolution, IP address, location); demographic attributes (age, gender, occupation) obtained from supposedly anonymized consumer data traded for marketing purposes; browsing history (although Apple, Google, and Mozilla have since taken steps to mitigate this); and canvas fingerprinting (instructing the web browser to “draw” a hidden image that can be turned into a unique ID).

Despite its use on a quarter of the top 10,000 websites, the accuracy of fingerprint matching is thought to decay quickly over time, from 98% (compared with deterministic matching) within 10 minutes of the attribution; 80% between 10 minutes and three hours; and 50% between three and 24 hours. Indicating that fingerprinting is not a viable (nor moral) method for publishers, there is a distinct correlation between the decline in fingerprinting and the rise of persistent identity solutions.

Identity solutions – the ideal scenario?

There are two types of identity solution. ‘Authenticated web’ solutions attach data to a persistent, unique ID, such as an email address and login to enable deterministic matching. Authenticated IDs currently available include GAID (Google Advertising ID), Apple’s IDFA (Identifier For Advertisers), The Trade Desk’s Unified ID 2.0 (open source, rather than proprietary), ID5 (an independent solution), and the Advertising ID Consortium’s ID (an independent group of ad tech companies using LiveRamp’s IdentityLink).

In contrast, ‘unauthenticated’ or ‘open web’ solutions use signals and algorithms (increasingly using ML) to identify the same user across different devices and apps (known as probabilistic matching). Device clusters (e.g. the devices a person might use with the same IP address, location and websites visited) can help build pretty robust probabilistic data. Although less accurate than deterministic, it does have the ability to scale. As Hogg estimates, “the open web will account for around 70% of all inventory, because, as a user, you don’t ever log into every website you use each day. For instance, you might wade through five different food sites before finding the perfect recipe to download”.

Highlighting the importance of identity, Zara Erismann, MD Publisher Europe at LiveRamp, explains that “publishers and marketers who haven’t yet started their post-cookie journey need to focus on addressability. It is too important to be ignored and there is merit in acting with urgency. Addressability – through people-based identity – provides a direct link to audiences. Without it, a publisher’s ability to target, measure and provide detailed attribution for marketing efforts will be reduced significantly”.

Meanwhile, Quantcast is ramping up its deterministic offering by expanding its existing Consent Management Platform (CMP) to provide user permissions across multiple sites. Permisio is free to all existing publishers and focuses on the user. Early adopters can create an account and set their privacy preferences for each site, or a general profile across all sites. The benefits are 1) publishers have consent to collect first-party data about the user to inform targeted ads, and 2) users see fewer permission pop-ups and have a central portal to control their data. As White explains, “consented first-party data from sources like Permisio combined with ML capabilities will provide publishers with access to great audience insights long into the future”. The challenge, however, will be gaining enough exposure, and therefore uptake, to make a meaningful impact in the industry.

Data enrichment solutions, such as Lotame’s Panorama ID, could be the most effective way to attain scalable quantities of privacy-compliant, pseudonymized data in the years to come, with each individual ID “carrying an average of 200+ behavioral attributes”. Because users’ data will be stored on different IDs, Hogg advises publishers to “look at using DMPs which overlay multiple data sources to enrich inventory”. Panorama ID is a “probabilistic solution that helps publishers collect identifiers and put them into cohorts”, so you may have three or four or five devices that connect together with first-party cookies coming into the mix.

Meanwhile, InfoSum offers an interesting approach to identity resolution with its privacy-by-design technology that “keeps datasets isolated, encrypted and anonymised”. Rather than the information “being moved into one pot, a decentralised network builds virtual bridges between datasets”.

The number of identity solutions on the market certainly suggests this is where the future of publishing is headed.

Should publishers focus on deterministic or probabilistic data?

The two properties of third-party cookies which made them so useful – accurate tracking and ubiquity – have been separated, hence the need for new identity solutions to fill the gap. But should these focus on deterministic or probabilistic data?

Deterministic data – which should also be authenticated data – is more likely to be compliant with privacy regulations and is more accurate. Probabilistic data is still a legal grey area. As a recent Winterberry Group report suggests, almost any user data can fall under GDPR due to “the flexible definition of personal data, where almost any attribute can be considered personal data if it may be combined with personal details at a later stage”.

On the other hand, probabilistic data can have a far wider reach, as it doesn’t rely on users being logged in to every website they visit. As Hogg mentions, by adding in probabilistic data, “publishers will pick up incremental inventory and eyeballs just from people browsing the internet”.

It’s also really important to bear in mind that fresh probabilistic data can be better than stale deterministic data. Knowing with 70% accuracy that a user is car shopping today is more useful to an advertiser than knowing with 100% accuracy they were fridge shopping two weeks ago. As White agrees, “third-party data can be useful in some instances if vetted carefully; the dangers lie in stale, poorly labelled data”.

Ultimately, it’s a trade-off between a safe bet (relevancy and compliance) for a small audience versus a weighted throw of the dice on a large audience. Some solutions are attempting to circumvent the issues with probabilistic tracking by combining many different sources of first-party data, which brings us to data collaboration.

Collaborative solutions (and clean rooms) come to the rescue

It seems the ability to ‘go it alone’ is increasingly unrealistic, with the exception of a small number of scalable walled gardens. As Hogg highlights, “even your first-party data has a shelf life, so potentially, you can’t use the data you collected the day before. Also, when a user logs onto their devices in the morning, they could show as having 30 different user IDs, in part due to browser erosion, where browsers are not only blocking third-party cookies, but forcing the deletion of first-party cookies on a regular cadence. Meanwhile, all third-party data was first-party somewhere along the line”. Therefore, it’s not necessarily true that all first-party data is good and all third-party data is bad. It seems the overriding factor is how it’s been treated.

Munchbach agrees: “I don’t think all third-party data needs to be considered bad, but there are too many unanswered questions out there right now. Is the data consented? What is the lineage? How old is it? I think a better approach is to start with first-party data and then judiciously assess whether third-party data is necessary, based on these questions. With a pure-play CDP, there’s very little reason to use third-party datasets”.

One issue for Fosci is that “merging information creates new information. Has the user consented to merging that data with other data so that new things can be learned about them? The GDPR is very clear; it says you need to ask consent for the specific use of the data [at every stage of the journey]. Relying on first-party data means 60-70% revenue losses for publishers. And that’s taken from Safari, which has already banned cookies. This is where data collaboration comes in”.

So, which different collaborative solutions can publishers leverage?

1. Data cooperatives, originating in the 90s using offline data, allow multiple brands to provide first-party data for combining. Examples include The Abacus Alliance (Epsilon Abacus), The Alliant DataHub (Alliant), Datalogix (Oracle), Apogee (Data Axle) and DonorBase (Lake Group Media).

2. Data marketplaces and data exchanges allow third-party data to be exchanged for targeting or analysis. Publishers can search data sets and select best audiences for activation (e.g. Adobe’s Audience Marketplace, Eyeota, LiveRamp Data Marketplace, Lotame, Oracle DMP, The Trade Desk, Tru Optik and Snowflake).

3. Data clean rooms (or technical data environments), originally designed as a way of matching offline data to a brand’s CRM, now offer publishers a compliant, accurate way of comparing their brand advertiser data against walled garden internal identity graphs, to prevent over-serving ads to the same audiences. The data never leaves the clean room, so publishers don’t actually need to give up their data (but share it in a way that protects each party’s proprietary assets). Typically, the larger walled gardens don’t share with each other, so advertisers still run the risk of duplication and therefore over-investment in these platforms. However, Unilever is developing a clean room in partnership with Google, Facebook, and Twitter, where advertisers should be able to see duplicated reach on these platforms. Currently, 42.9% of US brand marketers and 30.3% of UK brand marketers are leveraging clean rooms such as those offered by Lotame, Acxiom, or InfoSum.

With data sharing capabilities predicted to increase from 2021-2024, through a “near continuous process of testing, similar to the early days of programmatic between 2009 and 2012”, collaborative solutions such as clean rooms are definitely worth considering, especially for larger publishers.

However, Fosci does offer a word of caution: “Often, matching data doesn’t work or isn’t very accurate. The industry average is about 40% [match rate]; there’s a big chunk of IDs that simply don’t sync well. Therefore, there are a lot of company acquisitions motivated by access to first-party data”. So, with a risk of collaborative solutions being hampered by privacy legislation, is there a way of using the data without breaking compliance? Arguably, edge computing provides a perfect solution.

Click here for Part 2 of this story.

This article is an extract from our free to download report, How Publishers Can Swap Out The Cookie Jar.

Hazel Broadley,
Author, How Publishers Can Swap Out The Cookie Jar