CMO

Explainer: What you need to know about data clean rooms

CMO delves into the burgeoning world of data clean room solutions and why they're more than just a media and advertising play


Amazon, Google and Meta operate them, Disney Select uses one to power advertising and match a brand’s first-party data to its own niche audience segments, and Roku has built one supporting its streaming TV platform.

But just what exactly are data clean rooms? Why are they important? Do they only apply to advertising use cases? How could these environments more broadly shape the way marketers and organisations share data with external partners securely in the face of growing privacy concerns and crackdowns? And where can we expect to see use cases evolve and technology to advance?

In CMO’s latest explainer series feature, we take a deep dive into the concept of data clean rooms to find out more.

The purpose of a data clean room

By definition, a data clean room (DCR) is a secure, neutral environment where multiple data sources can be shared without having to physically exchange the data sets or compromise personal information (PII), IP identifiers or any other form of user privacy protection.  

As one of the major providers of data clean rooms, InfoSum, explains it, a DCR is about providing a complete, persistent view of the customer using other first-party data sources without requiring data to be moved into a centralised location, eliminating the risk of exposure and leakage.

“The safety and security of data, combined with the power and intelligence of multi-party computation, have put the data clean room at the top of the must-have list for any organisation that handles customer data,” claims InfoSum communications manager, Brett Pinto.  

LiveRamp COO Asia-Pacific, Melanie Hoptman, notes DCR solutions should provide advanced privacy controls, including encryption, so data can’t be used inappropriately. At the same time, data scientists gain the ability to leverage data to better plan, activate and measure across the ecosystem.

“A strategic, well-executed data collaboration strategy is a route to deeper customer intelligence, which can power a personalised, omni-channel experience for customers, as well as a real competitive advantage,” she says. “Clean rooms enable this data collaboration to happen in a safer, more privacy-conscious way.”

Why we need them

The reason more and more DCR solutions are cropping up can be attributed to the paradigm shift we’re living through. Privacy has become an integral part of what people expect from digital and connected experiences, says AppsFlyer president and managing director APAC, Ronen Mense. Fuelling this macro shift are Apple’s iOS 14.5, Google’s plan to ban third-party cookies, and regulatory crackdowns locally and globally to better protect personal privacy and digitally disclosed information.

“Companies will need to look for alternative methods to collect, share and analyse data without compromising user privacy,” Mense says. “This is where privacy-preserving solutions like the data clean room come into play… It is like the Switzerland of data where all sensitive data is kept secure and private.

“With a DCR, developers can securely join and produce insights based on their first-party data together with their marketing conversion data. They have full control over what data runs in the DCR’s memory, what data they want to use during partner collaboration, and granularity level of the data itself. This is accessible while complying to all privacy regulations to achieve both objectives: Adhering to regulations and taking informed action for their marketing campaigns.”

Among the highest profile clean rooms in the media and advertising world are those run by the walled gardens - Google, Amazon and Meta. Google Ads Data Hub, for example, allows advertisers to customise campaign analysis to understand media effectively using aggregated Google data sets. Amazon Marketing Cloud is a data clean room solution built on Amazon Web Services allowing advertisers to match and analyse their data sets with similarly aggregated data sets delivered by Amazon Advertising events.  

Outside the walled gardens are media players such as Disney and Roku, which have built clean rooms for matching advertiser data with their own audience data sets in order to target and report on campaigns running across their networks. Brands are directly building data clean rooms too. Unilever, for example, has a data clean room for measuring marketing effectiveness and using anonymised data with research firms such as Nielsen and Kantar.  

The concept of a data clean room is not just for advertising purposes, however. After all, sharing or matching up privacy-protected or sensitive data has been around for some time. Just think about credit reporting and agencies such as Experian and FICO in Australia. Matching data solutions have also been available from the likes of Oracle, Epsilon and arguably the DMPs.  

Fuelling a safe data exchange at scale  

Explaining the data clean room as a principle rather than a standalone software system, Versent partner data and advisory lead, David Hanus, sees this as the next-gen iteration of work the digital consultancy group has been doing with different customers over the last few years to data share more easily between two or more parties.

“A good example is a large loyalty business here in Australia: We helped migrate the company’s data platform from an on-premise environment to the cloud. It was a very big use case around migrating and revamping architecture to essentially support what is defined as a data clean room,” he says. “This ‘room’ was designed as an area where they could mix in aggregated or signals data and features developed around customers, together with partners’ data, to run some of their own analytics without necessarily giving away all the core data assets.

“So a data clean room is essentially a zoned area where data is anonymised, washed and standardised then mixed with a partner’s data to drive customer analytics in order to be able to draw insights. And you’re doing that with some fairly strict, rigorous protocols.”

Adding to the imperative of sharing data externally in Australia is the consumer data rights (CDR) framework rolling out across the financial and utilities markets to make consumer data porting easier across providers.

“All of this starting to force fresh thinking, right across industries, around how you responsibly serve, share and use data that aligns to the social contract you have out with your customers,” Hanus says.

What’s stymied such efforts historically is a lack of products to do this seamlessly, quickly and at scale. Commercial requirements around data use and integration add further complexity.

“In every single company and commercial agreement, particularly if you have multiple vendors trying to work together, there are subtle nuances between how company A or company B needs to interact with aggregator Z,” Hanus says.

“Those things change frequently or as new clients come on board. More often than not when you're building these systems and designing foundation components, you have consider what the role-based access framework needs to look like, what bits of data you can see, versus which bits of data are masked from certain parties or what permissions you have to mix data together. It's really hard to predict what that needs to look like. Which means you need to build frameworks that are either extremely flexible, or you can instance out reasonably quickly to support.” 

This is where cloud computing has become such a technological asset. Today’s cloud environments allow organisations to instance data sets out more easily and implement bespoke rules for each one of those commercial agreements without having to completely rearchitect their data warehouse or infrastructure.

“Then there are tools we now work with, such as those from Databricks and Snowflake, that make this data sharing use case significantly easier again,” Hanus says. “You can create ‘virtual warehouses’ where you can share defined datasets in a really simple way, accompanied by auditing and logging details, plus rules allowing you to control how data is being used and how it’s mixed in with other data sets.”

Hanus also attributes data clean room advancement to more distributed management across an organisation to execute activities using data, and the control an organisation needs to assert over its data in conjunction with it.

“You could have teams of people running programmatic campaigns or potentially managing data relationships with other strategic partners. Operationally, you get to a point where you have junior digital marketers or other people running campaigns who are needing to use some of this quite sensitive data in quite what could otherwise be quite high-risk environments. Data clean rooms give a management and infrastructure layer to that and to make it possible to do it safely operationally.”

Senior product manager at data management provider, Talend, Felipe Henao Brand, positions a data clean room as a marketplace where different parties, advertisers, brands or second-party data providers share data in a safe environment for the participants.

“But more importantly, the personas behind all those,” he says. “Historically, it [data sharing] was a bit of a wild west without much control. That's pushed marketing to actually begin to architect and develop new solutions.”

Then there’s the imperative to action data: It's one thing to collect data, but activating data from a marketing point of view is the critical step.

“How do I tap into this data, how many touch points do I have with a consumer or a prospect? That's where data clean rooms are helping organisations that have agreed to share data in in a safe environment,” Henao Brand says. “You create a unique ID only known by the party sharing the data. At the same time, this approach allows brands and advertisers to use that ID to activate data, then measure on the back of that.”

Interactive Advertising Bureau (IAB) Australia tech lead, Jonas Jaanimagi, says the safety and security of data in a DCR, combined with the power and intelligence of multi-party computation, enable companies to instantly match and analyse data across unlimited datasets in real-time while still eliminating the risk of exposure, leakage or misuse.

“These types of solutions are also known as Privacy Enhancing Technologies [PETs], a category of technologies that enable, enhance and preserve the privacy of data throughout its lifecycle, including when being shared with third parties,” he says.

It’s also worth noting DCRs as a concept stem from the older clean room philosophy of ‘zero contamination’ in manufacturing. It’s a concept over 50 years old, Jaanimagi comments, and one invented by Willis Whitfield to stop microscopic dust particles from infiltrating mechanical components in the manufacture of nuclear weapons.

“These processes came to revolutionise manufacturing in electronics and pharmaceuticals, improved safety and even enabled further space exploration. The same strict approach and mindset has been leveraged for media clients looking to work in these environments and with these tools. It’s privacy-first and zero risk mantra for any businesses – at all times.”

But before you forward to invest, it’s important to understand not all data clean rooms are created equal and often don’t solve the same needs, Pinto warns.  

How DCRs match and analyse audiences

So how does a DCR literally work? InfoSum explains it as a segregated environment in which two or more partners can create a match between two or more datasets using a common identifier or key, to generate insights. How they enable the match is usually the main difference between DCR offerings. 

“Many DCRs create matches by moving the data into a centralised location, merging both datasets into one, or with an ID graph, and then extracting insights from there,” Pinto explains. InfoSum’s approach is to match in a decentralised manner and without moving any data using its proprietary Safe Audience Transfer (SAT) technology. 

In LiveRamp’s case, companies take their segmented audience data and event-level data and match it to its pseudonymous, people-based identifier, RampID.

“Within our clean room, customers use RampIDs to leverage different workspaces for audience building, insights, lookalike modelling and activation, or for advanced modelling, data analysis and visualisation,” says Hoptman. “Different users can access the tools they need based on permissions set for them by the system administrator - meaning, the owner of the data.”

Snowflake’s offering, meanwhile, is along the lines of a big data lake where everything gets pulled in, is anonymised, then a unique data piece of ID is created. This can be activated through many different channels. The vendor has also built tools for sharing aggregated data and reporting.

The best data clean rooms offer measurement and insight application capabilities, in addition to connections to leading activation channels, says Hoptman.

“Leading data clean rooms provide built-in and application capabilities that put their clean rooms’ capabilities on par with walled gardens’ clean rooms,” she says. “Today, clean rooms can match impression data to transactions, enable multiple partnerships for multi-touch attribution, and incorporate TV partners to better plan omnichannel campaigns, among many other things.”

In its First-Party Data Handbook, the IAB with InfoSum takes the concept much further by identifying four different types of data clean rooms: Single-party clean rooms (such as a customer data platform); centralised multi-party clean rooms (such as a data warehouse), publisher data clean rooms and decentralised multi-party clean rooms. The decentralised multi-party clean room is what IAB considers to be the ‘truest’ version of a data clean room. Each have differing features and often cannot solve for the same needs, Jaanimagi notes.

Up next: Who has invested in data clean rooms already and the common and emerging use cases, plus evaluation criteria for marketing leaders

Page Break


Who is using DCRs right now

As noted, the earliest examples of clean rooms in action in the marketing and advertising ecosystem are those by the walled garden providers. Google’s Ads Data Hub is often quoted as the first example in this context, followed by Meta Business Suite and Amazon Marketing Cloud.

Snowflake’s US customers include NBC Universal, Roku and Disney for its advertising sales solution across Disney media, properties and outlets. In the latter’s case, segmented data from its owned data sets is split into 1000 different user segments. Advertisers come in and use those for targeting specific audiences, as well as for performance analytics. Disney has also done its own smart modelling.

In Australia, Snowflake also cites several brands and media outlets as customers. “We've certainly seen media companies trying to aggregate different data sets to be able to better position those audiences from an advertising point of view,” says Snowflake sales engineering A/NZ director, Clive Astbury.

“That was going on separately to any cookie conversation. We’ve also seen quite a few different second-party data partnerships occurring. Generally, when I've heard about those, they've come out of a martech player’s stack, so they will have customers using the same technology. And they've been able to connect those two customers together and have that second-party data sharing capability.”

For Jaanimagi, the hope is the evolution of clean rooms allows everyone to start to make effective yet responsible first-party data sharing practices much easier, more efficient, and less risky. “The major private ecosystem platforms have been running these types of solutions for a while now. But the real excitement is in how different partnerships are now launching and evolving across the open Internet and among a range of second-party partners.”

As well as being involved with Disney, partners and clients of InfoSum include Omnicom, Experian, Foursquare and CNN. Retail networks are another growing example of data clean rooms being put into action. LiveRamp has worked with retailers such as Boots and Carrefour to build such solutions.

“Clean rooms should be used by any business that wants to create better experiences for its customers, which is a lot of businesses. Our clean room customers are far-ranging as well,” Loptman says. “We work with some of the biggest retailers in the world to power programs that drive immense value for them.

“We work also with everyone from major Australian publishers to, most recently, a major European utilities organisation that wanted to enhance its customers’ experiences.” 

Common business use cases

As the list of organisations signing up to use DCRs expands, so do the many use cases. One of the most common in a marketing and advertising context is first-party data onboarding, allowing brands, media owners and data providers to undertake matching across their respective data sets. Identity resolution and leveraging multiple identity graph partner is a second example.  

Another use case Pinto points to segmentation and enrichment, to more easily connect, match and enrich first-party data with direct access and without reliance on third-party cookies or aggregated data. “This is about quickly identify the best performing consumer attributes across existing core customers and using that intelligence to build powerful audiences without sharing your data with a third party,” he says.  

Then there’s the popular use case of data activation for media planning. “Both brands and media owners can future-proof their advertising performance with first-party data matching and deliver relevant high performing experiences with little to no media waste,” Pinto says.  

“The fourth one worth mentioning is measurement and optimisation. Marketers can quickly understand and measure the effectiveness of their campaigns, audiences, and sales performance through direct collaboration.”

Hoptman cites retailers and CPG brands leveraging clean rooms to collaborate on customer and transaction data. “Retailers can leverage insights on what customers are buying, how much they are buying, and how often, and share these insights with CPG brands,” she says. “Brands achieve better targeting, reach the right audiences, and reduce their advertising waste. Retailers improve yields and deliver better customer experiences.”

A more interesting use case is between partners that may not have obvious audience overlap or association. “Partnering for data collaboration offers a chance to unlock new insights about brands’ mutual customers that they may not have otherwise discovered. For example, an automotive brand could partner with someone like a big box retailer to uncover audience overlap,” Hoptman says.

“This would then help each partner to reach their customers at different points of the customer journey, and they could curate new messaging to better reach these customers.”

Versent has transport clients wanting to be able to share data with each other to get an end-to-end view of how a particular trip is occurring.

“All of the same principles apply in that space as well, because no one wants to give away that data, necessarily, they kind of want to share it in a managed way,” Hanus says. “That means with the appropriate controls to meet privacy and data sharing obligations they have and which they’ve essentially committed to the customer around.

“We have one customer at the moment, a boutique consultancy, looking at reputation management and to analyse those sorts of reputational risks by diving into their email and groupware systems. They need to establish that data integration point to be able to enable that to happen securely. We've combined products we work with, with a kind of set of proprietary technologies to achieve that.

“We also see this within the land titles industry: A lot of land title services are looking to figure out how to commercialise their data. No one wants to build yet another integration, people want to test it and learn iteratively. Having that data sharing capability where they can essentially mix those datasets, without giving away the asset is critical.”

Key features of a data clean room

In understanding data clean room solutions and use cases, Pinto advises companies to consider scale, speed, privacy, transparency, simplicity and control as evaluation criteria. For example, privacy is the core responsibility of data collaboration and data clean room solutions. There are multiple privacy-enhancing methodologies and technologies (PETs) that can be applied to ensure complete obfuscation and anonymisation of data.

“Control is also a crucial factor. The challenge with modern data sharing solutions is one party’s data needs to leave their control and ownership to perform even the most basic use cases such as data onboarding, activation, and measurement,” Pinto says. “Data clean rooms must eliminate the need for any data to move - even enabling all parties to maintain complete control and holistic governance with granular permissions and access controls that allow each party to dictate how their data is matched, analysed and activated by each partner.”

This is why Hoptman is advising anyone looking at DCR offerings to consider partners they’re collaborating with as a key element in decision making.

“With the advent of a slew of clean room solutions, not only should companies look for clean rooms being used by their partners today, but also interoperable solutions that will allow their data to safely and securely be leveraged with more partners in the future,” Hoptman says. “Furthermore, a preferred clean room should allow companies to get to the level of detail - transaction data, or whatever it may be - to enable them to derive win-win insights from their shared data.

Knowing if you need a DCR

So given all this, do you need to build your own DCR? The jury is out, but most experts say this is one all marketers need to watch.

“Every business that cares about their customers should and will invest in the future and in the protection of their data - and data clean rooms are part of that future,” says Pinto. “The first question they should ask themselves is whether or not they have first-party data, or want to access first-party data. If that answer is yes, then a data clean room is the right solution for them.”

For Jaanimagi, DCRs are much more than a replacement, they are a large leap forward, particularly in the context of consumer privacy, consent management and marketing efficiency and effectiveness.

“Post third-party cookies deprecation, there are three general future-proof approaches emerging for marketers, developed through the pan-industry collaborative efforts over the past 2-3 years,” he says. “For the future of both linked and unlinked first-party data, clean rooms are a significant product evolution, and we expect these privacy-by-design types of solutions to play large roles for companies that have enough scale and quality in their data, and the resources and intent to leverage them fully and completely.” 

Henao Brand believes anyone with enough of a digital footprint to activate consumers’ information is a prospective buyer of a DCR solution.

“We are going to see private data clean rooms for big conglomerates, like media, like retail, and we’ll continue to see that explosion of fragmented sort of marketplaces, which are going to make it more and more difficult for marketers for brands, in my opinion, to measure the success of, of their campaigns,” he comments. “Because if you are able to exploit some data through the Channel Nine marketplace, that doesn't mean you're going to be able to link those results to whatever you do on Infosum. So this is an interesting period for the adtech industry where the regulation, the need for monetisation, and the changes that Google is imposing on the in the industry.”

Customers are already using one or more tools for the purposes of sharing data and action, depending on the dataset, says Astbury.

“This will either become another option or the way they do it. We don't know yet and we’ll see how that pans out,” he adds. “The nice thing for Snowflake as well, we're not restricted to just say, two datasets. Because of this data sharing capability, we can have multi-party clean rooms. So three, four or five companies get together and share data. We will see where that goes and what kinds of additional use cases that brings.”