Identity Resolution in a privacy-first, cookie-less future: How to prepare tracking for analytics

Learn how we’ve been able to get 95% identity attribution to devices and modern browsers in a world of cookie consent banners and privacy-first updates.

Identity Resolution in a privacy-first, cookie-less future: How to prepare tracking for analytics

Photo by HalGatewood.com / Unsplash

Between Apple's new privacy protections on their devices, Safari and Firefox blocking third-party cookies by default (with Chrome following close behind in 2022), and the proliferation of cookie consent banners because of GDPR (not to mention the California Consumer Privacy Act), our ability to track users for analytics, especially top of funnel when those users are almost always anonymous, is moving from complicated to almost impossible.

This problem isn't going anywhere: getting only 20% acceptance on your cookie-consent banner is considered good! If you're unable to set a consistent cookie across your user's many sessions (especially for a high retention business like e-commerce), or your javascript conversion events (Google Tag Manager for example) are being blocked, your user's historical behavior will be extremely difficult to stitch together over time.

The issue with browsers blocking javascript conversion pixels

This post outlines a strategy for how to attribute your data properly regardless of privacy changes (now or in the future).

The idea is simple. Once a user converts we know who they are (e.g. they've filled out a signup form and entered their email). We can then look up past anonymous data that matches and assign it to that user.

As a diagram:

Stitch anonymous data to users once they convert

For reference, we've been using this strategy presented at Narrator, and in most cases, we're able to attribute 95% of our clients' anonymous sessions to conversions. Said another way, our customers are able to stitch historical anonymous data to 95% of their converted users–this is even in the last few weeks after all the Apple device and browser privacy updates.

Why attributing anonymous page views to conversions is important

Many customer interactions are easy to attribute. You'll always know who converted, who received emails, or who submitted an order.

Pageviews, on the other hand, are almost always anonymous: think of a landing page, instead of an app where users are already logged in. Unfortunately, pageviews also contain the most important data for attribution and analysis: utm sources, referral URLs, etc. We need to know what our users were doing right before they converted, and the best place for that is page view data.

Though it's incredibly important, it's difficult to tie identifiable conversion events (purchases, emails, bookings, subscriptions) to the user's previous behavior. Most analytics tools simply don't have access to all the data. Either they'll have just page views (Google Analytics) or just leads (your CRM) or just emails (your Email Service Provider), and even when they do – they rely on unreliable javascript conversion pixels.

Connecting your user across all their different sources and devices is difficult

What you’re missing out on by not connecting your users across all their data sources

Imagine you're the owner of a high retention e-commerce business (good for you!), and you're spending on Google and Facebook/Instagram ads to get more traffic to your site.

If you're able to filter out the RETURNING customers from your targeted advertising, you could focus your spending entirely on NEW customers.

RETURNING customers were going to convert anyway (since you have a high retention business) so spending on advertising to "retain" them is a waste vs re-allocating that spend to capture NEW customers.

The Strategy: Use unique URLs to identify your users and use a data warehouse to attribute anonymous visits

At a high level:

  • Track all page views with a service that maintains consistent anonymous identifiers across each session and subdomain. It should report individual page views per anonymous visitor, not just aggregated data.
  • Sync the page views into your data warehouse
  • Add a unique identifier to all urls on your site when you know who the user is. For example, when logging in, filling out a signup form, submitting an order, etc... (e.g. mysite.com/checkout?order_id=192381923).
  • DONT BE LAZY. I know it's tedious, but when I say "every time", I mean it
  • Using the unique (non-PII!) identifiers in the URL, stitch the anonymous page views with your identified users using a data warehouse

Step 1 - Track all page views and sync them into a data warehouse

First, ensure that individual page views are sent to your data warehouse.

The services below allow you to maintain consistent anonymous identifiers across individual sessions, and if the user allows cookies, across multiple sessions as well!

NOTE: If you already use Google Analytics, be careful as Universal Analytics is not the same as Google Analytics 4!!!

Step 2 - Identify your user with a unique URL

When users first come to your site they're anonymous. When they convert — submit a form, buy a product, log in, etc — you know who they are, so it's important to send that information down to your analytics system.

Some examples

Anonymous:

  • Page views

Identifiable:

  • Orders/Payments/Subscriptions services (Stripe, Shopify, etc...)
  • Emails (ESPs like Sendgrid, Mandrill, Klaviyo, etc...)
  • Bookings (Calendly)
  • Leads / Opportunities (CRMs like Salesforce, HubSpot)
  • etc...

Whenever a user tells you who they are on a site when they're anonymous, you have two options:

  1. Add a unique, ideally non-PII, identifier to the URL
  2. Use built-in "identify" events from your existing page view tracker (GA cookies and user identification, Segment Identify)

We highly recommend option #1 above because you're always aware when page view tracking is off, therefore, making it far more consistent and reliable. We often see Javascript-based "identify" calls break or getting caught by chrome extensions so you lose days (sometimes weeks) of data before realizing it.

The way to do this is to find a unique identifier for the order, signup form, email opened, etc. These should be consistent — most 3rd party tools have an id available (e.g. a Shopify order id).

The identifier shouldn't directly identify the user. It should instead come from the actual action taken. In other words, don't use /orders?email=user@example.com It's harder to manage and leaks personally identifiable information.

Here's the URL approach for a few different conversion types:

Completing an Order

  • Add the unique order_id to the thank you page in the URL:  example.com/confirmation?order_id=192381923 (Shopify already does this with its unique checkout URLs)

Signing up for a Subscription

  • Add the unique subscription_id to the thank you page in the URL: example.com/confirmation?subscription_id=192381923

Joining a Newsletter

  • Add the unique contact_id from the Email Service Provider to the confirmation page in the URL: example.com/confirmation?contact_id=192381923

Clicking on a link from an email

  • Add the unique contact_id from the Email Service Provider to each link in the email: example.com/product_page?contact_id=192381923

Scheduling a meeting

  • Add the unique booking_id to the confirmation page in the URL: example.com/confirmation?booking_id=192381923

Step 3 - Attribute anonymous page views to the user!

Now you should have a warehouse with:

  1. Page view data with a consistent and unique anonymous identifier per user for a session
  2. Data from all identifiable customer activities (orders, leads, emails, etc...)

It's time to stitch them together. All you have to do is find the page views with an identifier in the URL. The trick here is that the page views with the identifier in the url also have an anonymous id. Simply look up the user from the identifier, note the anonymous id, and replace the anonymous id with a real user in the data.

Using the "Completing an Order" example from above:

  • For each page view with order_id in the URL...
  • Find the order in your data warehouse with that order_id
  • Get the customer's email from that order
  • For all page views that have the same anonymous_id as the page view (example.com/confirmation?order_id=192381923) go back in time and overwrite their anonymous user_id with that user's email
  • You've now identified all that user's page views!

Remember the earlier graphic? It's this flow, just on your data warehouse:

Stitch anonymous data to users once they convert (on your warehouse)

Example SQL for folks who like queries:

SELECT
  p.anonymous_id,
  o.email
FROM website.pages p 
left join order_service.order o
  on o.id = nullif ( substring ( regexp_substr ( lower( p.search ::varchar)  , 'order_id=[^&]*' ) , 9 ) , '')
where p.search ilike '%order_id%'

So why does this work even if tracking cookies are blocked?

Page views will still have a unique identifier per session (since that doesn't require cookies). As long as the user did one identifiable action during that session we'll be able to attribute their page views. If the user has cookies turned on then the anonymous_id will stay consistent across sessions.

Step 4, 5, 6, etc... Implement this URL strategy EVERY time the user converts!

I know I sound like a broken record, but it's important that you have a strategy for EVERY time a user comes to your site anonymously. If you have lots of returning users (E-Commerce for example), your users will come back on their phones, tablets, computers, etc... This means an individual user will have MANY anonymous identifiers to stitch together.

This also means that you'll need to run the user attribution queries on a regular basis. Building out a data platform that can easily manage this is outside the scope of this post. That said, data platform tools like Narrator can help. We attribute anonymous visits to users automatically and transparently.

By following the strategy above, we've seen Narrator clients achieve > 95% attribution on their anonymous page views even in the multi-touch/multi-device world we live in.

It all comes down to being diligent with URLs with the necessary identifiers in the query params. Once you have them you can easily identify anonymous page views and stitch the user together with your other data sources.


Check us out on the Data Engineering Podcast

Find it on the podcast page or stream it below