SQL Hack: How to join tables when the key is buried in a blob of text
The Situation
Text blobs are a pain, but they are commonly used with data from email platforms, Salesforce, or dynamic UTMs. In these cases, the JOIN key is probably hidden in a blob of text.
For example, say you want to join your users table using the user_id
from the Salesforce notes column. But the notes data looks like this…
Notes Column:Ahmed (user_id: 1f65e898-0f67-4d34-8ac3-ae223bb71a79) called us today
First Attempt: Regex Parsing
Initially, you can try to use regex to parse the user_id out of the value column. This is always annoying but can get close.
If it works, then a simple subquery and join should do the trick. However, like any free text column, the regex may not work perfectly.
Regex can be risky when the structure of the data changes:
Notes Column:Ahmed (user: 1f65e898-0f67-4d34-8ac3-ae223bb71a79)
(references “user” not “user_id”)
When regex doesn’t work, use this hack…
Solution: Use the "ilike" SQL hack
Use an ilike
to JOIN data by creating a dynamic key:
SELECT ...
FROM some_table a. -- structured data
JOIN other_table b -- data with the text blob
ON ( a.column ilike '%' || b.other_column || '%')
This can be a handy trick if you’re in a pinch, but it can also be slow, so use it with caution. And watch out for unwanted duplication in your JOIN if the “ilike” matches more than once!
Good Luck!