The complete workflow
Advanced Computation
In case you would like to receive access to the advanced SQL statements that give you access to the complete end to end process that we have build at Identity Graph. It includes the creation of separate CRM & Cookie ID Graphs, the device identification and the logic that establishes connections between CRM Profile & Device. In addition there is additional functionality available to distinguish between IDs that should be used in the ID merging process and IDs that should be just added to a profile after the modelling is done.
The end-to-end workflow consists of the following steps:
Short Description
Advanced Computation
Included Components
SQL Statements: Create Input Data
SQL Statements: Create List of Nodes & Edges
SQL Statements: Create Dictionary
SQL Statements: Integer Encoding
SQL Statements: Edges Main Table
SQL Statements: Complex Deduplication
SQL Statement: Create Flat Results Table
SQL Statement: Create Nested Results Table
SQL Statements: Create separate CRM & Cookie Graph
SQL Statements: CRM Profile & Device Connections
Missing Components
SQL Statements: Data Quality Reporting
SQL Statements: User ID Profile Reporting
Necessary skills
The overview of the contents
Contents of the Advanced ID Graph Package
The Identity Graph Starter Package provides SQL scripts for creating a demo Identity Graph in BigQuery. It guides users through initializing tables, loading demo data, and cleaning inputs to support CRM data modeling. The process culminates with graph modeling, anomaly detection, deduplication, and generating a final nested results table for comprehensive ID analysis.
Step 1 – Initialize Environment
Step 1.1 – Initialize ID Graph Logging
- initialize_id_graph_logging : Initializes a logging table that helps to document workflow start & end timestamps
Step 1.2 – Initialize Source Data Demo Tables
- initialize_crm_data.sql : Includes an example CRM Dataset
- initialize_event_stream.sql : Includes an example Event Stream (incl. Web Browsing Events, Purchase Events, etc.)
- initialize_cookie_consent_table.sql : Includes an example Consent Event Log (incl. Cookie Consent Opt-Ins, etc.)
Step 1.3 – Initialize Edges Tables
- initialize_list_of_crm_edges.sql : Includes all Edges with CRM Identifiers
- initialize_list_of_cookie_edges.sql : Includes all Edges with Cookie or Fingerprinting Identifiers
- initialize_list_of_auth_signals.sql : Includes all Edges created by Authentication Signals (e.g. Website Logins)
- initialize_list_of_link_only_edges.sql : Includes all Edges with IDs that should not be used for merging but should be included in the Identity Graph after the main computation
Step 1.4 – Initialize the ID Dictionary
- initialize_id_dictionary.sql : Includes the ID dictionary of all IDs mentioned in the above Edges Tables
Step 1.5 – Initialize the Integer Encoded Edges Tables
- initialize_list_of_int_crm_edges.sql : Includes all Integer Encoded Edges with CRM Identifiers
- initialize_list_of_int_cookie_edges.sql : Includes all Integer Encoded Edges with Cookie or Fingerprinting Identifiers
- initialize_list_of_int_auth_signals.sql : Includes all Integer Encoded Edges created by Authentication Signals (e.g. Website Logins)
- initialize_list_of_int_link_only_edges.sql : Includes all Integer Encoded Edges with IDs that should not be used for merging but should be included in the Identity Graph after the main computation
Step 1.6 – Initialize the Edges Main Tables
- initialize_edges_main_table_crm.sql : Includes the CRM Edges for the Profile Graph Modelling
- initialize_edges_main_table_cookies.sql : Includes the Cookie Edges for the Cookie Pool / Device Graph Modelling
- initialize_edges_link_only_table.sql : Includes the Edges that should only be linked to Profile or Device Graph
Step 1.7 – Initialize the Iteration Log Tables
- initialize_iteration_log_crm.sql : Logs the number of iterations and number of updates for the Complex Deduplication started for the CRM Edges Main Table
- initialize_iteration_log_cookie.sql : Logs the number of iterations and number of updates for the Complex Deduplication started for the Cookie Edges Main Table
Step 1.8 – Initialize the Profile to Cookie Pool Connections Table
- initialize_crm_cookie_pool_connections.sql : Logs the connections between Profile and Device including the device strength and the maximum strength of any profile to a specific device
Step 2 – Load Demo Data
Step 2.1 – Create the initial data loads
- cookie_consent_demo.sql : Loads Demo Data into the Cookie Consent Table cookie_consent
- crm_data_demo.sql : Loads Demo Data into the CRM Table crm_data
- event_stream.sql : Loads Demo Data into the Event Table event_stream
- list_of_auth_signals.sql : Loads Demo Data into the Edges Table list_of_auth_signals
- list_of_crm_edges_demo.sql : Loads Demo Data into the Edges Table list_of_crm_edges
- list_of_edges_demo.sql : Loads Demo Data into the Edges Table list_of_cookie_edges
- list_of_link_only_edges.sql : Loads Demo Data into the Edges Table list_of_link_only_edges
Step 2.2 – Clean the input data
- Update ID type descriptions in list_of_crm_edges
- Update ID type descriptions in list_of_cookie_edges
- Update ID type descriptions in list_of_link_only_edges
- Update ID type descriptions in list_of_auth_signals
- Replace whitespace with _ and convert uppercase letters to lowercase in type_1 and type_2
Step 3 – Graph Modelling
Step 3.1 – Update ID Dictionary
- Merge new IDs in the list_of_crm_edges into the id_dictionary
- Merge new IDs in the list_of_cookie_edges into the id_dictionary
- Merge new IDs in the list_of_link_only_edges into the id_dictionary
- Merge new IDs in the list_of_auth_signals into the id_dictionary
Step 3.2 – Integer Encoding of IDs
- Use the id_dictionary to integer encode the list_of_crm_edges
- Use the id_dictionary to integer encode the list_of_cookie_edges
- Use the id_dictionary to integer encode the list_of_link_only_edges
- Use the id_dictionary to integer encode the list_of_auth_signals
Step 3.3 – Set Edges Table
- Merge new edges from the list_of_int_crm_edges into the edges_main_table_crm
- Merge new edges from the list_of_int_cookie_edges into the edges_main_table_cookies
- Merge new edges from the list_of_link_only_edges into the edges_link_only_table
Step 3.4 – Detect Anomalies
For all Edges Table mentioned above:
- Update the count of edges that include the more frequently observed ID of the edge.
- Update the z_score values for each edge and mark edges as “Normal” or “Outlier”
- Reset the group_id to remove the edge identified as “Outlier” from modelling
Step 3.5 – Update Edges Table
For all Edges Table mentioned above:
- Initialize the group ID for edges without a group_id that are marked as “Normal”
- Duplicate edges and switch int_1 and int_2 as final data preparation before the Complex Deduplication
Step 3.6 – Complex Deduplication
For the CRM & Cookie Edges Table mentioned above:
- Run the Complex Deduplication in iterations until no more group_id updates can be found
For the Link Only Edges Main Table:
- Run a simpler join to connect the Edges to the Group ID mentioned in the CRM or Cookie Edges Tables
Step 4 – Establish Profile & Device Connections
Step 4.1 – Create the Profile & Device Connections Table
- Leverage the list_of_auth_signals to merge new data into the crm_cookie_pool_connections table
Step 4.2 – Update the Connection Strength Values
- Leverage the list_of_auth_signals to update the connection strength values for each Profile to Device connection
- Leverage the list_of_auth_signals to update the maximum connection strength value any Profile has to a Device
Step 5 – Save Results
Step 5.1 – Save CRM Profile Nested Results Table
- Define a list of CRM ID types logged in the id_dictionary
- Define the dynamic SQL to iterate each CRM ID type logged in the id_dictionary .
- Execute the dynamic SQL to build the CRM nested results table that includes all relevant ID types
Step 5.2 – Save Cookie Profile Nested Results Table
- Define a list of Cookie ID types logged in the id_dictionary
- Define the dynamic SQL to iterate each Cookie ID type logged in the id_dictionary
- Execute the dynamic SQL to build the Cookie nested results table that includes all relevant ID types
Step 5.3 – Save Combined Nested Results Table
- Define a list of available ID types logged in the id_dictionary
- Define the dynamic SQL to iterate each ID type logged in the id_dictionary
- Execute the dynamic SQL to build the final nested results table that includes all relevant ID types
Step 5.3 – Save Combined Flat Results Table
- Create a flat results table of CRM Group ID, Unique ID, ID Type & ID Values
- The flat results table can be used in some scenarios as it is easier to query (no unnesting is necessary)
Step 6 – Post Processing
- Add the CRM Group ID to the cookie_consent table
- Add the CRM Group ID to the event_stream table
Reviews
There are no reviews yet.