Automate your Setup
Advanced Orchestration
Those packages offer advanced capabilities that help you to scale your identity graph solution. The packages offer an end to end workflow that can be customized according to your organizations needs. The workflows can be build based on the open source Apache Airflow management solution or the Google Cloud BigQuery native Dataform service. In case you choose the Airflow package, you can deploy them on your virtual machine in your own on premise server landscape or at any cloud provider.
We believe the smart approach is doing things step by step:
Short Description
Advanced Orchestration
Pre-Requisits
Advanced Computation Pacakge
Advanced Reporting Package
Included Components
All SQL Statements: of the Advanced Computation
Dataform Configuration for each SQL statement
Airflow: DAG Configuration for each SQL Statement
Workflow Configuration Variables
Workflow Environment Variables
Missing Components
SQL Statements: Data Quality Reporting
SQL Statements: User ID Profile Reporting
Product Details
Necessary skills
The overview of the contents
Contents of the Advanced Orchestration Package
The Identity Graph Starter Package provides SQL scripts for creating a demo Identity Graph in BigQuery. It guides users through initializing tables, loading demo data, and cleaning inputs to support CRM data modeling. The process culminates with graph modeling, anomaly detection, deduplication, and generating a final nested results table for comprehensive ID analysis.
Step 1 – Initialize Environment
Step 1.1 – Initialize ID Graph Logging
- initialize_id_graph_logging : Initializes a logging table that helps to document workflow start & end timestamps
Step 1.2 – Initialize Source Data Demo Tables
- initialize_crm_data.sqlx : Includes an example CRM Dataset
- initialize_event_stream.sqlx : Includes an example Event Stream (incl. Web Browsing Events, Purchase Events, etc.)
- initialize_cookie_consent_table.sqlx : Includes an example Consent Event Log (incl. Cookie Consent Opt-Ins, etc.)
Step 1.3 – Initialize Edges Tables
- initialize_list_of_crm_edges.sqlx : Includes all Edges with CRM Identifiers
- initialize_list_of_cookie_edges.sqlx : Includes all Edges with Cookie or Fingerprinting Identifiers
- initialize_list_of_auth_signals.sqlx : Includes all Edges created by Authentication Signals (e.g. Website Logins)
- initialize_list_of_link_only_edges.sqlx : Includes all Edges with IDs that should not be used for merging but should be included in the Identity Graph after the main computation
Step 1.4 – Initialize the ID Dictionary
- initialize_id_dictionary.sqlx : Includes the ID dictionary of all IDs mentioned in the above Edges Tables
Step 1.5 – Initialize the Integer Encoded Edges Tables
- initialize_list_of_int_crm_edges.sqlx : Includes all Integer Encoded Edges with CRM Identifiers
- initialize_list_of_int_cookie_edges.sqlx : Includes all Integer Encoded Edges with Cookie or Fingerprinting Identifiers
- initialize_list_of_int_auth_signals.sqlx : Includes all Integer Encoded Edges created by Authentication Signals (e.g. Website Logins)
- initialize_list_of_int_link_only_edges.sqlx : Includes all Integer Encoded Edges with IDs that should not be used for merging but should be included in the Identity Graph after the main computation
Step 1.6 – Initialize the Edges Main Tables
- initialize_edges_main_table_crm.sqlx : Includes the CRM Edges for the Profile Graph Modelling
- initialize_edges_main_table_cookies.sqlx : Includes the Cookie Edges for the Cookie Pool / Device Graph Modelling
- initialize_edges_link_only_table.sqlx : Includes the Edges that should only be linked to Profile or Device Graph
Step 1.7 – Initialize the Iteration Log Tables
- initialize_iteration_log_crm.sqlx : Logs the number of iterations and number of updates for the Complex Deduplication started for the CRM Edges Main Table
- initialize_iteration_log_cookie.sqlx : Logs the number of iterations and number of updates for the Complex Deduplication started for the Cookie Edges Main Table
Step 1.8 – Initialize the Profile to Cookie Pool Connections Table
- initialize_crm_cookie_pool_connections.sqlx : Logs the connections between Profile and Device including the device strength and the maximum strength of any profile to a specific device
Step 1.9 – Initialize the Data Quality Reporting Tables
- initialize_table_data_monitoring.sqlx : Logs the number of unique IDs per ID type in each table for each day
- initialize_steps_data_monitoring.sqlx : Logs the number of unique IDs per ID type in each processing step for each day
Step 2 – Load Demo Data
Step 2.1 – Create the initial data loads
- cookie_consent_demo.sqlx : Loads Demo Data into the Cookie Consent Table cookie_consent
- crm_data_demo.sqlx : Loads Demo Data into the CRM Table crm_data
- event_stream.sqlx : Loads Demo Data into the Event Table event_stream
- list_of_auth_signals.sqlx : Loads Demo Data into the Edges Table list_of_auth_signals
- list_of_crm_edges_demo.sqlx : Loads Demo Data into the Edges Table list_of_crm_edges
- list_of_edges_demo.sqlx : Loads Demo Data into the Edges Table list_of_cookie_edges
- list_of_link_only_edges.sqlx : Loads Demo Data into the Edges Table list_of_link_only_edges
Step 2.2 – Clean the input data
- Update ID type descriptions in list_of_crm_edges
- Update ID type descriptions in list_of_cookie_edges
- Update ID type descriptions in list_of_link_only_edges
- Update ID type descriptions in list_of_auth_signals
- Replace whitespace with _ and convert uppercase letters to lowercase in type_1 and type_2
Step 3 – Graph Modelling
Step 3.1 – Update ID Dictionary
- Merge new IDs in the list_of_crm_edges into the id_dictionary
- Merge new IDs in the list_of_cookie_edges into the id_dictionary
- Merge new IDs in the list_of_link_only_edges into the id_dictionary
- Merge new IDs in the list_of_auth_signals into the id_dictionary
Step 3.2 – Integer Encoding of IDs
- Use the id_dictionary to integer encode the list_of_crm_edges
- Use the id_dictionary to integer encode the list_of_cookie_edges
- Use the id_dictionary to integer encode the list_of_link_only_edges
- Use the id_dictionary to integer encode the list_of_auth_signals
Step 3.3 – Set Edges Table
- Merge new edges from the list_of_int_crm_edges into the edges_main_table_crm
- Merge new edges from the list_of_int_cookie_edges into the edges_main_table_cookies
- Merge new edges from the list_of_link_only_edges into the edges_link_only_table
Step 3.4 – Detect Anomalies
For all Edges Table mentioned above:
- Update the count of edges that include the more frequently observed ID of the edge.
- Update the z_score values for each edge and mark edges as “Normal” or “Outlier”
- Reset the group_id to remove the edge identified as “Outlier” from modelling
Step 3.5 – Update Edges Table
For all Edges Table mentioned above:
- Initialize the group ID for edges without a group_id that are marked as “Normal”
- Duplicate edges and switch int_1 and int_2 as final data preparation before the Complex Deduplication
Step 3.6 – Complex Deduplication
For the CRM & Cookie Edges Table mentioned above:
- Run the Complex Deduplication in iterations until no more group_id updates can be found
For the Link Only Edges Main Table:
- Run a simpler join to connect the Edges to the Group ID mentioned in the CRM or Cookie Edges Tables
Step 4 – Establish Profile & Device Connections
Step 4.1 – Create the Profile & Device Connections Table
- Leverage the list_of_auth_signals to merge new data into the crm_cookie_pool_connections table
Step 4.2 – Update the Connection Strength Values
- Leverage the list_of_auth_signals to update the connection strength values for each Profile to Device connection
- Leverage the list_of_auth_signals to update the maximum connection strength value any Profile has to a Device
Step 5 – Save Results
Step 5.1 – Save CRM Profile Nested Results Table
- Define a list of CRM ID types logged in the id_dictionary
- Define the dynamic sqlx to iterate each CRM ID type logged in the id_dictionary .
- Execute the dynamic sqlx to build the CRM nested results table that includes all relevant ID types
Step 5.2 – Save Cookie Profile Nested Results Table
- Define a list of Cookie ID types logged in the id_dictionary
- Define the dynamic sqlx to iterate each Cookie ID type logged in the id_dictionary
- Execute the dynamic sqlx to build the Cookie nested results table that includes all relevant ID types
Step 5.3 – Save Combined Nested Results Table
- Define a list of available ID types logged in the id_dictionary
- Define the dynamic sqlx to iterate each ID type logged in the id_dictionary
- Execute the dynamic sqlx to build the final nested results table that includes all relevant ID types
Step 5.3 – Save Combined Flat Results Table
- Create a flat results table of CRM Group ID, Unique ID, ID Type & ID Values
- The flat results table can be used in some scenarios as it is easier to query (no unnesting is necessary)
Step 6 – Post Processing
- Add the CRM Group ID to the cookie_consent table
- Add the CRM Group ID to the event_stream table
Step 7 – Table Reporting
- List the number of unique IDs per ID type across all available tables
Step 8 – Process Steps Reporting
- List the number of unique IDs per ID type across all available processing steps
Reviews
There are no reviews yet.