Skip to content

Getting Started with dbt-tags package

In order to start the Tag-based masking journey in our dbt project, we have 6 essential steps as follows:

Notes

  • Before diving into steps, it's recommended to learn more on Tag-based masking polcies (skip if you knew it already 💯)
  • Also, please be noted that you will need the Snowflake Enterprise Edition as the prerequisite

Now, let's go 🏃:

1. Configure the namespace (databse, schema)

Firstly, we should let dbt know where we would like to store the Tag objects (as well as the Masking Policies' definitions).

We'll put the dbt_tags's variables in dbt_project.yml file as below:

vars:
  # dbt_tags__database: MY_DB # (optional) default to `target.database` if not specified
  dbt_tags__schema: COMMON # (optional) default to `target.schema` if not specified

Please NOTE that the schema name (and database name) will be following the dbt Custom schema. If you'd like to keep the actual names, please turn off the Opt-In flag via dbt_tags__opt_in_default_naming_config: False.

2. Decide to allow the specific tags only

ℹ️ Skip this step if all dbt tags are allowed. Otherwise, see the sample below:

vars:
  dbt_tags__allowed_tags:
    - pii_name
    - pii_email
    - ...

3. Decide to commit masking policies' definition

ℹ️ Skip this step if you decide not to use masking policies, but only tags!

Masking Policies' functions vary depending on the dbt project, so it's your responsibility to implement them in advance.

In the dbt root directory, let's create a new one within /macros folder e.g. /macros/mp-ddl.

For each tag name, we need a corresponding macro that holds the masking policy definition.

Given a sample, we have a tag named pii_name, we'll create a macro file as below:

-- File path: /macros/mp-ddl/create_masking_policy__pii_name.sql
{% macro create_masking_policy__pii_name(ns) -%}

  create masking policy if not exists {{ ns }}.pii_name as (val string)
    returns string ->
    case --/ your definition start here /--
      when is_role_in_session('ROLE_HAS_PII_ACCESS') then val
      else sha2(val)
    end;

{%- endmacro %}

{{ ns }} or ns stands for the schema namespace, let's copy the same!

ℹ️ If you want to have multiple masking policies of different data types (they must be different data types) to a single tag, follow these steps:

Given a sample, we have a tag named pii_null, we'll create a macro file as below:

-- File path: /macros/mp-ddl/create_masking_policy__pii_null.sql
{% macro create_masking_policy__pii_null(ns) -%}

  create masking policy if not exists {{ ns }}.pii_null_varchar as (val string)
    returns string ->
    case --/ your definition start here /--
      when is_role_in_session('ROLE_HAS_PII_ACCESS') then val
      else null
    end;

  create masking policy if not exists {{ ns }}.pii_null_number as (val number)
    returns number ->
    case --/ your definition start here /--
      when is_role_in_session('ROLE_HAS_PII_ACCESS') then val
      else null
    end;

{%- endmacro %}

We then must modify the optional var dbt_tags__policy_data_types in the dbt_project.yml file:

vars:
  dbt_tags__policy_data_types:
    - pii_null: ['varchar','number']

These must match the exact same data_type suffix that has been applied to the name of the masking policies in the create macro. This will then assign both of these to the single tag, rather than having to manage multiple of the same tag for the different data types.

Leaving any tags out of the dbt_tags__policy_data_types var definition means that it will expect only a single masking policy which has the exact same name as the tag.

4. Set tags on columns

To assign tags to columns, you follow the same process as you would apply dbt tags to columns normally. This is done in the model schema yaml files.

By default, this package assigns the name of the column as the value of the tag. Because of how dbt tags work, there is no out of the box way to assign values for the Snowflake tags, so a separator ("~") has been configured within dbt_tags to facilitate this.

Setting a value for a tag can be useful for Security Governance querying in Snowflake. Or it can be used within a masking policy to allow some dynamic functionality using the Snowflake function system$get_tag_on_current_column('fully.qualified.tag-name').

Looking at a model's schema yaml file:

  • If you don't need a tag value
columns:
    - name: first_name
      description: Customer's first name. PII.
      tags:
        - pii_name
  • If you do need a tag value
columns:
    - name: membership_number
      description: Customer's membership number. PII.
      tags:
        - pii_mask_last_x_characters~4

The value is then available to use either in the masking policy or in Snowflake.

ℹ️ dbt tags will only deploy tags that have been set on columns. If you have tags or masking policies which aren't assigned to columns, they won't be deployed.

5. Deploy resources (tags, masking policies)

❗We don't want to repeat this step on every dbt run(s).

Instead, let's do it as a step in the Production Release process (or manually).

Remove --args '{debug: true}' for a live run

  • Deploy the dbt tags to DWH:
dbt run-operation create_tags --args '{debug: true}'
  • Deploy the tag-based masking policy functions to DWH:

ℹ️ Skip this step if you decide not to use masking policies, but only tags!

dbt run-operation create_masking_policies --args '{debug: true}'

6. Apply tags to columns

ℹ️ Currently, only column tags are supported!

For dbt models (or any resource types that support dbt hooks), add a new post-hook to the model level:

models:
  my_project:
    post-hook:
      - > # customize below `if` condition following your need
        {% if flags.WHICH in ('run', 'build') %}
          {{ dbt_tags.apply_column_tags() }}
        {% endif %}

For dbt sources, we can leverage on-run-start (or on-run-end) project hook to run this step for all sources' columns:

on-run-start:
  - >
    {{ dbt_tags.apply_source_column_tags() }}

In this type of the resource, we implicitly control the enablement of the run by using dbt_tags__tag_source_columns variable (False by default). Therefore, for the live run, we need to run dbt with dbt_tags__tag_source_columns: true, for example:

dbt build -s ... --vars '{dbt_tags__tag_source_columns: true}'

7. Apply masking policies to tags

ℹ️ Skip this step if you decide not to use masking policies, but only tags!

Remove --args '{debug: true}' for a live run

dbt run-operation apply_mps_to_tags --args '{debug: true}'

Now, go to Snowflake and check the result! 🚀

Happy Masking 🎉