Data Experience Quick Start
This guide contains the minimal steps you'll need to take to get connected with the Data Experience. The plugins needed are all running by default so they just have to be configured.
More detailed information can be found for the Data Experience and each of the integrations in the sidebar.
What you'll achieve
- Ingest data warehouse tables as APIs into Portal's Software Catalog
 - Make datasets searchable alongside your software components
 - Provide ownership and lifecycle information for data governance
 
Prerequisites
- Admin access to Portal's Config Manager
 - Access to create credentials in your preferred integration(s)
 
Step 1: Configure Authentication
- BigQuery
 - Redshift
 - Snowflake
 - Databricks
 
- Create a GCP service account with the following roles:
roles/bigquery.dataViewerroles/bigquery.jobUser
 - Download the JSON credentials file
 - Navigate to Config Manager > Data Experience
 - Expand the keys on the sidebar 
dataExperience>registry>integrations>bigquery - Add an item to the 
sourceslist:- Enter your GCP project ID
 - Paste the service account JSON into the 
credentialsfield 
 - Scroll to the bottom of the page and click the Save changes button
 
- Create an IAM User with one of the following policies depending on your needs.
 
Allow access to all Redshift clusters
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "redshift:DescribeClusters",
        "redshift-data:ListSchemas",
        "redshift-data:ListDatabases",
        "redshift:GetClusterCredentialsWithIAM",
        "redshift-data:ExecuteStatement",
        "redshift-data:DescribeStatement",
        "redshift-data:GetStatementResult"
      ],
      "Resource": "*"
    }
  ]
}
Allow access to limited Redshift clusters
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "redshift:DescribeClusters",
        "redshift-data:ListSchemas",
        "redshift-data:ListDatabases",
        "redshift-data:ExecuteStatement"
      ],
      "Resource": "arn:aws:redshift:YOUR_REGION:YOUR_AWS_ACCOUNT_ID:cluster:YOUR_CLUSTER_NAME"
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": [
        "redshift:GetClusterCredentialsWithIAM"
      ],
      "Resource": "arn:aws:redshift:YOUR_REGION:YOUR_AWS_ACCOUNT_ID:dbname:YOUR_CLUSTER_NAME/*"
    },
    {
      "Sid": "VisualEditor2",
      "Effect": "Allow",
      "Action": [
        "redshift-data:DescribeStatement",
        "redshift-data:GetStatementResult"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "redshift-data:statement-owner-iam-userid": "${aws:userid}"
        }
      }
    }
  ]
}
- Create an access key for this newly created IAM User.
 - Navigate to Config Manager > App and expand the 
aws>accountskey in the sidebar. - Add a new item under the 
accountslist- Enter the AWS 
accountId - Enter the IAM User's 
accessKeyIdandsecretAccessKey - The remaining fields may remain blank.
 
 - Enter the AWS 
 - Scroll to the bottom of the page and click the Save changes button
 - Navigate to Config Manager > Data Experience
 - Expand the keys on the sidebar 
dataExperience>registry>integrations>redshift - Add an item to the 
sourceslist:- Enter your AWS 
accountId - Expand the section below which applies to your situation and follow the instructions
 
 - Enter your AWS 
 
I've enabled access to all Redshift clusters
- Ensure the Option 1 tab is selected
 - Add an item to the 
sourceslist - Enter the AWS 
accountId - Add the regions in which your clusters reside to the 
regionslist 
I've enabled access to limited Redshift clusters
- Ensure the Option 2 tab is selected
 - Add an item to the 
sourceslist - Enter the AWS 
accountId - Add the clusters you've granted access to in the 
clusterslist - Enter the cluster 
identifierandregionfor each cluster 
- Scroll to the bottom of the page and click the Save changes button
 
- Follow Snowflake's guide to generate a key-pair for authentication. Be sure to follow the instructions to assign the public key to a Snowflake user.
 - Navigate to Config Manager > Data Experience 1. Expand the keys on
the sidebar 
dataExperience>registry>integrations>snowflake1. Under thesourceskey, select the Option 2 tab and add a new item to the list - EnterSNOWFLAKE_JWTin theauthenticatorfield - Enter theusernameof the user the public key was assigned to - Enter theprivateKey 
- Enter the 
warehousethat should be used for executing queries. If omitted, the user's default warehouse will be used. - Enter therolethat should be used for executing queries. If omitted, the user's default role will be used. 
- Scroll to the bottom of the page and click the Save changes button
 
- Create a Service Principal in Databricks. This can be at either the account or workspace level.
 - This service principal must be granted the 
USE CATALOGandUSE SCHEMAprivileges on each catalog and schema you wish to ingest datasets from. See the Databricks API documentation for more details. - Generate OAuth credentials for the service principal, and take note of the client ID and secret. Be mindful of the lifetime set for these credentials and remember to rotate the secrets before they expire to prevent ingestion failures.
 - Back in Portal, navigate to Config Manager > Data Experience > Databricks
 - For each workspace you'd like to ingest datasets from, add a new source with the workspace URL and service principal credentials. The integration will discover all catalogs, schemas, and tables which the service principal has access to.
 
(Optional) Step 2: Configure Registry Ingest Schedule
Configure how often datasets are ingested from your sources to the data registry.
- From Config Manager > Data Experience, expand the keys on the sidebar 
dataExperience>registry>integrations>defaults>schedule>frequency>cron - Enter a valid crontab string. This is 
0 */6 * * *(every 6 hours) by default - Scroll to the bottom of the page and click the Save changes button
 
Step 3: Test & Verify
- Wait for the first sync to complete - when this happens will depend on how you've configured your schedules in steps 3 and 4. You can monitor progress by visiting the Data Overview page, accessibile from Portal's navigation.
 - Search for your datasets in Portal's search
 
Next Steps
- Add filters to exclude test/staging tables
 - Configure integrations with other data warehouses
 - Integrate with dbt to bring your dbt projects into the Software Catalog
 - Integrate with TechDocs to add documentation for your datasets
 - Create your first check for datasets in Soundcheck
 - Add news tags or labels to help discovery of your datasets with Entity Overlays
 
Troubleshooting
- No datasets appearing? Verify service account permissions and project visibility. Consider extending the maximum entity name length in Portal
 - Missing owner/lifecycle? Verify your entity defaults are set to valid catalog entities