Data Experience Quick Start
This guide contains the minimal steps you'll need to take to get connected with the Data Experience. More detailed information can be found for the Data Experience and each of the integrations in the sidebar.
What you'll achieve
- Ingest data warehouse tables as APIs into Portal's Software Catalog
- Make datasets searchable alongside your software components
- Provide ownership and lifecycle information for data governance
Prerequisites
- Admin access to Portal's Config Manager
- Access to create credentials in your preferred integration(s) (Redshift, BigQuery, or Snowflake)
Step 1: Enable Data Experience
- Go to Config Manager in Portal
- Under Data Experience, click the Start plugin button.
- Under Catalog, select the Modules tab, and start
@spotify/backstage-plugin-catalog-backend-module-data-registry-provider
- Under Search, select the Modules tab and start
@spotify/backstage-plugin-search-backend-module-data-registry
Step 2: Configure Authentication
- BigQuery
- Redshift
- Snowflake
- Create a GCP service account with the following roles:
roles/bigquery.dataViewer
roles/bigquery.jobUser
- Download the JSON credentials file
- Navigate to Config Manager > Data Experience
- Expand the keys on the sidebar
dataExperience
>registry
>integrations
>bigquery
- Add an item to the
sources
list:- Enter your GCP project ID
- Paste the service account JSON into the
credentials
field
- Scroll to the bottom of the page and click the Save changes button
- Create an IAM User with one of the following policies depending on your needs.
Allow access to all Redshift clusters
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"redshift-data:ListTables",
"redshift-data:DescribeTable",
"redshift:DescribeClusters",
"redshift-data:ListSchemas",
"redshift-data:ListDatabases",
"redshift:GetClusterCredentialsWithIAM",
"redshift-data:ExecuteStatement",
"redshift-data:DescribeStatement",
"redshift-data:GetStatementResult"
],
"Resource": "*"
}
]
}
Allow access to limited Redshift clusters
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"redshift-data:ListTables",
"redshift-data:DescribeTable",
"redshift:DescribeClusters",
"redshift-data:ListSchemas",
"redshift-data:ListDatabases",
"redshift-data:ExecuteStatement"
],
"Resource": "arn:aws:redshift:YOUR_REGION:YOUR_AWS_ACCOUNT_ID:cluster:YOUR_CLUSTER_NAME"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"redshift:GetClusterCredentialsWithIAM"
],
"Resource": "arn:aws:redshift:YOUR_REGION:YOUR_AWS_ACCOUNT_ID:dbname:YOUR_CLUSTER_NAME/*"
},
{
"Sid": "VisualEditor2",
"Effect": "Allow",
"Action": [
"redshift-data:DescribeStatement",
"redshift-data:GetStatementResult"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"redshift-data:statement-owner-iam-userid": "${aws:userid}"
}
}
}
]
}
- Create an access key for this newly created IAM User.
- Navigate to Config Manager > App and expand the
aws
>accounts
key in the sidebar. - Add a new item under the
accounts
list- Enter the AWS
accountId
- Enter the IAM User's
accessKeyId
andsecretAccessKey
- The remaining fields may remain blank.
- Enter the AWS
- Scroll to the bottom of the page and click the Save changes button
- Navigate to Config Manager > Data Experience
- Expand the keys on the sidebar
dataExperience
>registry
>integrations
>redshift
- Add an item to the
sources
list:- Enter your AWS
accountId
- Expand the section below which applies to your situation and follow the instructions
- Enter your AWS
I've enabled access to all Redshift clusters
- Ensure the Option 1 tab is selected
- Add an item to the
sources
list - Enter the AWS
accountId
- Add the regions in which your clusters reside to the
regions
list
I've enabled access to limited Redshift clusters
- Ensure the Option 2 tab is selected
- Add an item to the
sources
list - Enter the AWS
accountId
- Add the clusters you've granted access to in the
clusters
list - Enter the cluster
identifier
andregion
for each cluster
- Scroll to the bottom of the page and click the Save changes button
- Follow Snowflake's guide to generate a key-pair for authentication. Be sure to follow the instructions to assign the public key to a Snowflake user.
- Navigate to Config Manager > Data Experience
- Expand the keys on the sidebar
dataExperience
>registry
>integrations
>snowflake
- Under the
sources
key, select the Option 2 tab and add a new item to the list- Enter
SNOWFLAKE_JWT
in theauthenticator
field - Enter the
username
of the user the public key was assigned to - Enter the
privateKey
- Enter the
warehouse
that should be used for executing queries. If omitted, the user's default warehouse will be used. - Enter the
role
that should be used for executing queries. If omitted, the user's default role will be used.
- Enter
- Scroll to the bottom of the page and click the Save changes button
(Optional) Step 3: Configure Ingest Schedule
Configure how often datasets are ingested from your sources to the data registry.
- From Config Manager > Data Experience, expand the keys on the sidebar
dataExperience
>registry
>integrations
>defaults
>schedule
>frequency
>cron
- Enter a valid crontab string. This is
0 */6 * * *
(every 6 hours) by default. - Scroll to the bottom of the page and click the Save changes button
(Optional) Step 4. Configure Catalog Sync Schedule
Configure how often datasets are replicated to the Software Catalog from the data registry.
- From Config Manager > Data Experience, expand the keys on the sidebar
dataExperience
>catalog
>schedule
>frequency
>cron
- Enter a valid crontab string. This is
0 */6 * * *
(every 6 hours) by default. - Scroll to the bottom of the page and click the Save changes button
Step 5: Test & Verify
- Wait for the first sync to complete - when this happens will depend on how you've configured your schedules in steps 3 and 4. You can monitor progress by visiting the Data Overview page, accessibile from Portal's navigation.
- Search for your datasets in Portal's search
Next Steps
- Add filters to exclude test/staging tables
- Configure integrations with other data warehouses
- Integrate with dbt to bring your dbt projects into the Software Catalog
- Integrate with TechDocs to add documentation for your datasets
- Create your first check for datasets in Soundcheck
- Add news tags or labels to help discovery of your datasets with Entity Overlays
Troubleshooting
- No datasets appearing? Verify service account permissions and project visibility. Consider extending the maximum entity name length in Portal
- Missing owner/lifecycle? Verify your entity defaults are set to valid catalog entities