Skip to main content

Integration: Redshift

IAM Policy Requirements

Authentication with Redshift is handled with IAM User credentials. You can configure these credentials to allow access to all of your Redshift clusters or a subset of clusters.

To ingest all of your AWS account's Redshift clusters and databases into the data registry, the following IAM Policy can be used which grants the required actions on all resources.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"redshift-data:ListTables",
"redshift-data:DescribeTable",
"redshift:DescribeClusters",
"redshift:GetClusterCredentialsWithIAM",
"redshift-data:ListSchemas",
"redshift-data:ListDatabases"
],
"Resource": "*"
}
]
}

Taking advantage of this option simplifies configuration as you will only need to configure the AWS Account ID and the regions your clusters are in, and the integration will discover the rest. Use the 'option 1' array under sources:

Redshift Source All Clusters

Privileges for IAM user

By default in redshift, information_schema (and other cluster-generated tables and views the integration reads from) only shows objects/rows that the connected user has privileges to see. Additionally, only superusers and the creators of objects can query objects. Because of this, if there are tables or views created by a user other than the IAM service account that you want ingested, you can grant permissions to ensure their metadata will be pulled in.

The following commands can be executed programmatically, or in the redshift query editor as a superuser. Ensure you use double quotes around the verbatim username associated with the IAM account you configured for data registry.

-- these two commands ensure for a given schema 'schema_name', all tables are ingested into data registry
GRANT USAGE ON SCHEMA schema_name TO "IAM:DataExperienceRedshiftUser";

GRANT SELECT ON ALL TABLES IN SCHEMA schema_name TO "IAM:DataExperienceRedshiftUser";

-- this command will ensure any new tables created in the schema moving forward are accessible / ingestable into data registry
ALTER DEFAULT PRIVILEGES IN SCHEMA your_schema
GRANT SELECT ON TABLES TO "IAM:DataExperienceRedshiftUser";

-- these commands can help confirm/show if your IAM user has access to schemas or tables
SHOW GRANTS ON SCHEMA schema_name;
SELECT HAS_TABLE_PRIVILEGE('IAM:DataExperienceRedshiftUser', 'schema_name.table-name', 'select');

Configuration

Each source in the Redshift integration requires an accountId; make sure to set up authentication for each account. Each region in the associated config must be enabled for the account.

While Redshift doesn't natively support table or column descriptions, some organizations use the COMMENT sql operation to store relevant information. The Redshift module can optionally be configured to ingest these comments as descriptions, by setting the pullRedshiftComments config value to true. The setting is optional, and defaults to false, because the additional following permissions are required on the IAM policy to pull the comments.

"redshift-data:ExecuteStatement",
"redshift-data:DescribeStatement",
"redshift-data:GetStatementResult"

Authentication

The Redshift integration uses the @backstage/integration-aws-node package to create a credential provider which is then passed into the AWS client SDKs. Since it is handled by a separate plugin, the set up for the authentication for it is found under the App config as seen below:

Redshift Auth

note

Every accountID included in the data registry config must have an associated auth config like above.

Naming

The naming structure for Datasets created from Redshift is as follows: [database].[schema].[table].

Tags & Labels

Tags for redshift resources are not currently supported for the Redshift connector since they cannot be applied to resources at the database granularity or below.

Troubleshooting