Integration: Redshift
IAM Policy Requirements
Authentication with Redshift is handled with IAM User credentials. You can configure these credentials to allow access to all of your Redshift clusters or a subset of clusters.
- Allow access to all clusters
- Limit access to specific clusters
To ingest all of your AWS account's Redshift clusters and databases into the data registry, the following IAM Policy can be used which grants the required actions on all resources.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"redshift-data:ListTables",
"redshift-data:DescribeTable",
"redshift:DescribeClusters",
"redshift:GetClusterCredentialsWithIAM",
"redshift-data:ListSchemas",
"redshift-data:ListDatabases"
],
"Resource": "*"
}
]
}
Taking advantage of this option simplifies configuration as you will only need to configure the AWS Account ID and the regions your clusters are in, and the integration will discover the rest. Use the 'option 1' array under sources:
If you wish to limit the clusters which the data registry has access to, you will need to specify multiple policy statements. For example, the following policy will grant access to only my-cluster
in the us-east-1
region along with all of its databases:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"redshift-data:ListTables",
"redshift-data:DescribeTable",
"redshift:DescribeClusters",
"redshift-data:ListSchemas",
"redshift-data:ListDatabases",
"redshift-data:ExecuteStatement",
"redshift-data:DescribeStatement",
"redshift-data:GetStatementResult"
],
"Resource": "arn:aws:redshift:us-east-1:my-account-id:cluster:my-cluster"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"redshift:GetClusterCredentialsWithIAM"
],
"Resource": "arn:aws:redshift:us-east-1:my-account-id:dbname:my-cluster/*"
}
]
}
Taking advantage of this option means you will need to configure each cluster in Config Manager manually by specifying the AWS Account ID along with a list of cluster identifier/region pairs. Use the 'option 2' array under sources:
Privileges for IAM user
By default in redshift, information_schema (and other cluster-generated tables and views the integration reads from) only shows objects/rows that the connected user has privileges to see. Additionally, only superusers and the creators of objects can query objects. Because of this, if there are tables or views created by a user other than the IAM service account that you want ingested, you can grant permissions to ensure their metadata will be pulled in.
The following commands can be executed programmatically, or in the redshift query editor as a superuser. Ensure you use double quotes around the verbatim username associated with the IAM account you configured for data registry.
-- these two commands ensure for a given schema 'schema_name', all tables are ingested into data registry
GRANT USAGE ON SCHEMA schema_name TO "IAM:DataExperienceRedshiftUser";
GRANT SELECT ON ALL TABLES IN SCHEMA schema_name TO "IAM:DataExperienceRedshiftUser";
-- this command will ensure any new tables created in the schema moving forward are accessible / ingestable into data registry
ALTER DEFAULT PRIVILEGES IN SCHEMA your_schema
GRANT SELECT ON TABLES TO "IAM:DataExperienceRedshiftUser";
-- these commands can help confirm/show if your IAM user has access to schemas or tables
SHOW GRANTS ON SCHEMA schema_name;
SELECT HAS_TABLE_PRIVILEGE('IAM:DataExperienceRedshiftUser', 'schema_name.table-name', 'select');
Configuration
Each source in the Redshift integration requires an accountId; make sure to set up authentication for each account. Each region in the associated config must be enabled for the account.
While Redshift doesn't natively support table or column descriptions, some organizations use the COMMENT sql operation to store relevant information. The Redshift module can optionally be configured to ingest these comments as descriptions, by setting the pullRedshiftComments
config value to true. The setting is optional, and defaults to false, because the additional following permissions are required on the IAM policy to pull the comments.
"redshift-data:ExecuteStatement",
"redshift-data:DescribeStatement",
"redshift-data:GetStatementResult"
Authentication
The Redshift integration uses the @backstage/integration-aws-node
package to create a credential provider which is then passed into
the AWS client SDKs. Since it is handled by a separate plugin, the set up for the authentication for it is found under the App
config as seen below:
Every accountID included in the data registry config must have an associated auth config like above.
Naming
The naming structure for Datasets created from Redshift is as follows: [database].[schema].[table]
.
Tags & Labels
Tags for redshift resources are not currently supported for the Redshift connector since they cannot be applied to resources at the database granularity or below.
Troubleshooting
- The redshift ingestion is failing to ingest anything? Double check the IAM user has the correct permissions, the client secret and token placed in the aws config belong to the IAM user, and the account ID used in the data registry config matches the one in the aws config.
- Some redshift tables are missing from data registry? Verify the IAM user has privileges on all the schemas and tables/views you want represented in data registry. Use sql statements here to check.