What you’ll achieve
- Ingest data warehouse tables as APIs into Portal’s Software Catalog
- Make datasets searchable alongside your software components
- Provide ownership and lifecycle information for data governance
Prerequisites
- Admin access to Portal
- Access to create credentials in your preferred integration(s)
Step 1: Configure Authentication
- BigQuery
- Redshift
- Snowflake
- Databricks
- Create a GCP service account with the following roles:
roles/bigquery.dataViewerroles/bigquery.jobUser
- Download the JSON credentials file
- Navigate to Plugins and open the Data Experience plugin
- Expand the keys on the sidebar
dataExperience>registry>integrations>bigquery - Add an item to the
sourceslist:- Enter your GCP project ID
- Paste the service account JSON into the
credentialsfield
- Scroll to the bottom of the page and click the Save changes button
(Optional) Step 2: Configure Registry Ingest Schedule
Configure how often datasets are ingested from your sources to the data registry.- From Plugins > Data Experience, expand the keys on the sidebar
dataExperience>registry>integrations>defaults>schedule>frequency>cron - Enter a valid crontab string. This is
0 */6 * * *(every 6 hours) by default - Scroll to the bottom of the page and click the Save changes button
Step 3: Test & Verify
- Wait for the first sync to complete - when this happens will depend on how you’ve configured your schedules in steps 3 and 4. You can monitor progress by visiting the Data Overview page, accessibile from Portal’s navigation.
- Search for your datasets in Portal’s search
Next Steps
- Add filters to exclude test/staging tables
- Configure integrations with other data warehouses
- Integrate with dbt to bring your dbt projects into the Software Catalog
- Integrate with TechDocs to add documentation for your datasets
- Create your first check for datasets in Soundcheck
- Add news tags or labels to help discovery of your datasets with Entity Overlays
Troubleshooting
- No datasets appearing? Verify service account permissions and project visibility. Consider extending the maximum entity name length in Portal
- Missing owner/lifecycle? Verify your entity defaults are set to valid catalog entities