Releases: datahub-project/datahub
v0.8.42
v0.8.42
Highlights
User Experience
- Improved Search Experience - preview cards now display usage and freshness information
- Update to Schema History - incorporated Community feedback to remove “Blame” terminology
- Improved UI-Based Ingestion - easily configure metadata ingestion from Snowflake, BigQuery, Looker, and Tableau with an easy-to-follow form; YAML is still supported!
Developer/Community Experience
- Python 3.6 is no longer supported for ingestion – we expect this to impact fewer than 1% of DataHub users (based on PyPi download stats). Please upgrade to Python 3.7 or newer
- Update to GitHub Issue management - issues will be marked as “Inactive” after 30 days of no activity and will be automatically closed following an additional 30 days of inactivity
- We’ve updated our Slack Guidelines! Read them here
Metadata Ingestion
- You can now test your Snowflake connection via the CLI and UI-based Ingestion to ensure you have proper access levels required for general ingestion, profiling, and usage. We will be expanding this functionality to other cloud-based ingestion sources in upcoming cycles.
- Hard delete will now discover and remove soft deleted entities
- Resolved issue of assertion error with dbt stateful ingestion
Full Commit Log
What's Changed
- feat(quickstart,docs): updates for v0.8.41 by @anshbansal in #5409
- fix(ingest): ensure upgrade checks run async by @shirshanka in #5383
- fix(ingest): pass transport options to usage history looker api calls by @mayurinehate in #5417
- feat(quickstart): moving to official confluent images for m1 by @shirshanka in #5416
- fix(documentation) Fix erratic cursor in documentation editor bug by @chriscollins3456 in #5411
- feat(ui): Supporting enriched search preview + misc improvements by @jjoyce0510 in #5419
- chore: remove unnecessary modules from codebase by @shirshanka in #5420
- fix(ingest): extract usage for dashboards allowed by pattern by @mayurinehate in #5424
- fix(docker): fix kafka-setup command to support same capabilities as … by @shirshanka in #5428
- fix(protobuf): ownership fixes by @leifker in #5425
- fix(ui): add dataset qualifiedName parameter to lineage query by @alexey-kravtsov in #5427
- fix(glossary) Fix dropdown where disabled buttons are still clickable by @chriscollins3456 in #5430
- docs(bigquery): add changelog and unittest for profiling limits by @MugdhaHardikar-GSLab in #5407
- fix(siblings): fixing lineage fetching for siblings & sources by @gabe-lyons in #5415
- fix(ui): Fixing unreleased search preview bugs by @jjoyce0510 in #5432
- feat(ui): Adding Statistics Summary to Dataset + Dashboard Profiles by @jjoyce0510 in #5440
- feat(ingest): add test source connection feature, structured report file by @shirshanka in #5442
- fix(ingest/glue): handle error when generating s3 tags for virtual view tables by @timcosta in #5398
- feat(ingest): model - adding a small extension to support communicati… by @shirshanka in #5429
- fix(bigquery-usage): fix dataset name for sharded table by @MugdhaHardikar-GSLab in #5412
- feat(ingestion) Add new endpoint to test an ingestion connection by @chriscollins3456 in #5438
- feat(cli,build): remove deprecated variables GMS_HOST/_PORT by @anshbansal in #5451
- fix(search): make filters by default an empty list if null by @aditya-radhakrishnan in #5454
- fix(hive): add column comment as a column description by @MugdhaHardikar-GSLab in #5449
- feat(groups): add native groups concept to DataHub by @aditya-radhakrishnan in #5443
- fix(ingest): fix serialization of report to handle nesting by @shirshanka in #5455
- fix(tableau): fix tableau db error, add more logs by @mayurinehate in #5423
- build(deps): bump terser from 5.9.0 to 5.14.2 in /docs-website by @dependabot in #5448
- feat(doc): spark-lineage - Adding spark lineage configuration doc for Amazon EMR by @treff7es in #5459
- feat(schema-history): remove blame language for the schema history feature by @aditya-radhakrishnan in #5457
- Search header: Menu icon alignment by @Ankit-Keshari-Vituity in #5458
- build(deps): bump terser from 4.8.0 to 4.8.1 in /datahub-web-react by @dependabot in #5446
- feat(ingest): snowflake - basic test connection capability by @shirshanka in #5464
- fix(ingest/trino): Avoid exception if $properties table empty or not readable by @glinmac in #5447
- feat(ingest): preflight - Add way to check/upgrade brew package version in preflight if needed by @treff7es in #5435
- fix(build): add base image with gradle wrapper cached by @anshbansal in #5467
- doc(bigquery): groups grants by requirements by @sgomezvillamor in #5468
- fix(docs,build): remove base image not needed, cleanup docs by @anshbansal in #5469
- feat(ui): Partial support for Chart usage by @jjoyce0510 in #5473
- fix(ingest): bigquery: multiproject profiling fix by @treff7es in #5474
- fix(ingest): kafka - revert deps back to < 1.9.0 by @shirshanka in #5476
- feat(docker): support multiplatform image for datahub-upgrade by @shirshanka in #5477
- feat(quickstart): experimental support for backup restore for quickstart by @shirshanka in #5418
- feat(dbt): updating source lineage logic by @gabe-lyons in #5414
- Ingestion: Added form in Big Query type to edit the queries. by @Ankit-Keshari-Vituity in #5431
- docs: fix docsearch config by @hsheth2 in #5479
- Search Results: Added checkbox option to select multiple results at once. by @Ankit-Keshari-Vituity in #5422
- feat(delete): hard delete deletes soft deleted entities by @anshbansal in #5478
- fix(docs): add missing closing marker for note section by @shirshanka in #5480
- fix(build): intermittent failure in github actions by @anshbansal in #5452
- feat(model, ingest): add user email in dashboard user usage counts by @mayurinehate in #5471
- feat(ingest): add support for capability report in snowflake test connection by @mayurinehate in #5472
- feat(build): automatically mark issues as stale to close inactive issues by @anshbansal in #5482
- fix(ingest): loosen confluent-kafka dep requirement by @hsheth2 in #5489
- refactor(ingest): cleanup importlib.import_module calls by @hsheth2 in #5490
- build(ingest): make gradle build less chatty by @hsheth2 in #5491
- fix(ingest): Fixing dbt trino datatypes by @aezomz in #5379
- refactor(ci): use custom action for checking codegen status by @hsheth2 in #5493
- feat(spark-lineage): Support ssl cert disable functionality by @MugdhaHardikar-GSLab in #5488
- docs(auth): fix link to point to new doc by @anshbansal in #5501
- docs(updating-datahub): add note for breaking change in looker usage … by @mayurinehate in #5499
- fix(ingest): cleanup unused flake8 noqa statements by @hsheth2 in #5492
- refactor(ci): refactor Docker build-and-push workflows by @hsheth2 in #5494
- docs(slack) Update to Slack guidelines by @maggiehays in #5504
- feat(cli): dele...
v0.8.41
Highlights
User Experience
- Performance improvements in the UI
- Improvements in CSV connector for easier ingestion - description, ownership, domain support added
- UI form for Snowflake Managed Ingestion so you don't have to make changes in YAML
- Viewing Siblings
Developer Experience
- Ability to stop quickstart instead of nuking
- Customizing mapped ports in quickstart
- New models for dashboard usage
- Circuit breaker and python api for Assertion and Operation
Metadata Ingestion
- Improvements in bigquery connector to only profile some tables
- Intermittent 401 errors during ingestion fixed
- New salesforce connector
What's Changed
- fix(test): add cleanup in tests, make urls configurable by @anshbansal in #5287
- fix(docs,quickstart): release related changes for 0.8.40 by @anshbansal in #5299
- [Deployment]: fix config typo on confluent cloud by @tengis in #5293
- fix(cli): suppress secrets in stacktraces by @anshbansal in #5302
- refactor(ui): Fix settings page divider by @jjoyce0510 in #5292
- fix(cli): timeline - category should be owner not ownership by @shirshanka in #5304
- perf(siblings): reduce data fetched by siblings in lineage by @gabe-lyons in #5308
- fix(ingest): bigquery - Fix for bigquery error when there was no bigquery catalog specified by @treff7es in #5303
- fix(ui) Fix entity profile sidebar width issues by @chriscollins3456 in #5305
- perf(search): Improve search default performance by @jjoyce0510 in #5311
- perf(ui): Performance improvements and misc refactorings in the UI by @jjoyce0510 in #5310
- Modified the drop down of Menu Items by @Ankit-Keshari-Vituity in #5301
- fix(validation) Fail validation error silently instead of crashing by @chriscollins3456 in #5314
- feat(docs) Add documentation on authorization & authentication by @pedro93 in #5265
- fix(ui) Make profile icon clickable to expand header menu by @chriscollins3456 in #5317
- refactor(ui): Extract searchable page into its own component (perf + ux) by @jjoyce0510 in #5318
- fix(gms) Remove auto-creating status aspect if not present when ingesting by @pedro93 in #5315
- fix(ui): Add missing SearchRoutes component by @jjoyce0510 in #5321
- feat(ingest): Ingest Looker dashboard create/update/delete timestamps by @mayurinehate in #5312
- fix(ui): Fix pipeline tasks list loading by @jjoyce0510 in #5332
- feat(ingest): lookml - adding support for only emitting reachable vie… by @shirshanka in #5333
- fix(ingest): omit schema fields when name is absent by @mayurinehate in #5275
- fix(siblings) Combine siblings data but remove duplicate data by @chriscollins3456 in #5337
- Fix typo in metadata-ingestion.md by @dougpm in #5338
- fix(me) Cache the me query for performance reasons by @chriscollins3456 in #5316
- fix(tokens) Adds non-admin tests for access tokens by @pedro93 in #5174
- feat(bigquery): support size, rowcount, lastmodified based table selection for profiling by @MugdhaHardikar-GSLab in #5329
- chore: Refactor Python Codebase by @koconder in #5113
- docs(bigquery): profiling report enhancement by @MugdhaHardikar-GSLab in #5342
- feat(ingest): update CSV source to support description and ownership type by @aditya-radhakrishnan in #5346
- Fixed UI issue: Tags list going outside the container by @Ankit-Keshari-Vituity in #5341
- feat(ingest): add salesforce connector by @mayurinehate in #5104
- feat(bootstrap): create abstract class UpgradeStep to abstract away upgrade logic by @aditya-radhakrishnan in #5349
- fix(bigquery-usage): dataset name fix for sharded tables by @MugdhaHardikar-GSLab in #5347
- docs(features): update grammar on Features overview by @maggiehays in #5350
- fix(ci): fix mysql and kafka-connect ingestion test by @shirshanka in #5352
- feat(ui): add copy function for stats table sample value by @ngamanda in #5331
- fix(ui) Correct show/hide tabs in Settings based on privileges by @chriscollins3456 in #5355
- fix(siblings): add useMutationUrn to domain section by @gabe-lyons in #5270
- feat(schema) Show last observed timestamp in the schema tab by @chriscollins3456 in #5348
- fix(glossary) Fixes a bug for yaml ingested terms without source_url by @chriscollins3456 in #5356
- feat(lineage) Add Lineage tab to Chart and Dashboard entity profiles by @chriscollins3456 in #5357
- fix(cassandra): fix Cassandra queries used by IngestDataPlatformInstancesStep by @justinas-marozas in #5199
- refactor(ui): Use createTag mutation for creating new tags from the UI by @jjoyce0510 in #5359
- Added recommendation on group modal by @Ankit-Keshari-Vituity in #5362
- refactor(ui): Remove unnecessary fields in GraphQL queries by @jjoyce0510 in #5358
- feat(ingest) - add audit actor urn to auditStamp by @neojunjie in #5264
- feat(ingest): Domain ingestion usability by @shirshanka in #5366
- fix(config): fixes config key in DataHubAuthorizerFactory by @sgomezvillamor in #5371
- fix(ingest): domains - check whether urn based domain exists during r… by @shirshanka in #5373
- feat(quickstart): Adding env variables and cli options for customizing mapped ports in quickstart by @NavinSharma13 in #5353
- fix(build): tweak ingestion build by @anshbansal in #5374
- feat(query) Add get_entity_v2 to python package by @aezomz in #5255
- fix(airflow): Fix for failing serialisation when Param was specified + support for external task sensor by @treff7es in #5368
- fix(users): fix to not get invite token unless the invite token modal is visible by @aditya-radhakrishnan in #5380
- fix(gms): Propagate token cache error by @pedro93 in #5381
- fix(bootstrap): skip ingesting data platforms that already exist by @aditya-radhakrishnan in #5382
- fix(cli): respect server telemetry settings correctly by @treff7es in #5384
- fix(ingest): bigquery - Graceful bq partition id date parsing failure by @treff7es in #5386
- feat(airflow): Circuit breaker and python api for Assertion and Operation by @treff7es in #5196
- feat(kafka-setup): add options for sasl_plaintext by @abiwill in #5385
- fix(bigquery): multi-project GCP setup run query through correct project by @anshbansal in #5393
- fix(bigquery): add storage project name by @anshbansal in #5395
- Add Changes to support smoke test on Datahub deployed on kubernetes Cluster by @NavinSharma13 in #5334
- fix(PlayCookie) PLAY_TOKEN cookie rejected because userprofile exceeds 4096 chars by @neojunjie in #5114
- feat(dashboards): add datasets field to DashboardInfo aspect by @Masterchen09 in #5188
- feat(siblings): allow viewing siblings separately by @gabe-lyons in #5390
- Added Cursor pointer to tags by @Ankit-Keshari-Vituity in #5389
- feat(GMS): Adding Dashboard Usage Models by @jjoyce0510 in #5399
- fix(q...
v0.8.40
Highlights
Fixes bug in 0.8.39 that prevented standalone MAE consumers from being deployed.
User Experience
Support for deleting Tags and Domains via the UI
Support for editing Domain name via the UI
Visualize Glossary Term source on the Glossary Term Entity Page
Developer Experience
Fix for issue where standalone MAE consumers could not be deployed
Metadata Ingestion
Script to re-index sibling associations for dbt nodes that had already been ingested before 0.8.39
What's Changed
- feat(search) Allow users to update the number of search results per page by @chriscollins3456 in #5212
- feat(build): add base image for ingest by @anshbansal in #5243
- feat(ingest): working with multiple bigquery projects by @anshbansal in #5240
- fix(build): missing libs by @anshbansal in #5254
- fix(build): use correct creds by @anshbansal in #5261
- feat(ingest): redshift - Option to define path spec for Redshift lineage generation by @treff7es in #5256
- fix(ui): Enable previews properly when browsing for DataJob by @MikeSchlosser16 in #5250
- fix(docs): Fix acronym on mxe docs by @MikeSchlosser16 in #5249
- fix(ui): Support deleting references to glossary terms / nodes, users, assertions, and groups by @jjoyce0510 in #5248
- feat(docs) add links in quickstart for adding users by @pedro93 in #5267
- fix(siblings) Display sibling assertions in Validations tab by @chriscollins3456 in #5268
- Feat(domain) Add ability to edit a Domain name from the UI by @chriscollins3456 in #5266
- Delta lake base by @MugdhaHardikar-GSLab in #5259
- fix(siblings) Update the names of siblings utils args for readability by @chriscollins3456 in #5269
- docs(adopters): add showroomprive and n26 as DataHub adopters by @maggiehays in #5271
- feat(glossary) Add Source section to sidebar for Glossary Terms by @chriscollins3456 in #5262
- fix(delta-lake): fix dependency issue for snowflake due to s3_util by @MugdhaHardikar-GSLab in #5274
- fix(ingest): s3 - Remove unneeded methods from s3_util by @MugdhaHardikar-GSLab in #5276
- Selector recommendations in Owner, Tag and Domain Modal by @Ankit-Keshari-Vituity in #5197
- fix(security) Sanitize rich text before sending to backend or rendering on frontend by @chriscollins3456 in #5278
- feat(GraphQL): Support for Deleting Domains, Tags via GraphQL API by @jjoyce0510 in #5272
- feat(build): reduce build time for ingestion image by @anshbansal in #5225
- fix(ingestion): profiling - Fixing partitioned table profiling in BQ by @treff7es in #5283
- fix(ingest) redshift: Adding missing dependencies and relaxing sqlalchemy dependency by @treff7es in #5284
- fix(ingestion): Reverting sqlalchemy upgrade because it caused issues with mssql and redshift-usage by @treff7es in #5289
- fix(Siblings): Have sibling hook use entity client by @gabe-lyons in #5279
- Show message when related glossary terms are empty. by @Ankit-Keshari-Vituity in #5285
- docs(adopter): add Digital Turbine as DataHub adopter by @maggiehays in #5290
- Update schema-registry docker.env by @liyuhui666 in #5231
- feat(siblings): index sibling aspects for historical dbt metadata by @gabe-lyons in #5291
- feat(ui) Adding support for deleting Tags and Domains via the UI by @jjoyce0510 in #5280
Full Changelog: v0.8.39...v0.8.40
v0.8.39
Release Highlights
Known Issues
When using stand-alone MAE consumers (mae-consumer-job) this release will not work; this has been resolved in v0.8.40.
User Experience
- NEW: support for surfacing outcomes of dbt Tests in dataset entity pages (see it in action here)
- NEW: Improved navigation of dbt resources: dbt models and their associated warehouse tables are now merged into a unified entity (see it here). This will automatically be enabled for all newly ingested entities. To view this for entities you have already ingested, you will need to run a restore indices job.
- Improvement to Impact Analysis: When looking at the
Lineage
tab, you can now easily toggle between “Upstream” and “Downstream” entities (try it out here)
Developer Experience
- NEW: Java Kafka Emitter – Use this when you want to decouple your metadata producer from the uptime of your datahub metadata server by utilizing Kafka as a highly available message bus
Metadata Ingestion
- NEW: Make bulk edits to your metadata via CSV (read more)
- Snowflake ingestion improvements: configure profiling to run only if they have been updated within the prior N days
- Managed ingestion update: removed need for sink block
What's Changed
- fix(ui-ingestion): update looker ingestion warning banner by @aditya-radhakrishnan in #5142
- chore: Bump Default UI Ingestion Version 0.8.38 by @jjoyce0510 in #5145
- feat(schema): support rendering schemas with
.
in field names by @gabe-lyons in #5141 - feat(dbt): Platform instances for target platform by @skrydal in #5129
- feat(ingest): snowflake profile tables only if they have been updates… by @mayurinehate in #5132
- fix(airflow): fixes DeprecationWarning with hook-class-names by @sayakmaity in #5143
- feat(frontend): Parse JWT access token claims by @chen4119 in #5138
- fix(tokens): Using keyword search filters for ListAccessTokensResolver by @jjoyce0510 in #5154
- feat(ui) Update the max text length of Terms/Term Groups by @chriscollins3456 in #5162
- docs(policies): add info about Manage User Credentials by @aditya-radhakrishnan in #5157
- fix(restore-indices): Do not fail on MAE row count diff by @dexter-mh-lee in #5165
- fix(Kafka-setup): Make sure it doesn't fail when the new envs are not set by @dexter-mh-lee in #5168
- chore(deps): Bump Nimbus Jose JWT dependency by @pedro93 in #5158
- fix(recs): Verify that an entity exists before recommending by @jjoyce0510 in #5163
- fix(business glossary): setting properties to be empty if the node has no properties aspect by @gabe-lyons in #5166
- refactor(ui): Misc improvements to Dataset Assertions UI by @jjoyce0510 in #5155
- chore(guava): force version of guava in client jars per #5134 by @RyanHolstien in #5153
- feat(boot): Make Glossary Term Upgrade Async by @jjoyce0510 in #5164
- fix(frontend): Add iam auth jar to frontend by @dexter-mh-lee in #5171
- docs(features): update & clean up Features page by @maggiehays in #5175
- fix(glue): fix glue profiling config option by @kangseonghyun in #5178
- feat(upgrade) Check version when determining to run RestoreGlossaryIndices step by @chriscollins3456 in #5182
- fix(jaas): fixed auth.jaas.enabled option parsing by @alexey-kravtsov in #5179
- feat(ingestion): bigquery - Option to send usage queries as well as Operational metadata by @treff7es in #5151
- feat(build): changes to decrease build time, cancel runs in case of multiple commits by @anshbansal in #5187
- refactor(docs): Update Metadata Events Docs by @jjoyce0510 in #5173
- fix(ingest): If there is no manager for a LDAP user (example: system account) by @bda618 in #5180
- bug(ingest): correct case of sys views for mssql description populati… by @BALyons in #5186
- refactor(configs): Simplify Kafka Topic name configurations + docs by @jjoyce0510 in #5198
- feat(ingest): dbt - adding support for dbt tests by @shirshanka in #5201
- fix(cli): correct handling of env variables by @anshbansal in #5203
- feat(ci): split integration tests to reduce run time by @anshbansal in #5205
- feat(datahub-client): add java kafka emitter by @MugdhaHardikar-GSLab in #5074
- feat(graphql): add metrics capturing for graphql latency by @RyanHolstien in #5200
- test(ingestion): bigquery-usage - Adding tests for bigquery usage filters by @treff7es in #5195
- fix(ui): load monaco-editor as a dependency and not from a third party CDN by @Masterchen09 in #5189
- feat(cli): Add token parameter for sample ingestion by @pedro93 in #5160
- feat(lineage) Update Lineage tab and Impact Analysis feature by @chriscollins3456 in #5121
- fix(ingest): add missing ownership types by @afghori in #5209
- feat(ingestion) ldap: make ldap attrs keys configurable by @atulsaurav in #4682
- Remove unnecessary space from application.yml of GMS by @mmmeeedddsss in #5216
- fix(upgrade): fix upgrade when s3 path has = by @RyanHolstien in #5220
- feat(docs) Add and update docs for the new Glossary experience by @chriscollins3456 in #5211
- feat(glossary) Add empty state for the Business Glossary home page by @chriscollins3456 in #5217
- feat(bootstrap): add bootstrap step to clear out unknown aspect rows from the database by @RyanHolstien in #5148
- feat(ingest): adds csv enricher ingestion source by @aditya-radhakrishnan in #5221
- fix(build): pin confluent kafka dependency by @anshbansal in #5224
- fix(ingest): databricks - ingest structs correctly through hive by @shirshanka in #5223
- feat(dbt): add sibling association logic to associate dbt elements with their target systems by @gabe-lyons in #5190
- feat(tableau): use pagination for all connection queries by @mayurinehate in #5204
- Handling 404 page not found by @Ankit-Keshari-Vituity in #5227
- refactor(UI): Refactor Dataset Health Status by @jjoyce0510 in #5222
- fix(dbt-test): Inconsistency in assertions by @Santhin in #5214
- feat(ingest): remove need for sink block in UI based ingestion by @anshbansal in #5208
- fix(ingest): bigquery - Grouping date named tables at bigquery by @treff7es in #5230
- Add check for 0 rows when profiling datasets from s3 by @Jiafi in #5219
- [bug fix]: disabled create buttons by @xiphl in #5234
- fix(ingest): bigquery - Handling gracefully sql parser error in bq lineage by @treff7es in #5238
- fix(ingest): do not dump password by @anshbansal in #5235
- feat(ingest): dbt - improving dbt_meta mapping by @shirshanka in https://github.com/datahub-project/data...
[!] DataHub v0.8.38
Notice: There is a known issue in this release. Listing access tokens for a user may not return the correct results to the UI due to an unreliable query to DataHub's search backend. This will be resolved in v0.8.39. Note that this does not mean that access tokens will not work or are in any way compromised - the functionality of generating and using access tokens is not impacted.
The below release notes are copied from v0.8.37 release notes.
Highlights
User Experience
This release comes packed full of new features and updates.
- NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
- NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
- NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
- UPDATE - Rename “Manage” navigation item to “Govern”
- [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
- [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
- FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
- Minor fixes & improvements to UI for adding policy users + groups.
Metadata Ingestion
- Support Snowflake ingest via Oauth
- Misc fixes and improvements to existing ingestion sources
Disclaimers:
With this upgrade, we've added a new mechanism for authenticating users: native authentication. By default, this is enabled, which will allow new users to be created by Admin and for the user to login.
If you were previously disabling BOTH JaaS (via AUTH_JAAS_ENABLED = false) AND OIDC, and you still do not want to require a username + password to login, you'll need to add a new environment variable to datahub-frontend-react
container: AUTH_NATIVE_ENABLED=false.
What's Changed
- feat(docs): auto-open config section for ingestion sources by @shirshanka in #5075
- feat(spark-lineage): coalesce spark jobs by @MugdhaHardikar-GSLab in #5077
- refactor(ui): UI Navigation Refactoring by @jjoyce0510 in #5076
- Update docs to alert users to restore indices for their Glossary by @chriscollins3456 in #5082
- fix(restore-indices): Do not fail while working with each row by @dexter-mh-lee in #5084
- fix(ingestion): looker - Handling gracefully invalid json in query dynamic field by @treff7es in #5083
- feat(docs): ingest - add tab for config json schema by @shirshanka in #5086
- chore(dep): upgrade json-smart by @RyanHolstien in #5081
- feat(ingest): rest_emitter - Adding option to rest emitter to disable ssl verification by @treff7es in #5042
- feat(cli): suggest upgrades when appropriate by @shirshanka in #5091
- feat(doc): Generating json schema for ingestion recipes by @treff7es in #5092
- feat(ingest): snowflake using oauth by @saxo-lalrishav in #4647
- fix(ui): do not show copy URN buttons when Clipboard API is not available by @Masterchen09 in #5087
- feat(kafka): use a thread pool executor for kafka for thread reuse by @RyanHolstien in #5079
- Manage Access Tokens by @Ankit-Keshari-Vituity in #5067
- tests(lookml): adding tests for model deny patterns by @gabe-lyons in #4934
- feat(model): Add optional context field to tag/term association by @dexter-mh-lee in #5085
- fix(glossary) Two quick followup fixes around the new Glossary updates by @chriscollins3456 in #5065
- chore(deps): bump eventsource from 1.1.0 to 1.1.1 in /docs-website by @dependabot in #5057
- feat(oidc): add configurable read timeout by @RyanHolstien in #5088
- feat(glossary) Display Incoming 'IsA' Glossary related entities by @chriscollins3456 in #5063
- fix(profiling): don't stop if some steps fail by @anshbansal in #5095
- feat(upgrades) Create new DataHubUpgrade + Restore Glossary Entities Bootstrap step by @chriscollins3456 in #5099
- fix(deps): ingest - moving packaging to framework_common by @shirshanka in #5096
- feat(frontend) Allow overriding akka-max-header-value-length by @karoliskascenas in #5094
- refactor(graphql): Migrate Visual Config into the Configuration Provider by @jjoyce0510 in #4780
- chore(akka): upgrade akka http for vuln by @RyanHolstien in #5100
- fix(build): reduce time taken for resolution by @anshbansal in #5106
- fix(build): remove dependencies added for compatibility by @anshbansal in #5108
- fix(ci): pin google-cloud-logging to avoid pip backtracking by @shirshanka in #5109
- Policies page issue by @Ankit-Keshari-Vituity in #5107
- chore(deps): Bump spring to 5.3.20 for vuln fix by @pedro93 in #5110
- fix(cli): Bumping avro-gen3 to 0.7.4 by @jjoyce0510 in #5098
- feat(docs): Updating example files with the new ingestion recipe suffix by @treff7es in #5103
- feat(graphql): add graphql endpoint to check whether an entity exists by @aditya-radhakrishnan in #5102
- feat(looker): ensure explore name matches looker's display name by @shirshanka in #5111
- fix(ui): Fixing missing homescreen logo by @jjoyce0510 in #5112
- fix(dbt): final fix of dbt platform instance issues by @gabe-lyons in #5115
- feat(ingestion): bigquery-usage - Collect stats from read event reasons by @treff7es in #5118
- feat(terms) Add ability to Add and Remove Related Terms to Glossary Terms by @chriscollins3456 in #5120
- Fixed Issue : Add Members Modal by @Ankit-Keshari-Vituity in #5117
- fix(bigquery): handling of empty partitioned tables, improve report message by @anshbansal in #5122
- feat(glossary) Hide self and children from select when moving a GlossaryNode by @chriscollins3456 in #5123
- fix(ingestion): bigquery-usage - Removing filtering at queryevents by @treff7es in #5124
- feat(users): add ability to add native users from the UI by @aditya-radhakrishnan in #5097
- fix(ingestion): Looker original view name should be used for explore_joins by @sebkim in #4928
- fix(iceberg): Change how MapType are mapped to Avro to support complex Map key type by @cccs-eric in #5060
- fix(ingestion): bigquery-usage - Only send operational metadata for allowed tables by @treff7es in #5127
- fix(dbt): Validator error fix by @BoyuanZhangDE in #5125
- feat(settings): skip calling graphql hooks if user does not have the right permissions by @aditya-radhakrishnan in #5136
- fix(ingest): fix table urn for athena connectionType by @mayurinehate in #5135
- Fixed the UI issue on Deprecated Pop-Up issue by @Ankit-Keshari-Vituity in #5130
- fix(ui-ingestion): show warning banner when configuring looker ui-ingestion for the first time by @aditya-radhakrishnan in #5139
- fix(tokens): Fix stale cache problem, reduce cache timeout for access tokens + fix listing owner tokens by @jjoyce0510 in #5140
Full Changelog: v0.8.37...v0.8.38
[!] DataHub v0.8.37
Notice! This version has a few known bugs regarding revocable access tokens. Specifically, the UI for listing access tokens does not work properly unless you have a specific platform privilege. Additionally, there is a delay in revoking access tokens of 6 hours. We recommend that you skip this version and upgrade directly to v0.8.38.
Highlights
User Experience
This release comes packed full of new features and updates.
- NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
- NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
- NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
- UPDATE - Rename “Manage” navigation item to “Govern”
- [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
- [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
- FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
- Minor fixes & improvements to UI for adding policy users + groups.
Metadata Ingestion
- Support Snowflake ingest via Oauth
- Misc fixes and improvements to existing ingestion sources
What's Changed
- feat(docs): auto-open config section for ingestion sources by @shirshanka in #5075
- feat(spark-lineage): coalesce spark jobs by @MugdhaHardikar-GSLab in #5077
- refactor(ui): UI Navigation Refactoring by @jjoyce0510 in #5076
- Update docs to alert users to restore indices for their Glossary by @chriscollins3456 in #5082
- fix(restore-indices): Do not fail while working with each row by @dexter-mh-lee in #5084
- fix(ingestion): looker - Handling gracefully invalid json in query dynamic field by @treff7es in #5083
- feat(docs): ingest - add tab for config json schema by @shirshanka in #5086
- chore(dep): upgrade json-smart by @RyanHolstien in #5081
- feat(ingest): rest_emitter - Adding option to rest emitter to disable ssl verification by @treff7es in #5042
- feat(cli): suggest upgrades when appropriate by @shirshanka in #5091
- feat(doc): Generating json schema for ingestion recipes by @treff7es in #5092
- feat(ingest): snowflake using oauth by @saxo-lalrishav in #4647
- fix(ui): do not show copy URN buttons when Clipboard API is not available by @Masterchen09 in #5087
- feat(kafka): use a thread pool executor for kafka for thread reuse by @RyanHolstien in #5079
- Manage Access Tokens by @Ankit-Keshari-Vituity in #5067
- tests(lookml): adding tests for model deny patterns by @gabe-lyons in #4934
- feat(model): Add optional context field to tag/term association by @dexter-mh-lee in #5085
- fix(glossary) Two quick followup fixes around the new Glossary updates by @chriscollins3456 in #5065
- chore(deps): bump eventsource from 1.1.0 to 1.1.1 in /docs-website by @dependabot in #5057
- feat(oidc): add configurable read timeout by @RyanHolstien in #5088
- feat(glossary) Display Incoming 'IsA' Glossary related entities by @chriscollins3456 in #5063
- fix(profiling): don't stop if some steps fail by @anshbansal in #5095
- feat(upgrades) Create new DataHubUpgrade + Restore Glossary Entities Bootstrap step by @chriscollins3456 in #5099
- fix(deps): ingest - moving packaging to framework_common by @shirshanka in #5096
- feat(frontend) Allow overriding akka-max-header-value-length by @karoliskascenas in #5094
- refactor(graphql): Migrate Visual Config into the Configuration Provider by @jjoyce0510 in #4780
- chore(akka): upgrade akka http for vuln by @RyanHolstien in #5100
- fix(build): reduce time taken for resolution by @anshbansal in #5106
- fix(build): remove dependencies added for compatibility by @anshbansal in #5108
- fix(ci): pin google-cloud-logging to avoid pip backtracking by @shirshanka in #5109
- Policies page issue by @Ankit-Keshari-Vituity in #5107
- chore(deps): Bump spring to 5.3.20 for vuln fix by @pedro93 in #5110
- fix(cli): Bumping avro-gen3 to 0.7.4 by @jjoyce0510 in #5098
- feat(docs): Updating example files with the new ingestion recipe suffix by @treff7es in #5103
- feat(graphql): add graphql endpoint to check whether an entity exists by @aditya-radhakrishnan in #5102
- feat(looker): ensure explore name matches looker's display name by @shirshanka in #5111
- fix(ui): Fixing missing homescreen logo by @jjoyce0510 in #5112
- fix(dbt): final fix of dbt platform instance issues by @gabe-lyons in #5115
- feat(ingestion): bigquery-usage - Collect stats from read event reasons by @treff7es in #5118
- feat(terms) Add ability to Add and Remove Related Terms to Glossary Terms by @chriscollins3456 in #5120
- Fixed Issue : Add Members Modal by @Ankit-Keshari-Vituity in #5117
- fix(bigquery): handling of empty partitioned tables, improve report message by @anshbansal in #5122
- feat(glossary) Hide self and children from select when moving a GlossaryNode by @chriscollins3456 in #5123
- fix(ingestion): bigquery-usage - Removing filtering at queryevents by @treff7es in #5124
- feat(users): add ability to add native users from the UI by @aditya-radhakrishnan in #5097
- fix(ingestion): Looker original view name should be used for explore_joins by @sebkim in #4928
- fix(iceberg): Change how MapType are mapped to Avro to support complex Map key type by @cccs-eric in #5060
- fix(ingestion): bigquery-usage - Only send operational metadata for allowed tables by @treff7es in #5127
- fix(dbt): Validator error fix by @BoyuanZhangDE in #5125
- feat(settings): skip calling graphql hooks if user does not have the right permissions by @aditya-radhakrishnan in #5136
Full Changelog: v0.8.36...v0.8.37
DataHub V0.8.36
V0.8.36
Highlights
User Experience
NEW – Manage Glossary Terms via the DataHub UI! Delivering on our Q2’22 Roadmap item, end users can now create, edit, move, delete, and deprecate Glossary Terms via the UI! With this new experience comes some new ways of indexing data in order to make viewing and traversing the different levels of your Glossary possible. Therefore, you will have to restore your indices in order for the new Glossary experience to work for users that already have existing Glossaries. If this is your first time using DataHub Glossaries, you're all set!
Ability to add multiple Owners, Tags, Terms
Developer Experience
The new Revokable Token API supports a new type of Access Token which can be revoked & queried, allowing admins to easily delete tokens for operational & security reasons. Read all about it in the Access Token Management Usage Guide.
Ingestion Updates
This release includes 3 new Metadata Sources:
- Iceberg
- Vertica
- SAP HANA
📣 Massive shoutout to DataHub Community members @cccs-eric, @eburairu, and @buggythepirate for driving these contributions! 📣
These sources are currently marked as “Testing” - we encourage you to try them out & provide feedback in the DataHub #ingestion Slack channel!
We’ve rolled out the following ingestion-related improvements:
- AWS Glue - data profiling is now supported
- S3 ingestion speed-up
- Various bug fixes
Full Commit Log
- #5071 @dexter-mh-lee fix(docker): Fix mysql setup bug
- #5066 @jjoyce0510 refactor(docs): Rename metadata modeling ingestion sidebar titles
- #5036 @mmmeeedddsss fix(mysql-setup-job): add mysql default port override support
- #5056 @nj7 fix: ES Rest Client Creation for non ssl authenticated connection
- #5053 @ShubhamThakre fix(ui): ui bug fix for datasets sidebar stats section
- #5061 @anshbansal feat(redash): add parallelism support for ingestion
- #5017 @anshbansal feat(model): new chart types
- #5047 @RyanHolstien fix(datahub-upgrade): exclude unnecessary configuration from standalone applications
- #5052 @shirshanka feat(ci): datahub-client - add workflow, fix build
- #5054 @jjoyce0510 docs(actions): Adding DataHub Actions to docs website
- #5031 @piyushn-stripe feat(frontend): Allow overriding frontend with a custom akka http server
- #5050 @dexter-mh-lee Remove exception on ingest policies
- #5043 @Masterchen09 fix(docs): hana - rename SAP HANA source and data platform
- #5051 @shirshanka fix(ingest): fix build breakage due to traitlets 5.2.2 bug
- #5045 @anshbansal fix(redash): fix bug with names, add option for page size, debugging info
- #5022 @jjoyce0510 fix(restore): Add RESTATE ChangeType to MCL / MCP to permit restore indices
- #5041 @anshbansal doc(bigquery): fix missing permissions
- #5030 @endeesa fix(doc) - Specify docker-compose version to avoid compatibility issues
- #4879 @BoyuanZhangDE feat(ingest): glue - enable profiling
- #5035 @treff7es fix(profiling): bigquery - Fix for Bigquery temp table creation on GE >= 0.15.3
- #5040 @shirshanka fix(build): m1 build fails to install hdb-cli
- #5026 @chriscollins3456 feat(glossary) Business Glossary updates
- #4940 @MugdhaHardikar-GSLab fix(spark-lineage): remove need for sparksession.stop call
- #5023 @rslanka fix(ingest): common - fix nullability determination for the AVRO fixed type.
- #5012 @anshbansal fix(cli): don't use env for container, add example
- #5021 @maggiehays docs(townhall): update townhall rsvp link and add may townhall detail
- #5038 @shirshanka fix(build): docgen should fail if plugin is not loadable
- #5033 @RyanHolstien fix(timelineAPI): fix issue with semantic versioning
- #5034 @RyanHolstien fix(telemetry): exclude configuration from standalone apps
- #5029 @RyanHolstien feat: telemetry improvements
- #5028 @gabe-lyons dont set platform instances for sources
- #5027 @anshbansal fix(parsing): incorrect parsing for commas
- #4938 @Ankit-Keshari-Vituity refactor(ui): UI Integration to add multiple tags, terms and owners
- #5025 @anshbansal fix(parsing): improve sql parsing, some debugging redash
- #5024 @rslanka fix(ingestion): Remove hana from base_dev_requirements to unblock m1 users
- #5014 @anshbansal fix(bigquery): reduce number of calls for details of partitioning
- #5016 @ShubhamThakre fix(ui): arrow click position update
- #5019 @rslanka fix(build): fix for hana build failure for aarch64.
- #5020 @jjoyce0510 feat(Tests): Make DataHub Tests Feature configurable via env variable
- #5005 @hsheth2 test(ingestion): change class names to avoid unittest warnings
- #5006 @hsheth2 fix(ingestion): use raw strings for regexes
- #5010 @rslanka feat(ingestion): Add Iceberg source
- #5001 @PatrickfBraz fix(bigquery-usage): fix audit metadata query template
- #4997 @anshbansal fix(redash): improve logging for debugging, add validation for dataset urn, some refactoring
- #4376 @buggythepirate feat(ingest): Added new ingestion source SAP HANA
- #5011 @rslanka Fix pulsar source docs.
- #4555 @eburairu feat(ingest): Add Source from Vertica
- #5008 @anshbansal fix(dbt): missing aws dependency
- #5007 @anshbansal fix(bigquery): restrict protobuf version
- #5004 @pedro93 fix(gms): Fix incorrect StatefulTokenService init
- #5002 @ShubhamThakre fix(ui): ui bug fix - fixing search card vertical margin
- #4994 @anshbansal doc(delete): add example for dataflow and datajob
- #4988 @jjoyce0510 feat(DataHub Operations): Adding GraphQL mutation for reporting Dataset operations
- #4998 @shirshanka fix(cli): timeline - adjust for timeline API changes on server
- #5000 @pedro93 fix(docs): Fixes token docs
- #4989 @jjoyce0510 feat(Tests): Metadata Tests Models + APIs + UI (Part 1)
- #4995 @treff7es fix(airflow): Fix for Airflow 1 support
- #4993 @shirshanka chore(deps): upgrade gson version
- #4935 @BoyuanZhangDE feat(dbt): enable dbt read artifacts from s3
- #4833 @treff7es feat(airflow): Airflow lineage ingestion plugin
- #4931 @mayurinehate fix(ingest): tableau - fix chart custom properties None key error, update docs
- #4943 @mayurinehate feat(model): add created, lastModified auditstamps to SchemaField
- #4991 @anshbansal refactor(redash): emit charts first and try with id based dashboard API first
- #4942 @mohdsiddique metabase chart are missing from dashboard
- #4992 @anshbansal doc(ingest): update golden file command
- #4927 @treff7es feat(ingest): s3 - speeding up ingestion with sampling
- #4979 @pedro93 fix(smoke-tests) Increases sleep timeout in rollback test to prevent flakiness
- #4964 @dexter-mh-lee feat(run): Create a describe run endpoint for fetching aspects created by the ingestion run
- #4169 @claudio-benfatto feat(ingestion): optionally disable some kafka schema warnings
- #4972 @mayurinehate feat(great-expectations): allow DATAHUB_DEBUG env var to enable debug logs in GE Action
- #4957 @justinas-marozas refactor(metadata-io): introduce a storage-independent in-memory entity aspect model
- #4982 @jjoyce0510 feat(authorization): Adding AuthorizerContext + ResourceSpecResolver to context
- #4984 @anshbansal doc(ingestion): default boolean fix, broken bigquery docgen
- #4970 @pedro93 feat(graphql) Add new Revokable Token API
- #4987 @anshbansal fix(ingest): remove new schema field usage
- #4985 @anshbansal fix(redash): use dashboard id if slug does not work
- #4986 @pedro93 chore(deps): upgrade datastax libs version
- #4981 @RyanHolstien fix(metadata-service): telemetry - fix hardcoded aspect name, suppress errors when producing MAE
- #4983 @shirshanka fix(ingest): mode - dashboards without creator info fails to process
- #4975 @chriscollins3456 fix(UI) Fix multiple UI usability issues
- #4977 @maggiehays docs(townhall): update invite links and townhall history
- #4980 @MugdhaHardikar-GSLab feat(spark-lineage): support for persist API
- #4974 @anshbansal feat(bigquery): add partition key tag
- #4967 @anshbansal fix(bigquery): add rate limiting for api calls made
- #4971 @shirshanka fix(cli): graph - get_aspect_v2 method fails to deserialize aspects correctly
- #4958 @anshbansal doc(ingest): mysql - describe required grants
- #4969 @RyanHolstien doc(telemetry): fix telemetry doc
- #4878 @MugdhaHardikar-GSLab fix(datahub-client): support utf8 encoding
- #4961 @anshbansal feat(bigquery): reduce logging
- #4909 @ShubhamThakre fix(ui): policy outside modal click issue update
- #4968 @jeffmerrick docs(website): Remove banner and nav item for metadata day 2022
- #4965 @mmmeeedddsss docs(datahub-kafka-sink): add topic_routes config to doc of datahub-kafka-sink
- #4966 @liyuhui666 fix(data platforms): Update data_platforms.json
- #4922 @mayurinehate feat(cli): raise error if get entity api fails
- #4963 @Masterchen09 fix(ui): do not show copy URN buttons when Clipboard API is not available
- #4962 @RyanHolstien feat(release): update CLI version
- #4960 @RyanHolstien feat: updates for 0.8.35
- #4945 @treff7es Revert "feat(spark-lineage): add support for iceberg and cache based plans (#4882)"
- #4954 @dexter-mh-lee fix(ci): remove scheduled artifact deletion run to avoid api rate limiting
-...
[!] DataHub v0.8.35
Notice: Deploying this release will result in an incorrectly named aspect entry existing in the database. The impact is that some upgrade jobs may fail to perform full scans of the database. This will be fixed by upgrading to > v0.8.38 OR by pulling the latest DataHub Upgrade docker image and executing the following upgrade:
./datahub-upgrade.sh -u RemoveUnknownAspects
v0.8.35
Highlights
Reduced vulnerability counts in project
Various bug fixes
New streamlined docker workflow
Full Commit Log
- #4937 @RyanHolstien fix(env): provide default for unset telemetry variable
- #4926 @gabe-lyons feat(dbt): enable data platform instance on dbt
- #4933 @anshbansal fix(lint): lint failure due to mypy upgrade
- #4925 @RyanHolstien feat(telemetry): add server side telemetry
- #4917 @jjoyce0510 feat(graphql): Adding resolvers for adding multiple tags, terms, and owners
- #4924 @chen4119 fix(kafka-setup): Check if keystore/truststore location env variables are set
- #4919 @jjoyce0510 feat(ui): Adding Search Bar to all List Views (groups, users, domains, policies, ingestion)
- #4923 @chen4119 fix(kafka-setup): Add ssl.keystore.type and ssl.truststore.type
- #4882 @maggie-zhu feat(spark-lineage): add support for iceberg and cache based plans
- #4918 @RyanHolstien fix(idea): change location of coercer to make intellij not complain about classes
- #4916 @chriscollins3456 fix(ui) Fix some spacing issues on the search card
- #4914 @anshbansal docs(ingest): remove incorrectly annotated lineage capability
- #4912 @mayurinehate docs(transformer): update custom transform example to add missing super init
- #4903 @jjoyce0510 refactor(actions): Migrate to use new datahub-actions container
- #4869 @jjoyce0510 refactor(API): Add "Filter" support for Assertion Run Events, Dataset Profiles, Dataset Operations
- #4860 @anshbansal fix(doc): update doc url to generated docs
- #4910 @chriscollins3456 feat(containers) Get and display all parent containers in header and search
- #4791 @pedro93 feat(gms): Add support for deleting reference pointers when deleting by urn
- #4911 @RyanHolstien docs(frontend): update build command for partial build
- #4839 @BoyuanZhangDE feat(ingestion): For all usage connectors, allow exclusion of top_n_queries from ingestion via a config param.
- #4908 @jeffmerrick fix(docs): Metadata day 2022: Fix year
- #4859 @anshbansal doc(biqquery): add caveat for materialized view
- #4906 @jeffmerrick docs(website): add banner and nav item for metadata day 2022
- #4905 @anshbansal fix(build): Fix breaking changes from GE 0.15.3
- #4884 @shirshanka fix(deps): reduce frontend dependency
- #4902 @anshbansal doc(ingestion): add note for UI ingestion & custom sources
- #4901 @anshbansal revert(bigquery-usage): dataset allow filter impl
- #4824 @gabe-lyons fix(usage): pull usage from environment source rather than args
- #4899 @SagarTiwari24 fix(docs): Update developing.md to mention directory context
- #4892 @gabe-lyons fix(ui): fix side panel resize css
- #4890 @justinas-marozas fix(mxe-consumer): exclude CassandraAutoConfiguration from consumer boot
- #4853 @sebkim fix(ingestion): ElasticSearch when no properties from elastic_mappings, gracefully continue
- #4865 @dependabot chore(deps): bump axios from 0.21.1 to 0.21.4 in /datahub-web-react
- #4898 @treff7es fix(ingestion): bigquery-usage: Fix biquery usage table deny pattern template
- #4893 @shirshanka fix(ci): remove logging statement
- #4891 @RyanHolstien chore(deps): play - upgrade for CVEs
- #4889 @shirshanka fix(ci): clean up docker workflow for multi-tags
- #4875 @shirshanka fix(ingest): lookml - add view definitions for all views
- #4887 @shirshanka fix(ci): docker - either load or push, don't do both
- #4885 @shirshanka fix(ci): remove buildx and qemu for non multi-platform images
- #4862 @anshbansal fix(sql-parsing): improve error handling
- #4883 @shirshanka fix(ci): remove multiplatform builds from containers that don't support it
- #4881 @shirshanka feat(ci): docker actions simplify, add vulnerability scanner, simplify smoke-tests
- #4867 @chriscollins3456 feat(dataPlatformInstance) - Resolve and display dataPlatformInstance on entities
- #4880 @shirshanka fix(docs): ingest - sort modules, fix small typos
- #4866 @ShubhamThakre fix(ui): search filter entity ui update
- #4855 @treff7es fix(ingestion): dependencies - Downgrading typing-extension dependency to work with Airflow 2.0.2
- #4600 @pedro93 Use ingest proposal to submit status updates
- #4868 @RyanHolstien Revert "chore(deps): upgrade play to remove CVEs (#4864)"
- #4857 @RyanHolstien chore(jetty): upgrade jetty to 9.4.46 for CVE
- #4776 @tha23rd fix(bigquery-usage): dataset allow filter impl
- #4864 @RyanHolstien chore(deps): upgrade play to remove CVEs
- #4843 @cristiancalugaru ssl configuration support for elasticsearch source
- #4861 @RyanHolstien Revert "chore(deps): upgrade play dependencies to remove CVE vulnerabilities (#4820)"
- #4846 @dependabot chore(deps): bump async from 2.6.3 to 2.6.4 in /docs-website
- #4847 @dependabot chore(deps): bump minimist from 1.2.5 to 1.2.6 in /docs-website
- #4820 @RyanHolstien chore(deps): upgrade play dependencies to remove CVE vulnerabilities
- #4842 @rslanka fix(ingestion): Allow profiling of only those tables that are allowed by the table_pattern.
- #4844 @RyanHolstien Revert "fix(jetty): upgrade jetty dependency for CVE (#4838)"
- #4838 @RyanHolstien fix(jetty): upgrade jetty dependency for CVE
- #4840 @rslanka chore(deps): upgrade dependency io.netty:netty-all to address vulnerability
- #4841 @RyanHolstien fix(policies): change order of operations for policies bootstrap step to update index after database
- #4837 @RyanHolstien chore(deps): move from velocity 1.7 to 2.3
- #4821 @ShubhamThakre feat(ui): entity profile add copy url option update
- #4817 @aditya-radhakrishnan docs(schema-history): add usage guide for schema history
- #4835 @gabe-lyons hide soft deleted entities in lineage
- #4836 @shirshanka refactor(metadata-service): remove redundant file
- #4826 @jjoyce0510 chore(deps): pinning jackson dataformat cbor
- #4777 @treff7es feat(ingest): s3 - add support for multiple pathspecs in one recipe
- #4807 @eclaassen-pb chore(deps): upgrade spring and parquet dependencies
- #4813 @pedro93 fix(docs): Adds access policy documentation
- #4832 @mayurinehate feat(ingest): great-expectations - add more logs
v0.8.34
Release Highlights
Developer Experience
- DataHub Actions Framework is LIVE! The Actions Framework makes responding to real-time changes in your Metadata Graph easy, enabling you to seamlessly integrate DataHub into a broader events-based architecture. Check out the repo here
- This release also introduces OpenAPI endpoints to post, get, and delete entities. Check out the usage guide here
- Metadata Ingestion Source docs have a new look! We now have code-generated documentation to apply consistency in format and contents
User Experience
- New! The Dataset Schema page now supports a “Blame View” to quickly understand how a field has evolved over semantic schema versions. You can find more info about how we compute versions here.
Ingestion Improvements
- New! Now incubating the Apache Pulsar source
- Update to Feast connector to support v0.18
- Ongoing improvements to Snowflake external table support
- Improvements to handling BigQuery audit log SQL queries
- Miscellaneous Tableau fixes for lineage, browse path, non-embedded datasets
What's Changed
- fix(cypress) - enable retries for failed tests to minimize flaking by @aditya-radhakrishnan in #4680
- Deprecate an entity by @Ankit-Keshari-Vituity in #4633
- fix(timeline): enhance schema field name change and removal support by @RyanHolstien in #4603
- fix(cli): rest emitter should override config and env variables by @anshbansal in #4622
- fix(docs): elasticsearch secret reference by @felixb in #4314
- fix(mcl-processor): Remove unnecessary log.info by @dexter-mh-lee in #4686
- fix(datahub-client): avoid parallel execution of metadat-io:test by @MugdhaHardikar-GSLab in #4685
- docs(metadata-models-custom): add example script to show producing cu… by @shirshanka in #4681
- fix(gms): Ensure Ordering by version when fetching next version by @arunvasudevan in #4696
- fix(docker): Fix issue #4683 by @jjoyce0510 in #4697
- feat(vulnerability): Upgrade spring libraries to latest version by @dexter-mh-lee in #4698
- refactor(gms): EbeanAspectDao - make the orderBy clause explicitly ascending in getNextVersions by @jjoyce0510 in #4699
- feat(gms): Entity change events v1 (Platform Event) by @jjoyce0510 in #4687
- Redesign the login page by @Ankit-Keshari-Vituity in #4684
- fix(snowflake): remove extra lineage edges in reports, change badly named config variable by @anshbansal in #4595
- fix(bigquery): error due to not handling data properly by @anshbansal in #4702
- fix(looker): Fix for Pydantic validation error for Looker TransportOptions on python 3.8 by @treff7es in #4705
- fix(ingest) bigquery: Moving bigquery temporary credential deletion to atexit by @treff7es in #4701
- fix(lineage): Fix lineage entity drawer height UI bug by @chriscollins3456 in #4707
- feat(ingest) - update identity sources to add flags for masking sensitive work units by @aditya-radhakrishnan in #4711
- fix(snowflake): deprecate config, update examples by @anshbansal in #4644
- fix(glue): delete CatalogId parameter from get_jobs api call by @BoyuanZhangDE in #4646
- fix(ui): Show deprecate button only for specific entity pages. by @jjoyce0510 in #4712
- feat(ml): show custom properties for MLFeatureTable in UI by @maaaikoool in #4706
- fix(glue): fix error for custom connector if ignore_unsupported_conne… by @mayurinehate in #4667
- feat(ingest): add decimal128 custom type for mysql by @kevinhu in #4624
- fix(policy): Use search to fetch all policies by @dexter-mh-lee in #4713
- fix(transformers): add snapshot aspects from dataset into base_transf… by @shirshanka in #4719
- Revert "fix(policy): Use search to fetch all policies" by @dexter-mh-lee in #4725
- minor fix(metadata-ingestion): Add new schemas to python codegen by @jjoyce0510 in #4726
- fix(ui): Display warning in UI when metadata service auth is disabled. by @jjoyce0510 in #4728
- fix(timelineCli): fix naming for timeline cli by @RyanHolstien in #4729
- fix(entity header): Fixes two issues in the EntityHeader - update UI and remove link by @chriscollins3456 in #4720
- Revert "fix(timelineCli): fix naming for timeline cli (#4729)" by @jjoyce0510 in #4731
- feat(cli): suppress stacktrace printing on configuration errors by @shirshanka in #4718
- fix(cli): align default sink env variables across ingest and other cl… by @shirshanka in #4739
- feat(ingest) dbt: Dbt query tag mapping and match template by @treff7es in #4744
- fix(cli): telemetry - make config file processing more robust by @shirshanka in #4738
- feat(react theming): stop homepage flicker for env-var based logos by @gabe-lyons in #4730
- feat(Cassandra): add Cassandra implementation of EntityService by @xdl in #3286
- fix(policies): Re-revert the policies fix + ingest documents directly to search by @dexter-mh-lee in #4733
- feat(cli): Eagerly load datahub actions CLI commands by @jjoyce0510 in #4748
- fix(ingest) bigquery: Fix BigQuery Datetime/Timestamp type column partition table profile bug by @sebkim in #4658
- docs: add missing PR numbers by @anshbansal in #4742
- fix(azure_ad): silently discard other Azure AD object types (#4693) by @cccs-eric in #4704
- fix(datahub-frontend): OIDC discovery URL will not have NONE as auth_methods_supported by @chen4119 in #4710
- fix(docs): fix links by @daha in #4703
- feat(ingest): add Feast repository source by @danilopeixoto in #4094
- feat(soft deletes): rephrasing soft delete banner by @gabe-lyons in #4753
- feat(ebeans): Add metrics to track connection pool by @dexter-mh-lee in #4755
- fix(AWS) When using aws_profile, grab temporary credentials from the session. by @Jiafi in #4751
- feat(metadata-ingestion): Custom endpoint url and proxies in S3. by @pawel3275 in #4708
- fix(tableau): miscellaneous tableau fixes for lineage, browse path, non-embedded datasets by @mayurinehate in #4724
- doc: add warning for JDK by @anshbansal in #4761
- fix(ui): fix expandedName for dataset by @mayurinehate in #4762
- fix(ui): Users and Groups UI bug fixes by @ShubhamThakre in #4746
- fix(azure_ad): make redirect and graph_url optional parameters and update docs by @aditya-radhakrishnan in #4754
- docs(glue): clarify that table regex patterns should be fully-qualified by @aditya-radhakrishnan in #4747
- fix(ml models): fix features tab by @gabe-lyons in #4769
- fix(lint): lib upgrade caused by @anshbansal in #4773
- fix(lineage) Filter dataset -> dataset lineage edges if data is transformed by @chriscollins3456 in #4732
- fix(build): Fix breaking changes from GE 0.15.3 that are affecting our Python3.6 smoke_tests by @rslanka in #4779
- fix(ingestion): Fixing how we eagerly import DataHub actions by @jjoyce0510 in #4784
- fix(ingest): fwk - datahub_api should be initialized by datahub-rest … by @shirshanka in #4786
...
DataHub v0.8.33
Release Highlights
User Experience
Refreshed the ML Entity page to match the feel of all other entity types; improved ML lineage functionality
Ingestion Improvements
- Airflow Improvements - as demoed in March Town Hall
- Add support to capture Airflow execution runs from lineage backend
- Introduce new High level API for generating dataflow/job/dataprocessinstance
- MS SQL ingestion now captures table & column descriptions
- Trino platform support for Great Expectations
- New Presto-on-Hive ingestion source
- BigQuery ingestion now supports extraction of usage info from audit logs
- Fix to Looker ingestion to extract Explore Views from join names
- Fix to Tableau ingestion to avoid duplicating schema in URNs for upstream tables
- Simplify & annotate Redshift Usage source
Full Commit Log
- feat(gms): Expose kafka listener concurrency as a GMS setting by @jjoyce0510 in #4536
- feat(ingest): add option for external Spark cluster by @kevinhu in #4571
- fix(upgrade): Renaming kafka producer since it clashes with spring-internal by @dexter-mh-lee in #4573
- feat(GraphQL): Add data platform query to GraphQL API by @jjoyce0510 in #4574
- build(ui): Fix Windows UI lint by @mattmatravers in #4556
- doc: make note prominent on quickstart by @anshbansal in #4558
- fix(protobuf) minor bugfixes for protobuf by @leifker in #4553
- feat(docs) Improves docs around developing datahub, removes deprecated docs on building metadata service by @pedro93 in #4552
- chore: cleanup extra file by @anshbansal in #4541
- feat(snowflake): reduce permissions provisioned by default by @anshbansal in #4543
- fix(ingestion): Redshift usage refactoring - simplify, annotate, fix bugs by @rslanka in #4572
- fix(graphql): Adding PRE FabricType to GraphQL by @jjoyce0510 in #4582
- feat(search) - add DATETIME FieldType by @aditya-radhakrishnan in #4407
- fix(tableau): fix for incorrect schema returned by tableau api for sn… by @mayurinehate in #4577
- chore: update default cli for managed ingestion by @anshbansal in #4581
- feat(okta) - add support for filtering/searching when ingesting Okta groups and users by @aditya-radhakrishnan in #4586
- doc(snowflake): add example of table pattern by @anshbansal in #4580
- fix(doc): try to fix broken link by @daha in #4593
- fix(bigquery): incorrect lineage when views are present by @anshbansal in #4568
- feat(metadata-service): Supporting a configurable Authorizer Chain by @jjoyce0510 in #4584
- fix(search): Make sure home page and search pages are consistent by @dexter-mh-lee in #4588
- fix(browse): Reduce browse aggregation size by @dexter-mh-lee in #4601
- doc: add page for handling deprecations, breaking changes etc. by @anshbansal in #4590
- docs(GraphQL): fix typo by @Falci in #4605
- feat(search): Add SearchScore annotation to use fields for search ranking by @dexter-mh-lee in #4596
- feat(ingestion): Redshift Usage Source - simplify OperationalStats workunit generation. by @rslanka in #4585
- feat(tableau): add some logic to normalize table names in tableau by @gabe-lyons in #4609
- fix: urlencode slash in urns too by @daha in #4527
- fix(bigquery): fix lineage bug, improve docs, add dataset filter config by @anshbansal in #4607
- fix(protobuf) fix test instabilitity by @leifker in #4612
- fix(ui): Fix dashboard tags display by @jjoyce0510 in #4611
- feat(ui): Adding GraphQL queries to fetch entity deprecation status by @jjoyce0510 in #4614
- feat(ingest): enable connection string for all sqlalchemy datasources by @ms32035 in #4508
- fix(docs): add grant statements for redshift-ingestion by @Abhiram98 in #4559
- chore: fix lint and remove incorrect integration mark from unit tests by @anshbansal in #4621
- feat: adding gradle, pip cache via gh cache, docker cache via dockerhub by @anshbansal in #4387
- doc(scheduling): make it easier to find ui ingestion by @anshbansal in #4610
- feat(glue): add CatalogId parameter for cross-account access by @BoyuanZhangDE in #4608
- doc(cli): add env variables and options for ingest command by @anshbansal in #4598
- fix(ingest): Restricting pytest docker version to <0.12 by @treff7es in #4639
- fix(cypress) - add waits for cypress search test to remove flakiness by @aditya-radhakrishnan in #4640
- Revert "feat: adding gradle, pip cache via gh cache, docker cache via dockerhub" by @dexter-mh-lee in #4637
- feat(search): Only reindex if the mappings for an existing field changed by @dexter-mh-lee in #4629
- feat: add presto-on-hive metadata ingestion source by @jchen0824 in #4625
- feat(ingest): add trino platform for great expectations by @ms32035 in #4594
- fix(kafka): Stop overriding kafka registry props with empty values by @jsotelo in #4604
- [model]: Dataprocess instance entity to model datajob/jobflow runs by @treff7es in #4459
- feat(ingest): add Urn python library for DataJob, DataFlow, Domain and Tag by @tc350981 in #4618
- fix(ingestion): ensure source/sink reports are always logged by @anshbansal in #4592
- fix(ingestion): extract explore views from join name in Looker by @dyanarose in #4627
- feat(ingestion): Enable lower-casing of the name part of dataset urn if env variable is set. by @rslanka in #4649
- feat: Enable the ingestion of bigquery audit logs to parse usage info… by @tha23rd in #4441
- fix(ingest): Fix snowflake KEY_PAIR auth by @mkamalas in #4638
- fix(home): Fix issue where some browse cards are missing by @dexter-mh-lee in #4652
- fix(tableau): avoid duplicate schema in URNs for upstream tables by @maaaikoool in #4645
- feat(ingest): capture MSSQL table+column descriptions by @kevinhu in #4579
- feat(ml): bringing ml screens up to date w/ the modern ui layout & improving ml lineage by @gabe-lyons in #4651
- (feat:airflow) Add support to capture airflow executions + high level dataflow/jobs api by @treff7es in #4615
- fix(ingestion): add missing workunit ids by @anshbansal in #4657
- fix(ingestion): Adding missing init.py by @anshbansal in #4659
- fix(bigquery-usage): missing dependency by @anshbansal in #4661
- feat(cypress) - add cypress dashboard view to CI by @aditya-radhakrishnan in #4654
- feat(autocomplete): show fully qualified name in autocomplete by @gabe-lyons in #4663
- feat(ingestion) dbt: Fixing issue with strip_user_ids_from_email and adding owner_naming_pattern by @arunvasudevan in #4587
- fix(sqlparser): fix sqlparser breaking due to # sign by @anshbansal in #4662
- fix(ingestion): validate datasource in Tableau connector, before creating its upstream by @nandacamargo in #4613
- Added Relative Routing on the Users & Groups screen by @Ankit-Keshari-Vituity in #4664
- fix(airflow): Not importing emitters directly to eliminate unneeded dependency by @treff7es in #4668
- docs:...