Skip to content

Releases: datahub-project/datahub

v0.8.42

03 Aug 21:49
f1abdc9
Compare
Choose a tag to compare

v0.8.42

Highlights

User Experience

  • Improved Search Experience - preview cards now display usage and freshness information
  • Update to Schema History - incorporated Community feedback to remove “Blame” terminology
  • Improved UI-Based Ingestion - easily configure metadata ingestion from Snowflake, BigQuery, Looker, and Tableau with an easy-to-follow form; YAML is still supported!

Developer/Community Experience

  • Python 3.6 is no longer supported for ingestion – we expect this to impact fewer than 1% of DataHub users (based on PyPi download stats). Please upgrade to Python 3.7 or newer
  • Update to GitHub Issue management - issues will be marked as “Inactive” after 30 days of no activity and will be automatically closed following an additional 30 days of inactivity
  • We’ve updated our Slack Guidelines! Read them here

Metadata Ingestion

  • You can now test your Snowflake connection via the CLI and UI-based Ingestion to ensure you have proper access levels required for general ingestion, profiling, and usage. We will be expanding this functionality to other cloud-based ingestion sources in upcoming cycles.
  • Hard delete will now discover and remove soft deleted entities
  • Resolved issue of assertion error with dbt stateful ingestion

Full Commit Log

What's Changed

Read more

v0.8.41

15 Jul 15:05
6e07ec5
Compare
Choose a tag to compare

Highlights

User Experience

  • Performance improvements in the UI
  • Improvements in CSV connector for easier ingestion - description, ownership, domain support added
  • UI form for Snowflake Managed Ingestion so you don't have to make changes in YAML
  • Viewing Siblings

Developer Experience

  • Ability to stop quickstart instead of nuking
  • Customizing mapped ports in quickstart
  • New models for dashboard usage
  • Circuit breaker and python api for Assertion and Operation

Metadata Ingestion

  • Improvements in bigquery connector to only profile some tables
  • Intermittent 401 errors during ingestion fixed
  • New salesforce connector

What's Changed

Read more

v0.8.40

30 Jun 02:59
11356e3
Compare
Choose a tag to compare

Highlights

Fixes bug in 0.8.39 that prevented standalone MAE consumers from being deployed.

User Experience

Support for deleting Tags and Domains via the UI
Support for editing Domain name via the UI
Visualize Glossary Term source on the Glossary Term Entity Page

Developer Experience

Fix for issue where standalone MAE consumers could not be deployed

Metadata Ingestion

Script to re-index sibling associations for dbt nodes that had already been ingested before 0.8.39

What's Changed

Full Changelog: v0.8.39...v0.8.40

v0.8.39

24 Jun 22:28
68762a2
Compare
Choose a tag to compare

Release Highlights

Known Issues

When using stand-alone MAE consumers (mae-consumer-job) this release will not work; this has been resolved in v0.8.40.

User Experience

  • NEW: support for surfacing outcomes of dbt Tests in dataset entity pages (see it in action here)
  • NEW: Improved navigation of dbt resources: dbt models and their associated warehouse tables are now merged into a unified entity (see it here). This will automatically be enabled for all newly ingested entities. To view this for entities you have already ingested, you will need to run a restore indices job.
  • Improvement to Impact Analysis: When looking at the Lineage tab, you can now easily toggle between “Upstream” and “Downstream” entities (try it out here)

Developer Experience

  • NEW: Java Kafka Emitter – Use this when you want to decouple your metadata producer from the uptime of your datahub metadata server by utilizing Kafka as a highly available message bus

Metadata Ingestion

  • NEW: Make bulk edits to your metadata via CSV (read more)
  • Snowflake ingestion improvements: configure profiling to run only if they have been updated within the prior N days
  • Managed ingestion update: removed need for sink block

What's Changed

Read more

[!] DataHub v0.8.38

09 Jun 22:44
d05cd08
Compare
Choose a tag to compare

Notice: There is a known issue in this release. Listing access tokens for a user may not return the correct results to the UI due to an unreliable query to DataHub's search backend. This will be resolved in v0.8.39. Note that this does not mean that access tokens will not work or are in any way compromised - the functionality of generating and using access tokens is not impacted.

The below release notes are copied from v0.8.37 release notes.

Highlights

User Experience

This release comes packed full of new features and updates.

  • NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
  • NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
  • NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
  • UPDATE - Rename “Manage” navigation item to “Govern”
  • [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
  • [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
  • FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
  • Minor fixes & improvements to UI for adding policy users + groups.

Metadata Ingestion

  • Support Snowflake ingest via Oauth
  • Misc fixes and improvements to existing ingestion sources

Disclaimers:

With this upgrade, we've added a new mechanism for authenticating users: native authentication. By default, this is enabled, which will allow new users to be created by Admin and for the user to login.

If you were previously disabling BOTH JaaS (via AUTH_JAAS_ENABLED = false) AND OIDC, and you still do not want to require a username + password to login, you'll need to add a new environment variable to datahub-frontend-react container: AUTH_NATIVE_ENABLED=false.

What's Changed

Full Changelog: v0.8.37...v0.8.38

[!] DataHub v0.8.37

09 Jun 17:37
f2304c3
Compare
Choose a tag to compare

Notice! This version has a few known bugs regarding revocable access tokens. Specifically, the UI for listing access tokens does not work properly unless you have a specific platform privilege. Additionally, there is a delay in revoking access tokens of 6 hours. We recommend that you skip this version and upgrade directly to v0.8.38.

Highlights

User Experience

This release comes packed full of new features and updates.

  • NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
  • NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
  • NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
  • UPDATE - Rename “Manage” navigation item to “Govern”
  • [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
  • [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
  • FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
  • Minor fixes & improvements to UI for adding policy users + groups.

Metadata Ingestion

  • Support Snowflake ingest via Oauth
  • Misc fixes and improvements to existing ingestion sources

What's Changed

Full Changelog: v0.8.36...v0.8.37

DataHub V0.8.36

02 Jun 08:55
d31c009
Compare
Choose a tag to compare

V0.8.36

Highlights

User Experience

NEWManage Glossary Terms via the DataHub UI! Delivering on our Q2’22 Roadmap item, end users can now create, edit, move, delete, and deprecate Glossary Terms via the UI! With this new experience comes some new ways of indexing data in order to make viewing and traversing the different levels of your Glossary possible. Therefore, you will have to restore your indices in order for the new Glossary experience to work for users that already have existing Glossaries. If this is your first time using DataHub Glossaries, you're all set!

Ability to add multiple Owners, Tags, Terms

Developer Experience

The new Revokable Token API supports a new type of Access Token which can be revoked & queried, allowing admins to easily delete tokens for operational & security reasons. Read all about it in the Access Token Management Usage Guide.

Ingestion Updates

This release includes 3 new Metadata Sources:

  • Iceberg
  • Vertica
  • SAP HANA

📣 Massive shoutout to DataHub Community members @cccs-eric, @eburairu, and @buggythepirate for driving these contributions! 📣

These sources are currently marked as “Testing” - we encourage you to try them out & provide feedback in the DataHub #ingestion Slack channel!

We’ve rolled out the following ingestion-related improvements:

  • AWS Glue - data profiling is now supported
  • S3 ingestion speed-up
  • Various bug fixes

Full Commit Log

Read more

[!] DataHub v0.8.35

18 May 17:28
bb341f7
Compare
Choose a tag to compare

Notice: Deploying this release will result in an incorrectly named aspect entry existing in the database. The impact is that some upgrade jobs may fail to perform full scans of the database. This will be fixed by upgrading to > v0.8.38 OR by pulling the latest DataHub Upgrade docker image and executing the following upgrade: ./datahub-upgrade.sh -u RemoveUnknownAspects

v0.8.35

Highlights

Reduced vulnerability counts in project
Various bug fixes
New streamlined docker workflow

Full Commit Log

v0.8.34

04 May 16:19
c22d52d
Compare
Choose a tag to compare

Release Highlights

Developer Experience

  • DataHub Actions Framework is LIVE! The Actions Framework makes responding to real-time changes in your Metadata Graph easy, enabling you to seamlessly integrate DataHub into a broader events-based architecture. Check out the repo here
  • This release also introduces OpenAPI endpoints to post, get, and delete entities. Check out the usage guide here
  • Metadata Ingestion Source docs have a new look! We now have code-generated documentation to apply consistency in format and contents

User Experience

  • New! The Dataset Schema page now supports a “Blame View” to quickly understand how a field has evolved over semantic schema versions. You can find more info about how we compute versions here​​.

Ingestion Improvements

  • New! Now incubating the Apache Pulsar source
  • Update to Feast connector to support v0.18
  • Ongoing improvements to Snowflake external table support
  • Improvements to handling BigQuery audit log SQL queries
  • Miscellaneous Tableau fixes for lineage, browse path, non-embedded datasets

What's Changed

Read more

DataHub v0.8.33

15 Apr 18:46
72046bf
Compare
Choose a tag to compare

Release Highlights

User Experience

Refreshed the ML Entity page to match the feel of all other entity types; improved ML lineage functionality

Ingestion Improvements

  • Airflow Improvements - as demoed in March Town Hall
    • Add support to capture Airflow execution runs from lineage backend
    • Introduce new High level API for generating dataflow/job/dataprocessinstance
  • MS SQL ingestion now captures table & column descriptions
  • Trino platform support for Great Expectations
  • New Presto-on-Hive ingestion source
  • BigQuery ingestion now supports extraction of usage info from audit logs
  • Fix to Looker ingestion to extract Explore Views from join names
  • Fix to Tableau ingestion to avoid duplicating schema in URNs for upstream tables
  • Simplify & annotate Redshift Usage source

Full Commit Log

  • feat(gms): Expose kafka listener concurrency as a GMS setting by @jjoyce0510 in #4536
  • feat(ingest): add option for external Spark cluster by @kevinhu in #4571
  • fix(upgrade): Renaming kafka producer since it clashes with spring-internal by @dexter-mh-lee in #4573
  • feat(GraphQL): Add data platform query to GraphQL API by @jjoyce0510 in #4574
  • build(ui): Fix Windows UI lint by @mattmatravers in #4556
  • doc: make note prominent on quickstart by @anshbansal in #4558
  • fix(protobuf) minor bugfixes for protobuf by @leifker in #4553
  • feat(docs) Improves docs around developing datahub, removes deprecated docs on building metadata service by @pedro93 in #4552
  • chore: cleanup extra file by @anshbansal in #4541
  • feat(snowflake): reduce permissions provisioned by default by @anshbansal in #4543
  • fix(ingestion): Redshift usage refactoring - simplify, annotate, fix bugs by @rslanka in #4572
  • fix(graphql): Adding PRE FabricType to GraphQL by @jjoyce0510 in #4582
  • feat(search) - add DATETIME FieldType by @aditya-radhakrishnan in #4407
  • fix(tableau): fix for incorrect schema returned by tableau api for sn… by @mayurinehate in #4577
  • chore: update default cli for managed ingestion by @anshbansal in #4581
  • feat(okta) - add support for filtering/searching when ingesting Okta groups and users by @aditya-radhakrishnan in #4586
  • doc(snowflake): add example of table pattern by @anshbansal in #4580
  • fix(doc): try to fix broken link by @daha in #4593
  • fix(bigquery): incorrect lineage when views are present by @anshbansal in #4568
  • feat(metadata-service): Supporting a configurable Authorizer Chain by @jjoyce0510 in #4584
  • fix(search): Make sure home page and search pages are consistent by @dexter-mh-lee in #4588
  • fix(browse): Reduce browse aggregation size by @dexter-mh-lee in #4601
  • doc: add page for handling deprecations, breaking changes etc. by @anshbansal in #4590
  • docs(GraphQL): fix typo by @Falci in #4605
  • feat(search): Add SearchScore annotation to use fields for search ranking by @dexter-mh-lee in #4596
  • feat(ingestion): Redshift Usage Source - simplify OperationalStats workunit generation. by @rslanka in #4585
  • feat(tableau): add some logic to normalize table names in tableau by @gabe-lyons in #4609
  • fix: urlencode slash in urns too by @daha in #4527
  • fix(bigquery): fix lineage bug, improve docs, add dataset filter config by @anshbansal in #4607
  • fix(protobuf) fix test instabilitity by @leifker in #4612
  • fix(ui): Fix dashboard tags display by @jjoyce0510 in #4611
  • feat(ui): Adding GraphQL queries to fetch entity deprecation status by @jjoyce0510 in #4614
  • feat(ingest): enable connection string for all sqlalchemy datasources by @ms32035 in #4508
  • fix(docs): add grant statements for redshift-ingestion by @Abhiram98 in #4559
  • chore: fix lint and remove incorrect integration mark from unit tests by @anshbansal in #4621
  • feat: adding gradle, pip cache via gh cache, docker cache via dockerhub by @anshbansal in #4387
  • doc(scheduling): make it easier to find ui ingestion by @anshbansal in #4610
  • feat(glue): add CatalogId parameter for cross-account access by @BoyuanZhangDE in #4608
  • doc(cli): add env variables and options for ingest command by @anshbansal in #4598
  • fix(ingest): Restricting pytest docker version to <0.12 by @treff7es in #4639
  • fix(cypress) - add waits for cypress search test to remove flakiness by @aditya-radhakrishnan in #4640
  • Revert "feat: adding gradle, pip cache via gh cache, docker cache via dockerhub" by @dexter-mh-lee in #4637
  • feat(search): Only reindex if the mappings for an existing field changed by @dexter-mh-lee in #4629
  • feat: add presto-on-hive metadata ingestion source by @jchen0824 in #4625
  • feat(ingest): add trino platform for great expectations by @ms32035 in #4594
  • fix(kafka): Stop overriding kafka registry props with empty values by @jsotelo in #4604
  • [model]: Dataprocess instance entity to model datajob/jobflow runs by @treff7es in #4459
  • feat(ingest): add Urn python library for DataJob, DataFlow, Domain and Tag by @tc350981 in #4618
  • fix(ingestion): ensure source/sink reports are always logged by @anshbansal in #4592
  • fix(ingestion): extract explore views from join name in Looker by @dyanarose in #4627
  • feat(ingestion): Enable lower-casing of the name part of dataset urn if env variable is set. by @rslanka in #4649
  • feat: Enable the ingestion of bigquery audit logs to parse usage info… by @tha23rd in #4441
  • fix(ingest): Fix snowflake KEY_PAIR auth by @mkamalas in #4638
  • fix(home): Fix issue where some browse cards are missing by @dexter-mh-lee in #4652
  • fix(tableau): avoid duplicate schema in URNs for upstream tables by @maaaikoool in #4645
  • feat(ingest): capture MSSQL table+column descriptions by @kevinhu in #4579
  • feat(ml): bringing ml screens up to date w/ the modern ui layout & improving ml lineage by @gabe-lyons in #4651
  • (feat:airflow) Add support to capture airflow executions + high level dataflow/jobs api by @treff7es in #4615
  • fix(ingestion): add missing workunit ids by @anshbansal in #4657
  • fix(ingestion): Adding missing init.py by @anshbansal in #4659
  • fix(bigquery-usage): missing dependency by @anshbansal in #4661
  • feat(cypress) - add cypress dashboard view to CI by @aditya-radhakrishnan in #4654
  • feat(autocomplete): show fully qualified name in autocomplete by @gabe-lyons in #4663
  • feat(ingestion) dbt: Fixing issue with strip_user_ids_from_email and adding owner_naming_pattern by @arunvasudevan in #4587
  • fix(sqlparser): fix sqlparser breaking due to # sign by @anshbansal in #4662
  • fix(ingestion): validate datasource in Tableau connector, before creating its upstream by @nandacamargo in #4613
  • Added Relative Routing on the Users & Groups screen by @Ankit-Keshari-Vituity in #4664
  • fix(airflow): Not importing emitters directly to eliminate unneeded dependency by @treff7es in #4668
  • docs:...
Read more