Skip to content

Releases: datahub-project/datahub

DataHub v0.8.32

04 Apr 21:27
ede6547
Compare
Choose a tag to compare

Release Highlights

User Experience

We're excited to announce View-based RBAC Policies! You can now create and apply view-only permissions to your DataHub end-users, providing more robust access controls.

We've also included some small (but impactful!) improvements to UX, including:

  • Display recent search terms when beginning the search flow
  • Consistently displaying entity subtypes for dbt, Looker, Kafka, & more. Think: Kafka entities are displayed as "topics" instead of "datasets"

Ingestion Highlights

  • New! Protobuf ingestion (shoutout to @leifker for this Community-led contribution!)
  • Initial work to support a "Notebook" entity (shoutout to @tc350981 for spearheading this work!!)
  • Stateful ingestion for dbt is now supported
  • Ongoing improvements to our Tableau ingestion source from @nandacamargo & @cuong-pham
  • Improvements to handling database aliases for Redshift ingestion
  • Improvements to S3 source:
    • Add containers for datasets
    • Support platform_instance
    • Support for folder level datasets
    • Increased flexibility to specify dataset paths
  • Ingestion Fixes:
    • Snowflake Usage - log warning instead of error out & other error handling
    • Snowflake allow/deny patterns
    • Examples of allow/deny patterns added to docs

Full Commit Log

DataHub v0.8.31

17 Mar 23:22
2f078c9
Compare
Choose a tag to compare

Bugfix release to prevent failing reindexing of system metadata index in elasticsearch

Full Commit Log

  • #4440 @pedro93 fix(cli) Makes filtered search deletes include BOTH removed and non-removed
  • #4444 @pedro93 fix(cli) Adds elasticsearch mapping
  • #4432 @leifker feat(protobuf): Gradle protobuf example project

Datahub v0.8.30

17 Mar 13:55
2d82531
Compare
Choose a tag to compare

V0.8.30

Release Highlights

  • Fix for OIDC encryption bug from v0.8.29
  • Adds platform instance id to the container id generation, and support for migrating the old container ids to the new ones via the datahub migrate CLI.

Notable UI-Based Features

  • Showing recent searches in autocomplete.

What's Changed

  • fix(ui): some small ui fixes for lineage by @gabe-lyons in #4381
  • fix(docs): change cabify link by @maaaikoool in #4373
  • Fixed Bug: Alpha slider doesn’t move, only the color slider is movabe in tag color picker by @Ankit-Keshari-Vituity in #4359
  • feat(GE): add option to disable sql parsing, use default parser by @mayurinehate in #4377
  • fix(removed): Make sure removed entities do not appear on recommendations by @dexter-mh-lee in #4353
  • fix(browse): fix browse double click issue by @gabe-lyons in #4382
  • fix(oidc): Update group membership each login (and make group extraction disabled by default) by @jjoyce0510 in #4380
  • feat(ingestion): add java protobuf schema ingestion by @leifker in #4178
  • Docs/update docs by @RyanHolstien in #4393
  • Revert "Fixed Bug: Alpha slider doesn’t move, only the color slider is movabe in tag color picker" by @gabe-lyons in #4390
  • feat(ingestion): improve logging, docs for bigquery, snowflake, redshift by @anshbansal in #4344
  • fix(ingest) Azure AD: support nested groups (#4367) by @cccs-eric in #4368
  • fix: add missing logo by @anshbansal in #4386
  • feat(spark-lineage): add support to custom env and platform_instance by @MugdhaHardikar-GSLab in #4208
  • fix(containers) - configure domain resolver for containers by @aditya-radhakrishnan in #4404
  • feat(*): Support setting owner type when assigning ownership by @jjoyce0510 in #4354
  • fix: telemetry failure should not cause CLI failure by @anshbansal in #4406
  • feat(autocomplete): Show recent searches + improved autocomplete by @jjoyce0510 in #4400
  • fix(ingestion): Fix mypy error stateful committable & restore mypy version. by @rslanka in #4408
  • build(markupsafe): update markupsafe pinning for Airflow compatibility by @set5think in #4388
  • feat(search): Add flag to enable caching on search service by @dexter-mh-lee in #4335
  • fix(query_combiner): add try block to handle queries of type str by @WaStCo in #4397
  • fix(ingestion): read all tables from redshift by @Abhiram98 in #4345
  • fix(ingestion): Invoke SqlLineageSQLParser's implementation in a separate process by @rslanka in #4391
  • fix(ingest): handle endpoints without 200 response in openapi by @JorgenEvens in #4332
  • feat(ingestion): Add the ability to query the latest timeseries aspect value via the get_cli. by @rslanka in #4395
  • Refactoring the quries into a single one to get the search results on Home Page by @Ankit-Keshari-Vituity in #4372
  • feat(lineage): hide soft deleted nodes in lineage & adds banner in entity page by @gabe-lyons in #4410
  • fix(lineage): Move lineage registry to entity-registry module by @dexter-mh-lee in #4412
  • feat(cli) Changes rollback behaviour to apply soft deletes by default by @pedro93 in #4358
  • fix(looker): various looker fixes by @gabe-lyons in #4394
  • fix(oidc): Fixing OIDC encryption bug in v0.8.29 by @jjoyce0510 in #4418
  • feat(oidc): Adding support for extracting single string groups claim by @jjoyce0510 in #4419
  • fix: change log levels to debug by @anshbansal in #4411
  • tests(cypress): reduce cypress flakiness by retrying login on failure by @gabe-lyons in #4423
  • fix(ingest): extract redshift platform correctly from sqlalchemy uri by @mayurinehate in #4421
  • build: Fix line endings for Windows check-out by @mattmatravers in #4370
  • feat(gql): make gql layer resistant to unresolvable relationships by @gabe-lyons in #4424
  • fix(ingestion) containers: Adding platform instance to container keys by @treff7es in #4279
  • fix: don't set None default by @anshbansal in #4422
  • Flexible search on soft delete by @pedro93 in #4405
  • fix(no-code metadata models in ui): fixes bug with rendering renderSpec aspects by @gabe-lyons in #4430

New Contributors

Full Changelog: v0.8.29...v0.8.30

DataHub v0.8.29

10 Mar 19:15
d474387
Compare
Choose a tag to compare

v0.8.29

NOTICE

This version is affected by an OIDC (SSO) related issue with the following stack trace:

datahub-datahub-frontend-8d7f7cf6f-xvjwm datahub-frontend Caused by: java.security.InvalidKeyException: Invalid AES key length: 30 bytes
datahub-datahub-frontend-8d7f7cf6f-xvjwm datahub-frontend 	at com.sun.crypto.provider.AESCrypt.init(AESCrypt.java:87)

DataHub core team is working to address this. For now, we recommend staying on 0.8.28 if you are using OIDC actively!

Release Highlights

Fix for MAE & MCE consumer healthcheck
Upgrade to Java 11 and Gradle 6

Full Commit Log

DataHub v0.8.28

07 Mar 23:57
beb51eb
Compare
Choose a tag to compare

Release Highlights

Notable UI-Based Features

Quickly view, search, and filter the downstream dependencies of any Entity! By using the Impact Analysis Lineage view, you can now see the full set of downstream entities that may be impacted by a change to a given entity. You can also search, filter, and export the list of entities to CSV; try it for yourself here.

View Dataset- and Column-Level Data Validation outcomes in DataHub. We now support surfacing outcomes from Great Expectations validations in Dataset Entities! Easily view the full history of validation outcomes to understand the trustworthiness of your data.

User Groups, Policies, and Tags have a new look!

  • The User Group page has a new look, allowing you to assign an email address, Slack Channel, Group Owner, and more. Easily add/remove Group Members from the UI - test it out here.
  • We refreshed the Policies Page, allowing you to see Policy membership and status at a glance.
  • The Tag Details page has been overhauled! You can now edit the definition, assigned owners, and tag color via the UI (try it here).

Notable Metadata Model & Ingestion-Based Features

First Milestone: Column-Level Lineage is complete! The Metadata Model now supports “fine-grained” lineage for Datasets; see documentation here for details, including adding fine-grained lineage to a dataset or a datajob.

Define Dataset-to-Dataset lineage via YAML. As demonstrated in the February 2022 Town Hall, you can now set Dataset-level lineage via YAML. This is great for teams that have more bespoke lineage needs that cannot be auto-extracted by the current set of supported ingestion sources.

Track all changes to entities using the Timeline API. This unified timeline of changes to entities in the metadata graph provides a robust picture of how your metadata has evolved over time. Upcoming work will support surfacing this detail via the DataHub UI. See the overview from Town Hall here.

Miscellaneous Metadata Ingestion Updates:

  • Incubating: PowerBI Ingestion Source
  • BigQuery Profiling: ability to disable profiling by partition
  • Tableau improvements: Workbooks are now modeled as “Containers”

What's Changed

Read more

DataHub Release Candidate v0.8.28 (rc1)

05 Mar 00:53
18dd5b6
Compare
Choose a tag to compare
Pre-release

DataHub v0.8.28 Release Candidate 1

What's Changed

New Contributors

Full Changelog: v0.8.27...v0.8.28rc1

Release Candidate v0.8.28

05 Mar 00:14
18dd5b6
Compare
Choose a tag to compare
Pre-release

Release Candidate for Version 0.8.28.

What's Changed

New Contributors

Full Changelog: v0.8.27...RC-v0.8.28

DataHub v0.8.27

23 Feb 19:44
49a8ece
Compare
Choose a tag to compare

Release Highlights

Notable UI-Based Features

  • The User Page has a new look! You can now quickly filter & search for entities owned by a User, update/edit the user profile, and see details of which Groups the User belongs to. See it in action here.

  • Search for Entities by Owner - Easily filter search results by User/Group Owner

  • Edit existing Glossary Terms - you can now edit/update Glossary Term descriptions via the UI. Future work will allow creating Terms from the UI as well - stay tuned!

  • Improved Metadata Analytics - keep tabs on your DataHub entities across Domains, Platforms, Glossary Terms, Environments, & more. Check out the new & improved Analytics tab!

Notable Metadata Model & Ingestion-Based Features

  • ClickHouse integration is now incubating! This is a 100% Community-led integration - huge shoutout to @ne1r0n & @havramar for pushing initial code & moving this work through!

  • Kafka Stateful Ingestion - shoutout to @claudio-benfatto for building this out!

  • Extract Airflow Task Description - big thanks to @guidoturtu for the contrib!

  • BigQuery: profile latest Partition/Shard - We know that Data Profiling can be computationally expensive for partitioned/sharded BQ instances. We now support profiling only the latest partition/shard to minimize processing load.

Notable Docs Updates

  • NEW! Tips for Searching within DataHub - Ever wondered how to make the most of Searching within DataHub? Check out this doc put together by @xiphl

  • Improvements to Metadata Model Docs - This is a huge win for the Community - we’re taking a big step toward providing auto-generated & curated docs related to the Metadata Model - take a look here.

What's Changed

Read more

DataHub v0.8.26

08 Feb 23:22
3668de8
Compare
Choose a tag to compare

This is a Bugfix release meant to address the issue with adding Glossary Terms to Dataset fields present in version 0.8.25.

Release Highlights

  • Fixing bug where Glossary Terms cannot be added to Dataset fields in previous release version.

DataHub v0.8.25

07 Feb 22:32
ec062b6
Compare
Choose a tag to compare

Known Issues

  • Adding Glossary Terms to schema fields does not work with this version due to a bug. Upgrade to v0.8.26 for the fix.

Release Highlights

Buckle up, folks! v0.8.25 brings some very exciting (and highly-requested!) updates.

Notable UI-Based Features

  • UI-based Ingestion - as demoed in December Town Hall, we now support creating, configuring, scheduling, & executing batch metadata ingestion using the DataHub user interface. This makes getting metadata into DataHub easier by minimizing the overhead required to operate custom integration pipelines.
  • Data Domains - DataHub now supports grouping data assets into logical collections called Domains. Domains are curated, top-level folders or categories where related assets can be explicitly grouped. Read the guide here!
  • Data Containers are now supported! This is the physical grouping of entities, ex. a Schema is a container of 1 or more Datasets; a Dashboard is a container of 1 or more Charts.

Notable Metadata Model & Ingestion-Based Features

  • Data Quality test results are now supported in the DataHub metadata model. This is the first milestone toward surfacing Dataset & Column-level Data Quality results in the UI (read full scope of work here). Future releases will include a Great Expectations integration & UI support - we’re on track to complete this in Q1 as planned.
  • Avro files are now supported in the Data Lake File ingestion source
  • Ingest metadata from multiple instances of the same platform type. This has been a very common use case within the Community - you can now differentiate multiple instances of the same platform type! If you already have pre-existing entries, use the datahub migrate command to migrate them over to platform instances.
  • Ignore users from Top Users calculation
    • feat(ingestion): Adding ability to ignore users from top users calculation by @treff7es in #3735
  • BigQuery - Data Profiling on only the latest partition/shard
    • feat(ingestion) bigquery: Profiling only the latest partition/shard on bigquery by @treff7es in #3930
  • (feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in #3813

Notable Fixes

  • Fix to support View in Looker * feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in #3985
  • fix(graphql): support group display name in ownership by @thomasplarsson in #3979
  • fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in #3990
  • fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in #3926

DataHub Usage Guides

What's Changed

Read more