DataHub v0.8.0
Notable Highlights
- Product Analytics : Understand how your users are interacting with DataHub
- Product Improvements: Auto-complete across types, Task list view under Pipelines
- Features: Business Glossary (incubating)
- Integration improvements
- Looker, dbt, Hive, Redshift, Glue, MongoDB
- Kafka Connect (incubating)
and finally,
NoCodeMetadata
This release introduces a major refactor that permits extension of DataHub’s metadata model without writing any imperative code.
Highlights:
- Removed strongly-typed, entity-specific DAOs. Added more generic services.
- Introduced Elastic settings & mappings generation, dynamic index registration & evolution
- Decoupled persistence layer from Pegasus + Java by removing fully-qualified class names (aspects, relationships)
- Introduced declarative, annotation-based mechanisms for defining indexed fields, foreign key fields, entities & aspects
- In-place upgrade CLI to aid in adopting this upgrade (datahub-upgrade)
For more information, see
The PR: #2629
Technical Overview
The DataHub Metadata Model
Extending the Metadata Model
No Code Upgrade Guide
ChangeLog
- #2629 @jjoyce0510 feat: No Code Metadata Modeling
- #2617 @shirshanka docs: update roadmap with accomplished items
- #2635 @hsheth2 fix(ingest): improve redshift ingestion performance
- #2599 @shubham49 feat(react): replace user urn with username
- #2623 @gabe-lyons fix(react): url encoding urns and tag profile fix
- #2634 @hsheth2 fix(ingest): include urn as key for kafka emitter
- #2636 @dexter-mh-lee fix(ci): update trigger to always generate docker images
- #2622 @RickardCardell feat(react): custom properties are now sortable by name in the UI
- #2626 @thomasplarsson fix(ingestion): improve robustness of glue ingestion source
- #2619 @topwebtek7 fix(react): update ispartofbuilderfromdataflow, update ui in datajob header
- #2620 @jjoyce0510 feat(analytics): support configuration of Kafka SSL
- #2618 @topwebtek7 feat(react): eliminate noises in react build, test and cleanup, get rid of warnings
- #2616 @dexter-mh-lee docs: added AWS deployment guide
- #2615 @gabe-lyons fix(react): fixing tags autocomplete bug
- #2590 @saxo-lalrishav feat(react): business glossary and user - tab based profile page
- #2612 @hsheth2 docs: update homepage text
- #2614 @hsheth2 fix(ingest): fail gracefully when lookml used on old python versions
- #2603 @topwebtek7 feat(react): update collectionname in datajob, update tabs ui/ux
- #2604 @topwebtek7 feat(graphql): redesign autocomplete to search for all entity types, show suggestions grouped by entities
- #2606 @hsheth2 feat(ingest): populate inputDatajobs field in airflow integration
- #2602 @topwebtek7 feat(react): add parent flow link on datajob page
- #2596 @remisalmon fix(ingest): fix lineage after dbt metadata ingestion when tables name and identifier differ
- #2600 @topwebtek7 feat(react): add topological sort feature, update graphql, add tests
- #2607 @hsheth2 feat(ingest): update bigquery demo data
- #2609 @hsheth2 fix(docs): various fixes and additions
- #2601 @frsann feat(ingestion): Fix looker test
- #2577 @shubham49 feat(react): add glossary term to dataset preview
- #2585 @afranzi fix(ingest): incorrect implementation of the allow pattern in looker dashboards
- #2591 @martha feat(react): add optional subtitle to home page
- #2598 @kevinhu fix(ingest): default values for env
- #2575 @hsheth2 docs(ingest): add a guide for writing sources
- #2594 @topwebtek7 feat(react): add nativeDataType with tooltip over icon in schema
- #2595 @havramar docs: Add Plum Research to POC adoption section in README.md
- #2589 @martha feat(react): prevent logo distortion
- #2586 @gabe-lyons fix(react): fix tag autocomplete after creating a new tag
- #2583 @bboylen feat(react): Add label to edited dataset descriptions
- #2579 @topwebtek7 feat(dataflow): update dataflow to have datajobs in new tab
- #2584 @john-bodley fix(docs): Fix Superset typo in README
- #2574 @zack3241 fix(helm charts): remove connection tests from helm charts
- #2582 @hsheth2 build(ingest): show diff upon lint failures
- #2516 @taufiqibrahim feat(ingest): kafka connect metadata ingestion
- #2580 @hsheth2 feat(ingest): add dataset tag transformer
- #2573 @hsheth2 test(ingest): use different mysql test port
- #2549 @shubham49 feat(react): link glossary term to dataset page
- #2572 @hsheth2 test(ingest): ensure transformer registry works for aliases
- #2571 @hsheth2 fix(ingest): better active directory LDAP support
- #2483 @luck02 fix(dbt): set target platform and load schema
- #2563 @afranzi feat(ingest): add AWS IAM Roles Support to the Glue Source
- #2566 @saxo-lalrishav fix(react): Update raw schema view to support non json schemas
- #2570 @saxo-lalrishav fix(react): Removing a user having multiple role from owner tab also remove the other roles associated to that user
- #2562 @sunkickr docs: Add Sphinx Docstrings to Airflow Modules
- #2560 @hsheth2 fix(cli): prevent click from suppressing errors
- #2559 @hsheth2 docs: include license in the readme
- #2561 @hsheth2 fix(ingest): check mypy types for test helpers
- #2558 @shirshanka docs: town-hall updates and some badges
- #2557 @hsheth2 feat(ingest): add options for Airflow lineage backend
- #2467 @pedro93 feat(k8s): generalizes CronJob metadata ingestion resource for custom logic
- #2546 @kevinhu feat(ingest): MongoDB schema inference
- #2556 @gabe-lyons fix(search): have search bar ignore blank searches
- #2553 @gabe-lyons fix(owner): fixing ownership routing
- #2555 @gabe-lyons feat(business glossary): hiding business glossary until all features completed
- #2493 @frsann feat(ingest): Looker view and dashboard ingestion
- #2538 @saxo-lalrishav feat(business glossary): search, browse and entity page for business glossary terms
- #2543 @hsheth2 fix(ingest): register custom Hive types
- #2544 @hsheth2 docs(ingest): improve kafka schema registry config docs
- #2545 @G-nther fix(analytics): use seperate env variable for tracking topic in MAE-Consumer
- #2547 @hsheth2 ci(docker): disable GitHub Docker registry
- #2521 @hsheth2 refactor(ingest): move Airflow into
datahub_provider
module - #2539 @dexter-mh-lee fix(analytics): add support for AWS ES
- #2540 @afranzi feat(ingest): define Redshift as a Postgres Source
- #2541 @jjoyce0510 fix(react): disable analytics link display
- #2542 @topwebtek7 fix(react): fix type issue with adding new in ownership
- #2531 @hsheth2 build(ingest): use gradle in commands + docs
- #2536 @hsheth2 fix(ingest): remove mce.json file from root
- #2535 @gabe-lyons fix(react): fixing import issue
- #2534 @kevinhu docs: autoplay and navigation for source logos carousel
- #2519 @topwebtek7 feat(usergroup): implement corpgroup in graphql, refactor avatars and ownership in react
- #2532 @hsheth2 feat(ingest): add a transformer for adding ownership
- #2485 @shubham49 feat(graphql): add graphql types for business glossary
- #2533 @dexter-mh-lee fix(k8s): Fix helm charts for supporting analytics
- #2499 @jjoyce0510 feat(Product Analytics): Introducing In-App Analytics Beta
- #2529 @hsheth2 docs: enable better syntax highlighting
- #2528 @kevinhu docs: Use carousel layout for ingestion source logos
- #2530 @jjoyce0510 fix(model): removing reference to go link in SchemaFieldPath model
- #2515 @hsheth2 docs: update docusaurus
- #2527 @dexter-mh-lee fix(k8s): change defaultMode for certs volume
- #2503 @hsheth2 feat(ingest): check in generated schema files
- #2526 @nickwu241 fix(k8s): fix kafka-setup-job.yml datahub-certs-dir mountPath
- #2512 @dexter-mh-lee fix(k8s): comment out minikube specific settings
- #2522 @hsheth2 fix(ingest): generate Airflow tags correctly
- #2523 @topwebtek7 feat(react): set fixed height for dataset preview
- #2525 @kevinhu docs: Add ingestion source logos grid
- #2524 @hsheth2 fix(ingest): add support for custom postgres types
- #2228 @shakti-garg feat(business_glossary): add new entity business term and its relationship with dataset and its fields
- #2520 @hsheth2 fix(build): only check for src/ and tests/ directories for lint checks
- #2514 @hsheth2 docs: update Wolt logo
- #2513 @hsheth2 build(ingest): include package data in sdist
- #2510 @hsheth2 build(ingest): add metadata-ingestion to gradle build
- #2509 @hsheth2 docs: improve airflow explanations and examples
- #2508 @hsheth2 fix(ingest): remove double edges from Airflow lineage backend
- #2505 @vlavorini docs: fixed MCE file recipe example
- #2500 @dexter-mh-lee docs(k8s): Update readme with helm prerequisite
- #2501 @gabe-lyons feat(lineage): removing dataset<>dataset edge in job index builder
- #2502 @hsheth2 ci(ingest): ensure datahub imports work
- #2497 @hsheth2 feat(ingest): capture table properties if available
- #2498 @hsheth2 fix(ingest): replace ImportError with ModuleNotFoundError
- #2491 @dexter-mh-lee feat(search): Add search for field level description and tags
- #2496 @gabe-lyons fix(react): fixing layout adjustments in lineage viz
- #2495 @shirshanka docs: Update roadmap with accomplished items
- #2489 @hsheth2 fix(ingest): support https connections with cookies in Hive ingestion
- #2486 @hsheth2 feat(ingest): support hive over http
- #2478 @remisalmon feat(ingest): add support for Looker view built from SQL-based derived tables
- #2484 @jjoyce0510 refactor(react): Use useGetAuthenticatedUser hook to get the logged in user.
- #2473 @saxo-lalrishav fix(react): Fetch userUrn from cookie while uploading documentation links
- #2449 @hsheth2 fix(ingest): remove datahub.metadata import shortcut
- #2479 @hsheth2 docs(ingest): add gradle build step to developing setup as well
- #2480 @hsheth2 build: ensure files are not changed in CI
- #2481 @hsheth2 test(ingest): rename TestSource -> FakeSource
- #2474 @bboylen feat(react): makes user profile select first ownership item automatically
- #2464 @hsheth2 test: add smoke test
- #2475 @hsheth2 fix(ingest): guess hook type from name
- #2476 @hsheth2 test(ingest): add test names and IDs using pytest
- #2477 @gabe-lyons fix(lineage): fixing batch requests
- #2462 @gabe-lyons fix(lineage): Fix issue where large lineage fetches trigger 414 URI too long
- #2472 @hsheth2 fix(ingest): use postgres data platform urn
- #2468 @gabe-lyons fix(react): removing non-functional page size change controls on results page
- #2469 @RickardCardell feat(react): add custom properties tab on dashboard profile page (#2439)
- #2470 @topwebtek7 fix(bootstrap): update bootstrap data with more realistic nested schemas
- #2466 @gabe-lyons docs(ingest): Update README.md to add superset source
- #2455 @hsheth2 fix(ingest): support Airflow 1.10.x style lineage in Airflow 2
- #2465 @hsheth2 test(ingest): fix mypy issue in schema util test
- #2458 @topwebtek7 feat(react): add dashboards tab in charts entity
- #2463 @hsheth2 feat(ingest): capture default values in Avro schemas
- #2461 @hsheth2 fix(ingest): fields with defaults should be optional
- #2451 @hsheth2 feat(ingest): setup scaffolding for tox testing
- #2456 @hsheth2 docs(ingest): clarify options field and fix bigquery sample config
- #2459 @dexter-mh-lee fix(docker): Nuke ingestion containers when calling docker/nuke.sh
- #2457 @hsheth2 fix(cli): check docker setup containers
- #2453 @bboylen docs: Corrected typo on docs/docker/development.md
- #2460 @hsheth2 fix(build): remove bintray deps
- #2442 @dexter-mh-lee feat(search): Support search terms that are dataset platform names
- #2446 @hsheth2 fix(ingest): setup pyproject.toml
- #2445 @hsheth2 fix(ingest): various updates to datahub rest sink
- #2444 @hsheth2 fix(ingest): add snowflake warehouse and role to config
- #2443 @topwebtek7 feat(react): update schema table hierarchy
- #2448 @jjoyce0510 feat(React): Adding Tags to Previews & Fixing Dashboard / Charts Ownership Updates