Releases: datahub-project/datahub
DataHub v0.8.4
Release Highlights
- Dataset Popularity, Recent Queries powered by Usage logs (support for Snowflake, BigQuery)
- Markdown descriptions and editing
- New Integrations : Glue Jobs, Feast
- Versioned API for metadata GETs
- No neo4j requirement, Elastic for Graph
- Docker image hardening
- Improved logging
- GCP Deployment Guide
Changelog
- #2773 @jjoyce0510 feat(logs): add thresholding, misc cleanup
- #2771 @topwebtek7 fix(react): update platform text in dataset profile header
- #2772 @dexter-mh-lee fix(nocode): removing service PDL
- #2761 @jjoyce0510 feat(logs): improve logging in GMS and datahub-frontend
- #2769 @topwebtek7 fix(react): fix graphql apollo cache update issue cause of usagestats
- #2768 @dexter-mh-lee feat(k8s): add GCP deploy recipe
- #2767 @hsheth2 fix(react): update the query frequency text label
- #2766 @hsheth2 docs(ingest): move usage stats docs into the "sources" section
- #2728 @jjoyce0510 fix(datahub-upgrade): removing the CleanupStep from datahub upgrade
- #2763 @dexter-mh-lee fix(docker): Fix dependency vulnerability
- #2765 @hsheth2 fix(react): move percent sign after number and update meta tag
- #2695 @thomasplarsson fix(frontend): auth session ttl specified in hours instead of days. M…
- #2764 @hsheth2 docs: update for Jun townhall
- #2762 @hsheth2 feat: usage stats (part 2)
- #2750 @hsheth2 feat: usage stats (part 1)
- #2759 @topwebtek7 fix(react): reverse result of topological sort, autofocus in add tag modal
- #2753 @gabe-lyons feat(elastic-as-graph): defaulting to elastic in quickstart
- #2760 @hsheth2 fix(docker): use head tag for datahub-ingestion
- #2754 @gabe-lyons Revert "fix(gms): add rest.li validation in gms (#2745)"
- #2751 @dexter-mh-lee fix(browse): sort by doc count descending
- #2752 @gabe-lyons fix(elastic-as-graph): adding elasticsearch setup back in
- #2739 @jjoyce0510 fix(gms): make get return entity type
- #2749 @remisalmon feat(ingest): add option to specify source platform database in lookml ingestion
- #2729 @kevinhu feat(ingest): ingest last-modified from dbt sources.json
- #2746 @dexter-mh-lee fix(docker): modernize docker images and fix vulnerabilities
- #2740 @jjoyce0510 fix(datahub-upgrade): add support for postgres migration
- #2745 @jjoyce0510 fix(gms): add rest.li validation in gms
- #2748 @jjoyce0510 feat(quickstart): remove orphaned docker containers on quickstart through cli
- #2744 @jjoyce0510 feat(docker): reduce quickstart footprint
- #2747 @kevinhu fix(ci): increase wait-for-it timeout to fix flaky feast test
- #2743 @kevinhu feat(ingest): print docker logs on timeout
- #2726 @gabe-lyons feat(graph): support using elasticsearch as graph backend.
- #2723 @gabe-lyons fix(gms): fixes for version aspect fetching
- #2741 @hsheth2 fix(ingest): update lookml test
- #2742 @kevinhu fix(ingest): fix lookml platform URN
- #2687 @kevinhu feat(ingest): add support for Glue ETL jobs
- #2716 @kevinhu fix(ingest): types for dbt
- #2722 @kevinhu fix(ci): increase Feast docker setup timeout
- #2737 @remisalmon fix(looker): fix invalid URN syntax error
- #2730 @dexter-mh-lee fix(docker): update default tags to head
- #2727 @gabe-lyons fix(docs): update extending-the-metadata-model.md
- #2721 @gabe-lyons fixing docs sidebar
- #2719 @dexter-mh-lee Update version for helm
DataHub v0.8.3
Release Notes
Bug fix release that fixes editable descriptions bug from previous release.
Previous version release notes: https://github.com/linkedin/datahub/releases/tag/v0.8.2
Changelog
#2718 @topwebtek7 fix(react): update schema description edit behavior
DataHub v0.8.2
Release Notes
Bug fix release that fixes installation, upgrade and usability issues with v0.8.1 specifically around product analytics.
Read the release notes for v0.8.1 here.
Full list of improvements follow.
Changelog
- #2711 @jjoyce0510 fix(noCode): Improving efficiency of EntityService "listLatestAspects" API
- #2715 @hsheth2 feat(quickstart): use versioned docker images when on a release tag
- #2710 @gabe-lyons fix(editable descriptions): adding indexing for editable descriptions
- #2705 @vijayan-nallasami-curve added node_type_pattern in dbt yaml file
- #2702 @vijayan-nallasami-curve Removed key not in catalog filter from dbt and added node filter using AllowDenyPattern
- #2712 @hsheth2 feat(ingest): expose additional types to Python via codegen
- #2674 @kuntalkumarbasu feat: Adding annotation option to frontend service
- #2699 @hsheth2 test(ingest): simplify docker cleanup commands
- #2708 @hsheth2 fix(ingest): use
looker
data platform - #2709 @kevinhu feat(ingest): Add test case and docs for SQL view ingestion
- #2707 @topwebtek7 feat(entities): add markdown description update/viewer feature in dataset, datajob, dataflow, chart and dashboard, update ui/ux
- #2677 @gabe-lyons feat(aspects): support fetching of versioned aspects
- #2691 @kwark Add support for different oidc client authentication methods
- #2701 @JeffSkinner feat(gms): support basic auth header when connecting to elasticSearch
- #2697 @dexter-mh-lee fix(NoCode): Fix product analytics
- #2690 @jjoyce0510 feat(gms): Merge MAE, MCE consumers into GMS
- #2689 @jjoyce0510 feat(datahub cli): DataHub CLI Quickstart
- #2672 @kevinhu feat(docs): Docs for S3 ingestion with AWS Glue
- #2686 @remisalmon Only work with dbt catalog data if load_catalog is False
- #2688 @gabe-lyons removing whitespace from service aspect
- #2663 @vlavorini feat(sql_views): added views as datasets for SQLAlchemy DBs
- #2676 @hsheth2 feat(ingest): support Oracle service names
- #2684 @hsheth2 fix(ingest): upgrade acryl-pyhive to use sasl3 instead of sasl
- #2632 @RickardCardell Fix 2592 neo4j connection options
- #2670 @hsheth2 fix(ingest): pin to new mypy version
- #2637 @kevinhu feat(ingest): headers for codegen Python scripts
- #2675 @thomasplarsson fix(analytics): Support sasl authentication to kafka
- #2671 @zack3241 feat(k8s): Add imagePullSecrets to all K8's jobs
- #2667 @zack3241 Add get_identifier to hive source in metadata ingestion
- #2669 @gabe-lyons fix(no-code): Adding Chart input relationship annotations
- #2605 @kevinhu feat(ingest): Feast ingestion integration
- #2666 @kevinhu fix(ingest): fix MyPy stubs
- #2661 @hsheth2 build(docker): test docker builds in pull request CI
- #2664 @dexter-mh-lee docs(nocode): fix typo in autocomplete annotation
- #2662 @gabe-lyons docs(nocode): adding documentation to handle backwards incompatibility issues
- #2660 @hsheth2 fix(ingest): exclude mssql-odbc from "all" extra
- #2655 @dexter-mh-lee fix(NoCode): Update snapshot json to latest
- #2657 @hsheth2 fix(ingest): support mssql encryption via ODBC
- #2658 @hsheth2 fix(docker): use debug tag in local dev images
- #2659 @hsheth2 revert: "fix(docker): pin containers to golden hash for release (#2654)"
- #2656 @gabe-lyons docs(nocode): Adding documentation for no-migration upgrade option
- #2643 @dexter-mh-lee feat(nocode): Add datahub-upgrade job to helm chart and set version to v0.8.1
DataHub v0.8.1
Release Notes
- Bug fix release that fixes installation and upgrade issues with v0.8.0.
- Read the release notes for v0.8.0 here.
Changelog
- #2654 @hsheth2 fix(docker): pin containers to golden hash for release
- #2653 @jjoyce0510 feat(datahub-upgrade): improve no code upgrade logging
- #2651 @hsheth2 docs(nocode): fix links to datahub-upgrade
- #2646 @jjoyce0510 fix(gms): Return empty snapshots when one does not exist
- #2652 @jjoyce0510 fix(datahub-upgrade): fixing mis-spelled schema registry url
- #2650 @hsheth2 fix: use head tag by default in quickstart
- #2649 @dexter-mh-lee docs(nocode): Update migration docs with workaround for stuck upgrade
- #2648 @dexter-mh-lee fix(nocode): Fix docker image tag for run_upgrade script
- #2647 @dexter-mh-lee fix(analytics): fix index names
- #2645 @hsheth2 fix(build): don't purge ingestion codegen files on gradle clean
- #2640 @jjoyce0510 fix(datahub-upgrade): remove debug flag
- #2641 @jjoyce0510 feat(datahub-upgrade): Use the "head" image tag.
- #2642 @gabe-lyons docs(no-code): removing unused snapshot annotations
DataHub v0.8.0
Notable Highlights
- Product Analytics : Understand how your users are interacting with DataHub
- Product Improvements: Auto-complete across types, Task list view under Pipelines
- Features: Business Glossary (incubating)
- Integration improvements
- Looker, dbt, Hive, Redshift, Glue, MongoDB
- Kafka Connect (incubating)
and finally,
NoCodeMetadata
This release introduces a major refactor that permits extension of DataHub’s metadata model without writing any imperative code.
Highlights:
- Removed strongly-typed, entity-specific DAOs. Added more generic services.
- Introduced Elastic settings & mappings generation, dynamic index registration & evolution
- Decoupled persistence layer from Pegasus + Java by removing fully-qualified class names (aspects, relationships)
- Introduced declarative, annotation-based mechanisms for defining indexed fields, foreign key fields, entities & aspects
- In-place upgrade CLI to aid in adopting this upgrade (datahub-upgrade)
For more information, see
The PR: #2629
Technical Overview
The DataHub Metadata Model
Extending the Metadata Model
No Code Upgrade Guide
ChangeLog
- #2629 @jjoyce0510 feat: No Code Metadata Modeling
- #2617 @shirshanka docs: update roadmap with accomplished items
- #2635 @hsheth2 fix(ingest): improve redshift ingestion performance
- #2599 @shubham49 feat(react): replace user urn with username
- #2623 @gabe-lyons fix(react): url encoding urns and tag profile fix
- #2634 @hsheth2 fix(ingest): include urn as key for kafka emitter
- #2636 @dexter-mh-lee fix(ci): update trigger to always generate docker images
- #2622 @RickardCardell feat(react): custom properties are now sortable by name in the UI
- #2626 @thomasplarsson fix(ingestion): improve robustness of glue ingestion source
- #2619 @topwebtek7 fix(react): update ispartofbuilderfromdataflow, update ui in datajob header
- #2620 @jjoyce0510 feat(analytics): support configuration of Kafka SSL
- #2618 @topwebtek7 feat(react): eliminate noises in react build, test and cleanup, get rid of warnings
- #2616 @dexter-mh-lee docs: added AWS deployment guide
- #2615 @gabe-lyons fix(react): fixing tags autocomplete bug
- #2590 @saxo-lalrishav feat(react): business glossary and user - tab based profile page
- #2612 @hsheth2 docs: update homepage text
- #2614 @hsheth2 fix(ingest): fail gracefully when lookml used on old python versions
- #2603 @topwebtek7 feat(react): update collectionname in datajob, update tabs ui/ux
- #2604 @topwebtek7 feat(graphql): redesign autocomplete to search for all entity types, show suggestions grouped by entities
- #2606 @hsheth2 feat(ingest): populate inputDatajobs field in airflow integration
- #2602 @topwebtek7 feat(react): add parent flow link on datajob page
- #2596 @remisalmon fix(ingest): fix lineage after dbt metadata ingestion when tables name and identifier differ
- #2600 @topwebtek7 feat(react): add topological sort feature, update graphql, add tests
- #2607 @hsheth2 feat(ingest): update bigquery demo data
- #2609 @hsheth2 fix(docs): various fixes and additions
- #2601 @frsann feat(ingestion): Fix looker test
- #2577 @shubham49 feat(react): add glossary term to dataset preview
- #2585 @afranzi fix(ingest): incorrect implementation of the allow pattern in looker dashboards
- #2591 @martha feat(react): add optional subtitle to home page
- #2598 @kevinhu fix(ingest): default values for env
- #2575 @hsheth2 docs(ingest): add a guide for writing sources
- #2594 @topwebtek7 feat(react): add nativeDataType with tooltip over icon in schema
- #2595 @havramar docs: Add Plum Research to POC adoption section in README.md
- #2589 @martha feat(react): prevent logo distortion
- #2586 @gabe-lyons fix(react): fix tag autocomplete after creating a new tag
- #2583 @bboylen feat(react): Add label to edited dataset descriptions
- #2579 @topwebtek7 feat(dataflow): update dataflow to have datajobs in new tab
- #2584 @john-bodley fix(docs): Fix Superset typo in README
- #2574 @zack3241 fix(helm charts): remove connection tests from helm charts
- #2582 @hsheth2 build(ingest): show diff upon lint failures
- #2516 @taufiqibrahim feat(ingest): kafka connect metadata ingestion
- #2580 @hsheth2 feat(ingest): add dataset tag transformer
- #2573 @hsheth2 test(ingest): use different mysql test port
- #2549 @shubham49 feat(react): link glossary term to dataset page
- #2572 @hsheth2 test(ingest): ensure transformer registry works for aliases
- #2571 @hsheth2 fix(ingest): better active directory LDAP support
- #2483 @luck02 fix(dbt): set target platform and load schema
- #2563 @afranzi feat(ingest): add AWS IAM Roles Support to the Glue Source
- #2566 @saxo-lalrishav fix(react): Update raw schema view to support non json schemas
- #2570 @saxo-lalrishav fix(react): Removing a user having multiple role from owner tab also remove the other roles associated to that user
- #2562 @sunkickr docs: Add Sphinx Docstrings to Airflow Modules
- #2560 @hsheth2 fix(cli): prevent click from suppressing errors
- #2559 @hsheth2 docs: include license in the readme
- #2561 @hsheth2 fix(ingest): check mypy types for test helpers
- #2558 @shirshanka docs: town-hall updates and some badges
- #2557 @hsheth2 feat(ingest): add options for Airflow lineage backend
- #2467 @pedro93 feat(k8s): generalizes CronJob metadata ingestion resource for custom logic
- #2546 @kevinhu feat(ingest): MongoDB schema inference
- #2556 @gabe-lyons fix(search): have search bar ignore blank searches
- #2553 @gabe-lyons fix(owner): fixing ownership routing
- #2555 @gabe-lyons feat(business glossary): hiding business glossary until all features completed
- #2493 @frsann feat(ingest): Looker view and dashboard ingestion
- #2538 @saxo-lalrishav feat(business glossary): search, browse and entity page for business glossary terms
- #2543 @hsheth2 fix(ingest): register custom Hive types
- #2544 @hsheth2 docs(ingest): improve kafka schema registry config docs
- #2545 @G-nther fix(analytics): use seperate env variable for tracking topic in MAE-Consumer
- #2547 @hsheth2 ci(docker): disable GitHub Docker registry
- #2521 @hsheth2 refactor(ingest): move Airflow into
datahub_provider
module - #2539 @dexter-mh-lee fix(analytics): add support for AWS ES
- #2540 @afranzi feat(ingest): define Redshift as a Postgres Source
- #2541 @jjoyce0510 fix(react): disable analytics link display
- #2542 @topwebtek7 fix(react): fix type issue with adding new in ownership
- #2531 @hsheth2 build(ingest): use gradle in commands + docs
- #2536 @hsheth2 fix(ingest): remove mce.json file from root
- #2535 @gabe-lyons fix(react): fixing import issue
- #2534 @kevinhu docs: autoplay and navigation for source logos carousel
- #2519 @topwebtek7 feat(usergroup): implement corpgroup in graphql, refactor avatars and ownership in react
- #2532 @hsheth2 feat(ingest): add a transformer for adding ownership
- #2485 @shubham49 feat(graphql): add graphql types for business glossary
- #2533 @dexter-mh-lee fix(k8s): Fix helm charts for supporting analytics
- #2499 @jjoyce0510 feat(Product Analytics): Introducing In-App Analytics Beta
- #2529 @hsheth2 docs: enable better syntax highlighting
- #2528 @kevinhu docs: Use carousel layout for ingestion source logos
- #2530 @jjoyce0510 fix(model): removing reference to go link in SchemaFieldPath model
- #2515 @hsheth2 docs: update docusaurus
- #2527 @dexter-mh-lee fix(k8s): change defaultMode for certs volume
- #2503 @hsheth2 feat(ingest): check in generated schema files
- #2526 @nickwu241 fix(k8s): fix kafka-setup-job.yml datahub-certs-dir mountPath
- #2512 @dexter-mh-lee fix(k8s): comment out minikube specific settings
- #2522 @hsheth2 fix(ingest): generate Airflow tags correctly
- #2523 @topwebtek7 feat(react): set fixed height for dataset preview
- #2525 @kevinhu docs: Add ingestion source logos grid
- #2524 @hsheth2 fix(ingest): add support for custom postgres types
- #2228 @shakti-garg feat(business_glossary): add new entity business term and its relationship with dataset and its fields
- #2520 @hsheth2 fix(build): only check for src/ and tests/ directories for lint checks
- #2514 @hsheth2 docs: update Wolt logo
- #2513 @hsheth2 build(ingest): include package data in sdist
- #2510 @hsheth2 build(ingest): add metadata-ingestion to gradle build
- #2509 @hsheth2 docs: improve airflow explanations and examples
- #2508 @hsheth2 fix(ingest): remove double edges from Airflow lineage backend
- #2505 @vlavorini docs: fixed MCE file recipe example
- #2500 @dexter-mh-lee docs(k8s): Update readme with helm prerequisite
- #2501 @gabe-lyons feat(lineage): removing dataset<>dataset edge in job...
DataHub v0.7.1
Notable Highlights
- Lineage Visualization
- Pipelines and Tasks, Flows and Jobs
- Airflow Lineage
- Editable Field Descriptions
- Nested Schema Viz
- Search Improvements
- datahub CLI
- Official PyPi packages
- Production-quality Helm scripts
- New Integrations
- Officially-supported Sources: Airflow, AWS Glue, dbt, Druid, Superset, MongoDB, Oracle
Changelog
- #2440 @dexter-mh-lee feat(k8s): Move helm charts out of contrib
- #2397 @gabe-lyons feat(lineage): implement support for datasets, charts and dashboards downstream lineage fetching in a generic way
- #2434 @adriaanslechten feat(ingest) LDAP groups ingestion
- #2438 @hsheth2 fix(ingest): use entrypoints lib instead of pkg_resources
- #2425 @gabe-lyons feat(ingest): adding superset ingestion source
- #2433 @topwebtek7 fix(react): fix lineage sidebar buttons
- #2436 @hsheth2 fix(ingest): support custom snowflake types
- #2419 @topwebtek7 feat(react): add dataJob, dataFlow entity pages, refactor with fragments
- #2418 @frsann Fix(search): fix datajob and dataflow search mappings
- #2429 @hsheth2 fix(ingest): fix chart type enum serialization and add tests for rest emitter
- #2431 @shirshanka docs: Update agenda for Apr 23 townhall
- #2427 @hsheth2 fix(ingest): ensure upstreams in airflow lineage emission are entities
- #2426 @hsheth2 fix(ingest): include database info for snowflake
- #2424 @hsheth2 feat: add s3 data platform and logo
- #2423 @topwebtek7 feat(react): schema visualization add support for nested structs
- #2422 @topwebtek7 fix(react): lineage sidebar buttons should refer to the selected entity
- #2421 @dexter-mh-lee fix(kafka-setup): Fix start script for kafka setup
- #2417 @topwebtek7 feat(react): update dataset entity default svg icon
- #2411 @thomasplarsson feature(ingestion): Adding the concept of transformers
- #2415 @dexter-mh-lee fix(k8s): Add credentials to kafka-setup job and clean up
- #2412 @hsheth2 feat(ingest): add Kafka-based emitter example
- #2413 @gabe-lyons fix(lineage): allow lineage viz to handle circular dependencies
- #2414 @dexter-mh-lee fix(kafka-setup): Add the correct context to the git workflow for pushing kafka-setup image
- #2403 @hsheth2 fix(ingest): bump avro-gen3
- #2406 @topwebtek7 feat(react): use default entity icon if lineageentity has no icon
- #2408 @hsheth2 fix(ingest): properly handle fieldDiscriminator with restli
- #2409 @hsheth2 fix(ingest): add sqlalchemy extra
- #2398 @G-nther feat(kafka-setup): add option for SSL and topic partition config via environment
- #2404 @dexter-mh-lee feat(k8s): add extraEnvs to setup jobs
- #2407 @topwebtek7 feat(react): add footer buttons in lineage sidebar
- #2405 @thomasplarsson feature(ingestion): Make origin/fabric_type configurable
- #2384 @topwebtek7 feat(react): add padding between tags and description on datasets profile page
- #2396 @gabe-lyons feat(sample): adding sample mces for dataflows and datajobs
- #2400 @hsheth2 fix(ingest): streamline codegen init methods
- #2382 @topwebtek7 feat(react): update schema table to have fixed description column, set line break with max description width
- #2402 @dexter-mh-lee fix: Fix env variable setup for kafka, mysql-setup docker containers
- #2401 @hsheth2 fix(ingest): add db name to postgres URNs
- #2393 @hsheth2 fix(ingest): enable mypy
disallow_incomplete_defs
anddisallow_untyped_decorators
- #2395 @gabe-lyons fix(react): fix access to pictureLink in charts and dashboards
- #2399 @gabe-lyons fix(tags): check description existence on tags
- #2383 @topwebtek7 feat(react): fix long descriptions overflow issue in lineage side panel
- #2392 @hsheth2 refactor(ingest): update test harness to use a compose file per test
- #2391 @topwebtek7 feat(react): fix browse link of last breadcrumb linked to unknown page
- #2385 @dexter-mh-lee feat(mysql-setup): Add the ability to specify database name for mysql-setup
- #2389 @hsheth2 feat(ingest): add generic sqlalchemy source
- #2390 @dexter-mh-lee feat(k8s): Add ability to add service accounts to setup jobs
- #2387 @dexter-mh-lee fix(kafka-topic-convention): Fix DAOs that do not refer to TopicConvention
- #2386 @dexter-mh-lee feat(index): Add index naming convention for elasticsearch
- #2388 @hsheth2 fix(ingest): report correct version status in dev mode
- #2368 @hsheth2 feat(ingest): add Airflow lineage backend
- #2380 @OddCN fix(docs): fix config example for file sink
- #2362 @dexter-mh-lee feat(k8s): Update pods with correct probes and remove unnecessary dependencies
- #2372 @thomasplarsson fix(ingestion): dont crash on non-RecordSchema topics
- #2360 @hsheth2 docs(ingestion): remove outdated data-source-onboarding.md docs
- #2376 @topwebtek7 feat(react): hide Owned By label in card if no owners
- #2373 @shubham49 fix(react): ownership rendering
- #2377 @topwebtek7 feat(react): add null state indicator in user profile when no entities
- #2379 @topwebtek7 feat(react): update avatar to use initial if no image, refactor all avatars with custom one
- #2369 @gabe-lyons feat(lineage): support arbitrary entity types in lineage viz
- #2364 @thomasplarsson fix(ingestion): Support mapping from avro "boolean" and "map" types t…
- #2343 @thomasplarsson fix(ingestion): properly detect optional fields in avro schemas
- #2370 @topwebtek7 feat(react): add empty state UI for browse when no entities
- #2242 @frsann feat(datajob): Datajob graphql query
- #2367 @topwebtek7 feat(react): add dropdown menu links, menu styling, removed warnings
- #2365 @frsann chore(dependabot): Update pyyaml version
- #2366 @topwebtek7 feat(react): add icons on entities, updated styling in LineageViz
- #2351 @hsheth2 fix(ingest): add test for avro serialization and deserialization
- #2361 @hsheth2 feat(cli): Add support for checking docker memory usage
- #2358 @topwebtek7 feat(react): original description shows in edit modal even when the description has been updated
- #2357 @gabe-lyons feat(react): improving error logging on dataset entity
- #2356 @dexter-mh-lee fix(elasticsearch): Fix inconsistencies between documents and elasticsearch mappings
- #2359 @hsheth2 fix(ingest): support
python3 -m datahub
- #2353 @hsheth2 chore(ingest): remove unused
plugin_requirements.txt
file - #2352 @hsheth2 fix(ingest): bump pybigquery version
- #2350 @hsheth2 fix(ingest): support
datahub --version
- #2349 @gabe-lyons feat(lineage): improve lineage re-focus experience
- #2341 @frsann feat(tags): Add tag graph builder
- #2348 @jjoyce0510 fix(Ember App): Allow ember build (disabled by default)
- #2345 @hsheth2 fix(cli): add --verbose flag for
datahub check plugins
- #2346 @gabe-lyons fix(lineage): add upstream arrows back in
- #2347 @hsheth2 feat(ingest): add Oracle db support
- #2336 @topwebtek7 feat(react): add description edit behavior along with modal
- #2340 @gabe-lyons feat(lineage): adding ghost edges indicating hidden dependencies
- #2331 @hsheth2 feat(ingest): start airflow integration + metadata builders
- #2339 @hsheth2 fix(ingest): add support for database and table patterns to glue source
- #2338 @hsheth2 fix(docker): remove
restart: always
from docker-compose for consistency - #2335 @gabe-lyons feat(lineage): adding directionality to lineage edges to make the visualization more clear
- #2337 @gabe-lyons fix(lineage): fixing lineage layout bugs
- #2319 @amonkhouse feat(ingest): adding support for AWS Glue
- #2312 @shakti-garg feat(es-setup): add logic in elasticsearch setup to compare-and-update index if already exists
- #2333 @gabe-lyons feat(lineage): expandable lineage visualization for dataset <> dataset lineage
- #2332 @hsheth2 docs: add wolt logo to frontpage
- #2315 @grantatspothero feat(ingest): adds experimental support for ingesting Looker metadata
- #2330 @luck02 fix(test): dbt-manifest files
- #2329 @topwebtek7 feat(react): moving filter panel from modal to drawer
- #2328 @hsheth2 build: remove deprecated ember app from build
- #2327 @hsheth2 feat(ingest): verify dynamic registry types at runtime
- #2316 @joemirizio feat(ingest): dynamically register plugins
- #2325 @hsheth2 fix(ingest): remove outdated metadata-ingestion scripts
- #2313 @shakti-garg fix(k8s): make es-setup job parameters more contextual
- #2322 @gabe-lyons docs(theme): making
yarn start
instructions more explicit - #2317 @hsheth2 doc: update slack links to https
- #2324 @frsann fix(datajob): Fix URI templates for datajob and dataflow
- #2320 @frsann fix(tags): Support creating tags with MCE
- #2323 @arunvasudevan fix(docs): Update metadata-serving.md
- #2318 @dexter-mh-lee fix(docker): Fix issue in gms start.sh
- #2321 @shirshanka docs: Update next townhall details, fixup links and misc docs
- #2251 @bernardino feat(Kubernetes): Add JMX exporter containers to all DataHub components
- #2308 @dexter-mh-lee fix(search): Fix styling for column match snippet
- #2302 @shakti-garg feat(k8s): Add k8s hook in datahub helm chart for setting up elasticsearch
- #2298 @dexter-mh-lee feat(docker): Add dockerfile for initializing an existing mysql server
- #2297 @shakti-garg feat(kafka-config): add variable KAFKA_CONSUMER_GROUP_ID to ove...
DataHub v0.7.0
Notable Highlights
- New React Application re-written from the ground up
- Support for GraphQL
- New Metadata Ingestion Framework (Python)
- Officially-supported Sources: Kafka, MySQL, SQL Server, Hive, Postgres, Snowflake, BigQuery, AWS Athena, Druid, LDAP
- New Homepage and Hosted Docs redesign at datahubproject.io
- Product Features: SSO (OIDC), Tags, Themes, Dashboards
- Metadata Backend Implementations: MLModel ecosystem, DataFlow ecosystem
- Move to Elasticsearch 7. Migration guide from 5.x here
Changelog
- #2263 @jplaisted feat(search) BREAKING Support ElasticSearch 7, drop ES5
- #2260 @gabe-lyons fix(tags): fixing margins on tags for long descriptions
- #2259 @hsheth2 docs: update roadmap progress
- #2258 @dexter-mh-lee refactor(demo): Add empty global tags to BigQuery demo data
- #2255 @jjoyce0510 feat(react): Adding shadow and deeper linear gradient
- #2254 @gabe-lyons feat(tags): improving elastic search templates for tags
- #2253 @gabe-lyons fix(tags): fix ownership on tag create
- #2256 @hsheth2 fix: update slack links
- #2248 @gabe-lyons feat(tags): editing tags from react client on datasets, schemas, charts & dashboards
- #2252 @jjoyce0510 refactor(react): React as the default UI
- #2246 @hsheth2 feat(ingest): various minor fixes
- #2245 @jjoyce0510 feat(react): Adding big query logo
- #2249 @gabe-lyons fix(react): enabling charts and dashboards to be supported by theme config
- #2235 @pedro93 feat(ingest): Add support for druid
- #2244 @gabe-lyons feat(react): moving schema tab to be default
- #2243 @shirshanka docs: adding mar-19 townhall agenda
- #2240 @dexter-mh-lee feat(tags): Enable search for datasets by tags
- #2236 @pedro93 feat(k8s): Add metadata-ingestion as a Helm component
- #2241 @shirshanka docs: Improving architecture docs
- #2239 @hsheth2 feat(docs): use gradle for building docs
- #2232 @hsheth2 fix(ingest): various avro codegen fixes
- #2237 @gabe-lyons fix(dataflow): fixing browse dao access
- #2166 @arunvasudevan feat: MLmodel Graphql Query
- #2197 @frsann feat(datajob): Backend implementation
- #2233 @jjoyce0510 refactor(react): All entity search UI + misc improvements
- #2234 @jjoyce0510 docs(react): Oidc React Doc Updates
- #2231 @dexter-mh-lee fix(docker): start issue when there are multiple kafka brokers in bootstrap config
- #2227 @jjoyce0510 refactor(React): Misc UI improvements
- #2230 @hsheth2 fix(ingest): pin version of avro-gen3
- #2226 @hsheth2 fix(ingest): use python extras in docker image
- #2224 @hsheth2 feat(ingest): use plugin system based on Python extras
- #2190 @jjoyce0510 feat(react): SSO support simple OIDC authentication
- #2223 @dexter-mh-lee Added images to es/kafka-setup
- #2222 @dexter-mh-lee fix(ci): rename file to match git workflow needs
- #2220 @dexter-mh-lee fix(ci): remove paths_ignore from workflow files
- #2219 @thomasplarsson refactor(ingest): improve athena source api and documentation
- #2221 @gabe-lyons fix(ci): setting CI to false for builds
- #2218 @gabe-lyons feat(react): hiding raw schema button when no raw schema exists
- #2216 @dexter-mh-lee fix(es-setup): Add git workflows to upload docker for elasticsearch and kafka setup
- #2213 @thomasplarsson feat(ingest): add aws athena ingestion source
- #2217 @gabe-lyons fix(ci): fail CI on react build errors
- #2215 @gabe-lyons fix(react): fix theming test in react and simplifying api
- #2209 @thomasplarsson feat(ingest): add option for optimized skipping of schemas
- #2212 @hsheth2 fix(ingestion): nullable types and timestamp precision
- #2207 @hsheth2 feat(ingest): standalone metadata emitters
- #2205 @dexter-mh-lee fix(ci): Fix github package path
- #2204 @dexter-mh-lee feat(ci): Add SHA based tagging before pushing to docker registries
- #2203 @gabe-lyons feat(tag): adding search for tags in gms layer
- #2193 @gabe-lyons feat(react): adding ability to support theming of datahub, with two themes included
- #2201 @hsheth2 feat: add date and time types to SQL model
- #2202 @thomasplarsson feat(mae-consumer): enable mae-consumer to use ssl when communicating with elasticsearch
- #2199 @thomasplarsson fix(mae-consumer): mae-consumer needs sslcontext bean
- #2181 @shirshanka chore: renaming business_glossary rfc directory with pull request number
- #2182 @shirshanka chore: renaming graphql_frontend rfc directory with pull request number
- #2183 @shirshanka chore: renaming react-app rfc directory with pull request number
- #2196 @shirshanka docs(roadmap): update project roadmap
- #2195 @jjoyce0510 fix(graphql): Add "fixed" SchemaFieldDataType mapping
- #2194 @gabe-lyons feat(tags): Enriching sample data for tags
- #2191 @hsheth2 feat(docs): automatically populate sidebar with RFCs
- #2192 @jplaisted (feat) Simple python script to carry over ES indices from 5 to 7.
- #2173 @brendansun93 feat(React): Ownership component of user profile
- #2189 @thomasplarsson feat(gms): add elasticsearch SSL support
- #2112 @frsann feat(tags): RFC for tags
- #2187 @gabe-lyons fix(react): fixing test issues that arose from ill-timed merges
- #2164 @gabe-lyons feat(tags): adding support for read/write of tags in gms & read-only in react datahub-frontend.
- #2185 @jjoyce0510 feat(graphql): More forgiving for unknown data platforms during reads
- #2184 @jjoyce0510 test(React): Home page tests
- #2186 @hsheth2 fix(docs): fix broken links
- #2179 @gabe-lyons feat(react): adding raw schema view option for table schemas
- #2178 @hsheth2 feat(ingest): bigquery sample data
- #2176 @hsheth2 docs: point to hosted docs site
- #2177 @hsheth2 docs(ingest): clarify setuptools requirement
- #2175 @hsheth2 build(docs): only deploy docs on main repo
- #2174 @hsheth2 docs: hosted documentation website
- #2167 @jjoyce0510 feat(React): Impl browse UI for Dashboards and Charts
- #2168 @jjoyce0510 fix(React): Fix Browse Pagination Bug
- #2172 @hsheth2 fix(ingest): loosen Kafka broker validation
- #2165 @jjoyce0510 feat(DataPlatform Logos): Adding server driven logos
- #2171 @hsheth2 docs(ingest): clarify Kafka connection config
- #2169 @shirshanka doc(townhall): Add links for Feb 19, upcoming townhall on Mar 19
- #2161 @hsheth2 fix(ingest): bigquery source and dataset naming fixes
- #2163 @jjoyce0510 fix(graphql): Bubbling up exceptions logged in GraphQL resolvers
- #2159 @hsheth2 build(ingest): use multi-stage docker build for datahub-ingestion
- #2157 @hsheth2 feat(ingest): capture table descriptions
- #2158 @hsheth2 feat(ingest): switch quickstart to Python ingestion
- #2156 @pedro93 feat(ingest): support alternative authentication in sql ingestion
- #2152 @gabe-lyons fix(react): fixing format we propagate filters to graphql in
- #2154 @gabe-lyons feat(react): Redirecting /assets to index
- #2151 @hsheth2 build(docker): add large generated directories to dockerignore
- #2150 @hsheth2 ci(ingest): setup docker container for metadata ingestion
- #2145 @RickardCardell feat: neo4j Bolt TLS support (#2100)
- #2143 @dexter-mh-lee feat(dashboards): Add browse end point for charts and dashboards
- #2144 @RickardCardell feat: neo4j https support (#2101)
- #2147 @gabe-lyons docs(frontend): Update docs to clarify running local frontend w/ local react app
- #2148 @jjoyce0510 feat(gms): Add optional data platform display name
- #2149 @jplaisted Switch GMA dep from bintray to artifactory.
- #2146 @jjoyce0510 Fixing required audit stamps bug
- #2140 @jjoyce0510 feat(React): Search page UI improvements, 'all' entity search.
- #2133 @thomasplarsson feat(datahub-dao): enable services to access gms over https
- #2136 @hsheth2 feat(ingest): support Postgres PostGIS extensions
- #2139 @gabe-lyons docs(Ownership): making lack of support for ownergroups in frontend explicit in pdl
- #2137 @dexter-mh-lee refactor(docker-dev): set up elasticsearch using local mapping on docker-compose.dev
- #2135 @hsheth2 ci(ingest): run apt update
- #2134 @hsheth2 refactor(ingest): cleanup configuration models
- #2130 @jjoyce0510 feat(React UI): SearchPage and SearchResultsPage
- #2132 @jjoyce0510 Add URL to dashboard / chart page
- #2131 @gabe-lyons fix(React): Adding test coverage for search page & fixing filter select bug
- #2128 @jjoyce0510 fix(react): Fix authenticated user profile
- #2125 @hsheth2 fix(ingest): gracefully handle unknown types
- #2127 @jjoyce0510 feat: Introducing optional DataPlatform logo url
- #2124 @hsheth2 fix(ingest): update sample MCEs based on MLModel changes
- #2126 @jjoyce0510 fix(gms): fix getAllDataPlatforms bug
- #2123 @hsheth2 docs(ingest): add solutions for common install issues
- #2122 @hsheth2 feat(ingest): add support for LDAP ingestion
- #2120 @hsheth2 test(ingest): verify the output of mssql
- #2119 @jjoyce0510 feat(React): Adding basic chart + dashboard UI
- #2115 @brendansun93 feat(React): Avatar dropdown menu and logout function
- #2121 @hsheth2 feat(ingest): improve error reporting for pipelines
- #2117 @jjoyce0510 feat(GraphQL API): GQL implementation of Charts + Dashboards
- #2118 @...
DataHub v0.6.1
Added
- #2021 Add a CODEOWNERS file @jplaisted
- #1884 feat(dashboard): Dashboards backend implementation @keremsahin1
- #2001 feat(dataset): Enable search of datasets by field names @nagarjunakanamarlapudi
- #1986 feat: enable SCSI for datasets @jywadhwani
- #1936 feat(field-level-lineage): Add models for field level lineage @nagarjunakanamarlapudi
- #1842 feat(business-glossary):RFC for Business Glossary @pmsrao
- #1985 add LocalDAOStorageConfigFactory for SCSI @jywadhwani
- #1978 add SCSI bootstrap script for datasets @jywadhwani
Changed
- #2027 fix: ingestion docker image @jplaisted
- #2022 Fix dataset index creation issue @nagarjunakanamarlapudi
- #2008 feat(models): Add DataFlow and DataJob models @hshahoss
- #2009 fix/docs(frontend): Syncs UI with internal frontend @cptran777
- #2016 docs: upload updated deck @mars-lan
- #2015 docs: update links @mars-lan
- #2011 Townhall agenda for December 4 @nagarjunakanamarlapudi
- #2007 Bump GMA to latest @jplaisted
- #2005 feat(kubernetes): Add pod-level annotations to the datahub helm charts @shakti-garg-saxo
- #2004 1995 | fix indentation value in helm deployment templates @shakti-garg-saxo
- #1999 Update doc for configuring topic names @shakti-garg-saxo
- #1979 refactor(gms): use BaseLocalDAO as the interface in factories & rest.li resources @mars-lan
- #1932 feat(dashboard): Dashboard models update @keremsahin1
- #1991 fix: fix build definition of DatasetFieldUrn @jplaisted
- #1977 [Breaking] Update to GMA 0.2.0 and fix Urn definitions. @jplaisted
- #1989 2020-10-10 Syncronizing datahub-web {COMMIT-SYNC:7f757e3a514fdeff1de922112f182386bd322228} @igbopie
- #1981 1604086049622-ui-sync @igbopie
- #1988 Updates to town hall history and next town hall @nagarjunakanamarlapudi
- #1987 docs: update UI credential requirement for Quickstart @shakti-garg-saxo
- #1982 docs: update agenda of town hall @nagarjunakanamarlapudi
DataHub v0.6.0
Added
- #1940 add aspects to VALUE model of datasets @jywadhwani
- #1820 feat(Azkaban entities): RFC for Azkaban Flows and Jobs @hshahoss
- #1841 feat(field-level-lineage): RFC for field-level-lineage @nagarjunakanamarlapudi
Changed
- #1972 refactor search index builder to store urn parts efficiently @jywadhwani
- #1971 test: improve test coverage for DatasetIndexBuilder. @jplaisted
- #1969 feat: enable default restli documentation @mars-lan
- #1968 fix: add placeholder for logging call parameter @claudio-benfatto
- #1955 refactor: move code to linkedin/datahub-gma. @jplaisted
- #1931 Bump to datahub-gma 0.1.0 @keremsahin1
- #1962 Update faq.md @pardhugunnam
- #1960 Upgrade neo4j to 4.0 @keremsahin1
- #1958 fix: validate entity type for an urn @jywadhwani
- #1950 fix(login): Fix login error when corp user editable information is not present. Fixes #1948 @nagarjunakanamarlapudi
- #1949 Moves remaining references to non-inclusive language @cptran777
- #1947 Catch up fe to internal - includes module consolidations for faster build times @cptran777
- #1944 docs: update links @mars-lan
- #1933 feat(frontend): Catchup frontend for internal development changes @cptran777
- #1939 datasets client to extend browsable client @jywadhwani
- #1938 Change favicon and logo to be datahub instead of linkedin @cptran777
- #1937 refactor search index builder to store urn parts efficiently @jywadhwani
- #1913 Update tab.ts @andrewkantor
- #1935 Fixes issue where user avatar reaches internal page and improves aspects fetching from UI @cptran777
- #1934 docs: correct search over new field docs @shubhamg931
- #1929 build(docker): use community version of ES & Kibana in quickstart @mars-lan
Deleted
- #1973 get rid of search mock utils @jywadhwani
- #1964 refactor: drop unused models to prevent drifts @mars-lan
DataHub v0.5.0
Added
- #1775 feat(dashboard): Dashboard metadata models @ksahin
- #1818 doc(rfc): Add requirements / non requirements section to RFC. @jplaisted
- #1805 Start adding java ETL examples, starting with kafka etl. @jplaisted
- #1812 feat(ML models): RFC for ML models @jywadhwani
- #1721 feat: add ML models @arunvasudevan
- #1859 feat(platform): add "postgres" as a supported data platform @mars-lan
- #1844 feat(frontend): Module consolidation for some test modules and reduces errors from unsupported API calls @catran
- #1837 feat: add MCE ingestion support for CorpGroup @mars-lan
- #1821 feat(frontend): Module consolidation - clean up for OS logic - init virtual assistant @catran
Changed
- #1927 Announce DataHub's participation in Hacktoberfest @nagarjunakanamarlapudi
- #1924 Update next townhall meeting id @nagarjunakanamarlapudi
- #1916 refactor(gms): reorganize GMS factory namespace @mars-lan
- #1921 Update of townhall schedule for the next quarter @nagarjunakanamarlapudi
- #1918 fix(metadata-ingestion): Fix auditStamp unix timestamp format in sql etl ingestion @grantatspothero
- #1914 docker: Run as non-root user in docker @frsann
- #1912 doc: update search-over-new-field.md @ibona
- #1905 Adds UI support for custom dataset properties @catran
- #1909 docs: Update for topic name configuration @jplaisted
- #1904 frontend code migration and unused code removal font update and minor improvements @catran
- #1894 Add new spring factories to customize metadata event topic names. @jplaisted
- #1903 docs: update links @mars-lan
- #1901 docs: add Budapest talk @mars-lan
- #1900 build: fix build by adding zookeeper dependency explicitly @mars-lan
- #1898 Bump up kafkaAvroSerde to support SSL for Schema Registry @themightylaz
- #1899 fix(docker): update mae and mce consumer images to include glibc compat layer. allows the consumer jobs to deal with snappy compressed kafka topics when running on alpine linux @grantatspothero
- #1895 [BREAKING] Break dependency of ebean-dao on metadata-models. @jplaisted
- #1897 docs: update town hall history @mars-lan
- #1893 add default KAFKA_BOOTSTRAP_SERVER @liangjun-jiang
- #1871 feat: Port mce-cli to Java. @jplaisted
- #1889 fix (docker): Fix install of Chrome in frontend Dockerimage @frsann
- #1873 build: add failure notification on push @mars-lan
- #1881 Adds ability for midtier to serve custom dataset properties from aspect @catran
- #1880 Fixes current user entity not being populated correctly @catran
- #1874 fix (frontend): Partially fixes lineage issues and dataset API handling @catran
- #1872 build: fix build @mars-lan
- #1868 Small fixes to mce_cli @jplaisted
- #1863 fix(gms): update kafka client libraries to a newer version to support schema registry basic auth + SSL @grantatspothero
- #1857 1849 support ssl to mce cli.py @fabiofilz
- #1839 fix(ingestion): set schema registry URL correctly for FMCE producer @mars-lan
- #1838 build(node): replace broken & unmaintained gradle node plugin @mars-lan
- #1835 Pushing internal consolidation of modules to open source @catran
- #1828 docs: add external link @mars-lan
Removed
- #1925 remove CorpUsersClient file @jywadhwani