Releases: datahub-project/datahub
Releases · datahub-project/datahub
DataHub v0.5.0-beta
Changed
- #1806 Updated the frontend code. The frontend code was very far (> 6 months) behind the internal frontend code. We're not caught up yet, hence the BETA release, but we did go pretty far. Major refactorings were included.
Added
DataHub v0.4.3
Added
- #1782 improve security of k8s / helm charts
- #1791 Add description of dataset to the search index
- #1803 Add an example crawler for MS SQL
- #1811 Sync our internal backend code externally to HEAD (we're caught up now!)
- Added
ESBulkWriterDAO
to bulk write to ElasticSearch. Planned usage is for integration tests. - Add Strongly Consistent Secondary Index (SCSI) Implementation for MySQL.
- Start adding code to generate aspect-entity specific metadata events, rather than our current single event approach.
- Add support in the GMS to ask for no aspects on entities by setting the aspectNames param to null (omitting the param is still considered as asking for all aspects). Useful if checking the existence of an entity to avoid a large response (i.e. performing a search to just get URNs back, and nothing else).
- Added
Changed
- #1777 Add docker files for development
Fixed
Fixed
- #1808 Clear dataset description from search index when cleared in source
DataHub v0.4.2
Added
- #1711 feature(ingest): add bigquery ETL script @mars-lan
- #1712 feat(ingest): add PostgreSQL ETL script @mars-lan
- #1713 feat(ingest): replace custom hive-etl with sql-based ETL @mars-lan
- #1714 feat(ingest): add snowflake ETL script @mars-lan
- #1706 Implemented data process search feature @liangjun-jiang
- #1742 feat(gms): add postgres & mariadb supports to GMS @mars-lan
- #1752 build: build GitHub Pages from /docs directory @mars-lan
- #1745 feat(kafka-config): Add ability to configure other Kafka props @jsotelo
- #1754 Add documentation around the DataHub RFC process @jplaisted
Changed
- #1710 Refactor all ETL scripts to using Python 3 exclusively @mars-lan
- #1733 refactor(models): remove internal cluster model @hshahoss
- #1756 metadata-models 72.0.8 -> 80.0.0 @jywadhwani
- #1757 docs: add a sequence diagram and a description @liangjun-jiang
Removed
Fixed
- #1716 fix(py3): Bump ingestion Docker py dependency to 3.6 @keremsahin1
- #1726 fix: modify the etl script dependency @cobolbaby
- #1727 fix: correct the way to catch the exception @cobolbaby
- #1758 fix(ingestions): align the default kafka topics with PR @RealChrisL
DataHub v0.4.1
Added
- #1680 Data process entity @liangjun-jiang
- #1695 Implement data process graph feature
- #1708 feature(etl): add SQLAlchemy-based ingestion script @mars-lan
- #1707 Support for volta in web client @cptran777
- bbf7545 build: parallelize docker image builds @mars-lan
Changed
- #1700 Add missing updates from recent internal push @keremsahin1
- #1693 metadata-models 62.0.3 -> 72.0.8 @jywadhwani
- #1687 build(docker): refactor docker build scripts @mars-lan
- #1690 build(docker): refactor ingestion docker build script @mars-lan
- #1691 upgrade the version of neo4j @jywadhwani
- #1685 move the gradle plugin version to top level build.gradle @jywadhwani
- 63943a1 build: update workflows to build version-tagged docker images upon new release @mars-lan
Fixed
- #1697 fix: remove helm container command @jsotelo
- #1698 fix: add missing neo4j.host helm var @jsotelo
- #1709 [fix] load default picture link if not present @jywadhwani
- #1704 fix-DatasetSearchConfig class ref @geosmart
- f79b2c9 fix(ingestion): Fix sample MCE for data process @keremsahin1
- 867dbd0 fix: use tuple notations for union types @mars-lan
DataHub v0.4.0
Added
- #1568 Allow to store Quickstart dockers data in a folder for persistence @afranzi
- #1602 feat: support for Kubernetes-based deployment @bharatak
- #1608 add lineage hive @clojurians-org
- #1609 add support for kubernetes helm packaging @bharatak
- #1611 init jdbc generator @clojurians-org
- #1613 add oracle driver @clojurians-org
- #1629 feat: Converting MCE to a Spring boot Application @arunvasudevan
- #1635 feat: convert MAE application to springboot @arunvasudevan
- #1637 add postgresql support and force utf8 encode on non-utf8 locale @clojurians-org
- #1647 Add openldap-etl script and instruction @loftyet
- #1673 add DataProcess Urn @loftyet
- #1678 refactor(pdl): convert all pdsc to pdl @mars-lan
- #1677 feat(urn): add AzkabanFlow and AzkabanJob urn @hshahoss
Changed
- #1601 build: bypass testing datahub-web when running idea gradle task @mars-lan
- 6ab2ab6 build(mysql): Change mysql dependency from latest to 5.7 @keremsahin1
- #1610 metadata-models 54.0.1 -> 58.0.1 @jywadhwani
- #1616 metadata-models 58.0.1 -> 62.0.3 @jywadhwani
- #1619 refactor(gms): move gms restli resources @jywadhwani
- #1624 build(gms): rename JettyRunWar task to run @mars-lan
- #1626 refactor(frontend): fails loudly to help debug gms issue @mars-lan
- #1633 add field for ui and parser reference @clojurians-org
- #1641 migrate hive generator @clojurians-org
- #1662 style: add checkstyle and IDEA code style config @mars-lan
- #1664 build: update pegasus to v28 to add PDL support @mars-lan
- #1667 refactor: change the default log location @mars-lan
- #1669 refactor: use named volume instead of bind mount in quickstart @mars-lan
Deprecated
Removed
Fixed
- #1605 specify explicit avro lib for compatibility issue @jhsenjaliya
- d1cf628 Fix: Docker Quickstart - Sample Data Loading Error @RealChrisL
- ba33c7a Specify python version in mce-cli requirement.txt @RealChrisL
- #1621 fix: elasticsearch not starting on Mac @mars-lan
- #1622 build: pegasus plugin doesn't work well with gradle caching @mars-lan
- #1625 fix(gms): unable to find registered resources @mars-lan
- #1630 fix: Reduce gms & frontend docker image sizes @keremsahin1
- #1631 fix(Docker): Fixing 'dockerize not found' issue while starting @keremsahin1
- #1632 fix: Reduce mae-consumer & mce-consumer docker image sizes @bharatak
- #1646 fix(metadata-ingestion): pass schema_record to mce-cli cosumer @RealChrisL
- #1657 fix(quickstart): set utf8mb4 for mysql @e11it
- #1661 fix(urn): Move UrnCoercer into corresponding Urn class @mars-lan
- #1665 fix: use semantic instead of literal comparison in DefaultEqualityTester @mars-lan
- #1670 build: start enforcing checkstyle and fix all violations @mars-lan
- #1672 fix(frontend): Extract lastModified field from downstream/upstream aspect @keremsahin1
DataHub v0.3.1
Added
- 3765c1d Enable parallel Gradle build @keremsahin1
- #1575 Enable Failed Metadata Change Event for MCE Processor @arunvasudevan
- #1570 Use pictureLink property to show person picture @afranzi
- #1569 Show Dataset description in Dataset view @afranzi
- #1597 Ingestion tool to load JSON data to DataHub (in /contrib) @clojurians-org
- #1585 Nix sandbox (in /contrib) @clojurians-org
- 71f2d14 Added EventUtilsTest @keremsahin1
Changed
- 36a5d23 Migrate to getSnapshot API & remove dataset snapshot @keremsahin1
- b17b91f Bump gradle to 5.6.4 and pegasus to 27.7.18 @keremsahin1
- Documentation
Removed
- #1581 Drop LinkedIn internal fabrics @mars-lan
- 1fff6c9 Cleanup unused snapshot resources for corp users & groups @keremsahin1
Fixed
- #1590 Gradle Build Fails When Run in Parallel @RyanHolstien
- #1574 Fix typo and watchman error @clojurians-org
- #1564 Allow dashes in user urn @ben5448
- 3d64c45 Fix browse result pagination @keremsahin1
- fba5cd8 Handle optional aspects/fields for CorpUser gracefully @keremsahin1
DataHub v0.3.0
- Onboarded people as a top level entity
- Enabled people search
- Created Docker image for running ingestion pipeline
- Misc bug fixes
- Documentation updates
- Code cleanup
DataHub v0.2.0-alpha
- Added Neo4j graph indexing/querying pipeline
- Dataset downstream lineage is now powered by graph
- Added MySQL ETL example
- Updated docker-compose settings for low resource environments
- Misc bug fixes
DataHub v0.1.1-alpha
- Added Kafka crawler sample
- Added support for surfacing downstream dataset lineage using search. This is a stop-gap solution until neo4j support is added
Data Hub v0.1.0-alpha
First official release of Data Hub:
- Leveraging GMA architecture
- Backend: GMS implementation - support for dataset & user entities
- Frontend: Data Hub Web Application
- Pub-sub: Kafka
- Stream processing: MXE consumer jobs using Kafka Streams
- Generic modeling layer with CRUD on MySQL
- Search support using Elasticsearch
- Supported metadata sources: LDAP and Hive