-
Notifications
You must be signed in to change notification settings - Fork 97
Home
wjohnson edited this page Sep 10, 2021
·
6 revisions
Welcome to the pyapacheatlas wiki!
The purpose of this package is to make it easy to work with the Apache Atlas REST API without having to learn too much about its nuances. In addition, the package provides a way to read an Excel file and extract entities, lineage, column mappings, and type definitions so you don't have to dig into the nuances of Atlas just to get something into your data catalog.
The package is broken up into several submodules:
-
auth
- Provides azure-identity (Managed Identity, Azure CLI), ServicePrincipal, and Basic authentication (for Apache Atlas) support.
-
core
- Provides an
AtlasClient
orPurviewClient
to your Apache Atlas backed service. - Provides
AtlasEntity
andAtlasProcess
classes to make it easier to work with an Entity and Process type. - Provides Entity and Relationship TypeDef support.
- Provides a "What If" validator to help check if your entities are valid against a provided set of type defs.
- Provides an
-
readers
- A reader aides in extracting entities and types from standardized formats. Currently, the
ExcelReader
is the only provided reader. However, theReader
base class could be extended to support other formats you need. - A reader has a few standardized methods that take in a template that you have filled in and produces a batch of entities, custom lineage, column mapping, or type definitions.
- The
parse_update_lineage
function reads an excel file's UpdateLineage tab and extracts your Process types from excel and prepares the metadata to be uploaded to Atlas or Purview. - The
parse_bulk_entities
function lets you define entities with attributes and their relationship to other entities (e.g. define a table, columns, and the connection between them). - The
parse_entity_defs
andparse_classification_defs
extracts entity and classification definitions (respectively). - You can generate an Excel template with the required headers by running
python -m pyapacheatlas --make-template ./template.xlsx
on the terminal.
- A reader aides in extracting entities and types from standardized formats. Currently, the
-
scaffolding
- Create a type definition "payload" that provides the table, column, table lineage process, column lineage process, table to column relationship, and table lineage to column lineage relationship.
from pyapacheatlas.scaffolding import column_lineage_scaffolding
.
Thank you for your interest in using PyApacheAtlas! Please be sure to take a look at the more detailed pages in the wiki to get more specific information on the Excel Reader and Azure Purview Tips.
- Create a type definition "payload" that provides the table, column, table lineage process, column lineage process, table to column relationship, and table lineage to column lineage relationship.