Skip to content

Commit

Permalink
Merge pull request #122 from Insight-Services-APAC/main
Browse files Browse the repository at this point in the history
Sync changes from main
  • Loading branch information
KRodriguez-Insight authored Jul 30, 2024
2 parents 98d8bfe + 9b2215f commit 093f9c1
Show file tree
Hide file tree
Showing 17 changed files with 148 additions and 82 deletions.
4 changes: 2 additions & 2 deletions dbt_wrapper/fabric_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,10 @@ def remove_last_line(self, py_fabric_file: str):

# Generate py files for api update
def IPYNBtoFabricPYFile(self, dbt_project_dir, progress, task_id):
progress.update(task_id=task_id, description=f"Converting notebooks to Fabric PY format")
progress.update(task_id=task_id, description=f"Converting notebooks to Fabric PY format")
target_dir = str(Path(dbt_project_dir) / Path("target"))
notebooks_dir = str(Path(target_dir) / Path("notebooks"))
notebooks_fabric_py_dir = str(Path(target_dir) / Path("notebooks_fabric_py"))
notebooks_fabric_py_dir = str(Path(target_dir) / Path("notebooks_fabric_py"))
os.makedirs(notebooks_fabric_py_dir, exist_ok=True)
list_of_notebooks = os.listdir(notebooks_dir)
for filename in list_of_notebooks:
Expand Down
Binary file added docs/assets/images/dbt_wrapper_run_all.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions docs/assets/stylesheets/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,12 @@
--md-accent-fg-color: #B01C87;
--md-accent-fg-color--transparent: #57B5E6;
--md-accent-bg-color: #fff;
--md-accent-bg-color--light: #ffffffb3
--md-accent-bg-color--light: #ffffffb3;

.md-button {
color: var(--md-typeset-a-color)
background-color: var(--md-typeset-a-color);
border-color: var(--md-typeset-a-color);
color: hsla(var(--md-hue),0%,100%,1)
}

.md-button--primary {
Expand Down
7 changes: 5 additions & 2 deletions docs/developer_guide/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
---

weight: 2
weight: 1

---

# Developer Guide
# Developer Guide

!!! danger
The developer guide is a work in progress. More details to follow soon. Please check back later.
36 changes: 0 additions & 36 deletions docs/developer_guide/initial_setup.md

This file was deleted.

17 changes: 16 additions & 1 deletion docs/documentation_guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,20 @@
# Documentation Guide

## Building you environment
Documentation for this project is built using [mkdocs-material](https://squidfunk.github.io/mkdocs-material/). To contribute to the documentation you will need to create a separate python environment. I suggest that you call this `.env_mkdocs` to avoid confusion with the dbt environment. Create your environment and install the required packages as shown below:
Documentation for this project is built using [mkdocs-material](https://squidfunk.github.io/mkdocs-material/). To contribute to the documentation you will need to create a separate python environment. I suggest that you call this `.env_mkdocs` to avoid confusion with the dbt environment.

!!! important
The commands below assume that you have already performed the `Core Tools Installation` steps in the [User Guide](../user_guide/initial_setup/#core-tools-installation). If you have not done this yet, please do so before proceeding. Note you **ONLY** have to install `core tools` it is not necessary to move on to the `other tools` section.


Before creating the environment you will need to clone the repository. You can do this by running the command below:

``` powershell title="clone the repository"
git clone https://github.com/Insight-Services-APAC/APAC-Capability-DAI-DbtFabricSparkNb.git MyDocsProject
```
This will clone the repository into a directory called ==MyDocsProject==. You can rename this directory to whatever you like. Navigate into this new directory and then run the commands below.

``` powershell title="Create and activate the Python environment"
# Create the Python environment
python -m venv .env_mkdocs
Expand All @@ -23,6 +32,12 @@ pip install -r ./requirements_mkdocs.txt
```

These commands will create a new python environment and install the required packages for building the documentation. To launch the new environment you will need to run the command below:

``` powershell title="Activate the Python environment"
.\.env_mkdocs\Scripts\activate.ps1
```

## Updating the documentation
The documentation source is held in the `docs` directory. To update the documentation you will need to edit the markdown files in this directory. In order to understand the syntax used for the markdown be sure to review the reference section for [mkdocs-material](https://squidfunk.github.io/mkdocs-material/reference/). Once you have made your changes you can build the documentation using the command below:

Expand Down
11 changes: 0 additions & 11 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,3 @@ Consequently, to use this adapter, you will need to install the [dbt-fabrickspar
### Branching
When creating a branch to work on from please use the branch name of `feature/YourBranchName`. The case on `feature/` matters so please make sure to keep it lower case. Pull requests are to be made into the "dev" branch only. Any pull requests made into "Main" will be removed and not merged.

## Community

### Logging to Delta

Logging was previously done to a log file saved in the lakehouse and in json format. This has been changed to now log to a delta table in the lakehouse.

It works using 2 tables *"batch"* and *"execution_log"*. At the start of the ETL the Prepare step will check if the tables exist and if they don't they will be created. This is followed by a check for an *"open"* batch and where the batch is still open it will fail.

If you need to close the batch manually, this code is available at the end of the master notebook.

If this check passes, a batch will be opened. There are steps in each master numbered notebook to check for failures in previousn notebook runs and this is done using the open batch so previous ETL executions with failures are not picked up and return false stops on the current execution.
32 changes: 22 additions & 10 deletions docs/user_guide/dbt_build_process.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@

---

# Dbt Build Process

## Dbt Build Process & Dbt_Wrapper
The dbt-fabricksparknb package includes a console application that will allow you to build your dbt project and generate a series of notebooks that can be run in a Fabric workspace. This application is called `dbt_wrapper` and is a python script that is run from the command line. You can invoke the application and view information about it by running the following command in a terminal.

Expand All @@ -31,38 +29,52 @@ dbt_wrapper --help

To build your dbt project and publish your notebook to your Fabric workspace you can run the command below:

!!! Note
Be sure to replace ==my_project== with the name of your dbt project folder.
!!! note
Be sure to replace ==my_project== with the name of your dbt project folder

!!! important
Before running the dbt_wrapper make sure you're logged into your tenant in the PowerShell terminal using both `az login` and `azcopy login`. See the examples below and replace the tenant id with your own.
```powershell
az login --tenant 73738727-cfc1-4875-90c2-2a7a1149ed3d
azcopy login --tenant-id 73738727-cfc1-4875-90c2-2a7a1149ed3d
```

```powershell
dbt_wrapper run-all my_project
dbt_wrapper run-all my_project
```

The command above will carry out all of the necessary "stages" required to fully build your dbt project and generate the notebooks that can be run in a Fabric workspace. When run successfully your should see output similar to the image below.
![notebooks](../assets/images/dbt_wrapper_output.png)

## Toggling Build Stages Off and On
There are times when you may not wish to run ALL of the build steps. In such circumstances you can toggle off specific stages by using the options built in to the dbt_wrapper application. To view all of the options available to you run the command below:
![alt text](./../assets/images/dbt_wrapper_run_all.png)


## Toggling Build Stages Off and On

There are times when you may not wish to run ALL of the build steps. In such circumstances you can toggle off specific stages by using the options built in to the `dbt_wrapper` application. To view all of the options available to you run the command below:

```powershell
dbt_wrapper run-all --help
```

For example, should you wish to run all stages except for the upload of the generated notebooks to your Fabric workspace you can run the command below:

```powershell
dbt_wrapper run-all my_project --no-upload-notebooks-via-api
```

Alternatively, you might want to make use of some additional "helper" commands that we have included in the application. For example, you can run the run-all-local command to run all stages except for those that require a live Fabric connection. This is useful when you are testing the build process locally. To run this command you can use the command below:


```powershell
dbt_wrapper run-all-local my_project
```

Review all of the commands available to you by running using the help option as shown below:

```powershell
dbt_wrapper --help
```

!!! Info
You are now ready to move to the next step in which you gain an understanding of the various kinds of notebooks generated by the adapter. Follow the [Understanding the Generated Notebooks](./generated_notebooks.md) guide.



1 change: 0 additions & 1 deletion docs/user_guide/dbt_project_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,6 @@ my_project:
retry_all: true
```


!!! info
You are now ready to move to the next step in which you will build your dbt project. Follow the [Dbt Build Process](./dbt_build_process.md) guide.

12 changes: 4 additions & 8 deletions docs/user_guide/development_workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,24 +13,20 @@
- [x] Native Fabric Notebooks Generated and Deployed
1. Non dbt users able to view notebooks and business logic
2. Monitoring and debugging of loads directly in Fabric without the need for a separate tool
- [x] Re-occurring loads achieved using native Fabric scheduling
- [x] Re-occurring loads achieved using native Fabric scheduling
- [x] Simplified code promotion process using native Fabric Git integration
- [X] No need for dbt hosted in a virtual machine
- [X] No need for dbt hosted in a virtual machine
1. No need for service account
2. **No need for Azure Landing Zone**
3. No need for secure network connectivity between Azure VM and Fabric
3. No need for secure network connectivity between Azure VM and Fabric
- [x] Allows for disconnected development environment providing
1. Faster DBT build times
2. Greater developer flexibility
- [x] Simplified code promotion Process using native Fabric Git integration
1. Single, native promotion process for all Fabric artifacts including non-dbt ones

!!! failure "Disadvantages"
- Requires Additional Steps
1. Meta Data Extract
2. Notebook Upload
3. Notebook Import

- Requires additional steps to extract metadata and generate notebooks but this is mitigated by our wrapper script that automates these.

</div>

Expand Down
62 changes: 62 additions & 0 deletions docs/user_guide/notebooks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---

title: "Understanding the Generated Notebooks"
excerpt: "This provided you with an understanding of the notebooks that are generated by the dbt-fabricsparknb package."
sidebar_label: "Generated Notebooks"
slug: /user_guide/generated_notebooks
weight: 4

---


## Understanding the Notebooks Generated


When you run this build script successfully, you will see a series of notebooks generated in your ==my_project==/target/notebooks directory. This is the `"special sauce"` of this dbt-adapter that allows your to run your dbt project natively as notebooks in a Fabric workspace. The image below shows a sample listing of generated notebooks. Your specific notebooks will be contain the name of your dbt project and may be different depending on the models and tests that you have defined in your dbt project.

#### Sample listing of Generated Notebooks
![notebooks](/assets/images/notebooks.png)

If you study the files shown above you will notice that there is a naming convention and that the notebooks are prefixed with a specific string. The following table explains at a high level the naming convention and the purpose of each type of notebook.

| Notebook Prefix | Description |
| --------------- | --------------------------|
| model. | These are dbt **model** notebooks. A notebook will be generated for each dbt **model** that you define. You will be able to run, debug and monitor execution of these notebooks directly in the Fabric portal independently of dbt.|
| test. | These are dbt **test** notebooks. A notebook will be generated for each dbt **test** that you define. You will be able to run, debug and monitor execution of these notebooks directly in the Fabric portal independently of dbt. |
| seed. | These are dbt **seed** notebooks. A notebook will be generated for each dbt **seed** that you define. You will be able to run, debug and monitor execution of these notebooks directly in the Fabric portal independently of dbt.|
| master_ | These are **execution orchestration** notebooks. They allow the running of your models, tests and seeds in parallel and in the correct order. They are what allow you to run your transformation pipelines independently of dbt as an orchestrator. In order to run your project simply schedule master.{project_name}.notebook.iypnb using Fabric's native scheduling functionality |
| import_ | This is a helper notebook that facilitate import of generated notebooks into workspace. |
| metadata_ | This is a helper notebook to facilitate generation of workspace metadata json files. |


!!! important
The green panels below provide a more detailed discussion of each type of notebook. Take a moment to expand each panel by clicking on it and read the detailed explanation of each type of notebook.

??? Question "Notebooks with the Prefix `"model."`"
These are dbt **model** notebooks. A notebook will be generated for each dbt **model** that you define. You will be able to run, debug and monitor execution of these notebooks directly in the Fabric portal independently of dbt.

![alt text](./assets/images/model_notebook0.png)

![alt text](./assets/images/model_notebook1.png)

??? Question "Notebooks with the Prefix `"test."`"
These are dbt **test** notebooks. A notebook will be generated for each dbt **test** that you define. You will be able to run, debug and monitor execution of these notebooks directly in the Fabric portal independently of dbt.

??? Question "Notebooks with the Prefix `"seed."`"
These are dbt **seed** notebooks. A notebook will be generated for each dbt **seed** that you define. You will be able to run, debug and monitor execution of these notebooks directly in the Fabric portal independently of dbt.

??? Question "Notebooks with the Prefix `"master_"`"
These are **execution orchestration** notebooks. They allow the running of your models, tests and seeds in parallel and in the correct order. They are what allow you to run your transformation pipelines independently of dbt as an orchestrator. In order to run your project simply schedule master.{project_name}.notebook.iypnb using Fabric's native scheduling functionality.

??? Question "Notebooks with the Prefix `"import_"`"
This is a helper notebook that facilitates import of generated notebooks into workspace.

??? Question "Notebooks with the Prefix `"metadata_"`"
This is a helper notebook to facilitates the generation of workspace metadata json files.


## Notebooks in your Fabric Workspace
If you login to your fabric workspace and navigate to the notebooks section you will see that the generated notebooks have been uploaded to your workspace.

!!! tip
I suggest that you move your notebooks into a folder that matches the name of your dbt project.
File renamed without changes.
File renamed without changes.
5 changes: 0 additions & 5 deletions docs/zzz_archive/index.md

This file was deleted.

28 changes: 27 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,30 @@ extra:
site_url: https://legendary-adventure-22vkokv.pages.github.io
client_organisation: "AdventureWorks"
project_start: "2024-07-01"

analytics:
provider: google
property: G-K2F0EYYZSP
feedback:
title: Was this page helpful?
ratings:
- icon: material/emoticon-happy-outline
name: This page was helpful
data: 1
note: >-
Thanks for your feedback!
- icon: material/emoticon-sad-outline
name: This page could be improved
data: 0
note: >-
Thanks for your feedback! Help us improve this page by
using our <a href="..." target="_blank" rel="noopener">feedback form</a>.
consent:
title: Cookie consent
description: >-
We use cookies to recognize your repeated visits and preferences, as well
as to measure the effectiveness of our documentation and whether users
find what they're searching for. With your consent, you're helping us to
make our documentation better.
copyright: >
By Insight Enterprises –
<a href="#__consent">Change cookie settings</a>
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,6 @@ azure-identity>=1.13.0
azure-core>=1.26.4
requests==2.31.0
typer>=0.12
azure-storage-file-datalake
azure-storage-file-datalake
setuptools>=72.1.0
pip-system-certs
5 changes: 3 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,9 @@ def _get_dbt_core_version():
"azure-core>=1.26.4",
"requests==2.31.0",
"typer>=0.12.3",
"setuptools>=71.0.4",
"azure-storage-file-datalake"
"setuptools>=72.1.0",
"azure-storage-file-datalake",
"pip-system-certs"

],
zip_safe=False,
Expand Down

0 comments on commit 093f9c1

Please sign in to comment.