This is a Singer tap that produces JSON-formatted data following the Singer spec.
This tap:
- Pulls LookML files from GitHub v3 API to extract LookML components using lkml parser.
- Extracts the following resources:
- Model Files: Git API Search with filename and extension filters for model and lkml
- Models: Parse items (connection, includes, datagroups, explores, joins, etc.) using lkml
- View Files: Git API Search with filename and extension filters for view and lkml
- Views: Parse items (derived table, dimensions, measures, filters, parameters, sets, etc.) using lkml
- Outputs the schema for each resource
- Incrementally pulls data based on the input state (file last-modified in GitHub)
model_files
- Search Endpoint (ALL Model Files): https://api.github.com/search/code?q=filename:.model.+extension:lkml+repo:[GIT_OWNER]/[GIT_REPOSITORY]
- File Endpoint: https://api.github.com/repos/[GIT_OWNER]/[GIT_REPOSITORY]/contents/[GIT_FILE_PATH]
- Primary key fields: git_owner, git_repository, path
- Foreign key fields: repository_id
- Replication strategy: INCREMENTAL (Search ALL, filter results)
- Bookmark field: last_modified
- Transformations: Remove _links node, remove content node, add repository name, path, folder, and repository ID
models
- Primary key fields: git_owner, git_repository, path
- Replication strategy: FULL_TABLE (ALL for each model_file)
- Transformations: Decode, parse model_file content and convert to JSON
view_files
- Search Endpoint (ALL View Files): https://api.github.com/search/code?q=filename:.view.+extension:lkml+repo:[GIT_OWNER]/[GIT_REPOSITORY]
- File Endpoint: https://api.github.com/repos/[GIT_OWNER]/[GIT_REPOSITORY]/contents/[GIT_FILE_PATH]
- Primary key fields: git_owner, git_repository, path
- Foreign key fields: repository_id
- Replication strategy: INCREMENTAL (Search ALL, filter results)
- Bookmark field: last_modified
- Transformations: Remove _links node, remove content node, add repository name, path, folder, and repository ID
views
- Primary key fields: git_owner, git_repository, path
- Replication strategy: FULL_TABLE (ALL for each model_file)
- Transformations: Decode, parse model_file content and convert to JSON
-
Install
Clone this repository, and then install using setup.py. We recommend using a virtualenv:
> virtualenv -p python3 venv > source venv/bin/activate > python setup.py install OR > cd .../tap-lookml > pip install .
-
Dependent libraries The following dependent libraries were installed.
> pip install singer-python > pip install singer-tools > pip install target-stitch > pip install target-json
-
Create your tap's
config.json
file. This tap connects to GitHub with a GitHub OAuth2 Token. This may be a Personal Access Token or Create an authorization for an App. Each tap connects to a single Looker/LookML Git Repository (where your Looker LookML code is hosted for your Looker Project); provide the name of thegit_repositories
delimited by a comma (spaces are ignored) and thegit_owner
of those repositories (whcih can be a User or Organization).{ "api_token": "YOUR_GITHUB_API_TOKEN", "git_owner": "YOUR_GITHUB_ORGANIZATION_OR_USER", "git_repositories": "LOOKER_GIT_REPO_1, LOOKER_GIT_REPO_2, ...", "start_date": "2019-01-01T00:00:00Z", "user_agent": "tap-lookml <api_user_email@your_company.com>" }
Optionally, also create a
state.json
file.currently_syncing
is an optional attribute used for identifying the last object to be synced in case the job is interrupted mid-stream. The next run would begin where the last job left off.{ "currently_syncing": "users", "bookmarks": { "model_files": "2019-10-13T19:53:36.000000Z", "view_files": "2019-10-13T18:50:11.000000Z" } }
-
Run the Tap in Discovery Mode This creates a catalog.json for selecting objects/fields to integrate:
tap-lookml --config config.json --discover > catalog.json
See the Singer docs on discovery mode here.
-
Run the Tap in Sync Mode (with catalog) and write out to state file
For Sync mode:
> tap-lookml --config tap_config.json --catalog catalog.json > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
To load to json files to verify outputs:
> tap-lookml --config tap_config.json --catalog catalog.json | target-json > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
To pseudo-load to Stitch Import API with dry run:
> tap-lookml --config tap_config.json --catalog catalog.json | target-stitch --config target_config.json --dry-run > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
-
Test the Tap
While developing the lookml tap, the following utilities were run in accordance with Singer.io best practices: Pylint to improve code quality:
> pylint tap_lookml -d missing-docstring -d logging-format-interpolation -d too-many-locals -d too-many-arguments
Pylint test resulted in the following score:
Your code has been rated at 9.68/10
To check the tap and verify working:
> tap-lookml --config tap_config.json --catalog catalog.json | singer-check-tap > state.json > tail -1 state.json > state.json.tmp && mv state.json.tmp state.json
Check tap resulted in the following:
The output is valid. It contained 58 messages for 4 streams. 4 schema messages 48 record messages 6 state messages Details by stream: +-------------+---------+---------+ | stream | records | schemas | +-------------+---------+---------+ | model_files | 2 | 1 | | models | 2 | 1 | | view_files | 17 | 1 | | views | 27 | 1 | +-------------+---------+---------+
Copyright © 2019 Stitch