-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
470 IngestionMediator class #593
Open
jkwening
wants to merge
12
commits into
focusconsulting:dev
Choose a base branch
from
jkwening:470-cron-job
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Started implementing mediator design pattern for ingestion objects and work flow. This required: - creating an ingestion_mediator module: housing IngestionMediator class - creating a base colleague class that will be inherited by all objects that will communicate with the mediator - started modifying LoadData and Manifest classes to inherit from Colleague class so it can communicate with the mediator and mediator with it - added 'dependency' field to manifest.csv that will track dependencies across unique_data_ids so to prevent them being reloaded without the precense of their respective dependent unique_data_id TODO: - refactor LoadData, Manifest, and Cleaner modules to decouple them from each other and have IngestionMediator class coordinate their activities - implement solution for resolving double-adding projects issue based on 'dependency' field in manifest
…coupled and moved into IngestionMediator and any other needed refactoring
…ild activity into ingestion mediator to mirro current workflow of LoadData before beginning decoupling from Manifest and then Cleaner.
…ted Meta.py and additional refactoring. Todo - finish up coordinating writing clean psv file into db
…ance variable to Colleague class, and related methods to IngestionMediator class to allow loading directly from cleaned psv file without having to reprocess and clean raw data files prior to loading to db. TODO - write unit tests to confirm code works appropriately
Completed additional decoupling refactoring for LoadData and SQLWriter needed to seperate out loading via raw data vs clean psv file. Current code passes unit tests that verifies that both methods works successfully. TODO: - add remaining load data activities, specifically zone_facts table - add activities related to get_api_data.py and incorporate dependency workflow
Passed unit tests - can now also load zone_facts table TODO - implement solution for resolving double-adding projects issue by refactoring to utilize 'dependency' field in manifest
GetApiData class was added with three methods: - get all api files - get files by modules - get files by unique_data_id It's much clearer what is going on and additionaly honors the Mediator design pattern for our code ingestion.
Can now load data and trigger loading of dependent unique data ids. Refactoring code passed unit tests. This code is not stable enough for PR review and push to dev branch. TODO - add reverse dependency look up: for unique data id, check to see if the data it is dependent on is loaded into the db. If not, load that first and recursively whatever it is dependent on before loading the originall requested data id.
Includes methods for on demand requests by user in services.py. This includes a command line interface similar to load_data.py. Moving forward, this replaces load_data.py as point of user access for ingestion and processing related workflow. Additionally, refactored get_api_data.py -> GetApiData.py encapsulated as a class object instead of scripting module. It has been moved into 'python/housinginsights/ingestion' file path.
26 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note
This is an overhaul of our ingestion process. This PR request is now stable and passed all included unit tests. services.py is now the point of access for user requests related ingestion workflow, weekly_update process, and ancillary features. The following modules are now obsolete:
load_data.py: replaced by services.py
functions.py: its methods have been refactored into Meta.py - class encapsulation of meta.csv
What's In This
Implemented mediator design pattern for ingestion objects and work flow. This required:
TODO
from each other and have IngestionMediator coordinate their activities
'dependency' field in manifest Error logging and cron jobs #470