-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ML models #1721
feat: add ML models #1721
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Arun for the extensive PR. Left a few overall comments on the models.
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/IntendedUse.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/aspect/ModelAspect.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/IntendedUse.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/ModelProperties.pdl
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the initiative @arunvasudevan
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/CaveatsAndRecommendations.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/EthicalConsiderations.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/ModelProperties.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/EvaluationData.pdl
Outdated
Show resolved
Hide resolved
@hshahoss @jywadhwani Thanks for your detailed review. I have addressed your comments. |
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/IntendedUse.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/QuantitativeAnalyses.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/QuantitativeAnalyses.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/ModelProperties.pdl
Outdated
Show resolved
Hide resolved
/* | ||
* Cost Details for an Entity | ||
*/ | ||
record Cost { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option might be to define an enum for type of Cost called something like CostType. It could be an enum to identify the different type of costs that can be attached. To start with, we can have ORG_COST_CODE as one of the types. Then, we can define cost value as a object which can be a union of primitive/complex cost objects types. Something similar to a model like below:
/*
* Cost Details for an Entity
*/
record Cost {
costType: CostType
cost: CostValue
}
enum CostType {
/**
* Org Cost Code to which the Cost of this entity should be attributed to
*/
ORG_COST_CODE
// additional cost types here
}
typeref CostValue = union[
numericalCost: double
stringCost: string
customCost: customCostModel
....
]
Referencing the above model, we can now have an array of Cost aspect associated with the Model entity. The above model will provide an option for others to expand it to include their own cost definitions.
I'd encourage for us to think of all common models in this way so that they are extensible to other use cases. Let me know what you think.
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/QuantitativeAnalyses.pdl
Outdated
Show resolved
Hide resolved
li-utils/src/main/java/com/linkedin/common/urn/MlFeatureUrn.java
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/IntendedUse.pdl
Outdated
Show resolved
Hide resolved
/** | ||
* Fabric type where ML Model belongs to or where it was generated. | ||
*/ | ||
origin: FabricType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this Fabric refer to data centers? Just for my understanding, would models be trained per fabric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FabricType
refers to the origin or environment.
In most cases i don't think someone would train in every environment but they can attach the trained datasets (DatasetUrn
) per platform, Model and Environment.
li-utils/src/main/java/com/linkedin/common/urn/MlFeatureUrn.java
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/IntendedUserType.pdl
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/IntendedUse.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/EvaluationData.pdl
Show resolved
Hide resolved
@hshahoss @zimengy @jywadhwani Thanks a lot for your review comments. |
metadata-models/src/main/pegasus/com/linkedin/metadata/snapshot/ModelSnapshot.pdl
Outdated
Show resolved
Hide resolved
@jywadhwani Updated all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, Looks good to me. Let's try to close on the open questions and get this in?
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLModelFactors.pdl
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on the RFC, came across some questions that I have added in my review comments.
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLModelFactors.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLModelFactors.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/TrainingData.pdl
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/TrainingData.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/EthicalConsiderations.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/CaveatsAndRecommendations.pdl
Outdated
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/CaveatsAndRecommendations.pdl
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/MLModelProperties.pdl
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/QuantitativeAnalyses.pdl
Show resolved
Hide resolved
metadata-models/src/main/pegasus/com/linkedin/ml/metadata/TrainingData.pdl
Show resolved
Hide resolved
@jywadhwani @hshahoss Appreciate your time in the review process. |
fix: Update defaults of aspectNames params. The last PR to sync internal code broke the external GMS, as code was now expected aspectNames to be null rather than empty by default. This preventing me logging into DataHub as the corp user request would fail (it assumed I asked for no aspects rather than all aspects). TESTED: Built locally, launched with docker/dev.sh (so used latest frontend, but whatever). Verified I can now log into DataHub, browse and search for datasets, and view my profile.
…atahub-project#1782) * 1747 | remove obsolete yaml files * 1747 | remove configmap and its hardcoded references * 1747 | add missing input parameter of neo4j.host * 1747 | remove obsolete secrets and parameterize the rest * 1747 | auto-generate gms secret * 1747 | remove fullName overrides * 1747 | fix parameters in subchart's values.yaml * 1747 | remove hardcoding from parameters for gms host and port * 1747 | upgrade chart version * 1747 | update helm docs * 1747 | add extraEnv, extraVolume and extraMounts * 1747 | Alters pull policy of images to 'always' for ldh Co-authored-by: shakti-garg <[email protected]>
…-tier support (datahub-project#1817) * feat(data-platforms): Adding rest resource for /dataPlatforms and mid-tier support * Removed data platforms which are Linkedin internal
* Copy NOTICE from wherehows Copies the file from the wherehows branch. * Update notice.
* feature(dashboards): RFC for dashboards * Change directory structure * Create goals & non-goals sections * Removing alternatives section
Also list apache projects specifically.
* Releases updated version of datahub-web client UI code * Fix typo in yarn lock * Change yarn lock to match yarn registry directories * Previous commit missed some paths * Even more changes to yarnlock missing in previous commit * Include codegen file for typings * Add files to get parity for datahub-web and current OS datahub-midtier * Add in typo fix from previous commit - change to proper license * Implement proper OS fix for person entity picture url * Workarounds for open source DH issues * Fixes institutional memory api and removes unopensourced tabs for datasets * Fixes search dataset deprecation and user search issue as a result of changes * Remove internal only options in the avatar menu
* Update README.md * Update links.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hard work! Excited to see this come to fruition.
Great work @arunvasudevan! Thanks for the unyielding will to see this through. |
ML Models schema is inspired from Google's Model cards Paper- https://arxiv.org/pdf/1810.03993.pdf
Checklist