-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion) ldap: make ldap attrs keys configurable #4682
feat(ingestion) ldap: make ldap attrs keys configurable #4682
Conversation
…ject#4599) * Code changes to consume attrs_mapping from ldap recipe * Doc changes outlining how to use different configurations in recipe
As discussed at office hours.. Let's move forward by duplicating the LDAP source and taking the caching approach you suggested:
|
@@ -34,6 +34,24 @@ source: | |||
# Options | |||
base_dn: "dc=example,dc=org" | |||
|
|||
# Optional attribute mapping to allow ldap config differences across orgs | |||
attrs_mapping: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulsaurav im wondering if this should really map "DataHub" concepts to specific attributes?
Like this:
urn_id:
email
firstName
lastName
displayName
Instead of mapping LDAP -> LDAP concepts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea! Let me make changes and send you an update later in the evening. Thanks!
|
||
The `drop_missing_first_last_name` should be set to true if you've got many "headless" user LDAP accounts | ||
for devices or services should be excluded when they do not contain a first and last name. This will only | ||
impact the ingestion of LDAP users, while LDAP groups will be unaffected by this config option. | ||
|
||
### Configurable LDAP | ||
|
||
Every organization may implement LDAP slightly differently based on their needs. The makes a standard LDAP recipe ineffecvtive due to missing data during LDAP ingestion. For instance, LDAP recipe assumes department information for a CorpUser would be present in the `departmentNumber` attribute. If an organization chose not to implement that attribute or rather capture similar imformation in the `department` attribute, that information can be missed during LDAP ingestion (even though the information may be present in LDAP in a slightly different form). LDAP source provides flexibility to provide optional mapping for such variations to be reperesented under attrs_mapping. So if an organization represented `departmentNumber` as `department` and `mail` as `email`, the recipe can be adapted to customiza that mapping based on need. An example is show below. If `attrs_mapping` section is not provided, the default mapping will apply. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor typo:
ineffecvtive -> ineffective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo:
customiza -> customize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching these!
members = parse_from_attrs(attrs, "uniqueMember") | ||
email = attrs["mail"][0].decode() if "mail" in attrs else full_name | ||
owners = parse_from_attrs(attrs, self.config.attrs_mapping["owner"]) | ||
members = parse_from_attrs(attrs, self.config.attrs_mapping["uniqueMember"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember -> this members field is deprecated. instead, we should be populating the "GroupMembership" aspect of the user object.. Were you intending to do this as a followup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, as a part of #3335
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful! Thanks for the update
Looking great so far. Once we do the "datahub concept" mapping piece we are good to go! |
# default mapping for attrs | ||
# general attrs | ||
attrs_mapping: Dict[str, Any] = {} | ||
attrs_mapping["urn"] = "sAMAccountName" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the attribute used to construct the urn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks for clarifying that in the docs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. As a followup we definitely should fix the LDAP group extraction to use the GroupMembership MCEs!
…hub-project-master
…atahub into configurable-ldap-ingestion
group_urn: cn | ||
admins: owner | ||
members: uniqueMember | ||
displayName: name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this overwrite the display name attribute defined above?
This makes me think we do need 2 distinct mappings:
userMappings
groupMappings
I should have called this out earlier - my apologies for that. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, done!
attrs_mapping["group_urn"] = "cn" | ||
attrs_mapping["admins"] = "owner" | ||
attrs_mapping["members"] = "uniqueMember" | ||
attrs_mapping["displayName"] = "name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overwrites like 36
…atahub into configurable-ldap-ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for improving this connector @atulsaurav !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int(attrs[self.config.attrs_mapping["departmentId"]][0].decode())
The line above is actually failing for me with the following error:
line 320, in build_corp_user_mce
int(attrs[self.config.user_attrs_map["departmentId"]][0].decode())
ValueError: invalid literal for int() with base 10: 'USE_NEW_GL_FIELDS'
It appears that "departmentId" has been mapped to "departmentNumber" which we don't have at our corp ldap. Thus, it is null. I think you can reproduce this error by rolling back changes at the test case to null.
I have temporali applied a fix as below:
attrs[self.config.attrs_mapping["departmentId"]][0]).decode()
It fixed a problem partly, but some users were not ingested and end up with error like " Failed to extract some records due to: source produced an invalid metadata work unit"
Yes this is an oddity with the way the LDAP connector was written. DH has 2 fields for department - |
I have just done what you suggested and that fixed a problem, here is remapping from my receipt: source: type: "ldap" config: # Map from a default mapping as departmentNumber to departmentId user_attrs_map: {"departmentId":"departmentId"} The original source of the problem with our departmentNumber is that it is not a number, but a string. |
attrs_mapping
from ldap recipecloses #4599
Checklist