-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion) ldap: make ldap attrs keys configurable #4682
Changes from 14 commits
cab247d
5e50969
6f5fbab
d0099e5
69dfbb3
3b5c051
0e0946a
7680041
bfcc13b
ff8b3c3
015a517
b736547
5132e75
7560c03
f485349
3955951
e077971
474ac7c
4df5568
74175e0
68add91
44eebfe
e9aa404
72fad69
0fbb294
49f14ee
26aa509
d9de05c
9ffb3af
ec943a4
d3c69b5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,28 @@ | |
CorpUserSnapshotClass, | ||
) | ||
|
||
# default mapping for attrs | ||
# general attrs | ||
attrs_mapping: Dict[str, Any] = {} | ||
attrs_mapping["urn"] = "sAMAccountName" | ||
|
||
# user related attrs | ||
attrs_mapping["fullName"] = "cn" | ||
attrs_mapping["lastName"] = "sn" | ||
attrs_mapping["firstName"] = "givenName" | ||
attrs_mapping["displayName"] = "displayName" | ||
attrs_mapping["manager"] = "manager" | ||
attrs_mapping["mail"] = "mail" | ||
attrs_mapping["departmentNumber"] = "departmentNumber" | ||
attrs_mapping["title"] = "title" | ||
|
||
# group related attrs | ||
attrs_mapping["group_urn"] = "cn" | ||
attrs_mapping["owner"] = "owner" | ||
attrs_mapping["managedBy"] = "managedBy" | ||
attrs_mapping["uniqueMember"] = "uniqueMember" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are these attributes? I'm not familiar with these or how they can be mapped to DataHub concepts. DataHub groups have
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we just call uniqueMember as "member"? It's not clear what the difference between "member" and "uniqueMember" are - do you know? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my company we use "member" only. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sorry, you are correct! I had somehow missed these. I have corrected this now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am calling this as @bda618 thanks for checking! we also just use |
||
attrs_mapping["member"] = "member" | ||
|
||
|
||
def create_controls(pagesize: int) -> SimplePagedResultsControl: | ||
""" | ||
|
@@ -56,15 +78,6 @@ def set_cookie( | |
return bool(cookie) | ||
|
||
|
||
def guess_person_ldap(attrs: Dict[str, Any]) -> Optional[str]: | ||
"""Determine the user's LDAP based on the DN and attributes.""" | ||
if "sAMAccountName" in attrs: | ||
return attrs["sAMAccountName"][0].decode() | ||
if "uid" in attrs: | ||
return attrs["uid"][0].decode() | ||
return None | ||
|
||
|
||
class LDAPSourceConfig(ConfigModel): | ||
"""Config used by the LDAP Source.""" | ||
|
||
|
@@ -87,6 +100,21 @@ class LDAPSourceConfig(ConfigModel): | |
default=20, description="Size of each page to fetch when extracting metadata." | ||
) | ||
|
||
# default mapping for attrs | ||
attrs_mapping: Dict[str, Any] = {} | ||
|
||
|
||
def guess_person_ldap(attrs: Dict[str, Any], config: LDAPSourceConfig) -> Optional[str]: | ||
"""Determine the user's LDAP based on the DN and attributes.""" | ||
if config.attrs_mapping["urn"] in attrs: | ||
jjoyce0510 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return attrs[config.attrs_mapping["urn"]][0].decode() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor: Any chance this will be None? What then? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So since we default |
||
else: # for backward compatiblity | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's add a log line / warning here that we could not find the configured attribute mapping, falling back to X. I'm a bit torn on this - we might want to simply skip the user instead of using attributes the user doesn't want to use. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have the else part in place to retain existing behavior and not introduce any breaking changes.. I have taken care of adding warning when this happens. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it - thanks for explanation |
||
if "sAMAccountName" in attrs: | ||
return attrs["sAMAccountName"][0].decode() | ||
if "uid" in attrs: | ||
return attrs["uid"][0].decode() | ||
return None | ||
|
||
|
||
@dataclasses.dataclass | ||
class LDAPSourceReport(SourceReport): | ||
|
@@ -116,6 +144,11 @@ def __init__(self, ctx: PipelineContext, config: LDAPSourceConfig): | |
"""Constructor.""" | ||
super().__init__(ctx) | ||
self.config = config | ||
# ensure prior defaults are in place | ||
for k in attrs_mapping: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice! |
||
if k not in self.config.attrs_mapping: | ||
self.config.attrs_mapping[k] = attrs_mapping[k] | ||
|
||
self.report = LDAPSourceReport() | ||
|
||
ldap.set_option(ldap.OPT_X_TLS_REQUIRE_CERT, ldap.OPT_X_TLS_ALLOW) | ||
|
@@ -198,17 +231,17 @@ def handle_user(self, dn: str, attrs: Dict[str, Any]) -> Iterable[MetadataWorkUn | |
work unit based on the information. | ||
""" | ||
manager_ldap = None | ||
if "manager" in attrs: | ||
if self.config.attrs_mapping["manager"] in attrs: | ||
try: | ||
m_cn = attrs["manager"][0].decode() | ||
m_cn = attrs[self.config.attrs_mapping["manager"]][0].decode() | ||
manager_msgid = self.ldap_client.search_ext( | ||
m_cn, | ||
ldap.SCOPE_BASE, | ||
self.config.filter, | ||
serverctrls=[self.lc], | ||
) | ||
_m_dn, m_attrs = self.ldap_client.result3(manager_msgid)[1][0] | ||
manager_ldap = guess_person_ldap(m_attrs) | ||
manager_ldap = guess_person_ldap(m_attrs, self.config) | ||
except ldap.LDAPError as e: | ||
self.report.report_warning( | ||
dn, "manager LDAP search failed: {}".format(e) | ||
|
@@ -241,26 +274,37 @@ def build_corp_user_mce( | |
""" | ||
Create the MetadataChangeEvent via DN and attributes. | ||
""" | ||
ldap_user = guess_person_ldap(attrs) | ||
ldap_user = guess_person_ldap(attrs, self.config) | ||
|
||
if self.config.drop_missing_first_last_name and ( | ||
"givenName" not in attrs or "sn" not in attrs | ||
self.config.attrs_mapping["firstName"] not in attrs | ||
jjoyce0510 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
or self.config.attrs_mapping["lastName"] not in attrs | ||
): | ||
return None | ||
full_name = attrs["cn"][0].decode() | ||
first_name = attrs["givenName"][0].decode() | ||
last_name = attrs["sn"][0].decode() | ||
|
||
email = (attrs["mail"][0]).decode() if "mail" in attrs else ldap_user | ||
full_name = attrs[self.config.attrs_mapping["fullName"]][0].decode() | ||
first_name = attrs[self.config.attrs_mapping["firstName"]][0].decode() | ||
last_name = attrs[self.config.attrs_mapping["lastName"]][0].decode() | ||
|
||
email = ( | ||
(attrs[self.config.attrs_mapping["mail"]][0]).decode() | ||
if self.config.attrs_mapping["mail"] in attrs | ||
else ldap_user | ||
) | ||
display_name = ( | ||
(attrs["displayName"][0]).decode() if "displayName" in attrs else full_name | ||
(attrs[self.config.attrs_mapping["displayName"]][0]).decode() | ||
if self.config.attrs_mapping["displayName"] in attrs | ||
else full_name | ||
) | ||
department = ( | ||
(attrs["departmentNumber"][0]).decode() | ||
if "departmentNumber" in attrs | ||
(attrs[self.config.attrs_mapping["departmentNumber"]][0]).decode() | ||
if self.config.attrs_mapping["departmentNumber"] in attrs | ||
else None | ||
) | ||
title = ( | ||
attrs[self.config.attrs_mapping["title"]][0].decode() | ||
if self.config.attrs_mapping["title"] in attrs | ||
else None | ||
) | ||
title = attrs["title"][0].decode() if "title" in attrs else None | ||
manager_urn = f"urn:li:corpuser:{manager_ldap}" if manager_ldap else None | ||
|
||
return MetadataChangeEvent( | ||
|
@@ -284,12 +328,16 @@ def build_corp_user_mce( | |
|
||
def build_corp_group_mce(self, attrs: dict) -> Optional[MetadataChangeEvent]: | ||
"""Creates a MetadataChangeEvent for LDAP groups.""" | ||
cn = attrs.get("cn") | ||
cn = attrs.get(self.config.attrs_mapping["group_urn"]) | ||
if cn: | ||
full_name = cn[0].decode() | ||
owners = parse_from_attrs(attrs, "owner") | ||
members = parse_from_attrs(attrs, "uniqueMember") | ||
email = attrs["mail"][0].decode() if "mail" in attrs else full_name | ||
owners = parse_from_attrs(attrs, self.config.attrs_mapping["owner"]) | ||
members = parse_from_attrs(attrs, self.config.attrs_mapping["uniqueMember"]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remember -> this members field is deprecated. instead, we should be populating the "GroupMembership" aspect of the user object.. Were you intending to do this as a followup? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, as a part of #3335 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wonderful! Thanks for the update |
||
email = ( | ||
attrs[self.config.attrs_mapping["mail"]][0].decode() | ||
if self.config.attrs_mapping["mail"] in attrs | ||
else full_name | ||
) | ||
|
||
return MetadataChangeEvent( | ||
proposedSnapshot=CorpGroupSnapshotClass( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the attribute used to construct the urn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks for clarifying that in the docs!