Skip to content

Commit

Permalink
feat(ingestion) ldap: make ldap atttrs keys configurable (datahub-pro…
Browse files Browse the repository at this point in the history
…ject#4599)

* Code changes to consume attrs_mapping from ldap recipe

* Doc changes outlining how to use different configurations in recipe
  • Loading branch information
atulsaurav committed Apr 17, 2022
1 parent e572af6 commit cab247d
Show file tree
Hide file tree
Showing 2 changed files with 112 additions and 33 deletions.
44 changes: 44 additions & 0 deletions metadata-ingestion/source_docs/ldap.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,24 @@ source:
# Options
base_dn: "dc=example,dc=org"

# Optional attribute mapping to allow ldap config differences across orgs
attrs_mapping:
sAMAccountName: sAMAccountName
uid: uid
objectClass: objectClass
manager: manager
givenName: givenName
sn: sn
cn: cn
mail: mail
displayName: displayName
departmentNumber: departmentNumber
title: title
owner: owner
managedBy: managedBy
uniqueMember: uniqueMember
member: member

sink:
# sink configs
```
Expand All @@ -51,11 +69,37 @@ Note that a `.` is used to denote nested fields in the YAML recipe.
| `filter` | | `"(objectClass=*)"` | LDAP extractor filter. |
| `drop_missing_first_last_name` | | `True` | If set to true, any users without first and last names will be dropped. |
| `page_size` | | `20` | Size of each page to fetch when extracting metadata. |
| `attrs_mapping.sAMAccountName` | | `sAMAccountName` | Alternate attrs key representing same information as sAMAccountName in the organization. |
| `attrs_mapping.uid` | | `uid` | Alternate attrs key representing same information as uid in the organization. |
| `attrs_mapping.objectClass` | | `objectClass` | Alternate attrs key representing same information as objectClass in the organization. |
| `attrs_mapping.manager` | | `manager` | Alternate attrs key representing same information as manager in the organization. |
| `attrs_mapping.givenName` | | `givenName` | Alternate attrs key representing same information as givenName in the organization. |
| `attrs_mapping.sn` | | `sn` | Alternate attrs key representing same information as sn in the organization. |
| `attrs_mapping.cn` | | `cn` | Alternate attrs key representing same information as cn in the organization. |
| `attrs_mapping.mail` | | `mail` | Alternate attrs key representing same information as mail in the organization. |
| `attrs_mapping.displayName` | | `displayName` | Alternate attrs key representing same information as displayName in the organization. |
| `attrs_mapping.departmentNumber`| | `departmentNumber` | Alternate attrs key representing same information as departmentNumber in the organization. |
| `attrs_mapping.title` | | `title` | Alternate attrs key representing same information as title in the organization. |
| `attrs_mapping.owner` | | `owner` | Alternate attrs key representing same information as owner in the organization. |
| `attrs_mapping.managedBy` | | `managedBy` | Alternate attrs key representing same information as managedBy in the organization. |
| `attrs_mapping.uniqueMember` | | `uniqueMember` | Alternate attrs key representing same information as uniqueMember in the organization. |
| `attrs_mapping.member` | | `member` | Alternate attrs key representing same information as member in the organization. |

The `drop_missing_first_last_name` should be set to true if you've got many "headless" user LDAP accounts
for devices or services should be excluded when they do not contain a first and last name. This will only
impact the ingestion of LDAP users, while LDAP groups will be unaffected by this config option.

### Configurable LDAP

Every organization may implement LDAP slightly differently based on their needs. The makes a standard LDAP recipe ineffecvtive due to missing data during LDAP ingestion. For instance, LDAP recipe assumes department information for a CorpUser would be present in the `departmentNumber` attribute. If an organization chose not to implement that attribute or rather capture similar imformation in the `department` attribute, that information can be missed during LDAP ingestion (even though the information may be present in LDAP in a slightly different form). LDAP source provides flexibility to provide optional mapping for such variations to be reperesented under attrs_mapping. So if an organization represented `departmentNumber` as `department` and `mail` as `email`, the recipe can be adapted to customiza that mapping based on need. An example is show below. If `attrs_mapping` section is not provided, the default mapping will apply.

```yaml
# in config section
attrs_mapping:
departmentNumber: department
mail: email
```
## Compatibility
Coming soon!
Expand Down
101 changes: 68 additions & 33 deletions metadata-ingestion/src/datahub/ingestion/source/ldap.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,6 @@ def set_cookie(
return bool(cookie)


def guess_person_ldap(attrs: Dict[str, Any]) -> Optional[str]:
"""Determine the user's LDAP based on the DN and attributes."""
if "sAMAccountName" in attrs:
return attrs["sAMAccountName"][0].decode()
if "uid" in attrs:
return attrs["uid"][0].decode()
return None


class LDAPSourceConfig(ConfigModel):
"""Config used by the LDAP Source."""

Expand All @@ -75,6 +66,33 @@ class LDAPSourceConfig(ConfigModel):

page_size: int = 20

# default mapping for attrs
attrs_mapping: Dict[str, Any] = {}
attrs_mapping["sAMAccountName"] = "sAMAccountName"
attrs_mapping["uid"] = "uid"
attrs_mapping["objectClass"] = "objectClass"
attrs_mapping["manager"] = "manager"
attrs_mapping["givenName"] = "givenName"
attrs_mapping["sn"] = "sn"
attrs_mapping["cn"] = "cn"
attrs_mapping["mail"] = "mail"
attrs_mapping["displayName"] = "displayName"
attrs_mapping["departmentNumber"] = "departmentNumber"
attrs_mapping["title"] = "title"
attrs_mapping["owner"] = "owner"
attrs_mapping["managedBy"] = "managedBy"
attrs_mapping["uniqueMember"] = "uniqueMember"
attrs_mapping["member"] = "member"


def guess_person_ldap(attrs: Dict[str, Any], config: LDAPSourceConfig) -> Optional[str]:
"""Determine the user's LDAP based on the DN and attributes."""
if config.attrs_mapping["sAMAccountName"] in attrs:
return attrs[config.attrs_mapping["sAMAccountName"]][0].decode()
if config.attrs_mapping["uid"] in attrs:
return attrs[config.attrs_mapping["uid"]][0].decode()
return None


@dataclasses.dataclass
class LDAPSourceReport(SourceReport):
Expand Down Expand Up @@ -148,15 +166,17 @@ def get_workunits(self) -> Iterable[MetadataWorkUnit]:
continue

if (
b"inetOrgPerson" in attrs["objectClass"]
or b"posixAccount" in attrs["objectClass"]
or b"person" in attrs["objectClass"]
b"inetOrgPerson" in attrs[self.config.attrs_mapping["objectClass"]]
or b"posixAccount"
in attrs[self.config.attrs_mapping["objectClass"]]
or b"person" in attrs[self.config.attrs_mapping["objectClass"]]
):
yield from self.handle_user(dn, attrs)
elif (
b"posixGroup" in attrs["objectClass"]
or b"organizationalUnit" in attrs["objectClass"]
or b"group" in attrs["objectClass"]
b"posixGroup" in attrs[self.config.attrs_mapping["objectClass"]]
or b"organizationalUnit"
in attrs[self.config.attrs_mapping["objectClass"]]
or b"group" in attrs[self.config.attrs_mapping["objectClass"]]
):
yield from self.handle_group(dn, attrs)
else:
Expand All @@ -177,17 +197,17 @@ def handle_user(self, dn: str, attrs: Dict[str, Any]) -> Iterable[MetadataWorkUn
work unit based on the information.
"""
manager_ldap = None
if "manager" in attrs:
if self.config.attrs_mapping["manager"] in attrs:
try:
m_cn = attrs["manager"][0].decode()
m_cn = attrs[self.config.attrs_mapping["manager"]][0].decode()
manager_msgid = self.ldap_client.search_ext(
m_cn,
ldap.SCOPE_BASE,
self.config.filter,
serverctrls=[self.lc],
)
_m_dn, m_attrs = self.ldap_client.result3(manager_msgid)[1][0]
manager_ldap = guess_person_ldap(m_attrs)
manager_ldap = guess_person_ldap(m_attrs, self.config)
except ldap.LDAPError as e:
self.report.report_warning(
dn, "manager LDAP search failed: {}".format(e)
Expand Down Expand Up @@ -220,26 +240,37 @@ def build_corp_user_mce(
"""
Create the MetadataChangeEvent via DN and attributes.
"""
ldap_user = guess_person_ldap(attrs)
ldap_user = guess_person_ldap(attrs, self.config)

if self.config.drop_missing_first_last_name and (
"givenName" not in attrs or "sn" not in attrs
self.config.attrs_mapping["givenName"] not in attrs
or self.config.attrs_mapping["sn"] not in attrs
):
return None
full_name = attrs["cn"][0].decode()
first_name = attrs["givenName"][0].decode()
last_name = attrs["sn"][0].decode()

email = (attrs["mail"][0]).decode() if "mail" in attrs else ldap_user
full_name = attrs[self.config.attrs_mapping["cn"]][0].decode()
first_name = attrs[self.config.attrs_mapping["givenName"]][0].decode()
last_name = attrs[self.config.attrs_mapping["sn"]][0].decode()

email = (
(attrs[self.config.attrs_mapping["mail"]][0]).decode()
if self.config.attrs_mapping["mail"] in attrs
else ldap_user
)
display_name = (
(attrs["displayName"][0]).decode() if "displayName" in attrs else full_name
(attrs[self.config.attrs_mapping["displayName"]][0]).decode()
if self.config.attrs_mapping["displayName"] in attrs
else full_name
)
department = (
(attrs["departmentNumber"][0]).decode()
if "departmentNumber" in attrs
(attrs[self.config.attrs_mapping["departmentNumber"]][0]).decode()
if self.config.attrs_mapping["departmentNumber"] in attrs
else None
)
title = (
attrs[self.config.attrs_mapping["title"]][0].decode()
if self.config.attrs_mapping["title"] in attrs
else None
)
title = attrs["title"][0].decode() if "title" in attrs else None
manager_urn = f"urn:li:corpuser:{manager_ldap}" if manager_ldap else None

return MetadataChangeEvent(
Expand All @@ -263,12 +294,16 @@ def build_corp_user_mce(

def build_corp_group_mce(self, attrs: dict) -> Optional[MetadataChangeEvent]:
"""Creates a MetadataChangeEvent for LDAP groups."""
cn = attrs.get("cn")
cn = attrs.get(self.config.attrs_mapping["cn"])
if cn:
full_name = cn[0].decode()
owners = parse_from_attrs(attrs, "owner")
members = parse_from_attrs(attrs, "uniqueMember")
email = attrs["mail"][0].decode() if "mail" in attrs else full_name
owners = parse_from_attrs(attrs, self.config.attrs_mapping["owner"])
members = parse_from_attrs(attrs, self.config.attrs_mapping["uniqueMember"])
email = (
attrs[self.config.attrs_mapping["mail"]][0].decode()
if self.config.attrs_mapping["mail"] in attrs
else full_name
)

return MetadataChangeEvent(
proposedSnapshot=CorpGroupSnapshotClass(
Expand Down

0 comments on commit cab247d

Please sign in to comment.