Skip to content

Commit

Permalink
refactor(pdl): convert all pdsc to pdl
Browse files Browse the repository at this point in the history
Use the automated tool in https://linkedin.github.io/rest.li/pdl_migration
Also update all relevant docs
  • Loading branch information
mars-lan committed May 21, 2020
1 parent 2d06e1d commit 7a9cf19
Show file tree
Hide file tree
Showing 353 changed files with 3,640 additions and 4,068 deletions.
47 changes: 20 additions & 27 deletions docs/what/aspect.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# What is a metadata aspect?

A metadata aspect is a structured document, or more precisely a `record` in [PDSC](https://linkedin.github.io/rest.li/DATA-Data-Schema-and-Templates),
A metadata aspect is a structured document, or more precisely a `record` in [PDL](https://linkedin.github.io/rest.li/pdl_schema),
that represents a specific kind of metadata (e.g. ownership, schema, statistics, upstreams).
A metadata aspect on its own has no meaning (e.g. ownership for what?) and must be associated with a particular entity (e.g. ownership for PageViewEvent).
We purposely not to impose any model requirement on metadata aspects, as each aspect is expected to differ significantly.
Expand All @@ -22,31 +22,24 @@ Here’s an example metadata aspect. Note that the `admin` and `members` fields
It’s very natural to save such relationships as URNs in a metadata aspect.
The [relationship](relationship.md) section explains how this relationship can be explicitly extracted and modelled.

```json
{
"type": "record",
"name": "Membership",
"namespace": "com.linkedin.group",
"doc": "The membership metadata for a group",
"fields": [
{
"name": "auditStamp",
"type": "com.linkedin.common.AuditStamp",
"doc": "Audit stamp for the last change"
},
{
"name": "admin",
"type": "com.linkedin.common.CorpuserUrn",
"doc": "Admin of the group"
},
{
"name": "members",
"type": {
"type": "array",
"items": "com.linkedin.common.CorpuserUrn"
},
"doc": "Members of the group, ordered in descending importance"
}
]
```
namespace com.linkedin.group
import com.linkedin.common.AuditStamp
import com.linkedin.common.CorpuserUrn
/**
* The membership metadata for a group
*/
record Membership {
/** Audit stamp for the last change */
auditStamp: AuditStamp
/** Admin of the group */
admin: CorpuserUrn
/** Members of the group, ordered in descending importance */
members: array[CorpuserUrn]
}
```
84 changes: 34 additions & 50 deletions docs/what/entity.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ There’s no need to explicitly create or destroy entity instances. An entity in
Each entity has a special boolean attribute `removed`, which is used to mark the entity as "soft deleted",
without destroying existing relationships and attached metadata. This is useful for quickly reviving an incorrectly deleted entity instance without losing valuable metadata, e.g. human authored content.

An example [PDSC](https://linkedin.github.io/rest.li/pdsc_syntax) schema for the `Dataset` entity is shown below. Note that:
An example [PDL](https://linkedin.github.io/rest.li/pdl_schema) schema for the `Dataset` entity is shown below. Note that:
1. Each entity is expected to have a `urn` field with an entity-specific URN type.
2. The optional `removed` field is captured in BaseEntity, which is expected to be included by all entities.
3. All other fields are expected to be of primitive types or enum only.
Expand All @@ -26,58 +26,42 @@ this mostly depends on the underlying indexing system. For simplicity, we only a
4. The `urn` field is non-optional, while all other fields must be optional.
This is to support "partial update" when only a selective number of attributes need to be altered.

```json
{
"type": "record",
"name": "BaseEntity",
"namespace": "com.linkedin.metadata.entity",
"doc": "Common fields that apply to all entities",
"fields": [
{
"name": "removed",
"type": "boolean",
"doc": "Whether the entity has been removed or not",
"optional": true,
"default": false
}
]
```
namespace com.linkedin.metadata.entity
/**
* Common fields that apply to all entities
*/
record BaseEntity {
/** Whether the entity has been removed or not */
removed: optional boolean = false
}
```

```json
{
"type": "record",
"name": "DatasetEntity",
"namespace": "com.linkedin.metadata.entity",
"doc": "Data model for a dataset entity",
"include": [
"BaseEntity"
],
"fields": [
{
"name": "urn",
"type": "com.linkedin.common.DatasetUrn",
"doc": "Urn of the dataset"
},
{
"name": "name",
"type": "string",
"doc": "Dataset native name",
"optional": true
},
{
"name": "platform",
"type": "com.linkedin.common.DataPlatformUrn",
"doc": "Platform urn for the dataset.",
"optional": true
},
{
"name": "fabric",
"type": "com.linkedin.common.FabricType",
"doc": "Fabric type where dataset belongs to.",
"optional": true
}
]
```
namespace com.linkedin.metadata.entity
import com.linkedin.common.DataPlatformUrn
import com.linkedin.common.DatasetUrn
import com.linkedin.common.FabricType
/**
* Data model for a dataset entity
*/
record DatasetEntity includes BaseEntity {
/** Urn of the dataset */
urn: DatasetUrn
/** Dataset native name */
name: optional string
/** Platform urn for the dataset */
platform: optional DataPlatformUrn
/** Fabric type where dataset belongs to */
origin: optional FabricType
}
```

Expand Down
124 changes: 56 additions & 68 deletions docs/what/relationship.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,88 +16,76 @@ Similar to an entity, a relationship can also be associated with optional attrib
For example, from the `Membership` metadata aspect shown below, we’re able to derive the `HasMember` relationship that links a specific `Group` to a specific `User`. We can also include additional attribute to the relationship, e.g. importance, which corresponds to the position of the specific member in the original membership array. This allows complex graph query that travel only relationships that match certain criteria, e.g. "returns only the top-5 most important members of this group."
Similar to the entity attributes, relationship attributes should only be added based on the expected query patterns to reduce the indexing cost.

```json
{
"type": "record",
"name": "Membership",
"namespace": "com.linkedin.group",
"doc": "The membership metadata for a group",
"fields": [
{
"name": "auditStamp",
"type": "com.linkedin.common.AuditStamp",
"doc": "Audit stamp for the last change"
},
{
"name": "admin",
"type": "com.linkedin.common.CorpuserUrn",
"doc": "Admin of the group"
},
{
"name": "members",
"type": {
"type": "array",
"items": "com.linkedin.common.CorpuserUrn"
},
"doc": "Members of the group, ordered in descending importance"
}
]
```
namespace: com.linkedin.group
import com.linkedin.common.AuditStamp
import com.linkedin.common.CorpuserUrn
/**
* The membership metadata for a group
*/
record Membership {
/** Audit stamp for the last change */
modified: AuditStamp
/** Admin of the group */
admin: CorpuserUrn
/** Members of the group, ordered in descending importance */
members: array[CorpuserUrn]
}
```

Relationships are meant to be "entity-neutral". In other words, one would expect to use the same `OwnedBy` relationship to link a `Dataset` to a `User` and to link a `Dashboard` to a `User`. As Pegasus doesn’t allow typing a field using multiple URNs (because they’re all essentially strings), we resort to using generic URN type for the source and destination.
We also introduce a non-standard property pairings to limit the allowed source and destination URN types.
We also introduce a `@pairings` [annotation](https://linkedin.github.io/rest.li/pdl_migration#shorthand-for-custom-properties) to limit the allowed source and destination URN types.

While it’s possible to model relationships in rest.li as [association resources](https://linkedin.github.io/rest.li/modeling/modeling#association), which often get stored as mapping tables, it is far more common to model them as "foreign keys" field in a metadata aspect. For instance, the `Ownership` aspect is likely to contain an array of owner’s corpuser URNs.

Below is an example of how a relationship is modeled in PDSC. Note that:
Below is an example of how a relationship is modeled in PDL. Note that:
1. As the `source` and `destination` are of generic URN type, we’re able to factor them out to a common `BaseRelationship` model.
2. Each model is expected to have a pairings property that is an array of all allowed source-destination URN pairs.
2. Each model is expected to have a `@pairings` annotation that is an array of all allowed source-destination URN pairs.
3. Unlike entity attributes, there’s no requirement on making all relationship attributes optional since relationships do not support partial updates.

```json
{
"type": "record",
"name": "BaseRelationship",
"namespace": "com.linkedin.metadata.relationship",
"doc": "Common fields that apply to all relationships",
"fields": [
{
"name": "source",
"type": "com.linkedin.common.Urn",
"doc": "Urn for the source of the relationship"
},
{
"name": "destination",
"type": "com.linkedin.common.Urn",
"doc": "Urn for the destination of the relationship"
}
]
```
namespace com.linkedin.metadata.relationship
import com.linkedin.common.Urn
/**
* Common fields that apply to all relationships
*/
record BaseRelationship {
/**
* Urn for the source of the relationship
*/
source: Urn
/**
* Urn for the destination of the relationship
*/
destination: Urn
}
```

```json
```
namespace com.linkedin.metadata.relationship
/**
* Data model for a has-member relationship
*/
@pairings = [ {
"destination" : "com.linkedin.common.urn.CorpGroupUrn",
"source" : "com.linkedin.common.urn.CorpUserUrn"
} ]
record HasMembership includes BaseRelationship
{
"type": "record",
"name": "HasMember",
"namespace": "com.linkedin.metadata.relationship",
"doc": "Data model for a has-member relationship",
"include": [
"BaseRelationship"
],
"pairings": [
{
"source": "com.linkedin.common.urn.CorpGroupUrn",
"destination": "com.linkedin.common.urn.CorpUserUrn"
}
],
"fields": [
{
"name": "importance",
"type": "int",
"doc": "The importance of the membership"
}
]
/**
* The importance of the membership
*/
importance: int
}
```

Expand Down
Loading

0 comments on commit 7a9cf19

Please sign in to comment.