Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support arbitrary tuples #34

Closed
wants to merge 2 commits into from
Closed

Support arbitrary tuples #34

wants to merge 2 commits into from

Conversation

regexident
Copy link
Contributor

@regexident regexident commented Jul 17, 2021

This PR aims to make datafrog less dependent on(Key, Value) tuples, lessening the burden of having to create lots and lots of intermediary variables that do little more than move an element to the front of the tuple, so it can be used as key.

Before

As such where you currently have to introduce auxiliary variables of (Key, Value) tuples:

let variable: Variable<(f32, i32, bool)> = …;
let relation: Relation<(u8, i32)> = …;

// Auxiliary variables/relations:
let auxiliary_variable: Variable<(i32, (f32, bool))> = …;
let auxiliary_relation: Relation<(i32, u8)> = …;

let result_variable = iteration.variable::<(u8, bool)>("(u8, bool)");

while iteration.changed() {
    // Additional maintenance for auxiliary variables:
    auxiliary_variable.from_map(&variable, |tuple|
        (tuple.1, tuple.0)
    );
    auxiliary_relation.from_map(&relation, |tuple|
        (tuple.1, tuple.0)
    );

    result_variable.from_join(
        &auxiliary_variable,
        &auxiliary_relation,
        |_, (_, boolean1), (_, byte2)| (byte2, boolean1),
    );
}

After

… with this PR you simply extract the key from an arbitrary tuple index using a closure:

let variable: Variable<(f32, i32, bool)> = …;
let relation: Relation<(u8, i32)> = …;

let result_variable = iteration.variable::<(u8, bool)>("(u8, bool)");

while iteration.changed() {
    result_variable.from_join_by(
        &variable,
        &relation,
        |(_, integer, _)| integer,
        |(_, integer)| integer,
        |(_, _, boolean1), (byte2, _)| (byte2, boolean1),
    );
}

This cuts down the maintenance burden of having to keep several additional auxiliary variables and relations up-to-date, reducing the mental overhead and brings your datafrog Rust code a little bit closer to their corresponding datalog rules.

@frankmcsherry @nikomatsakis I hope that these changes are not in conflict with the soundness of the API of datafrog or the semantics of Datalog?

Performance

I did some performance testing with criterion, but did not find any consistent and noteworthy performance regressions for existing APIs and found the overhead of the …_by method variants minimal.

Known limitations

Unfortunately this PR only allows lenses to return references to arbitrary elements of the tuple (via Accessor1: Fn(&Tuple1) -> &Key), but no owned ad-hoc values.

I had initially hoped to be able to allow for arbitrary Key values, such as ad-hoc tuples or other computed values. This would have allowed for a returning (&X, &Y) as key for (X, Y, Z) tuple, rather than just single individual elements of it. Support for arbitrary tuples as keys would have been very convenient for scenarios with complex n-ary keys such as borrow_check.rs.

Changing Accessor1: Fn(&Tuple1) -> &Key, to Accessor1: Fn(&Tuple1) -> Key, unfortunately causes accessors that return a reference to the tuple or elements of it, such as |tuple| &tuple or |tuple| &tuple.0 to trigger this language limitation. Unfortunately those are the most commonly used accessors and I'm not aware of a workaround that would allow both, arbitrary keys and borrowed tuple element keys.

Minimal reproducible snippet
fn dummy<'a, T, K: 'a, F>(value: &'a T, f: F) -> K
where
    F: Fn(&T) -> K
{
    f(value)
}

fn main() {
    let tuple = (1, 2, 3);
    let _ = dummy(&tuple, |tuple| &tuple.0);
}

Error:

10 |     let _ = dummy(&tuple, |tuple| &tuple.0);
   |                            ------ ^^^^^^^^ returning this value requires that `'1` must outlive `'2`
   |                            |    |
   |                            |    return type of closure is &'2 i32
   |                            has type `&'1 (i32, i32, i32)`

This is part #3 of a stacked pull-request and adds upon #33:

└── Split-up\ lib.rs (#32)
    └── Cleanup\ generics (#33)
        └── Support arbitrary tuples 👈🏻

Ordering::Less => {
slice1 = gallop(slice1, |x| x.0 < slice2[0].0);
slice1 = gallop(slice1, |x| accessor1(x) < accessor2(&slice2[0]));
Copy link
Contributor

@ecstatic-morse ecstatic-morse Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To join efficiently, tuples need to be in sorted order. This is a precondition of gallop/binary_search. In the current implementation, both slice1 and slice2 are in sorted order because they come from a Relation, but the mapped values (once accessor is applied) may not be.

It would be possible to make this a contract of the accessors (probably checked at runtime?), but that would be a pretty big footgun, and you would still have to re-index variables when their elements are in the wrong order. Accessors only help when tuple elements aren't grouped properly (e.g. (A, B, C) vs. ((A, B), C)). However, there are type-safe ways of handling that particular case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, I feared I'd be missing something like that. 😟

I'm familiar with h-lists, but not exactly sure how one would apply them here? 🤔

Copy link
Contributor

@ecstatic-morse ecstatic-morse Aug 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafrog expects leapers to use an interface like (Key, Value), which becomes awkward when Key or Value contain multiple elements themselves. That's why you see variables with types like ((Origin, Point), Origin) or ((Loan, Point), ()) over in Polonius. However, the only prerequisite for an efficient join is that variables/relations have a common prefix: You should be able to join (Loan, Origin, X) with (Loan, Origin, Y, Z) directly, without having to re-index them as ((Loan, Origin), ...).

I think it would be simplest to express this constraint on top of h-lists (with the typical ordering reversed, so (((A), B), C) instead of (A, (B, (C)))), since you can take a reference to a valid h-list representing any prefix of that type. You could also do something similar with extension traits on top of tuples (impl Prefix<(A, B)> for (A, B, C)), but everything would have to be Copy. This is fine for Polonius I suppose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants