Support arbitrary tuples #34

regexident · 2021-07-17T20:08:50Z

This PR aims to make datafrog less dependent on(Key, Value) tuples, lessening the burden of having to create lots and lots of intermediary variables that do little more than move an element to the front of the tuple, so it can be used as key.

Before

As such where you currently have to introduce auxiliary variables of (Key, Value) tuples:

let variable: Variable<(f32, i32, bool)> = …;
let relation: Relation<(u8, i32)> = …;

// Auxiliary variables/relations:
let auxiliary_variable: Variable<(i32, (f32, bool))> = …;
let auxiliary_relation: Relation<(i32, u8)> = …;

let result_variable = iteration.variable::<(u8, bool)>("(u8, bool)");

while iteration.changed() {
    // Additional maintenance for auxiliary variables:
    auxiliary_variable.from_map(&variable, |tuple|
        (tuple.1, tuple.0)
    );
    auxiliary_relation.from_map(&relation, |tuple|
        (tuple.1, tuple.0)
    );

    result_variable.from_join(
        &auxiliary_variable,
        &auxiliary_relation,
        |_, (_, boolean1), (_, byte2)| (byte2, boolean1),
    );
}

After

… with this PR you simply extract the key from an arbitrary tuple index using a closure:

let variable: Variable<(f32, i32, bool)> = …;
let relation: Relation<(u8, i32)> = …;

let result_variable = iteration.variable::<(u8, bool)>("(u8, bool)");

while iteration.changed() {
    result_variable.from_join_by(
        &variable,
        &relation,
        |(_, integer, _)| integer,
        |(_, integer)| integer,
        |(_, _, boolean1), (byte2, _)| (byte2, boolean1),
    );
}

This cuts down the maintenance burden of having to keep several additional auxiliary variables and relations up-to-date, reducing the mental overhead and brings your datafrog Rust code a little bit closer to their corresponding datalog rules.

@frankmcsherry @nikomatsakis I hope that these changes are not in conflict with the soundness of the API of datafrog or the semantics of Datalog?

Performance

I did some performance testing with criterion, but did not find any consistent and noteworthy performance regressions for existing APIs and found the overhead of the …_by method variants minimal.

Known limitations

Unfortunately this PR only allows lenses to return references to arbitrary elements of the tuple (via Accessor1: Fn(&Tuple1) -> &Key), but no owned ad-hoc values.

I had initially hoped to be able to allow for arbitrary Key values, such as ad-hoc tuples or other computed values. This would have allowed for a returning (&X, &Y) as key for (X, Y, Z) tuple, rather than just single individual elements of it. Support for arbitrary tuples as keys would have been very convenient for scenarios with complex n-ary keys such as borrow_check.rs.

Changing Accessor1: Fn(&Tuple1) -> &Key, to Accessor1: Fn(&Tuple1) -> Key, unfortunately causes accessors that return a reference to the tuple or elements of it, such as |tuple| &tuple or |tuple| &tuple.0 to trigger this language limitation. Unfortunately those are the most commonly used accessors and I'm not aware of a workaround that would allow both, arbitrary keys and borrowed tuple element keys.

Minimal reproducible snippet

fn dummy<'a, T, K: 'a, F>(value: &'a T, f: F) -> K
where
    F: Fn(&T) -> K
{
    f(value)
}

fn main() {
    let tuple = (1, 2, 3);
    let _ = dummy(&tuple, |tuple| &tuple.0);
}

Error:

10 |     let _ = dummy(&tuple, |tuple| &tuple.0);
   |                            ------ ^^^^^^^^ returning this value requires that `'1` must outlive `'2`
   |                            |    |
   |                            |    return type of closure is &'2 i32
   |                            has type `&'1 (i32, i32, i32)`

This is part #3 of a stacked pull-request and adds upon #33:

└── Split-up\ lib.rs (#32)
    └── Cleanup\ generics (#33)
        └── Support arbitrary tuples 👈🏻

ecstatic-morse · 2021-08-04T21:00:18Z

src/join.rs

            Ordering::Less => {
-                slice1 = gallop(slice1, |x| x.0 < slice2[0].0);
+                slice1 = gallop(slice1, |x| accessor1(x) < accessor2(&slice2[0]));


To join efficiently, tuples need to be in sorted order. This is a precondition of gallop/binary_search. In the current implementation, both slice1 and slice2 are in sorted order because they come from a Relation, but the mapped values (once accessor is applied) may not be.

It would be possible to make this a contract of the accessors (probably checked at runtime?), but that would be a pretty big footgun, and you would still have to re-index variables when their elements are in the wrong order. Accessors only help when tuple elements aren't grouped properly (e.g. (A, B, C) vs. ((A, B), C)). However, there are type-safe ways of handling that particular case.

Oh no, I feared I'd be missing something like that. 😟

I'm familiar with h-lists, but not exactly sure how one would apply them here? 🤔

datafrog expects leapers to use an interface like (Key, Value), which becomes awkward when Key or Value contain multiple elements themselves. That's why you see variables with types like ((Origin, Point), Origin) or ((Loan, Point), ()) over in Polonius. However, the only prerequisite for an efficient join is that variables/relations have a common prefix: You should be able to join (Loan, Origin, X) with (Loan, Origin, Y, Z) directly, without having to re-index them as ((Loan, Origin), ...).

I think it would be simplest to express this constraint on top of h-lists (with the typical ordering reversed, so (((A), B), C) instead of (A, (B, (C)))), since you can take a reference to a valid h-list representing any prefix of that type. You could also do something similar with extension traits on top of tuples (impl Prefix<(A, B)> for (A, B, C)), but everything would have to be Copy. This is fine for Polonius I suppose.

Thanks for the detailed explanation!

ecstatic-morse reviewed Aug 4, 2021

View reviewed changes

regexident added 2 commits August 14, 2021 15:58

Unify generic argument naming scheme

c33506a

Support arbitrary tuples via key accessors

ccf897c

ecstatic-morse mentioned this pull request Aug 14, 2021

Clean up generics for a more consistent naming scheme #33

Closed

regexident closed this Aug 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support arbitrary tuples #34

Support arbitrary tuples #34

regexident commented Jul 17, 2021 •

edited

Loading

ecstatic-morse Aug 4, 2021 •

edited

Loading

regexident Aug 14, 2021

ecstatic-morse Aug 14, 2021 •

edited

Loading

regexident Aug 14, 2021

Support arbitrary tuples #34

Support arbitrary tuples #34

Conversation

regexident commented Jul 17, 2021 • edited Loading

Before

After

Performance

Known limitations

ecstatic-morse Aug 4, 2021 • edited Loading

Choose a reason for hiding this comment

regexident Aug 14, 2021

Choose a reason for hiding this comment

ecstatic-morse Aug 14, 2021 • edited Loading

Choose a reason for hiding this comment

regexident Aug 14, 2021

Choose a reason for hiding this comment

regexident commented Jul 17, 2021 •

edited

Loading

ecstatic-morse Aug 4, 2021 •

edited

Loading

ecstatic-morse Aug 14, 2021 •

edited

Loading