-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1395] Marshalling Set and HashSet #1405
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good to me. We should definitely offer these Type
s.
If however we decide to add variants that don't accept duplicates, I think we need to discuss the naming of the Type
s, and the question of which variant the Interpret
instances should use.
@@ -1659,6 +1690,9 @@ instance Inject a => Inject (Vector a) where | |||
instance Inject a => Inject (Data.Set.Set a) where | |||
injectWith = fmap (contramap Data.Set.toList) injectWith | |||
|
|||
instance Inject a => Inject (Data.HashSet.HashSet a) where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we should sort the elements… 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In math, sets have no notion of ordering so I would tend towards not sorting it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having second thoughts, maybe we should sort it... Extra opinion @Gabriel439?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, this is something that we could change later on. For now, maybe just point out that there's no sorting, and add a doctest that demonstrates it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've done one for HashSet
(not pushed yet), but what about Set
? Data.Set. toList
and Data.Set.toAscList
are both O(n)
, I might as well use toAscList
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Data.Set.toList
is defined via toAscList
. toAscList
is more explicit though. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. Good.
Co-Authored-By: Simon Jakobi <[email protected]>
Co-Authored-By: Simon Jakobi <[email protected]>
I added In my opinion the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on the idea of just comparing the numbers of elements!
dhall/src/Dhall.hs
Outdated
> Non-distinct elements in fromList [NaturalLit 1,NaturalLit 1,NaturalLit 3] | ||
> | ||
-} | ||
distinctList :: (Ord a) => Type a -> Type (Data.Set.Set a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to call this setFromDistinctList
.
dhall/src/Dhall.hs
Outdated
distinctList (Type extractIn expectedIn) = Type extractOut expectedOut | ||
where | ||
extractOut (ListLit _ es) | ||
| length es == length (seqToSet es) = seqToSet <$> traverse extractIn es |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would fail to catch cases where distinct Dhall Expr
s map to the same Haskell value. So I believe we should check the size of the resulting Set
instead of the size of the Set
of Expr
s.
dhall/src/Dhall.hs
Outdated
distinctList (Type extractIn expectedIn) = Type extractOut expectedOut | ||
where | ||
extractOut (ListLit _ es) | ||
| length es == length (seqToSet es) = seqToSet <$> traverse extractIn es |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've previously had little bugs where, after a refactoring, null
was applied to an Either
instead of a []
or so.
Would you mind using the respective size
functions instead of length
?
dhall/src/Dhall.hs
Outdated
seqToSet :: (Ord a, Foldable t) => t a -> Data.Set.Set a | ||
seqToSet = Data.Set.fromList . Data.Foldable.toList | ||
|
||
err = "Non-distinct elements in " <> Data.Text.pack (show es) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message could get pretty huge. Maybe we could try to find the first duplicate (Haskell) element and just report that?
Next iteration, unfortunately it became more complex... If I want to show the Haskell duplicate elements, I need to apply |
dhall/src/Dhall.hs
Outdated
esList = Data.Foldable.toList esSeq | ||
esSet = Data.Set.fromList esList | ||
sameSize = Data.Set.size esSet == Data.Sequence.length esSeq | ||
duplicates = esList Data.List.\\ Data.Foldable.toList esSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're serious about the possibility of duplicates due to distinct Dhall terms having the same Haskell representation, this isn't sufficient. deleteBy (\x y -> extractIn x == extractIn y)
should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this stage, esSet
and esList
are Haskell values already because we are inside of the case traverse extractIn es of
.
Maybe the es
notation isn't helping, to be honest I don't even know what es
stands for, I copied it from other functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, my misunderstanding!
es
is for "expressions", I guess. You could use values
or vs
for the Haskell values.
dhall/src/Dhall.hs
Outdated
Duplicate elements are ignored. | ||
-} | ||
set :: (Ord a) => Type a -> Type (Data.Set.Set a) | ||
set = fmap Data.Set.fromList . list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be the same as setFromDistinctList
. My reasoning is that by default we should fail loudly if we detect anything wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's maybe keep this definition as ~setIgnoringDuplicates
or setAllowingDuplicates
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that seems reasonable to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, apart from a few more comments.
I'd be interested in feedback from @Gabriel439 and @neongreen though.
@@ -935,6 +1064,12 @@ instance Interpret a => Interpret [a] where | |||
instance Interpret a => Interpret (Vector a) where | |||
autoWith opts = vector (autoWith opts) | |||
|
|||
instance (Interpret a, Ord a, Show a) => Interpret (Data.Set.Set a) where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document how this instance handles duplicates? Maybe just reference setFromDistinctList
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can't be any duplicates here because the source is a Set
which turns into a Dhall List
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take that as an indication that we should find better names for Interpret
and Inject
. ;)
(Take a look at the definition on the line below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my God, don't I feel silly!
Thank you for taking some of the blame away from me :)
instance (Interpret a, Ord a, Show a) => Interpret (Data.Set.Set a) where | ||
autoWith opts = setFromDistinctList (autoWith opts) | ||
|
||
instance (Interpret a, Hashable a, Ord a, Show a) => Interpret (Data.HashSet.HashSet a) where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
dhall/src/Dhall.hs
Outdated
extractOut (ListLit _ es) = case traverse extractIn es of | ||
Success vSeq | ||
| sameSize -> Success vSet | ||
| length duplicates == 1 -> extractError err1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like this would make it cheaper to distinguish Success
and Failure
:
| length duplicates == 1 -> extractError err1 | |
| otherwise -> extractError err | |
where err | length duplicates == 1 … |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it be cheaper?
I don't mind, but err1
and errN
take up a few lines, I think the structure code may be harder to read...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current situation it seems that we need to compute length duplicates
to get the Failure
constructor. I believe that if we move the length duplicates
check into the computation of the error message, the caller can determine the constructor without computing length duplicates
.
Maybe GHC
is smart enough, but it's better not to rely on it too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But Failure
is only accessed if it's not a Success
. If it's not a success, it doesn't go in the conditions.
f x = case x of
Just x | error "whoops" -> 1
Nothing -> 0
ghci> f Nothing
0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you mean. extractError
results in Failure
too.
I'm assuming usage like
case toMonadic (extract (setFromDistinctList elementType) expr) of
Right s -> …
Left _ -> … -- handle failure but ignore the error message
I'm arguing that even if expr
contains duplicates, it shouldn't be necessary to force length duplicates
, but the current code looks like it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I understand what you mean now, users should be able to pattern match on Failure
without extra computation. That's a good point and sorry for being so dense ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. I wasn't very clear!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Feel free to merge whenever you are ready (or add the merge-me
label)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, apart from one more wibble. :)
Co-Authored-By: Simon Jakobi <[email protected]>
As discussed in #1395.
So far I have only implemented a version where duplicates are ignored,
Data.Set
doesn't provide tools to check for duplicates, and the notion of set doesn't really exist in dhall so I'm still a bit unsure about that...