Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1395] Marshalling Set and HashSet #1405

Merged
merged 11 commits into from
Oct 10, 2019
1 change: 1 addition & 0 deletions dhall/dhall.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,7 @@ Library
exceptions >= 0.8.3 && < 0.11,
filepath >= 1.4 && < 1.5 ,
haskeline >= 0.7.2.1 && < 0.8 ,
hashable >= 1.2 && < 1.3 ,
lens-family-core >= 1.0.0 && < 2.1 ,
megaparsec >= 6.5.0 && < 7.1 ,
memory >= 0.14 && < 0.16,
Expand Down
97 changes: 92 additions & 5 deletions dhall/src/Dhall.hs
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ module Dhall
, sequence
, list
, vector
, set
, setFromDistinctList
, hashSet
, Dhall.map
, pairFromMapEntry
, unit
Expand Down Expand Up @@ -119,6 +122,7 @@ import Data.Either.Validation (Validation(..), ealt, eitherToValidation, validat
import Data.Fix (Fix(..))
import Data.Functor.Contravariant (Contravariant(..), (>$<), Op(..))
import Data.Functor.Contravariant.Divisible (Divisible(..), divided)
import Data.Hashable (Hashable)
import Data.List.NonEmpty (NonEmpty (..))
import Data.Map (Map)
import Data.Monoid ((<>))
Expand All @@ -136,7 +140,7 @@ import Dhall.Import (Imported(..))
import Dhall.Parser (Src(..))
import Dhall.TypeCheck (DetailedTypeError(..), TypeError, X)
import GHC.Generics
import Lens.Family (LensLike', set, view)
import Lens.Family (LensLike', view)
import Numeric.Natural (Natural)
import Prelude hiding (maybe, sequence)
import System.FilePath (takeDirectory)
Expand All @@ -149,11 +153,13 @@ import qualified Data.Functor.Compose
import qualified Data.Functor.Product
import qualified Data.Map
import qualified Data.Maybe
import qualified Data.List
import qualified Data.List.NonEmpty
import qualified Data.Semigroup
import qualified Data.Scientific
import qualified Data.Sequence
import qualified Data.Set
import qualified Data.HashSet
import qualified Data.Text
import qualified Data.Text.IO
import qualified Data.Text.Lazy
Expand All @@ -167,6 +173,7 @@ import qualified Dhall.Parser
import qualified Dhall.Pretty.Internal
import qualified Dhall.TypeCheck
import qualified Dhall.Util
import qualified Lens.Family

-- $setup
-- >>> :set -XOverloadedStrings
Expand Down Expand Up @@ -405,8 +412,8 @@ inputWithSettings settings (Type {..}) txt = do
let EvaluateSettings {..} = _evaluateSettings

let transform =
set Dhall.Import.normalizer _normalizer
. set Dhall.Import.startingContext _startingContext
Lens.Family.set Dhall.Import.normalizer _normalizer
. Lens.Family.set Dhall.Import.startingContext _startingContext

let status = transform (Dhall.Import.emptyStatus _rootDirectory)

Expand Down Expand Up @@ -499,8 +506,8 @@ inputExprWithSettings settings txt = do
let EvaluateSettings {..} = _evaluateSettings

let transform =
set Dhall.Import.normalizer _normalizer
. set Dhall.Import.startingContext _startingContext
Lens.Family.set Dhall.Import.normalizer _normalizer
. Lens.Family.set Dhall.Import.startingContext _startingContext

let status = transform (Dhall.Import.emptyStatus _rootDirectory)

Expand Down Expand Up @@ -791,6 +798,77 @@ list = fmap Data.Foldable.toList . sequence
vector :: Type a -> Type (Vector a)
vector = fmap Data.Vector.fromList . list

{-| Decode a `Set` from a `List`

>>> input (set natural) "[1, 2, 3]"
fromList [1,2,3]

Duplicate elements are ignored.
-}
set :: (Ord a) => Type a -> Type (Data.Set.Set a)
set = fmap Data.Set.fromList . list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be the same as setFromDistinctList. My reasoning is that by default we should fail loudly if we detect anything wrong

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe keep this definition as ~setIgnoringDuplicates or setAllowingDuplicates?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seems reasonable to me


{-| Decode a `HashSet` from a `List`

>>> input (hashSet natural) "[1, 2, 3]"
fromList [1,2,3]

Duplicate elements are ignored.
-}
hashSet :: (Hashable a, Ord a) => Type a -> Type (Data.HashSet.HashSet a)
hashSet = fmap Data.HashSet.fromList . list

{-| Decode a `Set` from a `List` with distinct elements

>>> input (setFromDistinctList natural) "[1, 2, 3]"
fromList [1,2,3]

An error is thrown if the list contains duplicates.

> >>> input (setFromDistinctList natural) "[1, 1, 3]"
> *** Exception: Error: Failed extraction
>
> The expression type-checked successfully but the transformation to the target
> type failed with the following error:
>
> One duplicate element in the list: 1
>

> >>> input (setFromDistinctList natural) "[1, 1, 3, 3]"
> *** Exception: Error: Failed extraction
>
> The expression type-checked successfully but the transformation to the target
> type failed with the following error:
>
> 2 duplicates were found in the list, including 1
>

-}
setFromDistinctList :: (Ord a, Show a) => Type a -> Type (Data.Set.Set a)
setFromDistinctList (Type extractIn expectedIn) = Type extractOut expectedOut
where
extractOut (ListLit _ es) = case traverse extractIn es of
Success esSeq
| sameSize -> Success esSet
| length duplicates == 1 -> extractError err1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like this would make it cheaper to distinguish Success and Failure:

Suggested change
| length duplicates == 1 -> extractError err1
| otherwise -> extractError err
where err | length duplicates == 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be cheaper?
I don't mind, but err1 and errN take up a few lines, I think the structure code may be harder to read...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current situation it seems that we need to compute length duplicates to get the Failure constructor. I believe that if we move the length duplicates check into the computation of the error message, the caller can determine the constructor without computing length duplicates.

Maybe GHC is smart enough, but it's better not to rely on it too much.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Failure is only accessed if it's not a Success. If it's not a success, it doesn't go in the conditions.

f x = case x of
    Just x | error "whoops" -> 1
    Nothing -> 0

ghci> f Nothing
0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you mean. extractError results in Failure too.

I'm assuming usage like

case toMonadic (extract (setFromDistinctList elementType) expr) of
  Right s -> …
  Left _ -> … -- handle failure but ignore the error message

I'm arguing that even if expr contains duplicates, it shouldn't be necessary to force length duplicates, but the current code looks like it does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand what you mean now, users should be able to pattern match on Failure without extra computation. That's a good point and sorry for being so dense ^^

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I wasn't very clear!

| otherwise -> extractError errN
where
esList = Data.Foldable.toList esSeq
esSet = Data.Set.fromList esList
sameSize = Data.Set.size esSet == Data.Sequence.length esSeq
duplicates = esList Data.List.\\ Data.Foldable.toList esSet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're serious about the possibility of duplicates due to distinct Dhall terms having the same Haskell representation, this isn't sufficient. deleteBy (\x y -> extractIn x == extractIn y) should work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage, esSet and esList are Haskell values already because we are inside of the case traverse extractIn es of.
Maybe the es notation isn't helping, to be honest I don't even know what es stands for, I copied it from other functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, my misunderstanding!

es is for "expressions", I guess. You could use values or vs for the Haskell values.

err1 = "One duplicate element in the list: "
<> (Data.Text.pack $ show $ head duplicates)
errN = Data.Text.pack $ unwords
[ show $ length duplicates
, "duplicates were found in the list, including"
, show $ head duplicates
]
Failure f -> Failure f
extractOut expr = typeError expectedOut expr

expectedOut = App List expectedIn

{-| Decode a `Map` from a @toMap@ expression or generally a @Prelude.Map.Type@

>>> input (Dhall.map strictText bool) "toMap { a = True, b = False }"
Expand Down Expand Up @@ -935,6 +1013,12 @@ instance Interpret a => Interpret [a] where
instance Interpret a => Interpret (Vector a) where
autoWith opts = vector (autoWith opts)

instance (Interpret a, Ord a) => Interpret (Data.Set.Set a) where
autoWith opts = set (autoWith opts)

instance (Interpret a, Hashable a, Ord a) => Interpret (Data.HashSet.HashSet a) where
autoWith opts = hashSet (autoWith opts)

instance (Ord k, Interpret k, Interpret v) => Interpret (Map k v) where
autoWith opts = Dhall.map (autoWith opts) (autoWith opts)

Expand Down Expand Up @@ -1659,6 +1743,9 @@ instance Inject a => Inject (Vector a) where
instance Inject a => Inject (Data.Set.Set a) where
injectWith = fmap (contramap Data.Set.toList) injectWith

instance Inject a => Inject (Data.HashSet.HashSet a) where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should sort the elements… 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In math, sets have no notion of ordering so I would tend towards not sorting it...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having second thoughts, maybe we should sort it... Extra opinion @Gabriel439?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, this is something that we could change later on. For now, maybe just point out that there's no sorting, and add a doctest that demonstrates it.

Copy link
Collaborator Author

@jiegillet jiegillet Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've done one for HashSet (not pushed yet), but what about Set? Data.Set. toList and Data.Set.toAscList are both O(n), I might as well use toAscList right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Data.Set.toList is defined via toAscList. toAscList is more explicit though. 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Good.

injectWith = fmap (contramap Data.HashSet.toList) injectWith

instance (Inject a, Inject b) => Inject (a, b)

{-| This is the underlying class that powers the `Interpret` class's support
Expand Down