Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement floating point conversion with ryu #365

Merged
merged 45 commits into from
Dec 4, 2021
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
ba61414
Implement floating point conversion with ryu
la-wu Feb 22, 2021
7d80321
Use manual strictness
la-wu Feb 22, 2021
356ed6c
Use checked shifts
la-wu Feb 23, 2021
c0a0583
Use builtin float-to-word conversion functions
la-wu Feb 25, 2021
ff2210d
Use builtin conversion to Bool
la-wu Feb 25, 2021
0bb5a01
Remove dependency on array package
la-wu Feb 25, 2021
de1d174
Handle non-exhaustive patterns
la-wu Feb 25, 2021
8dd7f16
Try using prim conversions directly
la-wu Feb 25, 2021
23d5cfe
Revert "Try using prim conversions directly"
la-wu Feb 28, 2021
755f58f
Dispatch to slow cast when builtin unavailable
la-wu Feb 28, 2021
4635e2b
Try bumping min version to 8.4.x
la-wu Feb 28, 2021
76b5e2e
Fix log10pow5 approximation and add unit test
la-wu Aug 8, 2021
b5f7086
Re-export floatDec and doubleDec to maintain public API
la-wu Aug 8, 2021
648bfae
Improve documentation and fixes for initial code review
la-wu Aug 8, 2021
d172abf
Improve table generation documentation and clean-up
la-wu Aug 8, 2021
5f3dce5
Improve documentation of f2s and d2s and cleanup
la-wu Aug 8, 2021
f1c6275
Use stricter integral types and annotate fromIntegral usages
la-wu Sep 18, 2021
c2c2c87
Add module descriptions and fix typos
la-wu Sep 18, 2021
f6497c2
Use internal FloatFormat instead of GHC.Float.FFFormat
la-wu Sep 18, 2021
60db980
Use monomorphic helpers for remaining integral conversions used by Re…
la-wu Sep 23, 2021
6ec7e2d
Remove usage of TemplateHaskell in RealFloat
la-wu Sep 23, 2021
a73c84e
Fix LUT usage on big-endian systems
la-wu Sep 24, 2021
3ccfd47
Add header for endianness detection
la-wu Sep 27, 2021
6b3eaaa
Fix big-endian word16 packing in fast digit formatting
la-wu Sep 29, 2021
abf6e04
Fix big-endian word128 read from raw addr
la-wu Sep 29, 2021
f771cd5
Clean up unused functions
la-wu Oct 5, 2021
b394896
Fix incorrect reciprocal function usage
la-wu Oct 5, 2021
9815597
Add more test coverage and fix doc example
la-wu Oct 5, 2021
c0648bb
Use quickcheck equality property in tests
la-wu Oct 6, 2021
d87d3ae
Format haddock headers more similarly to existing ones
la-wu Oct 6, 2021
5500d59
Use simpler reciprocal math for 32-bit words
la-wu Oct 7, 2021
7d7d7fa
Use boxed arithmetic in logic flow
la-wu Oct 12, 2021
906d6db
Support ghc 9.2 prim word changes
la-wu Oct 12, 2021
046a42b
Fix 32-bit support
la-wu Oct 12, 2021
8fafed4
Skip conversion to Double before fixed Float formatting
la-wu Nov 7, 2021
dde95e2
Tweak doc wording and add examples
la-wu Nov 7, 2021
415ac6f
Rename FExponent to FScientific
la-wu Nov 15, 2021
0474332
Use an opaque FloatFormat type for compatibility
la-wu Nov 15, 2021
9be8170
Rename float fixed-format to standard-format and other naming tweaks
la-wu Nov 17, 2021
f67df50
Encourage inlining by removing partial application
la-wu Nov 17, 2021
a01cb00
Fix some haddock links and accidental monospacing
la-wu Nov 22, 2021
0cc5417
Add explanation about difference between implementation and reference…
la-wu Nov 22, 2021
12436a2
Clarify default precision
la-wu Nov 22, 2021
d8dac2a
Point to ryu paper for more details
la-wu Nov 22, 2021
b70918b
Fix non-exhaustive warning for ghc 9.2
la-wu Nov 28, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Data/ByteString/Builder.hs
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ module Data.ByteString.Builder
, stringUtf8

, module Data.ByteString.Builder.ASCII
, module Data.ByteString.Builder.RealFloat

) where

Expand All @@ -261,6 +262,7 @@ import Data.ByteString.Builder.Internal
import qualified Data.ByteString.Builder.Prim as P
import qualified Data.ByteString.Lazy.Internal as L
import Data.ByteString.Builder.ASCII
import Data.ByteString.Builder.RealFloat

import Data.String (IsString(..))
import System.IO (Handle, IOMode(..), withBinaryFile)
Expand Down
27 changes: 1 addition & 26 deletions Data/ByteString/Builder/ASCII.hs
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ import Data.ByteString.Lazy as L
import Data.ByteString.Builder.Internal (Builder)
import qualified Data.ByteString.Builder.Prim as P
import qualified Data.ByteString.Builder.Prim.Internal as P
import Data.ByteString.Builder.RealFloat (floatDec, doubleDec)

import Foreign
import Foreign.C.Types
Expand All @@ -89,16 +90,6 @@ import Foreign.C.Types
-- Decimal Encoding
------------------------------------------------------------------------------


-- | Encode a 'String' using 'P.char7'.
{-# INLINE string7 #-}
string7 :: String -> Builder
string7 = P.primMapListFixed P.char7

------------------------------------------------------------------------------
-- Decimal Encoding
------------------------------------------------------------------------------

-- Signed integers
------------------

Expand Down Expand Up @@ -163,22 +154,6 @@ wordDec :: Word -> Builder
wordDec = P.primBounded P.wordDec


-- Floating point numbers
-------------------------

-- TODO: Use Bryan O'Sullivan's double-conversion package to speed it up.

-- | /Currently slow./ Decimal encoding of an IEEE 'Float'.
{-# INLINE floatDec #-}
floatDec :: Float -> Builder
floatDec = string7 . show

-- | /Currently slow./ Decimal encoding of an IEEE 'Double'.
{-# INLINE doubleDec #-}
doubleDec :: Double -> Builder
doubleDec = string7 . show
la-wu marked this conversation as resolved.
Show resolved Hide resolved


------------------------------------------------------------------------------
-- Hexadecimal Encoding
------------------------------------------------------------------------------
Expand Down
200 changes: 200 additions & 0 deletions Data/ByteString/Builder/RealFloat.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
-- |
-- Module : Data.ByteString.Builder.RealFloat
-- Copyright : (c) Lawrence Wu 2021
-- License : BSD-style
-- Maintainer : [email protected]
--
-- Floating point formatting for Bytestring.Builder
--
-- This module primarily exposes `floatDec` and `doubleDec` which do the
-- equivalent of converting through `string7 . show`.
la-wu marked this conversation as resolved.
Show resolved Hide resolved
--
-- It also exposes `formatFloat` and `formatDouble` with a similar API as
-- `GHC.Float.formatRealFloat`.
--
-- NB: this implementation matches `show`'s output (specifically with respect
-- to default rounding and length). In particular, there are boundary cases
-- where the closest and 'shortest' string representations are not used.
-- Mentions of 'shortest' in the docs below are with this caveat.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is a bit hard to understand I think. An example would be very helpful. Please also clarify what "this implementation" refers to. Maybe create a named chunk and refer to it from the relevant places.

Also note that single quotes (') are used for marking identifiers in Haddock. Escaping the quotes might work?! E.g. \'shortest\'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an example / handwavy explanation about the rounding / boundaries and pointed to the paper for details. TBH at this point I am far enough from the writing this code that I would need to redigest the exact semantics.

WRT quotes, thanks for pointing that out. I'm used to normal MD.


module Data.ByteString.Builder.RealFloat
sjakobi marked this conversation as resolved.
Show resolved Hide resolved
( FloatFormat(..)
, floatDec
, doubleDec
, formatFloat
, formatDouble
) where

import Data.ByteString.Builder.Internal (Builder)
import qualified Data.ByteString.Builder.RealFloat.Internal as R
import qualified Data.ByteString.Builder.RealFloat.F2S as RF
import qualified Data.ByteString.Builder.RealFloat.D2S as RD
import qualified Data.ByteString.Builder.Prim as BP
import GHC.Float (roundTo)
import GHC.Word (Word64)
import GHC.Show (intToDigit)

-- | Returns a rendered Float. Matches `show` in displaying in fixed or
-- scientific notation
la-wu marked this conversation as resolved.
Show resolved Hide resolved
{-# INLINABLE floatDec #-}
floatDec :: Float -> Builder
floatDec = formatFloat FGeneric Nothing

-- | Returns a rendered Double. Matches `show` in displaying in fixed or
-- scientific notation
{-# INLINABLE doubleDec #-}
doubleDec :: Double -> Builder
doubleDec = formatDouble FGeneric Nothing

-- | ByteString float-to-string format
data FloatFormat
= FExponent -- ^ scientific notation
la-wu marked this conversation as resolved.
Show resolved Hide resolved
| FFixed -- ^ fixed precision with `Maybe Int` digits after the decimal
| FGeneric -- ^ dispatches to fixed or exponent based on the exponent
sjakobi marked this conversation as resolved.
Show resolved Hide resolved
deriving Show
la-wu marked this conversation as resolved.
Show resolved Hide resolved

-- TODO: support precision argument for FGeneric and FExponent
sjakobi marked this conversation as resolved.
Show resolved Hide resolved
-- | Returns a rendered Float. Matches the API of `formatRealFloat` but does
-- not currently handle the precision argument in scientific notation.
--
-- The precision argument is used to truncate (or extend with 0s) the
-- 'shortest' rendered Float. A precision of 'Nothing' does no such
-- modifications and will return as many decimal places as the representation
-- demands.
la-wu marked this conversation as resolved.
Show resolved Hide resolved
--
-- e.g
--
-- >>> formatFloat FFixed (Just 1) 1.2345e-2
-- "0.0"
-- >>> formatFloat FFixed (Just 5) 1.2345e-2
-- "0.01234"
-- >>> formatFloat FFixed (Just 10) 1.2345e-2
-- "0.0123450000"
-- >>> formatFloat FFixed Nothing 1.2345e-2
-- "0.01234"
-- >>> formatFloat FExponent Nothing 12.345
-- "1.2345e1"
-- >>> formatFloat FGeneric Nothing 12.345
-- "12.345"
{-# INLINABLE formatFloat #-}
formatFloat :: FloatFormat-> Maybe Int -> Float -> Builder
la-wu marked this conversation as resolved.
Show resolved Hide resolved
formatFloat fmt prec f =
case fmt of
FGeneric ->
case specialStr f of
Just b -> b
Nothing ->
if e' >= 0 && e' <= 7
then sign f `mappend` showFixed (R.word32ToWord64 m) e' prec
else BP.primBounded (R.toCharsScientific (f < 0) m e) ()
FExponent -> RF.f2s f
FFixed ->
case specialStr f of
Just b -> b
Nothing -> sign f `mappend` showFixed (R.word32ToWord64 m) e' prec
where (RF.FloatingDecimal m e) = RF.f2Intermediate f
e' = R.int32ToInt e + R.decimalLength9 m

-- TODO: support precision argument for FGeneric and FExponent
-- | Returns a rendered Double. Matches the API of `formatRealFloat` but does
-- not currently handle the precision argument in scientific notation
--
-- The precision argument is used to truncate (or extend with 0s) the
-- 'shortest' rendered Double. A precision of 'Nothing' does no such
-- modifications and will return as many decimal places as the representation
-- demands.
--
-- e.g
--
-- >>> formatDouble FFixed (Just 1) 1.2345e-2
-- "0.0"
-- >>> formatDouble FFixed (Just 5) 1.2345e-2
-- "0.01234"
-- >>> formatDouble FFixed (Just 10) 1.2345e-2
-- "0.0123450000"
-- >>> formatDouble FFixed Nothing 1.2345e-2
-- "0.01234"
-- >>> formatDouble FExponent Nothing 12.345
-- "1.2345e1"
-- >>> formatDouble FGeneric Nothing 12.345
-- "12.345"
{-# INLINABLE formatDouble #-}
formatDouble :: FloatFormat-> Maybe Int -> Double -> Builder
formatDouble fmt prec f =
case fmt of
FGeneric ->
case specialStr f of
Just b -> b
Nothing ->
if e' >= 0 && e' <= 7
then sign f `mappend` showFixed m e' prec
else BP.primBounded (R.toCharsScientific (f < 0) m e) ()
FExponent -> RD.d2s f
FFixed ->
case specialStr f of
Just b -> b
Nothing -> sign f `mappend` showFixed m e' prec
where (RD.FloatingDecimal m e) = RD.d2Intermediate f
e' = R.int32ToInt e + R.decimalLength17 m

-- | Char7 encode a 'Char'.
{-# INLINE char7 #-}
char7 :: Char -> Builder
char7 = BP.primFixed BP.char7

-- | Char7 encode a 'String'.
{-# INLINE string7 #-}
string7 :: String -> Builder
string7 = BP.primMapListFixed BP.char7

-- | Encodes a `-` if input is negative
sign :: RealFloat a => a -> Builder
sign f = if f < 0 then char7 '-' else mempty

-- | Special rendering for Nan, Infinity, and 0. See
-- RealFloat.Internal.NonNumbersAndZero
specialStr :: RealFloat a => a -> Maybe Builder
specialStr f
| isNaN f = Just $ string7 "NaN"
| isInfinite f = Just $ sign f `mappend` string7 "Infinity"
| isNegativeZero f = Just $ string7 "-0.0"
| f == 0 = Just $ string7 "0.0"
| otherwise = Nothing

-- | Returns a list of decimal digits in a Word64
digits :: Word64 -> [Int]
digits w = go [] w
where go ds 0 = ds
go ds c = let (q, r) = R.dquotRem10 c
in go ((R.word64ToInt r) : ds) q

-- | Show a floating point value in fixed point. Based on GHC.Float.showFloat
showFixed :: Word64 -> Int -> Maybe Int -> Builder
showFixed m e prec =
case prec of
Nothing
| e <= 0 -> char7 '0'
`mappend` char7 '.'
`mappend` string7 (replicate (-e) '0')
`mappend` mconcat (digitsToBuilder ds)
| otherwise ->
let f 0 s rs = mk0 (reverse s) `mappend` char7 '.' `mappend` mk0 rs
f n s [] = f (n-1) (char7 '0':s) []
f n s (r:rs) = f (n-1) (r:s) rs
in f e [] (digitsToBuilder ds)
Just p
| e >= 0 ->
let (ei, is') = roundTo 10 (p' + e) ds
(ls, rs) = splitAt (e + ei) (digitsToBuilder is')
in mk0 ls `mappend` mkDot rs
| otherwise ->
let (ei, is') = roundTo 10 p' (replicate (-e) 0 ++ ds)
(b:bs) = digitsToBuilder (if ei > 0 then is' else 0:is')
Copy link
Contributor

@Bodigrim Bodigrim Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises Pattern match(es) are non-exhaustive with GHC 9.2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm is there a way to convince GHC that the list is non-empty? Looking at the implementation of roundTo, the non-throwing return values are (0, _) and (1, 1:_). Before the call to digitsToZero, we prepend 0 to the former case's snd value so AFIACT this is 'correct'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just mimic the case above or add some redundant matches e.g

diff --git a/Data/ByteString/Builder/RealFloat.hs b/Data/ByteString/Builder/RealFloat.hs
index cd0d39d..0dadf45 100644
--- a/Data/ByteString/Builder/RealFloat.hs
+++ b/Data/ByteString/Builder/RealFloat.hs
@@ -258,8 +258,8 @@ showStandard m e prec =
            in mk0 ls `mappend` mkDot rs
       | otherwise ->
           let (ei, is') = roundTo 10 p' (replicate (-e) 0 ++ ds)
-              (b:bs) = digitsToBuilder (if ei > 0 then is' else 0:is')
-           in b `mappend` mkDot bs
+              (ls, rs) = splitAt 1 $ digitsToBuilder (if ei > 0 then is' else 0:is')
+           in mk0 ls `mappend` mkDot rs
           where p' = max p 0
   where
     mk0 ls = case ls of [] -> char7 '0'; _ -> mconcat ls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying that your code is wrong, I'm saying that we need to get rid of a warning. Boot packages cannot have warnings. Throwing an error for an empty list would be fine, for instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this warning needs to be silenced. Maybe this would be a good fit for Data.List.NonEmpty?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the snippet above. A lot of the code in GHC.Float makes the same assumption though so do they just turn the warning off?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That module uses the pragma

{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-}

in b `mappend` mkDot bs
where p' = max p 0
where
mk0 ls = case ls of [] -> char7 '0'; _ -> mconcat ls
mkDot rs = if null rs then mempty else char7 '.' `mappend` mconcat rs
ds = digits m
digitsToBuilder = fmap (char7 . intToDigit)

Loading