Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix json encoding of NaN/infinity #623

Merged
merged 1 commit into from
Sep 23, 2022
Merged

Fix json encoding of NaN/infinity #623

merged 1 commit into from
Sep 23, 2022

Conversation

WardBrian
Copy link
Member

Submission Checklist

  • Run unit tests
  • Declare copyright holder and open-source license: see below

Summary

We implemented our json encoding based on the spec given in the Cmdstan Users Guide, which was incorrect. In particular, the value "inf" does not parse. The default values Python serializes these to does, so we should just leave it as it is.

I also removed ujson as part of this fix. ujson implemented differing behavior from Python for infinity which meant round-tripping values was not possible. This was first noted in April of 2013, and did not get fixed until ujson 5.5.0 released last week. So, the options were either pin >=5.5.0, or just remove it.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):

Simons Foundation

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@ahartikainen
Copy link
Contributor

Do we read or write any large json files in the package?

@WardBrian
Copy link
Member Author

We will write large files if the user has large data they're creating in Python

@ahartikainen
Copy link
Contributor

Then maybe we should at least have some library that can write those in reasonable time?

There are also other possibilities (not ujson)

@WardBrian
Copy link
Member Author

I would be interested in doing some benchmarking, but my guess is that the preprocessing we need to do (such as turning all numpy/pandas objects into lists) is probably nearly as expensive as the serialization.

Its probably worth looking into options other than ujson. I'd like at least:

  1. Minimal/no dependencies
  2. Matches python on these edge cases
  3. Supports custom encoding like the built-in json does. If the user had data which had type complex anywhere in it, we were previously calling ujson, then, if that failed, calling standard json with our custom complex encoding.

But I propose that search happens separately

@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2022

Codecov Report

Merging #623 (92f5ab3) into develop (c2bab85) will decrease coverage by 0.10%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop     #623      +/-   ##
===========================================
- Coverage    80.29%   80.18%   -0.11%     
===========================================
  Files           69       69              
  Lines        10386    10326      -60     
===========================================
- Hits          8339     8280      -59     
+ Misses        2047     2046       -1     
Impacted Files Coverage Δ
a/cmdstanpy/cmdstanpy/cmdstanpy/model.py 89.46% <0.00%> (-1.00%) ⬇️
cmdstanpy/cmdstanpy/model.py 88.66% <0.00%> (ø)
cmdstanpy/cmdstanpy/utils/stancsv.py 96.91% <0.00%> (ø)
a/cmdstanpy/cmdstanpy/cmdstanpy/utils/stancsv.py 96.91% <0.00%> (ø)
...runner/work/cmdstanpy/cmdstanpy/cmdstanpy/model.py 88.66% <0.00%> (ø)
...ork/cmdstanpy/cmdstanpy/cmdstanpy/utils/stancsv.py 96.91% <0.00%> (ø)
cmdstanpy/cmdstanpy/utils/json.py 96.29% <0.00%> (+2.67%) ⬆️
a/cmdstanpy/cmdstanpy/cmdstanpy/utils/json.py 96.29% <0.00%> (+2.67%) ⬆️
...r/work/cmdstanpy/cmdstanpy/cmdstanpy/utils/json.py 96.29% <0.00%> (+2.67%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@WardBrian WardBrian merged commit fb3dfe2 into develop Sep 23, 2022
@WardBrian WardBrian deleted the fix/json-encoding branch September 23, 2022 16:25
@WardBrian WardBrian mentioned this pull request Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants