-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple top level JSON arrays for UnwrapArray #438
Support multiple top level JSON arrays for UnwrapArray #438
Conversation
b862137
to
2f10867
Compare
Yeah this is wildly tricky. I'm not sure whether or not this is the right approach to it. Definitely works for this case, but in general you're talking about broadly unwinding and incrementally flattening nested data structures, and there are a lot more cases than just this one. It's also not clear that it's always the right thing to do to just recursively and dynamically flatten in this fashion. The right™ solution is probably to have some more controllable output layers and fewer assumptions in those layers around finity of the structures in question. This is basically the Tectonic approach to the problem. This would require some significant (and breaking) changes to the facade though, and it's also not entirely clear that people necessarily want the API that it would generate. So that brings us back to this PR. Honestly I'm not sure what the right answer is. Like I said, this works but it hard-codes a very specific semantic and it's a bit philosophically odd. It's hard for me to say for certain whether it would be better to focus more on a "Jawn 2.0" which solves some of these issues, or to simply go with more incremental fixes like this PR. |
@djspiewak Thanks for the reply! Would you be happy if I adjusted the PR so that this behaviour is enabled via an explicit parameter that by default is false? In regards to your points about Jawn 2.0, I definitely agree and there is an argument that we can also generalise the solution to use JSON path selectors like jsurfer does so you can even selected a nested JSON array without having to rely on it being top level. For now though if appropriate I would push for an incremental solution. Not having this is actually blocking/causing other issues for my use case which I explained earlier. |
2f10867
to
fc8e8c6
Compare
I just rebased the PR, the only change was adding a type annotation since CI complained about it. |
@djspiewak Is there any update on this? |
Pinging |
The readme https://github.com/typelevel/jawn#parsing currently says:
so I guess the new parameter should at least be documented. Or better yet, would it make sense to add a new mode like |
Yes agreed, I will update the PR to try and add the extra parameter and document it appropriately, if that's not possible I will make a new mode but to me the extra parameter sounds better. |
fc8e8c6
to
2beb73f
Compare
@eed3si9n In the end your suggestion of Let me know if anything else is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending CI
So mima has failed, i.e.
Is this an issue, I think it may be due to |
Given that the constructor is protected it should be ok to ignore this, but I guess you can provide an overload constructor if you want to be more safe? |
I am trying this now but due to the fact that def apply[J](var state: Int, ...): AsyncParser[J] or def this[J](var state: Int, ...): AsyncParser[J] Is not valid Scala and I can't think of another way to do this (honestly this should be Should I add a mima |
72af0e0
to
16d31af
Compare
So in addition to rebasing off the latest |
fe9f6c3
to
766806f
Compare
So the latest bincompat issue from mima came from the Its a bit silly and there is code repetition but at least the mima check should pass now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused why a secondary constructor with the old signature cannot be added. There is no need to use var
in the constructor arguments; they can simply be unqualified arguments (like a method signature) that are passed to the primary constructor.
766806f
to
cac2888
Compare
Indeed you are correct, I didn't know about Scala's unqualified arguments. I added a second |
sealed abstract class Mode(val start: Int, val value: Int) | ||
case object UnwrapArray extends Mode(-5, 1) | ||
case object UnwrapMultiArray extends Mode(-5, 1) | ||
case object ValueStream extends Mode(-1, 0) | ||
case object SingleValue extends Mode(-1, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this change is not backwards compatible, since it adds a new member to a sealed hierarchy. This will break any previous total pattern matches since they will not be accounting for this new case.
See:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what is your recommendation? When reading lightbend-labs/mima#200 (comment) there is an argument that this is out of scope for mima because its a compilation error, not a linking error which is historically what mima was responsible for.
I also agree with the general sentiment there which is that warning in this case seems sensible however treating it the same as the other mima errors is out of scope of mima.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it is not a linking error, but it can still crash at runtime. If I wrote a pattern match that checks for UnwrapArray
, ValueStream
, and SingleValue
, and assumes those are the only 3 possible cases (as promised by sealed
) then that code is in big trouble if it encounters an instance of UnwrapMultiArray
.
My recommendation is to find a way to make this change backwards-compatibly :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So afaik the only way to make this binary compatible while still making sure multiValue
is explicit (i.e. not changing current Jawn behavior) is the initial implementation which was actually changed because design wise its worse, i.e.
sealed abstract class Mode(val start: Int, val value: Int)
case object UnwrapArray extends Mode(-5, 1)
case object ValueStream extends Mode(-1, 0)
case object SingleValue extends Mode(-1, -1)
def apply[J](mode: Mode = SingleValue): AsyncParser[J] =
new AsyncParser(
state = mode.start,
curr = 0,
context = null,
stack = Nil,
data = new Array[Byte](131072),
len = 0,
allocated = 131072,
offset = 0,
done = false,
streamMode = mode.value,
multiValue = false
)
def apply[J](mode: Mode, multiValue: Boolean): AsyncParser[J] =
new AsyncParser(
state = mode.start,
curr = 0,
context = null,
stack = Nil,
data = new Array[Byte](131072),
len = 0,
allocated = 131072,
offset = 0,
done = false,
streamMode = mode.value,
multiValue = multiValue
)
}
The reason why this is worse is because multiValue
has no meaning for other Mode
's that is not ValueStream
so its misleading. Furthermore due to how overloading works you cannot have a duplicate default parameter for mode: Mode
(which is why the second new apply
is missing the = SingleValue
default param).
There was also another idea which is the making ValueStream
a case class
that takes a multiValue
parameter but afaik even with default parameters that is not binary compatible at all (in the traditional/"correct" sense).
If you have any other ideas/suggestions that would be greatly appreciated because otherwise I think I am kind of stuck here? I guess I can add something like AsyncParser.unwrapMultiArray
function although it still kind of looks weird.
So I just added another commit which implements the secondary apply for the |
9c09847
to
ca3c0cf
Compare
@armanbilge Is there anything else needed for this PR? |
I'm not really a maintainer here :) I believe as-written it is backwards compatible.
I'm glad it solves your problem, but do you mind if I ask: do you believe that this is a good change for this library overall? I thought Daniel's review in #438 (comment) raised some important points.
Since there are many possible behaviors here, we must be careful not to box ourselves with compatibility constraints. |
Ah okay, I thought you are a maintainer.
Oh definitely I agree with Daniel however as he also said I don't think it's possible to solve these problems in a principled way without either breaking binary compatibility or introducing an API which Daniel contemplated would be better solved for a Jawn 2.0 (and I also suggested RFCs that have standards on how to deal with nesting of JSON structures). My intention with this PR is to unblock the specific problem I have in a way that is still reasonable, I don't have any problems with a new API/better design but that dramatically increases the scope (and judging from Daniel's comment earlier he also seems to be on the same page and is open to a quick fix as long as its broadly sensible). |
@rossabaker @eed3si9n Do you mind reviewing/re-reviewing the PR since I believe you are the currently active maintainers? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I am only nominally a maintainer, and haven't been active, but overall I think this change is fine. |
8550e0d
to
a8f809f
Compare
a8f809f
to
d2493b4
Compare
@eed3si9n Thanks for the review, just rebased and pushed the PR with your changes @rossabaker Would it be possible to look/review this PR. You had a brief look at it previously when doing the new release and the issues back then have been resolved |
Thanks for merging this through, really appreciate it! |
Currently Jawn's
AsyncParser
inUnwrapArray
(which is designed to stream individual JSON elements from a top level JSON array) only supports having a single top level JSON array in the byte stream.While although on the surface this may appear to be sensible it ends up causing issues when you use Jawn's
AsyncParser
in actual streaming scenarios. In a lot of streaming cases there isn't really an "end", you are just fed with a constant stream of bytes and in some cases (depending on the streaming library you use) you may end up combining multiple JSON arrays into a single stream (i.e. you have a single JSON array per file but you want to stream one file after another, streaming libraries will typically combine this into a single massive byte stream).As a comparison, Akka's Alpakka JSON streaming which uses jsurfer underneath doesn't exhibit this limitation (see akka/alpakka#2830 and https://discuss.lightbend.com/t/working-with-eof-for-a-source-flow/9481).
Originally I tried to work around this in my own library https://github.com/mdedetrich/akka-streams-json however since we are dealing with read only
ByteBuffer
's its not really possible/sensible to peek into the byte array ahead of time to see if you are going to hit another[
(indicating multiple top level JSON arrays).So I ended up creating this PR which covers this exception case only if you happen to be using
AsyncParser.UnwrapArray
. Currently the PR hardcodes this exception however there is also an argument that you can addmultiValue
as a parameter toAsyncParser.apply
if requested (albeit its a bit complicated if you want to use default parameters because afaik default parameters can't depend on another value in the parameter list and ideally such a default parameter would bemode == UnwrapArray
however one can argue that you can just be explicit with a default value offalse
).Property based tests have also been added to
SyntaxCheck
to verify this behavior.