Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add unknown field support #775

Open
pavelstudeny opened this issue Apr 24, 2017 · 20 comments · May be fixed by #779
Open

add unknown field support #775

pavelstudeny opened this issue Apr 24, 2017 · 20 comments · May be fixed by #779

Comments

@pavelstudeny
Copy link

protobuf.js version: all, up to the current 6.8.0

proto2 preserve unknown fields for future serialization. This is also planned for proto3 in the near future: https://docs.google.com/document/d/1KMRX-G91Aa-Y2FkEaHeeviLRRNblgIahbsk4wA14gRk/edit#heading=h.w8dtggryroj4

I'm working on a merge request that will add unknown fields when decoding as

_unknownFields: { <id|wireType>: }

This should be doable by add read_type_bytes() to Reader and using it instead of skipType. The _unknownField can be later serialized again or discarded by a new method discardUnknownFields().

Tests:

var proto = "message Simple_v1 {\
  optional string name  = 1;\
  optional string value = 3;\
}\
message Simple_v2 {\
  optional string name  = 1;\
  optional int32  flags = 2;\
  optional string value = 3;\
}";

var msg = { inner: [{}, {}, {}] };

tape.test("unknown fields", function (test) {
    var root = protobuf.parse(proto).root,
        Simple_v1 = root.lookup("Simple_v1");
        Simple_v2 = root.lookup("Simple_v2");

    var s2 = Simple_v2.create({ name: "v2", flags: 2, value: "dummy" });
    var s1 = Simple_v1.decode(Simple_v2.encode(s2).finish());

    var restored = Simple_v2.decode(Simple_v1.encode(s1).finish());

    test.equal(s2.name, restored.name, "assert: even known fields are missing");

    test.equal(2, restored.flags, "are preserved by default");
    test.end();
});

tape.test("discarded unknown fields", function (test) {
    var root = protobuf.parse(proto).root,
        Simple_v1 = root.lookup("Simple_v1");
    Simple_v2 = root.lookup("Simple_v2");

    var s2 = Simple_v2.create({ name: "v2", flags: 2, value: "dummy" });
    var s1 = Simple_v1.decode(Simple_v2.encode(s2).finish());

    try {
        s1.discardUnknownFields();
    }
    catch (ex) {
        test.end("discardUnknownFields() exception: " + ex);
        return;
    }

    var restored = Simple_v2.decode(Simple_v1.encode(s1).finish());

    test.equal(undefined, restored.flags, "are removed from the message");
    test.end();
});

Does it sound OK?

@dcodeIO
Copy link
Member

dcodeIO commented Apr 24, 2017

I have a feeling that this will come at quite a performance hit. For example, encoders serialize fields in ascending field id order, as of the spec. Making this work with unknown fields might be quite a challenge without hurting performance significantly.

We'd probably be better off ignoring order of unknown fields completely, simply appending uninterpreted buffer slices to the final buffer again. skipType could, for example, in addition to skipping over the data, just return the respective buffer slice. These slices would then be stored in an array without any id/wiretype association required, for the encoder to append.

@pavelstudeny
Copy link
Author

pavelstudeny commented Apr 25, 2017

Yes, implementations for other languages also place unknown fields at the end, so protobuf.js could follow the habit. This should also remove any performance impact, I hope.

May be there should anyway be a global option to ignore the unknown fields, in addition to an explicit discard?

Does it sound OK that the discardUnknownFields method would be on the decoded data instance (s1/s2 above) or should it rather stay at the type descriptor (Simple_v1/Simple_v2)?

@dcodeIO
Copy link
Member

dcodeIO commented Apr 25, 2017

May be there should anyway be a global option to ignore the unknown fields, in addition to an explicit discard?

Yeah, why not.

Does it sound OK that the discardUnknownFields method would be on the decoded data instance (s1/s2 above) or should it rather stay at the type descriptor (Simple_v1/Simple_v2)?

Instances don't have any methods on them by default (except .toJSON, which must be on the instance). But this could also be just a global helper function, as it basically just does a delete someMessage._someProperty.

@pavelstudeny pavelstudeny linked a pull request Apr 26, 2017 that will close this issue
@pavelstudeny
Copy link
Author

How about #779?

I hope the CODECLIMATE_REPO_TOKEN is a problem on the test machine.

I had to remove the first variant of writeBytes in writer.js. It looked like a mere optimization, that was however never called and was failing on nodes 4-. It was also ignored by test coverage - that's why the existing tests couldn't find the problem.

@pavelstudeny
Copy link
Author

Just wondering about a general opinion, whether the solution looks acceptable as it is or needs any changes?

@dcodeIO
Copy link
Member

dcodeIO commented May 10, 2017

Sorry, haven't yet got around to check. Being a bit lazy here because it's not in the official implementation yet, but I appreciate your efforts!

@pavelstudeny
Copy link
Author

Fair enough for proto3, if the upcoming specification it's going to define API calls for this feature. This, however, also fulfills the requirement for proto2, that is also handled by protobuf.js.

@niicojs
Copy link

niicojs commented Aug 11, 2017

I would need this also. Is your PR usable already?

@pavelstudeny
Copy link
Author

It's complete and usable, although I'm not using it myself, as dcodeIO may decide to modify it or come up with a different solution.

@robin-anil
Copy link
Contributor

I would love to +1 this. Currently, the parse fails with unknown fields in the byte stream. this is a hallmark feature of protobufs in other languages that allows developers to push different applications without breaking.

@chakmingli
Copy link

It's been a while. Will this change be accepted?

@pavelstudeny
Copy link
Author

Protobuf 3 preserves unknown fields by default since release 3.5.0 from 2017-11-13:

Unknown fields are now preserved in proto3 for most of the language implementations for proto3 by default.

This should be a reason to accept pull request #779 or point out what shoul be improved.

@jcgertig
Copy link

BUMP this is very needed

@akshah123
Copy link

This would be very useful feature. Are there any plans to have this implemented or a decision to not implement it?

@akshah123
Copy link

@dcodeIO thoughts on what the plan is regarding this feature request?

@alexander-fenster
Copy link
Contributor

@akshah123 Just FYI, we at Google Cloud client libraries will likely need this implemented sooner rather than later, so we'll likely spend some time in December looking into this.

@jcgertig
Copy link

jcgertig commented Nov 16, 2020 via email

@Asafb26
Copy link

Asafb26 commented Oct 17, 2021

Anything new here? Is this PR will ever get merged?

2 similar comments
@yjqiang
Copy link

yjqiang commented Jan 26, 2022

Anything new here? Is this PR will ever get merged?

@bogdan-khitsenko
Copy link

Anything new here? Is this PR will ever get merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.