-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added libtest_json_output #2234
Conversation
Would there be a way to determine which file a test is coming from? I'd like to display errors inline, but just having the test name doesn't seem to be enough information to do that. |
Currently, this is something the test infrastructure does not provide, and I'm not sure if it can be easily added. |
After our discussion at alternate-libtest-output-format/6121 I looked at the implementation to see how difficult it would be to add filename/line number information. I don't have any real experience with the Rust internals, so it's difficult for me to say for sure, but it looks like it shouldn't be too difficult to extend TestDesc to include that information. I'm not sure if extending this is outside of the scope for this RFC. However, my concern is that the "name" parameter is not unique. If we add file/line later, would every JSON message duplicate that information? As an aside, adding line number information would also help with implementing rust-lang/rust#45765. |
I think it's a great idea, which I would love to take care of, but this is a bit outside of the scope of this RFC. I'm not an expert of APIs, but I think it might not be an API breakage to add another field to the json output |
@Gilnaa, what is the proposed format based on? Is it the same as some already established test framework? If yes, which one? If not, why not? |
In this RFC, don't forget to account for the needs of crates providing property based testing such as proptest and |
@jan-hudec I admit I did not do a lot of research, but I can't say I know of a testing framework using JSON. Mostly I found that gtest can output XML and of TAP. The proposed format is mostly based on suggestions from other RFCs. @Centril If I understand correctly, it is currently impossible to do what you suggest, since I have no indication of when an assert succeeds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a formal and full specification of the used JSON schema.
|
||
Using this flag, the format can be selected from one of the following: | ||
- `pretty`: The default setting; print test results in a detailed manner. | ||
- `terse`: Equivalent to `-q`; display one character per test instead of one line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a new addition as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, it's just the old -q that libtest already has.
It was present in the PR as well.
|
||
Each JSON object starts with the "type" property, which is one of the following: | ||
|
||
- `suite`: Events regarding the whole testing suite; consists of a header specifying the amount of tests to be performed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be multiple test suites, right? It seems like currently we run unit tests, integration tests, and doc tests separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No as far as I'm aware.
In the context of libtest there's nothing really named a "testing suite", there are just slices of function pointers.
Maybe the name is misleading
|
||
- `suite`: Events regarding the whole testing suite; consists of a header specifying the amount of tests to be performed, | ||
and a footer with results summary. | ||
- `test`: A change in a test's state, along with its name. Can be one of the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "its name" here? A unique identifier for the test? We have one of these, right? I assume it only needs to be unique in the current suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially it's the test's name+path.
I'm pretty sure they're unique, as Rust paths are mostly unique, but it requires a second check.
text/0000-libtest-json-output.md
Outdated
- `allowed_failure` | ||
- `timeout` | ||
- `bench`: Benchmark results, specifying the median time and the standard deviation | ||
- `test_output`: The stdout of a failed test, if available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was part of a lengthy discussion in today's dev-tools meeting. I'll write a top-level comment on this later.
|
||
- This proposal adds a new API to which the toolchain must adhere, increasing the chance of accidental breakage in the future. | ||
|
||
# Rationale and alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add "Prior Art" with links to all the stuff that other people did? (Can be collected from comments here!) #1284 mentioned https://github.com/rubyworks/tapout/wiki/TAP-Y-J-Specification for example
Some prior art links: TAP: http://testanything.org/tap-specification.html JSON format from Dart: https://github.com/dart-lang/test/blob/master/doc/json_reporter.md |
Hm, and of course #1284 :) |
As mentioned in #2234 (comment) there was a bit of a discussion around how best to capture a test's output (to stdout/stderr). The use case is that the user want to see immediately (in their IDE which consumes the JSON-based test output) when a test outputs something, so they don't need wait for the whole (possibly very slow) test to run to see if there was an error. Basically, printf debugging of tests. Technically, each test needs to run in its own process, so it can have its own stdout/stdin buffers (threads share stdout/stderr!). I believe this is already the case (as cargo test is able to capture stdout/stderr). Depending on our goals we can either capture this and output the buffer contents after the test is finished, or stream the output. I propose the following:
Another interesting issue that was brought up was: What if a test starts a thread, and uses @matklad what do you think? |
@killercup reading the source of libtest reveals a couple of problems:
@killercup 's suggestion looks excellent, if we can implement it :) |
@matklad Interesting. (I'll leave it at that…) As these are limitations of the test runner, and this RFC is more about the format (or, well, it should be :)), do you think we can live with them for now? Dart's JSON format look really great, by the way. I don't think we need all of it, but it may be worthwhile to try to keep our schema compatible with theirs. Their streaming output schema only differentiates between "print" and "skip message", so we'll need to expand that. |
I am afraid that with these limitations IntelliJ won't be able to take advantage of JSON format at all :( So, this might be useful for someone, but not for us. Actually, if not for these limitations, we would have parsed current text-based output: it's not as pretty as parsing JSON, but certainly doable. I do think that we need to find at least one client (not necessary IntelliJ) for this feature before stabilizing it. |
Wait, which limitation exactly? Does the thread-local-stdout-capture workaround not work properly enough to allow streaming, or is the test started thing messing everything up? I do think this is independent of the RFC, though, and should actually be fixed regardless of what happens here. Let me pick up something from the dev tool team meeting: I proposed for this to become an experimental RFC (or eRFC), because, while the format itself is not decided on, the motivation, and the need for machine-readable test output is there. We settled on landing this behind an unstable feature flag, iterate on the design, and then use this RFCs to stabilize it. If we need to refactor libtest to do this, this will take longer than I'd like, but would still fit into that plan.
I agree. A good first demo might be a CLI tool that reads the JSON test output and converts it to human readable stuff similar to what we currently output, but demo CLI tools are of course not the primary targets of this! I also think IntelliJ could use this feature before output streaming is available, but I'm of course in no position to tell you what to do :) (My only argument: Displaying output after the test run is still strictly better than not supporting running tests at all.) |
This one really throws a wrench into test reporting. I might be misreading the code though.
We do support running tests, we just can't generate nice reports and show which tests are in progress, which are done and which have not started yet. |
Wait, "printing test started after the test is done" is actually only relevant for human reporter (because you want |
Cool! "Suboptimal" means we can try this out and iterate on it :)
Aleksey Kladov <[email protected]> schrieb am Di. 12. Dez. 2017 um
00:56:
… Wait, "printing test started after the test is done" is actually only
relevant for human reporter (because you want test foo::bar::baz ... and
ok or failure bits to be on the same line), this limitation should be
easily lifted for JSON messages, and with that, we will be able to improve
test integration on the IntelliJ side, though the handling of stdout/stderr
would be suboptimal.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2234 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AABOX2B-49l4G1mgKubUdXE8JtNFKBt_ks5s_cE2gaJpZM4Q1Kop>
.
|
A lot of issues mentioned here and on the PR actually touch core problem with libtest. It seems like this RFC should not be merged until there's an effort to redesign libtest. @killercup I'm all for putting the output of a test inline. I'm not sure why I didn't think of that. Maybe I just followed too closely the pretty printing. About serde. I wrote a quick trial of libtest using serde, but didn't manage to compile it. Something about serde_derive not being able to use/find the proc_macro crate. (Sounds like something to do with the compiler's stages, IDK) @Centril I'm not familiar with the code proptest generates, but if it boils down to generating #[test] decorated functions, I'm not sure how libtest will be able to implement what you're suggesting currently. Nonetheless, we might want the JSON API to support it, so alternative test runners who do support this can output more information. I'll try to update according to your comments soon (+demo implementation), but I'm traveling a lot this week and so am laptop-less. |
It seems some of the problems of libtest stem from the fact that it uses panics (mainly through asserts) to indicate failures, which is problematic for a few reasons. Among other things, it prevents the existence of "soft-asserts", (I think Catch/gTest calls these REQUIRE) or any kind of test-specific indication. Not to say libtest shouldn't try to catch panics (it should), but it's not the best way to communicate. (I can attest that I'm not the best communicator when I'm panicking) It's not the prettiest solution, but maybe we can add an option for a #[test] function that receives some kind of struct/callback from libtest as a parameter.
But this is also problematic, since libtest is not stable at all (and currently test-functions are effectively blind to their context) |
@Gilnaa That is what For PBT, I think this would be useful (perhaps too much): {
"type": "test_output",
"name": "reverse_reverse_eq",
"output": "thread 'will_fail' panicked at 'assertion failed: false', f.rs:12:1\nnote: Run with `RUST_BACKTRACE=1` for a backtrace.\n",
"runner": "proptest",
"runner_type": "property_based_testing",
"minimal_input": "Point { x: 1, y: 0 }",
"passed": 78,
"ignored": 10,
"required": 100,
} And some events: { "type": "test", "runner": "proptest", "runner_type": "property_based_testing", "event": "shrinking", "name": "complicated" }
{ "type": "test", "runner": "proptest", "runner_type": "property_based_testing", "event": "shrinking", "name": "simplified" }
{ "type": "test", "runner": "proptest", "runner_type": "property_based_testing", "event": "shrinking", "name": "found_minimal" } Also, on an unrelated note, I think it would be helpful if the trait introduced in the reference level explanation had more motivation and explanation - you motivate that there's problem to solve, but not why the proposed trait solves this in the best way possible - the rationale should also talk about this. |
Yeah, I'll add an explanation. The trait is basically modeled after libtest current behavior and design which is described mostly by the following callback: (and then some) If we keep the current design of libtest (which we do not have to, since the library is unstable), this trait is the most proper solution, but if there's a major refactoring, the technical details of this RFC are slightly wrong. |
@Gilnaa As long as the changes you propose are reasonably forward-compatible so that it could support the extra information I listed in the future, I think that's fine. Having |
I'll admit I'm not sure what these are for or what the IDE should do with them, can you elaborate? |
@Gilnaa Given
or some string with useful information that we can't think of right now. The IDE or the command line should dump these bits to the user in some nice readable way.
Without these statistics, your testing may be slow but you have no way to improve things because you lack the relevant information. |
And are they supposed to free-form fields or a closed set? |
@Gilnaa I was thinking the former as different runners may provide different sets of information... tho it could be a closed set of optional fields so you can range between a minimal set and and a maximal set. A set of free-form fields may be more expressive but harm the consistency of analysis. |
Not sure there's a need to specify it for each test, right? You can add it to the first (header-like) JSON line. |
@Centril I'd rather punt on needs of property based tests for now: even if we support this in JSON protocol, I doubt it there will be a lot of clients that would understand this information. Certainly, there's no UI in IntelliJ where we could show this information, so the best that a property test can do is to print this information in a human readable form to stdout. And it's always possible to add more fields to existing messages or to add new types of messages. Disclaimer: I am myself addicted to property based testing and I find hypothesis a godsend, but I want to have some form of machine readable output sooner rather then later, because this was blocked on custom everything forever. |
@Gilnaa To be explicit/clear, each @matklad I'm down with that so long as we are forward compatible with adding the things I enumerated. I don't wish to be in the way of stabilization =) |
@Centril yeah, ofc extras could be "anywhere", I talked about the runner type. Edit: |
Would the following be amenable to you?: #[nonexhaustive]
pub enum RunnerClassification {
Unit,
PBT,
//...
}
pub struct RunnerIdentifier {
identifier: String,
type: RunnerClassification // Or perhaps just String.
}
pub struct TestDesc {
pub name: TestName,
pub ignore: bool,
pub should_panic: ShouldPanic,
pub allow_fail: bool,
pub runner: Option<RunnerIdentifier>, // <-- Added, rest is the same for this type.
}
pub struct TestResult {
classification: TestResultClassification,
extras: HashMap<String, String> // Or some similar type for free-form key-value extras.
}
pub enum TestResultClassification {
TrOk,
TrFailed,
TrFailedMsg(String),
TrIgnored,
TrAllowedFail,
TrMetrics(MetricMap),
TrBench(BenchSamples),
}
// OutputFormatter remains as is. If the trait is stabilized as specified in the RFC currently, then we are not forward-compatible since changes to Also - re-checking the trait, I'm not sure how it supports the "event-notifications" you are proposing in the guide level explanation. |
The context could be in thread-local storage. That would avoid the need for passing an object around. It would not allow easily doing checks from threads spawned by the test, but neither do panics in the current state. |
@Centril the trait is mostly an implementation detail and should not be relevant to the resulting JSON, it is not public, and thus it is not an API to break it. The events are logged using mostly @jan-hudec Yeah, that's a great idea. |
Oh I see. I thought it was generic for all runners and all formats, wherefore my confusion. Can you make this more clear in the text? |
Yes, just to clarify: This RFC is intended only for the simple, plain libtest to produce simple json. |
We discussed this yesterday in the dev-tools meeting. We thought that the associated Rust PR should land without an RFC. Long-term, machine-readable output should be part of the larger plan for pluggable test runners. We will probably want to specify some kind of standardised output or API and this RFC is probably a good starting point. However, I think we should get some experience with the initial implementation and work out the larger story around pluggable test runners. Therefore, I'm closing this RFC for now. |
Rendered
Proposed implementation:
rust-lang/rust#46450 (comment)