C++: Constant type-bounds in the new range analysis #13783

MathiasVP · 2023-07-20T15:13:53Z

This PR adds type-based bounds in the new range analysis library. This allows us to deduce that fewer things overflow, and thus allows us exclude fewer bounds.

I'll test this by locally rebasing this branch onto #12505 and running DCA on that.

geoffw0 · 2023-07-20T15:39:15Z

cpp/ql/test/library-tests/ir/range-analysis/SimpleRangeAnalysis_tests.cpp

@@ -936,7 +936,7 @@ void two_bounds_from_one_test(short ss, unsigned short us) {
    range(ss); // -32768 .. 32767
  }

-  if (ss + 1 < sizeof(int)) {  // $ overflow=+
+  if (ss + 1 < sizeof(int)) { // $ overflow=-


👍 verified experimentally

MathiasVP · 2023-07-22T08:16:35Z

I've verified the results and they LGTM. ~~I'm still a bit unsure about whether we should suppress constant bounds arising purely from types such as:~~

~~void f(unsigned int ui) { unsigned long long ull = ui; range(ull); // we infer that `ull <= UINT_MAX` here. Is that what we want? }~~

~~In any case, I think we can push this feature to a future PR. I've noted this down in our internal issue.~~

Edit: We now let each query do that using the TypeReason mechanism added in #13880. 9807c0b makes use of this in this PR for the MCTV queries.

geoffw0 · 2023-07-24T13:07:25Z

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

jketema

Some small comments, just to further my understanding of the code.

cpp/ql/lib/semmle/code/cpp/rangeanalysis/new/internal/semantic/SemanticExprSpecific.qll

cpp/ql/test/library-tests/ir/range-analysis/SimpleRangeAnalysis_tests.cpp

MathiasVP · 2023-07-24T13:19:50Z

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

Because it's not a very precise bound, and many users have been confused by false positives caused by such very-large-but-sound bounds obtained from type-information only. For example, this change was made because we wanted to be able to distinguish between precise bounds found by bounds from guards, and less precise bounds from type-information, for a high precision version of the cpp/overrunning-write query.

And then there's this beauty from an external contributor who had a similar problem: https://github.com/github/codeql/blob/main/cpp/ql/src/experimental/Security/CWE/CWE-561/FindIncorrectlyUsedSwitch.ql#L21

jketema · 2023-07-24T15:04:14Z

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

Because it's not a very precise bound, and many users have been confused by false positives caused by such very-large-but-sound bounds obtained from type-information only. For example, this change was made because we wanted to be able to distinguish between precise bounds found by bounds from guards, and less precise bounds from type-information, for a high precision version of the cpp/overrunning-write query.

And then there's this beauty from an external contributor who had a similar problem: https://github.com/github/codeql/blob/main/cpp/ql/src/experimental/Security/CWE/CWE-561/FindIncorrectlyUsedSwitch.ql#L21

Not as part of this PR, but as each bound comes with a reason, we might want to explore whether we can add a new reason for the type bounds that are introduced here.

jketema · 2023-07-25T10:34:16Z

The DCA alert changes do not make any kind of sense to me. Any clue what was going on there?

MathiasVP · 2023-07-25T11:33:58Z

The cpp/uncontrolled-arithmetic ones in Samate definitely LGTM since they're all in functions annotated as TPs.
- I think the lost result on abseil-cpp-linux is correct assuming MultType resolves to uint128. If it resolves to uint64_t I'm less sure. Will need to check that locally, I think.
Most of the results on cpp/constant-comparison for ImageMagick/ImageMagick also LGTM, but I'm still not sure about all of them.
The cpp/integer-multiplication-cast-to-long results LGTM. We're now able to infer more bounds, so we can infer more bounds for the operands of multiplications.
The cpp/overrunning-write results looks like FP results that comes from inferring upper bounds purely from the type. That's also how the query behaves on main, though, so that's not an issue.
The cpp/unbounded-write results LGTM. All the lost results are because we correctly infer a smaller upper bound based on a type promotion.
Same story for cpp/uncontrolled-allocation-size.
The new result on cpp/very-likely-overrunning-write for lubomyr/bochs is a TP, I think: pname has space for 6 chars, but port%d can have length strlen("port") plus the size of max(i + 1), and i is upper bounded by hub.n_ports (which is an unsigned char). So max(i + 1) = 256. So it does seem like the sprintf requires 4 + 3 + 1 = 8 (for the null terminator) to be safe.
- The new result on vim is absolutely horrible 😂. I can't parse that sprintf in my head, but I'm very sure the new results comes because we're able to infer that key_name[0] and key_name[1] can each be at most CHAR_MAX.

jketema · 2023-07-25T12:05:54Z

Thanks. I would be fine with having this merged, assuming DCA against main for both the nightly suite and the MCTV suite look fine. I'm not sure what @geoffw0 thinks?

MathiasVP · 2023-07-25T12:11:24Z

Good point. I actually didn't test this against main (I only tested it against #12505), but I'll start such a run right away.

MathiasVP · 2023-07-25T14:56:36Z

Uh oh. Since we now infer many many more constant bounds the cpp/constant-array-overflow query suddenly has a lot more results now. There's also two new FPs on cpp/invalid-pointer-deref with an alert message that reads very much like we're getting a type-based bound:

This read might be out of bounds, as the pointer might be equal to call to malloc + position + 65534.

I think that's another good reason to do what Jeroen said here:

Why wouldn't we want to deduce that ull <= UINT_MAX in the above example?

Because it's not a very precise bound, and many users have been confused by false positives caused by such very-large-but-sound bounds obtained from type-information only. For example, this change was made because we wanted to be able to distinguish between precise bounds found by bounds from guards, and less precise bounds from type-information, for a high precision version of the cpp/overrunning-write query.
And then there's this beauty from an external contributor who had a similar problem: https://github.com/github/codeql/blob/main/cpp/ql/src/experimental/Security/CWE/CWE-561/FindIncorrectlyUsedSwitch.ql#L21

Not as part of this PR, but as each bound comes with a reason, we might want to explore whether we can add a new reason for the type bounds that are introduced here.

But since this is causing FPs on cpp/invalid-pointer-deref it may be worth including the necessary changes to SemReason so that we can exclude these cases in the query.

MathiasVP · 2023-07-31T15:52:40Z

I've pushed two temporary commits to this PR: 54b903c and 59b7d3e.

Together, these filter out type-based bounds from the two MCTV queries that DCA showed had new FPs as a result of these changes. If DCA comes back happy I'll move those commits commits to another PR so that we can merge that one before continuing with this PR.

geoffw0 · 2023-08-02T15:08:35Z

I'm not sure what @geoffw0 thinks?

Sorry I didn't see the notification for this until now. I'll have a quick look at the DCA alert differences on the latest run, particularly for cpp/overrunning-write and cpp/unbounded-write...

MathiasVP · 2023-08-02T15:10:34Z

I'm not sure what @geoffw0 thinks?

Sorry I didn't see the notification for this until now. I'll have a quick look at the DCA alert differences on the latest run, particularly for cpp/overrunning-write and cpp/unbounded-write...

Please don't do so yet. I'm still fixing things. I'll move this to a draft to make this explicit.

MathiasVP · 2023-08-03T13:12:02Z

My long series of TEMP commits seems to have removed all the type-related FPs, and we're now left with four new results. I thought these were FPs similar to the other ones I've fixed in the earlier commits, but these turn out to be genuine (I mean ... they're still a FP wrt. the query, but the range analyses reported are all correct).

For example, one of the four new alerts is on this line where we report that the back_r used as the index into png_sRGB_table may have the value 65535, while the array is only 255 entries long (giving us an "off-by-65280" alert because 65535 - 255 = 65280). This very much looked like a type-bound was still somehow getting through to the query even though I've removed type-based bounds from the query in a7194c9 and 03b72eb.

However! After a long and painful debugging session I found this line which is the source of this reported 65535. So the result is "correct", although the path that would allow for this overflow is infeasible.

TLDR: It seems like things work now 👍. I'll cleanup this PR and take it out of draft later today.

MathiasVP · 2023-08-03T13:29:48Z

I've moved all the TEMP commits to a fresh PR: #13880. Once that PR is merged I can rebase this PR and pull this out of draft 🎉

…pper bound (and similarly for lower bounds).

MathiasVP · 2023-08-07T09:51:39Z

DCA looks good. We have the same set of results as the very first DCA result (all of which we verified). There was a slowdown on Wireshark which I could reproduce locally, and a subsequent DCA run didn't show any slowdown either. So I think this PR is good to go!

jketema

One question, otherwise LGTM.

jketema · 2023-08-07T09:57:34Z

...mmle/code/cpp/rangeanalysis/new/internal/semantic/analysis/RangeAnalysisConstantSpecific.qll

-  predicate hasConstantBound(SemExpr e, float bound, boolean upper) { none() }
+  predicate hasConstantBound(SemExpr e, float bound, boolean upper, SemReason reason) {
+    semHasConstantBoundConstantSpecific(e, bound, upper) and
+    reason instanceof SemTypeReason


I've trouble following this. Why os this restricted to SemTypeReason only?

Because semHasConstantBoundConstantSpecific only infers bounds based on type information. It's probably possible to push this condition into semHasConstantBoundConstantSpecific, but that then requires that semHasConstantBoundConstantSpecific knows about the SemReason type which it currently doesn't since that predicate is defined outside the CppLangImplConstant module (which instantiates those types).

Does that make sense?

So zero bounds are not constant?

semHasConstantBoundConstantSpecific just forwards to hasConstantBoundConstantSpecific (which is the predicate added here). The only thing that adds is the type bounds inferred from type conversions.

We briefly discussed this face-to-face. This is the correct thing to do here because as Mathias writes:

semHasConstantBoundConstantSpecific only infers bounds based on type information

We could extend this later and push down the reason argument, but that is not of immediate concern here.

…r-new-range-analysis" This reverts commit e9750af, reversing changes made to 37a5462.

github-actions bot added the C++ label Jul 20, 2023

geoffw0 reviewed Jul 20, 2023

View reviewed changes

MathiasVP changed the title ~~C++: Constant type bounds in the new range analysis~~ C++: Constant type-bounds in the new range analysis Jul 21, 2023

MathiasVP marked this pull request as ready for review July 22, 2023 08:13

MathiasVP requested a review from a team as a code owner July 22, 2023 08:13

MathiasVP added the no-change-note-required This PR does not need a change note label Jul 22, 2023

jketema reviewed Jul 24, 2023

View reviewed changes

MathiasVP force-pushed the type-bounds-for-new-range-analysis branch from 44a88b8 to 36ffa12 Compare August 1, 2023 12:41

MathiasVP marked this pull request as draft August 2, 2023 15:10

MathiasVP mentioned this pull request Aug 3, 2023

C++: Add a type-based SemReason. #13880

Merged

MathiasVP added 4 commits August 4, 2023 16:03

C++: Infer a constant upper bound whenever we convert to a 'larger' u…

050a563

…pper bound (and similarly for lower bounds).

C++: Let 'hasConstantBound' determine a reason.

b14db86

C++: Filter type-based reasons out of MCTV queries.

9807c0b

C++: Accept test changes.

2d832db

MathiasVP force-pushed the type-bounds-for-new-range-analysis branch from 1822a67 to 2d832db Compare August 4, 2023 14:12

MathiasVP marked this pull request as ready for review August 5, 2023 18:31

jketema reviewed Aug 7, 2023

View reviewed changes

jketema approved these changes Aug 7, 2023

View reviewed changes

MathiasVP merged commit e9750af into github:main Aug 7, 2023

MathiasVP added a commit to MathiasVP/ql that referenced this pull request Aug 9, 2023

Revert "Merge pull request github#13783 from MathiasVP/type-bounds-fo…

cb1076c

…r-new-range-analysis" This reverts commit e9750af, reversing changes made to 37a5462.

MathiasVP mentioned this pull request Aug 9, 2023

C++: Revert constant bounds for new range analysis #13931

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++: Constant type-bounds in the new range analysis #13783

C++: Constant type-bounds in the new range analysis #13783

MathiasVP commented Jul 20, 2023 •

edited

Loading

geoffw0 Jul 20, 2023

MathiasVP commented Jul 22, 2023 •

edited

Loading

geoffw0 commented Jul 24, 2023

jketema left a comment

MathiasVP commented Jul 24, 2023

jketema commented Jul 24, 2023

jketema commented Jul 25, 2023

MathiasVP commented Jul 25, 2023 •

edited

Loading

jketema commented Jul 25, 2023

MathiasVP commented Jul 25, 2023

MathiasVP commented Jul 25, 2023

MathiasVP commented Jul 31, 2023

geoffw0 commented Aug 2, 2023

MathiasVP commented Aug 2, 2023

MathiasVP commented Aug 3, 2023

MathiasVP commented Aug 3, 2023

MathiasVP commented Aug 7, 2023

jketema left a comment

jketema Aug 7, 2023

MathiasVP Aug 7, 2023

jketema Aug 7, 2023

MathiasVP Aug 7, 2023

jketema Aug 7, 2023

C++: Constant type-bounds in the new range analysis #13783

C++: Constant type-bounds in the new range analysis #13783

Conversation

MathiasVP commented Jul 20, 2023 • edited Loading

geoffw0 Jul 20, 2023

Choose a reason for hiding this comment

MathiasVP commented Jul 22, 2023 • edited Loading

geoffw0 commented Jul 24, 2023

jketema left a comment

Choose a reason for hiding this comment

MathiasVP commented Jul 24, 2023

jketema commented Jul 24, 2023

jketema commented Jul 25, 2023

MathiasVP commented Jul 25, 2023 • edited Loading

jketema commented Jul 25, 2023

MathiasVP commented Jul 25, 2023

MathiasVP commented Jul 25, 2023

MathiasVP commented Jul 31, 2023

geoffw0 commented Aug 2, 2023

MathiasVP commented Aug 2, 2023

MathiasVP commented Aug 3, 2023

MathiasVP commented Aug 3, 2023

MathiasVP commented Aug 7, 2023

jketema left a comment

Choose a reason for hiding this comment

jketema Aug 7, 2023

Choose a reason for hiding this comment

MathiasVP Aug 7, 2023

Choose a reason for hiding this comment

jketema Aug 7, 2023

Choose a reason for hiding this comment

MathiasVP Aug 7, 2023

Choose a reason for hiding this comment

jketema Aug 7, 2023

Choose a reason for hiding this comment

MathiasVP commented Jul 20, 2023 •

edited

Loading

MathiasVP commented Jul 22, 2023 •

edited

Loading

MathiasVP commented Jul 25, 2023 •

edited

Loading