Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Deploy preview] Add a function list #5233

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

mstange
Copy link
Contributor

@mstange mstange commented Nov 28, 2024

WIP implementing the approach outlined in #15 (comment)

This just adds a new interface that we can hang functionality off of
which is specific to the inverted tree. No functional changes.
This is the main new concept in this PR that allows us to make
the inverted tree fast. See the comment above CallNodeInfoInverted in
src/types/profile-derived.js for details.

The PR is structured as follows:
 - Implement the suffix order in a brute force manner (this commit).
 - Use the suffix order to re-implement everything that was using the
   inverted call node table.
 - Once nothing is using the inverted call node table directly anymore,
   make it fast. We make it fast by rewriting the computation of the
   inverted call node table and of the suffix order so that we only
   materialize inverted call nodes that are displayed in the call tree,
   and not for every sample. And we only compute the suffix order to the
   level of precision needed to have correct ranges for all materialized
   inverted call nodes.
This function is used by getNativeSymbolsForCallNodeInverted,
getStackAddressInfoForCallNodeInverted, and
getStackLineInfoForCallNodeInverted.

This replaces a call to getStackIndexToCallNodeIndex() with a call to
getStackIndexToNonInvertedCallNodeIndex(). It also mostly removes the
use of the inverted call node table for this code. (There's still a
place that accesses callNodeInfo.getCallNodeTable().depth, but this will
be fixed in a later commit.)

We want to eliminate all callers to getStackIndexToCallNodeIndex() because
we don't want to compute a mapping from non-inverted stack index to
inverted call node index upfront.
… order.

This replaces a call to getStackIndexToCallNodeIndex() with a call to
getStackIndexToNonInvertedCallNodeIndex(). It also removes a call to
getCallNodeTable(). And it replaces a SampleIndexToCallNodeIndex
mapping with a SampleIndexToNonInvertedCallNodeIndex mapping.
This replaces a call to getStackIndexToCallNodeIndex() with a call to
getStackIndexToNonInvertedCallNodeIndex(). It also removes a call to
getCallNodeTable().
…ted call nodes.

This function is used when hovering or clicking the activity graph.

This commit replaces a SampleIndexToCallNodeIndex mapping with a
SampleIndexToNonInvertedCallNodeIndex mapping.
The stack chart is always non-inverted, so this commit is functionally
neutral.

This lets us remove the now-unused function
getSampleIndexToCallNodeIndexForFilteredThread.
This removes a few more uses of getCallNodeTable().
This replaces lots of uses of getCallNodeTable() with uses of
getNonInvertedCallNodeTable().
It also replaces lots of uses of getStackIndexToCallNodeIndex() with
uses of getStackIndexToNonInvertedCallNodeIndex().

We now compute the call tree timings quite differently for inverted mode
compared to non-inverted mode. There's one part of the work that's shared:
The getCallNodeLeafAndSummary computes the self time for each non-inverted
node, and the result is used for both the inverted and the non-inverted
call tree timings.
The CallTreeTimings Flow type is turned into an enum, with a different
type for CallTreeTimingsNonInverted and for CallTreeTimingsInverted.
A new implementation for the CallTreeInternal interface is added.
All these places now deal with non-inverted call nodes, and for those,
what we meant by "leaf" and by "self" was always the same thing.
And I prefer the word "self" because "leaf" usually means "has no children"
and that's not the case here.
We still use the word "leaf" in many parts of the documentation.
Whether a function recurses (directly or indirectly) is the same
in the inverted call node table and in the non-inverted call node
table.
This just stops exposing it from the interface.

The way we compute it will change in the next commit.
This is the main commit of this PR. Now that nothing is relying on having
an inverted call node for each sample, or on having a fully-computed
inverted call node table, we can make it so that we only add entries to
the inverted call node table when we actually need a node, for example
because it was revealed in the call tree. This makes it a lot faster
to click the "Invert call stack" checkbox - before this commit, we were
computing a lot of inverted call nodes that were never shown to the user.

After this commit, CallNodeInfoInvertedImpl no longer inherits from
CallNodeInfoImpl - it is now a fully separate implementation.

This commit reduces the time spent in `getInvertedCallNodeInfo` on an
example profile (https://share.firefox.dev/411Vg2T) from 11 seconds to 718 ms.
Before: https://share.firefox.dev/3CTNApp
After: https://share.firefox.dev/492F7wl (15x faster)
The new structure gives us a nice guarantee about roots
of the inverted tree: There is an inverted root for every
func, and their indexes are identical. This makes it really
cheap to translate between the call node index and the func index
(no conversion or lookup is necessary) and also makes it cheap
to check if a node is a root.

This commit also replaces a few maps and sets with typed arrays
for performance. This is easier now that the root indexes are
all contiguous.
This avoids a CompareIC when comparing to null in _createInvertedRootCallNodeTable,
because we'll now only be comparing integers. This speeds up
_createInvertedRootCallNodeTable by almost 2x.
…ted.

This saves a lot of work upfront that's not needed. At any given time,
we just need the suffix order to be accurate enough so that the "suffix
order index range" for every existing inverted call node is correct.

This commit reduces the time spent in `getInvertedCallNodeInfo` + `getChildren`
on an example profile (https://share.firefox.dev/411Vg2T) from 721 ms to 20 ms.
Before: https://share.firefox.dev/40Wdi6S
After: https://share.firefox.dev/3AZjbpg (35x faster)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant