smartcontractkit · skudasov · Dec 19, 2024 · Dec 2, 2024 · Dec 3, 2024 · Dec 3, 2024
@@ -31,7 +31,7 @@
           GOPRIVATE: github.com/smartcontractkit/generate-go-function-docs
         run: |
           git config --global url."https://x-access-token:${{ steps.setup-github-token-read.outputs.access-token }}@github.com/".insteadOf "https://github.com/"
-          go install github.com/smartcontractkit/[email protected].1
+          go install github.com/smartcontractkit/[email protected].2
           go install github.com/jmank88/[email protected]
           go install golang.org/x/tools/gopls@latest
 
@@ -111,7 +111,7 @@
        shell: bash
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_DOC_GEN_API_KEY }}
        run: |
          # Add go binary to PATH
          PATH=$PATH:$(go env GOPATH)/bin
          export PATH

@@ -0,0 +1,27 @@
+name: WASP's BenchSpy Go Tests
+on: [push]
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+jobs:
+  test:
+    defaults:
+      run:
+        working-directory: wasp
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - uses: dorny/paths-filter@v3
+        id: changes
+        with:
+          filters: |
+            src:
+              - 'wasp/benchspy/**'
+      - uses: cachix/install-nix-action@08dcb3a5e62fa31e2da3d490afc4176ef55ecd72 # v30
+        if: steps.changes.outputs.src == 'true'
+        with:
+          nix_path: nixpkgs=channel:nixos-unstable
+      - name: Run tests
+        if: steps.changes.outputs.src == 'true'
+        run: |-
+          nix develop -c make test_benchspy_race
@@ -8,7 +8,7 @@
     defaults:
       run:
         working-directory: wasp
-    runs-on: ubuntu-latest
+    runs-on: ubuntu22.04-16cores-64GB
     steps:
       - uses: actions/checkout@v3
       - uses: dorny/paths-filter@v3

@@ -19,6 +19,7 @@ artifacts/
 
 # Output of the go coverage tool, specifically when used with LiteIDE
 *.out
+cover.html
 
 # Dependency directories (remove the comment below to include it)
 # vendor/

@@ -12,4 +12,5 @@ CVE-2024-24786 # CWE-835 Loop with Unreachable Exit Condition ('Infinite Loop')
 CVE-2024-32972 # CWE-400: Uncontrolled Resource Consumption ('Resource Exhaustion') [still not fixed, not even in v1.13.8]
 CVE-2023-42319 # CWE-noinfo: lol... go-ethereum v1.13.8 again
 CVE-2024-10086 # Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')
-CVE-2024-51744 # CWE-755: Improper Handling of Exceptional Conditions
+CVE-2024-51744 # CWE-755: Improper Handling of Exceptional Conditions
+CVE-2024-45338 # CWE-770: Allocation of Resources Without Limits or Throttling
@@ -69,6 +69,23 @@
       - [Profile](./libs/wasp/components/profile.md)
       - [Sampler](./libs/wasp/components/sampler.md)
       - [Schedule](./libs/wasp/components/schedule.md)
+    - [BenchSpy](./libs/wasp/benchspy/overview.md)
+      - [Getting started](./libs/wasp/benchspy/getting_started.md)
+      - [Your first test](./libs/wasp/benchspy/first_test.md)
+      - [Simplest metrics](./libs/wasp/benchspy/simplest_metrics.md)
+      - [Standard Loki metrics](./libs/wasp/benchspy/loki_std.md)
+      - [Custom Loki metrics](./libs/wasp/benchspy/loki_custom.md)
+      - [Standard Prometheus metrics](./libs/wasp/benchspy/prometheus_std.md)
+      - [Custom Prometheus metrics](./libs/wasp/benchspy/prometheus_custom.md)
+      - [To Loki or not to Loki?](./libs/wasp/benchspy/loki_dillema.md)
+      - [Real world example](./libs/wasp/benchspy/real_world.md)
+      - [Reports](./libs/wasp/benchspy/reports/overview.md)
+        - [Standard Report](./libs/wasp/benchspy/reports/standard_report.md)
+          - [Adding new QueryExecutor](./libs/wasp/benchspy/reports/new_executor.md)
+          - [Adding new standard load metric]()
+          - [Adding new standard resource metric]()
+        - [Defining a new report](./libs/wasp/benchspy/reports/new_report.md)
+      - [Adding new storage]()
     - [How to](./libs/wasp/how-to/overview.md)
       - [Start local observability stack](./libs/wasp/how-to/start_local_observability_stack.md)
       - [Try it out quickly](./libs/wasp/how-to/run_included_tests.md)

@@ -0,0 +1,114 @@
+# BenchSpy - Your First Test
+
+Let's start with the simplest case, which doesn't require any part of the observability stack—only `WASP` and the application you are testing.
+`BenchSpy` comes with built-in `QueryExecutors`, each of which also has predefined metrics that you can use. One of these executors is the `DirectQueryExecutor`, which fetches metrics directly from `WASP` generators,
+which means you can run it with Loki.
+
+> [!NOTE]
+> Not sure whether to use `Loki` or `Direct` query executors? [Read this!](./loki_dillema.md)
+
+## Test Overview
+
+Our first test will follow this logic:
+- Run a simple load test.
+- Generate a performance report and store it.
+- Run the load test again.
+- Generate a new report and compare it to the previous one.
+
+We'll use very simplified assertions for this example and expect the performance to remain unchanged.
+
+### Step 1: Define and Run a Generator
+
+Let's start by defining and running a generator that uses a mocked service:
+
+```go
+gen, err := wasp.NewGenerator(&wasp.Config{
+    T:           t,
+    GenName:     "vu",
+    CallTimeout: 100 * time.Millisecond,
+    LoadType:    wasp.VU,
+    Schedule:    wasp.Plain(10, 15*time.Second),
+    VU: wasp.NewMockVU(&wasp.MockVirtualUserConfig{
+        CallSleep: 50 * time.Millisecond,
+    }),
+})
+require.NoError(t, err)
+gen.Run(true)
+```
+
+### Step 2: Generate a Baseline Performance Report
+
+With load data available, let's generate a baseline performance report and store it in local storage:
+
+```go
+baseLineReport, err := benchspy.NewStandardReport(
+    // random hash, this should be the commit or hash of the Application Under Test (AUT)
+    "v1.0.0",
+    // use built-in queries for an executor that fetches data directly from the WASP generator
+    benchspy.WithStandardQueries(benchspy.StandardQueryExecutor_Direct),
+    // WASP generators
+    benchspy.WithGenerators(gen),
+)
+require.NoError(t, err, "failed to create baseline report")
+
+fetchCtx, cancelFn := context.WithTimeout(context.Background(), 60*time.Second)
+defer cancelFn()
+
+fetchErr := baseLineReport.FetchData(fetchCtx)
+require.NoError(t, fetchErr, "failed to fetch data for baseline report")
+
+path, storeErr := baseLineReport.Store()
+require.NoError(t, storeErr, "failed to store baseline report", path)
+```
+
+> [!NOTE]
+> There's a lot to unpack here, and you're encouraged to read more about the built-in `QueryExecutors` and the standard metrics they provide as well as about the `StandardReport` [here](./reports/standard_report.md).
+>
+> For now, it's enough to know that the standard metrics provided by `StandardQueryExecutor_Direct` include:
+> - Median latency
+> - P95 latency (95th percentile)
+> - Max latency
+> - Error rate
+
+### Step 3: Run the Test Again and Compare Reports
+
+With the baseline report ready, let's run the load test again. This time, we'll use a wrapper function to automatically load the previous report, generate a new one, and ensure they are comparable.
+
+```go
+// define a new generator using the same config values
+newGen, err := wasp.NewGenerator(&wasp.Config{
+    T:           t,
+    GenName:     "vu",
+    CallTimeout: 100 * time.Millisecond,
+    LoadType:    wasp.VU,
+    Schedule:    wasp.Plain(10, 15*time.Second),
+    VU: wasp.NewMockVU(&wasp.MockVirtualUserConfig{
+        CallSleep: 50 * time.Millisecond,
+    }),
+})
+require.NoError(t, err)
+
+// run the load
+newGen.Run(true)
+
+fetchCtx, cancelFn = context.WithTimeout(context.Background(), 60*time.Second)
+defer cancelFn()
+
+// currentReport is the report that we just created (baseLineReport)
+currentReport, previousReport, err := benchspy.FetchNewStandardReportAndLoadLatestPrevious(
+    fetchCtx,
+    // commit or tag of the new application version
+    "v2.0.0",
+    benchspy.WithStandardQueries(benchspy.StandardQueryExecutor_Direct),
+    benchspy.WithGenerators(newGen),
+)
+require.NoError(t, err, "failed to fetch current report or load the previous one")
+```
+
+> [!NOTE]
+> In a real-world case, once you've generated the first report, you should only need to use the `benchspy.FetchNewStandardReportAndLoadLatestPrevious` function.
+
+### What's Next?
+
+Now that we have two reports, how do we ensure that the application's performance meets expectations?
+Find out in the [next chapter](./simplest_metrics.md).
@@ -0,0 +1,14 @@
+# BenchSpy - Getting Started
+
+The following examples assume you have access to the following applications:
+- Grafana
+- Loki
+- Prometheus
+
+> [!NOTE]
+> The easiest way to run these locally is by using CTFv2's [observability stack](../../../framework/observability/observability_stack.md).
+> Be sure to install the `CTF CLI` first, as described in the [CTFv2 Getting Started](../../../framework/getting_started.md) guide.
+
+Since BenchSpy is tightly coupled with WASP, we highly recommend that you [get familiar with it first](../overview.md) if you haven't already.
+
+Ready? [Let's get started!](./first_test.md)
@@ -0,0 +1,47 @@
+# BenchSpy - Custom Loki Metrics
+
+In this chapter, we’ll explore how to use custom `LogQL` queries in the performance report. For this more advanced use case, we’ll manually compose the performance report.
+
+The load generation part is the same as in the standard Loki metrics example and will be skipped.
+
+## Defining Custom Metrics
+
+Let’s define two illustrative metrics:
+- **`vu_over_time`**: The rate of virtual users generated by WASP, using a 10-second window.
+- **`responses_over_time`**: The number of AUT's responses, using a 1-second window.
+
+```go
+lokiQueryExecutor := benchspy.NewLokiQueryExecutor(
+    map[string]string{
+        "vu_over_time":        fmt.Sprintf("max_over_time({branch=~\"%s\", commit=~\"%s\", go_test_name=~\"%s\", test_data_type=~\"stats\", gen_name=~\"%s\"} | json | unwrap current_instances [10s]) by (node_id, go_test_name, gen_name)", label, label, t.Name(), gen.Cfg.GenName),
+        "responses_over_time": fmt.Sprintf("sum(count_over_time({branch=~\"%s\", commit=~\"%s\", go_test_name=~\"%s\", test_data_type=~\"responses\", gen_name=~\"%s\"} [1s])) by (node_id, go_test_name, gen_name)", label, label, t.Name(), gen.Cfg.GenName),
+    },
+    gen.Cfg.LokiConfig,
+)
+```
+
+> [!NOTE]
+> These `LogQL` queries use the standard labels that `WASP` applies when sending data to Loki.
+
+## Creating a `StandardReport` with Custom Queries
+
+Now, let’s create a `StandardReport` using our custom queries:
+
+```go
+baseLineReport, err := benchspy.NewStandardReport(
+    "v1.0.0",
+    // notice the different functional option used to pass Loki executor with custom queries
+    benchspy.WithQueryExecutors(lokiQueryExecutor),
+    benchspy.WithGenerators(gen),
+)
+require.NoError(t, err, "failed to create baseline report")
+```
+
+## Wrapping Up
+
+The rest of the code remains unchanged, except for the names of the metrics being asserted. You can find the full example [here](...).
+
+Now it’s time to look at the last of the bundled `QueryExecutors`. Proceed to the [next chapter to read about Prometheus](./prometheus_std.md).
+
+> [!NOTE]
+> You can find the full example [here](https://github.com/smartcontractkit/chainlink-testing-framework/tree/main/wasp/examples/benchspy/loki_query_executor/loki_query_executor_test.go).
@@ -0,0 +1,39 @@
+# BenchSpy - To Loki or Not to Loki?
+
+You might be wondering whether to use the `Loki` or `Direct` query executor if all you need are basic latency metrics.
+
+## Rule of Thumb
+
+You should opt for the `Direct` query executor if all you need is a single number, such as the median latency or error rate, and you're not interested in:
+- Comparing time series directly,
+- Examining minimum or maximum values over time, or
+- Performing advanced calculations on raw data,
+
+## Why Choose `Direct`?
+
+The `Direct` executor returns a single value for each standard metric using the same raw data that Loki would use. It accesses data stored in the `WASP` generator, which is later pushed to Loki.
+
+This means you can:
+- Run your load test without a Loki instance.
+- Avoid calculating metrics like the median, 95th percentile latency, or error ratio yourself.
+
+By using `Direct`, you save resources and simplify the process when advanced analysis isn't required.
+
+> [!WARNING]
+> Metrics calculated by the two query executors may differ slightly due to differences in their data processing and calculation methods:
+> - **`Direct` QueryExecutor**: This method processes all individual data points from the raw dataset, ensuring that every value is taken into account for calculations like averages, percentiles, or other statistics. It provides the most granular and precise results but may also be more sensitive to outliers and noise in the data.
+> - **`Loki` QueryExecutor**: This method aggregates data using a default window size of 10 seconds. Within each window, multiple raw data points are combined (e.g., through averaging, summing, or other aggregation functions), which reduces the granularity of the dataset. While this approach can improve performance and reduce noise, it also smooths the data, which may obscure outliers or small-scale variability.
+
+> #### Why This Matters for Percentiles:
+> Percentiles, such as the 95th percentile (p95), are particularly sensitive to the granularity of the input data:
+> - In the **`Direct` QueryExecutor**, the p95 is calculated across all raw data points, capturing the true variability of the dataset, including any extreme values or spikes.
+> - In the **`Loki` QueryExecutor**, the p95 is calculated over aggregated data (i.e. using the 10-second window). As a result, the raw values within each window are smoothed into a single representative value, potentially lowering or altering the calculated p95. For example, an outlier that would significantly affect the p95 in the `Direct` calculation might be averaged out in the `Loki` window, leading to a slightly lower percentile value.
+
+> #### Direct caveats:
+> - **buffer limitations:** `WASP` generator use a [StringBuffer](https://github.com/smartcontractkit/chainlink-testing-framework/blob/main/wasp/buffer.go) with fixed size to store the responses. Once full capacity is reached
+> oldest entries are replaced with incoming ones. The size of the buffer can be set in generator's config. By default, it is limited to 50k entries to lower resource consumption and potential OOMs.
+>
+> - **sampling:** `WASP` generators support optional sampling of successful responses. It is disabled by deafult, but if you do enable it, then the calculations would no longer be done over a full dataset.
+
+> #### Key Takeaway:
+> The difference arises because `Direct` prioritizes precision by using raw data, while `Loki` prioritizes efficiency and scalability by using aggregated data. When interpreting results, it’s essential to consider how the smoothing effect of `Loki` might impact the representation of variability or extremes in the dataset. This is especially important for metrics like percentiles, where such details can significantly influence the outcome.