Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YARP has a higher cpu usage than Nginx #2427

Open
doddgu opened this issue Mar 4, 2024 · 23 comments
Open

YARP has a higher cpu usage than Nginx #2427

doddgu opened this issue Mar 4, 2024 · 23 comments
Assignees
Labels
needs-author-action An issue or pull request that requires more info or actions from the author. Type: Bug Something isn't working
Milestone

Comments

@doddgu
Copy link

doddgu commented Mar 4, 2024

Sorry, I don't know if it is a bug.

Describe the bug

I deployed 3 nginx at HongKong, and deployed 3 YARP at HangZhou.

Client -> Nginx -> Yarp -> Service

Nginx forwards some services, and YARP forward one of them.

Nginx CPU
image

YARP CPU
image

YARP other metrics
image

Htop (Cat.Service.dll is based on YARP)
image

I tried to analyze the CPU on vs
Top function
image

Module View
image

To Reproduce

No exception.

Further technical details

  • Include the version of the packages you are using
    2.1.0
  • The platform (Linux/macOS/Windows)
    Linux

They're all 4c8g, YARP on ubuntu 22.04, nginx on centos.
YARP 2.1.0 runs on .NET 8.

@doddgu doddgu added the Type: Bug Something isn't working label Mar 4, 2024
@Tratcher
Copy link
Member

Tratcher commented Mar 4, 2024

How does the load / RPS compare?

@doddgu
Copy link
Author

doddgu commented Mar 5, 2024

How does the load / RPS compare?

Every YARP is almost 4000
image

@doddgu doddgu changed the title Yapr has a higher cpu usage than Nginx Yarp has a higher cpu usage than Nginx Mar 5, 2024
@doddgu doddgu changed the title Yarp has a higher cpu usage than Nginx YARP has a higher cpu usage than Nginx Mar 5, 2024
@doddgu
Copy link
Author

doddgu commented Mar 5, 2024

I loaded pdb.
I find that the Thread in WorkerThreadStart method. The Thread.CurrentThread.SetThreadPoolWorkerThreadName() takes up a lot of CPU resources.

I don't know why have to call WorkerThreadStart so many times.

image

image

image

@doddgu
Copy link
Author

doddgu commented Mar 6, 2024

I used YARP source code analysis, I found that YARP itself does not have high cpu usage.

image

@doddgu
Copy link
Author

doddgu commented Mar 8, 2024

Hi @MihaZupan , any news?

@doddgu
Copy link
Author

doddgu commented Mar 11, 2024

Is it related to the dotnet/runtime#70098
And I see there's pr to fix it

@MihaZupan MihaZupan added this to the Backlog milestone Apr 9, 2024
@doddgu
Copy link
Author

doddgu commented Aug 12, 2024

@MihaZupan hi,is there any news?
In my case, I have a service , it has 120,000 qps. It only need 3 nginx, but used 40 yarp services. It troubles me.
I tried using.net 9 and I found a performance improvement of about 20%, but that's still a big difference.
Or are there any temporary ways to try to fix the problem? I'm happy to test it.

@zhenlei520
Copy link

The performance gap is so obvious, is there any room for improvement?

@zhenlei520
Copy link

How does the load / RPS compare?

Is there any news about this issue?
Through observations over the past few days, we found that when the response time of downstream services fluctuates, Porxy is under great pressure. Simply put, requests that originally required 100 threads to process require more threads to process these requests due to downstream fluctuations. At this time, threads are piled up, and then more threads are quickly started to process these requests. However, this rapid change of threads in a short period of time causes obvious CPU fluctuations, and as the downstream stabilizes, threads that have not been used for a long time will be destroyed. In this way, downstream fluctuations will have a great impact on Proxy. Although we set the minimum number of threads, this will not prevent the thread pool from recycling threads later. It only enables more threads to be started quickly. We hope to keep these threads alive all the time, and do not want frequent thread startups to cause large CPU fluctuations.

 ThreadPool.SetMinThreads(500, 500);

@Tratcher @MihaZupan

@zhenlei520
Copy link

image
image
image

@doddgu
Copy link
Author

doddgu commented Sep 10, 2024

We upgrade .NET 8 to .NET 9 preview, and set some envionment variables

The most obvious improvement in .NET 9 is half the memory

DOTNET_SYSTEM_NET_SOCKETS_THREAD_COUNT = 500
DOTNET_ThreadPool_UnfairSemaphoreSpinLimit = 0
DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS = 1

Overall, it indeed consumes less CPU (around 30% less), and there are no longer minute-level blockages causing widespread timeouts when the Current Request suddenly increases. However, there is still a small probability of request timeouts, and the frequency of CPU fluctuations has become very frequent. We tracked that the downstream service responds quickly, and occasionally requests timeout due to yarp, but because the QPS is relatively high, these timeouts are not visible on the dashboard. We have another upstream service that is particularly sensitive to abnormal requests, and in the upstream service, we see that requests with a small probability of timeout occur very frequently.

First, let's look at the performance of yarp, which has indeed improved.
image

These are abnormal requests detected upstream, all of which are SocketExceptions.
image

In summary: Setting thread-related parameters can reduce CPU usage but will introduce more instability, and there is still a significant gap compared to Nginx.

@zhenlei520
Copy link

Later we made some adjustments to the configuration

<PropertyGroup>
  <TargetFramework>net9.0</TargetFramework>
 <GarbageCollectionAdaptationMode>0</GarbageCollectionAdaptationMode>
</PropertyGroup>

Environment variables

DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0

After turning off spin, the CPU performance increased by nearly 40%, which is indeed a big improvement. However, according to the data, it will affect qps. However, we have not yet added link monitoring, so the impact on qps is not yet known. From the perspective of upstream requests, the average response time is not greatly affected.

image

However, compared with nginx, yarp still has a lot of room for improvement. We hope to use it instead of other reverse proxy products.

@doddgu
Copy link
Author

doddgu commented Sep 18, 2024

@Tratcher @MihaZupan
Sorry, I have to seek your help again. Because the CPU control still cannot meet our expectations, we might choose another reverse proxy as a result. This is a tough decision, as we are all .NET developers and had high hopes for YARP. Our requirements are not extremely stringent for YARP to match the performance parameters of Nginx. However, if we only need three Nginx servers to handle all the traffic stably, I cannot convince our team to choose YARP, which requires over 40 servers to run stably. I hope the .NET team can see this message and respond to us. Our biggest confusion right now is not knowing when it will be resolved, even just prioritizing the resolution would be very helpful. Thank you.

@bxjg1987
Copy link

bxjg1987 commented Oct 7, 2024

We are also choosing between nginx and yarp. As. net developers, we prefer to use yarp. Has there been any progress on this issue?

@karelz
Copy link
Member

karelz commented Oct 9, 2024

We do not see such huge differences in our perf lab between YARP and NGINX. The ratio is currently about 2:3 I believe. @MihaZupan can link our public perf dashboard.
That said, we are interested in learning why do you see so different ratio. However, it will require some deep digging and collaboration. Are you willing to help us understand the root cause and potentially help us improve YARP?

@MihaZupan
Copy link
Member

MihaZupan commented Oct 9, 2024

Is it related to the dotnet/runtime#70098

Only in the sense that this PR is improving HttpClient (and therefore also YARP) performance.
The change is about lowering contention in the connection pool, which shouldn't be a huge factor at the several thousand RPS/machine that you're looking at. So while it may save you some CPU cycles, I wouldn't expect it to make a meaningful difference in this case.

requests that originally required 100 threads to process require more threads to process these requests due to downstream fluctuations

This is a surprisingly high number of threads to see on a 4-core machine if everything is fully async (you're not doing "sync over async").
Were you seeing these numbers before you started modifying ThreadPool settings?

ThreadPool.SetMinThreads(500, 500);
DOTNET_SYSTEM_NET_SOCKETS_THREAD_COUNT = 500

Settings like these seem excessive for the machine size and are more likely to hurt performance than improve it. I'd recommend removing them unless you have real evidence that they're improving things.
The thread pool should be able to adjust the number of threads to adapt to different load levels.

<GarbageCollectionAdaptationMode>0</GarbageCollectionAdaptationMode>

How come you're disabling this? I was under the impression that you were worried about the memory footprint (#2527) after load spikes without this functionality.

It only need 3 nginx, but used 40 yarp services

How did you arrive at the 40 number?
What happens if you e.g. use 10 instead? Is request latency meaningfully impacted?

DOTNET_ThreadPool_UnfairSemaphoreSpinLimit

While this may reduce the CPU usage reported for a process, it may negatively impact throughput while under load.
If you experiment with reducing the number of YARP instances, such that the per-instance load is higher, does removing this environment variable (leaving the default behavior) make a difference?


There may be other factors impacting the performance between Nginx and YARP.
As Karel mentioned, we're aware of the performance differences, but the numbers we're seeing in the automated performance runs (select the "Proxies" tab on the bottom) are much closer than what you're seeing, nowhere near the 3:40 ratio.

Are both proxies using the same HTTP protocol version? Both between client-proxy and proxy-backend (e.g. YARP will default to HTTP/2 if the backend supports it)?
I saw you've posted previous questions in the repo about e.g. injecting Connection: Close headers. Can you share what your YARP configuration looks like, and how it differs from the Nginx setup?

@zhenlei520
Copy link

zhenlei520 commented Oct 23, 2024

@MihaZupan

Glad to receive your reply. After confirmation with the business side, we got the ratio of 3:11, close to 1:4. Currently, no special optimization has been done on nginx. Only two operations have been done on yarp.

  1. Weight-based weighting of Destination
public class WeightingRoundLoadBalancingPolicy : ILoadBalancingPolicy
{
    private ILogger<WeightingRoundLoadBalancingPolicy> _logger;

    public string Name => "WeightingRound";

    public WeightingRoundLoadBalancingPolicy(ILogger<WeightingRoundLoadBalancingPolicy> logger)
    {
        _logger = logger;
    }

    public DestinationState? PickDestination(HttpContext context, ClusterState cluster, IReadOnlyList<DestinationState> availableDestinations)
    {
        if (Weighting.WeightedClusterWeights.TryGetValue(cluster.ClusterId, out var weightedWeights))
        {
            if (weightedWeights is null)
            {
                _logger.LogInformation($"PickDestination Error: Can not get [{cluster.ClusterId}] cluster weightedWeights");
                return null;
            }

            if (weightedWeights.DestinationIds is null)
            {
                _logger.LogInformation($"PickDestination Error: Can not get [{cluster.ClusterId}] destination, DestinationIds is null");
                return null;
            }

            var destinationId = weightedWeights.DestinationIds[WeightingHelper.GetIndexByRandomWeight(weightedWeights.DestinationWeightedWeights, weightedWeights.DestinationWeights, weightedWeights.TotalWeights ?? 1D)];

            return availableDestinations.FirstOrDefault(destination => destination.DestinationId == destinationId);
        }

        _logger.LogInformation($"PickDestination Error: Can not get [{cluster.ClusterId}] cluster");
        return null;
    }
}

public class WeightingHelper
{
    public static (double[]? Weights, double? TotalWeight) GetWeightedWeights(double[] weights)
    {
        if (weights.Length == 0) return (null, null);
        else if (weights.Length == 1) return ([.. weights], weights[0]);

        var totalWeight = 0D;
        Span<double> newWeights = stackalloc double[weights.Length];

        for (int i = 0; i < weights.Length; i++)
        {
            totalWeight += weights[i];
            newWeights[i] = totalWeight;
        }

        return ([.. newWeights], totalWeight);
    }

    public static int GetIndexByRandomWeight(Span<double> weightedWeights, Span<double> weights, double totalWeight)
    {
        // Ignore weight when only one server
        if (weightedWeights.Length == 1) return 0;

        var randomWeight = Random.Shared.NextDouble() * totalWeight;
        var index = weightedWeights.BinarySearch(randomWeight);

        if (index < 0)
            index = -index - 1;
        else if (index > weightedWeights.Length)
            // The number of servers decreases
            index = GetIndexByRandomWeight(weightedWeights, weights, totalWeight);

        if (weights[index] != 0D)
            return index;
        else
            // The weight of the server is 0
            return GetIndexByRandomWeight(weightedWeights, weights, totalWeight);
    }
}
  1. Add request log monitoring
public class LoggingMiddleware
{
    private readonly RequestDelegate _next;
    private readonly ILogFormatter _logFormatter;
    private static readonly Channel<byte[]> _logChannel = Channel.CreateUnbounded<byte[]>();
    private static int _batchSize = 500;
    private static DateTime _lastWriteTime;

    public LoggingMiddleware(RequestDelegate next, ILogFormatter logFormatter)
    {
        _next = next;
        _logFormatter = logFormatter;

        int consumerCount = Environment.ProcessorCount;
        Task.Run(ProcessLogsAsync);
    }

    public async Task InvokeAsync(HttpContext context)
    {
        Stopwatch sw = new();
        sw.Start();

        try
        {
            await _next(context);
        }
        finally
        {
            sw.Stop();

            var logEntry = _logFormatter.Format(context, new LogFormatterAdditionalInfo(sw.ElapsedMilliseconds));
            await _logChannel.Writer.WriteAsync(logEntry);
        }
    }

    private static async Task ProcessLogsAsync()
    {
        var batch = new List<byte[]>();

        await foreach (var logEntry in _logChannel.Reader.ReadAllAsync())
        {
            batch.Add(logEntry);

            if (batch.Count >= _batchSize || (DateTime.Now - _lastWriteTime).TotalSeconds >= 1)
            {
                var fs = LoggingHelper.FileStream;
                if (fs is null)
                {
                    continue;
                }
                _lastWriteTime = DateTime.Now;
                await WriteBatchAsync(fs, batch);
                batch.Clear();
            }
        }
    }

    private static async Task WriteBatchAsync(FileStream fs, List<byte[]> batch)
    {
        try
        {
            foreach (var logEntry in batch)
            {
                await fs.WriteAsync(logEntry);
            }
            await fs.FlushAsync();
        }
        finally
        {
        }
    }
}

Our scenario is that the response is longer and the current_request execution is higher.

@zhenlei520
Copy link

zhenlei520 commented Oct 23, 2024

nginx

worker_processes 4;
worker_rlimit_nofile 655350;

events {
  worker_connections 655350;
}


http {
  include mime.types;

  default_type application/octet-stream;

  log_format main '$remote_addr'
  '|$http_x_forwarded_for'
  '|$time_local'
  '|$request_method'
  '|$uri'
  '|$status'
  '|$upstream_status'
  '|$body_bytes_sent'
  '|$request_time'
  '|$upstream_response_time'
  '|$upstream_http_RequestType'
  '|$upstream_http_ClientID'
  '|$upstream_http_SessionID'
  '|$upstream_http_ise'
  '|$upstream_http_sid'
  '|$upstream_http_ist'
  '|$upstream_http_Accept_Encoding'
  '|$upstream_http_errocde'
  '|$upstream_http_biztype';


  sendfile on;
  
  keepalive_timeout 300;

  #gzip  on;

  upstream XXXProjectA-Intranet {
    server xxx.com;//custom internal domain 
    keepalive 32;
  }


  server {
    listen 80;
    server_name XXXProjectA.com;
	http2_max_concurrent_streams 256;

    location / {
      client_max_body_size 30m; 
      client_body_buffer_size 128k; 

      proxy_pass http://XXXProjectA-Intranet;
      proxy_set_header XXX-Real-IP $remote_addr;;
      proxy_set_header XXX-X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $host;
      proxy_set_header XXX-X-IsSSL "true";

      proxy_connect_timeout 300;
      proxy_send_timeout 300;
      proxy_read_timeout 300;
      proxy_ignore_client_abort on;
      proxy_buffer_size 64k;
      proxy_buffers 4 128k;
      proxy_busy_buffers_size 256k;
      proxy_temp_file_write_size 256k;

      keepalive_requests 1000;
      proxy_http_version 1.1;
      proxy_set_header Connection "";
    }
  }
}

yarp

{
  "Logging": {
    "LogLevel": {
      "Default": "Warning",
      "Microsoft.AspNetCore": "Warning"
    }
  },
  "AllowedHosts": "*",
  "Cat": {
    "ListenUrls": [
      "http://*:80"
    ],
    "Routes": [
      {
        "RouteId": "XXXProjectRoute",
        "Match": {
          "Path": "{**catch-all}"
        },
        "ClusterId": "XXXProjectCluster",
        "Transforms": [
          {
            "RequestHeaderOriginalHost": "true"
          },
          {
            "X-Forwarded": "Set",
            "For": "Off"
          },
          {
            "ResponseHeader": "Connection",
            "Append": "close"
          }
        ]
      }
    ],
    "Clusters": [
      {
        "ClusterId": "XXXProjectCluster",
        "ClusterConfigTemplate": {
          "LoadBalancingPolicy": "WeightingRound",
          "HttpRequest": {
            "ActivityTimeout": "00:03:00"
          }
        },
        "DestinationAddressTemplate": "http://{0}:5000",
        "Destinations": [
          {
            "IPAddress": "192.168.1.1",
            "Weight": 100
          },
	 {
            "IPAddress": "192.168.1.2",
            "Weight": 100
          }
        ]
      }
    ]
  }
}

nginx version: nginx/1.20.1
Yarp.ReverseProxy version: 2.1.0

@MihaZupan
Copy link
Member

"ResponseHeader": "Connection",
"Append": "close"

Why are you forcing connections from the client to the proxy to never be reused?
I don't see you doing so with Nginx, which could explain the massive perf differences.

@zhenlei520
Copy link

zhenlei520 commented Oct 24, 2024

"ResponseHeader": "Connection",
"Append": "close"

Why are you forcing connections from the client to the proxy to never be reused? I don't see you doing so with Nginx, which could explain the massive perf differences.

Thank you very much, It has been too long. We are unable to track down why the two configurations were added. We need some time to conduct verification tests.

@zhenlei520
Copy link

"ResponseHeader": "Connection",
"Append": "close"

Why are you forcing connections from the client to the proxy to never be reused? I don't see you doing so with Nginx, which could explain the massive perf differences.

After adjusting the configuration, the ratio between nginx and yarp is about 1:2.5. The Connection: Close has been deleted. after testing, turning off spin has lower CPU usage than the default

@MihaZupan
Copy link
Member

MihaZupan commented Dec 9, 2024

Were you able to look at other questions from #2427 (comment)?
Particularly how that ratio is determined.

You've opened several issues across repos, what does your configuration look like now? Are you changing thread counts, changing thread pool environment variables, etc.?

What do your request and responses look like (e.g. size, duration)?
Have you tried remeasuring performance on .NET 9.0?

@zhenlei520
Copy link

Were you able to look at other questions from #2427 (comment)? Particularly how that ratio is determined.

You've opened several issues across repos, what does your configuration look like now? Are you changing thread counts, changing thread pool environment variables, etc.?

What do your request and responses look like (e.g. size, duration)? Have you tried remeasuring performance on .NET 9.0?

After testing, the CPU performance is better when the spin is turned off than when it is not turned off, with a difference of about 30%, but this is not absolute. It is related to qps and throughput. For requests with high throughput and low qps, the effect is not very prominent.

.net9.0 has better performance. We have upgraded to .net9.0, and Connection: Close has also been removed, but the ratio is only 1:2.5.

We also found a very strange phenomenon. After the business project was deployed on Windows Server and Linux Ubuntu Server respectively, it was found that the CPU of the .net project deployed on the Windows machine was better, 50% higher than the Linux server deployment, but the memory was 100% worse than Linux. However, we did not deploy reverse-proxy on the Windows machine. I wonder if you have compared the difference between the deployment of reverse-proxy on these two operating systems?

After our use, we found that the reverse-proxy project is a service that consumes CPU but consumes relatively less memory. If Windows is better, we can find time to try to deploy it on the Windows server to see the effect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-author-action An issue or pull request that requires more info or actions from the author. Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants