Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2ray 的 DS 可能存在性能问题 & v2ray 整体性能提升 #373

Closed
RPRX opened this issue Oct 31, 2020 · 14 comments
Closed

v2ray 的 DS 可能存在性能问题 & v2ray 整体性能提升 #373

RPRX opened this issue Oct 31, 2020 · 14 comments
Assignees
Labels
help wanted Extra attention is needed Stale Welcome PR

Comments

@RPRX
Copy link
Contributor

RPRX commented Oct 31, 2020

https://github.com/badO1a5A90/v2ray-doc/blob/master/performance_test/DS/20201030.md

不知道其它 Go 程序是否会这样,但 v2ray 一直是这样。


2020/11/01 更新:

经过 @xiaokangwang 大佬的点拨,发现 v2ray 有针对纯 TCP 的 read 性能优化,之前的确没注意到:

_, isFile := reader.(*os.File)
if !isFile && useReadv {
if sc, ok := reader.(syscall.Conn); ok {
rawConn, err := sc.SyscallConn()
if err != nil {
newError("failed to get sysconn").Base(err).WriteToLog()
} else {
return NewReadVReader(reader, rawConn)
}
}
}

这里断言出 syscall.Conn,之后的 ReadVReader,系统直接写入 v2ray 指定的内存中,从而减少一次内存复制。

而由于这个修复 47660bf ,DS 是去掉了这项优化的。(其实这里可以先判断一下 runtime.GOOS

但可以发现,只要多套一层,比如 HTTP 伪装、WebSocket、TLS、PROXY protocol,这项优化就不会起作用了。。。

所以我会据此调整下 XTLS,实现更强的性能。

@RPRX RPRX added the help wanted Extra attention is needed label Oct 31, 2020
@RPRX
Copy link
Contributor Author

RPRX commented Oct 31, 2020

另外 v2ray 的整体性能也。。。需要研究一下问题在哪,优化 CPU 和内存使用(我有一些猜想,晚些时候补充)。

@lucifer9
Copy link
Member

貌似其他go程序不这样...
https://github.com/lucifer9/goben
可以用这个跑下看
我在i3-8100上测试,DS是TCP 127.0.0.1的两倍左右
abstract 与否基本上没差别

@RPRX
Copy link
Contributor Author

RPRX commented Oct 31, 2020

@lucifer9

可能是因为这项修复 47660bf

回滚后 linux 上 benchmark 试试

@ghost
Copy link

ghost commented Oct 31, 2020

根据https://github.com/badO1a5A90/v2ray-doc/blob/master/performance_test/DS/20201030.md 做的测试,发现domain socket对性能影响巨大。我自己也做了很多类似的测试,发现确实如此。
后来我又对此做了进一步的实验,主要对比nginx和v2ray处理domain socket的性能。发现:
1.v2ray对domain socket进行接收(listen)或者输出(fallbacks)处理,性能都不如tcp。输出性能不到tcp四分之一,接收性能约为tcp一半
2.nginx对domain socket进行输出(proxy_pass)处理,性能和tcp基本持平。对domain socket进行接收(listen)处理,结果比较奇怪,这里不做评价

以下是我做的实验:

nginx配置

server {
    listen 81;
    listen unix:/dev/shm/test.sock;
    root /dev/shm/nginx;
}
server {
    listen 82;
    location / {
        proxy_pass http://127.0.0.1:81;
    }
}
server {
    listen 83;
    location / {
        proxy_pass http://unix:/dev/shm/test.sock;
    }
}
server {
    listen 86;
    location / {
        proxy_pass http://127.0.0.1:998;
    }
}
server {
    listen 998;
    location / {
        proxy_pass http://127.0.0.1:81;
    }
}
server {
    listen 87;
    location / {
        proxy_pass http://unix:/dev/shm/test2.sock;
    }
}
server {
    listen unix:/dev/shm/test2.sock;
    location / {
        proxy_pass http://127.0.0.1:81;
    }
}
server {
    listen 88;
    location / {
        proxy_pass http://unix:/dev/shm/test3.sock;
    }
}
server {
    listen unix:/dev/shm/test3.sock;
    location / {
        proxy_pass http://unix:/dev/shm/test.sock;
    }
}
server {
    listen 89;
    location / {
        proxy_pass http://127.0.0.1:999;
    }
}
server {
    listen 90;
    location / {
        proxy_pass http://unix:/dev/shm/test4.sock;
    }
}
server {
    listen 91;
    location / {
        proxy_pass http://unix:/dev/shm/test5.sock;
    }
}

v2ray配置:

{
    "log": {
        "loglevel": "none"
    },
    "inbounds": [
        {
            "port": 84,
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": 81,
                        "xver": 0
                    }
                ]
            }
        },
        {
            "port": 85,
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": "/dev/shm/test.sock",
                        "xver": 0
                    }
                ]
            }
        },
        {
            "port": 999,
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": 81,
                        "xver": 0
                    }
                ]
            }
        },
        {
            "listen": "/dev/shm/test4.sock",
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": 81,
                        "xver": 0
                    }
                ]
            }
        },
        {
            "listen": "/dev/shm/test5.sock",
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": "/dev/shm/test.sock",
                        "xver": 0
                    }
                ]
            }
        }
    ]
}

/dev/shm/nginx下放置着一个大小为300m的名为file的文件(因为内存不够了,我的机器内存只有1gb),/dev/shm还剩下约200m的空间

分别执行

#测试tcp直连速度
wget 127.0.0.1:81/file -O /dev/null

#对比niginx和v2ray doamin socket输出速度
	#nginx tcp-> nginx
wget 127.0.0.1:82/file -O /dev/null
	#nginx ds-> nginx
wget 127.0.0.1:83/file -O /dev/null
	#v2ray tcp-> nginx
wget 127.0.0.1:84/file -O /dev/null
	#v2ray ds-> nginx
wget 127.0.0.1:85/file -O /dev/null

#对比niginx和v2ray doamin socket接收速度
	#nginx tcp-> nginx tcp-> nginx
wget 127.0.0.1:86/file -O /dev/null
	#nginx ds-> nginx tcp-> nginx
wget 127.0.0.1:87/file -O /dev/null
	#nginx ds-> nginx ds-> nginx
wget 127.0.0.1:88/file -O /dev/null
	#nginx tcp-> v2ray tcp-> nginx
wget 127.0.0.1:89/file -O /dev/null
	#nginx ds-> v2ray tcp-> nginx
wget 127.0.0.1:90/file -O /dev/null
	#nginx ds-> v2ray ds-> nginx
wget 127.0.0.1:91/file -O /dev/null

每条命令都重复执行多次,用wget自带的速度显示进行测速,类似下面这样:

# wget 127.0.0.1:81/file -O /dev/null
--2020-10-31 20:05:39--  http://127.0.0.1:81/file
Connecting to 127.0.0.1:81... connected.
HTTP request sent, awaiting response... 200 OK
Length: 314572800 (300M) [application/octet-stream]
Saving to: ‘/dev/null’

/dev/null                            100%[===================================================================>] 300.00M  1.05GB/s    in 0.3s    

2020-10-31 20:05:39 (1.05 GB/s) - ‘/dev/null’ saved [314572800/314572800]

测试结果:
第一条命令平均速度为 1.02GB/s

第二条 510MB/s 速度刚好为第一条的一半
第三条和第二条速度几乎相同
第四条 480MB/s
第五条 110MB/s,连tcp回落的四分之一速度都不到

第六条 370MB/s
第七条 120MB/s 这个很奇怪,刚开始很慢,不到20mb/s,后面速度才快上了, 每次测都是这样
第八条 230MB/s
第九条 300MB/s
第十条 170MB/s
第十一条 140MB/s 这条命令居然比第五条还快,就很离谱

其他信息:
1.我的机器用iperf测到的回路极限速度为20Gbps
2.系统为 Ubuntu 21.04
3.内核为 5.10.0-051000rc1-generic
4.nginx版本为1.19.4
5.v2ray版本为4.32.0

@lucifer9
Copy link
Member

@lucifer9

可能是因为这项修复 47660bf

回滚后 linux 上 benchmark 试试

没啥变化,v2ray 测试配置如下,主要测试TCP回落和DS回落的性能差别

{
    "log": {
        "loglevel": "none"
    },
    "inbounds": [
        {
            "port": 10004,
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": 10001,
                        "xver": 0
                    }
                ]
            },
            "streamSettings": {
                "network": "tcp"
            }
        },
        {
            "port": 10005,
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": "/tmp/test.sock",
                        "xver": 0
                    }
                ]
            }
        }
    ]
}

iperf3监听10000端口

iperf3 -s -p 10000

用socat转发tcp和unix domain socket

socat -b 81920000 TCP-LISTEN:10001,reuseaddr,fork TCP:127.0.0.1:10000
socat -b 81920000 UNIX-LISTEN:/tmp/test.sock,reuseaddr,fork TCP:127.0.0.1:10000

TCP回落(port 10004)

iperf3 -c 127.0.0.1 -p 10004
Connecting to host 127.0.0.1, port 10004
[  5] local 127.0.0.1 port 6178 connected to 127.0.0.1 port 10004
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.07 GBytes  9.21 Gbits/sec    2    767 KBytes
[  5]   1.00-2.00   sec  1.13 GBytes  9.74 Gbits/sec    1    895 KBytes
[  5]   2.00-3.00   sec  1.10 GBytes  9.49 Gbits/sec    1   1.12 MBytes
[  5]   3.00-4.00   sec  1.35 GBytes  11.6 Gbits/sec    0    767 KBytes
[  5]   4.00-5.00   sec  1.04 GBytes  8.92 Gbits/sec    0    767 KBytes
[  5]   5.00-6.00   sec   856 MBytes  7.19 Gbits/sec    0    767 KBytes
[  5]   6.00-7.00   sec  1.20 GBytes  10.3 Gbits/sec    0    767 KBytes
[  5]   7.00-8.00   sec  1.08 GBytes  9.24 Gbits/sec    1    767 KBytes
[  5]   8.00-9.00   sec   891 MBytes  7.48 Gbits/sec    1    895 KBytes
[  5]   9.00-10.00  sec  1.12 GBytes  9.60 Gbits/sec    1    767 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec    7             sender
[  5]   0.00-10.00  sec  10.8 GBytes  9.27 Gbits/sec                  receiver

iperf Done.

DS回落(port 10005)

iperf3 -c 127.0.0.1 -p 10005
Connecting to host 127.0.0.1, port 10005
[  5] local 127.0.0.1 port 39040 connected to 127.0.0.1 port 10005
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  63.8 MBytes   535 Mbits/sec    1   1023 KBytes
[  5]   1.00-2.00   sec  47.1 MBytes   395 Mbits/sec    5   1023 KBytes
[  5]   2.00-3.00   sec  43.0 MBytes   361 Mbits/sec    7   1023 KBytes
[  5]   3.00-4.00   sec  42.0 MBytes   352 Mbits/sec    6   1023 KBytes
[  5]   4.00-5.00   sec  50.7 MBytes   426 Mbits/sec    8   1.12 MBytes
[  5]   5.00-6.00   sec  56.4 MBytes   473 Mbits/sec    4    895 KBytes
[  5]   6.00-7.00   sec  45.2 MBytes   379 Mbits/sec    5   1023 KBytes
[  5]   7.00-8.00   sec  43.0 MBytes   361 Mbits/sec    4   1023 KBytes
[  5]   8.00-9.00   sec  47.9 MBytes   402 Mbits/sec    4   1023 KBytes
[  5]   9.00-10.00  sec  43.2 MBytes   362 Mbits/sec    4   1023 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   482 MBytes   405 Mbits/sec   48             sender
[  5]   0.00-10.00  sec   475 MBytes   398 Mbits/sec                  receiver

iperf Done.

以上是回滚以后测试的。
以下是未回滚前(master)测试的

Connecting to host 127.0.0.1, port 10005
[  5] local 127.0.0.1 port 65104 connected to 127.0.0.1 port 10005
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  59.1 MBytes   495 Mbits/sec    4   1023 KBytes       
[  5]   1.00-2.00   sec  43.5 MBytes   365 Mbits/sec    3   1023 KBytes       
[  5]   2.00-3.00   sec  45.3 MBytes   380 Mbits/sec    4   1023 KBytes       
[  5]   3.00-4.00   sec  43.9 MBytes   368 Mbits/sec   12    895 KBytes       
[  5]   4.00-5.00   sec  44.6 MBytes   374 Mbits/sec    5    895 KBytes       
[  5]   5.00-6.00   sec  53.6 MBytes   450 Mbits/sec    4    895 KBytes       
[  5]   6.00-7.00   sec  46.8 MBytes   392 Mbits/sec    3    895 KBytes       
[  5]   7.00-8.00   sec  48.8 MBytes   409 Mbits/sec    4    895 KBytes       
[  5]   8.00-9.00   sec  45.4 MBytes   380 Mbits/sec    4    895 KBytes       
[  5]   9.00-10.00  sec  46.8 MBytes   393 Mbits/sec    5    895 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   478 MBytes   401 Mbits/sec   48             sender
[  5]   0.00-10.00  sec   469 MBytes   394 Mbits/sec                  receiver

iperf Done.

结果很接近

@lucifer9
Copy link
Member

macOS 10.15.7 下测试,同样配置
TCP回落是12.4Gb/s,DS回落3.66Gb/s
看上去差距就没Linux下那么大

@RPRX
Copy link
Contributor Author

RPRX commented Oct 31, 2020

@xiaokangwang 说 TCP 有特殊优化,本来 DS 也有的(回落的 DS 可能不在这个体系内?)

@lucifer9
Copy link
Member

lucifer9 commented Oct 31, 2020

@xiaokangwang 说 TCP 有特殊优化,本来 DS 也有的(回落的 DS 可能不在这个体系内?)

怎么优化的啊,我用如下的配置测试

{
    "log": {
        "loglevel": "none"
    },
    "inbounds": [
        {
            "port": 10000,
            "listen": "127.0.0.1",
            "protocol": "dokodemo-door",
            "settings": {
                "address": "127.0.0.1",
                "port": 10001,
                "network": "tcp"
            }
        }
    ],
    "outbounds": [
        {
            "protocol": "freedom",
            "settings": {}
        }
    ]
}

结果还没回落快

iperf3 -c 127.0.0.1 -p 10000
Connecting to host 127.0.0.1, port 10000
[  5] local 127.0.0.1 port 43650 connected to 127.0.0.1 port 10000
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   551 MBytes  4.62 Gbits/sec   10    895 KBytes       
[  5]   1.00-2.00   sec   343 MBytes  2.88 Gbits/sec   11    895 KBytes       
[  5]   2.00-3.00   sec   367 MBytes  3.08 Gbits/sec    5    895 KBytes       
[  5]   3.00-4.00   sec   330 MBytes  2.77 Gbits/sec   11   1023 KBytes       
[  5]   4.00-5.00   sec   331 MBytes  2.77 Gbits/sec    9    895 KBytes       
[  5]   5.00-6.00   sec   451 MBytes  3.78 Gbits/sec    3   1.12 MBytes       
[  5]   6.00-7.00   sec   285 MBytes  2.39 Gbits/sec   10   1023 KBytes       
[  5]   7.00-8.00   sec   403 MBytes  3.38 Gbits/sec   15   1023 KBytes       
[  5]   8.00-9.00   sec   372 MBytes  3.12 Gbits/sec   17    767 KBytes       
[  5]   9.00-10.00  sec   467 MBytes  3.92 Gbits/sec   11    895 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  3.81 GBytes  3.27 Gbits/sec  102             sender
[  5]   0.00-10.00  sec  3.80 GBytes  3.27 Gbits/sec                  receiver

iperf Done.

直连的速度

iperf3 -c 127.0.0.1 -p 10001
Connecting to host 127.0.0.1, port 10001
[  5] local 127.0.0.1 port 10144 connected to 127.0.0.1 port 10001
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  4.33 GBytes  37.2 Gbits/sec    0    767 KBytes       
[  5]   1.00-2.00   sec  4.44 GBytes  38.1 Gbits/sec    0    767 KBytes       
[  5]   2.00-3.00   sec  4.41 GBytes  37.8 Gbits/sec    0    895 KBytes       
[  5]   3.00-4.00   sec  4.46 GBytes  38.3 Gbits/sec    0    895 KBytes       
[  5]   4.00-5.00   sec  4.30 GBytes  36.9 Gbits/sec    0    895 KBytes       
[  5]   5.00-6.00   sec  4.21 GBytes  36.2 Gbits/sec    0    767 KBytes       
[  5]   6.00-7.00   sec  4.50 GBytes  38.6 Gbits/sec    0    767 KBytes       
[  5]   7.00-8.00   sec  4.67 GBytes  40.1 Gbits/sec    0    767 KBytes       
[  5]   8.00-9.00   sec  4.26 GBytes  36.6 Gbits/sec    0    767 KBytes       
[  5]   9.00-10.00  sec  4.40 GBytes  37.8 Gbits/sec    0    767 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  44.0 GBytes  37.8 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  44.0 GBytes  37.8 Gbits/sec                  receiver

iperf Done.

同样上面的配置,macOS上测试,直连的速度是通过v2ray的2倍。没Linux上这么夸张。

@RPRX
Copy link
Contributor Author

RPRX commented Nov 2, 2020

@lucifer9 TG 群反馈 v4.32.0 的入站日志中来源端口均为 0,大概是改 DS 引入的问题

@lucifer9
Copy link
Member

lucifer9 commented Nov 2, 2020

@lucifer9 TG 群反馈 v4.32.0 的入站日志中来源端口均为 0,大概是改 DS 引入的问题

如果是用的ws或者h2,这个行为是正常的。
之前的代码是用X-Forwarded-For的IP作为log里面的来源IP,实际的remote_address的端口作为log里面的来源端口。这样实际上是不对的,真有问题的话,没法根据这个IP:Port组合去上一级代理的accsee log里面找相应记录。
如果要求真实端口的话,那么得上级代理支持,这个要求比较难以满足。虽然确实大部分代理都可以配置,也有相应标准,但是对用户来说就不太友好。
现在只记录X-Forwarded-For,忽略掉端口,用户去上级代理查记录时候也不会被误导。当然要是强迫症特别喜欢看到ip:port这样的话,那就随机生成一个5位数端口也是可以的 😃

@badO1a5A90
Copy link
Contributor

但可以发现,只要多套一层,比如 HTTP 伪装、WebSocket、TLS、PROXY protocol,这项优化就不会起作用了。。。

所以是否意味着,比如带TLS的情况下,DS应该超过TCP的性能?但以往一些测试中似乎也并非如此.DS一直比TCP要慢.

@RPRX
Copy link
Contributor Author

RPRX commented Nov 5, 2020

VLESS XTLS Direct Mode ReadV Experiment 很成功,少一次内存拷贝,又提升了近一倍性能。回落时的 DS 如果有 ReadV,应该能提升读取返回数据的性能。但控制变量的情况下和 TCP 比还是很奇怪,难道和 v2 内部处理数据转发的方式有关?另外,回落不经过路由和 pipe 机制,数据转发流程很简单,所以跑大量数据时性能甚至能超过 Nginx。这个 pipe 机制读和写都有大锁,可能是 v2 当前性能问题的核心所在,回落的长期稳定还证明了这个 pipe 机制没有绝对必要、可以绕过,即直接对接 reader 和 writer。

https://github.com/v2fly/v2ray-core/blob/master/transport/pipe/impl.go

@RPRX RPRX changed the title v2ray 的 DS 可能存在性能问题 v2ray 的 DS 可能存在性能问题 & v2ray 性能提升 Nov 5, 2020
@RPRX RPRX changed the title v2ray 的 DS 可能存在性能问题 & v2ray 性能提升 v2ray 的 DS 可能存在性能问题 & v2ray 整体性能提升 Nov 5, 2020
@RPRX
Copy link
Contributor Author

RPRX commented Nov 5, 2020

这里备忘一下,XTLS Direct Mode 的 ReadV Experiment 同时证明了它在 arm、mips 上都可用,不仅限于桌面平台。待周五的新版本带来更多样本后,如果没有什么问题,将把 v2 的 ReadV 改为全平台开放,预计大幅提升裸 VMess、SS 的性能。

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2021

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed Stale Welcome PR
Projects
None yet
Development

No branches or pull requests

3 participants