Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dfget回源拉取hdfs失败 #939

Closed
wciq1208 opened this issue Dec 16, 2021 · 1 comment · Fixed by #940
Closed

dfget回源拉取hdfs失败 #939

wciq1208 opened this issue Dec 16, 2021 · 1 comment · Fixed by #940

Comments

@wciq1208
Copy link

Bug report:

当dfget因为别的原因拉取HDFS上的文件失败后,
https://github.com/dragonflyoss/Dragonfly2/blob/v2.0.1/client/dfget/dfget.go#L156 触发回源并作流的写入操作
而io.Copy最终会调用一次 https://github.com/dragonflyoss/Dragonfly2/blob/v2.0.1/pkg/source/hdfsprotocol/hdfs_source_client.go#L286 处的函数,
但该函数只作了一次最大为buf size的写入操作就返回结果,导致最终文件仅有512字节,并且程序误判为成功。
将函数改为

func (rc *hdfsFileReaderClose) WriteTo(w io.Writer) (n int64, err error) {
	return io.Copy(w, rc.limited)
}

后成功回源下载文件

最后补充一下,回源前下载失败是由于

storage:
  # 磁盘 GC 阈值,缓存数据超过阈值后,最旧的缓存数据将会被清理
  diskGCThreshold: 50Gi

该配置写的太小引起,可以考虑一下GC掉正在进行的任务的合理性

Expected behavior:

How to reproduce it:

在GC阈值分别设为10G、50G时下载HDFS上60G的文件都能复现

Environment:

  • Dragonfly version:v2.0.1
  • OS:Linux shylf-magi-01 4.9.0-0.bpo.7-amd64 [WIP] Implement df daemon #1 SMP Debian 4.9.110-3+deb9u2~deb8u1 (2018-08-14) x86_64 GNU/Linux
  • Kernel (e.g. uname -a):4.9.0
  • Others:
@jim3ma
Copy link
Member

jim3ma commented Dec 16, 2021

感谢反馈。

代码修改,能否提个 PR,我们来负责 review ?

@jim3ma jim3ma mentioned this issue Dec 17, 2021
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants