更新代码后，重新执行finetune.sh出错， TypeError: init_process_group() got multiple values for keyword argument 'backend' #112

alisyzhu · 2023-04-26T05:37:24Z

昨日重新拉取git code之后，再次执行finetune.sh，torchrun 就会报错。
【初始环境】
A100 * 1
accelerate 0.18.0
bitsandbytes 0.37.2
transformers 4.29.0.dev0
【修改环境v1】
执行pip install transformers==4.28.1
结果：仍然错误
【修改环境v2】
执行 pip install git+https://github.com/huggingface/transformers@ff20f9cf3615a8638023bc82925573cb9d0f3560
结果：仍然报错，错误如下：

Facico · 2023-04-26T06:43:56Z

finetune.py一个月都没改过了，老问题了你单卡就别用torchrun了，直接用python跑
（报错的时候，你可以把你的错误在我们仓库搜一下）

dizhenx · 2023-04-27T08:24:47Z

finetune.py一个月都没改过了，老问题了你单卡就别用torchrun了，直接用python跑（报错的时候，你可以把你的错误在我们仓库搜一下）

我现在运行bash finetune_others_continue.sh也报这个错，这个错误是在调用finetune.py:237行时发生的

Facico · 2023-05-04T03:35:31Z

@dizhenx 这个和哪个脚本没关系，多卡用torchrun（我们脚本都是默认多卡的），单卡就不要用了，直接用python

wangrui6 · 2023-05-19T01:41:15Z

@Facico A100 的训练参数组合有经验吗？
···

# optimized for RTX 4090. for larger GPUs, increase some of these?

MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2
BATCH_SIZE = 128
MAX_STEPS = None
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3 # we don't always need 3 tbh
LEARNING_RATE = 3e-4 # the Karpathy constant
CUTOFF_LEN = 256 # 256 accounts for about 96% of the data
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05
VAL_SET_SIZE = args.test_size #2000
···
尤其是前几个

wangrui6 · 2023-05-19T06:37:30Z

@Facico 另外想问一下finetune的速度如何？用4090 finetune vicuna13b，100K的samples大概要多久？有可以参考的数据吗？

benjamin555 · 2023-06-29T08:19:49Z

vicuna13b

这个框架支持精调vicuna13b吗？

Facico · 2023-06-29T09:42:21Z

@wangrui6 一般只用根据硬件需求调CUTOFF_LEN。不太记得了，应该是几十万数据大概跑了200h
@benjamin555 基底是llama 的都支持

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

更新代码后，重新执行finetune.sh出错， TypeError: init_process_group() got multiple values for keyword argument 'backend' #112

更新代码后，重新执行finetune.sh出错， TypeError: init_process_group() got multiple values for keyword argument 'backend' #112

alisyzhu commented Apr 26, 2023

Facico commented Apr 26, 2023

dizhenx commented Apr 27, 2023

Facico commented May 4, 2023

wangrui6 commented May 19, 2023 •

edited

Loading

wangrui6 commented May 19, 2023

benjamin555 commented Jun 29, 2023

Facico commented Jun 29, 2023

更新代码后，重新执行finetune.sh出错， TypeError: init_process_group() got multiple values for keyword argument 'backend' #112

更新代码后，重新执行finetune.sh出错， TypeError: init_process_group() got multiple values for keyword argument 'backend' #112

Comments

alisyzhu commented Apr 26, 2023

Facico commented Apr 26, 2023

dizhenx commented Apr 27, 2023

Facico commented May 4, 2023

wangrui6 commented May 19, 2023 • edited Loading

# optimized for RTX 4090. for larger GPUs, increase some of these?

wangrui6 commented May 19, 2023

benjamin555 commented Jun 29, 2023

Facico commented Jun 29, 2023

wangrui6 commented May 19, 2023 •

edited

Loading