You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Uncomment only one /device <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:
/device ascend 华为910B显卡
Software Environment:
MindSpore version (2.3.1 , 2.3.0-rc1)尝试过这两种版本:
Python version (e.g., Python 3.8.20 ,2.9.11)尝试过这两种版本:
OS platform and distribution (e.g., Ubuntu 22.04.5 LTS):
GCC/Compiler version (gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.):
环境
Hardware Environment(
Ascend
):Software Environment:
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.):
Describe the current behavior
一下是代码报错部分:
(md_py3911) root@6163cbabedd4:/home/data/jupyter/ai_project/scripts/workcloth# python RunTrainCommand.py
/home/data/jupyter/ai_project/scripts/workcloth
2024-11-28 09:43:46,562 [INFO] parse_args:
2024-11-28 09:43:46,562 [INFO] task detect
2024-11-28 09:43:46,562 [INFO] device_target Ascend
2024-11-28 09:43:46,562 [INFO] save_dir ./runs/2024.11.28-09.43.46
2024-11-28 09:43:46,562 [INFO] log_level INFO
2024-11-28 09:43:46,562 [INFO] is_parallel False
2024-11-28 09:43:46,562 [INFO] ms_mode 0
2024-11-28 09:43:46,562 [INFO] ms_amp_level O0
2024-11-28 09:43:46,562 [INFO] keep_loss_fp32 True
2024-11-28 09:43:46,562 [INFO] anchor_base True
2024-11-28 09:43:46,562 [INFO] ms_loss_scaler static
2024-11-28 09:43:46,562 [INFO] ms_loss_scaler_value 1024.0
2024-11-28 09:43:46,562 [INFO] ms_jit True
2024-11-28 09:43:46,562 [INFO] ms_enable_graph_kernel False
2024-11-28 09:43:46,562 [INFO] ms_datasink False
2024-11-28 09:43:46,562 [INFO] overflow_still_update True
2024-11-28 09:43:46,562 [INFO] clip_grad False
2024-11-28 09:43:46,562 [INFO] clip_grad_value 10.0
2024-11-28 09:43:46,562 [INFO] ema True
2024-11-28 09:43:46,562 [INFO] weight ../../models/yolov7-tiny_300e.ckpt
2024-11-28 09:43:46,562 [INFO] ema_weight
2024-11-28 09:43:46,562 [INFO] freeze []
2024-11-28 09:43:46,562 [INFO] epochs 2
2024-11-28 09:43:46,562 [INFO] per_batch_size 8
2024-11-28 09:43:46,562 [INFO] img_size 640
2024-11-28 09:43:46,562 [INFO] nbs 64
2024-11-28 09:43:46,562 [INFO] accumulate 1
2024-11-28 09:43:46,562 [INFO] auto_accumulate False
2024-11-28 09:43:46,562 [INFO] log_interval 10
2024-11-28 09:43:46,562 [INFO] single_cls False
2024-11-28 09:43:46,562 [INFO] sync_bn False
2024-11-28 09:43:46,562 [INFO] keep_checkpoint_max 100
2024-11-28 09:43:46,562 [INFO] run_eval False
2024-11-28 09:43:46,562 [INFO] conf_thres 0.001
2024-11-28 09:43:46,562 [INFO] iou_thres 0.65
2024-11-28 09:43:46,562 [INFO] conf_free False
2024-11-28 09:43:46,562 [INFO] rect False
2024-11-28 09:43:46,562 [INFO] nms_time_limit 20.0
2024-11-28 09:43:46,562 [INFO] recompute False
2024-11-28 09:43:46,562 [INFO] recompute_layers 0
2024-11-28 09:43:46,562 [INFO] seed 2
2024-11-28 09:43:46,562 [INFO] summary True
2024-11-28 09:43:46,562 [INFO] profiler False
2024-11-28 09:43:46,562 [INFO] profiler_step_num 1
2024-11-28 09:43:46,562 [INFO] opencv_threads_num 2
2024-11-28 09:43:46,562 [INFO] strict_load False
2024-11-28 09:43:46,562 [INFO] enable_modelarts False
2024-11-28 09:43:46,562 [INFO] data_url
2024-11-28 09:43:46,562 [INFO] ckpt_url
2024-11-28 09:43:46,562 [INFO] multi_data_url
2024-11-28 09:43:46,562 [INFO] pretrain_url
2024-11-28 09:43:46,562 [INFO] train_url
2024-11-28 09:43:46,562 [INFO] data_dir /cache/data/
2024-11-28 09:43:46,562 [INFO] ckpt_dir /cache/pretrain_ckpt/
2024-11-28 09:43:46,562 [INFO] data.dataset_name WorkCloth
2024-11-28 09:43:46,562 [INFO] data.train_set ../../dataset/WorkCloth/train.txt
2024-11-28 09:43:46,562 [INFO] data.val_set ../../dataset/WorkCloth/val.txt
2024-11-28 09:43:46,562 [INFO] data.test_set ../../dataset/WorkCloth/test.txt
2024-11-28 09:43:46,562 [INFO] data.nc 2
2024-11-28 09:43:46,562 [INFO] data.names ['work_clothes', '']
2024-11-28 09:43:46,562 [INFO] data.num_parallel_workers 4
2024-11-28 09:43:46,562 [INFO] data.train_transforms [{'func_name': 'mosaic', 'prob': 1.0, 'mosaic9_prob': 0.2}, {'func_name': 'resample_segments'}, {'func_name': 'random_perspective', 'prob': 1.0, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0}, {'func_name': 'mixup', 'alpha': 8.0, 'beta': 8.0, 'prob': 0.05, 'pre_transform': [{'func_name': 'mosaic', 'prob': 1.0, 'mosaic9_prob': 0.2}, {'func_name': 'resample_segments'}, {'func_name': 'random_perspective', 'prob': 1.0, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0}]}, {'func_name': 'hsv_augment', 'prob': 1.0, 'hgain': 0.015, 'sgain': 0.7, 'vgain': 0.4}, {'func_name': 'pastein', 'prob': 0.05, 'num_sample': 30}, {'func_name': 'fliplr', 'prob': 0.5}, {'func_name': 'label_norm', 'xyxy2xywh_': True}, {'func_name': 'label_pad', 'padding_size': 160, 'padding_value': -1}, {'func_name': 'image_norm', 'scale': 255.0}, {'func_name': 'image_transpose', 'bgr2rgb': True, 'hwc2chw': True}]
2024-11-28 09:43:46,562 [INFO] data.test_transforms [{'func_name': 'letterbox', 'scaleup': False, 'only_image': True}, {'func_name': 'image_norm', 'scale': 255.0}, {'func_name': 'image_transpose', 'bgr2rgb': True, 'hwc2chw': True}]
2024-11-28 09:43:46,562 [INFO] optimizer.lr_init 0.01
2024-11-28 09:43:46,562 [INFO] optimizer.optimizer momentum
2024-11-28 09:43:46,562 [INFO] optimizer.momentum 0.937
2024-11-28 09:43:46,562 [INFO] optimizer.nesterov True
2024-11-28 09:43:46,562 [INFO] optimizer.loss_scale 1.0
2024-11-28 09:43:46,562 [INFO] optimizer.warmup_epochs 3
2024-11-28 09:43:46,562 [INFO] optimizer.warmup_momentum 0.8
2024-11-28 09:43:46,562 [INFO] optimizer.warmup_bias_lr 0.1
2024-11-28 09:43:46,562 [INFO] optimizer.min_warmup_step 1000
2024-11-28 09:43:46,562 [INFO] optimizer.group_param yolov7
2024-11-28 09:43:46,562 [INFO] optimizer.gp_weight_decay 0.0005
2024-11-28 09:43:46,562 [INFO] optimizer.start_factor 1.0
2024-11-28 09:43:46,562 [INFO] optimizer.end_factor 0.01
2024-11-28 09:43:46,562 [INFO] optimizer.epochs 2
2024-11-28 09:43:46,562 [INFO] optimizer.nbs 64
2024-11-28 09:43:46,562 [INFO] optimizer.accumulate 1
2024-11-28 09:43:46,562 [INFO] optimizer.total_batch_size 8
2024-11-28 09:43:46,562 [INFO] loss.name YOLOv7Loss
2024-11-28 09:43:46,562 [INFO] loss.box 0.05
2024-11-28 09:43:46,562 [INFO] loss.cls 0.5
2024-11-28 09:43:46,562 [INFO] loss.cls_pw 1.0
2024-11-28 09:43:46,562 [INFO] loss.obj 1.0
2024-11-28 09:43:46,562 [INFO] loss.obj_pw 1.0
2024-11-28 09:43:46,562 [INFO] loss.fl_gamma 0.0
2024-11-28 09:43:46,562 [INFO] loss.anchor_t 4.0
2024-11-28 09:43:46,562 [INFO] loss.label_smoothing 0.0
2024-11-28 09:43:46,562 [INFO] network.model_name yolov7
2024-11-28 09:43:46,562 [INFO] network.depth_multiple 1.0
2024-11-28 09:43:46,562 [INFO] network.width_multiple 1.0
2024-11-28 09:43:46,562 [INFO] network.stride [8, 16, 32]
2024-11-28 09:43:46,562 [INFO] network.anchors [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]]
2024-11-28 09:43:46,562 [INFO] network.backbone [[-1, 1, 'ConvNormAct', [32, 3, 2, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [64, 3, 2, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [32, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-2, 1, 'ConvNormAct', [32, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [32, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [32, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, -2, -3, -4], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [64, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'MP', []], [-1, 1, 'ConvNormAct', [64, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-2, 1, 'ConvNormAct', [64, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [64, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [64, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, -2, -3, -4], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [128, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'MP', []], [-1, 1, 'ConvNormAct', [128, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-2, 1, 'ConvNormAct', [128, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [128, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [128, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, -2, -3, -4], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [256, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'MP', []], [-1, 1, 'ConvNormAct', [256, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-2, 1, 'ConvNormAct', [256, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [256, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [256, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, -2, -3, -4], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [512, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']]]
2024-11-28 09:43:46,562 [INFO] network.head [[-1, 1, 'ConvNormAct', [256, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-2, 1, 'ConvNormAct', [256, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'SP', [5]], [-2, 1, 'SP', [9]], [-3, 1, 'SP', [13]], [[-1, -2, -3, -4], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [256, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, -7], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [256, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [128, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'Upsample', ['None', 2, 'nearest']], [21, 1, 'ConvNormAct', [128, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, -2], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [64, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-2, 1, 'ConvNormAct', [64, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [64, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [64, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, -2, -3, -4], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [128, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [64, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'Upsample', ['None', 2, 'nearest']], [14, 1, 'ConvNormAct', [64, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, -2], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [32, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-2, 1, 'ConvNormAct', [32, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [32, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [32, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, -2, -3, -4], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [64, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [128, 3, 2, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, 47], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [64, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-2, 1, 'ConvNormAct', [64, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [64, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [64, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, -2, -3, -4], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [128, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [256, 3, 2, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, 37], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [128, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-2, 1, 'ConvNormAct', [128, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [128, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [-1, 1, 'ConvNormAct', [128, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[-1, -2, -3, -4], 1, 'Concat', [1]], [-1, 1, 'ConvNormAct', [256, 1, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [57, 1, 'ConvNormAct', [128, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [65, 1, 'ConvNormAct', [256, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [73, 1, 'ConvNormAct', [512, 3, 1, 'None', 1, 1, 'nn.LeakyReLU(0.1)']], [[74, 75, 76], 1, 'YOLOv7Head', ['nc', 'anchors', 'stride']]]
2024-11-28 09:43:46,562 [INFO] config ../../configs/yolov7/yolov7-tiny_xyz_workcloth.yaml
2024-11-28 09:43:46,562 [INFO] rank 0
2024-11-28 09:43:46,562 [INFO] rank_size 1
2024-11-28 09:43:46,562 [INFO] total_batch_size 8
2024-11-28 09:43:46,562 [INFO] callback []
2024-11-28 09:43:46,562 [INFO]
2024-11-28 09:43:46,564 [INFO] Please check the above information for the configurations
2024-11-28 09:43:54,359 [WARNING] Parse Model, args: nearest, keep str type
2024-11-28 09:43:54,468 [WARNING] Parse Model, args: nearest, keep str type
2024-11-28 09:43:54,882 [INFO] number of network params, total: 6.032533M, trainable: 6.017694M
2024-11-28 09:43:56,393 [WARNING] Parse Model, args: nearest, keep str type
2024-11-28 09:43:56,495 [WARNING] Parse Model, args: nearest, keep str type
2024-11-28 09:43:56,964 [INFO] number of network params, total: 6.032533M, trainable: 6.017694M
2024-11-28 09:43:57,925 [WARNING] Dropping checkpoint parameter
model.model.77.m.0.weight
with shape(255, 128, 1, 1)
, which is inconsistent with cell shape(21, 128, 1, 1)
2024-11-28 09:43:57,925 [WARNING] Dropping checkpoint parameter
model.model.77.m.0.bias
with shape(255,)
, which is inconsistent with cell shape(21,)
2024-11-28 09:43:57,925 [WARNING] Dropping checkpoint parameter
model.model.77.m.1.weight
with shape(255, 256, 1, 1)
, which is inconsistent with cell shape(21, 256, 1, 1)
2024-11-28 09:43:57,926 [WARNING] Dropping checkpoint parameter
model.model.77.m.1.bias
with shape(255,)
, which is inconsistent with cell shape(21,)
2024-11-28 09:43:57,926 [WARNING] Dropping checkpoint parameter
model.model.77.m.2.weight
with shape(255, 512, 1, 1)
, which is inconsistent with cell shape(21, 512, 1, 1)
2024-11-28 09:43:57,926 [WARNING] Dropping checkpoint parameter
model.model.77.m.2.bias
with shape(255,)
, which is inconsistent with cell shape(21,)
2024-11-28 09:43:57,926 [WARNING] Dropping checkpoint parameter
model.model.77.im.0.implicit
with shape(1, 255, 1, 1)
, which is inconsistent with cell shape(1, 21, 1, 1)
2024-11-28 09:43:57,926 [WARNING] Dropping checkpoint parameter
model.model.77.im.1.implicit
with shape(1, 255, 1, 1)
, which is inconsistent with cell shape(1, 21, 1, 1)
2024-11-28 09:43:57,926 [WARNING] Dropping checkpoint parameter
model.model.77.im.2.implicit
with shape(1, 255, 1, 1)
, which is inconsistent with cell shape(1, 21, 1, 1)
[WARNING] ME(337942:281469431465216,MainProcess):2024-11-28-09:43:57.945.768 [mindspore/train/serialization.py:1560] For 'load_param_into_net', 9 parameters in the 'net' are not loaded, because they are not in the 'parameter_dict', please check whether the network structure is consistent when training and loading checkpoint.
[WARNING] ME(337942:281469431465216,MainProcess):2024-11-28-09:43:57.945.909 [mindspore/train/serialization.py:1564] ['model.model.77.m.0.weight', 'model.model.77.m.0.bias', 'model.model.77.m.1.weight', 'model.model.77.m.1.bias', 'model.model.77.m.2.weight', 'model.model.77.m.2.bias', 'model.model.77.im.0.implicit', 'model.model.77.im.1.implicit', 'model.model.77.im.2.implicit'] are not loaded.
2024-11-28 09:43:57,946 [INFO] Pretrain model load from "../../models/yolov7-tiny_300e.ckpt" success.
2024-11-28 09:44:10,634 [INFO] ema_weight not exist, default pretrain weight is currently used.
2024-11-28 09:44:10,693 [INFO] Dataset Cache file hash/version check success.
2024-11-28 09:44:10,694 [INFO] Load dataset cache from [../../dataset/WorkCloth/train.cache.npy] success.
Scanning '../../dataset/WorkCloth/train.cache.npy' images and labels... 2395 found, 0 missing, 0 empty, 0 corrupted: 100%|█| 2395/2395 [00:00<?, ?it/s]
2024-11-28 09:44:10,724 [INFO] Dataloader num parallel workers: [4]
2024-11-28 09:44:12,185 [INFO] Registry(name=callback, total=4)
2024-11-28 09:44:12,185 [INFO] (0): YoloxSwitchTrain in mindyolo/utils/callback.py
2024-11-28 09:44:12,185 [INFO] (1): EvalWhileTrain in mindyolo/utils/callback.py
2024-11-28 09:44:12,185 [INFO] (2): SummaryCallback in mindyolo/utils/callback.py
2024-11-28 09:44:12,185 [INFO] (3): ProfilerCallback in mindyolo/utils/callback.py
2024-11-28 09:44:12,185 [INFO]
2024-11-28 09:44:12,299 [INFO] got 1 active callback as follows:
2024-11-28 09:44:12,300 [INFO] SummaryCallback()
2024-11-28 09:44:12,300 [WARNING] The first epoch will be compiled for the graph, which may take a long time; You can come back later :).
Warning: tiling offset out of range, index: 32
Warning: tiling offset out of range, index: 32
Warning: tiling offset out of range, index: 32
Warning: tiling offset out of range, index: 32
Warning: tiling offset out of range, index: 32
[ERROR] DEVICE(337942,fffcf347f1a0,python):2024-11-28-09:53:18.692.462 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:232] TaskExceptionCallback] Run Task failed, task_id: 0, stream_id: 548, tid: 338667, device_id: 0, retcode: 507018 (aicpu exception)
[ERROR] GE_ADPT(337942,fffcf347f1a0,python):2024-11-28-09:53:18.795.720 [mindspore/ccsrc/transform/graph_ir/graph_runner.cc:371] RunGraphWithStreamAsync] Call GE RunGraphWithStreamAsync Failed, ret is: 4294967295
Traceback (most recent call last):
File "../../mindyolo-master/train.py", line 330, in
train(args)
File "../../mindyolo-master/train.py", line 285, in train
trainer.train(
File "/home/data/jupyter/ai_project/mindyolo-master/mindyolo/utils/trainer_factory.py", line 170, in train
run_context.loss, run_context.lr = self.train_step(imgs, labels, segments,
File "/home/data/jupyter/ai_project/mindyolo-master/mindyolo/utils/trainer_factory.py", line 366, in train_step
loss, loss_item, _, grads_finite = self.train_step_fn(imgs, labels, True)
File "/usr/local/python3.8/lib/python3.8/site-packages/mindspore/common/api.py", line 941, in staging_specialize
out = _MindsporeFunctionExecutor(func, hash_obj, dyn_args, process_obj, jit_config)(*args, **kwargs)
File "/usr/local/python3.8/lib/python3.8/site-packages/mindspore/common/api.py", line 185, in wrapper
results = fn(*arg, **kwargs)
File "/usr/local/python3.8/lib/python3.8/site-packages/mindspore/common/api.py", line 572, in call
output = self._graph_executor(tuple(new_inputs), phase)
RuntimeError: Exec graph failed
EZ9999: Inner Error!
EZ9999: 2024-11-28-09:53:18.691.987 Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1776][THREAD:338667]
TraceBack (most recent call last):
Aicpu kernel execute failed, device_id=0, stream_id=548, task_id=0, errorCode=2a.[FUNC:PrintAicpuErrorInfo][FILE:task_info.cc][LINE:1579][THREAD:338667]
AICPU Kernel task happen error, retCode=0x2a.[FUNC:GetError][FILE:stream.cc][LINE:1512][THREAD:338667]
Aicpu kernel execute failed, device_id=0, stream_id=548, task_id=0, fault op_name=[FUNC:GetError][FILE:stream.cc][LINE:1512][THREAD:338667]
rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53][THREAD:338667]
Call rtStreamSynchronize(stream) fail, ret: 0x7BC8A[FUNC:LaunchKernelCustAicpuSo][FILE:model_manager.cc][LINE:1698][THREAD:338667]
GraphManager RunGrapWithStreamhAsync failed,session id = 0, graph id = 2, stream = 0xaaad7b705a70.[FUNC:RunGraphWithStreamAsync][FILE:inner_session.cc][LINE:513][THREAD:338667]
[Run][Graph]Run graph with stream asyn failed, error code = 507018, session id = 0,graph id = 2, stream = 0xaaad7b705a70.[FUNC:RunGraphWithStreamAsync][FILE:ge_api.cc][LINE:800][THREAD:338667]
(Please search "CANN Common Error Analysis" at https://www.mindspore.cn for error code description)
mindspore/ccsrc/plugin/device/ascend/hal/hardware/ge_graph_executor.cc:1332 RunGraphRefMode
Describe the expected behavior
解决报错,能够正常训练模型
Steps to reproduce the issue
3.触发内核错误
Related log / screenshot
Special notes for this issue
The text was updated successfully, but these errors were encountered: