pytorch-bug-index

本文记录了pytorch中碰到的一些bug

[error]RuntimeError: Couldn’t export Python operator <python_value>

Pytorch训练过程中使用torch.jit.ScriptModule来trace模型,net.save(path)保存模型时候出现下面的错误:

Traceback (most recent call last):
File "train.py", line 69, in <module>
train(opt, data_loader, model, visualizer)
File "train.py", line 42, in train
model.save('latest')
File "/opt/ml/job/models/conditional_gan_model.py", line 132, in save
self.save_network(self.netG, 'G', label, self.gpu_ids)
File "/opt/ml/job/models/base_model.py", line 49, in save_network
network.save(save_path)
RuntimeError: Couldn't export Python operator <python_value>
Defined at:
@torch.jit.script_method
def forward(self, x):
output = self.model(x)
~~~~~~~~~~ <--- HERE
return output
===============End of job stdout&stderr==============

[error] Import Error

安装完pytorch后,导入import torch,出现下面的错误:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/maxliu/anaconda3/lib/python3.7/site-packages/torch/__init__.py", line 84, in <module>
from torch._C import *
ImportError: /home/maxliu/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZNK3c104Type11isSubtypeOfESt10shared_ptrIS0_E

首先我能够明确安装nvidia, cuda, cudnn没有任何问题,那么问题出在哪里了呢,使用命令查看这个符号:

c++filt _ZNK3c104Type11isSubtypeOfESt10shared_ptrIS0_E

得到下面的函数:

c10::Type::isSubtypeOf(std::shared_ptr<c10::Type>) const

这里怀疑在搜索c10的库时候链接的版本库出了问题,因为安装的pytorch是最新的版本,而我之前使用过旧的libtorch版本,并且加入到了LD_LIBRARY_PATHecho $LD_LIBRARY_PATH查看LD_LIBRARY_PATH

/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/data/source/libtorch/lib:

其中/data/source/libtorch/lib就是我之前加入的旧库,import torch引入了旧的库导致找不到符号,在LD_LIBRARY_PATH中去除这个库即可。

conda-using-skill linux-install-index

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×