我使用了tensorflow1.4.0+CUDA8.0+cudnn6.0进行深度学习的训练,当训练进行到第一个epoch结束的时候就会出现jupyter服务重启的问题,按照之前的博主限制了显卡的占用率,也还是没有效果,查了一下nvidia-smi,显示显卡也有正常调用,很困惑,明明安装了CUDA,版本也应该是正确的,求各位大佬解答。
限制显卡占用的代码

import keras.backend.tensorflow_backend as ktf
import tensorflow as tf
import os
os.environ['CUDA_VISIBLE_DEVICES']='0'
Conf = tf.ConfigProto()
Conf.gpu_options.per_process_gpu_memory_fraction = 0.5
Conf.gpu_options.allow_growth = True
sess = tf.Session(config = Conf)
ktf.set_session(sess)

查询nvidia-smi的显示

运行一个epoch后的显示

以下是错误信息

Exception in thread Thread-6:
Traceback (most recent call last):
 File "e:\anaconda3\envs\tensorflow\lib\threading.py", line 916, in _bootstrap_inner
   self.run()
 File "e:\anaconda3\envs\tensorflow\lib\threading.py", line 864, in run
   self._target(*self._args, **self._kwargs)
 File "e:\anaconda3\envs\tensorflow\lib\site-packages\keras\utils\data_utils.py", line 568, in data_generator_task
   generator_output = next(self._generator)
 File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 155, in data_generator
   skip_blank=skip_blank, permute=permute)
 File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 210, in add_data
   data, truth = get_data_from_file(data_file, index, patch_shape=patch_shape)
 File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 234, in get_data_from_file
   data, truth = get_data_from_file(data_file, index, patch_shape=None)
 File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 238, in get_data_from_file
   x, y = data_file.root.data[index], data_file.root.truth[index, 0]
 File "e:\anaconda3\envs\tensorflow\lib\site-packages\tables\array.py", line 658, in __getitem__
   arr = self._read_slice(startl, stopl, stepl, shape)
 File "e:\anaconda3\envs\tensorflow\lib\site-packages\tables\array.py", line 762, in _read_slice
   self._g_read_slice(startl, stopl, stepl, nparr)
 File "tables\hdf5extension.pyx", line 1585, in tables.hdf5extension.Array._g_read_slice
tables.exceptions.HDF5ExtError: HDF5 error back trace

 File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dio.c", line 199, in H5Dread
   can't read data
 File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dio.c", line 601, in H5D__read
   can't read data
 File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dchunk.c", line 2282, in H5D__chunk_read
   chunked read failed
 File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dselect.c", line 283, in H5D__select_read
   read error
 File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dselect.c", line 118, in H5D__select_io
   can't retrieve I/O vector size
 File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5CX.c", line 1341, in H5CX_get_vec_size
   can't get default dataset transfer property list

End of HDF5 error back trace

Problems reading the array data.


  • Caroline    2020-04-29 10:20:46
  • 阅读 1905    收藏 0    回答 1
  • 邀请
  • 收藏
  • 分享
发送
登录 后发表评论
  • 51testing软件测试圈微信