[추가] 현재는 Fix된 이슈임
Windows 11 Insider Preview Build 22000.51이 나온 뒤에는 해결된 문제입니다. 아래 환경에서 테스트하였으니 apt 패키지와 드라이버를 업데이트 해보시기 바랍니다.
- OS: Windows 11 Insider Preview Build 22000.51
- Driver: NVIDIA 470.76
- APT Package Version List
Inst libnvidia-container1 (1.4.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64]) Inst libnvidia-container-tools (1.4.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64]) Inst nvidia-container-toolkit (1.5.1-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64]) Inst nvidia-container-runtime (3.5.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64]) Inst nvidia-docker2 (2.6.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [all]) Conf libnvidia-container1 (1.4.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64]) Conf libnvidia-container-tools (1.4.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64]) Conf nvidia-container-toolkit (1.5.1-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64]) Conf nvidia-container-runtime (3.5.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [amd64]) Conf nvidia-docker2 (2.6.0-1 NVIDIA CORPORATION [email protected]:1.0/bionic [all])
Environment: Windows 10 Insider Preview Build 21376.1
WSL2 Ubuntu release: 20.04
NVIDIA Forum Links: https://forums.developer.nvidia.com/t/nvidia-container-cli-error/177403/2?u=ji5489
문제 제기
if I run sudo nvidia-container-cli -k -d /dev/tty info
I got this error
where is ‘devicesetgpcclkvfoffset’ symbol?
– WARNING, the following logs are for debugging purposes only –
I0509 06:20:58.858216 1009 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df)
I0509 06:20:58.858281 1009 nvc.c:346] using root /
I0509 06:20:58.858287 1009 nvc.c:347] using ldcache /etc/ld.so.cache
I0509 06:20:58.858293 1009 nvc.c:348] using unprivileged user 65534:65534
I0509 06:20:58.858311 1009 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0509 06:20:58.891385 1009 dxcore.c:226] Creating a new WDDM Adapter for hAdapter:40000000 luid:1e946dc
I0509 06:20:58.917427 1009 dxcore.c:267] Adding new adapter via dxcore hAdapter:40000000 luid:1e946dc wddm version:3000
I0509 06:20:58.917515 1009 dxcore.c:325] dxcore layer initialized successfully
W0509 06:20:58.918854 1009 nvc.c:397] skipping kernel modules load on WSL
I0509 06:20:58.919404 1010 driver.c:101] starting driver service
E0509 06:20:58.950931 1010 driver.c:168] could not start driver service: load library failed: /usr/lib/wsl/drivers/nv_dispi.inf_amd64_43efafcd74b2efc9/libnvidia-ml.so.1: undefined symbol: devicesetgpcclkvfoffset
I0509 06:20:58.951399 1009 driver.c:203] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request
해결방법
I’m currently installing WSL2 Ubuntu 20.04 within Windows 10 Insider Preview Build 21376.1, and faced same issue like below.
# sudo nvidia-container-cli -k -d /dev/tty info -- WARNING, the following logs are for debugging purposes only -- I0512 07:05:58.485519 5592 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df) I0512 07:05:58.485581 5592 nvc.c:346] using root / I0512 07:05:58.485601 5592 nvc.c:347] using ldcache /etc/ld.so.cache I0512 07:05:58.485604 5592 nvc.c:348] using unprivileged user 65534:65534 I0512 07:05:58.485659 5592 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I0512 07:05:58.502183 5592 dxcore.c:226] Creating a new WDDM Adapter for hAdapter:40000000 luid:185673b I0512 07:05:58.512924 5592 dxcore.c:267] Adding new adapter via dxcore hAdapter:40000000 luid:185673b wddm version:3000 I0512 07:05:58.512956 5592 dxcore.c:325] dxcore layer initialized successfully W0512 07:05:58.513266 5592 nvc.c:397] skipping kernel modules load on WSL I0512 07:05:58.513404 5593 driver.c:101] starting driver service E0512 07:05:58.521931 5593 driver.c:168] could not start driver service: load library failed: /usr/lib/wsl/drivers/nv_dispi.inf_amd64_43efafcd74b2efc9/libnvidia-ml.so.1: undefined symbol: devicesetgpcclkvfoffset I0512 07:05:58.522047 5592 driver.c:203] driver service terminated successfully nvidia-container-cli: initialization error: driver error: failed to process request
and I found out that apt-get experimental
repository and stable
one is mixed out – failing to install WSL2 one (which is available in experimental
repository). as stable package is released, experimental(=WSL2) package should be released, but It wasn’t. see apt-cache madison
result below.
# apt-cache madison libnvidia-container1 libnvidia-container1 | 1.4.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 Packages libnvidia-container1 | 1.3.3-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 Packages libnvidia-container1 | 1.3.3~rc.2-1 | https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/amd64 Packages libnvidia-container1 | 1.3.3~rc.1-1 | https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/amd64 Packages libnvidia-container1 | 1.3.2-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 Packages libnvidia-container1 | 1.3.1-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 Packages
The point is, we should install libnvidia-container1=1.3.3~rc.2-1
version rather than libnvidia-container1=1.4.0-1
. but normal apt-get install nvidia-docker2
command would install 1.4.0-1
version, and It mostly like to fail in WSL2 environment.
I successfully installed older (WSL2-exclusive) packages via apt-get install
command below:
apt-get install \ libnvidia-container1=1.3.3~rc.2-1 \ libnvidia-container-tools=1.3.3~rc.2-1 \ nvidia-container-toolkit=1.4.1-1 \ nvidia-container-runtime=3.4.1-1 \ nvidia-docker2=2.5.0-1
Those commands will install older packages, and after sudo service docker stop
and sudo service docker start
, CUDA will work inside docker.
넘머쪄요!