리눅스에서 3D Visualization 을 위한 환경구축 방법 입니다.




Docker 에서 Nvidia Hardware Accelerated OpenGL


앞서 포스팅 한 것이 전반적인 삽질의 기록이라면, 이번 포스팅은 그것들을 정리한 것으로 생각하시면 될 것 같습니다.





Host 환경 구축




HOST 환경

64bit

Ubuntu 16.04 / CentOS 7.6

Nvidia GPU Tesla M40



참고 사이트:

https://github.com/agisoft-llc/cloud-scripts



위 사이트를 참고하여, 새로 만든 스크립트입니다.

버그를 수정하였고, TurboVNC 를 위한 구성이 아닌 Docker 서비스를 위한 구성으로 수정하여 만들었습니다.



[Ubuntu 용]

configure_ubuntu.sh

#!/bin/bash

NVIDIA_DRIVER=410.79
VIRTUAL_GL=2.6.1
VDISPLAY_RESOLUSION=1920x1080

set -e

sudo apt-get update
sudo DEBIAN_FRONTEND=noninteractive apt-get upgrade -yq

# Prepare for NVidia drivers install
sudo apt-get install -y gcc make pkg-config xserver-xorg-dev linux-headers-$(uname -r) xterm
# xterm is needed for xinit

# Install Lubuntu/Xubuntu/anything
sudo apt-get install -y lubuntu-desktop

# Installing NVidia driver
wget -O NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run --no-check-certificate http://us.download.nvidia.com/tesla/${NVIDIA_DRIVER}/NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run
chmod +x NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run
sudo ./NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run --no-questions --accept-license --no-precompiled-interface --ui=none
echo ""
echo "************************************************************************************************"
echo "*                                                                                              *"
echo "* May be you see this warning above:                                                           *"
echo "*  - WARNING: Unable to find a suitable destination to install 32-bit compatibility libraries. *"
echo "* This is OK.                                                                                  *"
echo "*                                                                                              *"
echo "************************************************************************************************"
echo ""
rm NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run

# Preparation for virtualgl like in https://virtualgl.org/Documentation/HeadlessNV
sudo nvidia-xconfig -a --use-display-device=None --virtual=${VDISPLAY_RESOLUSION}

echo ""
echo "********************************************************************************"
echo "*                                                                              *"
echo "* May be you see this warning above:                                           *"
echo "*  - WARNING: Unable to locate/open X configuration file.                      *"
echo "* This is OK.                                                                  *"
echo "*                                                                              *"
echo "********************************************************************************"
echo ""

# Fix /etc/X11/xorg.conf:
# 1. Add line with BusID in section Device (taken from output of lspci | egrep -h "VGA|3D controller")
# For EC2 g3 and p3 also:
# 2. Delete whole section ServerLayout (comment it with # symbol)
# 3. Delete whole section Screen (comment it with # symbol)
sudo /usr/bin/python2.7 fix_xorg_conf.py /etc/X11/xorg.conf

# Install VirtualGL
wget https://sourceforge.net/projects/virtualgl/files/${VIRTUAL_GL}/virtualgl_${VIRTUAL_GL}_amd64.deb/download -O virtualgl_${VIRTUAL_GL}_amd64.deb
sudo dpkg -i virtualgl*.deb
rm virtualgl*.deb

# Configure VirtualGL
sudo service lightdm stop
sudo /opt/VirtualGL/bin/vglserver_config -config +s +f -t

echo ""
echo "********************************************************************************"
echo "*                                                                              *"
echo "* May be you see these lines above:                                            *"
echo "*  - rmmod: ERROR: Module nvidia is in use by: nvidia_modeset                  *"
echo "*  - IMPORTANT NOTE: Your system uses modprobe.d to set device permissions.    *"
echo "* This is OK - just means that reboot required.                                *"
echo "*                                                                              *"
echo "********************************************************************************"
echo ""


# Add Global Environment Variables
sudo sed -i '$ a export DISPLAY=:0.0' /etc/profile


echo ""
echo "******************************************************************"
echo "*                                                                *"
echo "* Rebooting for changes to take effect!                          *"
echo "*                                                                *"
echo "******************************************************************"
echo ""

sudo reboot



[CentOS 용]

#!/bin/bash

NVIDIA_DRIVER=410.79
VIRTUAL_GL=2.6.1
VDISPLAY_RESOLUSION=1920x1080

set -e

sudo yum update -y
sudo DEBIAN_FRONTEND=noninteractive yum upgrade -yq

# Prepare for NVidia drivers install
sudo yum install -y gcc make pkg-config xserver-xorg-dev linux-headers-$(uname -r) xterm
# xterm is needed for xinit

# Install Lubuntu/Xubuntu/anything
sudo yum -y groupinstall "X Window System"
sudo yum -y install lightdm

# Installing NVidia driver
wget -O NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run --no-check-certificate http://us.download.nvidia.com/tesla/${NVIDIA_DRIVER}/NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run
chmod +x NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run
sudo ./NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run --no-questions --accept-license --no-precompiled-interface --ui=none --glvnd-egl-config-path=/usr/share/glvnd/egl_vendor.d
echo ""
echo "************************************************************************************************"
echo "*                                                                                              *"
echo "* May be you see this warning above:                                                           *"
echo "*  - WARNING: Unable to find a suitable destination to install 32-bit compatibility libraries. *"
echo "* This is OK.                                                                                  *"
echo "*                                                                                              *"
echo "************************************************************************************************"
echo ""
rm NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run


# Preparation for virtualgl like in https://virtualgl.org/Documentation/HeadlessNV
sudo nvidia-xconfig -a --use-display-device=None --virtual=${VDISPLAY_RESOLUSION}

echo ""
echo "********************************************************************************"
echo "*                                                                              *"
echo "* May be you see this warning above:                                           *"
echo "*  - WARNING: Unable to locate/open X configuration file.                      *"
echo "* This is OK.                                                                  *"
echo "*                                                                              *"
echo "********************************************************************************"
echo ""

# Fix /etc/X11/xorg.conf:
# 1. Add line with BusID in section Device (taken from output of lspci | egrep -h "VGA|3D controller")
# For EC2 g3 and p3 also:
# 2. Delete whole section ServerLayout (comment it with # symbol)
# 3. Delete whole section Screen (comment it with # symbol)
sudo /usr/bin/python2.7 fix_xorg_conf.py /etc/X11/xorg.conf

# Install VirtualGL
wget -O virtualgl.rpm https://downloads.sourceforge.net/project/virtualgl/${VIRTUAL_GL}/VirtualGL-${VIRTUAL_GL}.x86_64.rpm
sudo yum -y install virtualgl.rpm
rm virtualgl.rpm

# Configure VirtualGL
sudo service lightdm stop
sudo systemctl enable lightdm
sudo /opt/VirtualGL/bin/vglserver_config -config +s +f -t

echo ""
echo "********************************************************************************"
echo "*                                                                              *"
echo "* May be you see these lines above:                                            *"
echo "*  - rmmod: ERROR: Module nvidia is in use by: nvidia_modeset                  *"
echo "*  - IMPORTANT NOTE: Your system uses modprobe.d to set device permissions.    *"
echo "* This is OK - just means that reboot required.                                *"
echo "*                                                                              *"
echo "********************************************************************************"
echo ""


# Add Global Environment Variables
sudo sed -i '$ a export DISPLAY=:0.0' /etc/profile


echo ""
echo "******************************************************************"
echo "*                                                                *"
echo "* Rebooting for changes to take effect!                          *"
echo "* Don't forget to execute below command after reboot.            *"
echo "* xauth merge /etc/opt/VirtualGL/vgl_xauth_key                   *"
echo "*                                                                *"
echo "******************************************************************"
echo ""

sudo reboot



[Ubuntu / CentOS 공용]

fix_xorg_conf.py

import os
import sys
import subprocess

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print("Required argument: <path to xorg.conf>")
        sys.exit(1)

    xorg_config = sys.argv[1]

    lspci_p = subprocess.Popen(['lspci'], stdout=subprocess.PIPE)
    lspci_vga_p = subprocess.Popen(['egrep', '-h', 'VGA|3D controller'], stdin=lspci_p.stdout, stdout=subprocess.PIPE)
    lspci_p.stdout.close()

    vga_devices = lspci_vga_p.communicate()[0]

    gpus = []

    instance_type = None

    for line in vga_devices.split('\n'):
        if len(line) == 0:
            continue
        if "Cirrus" in line:
            continue
        if "NVIDIA Corporation" in line:
            bus_id_hex = line.split(' ')[0]
            bus_id0, bus_id12 = bus_id_hex.split(':')[0], bus_id_hex.split(':')[1]
            bus_id1, bus_id2 = bus_id12.split('.')
            bus_id_decimal = "{}:{}:{}".format(int(bus_id0, 16), int(bus_id1, 16), int(bus_id2, 16))
            gpus.append((line, bus_id_hex, bus_id_decimal))
            if "GRID K520" in line:
                needToDeleteSecsion = False
            elif "Tesla M60" in line:
                needToDeleteSecsion = True
            elif "Tesla K80" in line:
                needToDeleteSecsion = False
            elif "NVIDIA Corporation Device 1db1 (rev a1)" in line:
                needToDeleteSecsion = True
            elif "NVIDIA Corporation Device 15f8 (rev a1)" in line:
                needToDeleteSecsion = True
            elif "Tesla M40" in line:
                needToDeleteSecsion = True
            elif "Tesla P100" in line:
                needToDeleteSecsion = True


    if len(gpus) == 0:
        print("No GPUs detected with 'lspci | egrep -h \"VGA|3D controller\"'!")
        sys.exit(1)

    if needToDeleteSecsion is True:
        print("Need To Delete 'ServerLayout', 'Screen' Section.")

    print("{} GPUs detected:".format(len(gpus)))
    print("  {: <10s} {: <10s} {}".format("BusID hex", "BusID dec", "lspci output"))
    for line, bus_id_hex, bus_id_decimal in gpus:
        print("  {: <10s} {: <10s} {}".format(bus_id_hex, bus_id_decimal, line))

    print("Fixing xorg.conf {}...".format(xorg_config))
    xorg_config_backup = xorg_config + ".backup"
    xorg_config_new = xorg_config + ".fixed.tmp"

    with open(xorg_config, 'r') as config:
        lines = config.readlines()

    # 1. Add line with BusID in section Device (taken from output of lspci | egrep -h "VGA|3D controller")
    # For M60, M40, v100 and for P100 PCIE also:
    # 2. Delete whole section ServerLayout (comment it with # symbol)
    # 3. Delete whole section Screen (comment it with # symbol)
    #
    # On M60, M40, v100 and for P100 PCIE steps 2 and 3 to fix this error in /var/log/Xorg.0.log:
    # (EE) NVIDIA(GPU-0): UseDisplayDevice "None" is not supported with GRID
    # (EE) NVIDIA(GPU-0):     displayless
    # (EE) NVIDIA(GPU-0): Failed to select a display subsystem.
    section_start = "Section \""
    section_end   = "EndSection\n"
    sections_to_delete = []
    if needToDeleteSecsion is True:
        sections_to_delete = ["ServerLayout", "Screen"]

    sections_deleted = []

    device_index = 0

    print("  Writing fixed xorg.conf to {}".format(xorg_config_new))
    with open(xorg_config_new, 'w') as updated:
        current_section = None
        for line in lines:
            removed = False

            if current_section is None and section_start in line:
                current_section = line[len(section_start):-2].replace('"', '').strip()
                if current_section in sections_to_delete:
                    print("  Section {} deleted!".format(current_section))
                    sections_deleted.append(current_section)

            if current_section in sections_to_delete:
                removed = True

            if current_section is not None and line == section_end:
                if current_section == "Device":
                    _, _, bus_id_decimal = gpus[device_index]
                    print("  BusID {} added!".format(bus_id_decimal))
                    updated.write("    BusID          \"PCI:{}\"\n".format(bus_id_decimal))
                    device_index += 1
                current_section = None

            if removed:
                updated.write("#{}".format(line))
            else:
                updated.write("{}".format(line))

    if device_index == 0:
        print("Section \"Device\" was not found!")
        sys.exit(1)
    for section in sections_to_delete:
        if section not in sections_deleted:
            print("Section \"{}\" was not found!".format(section))
            sys.exit(1)

    os.rename(xorg_config, xorg_config_backup)
    print("  Backup saved to {}".format(xorg_config_backup))

    os.rename(xorg_config_new, xorg_config)


설정 방법은 위 configure.sh 와 fix_xorg_config.py 를 먼저 복사합니다.

이후에 configure.sh 를 실행하시면 됩니다.


설정이 완료되고나면, 시스템이 재시작 됩니다.

Xorg 서비스가 실행되고 있는지 확인하고, 안되어 있다면 아래처럼 서비스를 실행시킵니다.


sudo service lightdm restart



그리고 DISPLAY 환경변수를 지정합니다.

DISPLAY=:0.0

export DISPLAY


이후 제대로 OpenGL 이 Nvidia 위에서 동작하는지 확인.

glxinfo

glxinfo | grep -i OpenGL







Docker BaseImage 만들기




Docker 이미지도 몇 가지 설정이 필요합니다.

주의할 점은 HOST 와 동일한 버전의 NVIDIA DRIVER  를 설치해야 한다는 것 입니다.



Dockerfile.txt

FROM ubuntu:16.04 
MAINTAINER sw0826.kim@snowcorp.com 

ENV NVIDIA_DRIVER=410.79 
RUN apt-get update 
RUN apt-get install -yq --no-install-recommends mesa-utils wget module-init-tools
RUN wget -O NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run --no-check-certificate http://us.download.nvidia.com/tesla/${NVIDIA_DRIVER}/NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run 
RUN chmod +x NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run
RUN sh ./NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run --no-questions --accept-license --no-precompiled-interface --ui=none -a -N --no-kernel-module 
RUN rm NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run

FROM centos:centos7.6.1810
MAINTAINER sw0826.kim@snowcorp.com

ENV NVIDIA_DRIVER=410.79
RUN yum update -y
RUN yum install -y mesa-utils wget module-init-tools

RUN wget -O NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run --no-check-certificate http://us.download.nvidia.com/tesla/${NVIDIA_DRIVER}/NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run
RUN chmod +x NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run
RUN sh ./NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run --no-questions --accept-license --no-precompiled-interface --ui=none --glvnd-egl-config-path=/usr/share/glvnd/egl_vendor.d -a -N --no-kernel-module
RUN rm NVIDIA-Linux-x86_64-${NVIDIA_DRIVER}.run

RUN yum install -y glxinfo


빌드는 대충~

docker build -t openglDocker:base .




실행은 다음과 같이 해야 합니다.

물론 nvidia-docker 가 설치되어 있어야 합니다.


docker run --runtime=nvidia --privileged -e "DISPLAY=unix:0.0" -it -v="/tmp/.X11-unix:/tmp/.X11-unix:rw" openglDocker:base glxinfo


이런식으로 실행하시면 됩니다.





Trouble Shooting



아래와 같은 에러가 발생할 경우


name of display: :0.0

X Error of failed request:  BadValue (integer parameter out of range for operation)

  Major opcode of failed request:  154 (GLX)

  Minor opcode of failed request:  24 (X_GLXCreateNewContext)

  Value in failed request:  0x0

  Serial number of failed request:  37

  Current serial number in output stream:  38


/etc/X11/xorg.conf 파일이 제대로 있는지, 그리고 BusID 를 포함하여 설정이 제대로 되어 있는지 확인합니다.

간혹 설정이 삭제되는 경우가 있습니다.





!