Setting up my own deep learning rig

Posted Jul 28, 2023 Updated Dec 23, 2023

By yxlow

10 min read

Personal Note

Currently I am pursuing a online masters in computer science from Georgia Tech. Part of the coursework (that I intend to take) such as Deep Learning and Reinforcement Learning recommends a CUDA compatible GPU. In addition, based on my personal experience, the downside of using cloud technologies is that it can be a hassle to start/shut down instances (or if you forgot to!), which introduces a barrier when trying simple experiments. Furthermore, colab recently switched to a credits model instead of a subscription model.

This, combined with the recent surge of interest in generative AI, made me decide to invest in my own deep learning rig. In addition I thought it would be cool and educational about maintaining my own “private” server.

I decided to mimic my workplace setup, which is a remote desktop called Google’s Cloudtop with Goobuntu installed, and later switched over to gLinux. For developmental work, employees would ssh to their remote desktop . This is quite similar to how you spin up an EC2 or GCE instance.

Dualboot ubuntu

This post highlights the main reasons why I decided to use dualboot, I tried WSL2 but found the experience subpar (which is partly motivated by my workplace design). Another huge advantage of dual boot is the ability to add multiple users (each with a new environment) with it’s own installations.

I bought my PC from aftershock with Windows 11 pro. I mainly got windows for edge cases (or insurance?) in the event that ubuntu is not compatiable with certain software required for my masters. Also, as recommended by the ubuntu website, windows should be installed first.

This guide is the one I followed to install dual boot ubuntu. I download the ubuntu desktop version instead of the server function, just incase I ever need the UI.

Tips

You need to prepare a thumbdrive to function as an ISO image.

I set my ubuntu to be

Default OS

Login automatically.

This is so that I can place my desktop anywhere in the house, and I just need to turn the power on. For heavy debugging tasks I will connect it to a monitor . For shutting down or rebooting, it can be done with sudo reboot and sudo shutdown now.

Configuring users

These are the commands I found useful when configuring users / deleting users.

Add user
- 1 sudo adduser <username>
Add user to sudoers group (Giving new users admin/root access):
- Online Guide on how to addd sudoers in ubuntu
- 1 usermod -aG sudo <username>

Delete user

askubuntu.com - How to delete a user & its home folder safetly

  
sudo userdel username
# Or to delete the home directory as well
sudo deluser --remove-home user

Configuring ssh

To setup your ubuntu desktop to allow ssh is incredibily easy:

  
sudo apt update && sudo apt install build-essential
sudo apt install openssh-server

Verify that ssh has been installed correctly:

sudo systemctl status ssh

Allow SSH from another device.

To allow ssh, you need to configure your router. In my case I am using the linksys velop network and the guide can be found here. I used single port forwarding and both external and internal port should be set to 22 which is the default port.

To figure out your Device IP address, just type this on your ubuntu Desktop:

Hostname -I

It should come out as 192.168.x.y and just populate this ip on your router settings. Afterwards, you should be able to ssh by doing this on your mac’s terminal:

ssh <username>@192.168.x.y -p 22

If you know your own ip address, it can also be done with ssh <username>@<your ip address> -p 22.

The following sections can now be done from your mac (or external machine)!

❯ ssh <myusername>@<my ip> -p 22
<myusername>@<my ip>'s password:
Welcome to Ubuntu 22.04.2 LTS (GNU/Linux 5.19.0-50-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

 * Introducing Expanded Security Maintenance for Applications.
   Receive updates to over 25,000 software packages with your
   Ubuntu Pro subscription. Free for personal use.

     https://ubuntu.com/pro

Expanded Security Maintenance for Applications is not enabled.

0 updates can be applied immediately.

Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status

Last login: Sat Jul 29 16:57:03 2023 from 192.168.3.112
lowyx:~$

Configuring Python

I followed this guide to install anaconda, as it is my preferred method to manage different python environments locally. The instructions are pretty straight forward:

Using Docker while remote ssh!

Turns out using docker in such an environment is pretty straight forward.

It is exactly the same from the terminal point of view, and if you are using vscode, the plugins handles majority of the complexities.

More information can be found at the remote ide section.

Look at the official anaconda repo site and select the version you need. I happen to be using the intel x64 architecture:

curl --output anaconda.sh https://repo.anaconda.com/archive/Anaconda3-2023.07-1-Linux-x86_64.sh
bash anaconda.sh

Then, reload your terminal (by exiting and ssh in again), you should be able to run conda:

conda env list

To create a new conda environment:

  
conda create -n pyt python=3.10
conda activate pyt

Configuring GPU for Pytorch

In my opinion, configuring Cuda and Nvidia for deep learning is a little tricky, I watched the following youtube video (below) but I found certain steps to be missing but it should give you a high level idea of the steps required:

Install nvidia drivers (nvidia-smi)
Configure Cuda
Install pytorch

Nvidia-smi

First check if your desktop has a nvidia gpu and then install the driver:

  
lowyx:~$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2684 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 22ba (rev a1)

Then, check the available drivers avaliable:

  


lowyx:~$ ubuntu-drivers list
nvidia-driver-535-open, (kernel modules provided by linux-modules-nvidia-535-open-generic-hwe-22.04)
nvidia-driver-525-server, (kernel modules provided by linux-modules-nvidia-525-server-generic-hwe-22.04)
nvidia-driver-525-open, (kernel modules provided by linux-modules-nvidia-525-open-generic-hwe-22.04)
nvidia-driver-525, (kernel modules provided by linux-modules-nvidia-525-generic-hwe-22.04)
nvidia-driver-535, (kernel modules provided by linux-modules-nvidia-535-generic-hwe-22.04)
nvidia-driver-535-server, (kernel modules provided by linux-modules-nvidia-535-server-generic-hwe-22.04)
nvidia-driver-535-server-open, (kernel modules provided by linux-modules-nvidia-535-server-open-generic-hwe-22.04)

I installed the 525 version, do note that you need to reboot afterwards.

  
sudo apt-get install nvidia-driver-525
sudo reboot

Then, on the terminal run nvidia-smi and this should be the output:

  


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  Off |
|  0%   36C    P8    17W / 450W |    216MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1556      G   /usr/lib/xorg/Xorg                 64MiB |
|    0   N/A  N/A      1808      G   /usr/bin/gnome-shell              134MiB |
|    0   N/A  N/A      2165      G   ...bexec/gnome-initial-setup        6MiB |
+-----------------------------------------------------------------------------+

Cuda toolkit

If you see the nvidia-smi output above, you will notice that the CUDA Version is version 12. However, it is important for you to check the latest distribution of pytorch cuda version, for example in my case the stable build only supports up till 11.8.

So, it is important that we install the cuda 11.8 version. I just searched Google with nvidia driver for cuda 11.8 and it brought me to the nvidia developer page here.

Select the required OS, architecture, distribution and version, runfile as installer type which will give you the instructions:

wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run

When you run the bash scripts, it warns you that you already have an earlier of driver installed, click Continue:

Accept the terms and conditions:

And unselect the driver installation:

After this step, reboot your desktop (sudo reboot). It will show you an output that your cuda has been added to /usr/local/cuda directory.

To make sure everything is working correctly:

Navigate to /usr/local and look for cuda-11.8 (Or the version you are using):
Navigate to cuda-11.8/bin, you should see a nvcc binary

  


lowyx:/usr/local/cuda-11.8/bin$ ./nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Once this is verified, add this to your bashrc script:

  
export CUDA_HOME=/usr/local/cuda-11.8   # Modify the path if necessary
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

Now, reboot your bash script/terminal and you should be able to run nvcc --version:

  


lowyx:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Pytorch

As mentioned, the pytorch installation instructions can be found in pytorch.org website. Since I am using linux with Conda, this are my installation steps provided:

  
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

To check if your pytorch is able to access the gpu, I found this stackoverflow to be a good guide:

  


lowyx:~$ ipython
Python 3.11.4 (main, Jul  5 2023, 13:45:01) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch
torch
In [2]: torch.cuda.is_available()
Out[2]: True

Delete Driver + Cuda

During my process, I found myself running wrong steps and had to attempt to reinstall certain drivers. This stackoverflow provided me with the details required to remove cuda from ubuntu.

  
sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*

Tensorflow

Installing tensorflow is pretty easy, I just followed the official documentation. As compared against the other tutorials I have found online, I was not required to install cuDNN. (It seems like I also do not need to do so for pytorch, perhaps if anyone is reading this and knows why, please let me know)

Remote IDE

Use vscode remote development. All you got to do is to connect to ssh and vscode will figure out your filesystem for you.

Remote IDE with docker

I followed this tutorial to install docker.

Similarly, you can create a Dockerfile and .devcontainer.json, install the required extensions and develop on top of docker in your remote Desktop. I may upload an example in the future if there is demand for it.

Useful Tips

Check remaining space:

df -h

Check size of current directory:

du -sh

Check size of each folder in current directory:

  
du -sh ./*

To list all available drives:

lsblk

To have a shared drive and share them among dual boot, I found this guide to be useful.

For my own linux, I created /mnt/shared and added this line to /etc/fstab:

/dev/sda2 /mnt/shared ntfs-3g defaults,auto,umask=000 0 0

Knowledge, Engineering

This post is licensed under CC BY 4.0 by the author.