Home Syllabus Schedule AI Prompt Resources

Computing Environment Overview

At a minimum, you’ll need the following to do this project:

Unless you are very patient, you’ll also need one more important resource: GPU-accelerated compute. Training a model on your laptop could be 100x - 1000x slower than training that model on a decent CUDA-capable graphics card (GPU).

Rose CSSE has two GPU servers

Gebru is first-come, first-served. Gus is part of our new SLURM compute cluster. You can’t log in to Gus directly. The instructions here are for using Gebru, which is easier.

Alternative workflows

You’re welcome to try any of the following approaches, but they aren’t the approach that the course officially supports:

Outline of the official workflow

The rest of this document will walk you through the following workflow:

Part 1: Installing VS Code

Tldr; VS Code is a free code editor from Microsoft. You want to install it on your local computer. You’ll also need several VS Code extensions: Python, Jupyter, and Remote - SSH.


1. Download the Installer:

Go to the official VS Code download page: https://code.visualstudio.com/download. Click on the “System Installer” for your OS.

2. Run the Installer:

Open the downloaded .exe file. Accept the license agreement and click “Next”.

3. Install and Launch

Click “Next” and then “Install”. Once it’s finished, launch VS Code. You should be greeted with the welcome screen.

4. Install VS Code Extensions

  1. Click on the “Extensions” icon in the Activity Bar on the left (it looks like four squares).
  2. In the search bar, type Jupyter. Find the extension named Jupyter (by Microsoft) and click the “Install” button.
  3. Repeat with Python to find and install the official Python extension by Microsoft.
  4. Repeat again with Remote - SSH (also from Microsoft)

Part 2: Connecting to Gebru

Access to Rose CSSE servers is restricted to the Rose-Hulman campus network. If you’re off campus you’ll need to first connect to the Rose-Hulman VPN. This EIT knowledge base article explains how to do this.

  1. Once you’ve installed the Remote - SSH VS Code extension, the Remote Explorer icon will appear in your Activity Bar (on the far left, near the Extensions button). Click it.
  2. Make sure SSH is selected in the dropdown at the top.
  3. Click the ‘+’ icon to add a new SSH connection.
  4. Enter the connection command in the following format, replacing username with your Rose-Hulman user name: ssh username@gebru.csse.rose-hulman.edu
  5. If it asks you which OS the server is using, select Linux.
  6. Press Enter. It will ask you to select an SSH configuration file; the default option is fine.
  7. You will now see your connection listed in the Remote Explorer. Hover over it and click the “Connect to Host in New Window” icon (it looks like a folder with a plus sign).
  8. A new VS Code window will open. It will ask for your password. Enter it and press Enter.
  9. Look at the green or blue tab in the bottom-left corner. If it says SSH: server.address, you are successfully connected!

From now on, any file you open or terminal you create is on the remote server, not your local machine. Your VS Code window is (not quite literally, but basically) running on the server.

Part 3: Setting up Gebru for this project

All the commands in this section assume that you’ve opened VS Code, connected to the compute server, and have opened a terminal in VS Code. This terminal lets you run commands directly on the server (not on your local computer).

We’ll manage our Python environment with a slick new tool called uv. You can read about how to install uv here. On Linux, just run this command from any folder:

# Install uv on Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

Next clone the project starting code from GitHub. There’s a uv configuration file called pyproject.toml in the starting code. Read this file to get a sense for what will be installed.

Install all of these project dependencies with one command:

# install all of the standard Python dependencies for this project
uv sync

We also need to install some special tricky project dependencies. This command will choose the correct versions based on your hardware and operating system. It’s normal for this to take a little time.

# auto-detect and install appropriate pytorch version
uv pip install torch torchvision --torch-backend=auto

Finally, we need to register this Python environment as a Python kernel that we can use in VS Code. This is a one-time command. You won’t need to run it again unless you change your Python environment.

# Install this Python environment as a Python kernel (one-time setup)
uv run python -m ipykernel install --user --name=p3-depth --display-name="P3 Depth"

Part 4: Training a Model

This section assumes that you have VS Code installed, you connected VS Code to the GPU server via SSH, you cloned the starter code to the server, and you finished setting up your Python environment on the server as described above.

Now it’s time to start working on your project.

  1. Open the Project Folder:
  2. Select the Kernel: The “kernel” is the Python engine that runs your code.
  3. Edit and Run the Code: You can now proceed with the assignment. Good luck!

Server Etiquette

There are more project teams than GPUs. We’re going to have to share. GPUs have two main resources: (1) cores, and (2) memory.

Generally, when you over-use the cores available on a GPU, all the jobs running on that card will run slowly. If you overuse the memory, usually all of the jobs on that GPU will crash. So, it’s your job to check GPU useage before you start a job.

Class Rule: If you crash a classmate’s training run you owe them a cookie.

To check the status of the GPUs on your server, run this command:

nvidia-smi

You’ll receive a snapshot report about GPU usage on the server. If you’d prefer a continuously-running dashboard display you can try nvtop instead of nvidia-smi.

Every time you start working on the GPU server, run one of these two commands. Choose a card that isn’t full. There’s a spot in your code notebook for you to type your chosen card number. Now restart your Python kernel and start training. (The restart is important! You have to restart your Python kernel to switch from one GPU to another.)

You can make your jobs smaller to save space. Smaller batch sizes and less-complicated models reduce both core usage and memory useage.

Getting Help

If you have a question about using the GPU servers, please ask me! Email and Teams are best if you can’t ask in-person.

If you think you need some system administration support, like installing system-wide software, or troubleshooting a server that has gone offline, please email both me and the CSSE department sysadmin, Darryl Mouck.