Overview Getting Started SSH Keys Software Hardware Jobs Apptainer For Faculty

Apptainer Containers

Apptainer is a container technology designed for HPC environments that allows you to create portable, reproducible computing environments. This guide shows how to build a PyTorch GPU container for machine learning on the Beehive cluster.

Building a PyTorch GPU Container

Step 1: Get a GPU Node and Set Up Environment

# Request an interactive GPU node
srun --pty -p gpu -G1 bash

# Set up working directory on scratch filesystem (avoid filling home directory)
export MYDIR=/scratch/$USER/apptainer && mkdir -p $MYDIR && cd $MYDIR

# Configure Apptainer to use scratch for downloads and temporary files
export APPTAINER_CACHEDIR=$MYDIR/apptainer_cache && mkdir -p $APPTAINER_CACHEDIR
export TMPDIR=$MYDIR/tmp && mkdir -p $TMPDIR

Step 2: Build Sandbox from CUDA Base Image

# Create sandbox directory directly from NVIDIA CUDA Docker image
apptainer build --sandbox pytorch_env docker://nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04

The --sandbox flag creates a writable directory structure instead of a read-only .sif file, allowing you to customize the container.

Step 3: Install Python and PyTorch

# Enter the container with GPU support and write permissions
apptainer shell --nv --fakeroot --writable --no-home pytorch_env/

Key flags explained:

--nv: Enable NVIDIA GPU support (access to CUDA drivers and libraries)
--fakeroot: Gain root-like privileges inside container for package installation
--writable: Allow modifications to the sandbox container
--no-home: Don't mount your home directory (cleaner environment)

You'll see several warnings when entering the container - these are normal and can be ignored:

WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-smi [files]: /usr/bin/nvidia-smi doesn't exist in container
...

# Inside container: install system packages
apt-get update && apt-get install -y python3 python3-pip python3-venv git
apt-get clean

# Create and activate Python virtual environment
python3 -m venv /opt/venv
source /opt/venv/bin/activate

# Install PyTorch
pip install --upgrade pip --no-cache-dir
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124 --no-cache-dir

The --no-cache-dir flag tells pip not to cache downloaded packages, saving space in your container.

# Test PyTorch installation inside container
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"

# Exit container
exit

# Test PyTorch installation from outside the container
apptainer exec --nv pytorch_env /opt/venv/bin/python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"

Step 4: Build Final Container and Test

# Build final immutable container from sandbox
apptainer build pytorch.sif pytorch_env/

# Test GPU access in final container
apptainer exec --nv pytorch.sif /opt/venv/bin/python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"

Step 5: Copy to Persistent Storage and Cleanup

# Copy the container to your home directory or project space
cp pytorch.sif ~/pytorch.sif

# Clean up the scratch directory
cd && rm -rf $MYDIR

Using Your Container

Interactive Use

# Enter container with GPU support
apptainer shell --nv ~/pytorch.sif
source /opt/venv/bin/activate
python  # Now you can use PyTorch with GPU

Running Scripts

# Run a Python script
apptainer exec --nv ~/pytorch.sif /opt/venv/bin/python your_script.py

Batch Jobs

#!/bin/bash
#SBATCH --partition=gpu --gres=gpu:1 --time=04:00:00

apptainer exec --nv ~/pytorch.sif /opt/venv/bin/python /path/to/your/script.py

Key Points

Always use --nv for GPU access in both building and running
Use scratch filesystem (/scratch/$USER/apptainer) to avoid filling home directory
Use --no-cache-dir with pip to save space
Virtual environment path is /opt/venv/bin/python
Build on GPU nodes, never on the head node
Test in both sandbox and final container to ensure everything works
Copy container to persistent storage (home directory or project space)
Clean up scratch directory after copying container
Ignore warnings about missing NVIDIA files when building - these are normal

Advanced Options

Definition Files

For reproducible builds, use definition files instead of sandbox method. See Apptainer documentation.

Binding Directories

# Mount scratch directory in container (access host filesystem)
apptainer exec --nv --bind /scratch:/scratch ~/pytorch.sif /opt/venv/bin/python script.py

The --bind flag mounts host directories inside the container, allowing access to datasets and output directories.

Environment Variables

# Pass custom environment variables to the container
apptainer exec --nv --env MY_VARIABLE=value ~/pytorch.sif /opt/venv/bin/python script.py

The --env flag sets environment variables inside the container.