Apptainer Containers
Apptainer is a container technology designed for HPC environments that allows you to create portable, reproducible computing environments. This guide shows how to build a PyTorch GPU container for machine learning on the Beehive cluster.
Building a PyTorch GPU Container
Step 1: Get a GPU Node and Set Up Environment
# Request an interactive GPU node
srun --pty -p gpu -G1 bash
# Set up working directory on scratch filesystem (avoid filling home directory)
export MYDIR=/scratch/$USER/apptainer && mkdir -p $MYDIR && cd $MYDIR
# Configure Apptainer to use scratch for downloads and temporary files
export APPTAINER_CACHEDIR=$MYDIR/apptainer_cache && mkdir -p $APPTAINER_CACHEDIR
export TMPDIR=$MYDIR/tmp && mkdir -p $TMPDIR
Step 2: Build Sandbox from CUDA Base Image
# Create sandbox directory directly from NVIDIA CUDA Docker image
apptainer build --sandbox pytorch_env docker://nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
The --sandbox flag creates a writable directory structure instead of a read-only .sif file, allowing you to customize the container.
Step 3: Install Python and PyTorch
# Enter the container with GPU support and write permissions
apptainer shell --nv --fakeroot --writable --no-home pytorch_env/
Key flags explained:
--nv: Enable NVIDIA GPU support (access to CUDA drivers and libraries)--fakeroot: Gain root-like privileges inside container for package installation--writable: Allow modifications to the sandbox container--no-home: Don't mount your home directory (cleaner environment)
You'll see several warnings when entering the container - these are normal and can be ignored:
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-smi [files]: /usr/bin/nvidia-smi doesn't exist in container
...
# Inside container: install system packages
apt-get update && apt-get install -y python3 python3-pip python3-venv git
apt-get clean
# Create and activate Python virtual environment
python3 -m venv /opt/venv
source /opt/venv/bin/activate
# Install PyTorch
pip install --upgrade pip --no-cache-dir
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124 --no-cache-dir
The --no-cache-dir flag tells pip not to cache downloaded packages, saving space in your container.
# Test PyTorch installation inside container
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"
# Exit container
exit
# Test PyTorch installation from outside the container
apptainer exec --nv pytorch_env /opt/venv/bin/python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"
Step 4: Build Final Container and Test
# Build final immutable container from sandbox
apptainer build pytorch.sif pytorch_env/
# Test GPU access in final container
apptainer exec --nv pytorch.sif /opt/venv/bin/python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"
Step 5: Copy to Persistent Storage and Cleanup
# Copy the container to your home directory or project space
cp pytorch.sif ~/pytorch.sif
# Clean up the scratch directory
cd && rm -rf $MYDIR
Using Your Container
Interactive Use
# Enter container with GPU support
apptainer shell --nv ~/pytorch.sif
source /opt/venv/bin/activate
python # Now you can use PyTorch with GPU
Running Scripts
# Run a Python script
apptainer exec --nv ~/pytorch.sif /opt/venv/bin/python your_script.py
Batch Jobs
#!/bin/bash
#SBATCH --partition=gpu --gres=gpu:1 --time=04:00:00
apptainer exec --nv ~/pytorch.sif /opt/venv/bin/python /path/to/your/script.py
Key Points
- Always use
--nvfor GPU access in both building and running - Use scratch filesystem (
/scratch/$USER/apptainer) to avoid filling home directory - Use
--no-cache-dirwith pip to save space - Virtual environment path is
/opt/venv/bin/python - Build on GPU nodes, never on the head node
- Test in both sandbox and final container to ensure everything works
- Copy container to persistent storage (home directory or project space)
- Clean up scratch directory after copying container
- Ignore warnings about missing NVIDIA files when building - these are normal
Advanced Options
Definition Files
For reproducible builds, use definition files instead of sandbox method. See Apptainer documentation.
Binding Directories
# Mount scratch directory in container (access host filesystem)
apptainer exec --nv --bind /scratch:/scratch ~/pytorch.sif /opt/venv/bin/python script.py
The --bind flag mounts host directories inside the container, allowing access to datasets and output directories.
Environment Variables
# Pass custom environment variables to the container
apptainer exec --nv --env MY_VARIABLE=value ~/pytorch.sif /opt/venv/bin/python script.py
The --env flag sets environment variables inside the container.