Beehive Documentation

Last updated: 09/09/2025, 11:53:54 AM

Apptainer Containers

Apptainer is a container technology designed for HPC environments that allows you to create portable, reproducible computing environments. This guide shows how to build a PyTorch GPU container for machine learning on the Beehive cluster.

Building a PyTorch GPU Container

Step 1: Get a GPU Node and Set Up Environment

# Request an interactive GPU node
srun --pty -p gpu -G1 bash

# Set up working directory on scratch filesystem (avoid filling home directory)
export MYDIR=/scratch/$USER/apptainer && mkdir -p $MYDIR && cd $MYDIR

# Configure Apptainer to use scratch for downloads and temporary files
export APPTAINER_CACHEDIR=$MYDIR/apptainer_cache && mkdir -p $APPTAINER_CACHEDIR
export TMPDIR=$MYDIR/tmp && mkdir -p $TMPDIR

Step 2: Build Sandbox from CUDA Base Image

# Create sandbox directory directly from NVIDIA CUDA Docker image
apptainer build --sandbox pytorch_env docker://nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04

The --sandbox flag creates a writable directory structure instead of a read-only .sif file, allowing you to customize the container.

Step 3: Install Python and PyTorch

# Enter the container with GPU support and write permissions
apptainer shell --nv --fakeroot --writable --no-home pytorch_env/

Key flags explained:

  • --nv: Enable NVIDIA GPU support (access to CUDA drivers and libraries)
  • --fakeroot: Gain root-like privileges inside container for package installation
  • --writable: Allow modifications to the sandbox container
  • --no-home: Don't mount your home directory (cleaner environment)

You'll see several warnings when entering the container - these are normal and can be ignored:

WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-smi [files]: /usr/bin/nvidia-smi doesn't exist in container
...
# Inside container: install system packages
apt-get update && apt-get install -y python3 python3-pip python3-venv git
apt-get clean

# Create and activate Python virtual environment
python3 -m venv /opt/venv
source /opt/venv/bin/activate

# Install PyTorch
pip install --upgrade pip --no-cache-dir
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124 --no-cache-dir

The --no-cache-dir flag tells pip not to cache downloaded packages, saving space in your container.

# Test PyTorch installation inside container
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"

# Exit container
exit
# Test PyTorch installation from outside the container
apptainer exec --nv pytorch_env /opt/venv/bin/python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"

Step 4: Build Final Container and Test

# Build final immutable container from sandbox
apptainer build pytorch.sif pytorch_env/

# Test GPU access in final container
apptainer exec --nv pytorch.sif /opt/venv/bin/python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"

Step 5: Copy to Persistent Storage and Cleanup

# Copy the container to your home directory or project space
cp pytorch.sif ~/pytorch.sif

# Clean up the scratch directory
cd && rm -rf $MYDIR

Using Your Container

Interactive Use

# Enter container with GPU support
apptainer shell --nv ~/pytorch.sif
source /opt/venv/bin/activate
python  # Now you can use PyTorch with GPU

Running Scripts

# Run a Python script
apptainer exec --nv ~/pytorch.sif /opt/venv/bin/python your_script.py

Batch Jobs

#!/bin/bash
#SBATCH --partition=gpu --gres=gpu:1 --time=04:00:00

apptainer exec --nv ~/pytorch.sif /opt/venv/bin/python /path/to/your/script.py

Key Points

  • Always use --nv for GPU access in both building and running
  • Use scratch filesystem (/scratch/$USER/apptainer) to avoid filling home directory
  • Use --no-cache-dir with pip to save space
  • Virtual environment path is /opt/venv/bin/python
  • Build on GPU nodes, never on the head node
  • Test in both sandbox and final container to ensure everything works
  • Copy container to persistent storage (home directory or project space)
  • Clean up scratch directory after copying container
  • Ignore warnings about missing NVIDIA files when building - these are normal

Advanced Options

Definition Files

For reproducible builds, use definition files instead of sandbox method. See Apptainer documentation.

Binding Directories

# Mount scratch directory in container (access host filesystem)
apptainer exec --nv --bind /scratch:/scratch ~/pytorch.sif /opt/venv/bin/python script.py

The --bind flag mounts host directories inside the container, allowing access to datasets and output directories.

Environment Variables

# Pass custom environment variables to the container
apptainer exec --nv --env MY_VARIABLE=value ~/pytorch.sif /opt/venv/bin/python script.py

The --env flag sets environment variables inside the container.