Beehive Documentation

Last updated: 09/08/2025, 01:20:55 PM

Software Setup

⚠️ Important: Never install software on the head node. Always request an interactive job on a compute node before installing packages or setting up environments.


Python Virtual Environments

Python virtual environments provide isolated Python installations for each project, preventing dependency conflicts and ensuring reproducible environments. This is the recommended approach for Python development on Beehive.

First, get an interactive job on a compute node:

srun --pty bash

For better organization and isolation, create a separate virtual environment for each project. You have two location options:

Option 1: Home directory (persistent across nodes)

export PROJECT_DIR=$HOME/my_project
mkdir -p $PROJECT_DIR && cd $PROJECT_DIR

Option 2: Scratch space (faster I/O, node-specific)

export PROJECT_DIR=/scratch/$USER/my_project
mkdir -p $PROJECT_DIR && cd $PROJECT_DIR

Note: /scratch is local to each compute node and provides faster I/O, but data is not shared between nodes. For persistent storage across all nodes, use your home directory.

  1. Create a virtual environment for this project:
python3 -m venv $PROJECT_DIR/venv
  1. Create a project environment file:
cat > my_project.env << EOF
export PROJECT_DIR=$PROJECT_DIR
source $PROJECT_DIR/venv/bin/activate
export PYTHONPATH=$PROJECT_DIR:$PYTHONPATH
echo "Activated virtual environment for my_project"
EOF

Using Your Project Environment

  1. Activate the environment:
source my_project.env
  1. Update pip to the latest version:
pip install --upgrade pip --no-cache-dir
  1. Install packages using pip (using --no-cache-dir to save home directory space):
# Install individual packages
pip install numpy pandas matplotlib --no-cache-dir

# Or install from requirements file (if you have an existing project)
pip install -r requirements.txt --no-cache-dir
  1. Save your environment:
# Generate requirements file
pip freeze > requirements.txt

This approach has several advantages:

  • Each project has its own isolated Python environment
  • Uses Python's built-in virtual environment system
  • Dependencies are tracked in requirements.txt
  • Easy to share environment specifications with collaborators
  • No conflicts between different projects' dependencies

PyTorch Installation

PyTorch is a popular deep learning framework with excellent GPU support. This guide will help you install PyTorch with CUDA support on the Beehive cluster using the Python virtual environment approach described above.

Prerequisites

First, set up a Python virtual environment following the Python Virtual Environments section above. For this example we will be using a project named pytorch_project.

Installation Steps

  1. Get an interactive job on a GPU node:
srun -p gpu -G1 --pty bash
  1. Set up your PyTorch project using the virtual environment approach:
# For persistent storage (recommended for most users)
export PROJECT_DIR=$HOME/pytorch_project
# Or for faster I/O during training (node-specific)
# export PROJECT_DIR=/scratch/$USER/pytorch_project
mkdir -p $PROJECT_DIR && cd $PROJECT_DIR
python3 -m venv $PROJECT_DIR/venv
# Create project environment file
cat > pytorch_project.env << EOF
export PROJECT_DIR=$PROJECT_DIR
source $PROJECT_DIR/venv/bin/activate
export PYTHONPATH=$PROJECT_DIR:$PYTHONPATH
echo "Activated PyTorch environment"
EOF
  1. Activate the environment and install PyTorch:
source pytorch_project.env
pip install --upgrade pip --no-cache-dir
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124 --no-cache-dir
  1. Test your installation:
python -c "import torch; print(torch.cuda.is_available())"

This should print True if PyTorch can access the GPU.

Using PyTorch in Future Sessions

In future sessions, you'll just need to activate your virtual environment:

source pytorch_project.env

Running PyTorch Jobs

When submitting batch jobs that use PyTorch, make sure to include the activation command in your job script:

#!/bin/bash
#SBATCH --job-name=pytorch_job
#SBATCH --output=pytorch_%j.log
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=4

# Activate the virtual environment with PyTorch (use the correct path to your project)
source $HOME/pytorch_project/pytorch_project.env

# Run your PyTorch script
python train_model.py

Jupyter Notebook Usage

Jupyter Notebooks provide an interactive environment for code development, data exploration, and visualization. This guide shows how to run Jupyter on the Beehive cluster using the Python virtual environment approach described above.

Prerequisites

First, set up a Python virtual environment following the Python Virtual Environments section above. For this example we will be using a project named jupyter_project.

Setup and Installation

  1. Get an interactive job on a node:
srun -p dev-cpu --pty bash
  1. Set up your Jupyter project using the virtual environment approach:
# For persistent storage (recommended for most users)
export PROJECT_DIR=$HOME/jupyter_project
# Or for faster I/O (node-specific)
# export PROJECT_DIR=/scratch/$USER/jupyter_project
mkdir -p $PROJECT_DIR && cd $PROJECT_DIR
python3 -m venv $PROJECT_DIR/venv
# Create project environment file
cat > jupyter_project.env << EOF
export PROJECT_DIR=$PROJECT_DIR
source $PROJECT_DIR/venv/bin/activate
export PYTHONPATH=$PROJECT_DIR:$PYTHONPATH
echo "Activated Jupyter environment"
EOF
  1. Activate the environment and install Jupyter:
source jupyter_project.env
pip install --upgrade pip --no-cache-dir
pip install notebook --no-cache-dir

Running Jupyter Notebook

You can run Jupyter Notebook as either an interactive or batch job.

Interactive Method

  1. Start an interactive job:
srun --pty bash
  1. Set up your environment:
source jupyter_project.env       # Activate virtual env
export NODEIP=$(hostname -i)     # Get the IP address of your node
export NODEPORT=$(( $RANDOM + 1024 ))  # Get a random port above 1024
echo $NODEIP:$NODEPORT           # Note these values for the SSH tunnel
jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser
  1. In a new terminal on your local machine, create an SSH tunnel:
ssh -N -L 8888:$NODEIP:$NODEPORT username@beehive.ttic.edu

(Replace $NODEIP, $NODEPORT, and username with the actual values)

  1. Open your local browser and visit: http://localhost:8888 The login token is displayed in the output of the jupyter-notebook command.

Batch Method

  1. Create a batch job script named jupyter-notebook.sbatch:
#!/bin/bash
#SBATCH --job-name=jupyter
#SBATCH --output=jupyter_%j.log
#SBATCH --partition=cpu
#SBATCH --cpus-per-task=2

NODEIP=$(hostname -i)
NODEPORT=$(( $RANDOM + 1024))
echo "ssh command: ssh -N -L 8888:$NODEIP:$NODEPORT $(whoami)@beehive.ttic.edu"

source $HOME/jupyter_project/jupyter_project.env
jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser
  1. Submit the batch job:
sbatch jupyter-notebook.sbatch
  1. Check the job output file to find the SSH command to use when accessing your notebook.

  2. Create an SSH tunnel as instructed in the output:

ssh -N -L 8888:###.###.###.###:#### username@beehive.ttic.edu
  1. Open your local browser and visit: http://localhost:8888

Troubleshooting

If you're having problems with the token or password:

  1. Stop the notebook server
  2. Remove the runtime files:
rm -rf ~/.local/share/jupyter/runtime
  1. Restart the server