Beehive: Usage & Guidelines

Current Status: [Docs] | [Job Process Info] | [Job Queue] | [Cluster Usage & Load]

The TTIC cluster beehive is a pool of machines, many with modern GPUs, to which users may submit compute jobs through the slurm scheduler.

Table of Contents

General Guidelines

Much of the cluster infrastructure relies on users monitoring their own jobs and usage while being careful about adhering to policy, rather than automated scripts that kill jobs or block accounts for infractions. We request that you respect this collegial arrangement, and be careful that your usage of the cluster adheres to the following set of guidelines:

  1. Use the head node (i.e., beehive.ttic.edu) only to submit and monitor jobs. Do not run any computationally intensive processes (including installing software). You should use an interactive job for such things.

  2. We generally discourage the use of interactive jobs, but recognize that they are necessary in some workflows (for example, for compilation and initial testing of programs). However, we find that with interactive jobs, users often confuse which machine they are on, and either (a) confuse the head node for a compute node and start running their jobs on the head node, which slows it down and makes it difficult or impossible for other users to submit their jobs; or (b) confuse compute nodes for the head node and use them to submit jobs, and take up a slot on the compute node that remains idle. If you do use interactive jobs, please keep track of which machine you are on!

  3. Scratch space: if your jobs need to repeatedly read and write large files from disk, we ask that you use fast temporary local storage (4T SSD) on the compute nodes, and not your NFS-shared home directories. Scratch space is available in /scratch on all compute nodes. We also request that you delete all temporary files when you are done with them, and also organize them in a subdirectory with your user or group name.

    However, if there is some dataset that you expect to use multiple times, you should leave it in the temporary directory rather than transferring it at the beginning of every job. Your job could check for the presence of the dataset and copy it from a central location only if its absent. Optionally, if this is a large dataset that you expect to use over a period of time, you can ask the IT Director to place it on all (or a subset of) compute nodes.

Submitting jobs

All jobs are submitted by logging in via ssh to the head node beehive.ttic.edu which, along with all compute nodes, mounts your NFS home directory. Jobs are run by submitting them to the slurm scheduler, which then executes them on one of the compute nodes.

In this section, we provide information to get you started with using the scheduler, and about details of the TTIC cluster setup. For complete documentation, see the man pages on the slurm website.

Understanding Partitions

All jobs in slurm are submitted to a partition---which defines whether the submission is a GPU or CPU job, the set of nodes it can be run on, and the priority it will have in the queue. Different users will have access to different partitions (based on the group's contributions to the cluster) as noted below:

You can run the sinfo command to see which partitions you have access to. Please consult with your faculty adviser (or the IT Director) if you need access to other partitions.

All of the above partitions have a strict time limit of 8 hours per job, and jobs that do not finish in this time will be killed.

Batch Jobs

The primary way to submit jobs is through the command sbatch. In this regime, you write the commands you want executed into a script file (typically, a bash script). It is important that the first line of this file is a shebang line to the script interpreter: in most cases, you will want to use #!/bin/bash.

The sbatch command also takes the following options (amongst others):

There are many others options you can pass to sbatch to customize your jobs (for example, to submit array jobs, to use MPI, etc.). See the sbatch man page.

Array Jobs

You can submit array jobs using the python script below. You must provide an input file that contains the commands you wish to run for each job on a single line and a partition on which to run the job. The script will then package up a batch-commands-$.txt and sbatch-script-$.txt splitting your input file into batches of 5000 if necessary. You then submit the job by running sbatch sbatch-script-$.txt. You can also optionally supply a job name and constraint with -J and -C respectively.

#!/usr/bin/env python

import argparse

parser = argparse.ArgumentParser(description='TTIC SLURM sbatch script creator')
parser.add_argument('INPUT_FILE', help='Input file with list of commands to run')
parser.add_argument('PARTITION', help='Name of partition to use')
parser.add_argument('-C', '--constraint', help='Constraint to use')
parser.add_argument('-J', '--job-name', help='Name of the job')

args = parser.parse_args()

def gen_sbatch_end(constraint, job_name):
  if constraint and job_name:
    sbatch_end = ' --constraint=' + args.constraint + ' --job-name=' + args.job_name
  elif constraint:
    sbatch_end = ' --constraint=' + args.constraint
  elif job_name:
    sbatch_end = ' --job-name=' + args.job_name
  else:
    sbatch_end = ''
  return sbatch_end

file_in = open(args.INPUT_FILE, 'r')
lines = file_in.readlines()

count = 0
commands = []
while count < len(lines):
  if count % 5000 == 0 and count > 0:
    index = count / 5000
    file_out = open('batch-commands-' + str(index) + '.txt', 'w')
    for i in commands:
      file_out.write(i.strip() + '\n')
    file_out.close()
    file_out = open('sbatch-script-' + str(index) + '.txt', 'w')
    file_out.write('#!/bin/bash\n')
    sbatch_end = gen_sbatch_end(args.constraint, args.job_name)
    file_out.write('#SBATCH --partition=' + args.PARTITION + ' --array=1-' + str(len(commands)) + sbatch_end + '\n')
    file_out.write('bash -c "`sed "${SLURM_ARRAY_TASK_ID}q;d" '+'batch-commands-'+str(index)+'.txt'+'`"')
    file_out.close()
    commands = []
  commands.append(lines[count])
  count += 1

file_out = open('batch-commands-last.txt', 'w')
for i in commands:
  file_out.write(i.strip() + '\n')
file_out.close()
file_out = open('sbatch-script-last.txt', 'w')
file_out.write('#!/bin/bash\n')
sbatch_end = gen_sbatch_end(args.constraint, args.job_name)
file_out.write('#SBATCH --partition=' + args.PARTITION + ' --cpus-per-task=1 --array=1-' + str(len(commands)) + sbatch_end + '\n')
file_out.write('bash -c "`sed "${SLURM_ARRAY_TASK_ID}q;d" '+'batch-commands-last.txt'+'`"')
file_out.close()

Interactive jobs

While we recommend that you try to use batch jobs for majority of tasks submitted to the cluster, it may be necessary to run programs interactively occasionally to set-up your experiments for the first time. You can use the srun command to request an interactive shell on a compute node.

Call srun with the same options as sbatch above to specify partition, number of cores, etc., followed by the option --pty bash. For example, to request a shell with access to a single gpu on the gpu partition, run

srun -p gpu -G 1 --pty bash

Note that interactive jobs are subject to the same time limits and priority as batch jobs, which means that you might have to wait for your job to be scheduled, and that your shell will be automatically killed after the time limit expires.

Job Sequences for Dealing with Time limits

Let's say you have split up your job into a series of three script files called: optimize_1.sh, optimize_2.sh, optimize_3.sh --- each of which runs under the cluster's time limit, and picks up from where the last left off. You can request that they be executed as separate jobs in sequence on the cluster.

Pick a unique "job name" for the sequence, (let's say "series_A"). Then, just submit the three batch jobs in series using sbatch, with the additional command parameters -J series_A -d singleton. For example:

sbatch -p gpu -c1 -J series_A -d singleton optimize_1.sh
sbatch -p gpu -c1 -J series_A -d singleton optimize_2.sh
sbatch -p gpu -c1 -J series_A -d singleton optimize_3.sh

All three jobs will be immediately added to the queue, and if there are slots free, optimize_1.sh will start running. But optimize_2.sh will NOT start until the first job is done, and similarly, optimize_3.sh will only be started after the other two jobs have ended. Note that there is no guarantee that they will start on the same machine.

The singleton dependency essentially requires that previously submitted jobs with the same name (by the same user) have finished. There is a caveat however---the subsequent job will even be started if the previous job failed, or was killed (for example, because it overshot the time limit). So your scripts should be robust to the possibility that the previous job may have failed.

Note that you can have multiple such sequences running in parallel by giving them different names.

Monitoring your usage

Once your jobs have been scheduled, you can keep an eye on them using both command line tools on the login host, as well as on the cluster website http://slurm.ttic.edu/. At the very least, you should monitor your jobs to ensure that their processor usage is not exceeding what you requested when submitting these jobs.

The cluster website provides you with a listing of scheduled and waiting jobs in the cluster queue, shows you statistics of load on the cluster, as well as provides details (from the output of ps and nvidia-smi) of processes corresponding to jobs running on the cluster.

You can also use the slurm command line tool squeue to get a list of jobs in the queue (remember to call it with the -a option to see all jobs, including those in other groups' partitions that you may not have access to). To get a listing like the output on the website, which organizes job sequences into single entries, you can run xqueue.py.

Finally, use the scancel to cancel any of your running or submitted jobs. See the scancel man page for details on how to call this command. In particular, if you are using job sequences, you can use -n series_name option to cancel all jobs in a sequence.

List of Node Names & Features

Node-name Public Cores RAM GPU(s) GPU Type Feature labels
test-c0 Y 12 64G - -
test-g0 Y 16 128G 2 nvidia_rtx_a6000 ampere, 48g
c0 Y 128 1024G - -
c1 Y 64 256G - -
c2 Y 12 128G - -
c3 Y 20 128G - -
c4 Y 12 128G - -
c5 Y 8 48G - -
c6 Y 8 48G - -
c7 Y 8 48G - -
c8 Y 8 48G - -
g0 Y 20 256G 4 nvidia_geforce_rtx_2080_ti turing, 11g
g1 Y 16 256G 4 nvidia_titan_v volta, 12g
g2 Y 20 192G 8 nvidia_geforce_rtx_2080_ti turing, 11g
g3 Y 20 192G 8 nvidia_rtx_5000_ada_generation ada, 32g
g4 Y 48 1024G 8 nvidia_rtx_a4000 ampere, 16g
g5 Y 48 1024G 8 nvidia_rtx_6000_ada_generation ada, 48g
g6 Y 20 192G 8 nvidia_rtx_a6000 ampere, 48g
g7 Y 20 256G 10 nvidia_rtx_a4000 ampere, 16g
g8 Y 16 256G 4 nvidia_rtx_a4000 ampere, 16g
g9 Y 20 384G 8 nvidia_geforce_rtx_2080_ti turing, 11g
g10 Y 20 192G 8 nvidia_geforce_rtx_2080_ti turing, 11g
g11 Y 24 192G 4 nvidia_geforce_rtx_2080_ti turing, 11g
g12 Y 24 192G 4 nvidia_geforce_rtx_2080_ti turing, 11g
g13 Y 20 192G 8 nvidia_geforce_rtx_2080_ti turing, 11g
g14 Y 20 384G 8 nvidia_rtx_a6000 ampere, 48g
g15 Y 20 384G 8 nvidia_rtx_a6000 ampere, 48g
g16 Y 48 1024G 8 nvidia_rtx_6000_ada_generation ada, 48g
g17 Y 20 192G 4 nvidia_rtx_a4000 ampere, 16g
g18 Y 20 192G 4 nvidia_rtx_a4000 ampere, 16g
g19 Y 48 1024G 8 nvidia_rtx_a6000 ampere, 48g
g20 Y 48 1536G 8 nvidia_rtx_6000_ada_generation ada, 48g
g21 Y 48 2304G 8 nvidia_l40s ada, 48g
priv-g0 N 8 128G 1 nvidia_rtx_a5500 ampere, 24g
priv-g1 N 4 64G 2 quadro_rtx_8000 turing, 48g
priv-g2 N 48 1536G 8 nvidia_rtx_a6000 ampere, 48g
priv-g3 N 20 384G 8 nvidia_rtx_a6000 ampere, 48g
priv-g4 N 4 64G 2 nvidia_rtx_a5000 ampere, 24g
priv-g5 N 4 64G 2 nvidia_rtx_a5000 ampere, 24g
priv-g6 N 12 256G 2 nvidia_geforce_rtx_2080_ti turing, 11g
priv-g7 N 8 64G 2 quadro_rtx_6000 turing, 24g
priv-g8 N 8 128G 2 nvidia_geforce_rtx_2080_ti turing, 11g
priv-g9 N 12 32G 1 nvidia_geforce_rtx_2080_ti turing, 11g
priv-g10 N 8 56G 1 nvidia_rtx_4000_ada_generation ada, 20g
priv-g11 N 8 128G 2 quadro_rtx_6000 turing, 24g
priv-g12 N 4 64G 2 nvidia_geforce_gtx_1080_ti pascal, 11g
priv-g13 N 12 128G 1 nvidia_rtx_6000_ada_generation ada, 48g

Software Tips

Apptainer (See also this writeup - [care of David Yunis, thanks!])

Building an image

First we will start an interactive job on a node with a GPU srun -p gpu -G1 --pty bash and then set the following environment variables.

export MYDIR=/scratch/$USER && mkdir -p $MYDIR && cd $MYDIR
export APPTAINER_CACHEDIR=$MYDIR/apptainer_cache
export TMPDIR=$MYDIR/tmp && mkdir -p $TMPDIR

We are using /scratch to reduce network I/O and gain faster I/O via a local to the node SSD.

Now we create a definition file. Please see the documentation here for more examples and in-depth explanations of the various components of the definition file.

This file will be used to build the image, and it can contain all of the necessary software and dependencies for your environment.

For example, to build an image with PyTorch 2.2.0 and CUDA 12.3, along with some system dependencies, copy the following into a file named image.def:

Bootstrap: docker
From: nvcr.io/nvidia/pytorch:24.01-py3

%environment
    DEBIAN_FRONTEND=noninteractive
    TZ=America/Chicago
    export DEBIAN_FRONTEND TZ

%post
    # Install system dependencies
    apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        bzip2 \
        ca-certificates \
        cmake \
        curl \
        ffmpeg \
        g++ \
        git \
        imagemagick \
        libegl1 \
        libegl1-mesa-dev \
        libgl1 \
        libgl1-mesa-dev \
        libgles2 \
        libgles2-mesa-dev \
        libglvnd-dev \
        libglvnd0 \
        libglx0 \
        libnss3-dev \
        libopenexr-dev \
        libx264-dev \
        tmux \
        unzip \
        vim \
        wget \
        htop

    # Update pip
    pip install --upgrade pip

We now build the image using the --sandbox option to build a sandbox rather than a single SIF file. This will let us both modify and test the image before deploying it to the cluster.

apptainer build --sandbox YOUR_IMAGE image.def

Ideally we would specify all of the software to install and configure in the image.def file, but sometimes that isn't possible.

To modify the image use --writable which will allow you to modify the image filesystem, and --no-home will prevent the image from mounting your home directory. Only install necessary packages for the environment here; the apptainer will still have access to the host filesystem and all of your external data. For example, to install some Python packages:

apptainer shell --writable --no-home YOUR_IMAGE
pip install numpy matplotlib pandas

Press Ctrl+D to exit the shell and return to the host. If you want to access the GPU to install packages with CUDA dependencies, pass --nv:

apptainer shell --writable --no-home --nv YOUR_IMAGE
pip install tensorflow-gpu

You can also run commands in the image without entering the shell by storing them in a bash script and passing it to apptainer exec:

echo "pip install numpy matplotlib pandas" > install.sh
apptainer exec --writable --no-home YOUR_IMAGE /bin/bash install.sh

If you want to install packages in the container as the root user, you can use the --fakeroot option. For example, to install apache2 as root, use the following command:

apptainer exec --writable --no-home --fakeroot YOUR_IMAGE /bin/bash -c "sudo apt-get install -y --no-install-recommends apache2"

Once you are satisfied with the image, build it as an immutable SIF file. This will prevent you from modifying the image filesystem in the future, but it will also make it both easier and safer to deploy to the cluster as this file will not allow any changes to it's filesystem.

apptainer build FINAL_IMAGE.sif YOUR_IMAGE

Move the file to the cluster and remove the temporary directory:

export SAVEDIR=/path/to/your/save/directory
mv FINAL_IMAGE.sif $SAVEDIR
cd $HOME && rm -rf $MYDIR

Using the image

The apptainer acts as a virtual environment like conda and has access to the host filesystem, so you can access your data and external code/software from within the container. For example, if you have a Python script that consumes data and saves some information to disk, you can do something like this:

export DATADIR=/path/to/your/data
export DISCLOC=/path/to/your/disc/save/location && mkdir -p $DISCLOC && cd $DISCLOC
apptainer exec $SAVEDIR/FINAL_IMAGE.sif /bin/bash -c "python /path/to/your/script.py --data $DATADIR"

Once the apptainer has executed the script, the results will be saved to $DISCLOC on the host filesystem. You can then access the results directly from the host without needing to copy them out of the container (as the container filesystem cannot be modified).

Submitting jobs using the image

To submit a job to the cluster, you can use the apptainer exec command in a job script. For example, to run a Python script that consumes data and saves some information to disk, you can create a job script like this:

#!/usr/bin/env bash

#SBATCH --job-name=$JOB_NAME
#SBATCH -d singleton

#SBATCH --partition=$PARTITION
#SBATCH -G $NUM_GPUS
#SBATCH -C $CONSTRAINTS

#SBATCH --output=slurm.out
#SBATCH --open-mode=append

#SBATCH --export=ALL,IS_REMOTE=1

cd $DISCLOC

apptainer exec --nv --no-home $SAVEDIR/FINAL_IMAGE.sif /bin/bash -c "python /path/to/your/script.py --data $DATADIR"

Assuming the above file is named job.sh, you can submit the job to the cluster with the following command: sbatch job.sh. Extending this to multiple jobs is as straightforward as creating multiple job scripts and submitting them all at once. The beauty of the apptainer is that all of these jobs can run simultaneously and access the same apptainer without interfering with each other or the host filesystem because the apptainer itself is immutable.

Binds and Mounts

By default apptainer will bind some locations however often you will want to specifcy your own bind with something like: --bind /path/on/host:/path/in/container. For simplicity we recommend binding at the root level of the share and using the same path on the host and in the container: --bind /share/data/speech:/share/data/speech.

Anaconda

Because users are limited to 20G home directories Miniconda is preferred.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/mc3
rm Miniconda3-latest-Linux-x86_64.sh
eval "$($HOME/mc3/bin/conda 'shell.bash' 'hook')"

You will want to run the last command every time you want to start working within miniconda. Installing this way (skipping a bashrc auto initialization) will keep your logins quick by not loading and scanning files unnecessarily.

Jupyter Notebook

First you will need to install jupyter notebook. Here are a couple of of options. The examples below will be using the virtualenv option.

  1. Get an interactive job on a node srun -p dev-cpu --pty bash
  2. Install Anaconda (Miniconda is preferred) OR Create a python virtualenv (see below)
python3 -m venv ~/myenv # create the virtualenv
source ~/myenv/bin/activate # activate the env
pip install --upgrade pip # it's always a good idea to update pip
pip install notebook # install jupyter-notebook
pip cache clear # empty cache to save home directory space

You can run the jupyter notebook as either an interactive or batch job.

Interactive

srun --pty bash # run an interactive job
source ~/myenv/bin/activate # activate virutal env
export NODEIP=$(hostname -i) # get the ip address of the node you are using
export NODEPORT=$(( $RANDOM + 1024 )) # get a random port above 1024
echo $NODEIP:$NODEPORT # echo the env var values to use later
jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser # start the jupyter notebook

Make a new ssh connection with a tunnel to access your notebook

ssh -N -L 8888:$NODEIP:$NODEPORT user@beehive.ttic.edu

substituting the values for the variables.

This will make an ssh tunnel on your local machine that fowards traffic sent to localhost:8888 to $NODEIP:$NODEPORT via the ssh tunnel. This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.

Open your local browser and visit: http://localhost:8888 the token to login is avaiable in the output of the jupyter-notebook command above.

Batch

The process for a batch job is very similar.

jupyter-notebook.sbatch

#!/bin/bash
NODEIP=$(hostname -i)
NODEPORT=$(( $RANDOM + 1024))
echo "ssh command: ssh -N -L 8888:$NODEIP:$NODEPORT `whoami`@beehive.ttic.edu"
source ~/myenv/bin/activate
jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser

Check the output of your job to find the ssh command to use when accessing your notebook.

Make a new ssh connection to tunnel your traffic. The format will be something like:

ssh -N -L 8888:###.###.###.###:#### user@beehive.ttic.edu

This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.

Open your local browser and visit: http://localhost:8888

Token/Password Problems

If you are having problems finding the token or password try stopping the notebook server running rm -rf ~/.local/share/jupyter/runtime and restarting the server.

PyTorch

These are the commands to install the current stable version of pytorch. In this example we are using /scratch, though in practice you may want to install it in a network location. The total install is 11G which means that installing in your home directory in not recommended.

# getting an interactive job on a gpu node
srun -p contrib-gpu --pty bash

# creating a place to work
export MYDIR=/scratch/$USER/pytorch
mkdir -p $MYDIR && cd $MYDIR

# installing miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $MYDIR/mc3
rm Miniconda3-latest-Linux-x86_64.sh

# activating the miniconda base environment (you will need to run this before using pytorch in future sessions).
eval "$($MYDIR/mc3/bin/conda 'shell.bash' 'hook')"

# install pytorch, with cuda 11.8
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# test (should return True)
python -c "import torch; print(torch.cuda.is_available())"

Tensorflow

When using tensorflow it will not respect common environmental variables to restrict the number of threads in use. If you add the following code to your tensorflow setup if will respect the correct number of threads requested with the -c option.

import os
NUM_THREADS = int(os.environ['OMP_NUM_THREADS'])
sess = tf.Session(config=tf.ConfigProto(
    intra_op_parallelism_threads=NUM_THREADS,
    inter_op_parallelism_threads=NUM_THREADS))

[For Faculty] Contributing to the Cluster

If you are a faculty member at TTIC, we would like to invite you to contribute machines or hardware to the cluster. A rather high-level outline of what it would mean to you is below.

The basic principle we intend to follow is that the setup should provide people, on average, with access to more resources than they have on their own, and to manage these pooled resources to maximize efficiency and throughput.

If you contribute hardware, you will gain access to the entire cluster (see description of partitions above), and be given a choice between two options for high-priority access:

  1. You (which means you and any users you designate as being in your "group") will be guaranteed access to your machines within a specified time window from when you request it (the window is 4 hours, i.e., the time-limit for any other users' job that may be running on your machine). Once you get this access, you can keep it as long as you need it. Effectively, you decide when you want to let others use your machines, with a limited waiting period when you want them back.

  2. You give up the ability to guarantee on-demand access to your specific machines, in exchange for a higher priority for your jobs on the entire cluster. You still can reserve your machines for up to four weeks per year (possibly in installments of at least a week each time).

  3. As noted above, the priority of one's jobs is affected by a weighted combination of waiting time, user priority (higher for members of a group that has contributed more equipment), and fair share (lower for users who have recently made heavy use of the cluster).