Current Status: [Docs] | [Job Process Info] | [Job Queue] | [Cluster Usage & Load]
The TTIC cluster beehive is a pool of machines, many with modern GPUs, to which users may submit compute jobs through the slurm scheduler.
Much of the cluster infrastructure relies on users monitoring their own jobs and usage while being careful about adhering to policy, rather than automated scripts that kill jobs or block accounts for infractions. We request that you respect this collegial arrangement, and be careful that your usage of the cluster adheres to the following set of guidelines:
Use the head node (i.e., beehive.ttic.edu) only to submit and monitor jobs. Do not run any computationally intensive processes (including installing software). You should use an interactive job for such things.
We generally discourage the use of interactive jobs, but recognize that they are necessary in some workflows (for example, for compilation and initial testing of programs). However, we find that with interactive jobs, users often confuse which machine they are on, and either (a) confuse the head node for a compute node and start running their jobs on the head node, which slows it down and makes it difficult or impossible for other users to submit their jobs; or (b) confuse compute nodes for the head node and use them to submit jobs, and take up a slot on the compute node that remains idle. If you do use interactive jobs, please keep track of which machine you are on!
Scratch space: if your jobs need to repeatedly read and write large files from disk, we ask that you use fast temporary local storage (4T SSD) on the compute nodes, and not your NFS-shared home directories. Scratch space is available in /scratch
on all compute nodes. We also request that you delete all temporary files when you are done with them, and also organize them in a subdirectory with your user or group name.
However, if there is some dataset that you expect to use multiple times, you should leave it in the temporary directory rather than transferring it at the beginning of every job. Your job could check for the presence of the dataset and copy it from a central location only if its absent. Optionally, if this is a large dataset that you expect to use over a period of time, you can ask the IT Director to place it on all (or a subset of) compute nodes.
All jobs are submitted by logging in via ssh
to the head node beehive.ttic.edu
which, along with all compute nodes, mounts your NFS home directory. Jobs are run by submitting them to the slurm scheduler, which then executes them on one of the compute nodes.
In this section, we provide information to get you started with using the scheduler, and about details of the TTIC cluster setup. For complete documentation, see the man pages on the slurm website.
All jobs in slurm are submitted to a partition---which defines whether the submission is a GPU or CPU job, the set of nodes it can be run on, and the priority it will have in the queue. Different users will have access to different partitions (based on the group's contributions to the cluster) as noted below:
dev-cpu
, dev-gpu
: These partitions are for development/testing and have a 1 hour time limit.
cpu
, gpu
: The public partitions for CPU and GPU jobs that are available to all users. These partitions only contain a subset of all machines in the cluster.
contrib-gpu
: Partitions available to members of contributing groups that contain all nodes in the cluster with a baseline priority level.
<group>-gpu
: Partitions associated with a particular group, with access to all nodes at an enhanced priority level, based on the arrangement under which that group contributed resources to the cluster.
You can run the sinfo
command to see which partitions you have access to. Please consult with your faculty adviser (or the IT Director) if you need access to other partitions.
All of the above partitions have a strict time limit of 8 hours per job, and jobs that do not finish in this time will be killed.
The primary way to submit jobs is through the command sbatch
. In this regime, you write the commands you want executed into a script file (typically, a bash script). It is important that the first line of this file is a shebang line to the script interpreter: in most cases, you will want to use #!/bin/bash
.
The sbatch
command also takes the following options (amongst others):
-p name
: Partition name. e.g., -p contrib-gpu
.
-cN
: Number N of cpu cores that the job will use. e.g., -c1
-GN
: Number N of gpus that the job will use. e.g., -G2
--gpus=[type:]<number
: Type and number of gpus the job will use. e.g., --gpus=nvidia_rtx_a6000:2
-NN
: Number N of nodes that the job will use, if omitted this will be set to 1. e.g., -N2
-C feature
: Optional parameter specifying that you want your job only to run on nodes with specific features. In our current setup, this is used for GPU jobs to request a minimum amount of GPU memory, or the microarchitecture. See the complete listing of nodes and features for details.
There are many others options you can pass to sbatch to customize your jobs (for example, to submit array jobs, to use MPI, etc.). See the sbatch man page.
You can submit array jobs using the python script below. You must provide an input file that contains the commands you wish to run for each job on a single line and a partition on which to run the job. The script will then package up a batch-commands-$.txt and sbatch-script-$.txt splitting your input file into batches of 5000 if necessary. You then submit the job by running sbatch sbatch-script-$.txt
. You can also optionally supply a job name and constraint with -J and -C respectively.
#!/usr/bin/env python
import argparse
parser = argparse.ArgumentParser(description='TTIC SLURM sbatch script creator')
parser.add_argument('INPUT_FILE', help='Input file with list of commands to run')
parser.add_argument('PARTITION', help='Name of partition to use')
parser.add_argument('-C', '--constraint', help='Constraint to use')
parser.add_argument('-J', '--job-name', help='Name of the job')
args = parser.parse_args()
def gen_sbatch_end(constraint, job_name):
if constraint and job_name:
sbatch_end = ' --constraint=' + args.constraint + ' --job-name=' + args.job_name
elif constraint:
sbatch_end = ' --constraint=' + args.constraint
elif job_name:
sbatch_end = ' --job-name=' + args.job_name
else:
sbatch_end = ''
return sbatch_end
file_in = open(args.INPUT_FILE, 'r')
lines = file_in.readlines()
count = 0
commands = []
while count < len(lines):
if count % 5000 == 0 and count > 0:
index = count / 5000
file_out = open('batch-commands-' + str(index) + '.txt', 'w')
for i in commands:
file_out.write(i.strip() + '\n')
file_out.close()
file_out = open('sbatch-script-' + str(index) + '.txt', 'w')
file_out.write('#!/bin/bash\n')
sbatch_end = gen_sbatch_end(args.constraint, args.job_name)
file_out.write('#SBATCH --partition=' + args.PARTITION + ' --array=1-' + str(len(commands)) + sbatch_end + '\n')
file_out.write('bash -c "`sed "${SLURM_ARRAY_TASK_ID}q;d" '+'batch-commands-'+str(index)+'.txt'+'`"')
file_out.close()
commands = []
commands.append(lines[count])
count += 1
file_out = open('batch-commands-last.txt', 'w')
for i in commands:
file_out.write(i.strip() + '\n')
file_out.close()
file_out = open('sbatch-script-last.txt', 'w')
file_out.write('#!/bin/bash\n')
sbatch_end = gen_sbatch_end(args.constraint, args.job_name)
file_out.write('#SBATCH --partition=' + args.PARTITION + ' --cpus-per-task=1 --array=1-' + str(len(commands)) + sbatch_end + '\n')
file_out.write('bash -c "`sed "${SLURM_ARRAY_TASK_ID}q;d" '+'batch-commands-last.txt'+'`"')
file_out.close()
While we recommend that you try to use batch jobs for majority of
tasks submitted to the cluster, it may be necessary to run programs
interactively occasionally to set-up your experiments for the first
time. You can use the srun
command to request an interactive
shell on a compute node.
Call srun
with the same options as sbatch
above to specify partition, number of cores, etc., followed by the option --pty bash
. For example, to request a shell with access to a single gpu on the gpu
partition, run
srun -p gpu -G 1 --pty bash
Note that interactive jobs are subject to the same time limits and priority as batch jobs, which means that you might have to wait for your job to be scheduled, and that your shell will be automatically killed after the time limit expires.
Let's say you have split up your job into a series of three script files called: optimize_1.sh, optimize_2.sh, optimize_3.sh --- each of which runs under the cluster's time limit, and picks up from where the last left off. You can request that they be executed as separate jobs in sequence on the cluster.
Pick a unique "job name" for the sequence, (let's say
"series_A"). Then, just submit the three batch jobs in series using
sbatch, with the additional command parameters -J series_A -d
singleton
. For example:
sbatch -p gpu -c1 -J series_A -d singleton optimize_1.sh
sbatch -p gpu -c1 -J series_A -d singleton optimize_2.sh
sbatch -p gpu -c1 -J series_A -d singleton optimize_3.sh
All three jobs will be immediately added to the queue, and if there are slots free, optimize_1.sh will start running. But optimize_2.sh will NOT start until the first job is done, and similarly, optimize_3.sh will only be started after the other two jobs have ended. Note that there is no guarantee that they will start on the same machine.
The singleton dependency essentially requires that previously submitted jobs with the same name (by the same user) have finished. There is a caveat however---the subsequent job will even be started if the previous job failed, or was killed (for example, because it overshot the time limit). So your scripts should be robust to the possibility that the previous job may have failed.
Note that you can have multiple such sequences running in parallel by giving them different names.
Once your jobs have been scheduled, you can keep an eye on them using both command line tools on the login host, as well as on the cluster website http://slurm.ttic.edu/. At the very least, you should monitor your jobs to ensure that their processor usage is not exceeding what you requested when submitting these jobs.
The cluster website provides you with a listing of scheduled and waiting jobs in the cluster queue, shows you statistics of load on the cluster, as well as provides details (from the output of ps
and nvidia-smi
) of processes corresponding to jobs running on the cluster.
You can also use the slurm command line tool squeue
to get a list of jobs in the queue (remember to call it with the -a
option to see all jobs, including those in other groups' partitions that you may not have access to). To get a listing like the output on the website, which organizes job sequences into single entries, you can run xqueue.py
.
Finally, use the scancel
to cancel any of your running or submitted jobs. See the scancel man page for details on how to call this command. In particular, if you are using job sequences, you can use -n series_name
option to cancel all jobs in a sequence.
Nodes that are public are available to all users, other nodes are only available to groups who have contributed resources to the cluster
Nodes test-c0 and test-g0 are development nodes with a 1 hour time limit
Nodes priv-g[0-13] are exclusive access nodes that are not part of any other partitions
Node feature labels for the amount of GPU memory are inclusive (i.e.: if a node has the label 48g, it also has all smaller amounts, they are omitted below for clarity).
Node-name | Public | Cores | RAM | GPU(s) | GPU Type | Feature labels |
---|---|---|---|---|---|---|
test-c0 | Y | 12 | 64G | - | - | |
test-g0 | Y | 16 | 128G | 2 | nvidia_rtx_a6000 | ampere, 48g |
c0 | Y | 128 | 1024G | - | - | |
c1 | Y | 64 | 256G | - | - | |
c2 | Y | 12 | 128G | - | - | |
c3 | Y | 20 | 128G | - | - | |
c4 | Y | 12 | 128G | - | - | |
c5 | Y | 8 | 48G | - | - | |
c6 | Y | 8 | 48G | - | - | |
c7 | Y | 8 | 48G | - | - | |
c8 | Y | 8 | 48G | - | - | |
g0 | Y | 20 | 256G | 4 | nvidia_geforce_rtx_2080_ti | turing, 11g |
g1 | Y | 16 | 256G | 4 | nvidia_titan_v | volta, 12g |
g2 | Y | 20 | 192G | 8 | nvidia_geforce_rtx_2080_ti | turing, 11g |
g3 | Y | 20 | 192G | 8 | nvidia_rtx_5000_ada_generation | ada, 32g |
g4 | Y | 48 | 1024G | 8 | nvidia_rtx_a4000 | ampere, 16g |
g5 | Y | 48 | 1024G | 8 | nvidia_rtx_6000_ada_generation | ada, 48g |
g6 | Y | 20 | 192G | 8 | nvidia_rtx_a6000 | ampere, 48g |
g7 | Y | 20 | 256G | 10 | nvidia_rtx_a4000 | ampere, 16g |
g8 | Y | 16 | 256G | 4 | nvidia_rtx_a4000 | ampere, 16g |
g9 | Y | 20 | 384G | 8 | nvidia_geforce_rtx_2080_ti | turing, 11g |
g10 | Y | 20 | 192G | 8 | nvidia_geforce_rtx_2080_ti | turing, 11g |
g11 | Y | 24 | 192G | 4 | nvidia_geforce_rtx_2080_ti | turing, 11g |
g12 | Y | 24 | 192G | 4 | nvidia_geforce_rtx_2080_ti | turing, 11g |
g13 | Y | 20 | 192G | 8 | nvidia_geforce_rtx_2080_ti | turing, 11g |
g14 | Y | 20 | 384G | 8 | nvidia_rtx_a6000 | ampere, 48g |
g15 | Y | 20 | 384G | 8 | nvidia_rtx_a6000 | ampere, 48g |
g16 | Y | 48 | 1024G | 8 | nvidia_rtx_6000_ada_generation | ada, 48g |
g17 | Y | 20 | 192G | 4 | nvidia_rtx_a4000 | ampere, 16g |
g18 | Y | 20 | 192G | 4 | nvidia_rtx_a4000 | ampere, 16g |
g19 | Y | 48 | 1024G | 8 | nvidia_rtx_a6000 | ampere, 48g |
g20 | Y | 48 | 1536G | 8 | nvidia_rtx_6000_ada_generation | ada, 48g |
g21 | Y | 48 | 2304G | 8 | nvidia_l40s | ada, 48g |
priv-g0 | N | 8 | 128G | 1 | nvidia_rtx_a5500 | ampere, 24g |
priv-g1 | N | 4 | 64G | 2 | quadro_rtx_8000 | turing, 48g |
priv-g2 | N | 48 | 1536G | 8 | nvidia_rtx_a6000 | ampere, 48g |
priv-g3 | N | 20 | 384G | 8 | nvidia_rtx_a6000 | ampere, 48g |
priv-g4 | N | 4 | 64G | 2 | nvidia_rtx_a5000 | ampere, 24g |
priv-g5 | N | 4 | 64G | 2 | nvidia_rtx_a5000 | ampere, 24g |
priv-g6 | N | 12 | 256G | 2 | nvidia_geforce_rtx_2080_ti | turing, 11g |
priv-g7 | N | 8 | 64G | 2 | quadro_rtx_6000 | turing, 24g |
priv-g8 | N | 8 | 128G | 2 | nvidia_geforce_rtx_2080_ti | turing, 11g |
priv-g9 | N | 12 | 32G | 1 | nvidia_geforce_rtx_2080_ti | turing, 11g |
priv-g10 | N | 8 | 56G | 1 | nvidia_rtx_4000_ada_generation | ada, 20g |
priv-g11 | N | 8 | 128G | 2 | quadro_rtx_6000 | turing, 24g |
priv-g12 | N | 4 | 64G | 2 | nvidia_geforce_gtx_1080_ti | pascal, 11g |
priv-g13 | N | 12 | 128G | 1 | nvidia_rtx_6000_ada_generation | ada, 48g |
First we will start an interactive job on a node with a GPU srun -p gpu -G1 --pty bash
and then set the following environment variables.
export MYDIR=/scratch/$USER && mkdir -p $MYDIR && cd $MYDIR
export APPTAINER_CACHEDIR=$MYDIR/apptainer_cache
export TMPDIR=$MYDIR/tmp && mkdir -p $TMPDIR
We are using /scratch
to reduce network I/O and gain faster I/O via a local to the node SSD.
Now we create a definition file. Please see the documentation here for more examples and in-depth explanations of the various components of the definition file.
This file will be used to build the image, and it can contain all of the necessary software and dependencies for your environment.
For example, to build an image with PyTorch 2.2.0 and CUDA 12.3, along with some system dependencies, copy the following into a file named image.def
:
Bootstrap: docker
From: nvcr.io/nvidia/pytorch:24.01-py3
%environment
DEBIAN_FRONTEND=noninteractive
TZ=America/Chicago
export DEBIAN_FRONTEND TZ
%post
# Install system dependencies
apt-get update && apt-get install -y --no-install-recommends \
build-essential \
bzip2 \
ca-certificates \
cmake \
curl \
ffmpeg \
g++ \
git \
imagemagick \
libegl1 \
libegl1-mesa-dev \
libgl1 \
libgl1-mesa-dev \
libgles2 \
libgles2-mesa-dev \
libglvnd-dev \
libglvnd0 \
libglx0 \
libnss3-dev \
libopenexr-dev \
libx264-dev \
tmux \
unzip \
vim \
wget \
htop
# Update pip
pip install --upgrade pip
We now build the image using the --sandbox
option to build a sandbox rather than a single SIF file. This will let us both modify and test the image before deploying it to the cluster.
apptainer build --sandbox YOUR_IMAGE image.def
Ideally we would specify all of the software to install and configure in the image.def
file, but sometimes that isn't possible.
To modify the image use --writable
which will allow you to modify the image filesystem, and --no-home
will prevent the image from mounting your home directory. Only install necessary packages for the environment here; the apptainer will still have access to the host filesystem and all of your external data. For example, to install some Python packages:
apptainer shell --writable --no-home YOUR_IMAGE
pip install numpy matplotlib pandas
Press Ctrl+D
to exit the shell and return to the host. If you want to access the GPU to install packages with CUDA dependencies, pass --nv
:
apptainer shell --writable --no-home --nv YOUR_IMAGE
pip install tensorflow-gpu
You can also run commands in the image without entering the shell by storing them in a bash script and passing it to apptainer exec
:
echo "pip install numpy matplotlib pandas" > install.sh
apptainer exec --writable --no-home YOUR_IMAGE /bin/bash install.sh
If you want to install packages in the container as the root user, you can use the --fakeroot
option. For example, to install apache2
as root, use the following command:
apptainer exec --writable --no-home --fakeroot YOUR_IMAGE /bin/bash -c "sudo apt-get install -y --no-install-recommends apache2"
Once you are satisfied with the image, build it as an immutable SIF file. This will prevent you from modifying the image filesystem in the future, but it will also make it both easier and safer to deploy to the cluster as this file will not allow any changes to it's filesystem.
apptainer build FINAL_IMAGE.sif YOUR_IMAGE
Move the file to the cluster and remove the temporary directory:
export SAVEDIR=/path/to/your/save/directory
mv FINAL_IMAGE.sif $SAVEDIR
cd $HOME && rm -rf $MYDIR
The apptainer acts as a virtual environment like conda
and has access to the host filesystem, so you can access your data and external code/software from within the container. For example, if you have a Python script that consumes data and saves some information to disk, you can do something like this:
export DATADIR=/path/to/your/data
export DISCLOC=/path/to/your/disc/save/location && mkdir -p $DISCLOC && cd $DISCLOC
apptainer exec $SAVEDIR/FINAL_IMAGE.sif /bin/bash -c "python /path/to/your/script.py --data $DATADIR"
Once the apptainer has executed the script, the results will be saved to $DISCLOC
on the host filesystem. You can then access the results directly from the host without needing to copy them out of the container (as the container filesystem cannot be modified).
To submit a job to the cluster, you can use the apptainer exec
command in a job script. For example, to run a Python script that consumes data and saves some information to disk, you can create a job script like this:
#!/usr/bin/env bash
#SBATCH --job-name=$JOB_NAME
#SBATCH -d singleton
#SBATCH --partition=$PARTITION
#SBATCH -G $NUM_GPUS
#SBATCH -C $CONSTRAINTS
#SBATCH --output=slurm.out
#SBATCH --open-mode=append
#SBATCH --export=ALL,IS_REMOTE=1
cd $DISCLOC
apptainer exec --nv --no-home $SAVEDIR/FINAL_IMAGE.sif /bin/bash -c "python /path/to/your/script.py --data $DATADIR"
Assuming the above file is named job.sh
, you can submit the job to the cluster with the following command: sbatch job.sh
. Extending this to multiple jobs is as straightforward as creating multiple job scripts and submitting them all at once. The beauty of the apptainer is that all of these jobs can run simultaneously and access the same apptainer without interfering with each other or the host filesystem because the apptainer itself is immutable.
By default apptainer will bind some locations however often you will want to specifcy your own bind with something like: --bind /path/on/host:/path/in/container
. For simplicity we recommend binding at the root level of the share and using the same path on the host and in the container: --bind /share/data/speech:/share/data/speech
.
Because users are limited to 20G home directories Miniconda is preferred.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/mc3
rm Miniconda3-latest-Linux-x86_64.sh
eval "$($HOME/mc3/bin/conda 'shell.bash' 'hook')"
You will want to run the last command every time you want to start working within miniconda. Installing this way (skipping a bashrc auto initialization) will keep your logins quick by not loading and scanning files unnecessarily.
First you will need to install jupyter notebook. Here are a couple of of options. The examples below will be using the virtualenv option.
srun -p dev-cpu --pty bash
python3 -m venv ~/myenv # create the virtualenv
source ~/myenv/bin/activate # activate the env
pip install --upgrade pip # it's always a good idea to update pip
pip install notebook # install jupyter-notebook
pip cache clear # empty cache to save home directory space
You can run the jupyter notebook as either an interactive or batch job.
srun --pty bash # run an interactive job
source ~/myenv/bin/activate # activate virutal env
export NODEIP=$(hostname -i) # get the ip address of the node you are using
export NODEPORT=$(( $RANDOM + 1024 )) # get a random port above 1024
echo $NODEIP:$NODEPORT # echo the env var values to use later
jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser # start the jupyter notebook
Make a new ssh connection with a tunnel to access your notebook
ssh -N -L 8888:$NODEIP:$NODEPORT user@beehive.ttic.edu
substituting the values for the variables.
This will make an ssh tunnel on your local machine that fowards traffic sent to localhost:8888 to $NODEIP:$NODEPORT via the ssh tunnel. This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.
Open your local browser and visit: http://localhost:8888 the token to login is avaiable
in the output of the jupyter-notebook
command above.
The process for a batch job is very similar.
jupyter-notebook.sbatch
#!/bin/bash
NODEIP=$(hostname -i)
NODEPORT=$(( $RANDOM + 1024))
echo "ssh command: ssh -N -L 8888:$NODEIP:$NODEPORT `whoami`@beehive.ttic.edu"
source ~/myenv/bin/activate
jupyter-notebook --ip=$NODEIP --port=$NODEPORT --no-browser
Check the output of your job to find the ssh command to use when accessing your notebook.
Make a new ssh connection to tunnel your traffic. The format will be something like:
ssh -N -L 8888:###.###.###.###:#### user@beehive.ttic.edu
This command will appear to hang since we are using the -N option which tells ssh not to run any commands including a shell on the remote machine.
Open your local browser and visit: http://localhost:8888
If you are having problems finding the token or password try stopping the notebook server
running rm -rf ~/.local/share/jupyter/runtime
and restarting the server.
These are the commands to install the current stable version of pytorch. In this example we are using /scratch, though in practice you may want to install it in a network location. The total install is 11G which means that installing in your home directory in not recommended.
# getting an interactive job on a gpu node
srun -p contrib-gpu --pty bash
# creating a place to work
export MYDIR=/scratch/$USER/pytorch
mkdir -p $MYDIR && cd $MYDIR
# installing miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $MYDIR/mc3
rm Miniconda3-latest-Linux-x86_64.sh
# activating the miniconda base environment (you will need to run this before using pytorch in future sessions).
eval "$($MYDIR/mc3/bin/conda 'shell.bash' 'hook')"
# install pytorch, with cuda 11.8
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# test (should return True)
python -c "import torch; print(torch.cuda.is_available())"
When using tensorflow it will not respect common environmental variables to restrict the number of threads in use. If you add the following code to your tensorflow setup if will respect the correct number of threads requested with the -c option.
import os
NUM_THREADS = int(os.environ['OMP_NUM_THREADS'])
sess = tf.Session(config=tf.ConfigProto(
intra_op_parallelism_threads=NUM_THREADS,
inter_op_parallelism_threads=NUM_THREADS))
If you are a faculty member at TTIC, we would like to invite you to contribute machines or hardware to the cluster. A rather high-level outline of what it would mean to you is below.
The basic principle we intend to follow is that the setup should provide people, on average, with access to more resources than they have on their own, and to manage these pooled resources to maximize efficiency and throughput.
If you contribute hardware, you will gain access to the entire cluster (see description of partitions above), and be given a choice between two options for high-priority access:
You (which means you and any users you designate as being in your "group") will be guaranteed access to your machines within a specified time window from when you request it (the window is 4 hours, i.e., the time-limit for any other users' job that may be running on your machine). Once you get this access, you can keep it as long as you need it. Effectively, you decide when you want to let others use your machines, with a limited waiting period when you want them back.
You give up the ability to guarantee on-demand access to your specific machines, in exchange for a higher priority for your jobs on the entire cluster. You still can reserve your machines for up to four weeks per year (possibly in installments of at least a week each time).
As noted above, the priority of one's jobs is affected by a weighted combination of waiting time, user priority (higher for members of a group that has contributed more equipment), and fair share (lower for users who have recently made heavy use of the cluster).