Zest Cluster Specifications

Technical specifications for the Zest High-Performance Computing (HPC) cluster.

Overview

Scheduler: Slurm
Type: High-Performance Computing (HPC)
Best For: Parallel jobs, MPI, GPU acceleration, long-running workloads
Access: SSH via its-zest-loginX.syr.edu (X = your assigned login node)

Partitions

Zest has multiple partitions optimized for different workloads. If no partition is specified, jobs use the normal partition by default.

Partition	Purpose	Max Runtime	Default
normal	CPU-intensive workloads	20 days	✓ Yes
compute_zone2	CPU-intensive workloads	20 days
longjobs	Extended runtime workloads	40 days
gpu	GPU computations	20 days
gpu_zone2	GPU computations	20 days

Specifying Partitions in Job Scripts

# Single partition
#SBATCH --partition=gpu

# Multiple partitions (job will use first available)
#SBATCH --partition=gpu_zone2,gpu

# CPU partitions
#SBATCH --partition=compute_zone2,normal

GPU Resources

Available GPU Models

NVIDIA A40 - Primary GPU (most common)
Other models may be available

Requesting GPUs

Basic GPU request:

#SBATCH --partition=gpu_zone2,gpu
#SBATCH --gres=gpu:1

Request specific GPU model (if required by your code):

#SBATCH --partition=gpu_zone2,gpu
#SBATCH --gres=gpu:1
#SBATCH --constraint=gpu_type:A40

Multiple GPUs:

#SBATCH --gres=gpu:2

Best Practice: Only specify GPU model if your code requires it. Leaving it unspecified allows the scheduler to assign any available GPU, which can get your job running faster.

CPU and Memory

Requesting Resources

# Number of nodes
#SBATCH --nodes=1

# Tasks per node
#SBATCH --ntasks-per-node=20

# CPUs per task
#SBATCH --cpus-per-task=1

# Memory (specify M for megabytes or G for gigabytes)
#SBATCH --mem=4G              # Total memory per node
#SBATCH --mem-per-cpu=2G      # Memory per CPU

Resource Limits

Check current node availability and specifications:

sinfo                    # Basic node information
sinfo -o "%n %c %m %f"  # Detailed: node, CPUs, memory, features

Storage

Home Directory

Path: /home/netid/
Type: NetApp storage, mounted on all nodes at job start
Accessibility: Available on login nodes and automatically mounted on compute nodes when your job runs
Use for: Scripts, code, data files, conda environments, results

Temporary Storage During Jobs

Recommended: Create scratch directories within your home directory
Example: /home/netid/scratch/ or /home/netid/job_temp/
Why? Your home directory is already mounted on compute nodes, no need for separate temp storage

Best Practices

Store all your data in /home/ - it’s accessible everywhere
Create subdirectories for organization (data, scripts, results, scratch)
For large temporary files during computation, create a scratch directory in your home
Clean up large temporary files after jobs complete to manage your quota

Available Software

Using Lmod Modules

Zest uses Lmod for managing software environments.

# List all available modules
module avail

# Search for specific module
module spider python

# Load a module
module load anaconda3

# List currently loaded modules
module list

# Unload a module
module unload anaconda3

# Unload all modules
module purge

Commonly Used Modules

Module	Version	Description
anaconda3	2023.9	Python environment (Conda)
cuda	12.3	NVIDIA CUDA libraries
openmpi4	4.1.6	MPI implementation
gromacs	2023.2	Molecular dynamics
singularity	3.7.1	Container runtime
gnu12	12.3.0	GNU compiler family

For a complete up-to-date list, use module spider on the cluster.

Job Submission Limits

Runtime Limits

normal, compute_zone2, gpu, gpu_zone2: 20 days maximum
longjobs: 40 days maximum

Resource Recommendations

Start with conservative estimates
Monitor first few jobs to optimize requests
Under-requesting memory will cause job failures
Over-requesting resources means longer queue times

Checking Cluster Status

# View partition and node information
sinfo

# View your jobs
squeue -u netid

# View all jobs in a partition
squeue -p gpu

# Detailed node information
scontrol show nodes

# Job accounting information
sacct

# Detailed job information
scontrol show job <jobid>

Common Job Patterns

Basic CPU Job

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --time=02:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu

module load anaconda3
python my_script.py

GPU Job

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=32G
#SBATCH --partition=gpu_zone2,gpu
#SBATCH --gres=gpu:1
#SBATCH --time=1-00:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu

module load cuda
module load anaconda3
python train_model.py

MPI Parallel Job

#!/bin/bash
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=20
#SBATCH --cpus-per-task=1
#SBATCH --time=12:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu

module load openmpi4
mpirun my_parallel_program

External Resources

Getting Help

Questions about Zest specifications or optimal job configuration?

📧 Email researchcomputing@syr.edu