Zest Cluster Specifications

Technical specifications for the Zest High-Performance Computing (HPC) cluster.


Overview

Scheduler: Slurm
Type: High-Performance Computing (HPC)
Best For: Parallel jobs, MPI, GPU acceleration, long-running workloads
Access: SSH via its-zest-loginX.syr.edu (X = your assigned login node)


Partitions

Zest has multiple partitions optimized for different workloads. If no partition is specified, jobs use the normal partition by default.

Partition Purpose Max Runtime Default
normal CPU-intensive workloads 20 days ✓ Yes
compute_zone2 CPU-intensive workloads 20 days  
longjobs Extended runtime workloads 40 days  
gpu GPU computations 20 days  
gpu_zone2 GPU computations 20 days  

Specifying Partitions in Job Scripts

# Single partition
#SBATCH --partition=gpu

# Multiple partitions (job will use first available)
#SBATCH --partition=gpu_zone2,gpu

# CPU partitions
#SBATCH --partition=compute_zone2,normal

GPU Resources

Available GPU Models

  • NVIDIA A40 - Primary GPU (most common)
  • Other models may be available

Requesting GPUs

Basic GPU request:

#SBATCH --partition=gpu_zone2,gpu
#SBATCH --gres=gpu:1

Request specific GPU model (if required by your code):

#SBATCH --partition=gpu_zone2,gpu
#SBATCH --gres=gpu:1
#SBATCH --constraint=gpu_type:A40

Multiple GPUs:

#SBATCH --gres=gpu:2

Best Practice: Only specify GPU model if your code requires it. Leaving it unspecified allows the scheduler to assign any available GPU, which can get your job running faster.


CPU and Memory

Requesting Resources

# Number of nodes
#SBATCH --nodes=1

# Tasks per node
#SBATCH --ntasks-per-node=20

# CPUs per task
#SBATCH --cpus-per-task=1

# Memory (specify M for megabytes or G for gigabytes)
#SBATCH --mem=4G              # Total memory per node
#SBATCH --mem-per-cpu=2G      # Memory per CPU

Resource Limits

Check current node availability and specifications:

sinfo                    # Basic node information
sinfo -o "%n %c %m %f"  # Detailed: node, CPUs, memory, features

Storage

Home Directory

  • Path: /home/netid/
  • Type: NetApp storage, mounted on all nodes at job start
  • Accessibility: Available on login nodes and automatically mounted on compute nodes when your job runs
  • Use for: Scripts, code, data files, conda environments, results

Temporary Storage During Jobs

  • Recommended: Create scratch directories within your home directory
  • Example: /home/netid/scratch/ or /home/netid/job_temp/
  • Why? Your home directory is already mounted on compute nodes, no need for separate temp storage

Best Practices

  • Store all your data in /home/ - it’s accessible everywhere
  • Create subdirectories for organization (data, scripts, results, scratch)
  • For large temporary files during computation, create a scratch directory in your home
  • Clean up large temporary files after jobs complete to manage your quota

Available Software

Using Lmod Modules

Zest uses Lmod for managing software environments.

# List all available modules
module avail

# Search for specific module
module spider python

# Load a module
module load anaconda3

# List currently loaded modules
module list

# Unload a module
module unload anaconda3

# Unload all modules
module purge

Commonly Used Modules

Module Version Description
anaconda3 2023.9 Python environment (Conda)
cuda 12.3 NVIDIA CUDA libraries
openmpi4 4.1.6 MPI implementation
gromacs 2023.2 Molecular dynamics
singularity 3.7.1 Container runtime
gnu12 12.3.0 GNU compiler family

For a complete up-to-date list, use module spider on the cluster.


Job Submission Limits

Runtime Limits

  • normal, compute_zone2, gpu, gpu_zone2: 20 days maximum
  • longjobs: 40 days maximum

Resource Recommendations

  • Start with conservative estimates
  • Monitor first few jobs to optimize requests
  • Under-requesting memory will cause job failures
  • Over-requesting resources means longer queue times

Checking Cluster Status

# View partition and node information
sinfo

# View your jobs
squeue -u netid

# View all jobs in a partition
squeue -p gpu

# Detailed node information
scontrol show nodes

# Job accounting information
sacct

# Detailed job information
scontrol show job <jobid>

Common Job Patterns

Basic CPU Job

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --time=02:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu

module load anaconda3
python my_script.py

GPU Job

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=32G
#SBATCH --partition=gpu_zone2,gpu
#SBATCH --gres=gpu:1
#SBATCH --time=1-00:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu

module load cuda
module load anaconda3
python train_model.py

MPI Parallel Job

#!/bin/bash
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=20
#SBATCH --cpus-per-task=1
#SBATCH --time=12:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu

module load openmpi4
mpirun my_parallel_program

External Resources


Getting Help

Questions about Zest specifications or optimal job configuration?

📧 Email researchcomputing@syr.edu