Aurora (ALCF)
The Aurora cluster is located at ALCF.
Introduction
If you are new to this system, please see the following resources:
Batch system: PBS
-
/lus/flare/projects/$proj/: shared with all members of a project, Lustre
Preparation
Use the following commands to download the WarpX source code:
git clone https://github.com/BLAST-WarpX/warpx.git $HOME/src/warpx
We use system software modules, add environment hints and further dependencies via the file $HOME/aurora_warpx.profile.
Create it now:
cp $HOME/src/warpx/Tools/machines/aurora-alcf/aurora_warpx.profile.example $HOME/aurora_warpx.profile
Edit the 2nd line of this script, which sets the export proj="" variable.
For example, if you are member of the project proj_name, then run nano $HOME/aurora_warpx.profile and edit line 2 to read:
export proj="proj_name"
Exit the nano editor with Ctrl + O (save) and then Ctrl + X (exit).
Important
Now, and as the first step on future logins to Aurora, activate these environment settings:
source $HOME/aurora_warpx.profile
Finally, since Aurora does not yet provide software modules for some of our dependencies, install them once:
bash $HOME/src/warpx/Tools/machines/aurora-alcf/install_dependencies.sh
source /home/${USER}/sw/aurora/gpu/venvs/warpx-aurora/bin/activate
Compilation
Use the following cmake commands to compile the application executable:
cd $HOME/src/warpx
rm -rf build_aurora
cmake -S . -B build_aurora -DWarpX_COMPUTE=SYCL -DWarpX_FFT=ON -DWarpX_QED_TABLE_GEN=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_aurora -j 16
The WarpX application executables are now in $HOME/src/warpx/build_aurora/bin/.
Additionally, the following commands will install WarpX as a Python module:
cd $HOME/src/warpx
rm -rf build_aurora_py
cmake -S . -B build_aurora_py -DWarpX_COMPUTE=SYCL -DWarpX_FFT=OFF -DWarpX_QED_TABLE_GEN=ON -DWarpX_APP=OFF -DWarpX_PYTHON=ON -DWarpX_DIMS="1;2;RZ;3"
cmake --build build_aurora_py -j 16 --target pip_install
Now, you can submit Aurora compute jobs for WarpX Python (PICMI) scripts (example scripts).
Or, you can use the WarpX executables to submit Aurora jobs (example inputs).
For executables, you can reference their location in your job script or copy them to a location in /lus/flare/projects/$proj/.
Update WarpX & Dependencies
If you already installed WarpX in the past and want to update it, start by getting the latest source code:
cd $HOME/src/warpx
# read the output of this command - does it look ok?
git status
# get the latest WarpX source code
git fetch
git pull
# read the output of these commands - do they look ok?
git status
git log # press q to exit
And, if needed,
log out and into the system, activate the now updated environment profile as usual,
As a last step, clean the build directory rm -rf $HOME/src/warpx/build_aurora* and rebuild WarpX.
Running
The batch script below can be used to run a WarpX simulation on multiple nodes (change <NODES> accordingly) on the supercomputer Aurora at ALCF.
Replace descriptions between chevrons <> by relevant values, for instance <input file> could be plasma_mirror_inputs.
Note that we run one MPI rank per GPU.
#!/bin/bash -l
#PBS -A <proj>
#PBS -l select=<NODES>:system=aurora
#PBS -W run_count=17
#PBS -l walltime=0:10:00
#PBS -l filesystems=home:flare
#PBS -q debug
#PBS -N test_warpx
# Set required environment variables
# support gpu-aware-mpi
# export MPIR_CVAR_ENABLE_GPU=1
# Change to working directory
echo Working directory is $PBS_O_WORKDIR
cd ${PBS_O_WORKDIR}
echo Jobid: $PBS_JOBID
echo Running on host `hostname`
echo Running on nodes `cat $PBS_NODEFILE`
# On Aurora, must load module environment in job script:
source $HOME/aurora_warpx.profile
# executable & inputs file or python interpreter & PICMI script here
EXE=./warpx
INPUTS=input1d
# MPI and OpenMP settings
NNODES=`wc -l < $PBS_NODEFILE`
NRANKS_PER_NODE=12 # 1 rank per PVC tile (a.k.a. stack; like a gcd on an AMD GPU)
NTHREADS=1
# Avoid core 0 on socket 0 and core 52 on socket 1, where OS threads run:
CPU_RANK_BIND="--cpu-bind=list:1-8:9-16:17-24:25-32:33-40:41-48:53-60:61-68:69-76:77-84:85-92:93-100"
# Convention: gpuNumber.tileNumber
GPU_RANK_BIND="--gpu-bind=list:0.0:0.1:1.0:1.1:2.0:2.1:3.0:3.1:4.0:4.1:5.0:5.1"
NTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))
echo "NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}"
mpiexec --np ${NTOTRANKS} -ppn ${NRANKS_PER_NODE} ${CPU_RANK_BIND} ${GPU_RANK_BIND} -envall ${EXE} ${INPUTS} > output.txt
To run a simulation, copy the lines above to a file aurora.pbs and run
qsub aurora.pbs
to submit the job.