slurmstepd: error: Detected 1 oom-kill event(s) in step 3475229.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
# Error:
slurmstepd: error: Detected 1 oom-kill event(s) in step 3475229.batch
cgroup. Some of your processes may have been killed by the cgroup
out-of-memory handler.
# Explanation:
# When Linux runs low on memory, it will "oom-kill" a process to keep
# critical processes running. It looks like slurmstepd detected that
# your process was oom-killed.
# Solution:
# For me, adding this line to my slurm submission script fixed the issue:
#SBATCH --mem=40G
# The source also mentions adding the following, but I didn't test it:
#SBATCH --mem-per-cpu=<more memory than you've previously requested>