Answers for "slurmstepd: error: Detected 1 oom-kill event(s) in step 3475229.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler."

1

slurmstepd: error: Detected 1 oom-kill event(s) in step 3475229.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

# Error:
slurmstepd: error: Detected 1 oom-kill event(s) in step 3475229.batch 
cgroup. Some of your processes may have been killed by the cgroup 
out-of-memory handler.

# Explanation:
# When Linux runs low on memory, it will "oom-kill" a process to keep 
# critical processes running. It looks like slurmstepd detected that 
# your process was oom-killed.

# Solution:
# For me, adding this line to my slurm submission script fixed the issue:
#SBATCH --mem=40G

# The source also mentions adding the following, but I didn't test it:
#SBATCH --mem-per-cpu=<more memory than you've previously requested>
Posted by: Guest on June-26-2021

Code answers related to "slurmstepd: error: Detected 1 oom-kill event(s) in step 3475229.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler."

Code answers related to "Shell/Bash"

Browse Popular Code Answers by Language