Step-by-Step Guide: Resolving “Out of Disk Space” Issues with Picard Tools
January 10, 2025Running bioinformatics tools like Picard can sometimes lead to “out of disk space” errors, even when you have ample storage. This guide will walk you through the steps to resolve such issues, including tips, tricks, and scripts to manage disk space effectively.
1. Identify the Problem
The error java.io.IOException: No space left on device
typically occurs when the temporary directory used by Java (or Picard) is on a partition with insufficient space. This is common in high-performance computing environments where /tmp
is small.
2. Check Disk Space
Before proceeding, verify the available disk space on your system.
Unix Command:
df -h
This will display the disk usage and available space on all mounted partitions. Look for the partition where /tmp
is located (usually /
).
3. Create a Custom Temporary Directory
To avoid using the default /tmp
directory, create a custom temporary directory in a location with sufficient space.
Unix Command:
mkdir -p /path/to/large/disk/tmp
Replace /path/to/large/disk/tmp
with a directory path on a partition with ample space.
4. Run Picard Tools with a Custom Temporary Directory
Use the -Djava.io.tmpdir
Java option to specify the custom temporary directory. Additionally, Picard Tools often supports a TMP_DIR
parameter.
Example Command:
java -Xmx2g -Djava.io.tmpdir=/path/to/large/disk/tmp -jar SortSam.jar \ SORT_ORDER=coordinate \ INPUT=input.bam \ OUTPUT=output.sorted.bam \ TMP_DIR=/path/to/large/disk/tmp
Explanation:
-Xmx2g
: Allocates 2GB of memory to the Java process.-Djava.io.tmpdir=/path/to/large/disk/tmp
: Sets the Java temporary directory.TMP_DIR=/path/to/large/disk/tmp
: Ensures Picard uses the specified directory for temporary files.
5. Handle “Too Many Open Files” Error
If you encounter the error Too many open files
, it means the system has hit the limit for open file handles. Increase the limit temporarily or permanently.
Temporary Fix (for current session):
ulimit -n 65536
Permanent Fix (requires root access):
Edit /etc/security/limits.conf
and add the following lines:
* hard nofile 65536 * soft nofile 65536
Then, restart your session or reboot the system.
6. Using GATK with Picard Tools
If you’re running Picard Tools via GATK, the syntax for specifying the temporary directory is slightly different.
Example Command:
./gatk --java-options "-Djava.io.tmpdir=/path/to/large/disk/tmp" SortSam \ -I=input.sam \ -O=output.sorted.bam \ -SO=coordinate \ --TMP_DIR=/path/to/large/disk/tmp
Explanation:
--java-options
: Passes Java-specific options.--TMP_DIR
: Specifies the temporary directory for GATK/Picard.
7. Automate with Scripts
To streamline the process, you can create a script to handle temporary directory creation and tool execution.
Bash Script:
#!/bin/bash # Set variables INPUT_BAM="input.bam" OUTPUT_BAM="output.sorted.bam" TMP_DIR="/path/to/large/disk/tmp" # Create temporary directory mkdir -p $TMP_DIR # Run Picard SortSam java -Xmx2g -Djava.io.tmpdir=$TMP_DIR -jar SortSam.jar \ SORT_ORDER=coordinate \ INPUT=$INPUT_BAM \ OUTPUT=$OUTPUT_BAM \ TMP_DIR=$TMP_DIR
Python Script:
import subprocess import os # Set variables input_bam = "input.bam" output_bam = "output.sorted.bam" tmp_dir = "/path/to/large/disk/tmp" # Create temporary directory os.makedirs(tmp_dir, exist_ok=True) # Run Picard SortSam command = [ "java", "-Xmx2g", f"-Djava.io.tmpdir={tmp_dir}", "-jar", "SortSam.jar", "SORT_ORDER=coordinate", f"INPUT={input_bam}", f"OUTPUT={output_bam}", f"TMP_DIR={tmp_dir}" ] subprocess.run(command)
8. Monitor Disk Usage
Regularly monitor disk usage to avoid future issues.
Unix Command:
du -sh /path/to/large/disk/tmp
This will show the total size of the temporary directory.
9. Clean Up Temporary Files
After the job completes, clean up the temporary directory to free up space.
Unix Command:
rm -rf /path/to/large/disk/tmp/*
10. Additional Tips
- Compress Intermediate Files: Use tools like
gzip
orbzip2
to compress intermediate files if disk space is a concern. - Use Cloud Storage: If working in a cloud environment, consider using high-capacity storage buckets for temporary files.
- Optimize Memory Usage: Adjust
-Xmx
(maximum memory) based on your system’s RAM to avoid excessive disk swapping.
By following these steps, you can effectively manage disk space issues when running Picard Tools and ensure smooth execution of your bioinformatics workflows.