bioinformatics projects

Convert VCF File to PLINK PED/MAP Format While Filtering SNPs

January 3, 2025 Off By admin
Shares

This guide provides step-by-step instructions to convert a VCF file to PLINK PED/MAP format while filtering SNPs based on Minor Allele Frequency (MAF). You can use either vcftools or PLINK for this task.


Step 1: Install Required Tools

Ensure you have the following tools installed:

  1. vcftools: For VCF file manipulation.
  2. PLINK: For handling genetic data in PED/MAP or binary BED/BIM/FAM formats.

Install them using:

bash
Copy
# Install vcftools
sudo apt-get install vcftools

# Install PLINK (version 1.9 or later)
wget http://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20231009.zip
unzip plink_linux_x86_64_20231009.zip
sudo mv plink /usr/local/bin/

Step 2: Convert VCF to PLINK PED/MAP Format with MAF Filtering

Option 1: Using vcftools

  1. Convert VCF to PLINK format and filter SNPs by MAF in one step:
    bash
    Copy
    vcftools --vcf myvcf.vcf --plink --maf 0.05 --out myplink
    • --vcf: Input VCF file.
    • --plink: Output in PLINK PED/MAP format.
    • --maf 0.05: Filter SNPs with MAF ≥ 0.05.
    • --out: Output file prefix.
  2. Output Files:
    • myplink.ped: PLINK PED file.
    • myplink.map: PLINK MAP file.

Option 2: Using PLINK

  1. Convert VCF to PLINK PED/MAP format and filter SNPs by MAF in one step:
    bash
    Copy
    plink --vcf myvcf.vcf --maf 0.05 --recode --out myplink
    • --vcf: Input VCF file.
    • --maf 0.05: Filter SNPs with MAF ≥ 0.05.
    • --recode: Output in PLINK PED/MAP format.
    • --out: Output file prefix.
  2. Output Files:
    • myplink.ped: PLINK PED file.
    • myplink.map: PLINK MAP file.

Step 3: Verify Output

Check the generated PED and MAP files to ensure the conversion and filtering were successful:

bash
Copy
head myplink.ped
head myplink.map

Step 4: Additional Notes

  • Binary Format (BED/BIM/FAM): If you prefer the binary format for faster processing, replace --recode with --make-bed in the PLINK command:
    bash
    Copy
    plink --vcf myvcf.vcf --maf 0.05 --make-bed --out myplink

    This will generate myplink.bedmyplink.bim, and myplink.fam files.

  • Handling Missing Sample Information: If your VCF file lacks sample information, you may need to manually create a .fam file or use --allow-no-samples (though this limits some analyses).

Online Tools

For users who prefer a graphical interface or online tools:

  1. PLINK Web Interfacehttps://zzz.bwh.harvard.edu/plink/
  2. Galaxy Projecthttps://usegalaxy.org/ (Supports VCF to PLINK conversion with filtering).

Example Script

Here’s a complete script to automate the process:

bash
Copy
#!/bin/bash

# Input VCF file
VCF="myvcf.vcf"

# Output prefix
OUTPUT="myplink"

# MAF threshold
MAF=0.05

# Step 1: Convert VCF to PLINK format with MAF filtering
plink --vcf $VCF --maf $MAF --recode --out $OUTPUT

# Step 2: Verify output
echo "PED file:"
head ${OUTPUT}.ped

echo "MAP file:"
head ${OUTPUT}.map

Save the script as convert_vcf_to_plink.sh, make it executable, and run it:

bash
Copy
chmod +x convert_vcf_to_plink.sh
./convert_vcf_to_plink.sh

This guide provides a straightforward method to convert and filter VCF files into PLINK PED/MAP format. Use the approach that best fits your workflow

Shares