Convert VCF File to PLINK PED/MAP Format While Filtering SNPs
January 3, 2025This guide provides step-by-step instructions to convert a VCF file to PLINK PED/MAP format while filtering SNPs based on Minor Allele Frequency (MAF). You can use either vcftools
or PLINK
for this task.
Step 1: Install Required Tools
Ensure you have the following tools installed:
- vcftools: For VCF file manipulation.
- PLINK: For handling genetic data in PED/MAP or binary BED/BIM/FAM formats.
Install them using:
# Install vcftools sudo apt-get install vcftools # Install PLINK (version 1.9 or later) wget http://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20231009.zip unzip plink_linux_x86_64_20231009.zip sudo mv plink /usr/local/bin/
Step 2: Convert VCF to PLINK PED/MAP Format with MAF Filtering
Option 1: Using vcftools
- Convert VCF to PLINK format and filter SNPs by MAF in one step:
vcftools --vcf myvcf.vcf --plink --maf 0.05 --out myplink
--vcf
: Input VCF file.--plink
: Output in PLINK PED/MAP format.--maf 0.05
: Filter SNPs with MAF ≥ 0.05.--out
: Output file prefix.
- Output Files:
myplink.ped
: PLINK PED file.myplink.map
: PLINK MAP file.
Option 2: Using PLINK
- Convert VCF to PLINK PED/MAP format and filter SNPs by MAF in one step:
plink --vcf myvcf.vcf --maf 0.05 --recode --out myplink
--vcf
: Input VCF file.--maf 0.05
: Filter SNPs with MAF ≥ 0.05.--recode
: Output in PLINK PED/MAP format.--out
: Output file prefix.
- Output Files:
myplink.ped
: PLINK PED file.myplink.map
: PLINK MAP file.
Step 3: Verify Output
Check the generated PED and MAP files to ensure the conversion and filtering were successful:
head myplink.ped head myplink.map
Step 4: Additional Notes
- Binary Format (BED/BIM/FAM): If you prefer the binary format for faster processing, replace
--recode
with--make-bed
in the PLINK command:plink --vcf myvcf.vcf --maf 0.05 --make-bed --out myplink
This will generate
myplink.bed
,myplink.bim
, andmyplink.fam
files. - Handling Missing Sample Information: If your VCF file lacks sample information, you may need to manually create a
.fam
file or use--allow-no-samples
(though this limits some analyses).
Online Tools
For users who prefer a graphical interface or online tools:
- PLINK Web Interface: https://zzz.bwh.harvard.edu/plink/
- Galaxy Project: https://usegalaxy.org/ (Supports VCF to PLINK conversion with filtering).
Example Script
Here’s a complete script to automate the process:
#!/bin/bash # Input VCF file VCF="myvcf.vcf" # Output prefix OUTPUT="myplink" # MAF threshold MAF=0.05 # Step 1: Convert VCF to PLINK format with MAF filtering plink --vcf $VCF --maf $MAF --recode --out $OUTPUT # Step 2: Verify output echo "PED file:" head ${OUTPUT}.ped echo "MAP file:" head ${OUTPUT}.map
Save the script as convert_vcf_to_plink.sh
, make it executable, and run it:
chmod +x convert_vcf_to_plink.sh
./convert_vcf_to_plink.sh
This guide provides a straightforward method to convert and filter VCF files into PLINK PED/MAP format. Use the approach that best fits your workflow