bioinformatics-DNA, protein

Step-by-Step Guide to Minor Allele Frequency (MAF) Calculation

December 28, 2024 Off By admin
Shares

Introduction to Minor Allele Frequency


Step 1: Understand the Basics of Allele Frequencies

  • Each individual in a diploid organism has two alleles for a SNP (Single Nucleotide Polymorphism).
  • If the population size is NN:
    • Total number of alleles = 2×N2 \times N.
  • If the major allele frequency (e.g., AA) is 0.60.6, the minor allele frequency (e.g., GG) is 1−0.6=0.41 – 0.6 = 0.4.

Step 2: Manual Calculation

Suppose you have a SNP with the following data:

  • Population size = 100 individuals.
  • Major allele (A) = 0.6.
  • Minor allele (G) = 0.4.

Steps:

  1. Total alleles = 100×2=200100 \times 2 = 200.
  2. Number of GG alleles = 200×0.4=80200 \times 0.4 = 80.
  3. Verify:
    • AA alleles = 200×0.6=120200 \times 0.6 = 120.
    • Total alleles = 80+120=20080 + 120 = 200.

For a minor allele frequency of 0.050.05:

  • Number of minor alleles=200×0.05=10\text{Number of minor alleles} = 200 \times 0.05 = 10.

Step 3: Using Databases

For larger datasets or known SNPs:

  1. NCBI dbSNP: Provides allele frequencies across populations.
  2. 1000 Genomes Project: Offers population-based MAF data.

Step 4: Automating the Calculation with UNIX and Perl

UNIX Approach

If you have a VCF (Variant Call Format) file:

  1. Extract Allele Counts:
    bash
    bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%INFO/AF\n' input.vcf > allele_frequencies.txt
    • AF represents allele frequency.
  2. Filter for MAF < 0.05:
    bash
    awk '$5 < 0.05' allele_frequencies.txt > rare_variants.txt

Perl Script

If you have a tab-delimited file with allele counts:

#!/usr/bin/perl
use strict;
use warnings;

# Input file format: Chromosome Position Ref_Count Alt_Count
my $input_file = 'allele_counts.txt';
open(my $fh, '<', $input_file) or die "Cannot open $input_file: $!";

while (my $line = <$fh>) {
chomp $line;
my ($chrom, $pos, $ref, $alt) = split("\t", $line);

my $total = $ref + $alt;
my $maf = $alt / $total;
print "Chromosome: $chrom, Position: $pos, MAF: $maf\n" if $maf < 0.05;
}
close($fh);

  1. Save the file as calculate_maf.pl.
  2. Run the script:
    bash
    perl calculate_maf.pl

Step 5: Validation

Verify results using population genetics tools:

  • PLINK: Compute allele frequencies and filter based on MAF.
    bash
    plink --vcf input.vcf --freq --out allele_frequencies

Conclusion

By understanding and automating MAF calculations, researchers can efficiently analyze SNP data for diverse applications. Tools like bcftools, awk, and PLINK, combined with manual validation, provide robust workflows for both small and large datasets.

Shares