A-RNA-sequence-analysis-basics.

A Comprehensive Overview of High-Throughput Sequencing Tools

October 10, 2024 Off By admin
Shares

Introduction

High-throughput sequencing (HTS) has revolutionized the field of genomics, enabling researchers to generate massive amounts of sequence data at unprecedented speed and scale. This advancement has opened new avenues for exploring complex biological questions, ranging from gene expression analysis to genomic variation in disease. To harness the potential of HTS, various software tools have been developed to facilitate data analysis, visualization, and interpretation. This overview highlights a selection of notable tools designed to support different aspects of high-throughput sequencing workflows, catering to diverse applications in genomics and functional genomics.

Overview of Tools

  1. SAMtools: A fundamental suite of programs for interacting with sequencing data, SAMtools allows users to manipulate alignments in SAM/BAM/CRAM formats, index reference sequences, and extract subsequences. Its versatility makes it a cornerstone in many HTS analyses.
  2. CRISPR-AnalyzeR for pooled screens (CaRpools): This R package provides an accessible platform for exploratory data analysis and CRISPR/Cas9 screen analysis, integrating documentation and generating standardized analysis reports, making it suitable for both novice and experienced users.
  3. Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout (MAGeCK): MAGeCK excels in identifying positively and negatively selected sgRNAs and genes in genome-scale CRISPR/Cas9 knockout experiments. Its robust workflow controls the false discovery rate, enhancing the reliability of results.
  4. ScreenBEAM: An R package for gene-level meta-analysis, ScreenBEAM focuses on high-throughput functional genomics data from RNAi or CRISPR screenings, facilitating the identification of significant gene interactions.
  5. HiTSelect: This comprehensive analysis pipeline rigorously selects screen hits while addressing off-target effects and controlling for variance, helping researchers identify functionally relevant genes and pathways with integrated metadata.
  6. CRISPRs: This tool enables the detection of CRISPRs in locally produced datasets while consulting a database of known CRISPR sequences, aiding in the characterization of CRISPR-related genes.
  7. EBARDenovo: Designed for RNA-Seq, EBARDenovo offers highly accurate de novo assembly with effective chimera detection, ensuring the integrity of the assembled sequences.
  8. Blast2GO: A powerful tool for sequence annotation and data mining, Blast2GO utilizes gene ontology (GO) vocabulary to optimize function transfer from homologous sequences while providing visualization and statistical analysis capabilities.
  9. CodingQuarry: A self-training GHMM fungal gene predictor, CodingQuarry is tailored for high-quality fungal transcript assemblies, enabling accurate gene predictions despite potential assembly issues.
  10. HOMER: This suite of tools encompasses motif discovery and various next-gen sequencing analyses, making it a versatile choice for examining diverse functional genomics datasets.
  11. NGS-QC: Focusing on quality control, NGS-QC infers quality indicators from the distribution of sequenced reads, helping researchers assess the reliability of their sequencing data.
  12. kmerHMM: Optimized for motif discovery in Protein Binding Microarray data, kmerHMM employs Hidden Markov Models and Belief Propagation for efficient analysis.
  13. CexoR: This tool enables strand-specific peak-pair calling in ChIP-exo replicates, estimating irreproducible discovery rates for overlapping peak-pairs, enhancing the accuracy of peak detection.
  14. HATSEQ: HATSEQ identifies functional regions of interest on the genome, providing visualizations and statistical summaries that connect detected regions to gene pathways and motif analysis.
  15. plasmidSPAdes: Specifically for plasmid assembly, plasmidSPAdes uses SPAdes and ExSPAnder to resolve repeats and generate plasmidic contigs from whole genome sequencing data.
  16. PlasmidFinder 1.3: This tool identifies plasmids in sequenced bacterial isolates, detecting a variety of plasmids often linked to antimicrobial resistance, crucial for understanding bacterial pathogenicity.
  17. OrfM: OrfM rapidly identifies open reading frames (ORFs) in sequence data, utilizing the Aho-Corasick algorithm, making it well-suited for high-quality datasets produced by advanced sequencing platforms.
  18. IGV (Integrative Genomics Viewer): A powerful visualization tool, IGV allows researchers to explore large integrated genomic datasets and supports various data types, including next-generation sequencing data and genomic annotations.
  19. FastQC: A widely used quality control tool, FastQC assesses the quality of high-throughput sequence data, providing essential insights for downstream analysis.
  20. Picard: This set of command-line tools facilitates manipulation of high-throughput sequencing data in SAM/BAM/CRAM and VCF formats, aiding in various data processing tasks.
  21. CLC Workbench: A commercial tool, CLC Workbench offers advanced features for analyzing DNA, RNA, and protein sequence data, integrating tools for primer design, assembly, and gene expression analysis.

Tools for High-throughput Sequencing

Software / ToolCategoryFreeFreeTrialTool DescriptionLink
SAMtoolsHigh-throughput SequencingyesSAMtools is a suite of programs for interacting with high-throughput sequencing data. It can manipulate alignments in the SAM/BAM/CRAM formats : reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. Another feature is to index reference sequence in the FASTA format or extract subsequence from indexed reference sequence.Link
CRISPR-AnalyzeR for pooled screens caRpoolsHigh-throughput SequencingyesCaRpools is an R package for exploratory data analysis providing CRISPR/Cas9 screen analysis. CaRpools integrates screening documentation and generation of standardized analysis reports. Its open virtual appliance allows analysis without prior programming knowledge and is therefore suited for novice and expert users.Link
Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout MAGeCKHigh-throughput SequencingyesMAGeCK identifies positively and negatively selected sgRNAs and genes in genome-scale CRISPR/Cas9 knockout experiments. The workflow can be partitioned into four steps: read count normalization, mean-variance modeling, sgRNA ranking and gene ranking. MAGeCK outperforms existing computational methods in its control of the false discovery rate (FDR) and its high sensitivity.Link
ScreenBEAMHigh-throughput SequencingyesScreening Bayesian Evaluation and Analysis Method (ScreenBEAM) is an R package to do gene-level meta-anlaysis of high-throughput functional genomics RNAi or CRISPR screening data.Link
HiTSelectHigh-throughput SequencingyesA comprehensive analysis pipeline for rigorously selecting screen hits and identifying functionally relevant genes and pathways by addressing off-target effects, controlling for variance in both gene silencing efficiency and sequencing depth of coverage and integrating relevant metadata. HiTSelect is implemented as an open-source package, with a user-friendly interface for data visualization and pathway exploration.Link
CRISPRsHigh-throughput SequencingyesCRISPRs enables the detection of CRISPRs in locally-produced data and consultation of CRISPRs present in the database. If CRISPR-associated (cas) genes are annotated the program will show them as well.Link
EBARDenovoHigh-throughput SequencingyesEBARDenovo is a highly accurate de novo assembly of RNA-Seq with efficient chimera-detection.Link
Blast2GOHigh-throughput SequencingyesBlast2GO is spepcialized for annotation of sequences and data mining on the resulting annotations, primarily based on the gene ontology (GO) vocabulary. With the help of an algorithm that considers similarity, the extension of the homology, the database of choice, the GO hierarchy, and the quality of the original annotations Blast2GO optimizes function transfer from homologous sequences. The tool includes numerous functions for the visualization, management, and statistical analysis of annotation results, including gene set enrichment analysis. The application supports InterPro, enzyme codes, KEGG pathways, GO direct acyclic graphs (DAGs), and GOSlim.Link
CodingQuarryHigh-throughput SequencingyesCodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. Predictions are made directly from transcript sequences which is possible through the high quality of fungal transcript assemblies. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci.Link
HOMERHigh-throughput SequencingyesHOMER consists out of suite of tools for Motif Discovery and next-gen sequencing analysis. HOMER contains many useful tools for analyzing ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and numerous other types of functional genomics sequencing data sets.Link
NGS-QCHigh-throughput SequencingyesNGS-QC (Next Generation Sequencing Quality Control Generator) is a computational-based approach that infers quality indicators from the distribution of sequenced reads associated to a particular NGS profile.Link
kmerHMMHigh-throughput SequencingyesKmerHMM is suited for Motif Discovery on Protein Binding Microarray (PBM) data using Hidden Markov Model and Belief Propagation.Link
CexoRHigh-throughput SequencingyesCexoR enables strand specific peak-pair calling in ChIP-exo replicates. The cumulative Skellam distribution function is used to detect significant normalised count differences of opposed sign at each DNA strand (peak-pairs). Irreproducible discovery rate for overlapping peak-pairs across biological replicates is estimated using the package ‘idr’.Link
HATSEQHigh-throughput SequencingyesHATSEQ identifies functional regions of interest (ROIs) on the genome where a genomic signal significantly deviates from the general genome-wide behavior. The program provides different visualizations and statistical summaries for the detected ROIs and includes a number of built-in post-analyses with which biological meaning can be attached to the detected ROIs in terms of gene pathways and de-novo motif analysis. No further knowledge of scripting languages required.Link
plasmidSPAdesHigh-throughput SequencingyesPlasmidSPAdes assembles plasmids from whole genome sequencing data. It utilizes SPAdes for transforming the de Bruijn graph into the assembly graph and finds a subgraph of the assembly graph that we refer to as the plasmid graph. It further uses ExSPAnder for repeat resolution in the plasmid graph using paired reads and generates plasmidic contigs.Link
PlasmidFinder 1.3High-throughput SequencingyesPlasmidFinder 1.3 identifies plasmids in total or partial sequenced isolates of bacteria. PlasmidFinder can be used for replicon sequence analysis of raw, contig group, or completely assembled and closed plasmid sequencing data. The current database consists of 116 replicon sequences that match with at least at 80% nucleotide identity all replicon sequences identified in the 559 fully sequenced plasmids. The program detects a broad variety of plasmids that are often associated with antimicrobial resistance in clinically relevant bacterial pathogens.Link
OrfMHigh-throughput SequencingyesOrfM rapidly identifies open reading frames (ORFs) in sequence data by applying the Aho-Corasick algorithm to find regions uninterrupted by stop codons. Ist up to five times faster than comparable tools like ‘GetOrf’ and ‘Translate’. While OrfM is sequencing platform-agnostic, it is best suited to large, high quality datasets such as those produced by Illumina sequencers.Link
IGVHigh-throughput SequencingyesThe Integrative Genomics Viewer (IGV) can be used to explore large integrated genomic datasets and visualize them. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.Link
FastQCHigh-throughput SequencingyesA quality control tool for high throughput sequence data.Link
PicardHigh-throughput SequencingyesA set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.Link
CLC WorkbenchHigh-throughput SequencingyesThe CLC Workbench combines many useful features for DNA, RNA, and protein sequence data analysis. The features include: Editor for graphically and algorithmically advanced primer design/ Assembly of DNA sequence data/ Molecular cloning /Advanced RNA structure prediction and editing/ Integrated and advanced gene expression analysis /Integrated 3D molecule viewLink

 

Conclusion

The landscape of high-throughput sequencing is continually evolving, driven by advancements in sequencing technologies and the increasing complexity of biological data. The tools highlighted in this overview represent a fraction of the diverse software ecosystem available to researchers, each designed to tackle specific challenges associated with data analysis and interpretation. By leveraging these tools, scientists can gain deeper insights into genomic functions, uncover novel biomarkers, and advance our understanding of biological processes, ultimately contributing to the fields of genomics, biotechnology, and personalized medicine.

Shares