Ronald Robertson

Sanjay Singh

Scientist/Writer
  • Emailsanjaysingh765@gmail.com
  • Socail@lampatlex
  • VisitorSince 1982
  • LocationKentucky, USA



How to perform parallel BLAST





BLAST can be time-consuming especially when it includes a large number of the query sequence. Therefore, parallel BLAST can be useful. This Bash script can perform parallel BLASt in three common steps



  • Split fasta file into files with a given number of sequences 

  • BLAST them in parallel fashion

  • Combine their result



Since it doesn't include any additional software except BLAST+, therefore, it is easy to use.  If you want to parse your BLAST result you can always use NCBI BLAST parser from here.


USES


This bash script require following commands

script_name: Name of given script

blast_type: What kind of BLAST you want to perform e.g. blastn

query_file: Name of query file

database_name: Name of the database

number_of_sequence_each_file: how many sequences you want per query file

SCRIPT



#!/bin/bash

########################################################################################################################
#
# split fasta file into files with given number of sequences
# blast them in parallel fashion
# combine their result
########################################################################################################################

prog="$1"
query="$2"
database="$3"
num_seq="$4"


if [[ $# -lt 3 ]] ; then
printf "\033[1;31mGive me a proper command\033[0m\n"
printf "\033[1;31mUsage: script_name blast_type query_file database name number_of_sequence_each_file\033[0m\n\n"

exit;


else

start=`date +%s`

#split fasta sequences in given number of sequences per file
awk -v a_seq=$num_seq 'BEGIN {n_seq=0;} /^>/ {if(n_seq%a_seq==0){file=sprintf("myseq%d.fa",n_seq);} print >> file; n_seq++; next;} { print >> file; }' < $query



#run blast
ls *.fa | parallel -a - $prog -query {} -db $database -out {.}.out -evalue 0.001 -num_descriptions 1 -num_alignments 1 -num_threads 8

cat *.out >$query.blast.result


while true; do
read -p "Do you want to delete intermediate files?" yn
case $yn in
[Yy]* ) rm -f *.out *.fa; break;;
[Nn]* ) exit;;
* ) echo "Please answer yes or no.";;

esac
done

runtime=$((end-start))
printf "\033[1;31mYour analysis was done in $((($(date +%s)-$start)/60)) minutes\033[0m\n"${reset}
printf "\033[1;31mTHANKS\033[0m\n"${reset}

fi






Comments