BLAST can be time-consuming especially when it includes a large number of the query sequence. Therefore, parallel BLAST can be useful. This Bash script can perform parallel BLASt in three common steps
- Split fasta file into files with a given number of sequences
- BLAST them in parallel fashion
- Combine their result
Since it doesn't include any additional software except BLAST+, therefore, it is easy to use. If you want to parse your BLAST result you can always use NCBI BLAST parser from here.
USES
This bash script require following commands
script_name: Name of given script
blast_type: What kind of BLAST you want to perform e.g. blastn
query_file: Name of query file
database_name: Name of the database
number_of_sequence_each_file: how many sequences you want per query file
SCRIPT
#!/bin/bash
########################################################################################################################
#
# split fasta file into files with given number of sequences
# blast them in parallel fashion
# combine their result
########################################################################################################################
prog="$1"
query="$2"
database="$3"
num_seq="$4"
if [[ $# -lt 3 ]] ; then
printf "\033[1;31mGive me a proper command\033[0m\n"
printf "\033[1;31mUsage: script_name blast_type query_file database name number_of_sequence_each_file\033[0m\n\n"
exit;
else
start=`date +%s`
#split fasta sequences in given number of sequences per file
awk -v a_seq=$num_seq 'BEGIN {n_seq=0;} /^>/ {if(n_seq%a_seq==0){file=sprintf("myseq%d.fa",n_seq);} print >> file; n_seq++; next;} { print >> file; }' < $query
#run blast
ls *.fa | parallel -a - $prog -query {} -db $database -out {.}.out -evalue 0.001 -num_descriptions 1 -num_alignments 1 -num_threads 8
cat *.out >$query.blast.result
while true; do
read -p "Do you want to delete intermediate files?" yn
case $yn in
[Yy]* ) rm -f *.out *.fa; break;;
[Nn]* ) exit;;
* ) echo "Please answer yes or no.";;
esac
done
runtime=$((end-start))
printf "\033[1;31mYour analysis was done in $((($(date +%s)-$start)/60)) minutes\033[0m\n"${reset}
printf "\033[1;31mTHANKS\033[0m\n"${reset}
fi
Post a Comment