Ronald Robertson

Sanjay Singh

Scientist/Writer
  • Emailsanjaysingh765@gmail.com
  • Socail@lampatlex
  • VisitorSince 1982
  • LocationKentucky, USA



KEGG Sequence Downloader : retrieve gene sequences in Fasta format from KEGG database



I wanted to download the gene sequence of tobacco from NCBI. Since NCBI also contains the isoform and some other unwanted genes, therefore I choose to get it from KEGG. Although KEGGREST is a wonderful R package to retrieve the data from KEGG, but it limits the retrieval. The following bash script can help to download the thousands of sequences in a single go without any limitation. Although this is a crude solution and there must be an efficient way to do it but it worked for me. Basically, this bash script works in three steps:



  • Split IDs in a given chunk 

  • Download fasta sequences as HTML file 

  •  Clean HTML file and save the result




Uses


bash KEGG_sequence_downloader.sh query_file number_of_sequence


How to download only viridiplantae miRNA from miRBase HERE


Script















Script name Download
KEGG_sequence_downloader.sh




Comments