Ronald Robertson

Sanjay Singh

Scientist/Writer
  • Emailsanjaysingh765@gmail.com
  • Socail@lampatlex
  • VisitorSince 1982
  • LocationKentucky, USA



How to rename fasta headers according to a matching name list



FaBox has several utilities to manipulate the FASTA sequence. I wanted to replace the FASTA header with the new header or description which are saved in a file. Although I can do it with FaBox, but it handles difficult when the number of sequences is huge. This PERL script will rename the fasta sequence as per store in another file.


Header


Header and new FASTA header should be separated by TAB


M54089d protein1
M54089c protein2
M54089b protein3
M54089a protein4




Sequence 


FASTA should be in one line




Convert Multi line Fasta file into a Single line FASTA File HERE

>M54089d
MEQCRQGSRQNGSVTSGKGLALRAGHGGPSPEPVGCRWTARAAPAARAGRRVPAGGRTGNGSFGGLPRASHSQLRTGTDKGNPTV
>M54089c
MINFDHLFACLHGHYGEVENKLKCILHYFGRICSSMPLGYVSFERKVLSLECTPSCIPYPKEKAWSQSNISLCPIEITISGLIEDQSREAIEVDFANMYLGGGALVRGCVQQEEIRFMINPELIAGMLFLPCMADNEAVEIVGTERFSSYTGRLTKHFVASWINSSVISINSFSKMMASWDFNMIKMLKTPVEGPLLIFCRLVILQLHLKKLRKHRKTS
>M54089b
MIGRADIEGSKSNVAMNAWLHKPVIPVVTFLTPLASNSEGLKIVRPRFHGSYSYWKSESNELLPSVPHEISVRVELILGHLRYLLTDVPPQPNSPPDNVFRRIGLQASLGSKKRGSAPLPLHGISKITLEVVVFHFRLSAPTYTTPLKSFTKSD
>M54089a
MNGLTRFHCPCLLSSETTAKGTGLAESAGKEDPVELDSSRLCEMT




Script 


This PERL script will ask for header list and FASTA sequences (file format given above) and save the FASTA file with new header in result.fasta





If you are working with unix based system, then this AWK one-liner will be very useful
awk 'FNR==NR{  a[">"$1]=$2;next}$1 in a{  sub(/>/,">"a[$1]"|",$1)}1' header_list.txt sequence.fasta




Comments