Extract Part of a FASTA Sequences with Position

Update: 5.29.2018

Bedtools are another nice tool to extract defined regions sequence from FASTA file. Install Bedtools on your Ubuntu machine using these commands

sudo apt-get update
apt-get install bedtools
sudo

and extract sequence by this command

bedtools getfasta -fi input_fasta -bed
id_file

Formats for both fasta file and id files are same as described below.

=======================================================================

Actually, I have hundreds of protein sequence and I identified the conserved domain sequence from all those hundreds of protein sequences. Now I got the location of all those domains and want to extract the exact sequence from that locations. So it is easy if I have a single sequence and have the location of one or more domain in my protein but it's very difficult to extract out the domain sequences from many protein sequences with the help of domain location coordinates. I found an easy python script to extracting fasta sequences based on position. I have also shared an online program originally written by Dr Pierre Lindenbaum HERE .

Example FASTA file with protein sequence


>AT1G01250 
MSPQRMKLSSPPVTNNEPTATASAVKSCGGGGKETSSSTTRHPVYHGVRKRRWGKWVSEIREPRKKSRIWLGSFPVPEMAAKAYDVAAFCLKGRKAQLNFPEEIEDLPRPSTCTPRDIQVAAAKAANAVKIIKMGDDDVAGIDDGDDFWEGIELPELMMSGGGWSPEPFVAGDDATWLVDGDLYQYQFMACL

>AT1G03800 
MTTEKENVTTAVAVKDGGEKSKEVSDKGVKKRKNVTKALAVNDGGEKSKEVRYRGVRRRPWGRYAAEIRDPVKKKRVWLGSFNTGEEAARAYDSAAIRFRGSKATTNFPLIGYYGISSATPVNNNLSETVSDGNANLPLVGDDGNALASPVNNTLSETARDGTLPSDCHDMLSPGVAEAVAGFFLDLPEVIALKEELDRVCPDQFESIDMGLTIGPQTAVEEPETSSAVDCKLRMEPDLDLNASP

Example ID file with domain location

AT1G01250   45  102
AT1G03800   65  109

Script name	Download
domainseq.py

Uses

python domainseq.py input.fasta ids.txt > result.fasta

Results

>AT1G01250:45-102
IREPRKKSRIWLGSFPVPEMAAKAYDVAAFCLKGRKAQLNFPEEIEDLPRPSTCTPR
>AT1G03800:65-109
AEIRDPVKKKRVWLGSFNTGEEAARAYDSAAIRFRGSKATTNFP

Remove Empty Fasta Sequences from a file

How to Extract Multiple Sequence from Fasta File

Add FASTA Description to Multiple Sequences

Sanjay Singh

Extract Part of a FASTA Sequences with Position

Comments

Post a Comment