So you have so many FASTA sequence in a file have translated those multiple nucleotide sequences and now you want to extract the region with the longest gap between two stop codons. I have already shared couple of tool to translate many DNA sequences in a go. So you can translated several DNA sequences easily.
How to translate multiple DNA FASTA sequences HERE
Input
>Seq1
ASKAENM-SRSHFEKLTF-VSVSKFNRMYLRQ-LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-FCIKS-KCVVSKSFREIDVLSFCFQIQTDVSSPIIVRS-NFFQHYEALFTYFDPASKAENM-SRSHFEKLEFLSTL-SPFYIF-FCIKS-KYVVSKSFRN-RFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-LCIKS-KYVVSKSFREIDV
>seq2
ASKAENM-SRSHFEKLTF-VSVSKFNRMYLRQ-LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-FCIKS-KCVVSKSFREIDVLSFCFQIQTDVSSPIIVRS-NFFQHYEALFTYFDPASKAENM-SRSHFEKLEFLSTL-SPFYIF-FCIKS-KYVVSKSFRN-RFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-LCIKS-KYVVSKSFREIDV
Script 1 : Long1.pl
#!/usr/bin/perl
use strict;
use warnings;
$/ = "\n>";
while (<>) {
s/>//g;
my ($id, @seq) = split (/\n/, $_);
my $seq = join "", @seq;
my @orfs = split (/\-/, $seq);
shift @orfs; pop @orfs;
my $sel = shift @orfs;
foreach my $next (@orfs) {
$sel = $next if ((length $sel) < (length $next))
}
print ">$id\n$sel\n";
}
Uses
perl script.pl input.txt result.txt
Result
>Seq1
LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL
>seq2
LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL
If stop codon is depicted as '*' instead of '-' then you can replace '-' in line 11 with '*' and this PERL script work just fine.
Post a Comment