Ronald Robertson

Sanjay Singh

Scientist/Writer
  • Emailsanjaysingh765@gmail.com
  • Socail@lampatlex
  • VisitorSince 1982
  • LocationKentucky, USA



How to Extract Longest Sequence Region Between Stop Codons in Translated DNA Sequences






So you have so many FASTA sequence in a file have translated those multiple nucleotide sequences and now you want to extract the region with the longest gap between two stop codons. I have already shared couple of tool to translate many DNA sequences in a go. So you can translated several DNA sequences easily.



How to translate multiple DNA FASTA sequences HERE


Input


>Seq1
ASKAENM-SRSHFEKLTF-VSVSKFNRMYLRQ-LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-FCIKS-KCVVSKSFREIDVLSFCFQIQTDVSSPIIVRS-NFFQHYEALFTYFDPASKAENM-SRSHFEKLEFLSTL-SPFYIF-FCIKS-KYVVSKSFRN-RFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-LCIKS-KYVVSKSFREIDV
>seq2
ASKAENM-SRSHFEKLTF-VSVSKFNRMYLRQ-LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-FCIKS-KCVVSKSFREIDVLSFCFQIQTDVSSPIIVRS-NFFQHYEALFTYFDPASKAENM-SRSHFEKLEFLSTL-SPFYIF-FCIKS-KYVVSKSFRN-RFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-LCIKS-KYVVSKSFREIDV


Script 1 : Long1.pl


#!/usr/bin/perl

use strict;
use warnings;

$/ = "\n>";
while (<>) {
s/>//g;
my ($id, @seq) = split (/\n/, $_);
my $seq = join "", @seq;
my @orfs = split (/\-/, $seq);
shift @orfs; pop @orfs;
my $sel = shift @orfs;
foreach my $next (@orfs) {
$sel = $next if ((length $sel) < (length $next))
}
print ">$id\n$sel\n";
}


Uses


perl script.pl input.txt result.txt


Result


>Seq1
LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL
>seq2
LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL


If stop codon is depicted as '*' instead of '-' then you can replace '-' in line 11 with '*' and this PERL script work just fine.


Comments