screenshot of original drawing from Tank

Bioinformatics Algorithm Tutorial Lesson1

screenshot of original drawing from Tank

Bioinformatics Algorithm Tutorial Lesson1


Date
Event

WHERE IN THE GENOME DOES DNA REPLICATION BEGIN ?

Problem Definition

Genome replication is one of the most important tasks carried out in the cell, Replication begins in a genomic region called the replication origin (denoted oriC) and is performed by molecular copy machines called DNA polymerases

In the following problem, we assume that a genome has a single oriC and is represented as a DNA string , or a string of nucleotides from the four-letter alphabet { A , C , G , T }


Finding Origin of Replication Problem :
Input: A DNA string Genome.
Output: The location of oriC in Genome.


STOP and Think: Does this biological problem represent a clearly stated compu- tational problem?

Hidden Messages in the Replication Origin

PATTERNCOUNT(Text, Pattern)
  count = 0
  for i = 0 to |Text| - |Pattern|
    if Text (i, |Pattern| ) == Pattern
      count = count + 1
return count

Most Frequent Word

import sys

filedata = open(sys.argv[1]).read().split()

def mostFreq(text,k):
    #given a DNA string text and an integer k, find all most frequent k-mers in text
    
    #generate list of all kmers in text
    kmerList = []
    for i in range(len(text)-k+1):
        kmerList.append(text[i:i+k])

    #get the kmer counts
    kmerCounts = {}
    for kmer in kmerList:
        kmerCounts[kmer] = kmerCounts.get(kmer,0) + 1

    #identify most frequent kmers
    maxCount = max(kmerCounts.values())
    mostFreqKmers = [kmer for kmer,val in kmerCounts.items() if val == maxCount];

    return mostFreqKmers

text = filedata[0]
k = int(filedata[1])

mostFreqKmers = mostFreq(text,k)

#print output to new file and open
fnew = 'ANS_'+sys.argv[1]
fh = open(fnew,'w')
fh.write(' '.join(mostFreqKmers))
fh.close()

import webbrowser
webbrowser.open(fnew)

Reverse Complementary Algorithm

import sys

DNAseq = ''.join(open(sys.argv[1]).read().split())

def reverseComplement(sequence):
    #given a DNA string, find the reverse complement

    #DNA complement dict
    complements = {'A':'t','C':'g','G':'c','T':'a'}

    #reverse the sequence for the output and then replace nuc's with their complements
    revCompSeq = sequence[::-1]
    for nuc,comp in complements.items():
        revCompSeq = revCompSeq.replace(nuc,comp)

    return revCompSeq.upper()

revCompSeq = reverseComplement(DNAseq)

#print output to new file and open

fnew = 'ANS_'+sys.argv[1]
fh = open(fnew,'w')
fh.write(revCompSeq)
fh.close()

import webbrowser
webbrowser.open(fnew)
Avatar
Tank (Xiao-Ning Zhang)
PhD Student @ Data Miner & Coder

I’m a PhD Student majoring in Bioinformatics and Biostatistics who loves computer programming such as C(++), Java, Python and R.