screenshot of original drawing from Tank

Bioinformatics Algorithm Tutorial Lesson1

Bioinformatics, Statistics, Algorithm

screenshot of original drawing from Tank

Bioinformatics Algorithm Tutorial Lesson1

Bioinformatics, Statistics, Algorithm

Date

Sep 16, 2019

12:00 AM

Event

WHERE IN THE GENOME DOES DNA REPLICATION BEGIN ?

Problem Definition

Genome replication is one of the most important tasks carried out in the cell, Replication begins in a genomic region called the replication origin (denoted oriC) and is performed by molecular copy machines called DNA polymerases

In the following problem, we assume that a genome has a single oriC and is represented as a DNA string , or a string of nucleotides from the four-letter alphabet { A , C , G , T }

Finding Origin of Replication Problem :
Input: A DNA string Genome.
Output: The location of oriC in Genome.

STOP and Think: Does this biological problem represent a clearly stated compu- tational problem?

Hidden Messages in the Replication Origin

PATTERNCOUNT(Text, Pattern)
  count = 0
  for i = 0 to |Text| - |Pattern|
    if Text (i, |Pattern| ) == Pattern
      count = count + 1
return count

Most Frequent Word

import sys

filedata = open(sys.argv[1]).read().split()

def mostFreq(text,k):
    #given a DNA string text and an integer k, find all most frequent k-mers in text
    
    #generate list of all kmers in text
    kmerList = []
    for i in range(len(text)-k+1):
        kmerList.append(text[i:i+k])

    #get the kmer counts
    kmerCounts = {}
    for kmer in kmerList:
        kmerCounts[kmer] = kmerCounts.get(kmer,0) + 1

    #identify most frequent kmers
    maxCount = max(kmerCounts.values())
    mostFreqKmers = [kmer for kmer,val in kmerCounts.items() if val == maxCount];

    return mostFreqKmers

text = filedata[0]
k = int(filedata[1])

mostFreqKmers = mostFreq(text,k)

#print output to new file and open
fnew = 'ANS_'+sys.argv[1]
fh = open(fnew,'w')
fh.write(' '.join(mostFreqKmers))
fh.close()

import webbrowser
webbrowser.open(fnew)

Reverse Complementary Algorithm

import sys

DNAseq = ''.join(open(sys.argv[1]).read().split())

def reverseComplement(sequence):
    #given a DNA string, find the reverse complement

    #DNA complement dict
    complements = {'A':'t','C':'g','G':'c','T':'a'}

    #reverse the sequence for the output and then replace nuc's with their complements
    revCompSeq = sequence[::-1]
    for nuc,comp in complements.items():
        revCompSeq = revCompSeq.replace(nuc,comp)

    return revCompSeq.upper()

revCompSeq = reverseComplement(DNAseq)

#print output to new file and open

fnew = 'ANS_'+sys.argv[1]
fh = open(fnew,'w')
fh.write(revCompSeq)
fh.close()

import webbrowser
webbrowser.open(fnew)

course tutorial

Tank (Xiao-Ning Zhang)

PhD Student @ Data Miner & Coder

I’m a PhD Student majoring in Bioinformatics and Biostatistics who loves computer programming such as C(++), Java, Python and R.