Python For BioInformatics

What You Will Learn?

Using Python In Bioinformatics

Ask

Feel free to interrupt & ask during the session.

What Is Python?

General purpose, high-level programming language.

Why Use Python?

  • Python is the easiest language to learn.
  • Readability.
  • Develop applications quickly.
  • Powerful standard library.
  • Scalability from very small to very large programs.

Getting Started.

  • Python runs mostly on all modern operating systems.

Using Interpreter

  • interpreter - reads every line, evaluates, returns result.
  • variables - No need to give type of the variable.
  • print() - function, prints result to standard output.

Code

>>> 2
2
>>> 2 + 2
4
>>> name = 'anand'
>>> name
'anand'
>>> print(name)
anand

Numbers

  • General arithmetic
  • Swapping two numbers

Code

>>> 21 / 5
4
>>> 21 / 5.0
4.2
>>> 2 * 3
6
>>> 2 ** 3
8
>>> 2 + 3 * 2
8
>>> (2 + 3) * 2
10

String

  • single quote ('bio') or double quote ("bio").
  • triple single/double quotes for multi line string.
  • strings are immutable data type.

Code

>>> 'dna'
'dna'
>>> "dna"
'dna'
>>> """
... long
... dna
... seq
... """
'\nlong \ndna\nseq\n'

String operations

>>> sequence = 'actgaaattaaa'
>>> sequence.upper()
'ACTGAAATTAAA'
>>> sequence.count('a')
7
>>> sequence.replace('a', 'c')
'cctgcccttccc'
>>> len(sequence)
12

Substrings

>>> sequence.find('aaa')
4
>>> sequence[3]
'g'
>>> sequence[3:5]
'ga'
>>> sequence[-1]
'a'
>>> sequence[-3:-2]
'a'

Exercise

  • Calcuate GC content of 'cctgccactataccc'

Booleans

  • True and False are Boolean values in Python.
  • Python uses keyword and, or, not for and, or, not operations.
  • Truth table

Conditionals

  • Python uses if, elif, else for branching.
  • == is used to check two values are equal
  • != is used for non equality checks.

Code

>>> seq1 = 'atc'
>>> seq2 = 'atc'
>>> seq1 == seq2
True
>>> if seq1 == seq2:
...   print('both are equal')
...
both are equal

Lists

  • List is collection of heterogenous data type.
  • List is similar to array in other languages.
  • Size of the list grows over the course of the program.

Code

>>> data = ['homosapiens', 2015, 03, 'life exists']
>>> data[0]
'homosapiens'
>>> len(data)
4
>>> data[-1]
'life exists'
>>> print(data)
['homosapiens', 2015, 3, 'life exists']
>>> more_data = [data, 'valid data']
>>> print(more_data)
[['homosapiens', 2015, 3, 'life exists'], 'valid data']

For loop

for loop can be used against anything which is iterable

Code

>>> for i in "bioinformatics":
...     print(i)

>>> for i in ['sequence', 12, 'ggta', 'actgc']:
...   print(i)
...
sequence
12
ggta
actgc

Files

Open the sequence file.

Loop over the file to read.

Close the file.

Code

>>> f = open('sample.fasta')
>>> for line in f:
...   print(line)

>>> f.close()

Exercise

Open a fasta file & count nucleotides.

Resources

http://python.org/

http://interactivepython.org/

http://learnpythonthehardway.org/

Bioinformatics Programming By Mitchell L Model

Questions?