Learn Bioinformatics
March 27, 2018
This list consists of ~10 tutorials to learn bioinformatics. You can think of this list as a “Free Online Nano Book”. We’ll cover important bioinformatics topics, learning (a) the biological significance of each problem, and (b) the computational algorithms used to solve the problem. Everything is 100% free.
What is Bioinformatics?
Computers are everywhere, and biology is no exception. Bioinformatics is computer science applied to biology. In particular, bioinformatics often deals with application of algorithms, data structures and machine learning methods for the analysis of DNA, proteins and evolutionary history. More detail on topics we’ll discuss are included below within each section.
Subscribe to add this list to the top of your Home Page. Get started with the first article below.
What is a DNA? What is a Protein?
This section (single tutorial) provides the biological context for the rest of this course. It describes what DNAs and proteins are, and how we can model them as a sequence of characters.
Conserved Sequences and Regulatory Motifs with Brute-Force Algorithms and Randomized Algorithms
Conserver sequences and regulatory motifs are short sequences (say of length 15-30) which occur frequency in the genome. These sequences serve a variety of functions - such as regulating gene expression (and hence how much of a protein is produced) and indicating where genes begin. We’ll see how to find these sequences using brute-force algorithms and randomized algorithms.
- Conserved Sequences: Their Biological Significance, and the K-mer Finding problem
- Sequence Motifs, Consensus Sequences and The Motif Finding Problem
Genome Assembly and Sequence Alignment with Graph Algorithms and Dynamic Programming
Recently, we’ve figured out cost effective ways of sequencing a human genome, i.e. taking a human genome and reading sequence of 3-4 billion nucleic acids (A, C, G and T) that it comprises of. In this section, we’ll see how this is achieved. We’ll also see how we can find similar regions in two different genomes, which allows us to do things like infer evolutionary history and predict protein function.
- Introduction to Genome Assembly
- Sequence alignment using Longest Common Subsequence algorithm
- Synteny Blocks, Genetic Rearrangements and Synteny Block Construction
Constructing Evolutionary Trees with Greedy Algorithms and Dynamic Programming
An evolutionary tree, or “tree of life”, is a representation of how life evolved on our planet. It shows us which animals are more closely related to each other (dogs and wolves, humans and chimpanzees), and which ones are not. In this section, we’ll see methods to infer evolutionary trees, given parts of the DNA from different species.
- Evolutionary Trees
- Evolutionary Tree Construction: Neighbor-Joining Algorithm
- Character Based Evolutionary Tree Construction
Gene Analysis with Clustering Algorithms, and Detecting Disease Causing Mutations with Suffix Trees
In the first tutorial, we’ll see how we can find similar genes using the clustering algorithms such as K-means clustering and hierarchical clustering. In the second tutorial, we’ll see how we can detect mutations which cause diseases by mapping genes from a diseased human to a reference human genome. For this, we’ll see how to perform exact and inexact string matching efficiently using data structures like tries and suffix trees.
Advanced Topics: Protein Structure Prediction
This is a bonus tutorial. It covers the advanced topic of protein structure prediction, which is currently an area of active research with lots of unsolved and open problems.