site hit counter

[LNJ]⇒ Read Free Hidden Markov Models for Bioinformatics T Koski 9789401006132 Books

Hidden Markov Models for Bioinformatics T Koski 9789401006132 Books



Download As PDF : Hidden Markov Models for Bioinformatics T Koski 9789401006132 Books

Download PDF Hidden Markov Models for Bioinformatics T Koski 9789401006132 Books

_feature_div" class="feature" data-feature-name="bookDescription">

Hidden Markov Models for Bioinformatics T Koski 9789401006132 Books

The field of computational biology has expanded greatly in the last decade, mainly due to the increasing role of bioinformatics in the genome sequencing projects. This book outlines a particular set of algorithms called hidden Markov models, that are used frequently in genetic sequence search routines. The book is primarily for mathematicians who want to move into bioinformatics, but it could be read by a biologist who has a strong mathematical background. The book is detailed at some places, sparse in others, and reads like a literature survey at times, but many references are given, and there are very interesting exercises at the end of each chapter section. In fact it is really imperative that the reader work some of these exercises, as the author proves some of the results in the main body of the text via the exercises.
Some of the highlights of the book include: 1. An overview of the probability theory to be used in the book. The material is fairly standard, including a review of continuous and discrete random variables, from the measure-theoretic point of view, i.e the author introduces them via a probability space which is set with its sigma field, and a probability measure on this field. The weight matrix or "profile" as it is sometimes called, is defined, this having many applications in bioinformatics. Bayesian learning is also discussed, and the author introduces what he calls the "missing information principle", and is fundamental to the probabilistic modeling of biological sequences. Applications of probability theory to DNA analysis are discussed, including shotgun assembly and the distribution of fragment lengths from restriction digests. A collection of interesting exercises is included at the end of the chapter, particularly the one on the null model for pairwise alignments. 2. An introduction to information theory and the relative entropy or "Kullback distance", the latter of which is used to learn sequence models from data. The author defines the mutual information between two probability distributions and the entropy, and calculates the latter for random DNA. He also proves some of the Shannon source coding theorems, one being the convergence to the entropy for independent, identically distributed random variables. The Kullback distance is then defined, as a distance between probability distributions, with the caution that it is not a metric because of lack of symmetry. 3. The overview of probabilistic learning theory, where 'learning from data' is defined as the process of inferring a general principle from observations of instances. 4. The very detailed treatment of the EM algorithm, including the discussion of a model for fragments with motifs. 5. The discussion of alignment and scoring, especially that of global similarity. Local alignment is treated in the exercises. 6. The discussion of the learning of Markov chains via Bayesian modeling applied to a training sequence via a family of Markov models. Frame dependent Markov chains are discussed in the context of Markovian models for DNA sequences. 7. The discussion of influence diagrams and nonstandard hidden Markov models, in particular the excellent diagrams drawn to illustrate the main properties, and excellent discussion is given of an "HMM with duration" in the context of the functional units of a eukaryotic gene. This is important in the GeneMark:hmm software available. 8. The treatment of motif-based HMM, in particular the discussion of the approximate common substring problem. 9. The discussion of the "quasi-stationary" property of some chains and the connection with the "Yaglom limit". 10. The treatment of Derin's formula for the smoothing posterior probability of a standard HMM. The author shows in detail that the probability of a finite length emitted sequence conditioned on a state sequence of the HMM depends only on a subsequence of the state sequence. 11. The treatment of the lumping of Markov chains, i.e. the question as to whether a function of a Markov chain is another Markov chain. 12. The very detailed treatment of the Forward-Backward algorithm and the Viterbi algorithm. 13. The discussion of the learning problem via the quasi-log likelihood function for HMM. 14. The discussion of the limit points for the Baum-Welch algorithm. Since the Baum-Welch algorithm deals with iterations of a map, its convergence can be proved by finding the fixed points of this map. These fixed points are in fact the stationary points of the likelihood function and can be related to the convergence of the algorithm via the Zangwill theory of algorithms. Unfortunately the author does not give the details of the Zangwill theory, but instead delegates it to the references (via an exercise). The Zangwill theory can be discussed in the context of nonlinear programming, with generalizations of it occurring in the field of nonlinear functional analysis. It might be interesting to investigate whether the properties of hidden Markov models, especially their rigorous statistical properties, can all be discussed in the context of nonlinear functional analysis.

Product details

  • Paperback 412 pages
  • Publisher Springer (May 13, 2011)
  • Language English
  • ISBN-10 940100613X

Read Hidden Markov Models for Bioinformatics T Koski 9789401006132 Books

Tags : Buy Hidden Markov Models for Bioinformatics on Amazon.com ✓ FREE SHIPPING on qualified orders,T. Koski,Hidden Markov Models for Bioinformatics,Springer,940100613X
People also read other books :

Hidden Markov Models for Bioinformatics T Koski 9789401006132 Books Reviews


The intended audience of this book are mathematicians. To understand this book, you should have prior coursework experience in at least several upper division undergraduate courses in mathematical statistics and probability theory. The structure of this book is also that of a typical math book; full of proposition, corollary, lemma, etc, and very limited use of illustrations (e.g., there is no single figure up to chapter 6).
I wanted a book with a mathematical sophistication simliar to Durbin's book, but this book is way more than that. On the other hand, I showed this book to a mathematics graduate student and she said this book is perfect for her. So I guess this book is written by a mathematician only for mathematicians.
The book gives outstanding coverage of all that goes into building HMMs - one of the most important tools in genome analysis and structure prediction. It covers the field in extreme depth. More depth, in fact, than needed for building useful HMM systems. It not only presents the forward and backward algorithms leading up to Baum-Welch, it presents all the extras - convergence, etc.
This additional depth of coverage may go beyond many readers' needs. It is very helpful, though, for people who need more than the usual algorithms. By giving the background in such detail, a persistent reader can follow to a certain point, then create modifications with a clear idea of where the new algorithm actually comes from.
Regarding the current practice of HMM usage, I found it a bit thin. Widely-known tools based on HMMs are mentioned only occasionally and in passing, and HMM-based alignment is discussed only briefly. Well, this book isn't for the tool user. Perhaps more important, I found scant mention of scoring with respect to some background probability model ("null" model, as it's called here).
My one real complaint, and this is truly minor, is the quality of illustration. The line-drawings look like Word pictures - not necessarily a bad thing, if done well. These aren't particularly professional-looking, though, and oddly stretched or squashed in many cases. Still, they're readable enough and make all the needed points.
A lesser point, and not the author's fault, is the editorial implication that this book introduces probabilitic models in general. It does not. This is strictly about HMMs, not Bayesian nets, bootstrap techniques, or any of the dozens of other probabilistic models used in bioinformatics. That is not a flaw of the book, just a flaw in how it's represented.
If you are dedicated to becoming an expert in HMM construction and application, you must have this book. It's a bit much, though, for people who just want the results that HMMs give.
"Hidden Markov Models of Bioinformatics" is an excellent exploration of the subject matter appropriate coverage, well written, and engaging. Hidden Markov Models are a rather broad class of probabilistic models useful for sequential processes. Their use in the modeling and abstraction of motifs in, for example, gene and protein families is a specialization that bears a thorough description, and this book does so very well. This is a book for understanding the theory and core ideas underlying profile HMMs, and if the term Expectation Maximization doesn't sound familiar or interesting to you, this is probably not the book you're looking for. Personally I found it clearer in some ways than the standard reference by Durbin, Eddy, Krogh, and Mitchison, but actually the two complement each other very nicely. If you are interested in constructing an HMM for your favorite protein family you probably want to look at the HMMER or SAM documentation instead; if you want to understand where HMMs come from or how you might architect one, there's probably no better book than this one.
The field of computational biology has expanded greatly in the last decade, mainly due to the increasing role of bioinformatics in the genome sequencing projects. This book outlines a particular set of algorithms called hidden Markov models, that are used frequently in genetic sequence search routines. The book is primarily for mathematicians who want to move into bioinformatics, but it could be read by a biologist who has a strong mathematical background. The book is detailed at some places, sparse in others, and reads like a literature survey at times, but many references are given, and there are very interesting exercises at the end of each chapter section. In fact it is really imperative that the reader work some of these exercises, as the author proves some of the results in the main body of the text via the exercises.
Some of the highlights of the book include 1. An overview of the probability theory to be used in the book. The material is fairly standard, including a review of continuous and discrete random variables, from the measure-theoretic point of view, i.e the author introduces them via a probability space which is set with its sigma field, and a probability measure on this field. The weight matrix or "profile" as it is sometimes called, is defined, this having many applications in bioinformatics. Bayesian learning is also discussed, and the author introduces what he calls the "missing information principle", and is fundamental to the probabilistic modeling of biological sequences. Applications of probability theory to DNA analysis are discussed, including shotgun assembly and the distribution of fragment lengths from restriction digests. A collection of interesting exercises is included at the end of the chapter, particularly the one on the null model for pairwise alignments. 2. An introduction to information theory and the relative entropy or "Kullback distance", the latter of which is used to learn sequence models from data. The author defines the mutual information between two probability distributions and the entropy, and calculates the latter for random DNA. He also proves some of the Shannon source coding theorems, one being the convergence to the entropy for independent, identically distributed random variables. The Kullback distance is then defined, as a distance between probability distributions, with the caution that it is not a metric because of lack of symmetry. 3. The overview of probabilistic learning theory, where 'learning from data' is defined as the process of inferring a general principle from observations of instances. 4. The very detailed treatment of the EM algorithm, including the discussion of a model for fragments with motifs. 5. The discussion of alignment and scoring, especially that of global similarity. Local alignment is treated in the exercises. 6. The discussion of the learning of Markov chains via Bayesian modeling applied to a training sequence via a family of Markov models. Frame dependent Markov chains are discussed in the context of Markovian models for DNA sequences. 7. The discussion of influence diagrams and nonstandard hidden Markov models, in particular the excellent diagrams drawn to illustrate the main properties, and excellent discussion is given of an "HMM with duration" in the context of the functional units of a eukaryotic gene. This is important in the GeneMarkhmm software available. 8. The treatment of motif-based HMM, in particular the discussion of the approximate common substring problem. 9. The discussion of the "quasi-stationary" property of some chains and the connection with the "Yaglom limit". 10. The treatment of Derin's formula for the smoothing posterior probability of a standard HMM. The author shows in detail that the probability of a finite length emitted sequence conditioned on a state sequence of the HMM depends only on a subsequence of the state sequence. 11. The treatment of the lumping of Markov chains, i.e. the question as to whether a function of a Markov chain is another Markov chain. 12. The very detailed treatment of the Forward-Backward algorithm and the Viterbi algorithm. 13. The discussion of the learning problem via the quasi-log likelihood function for HMM. 14. The discussion of the limit points for the Baum-Welch algorithm. Since the Baum-Welch algorithm deals with iterations of a map, its convergence can be proved by finding the fixed points of this map. These fixed points are in fact the stationary points of the likelihood function and can be related to the convergence of the algorithm via the Zangwill theory of algorithms. Unfortunately the author does not give the details of the Zangwill theory, but instead delegates it to the references (via an exercise). The Zangwill theory can be discussed in the context of nonlinear programming, with generalizations of it occurring in the field of nonlinear functional analysis. It might be interesting to investigate whether the properties of hidden Markov models, especially their rigorous statistical properties, can all be discussed in the context of nonlinear functional analysis.
Ebook PDF Hidden Markov Models for Bioinformatics T Koski 9789401006132 Books

0 Response to "[LNJ]⇒ Read Free Hidden Markov Models for Bioinformatics T Koski 9789401006132 Books"

Post a Comment