Transcript to flux methods

Last year, Daniel Machado and Marcus Herrgård published a great paper [1] evaluating different methods for mapping from transcript data to flux patterns. Flux predictions for eighteen methods were compared to three E.coli and yeast datasets.

The methods varied widely their requirements: continuous vs. discrete levels (105/cell vs. high/low) and absolute vs. relative expression (105/cell vs. twice as much as a reference condition). I wouldn’t say that Daaaaave (rather boringly called “Lee-12” above) is the best algorithm out there — indeed the paper finds that all the methods perform equally well (badly). But I do think that it lies in the right part of the graph, using continuous, absolute data.

One of the successes of Machado’s paper is making their code and datasets available via github. I’ve now forked it into my own repository, allowing us to test new methods using their suite.

My first test was to attempt to address the problem that

Methods that do not make any assumptions regarding a biological objective (iMAT, Lee–12 and RELATCH*) … incorrectly predicted a zero growth rate in all cases

by enriching the reconstructions: associating the genes encoding the main DNA polymerases with growth. It didn’t make any difference, and I shouldn’t have been surprised. This gene association is one data point amongst thousands, and the cell’s growing would require a major rerouting of flux, thereby moving other many data points away from their best fit.

Back to the drawing-board.


  1. Daniel Machado and Marcus Herrgård (2014) “Systematic evaluation of methods for integration of transcriptomic data into constraint-based models of metabolism” PLoS Computational Biology 10:e1003580.

Diabetic neuropathy

For the next couple of months I’ll be working on a project alongside Natalie Gardiner from Life Sciences, with co-conspirators @neilswainston and @porld. Natalie studies diabetic neuropathy: this debilitating nerve damage affects around half of diabetes patients, and can lead to pain or loss of sensation. The causes of the condition are not well-understood, though elevated blood glucose is known to be a key factor.

Natalie’s lab has amassed comprehensive proteomic and metabolomic data sets from nerve cells in diabetic rats. This is a great opportunity for us to put to work some of the tools we’ve developed in the MCISB. ‘Omics measurements can tell us what changes occur in a disease, but not which of these changes are important to its development; for this we require knowledge of the underlying network organisation. We shall apply our Daaaaave algorithm [1, see also “From genes to fluxes”] to rat-nerve specific derivations of Recon 2 [2, “Hambo” see also “Striking a balance with Recon 2.1”], to derive cell–level behaviour from Natalie’s gene-level data.

Watch this space for news on “Super-Daaaaave” and “Rambo”.


  1. Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N (2012) “Improving metabolic flux predictions using absolute gene expression data” BMC Systems Biology 6:73.
  2. Thiele I, Swainston N, Fleming RM, Hoppe A, Sahoo S, Aurich MK, Haraldsdottir H, Mo ML, Rolfsson O, Stobbe MD, Thorleifsson SG, Agren R, Bölling C, Bordel S, Chavali AK, Dobson P, Dunn WB, Endler L, Hala D, Hucka M, Hull D, Jameson D, Jamshidi N, Jonsson JJ, Juty N, Keating S, Nookaew I, Le Novère N, Malys N, Mazein A, Papin JA, Price ND, Selkov E Sr, Sigurdsson MI, Simeonidis E, Sonnenschein N, Smallbone K, Sorokin A, van Beek JH, Weichart D, Goryanin I, Nielsen J, Westerhoff HV, Kell DB, Mendes P, Every Man, His Dog, Palsson BØ (2013) “A community-driven global reconstruction of human metabolism” Nature Biotechnology 31:419-425.

Unfinished Business #4: modelling β-oxidation

I have a long-running collaboration with Bernard Corfe, modelling metabolism of dietary fibres in the gut. These – otherwise indigestible – carbohydrates are first fermented by bacteria in the colon to form short-chain fatty acids: acetate (C2), propionate (C3) and butyrate (C4). The fatty acids are then taken up by colonocytes, and processed in their mitochondria using β-oxidation. It is a major source of energy in humans, and deficiencies in dietary fibre metabolism cause a number of diseases.

Of course we’d like to model this by fully characterising all the enzymes in colonocytes, as Manchester did for yeast glycolysis and PPP. But, for those not fortunate enough to have a £6.4M grant, we can get a long way through enriching the model structure (below) with typical parameter values.

You can grab an SBML version of this model from have a play. Having a kinetic model means we can run analyses such as MCA. And running such analyses allows us to find out that butyrate uptake is the key controller of both β-oxidation and glycolysis.

Interaction between β-oxidation and glycolysis in colonocytes, presented in SBGN format.
Competition occurs between the short-chain fatty acids and lactate for transporter use.

Unfinished Business #3: evolving metabolic networks

Over the years, we’ve developed metabolic reconstructions for a number of different organisms. It’s a slow process, with each reaction the result of a human reading and interpreting a scientific paper. Reducing these networks to graphs (collections of nodes/metabolites, connected by edges/reactions) allows us to describe their structure using familiar graph-theoretic language like “scale-free” and “small-world” [1,2]. But the networks are more than graphs. Biochemical reactions can have multiple reactants and products, with varying stoichiometry. Rewriting the reaction

S1 + S2 ⟺ 2 P1 + P2

as a collection of edges

S1 ⟺ P1, S1 ⟺ P2, S2 ⟺ P1, S2 ⟺ P2

loses important information.

It would be incredibly useful to have an algorithm that generates a metabolic network structure for a given number of metabolites. The easy way to do this is to take a pan-organism super-network like Kegg or MetaCyc, and to extract a connected, functional sub-network of the right size. This method would ensure that the result has the correct structural properties, but would give us no insight as to how this network evolved.

A better, harder approach, is to “grow” the network, adding new metabolites and reactions according to certain rules that characterise the evolutionary pressures unto which the network is subjected. This approach has again been used with graphical representations [2,3], but not stoichiometric models.

What rules would we use to evolve these networks? Unlike graphs, these metabolic models must satisfy mass-balancing, which can be written as a matrix equation [4]. I anticipate that growing metabolic networks will require our choosing solutions from this equation, by applying probabilistic attachment ideas used to grow other (signalling, social) networks.

Can we synthesise a metabolic network that looks like Kegg?


  1. H Jeong, B Tombor, Réka Albert, ZN Oltvai, and Albert-László Barabási (2000) “The large-scale organization of metabolic networks” Nature 407:651-654. doi:10.1038/35036627
  2. Henry Dorrian, Kieran Smallbone, and Jon Borresen (2012) “Size dependent growth in metabolic networks” arXiv:1210.2550
  3. Albert-László Barabási and Réka Albert (1999) “Emergence of scaling in random networks” Science 286: 509-512. doi:10.1126/science.286.5439.509
  4. Albert Gevorgyan, Mark Poolman, and David Fell (2008) “Detection of stoichiometric inconsistencies in biomolecular models” Bioinformatics 24:2245-2251. doi:10.1093/bioinformatics/btn425

“Typical parameter values”

My approach to referencing.

An aside from this week’s Unfinished Business.

Whilst writing yesterday’s post, I realised how often I (and I imagine many other modellers) resort to using “typical parameter values”, when no experimentally-determined numbers are available. If you read one of my papers, you’ll probably find a table listing these typical values and their provenance. But if you dig a little deeper, and follow these references back (all self-references, of course), you’ll eventually return to the paper you started with. Strange.

No, I don’t know where those numbers came from either. Still, we can derive them now using Brenda: a huge database of enzyme kinetic parameters. Database-level statistics in Brenda are presented as histograms; for example, there are over 105 experimentally-determined Michaelis-Menten constants KM, and we can use their statistics page to generate a histogram of the distribution of log10 KM:

We can calculate the mean and standard deviation of log10 KM from this curve, and then use a Taylor expansion to approximate the mean μ and standard deviation σ of KM. This process can be applied to all the functional parameters in Brenda:

parameter unit μ σ
KM mM 0.96 0.52
KI mM 0.077 0.028
IC50 mM 0.015 0.0068
pI 6.0 1.6
kcat 1/s 27 14
specific activity umol/min/mg 0.34 1.9
pH optimum 7.3 1.2
temperature optimum °C 39 12

I’m pleased (OK: surprised) to see that my typical, typical parameter values of KM = KI = 0.1 mM, and kcat = 10/s are close to the mark, though an order of magnitude estimate of KM = 1 mM might be a better choice in the future.

Unfinished Business #2: Who’s afraid of the big bad model?

As well as developing and maintaining metabolic reconstructions of yeast, E.coli and Cho cells, Pedro Mendes and I created an algorithm [1] for turning these static networks into dynamic kinetic models. We include any known parameters, and guesstimate the remainder.

But how — the (frequentist) statisticians amongst you may well ask — can we create a model with thousands of parameters, when we only have measurements for tens? There are two justifications for this. Firstly, dynamics are mostly driven by the structure (which we know), rather than the parameters (which we guess). Secondly, on a more philosophical level …

… is it better to model the well-characterised pathway in isolation, or as part of a worse-characterised wrapper?

The real issue with these big metabolic models, with their thousands of variables, is actually numerics. Yunjiao Wang and I found [2] that, even for the simplest linear pathway model, we should expect instability, stiffness and even chaos as the chain increases in length. In our full model of metabolism, the stiffness ratio was found to be around 1011. Given the different timescales over which metabolism operates, this is not so surprising. But it means that today’s systems biology tools just cannot robustly simulate such models: of this size, and with these numerical instabilities.

Whole-cell models are of increasing importance to the community. Jonathan Karr’s model of M. genitalium [3, also the focus of a recent summer school] avoids the issue of dynamically modelling metabolism through using FBA. We would, of course, much prefer the use of full kinetic models. But, right now, these models are not usable, and they are not useful. The onus to address their utility shouldn’t be on the software developers; instead, we modellers need to look into alternative ways to represent metabolism that will allow for its dynamic simulation at the whole-cell level.


  1. Kieran Smallbone and Pedro Mendes (2013) “Large-scale metabolic models: From reconstruction to differential equations” Industrial Biotechnology 9:179-184.
  2. Kieran Smallbone and Yunjiao Wang (2015) “Mathematising biology” MIMS EPrints 2015.21
  3. Jonathan Karr and friends (2012) “A whole-cell computational model predicts phenotype from genotype” Cell: 150:389-401 doi:10.1016/j.cell.2012.05.044

Unfinished Business #1: bucket lists

obsessive listologist David Hilbert

The end of this month sees the end of my current grant BioPreDyn. Or, to give it its full title, “From Data to Models: New Bioinformatics Methods and Tools for Data-Driven, Predictive Dynamic Modelling in Biotechnological Applications”.

It’s a time for reflection. Not only on what was achieved (my travelling round Europe by boat and train), but on what wasn’t achieved. Those ideas, both revolutionary and rubbish, that remain unfinished and unstarted, because you were too busy delivering deliverables, or answering emails.

Rather than leave these ideas to gather dust in my mental filing cabinet, I’ll spend this final week setting them out, for you to rubbish, or to run off with and claim as your own.

The first idea is by far the most frivolous. Little boys like to collect stamps. Big boys like to make Bucket Lists. German mathematician David Hilbert was no exception, and in 1900 compiled his list of 23 unsolved problems that were to shape the course of mathematics for the next century. One hundred years later, the Clay Institute set out seven unsolved maths problems, each carrying a $1,000,000 prize for their solution.

Which led me to ponder: can we follow Hilbert’s lead, and compile a list of the most important open problems in Mathematical Biology? Such a list could focus the field towards a common goal, just as it did back then. In 2008, Darpa announced 23 mathematical challenges, some of which had a biological flavour. The difficulty we face in constructing our list is highlighted by comparing Hilbert’s first problem:

Is there a set whose cardinality is strictly between that of the integers and the real numbers?

with Darpa’s first challenge:

Develop a mathematical theory to build a functional model of the brain.

Hilbert’s question is very well-defined (though, it turns out, is impossible to answer). By contrast, Mathematical Biology problems are, and should be, application-driven, and are hence necessarily fluffier. Darpa’s challenge, for example, is vague, and we could never tell if an answer amounted to a solution.

Our task then is to set out not only the big open problems in Mathematical Biology, but those concrete enough to be answered.