Category Archives: daaaaave

Transcript to flux methods

Last year, Daniel Machado and Marcus Herrgård published a great paper [1] evaluating different methods for mapping from transcript data to flux patterns. Flux predictions for eighteen methods were compared to three E.coli and yeast datasets.

The methods varied widely their requirements: continuous vs. discrete levels (105/cell vs. high/low) and absolute vs. relative expression (105/cell vs. twice as much as a reference condition). I wouldn’t say that Daaaaave (rather boringly called “Lee-12” above) is the best algorithm out there — indeed the paper finds that all the methods perform equally well (badly). But I do think that it lies in the right part of the graph, using continuous, absolute data.

One of the successes of Machado’s paper is making their code and datasets available via github. I’ve now forked it into my own repository, allowing us to test new methods using their suite.

My first test was to attempt to address the problem that

Methods that do not make any assumptions regarding a biological objective (iMAT, Lee–12 and RELATCH*) … incorrectly predicted a zero growth rate in all cases

by enriching the reconstructions: associating the genes encoding the main DNA polymerases with growth. It didn’t make any difference, and I shouldn’t have been surprised. This gene association is one data point amongst thousands, and the cell’s growing would require a major rerouting of flux, thereby moving other many data points away from their best fit.

Back to the drawing-board.


  1. Daniel Machado and Marcus Herrgård (2014) “Systematic evaluation of methods for integration of transcriptomic data into constraint-based models of metabolism” PLoS Computational Biology 10:e1003580.

Diabetic neuropathy

For the next couple of months I’ll be working on a project alongside Natalie Gardiner from Life Sciences, with co-conspirators @neilswainston and @porld. Natalie studies diabetic neuropathy: this debilitating nerve damage affects around half of diabetes patients, and can lead to pain or loss of sensation. The causes of the condition are not well-understood, though elevated blood glucose is known to be a key factor.

Natalie’s lab has amassed comprehensive proteomic and metabolomic data sets from nerve cells in diabetic rats. This is a great opportunity for us to put to work some of the tools we’ve developed in the MCISB. ‘Omics measurements can tell us what changes occur in a disease, but not which of these changes are important to its development; for this we require knowledge of the underlying network organisation. We shall apply our Daaaaave algorithm [1, see also “From genes to fluxes”] to rat-nerve specific derivations of Recon 2 [2, “Hambo” see also “Striking a balance with Recon 2.1”], to derive cell–level behaviour from Natalie’s gene-level data.

Watch this space for news on “Super-Daaaaave” and “Rambo”.


  1. Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N (2012) “Improving metabolic flux predictions using absolute gene expression data” BMC Systems Biology 6:73.
  2. Thiele I, Swainston N, Fleming RM, Hoppe A, Sahoo S, Aurich MK, Haraldsdottir H, Mo ML, Rolfsson O, Stobbe MD, Thorleifsson SG, Agren R, Bölling C, Bordel S, Chavali AK, Dobson P, Dunn WB, Endler L, Hala D, Hucka M, Hull D, Jameson D, Jamshidi N, Jonsson JJ, Juty N, Keating S, Nookaew I, Le Novère N, Malys N, Mazein A, Papin JA, Price ND, Selkov E Sr, Sigurdsson MI, Simeonidis E, Sonnenschein N, Smallbone K, Sorokin A, van Beek JH, Weichart D, Goryanin I, Nielsen J, Westerhoff HV, Kell DB, Mendes P, Every Man, His Dog, Palsson BØ (2013) “A community-driven global reconstruction of human metabolism” Nature Biotechnology 31:419-425.

From genes to fluxes #2

Last time I talked about the issue with the method used by Daaaaave [1] to map from enzyme-level data to reaction-level data. Given enzyme (or gene) levels A = 4, B = 3 and C = 2 units, we find:

reaction GPR Daaaaave Gimme
4 (A and B) or (A and C) min(4,3) + min(4,2) = 5 max(min(4,3), min(4,2)) = 3
5 A and (B or C) min(4,3 + 2) = 4 min(4, max(3,2)) = 3

The problem with applying the min/plus rule to GPRs is that reactions 4 and 5 are the same (albeit differently bracketed), but Daaaaave assigns them different values. As Nikos pointed out, the min/max rule used by Gimme [2] doesn’t make this mistake. However, I think we really should be adding the activities of alternative catalysts; indeed some networks — such as “Yeast 1” [3] — use separate reactions in place of “or” statements. Any mapping must be robust to equivalent representations.

Let’s step back a bit. Reaction 4 is catalysed by alternative complexes, A:B and A:C.

r4 → (A and B) or (A and C)

There is less of A (4) than the total amount of B (3) and C (2), so there must be some B or C “wasted” when forming the two complexes. There are an infinite number of arrangements here — we could have A:B/A:C = 3/1, 2/2, 2½/1½, … — but their maximum total activity is 4 units. This value of 4 is overestimated by Daaaaave, but underestimated by Gimme.

We can frame our verbal reasoning above mathematically. Each GPR mention of an enzyme across the network is really a separate entity

r4 → (A1 and B1) or (A2 and C1)

that together make up the total enzyme level

A1 + A2 + … = A = 4.

We can substitute “and” relationships by introducing new variables Xi ≥ 0 that represent complexes

r4 → X1 or X2

whose activities can be no more than any of their parts

X1 ≤ A1, X1 ≤ B1.

We can also substitute “or” relationships by introducing new variables Yi that represent alternative catalysts

r4 = Y1

whose activities are the sum of their parts

Y1 = X1 + X2.

Finally, we want there to be as little wastage as possible, and one way to achieve this is through maximising the total activity

maximise: r1 + r2 + ….

This optimisation is an LP problem and can be easily solved for networks of any size. Indeed, running an FBA over the network would be of the same computational complexity. Most importantly, this mapping makes the most of the available data.


  1. Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N (2012) “Improving metabolic flux predictions using absolute gene expression data” BMC Systems Biology 6:73.
  2. Becker SA, Palsson BØ (2008) “Context-specific metabolic networks are consistent with experiments” PLoS Comp Biol 4:e1000082.
  3. Herrgård MJ, Swainston N, Dobson P, Dunn WB, Arga KY, Arvas M, Blüthgen N, Borger S, Costenoble R, Heinemann M, Hucka M, Le Novère N, Li P, Liebermeister W, Mo ML, Oliveira AP, Petranovic D, Pettifer S, Simeonidis E, Smallbone K, Spasić I, Weichart D, Brent R, Broomhead DS, Westerhoff HV, Kirdar B, Penttilä M, Klipp E, Palsson BØ, Sauer U, Oliver SG, Mendes P, Nielsen J, Kell DB (2008) “A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology” Nat Biotechnol 26:1155-1160.

From genes to fluxes

In 2012, we set out a pipeline [1] for using predicting flux through a metabolic network, using genetic or proteomic data.

Two inputs are required:

  • a genome-scale metabolic network containing gene-protein-reaction (GPR) associations, such as human or yeast
  • quantitative estimates of their enzyme levels, via proteomics or absolute transcriptomic data

To predict metabolic flux, two steps are taken:

  • enzyme-level data are combined with GPR associations, to create reaction-level data
  • reaction-level data are constrained by mass-balancing, to create system-level flux data

A number of methods have been proposed to perform this second mapping, maximising the correlation between reaction data and system flux, including Daaaaave [1, developed in Manchester], Gimme [2] and iMat [3]. I won’t discuss their relative merits here, but will instead focus on the first step.

The mapping isn’t quite as easy as it seems, due to the many-to-many relationship between enzymes and metabolic reactions. Suppose we have determined enzyme levels of A = 4, B = 3 and C = 2. How should we interpret their various combinations as GPRs?

reaction GPR level
1 A 4
2 A or B 4 + 3 = 7
3 A and B min(4,3) = 3
4 (A and B) or (A and C) min(4,3) + min(4,2) = 5
5 A and (B or C) min(4,3 + 2) = 4

Given the simple mapping in reaction 1, we may use the enzyme level directly. The “or” relationship in reaction 2 allows for alternative catalysts, so its total level is given by the sum of its components. The “and” relationship in reaction 3 means it is catalysed by a complex, whose maximum possible concentration is given by the minimum level of its components. These two min/plus rules may be combined for more complex GPRs such as in reactions 4 and 5.

Here’s the rub: the Boolean logic used in 4 and 5 are the same (just bracketed differently), yet the levels output by the mapping are different. The existing approach is badly-defined and we need to go back to the drawing-board. One option — proposed by Brandon Barker — is to insist that GPRs are always written in a consistent manner. For example, we could use the disjunctive normal form (DNF), meaning that GPRs are expressed as an “or of ands”: a list of alternative complexes, as per reaction 4. I think that having consistency like this is an excellent idea. But I also think that our mapping should be robust to alternative bracketing; I’ll show you how, next time.


  1. Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N (2012) “Improving metabolic flux predictions using absolute gene expression data” BMC Systems Biology 6:73.
  2. Becker SA, Palsson BØ (2008) “Context-specific metabolic networks are consistent with experiments” PLoS Comp Biol 4:e1000082.
  3. Shlomi T, Cabili MN, Herrgård MJ, Palsson BØ, Ruppin E (2008) “Network-based prediction of human tissue-specific metabolism” Nature Biotechnology 26:1003–1010.