Last time I talked about the issue with the method used by Daaaaave  to map from enzyme-level data to reaction-level data. Given enzyme (or gene) levels A = 4, B = 3 and C = 2 units, we find:
||(A and B) or (A and C)
||min(4,3) + min(4,2) = 5
||max(min(4,3), min(4,2)) = 3
||A and (B or C)
||min(4,3 + 2) = 4
||min(4, max(3,2)) = 3
The problem with applying the min/plus rule to GPRs is that reactions 4 and 5 are the same (albeit differently bracketed), but Daaaaave assigns them different values. As Nikos pointed out, the min/max rule used by Gimme  doesn’t make this mistake. However, I think we really should be adding the activities of alternative catalysts; indeed some networks — such as “Yeast 1”  — use separate reactions in place of “or” statements. Any mapping must be robust to equivalent representations.
Let’s step back a bit. Reaction 4 is catalysed by alternative complexes, A:B and A:C.
r4 → (A and B) or (A and C)
There is less of A (4) than the total amount of B (3) and C (2), so there must be some B or C “wasted” when forming the two complexes. There are an infinite number of arrangements here — we could have A:B/A:C = 3/1, 2/2, 2½/1½, … — but their maximum total activity is 4 units. This value of 4 is overestimated by Daaaaave, but underestimated by Gimme.
We can frame our verbal reasoning above mathematically. Each GPR mention of an enzyme across the network is really a separate entity
r4 → (A1 and B1) or (A2 and C1)
that together make up the total enzyme level
A1 + A2 + … = A = 4.
We can substitute “and” relationships by introducing new variables Xi ≥ 0 that represent complexes
r4 → X1 or X2
whose activities can be no more than any of their parts
X1 ≤ A1, X1 ≤ B1.
We can also substitute “or” relationships by introducing new variables Yi that represent alternative catalysts
r4 = Y1
whose activities are the sum of their parts
Y1 = X1 + X2.
Finally, we want there to be as little wastage as possible, and one way to achieve this is through maximising the total activity
maximise: r1 + r2 + ….
This optimisation is an LP problem and can be easily solved for networks of any size. Indeed, running an FBA over the network would be of the same computational complexity. Most importantly, this mapping makes the most of the available data.
- Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N (2012) “Improving metabolic flux predictions using absolute gene expression data” BMC Systems Biology 6:73.
- Becker SA, Palsson BØ (2008) “Context-specific metabolic networks are consistent with experiments” PLoS Comp Biol 4:e1000082.
- Herrgård MJ, Swainston N, Dobson P, Dunn WB, Arga KY, Arvas M, Blüthgen N, Borger S, Costenoble R, Heinemann M, Hucka M, Le Novère N, Li P, Liebermeister W, Mo ML, Oliveira AP, Petranovic D, Pettifer S, Simeonidis E, Smallbone K, Spasić I, Weichart D, Brent R, Broomhead DS, Westerhoff HV, Kirdar B, Penttilä M, Klipp E, Palsson BØ, Sauer U, Oliver SG, Mendes P, Nielsen J, Kell DB (2008) “A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology” Nat Biotechnol 26:1155-1160.