You Git

In 2007, I published my second paper [2]. It was a cellular automaton model of carcinogenesis and, in my opinion, my second best paper *. In 2014, Alexandre Sarmento (from Brazil) asked me for the code. Of course I’d lost it. But, amazingly, I’d emailed it to Ornella Cominetti (Chile) back in 2008, and she still had our correspondence.

I considered emailing all my code to Ornella, but instead futureproofed my work by setting up a GitHub account. There you can reproduce models such as this

showing the expansion and somatic evolution of neoplastic epithelial tissue.

* My best paper is, in my opinion, my first [1]. According to Google Scholar, I’ve now written 52 papers. I’ll leave the reader to perform their own extrapolation.


  1. Smallbone K, Gavaghan DJ, Gatenby RA, Maini PK (2005) “The role of acidity in solid tumour growth and invasion” J Theor Biol 235:476-484.
  2. Smallbone K, Gatenby RA, Gillies RJ, Maini PK, Gavaghan DJ (2007) “Metabolic changes during carcinogenesis: potential impact on invasiveness” J Theor Biol 244:703-713.


I’ve written previously about the Daaaaave algorithm for using ‘omics data to predict fluxes through a metabolic network. We originally implemented the algorithm in MatLab, and released the code as supplementary material to the paper [1]. But the journal (BMC Systems Biology) messed up: changing the file names, and breaking the code.

We’ve tried to get them to fix this. Unsuccessfully.

So instead, I’ve put the code on sourceforge github [2]. And since MatLab sits uncomfortably with my cardigan-wearing, open source approach to science, I’ve reimplemented the algorithm in python there, too.


  1. Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N (2012) “Improving metabolic flux predictions using absolute gene expression data” BMC Systems Biology 6:73.

A python


Dietary fibres are fermented by bacteria in the colon to form short-chain fatty acids: acetate (C2), propionate (C3) and butyrate (C4). Fatty acids are taken up by colonocytes, and processed in the mitochondria using β-oxidation. This metabolism of — otherwise indigestible — carbohydrates via gut flora provides a major source of energy in humans.

Those of you interested in SCFA metabolism would do well to follow the work of Barbara Bakker [1-3]. In a recent paper with Karen van Eunen and friends, Babs created a mathematical model of β-ox in the rat liver. The model considers competition between acyl-CoAs with differing chain lengths for the same enzymes, which can lead to pathway saturation. I’ve created SBML versions of their model: have a play.

overview of β-ox [1]


  1. van Eunen K, Simons SM, Gerding A, Bleeker A, den Besten G, Touw CM, Houten SM, Groen BK, Krab K, Reijngoud DJ, Bakker BM (2013) “Biochemical competition makes fatty-acid β-oxidation vulnerable to substrate overload” PLoS Computational Biology 9:e1003186.
  2. den Besten G, van Eunen K, Groen AK, Venema K, Reijngoud DJ, Bakker BM (2013) “The role of short-chain fatty acids in the interplay between diet, gut microbiota, and host energy metabolism” Journal of Lipid Research 54:2325-2340.
  3. den Besten G, Lange K, Havinga R, van Dijk TH, Gerding A, van Eunen K, Müller M, Groen AK, Hooiveld GJ, Bakker BM, Reijngoud DJ (2013) “Gut-derived short-chain fatty acids are vividly assimilated into host carbohydrates and lipids” American Journal of Physiology – Gastrointestinal and Liver Physiology 305:G900-910.

A Baltic Circuit

Last week I took the train and ferry to Sønderborg, in Denmark. Whilst there, U+003F junior went to his first gig; it was Efterklang’s last.

I like trains. I like the Baltics. I like ferries and Efterklang too, but let’s not complicate matters. You can [almost] circumnavigate the Baltic Sea by train. Here’s how:

  • Day 1: starting at my little brother’s place in Göteborg, take the sleeper at 18:25, arriving into Luleå in the north of Sweden at 11:30 the next day. Cost: 873kr.
  • Day 2: take the bus [boo!] across the border into Finland, leaving at 12:45 and arriving to Tornio at 15:15. Cost: free! Then a sleeper to Helsinki, leaving at 20:31 and arriving at 10:00. Cost: €105.
  • Day 3: travel to Moscow on the Tolstoi, leaving at 17:23, arriving into Moscow Leningradski at 08:24. This is €158, but you’ll also need to get a Russian visa well in advance (£116). Spend the night in Moscow.
  • Day 5: take the Trans-European Express to Berlin (18000₽), leaving Moscow Byelorruski at 07:44, arriving 06:53. This one runs through Belarus, so you’ll need another visa (£50).
  • Day 6: The Berlin Night Express is a unique train that travels aboard a ferry to cross the Baltic. 799kr. Leaves Berlin at 22:30, arrives into Malmö at 12:55. It’s a short hop back to Göteborg, leaving at 13:08, arriving 16:20 on day 7. 343kr.

Total time: just under six days. Total cost: just over £900, based on two sharing etc etc, plus a Moscow hotel. To get back where you started. Brilliant.

The Baltic Sea

From genes to fluxes #2

Last time I talked about the issue with the method used by Daaaaave [1] to map from enzyme-level data to reaction-level data. Given enzyme (or gene) levels A = 4, B = 3 and C = 2 units, we find:

reaction GPR Daaaaave Gimme
4 (A and B) or (A and C) min(4,3) + min(4,2) = 5 max(min(4,3), min(4,2)) = 3
5 A and (B or C) min(4,3 + 2) = 4 min(4, max(3,2)) = 3

The problem with applying the min/plus rule to GPRs is that reactions 4 and 5 are the same (albeit differently bracketed), but Daaaaave assigns them different values. As Nikos pointed out, the min/max rule used by Gimme [2] doesn’t make this mistake. However, I think we really should be adding the activities of alternative catalysts; indeed some networks — such as “Yeast 1” [3] — use separate reactions in place of “or” statements. Any mapping must be robust to equivalent representations.

Let’s step back a bit. Reaction 4 is catalysed by alternative complexes, A:B and A:C.

r4 → (A and B) or (A and C)

There is less of A (4) than the total amount of B (3) and C (2), so there must be some B or C “wasted” when forming the two complexes. There are an infinite number of arrangements here — we could have A:B/A:C = 3/1, 2/2, 2½/1½, … — but their maximum total activity is 4 units. This value of 4 is overestimated by Daaaaave, but underestimated by Gimme.

We can frame our verbal reasoning above mathematically. Each GPR mention of an enzyme across the network is really a separate entity

r4 → (A1 and B1) or (A2 and C1)

that together make up the total enzyme level

A1 + A2 + … = A = 4.

We can substitute “and” relationships by introducing new variables Xi ≥ 0 that represent complexes

r4 → X1 or X2

whose activities can be no more than any of their parts

X1 ≤ A1, X1 ≤ B1.

We can also substitute “or” relationships by introducing new variables Yi that represent alternative catalysts

r4 = Y1

whose activities are the sum of their parts

Y1 = X1 + X2.

Finally, we want there to be as little wastage as possible, and one way to achieve this is through maximising the total activity

maximise: r1 + r2 + ….

This optimisation is an LP problem and can be easily solved for networks of any size. Indeed, running an FBA over the network would be of the same computational complexity. Most importantly, this mapping makes the most of the available data.


  1. Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N (2012) “Improving metabolic flux predictions using absolute gene expression data” BMC Systems Biology 6:73.
  2. Becker SA, Palsson BØ (2008) “Context-specific metabolic networks are consistent with experiments” PLoS Comp Biol 4:e1000082.
  3. Herrgård MJ, Swainston N, Dobson P, Dunn WB, Arga KY, Arvas M, Blüthgen N, Borger S, Costenoble R, Heinemann M, Hucka M, Le Novère N, Li P, Liebermeister W, Mo ML, Oliveira AP, Petranovic D, Pettifer S, Simeonidis E, Smallbone K, Spasić I, Weichart D, Brent R, Broomhead DS, Westerhoff HV, Kirdar B, Penttilä M, Klipp E, Palsson BØ, Sauer U, Oliver SG, Mendes P, Nielsen J, Kell DB (2008) “A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology” Nat Biotechnol 26:1155-1160.

From genes to fluxes

In 2012, we set out a pipeline [1] for using predicting flux through a metabolic network, using genetic or proteomic data.

Two inputs are required:

  • a genome-scale metabolic network containing gene-protein-reaction (GPR) associations, such as human or yeast
  • quantitative estimates of their enzyme levels, via proteomics or absolute transcriptomic data

To predict metabolic flux, two steps are taken:

  • enzyme-level data are combined with GPR associations, to create reaction-level data
  • reaction-level data are constrained by mass-balancing, to create system-level flux data

A number of methods have been proposed to perform this second mapping, maximising the correlation between reaction data and system flux, including Daaaaave [1, developed in Manchester], Gimme [2] and iMat [3]. I won’t discuss their relative merits here, but will instead focus on the first step.

The mapping isn’t quite as easy as it seems, due to the many-to-many relationship between enzymes and metabolic reactions. Suppose we have determined enzyme levels of A = 4, B = 3 and C = 2. How should we interpret their various combinations as GPRs?

reaction GPR level
1 A 4
2 A or B 4 + 3 = 7
3 A and B min(4,3) = 3
4 (A and B) or (A and C) min(4,3) + min(4,2) = 5
5 A and (B or C) min(4,3 + 2) = 4

Given the simple mapping in reaction 1, we may use the enzyme level directly. The “or” relationship in reaction 2 allows for alternative catalysts, so its total level is given by the sum of its components. The “and” relationship in reaction 3 means it is catalysed by a complex, whose maximum possible concentration is given by the minimum level of its components. These two min/plus rules may be combined for more complex GPRs such as in reactions 4 and 5.

Here’s the rub: the Boolean logic used in 4 and 5 are the same (just bracketed differently), yet the levels output by the mapping are different. The existing approach is badly-defined and we need to go back to the drawing-board. One option — proposed by Brandon Barker — is to insist that GPRs are always written in a consistent manner. For example, we could use the disjunctive normal form (DNF), meaning that GPRs are expressed as an “or of ands”: a list of alternative complexes, as per reaction 4. I think that having consistency like this is an excellent idea. But I also think that our mapping should be robust to alternative bracketing; I’ll show you how, next time.


  1. Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N (2012) “Improving metabolic flux predictions using absolute gene expression data” BMC Systems Biology 6:73.
  2. Becker SA, Palsson BØ (2008) “Context-specific metabolic networks are consistent with experiments” PLoS Comp Biol 4:e1000082.
  3. Shlomi T, Cabili MN, Herrgård MJ, Palsson BØ, Ruppin E (2008) “Network-based prediction of human tissue-specific metabolism” Nature Biotechnology 26:1003–1010.

Striking a balance with Recon 2.1

In 2013, after years of work, a comprehensive reconstruction of human metabolism, called “Recon 2“, was released to much fanfare [1].

But, as with any many-cooked project, some errors slipped through. The biggest issue I encountered involves unbalanced reactions and generic metabolites. Here are three reactions from the model:

  1. generic fatty acid R-group transfer from CoA (C21) to DHAP (C3)

This reaction provides a template to be followed by all fatty acid side groups. However, to function as a model, there need to be connections between these generic molecules and their specific counterparts:

  1. specific to generic reaction, converting C14–CoA to R–CoA
  2. generic-to-specific reaction, converting R–CoA to C16–CoA

Taken together, reactions 2 and 3 are elementally-unbalanced, providing a mechanism for generating two carbons for free, and meaning any analyses with Recon 2 examining carbon flux  will be unreliable.

We can, however, modify the network to ensure that all reactions are elementally-balanced. Generic molecules must be replaced by their specific counterparts, creating a new reaction for each possible R-group, and removing any specific-to-generic interconversions:

Given n possible R-groups, there are n3 possible triglycerides. Replacing all generic metabolites as above thus leads to a much larger network, “Recon 2.1”.

Recon 2 Recon 2.1
reactions 7440 71159
variables 5463 13940

The number of unique metabolites still falls well short of the forty thousand defined in HMDB, but is a step in the right direction.

The new model is available in SBML format from (MODEL1311110001). A more detailed description (and further tinkerings) can be found on arXiv [2].


  1. Thiele I, Swainston N, Fleming RM, Hoppe A, Sahoo S, Aurich MK, Haraldsdottir H, Mo ML, Rolfsson O, Stobbe MD, Thorleifsson SG, Agren R, Bölling C, Bordel S, Chavali AK, Dobson P, Dunn WB, Endler L, Hala D, Hucka M, Hull D, Jameson D, Jamshidi N, Jonsson JJ, Juty N, Keating S, Nookaew I, Le Novère N, Malys N, Mazein A, Papin JA, Price ND, Selkov E Sr, Sigurdsson MI, Simeonidis E, Sonnenschein N, Smallbone K, Sorokin A, van Beek JH, Weichart D, Goryanin I, Nielsen J, Westerhoff HV, Kell DB, Mendes P, Every Man, His Dog, Palsson BØ (2013) “A community-driven global reconstruction of human metabolism” Nature Biotechnology 31:419-425.
  2. Smallbone K (2013) “Striking a balance with Recon 2.1” arXiv:1311.5696