A Baltic Circuit

Last week I took the train and ferry to Sønderborg, in Denmark. Whilst there, U+003F junior went to his first gig; it was Efterklang’s last.

I like trains. I like the Baltics. I like ferries and Efterklang too, but let’s not complicate matters. You can [almost] circumnavigate the Baltic Sea by train. Here’s how:

  • Day 1: starting at my little brother’s place in Göteborg, take the sleeper at 18:25, arriving into Luleå in the north of Sweden at 11:30 the next day. Cost: 873kr.
  • Day 2: take the bus [boo!] across the border into Finland, leaving at 12:45 and arriving to Tornio at 15:15. Cost: free! Then a sleeper to Helsinki, leaving at 20:31 and arriving at 10:00. Cost: €105.
  • Day 3: travel to Moscow on the Tolstoi, leaving at 17:23, arriving into Moscow Leningradski at 08:24. This is €158, but you’ll also need to get a Russian visa well in advance (£116). Spend the night in Moscow.
  • Day 5: take the Trans-European Express to Berlin (18000₽), leaving Moscow Byelorruski at 07:44, arriving 06:53. This one runs through Belarus, so you’ll need another visa (£50).
  • Day 6: The Berlin Night Express is a unique train that travels aboard a ferry to cross the Baltic. 799kr. Leaves Berlin at 22:30, arrives into Malmö at 12:55. It’s a short hop back to Göteborg, leaving at 13:08, arriving 16:20 on day 7. 343kr.

Total time: just under six days. Total cost: just over £900, based on two sharing etc etc, plus a Moscow hotel. To get back where you started. Brilliant.


The Baltic Sea

From genes to fluxes #2

Last time I talked about the issue with the method used by Daaaaave [1] to map from enzyme-level data to reaction-level data. Given enzyme (or gene) levels A = 4, B = 3 and C = 2 units, we find:

reaction GPR Daaaaave Gimme
4 (A and B) or (A and C) min(4,3) + min(4,2) = 5 max(min(4,3), min(4,2)) = 3
5 A and (B or C) min(4,3 + 2) = 4 min(4, max(3,2)) = 3
.

The problem with applying the min/plus rule to GPRs is that reactions 4 and 5 are the same (albeit differently bracketed), but Daaaaave assigns them different values. As Nikos pointed out, the min/max rule used by Gimme [2] doesn’t make this mistake. However, I think we really should be adding the activities of alternative catalysts; indeed some networks — such as “Yeast 1” [3] — use separate reactions in place of “or” statements. Any mapping must be robust to equivalent representations.

Let’s step back a bit. Reaction 4 is catalysed by alternative complexes, A:B and A:C.

r4 → (A and B) or (A and C)

There is less of A (4) than the total amount of B (3) and C (2), so there must be some B or C “wasted” when forming the two complexes. There are an infinite number of arrangements here — we could have A:B/A:C = 3/1, 2/2, 2½/1½, … — but their maximum total activity is 4 units. This value of 4 is overestimated by Daaaaave, but underestimated by Gimme.

We can frame our verbal reasoning above mathematically. Each GPR mention of an enzyme across the network is really a separate entity

r4 → (A1 and B1) or (A2 and C1)

that together make up the total enzyme level

A1 + A2 + … = A = 4.

We can substitute “and” relationships by introducing new variables Xi ≥ 0 that represent complexes

r4 → X1 or X2

whose activities can be no more than any of their parts

X1 ≤ A1, X1 ≤ B1.

We can also substitute “or” relationships by introducing new variables Yi that represent alternative catalysts

r4 = Y1

whose activities are the sum of their parts

Y1 = X1 + X2.

Finally, we want there to be as little wastage as possible, and one way to achieve this is through maximising the total activity

maximise: r1 + r2 + ….

This optimisation is an LP problem and can be easily solved for networks of any size. Indeed, running an FBA over the network would be of the same computational complexity. Most importantly, this mapping makes the most of the available data.

References

  1. Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N (2012) “Improving metabolic flux predictions using absolute gene expression data” BMC Systems Biology 6:73.
    doi:10.1186/1752-0509-6-73
  2. Becker SA, Palsson BØ (2008) “Context-specific metabolic networks are consistent with experiments” PLoS Comp Biol 4:e1000082.
    doi:10.1371/journal.pcbi.1000082
  3. Herrgård MJ, Swainston N, Dobson P, Dunn WB, Arga KY, Arvas M, Blüthgen N, Borger S, Costenoble R, Heinemann M, Hucka M, Le Novère N, Li P, Liebermeister W, Mo ML, Oliveira AP, Petranovic D, Pettifer S, Simeonidis E, Smallbone K, Spasić I, Weichart D, Brent R, Broomhead DS, Westerhoff HV, Kirdar B, Penttilä M, Klipp E, Palsson BØ, Sauer U, Oliver SG, Mendes P, Nielsen J, Kell DB (2008) “A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology” Nat Biotechnol 26:1155-1160.
    doi:10.1038/nbt1492

From genes to fluxes

In 2012, we set out a pipeline [1] for using predicting flux through a metabolic network, using genetic or proteomic data.

Two inputs are required:

  • a genome-scale metabolic network containing gene-protein-reaction (GPR) associations, such as human or yeast
  • quantitative estimates of their enzyme levels, via proteomics or absolute transcriptomic data

To predict metabolic flux, two steps are taken:

  • enzyme-level data are combined with GPR associations, to create reaction-level data
  • reaction-level data are constrained by mass-balancing, to create system-level flux data

A number of methods have been proposed to perform this second mapping, maximising the correlation between reaction data and system flux, including Daaaaave [1, developed in Manchester], Gimme [2] and iMat [3]. I won’t discuss their relative merits here, but will instead focus on the first step.

The mapping isn’t quite as easy as it seems, due to the many-to-many relationship between enzymes and metabolic reactions. Suppose we have determined enzyme levels of A = 4, B = 3 and C = 2. How should we interpret their various combinations as GPRs?

reaction GPR level
1 A 4
2 A or B 4 + 3 = 7
3 A and B min(4,3) = 3
4 (A and B) or (A and C) min(4,3) + min(4,2) = 5
5 A and (B or C) min(4,3 + 2) = 4
.

Given the simple mapping in reaction 1, we may use the enzyme level directly. The “or” relationship in reaction 2 allows for alternative catalysts, so its total level is given by the sum of its components. The “and” relationship in reaction 3 means it is catalysed by a complex, whose maximum possible concentration is given by the minimum level of its components. These two min/plus rules may be combined for more complex GPRs such as in reactions 4 and 5.

Here’s the rub: the Boolean logic used in 4 and 5 are the same (just bracketed differently), yet the levels output by the mapping are different. The existing approach is badly-defined and we need to go back to the drawing-board. One option — proposed by Brandon Barker — is to insist that GPRs are always written in a consistent manner. For example, we could use the disjunctive normal form (DNF), meaning that GPRs are expressed as an “or of ands”: a list of alternative complexes, as per reaction 4. I think that having consistency like this is an excellent idea. But I also think that our mapping should be robust to alternative bracketing; I’ll show you how, next time.

References

  1. Lee D, Smallbone K, Dunn WB, Murabito E, Winder CL, Kell DB, Mendes P, Swainston N (2012) “Improving metabolic flux predictions using absolute gene expression data” BMC Systems Biology 6:73.
    doi:10.1186/1752-0509-6-73
  2. Becker SA, Palsson BØ (2008) “Context-specific metabolic networks are consistent with experiments” PLoS Comp Biol 4:e1000082.
    doi:10.1371/journal.pcbi.1000082
  3. Shlomi T, Cabili MN, Herrgård MJ, Palsson BØ, Ruppin E (2008) “Network-based prediction of human tissue-specific metabolism” Nature Biotechnology 26:1003–1010.
    doi:10.1038/nbt.1487

Striking a balance with Recon 2.1

In 2013, after years of work, a comprehensive reconstruction of human metabolism, called “Recon 2“, was released to much fanfare [1].

But, as with any many-cooked project, some errors slipped through. The biggest issue I encountered involves unbalanced reactions and generic metabolites. Here are three reactions from the model:

  1. generic fatty acid R-group transfer from CoA (C21) to DHAP (C3)

This reaction provides a template to be followed by all fatty acid side groups. However, to function as a model, there need to be connections between these generic molecules and their specific counterparts:

  1. specific to generic reaction, converting C14–CoA to R–CoA
  2. generic-to-specific reaction, converting R–CoA to C16–CoA

Taken together, reactions 2 and 3 are elementally-unbalanced, providing a mechanism for generating two carbons for free, and meaning any analyses with Recon 2 examining carbon flux  will be unreliable.

We can, however, modify the network to ensure that all reactions are elementally-balanced. Generic molecules must be replaced by their specific counterparts, creating a new reaction for each possible R-group, and removing any specific-to-generic interconversions:

Given n possible R-groups, there are n3 possible triglycerides. Replacing all generic metabolites as above thus leads to a much larger network, “Recon 2.1”.

Recon 2 Recon 2.1
reactions 7440 71159
variables 5463 13940
.

The number of unique metabolites still falls well short of the forty thousand defined in HMDB, but is a step in the right direction.

The new model is available in SBML format from BioModels.net (MODEL1311110001). A more detailed description (and further tinkerings) can be found on arXiv [2].

References

  1. Thiele I, Swainston N, Fleming RM, Hoppe A, Sahoo S, Aurich MK, Haraldsdottir H, Mo ML, Rolfsson O, Stobbe MD, Thorleifsson SG, Agren R, Bölling C, Bordel S, Chavali AK, Dobson P, Dunn WB, Endler L, Hala D, Hucka M, Hull D, Jameson D, Jamshidi N, Jonsson JJ, Juty N, Keating S, Nookaew I, Le Novère N, Malys N, Mazein A, Papin JA, Price ND, Selkov E Sr, Sigurdsson MI, Simeonidis E, Sonnenschein N, Smallbone K, Sorokin A, van Beek JH, Weichart D, Goryanin I, Nielsen J, Westerhoff HV, Kell DB, Mendes P, Every Man, His Dog, Palsson BØ (2013) “A community-driven global reconstruction of human metabolism” Nature Biotechnology 31:419-425.
    doi:10.1038/nbt.2488
  2. Smallbone K (2013) “Striking a balance with Recon 2.1” arXiv:1311.5696

Surreal goings-on at MMU

Surreal numbers are a construction developed by British mathematician John Conway (who also brought us the Game of Life). They are defined recursively: given two surreal numbers xL less than xR, we can define a new number X = (xL|xR) that bisects the two. Starting with the empty set and using this rule, we can generate all the real numbers, and more.

But, of course, those of you from Manchester knew all this already, as their definition may (surreally) be found on Chester Road, at the entrance to the MMU car park.


X = (x_L | x_R) {\rm ~iff~} \nexists x_R \leq x_L


X \geq Y {\rm ~iff~} \nexists x_R \leq Y \& X \leq y_L
X = Y {\rm ~iff~} X \leq Y \& Y \leq X


X+Y=(x_L+Y,X+y_L | x_R+Y,X+y_R)


\emptyset = {\rm void}
(\emptyset,\emptyset) = 0
(0, \emptyset) = 1
(\emptyset, 0) = -1


(0,1) = \frac{1}{2} \rightarrow \infty
{\rm JHWH}

This final panel is my favourite. It seems to say that starting from nothing (∅) we can produce everything (∞),  thus proving the existence a higher power (JHWH, the Tetragrammaton). In fact it is being deliberately ambiguous; “JHWH” is also how American computer scientist Donald Knuth referred to Conway, writing in 1979:

In the beginning, everything was void, and J.H.W.H.Conway began to create numbers. Conway said, “Let there be two rules which bring forth all numbers large and small. This shall be the first rule: Every number corresponds to two sets of previously created numbers, such that no member of the left set is greater than or equal to any member of the right set. And the second rule shall be this: One number is less than or equal to another number if and only if no member of the first number’s left set is greater than or equal to the second number, and no member of the second number’s right set is less than or equal to the first number.” And Conway examined these two rules he had made, and behold! they were very good.

The mystery remains as to how this mathematical memorial found its way to such an unexpected location. Answers on a postcard.

Thanks to my collaborator and lunch-mate Jon Borresen for pointing out the gates.

In the beginning there was #postaday

The days of heading to your University library to pore over the latest issue of Acta Arithmetica are a thing of the past. It’s a shame, but Generation Z want their research delivered electronically, and they want it delivered immediately. They don’t have the patience for polishing, for peer review, or for paper. For the academic, blogging has a role to play in this new world order, finding a niche in the immediacy:detail spectrum somewhere between twitter (one sentence critiques) and arXiv (unreviewed reports).

I first entered the blogoshpere in 2010. Inspired by gepasi‘s photo-a-day project 365, I embarked on a blogging #postaday challenge. Of course I failed. To write a decent blog post every day, you’d have to be highly imaginative, highly driven, and highly unemployed. I remain none of the above. But I’m back, armed with four years’ material and an 86% less demanding task. Wish me luck!

#postaweek

Enthought and SBML

{First posted 3 January 2012 at u003f.com, saved from oblivion by the internet archive}

A long time ago, I resolved to replace Matlab with python as my programming language of choice. I was recommended the Enthought python distribution (EPD), which has various packages installed around pylab and is free to academics.

I had some issues getting EPD to talk to SBML, until a young man with a weird surname helped me out. The three steps to overcoming your ophidiophobia are:

1. Link EPD’s unusual directory to the normal place …
sudo ln -s /Library/Frameworks/EPD64.framework /Library/Frameworks/Python.framework/

2. … configure libSBML 5.3.0 there …
./configure --with-python=/Library/Frameworks/Python.framework/Versions/Current

3. … and change ~/.bash_profile to
export PATH="/Library/Frameworks/Python.framework/Versions/Current/bin:${PATH}"
export PYTHONPATH=/usr/local/lib/python2.7/site-packages

A python

Wikipubdate

{First posted 13 June 2011 at u003f.com, saved from oblivion by the internet archive}

Some time ago I promised to rewrite the Wikipedia Systems Biology article and I thought I should update you as to how far I’ve got.

Nowhere.

But I’m pleased to announce that other projects are going very well indeed. You see the small, insignificant green plus sign at the top right of the History of Tranmere Rovers F.C. page? I did that. It means Good Article. It’s like getting a B at GCSE Wikipedia authoring. Go me!

The Computational biology wikiproject has really kicked into gear too; a “bunch of geeks” from Hinxton are driving activity, with the backing of the ISCB. Come and get involved!

[24 of 222]

MCA revisited [2]

{First posted 8 June 2011 at u003f.com, saved from oblivion by the internet archive}

As a base example, we’ll use the model presented in [1], a 1986 paper describing METAMOD – modelling software for your BBC micro. How I’d love to have a play with METAMOD now.

The model is called SEQFB, a branched pathway with sequential feedbacks. The program is used to find the model’s steady state for metabolite concentrations

Met Conc Copasi
A 2.2099E1 22.0992
B 4.6395E0 4.63950
C 3.3577E-1 0.335772
D 4.3999E-1 0.439992
E 2.7777E-1 0.277773
F 2.6500E-1 0.264999

and reaction rates.

Enz Rate Copasi
1 1.4261E0 1.42613
2 1.4261E0 1.42613
3 6.9765E-1 0.69753
4 6.9765E-1 0.69753
5 6.9765E-1 0.69753
6 7.2847E-1 0.728472
7 7.2847E-1 0.728472
8 7.2847E-1 0.728472

It’s good to see that loading the model [2] into fancy new software [3] gives identical results. Who needs technology eh? OK, so maybe it runs slightly faster today – half a blink of an eye, rather than just under two minutes.

More tomorrow.

[21 of 222]

References

  1. Hofmeyr JH, & van der Merwe KJ (1986). METAMOD: software for steady-state modelling and control analysis of metabolic pathways on the BBC microcomputer. Computer applications in the biosciences: CABIOS, 2 (4), 243-9 PMID: 3450367
  2. Model v.1 SBML
  3. Copasi http://www.copasi.org/

MCA revisited [1]

{First posted 7 June 2011 at u003f.com, saved from oblivion by the internet archive}

Quite some time ago I promised to spend some time going over Metabolic Control Analysis (MCA). It’s one of the main reasons I decided to do this silly #postaday business, but have been rather hesitant as I just don’t know if what I hope to do is possible.

He who dares, Rodders.

The landmark paper “The control of flux” was written in 1973 by Henrik Kacser and Jim Burns [1], describing how the rates of metabolic pathways were affected by changes in the amounts or activities of pathway enzymes. Their work was given a sound mathematical basis by Christine Reder in 1988 [2].

Given there are excellent webpages devoted to the topic [3,4] along with countless review articles, you might reasonably ask why on earth I’m rerevisiting old ground.

In the paper, Reder sets out criteria under which her analysis holds. Which means of course that there are situations where it doesn’t hold. These are generally ignored – probably because the paper is so “mathsy” – with software tools typically blindly applying the algorithm. I shall show, by example, where the theory breaks down, and (this is where I’m on shaky ground) how it can be tweaked to encompass these cases.

[20 of 222]

References

  1. Kacser H, & Burns JA (1973). The control of flux. Symposia of the Society for Experimental Biology, 27, 65-104 PMID: 4148886
  2. Reder C (1988). Metabolic control theory: a structural approach. Journal of theoretical biology, 135 (2), 175-201 PMID: 3267767
  3. Athel Cornish-Bowden. Metabolic Control Analysis
  4. Pedro Mendes. MCA web