Math makes many people anxious, bored, and unhappy — but even for the math-averse, it has a morbid fascination. For the math-philic, it’s as ever-present and powerful as the Force. This conversation is the collision of one math lover, one math hater, and a biochemistry problem.
I’ve been frustrated by math for as long as I can remember. I switched to a Montessori school halfway through second grade and have really vivid memories of sitting in front of an abacus with zero idea about how to move forward. I remember my teacher waiting for me and then snapping, “Mallory! Come on, it’s not that hard. FOCUS!” I know she was probably exhausted from teaching fifty or so students but her impatience with me found its way inside and I decided, then and there, it was better to pretend to get through the lesson than really understand. I felt humiliated and absolutely certain the deficit was within me. She’d said it “wasn’t that hard,” and if I wasn’t getting it that could only mean one thing: I wasn’t good at math.
That can’t be a comfortable feeling when you’re working in tech, right?
Not at all. I’m confronted pretty regularly with my “bad at math” narrative. I’m surrounded by people who are adept at math, logic, etc — analytical thinking of any kind, really. I wanna be clear that math and logic aren’t the only ways to think analytically; and analytical thinking certainly isn’t the source of self-worth; but I would like the key to enjoying math. I’ve had moments of it but I feel left out and discouraged when I don’t understand it regularly. I feel like a lot of cool questions get answered with math and I don’t like missing out on that experience.
Math can be just as exhilarating as it can be frustrating. In a hunt for the exhilarating parts, let’s attack a problem that I faced on my very first day on the job as a programmer in a biochemistry lab; and even though it’s going to start out sounding like chemistry, its going to end up looking like math.
Ok. I think the only way that seems to work for me is the “Toddler Method.” Basically, I’m just gonna keep asking “why?” like a 3 year old until you say something that makes sense to me. I think most people feel too vulnerable to continuously admit they don’t understand something yet — I know it’s really difficult for me — so I’m sure things are gonna get awkward. You ready for that?
Ok, explain the problem.
Alright, it has to do with RNA folding.
OK, I have to ask you a question already: What exactly is RNA again? I don’t remember the specific differences between RNA and DNA.
RNA, like DNA, is one of the three main bio-molecules that make up all known forms of life. Every living creature or organism has RNA. Unlike DNA, which is one long twisty helix, RNA can fold back on itself and have complicated shapes.
Got it, got it. Thanks. Continue.
RNA is a long strand of repeating parts — like beads on a string. There are four kinds of beads — and we give them single-letter identifiers:
Each bead is called a “base”. The string is flexible; and you can bend it around so that two bases from different parts of the strand touch each other. If the circumstances are right, they can click together and form a “base pair”. Mostly,
A pairs with
G pairs with
G can pair with
Here’s a strand of RNA to play with. If you click on a G and then a C (for instance) they’ll form a pair. You can make as many pairs as you like, as long as it’s
GU, and there are at least three bases in-between (the strand can only bend so much). You can break a pair by double clicking on either of the bases in it:
RNA frequently forms many consecutive pairs. A group of consecutive pairs are called a helix (because the pairs twist gently in 3 dimensions when this happens; we won’t show that twisting, but it’s what makes it a helix). Here’s a helix that’s 13 pairs long:
It’s easy to find out the sequence of a bit of RNA. Sometimes we can find out some information about the final shape experimentally — but the longer the strand gets, the more staggeringly large the number of possible shapes. Given just the sequence it’s incredibly hard to figure out exactly which bases pair in the real world — so it’s very hard to guess the final shape.
Roger that. I’m following you so far. Can you explain more of the problem?
We’ve got a particular bit of RNA, that’s just over 1000 bases long. Here’s the sequence:
And we know it folds up into something like an icosahedron:
We’re pretty sure our sequence makes two helices on each edge. 30 edges means 60 helices; and we know that each helix needs to be around 9 pairs long.
Wait, how do we know that it’s shaped like an icosahedron? Is this just a given?
No, not at all. It took scientists some time to come to this conclusion (using a technique called “X-ray crystallography”) and we only know because they wrote papers about it.
Gotcha. I just wanted to make sure it wasn’t information I was assumed to have. I remember math, or math class at least, always being a bit sneaky like that. There always seemed to be a bit of secret knowledge I needed in order to complete the problem and it was usually a theorem from 4 years prior I hadn’t thought about since. Math class always assumed a fluency I didn’t really have.
I totally understand what you mean. In this case, the information was given to us by the research team so we had it outright. So we knew something about the shape overall.
What we didn’t know was which bases ended up where. We didn’t know how to fold the sequence — the string of 1000 bases — up into that final shape.
How do you even find it if you don’t know exactly what it looks like?
I don’t know, to be honest. That’s the sort of problem the wet-lab chemists solved before the problem ever arrived on my plate. We can look it up, if you want.
Do we need it to understand the problem you had to solve?
I don’t think so. What’s important is that we know something about the overall shape of this RNA, but we don’t know the details.
Ok, so we’re a little in the dark. Can you explain exactly what we’re trying to solve?
So the problem is this: can we find a way for the sequence to fold that includes 60 perfect helices of 9 pairs each?
It turns out to be pretty easy to make a list of ALL the possible perfect 9 base pair helices in this sequence. There are around 300 of them, if I remember correctly. But most of them conflict with one another — they use the same bases to make different helices. A given base can pair with only one other base at a time; so some helices are mutually exclusive. You can make one, or the other, but not both.
Can you be clearer about what helices are exactly?
When a bunch of pairs stack up like a ladder, one right after another, in 3D they twist a little bit — it makes a shape sort of like a spiral staircase, or a spiral pasta. That’s very structurally stable. If there’s a mismatch somewhere in the ladder of pairs, you get a bulge. When you look at folded RNA, a lot of what you talk about is the number, length, and composition of the helices. If two helices each want to make use of the same base, they’re mutually exclusive.
Ok, makes sense.
So in that list of 300 possible perfect helices, the question is: are there 60 that can exist at the same time? We need 60 helices to build that icosahedron shape. But can they all be perfect? The answer might be no — it might be that, in reality, some of the helices are too short, have a bulge, or are missing completely. But first we want to know: could it be done perfectly?
OK, I think I need something hands on. I’m having trouble picturing exactly what you mean.
Smart! OK, let’s play with string and sticky notes and build some physical toys in order to better understand the problem.
So that’s the question: How do we find a set of 60 helices that don’t conflict?
One way a programmer might approach the problem is to say “what’s the stupidest way to do this?”
Well, I guess you could just try 60 after 60 after 60…
Right! We could try every single combination of 60, and wait for one that worked. And here’s a moment that a little math can help — how many different combinations of 60 would we have to try?
Oh god. Hmmmmm. I guess I’m going to google words you’ve said?
That is a valid technique, and one that we’ll use again before we’re done.
So is… is “Combinatorial mathematics” what we want?
Bingo! That’s the field of mathematics, and the particular operator we’re after is called “choose” — the number of different combinations of 60 out of a choice of 300 is “300 choose 60”. There’s a big equation for it
But who cares? If you type “300 choose 60” into google and the answer pops right out. If you know the right language, you can often skip a lot of the symbol-manipulating algebra-and-arithmetic part.
So. What’s the answer? How many combinations of 60 are there from our 300 helices?
Google says: slightly more than 9,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000
How big was your budget? Just kidding, that’s too many.
It’s way too many.
Math has already saved us some embarrassment — if we’d written code to solve the problem this way, the Earth would be swallowed by the Sun before we’d get an answer.
So what do we do?
We translate. Math is made up of a lot of languages; and if we pick the right language, we can turn our very specific problem into a totally abstract, general one — one that (hopefully) someone else has already solved.
I’m going to cheat and skip ahead a bit — because you don’t have lots of mathematical language already banging around in your head, I’m going to give you the language but have you do the translation.
In this particular case, the language we want is graph theory. Graphs are made of nodes that are connected by edges:
So lets make each helix a node; and we’ll connect helices that can coexist — so we’ll draw a line between any two helices if they don’t share any bases.
So the first question is: how can we tell, just by looking at a graph, if we can fold all the helices at the same time?
[NOTE: We went back and forth for a while, and drew some pictures on a whiteboard, and then Mallory hazarded a guess.]
Every… every helix — would be connected to every other helix?
Right! But use the graph theory words — say node, not helix!
Every node would be connected to every other node?
Yes. And there’s a phrase for that: it’s called a “completely connected” graph.
So what if we drew the whole set of 300 helices as a graph?
Well, it wouldn’t be “completely connected”. And there’d be too many helices — we only want 60.
Right. But we also have a word for parts of graphs. If you only look at some of the nodes in a big graph, that’s a “subgraph”.
And now you have all the words you need.
Remember, we’re no longer asking how to solve the question — just how to translate our original question into this new language. Don’t say helix — say node.
Ok, ok… What was our original question?
“Can we fold this sequence of RNA in such a way that there are 60 helices of 9 pairs each?”
We’ve already modified it some, to make it “Is there any group of 60 of these 300 helices that don’t overlap?”
So how do we take that question and say it using only graph theory words?
… Can we get 60… nodes that don’t overlap?
Right, but there’s no ‘overlapping’ here (in graph theory)…
Right. Can we get 60 nodes that are…connected?
Yep, they are connected. How are they connected?
[wipes the sweat from her brow] Completely connected?
Ok, we’re almost there; those 60 nodes are part of a larger graph of 300…
Right. Can we get a… subgraph of 60 nodes that are completely connected?
Yaaaaaaaaay. That brings me lots of joy. I like math.
[It’s worth listening to Mallory working this out. Hear the thrill of triumph:]
So let’s take one more step — there’s still this ‘60’ here, that’s very specific to our problem. Let’s just ask for the ‘largest completely connected subgraph’ — if it’s got more than 60 nodes, we win! And if it’s got less than 60 nodes, we know that 60 is impossible.
And that’s it. We now have a universal way of describing our problem: “what’s the largest completely connected subgraph?”. And if you google that…
With no specifics?
With NO specifics at all. That’s what this whole journey was about. Stripping away all the specifics so that we can ask abstractly about all problems that are shaped like this one.
And look at the second result — It tells us that “maximal clique” is a phrase that means “the largest completely-connected subgraph.” Our whole problem reduced to just two words! And look, down even further in that article is a link to actual pseudocode for solving the problem.
We went from no ideas to a complete solution — just by translating our question into the right language. And translating back is easy: remember, if the maximal clique has 60 or more nodes, that means there can be 60 perfect helices all folded on the same strand at the same time. And if it’s got less than 60 nodes, we know that 60 is impossible.
[MAL and SAM high five]
That was fun. It reminded me of the rush you get the first time you hear a foreign language and know what they’re saying.
[Note: it still took some considerable effort to actually get results for this problem; the algorithm we found is 30 orders of magnitude more efficient than the naive approach, but that’s still not quite fast enough to get answers in a reasonable period when the graph has 300+ nodes.]
To me, the real miracle is that Bron and Kerbosch, who created the algorithm we found, had no idea about RNA, or helices, or anything else about our problem. They were solving what looked like a completely different problem. But because they could use the language of mathematics, and we can use the language of mathematics, we were able to transform out problem into theirs.
It honestly gives me shivers.
So let me ask you a question: if you didn’t know to do this, to use the language of mathematics — would you have been able to solve the problem?
Oh, probably. Eventually, you could. But you’d have to be utterly brilliant, and it would take a lot of time, and you might never be certain it was correct. If you don’t have the ability to prove it afterwards, you just think you’re right.
And you knew all this language from classes you took?
Oh, some; but remember, I studied fine art, mostly. I got lucky — math happens to be a place where I’m not afraid to say “I don’t know” — and that makes it so much easier to learn. As a culture we’re pretty good at getting people scared of not knowing, and math is worse than many subjects because we’re taught that there’s one Right Answer and no chance to explore.
I asked questions, and poked around just long enough to understand the vocabulary. I wasn’t proving theorems, I wasn’t trying to be a great mathematician — I was just trying to understand the language so that I could ask great mathematicians the right questions.
It seems like math might be even more important at a consultancy.
Yeah, translating between problem domains is something we’re doing all the time — seeing the way one client’s problem is similar to another’s, recognizing when we can save a client’s time and money by applying existing solutions to “new” problems.
Translating your problem into mathematical language forces you to strip away everything but the very core; it often lets you express the problem in a single sentence, or — like in our case — two words.
It’s not always graph theory; there are lots of branches of math to sift through to on a problem by problem basis. But translation is always at the heart of it.
Ok, so now I’m mad.
Yeah, I’m really angry that math was never presented to me in this way. I wish my classes had been more interesting and that I hadn’t wasted so much time believing I was bad. I’m not bad at math; I’m just not practiced. I’d like to quit saying “math is hard” or “math is overvalued” (which, at times, it is) and just work at breaking down the stereotypes around it so I can take a full seat at the table.
If math and science graduates are so essential to the economy then we better step up our freaking game. Our STEM educations are so damn boring. They’re designed for those who are already inclined, not for those who are learning to be interested. Math is fascinating but no one SHOWED me that when I was in school; or at least not it a way that was very convincing.