k1lib.mo module¶

This module is quite dope. It essentially allows you to construct, explore and simulate molecules (hence the "mo") quite easily. I suggest opening the official docs in another tab for reference. Let's get started.

Making molecules¶

So the basis for everything is the Atom class. There are several builtin substances:

In [2]:

mo.substances()[:10]

Out[2]:

['H', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Na', 'Mg']

And there are some complicated substances too:

In [3]:

mo.substances()[-6:]

Out[3]:

['glucose', 'cyclohexane', 'benzene', 'adenine', 'ribose', 'adenosine']

You can create a new substance really easily:

In [4]:

c = mo.C; c

Out[4]:

<Atom C (6), 0 bonds, 4/8 valence electrons, 2 electron clouds, 0 free (radical) electrons>

That's it. Remember that each time you're accessing the substance, it'll create a new Atom entirely:

In [5]:

c == mo.C # different objects each time you request one!

Out[5]:

False

For complex substances, it will still return a single Atom, not some other weird data structures:

In [6]:

mo.benzene

Out[6]:

<Atom C (6), 4 bonds, 8/8 valence electrons, 0 electron clouds, 0 free (radical) electrons>

The highlighted circle is the current Atom we're showing the molecule from, and the arrows indicate the backbone's direction. To form a bond, you can do something like this:

Forming new bonds is sort of "safe". This means you don't have to pay too much attention to detail and it will still work. Let's bond a lone C to methane:

As you can see, the methane automatically disconnects 1 Hydrogen, to make place for our bond. This means you can create complex molecules effortlessly. Here's ethanol:

a.bond(b) is really the same as a(b), but a.bond(b) will return b, and a(b) will return a instead. There are several "convenience" methods to create molecules, like mo.alkane() and mo.alcohol():

In [11]:

ethanol.empirical(), mo.alcohol(2).empirical()

Out[11]:

('C2H6O', 'C2H6O')

For complex substances, you can choose not to display Hydrogens, to clear things up a bit:

You can also traverse the molecule quite easily:

So when you call .next(), it will return the next molecule, which is indicated by the arrows. Meaning, if you call .next() on the oxygen, it will return the alpha Carbon, instead of the Hydrogen:

In [14]:

oxygen = ethanol.next().next(); oxygen

Out[14]:

<Atom O (8), 2 bonds, 8/8 valence electrons, 2 electron clouds, 0 free (radical) electrons>

In [15]:

oxygen.next()

Out[15]:

<Atom C (6), 4 bonds, 8/8 valence electrons, 0 electron clouds, 0 free (radical) electrons>

If you really wish to get the Hydrogen, you can do something like this:

In [16]:

oxygen.next(1)

Out[16]:

<Atom H (1), 1 bonds, 2/2 valence electrons, 0 electron clouds, 0 free (radical) electrons>

Let's create tert-butanol:

This all looks fine, however, notice the center carbon points to the CH2OH group instead. This might not be desirable, as you may want to navigate (using .next()) through the main propane chain. So you can do something slightly different:

Now, all the arrows are pointing correctly, so you can think of this molecule as "propane, with methanol attached at 2nd carbon", instead of "propanol, with methane attached at 2nd carbon".

There's also this really dope way to get molecules, and that is by just parsing it:

Experts among you might notice that "perfluoro-2,3-dimethyl-1-chloropropanol" doesn't exactly comply with IUPAC rules, so the good news is that the parser is quite lenient about this.

My parser usually can handle lots of substances, but not all of them. If there's a list of molecules that you'd wish to "just work", you can send me an email at 157239q@gmail.com. But tbh, shouldn't be that hard to construct any molecule that you want.

Simulation¶

Now that you know how to construct molecules, let's talk about how you can simulate and view them. You need to first construct a System:

In [20]:

s = mo.parse("cyclohexane").system(); s

Out[20]:

<k1lib.mo.System at 0x7ff4250b8130>

If you were to plot it right away, it looks terrible (graph in picometers btw):

In [21]:

#%matplotlib widget # uncomment this in a notebook cell, if you want to pan around and look at the molecule in 3d
s.plot();

This is because the atom's position are randomly initialized. So you need to do a short simulation first:

It looks wonderful now. .simulate() method returns a list of locations:

In [24]:

type(xs), type(xs[0]), xs[0].shape

Out[24]:

(list, torch.Tensor, torch.Size([18, 3]))

This is quite useful if you want to see an animation of it:

Notice how you can see cyclohexane in the chair configuration (70% of the time) or the boat configuration (30% of the time). Everytime I run this it's gonna be different, but one of those 2. This sort of indicates that the simulator at least got some stuff right, and that you can rely on it to explore small molecules.

Simulation speed analysis¶

How big of a molecule can the simulator handle? Let's try adenosine, a relatively complex molecule:

In [26]:

%%time
mo.adenosine.system().simulate(1000); mo.adenosine.empirical()

CPU times: user 394 ms, sys: 80 µs, total: 395 ms
Wall time: 394 ms

Out[26]:

'C10H13O5N5'

So, 33-atom molecule, 1000 timesteps should take 250ms. Cool thing is, because the simulator is based on PyTorch, so if you have a GPU, then it can use that too:

In [28]:

%%time
mo.adenosine.system().simulate(1000, cuda=True); pass

CPU times: user 475 ms, sys: 159 µs, total: 475 ms
Wall time: 474 ms

Relatively same speeds. For a small molecule like adenosine, the performance gain isn't worth the overhead of using the GPU. Let's try attaching a bunch of adenosines together:

In [29]:

def ad(): return mo.adenosine.next() # do this cause main atom is the ribose's oxygen, not one of the carbons
c1, c2, c3, c4 = mo.alkane(4).nexts(4)
c1(ad())(ad())(ad())
c2(ad())(ad())
c3(ad())(ad())
c4(ad())(ad())(ad())
c1.empirical()

Out[29]:

'C104H120O50N50'

324-atom molecule, pretty big. This is about 15 amino acid residues btw.

In [30]:

%%time
c1.system().simulate(1000); pass

CPU times: user 42.6 s, sys: 347 ms, total: 43 s
Wall time: 5.55 s

In [31]:

%%time
c1.system().simulate(1000, cuda=True); pass

CPU times: user 3.71 s, sys: 47.7 ms, total: 3.76 s
Wall time: 966 ms

So we increased our atom count by 10x, CPU times also increase 10x, but amazingly, GPU times don't increase at all. So a general rule of thumb is, use CPU if your molecule have less than 30-40 atoms, and use GPU otherwise. However realistically, the simulator is there for small molecules only. It's quite worthless for proteins.