trace cli¶

This is essentially a demo on a single cli tool, trace. It allows you to inspect the stream at every point around the (likely tangled) cli operations. Let's get into it.

In [1]:
from k1lib.imports import *

Let's start simple. Say you have an operation that calculates square of a list of numbers, like this:

In [2]:
range(1, 5) | apply(op()**2) | deref()
Out[2]:
[1, 4, 9, 16]

For more complex operations, it can be easy to lose track of what's what, and how the stream should look like at a particular point. So, you can just slide a trace() in the middle:

In [3]:
range(1, 5) | trace() | apply(lambda x: x**2) | deref()
TD_0 <start> TD_1 apply TD_0->TD_1 (4,) TD_2 deref TD_1->TD_2 (4,) TD_3 <end> TD_2->TD_3 (4,)
Out[3]:
<trace object>

With no arguments, this will get the shape of the stream. You can of course, pass in other displaying functions that you like. I typically just use iden():

In [4]:
range(1, 5) | trace(iden()) | apply(lambda x: x**2)
TD_4 <start> TD_5 apply TD_4->TD_5 [1, 2, 3, 4] TD_6 <end> TD_5->TD_6 [1, 4, 9, 16]
Out[4]:
<trace object>

Notice how I left out the last deref(). It's because it's only there to display the final value to us. But we already got that data from trace(), so we don't have to anymore. Let's transform it one step further:

In [5]:
range(1, 5) | trace(iden()) | apply(lambda x: x**2) | apply(lambda x: x - 6)
TD_7 <start> TD_8 apply TD_7->TD_8 [1, 2, 3, 4] TD_9 apply TD_8->TD_9 [1, 4, 9, 16] TD_10 <end> TD_9->TD_10 [-5, -2, 3, 10]
Out[5]:
<trace object>

Sweet. Let's combine the operations first, before feeding it input:

In [6]:
range(1, 5) | trace(iden()) | (apply(lambda x: x**2) | apply(lambda x: x - 6)) | apply(lambda x: x*3)
cluster_0 |, serial TD_11 <start> TD_13 apply TD_11->TD_13 [1, 2, 3, 4] TD_14 apply TD_13->TD_14 [1, 4, 9, 16] TD_15 apply TD_14->TD_15 [-5, -2, 3, 10] TD_16 <end> TD_15->TD_16 [-15, -6, 9, 30]
Out[6]:
<trace object>

It knows, and automatically box the operations in. You can also put trace() wherever you like:

In [7]:
range(1, 5) | (apply(lambda x: x**2) | apply(lambda x: x - 6)) | trace(iden()) | apply(lambda x: x*3)
TD_17 <start> TD_18 apply TD_17->TD_18 [-5, -2, 3, 10] TD_19 <end> TD_18->TD_19 [-15, -6, 9, 30]
Out[7]:
<trace object>

You can go wild with this. This is an example with .all(), |, and op():

In [8]:
range(1,5) | trace(iden()) | (op()*2 | repeat(3)).all()
cluster_1 .all(), manyToMany, apply cluster_2 |, serial TD_20 <start> TD_22 * TD_20->TD_22 [1, 2, 3, 4] TD_24 op TD_22->TD_24 1 TD_25 repeat TD_24->TD_25 2 TD_26 * TD_25->TD_26 [2, 2, 2] TD_27 <end> TD_26->TD_27 [[2, 2, 2], [4, 4, 4], [6, 6, 6], [8, 8, 8]]
Out[8]:
<trace object>

You can limit the depth of the graph, like this:

In [9]:
range(1,5) | trace(iden(), 1) | (op()*2 | repeat(3)).all()
cluster_3 .all(), manyToMany, apply TD_28 <start> TD_30 * TD_28->TD_30 [1, 2, 3, 4] TD_31 serial TD_30->TD_31 1 TD_32 * TD_31->TD_32 [2, 2, 2] TD_33 <end> TD_32->TD_33 [[2, 2, 2], [4, 4, 4], [6, 6, 6], [8, 8, 8]]
Out[9]:
<trace object>
In [10]:
range(1,5) | trace(iden(), 0) | (op()*2 | repeat(3)).all()
TD_34 <start> TD_35 manyToMany TD_34->TD_35 [1, 2, 3, 4] TD_36 <end> TD_35->TD_36 [[2, 2, 2], [4, 4, 4], [6, 6, 6], [8, 8, 8]]
Out[10]:
<trace object>

If the graphs are too small/big, you can adjust the settings, like this:

In [11]:
settings.svgScale = 0.7 # times graphviz's default size. Defaulted to 0.7

Let's see some more complex examples:

In [12]:
range(1,5) | trace(iden()) | (shape() & (op()*2 | op()**2).all())
cluster_4 &, oneToMany cluster_5 .all(), manyToMany, apply cluster_6 |, serial TD_37 <start> TD_39 * TD_37->TD_39 [1, 2, 3, 4] TD_41 size TD_39->TD_41 [1, 2, 3, 4] TD_43 * TD_39->TD_43 [1, 2, 3, 4] TD_40 * TD_41->TD_40 [4] TD_48 <end> TD_40->TD_48 [[4], [4, 16, 36, 64]] TD_45 op TD_43->TD_45 1 TD_47 * TD_47->TD_40 [4, 16, 36, 64] TD_46 op TD_45->TD_46 2 TD_46->TD_47 4
Out[12]:
<trace object>
In [13]:
[range(1, 5), range(3, 7)] | trace(iden()) | (filt(op() % 2 == 0) + filt(op() % 2 == 1))
cluster_7 +, manyToManySpecific cluster_8 filt (column: None) cluster_9 filt (column: None) TD_49 <start> TD_51 * TD_49->TD_51 [[1, 2, 3, 4], [3, 4, 5, 6]] TD_53 filt TD_51->TD_53 [1, 2, 3, 4] TD_55 filt TD_51->TD_55 [3, 4, 5, 6] TD_52 * TD_53->TD_52 [2, 4] TD_54 op TD_53->TD_54 1 TD_57 <end> TD_52->TD_57 [[2, 4], [3, 5]] TD_55->TD_52 [3, 5] TD_56 op TD_55->TD_56 3 TD_54->TD_53 False TD_56->TD_55 True
Out[13]:
<trace object>
In [14]:
[range(1, 5), range(3, 7)] | trace(iden()) | ((shape() & (op()**2).all())).all() | deref()
cluster_10 .all(), manyToMany, apply cluster_11 &, oneToMany cluster_12 .all(), manyToMany, apply TD_58 <start> TD_60 * TD_58->TD_60 [[1, 2, 3, 4], [3, 4, 5, 6]] TD_62 * TD_60->TD_62 [1, 2, 3, 4] TD_64 size TD_62->TD_64 [1, 2, 3, 4] TD_66 * TD_62->TD_66 [1, 2, 3, 4] TD_63 * TD_64->TD_63 [4] TD_69 * TD_63->TD_69 [[4], [1, 4, 9, 16]] TD_67 op TD_66->TD_67 1 TD_68 * TD_68->TD_63 [1, 4, 9, 16] TD_67->TD_68 1 TD_70 deref TD_69->TD_70 [[[4], [1, 4, 9, 16]], [[4], [9, 16, 25, 36]]] TD_71 <end> TD_70->TD_71 [[[4], [1, 4, 9, 16]], [[4], [9, 16, 25, 36]]]
Out[14]:
<trace object>
In [15]:
a = torch.randn(2, 3, 4) | deref(igT=False)
a | trace() | transpose() | unsqueeze(2) | deref() | toTensor() | op().squeeze()
cluster_13 .all(), manyToMany, apply cluster_14 .all(), manyToMany, apply TD_72 <start> TD_73 transpose TD_72->TD_73 (2, 3, 4) TD_75 * TD_73->TD_75 (3, 2, 4) TD_77 * TD_75->TD_77 (2, 4) TD_78 wrapList TD_77->TD_78 (4,) TD_79 * TD_78->TD_79 (1, 4) TD_80 * TD_79->TD_80 (2, 1, 4) TD_81 deref TD_80->TD_81 (3, 2, 1, 4) TD_82 toTensor TD_81->TD_82 (3, 2, 1, 4) TD_83 op TD_82->TD_83 (3, 2, 1, 4) TD_84 <end> TD_83->TD_84 (3, 2, 4)
Out[15]:
<trace object>

As you can see, this can be extremely useful in debugging stuff.

Infinity and nested traces¶

The way trace() works is by deref-ing every part of the stream. This means you can't really handle infinite streams. There is a workaround for this though. Say we have this infinite stream:

In [16]:
range(1, 5) | repeatFrom() | apply(op()**2) | head(10) | deref()
Out[16]:
[1, 4, 9, 16, 1, 4, 9, 16, 1, 4]

So, we can set what infinity means inside settings.cli, which will be picked up by clis that can potentially produce infinite streams and limit them:

In [17]:
with settings.cli.context(inf=21):
    range(1, 5) | trace() | repeatFrom() | apply(lambda x: x**2) | head(10) | deref()

Because trace objects only displays their graph through calling __repr__, the code above wouldn't display anything because it's inside a block. To get the graph, do this:

In [18]:
trace.last
TD_85 <start> TD_86 repeatFrom TD_85->TD_86 (4,) TD_87 apply TD_86->TD_87 (84,) TD_88 head TD_87->TD_88 (84,) TD_89 deref TD_88->TD_89 (10,) TD_90 <end> TD_89->TD_90 (10,)
Out[18]:
<trace object>

This is sort of messy I get it, but there doesn't seem to be a way to robustly track infinite streams.

You can also put trace() inside of a relatively complex block. Let's grab an example from before:

In [19]:
range(1,5) | trace(None) | (shape() & (op()*2 | op()**2).all()) | deref()
cluster_15 &, oneToMany cluster_16 .all(), manyToMany, apply cluster_17 |, serial TD_91 <start> TD_93 * TD_91->TD_93 [1, 2, 3, 4] TD_95 size TD_93->TD_95 [1, 2, 3, 4] TD_97 * TD_93->TD_97 [1, 2, 3, 4] TD_94 * TD_95->TD_94 [4] TD_102 deref TD_94->TD_102 [[4], [4, 16, 36, 64]] TD_99 op TD_97->TD_99 1 TD_101 * TD_101->TD_94 [4, 16, 36, 64] TD_100 op TD_99->TD_100 2 TD_100->TD_101 4 TD_103 <end> TD_102->TD_103 [[4], [4, 16, 36, 64]]
Out[19]:
<trace object>

Now let's try to intercept in the middle:

In [20]:
range(1,5) | (shape() & (trace(None) | op()*2 | op()**2).all()) | deref()
TD_104 <start> TD_105 op TD_104->TD_105 1 TD_106 op TD_105->TD_106 2 TD_110 <end> TD_106->TD_110 4
Out[20]:
[[4], [<trace object>, 16, 36, 64]]

It works! If for some reason, it errors out, you can always do trace.last to see the best trace attempt. This is rare and I haven't been able to make it errors out, but logic says that it can in weird circumstances. Also, unlike trace() at the top level, notice how nested trace() are a part of the derefed object? This means that without actually forcing the execution to happen, you won't have any trace:

In [21]:
range(1,5) | (shape() & (trace(None) | op()*2 | op()**2).all())
Out[21]:
<generator object oneToMany.__ror__ at 0x7f0007f4fd60>
In [22]:
trace.last
TD_111 <start> TD_112 <end> TD_111->TD_112 None
Out[22]:
<trace object>

Therefore, it's my recommendation to always do deref(), ignore the output (because it can potentially be long), then see the last trace:

In [23]:
range(1,5) | (shape() & (trace(None) | op()*2 | op()**2).all()) | deref(); trace.last
TD_113 <start> TD_114 op TD_113->TD_114 1 TD_115 op TD_114->TD_115 2 TD_119 <end> TD_115->TD_119 4
Out[23]:
<trace object>

Also very surprisingly, this works too, while I reason it shouldn't really work!

In [24]:
[torch.randn(2, 3), torch.randn(4, 5)] | ((trace(None) | op().shape) + iden()) | deref()
TD_120 <start> TD_121 op TD_120->TD_121 tensor([[ 1.6021,  0.1544, -1.0622],        [ 2.2491, -0.0042, -0.1873]]) TD_122 <end> TD_121->TD_122 [2, 3]
Out[24]:
[<trace object>,
 tensor([[-0.4524, -0.7593,  0.6081,  0.3310,  0.6469],
         [ 1.0787,  0.2763, -0.3129, -0.9712,  0.4768],
         [ 1.0288, -0.7199, -0.2076,  0.3306, -1.5635],
         [-1.1242,  1.6919,  0.7447, -0.1250, -0.1216]])]

No idea how that works, but nice

Gotchas¶

There aren't any other gotchas, as far as I'm aware of. Sometimes the clis will be replaced by a version that is guaranteed to return the same result, but are slightly different, to make tracing code simpler. Examples may include applyMp changing into apply (will not work if trace() is nested inside of applyMp tho, and there be dragons if you try to do so). But you need not worry about this too much.

In [ ]: