k1lib.cli module¶

The main idea of this package is to emulate the terminal (hence “cli”, or “command line interface”), but doing all of that inside Python itself. So this bash statement:

cat file.txt | head -5 > headerFile.txt

Turns into this statement:

cat("file.txt") | head(5) > file("headerFile.txt")

You can even integrate with existing shell commands:

ls("~") | cmd("grep so")

Here, “ls” will list out files inside the home directory, then pipes it into regular grep on linux, which is then piped back into Python as a list of strings. So it’s equivalent to this bash statement:

ls | grep so

“cat”, “head”, “file”, “ls” and “cmd” are all classes extended from BaseCli. All of them implements the “reverse or” operation, or __ror__. So essentially, these 2 statements are equivalent:

3 | obj
obj.__ror__(3)

Also, a lot of these tools (like apply and filt) assume that we are operating on a table. So this table:

col1	col2	col3
1	2	3
4	5	6

Is equivalent to this list:

[["col1", "col2", "col3"], [1, 2, 3], [4, 5, 6]]

transpose and mtmS provides more flexible ways to transform a table structure (but usually involves more code).

Also, the expected way to use these tools is to import everything directly into the current environment, like this:

from k1lib.imports import *

If you just want clis without other baggage, you can do this:

from k1lib.cli import *

Because there are a lot of clis, you may sometimes unintentionally overwrite an exposed cli tool. No worries, every tool is also under the cli object, meaning you can use deref() or cli.deref().

Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. If this interests you, check over this:

All cli tools should work fine with torch.Tensor, numpy.ndarray and pandas.core.series.Series, but k1lib actually modifies Numpy arrays and Pandas series deep down for it to work. This means that you can still do normal bitwise or with a numpy float value, and they work fine in all regression tests that I have, but you might encounter strange bugs. You can disable it manually by changing settings.startup.or_patch. If you chooses to do this, you have to be careful and use these workarounds:

# returns (2, 3, 5), works fine
torch.randn(2, 3, 5) | shape()
# will not work, returns weird numpy array of shape (2, 3, 5)
np.random.randn(2, 3, 5) | shape()
# returns (2, 3, 5), mitigation strategy #1
shape()(np.random.randn(2, 3, 5))
# returns (2, 3, 5), mitigation strategy #2
[np.random.randn(2, 3, 5)] | (item() | shape())

All cli-related settings are at settings.cli.

Where to start?¶

Core clis include:

apply, aS, op, grep
filt, head, rows, cut
deref, item, shape
transpose, joinStreams, batched, count
cat(), ls(), file, stdout

Then other important, not necessarily core clis include:

applyMp, sort, randomize
wrapList, ignore, cmd
repeat and friends, groupBy

So, start reading over what these do first, as you can pretty much 95% utilize everything the cli workflow has to offer with those alone. Then skim over basic conversions in module conv. While you’re doing that, checkout trace(), for a quite powerful debugging tool.

There are several written tutorials about cli here, and I also made some video tutorials as well, so go check those out.

For every example in the tutorials that you found, you might find it useful to follow the following debugging steps, to see how everything works:

# assume there's this piece of code:
A | B | C | D
# do this instead:
A | deref()
# once you understand it, do this:
A | B | deref()

# assume there's this piece of code:
A | B.all() | C
# do this instead:
A | item() | B | deref()
# once you understand it, you can move on:
A | B.all() | deref()

# assume there's this piece of code:
A | (B & C)
# do this instead:
A | B | deref()

# assume there's this piece of code:
A | (B + C)
# do these instead:
A | deref() | op()[0] | B | deref()
A | deref() | op()[1] | C | dereF()
# there are alternatives to that:
A | item() | B | deref()
A | rows(1) | item() | C | deref()

Finally, you can read over the summary below, see what catches your eye and check that cli out.

Summary¶

structural	conv	utils	typehint	filt
`transpose`	`toStr`	`size`	`tBase`	`filt`
`reshape`	`toTensor`	`shape`	`tAny`	`inSet()`
`insert`	`toList`	`item`	`tList`	`contains()`
`splitW`	`toSet`	`iden`	`tIter`	`empty`
`joinStreams`	`toIter`	`join`	`tSet`	`isNumeric()`
`joinStreamsRandom`	`toRange`	`wrapList`	`tCollection`	`instanceOf()`
`activeSamples`	`toSum`	`equals`	`tExpand`	`head`
`table()`	`toProd`	`reverse`	`tNpArray`	`tail()`
`batched`	`toAvg`	`ignore`	`tTensor`	`columns`
`window`	`toMean`	`rateLimit`	`tListIterSet()`	`cut`
`groupBy`	`toMax`	`timeLimit`	`tListSet()`	`rows`
`insertColumn`	`toMin`	`tab()`	`tListIter()`	`intersection`
`insertIdColumn()`	`toPIL`	`indent()`	`tArrayTypes()`	`union`
`expandE`	`toImg`	`clipboard`	`inferType()`	`unique`
`unsqueeze()`	`toRgb`	`deref`	`TypeHintException`	`breakIf`
`count`	`toRgba`	`bindec`	`tLowest()`	`mask`
`permute`	`toBin`	`smooth`	`tCheck`
`accumulate`	`toIdx`	`disassemble()`	`tOpt`
`AA_`	`toDict`	`tree()`
`peek`	`toDictF`	`lookup`
`peekF`	`toFloat`	`dictFields`
`repeat`	`toInt`
`repeatF()`
`repeatFrom`
`oneHot`

modifier	init	output	inp	kxml
`applyS`	`BaseCli`	`stdout`	`cat()`	`node`
`aS`	`Table`	`tee`	`curl()`	`maxDepth`
`apply`	`T()`	`file`	`wget()`	`tags`
`applyMp`	`fastF()`	`pretty`	`ls()`	`pretty`
`parallel`	`yieldT()`	`display()`	`cmd`	`display`
`applyTh`	`serial`	`headOut()`	`requireCli()`
`applySerial`	`oneToMany`	`intercept`
`sort`	`mtmS`	`split`
`sortF`
`consume`
`randomize`
`stagger`
`op`
`integrate`

nb	grep	kcsv	trace	optimizations
`cells()`	`grep`	`cat()`	`trace`	`dummy()`
`pretty`	`grepTemplate`
`execute`

Biology-related clis¶

I separated these out because they might not be interesting to the majority of users.

bio	sam	gb	cif	mgi
`go()`	`cat()`	`feats`	`tables()`	`batch`
`quality()`	`header`	`origin`
`longFa()`	`flag`
`idx`
`transcribe`
`complement`
`translate`
`medAa`
`longAa`

bio module¶

This is for functions that are actually biology-related

k1lib.cli.bio.go(term: int)[source]¶: Looks up a GO term

k1lib.cli.bio.quality(log=True)[source]¶

Get numeric quality of sequence. Example:

# returns [2, 2, 5, 30]
"##&?" | quality() | deref()

Parameters: log – whether to use log scale (0 -> 40), or linear scale (1 -> 0.0001)

k1lib.cli.bio.longFa()[source]¶

Takes in a fasta file and put each sequence on 1 line. File “gene.fa”:

>AF086833.2 Ebola virus - Mayinga, Zaire, 1976, complete genome
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATC
TTTTGTGTGCGAATAACTATGAGGAAGATTAATAA
>something other gene
CGGACACACAAAAAGAAAGAAGA
TTTTGTGTGCGAATAACTATGAG

Code:

cat("gene.fa") | bio.longFa() | cli.headOut()

Prints out:

>AF086833.2 Ebola virus - Mayinga, Zaire, 1976, complete genome
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA
>something other gene
CGGACACACAAAAAGAAAGAAGATTTTGTGTGCGAATAACTATGAG

class k1lib.cli.bio.idx(fs: list = [])[source]¶

k1lib.cli module¶

Where to start?¶

Summary¶

Biology-related clis¶

bio module¶

cif module¶

conv module¶

mgi module¶

filt module¶

gb module¶

grep module¶

init module¶

inp module¶

kcsv module¶

kxml module¶

modifier module¶

nb module¶

output module¶

sam module¶

structural module¶

trace module¶

utils module¶

typehint module¶

optimizations module¶

others module¶

Elsewhere in the library¶