k1lib.cli module¶

The main idea of this package is to emulate the terminal (hence “cli”, or “command line interface”), but doing all of that inside Python itself. So this bash statement:

cat file.txt | head -5 > headerFile.txt

Turns into this statement:

cat("file.txt") | head(5) > file("headerFile.txt")

You can even integrate with existing shell commands:

ls("~") | cmd("grep so")

Here, “ls” will list out files inside the home directory, then pipes it into regular grep on linux, which is then piped back into Python as a list of strings. So it’s equivalent to this bash statement:

ls | grep so

“cat”, “head”, “file”, “ls” and “cmd” are all classes extended from BaseCli. All of them implements the “reverse or” operation, or __ror__. So essentially, these 2 statements are equivalent:

3 | obj
obj.__ror__(3)

Also, a lot of these tools (like apply and filt) assume that we are operating on a table. So this table:

col1	col2	col3
1	2	3
4	5	6

Is equivalent to this list:

[["col1", "col2", "col3"], [1, 2, 3], [4, 5, 6]]

transpose and mtmS provides more flexible ways to transform a table structure (but usually involves more code).

Also, the expected way to use these tools is to import everything directly into the current environment, like this:

from k1lib.imports import *

Because there are a lot of clis, you may sometimes unintentionally overwrite an exposed cli tool. No worries, every tool is also under the cli object, meaning you can use deref() or cli.deref().

Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. If this interests you, check over this:

Streams tutorial

All cli tools should work fine with torch.Tensor, numpy.ndarray and pandas.core.series.Series, but k1lib actually modifies Numpy arrays and Pandas series deep down for it to work. This means that although you can still do normal bitwise or with a numpy float value, and they work fine in all regression tests that I have, but you might encounter strange bugs. You can disable it manually by changing settings.startup.or_patch. If you chooses to do this, you have to becareful and use these workarounds:

# returns (2, 3, 5), works fine
torch.randn(2, 3, 5) | shape()
# will not work, returns weird numpy array of shape (2, 3, 5)
np.random.randn(2, 3, 5) | shape()
# returns (2, 3, 5), mitigation strategy #1
shape()(np.random.randn(2, 3, 5))
# returns (2, 3, 5), mitigation strategy #2
[np.random.randn(2, 3, 5)] | (item() | shape())

All cli-related settings are at settings.cli.

Where to start?¶

Core clis include apply, applyS (its multiprocessing cousin applyMp is great too), op, filt, deref, item, shape, iden, cmd, so start reading there first. Then, skim over everything to know what you can do with these collection of tools. While you’re doing that, checkout trace(), for a quite powerful debugging tool.

There are several written tutorials about cli here, and I also made some video tutorials as well, so go check those out.

Summary¶

structural	conv	utils	filt	modifier
`transpose`	`toStr`	`size`	`filt`	`applyS`
`reshape`	`toNumpy`	`shape`	`inSet()`	`aS`
`insert`	`toTensor`	`item`	`contains()`	`apply`
`splitW`	`toList`	`iden`	`empty`	`applyMp`
`joinStreams`	`toSet`	`join`	`isNumeric()`	`parallel`
`yieldSentinel()`	`toIter`	`wrapList`	`instanceOf()`	`applyTh`
`joinStreamsRandom`	`toRange`	`equals`	`inRange()`	`applySerial`
`activeSamples`	`toSum`	`reverse`	`head`	`toFloat`
`table()`	`toProd`	`ignore`	`tail()`	`toInt`
`batched`	`toAvg`	`rateLimit`	`columns`	`sort`
`window`	`toMean`	`timeLimit`	`cut`	`sortF`
`groupBy`	`toMax`	`tab()`	`rows`	`consume`
`collate()`	`toMin`	`indent()`	`intersection`	`randomize`
`insertColumn`	`toPIL`	`clipboard`	`union`	`stagger`
`insertIdColumn()`	`toImg`	`headerIdx()`	`unique`	`op`
`expandE`	`toRgb`	`deref`	`breakIf`	`integrate`
`unsqueeze()`	`toRgba`	`bindec`	`mask`
`count`	`toBin`	`smooth`
`permute`	`toIdx`	`disassemble()`
`accumulate`	`lengths`	`tree()`
`AA_`	`toLens`	`lookup`
`peek`	`toDict`
`peekF`	`toDictF`
`repeat`
`repeatF()`
`repeatFrom`

init	output	inp	kxml	nb
`BaseCli`	`stdout`	`cat()`	`node`	`cells()`
`Table`	`tee`	`curl()`	`maxDepth`	`pretty`
`T()`	`file`	`wget()`	`tags`	`execute`
`fastF()`	`pretty`	`ls()`	`pretty`
`serial`	`display()`	`cmd`	`display`
`oneToMany`	`headOut()`	`requireCli()`
`manyToMany`	`intercept`
`mtmS`	`split`

grep	kcsv	trace
`grep`	`cat()`	`trace`
`grepTemplate`

Biology-related clis¶

I separated these out because they might not be interesting to the majority of users.

bio	sam	entrez	gb	mgi
`go()`	`cat()`	`esearch()`	`feats`	`batch`
`quality()`	`header`	`efetch()`	`origin`
`longFa()`	`flag`
`idx`
`transcribe`
`complement`
`translate`
`medAa`
`longAa`

bio module¶

This is for functions that are actually biology-related

k1lib.cli.bio.go(term: int)[source]¶: Looks up a GO term

k1lib.cli.bio.quality(log=True)[source]¶

Get numeric quality of sequence. Example:

# returns [2, 2, 5, 30]
"##&?" | quality() | deref()

Parameters: log – whether to use log scale (0 -> 40), or linear scale (1 -> 0.0001)

k1lib.cli.bio.longFa()[source]¶

Takes in a fasta file and put each sequence on 1 line. File “gene.fa”:

>AF086833.2 Ebola virus - Mayinga, Zaire, 1976, complete genome
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATC
TTTTGTGTGCGAATAACTATGAGGAAGATTAATAA
>something other gene
CGGACACACAAAAAGAAAGAAGA
TTTTGTGTGCGAATAACTATGAG

Code:

cat("gene.fa") | bio.longFa() | cli.headOut()

Prints out:

>AF086833.2 Ebola virus - Mayinga, Zaire, 1976, complete genome
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA
>something other gene
CGGACACACAAAAAGAAAGAAGATTTTGTGTGCGAATAACTATGAG

class k1lib.cli.bio.idx(fs: list = [])[source]¶

k1lib.cli module¶

Where to start?¶

Summary¶

Biology-related clis¶

bio module¶

conv module¶

entrez module¶

mgi module¶

filt module¶

gb module¶

grep module¶

init module¶

inp module¶

kcsv module¶

kxml module¶

modifier module¶

nb module¶

output module¶

sam module¶

structural module¶

trace module¶

utils module¶

others module¶

Elsewhere in the library¶