k1lib.cli module¶

The main idea of this package is to emulate the terminal (hence “cli”, or “command line interface”), but doing all of that inside Python itself. So this bash statement:

cat file.txt | head -5 > headerFile.txt

Turns into this statement:

cat("file.txt") | head(5) > file("headerFile.txt")

You can even integrate with existing shell commands:

ls("~") | cmd("grep so")

Here, “ls” will list out files inside the home directory, then pipes it into regular grep on linux, which is then piped back into Python as a list of strings. So it’s equivalent to this bash statement:

ls | grep so

“cat”, “head”, “file”, “ls” and “cmd” are all classes extended from BaseCli. All of them implements the “reverse or” operation, or __ror__. So essentially, these 2 statements are equivalent:

3 | obj
obj.__ror__(3)

Also, a lot of these tools (like apply and filt) assume that we are operating on a table. So this table:

col1	col2	col3
1	2	3
4	5	6

Is equivalent to this list:

[["col1", "col2", "col3"], [1, 2, 3], [4, 5, 6]]

transpose and mtmS provides more flexible ways to transform a table structure (but usually involves more code).

Also, the expected way to use these tools is to import everything directly into the current environment, like this:

from k1lib.imports import *

Because there are a lot of clis, you may sometimes unintentionally overwrite an exposed cli tool. No worries, every tool is also under the cli object, meaning you can use deref() or cli.deref().

Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. If this interests you, check over this:

Streams tutorial

All clis tools should work totally fine with PyTorch tensors, but not numpy arrays. This is because numpy arrays actually implements __or__ operator, which overrides cli tools’ __ror__ operator. Workarounds might look like this:

# returns (2, 3, 5), works fine
torch.randn(2, 3, 5) | shape()
# will not work, returns weird numpy array of shape (2, 3, 5)
np.random.randn(2, 3, 5) | shape()
# returns (2, 3, 5), mitigation strategy #1
shape()(np.random.randn(2, 3, 5))
# returns (2, 3, 5), mitigation strategy #2
[np.random.randn(2, 3, 5)] | (item() | shape())

All settings are at settings under name “cli”.

Where to start?¶

Core clis include apply, applyS (its multiprocessing cousin applyMp is great too), op, filt, deref, item, shape, iden, cmd, so start reading there first. Then, skim over everything to know what you can do with these collection of tools. While you’re doing that, checkout trace(), for a quite powerful debugging tool.

There are several written tutorials about cli here, and I also made some video tutorials as well, so go check those out.

bio module¶

This is for functions that are actually biology-related

k1lib.cli.bio.go(term: int)[source]¶: Looks up a GO term

k1lib.cli.bio.quality(log=True)[source]¶

Get numeric quality of sequence. Example:

# returns [2, 2, 5, 30]
"##&?" | quality() | deref()

Parameters: log – whether to use log scale (0 -> 40), or linear scale (1 -> 0.0001)

k1lib.cli.bio.longFa()[source]¶

Takes in a fasta file and put each sequence on 1 line. File “gene.fa”:

>AF086833.2 Ebola virus - Mayinga, Zaire, 1976, complete genome
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATC
TTTTGTGTGCGAATAACTATGAGGAAGATTAATAA
>something other gene
CGGACACACAAAAAGAAAGAAGA
TTTTGTGTGCGAATAACTATGAG

Code:

cat("gene.fa") | bio.longFa() | cli.headOut()

Prints out:

>AF086833.2 Ebola virus - Mayinga, Zaire, 1976, complete genome
CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA
>something other gene
CGGACACACAAAAAGAAAGAAGATTTTGTGTGCGAATAACTATGAG

class k1lib.cli.bio.idx(fs: list = [])[source]¶

k1lib.cli module¶

Where to start?¶

bio module¶

entrez module¶

mgi module¶

filt module¶

gb module¶

grep module¶

init module¶

inp module¶

kcsv module¶

kxml module¶

modifier module¶

nb module¶

output module¶

sam module¶

structural module¶

trace module¶

utils module¶

others module¶

Elsewhere in the library¶