k1lib.cli module¶

The main idea of this package is to emulate the terminal, but doing all of that inside Python itself. So this bash statement:

cat file.txt | head -5 > headerFile.txt

Turns into this statement:

cat("file.txt") | head(5) > file("headerFile.txt")

Here, “cat”, “head” and “file” are all classes extended from BaseCli. All of them implements the “reverse or” operation, or __ror__. Essentially, these 2 statements are equivalent:

3 | obj
obj.__ror__(3)

Also, a lot of these tools assume that we are operating on a table. So this table:

col1	col2	col3
1	2	3
4	5	6

Is equivalent to this list:

[["col1", "col2", "col3"], [1, 2, 3], [4, 5, 6]]

Also, the expected way to use these tools is to import everything directly into the current environment, like this:

from k1lib.imports import *

Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. If this interests you, check over this:

Streams tutorial

Core clis include apply, applyS (its multiprocessing cousins applyMp and applyMpBatched are great too), op and deref, so start reading there first. Then, skim over everything to know what you can do with these collection of tools.

bio module¶

This is for functions that are actually biology-related

k1lib.cli.bio.go(term: int)[source]¶: Looks up a GO term

class k1lib.cli.bio.transcribe(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Transcribes (DNA -> RNA) incoming rows

__ror__(it: Union[Iterator[str], str])[source]¶

class k1lib.cli.bio.complement(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

__ror__(it: Union[Iterator[str], str])[source]¶

class k1lib.cli.bio.translate(length: int = 0)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(length: int = 0)[source]¶

Translates incoming rows.

Parameters: length – 0 for short (L), 1 for med (Leu), 2 for long (Leucine)

__ror__(it: Iterator[str])[source]¶

class k1lib.cli.bio.medAa(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Converts short aa sequence to medium one

__ror__(it: Iterator[str])[source]¶

class k1lib.cli.bio.longAa(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Converts short aa sequence to long one

__ror__(it: Iterator[str])[source]¶

entrez module¶

This module is not really fleshed out, not that useful/elegant, and I just use cmd instead

k1lib.cli.entrez.esearch(db: str = 'nucleotide', query: str = 'PRJNA257197')[source]¶

k1lib.cli.entrez.efetch(db: Optional[str] = None, ids: Optional[Union[str, List[str]]] = None, format: Optional[str] = None)[source]¶

mgi module¶

All tools related to the MGI database. Expected to use behind the “mgi” module name, like this:

from k1lib.imports import *
["SOD1", "AMPK"] | mgi.batch()

class k1lib.cli.mgi.batch(headless=True)[source]¶

Bases: k1lib.cli.init.BaseCli

Queries MGI database, convert list of genes to MGI ids

__init__(headless=True)[source]¶

Parameters: headless – whether to run this operation headless, or actually display the browser

__ror__(it: List[str])[source]¶

filt module¶

This is for functions that cuts out specific parts of the table

class k1lib.cli.filt.filt(predicate: Callable[[T], bool], column: Optional[int] = None)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(predicate: Callable[[T], bool], column: Optional[int] = None)[source]¶

Filters out lines. Examples:

# returns [2, 6]
[2, 3, 5, 6] | filt(lambda x: x%2 == 0) | deref()
# returns [3, 5]
[2, 3, 5, 6] | ~filt(lambda x: x%2 == 0) | deref()
# returns [[2, 'a'], [6, 'c']]
[[2, "a"], [3, "b"], [5, "a"], [6, "c"]] | filt(lambda x: x%2 == 0, 0) | deref()

Parameters

column –

if integer, then predicate(row[column])
if None, then predicate(row)

__ror__(it: Iterator[T]) → Iterator[T][source]¶

__invert__()[source]¶: Negate the condition

k1lib.cli.filt.isValue(value, column: Optional[int] = None) → k1lib.cli.filt.filt [source]¶

Filters out lines that is different from the given value. Example:

# returns [2, 2]
[1, 2, 3, 2, 1] | isValue(2) | deref()
# returns [1, 3, 1]
[1, 2, 3, 2, 1] | ~isValue(2) | deref()
# returns [[1, 2]]
[[1, 2], [2, 1], [3, 4]] | isValue(2, 1) | deref()

k1lib.cli.filt.isFile() → k1lib.cli.filt.filt [source]¶

Filters out non-files. Example:

# returns ["a.py", "b.py"], if those files really do exist
["a.py", "hg/", "b.py"] | isFile()

k1lib.cli.filt.inSet(values: Set[Any], column: Optional[int] = None) → k1lib.cli.filt.filt [source]¶

Filters out lines that is not in the specified set. Example:

# returns [2, 3]
range(5) | inSet([2, 8, 3]) | deref()
# returns [0, 1, 4]
range(5) | ~inSet([2, 8, 3]) | deref()

k1lib.cli.filt.contains(s: str, column: Optional[int] = None) → k1lib.cli.filt.filt [source]¶

Filters out lines that don’t contain the specified substring. Sort of similar to grep, but this is simpler, and can be inverted. Example:

# returns ['abcd', '2bcr']
["abcd", "0123", "2bcr"] | contains("bc") | deref()

class k1lib.cli.filt.empty(reverse=False)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(reverse=False)[source]¶

Filters out streams that is not empty. Almost always used inverted, but “empty” is a short, sweet name easy to remember. Example:

# returns [[1, 2], ['a']]
[[], [1, 2], [], ["a"]] | ~empty() | deref()

Parameters: reverse – not intended to be used by the end user. Do ~empty() instead.

__ror__(streams: Iterator[Iterator[T]]) → Iterator[Iterator[T]][source]¶

__invert__()[source]¶

k1lib.cli.filt.startswith(s: str, column: Optional[int] = None) → k1lib.cli.filt.filt [source]¶

Filters out lines that don’t start with s. Example:

# returns ['ab', 'ac']
["ab", "cd", "ac"] | startswith("a") | deref()
# returns ['cd']
["ab", "cd", "ac"] | ~startswith("a") | deref()

k1lib.cli.filt.endswith(s: str, column: Optional[int] = None) → k1lib.cli.filt.filt [source]¶: Filters out lines that don’t end with s. See also: startswith()

k1lib.cli.filt.isNumeric(column: Optional[int] = None) → k1lib.cli.filt.filt [source]¶: Filters out a line if that column is not a number. Example:

# returns [0, 2, ‘3’] [0, 2, “3”, “a”] | isNumeric() | deref()

k1lib.cli.filt.instanceOf(cls: Union[type, Tuple[type]], column: Optional[int] = None) → k1lib.cli.filt.filt [source]¶

Filters out lines that is not an instance of the given type. Example:

# returns [2]
[2, 2.3, "a"] | instanceOf(int) | deref()
# returns [2, 2.3]
[2, 2.3, "a"] | instanceOf((int, float)) | deref()

k1lib.cli.filt.inRange(min: float = - inf, max: float = inf, column: Optional[int] = None) → k1lib.cli.filt.filt [source]¶

Checks whether a column is in range or not. Example:

# returns [-2, 3, 6]
[-2, -8, 3, 6] | inRange(min=-3) | deref()
# returns [-8]
[-2, -8, 3, 6] | ~inRange(min=-3) | deref()

class k1lib.cli.filt.head(n: int = 10)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(n: int = 10)[source]¶

Only outputs first n lines. You can also negate it (like ~head(5)), which then only outputs after first n lines. Examples:

"abcde" | head(2) | deref() # returns ["a", "b"]
"abcde" | ~head(2) | deref() # returns ["c", "d", "e"]
"0123456" | head(-3) | deref() # returns ['0', '1', '2', '3']
"0123456" | ~head(-3) | deref() # returns ['4', '5', '6']

__ror__(it: Iterator[T]) → Iterator[T][source]¶

__invert__()[source]¶

class k1lib.cli.filt.columns(*columns: List[int])[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(*columns: List[int])[source]¶

Cuts out specific columns, sliceable. Examples:

["0123456789"] | cut(5, 8) | deref() # returns [['5', '8']]
["0123456789"] | cut(2) | deref() # returns ['2']
["0123456789"] | cut(5, 8) | deref() # returns [['5', '8']]
["0123456789"] | ~cut()[:7:2] | deref() # returns [['1', '3', '5', '7', '8', '9']]

If you’re selecting only 1 column, then Iterator[T] will be returned, not Table[T].

__ror__(it: k1lib.cli.init.Table[T]) → k1lib.cli.init.Table[T][source]¶

__invert__()[source]¶

k1lib.cli.filt.cut¶: alias of k1lib.cli.filt.columns

class k1lib.cli.filt.rows(*rows: List[int])[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(*rows: List[int])[source]¶

Cuts out specific rows. Space complexity O(1) as a list is not constructed (unless you’re using some really weird slices).

Parameters: rows – ints for the row indices

Example:

"0123456789" | rows(2) | deref() # returns ["2"]
"0123456789" | rows(5, 8) | deref() # returns ["5", "8"]
"0123456789" | rows()[2:5] | deref() # returns ["2", "3", "4"]
"0123456789" | ~rows()[2:5] | deref() # returns ["0", "1", "5", "6", "7", "8", "9"]
"0123456789" | ~rows()[:7:2] | deref() # returns ['1', '3', '5', '7', '8', '9']
"0123456789" | rows()[:-4] | deref() # returns ['0', '1', '2', '3', '4', '5']
"0123456789" | ~rows()[:-4] | deref() # returns ['6', '7', '8', '9']

__invert__()[source]¶

__ror__(it: Iterator[str])[source]¶

class k1lib.cli.filt.intersection(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Returns the intersection of multiple streams. Example:

# returns set([2, 4, 5])
[[1, 2, 3, 4, 5], [7, 2, 4, 6, 5]] | intersection()

__ror__(its: Iterator[Iterator[Any]]) → Set[Any][source]¶

class k1lib.cli.filt.union(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Returns the union of multiple streams. Example:

# returns {0, 1, 2, 10, 11, 12, 13, 14}
[range(3), range(10, 15)] | union()

__ror__(its: Iterator[Iterator[Any]]) → Set[Any][source]¶

class k1lib.cli.filt.unique(column: int)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(column: int)[source]¶

Filters out non-unique row elements. Example:

# returns [[1, "a"], [2, "a"]]
[[1, "a"], [2, "a"], [1, "b"]] | unique(0) | deref()

Parameters: column – doesn’t have the default case of None, because you can always use k1lib.cli.utils.toSet

__ror__(it: k1lib.cli.init.Table[T]) → k1lib.cli.init.Table[T][source]¶

class k1lib.cli.filt.breakIf(f)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(f)[source]¶

Breaks the input iterator if a condition is met. Example:

# returns [0, 1, 2, 3, 4, 5]
[*range(10), 2, 3] | breakIf(lambda x: x > 5) | deref()

__ror__(it: Iterator[T]) → Iterator[T][source]¶

class k1lib.cli.filt.mask(mask: Iterator[bool])[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(mask: Iterator[bool])[source]¶

Masks the input stream. Example:

# returns [0, 1, 3]
range(5) | mask([True, True, False, True, False]) | deref()

__ror__(it)[source]¶

gb module¶

All tools related to GenBank file format. Expected to use behind the “gb” module name, like this:

from k1lib.imports import *
cat("abc.gb") | gb.feats()

class k1lib.cli.gb.feats(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Fetches features, each on a separate stream

__ror__(it)[source]¶

static filt(*terms: str) → k1lib.cli.init.BaseCli [source]¶: Filters for specific terms in all the features texts. If there are multiple terms, then filters for first term, then second, then third, so the term’s order might matter to you

static tag(tag: str) → k1lib.cli.init.BaseCli [source]¶: Gets a single tag out. Applies this on a single feature only

class k1lib.cli.gb.origin(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Return the origin section of the genbank file

__ror__(it)[source]¶

grep module¶

class k1lib.cli.grep.grep(pattern: str, before: int = 0, after: int = 0)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(pattern: str, before: int = 0, after: int = 0)[source]¶

Find lines that has the specified pattern. Example:

# returns ['c', 'd', '2', 'd']
"abcde12d34" | grep("d", 1) | deref()
# returns ['d', 'e', 'd', '3', '4']
"abcde12d34" | grep("d", 0, 3).till("e") | deref()

Parameters

pattern – regex pattern to search for in a line
before – lines before the hit. Outputs independent lines
after – lines after the hit. Outputs independent lines

till(pattern: str)[source]¶: Greps until some other pattern appear. Before lines will be honored, but after lines will be set to inf. Inclusive.

__ror__(it: Iterator[str]) → Iterator[str][source]¶

class k1lib.cli.grep.grepToTable(pattern: str, before: int = 0, after: int = 0)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(pattern: str, before: int = 0, after: int = 0)[source]¶

Searches for a pattern. If found, then put all the before and after lines in different columns. Example:

# returns [['2', 'b'], ['5', 'b']]
"1a\n 2b\n 3c\n 4d\n 5b\n 6c\n f" | grepToTable("b", 1) | deref()

__ror__(it: Iterator[str]) → k1lib.cli.init.Table[str][source]¶

class k1lib.cli.grep.grepTemplate(pattern: str, template: str)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(pattern: str, template: str)[source]¶: Searches over all lines, pick out the match, and expands it to the templateand yields

__ror__(it: Iterator[str])[source]¶

init module¶

cli.cliSettings = {'defaultDelim': '\t', 'defaultIndent': ' ', 'lookupImgs': True, 'oboFile': None, 'strict': False}¶

Main settings of k1lib.cli. When using:

from k1lib.cli import *

…you can just set the settings like this:

cliSettings["defaultIndent"] = "\t"

There are a few settings:

defaultDelim: default delimiter used in-between columns when creating tables

defaultIndent: default indent used for displaying nested structures

lookupImgs: whether to automatically look up images when exploring something

oboFile: gene ontology obo file location

strict: whether strict mode is on. Turning it on can help you debug stuff, but could also be a pain to work with

class k1lib.cli.init.BaseCli(fs=[])[source]¶

Bases: object

A base class for all the cli stuff. You can definitely create new cli tools that have the same feel without extending from this class, but advanced stream operations (like +, &, .all(), |) won’t work.

At the moment, you don’t have to call super().__init__() and super().__ror__(), as __init__’s only job right now is to solidify any op passed to it, and __ror__ does nothing.

__init__(fs=[])[source]¶

Not expected to be instantiated by the end user.

Parameters: fs – if functions inside here is actually a op, then solidifies it (make it not absorb __call__ anymore)

__and__(cli: k1lib.cli.init.BaseCli) → k1lib.cli.init.oneToMany [source]¶: Duplicates input stream to multiple joined clis.

__add__(cli: k1lib.cli.init.BaseCli) → k1lib.cli.init.manyToManySpecific [source]¶: Parallel pass multiple streams to multiple clis.

all() → k1lib.cli.init.BaseCli [source]¶: Applies this cli to all incoming streams

__or__(it) → k1lib.cli.init.serial [source]¶: Joins clis end-to-end

__ror__(it)[source]¶

f() → k1lib.cli.init.Table[k1lib.cli.init.Table[int]][source]¶: Creates a normal function \(f(x)\) which is equivalent to x | self.

__lt__(it)[source]¶: Default backup join symbol >, in case it implements __ror__()

__call__(it, *args)[source]¶

Another way to do it | cli. If multiple arguments are fed, then the argument list is passed to cli instead of just the first element. Example:

@applyS
def f(it):
    return it
f(2) # returns 2
f(2, 3) # returns [2, 3]

class k1lib.cli.init.serial(*clis: List[k1lib.cli.init.BaseCli])[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(*clis: List[k1lib.cli.init.BaseCli])[source]¶

Merges clis into 1, feeding end to end. Used in chaining clis together without a prime iterator. Meaning, without this, stuff like this fails to run:

[1, 2] | a() | b() # runs
c = a() | b(); [1, 2] | c # doesn't run if this class doesn't exist

__ror__(it: Iterator[Any]) → Iterator[Any][source]¶

class k1lib.cli.init.oneToMany(*clis: List[k1lib.cli.init.BaseCli])[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(*clis: List[k1lib.cli.init.BaseCli])[source]¶: Duplicates 1 stream into multiple streams, each for a cli in the list. Used in the “a & b” joining operator

__ror__(it: Iterator[Any]) → Iterator[Iterator[Any]][source]¶

class k1lib.cli.init.manyToMany(cli: k1lib.cli.init.BaseCli)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(cli: k1lib.cli.init.BaseCli)[source]¶: Applies multiple streams to a single cli. Used in the “a.all()” operator. Note that this operation will use a different copy of the cli for each of the streams.

__ror__(it: Iterator[Iterator[Any]]) → Iterator[Iterator[Any]][source]¶

class k1lib.cli.init.manyToManySpecific(*clis: List[k1lib.cli.init.BaseCli])[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(*clis: List[k1lib.cli.init.BaseCli])[source]¶: Applies multiple streams to multiple clis independently. Used in the “a + b” joining operator

__ror__(its: Iterator[Any]) → Iterator[Any][source]¶

inp module¶

This module for tools that will likely start the processing stream.

k1lib.cli.inp.cat(fileName: Optional[str] = None, text: bool = True)[source]¶

Reads a file line by line. Example:

# display first 10 lines of file
cat("file.txt") | headOut()
# piping in also works
"file.txt" | cat() | headOut()

# rename file
cat("img.png", False) | file("img2.png", False)

Parameters

fileName – if None, then return a BaseCli that accepts a file name and outputs Iterator[str]
text – if True, read text file, else read binary file

k1lib.cli.inp.curl(url: str) → Iterator[str][source]¶

Gets file from url. File can’t be a binary blob. Example:

# prints out first 10 lines of the website
curl("https://k1lib.github.io/") | headOut()

k1lib.cli.inp.wget(url: str, fileName: Optional[str] = None)[source]¶

Downloads a file

Parameters

url – The url of the file
fileName – if None, then tries to infer it from the url

k1lib.cli.inp.ls(folder: Optional[str] = None)[source]¶

List every file and folder inside the specified folder. Example:

# returns List[str]
ls("/home")
# same as above
"/home" | ls()
# only outputs files, not folders
ls("/home") | isFile()

See also: isFile()

class k1lib.cli.inp.cmd(cmd: str)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(cmd: str)[source]¶

Runs a command, and returns the output line by line. Example:

# return detailed list of files
None | cmd("ls -la")
# return list of files that ends with "ipynb"
None | cmd("ls -la") | cmd('grep ipynb$')

property err¶: Error from the last command

__ror__(it: Optional[Iterator[str]]) → Iterator[str][source]¶: Pipes in lines of input, or if there’s nothing to pass, then pass None

k1lib.cli.inp.requireCli(cliTool: str)[source]¶: Searches for a particular cli tool (eg. “ls”), throws ImportError if not found, else do nothing

class k1lib.cli.inp.toPIL[source]¶

Bases: k1lib.cli.init.BaseCli

__init__()[source]¶

Converts a path to a PIL image. Example:

ls(".") | toPIL().all() | item() # get first image

__ror__(path) → PIL.Image.Image [source]¶

kcsv module¶

All tools related to csv file format. Expected to use behind the “kcsv” module name, like this:

from k1lib.imports import *
kcsv.cat("file.csv") | display()

k1lib.cli.kcsv.cat(file: str) → k1lib.cli.init.Table[str][source]¶: Opens a csv file, and turns them into nice row elements

kxml module¶

All tools related to xml file format. Expected to use behind the “kxml” module name, like this:

from k1lib.imports import *
cat("abc.xml") | kxml.node() | kxml.display()

class k1lib.cli.kxml.node(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Turns lines into a single node

__ror__(it: Iterator[str]) → Iterator[xml.etree.ElementTree.Element][source]¶

class k1lib.cli.kxml.maxDepth(depth: Optional[int] = None, copy: bool = True)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(depth: Optional[int] = None, copy: bool = True)[source]¶

Filters out too deep nodes

Parameters

depth – max depth to include in
copy – whether to limit the nodes itself, or limit a copy

__ror__(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶

class k1lib.cli.kxml.tag(tag: str)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(tag: str)[source]¶: Finds all tags that have a particular name. If found, then don’t search deeper

__ror__(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶

class k1lib.cli.kxml.pretty(indent: Optional[str] = None)[source]¶

Bases: k1lib.cli.init.BaseCli

__ror__(it: Iterator[xml.etree.ElementTree.Element]) → Iterator[str][source]¶

class k1lib.cli.kxml.display(depth: int = 3, lines: int = 20)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(depth: int = 3, lines: int = 20)[source]¶: Convenience method for getting head, make it pretty and print it out

__ror__(it: Iterator[xml.etree.ElementTree.Element], lines=10)[source]¶

modifier module¶

This is for quick modifiers, think of them as changing formats

class k1lib.cli.modifier.apply(f: Callable[[str], str], column: Optional[int] = None)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(f: Callable[[str], str], column: Optional[int] = None)[source]¶

Applies a function f to every line. Example:

# returns [0, 1, 4, 9, 16]
range(5) | apply(lambda x: x**2) | deref()
# returns [[3.0, 1.0, 1.0], [3.0, 1.0, 1.0]]
torch.ones(2, 3) | apply(lambda x: x+2, 0) | deref()

You can also use this as a decorator, like this:

@apply
def f(x):
    return x**2
# returns [0, 1, 4, 9, 16]
range(5) | f | deref()

Parameters: column – if not None, then applies the function to that column only

__ror__(it: Iterator[str])[source]¶

class k1lib.cli.modifier.applyMp(f: Callable[[T], T], prefetch: Optional[int] = None, timeout: float = 2, **kwargs)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(f: Callable[[T], T], prefetch: Optional[int] = None, timeout: float = 2, **kwargs)[source]¶

Like apply, but execute f(row) of each row in multiple processes. Example:

# returns [3, 2]
["abc", "de"] | applyMp(lambda s: len(s)) | deref()
# returns [5, 6, 9]
range(3) | applyMp(lambda x, bias: x**2+bias, bias=5) | deref()

# returns [[1, 2, 3], [1, 2, 3]], demonstrating outside vars work
someList = [1, 2, 3]
["abc", "de"] | applyMp(lambda s: someList) | deref()

Internally, this will continuously spawn new jobs up until 80% of all CPU cores are utilized. On posix systems, the default multiprocessing start method is fork(). This sort of means that all the variables in memory will be copied over. This might be expensive (might also not, with copy-on-write), so you might have to think about that. On windows and macos, the default start method is spawn, meaning each child process is a completely new interpreter, so you have to pass in all required variables and reimport every dependencies. Read more at https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

If you don’t wish to schedule all jobs at once, you can specify a prefetch amount, and it will only schedule that much jobs ahead of time. Example:

range(10000) | applyMp(lambda x: x**2)    | head() | deref() # 700ms
range(10000) | applyMp(lambda x: x**2, 5) | head() | deref() # 300ms

# demonstrating there're no huge penalties even if we want all results at the same time
range(10000) | applyMp(lambda x: x**2)    | deref() # 900ms
range(10000) | applyMp(lambda x: x**2, 5) | deref() # 1000ms

The first line will schedule all jobs at once, and thus will require more RAM and compute power, even though we discard most of the results anyway (the head cli). The second line only schedules 5 jobs ahead of time, and thus will be extremely more efficient if you don’t need all results right away.

Note

Remember that every BaseCli is also a function, meaning that you can do stuff like:

# returns [['ab', 'ac']]
[["ab", "cd", "ac"]] | applyMp(startswith("a") | deref()) | deref()

Also remember that the return result of f should not be a generator. That’s why in the example above, there’s a deref() inside f.

Most of the time, you’d probably want to use applyMpBatched instead. That cli tool has the same look and feel as this, but executes f multiple times in a single job, instead of executing f only 1 time per job here, so should dramatically improve performance for most workloads.

One last thing. Remember to close all pools (using clearPools()) before exiting the script so that all child processes are terminated, and that resources are freed. Let’s say if you use CUDA tensors, but have not close all pools yet, then it is possible that CUDA memory is not freed. I learned this the hard way. I’ve tried to use atexit to close pools automatically, but it doesn’t seem to work with notebooks.

Parameters

prefetch – if not specified, schedules all jobs at the same time. If specified, schedules jobs so that there’ll only be a specified amount of jobs, and will only schedule more if results are actually being used.
timeout – seconds to wait for job before raising an error
kwargs – extra arguments to be passed to the function. args not included as there’re a couple of options you can pass for this cli.

__ror__(it: Iterator[T]) → Iterator[T][source]¶

static clearPools()[source]¶: Terminate all existing pools. Do this before restarting/quitting the script/notebook to make sure all resources (like GPU) are freed.

static pools()[source]¶: Get set of all pools. Meant for debugging purposes only.

k1lib.cli.modifier.applyMpBatched(f, bs=32, prefetch=2, timeout=5)[source]¶: Pretty much the same as applyMp and has the same feel to it too. Iterator[A] goes in, Iterator[B] goes out, and you specify f(A) -> B. However, this will launch jobs that will execute multiple f(), instead of 1 job per execution. All examples from applyMp should work perfectly here.

class k1lib.cli.modifier.applyS(f: Callable[[T], T])[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(f: Callable[[T], T])[source]¶

Like apply, but much simpler, just operating on the entire input object, essentially. The “S” stands for “single”. Example:

# returns 5
3 | applyS(lambda x: x+2)

Like apply, you can also use this as a decorator like this:

@applyS
def f(x):
    return x+2
# returns 5
3 | f

This also decorates the returned object so that it has same qualname, docstring and whatnot.

__ror__(it: T) → T[source]¶

all()[source]¶: Applies this cli to all incoming streams

k1lib.cli.modifier.replace(s: str, target: Optional[str] = None, column: Optional[int] = None)[source]¶

Replaces substring s with target for each line. Example:

# returns ['104', 'ab0c']
["1234", "ab23c"] | replace("23", "0") | deref()

Parameters: target – if not specified, then use the default delimiter specified in cliSettings

k1lib.cli.modifier.remove(s: str, column: Optional[int] = None)[source]¶: Removes a specific substring in each line.

k1lib.cli.modifier.toFloat(*columns: List[int], force=False)[source]¶

Converts every row into a float. Example:

# returns [1, 3, -2.3]
["1", "3", "-2.3"] | toFloat() | deref()
# returns [[1.0, 'a'], [2.3, 'b'], [8.0, 'c']]
[["1", "a"], ["2.3", "b"], [8, "c"]] | toFloat(0) | deref()

With weird rows:

# returns [[1.0, 'a'], [8.0, 'c']]
[["1", "a"], ["c", "b"], [8, "c"]] | toFloat(0) | deref()
# returns [[1.0, 'a'], [0.0, 'b'], [8.0, 'c']]
[["1", "a"], ["c", "b"], [8, "c"]] | toFloat(0, force=True) | deref()

Parameters

columns – if nothing, then will convert each row. If available, then convert all the specified columns
force – if True, forces weird values to 0.0, else filters out all weird rows

k1lib.cli.modifier.toInt(*columns: List[int], force=False)[source]¶

Converts every row into an integer. Example:

# returns [1, 3, -2]
["1", "3", "-2.3"] | toInt() | deref()

Parameters

columns – if nothing, then will convert each row. If available, then convert all the specified columns
force – if True, forces weird values to 0, else filters out all weird rows

See also: toFloat()

class k1lib.cli.modifier.sort(column: int = 0, numeric=True, reverse=False)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(column: int = 0, numeric=True, reverse=False)[source]¶

Sorts all lines based on a specific column. Example:

# returns [[5, 'a'], [1, 'b']]
[[1, "b"], [5, "a"]] | ~sort(0) | deref()
# returns [[2, 3]]
[[1, "b"], [5, "a"], [2, 3]] | ~sort(1) | deref()
# errors out, as you can't really compare str with int
[[1, "b"], [2, 3], [5, "a"]] | sort(1, False) | deref()

Parameters

column – if None, sort rows based on themselves and not an element
numeric – whether to convert column to float
reverse – False for smaller to bigger, True for bigger to smaller. Use __invert__() to quickly reverse the order instead of using this param

__ror__(it: Iterator[str])[source]¶

__invert__()[source]¶: Creates a clone that has the opposite sort order

class k1lib.cli.modifier.sortF(f: Callable[[T], float], reverse=False)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(f: Callable[[T], float], reverse=False)[source]¶

Sorts rows using a function. Example:

# returns ['a', 'aa', 'aaa', 'aaaa', 'aaaaa']
["a", "aaa", "aaaaa", "aa", "aaaa"] | sortF(lambda r: len(r)) | deref()
# returns ['aaaaa', 'aaaa', 'aaa', 'aa', 'a']
["a", "aaa", "aaaaa", "aa", "aaaa"] | ~sortF(lambda r: len(r)) | deref()

__ror__(it: Iterator[T]) → Iterator[T][source]¶

__invert__() → k1lib.cli.modifier.sortF [source]¶

class k1lib.cli.modifier.consume(f: Union[k1lib.cli.init.BaseCli, Callable[[T], None]])[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(f: Union[k1lib.cli.init.BaseCli, Callable[[T], None]])[source]¶

Consumes the iterator in a side stream. Returns the iterator. Kinda like the bash command tee. Example:

# prints "0\n1\n2" and returns [0, 1, 2]
range(3) | consume(headOut()) | toList()
# prints "range(0, 3)" and returns [0, 1, 2]
range(3) | consume(lambda it: print(it)) | toList()

This is useful whenever you want to mutate something, but don’t want to include the function result into the main stream.

__ror__(it: T) → T[source]¶

class k1lib.cli.modifier.randomize(bs=100)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(bs=100)[source]¶

Randomize input stream. In order to be efficient, this does not convert the input iterator to a giant list and yield random values from that. Instead, this fetches bs items at a time, randomizes them, returns and fetch another bs items. If you want to do the giant list, then just pass in float("inf"), or None. Example:

# returns [0, 1, 2, 3, 4], effectively no randomize at all
range(5) | randomize(1) | deref()
# returns something like this: [1, 0, 2, 3, 5, 4, 6, 8, 7, 9]. You can clearly see the batches
range(10) | randomize(3) | deref()
# returns something like this: [7, 0, 5, 2, 4, 9, 6, 3, 1, 8]
range(10) | randomize(float("inf")) | deref()
# same as above
range(10) | randomize(None) | deref()

__ror__(it: Iterator[T]) → Iterator[T][source]¶

class k1lib.cli.modifier.stagger(every: int)[source]¶

Bases: k1lib.cli.init.BaseCli

Staggers input stream into multiple stream “windows” placed serially. Best explained with an example:

o = range(10) | stagger(3)
o | deref() # returns [0, 1, 2], 1st "window"
o | deref() # returns [3, 4, 5], 2nd "window"
o | deref() # returns [6, 7, 8]
o | deref() # returns [9]
o | deref() # returns []

This might be useful when you’re constructing a data loader:

dataset = [range(20), range(30, 50)] | transpose()
dl = dataset | batched(3) | (transpose() | toTensor()).all() | stagger(4)
for epoch in range(3):
    for xb, yb in dl: # looping over a window
        print(epoch)
        # then something like: model(xb)

The above code will print 6 lines. 4 of them is “0” (because we stagger every 4 batches), and xb’s shape’ will be (3,) (because we batched every 3 samples).

You should also keep in mind that this doesn’t really change the property of the stream itself. Essentially, treat these pairs of statement as being the same thing:

o = range(11, 100)

# both returns 11
o | stagger(20) | item()
o | item()

# both returns [11, 12, ..., 20]
o | head(10) | deref()
o | stagger(20) | head(10) | deref()

Lastly, multiple iterators might be getting values from the same stream window, meaning:

o = range(11, 100) | stagger(10)
it1 = iter(o); it2 = iter(o)
next(it1) # returns 11
next(it2) # returns 12

This may or may not be desirable. Also this should be obvious, but I want to mention this in case it’s not clear to you.

__ror__(it: Iterator[T]) → k1lib.cli.modifier.StaggeredStream[source]¶

class k1lib.cli.modifier.op[source]¶

Bases: k1lib._baseClasses.Absorber, k1lib.cli.init.BaseCli

Absorbs operations done on it and applies it on the stream. Based on Absorber. Example:

t = torch.tensor([[1, 2, 3], [4, 5, 6.0]])
# returns [torch.tensor([[4., 5., 6., 7., 8., 9.]])]
[t] | (op() + 3).view(1, -1).all() | deref()

Basically, you can treat op() as the input tensor. Tbh, you can do the same thing with this:

[t] | applyS(lambda t: (t+3).view(-1, 1)).all() | deref()

But that’s kinda long and may not be obvious. This can be surprisingly resilient, as you can still combine with other cli tools as usual, for example:

# returns [2, 3], demonstrating "&" operator
torch.randn(2, 3) | (op().shape & identity()) | deref() | item()

a = torch.tensor([[1, 2, 3], [7, 8, 9]])
# returns torch.tensor([4, 5, 6]), demonstrating "+" operator for clis and not clis
(a | op() + 3 + identity() | item() == torch.tensor([4, 5, 6])).all()

# returns [[3], [3]], demonstrating .all() and "|" serial chaining
torch.randn(2, 3) | (op().shape.all() | deref())

# returns [[8, 18], [9, 19]], demonstrating you can treat `op()` as a regular function
[range(10), range(10, 20)] | transpose() | filt(op() > 7, 0) | deref()

Performance-wise, there are some, but not a lot of degradation, so don’t worry about it:

n = 10_000_000
# takes 1.6s
for i in range(n): i**2
# takes 1.8s, 1.125x worse than for loop
range(n) | apply(lambda x: x**2) | ignore()
# takes 2.7s, 1.7x worse than for loop
range(n) | apply(op()**2) | ignore()
# takes 2.7s
range(n) | (op()**2).all() | ignore()

Reserved operations that are not absorbed are:

all
__ror__ (__or__ still works!)
op_solidify

op_solidify()[source]¶

Use this to not absorb __call__ operations anymore and makes it feel like a regular function (still absorbs other operations though):

f = op()**2
3 | f # returns 9, but may be you don't want to pipe it in
f.op_solidify()
f(3)  # returns 9

__ror__(it)[source]¶

output module¶

For operations that feel like the termination

class k1lib.cli.output.stdout(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Prints out all lines. If not iterable, then print out the input raw

__ror__(it: Iterator[str])[source]¶

class k1lib.cli.output.file(fileName: str, text: bool = True)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(fileName: str, text: bool = True)[source]¶

Opens a new file for writing.

Parameters: text – if True, accepts Iterator[str], and prints out each string on a new line. Else accepts bytes and write in 1 go.

__ror__(it: Iterator[str]) → None [source]¶

class k1lib.cli.output.pretty(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Pretty prints a table

__ror__(it: k1lib.cli.init.Table[Any]) → Iterator[str][source]¶

k1lib.cli.output.display(lines: int = 10)[source]¶: Convenience method for displaying a table

k1lib.cli.output.headOut(lines: int = 10)[source]¶: Convenience method for head() | stdout()

class k1lib.cli.output.intercept(raiseError: bool = True)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(raiseError: bool = True)[source]¶

Intercept flow at a particular point, analyze the object piped in, and raises error to stop flow. Example:

3 | intercept()

Parameters: raiseError – whether to raise error when executed or not.

__ror__(s)[source]¶

sam module¶

This is for functions that are .sam or .bam related

k1lib.cli.sam.cat(bamFile: str)[source]¶: Get sam file outputs from bam file

class k1lib.cli.sam.header(long=True)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(long=True)[source]¶

Adds a header to the table.

Parameters: long – whether to use a long descriptive header, or a short one

__ror__(it)[source]¶

class k1lib.cli.sam.quality(log=True)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(log=True)[source]¶

Get numeric quality of sequence.

Parameters: log – whether to use log scale (0 -> 40), or linear scale (1 -> 0.0001)

__ror__(line)[source]¶

structural module¶

This is for functions that sort of changes the table structure in a dramatic way. They’re the core transformations

k1lib.cli.structural.yieldSentinel¶: Object that can be yielded in a stream to ignore this stream for the moment in joinStreamsRandom.

class k1lib.cli.structural.joinStreamsRandom(fs=[])[source]¶

Join multiple streams randomly. If any streams runs out, then quits. If any stream yields yieldSentinel, then just ignores that result and continue. Could be useful in active learning. Example:

# could return [0, 1, 10, 2, 11, 12, 13, ...], with max length 20, typical length 18
[range(0, 10), range(10, 20)] | joinStreamsRandom() | deref()

stream2 = [[-5, yieldSentinel, -4, -3], yieldSentinel | repeat()] | joinStreams()
# could return [-5, -4, 0, -3, 1, 2, 3, 4, 5, 6], demonstrating yieldSentinel
[range(7), stream2] | joinStreamsRandom() | deref()

__ror__(streams: Iterator[Iterator[T]]) → Iterator[T][source]¶

class k1lib.cli.structural.transpose(fillValue=None)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(fillValue=None)[source]¶

Join multiple columns and loop through all rows. Aka transpose.

Parameters: fillValue – if not None, then will try to zip longest with this fill value

Example:

# returns [[1, 4], [2, 5], [3, 6]]
[[1, 2, 3], [4, 5, 6]] | transpose() | deref()
# returns [[1, 4], [2, 5], [3, 6], [0, 7]]
[[1, 2, 3], [4, 5, 6, 7]] | transpose(0) | deref()

__ror__(it: Iterator[Iterator[T]]) → k1lib.cli.init.Table[T][source]¶

class k1lib.cli.structural.joinList(element=None, begin=True)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(element=None, begin=True)[source]¶

Join element into list.

Parameters: element – the element to insert. If None, then takes the input [e, […]], else takes the input […] as usual

Example:

# returns [5, 2, 6, 8]
[5, [2, 6, 8]] | joinList() | deref()
# also returns [5, 2, 6, 8]
[2, 6, 8] | joinList(5) | deref()

__ror__(it: Tuple[T, Iterator[T]]) → Iterator[T][source]¶

class k1lib.cli.structural.splitList(*weights: List[float])[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(*weights: List[float])[source]¶

Splits list of elements into multiple lists. If no weights are provided, then automatically defaults to [0.8, 0.2]. Example:

# returns [[0, 1, 2, 3, 4, 5, 6, 7], [8, 9]]
range(10) | splitList(0.8, 0.2) | deref()
# same as the above
range(10) | splitList() | deref()

__ror__(it)[source]¶

class k1lib.cli.structural.joinStreams(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Joins multiple streams. Example:

# returns [1, 2, 3, 4, 5]
[[1, 2, 3], [4, 5]] | joinStreams() | deref()

__ror__(streams: Iterator[Iterator[T]]) → Iterator[T][source]¶

class k1lib.cli.structural.activeSamples(limit: int = inf)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(limit: int = inf)[source]¶

Yields active learning samples. Example:

o = activeSamples()
ds = range(10) # normal dataset
ds = [o, ds] | joinStreamsRandom() # dataset with active learning capability
next(ds) # returns 0
next(ds) # returns 1
next(ds) # returns 2
o.append(20)
next(ds) # can return     3     or 20
next(ds) # can return (4 or 20) or 4

So the point of this is to be a generator of samples. You can define your dataset as a mix of active learning samples and standard samples. Whenever there’s a data point that you want to focus on, you can add it to o and it will eventially yield it.

Parameters: limit – max number of active samples. Discards samples if number of samples is over this.

append(item)[source]¶

class k1lib.cli.structural.batched(bs=32, includeLast=False)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(bs=32, includeLast=False)[source]¶

Batches the input stream. Example:

# returns [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
range(11) | batched(3) | deref()
# returns [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
range(11) | batched(3, True) | deref()
# returns [[0, 1, 2, 3, 4]]
range(5) | batched(float("inf"), True) | deref()
# returns []
range(5) | batched(float("inf"), False) | deref()

__ror__(it)[source]¶

k1lib.cli.structural.collate()[source]¶

Puts individual columns into a tensor. Example:

# returns [tensor([ 0, 10, 20]), tensor([ 1, 11, 21]), tensor([ 2, 12, 22])]
[range(0, 3), range(10, 13), range(20, 23)] | collate() | toList()

k1lib.cli.structural.insertRow(*row: List[T])[source]¶: Inserts a row right before every other rows. See also: joinList().

k1lib.cli.structural.insertColumn(*column, begin=True, fillValue='')[source]¶

Inserts a column at beginning or end. Example:

# returns [['a', 1, 2], ['b', 3, 4]]
[[1, 2], [3, 4]] | insertColumn("a", "b") | deref()

k1lib.cli.structural.insertIdColumn(table=False, begin=True, fillValue='')[source]¶

Inserts an id column at the beginning (or end). Example:

# returns [[0, 'a', 2], [1, 'b', 4]]
[["a", 2], ["b", 4]] | insertIdColumn(True) | deref()
# returns [[0, 'a'], [1, 'b']]
"ab" | insertIdColumn()

Parameters: table – if False, then insert column to an Iterator[str], else treat input as a full fledged table

class k1lib.cli.structural.toDict[source]¶

Bases: k1lib.cli.init.BaseCli

__init__()[source]¶

Converts 2 Iterators, 1 key, 1 value into a dictionary. Example:

# returns {1: 3, 2: 4}
[[1, 2], [3, 4]] | toDict()

__ror__(it: Tuple[Iterator[T], Iterator[T]]) → dict [source]¶

class k1lib.cli.structural.toDictF(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶

Transform an incoming stream into a dict using a function for values. Example:

names = ["wanda", "vision", "loki", "mobius"]
names | toDictF(valueF=lambda s: len(s)) # will return {"wanda": 5, "vision": 6, ...}
names | toDictF(lambda s: s.title(), lambda s: len(s)) # will return {"Wanda": 5, "Vision": 6, ...}

__ror__(keys: Iterator[Any]) → Dict[Any, Any][source]¶

class k1lib.cli.structural.expandE(f: Callable[[T], List[T]], column: int)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(f: Callable[[T], List[T]], column: int)[source]¶

Expands table element to multiple columns. Example:

# returns [['abc', 3, -2], ['de', 2, -5]]
[["abc", -2], ["de", -5]] | expandE(lambda e: (e, len(e)), 0) | deref()

Parameters: f – Function that transforms 1 row element to multiple elements

__ror__(it)[source]¶

k1lib.cli.structural.unsqueeze(dim: int = 0)[source]¶

Unsqueeze input iterator. Example:

t = [[1, 2], [3, 4], [5, 6]]
# returns torch.Size([3, 2])
torch.tensor(t).shape
# returns torch.Size([1, 3, 2])
torch.tensor(t | unsqueeze(0) | deref()).shape
# returns torch.Size([3, 1, 2])
torch.tensor(t | unsqueeze(1) | deref()).shape
# returns torch.Size([3, 2, 1])
torch.tensor(t | unsqueeze(2) | deref()).shape

class k1lib.cli.structural.count(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Finds unique elements and returns a table with [frequency, value, percent] columns. Example:

# returns [[1, 'a', '33%'], [2, 'b', '67%']]
['a', 'b', 'b'] | count() | deref()

__ror__(it: Iterator[str])[source]¶

class k1lib.cli.structural.permute(*permutations: List[int])[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(*permutations: List[int])[source]¶

Permutes the columns. Acts kinda like torch.Tensor.permute(). Example:

# returns [['b', 'a'], ['d', 'c']]
["ab", "cd"] | permute(1, 0) | deref()

__ror__(it: Iterator[str])[source]¶

class k1lib.cli.structural.accumulate(columnIdx: int = 0, avg=False)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(columnIdx: int = 0, avg=False)[source]¶

Groups lines that have the same row[columnIdx], and add together all other columns, assuming they’re numbers

Parameters

columnIdx – common column index to accumulate
avg – calculate average values instead of sum

__ror__(it: Iterator[str])[source]¶

class k1lib.cli.structural.AA_(*idxs: List[int], wraps=False)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(*idxs: List[int], wraps=False)[source]¶

Returns 2 streams, one that has the selected element, and the other the rest. Example:

# returns [5, [1, 6, 3, 7]]
[1, 5, 6, 3, 7] | AA_(1)
# returns [[5, [1, 6, 3, 7]]]
[1, 5, 6, 3, 7] | AA_(1, wraps=True)

You can also put multiple indexes through:

# returns [[1, [5, 6]], [6, [1, 5]]]
[1, 5, 6] | AA_(0, 2)

If you don’t specify anything, then all indexes will be sliced:

# returns [[1, [5, 6]], [5, [1, 6]], [6, [1, 5]]]
[1, 5, 6] | AA_()

As for why the strange name, think of this operation as “AĀ”. In statistics, say you have a set “A”, then “not A” is commonly written as A with an overline “Ā”. So “AA_” represents “AĀ”, and that it first returns the selection A.

Parameters: wraps – if True, then the first example will return [[5, [1, 6, 3, 7]]] instead, so that A has the same signature as Ā

__ror__(it: List[Any]) → List[List[List[Any]]][source]¶

class k1lib.cli.structural.peek(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Returns (firstRow, iterator). This sort of peaks at the first row, to potentially gain some insights about the internal formats. The returned iterator is not tampered. Example:

e, it = iter([[1, 2, 3], [1, 2]]) | peek()
print(e) # prints "[1, 2, 3]"
s = 0
for e in it: s += len(e)
print(s) # prints "5", or length of 2 lists

You kinda have to be careful about handling the firstRow, because you might inadvertently alter the iterator:

e, it = iter([iter(range(3)), range(4), range(2)]) | peek()
e = list(e) # e is [0, 1, 2]
list(next(it)) # supposed to be the same as `e`, but is [] instead

The example happens because you have already consumed all elements of the first row, and thus there aren’t any left when you try to call next(it).

__ror__(it: Iterator[T]) → Tuple[T, Iterator[T]][source]¶

class k1lib.cli.structural.peekF(f: Union[k1lib.cli.init.BaseCli, Callable[[T], T]])[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(f: Union[k1lib.cli.init.BaseCli, Callable[[T], T]])[source]¶

Similar to peek, but will execute f(row) and return the input Iterator, which is not tampered. Example:

it = lambda: iter([[1, 2, 3], [1, 2]])
# prints "[1, 2, 3]" and returns [[1, 2, 3], [1, 2]]
it() | peekF(lambda x: print(x)) | deref()
# prints "1\n2\n3"
it() | peekF(headOut()) | deref()

__ror__(it: Iterator[T]) → Iterator[T][source]¶

class k1lib.cli.structural.repeat(limit: Optional[int] = None)[source]¶

Bases: k1lib.cli.init.BaseCli

Yields a specified amount of the passed in object. If you intend to pass in an iterator, then make a list out of it first, as second copy of iterator probably won’t work as you will have used it the first time. Example:

# returns [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
[1, 2, 3] | repeat(3) | toList()

Parameters: repeat – if None, then repeats indefinitely

__ror__(o: T) → Iterator[T][source]¶

k1lib.cli.structural.repeatF(f, limit: Optional[int] = None)[source]¶

Yields a specified amount generated by a specified function. Example:

# returns [4, 4, 4]
repeatF(lambda: 4, 3) | toList()
# returns 10
repeatF(lambda: 4) | head() | shape(0)

Parameters: limit – if None, then repeats indefinitely

See also: repeatFrom

class k1lib.cli.structural.repeatFrom(limit: Optional[int] = None)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(limit: Optional[int] = None)[source]¶

Yields from a list. If runs out of elements, then do it again for limit times. Example:

# returns [1, 2, 3, 1, 2]
[1, 2, 3] | repeatFrom() | head(5) | deref()
# returns [1, 2, 3, 1, 2, 3]
[1, 2, 3] | repeatFrom(2) | deref()

Parameters: limit – if None, then repeats indefinitely

__ror__(it: Iterator[T]) → Iterator[T][source]¶

utils module¶

This is for all short utilities that has the boilerplate feeling

class k1lib.cli.utils.size(idx=None)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(idx=None)[source]¶

Returns number of rows and columns in the input. Example:

# returns (3, 2)
[[2, 3], [4, 5, 6], [3]] | size()
# returns 3
[[2, 3], [4, 5, 6], [3]] | size(0)
# returns 2
[[2, 3], [4, 5, 6], [3]] | size(1)
# returns (2, 0)
[[], [2, 3]] | size()
# returns (3, None)
[2, 3, 5] | size()
# returns 3
[2, 3, 5] | size(0)

You can also pipe in a torch.Tensor, and it will just return its shape:

# returns torch.Size([3, 4])
torch.randn(3, 4) | size()

Parameters: idx – if idx is None return (rows, columns). If 0 or 1, then rows or columns

__ror__(it: Iterator[str])[source]¶

k1lib.cli.utils.shape¶: alias of k1lib.cli.utils.size

class k1lib.cli.utils.item(amt: int = 1)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(amt: int = 1)[source]¶

Returns the first row. Example:

# returns 0
iter(range(5)) | item()
# returns torch.Size([5])
torch.randn(3,4,5) | item(2) | shape()

Parameters: amt – how many times do you want to call item() back to back?

__ror__(it: Iterator[str])[source]¶

class k1lib.cli.utils.identity(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Yields whatever the input is. Useful for multiple streams. Example:

# returns range(5)
range(5) | identity()

__ror__(it: Iterator[Any])[source]¶

class k1lib.cli.utils.toStr(column: Optional[int] = None)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(column: Optional[int] = None)[source]¶

Converts every line to a string. Example:

# returns ['2', 'a']
[2, "a"] | toStr() | deref()
# returns [[2, 'a'], [3, '5']]
assert [[2, "a"], [3, 5]] | toStr(1) | deref()

__ror__(it: Iterator[str])[source]¶

class k1lib.cli.utils.join(delim: Optional[str] = None)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(delim: Optional[str] = None)[source]¶

Merges all strings into 1, with delim in the middle. Basically str.join(). Example:

# returns '2\na'
[2, "a"] | join("\n")

__ror__(it: Iterator[str])[source]¶

class k1lib.cli.utils.toNumpy(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Converts generator to numpy array. Essentially np.array(list(it))

__ror__(it: Iterator[float]) → numpy.array[source]¶

class k1lib.cli.utils.toTensor(dtype=torch.float32)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(dtype=torch.float32)[source]¶

Converts generator to torch.Tensor. Essentially torch.tensor(list(it)).

Also checks if input is a PIL Image. If yes, turn it into a torch.Tensor and return.

__ror__(it: Iterator[float]) → torch.Tensor [source]¶

class k1lib.cli.utils.toList(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Converts generator to list. list would do the same, but this is just to maintain the style

__ror__(it: Iterator[Any]) → List[Any][source]¶

class k1lib.cli.utils.wrapList(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Wraps inputs inside a list. There’s a more advanced cli tool built from this, which is unsqueeze().

__ror__(it: T) → List[T][source]¶

class k1lib.cli.utils.toSet(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Converts generator to set. set would do the same, but this is just to maintain the style

__ror__(it: Iterator[T]) → Set[T][source]¶

class k1lib.cli.utils.toIter(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Converts object to iterator. iter() would do the same, but this is just to maintain the style

__ror__(it: List[T]) → Iterator[T][source]¶

class k1lib.cli.utils.toRange(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Returns iter(range(len(it))), effectively

__ror__(it: Iterator[Any]) → Iterator[int][source]¶

class k1lib.cli.utils.equals[source]¶

Bases: object

Checks if all incoming columns/streams are identical

__ror__(streams: Iterator[Iterator[str]])[source]¶

class k1lib.cli.utils.reverse(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Reverses incoming list. Example:

# returns [3, 5, 2]
[2, 5, 3] | reverse() | deref()

__ror__(it: Iterator[str]) → List[str][source]¶

class k1lib.cli.utils.ignore(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Just loops through everything, ignoring the output. Example:

# will just return an iterator, and not print anything
[2, 3] | apply(lambda x: print(x))
# will prints "2\n3"
[2, 3] | apply(lambda x: print(x)) | ignore()

__ror__(it: Iterator[Any])[source]¶

class k1lib.cli.utils.toSum(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Calculates the sum of list of numbers. Can pipe in torch.Tensor. Example:

# returns 45
range(10) | toSum()

__ror__(it: Iterator[float])[source]¶

class k1lib.cli.utils.toAvg(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Calculates average of list of numbers. Can pipe in torch.Tensor. Example:

# returns 4.5
range(10) | toAvg()
# returns nan
[] | toAvg()

__ror__(it: Iterator[float])[source]¶

k1lib.cli.utils.toMean¶: alias of k1lib.cli.utils.toAvg

class k1lib.cli.utils.toMax(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Calculates the max of a bunch of numbers. Can pipe in torch.Tensor. Example:

# returns 6
[2, 5, 6, 1, 2] | toMax()

__ror__(it: Iterator[float]) → float [source]¶

class k1lib.cli.utils.toMin(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Calculates the min of a bunch of numbers. Can pipe in torch.Tensor. Example:

# returns 1
[2, 5, 6, 1, 2] | toMin()

__ror__(it: Iterator[float]) → float [source]¶

class k1lib.cli.utils.lengths(fs=[])[source]¶

Bases: k1lib.cli.init.BaseCli

Returns the lengths of each row. Example:

[range(5), range(10)] | lengths() == [5, 10]

__ror__(it: Iterator[List[Any]]) → Iterator[int][source]¶

k1lib.cli.utils.headerIdx()[source]¶

Cuts out first line, put an index column next to it, and prints it out. Useful when you want to know what your column’s index is to cut it out. Also sets the context variable “header”, in case you need it later. Example:

# returns [[0, 'a'], [1, 'b'], [2, 'c']]
["abc"] | headerIdx() | deref()

class k1lib.cli.utils.deref(ignoreTensors=True, maxDepth=inf)[source]¶

Bases: k1lib.cli.init.BaseCli

__init__(ignoreTensors=True, maxDepth=inf)[source]¶

Recursively converts any iterator into a list. Only str, numbers.Number and Module are not converted. Example:

# returns something like "<range_iterator at 0x7fa8c52ca870>"
iter(range(5))
# returns [0, 1, 2, 3, 4]
iter(range(5)) | deref()

You can also specify a maxDepth:

# returns something like "<list_iterator at 0x7f810cf0fdc0>"
iter([range(3)]) | deref(maxDepth=0)
# returns [range(3)]
iter([range(3)]) | deref(maxDepth=1)
# returns [[0, 1, 2]]
iter([range(3)]) | deref(maxDepth=2)

Parameters

ignoreTensors – if True, then don’t loop over torch.Tensor internals
maxDepth – maximum depth to dereference. Starts at 0 for not doing anything at all

Warning

Can work well with PyTorch Tensors, but not Numpy’s array as they screw things up with the __ror__ operator, so do torch.from_numpy(…) first. Don’t worry about unnecessary copying, as numpy and torch both utilizes the buffer protocol.

__ror__(it: Iterator[T]) → List[T][source]¶

__invert__() → k1lib.cli.init.BaseCli [source]¶: Returns a BaseCli that makes everything an iterator.

others module¶

This is for pretty random clis that’s scattered everywhere.

k1lib.cli.others.crissCross()[source]¶

Like the monkey-patched function torch.crissCross(). Example:

# returns another Tensor
[torch.randn(3, 3), torch.randn(3)] | crissCross()

There are a couple monkey-patched clis:

torch.stack()¶: Stacks tensors together

Elsewhere in the library¶

There might still be more cli tools scattered around the library. These are pretty rare, quite dynamic and most likely a cool extra feature, not a core functionality, so not worth it/can’t mention it here. Anyway, execute this:

cli.scatteredClis()

to get a list of them.