cli package

The main idea of this package is to emulate the terminal, but doing all of that inside Python itself. So this bash statement:

cat file.txt | head -5 > headerFile.txt

Turns into this statement:

cat("file.txt") | head(5) > file("headerFile.txt")

Here, “cat”, “head” and “file” are all classes extended from BaseCli. All of them implements the “reverse or” operation, or __ror__. Essentially, these 2 statements are equivalent:

3 | obj
obj.__ror__(3)

Also, a lot of these tools assume that we are operating on a table. So this table:

col1

col2

col3

1

2

3

4

5

6

Is equivalent to this list:

[["col1", "col2", "col3"], [1, 2, 3], [4, 5, 6]]

Also, the expected way to use these tools is to import everything directly into the current environment, like this:

from k1lib.bioinfo.cli import *

Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. If this interests you, check over this:

Submodules

bio module

This is for functions that are actually biology-related

k1lib.bioinfo.cli.bio.go(term: int)[source]

Looks up a GO term

class k1lib.bioinfo.cli.bio.transcribe[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Transcribes (DNA -> RNA) incoming rows

__ror__(it: Union[Iterator[str], str])[source]
class k1lib.bioinfo.cli.bio.complement[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__ror__(it: Union[Iterator[str], str])[source]
class k1lib.bioinfo.cli.bio.translate(length: int = 0)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(length: int = 0)[source]

Translates incoming rows.

Parameters

length – 0 for short (L), 1 for med (Leu), 2 for long (Leucine)

__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.bio.medAa[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts short aa sequence to medium one

__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.bio.longAa[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts short aa sequence to long one

__ror__(it: Iterator[str])[source]

ctx module

class k1lib.bioinfo.cli.ctx.Promise(ctx: str)

Bases: object

__init__(ctx: str)[source]

A delayed variable that represents a value in the current context. Not intended to be instantiated by the end user. Use __call__() to get the actual value (aka “dereferencing”).

This delayed variable just loves to be dereferenced. A lot of operations that you do with it will dereferences it right away, like this:

from k1lib.bioinfo.cli import *
ctx["a"] = 4
f"value: {ctx['a']}" # returns string "value: 4"
ctx['a'] + 5 # returns 9
ctx['a'] / 5 # returns 0.8

If a Promise attribute is set in BaseCli subclass, then it will automagically be dereferenced at __ror__ of BaseCli.

If you don’t interact with it directly like the above operations, but just pass it around, then it won’t dereference. You can then force it to do so like this:

[ctx['a'], 5] | ctx.dereference() # returns an iterator, with the first variable dereferenced
[ctx['a'], 5] | ctx.dereference() | toList() # returns ['a', 5]
[ctx['a'], 5] | dereference() # returns ['a', 5]
static strip(o)[source]

If is Promise, then returns the value in context, else returns o.

class k1lib.bioinfo.cli.ctx.consume(ctx: str, **kwargs)

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(ctx: str, **kwargs)[source]

Consumes the input, dereferences it and stores it in context. Example:

# returns [2, 3, 4, 5, 6]
range(5) | ctx.consume('a') | apply(lambda x: x+2) | toList()
# returns [0, 1, 2, 3, 4]
ctx['a']()
Parameters

kwargs – args to pass to dereference.

__ror__(it: T)T[source]
k1lib.bioinfo.cli.ctx.ctx()

Returns the internal context dictionary. Only use this if you want to write your own context-manipulating Callbacks

class k1lib.bioinfo.cli.ctx.dereference

Bases: k1lib.bioinfo.cli.init.BaseCli

If encountered a Promise, then replaces it with the value. It’s important to note that k1lib.bioinfo.cli.utils.dereference already replaced every Promise, so you don’t have to pass through this cli beforehand if you intend to dereference. Example:

ctx.setC('a', 4)
# returns [4]
[ctx.Promise('a')] | ctx.dereference() | toList()
__ror__(it)[source]
class k1lib.bioinfo.cli.ctx.enum(ctx: str)

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(ctx: str)[source]

Saves the list index to context. Example:

# returns [['abc', 0], ['def', 1]]
["abc", "def"] | ctx.enum("a") | apply(lambda r: [r, ctx['a']]) | dereference()
__ror__(it: Iterator[T])Iterator[T][source]
class k1lib.bioinfo.cli.ctx.f(ctx: str, f: Optional[Callable[[T], T]] = None)

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(ctx: str, f: Optional[Callable[[T], T]] = None)[source]

Saves the f-transformed list element to context. Example:

# returns [['abc', 3], ['ab', 2]]
["abc", "ab"] | ctx.f('a', lambda s: len(s)) | apply(lambda r: [r, ctx['a']]) | dereference()
Parameters

f – if not specified, then just save the object as-if

__ror__(it: Iterator[T])Iterator[T][source]
k1lib.bioinfo.cli.ctx.getC(ctx: str)k1lib.bioinfo.cli.ctx.Promise

Gets the context variable. Shortcut available like this:

ctx["a"] = 4
ctx["a"] # return Promise, that will dereferences to 4
k1lib.bioinfo.cli.ctx.setC(ctx: str, value)

Sets the context variable. Shortcut available like this:

ctx["a"] = 3 # instead of ctx.setC("a", 3)

entrez module

This module is not really fleshed out, not that useful/elegant, and I just use cmd instead

k1lib.bioinfo.cli.entrez.esearch(db: str = 'nucleotide', query: str = 'PRJNA257197')[source]
k1lib.bioinfo.cli.entrez.efetch(db: Optional[str] = None, ids: Optional[Union[str, List[str]]] = None, format: Optional[str] = None)[source]

mgi module

All tools related to the MGI database. Expected to use behind the “mgi” module name, like this:

from k1lib.bioinfo.cli import *
["SOD1", "AMPK"] | mgi.batch()
class k1lib.bioinfo.cli.mgi.batch(headless=True)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Queries MGI database, convert list of genes to MGI ids

__init__(headless=True)[source]
Parameters

headless – whether to run this operation headless, or actually display the browser

__ror__(it: List[str])[source]

filt module

This is for functions that cuts out specific parts of the table

class k1lib.bioinfo.cli.filt.filt(predicate: Callable[[T], bool], column: Optional[int] = None)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(predicate: Callable[[T], bool], column: Optional[int] = None)[source]

Filters out lines.

Parameters

column

  • if integer, then predicate(row[column])

  • if None, then predicate(row)

__ror__(it: Iterator[T])Iterator[T][source]
__invert__()[source]

Negate the condition

k1lib.bioinfo.cli.filt.isValue(value, column: Optional[int] = None)[source]

Filters out lines that is different from the given value

k1lib.bioinfo.cli.filt.inSet(values: Set[Any], column: Optional[int] = None)[source]

Filters out lines that is not in the specified set

k1lib.bioinfo.cli.filt.contains(s: str, column: Optional[int] = None)[source]

Filters out lines that don’t contain the specified substring

class k1lib.bioinfo.cli.filt.nonEmptyStream[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Filters out streams that have no rows

__ror__(streams: Iterator[Iterator[Any]])Iterator[Iterator[Any]][source]
k1lib.bioinfo.cli.filt.startswith(s: str, column: Optional[int] = None)[source]

Filters out lines that don’t start with s

k1lib.bioinfo.cli.filt.endswith(s: str, column: Optional[int] = None)[source]

Filters out lines that don’t end with s

k1lib.bioinfo.cli.filt.isNumeric(column: Optional[int] = None)[source]

Filters out a line if that column is not a number

k1lib.bioinfo.cli.filt.instanceOf(cls: Union[type, Tuple[type]], column: Optional[int] = None)[source]

Filters out lines that is not an instance of the given type

k1lib.bioinfo.cli.filt.inRange(min: Optional[float] = None, max: Optional[float] = None, column: Optional[int] = None)[source]

Checks whether a column is in range or not

class k1lib.bioinfo.cli.filt.head(n: int = 10)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(n: int = 10)[source]

Only outputs first n lines. You can also negate it (like ~head(5)), which then only outputs after first n lines. Examples:

"abcde" | head(2) | dereference() # returns ["a", "b"]
"abcde" | ~head(2) | dereference() # returns ["c", "d", "e"]
"0123456" | head(-3) | dereference() # returns ['0', '1', '2', '3']
"0123456" | ~head(-3) | dereference() # returns ['4', '5', '6']
__ror__(it: Iterator[T])Iterator[T][source]
__invert__()[source]
class k1lib.bioinfo.cli.filt.columns(*columns: List[int])[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*columns: List[int])[source]

Cuts out specific columns, sliceable. Examples:

["0123456789"] | cut(5, 8) | dereference() # returns [['5', '8']]
["0123456789"] | cut(2) | dereference() # returns ['2']
["0123456789"] | cut(5, 8) | dereference() # returns [['5', '8']]
["0123456789"] | ~cut()[:7:2] | dereference() # returns [['1', '3', '5', '7', '8', '9']]

If you’re selecting only 1 column, then Iterator[T] will be returned, not Table[T].

__ror__(it: k1lib.bioinfo.cli.init.Table)k1lib.bioinfo.cli.init.Table[source]
__invert__()[source]
k1lib.bioinfo.cli.filt.cut

alias of k1lib.bioinfo.cli.filt.columns

class k1lib.bioinfo.cli.filt.rows(*rows: List[int])[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*rows: List[int])[source]

Cuts out specific rows. Space complexity O(1) as a list is not constructed (unless you’re using some really weird slices).

Parameters

rows – ints for the row indices

Example:

"0123456789" | rows(2) | dereference() # returns ["2"]
"0123456789" | rows(5, 8) | dereference() # returns ["5", "8"]
"0123456789" | rows()[2:5] | dereference() # returns ["2", "3", "4"]
"0123456789" | ~rows()[2:5] | dereference() # returns ["0", "1", "5", "6", "7", "8", "9"]
"0123456789" | ~rows()[:7:2] | dereference() # returns ['1', '3', '5', '7', '8', '9']
"0123456789" | rows()[:-4] | dereference() # returns ['0', '1', '2', '3', '4', '5']
"0123456789" | ~rows()[:-4] | dereference() # returns ['6', '7', '8', '9']
__invert__()[source]
__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.filt.intersection[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Returns the intersection of multiple streams. Example:

# returns set([2, 4, 5])
[[1, 2, 3, 4, 5], [7, 2, 4, 6, 5]] | intersection()
__ror__(its: Iterator[Iterator[Any]])Set[Any][source]
class k1lib.bioinfo.cli.filt.union[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Returns the union of multiple streams. Example:

# returns {0, 1, 2, 10, 11, 12, 13, 14}
[range(3), range(10, 15)] | union()
__ror__(its: Iterator[Iterator[Any]])Set[Any][source]
class k1lib.bioinfo.cli.filt.notIn(s: Iterator[T])[source]

Bases: object

__init__(s: Iterator[T])[source]

Returns elements that are not in the specified list. Example:

# returns [-5, -4, -3, -2, -1, 10, 11]
range(-5, 12) | notIn(range(10)) | dereference()
__ror__(it: Iterator[T])Iterator[T][source]
class k1lib.bioinfo.cli.filt.unique(column: int)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(column: int)[source]

Filters out non-unique row elements. Example:

# returns [[1, "a"], [2, "a"]]
[[1, "a"], [2, "a"], [1, "b"]] | unique(0) | dereference()
__ror__(it)[source]

gb module

All tools related to GenBank file format. Expected to use behind the “gb” module name, like this:

from k1lib.bioinfo.cli import *
cat("abc.gb") | gb.feats()
class k1lib.bioinfo.cli.gb.feats[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Fetches features, each on a separate stream

__ror__(it)[source]
static filt(*terms: str)k1lib.bioinfo.cli.init.BaseCli[source]

Filters for specific terms in all the features texts. If there are multiple terms, then filters for first term, then second, then third, so the term’s order might matter to you

static tag(tag: str)k1lib.bioinfo.cli.init.BaseCli[source]

Gets a single tag out. Applies this on a single feature only

class k1lib.bioinfo.cli.gb.origin[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Return the origin section of the genbank file

__ror__(it)[source]

grep module

class k1lib.bioinfo.cli.grep.grep(pattern: str, before: int = 0, after: int = 0)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(pattern: str, before: int = 0, after: int = 0)[source]

Find lines that has the specified pattern. Example:

# returns ['c', 'd', '2', 'd']
"abcde12d34" | grep("d", 1) | dereference()
# returns ['d', 'e', 'd', '3', '4']
"abcde12d34" | grep("d", 0, 3).till("e") | dereference()
Parameters
  • pattern – regex pattern to search for in a line

  • before – lines before the hit. Outputs independent lines

  • after – lines after the hit. Outputs independent lines

till(pattern: str)[source]

Greps until some other pattern appear. Before lines will be honored, but after lines will be set to inf. Inclusive.

__ror__(it: Iterator[str])Iterator[str][source]
class k1lib.bioinfo.cli.grep.grepToTable(pattern: str, before: int = 0, after: int = 0)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(pattern: str, before: int = 0, after: int = 0)[source]

Searches for a pattern. If found, then put all the before and after lines in different columns. Example:

# returns [['2', 'b'], ['5', 'b']]
"1a\n 2b\n 3c\n 4d\n 5b\n 6c\n f" | grepToTable("b", 1) | dereference()
__ror__(it: Iterator[str])k1lib.bioinfo.cli.init.Table[source]
class k1lib.bioinfo.cli.grep.grepTemplate(pattern: str, template: str)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(pattern: str, template: str)[source]

Searches over all lines, pick out the match, and expands it to the templateand yields

__ror__(it: Iterator[str])[source]

init module

cli.bioinfoSettings = {'defaultDelim': '\t', 'defaultIndent': '  ', 'lookupImgs': True, 'oboFile': None, 'strict': False}

Main settings of k1lib.bioinfo.cli. When using:

from k1lib.bioinfo.cli import *

…you can just set the settings like this:

bioinfoSettings["defaultIndent"] = "\t"

There are a few settings:

  • defaultDelim: default delimiter used in-between columns when creating tables

  • defaultIndent: default indent used for displaying nested structures

  • lookupImgs: whether to automatically look up images when exploring something

  • oboFile: gene ontology obo file location

  • strict: whether strict mode is on. Turning it on can help you debug stuff, but could also be a pain to work with

class k1lib.bioinfo.cli.init.BaseCli[source]

Bases: object

__init__()[source]

Not expected to be instantiated by the end user.

__and__(cli: k1lib.bioinfo.cli.init.BaseCli)k1lib.bioinfo.cli.init.oneToMany[source]

Duplicates input stream to multiple joined clis.

__add__(cli: k1lib.bioinfo.cli.init.BaseCli)k1lib.bioinfo.cli.init.manyToManySpecific[source]

Parallel pass multiple streams to multiple clis.

all()k1lib.bioinfo.cli.init.BaseCli[source]

Applies this cli to all incoming streams

__or__(it)k1lib.bioinfo.cli.init.serial[source]

Joins clis end-to-end

__ror__(it)[source]
f()[source]

Creates a normal function \(f(x)\) which is equivalent to x | self.

__lt__(it)[source]

Default backup join symbol >, in case it implements __ror__()

__call__(it)[source]

Another way to do it | cli

class k1lib.bioinfo.cli.init.serial(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]

Merges clis into 1, feeding end to end. Used in chaining clis together without a prime iterator. Meaning, without this, stuff like this fails to run:

[1, 2] | a() | b() # runs
c = a() | b(); [1, 2] | c # doesn't run if this class doesn't exist
__ror__(it: Iterator[Any])Iterator[Any][source]
class k1lib.bioinfo.cli.init.oneToMany(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]

Duplicates 1 stream into multiple streams, each for a cli in the list. Used in the “a & b” joining operator

__ror__(it: Iterator[Any])Iterator[Iterator[Any]][source]
class k1lib.bioinfo.cli.init.manyToMany(cli: k1lib.bioinfo.cli.init.BaseCli)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(cli: k1lib.bioinfo.cli.init.BaseCli)[source]

Applies multiple streams to a single cli. Used in the “a.all()” operator. Note that this operation will use a different copy of the cli for each of the streams.

__ror__(it: Iterator[Iterator[Any]])Iterator[Iterator[Any]][source]
class k1lib.bioinfo.cli.init.manyToManySpecific(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]

Applies multiple streams to multiple clis independently. Used in the “a + b” joining operator

__ror__(its: Iterator[Any])Iterator[Any][source]

inp module

This module for tools that will likely start the processing stream.

k1lib.bioinfo.cli.inp.cat(fileName: Optional[str] = None)[source]

Reads a file line by line. Example:

# display first 10 lines of file
cat("file.txt") | headOut()
# piping in also works
"file.txt" | cat() | headOut()
Parameters

fileName – if None, then return a BaseCli that accepts a file name and outputs Iterator[str]

class k1lib.bioinfo.cli.inp.cats[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Like cat(), but opens multiple files at once, returning streams. Looks something like this:

apply(lambda s: cat(s))

Example:

# prints out first 10 lines of 2 files
["file1.txt", "file2.txt"] | cats() | headOut().all() | ignore()
__ror__(fileNames: Iterator[str])Iterator[Iterator[str]][source]
k1lib.bioinfo.cli.inp.curl(url: str)Iterator[str][source]

Gets file from url. File can’t be a binary blob. Example:

# prints out first 10 lines of the website
curl("https://k1lib.github.io/") | cli.headOut()
k1lib.bioinfo.cli.inp.wget(url: str, fileName: Optional[str] = None)[source]

Downloads a file

Parameters
  • url – The url of the file

  • fileName – if None, then tries to infer it from the url

k1lib.bioinfo.cli.inp.ls(folder: Optional[str] = None, dirs=True, files=True)[source]

List every file and folder inside the specified folder. Example:

# returns List[str]
ls("/home")
# same as above
"/home" | ls()
# only outputs files, not folders
ls("/home", dirs=False)
# same as above
"/home" | ls(dirs=False)
class k1lib.bioinfo.cli.inp.cmd(cmd: str)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(cmd: str)[source]

Runs a command, and returns the output line by line. Example:

# return detailed list of files
None | cmd("ls -la")
# return list of files that ends with "ipynb"
None | cmd("ls -la") | cmd('grep ipynb$')
property err

Error from the last command

__ror__(it: Optional[Iterator[str]])Iterator[str][source]

Pipes in lines of input, or if there’s nothing to pass, then pass None

k1lib.bioinfo.cli.inp.requireCli(cliTool: str)[source]

Searches for a particular cli tool (eg. “ls”), throws ImportError if not found, else do nothing

k1lib.bioinfo.cli.inp.infiniteF(f)[source]

Essentially just while True: yield f().

kcsv module

All tools related to csv file format. Expected to use behind the “kcsv” module name, like this:

from k1lib.bioinfo.cli import *
kcsv.cat("file.csv") | display()
k1lib.bioinfo.cli.kcsv.cat(file: str)Iterator[str][source]

Opens a csv file, and turns them into nice row elements

kxml module

All tools related to xml file format. Expected to use behind the “kxml” module name, like this:

from k1lib.bioinfo.cli import *
cat("abc.xml") | kxml.node() | kxml.display()
class k1lib.bioinfo.cli.kxml.node[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Turns lines into a single node

__ror__(it: Iterator[str])Iterator[xml.etree.ElementTree.Element][source]
class k1lib.bioinfo.cli.kxml.maxDepth(depth: Optional[int] = None, copy: bool = True)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(depth: Optional[int] = None, copy: bool = True)[source]

Filters out too deep nodes

Parameters
  • depth – max depth to include in

  • copy – whether to limit the nodes itself, or limit a copy

__ror__(nodes: Iterator[xml.etree.ElementTree.Element])Iterator[xml.etree.ElementTree.Element][source]
class k1lib.bioinfo.cli.kxml.tag(tag: str)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(tag: str)[source]

Finds all tags that have a particular name. If found, then don’t search deeper

__ror__(nodes: Iterator[xml.etree.ElementTree.Element])Iterator[xml.etree.ElementTree.Element][source]
class k1lib.bioinfo.cli.kxml.pretty(indent: Optional[str] = None)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__ror__(it: Iterator[xml.etree.ElementTree.Element])Iterator[str][source]
class k1lib.bioinfo.cli.kxml.display(depth: int = 3, lines: int = 20)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(depth: int = 3, lines: int = 20)[source]

Convenience method for getting head, make it pretty and print it out

__ror__(it: Iterator[xml.etree.ElementTree.Element], lines=10)[source]

modifier module

This is for quick modifiers, think of them as changing formats

class k1lib.bioinfo.cli.modifier.apply(f: Callable[[str], str], column: Optional[int] = None)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(f: Callable[[str], str], column: Optional[int] = None)[source]

Applies a function f to every line

Parameters

column – if not None, then applies the function to that column only

__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.modifier.applyMp(f: Callable[[T], T], *args, **kwargs)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(f: Callable[[T], T], *args, **kwargs)[source]

Like apply, but execute f(row) of each row in multiple processes. Example:

# returns [3, 2]
["abc", "de"] | applyMp(lambda s: len(s)) | dereference()
# returns [5, 6, 9]
range(3) | applyMp(lambda x, bias: x**2+bias, bias=5) | dereference()

Internally, this will continuously spawn new jobs up until 80% of all CPU cores are utilized. As the new processes will not share the same memory space as the main process, you should pass all dependencies in the arguments

Parameters

args – arguments to be passed to the function. kwargs too

__ror__(it: Iterator[T])Iterator[T][source]
class k1lib.bioinfo.cli.modifier.applySingle(f: Callable[[T], T])[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(f: Callable[[T], T])[source]

Like apply, but much simpler, just operating on the entire input object, essentially

__ror__(it: T)T[source]
k1lib.bioinfo.cli.modifier.applyS

alias of k1lib.bioinfo.cli.modifier.applySingle

k1lib.bioinfo.cli.modifier.lstrip(column: Optional[int] = None, char: Optional[str] = None)[source]

Strips left of every line

k1lib.bioinfo.cli.modifier.rstrip(column: Optional[int] = None, char: Optional[str] = None)[source]

Strips right of every line

k1lib.bioinfo.cli.modifier.strip(column: Optional[int] = None, char: Optional[str] = None)[source]

Strips both sides of every line

k1lib.bioinfo.cli.modifier.upper(column: Optional[int] = None)[source]

Make all characters uppercase

k1lib.bioinfo.cli.modifier.lower(column: Optional[int] = None)[source]

Make all characters lowercase

k1lib.bioinfo.cli.modifier.replace(s: str, target: Optional[str] = None, column: Optional[int] = None)[source]

Replaces substring s with target for each line.

k1lib.bioinfo.cli.modifier.remove(s: str, column: Optional[int] = None)[source]

Removes a specific substring in each line.

k1lib.bioinfo.cli.modifier.toFloat(*columns: List[int])[source]

Converts every row into a float. Excludes non numbers if not in strict mode. Example:

# returns [1, 3, -2.3]
["1", "3", "-2.3"] | toFloat() | dereference()
# returns [[1.0, 'a'], [2.3, 'b'], [8.0, 'c']]
[["1", "a"], ["2.3", "b"], [8, "c"]] | toFloat(0) | dereference()
Parameters

columns – if nothing, then will convert each row. If available, then convert all the specified columns

k1lib.bioinfo.cli.modifier.toInt(*columns: List[int])[source]

Converts every row into an integer. Excludes non numbers if not in strict mode. Example:

# returns [1, 3, -2]
["1", "3", "-2.3"] | toInt() | dereference()
Parameters

columns – if nothing, then will convert each row. If available, then convert all the specified columns

See also: toFloat()

class k1lib.bioinfo.cli.modifier.sort(column: int = 0, numeric=True, reverse=False)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(column: int = 0, numeric=True, reverse=False)[source]

Sorts all lines based on a specific column.

Parameters
  • column – if None, sort rows based on themselves and not an element

  • numeric – whether to treat column as float

  • reverse – False for smaller to bigger, True for bigger to smaller. Use __invert__() to quickly reverse the order instead of using this param

__ror__(it: Iterator[str])[source]
__invert__()[source]

Creates a clone that has the opposite sort order

class k1lib.bioinfo.cli.modifier.sortF(f: Callable[[T], float], reverse=False)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(f: Callable[[T], float], reverse=False)[source]

Sorts rows using a function. Example:

# returns ['a', 'aa', 'aaa', 'aaaa', 'aaaaa']
["a", "aaa", "aaaaa", "aa", "aaaa"] | sortF(lambda r: len(r)) | dereference()
__ror__(it)[source]
__invert__()[source]
class k1lib.bioinfo.cli.modifier.consume(f: Union[k1lib.bioinfo.cli.init.BaseCli, Callable[[T], None]])[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(f: Union[k1lib.bioinfo.cli.init.BaseCli, Callable[[T], None]])[source]

Consumes the iterator in a side stream. Returns the iterator. Kinda like the bash command tee. Example:

# prints "0\n1\n2" and returns [0, 1, 2]
range(3) | consume(headOut()) | toList()
# prints "range(0, 3)" and returns [0, 1, 2]
range(3) | consume(lambda it: print(it)) | toList()

This is useful whenever you want to mutate something, but don’t want to include the function result into the main stream.

__ror__(it: T)T[source]
class k1lib.bioinfo.cli.modifier.randomize(bs=100)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(bs=100)[source]

Randomize input stream. In order to be efficient, this does not convert the input iterator to a giant list and yield random values from that. Instead, this fetches bs items at a time, randomizes them, returns and fetch another bs items. If you want to do the giant list, then just pass in float("inf"). Example:

# returns [0, 1, 2, 3, 4], effectively no randomize at all
range(5) | randomize(1) | dereference()
# returns something like this: [1, 0, 2, 3, 5, 4, 6, 8, 7, 9]. You can clearly see the batches
range(10) | randomize(3) | dereference()
# returns something like this: [7, 0, 5, 2, 4, 9, 6, 3, 1, 8]
range(10) | randomize(float("inf")) | dereference()
__ror__(it: Iterator[T])Iterator[T][source]

output module

For operations that feel like the termination

class k1lib.bioinfo.cli.output.stdout[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Prints out all lines. If not iterable, then print out the input raw

__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.output.file(fileName: str)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__ror__(it: Iterator[str])None[source]
class k1lib.bioinfo.cli.output.pretty[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Pretty prints a table

__ror__(it: k1lib.bioinfo.cli.init.Table)Iterator[str][source]
k1lib.bioinfo.cli.output.display(lines: int = 10)[source]

Convenience method for displaying a table

k1lib.bioinfo.cli.output.headOut(lines: int = 10)[source]

Convenience method for head() | stdout()

sam module

This is for functions that are .sam or .bam related

k1lib.bioinfo.cli.sam.cat(bamFile: str)[source]

Get sam file outputs from bam file

class k1lib.bioinfo.cli.sam.header(long=True)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(long=True)[source]

Adds a header to the table.

Parameters

long – whether to use a long descriptive header, or a short one

__ror__(it)[source]
class k1lib.bioinfo.cli.sam.quality(log=True)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(log=True)[source]

Get numeric quality of sequence.

Parameters

log – whether to use log scale (0 -> 40), or linear scale (1 -> 0.0001)

__ror__(line)[source]

structural module

This is for functions that sort of changes the table structure in a dramatic way. They’re the core transformations

class k1lib.bioinfo.cli.structural.joinColumns(fillValue=None)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(fillValue=None)[source]

Join multiple columns and loop through all rows. Aka transpose.

Parameters

fillValue – if not None, then will try to zip longest with this fill value

Example:

# returns [[1, 4], [2, 5], [3, 6]]
[[1, 2, 3], [4, 5, 6]] | joinColumns() | dereference()
# returns [[1, 4], [2, 5], [3, 6], [0, 7]]
[[1, 2, 3], [4, 5, 6, 7]] | joinColumns(0) | dereference()
__ror__(it: Iterator[Iterator[T]])k1lib.bioinfo.cli.init.Table[source]
k1lib.bioinfo.cli.structural.transpose

alias of k1lib.bioinfo.cli.structural.joinColumns

k1lib.bioinfo.cli.structural.splitColumns

alias of k1lib.bioinfo.cli.structural.joinColumns

class k1lib.bioinfo.cli.structural.joinList(element=None, begin=True)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(element=None, begin=True)[source]

Join element into list.

Parameters

element – the element to insert. If None, then takes the input [e, […]], else takes the input […] as usual

Example:

# returns [5, 2, 6, 8]
[5, [2, 6, 8]] | joinList()
# also returns [5, 2, 6, 8]
[2, 6, 8] | joinList(5)
__ror__(it: Tuple[T, Iterator[T]])Iterator[T][source]
class k1lib.bioinfo.cli.structural.splitList(weights: List[float] = [0.8, 0.2])[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(weights: List[float] = [0.8, 0.2])[source]

Splits list of elements into multiple lists. Example:

# returns [[0, 1, 2, 3, 4, 5, 6, 7], [8, 9]]
range(10) | splitList([0.8, 0.2]) | dereference()
__ror__(it)[source]
class k1lib.bioinfo.cli.structural.joinStreams[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Join multiple streams. Example:

# returns [1, 2, 3, 4, 5]
[[1, 2, 3], [4, 5]] | joinStreams() | dereference()
__ror__(streams: Iterator[Iterator[T]])Iterator[T][source]
class k1lib.bioinfo.cli.structural.joinStreamsRandom[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Join multiple streams randomly. If any streams runs out, then quits. Example:

# could return [0, 1, 10, 2, 11, 12, 13, ...], with max length 20, typical length 18
[range(0, 10), range(10, 20)] | joinStreamsRandom() | dereference()
__ror__(streams: Iterator[Iterator[T]])Iterator[T][source]
class k1lib.bioinfo.cli.structural.batched(bs=32, includeLast=False)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(bs=32, includeLast=False)[source]

Batches the input stream. Example:

# returns [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
range(11) | batched(3) | dereference()
# returns [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
range(11) | batched(3, True) | dereference()
# returns [[0, 1, 2, 3, 4]]
range(5) | batched(float("inf"), True) | dereference()
# returns []
range(5) | batched(float("inf"), False) | dereference()
__ror__(it)[source]
k1lib.bioinfo.cli.structural.collate()[source]

Puts individual columns into a tensor. Example:

# returns [tensor([ 0, 10, 20]), tensor([ 1, 11, 21]), tensor([ 2, 12, 22])]
[range(0, 3), range(10, 13), range(20, 23)] | collate() | toList()
k1lib.bioinfo.cli.structural.insertRow(*row: List[T])[source]

Inserts a row right before every other rows. See also: joinList().

k1lib.bioinfo.cli.structural.insertColumn(*column, begin=True, fillValue='')[source]

Inserts a column at beginning or end. Example:

# returns [['a', 1, 2], ['b', 3, 4]]
[[1, 2], [3, 4]] | insertColumn("a", "b") | dereference()
k1lib.bioinfo.cli.structural.insertIdColumn(table=False, begin=True, fillValue='')[source]

Inserts an id column at the beginning (or end). Example:

# returns [[0, 'a', 2], [1, 'b', 4]]
[["a", 2], ["b", 4]] | insertIdColumn(True) | dereference()
# returns [[0, 'a'], [1, 'b']]
"ab" | insertIdColumn()
Parameters

table – if False, then insert column to an Iterator[str], else treat input as a full fledged table

class k1lib.bioinfo.cli.structural.toDict(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]

Transform an incoming stream into a dict using a function for values. Example:

names = ["wanda", "vision", "loki", "mobius"]
names | toDict(valueF=lambda s: len(s)) # will return {"wanda": 5, "vision": 6, ...}
names | toDict(lambda s: s.title(), lambda s: len(s)) # will return {"Wanda": 5, "Vision": 6, ...}
__ror__(keys: Iterator[Any])Dict[Any, Any][source]
class k1lib.bioinfo.cli.structural.split(delim: Optional[str] = None, idx: Optional[int] = None)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(delim: Optional[str] = None, idx: Optional[int] = None)[source]

Splits each line using a delimiter, and outputs the parts as a separate line. Example:

# returns ["a", "b", "d", "e"]
["a,b", "d,e"] | split(",") | dereference()
# returns ['b', 'e']
["a,b", "d,e"] | split(",", 1) | dereference()
Parameters

idx – if available, only outputs the element at that index

__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.structural.expandE(f: Callable[[T], List[T]], column: int)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(f: Callable[[T], List[T]], column: int)[source]

Expands table element to multiple columns. Example:

# returns [['abc', 3, -2], ['de', 2, -5]]
[["abc", -2], ["de", -5]] | expandE(lambda e: (e, len(e)), 0) | dereference()
Parameters

f – Function that transforms 1 row element to multiple

__ror__(it)[source]
class k1lib.bioinfo.cli.structural.table(delim: Optional[str] = None)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(delim: Optional[str] = None)[source]

Splits lines to rows (List[str]) using a delimiter. Example:

# returns [['a', 'bd'], ['1', '2', '3']]
["a|bd", "1|2|3"] | table("|") | dereference()
__ror__(it: Iterator[str])k1lib.bioinfo.cli.init.Table[source]
class k1lib.bioinfo.cli.structural.stitch(delim: Optional[str] = None)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(delim: Optional[str] = None)[source]

Stitches elements in a row together, so they become a simple string. See also: pretty. Example:

# returns ['1|2', '3|4', '5|6']
[[1, 2], [3, 4], [5, 6]] | stitch("|") | dereference()
__ror__(it: k1lib.bioinfo.cli.init.Table)Iterator[str][source]
k1lib.bioinfo.cli.structural.listToTable()[source]

Turns Iterator[T] into Table[T]

k1lib.bioinfo.cli.structural.tableFromList()

Turns Iterator[T] into Table[T]

class k1lib.bioinfo.cli.structural.count[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Finds unique elements and returns a table with [frequency, value, percent] columns. Example:

# returns [[1, 'a', '33%'], [2, 'b', '67%']]
['a', 'b', 'b'] | count() | dereference()
__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.structural.permute(*permutations: List[int])[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*permutations: List[int])[source]

Permutes the columns. Acts kinda like torch.Tensor.permute(). Example:

# returns [['b', 'a'], ['d', 'c']]
["ab", "cd"] | permute(1, 0) | dereference()
__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.structural.accumulate(columnIdx: int = 0, avg=False)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(columnIdx: int = 0, avg=False)[source]

Groups lines that have the same row[columnIdx], and add together all other columns, assuming they’re numbers

Parameters
  • columnIdx – common column index to accumulate

  • avg – calculate average values instead of sum

__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.structural.AA_(*idxs: List[int], wraps=False)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*idxs: List[int], wraps=False)[source]

Returns 2 streams, one that has the selected element, and the other the rest. Example:

[1, 5, 6, 3, 7] | AA_(1) # will return [5, [1, 6, 3, 7]]

You can also put multiple indexes through:

[1, 5, 6] | AA_(0, 2) # will return [[1, [5, 6]], [6, [1, 5]]]

If you put None in, then all indexes will be sliced:

[1, 5, 6] | AA_(0, 2)

# will return:
# [[1, [5, 6]],
#  [5, [1, 6]],
#  [6, [1, 5]]]

As for why the strange name, think of this operation as “AĀ”. In statistics, say you have a set “A”, then “not A” is commonly written as A with an overline “Ā”. So “AA_” represents “AĀ”, and that it first returns the selection A.

Parameters

wraps – if True, then the first example will return [[5, [1, 6, 3, 7]]] instead, so that A has the same signature as Ā

__ror__(it: List[Any])List[List[List[Any]]][source]
class k1lib.bioinfo.cli.structural.peek[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Returns (firstRow, iterator). This sort of peaks at the first row, to potentially gain some insights about the internal formats. Example:

e, it = iter([[1, 2, 3], [1, 2]]) | peek()
print(e) # prints "[1, 2, 3]"
s = 0
for e in it: s += len(e)
print(s) # prints "5", or length of 2 lists
__ror__(it: Iterator[T])Tuple[T, Iterator[T]][source]
class k1lib.bioinfo.cli.structural.peekF(f: Union[k1lib.bioinfo.cli.init.BaseCli, Callable[[T], T]])[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(f: Union[k1lib.bioinfo.cli.init.BaseCli, Callable[[T], T]])[source]

Similar to peek, but will execute f(row) and return the input Iterator. Example:

it = lambda: iter([[1, 2, 3], [1, 2]])
# prints "[1, 2, 3]" and returns [[1, 2, 3], [1, 2]]
it() | peekF(lambda x: print(x)) | dereference()
# prints "1\n2\n3"
it() | peekF(headOut()) | dereference()
__ror__(it: Iterator[T])Iterator[T][source]
class k1lib.bioinfo.cli.structural.repeat(limit: Optional[int] = None)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Yields a specified amount of the passed in object. Example:

# returns [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
[1, 2, 3] | repeat(3) | toList()
Parameters

repeat – if None, then repeats indefinitely

__ror__(o: T)Iterator[T][source]
class k1lib.bioinfo.cli.structural.infiniteFrom[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Yields from a list. If runs out of elements, then do it again. Example:

# returns [1, 2, 3, 1, 2]
[1, 2, 3] | infiniteFrom() | head(5) | dereference()
__ror__(it: Iterator[T])Iterator[T][source]

utils module

This is for all short utilities that has the boilerplate feeling

class k1lib.bioinfo.cli.utils.size(idx=None)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(idx=None)[source]

Returns number of rows and columns in the input.

Parameters

idx – if idx is None return (rows, columns). If 0 or 1, then rows or columns

__ror__(it: Iterator[str])[source]
k1lib.bioinfo.cli.utils.shape

alias of k1lib.bioinfo.cli.utils.size

class k1lib.bioinfo.cli.utils.item[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Returns the first row

__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.utils.identity[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Yields whatever the input is. Useful for multiple streams

__ror__(it: Iterator[Any])[source]
class k1lib.bioinfo.cli.utils.toStr[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts every line (possibly just a number) to a string.

__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.utils.to1Str(delim: Optional[str] = None)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(delim: Optional[str] = None)[source]

Merges all strings into 1, with delim in the middle

__ror__(it: Iterator[str])[source]
class k1lib.bioinfo.cli.utils.toNumpy[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts generator to numpy array

__ror__(it: Iterator[float])[source]
class k1lib.bioinfo.cli.utils.toTensor[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts generator to torch.Tensor

__ror__(it)[source]
class k1lib.bioinfo.cli.utils.toList[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts generator to list. list would do the same, but this is just to maintain the style

__ror__(it: Iterator[Any])List[Any][source]
class k1lib.bioinfo.cli.utils.wrapList[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Wraps inputs inside a list

__ror__(it: Any)List[Any][source]
class k1lib.bioinfo.cli.utils.toSet[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts generator to set. set would do the same, but this is just to maintain the style

__ror__(it: Iterator[Any])Set[Any][source]
class k1lib.bioinfo.cli.utils.toIter[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts object to iterator. iter() would do the same, but this is just to maintain the style

__ror__(it)Iterator[Any][source]
class k1lib.bioinfo.cli.utils.toRange[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Returns iter(range(len(it))), effectively

__ror__(it: Iterator[Any])Iterator[int][source]
class k1lib.bioinfo.cli.utils.equals[source]

Bases: object

Checks if all incoming columns/streams are identical

__ror__(streams: Iterator[Iterator[str]])[source]
class k1lib.bioinfo.cli.utils.reverse[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Prints last line first, first line last

__ror__(it: Iterator[str])List[str][source]
class k1lib.bioinfo.cli.utils.ignore[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Just executes everything, ignoring the output

__ror__(it: Iterator[Any])[source]
class k1lib.bioinfo.cli.utils.toSum[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Calculates the sum of list of numbers

__ror__(it: Iterator[float])[source]
class k1lib.bioinfo.cli.utils.toAvg[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Calculates average of list of numbers

__ror__(it: Iterator[float])[source]
class k1lib.bioinfo.cli.utils.toMax[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Calculates the max of a bunch of numbers

__ror__(it: Iterator[float])float[source]
class k1lib.bioinfo.cli.utils.toMin[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Calculates the min of a bunch of numbers

__ror__(it: Iterator[float])float[source]
class k1lib.bioinfo.cli.utils.lengths[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

Returns the lengths of each row.

__ror__(it: Iterator[List[Any]])Iterator[int][source]
k1lib.bioinfo.cli.utils.headerIdx()[source]

Cuts out first line, put an index column next to it, and prints it out. Useful when you want to know what your column’s index is to cut it out. Also sets the context variable “header”, in case you need it later. Example:

# returns [[0, 'a'], [1, 'b'], [2, 'c']]
["abc"] | headerIdx() | dereference()
# returns "abc"
ctx["header"]()
class k1lib.bioinfo.cli.utils.dereference(ignoreTensors=False, maxDepth=inf)[source]

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(ignoreTensors=False, maxDepth=inf)[source]

Recursively converts any iterator into a list. Only str, numbers.Number are not converted. Example:

# returns something like "<range_iterator at 0x7fa8c52ca870>"
iter(range(5))
# returns [0, 1, 2, 3, 4]
iter(range(5)) | deference()

You can also specify a maxDepth:

# returns something like "<list_iterator at 0x7f810cf0fdc0>"
iter([range(3)]) | dereference(maxDepth=0)
# returns [range(3)]
iter([range(3)]) | dereference(maxDepth=1)
# returns [[0, 1, 2]]
iter([range(3)]) | dereference(maxDepth=2)
Parameters

ignoreTensors – if True, then don’t loop over torch.Tensor internals

Warning

Can work well with PyTorch Tensors, but not Numpy’s array as they screw things up with the __ror__ operator, so do torch.from_numpy(…) first.

__ror__(it: Iterator[Any])List[Any][source]
__invert__()k1lib.bioinfo.cli.init.BaseCli[source]

Returns a BaseCli that makes everything an iterator.