cli package¶

The main idea of this package is to emulate the terminal, but doing all of that inside Python itself. So this bash statement:

cat file.txt | head -5 > headerFile.txt

Turns into this statement:

cat("file.txt") | head(5) > file("headerFile.txt")

Here, “cat”, “head” and “file” are all classes extended from BaseCli. All of them implements the “reverse or” operation, or __ror__. Essentially, these 2 statements are equivalent:

3 | obj
obj.__ror__(3)

Also, a lot of these tools assume that we are operating on a table. So this table:

col1	col2	col3
1	2	3
4	5	6

Is equivalent to this list:

[["col1", "col2", "col3"], [1, 2, 3], [4, 5, 6]]

Also, the expected way to use these tools is to import everything directly into the current environment, like this:

from k1lib.bioinfo.cli import *

Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. If this interests you, check over this:

Cli streams tutorial

Submodules¶

bio module¶

This is for functions that are actually biology-related

k1lib.bioinfo.cli.bio.go(term: int)[source]¶: Looks up a GO term

class k1lib.bioinfo.cli.bio.transcribe[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Transcribes (DNA -> RNA) incoming rows

__ror__(it)¶

class k1lib.bioinfo.cli.bio.complement[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__ror__(it)¶

class k1lib.bioinfo.cli.bio.translate(length: int = 0)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(length: int = 0)[source]¶

Translates incoming rows.

Parameters: length – 0 for short (L), 1 for med (Leu), 2 for long (Leucine)

__ror__(it)¶

class k1lib.bioinfo.cli.bio.medAa[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts short aa sequence to medium one

__ror__(it)¶

class k1lib.bioinfo.cli.bio.longAa[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts short aa sequence to long one

__ror__(it)¶

ctx module¶

All tools related to context variables. Expected to use behind the “ctx” module name, like this:

from k1lib.bioinfo.cli import *
ctx.set("country", 3)

class k1lib.bioinfo.cli._ctx.Promise(ctx: str)[source]¶

Bases: object

Not intended to be used by the end user. Use __call__() to get the actual value

static strip(o)[source]¶: If is Promise, then gets the value

k1lib.bioinfo.cli._ctx.getC(ctx: str)[source]¶: Gets the context variable

k1lib.bioinfo.cli._ctx.setC(ctx: str, value)[source]¶: Sets the context variable

class k1lib.bioinfo.cli._ctx.enum(ctx: str)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(ctx: str)[source]¶: Saves the list index to context.

__ror__(it)[source]¶

class k1lib.bioinfo.cli._ctx.identity(ctx: str)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(ctx: str)[source]¶: Saves the list element to context.

__ror__(it)[source]¶

entrez module¶

This module is not really fleshed out, not that useful/elegant, and I just use cmd instead

k1lib.bioinfo.cli.entrez.esearch(db: str = 'nucleotide', query: str = 'PRJNA257197')[source]¶

k1lib.bioinfo.cli.entrez.efetch(db: Optional[str] = None, ids: Optional[Union[str, List[str]]] = None, format: Optional[str] = None)[source]¶

mgi module¶

All tools related to the MGI database. Expected to use behind the “mgi” module name, like this:

from k1lib.bioinfo.cli import *
["SOD1", "AMPK"] | mgi.batch()

class k1lib.bioinfo.cli.mgi.batch(headless=True)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Queries MGI database, convert list of genes to MGI ids

__init__(headless=True)[source]¶

Parameters: headless – whether to run this operation headless, or actually display the browser

__ror__(it: List[str])[source]¶

filt module¶

This is for functions that cuts out specific parts of the table

class k1lib.bioinfo.cli.filt.filt(predicate: Callable[[str], bool], column: Optional[int] = None)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(predicate: Callable[[str], bool], column: Optional[int] = None)[source]¶

Filters out lines.

Parameters

column –

if integer, then predicate(row[column])
if None, then predicate(line)

__ror__(it)¶

__invert__()[source]¶: Negate the condition

k1lib.bioinfo.cli.filt.isValue(value, column: Optional[int] = None)[source]¶: Filters out lines that is different from the given value

k1lib.bioinfo.cli.filt.inSet(values: Set[Any], column: Optional[int] = None)[source]¶: Filters out lines that is not in the specified set

k1lib.bioinfo.cli.filt.contains(s: str, column: Optional[int] = None)[source]¶: Filters out lines that don’t contain the specified substring

class k1lib.bioinfo.cli.filt.nonEmptyStream[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Filters out streams that have no rows

__ror__(it)¶

k1lib.bioinfo.cli.filt.startswith(s: str, column: Optional[int] = None)[source]¶: Filters out lines that don’t start with s

k1lib.bioinfo.cli.filt.endswith(s: str, column: Optional[int] = None)[source]¶: Filters out lines that don’t end with s

k1lib.bioinfo.cli.filt.isNumeric(column: Optional[int] = None)[source]¶: Filters out a line if that column is not a number

k1lib.bioinfo.cli.filt.inRange(min: Optional[float] = None, max: Optional[float] = None, column: Optional[int] = None)[source]¶: Checks whether a column is in range or not

class k1lib.bioinfo.cli.filt.head(n: int = 10)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(n: int = 10)[source]¶

Only outputs first n lines. You can also negate it (like ~head(5)), which then only outputs after first n lines. Examples:

"abcde" | head(2) | dereference() # returns ["a", "b"]
"abcde" | ~head(2) | dereference() # returns ["c", "d", "e"]
"0123456" | head(-3) | dereference() # returns ['0', '1', '2', '3']
"0123456" | ~head(-3) | dereference() # returns ['4', '5', '6']

__ror__(it)¶

__invert__()[source]¶

class k1lib.bioinfo.cli.filt.columns(*columns: List[int])[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*columns: List[int])[source]¶

Cuts out specific columns, sliceable. Examples:

["0123456789"] | cut(5, 8) | dereference() # returns [['5', '8']]
["0123456789"] | cut(2) | dereference() # returns ['2']
["0123456789"] | cut(5, 8) | dereference() # returns [['5', '8']]
["0123456789"] | ~cut()[:7:2] | dereference() # returns [['1', '3', '5', '7', '8', '9']]

If you’re selecting only 1 column, then Iterator[T] will be returned, not Table[T].

__ror__(it)¶

__invert__()[source]¶

k1lib.bioinfo.cli.filt.cut¶: alias of k1lib.bioinfo.cli.filt.columns

class k1lib.bioinfo.cli.filt.rows(*rows: List[int])[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*rows: List[int])[source]¶

Cuts out specific rows. Space complexity O(1) as a list is not constructed (unless you’re using some really weird slices).

Parameters: rows – ints for the row indices

Example:

"0123456789" | rows(2) | dereference() # returns ["2"]
"0123456789" | rows(5, 8) | dereference() # returns ["5", "8"]
"0123456789" | rows()[2:5] | dereference() # returns ["2", "3", "4"]
"0123456789" | ~rows()[2:5] | dereference() # returns ["0", "1", "5", "6", "7", "8", "9"]
"0123456789" | ~rows()[:7:2] | dereference() # returns ['1', '3', '5', '7', '8', '9']
"0123456789" | rows()[:-4] | dereference() # returns ['0', '1', '2', '3', '4', '5']
"0123456789" | ~rows()[:-4] | dereference() # returns ['6', '7', '8', '9']

__invert__()[source]¶

__ror__(it)¶

class k1lib.bioinfo.cli.filt.intersection[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Returns the intersection of multiple streams. Example:

[[1, 2, 3, 4, 5], [7, 2, 4, 6, 5]] | intersection() # will return set([2, 4, 5])

__ror__(it)¶

gb module¶

All tools related to GenBank file format. Expected to use behind the “gb” module name, like this:

from k1lib.bioinfo.cli import *
cat("abc.gb") | gb.feats()

class k1lib.bioinfo.cli.gb.feats[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Fetches features, each on a separate stream

__ror__(it)[source]¶

static filt(*terms: str) → k1lib.bioinfo.cli.init.BaseCli [source]¶: Filters for specific terms in all the features texts. If there are multiple terms, then filters for first term, then second, then third, so the term’s order might matter to you

static tag(tag: str) → k1lib.bioinfo.cli.init.BaseCli [source]¶: Gets a single tag out. Applies this on a single feature only

class k1lib.bioinfo.cli.gb.origin[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Return the origin section of the genbank file

__ror__(it)[source]¶

grep module¶

class k1lib.bioinfo.cli.grep.grep(pattern: str, before: int = 0, after: int = 0)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(pattern: str, before: int = 0, after: int = 0)[source]¶

Find lines that has the specified pattern. Example: .. code-block:

# returns ['c', 'd', '2', 'd']
"abcde12d34" | grep("d", 1) | dereference()
# returns ['d', 'e', 'd', '3', '4']
"abcde12d34" | grep("d", 0, 3).till("e") | dereference()

Parameters

pattern – regex pattern to search for in a line
before – lines before the hit. Outputs independent lines
after – lines after the hit. Outputs independent lines

till(pattern: str)[source]¶: Greps until some other pattern appear. Before lines will be honored, but after lines will be set to inf. Inclusive.

__ror__(it)¶

class k1lib.bioinfo.cli.grep.grepToTable(pattern: str, before: int = 0, after: int = 0)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(pattern: str, before: int = 0, after: int = 0)[source]¶

Searches for a pattern. If found, then put all the before and after lines in different columns. Example:

# returns [['2', 'b'], ['5', 'b']]
"1a\n 2b\n 3c\n 4d\n 5b\n 6c\n f" | grepToTable("b", 1) | dereference()

__ror__(it)¶

class k1lib.bioinfo.cli.grep.grepTemplate(pattern: str, template: str)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(pattern: str, template: str)[source]¶: Searches over all lines, pick out the match, and expands it to the templateand yields

__ror__(it)¶

init module¶

cli.bioinfoSettings = {'defaultDelim': '\t', 'defaultIndent': ' ', 'lookupImgs': True, 'oboFile': None, 'strict': False}¶

Main settings of k1lib.bioinfo.cli. When using:

from k1lib.bioinfo.cli import *

…you can just set the settings like this:

bioinfoSettings["defaultIndent"] = "\t"

There are a few settings:

defaultDelim: default delimiter used in-between columns when creating tables

defaultIndent: default indent used for displaying nested structures

lookupImgs: whether to automatically look up images when exploring something

oboFile: gene ontology obo file location

strict: whether strict mode is on. Turning it on can help you debug stuff, but could also be a pain to work with

class k1lib.bioinfo.cli.init.BaseCli[source]¶

Bases: object

all() → k1lib.bioinfo.cli.init.BaseCli [source]¶: Applies this BaseCli to all incoming streams

__ror__(it)¶

f()[source]¶: Creates a normal function \(f(x)\) which is equivalent to x | self.

class k1lib.bioinfo.cli.init.serial(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶

Merges clis into 1, feeding end to end. Used in chaining clis together without a prime iterator. Meaning, without this, stuff like this fails to run:

[1, 2] | a() | b() # runs
c = a() | b(); [1, 2] | c # doesn't run if this class doesn't exist

__ror__(it)¶

class k1lib.bioinfo.cli.init.oneToMany(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶: Duplicates 1 stream into multiple streams, each for a cli in the list. Used in the “a & b” joining operator

__ror__(it)¶

class k1lib.bioinfo.cli.init.manyToMany(cli)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(cli)[source]¶: Applies multiple streams to a single cli. Used in the “a.all()” operator.

__ror__(it)¶

class k1lib.bioinfo.cli.init.manyToManySpecific(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶: Applies multiple streams to multiple clis independently. Used in the “a + b” joining operator

__ror__(it)¶

inp module¶

k1lib.bioinfo.cli.inp.cat(fileName: Optional[str] = None)[source]¶

Reads a file line by line.

Parameters: fileName – if None, then return a BaseCli that accepts a file name and outputs Iterator[str]

class k1lib.bioinfo.cli.inp.cats[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Like cat(), but opens multiple files at once, returning streams. Looks something like this:

apply(lambda s: cat(s))

__ror__(it)¶

k1lib.bioinfo.cli.inp.curl(url: str) → Iterator[str][source]¶: Gets file from url

k1lib.bioinfo.cli.inp.wget(url: str, fileName: Optional[str] = None)[source]¶

Downloads a file

Parameters

url – The url of the file
fileName – if None, then tries to infer it from the url

class k1lib.bioinfo.cli.inp.cmd(cmd: str)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(cmd: str)[source]¶: Runs a command, and returns the output line by line.

property err¶: Error from the last command

__ror__(it)¶

k1lib.bioinfo.cli.inp.requireCli(cliTool: str)[source]¶: Searches for a particular cli tool (eg. “ls”), throws ImportError if not found, else do nothing

k1lib.bioinfo.cli.inp.infiniteF(f)[source]¶: Essentially just while True: yield f().

kcsv module¶

All tools related to csv file format. Expected to use behind the “kcsv” module name, like this:

from k1lib.bioinfo.cli import *
kcsv.cat("file.csv") | display()

k1lib.bioinfo.cli.kcsv.cat(file: str) → Iterator[str][source]¶: Opens a csv file, and turns them into nice row elements

kxml module¶

All tools related to xml file format. Expected to use behind the “kxml” module name, like this:

from k1lib.bioinfo.cli import *
cat("abc.xml") | kxml.node() | kxml.display()

class k1lib.bioinfo.cli.kxml.node[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Turns lines into a single node

__ror__(it: Iterator[str]) → Iterator[xml.etree.ElementTree.Element][source]¶

class k1lib.bioinfo.cli.kxml.maxDepth(depth: Optional[int] = None, copy: bool = True)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(depth: Optional[int] = None, copy: bool = True)[source]¶

Filters out too deep nodes

Parameters

depth – max depth to include in
copy – whether to limit the nodes itself, or limit a copy

__ror__(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶

class k1lib.bioinfo.cli.kxml.tag(tag: str)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(tag: str)[source]¶: Finds all tags that have a particular name. If found, then don’t search deeper

__ror__(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶

class k1lib.bioinfo.cli.kxml.pretty(indent: Optional[str] = None)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__ror__(it: Iterator[xml.etree.ElementTree.Element]) → Iterator[str][source]¶

class k1lib.bioinfo.cli.kxml.display(depth: int = 3, lines: int = 20)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(depth: int = 3, lines: int = 20)[source]¶: Convenience method for getting head, make it pretty and print it out

__ror__(it: Iterator[xml.etree.ElementTree.Element], lines=10)[source]¶

modifier module¶

This is for quick modifiers, think of them as changing formats

class k1lib.bioinfo.cli.modifier.apply(f: Callable[[str], str], column: Optional[int] = None)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(f: Callable[[str], str], column: Optional[int] = None)[source]¶

Applies a function f to every line

Parameters: column – if not None, then applies the function to that column only

__ror__(it)¶

class k1lib.bioinfo.cli.modifier.applySingle(f: Callable[[Any], Any])[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(f: Callable[[Any], Any])[source]¶: Like apply, but much simpler, just operating on the entire input object, essentially

__ror__(it)¶

k1lib.bioinfo.cli.modifier.applyS¶: alias of k1lib.bioinfo.cli.modifier.applySingle

k1lib.bioinfo.cli.modifier.lstrip(column: Optional[int] = None, char: Optional[str] = None)[source]¶: Strips left of every line

k1lib.bioinfo.cli.modifier.rstrip(column: Optional[int] = None, char: Optional[str] = None)[source]¶: Strips right of every line

k1lib.bioinfo.cli.modifier.strip(column: Optional[int] = None, char: Optional[str] = None)[source]¶: Strips both sides of every line

k1lib.bioinfo.cli.modifier.upper(column: Optional[int] = None)[source]¶: Make all characters uppercase

k1lib.bioinfo.cli.modifier.lower(column: Optional[int] = None)[source]¶: Make all characters lowercase

k1lib.bioinfo.cli.modifier.replace(s: str, target: Optional[str] = None, column: Optional[int] = None)[source]¶: Replaces substring s with target for each line.

k1lib.bioinfo.cli.modifier.remove(s: str, column: Optional[int] = None)[source]¶: Removes a specific substring in each line.

k1lib.bioinfo.cli.modifier.toFloat(column: Optional[int] = None)[source]¶: Converts every row into a float. Excludes non numbers if not in strict mode.

k1lib.bioinfo.cli.modifier.toInt(column: Optional[int] = None)[source]¶: Converts every row into an integer. Excludes non numbers if not in strict mode.

class k1lib.bioinfo.cli.modifier.sort(column: int = 0, numeric=True, reverse=False)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(column: int = 0, numeric=True, reverse=False)[source]¶

Sorts all lines based on a specific column.

Parameters

numeric – whether to treat column as float
reverse – False for smaller to bigger, True for bigger to smaller. Use __invert__() to quickly reverse the order instead of using this param

__ror__(it)¶

__invert__()[source]¶: Creates a clone that has the opposite sort order

output module¶

For operations that feel like the termination

class k1lib.bioinfo.cli.output.file(fileName: str)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__ror__(it)¶

class k1lib.bioinfo.cli.output.pretty[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Pretty prints a table

__ror__(it)¶

k1lib.bioinfo.cli.output.display(lines: int = 10)[source]¶: Convenience method for displaying a table

k1lib.bioinfo.cli.output.headOut(lines: int = 10)[source]¶: Convenience method for head() | stdout

sam module¶

This is for functions that are .sam or .bam related

k1lib.bioinfo.cli.sam.cat(bamFile: str)[source]¶: Get sam file outputs from bam file

class k1lib.bioinfo.cli.sam.header(long=True)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(long=True)[source]¶

Adds a header to the table.

Parameters: long – whether to use a long descriptive header, or a short one

__ror__(it)[source]¶

class k1lib.bioinfo.cli.sam.quality(log=True)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(log=True)[source]¶

Get numeric quality of sequence.

Parameters: log – whether to use log scale (0 -> 40), or linear scale (1 -> 0.0001)

__ror__(line)[source]¶

structural module¶

This is for functions that sort of changes the table structure in a dramatic way. They’re the core transformations

class k1lib.bioinfo.cli.structural.joinColumns(fillValue=None)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(fillValue=None)[source]¶

Join multiple columns and loop through all rows. Aka transpose.

Parameters: fillValue – if not None, then will try to zip longest with this fill value

Example:

# returns [[1, 4], [2, 5], [3, 6]]
[[1, 2, 3], [4, 5, 6]] | joinColumns() | dereference()
# returns [[1, 4], [2, 5], [3, 6], [0, 7]]
[[1, 2, 3], [4, 5, 6, 7]] | joinColumns(0) | dereference()

__ror__(it)¶

k1lib.bioinfo.cli.structural.transpose¶: alias of k1lib.bioinfo.cli.structural.joinColumns

k1lib.bioinfo.cli.structural.splitColumns¶: alias of k1lib.bioinfo.cli.structural.joinColumns

class k1lib.bioinfo.cli.structural.joinList(element=None, begin=True)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(element=None, begin=True)[source]¶

Join element into list.

Parameters: element – the element to insert. If None, then takes the input [e, […]], else takes the input […] as usual

Example:

# returns [5, 2, 6, 8]
[5, [2, 6, 8]] | joinList()
# also returns [5, 2, 6, 8]
[2, 6, 8] | joinList(5)

__ror__(it)¶

class k1lib.bioinfo.cli.structural.splitList(weights: List[float] = [0.8, 0.2])[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(weights: List[float] = [0.8, 0.2])[source]¶

Splits list of elements into multiple lists. Example:

# returns [[0, 1, 2, 3, 4, 5, 6, 7], [8, 9]]
range(10) | splitList([0.8, 0.2]) | dereference()

__ror__(it)¶

class k1lib.bioinfo.cli.structural.joinStreams[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Join multiple streams. Example:

# returns [1, 2, 3, 4, 5]
[[1, 2, 3], [4, 5]] | joinStreams() | dereference()

__ror__(it)¶

class k1lib.bioinfo.cli.structural.joinStreamsRandom[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Join multiple streams randomly. If any streams runs out, then quits. Example:

# could return [0, 1, 10, 2, 11, 12, 13, ...], with max length 20, typical length 18
[range(0, 10), range(10, 20)] | joinStreamsRandom() | dereference()

__ror__(it)¶

class k1lib.bioinfo.cli.structural.batched(bs=32)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(bs=32)[source]¶

Batches the input stream. Ignores the last batch. Example:

# returns [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
range(11) | batched(3) | dereference()

__ror__(it)¶

k1lib.bioinfo.cli.structural.collate()[source]¶

Puts individual columns into a tensor. Example:

# returns [tensor([ 0, 10, 20]), tensor([ 1, 11, 21]), tensor([ 2, 12, 22])]
[range(0, 3), range(10, 13), range(20, 23)] | collate() | toList()

k1lib.bioinfo.cli.structural.insertRow(*row: List[T])[source]¶: Inserts a row right before every other rows. See also: joinList().

k1lib.bioinfo.cli.structural.insertColumn(*column, begin=True, fillValue='')[source]¶

Inserts a column at beginning or end. Example:

# returns [['a', 1, 2], ['b', 3, 4]]
[[1, 2], [3, 4]] | insertColumn("a", "b") | dereference()

k1lib.bioinfo.cli.structural.insertIdColumn(table=False, begin=True, fillValue='')[source]¶

Inserts an id column at the beginning (or end). Example:

# returns [[0, 'a', 2], [1, 'b', 4]]
[["a", 2], ["b", 4]] | insertIdColumn(True) | dereference()
# returns [[0, 'a'], [1, 'b']]
"ab" | insertIdColumn()

Parameters: table – if False, then insert column to an Iterator[str], else treat input as a full fledged table

class k1lib.bioinfo.cli.structural.toDict(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶

Transform an incoming stream into a dict using a function for values. Example:

names = ["wanda", "vision", "loki", "mobius"]
names | toDict(valueF=lambda s: len(s)) # will return {"wanda": 5, "vision": 6, ...}
names | toDict(lambda s: s.title(), lambda s: len(s)) # will return {"Wanda": 5, "Vision": 6, ...}

__ror__(it)¶

class k1lib.bioinfo.cli.structural.split(delim: Optional[str] = None, idx: Optional[int] = None)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(delim: Optional[str] = None, idx: Optional[int] = None)[source]¶

Splits each line using a delimiter, and outputs the parts as a separate line.

Parameters: idx – if available, only outputs the element at that index

__ror__(it)¶

class k1lib.bioinfo.cli.structural.table(delim: Optional[str] = None)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(delim: Optional[str] = None)[source]¶

Splits lines to rows (List[str]) using a delimiter. Example:

# returns [['a', 'bd'], ['1', '2', '3']]
["a|bd", "1|2|3"] | table("|") | dereference()

__ror__(it)¶

class k1lib.bioinfo.cli.structural.stitch(delim: Optional[str] = None)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(delim: Optional[str] = None)[source]¶

Stitches elements in a row together, so they become a simple string. See also: pretty. Example:

# returns ['1|2', '3|4', '5|6']
[[1, 2], [3, 4], [5, 6]] | stitch("|") | dereference()

__ror__(it)¶

k1lib.bioinfo.cli.structural.listToTable()[source]¶: Turns Iterator[T] into Table[T]

k1lib.bioinfo.cli.structural.tableFromList()¶: Turns Iterator[T] into Table[T]

class k1lib.bioinfo.cli.structural.count[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Finds unique elements and returns a table with [frequency, value, percent] columns. Example:

# returns [[1, 'a', '33%'], [2, 'b', '67%']]
['a', 'b', 'b'] | count() | dereference()

__ror__(it)¶

class k1lib.bioinfo.cli.structural.permute(*permutations: List[int])[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*permutations: List[int])[source]¶: Permutes the columns. Acts kinda like torch.Tensor.permute()

__ror__(it)¶

class k1lib.bioinfo.cli.structural.accumulate(columnIdx: int = 0, avg=False)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(columnIdx: int = 0, avg=False)[source]¶

Groups lines that have the same row[columnIdx], and add together all other columns, assuming they’re numbers

Parameters

columnIdx – common column index to accumulate
avg – calculate average values instead of sum

__ror__(it)¶

class k1lib.bioinfo.cli.structural.AA_(*idxs: List[int], wraps=False)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(*idxs: List[int], wraps=False)[source]¶

Returns 2 streams, one that has the selected element, and the other the rest. Example:

[1, 5, 6, 3, 7] | AA_(1) # will return [5, [1, 6, 3, 7]]

You can also put multiple indexes through:

[1, 5, 6] | AA_(0, 2) # will return [[1, [5, 6]], [6, [1, 5]]]

If you put None in, then all indexes will be sliced:

[1, 5, 6] | AA_(0, 2)

# will return:
# [[1, [5, 6]],
#  [5, [1, 6]],
#  [6, [1, 5]]]

As for why the strange name, think of this operation as “AĀ”. In statistics, say you have a set “A”, then “not A” is commonly written as A with an overline “Ā”. So “AA_” represents “AĀ”, and that it first returns the selection A.

Parameters: wraps – if True, then the first example will return [[5, [1, 6, 3, 7]]] instead, so that A has the same signature as Ā

__ror__(it)¶

class k1lib.bioinfo.cli.structural.peek[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Returns (firstRow, iterator). This sort of peaks at the first row, to potentially gain some insights about the internal formats. Example:

e, it = iter([[1, 2, 3], [1, 2]]) | peek()
print(e) # prints "[1, 2, 3]"
s = 0
for e in it: s += len(e)
print(s) # prints "5", or length of 2 lists

__ror__(it)¶

class k1lib.bioinfo.cli.structural.repeat(limit: Optional[int] = None)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Yields a specified amount of the passed in object. Example:

# returns [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
[1, 2, 3] | repeat(3) | toList()

Parameters: repeat – if None, then repeats indefinitely

__ror__(it)¶

class k1lib.bioinfo.cli.structural.infiniteFrom[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Yields from a list. If runs out of elements, then do it again. Example:

# returns [1, 2, 3, 1, 2]
[1, 2, 3] | infiniteFrom() | head(5) | dereference()

__ror__(it)¶

utils module¶

This is for all short utilities that has the boilerplate feeling

class k1lib.bioinfo.cli.utils.size(idx=None)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(idx=None)[source]¶

Returns number of rows and columns in the input.

Parameters: idx – if idx is None return (rows, columns). If 0 or 1, then rows or columns

__ror__(it)¶

k1lib.bioinfo.cli.utils.shape¶: alias of k1lib.bioinfo.cli.utils.size

class k1lib.bioinfo.cli.utils.item[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Returns the first row

__ror__(it)¶

class k1lib.bioinfo.cli.utils.identity[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Yields whatever the input is. Useful for multiple streams

__ror__(it)¶

class k1lib.bioinfo.cli.utils.toStr[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts every line (possibly just a number) to a string.

__ror__(it)¶

class k1lib.bioinfo.cli.utils.to1Str(delim: Optional[str] = None)[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

__init__(delim: Optional[str] = None)[source]¶: Merges all strings into 1, with delim in the middle

__ror__(it)¶

class k1lib.bioinfo.cli.utils.toNumpy[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts generator to numpy array

__ror__(it)¶

class k1lib.bioinfo.cli.utils.toTensor[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts generator to torch.Tensor

__ror__(it)¶

class k1lib.bioinfo.cli.utils.toList[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts generator to list. list would do the same, but this is just to maintain the style

__ror__(it)¶

class k1lib.bioinfo.cli.utils.wrapList[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Wraps inputs inside a list

__ror__(it)¶

class k1lib.bioinfo.cli.utils.toSet[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts generator to set. set would do the same, but this is just to maintain the style

__ror__(it)¶

class k1lib.bioinfo.cli.utils.toIter[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Converts object to iterator. iter() would do the same, but this is just to maintain the style

__ror__(it)¶

class k1lib.bioinfo.cli.utils.toRange[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Returns iter(range(len(it))), effectively

__ror__(it)¶

class k1lib.bioinfo.cli.utils.equals[source]¶

Bases: object

Checks if all incoming columns/streams are identical

__ror__(streams: Iterator[Iterator[str]])[source]¶

class k1lib.bioinfo.cli.utils.reverse[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Prints last line first, first line last

__ror__(it)¶

class k1lib.bioinfo.cli.utils.ignore[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Just executes everything, ignoring the output

__ror__(it)¶

class k1lib.bioinfo.cli.utils.toSum[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Calculates the sum of list of numbers

__ror__(it)¶

class k1lib.bioinfo.cli.utils.toAvg[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Calculates average of list of numbers

__ror__(it)¶

class k1lib.bioinfo.cli.utils.lengths[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Returns the lengths of each row.

__ror__(it)¶

k1lib.bioinfo.cli.utils.headerIdx()[source]¶

Cuts out first line, put an index column next to it, and prints it out. Useful when you want to know what your column’s index is to cut it out. Example:

# returns [[0, 'a'], [1, 'b'], [2, 'c']]
["abc"] | headerIdx() | dereference()

class k1lib.bioinfo.cli.utils.dereference[source]¶

Bases: k1lib.bioinfo.cli.init.BaseCli

Recursively converts any iterator into a list. Only str, numbers.Number are not converted. Example:

iter(range(5)) # returns something like "<range_iterator at 0x7fa8c52ca870>"
iter(range(5)) | deference() # returns [0, 1, 2, 3, 4]

Warning

Can work well with PyTorch Tensors, but not Numpy’s array as they screw things up with the __ror__ operator, so do torch.from_numpy(…) first.

__ror__(it)¶