cli package¶
The main idea of this package is to emulate the terminal, but doing all of that inside Python itself. So this bash statement:
cat file.txt | head -5 > headerFile.txt
Turns into this statement:
cat("file.txt") | head(5) > file("headerFile.txt")
Here, “cat”, “head” and “file” are all classes extended
from BaseCli. All of
them implements the “reverse or” operation, or __ror__.
Essentially, these 2 statements are equivalent:
3 | obj
obj.__ror__(3)
Also, a lot of these tools assume that we are operating on a table. So this table:
col1  | 
col2  | 
col3  | 
|---|---|---|
1  | 
2  | 
3  | 
4  | 
5  | 
6  | 
Is equivalent to this list:
["col1\tcol2\tcol3", "1\t2\t3", "4\t5\t6"]
Essentially, each row is a single string, and elements in a row are separated by a
delimiter. You can set the default delimiter using
bioinfoSettings like this:
bioinfoSettings["defaultDelim"] = ","
Also, the expected way to use these tools is to import everything directly into the current environment, like this:
from k1lib.bioinfo.cli import *
Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. If this interests you, check over this:
Submodules¶
bio module¶
This is for functions that are actually biology-related
- 
class 
k1lib.bioinfo.cli.bio.transcribe[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliTranscribes (DNA -> RNA) incoming rows
- 
class 
k1lib.bioinfo.cli.bio.translate(length: int = 0)[source]¶ 
- 
class 
k1lib.bioinfo.cli.bio.medAa[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliConverts short aa sequence to medium one
entrez module¶
This module is not really fleshed out, not that useful/elegant, and I just use
cmd instead
mgi module¶
- 
class 
k1lib.bioinfo.cli.mgi.batch(headless=True)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliQueries MGI database, convert list of genes to MGI ids
filt module¶
This is for functions that cuts out specific parts of the table
- 
class 
k1lib.bioinfo.cli.filt.filt(predicate: Callable[[str], bool], column: int = 0, delim: Optional[str] = None)[source]¶ 
- 
class 
k1lib.bioinfo.cli.filt.isValue(value, column: int = 0, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.filt.filt
- 
class 
k1lib.bioinfo.cli.filt.inside(values: Set[Any], column: int = 0, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.filt.filt
- 
class 
k1lib.bioinfo.cli.filt.nonEmptyStream[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliFilters out streams that have no rows
- 
class 
k1lib.bioinfo.cli.filt.startswith(s: str, column: int = 0, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.filt.filt
- 
class 
k1lib.bioinfo.cli.filt.endswith(s: str, column: int = 0, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.filt.filt
- 
class 
k1lib.bioinfo.cli.filt.isNumeric(column: Optional[int] = None, delim: Optional[str] = None)[source]¶ 
- 
class 
k1lib.bioinfo.cli.filt.inRange(min: Optional[float] = None, max: Optional[float] = None, column: Optional[int] = None, delim: Optional[str] = None)[source]¶ 
- 
class 
k1lib.bioinfo.cli.filt.columns(*columns: Union[int, slice, List[int]], delim: Optional[str] = None)[source]¶ 
- 
k1lib.bioinfo.cli.filt.cut¶ alias of
k1lib.bioinfo.cli.filt.columns
- 
class 
k1lib.bioinfo.cli.filt.every(length: int, offset: int = 0)[source]¶ 
- 
class 
k1lib.bioinfo.cli.filt.intersection[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliReturns the intersection of multiple streams. Example:
[[1, 2, 3, 4, 5], [7, 2, 4, 6, 5]] | intersection() # will return set([2, 4, 5])
grep module¶
- 
class 
k1lib.bioinfo.cli.grep.grep(pattern: str, before: int = 0, after: int = 0, singleLine: bool = False, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
__init__(pattern: str, before: int = 0, after: int = 0, singleLine: bool = False, delim: Optional[str] = None)[source]¶ Find lines that has the specified pattern.
- Parameters
 pattern – regex pattern to search for in a line
before – lines before the hit. Outputs independent lines
after – lines after the hit. Outputs independent lines
singleLine – change to True, to bunch before and after lines to a single line
delim – the delimiter in between sections if singleLine is True
- 
 
init module¶
- 
cli.bioinfoSettings= {'defaultDelim': '\t', 'defaultIndent': ' ', 'lookupImgs': True, 'oboFile': None, 'strict': False}¶ Main settings of
k1lib.bioinfo.cli. When using:from k1lib.bioinfo.cli import *
…you can just set the settings like this:
bioinfoSettings["defaultIndent"] = "\t"
There are a few settings:
defaultDelim: default delimiter used in-between columns when creating tables
defaultIndent: default indent used for displaying nested structures
lookupImgs: whether to automatically look up images when exploring something
oboFile: gene ontology obo file location
strict: whether strict mode is on. Turning it on can help you debug stuff, but could also be a pain to work with
- 
k1lib.bioinfo.cli.init.patchDefaultDelim(s: str)[source]¶ - Parameters
 s –
if not None, returns self
else returns the default delimiter in settings
- 
k1lib.bioinfo.cli.init.patchDefaultIndent(s: str)[source]¶ - Parameters
 s –
if not None, returns self
else returns the default indent character in settings
- 
class 
k1lib.bioinfo.cli.init.BaseCli[source]¶ Bases:
object- 
all() → k1lib.bioinfo.cli.init.BaseCli[source]¶ Applies this BaseCli to all incoming streams
- 
 
- 
class 
k1lib.bioinfo.cli.init.serial(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
__init__(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Merges clis into 1, feeding end to end. Used in chaining clis together without a prime iterator. Meaning, without this, stuff like this fails to run:
[1, 2] | a() | b() # runs c = a() | b(); [1, 2] | c # doesn't run if this class doesn't exist
- 
 
- 
class 
k1lib.bioinfo.cli.init.oneToMany(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
__init__(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Duplicates 1 stream into multiple streams, each for a cli in the list. Used in the “a & b” joining operator
- 
 
- 
class 
k1lib.bioinfo.cli.init.manyToManySpecific(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
__init__(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Applies multiple streams to multiple clis independently. Used in the “a + b” joining operator
- 
 
input module¶
- 
k1lib.bioinfo.cli.input.cat(fileName: Optional[str] = None)[source]¶ Reads a file line by line.
- Parameters
 fileName – if None, then return a
BaseClithat accepts a file name and outputs Iterator[str]
- 
class 
k1lib.bioinfo.cli.input.cats[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliLike
cat(), but opens multiple files at once, returning streams. Looks something like this:apply(lambda s: cat(s))
- 
k1lib.bioinfo.cli.input.wget(url: str, fileName: Optional[str] = None)[source]¶ Downloads a file
- Parameters
 url – The url of the file
fileName – if None, then tries to infer it from the url
- 
class 
k1lib.bioinfo.cli.input.cmd(cmd: str)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
property 
err¶ Error from the last command
- 
property 
 
kcsv module¶
This module is for dealing with csv stuff
kxml module¶
This module is for dealing with xml stuff
- 
class 
k1lib.bioinfo.cli.kxml.node[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliTurns lines into a single node
- 
__ror__(it: Iterator[str]) → Iterator[xml.etree.ElementTree.Element][source]¶ 
- 
 
- 
class 
k1lib.bioinfo.cli.kxml.maxDepth(depth: Optional[int] = None, copy: bool = True)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
__init__(depth: Optional[int] = None, copy: bool = True)[source]¶ Filters out too deep nodes
- Parameters
 depth – max depth to include in
copy – whether to limit the nodes itself, or limit a copy
- 
__ror__(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶ 
- 
 
- 
class 
k1lib.bioinfo.cli.kxml.tag(tag: str)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
__init__(tag: str)[source]¶ Finds all tags that have a particular name. If found, then don’t search deeper
- 
__ror__(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶ 
- 
 
- 
class 
k1lib.bioinfo.cli.kxml.pretty(indent: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
__ror__(it: Iterator[xml.etree.ElementTree.Element]) → Iterator[str][source]¶ 
- 
 
- 
class 
k1lib.bioinfo.cli.kxml.display(depth: int = 3, lines: int = 20)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
__init__(depth: int = 3, lines: int = 20)[source]¶ Convenience method for getting head, make it pretty and print it out
- 
__ror__(it: Iterator[xml.etree.ElementTree.Element], lines=10)[source]¶ 
- 
 
modifier module¶
This is for quick modifiers, think of them as changing formats
- 
class 
k1lib.bioinfo.cli.modifier.lstrip(char: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliStrips left of every line
- 
class 
k1lib.bioinfo.cli.modifier.rstrip(char: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliStrips right of every line
- 
class 
k1lib.bioinfo.cli.modifier.strip(char: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliStrips both sides of every line
- 
class 
k1lib.bioinfo.cli.modifier.upper[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliMake all characters uppercase
- 
class 
k1lib.bioinfo.cli.modifier.lower[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliMake all characters lowercase
- 
class 
k1lib.bioinfo.cli.modifier.replace(s: str, target: Optional[str] = None)[source]¶ 
- 
class 
k1lib.bioinfo.cli.modifier.sort(column: int = 0, reverse=False, numeric=True, delim: Optional[str] = None)[source]¶ 
output module¶
For operations that feel like the termination
sam module¶
This is for functions that are .sam or .bam related
- 
class 
k1lib.bioinfo.cli.sam.header(long=True)[source]¶ 
structural module¶
This is for functions that sort of changes the table structure in a dramatic way. They’re the core transformations
- 
class 
k1lib.bioinfo.cli.structural.joinColumns(delim: Optional[str] = None, sep: bool = False)[source]¶ 
- 
class 
k1lib.bioinfo.cli.structural.joinRows[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliJoin multiple stream of rows
- 
k1lib.bioinfo.cli.structural.joinStreams¶ 
- 
class 
k1lib.bioinfo.cli.structural.splitColumns(delim: Optional[str] = None)[source]¶ 
- 
class 
k1lib.bioinfo.cli.structural.insertRow(*columns: Union[List[str], str], delim: Optional[str] = None)[source]¶ 
- 
k1lib.bioinfo.cli.structural.insertIdColumn(begin=True, delim: Optional[str] = None)[source]¶ Inserts an id column at the beginning (or end)
- 
class 
k1lib.bioinfo.cli.structural.toDict(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
__init__(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶ Transform an incoming stream into a dict using a function for values. Example:
names = ["wanda", "vision", "loki", "mobius"] names | toDict(valueF=lambda s: len(s)) # will return {"wanda": 5, "vision": 6, ...} names | toDict(lambda s: s.title(), lambda s: len(s)) # will return {"Wanda": 5, "Vision": 6, ...}
- 
 
- 
class 
k1lib.bioinfo.cli.structural.split(delim: Optional[str] = None, idx: Optional[int] = None)[source]¶ 
- 
class 
k1lib.bioinfo.cli.structural.count(delim: Optional[str] = None)[source]¶ 
- 
class 
k1lib.bioinfo.cli.structural.permute(permutations: List[int], delim: Optional[str] = None)[source]¶ 
- 
class 
k1lib.bioinfo.cli.structural.accumulate(column: int = 0, avg=False, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
__init__(column: int = 0, avg=False, delim: Optional[str] = None)[source]¶ Groups lines that have the same line.split(delim)[column], and add together all other columns, assuming they’re floats
- Args:
 column: common column to accumulate avg: calculate average values instead of sum delim: specify delimiter between columns
- 
 
- 
class 
k1lib.bioinfo.cli.structural.AA_(*idxs: List[int], wraps=False)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli- 
__init__(*idxs: List[int], wraps=False)[source]¶ Returns 2 streams, one that has the selected element, and the other the rest. Example:
[1, 5, 6, 3, 7] | AA_(1) # will return [5, [1, 6, 3, 7]]
You can also put multiple indexes through:
[1, 5, 6] | AA_(0, 2) # will return [[1, [5, 6]], [6, [1, 5]]]
If you put None in, then all indexes will be sliced:
[1, 5, 6] | AA_(0, 2) # will return: # [[1, [5, 6]], # [5, [1, 6]], # [6, [1, 5]]]
As for why the strange name, think of this operation as “AĀ”. In statistics, say you have a set “A”, then “not A” is commonly written as A with an overline “Ā”. So “AA_” represents “AĀ”, and that it first returns the selection A first.
- Parameters
 wraps – if True, then the first example will return [[5, [1, 6, 3, 7]]] instead, so that A has the same signature as Ā
- 
 
- 
class 
k1lib.bioinfo.cli.structural.infinite[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliTakes in a stream and yields an infinite amount of them. Example:
# returns [[1, 2, 3], [1, 2, 3], [1, 2, 3]] [1, 2, 3] | infinite() | head(3) | toList()
utils module¶
This is for all short utilities that has the boilerplate feeling
- 
class 
k1lib.bioinfo.cli.utils.size(idx=None, delim: Optional[str] = None)[source]¶ 
- 
k1lib.bioinfo.cli.utils.shape¶ alias of
k1lib.bioinfo.cli.utils.size
- 
class 
k1lib.bioinfo.cli.utils.item[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliReturns the first row
- 
class 
k1lib.bioinfo.cli.utils.identity[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliYields whatever the input is. Useful for multiple streams
- 
class 
k1lib.bioinfo.cli.utils.toInt[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliConverts every row into an integer. Excludes non numbers if not in strict mode.
- 
class 
k1lib.bioinfo.cli.utils.toFloat[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliConverts every row into a float. Excludes non numbers if not in strict mode.
- 
class 
k1lib.bioinfo.cli.utils.toNumpy[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliConverts generator to numpy array
- 
class 
k1lib.bioinfo.cli.utils.toList[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliConverts generator to list.
listwould do the same, but this is just to maintain the style
- 
class 
k1lib.bioinfo.cli.utils.wrapList[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliWraps inputs inside a list
- 
class 
k1lib.bioinfo.cli.utils.toSet[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliConverts generator to set.
setwould do the same, but this is just to maintain the style
- 
class 
k1lib.bioinfo.cli.utils.toIter[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliConverts object to iterator. iter() would do the same, but this is just to maintain the style
- 
class 
k1lib.bioinfo.cli.utils.toRange[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliReturns iter(range(len(it))), effectively
- 
class 
k1lib.bioinfo.cli.utils.equals[source]¶ Bases:
objectChecks if all incoming columns/streams are identical
- 
class 
k1lib.bioinfo.cli.utils.reverse[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliPrints last line first, first line last
- 
class 
k1lib.bioinfo.cli.utils.ignore[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliJust executes everything, ignoring the output
- 
class 
k1lib.bioinfo.cli.utils.avg[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliCalculates average of list of numbers
- 
k1lib.bioinfo.cli.utils.headerIdx(delim: Optional[str] = None)[source]¶ Cuts out first line, put an index column next to it, and prints it out. Useful when you want to know what your column’s index is to cut it out.
- 
class 
k1lib.bioinfo.cli.utils.dereference[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCliRecursively converts any iterator into a list. Only
str,numbers.Numberare not converted. Example:iter(range(5)) # returns something like "<range_iterator at 0x7fa8c52ca870>" iter(range(5)) | deference() # returns [0, 1, 2, 3, 4]