cli package¶
The main idea of this package is to emulate the terminal, but doing all of that inside Python itself. So this bash statement:
cat file.txt | head -5 > headerFile.txt
Turns into this statement:
cat("file.txt") | head(5) > file("headerFile.txt")
Here, “cat”, “head” and “file” are all classes extended
from BaseCli
. All of
them implements the “reverse or” operation, or __ror__.
Essentially, these 2 statements are equivalent:
3 | obj
obj.__ror__(3)
Also, a lot of these tools assume that we are operating on a table. So this table:
col1 |
col2 |
col3 |
---|---|---|
1 |
2 |
3 |
4 |
5 |
6 |
Is equivalent to this list:
[["col1", "col2", "col3"], [1, 2, 3], [4, 5, 6]]
Also, the expected way to use these tools is to import everything directly into the current environment, like this:
from k1lib.bioinfo.cli import *
Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. If this interests you, check over this:
Submodules¶
bio module¶
This is for functions that are actually biology-related
-
class
k1lib.bioinfo.cli.bio.
transcribe
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Transcribes (DNA -> RNA) incoming rows
-
class
k1lib.bioinfo.cli.bio.
translate
(length: int = 0)[source]¶
-
class
k1lib.bioinfo.cli.bio.
medAa
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts short aa sequence to medium one
entrez module¶
This module is not really fleshed out, not that useful/elegant, and I just use
cmd
instead
mgi module¶
-
class
k1lib.bioinfo.cli.mgi.
batch
(headless=True)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Queries MGI database, convert list of genes to MGI ids
filt module¶
This is for functions that cuts out specific parts of the table
-
class
k1lib.bioinfo.cli.filt.
filt
(predicate: Callable[[str], bool], column: Optional[int] = None)[source]¶
-
k1lib.bioinfo.cli.filt.
isValue
(value, column: Optional[int] = None)[source]¶ Filters out lines that is different from the given value
-
k1lib.bioinfo.cli.filt.
inSet
(values: Set[Any], column: Optional[int] = None)[source]¶ Filters out lines that is not in the specified set
-
k1lib.bioinfo.cli.filt.
contains
(s: str, column: Optional[int] = None)[source]¶ Filters out lines that don’t contain the specified substring
-
class
k1lib.bioinfo.cli.filt.
nonEmptyStream
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Filters out streams that have no rows
-
k1lib.bioinfo.cli.filt.
startswith
(s: str, column: Optional[int] = None)[source]¶ Filters out lines that don’t start with s
-
k1lib.bioinfo.cli.filt.
endswith
(s: str, column: Optional[int] = None)[source]¶ Filters out lines that don’t end with s
-
k1lib.bioinfo.cli.filt.
isNumeric
(column: Optional[int] = None)[source]¶ Filters out a line if that column is not a number
-
k1lib.bioinfo.cli.filt.
inRange
(min: Optional[float] = None, max: Optional[float] = None, column: Optional[int] = None)[source]¶ Checks whether a column is in range or not
-
k1lib.bioinfo.cli.filt.
cut
¶ alias of
k1lib.bioinfo.cli.filt.columns
-
class
k1lib.bioinfo.cli.filt.
every
(length: int, offset: int = 0)[source]¶
-
class
k1lib.bioinfo.cli.filt.
intersection
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns the intersection of multiple streams. Example:
[[1, 2, 3, 4, 5], [7, 2, 4, 6, 5]] | intersection() # will return set([2, 4, 5])
grep module¶
-
class
k1lib.bioinfo.cli.grep.
grep
(pattern: str, before: int = 0, after: int = 0)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(pattern: str, before: int = 0, after: int = 0)[source]¶ Find lines that has the specified pattern. Example: .. code-block:
# returns ['c', 'd', '2', 'd'] "abcde12d34" | grep("d", 1) | dereference()
- Parameters
pattern – regex pattern to search for in a line
before – lines before the hit. Outputs independent lines
after – lines after the hit. Outputs independent lines
-
-
class
k1lib.bioinfo.cli.grep.
grepToTable
(pattern: str, before: int = 0, after: int = 0)[source]¶
init module¶
-
cli.
bioinfoSettings
= {'defaultDelim': '\t', 'defaultIndent': ' ', 'lookupImgs': True, 'oboFile': None, 'strict': False}¶ Main settings of
k1lib.bioinfo.cli
. When using:from k1lib.bioinfo.cli import *
…you can just set the settings like this:
bioinfoSettings["defaultIndent"] = "\t"
There are a few settings:
defaultDelim: default delimiter used in-between columns when creating tables
defaultIndent: default indent used for displaying nested structures
lookupImgs: whether to automatically look up images when exploring something
oboFile: gene ontology obo file location
strict: whether strict mode is on. Turning it on can help you debug stuff, but could also be a pain to work with
-
k1lib.bioinfo.cli.init.
patchDefaultDelim
(s: str)[source]¶ - Parameters
s –
if not None, returns self
else returns the default delimiter in settings
-
k1lib.bioinfo.cli.init.
patchDefaultIndent
(s: str)[source]¶ - Parameters
s –
if not None, returns self
else returns the default indent character in settings
-
k1lib.bioinfo.cli.init.
newTypeHint
(name, docs='')[source]¶ Creates a new type hint that can be sliced and yet still looks fine in sphinx. Crudely written by my poorly understood idea of Python’s metaclasses. Seriously, this shit is bonkers, read over it https://stackoverflow.com/questions/100003/what-are-metaclasses-in-python
Example:
Table = newTypeHint("Table", "some docs") Table[int] # prints out as "Table[int]", and sphinx fell for it too Table[Table[str], float] # prints out as "Table[Table[str], float]"
-
class
k1lib.bioinfo.cli.init.
Table
¶ Bases:
object
Essentially just Iterator[List[T]]. This class is just here so that I can generate the docs with nicely formatted types like “Table[str]”.
-
class
k1lib.bioinfo.cli.init.
Row
(iterable=(), /)[source]¶ Bases:
list
Not really used currently. Just here for potential future feature
-
class
k1lib.bioinfo.cli.init.
BaseCli
[source]¶ Bases:
object
-
all
() → k1lib.bioinfo.cli.init.BaseCli[source]¶ Applies this BaseCli to all incoming streams
-
-
class
k1lib.bioinfo.cli.init.
serial
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Merges clis into 1, feeding end to end. Used in chaining clis together without a prime iterator. Meaning, without this, stuff like this fails to run:
[1, 2] | a() | b() # runs c = a() | b(); [1, 2] | c # doesn't run if this class doesn't exist
-
-
class
k1lib.bioinfo.cli.init.
oneToMany
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Duplicates 1 stream into multiple streams, each for a cli in the list. Used in the “a & b” joining operator
-
-
class
k1lib.bioinfo.cli.init.
manyToManySpecific
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Applies multiple streams to multiple clis independently. Used in the “a + b” joining operator
-
input module¶
-
k1lib.bioinfo.cli.input.
cat
(fileName: Optional[str] = None)[source]¶ Reads a file line by line.
- Parameters
fileName – if None, then return a
BaseCli
that accepts a file name and outputs Iterator[str]
-
class
k1lib.bioinfo.cli.input.
cats
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Like
cat()
, but opens multiple files at once, returning streams. Looks something like this:apply(lambda s: cat(s))
-
k1lib.bioinfo.cli.input.
wget
(url: str, fileName: Optional[str] = None)[source]¶ Downloads a file
- Parameters
url – The url of the file
fileName – if None, then tries to infer it from the url
-
class
k1lib.bioinfo.cli.input.
cmd
(cmd: str)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
property
err
¶ Error from the last command
-
property
kcsv module¶
This module is for dealing with csv stuff
kxml module¶
This module is for dealing with xml stuff
-
class
k1lib.bioinfo.cli.kxml.
node
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Turns lines into a single node
-
__ror__
(it: Iterator[str]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
maxDepth
(depth: Optional[int] = None, copy: bool = True)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(depth: Optional[int] = None, copy: bool = True)[source]¶ Filters out too deep nodes
- Parameters
depth – max depth to include in
copy – whether to limit the nodes itself, or limit a copy
-
__ror__
(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
tag
(tag: str)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(tag: str)[source]¶ Finds all tags that have a particular name. If found, then don’t search deeper
-
__ror__
(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
pretty
(indent: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__ror__
(it: Iterator[xml.etree.ElementTree.Element]) → Iterator[str][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
display
(depth: int = 3, lines: int = 20)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(depth: int = 3, lines: int = 20)[source]¶ Convenience method for getting head, make it pretty and print it out
-
__ror__
(it: Iterator[xml.etree.ElementTree.Element], lines=10)[source]¶
-
modifier module¶
This is for quick modifiers, think of them as changing formats
-
class
k1lib.bioinfo.cli.modifier.
apply
(f: Callable[[str], str], column: Optional[int] = None)[source]¶
-
k1lib.bioinfo.cli.modifier.
lstrip
(column: Optional[int] = None, char: Optional[str] = None)[source]¶ Strips left of every line
-
k1lib.bioinfo.cli.modifier.
rstrip
(column: Optional[int] = None, char: Optional[str] = None)[source]¶ Strips right of every line
-
k1lib.bioinfo.cli.modifier.
strip
(column: Optional[int] = None, char: Optional[str] = None)[source]¶ Strips both sides of every line
-
k1lib.bioinfo.cli.modifier.
upper
(column: Optional[int] = None)[source]¶ Make all characters uppercase
-
k1lib.bioinfo.cli.modifier.
lower
(column: Optional[int] = None)[source]¶ Make all characters lowercase
-
k1lib.bioinfo.cli.modifier.
replace
(s: str, target: Optional[str] = None, column: Optional[int] = None)[source]¶ Replaces substring s with target for each line.
-
k1lib.bioinfo.cli.modifier.
remove
(s: str, column: Optional[int] = None)[source]¶ Removes a specific substring in each line.
-
k1lib.bioinfo.cli.modifier.
toFloat
(column: Optional[int] = None)[source]¶ Converts every row into a float. Excludes non numbers if not in strict mode.
-
k1lib.bioinfo.cli.modifier.
toInt
(column: Optional[int] = None)[source]¶ Converts every row into an integer. Excludes non numbers if not in strict mode.
-
class
k1lib.bioinfo.cli.modifier.
sort
(column: int = 0, numeric=True, reverse=False)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(column: int = 0, numeric=True, reverse=False)[source]¶ Sorts all lines based on a specific column.
- Parameters
numeric – whether to treat column as float
reverse – False for smaller to bigger, True for bigger to smaller. Use
__invert__()
to quickly reverse the order instead of using this param
-
output module¶
For operations that feel like the termination
-
class
k1lib.bioinfo.cli.output.
pretty
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Pretty prints a table
sam module¶
This is for functions that are .sam or .bam related
-
class
k1lib.bioinfo.cli.sam.
header
(long=True)[source]¶
structural module¶
This is for functions that sort of changes the table structure in a dramatic way. They’re the core transformations
-
class
k1lib.bioinfo.cli.structural.
joinColumns
(fillValue=None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(fillValue=None)[source]¶ Join multiple columns and loop through all rows. Aka transpose.
- Parameters
fillValue – if not None, then will try to zip longest with this fill value
Example:
# returns [[1, 4], [2, 5], [3, 6]] [[1, 2, 3], [4, 5, 6]] | joinColumns() | dereference() # returns [[1, 4], [2, 5], [3, 6], [0, 7]] [[1, 2, 3], [4, 5, 6, 7]] | joinColumns(0) | dereference()
-
-
k1lib.bioinfo.cli.structural.
transpose
¶
-
k1lib.bioinfo.cli.structural.
splitColumns
¶
-
class
k1lib.bioinfo.cli.structural.
joinList
(element=None, begin=True)[source]¶
-
class
k1lib.bioinfo.cli.structural.
joinStreams
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Join multiple streams. Example:
# returns [1, 2, 3, 4, 5] [[1, 2, 3], [4, 5]] | joinStreams() | dereference()
-
k1lib.bioinfo.cli.structural.
insertRow
(*row: List[T])[source]¶ Inserts a row right before every other rows. See also:
joinList()
.
-
k1lib.bioinfo.cli.structural.
insertColumn
(*column, begin=True)[source]¶ Inserts a column at beginning or end. Example:
# returns [['a', 1, 2], ['b', 3, 4]] [[1, 2], [3, 4]] | insertColumn("a", "b") | dereference()
-
k1lib.bioinfo.cli.structural.
insertIdColumn
(table=False, begin=True)[source]¶ Inserts an id column at the beginning (or end).
- Parameters
table – if False, then insert column to an Iterator[str], else treat input as a full fledged table
Example:
# returns [[0, 'a', 2], [1, 'b', 4]] [["a", 2], ["b", 4]] | insertIdColumn(True) | dereference() # returns [[0, 'a'], [1, 'b']] "ab" | insertIdColumn()
-
class
k1lib.bioinfo.cli.structural.
toDict
(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶ Transform an incoming stream into a dict using a function for values. Example:
names = ["wanda", "vision", "loki", "mobius"] names | toDict(valueF=lambda s: len(s)) # will return {"wanda": 5, "vision": 6, ...} names | toDict(lambda s: s.title(), lambda s: len(s)) # will return {"Wanda": 5, "Vision": 6, ...}
-
-
class
k1lib.bioinfo.cli.structural.
split
(delim: Optional[str] = None, idx: Optional[int] = None)[source]¶
-
class
k1lib.bioinfo.cli.structural.
table
(delim: Optional[str] = None)[source]¶
-
class
k1lib.bioinfo.cli.structural.
stitch
(delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(delim: Optional[str] = None)[source]¶ Stitches elements in a row together, so they become a simple string. See also:
k1lib.bioinfo.cli.output.pretty
-
-
class
k1lib.bioinfo.cli.structural.
count
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Finds unique elements and returns a table with [frequency, value, percent] columns. Example:
# returns [[1, 'a', '33%'], [2, 'b', '67%']] ['a', 'b', 'b'] | count() | dereference()
-
class
k1lib.bioinfo.cli.structural.
accumulate
(columnIdx: int = 0, avg=False)[source]¶
-
class
k1lib.bioinfo.cli.structural.
AA_
(*idxs: List[int], wraps=False)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*idxs: List[int], wraps=False)[source]¶ Returns 2 streams, one that has the selected element, and the other the rest. Example:
[1, 5, 6, 3, 7] | AA_(1) # will return [5, [1, 6, 3, 7]]
You can also put multiple indexes through:
[1, 5, 6] | AA_(0, 2) # will return [[1, [5, 6]], [6, [1, 5]]]
If you put None in, then all indexes will be sliced:
[1, 5, 6] | AA_(0, 2) # will return: # [[1, [5, 6]], # [5, [1, 6]], # [6, [1, 5]]]
As for why the strange name, think of this operation as “AĀ”. In statistics, say you have a set “A”, then “not A” is commonly written as A with an overline “Ā”. So “AA_” represents “AĀ”, and that it first returns the selection A first.
- Parameters
wraps – if True, then the first example will return [[5, [1, 6, 3, 7]]] instead, so that A has the same signature as Ā
-
-
class
k1lib.bioinfo.cli.structural.
infinite
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Yields an infinite amount of the passed in object. Example:
# returns [[1, 2, 3], [1, 2, 3], [1, 2, 3]] [1, 2, 3] | infinite() | head(3) | toList()
utils module¶
This is for all short utilities that has the boilerplate feeling
-
class
k1lib.bioinfo.cli.utils.
size
(idx=None)[source]¶
-
k1lib.bioinfo.cli.utils.
shape
¶ alias of
k1lib.bioinfo.cli.utils.size
-
class
k1lib.bioinfo.cli.utils.
item
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns the first row
-
class
k1lib.bioinfo.cli.utils.
identity
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Yields whatever the input is. Useful for multiple streams
-
class
k1lib.bioinfo.cli.utils.
toNumpy
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to numpy array
-
class
k1lib.bioinfo.cli.utils.
toList
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to list.
list
would do the same, but this is just to maintain the style
-
class
k1lib.bioinfo.cli.utils.
wrapList
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Wraps inputs inside a list
-
class
k1lib.bioinfo.cli.utils.
toSet
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to set.
set
would do the same, but this is just to maintain the style
-
class
k1lib.bioinfo.cli.utils.
toIter
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts object to iterator. iter() would do the same, but this is just to maintain the style
-
class
k1lib.bioinfo.cli.utils.
toRange
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns iter(range(len(it))), effectively
-
class
k1lib.bioinfo.cli.utils.
equals
[source]¶ Bases:
object
Checks if all incoming columns/streams are identical
-
class
k1lib.bioinfo.cli.utils.
reverse
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Prints last line first, first line last
-
class
k1lib.bioinfo.cli.utils.
ignore
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Just executes everything, ignoring the output
-
class
k1lib.bioinfo.cli.utils.
avg
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Calculates average of list of numbers
-
k1lib.bioinfo.cli.utils.
headerIdx
()[source]¶ Cuts out first line, put an index column next to it, and prints it out. Useful when you want to know what your column’s index is to cut it out. Example:
# returns [[0, 'a'], [1, 'b'], [2, 'c']] ["abc"] | headerIdx() | dereference()
-
class
k1lib.bioinfo.cli.utils.
dereference
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Recursively converts any iterator into a list. Only
str
,numbers.Number
are not converted. Example:iter(range(5)) # returns something like "<range_iterator at 0x7fa8c52ca870>" iter(range(5)) | deference() # returns [0, 1, 2, 3, 4]