cli package¶
The main idea of this package is to emulate the terminal, but doing all of that inside Python itself. So this bash statement:
cat file.txt | head -5 > headerFile.txt
Turns into this statement:
cat("file.txt") | head(5) > file("headerFile.txt")
Here, “cat”, “head” and “file” are all classes extended
from BaseCli
. All of
them implements the “reverse or” operation, or __ror__.
Essentially, these 2 statements are equivalent:
3 | obj
obj.__ror__(3)
Also, a lot of these tools assume that we are operating on a table. So this table:
col1 |
col2 |
col3 |
---|---|---|
1 |
2 |
3 |
4 |
5 |
6 |
Is equivalent to this list:
["col1\tcol2\tcol3", "1\t2\t3", "4\t5\t6"]
Essentially, each row is a single string, and elements in a row are separated by a
delimiter. You can set the default delimiter using
bioinfoSettings
like this:
bioinfoSettings["defaultDelim"] = ","
Also, the expected way to use these tools is to import everything directly into the current environment, like this:
from k1lib.bioinfo.cli import *
Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. If this interests you, check over this:
Submodules¶
bio module¶
This is for functions that are actually biology-related
-
class
k1lib.bioinfo.cli.bio.
transcribe
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Transcribes (DNA -> RNA) incoming rows
-
class
k1lib.bioinfo.cli.bio.
translate
(length: int = 0)[source]¶
-
class
k1lib.bioinfo.cli.bio.
medAa
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts short aa sequence to medium one
entrez module¶
This module is not really fleshed out, not that useful/elegant, and I just use
cmd
instead
mgi module¶
-
class
k1lib.bioinfo.cli.mgi.
batch
(headless=True)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Queries MGI database, convert list of genes to MGI ids
filt module¶
This is for functions that cuts out specific parts of the table
-
class
k1lib.bioinfo.cli.filt.
filt
(predicate: Callable[[str], bool], column: int = 0, delim: Optional[str] = None)[source]¶
-
class
k1lib.bioinfo.cli.filt.
isValue
(value, column: int = 0, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.filt.filt
-
class
k1lib.bioinfo.cli.filt.
inside
(values: Set[Any], column: int = 0, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.filt.filt
-
class
k1lib.bioinfo.cli.filt.
nonEmptyStream
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Filters out streams that have no rows
-
class
k1lib.bioinfo.cli.filt.
startswith
(s: str, column: int = 0, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.filt.filt
-
class
k1lib.bioinfo.cli.filt.
endswith
(s: str, column: int = 0, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.filt.filt
-
class
k1lib.bioinfo.cli.filt.
isNumeric
(column: Optional[int] = None, delim: Optional[str] = None)[source]¶
-
class
k1lib.bioinfo.cli.filt.
inRange
(min: Optional[float] = None, max: Optional[float] = None, column: Optional[int] = None, delim: Optional[str] = None)[source]¶
-
class
k1lib.bioinfo.cli.filt.
columns
(*columns: Union[int, slice, List[int]], delim: Optional[str] = None)[source]¶
-
k1lib.bioinfo.cli.filt.
cut
¶ alias of
k1lib.bioinfo.cli.filt.columns
-
class
k1lib.bioinfo.cli.filt.
every
(length: int, offset: int = 0)[source]¶
-
class
k1lib.bioinfo.cli.filt.
intersection
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns the intersection of multiple streams. Example:
[[1, 2, 3, 4, 5], [7, 2, 4, 6, 5]] | intersection() # will return set([2, 4, 5])
grep module¶
-
class
k1lib.bioinfo.cli.grep.
grep
(pattern: str, before: int = 0, after: int = 0, singleLine: bool = False, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(pattern: str, before: int = 0, after: int = 0, singleLine: bool = False, delim: Optional[str] = None)[source]¶ Find lines that has the specified pattern.
- Parameters
pattern – regex pattern to search for in a line
before – lines before the hit. Outputs independent lines
after – lines after the hit. Outputs independent lines
singleLine – change to True, to bunch before and after lines to a single line
delim – the delimiter in between sections if singleLine is True
-
init module¶
-
cli.
bioinfoSettings
= {'defaultDelim': '\t', 'defaultIndent': ' ', 'lookupImgs': True, 'oboFile': None, 'strict': False}¶ Main settings of
k1lib.bioinfo.cli
. When using:from k1lib.bioinfo.cli import *
…you can just set the settings like this:
bioinfoSettings["defaultIndent"] = "\t"
There are a few settings:
defaultDelim: default delimiter used in-between columns when creating tables
defaultIndent: default indent used for displaying nested structures
lookupImgs: whether to automatically look up images when exploring something
oboFile: gene ontology obo file location
strict: whether strict mode is on. Turning it on can help you debug stuff, but could also be a pain to work with
-
k1lib.bioinfo.cli.init.
patchDefaultDelim
(s: str)[source]¶ - Parameters
s –
if not None, returns self
else returns the default delimiter in settings
-
k1lib.bioinfo.cli.init.
patchDefaultIndent
(s: str)[source]¶ - Parameters
s –
if not None, returns self
else returns the default indent character in settings
-
class
k1lib.bioinfo.cli.init.
BaseCli
[source]¶ Bases:
object
-
all
() → k1lib.bioinfo.cli.init.BaseCli[source]¶ Applies this BaseCli to all incoming streams
-
-
class
k1lib.bioinfo.cli.init.
serial
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Merges clis into 1, feeding end to end. Used in chaining clis together without a prime iterator. Meaning, without this, stuff like this fails to run:
[1, 2] | a() | b() # runs c = a() | b(); [1, 2] | c # doesn't run if this class doesn't exist
-
-
class
k1lib.bioinfo.cli.init.
oneToMany
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Duplicates 1 stream into multiple streams, each for a cli in the list. Used in the “a & b” joining operator
-
-
class
k1lib.bioinfo.cli.init.
manyToManySpecific
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Applies multiple streams to multiple clis independently. Used in the “a + b” joining operator
-
input module¶
-
k1lib.bioinfo.cli.input.
cat
(fileName: Optional[str] = None)[source]¶ Reads a file line by line.
- Parameters
fileName – if None, then return a
BaseCli
that accepts a file name and outputs Iterator[str]
-
class
k1lib.bioinfo.cli.input.
cats
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Like
cat()
, but opens multiple files at once, returning streams. Looks something like this:apply(lambda s: cat(s))
-
k1lib.bioinfo.cli.input.
wget
(url: str, fileName: Optional[str] = None)[source]¶ Downloads a file
- Parameters
url – The url of the file
fileName – if None, then tries to infer it from the url
-
class
k1lib.bioinfo.cli.input.
cmd
(cmd: str)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
property
err
¶ Error from the last command
-
property
kcsv module¶
This module is for dealing with csv stuff
kxml module¶
This module is for dealing with xml stuff
-
class
k1lib.bioinfo.cli.kxml.
node
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Turns lines into a single node
-
__ror__
(it: Iterator[str]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
maxDepth
(depth: Optional[int] = None, copy: bool = True)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(depth: Optional[int] = None, copy: bool = True)[source]¶ Filters out too deep nodes
- Parameters
depth – max depth to include in
copy – whether to limit the nodes itself, or limit a copy
-
__ror__
(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
tag
(tag: str)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(tag: str)[source]¶ Finds all tags that have a particular name. If found, then don’t search deeper
-
__ror__
(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
pretty
(indent: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__ror__
(it: Iterator[xml.etree.ElementTree.Element]) → Iterator[str][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
display
(depth: int = 3, lines: int = 20)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(depth: int = 3, lines: int = 20)[source]¶ Convenience method for getting head, make it pretty and print it out
-
__ror__
(it: Iterator[xml.etree.ElementTree.Element], lines=10)[source]¶
-
modifier module¶
This is for quick modifiers, think of them as changing formats
-
class
k1lib.bioinfo.cli.modifier.
lstrip
(char: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Strips left of every line
-
class
k1lib.bioinfo.cli.modifier.
rstrip
(char: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Strips right of every line
-
class
k1lib.bioinfo.cli.modifier.
strip
(char: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Strips both sides of every line
-
class
k1lib.bioinfo.cli.modifier.
upper
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Make all characters uppercase
-
class
k1lib.bioinfo.cli.modifier.
lower
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Make all characters lowercase
-
class
k1lib.bioinfo.cli.modifier.
replace
(s: str, target: Optional[str] = None)[source]¶
-
class
k1lib.bioinfo.cli.modifier.
sort
(column: int = 0, reverse=False, numeric=True, delim: Optional[str] = None)[source]¶
output module¶
For operations that feel like the termination
sam module¶
This is for functions that are .sam or .bam related
-
class
k1lib.bioinfo.cli.sam.
header
(long=True)[source]¶
structural module¶
This is for functions that sort of changes the table structure in a dramatic way. They’re the core transformations
-
class
k1lib.bioinfo.cli.structural.
joinColumns
(delim: Optional[str] = None, sep: bool = False)[source]¶
-
class
k1lib.bioinfo.cli.structural.
joinRows
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Join multiple stream of rows
-
k1lib.bioinfo.cli.structural.
joinStreams
¶
-
class
k1lib.bioinfo.cli.structural.
splitColumns
(delim: Optional[str] = None)[source]¶
-
class
k1lib.bioinfo.cli.structural.
insertRow
(*columns: Union[List[str], str], delim: Optional[str] = None)[source]¶
-
k1lib.bioinfo.cli.structural.
insertIdColumn
(begin=True, delim: Optional[str] = None)[source]¶ Inserts an id column at the beginning (or end)
-
class
k1lib.bioinfo.cli.structural.
toDict
(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶ Transform an incoming stream into a dict using a function for values. Example:
names = ["wanda", "vision", "loki", "mobius"] names | toDict(valueF=lambda s: len(s)) # will return {"wanda": 5, "vision": 6, ...} names | toDict(lambda s: s.title(), lambda s: len(s)) # will return {"Wanda": 5, "Vision": 6, ...}
-
-
class
k1lib.bioinfo.cli.structural.
split
(delim: Optional[str] = None, idx: Optional[int] = None)[source]¶
-
class
k1lib.bioinfo.cli.structural.
count
(delim: Optional[str] = None)[source]¶
-
class
k1lib.bioinfo.cli.structural.
permute
(permutations: List[int], delim: Optional[str] = None)[source]¶
-
class
k1lib.bioinfo.cli.structural.
accumulate
(column: int = 0, avg=False, delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(column: int = 0, avg=False, delim: Optional[str] = None)[source]¶ Groups lines that have the same line.split(delim)[column], and add together all other columns, assuming they’re floats
- Args:
column: common column to accumulate avg: calculate average values instead of sum delim: specify delimiter between columns
-
-
class
k1lib.bioinfo.cli.structural.
AA_
(*idxs: List[int], wraps=False)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*idxs: List[int], wraps=False)[source]¶ Returns 2 streams, one that has the selected element, and the other the rest. Example:
[1, 5, 6, 3, 7] | AA_(1) # will return [5, [1, 6, 3, 7]]
You can also put multiple indexes through:
[1, 5, 6] | AA_(0, 2) # will return [[1, [5, 6]], [6, [1, 5]]]
If you put None in, then all indexes will be sliced:
[1, 5, 6] | AA_(0, 2) # will return: # [[1, [5, 6]], # [5, [1, 6]], # [6, [1, 5]]]
As for why the strange name, think of this operation as “AĀ”. In statistics, say you have a set “A”, then “not A” is commonly written as A with an overline “Ā”. So “AA_” represents “AĀ”, and that it first returns the selection A first.
- Parameters
wraps – if True, then the first example will return [[5, [1, 6, 3, 7]]] instead, so that A has the same signature as Ā
-
-
class
k1lib.bioinfo.cli.structural.
infinite
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Takes in a stream and yields an infinite amount of them. Example:
# returns [[1, 2, 3], [1, 2, 3], [1, 2, 3]] [1, 2, 3] | infinite() | head(3) | toList()
utils module¶
This is for all short utilities that has the boilerplate feeling
-
class
k1lib.bioinfo.cli.utils.
size
(idx=None, delim: Optional[str] = None)[source]¶
-
k1lib.bioinfo.cli.utils.
shape
¶ alias of
k1lib.bioinfo.cli.utils.size
-
class
k1lib.bioinfo.cli.utils.
item
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns the first row
-
class
k1lib.bioinfo.cli.utils.
identity
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Yields whatever the input is. Useful for multiple streams
-
class
k1lib.bioinfo.cli.utils.
toInt
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts every row into an integer. Excludes non numbers if not in strict mode.
-
class
k1lib.bioinfo.cli.utils.
toFloat
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts every row into a float. Excludes non numbers if not in strict mode.
-
class
k1lib.bioinfo.cli.utils.
toNumpy
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to numpy array
-
class
k1lib.bioinfo.cli.utils.
toList
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to list.
list
would do the same, but this is just to maintain the style
-
class
k1lib.bioinfo.cli.utils.
wrapList
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Wraps inputs inside a list
-
class
k1lib.bioinfo.cli.utils.
toSet
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to set.
set
would do the same, but this is just to maintain the style
-
class
k1lib.bioinfo.cli.utils.
toIter
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts object to iterator. iter() would do the same, but this is just to maintain the style
-
class
k1lib.bioinfo.cli.utils.
toRange
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns iter(range(len(it))), effectively
-
class
k1lib.bioinfo.cli.utils.
equals
[source]¶ Bases:
object
Checks if all incoming columns/streams are identical
-
class
k1lib.bioinfo.cli.utils.
reverse
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Prints last line first, first line last
-
class
k1lib.bioinfo.cli.utils.
ignore
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Just executes everything, ignoring the output
-
class
k1lib.bioinfo.cli.utils.
avg
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Calculates average of list of numbers
-
k1lib.bioinfo.cli.utils.
headerIdx
(delim: Optional[str] = None)[source]¶ Cuts out first line, put an index column next to it, and prints it out. Useful when you want to know what your column’s index is to cut it out.
-
class
k1lib.bioinfo.cli.utils.
dereference
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Recursively converts any iterator into a list. Only
str
,numbers.Number
are not converted. Example:iter(range(5)) # returns something like "<range_iterator at 0x7fa8c52ca870>" iter(range(5)) | deference() # returns [0, 1, 2, 3, 4]