cli package¶
The main idea of this package is to emulate the terminal, but doing all of that inside Python itself. So this bash statement:
cat file.txt | head -5 > headerFile.txt
Turns into this statement:
cat("file.txt") | head(5) > file("headerFile.txt")
Here, “cat”, “head” and “file” are all classes extended
from BaseCli
. All of
them implements the “reverse or” operation, or __ror__.
Essentially, these 2 statements are equivalent:
3 | obj
obj.__ror__(3)
Also, a lot of these tools assume that we are operating on a table. So this table:
col1 |
col2 |
col3 |
---|---|---|
1 |
2 |
3 |
4 |
5 |
6 |
Is equivalent to this list:
[["col1", "col2", "col3"], [1, 2, 3], [4, 5, 6]]
Also, the expected way to use these tools is to import everything directly into the current environment, like this:
from k1lib.bioinfo.cli import *
Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. If this interests you, check over this:
Submodules¶
bio module¶
This is for functions that are actually biology-related
-
class
k1lib.bioinfo.cli.bio.
transcribe
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Transcribes (DNA -> RNA) incoming rows
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.bio.
complement
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.bio.
translate
(length: int = 0)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(length: int = 0)[source]¶ Translates incoming rows.
- Parameters
length – 0 for short (L), 1 for med (Leu), 2 for long (Leucine)
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.bio.
medAa
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts short aa sequence to medium one
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.bio.
longAa
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts short aa sequence to long one
-
__ror__
(it)¶
-
ctx module¶
All tools related to context variables. Expected to use behind the “ctx” module name, like this:
from k1lib.bioinfo.cli import *
ctx.set("country", 3)
-
class
k1lib.bioinfo.cli._ctx.
Promise
(ctx: str)[source]¶ Bases:
object
Not intended to be used by the end user. Use
__call__()
to get the actual value
entrez module¶
This module is not really fleshed out, not that useful/elegant, and I just use
cmd
instead
mgi module¶
All tools related to the MGI database. Expected to use behind the “mgi” module name, like this:
from k1lib.bioinfo.cli import *
["SOD1", "AMPK"] | mgi.batch()
-
class
k1lib.bioinfo.cli.mgi.
batch
(headless=True)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Queries MGI database, convert list of genes to MGI ids
filt module¶
This is for functions that cuts out specific parts of the table
-
class
k1lib.bioinfo.cli.filt.
filt
(predicate: Callable[[str], bool], column: Optional[int] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(predicate: Callable[[str], bool], column: Optional[int] = None)[source]¶ Filters out lines.
- Parameters
column –
if integer, then predicate(row[column])
if None, then predicate(line)
-
__ror__
(it)¶
-
-
k1lib.bioinfo.cli.filt.
isValue
(value, column: Optional[int] = None)[source]¶ Filters out lines that is different from the given value
-
k1lib.bioinfo.cli.filt.
inSet
(values: Set[Any], column: Optional[int] = None)[source]¶ Filters out lines that is not in the specified set
-
k1lib.bioinfo.cli.filt.
contains
(s: str, column: Optional[int] = None)[source]¶ Filters out lines that don’t contain the specified substring
-
class
k1lib.bioinfo.cli.filt.
nonEmptyStream
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Filters out streams that have no rows
-
__ror__
(it)¶
-
-
k1lib.bioinfo.cli.filt.
startswith
(s: str, column: Optional[int] = None)[source]¶ Filters out lines that don’t start with s
-
k1lib.bioinfo.cli.filt.
endswith
(s: str, column: Optional[int] = None)[source]¶ Filters out lines that don’t end with s
-
k1lib.bioinfo.cli.filt.
isNumeric
(column: Optional[int] = None)[source]¶ Filters out a line if that column is not a number
-
k1lib.bioinfo.cli.filt.
inRange
(min: Optional[float] = None, max: Optional[float] = None, column: Optional[int] = None)[source]¶ Checks whether a column is in range or not
-
class
k1lib.bioinfo.cli.filt.
head
(n: int = 10)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(n: int = 10)[source]¶ Only outputs first
n
lines. You can also negate it (like~head(5)
), which then only outputs after firstn
lines. Examples:"abcde" | head(2) | dereference() # returns ["a", "b"] "abcde" | ~head(2) | dereference() # returns ["c", "d", "e"] "0123456" | head(-3) | dereference() # returns ['0', '1', '2', '3'] "0123456" | ~head(-3) | dereference() # returns ['4', '5', '6']
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.filt.
columns
(*columns: List[int])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*columns: List[int])[source]¶ Cuts out specific columns, sliceable. Examples:
["0123456789"] | cut(5, 8) | dereference() # returns [['5', '8']] ["0123456789"] | cut(2) | dereference() # returns ['2'] ["0123456789"] | cut(5, 8) | dereference() # returns [['5', '8']] ["0123456789"] | ~cut()[:7:2] | dereference() # returns [['1', '3', '5', '7', '8', '9']]
If you’re selecting only 1 column, then Iterator[T] will be returned, not Table[T].
-
__ror__
(it)¶
-
-
k1lib.bioinfo.cli.filt.
cut
¶ alias of
k1lib.bioinfo.cli.filt.columns
-
class
k1lib.bioinfo.cli.filt.
rows
(*rows: List[int])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*rows: List[int])[source]¶ Cuts out specific rows. Space complexity O(1) as a list is not constructed (unless you’re using some really weird slices).
- Parameters
rows – ints for the row indices
Example:
"0123456789" | rows(2) | dereference() # returns ["2"] "0123456789" | rows(5, 8) | dereference() # returns ["5", "8"] "0123456789" | rows()[2:5] | dereference() # returns ["2", "3", "4"] "0123456789" | ~rows()[2:5] | dereference() # returns ["0", "1", "5", "6", "7", "8", "9"] "0123456789" | ~rows()[:7:2] | dereference() # returns ['1', '3', '5', '7', '8', '9'] "0123456789" | rows()[:-4] | dereference() # returns ['0', '1', '2', '3', '4', '5'] "0123456789" | ~rows()[:-4] | dereference() # returns ['6', '7', '8', '9']
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.filt.
intersection
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns the intersection of multiple streams. Example:
[[1, 2, 3, 4, 5], [7, 2, 4, 6, 5]] | intersection() # will return set([2, 4, 5])
-
__ror__
(it)¶
-
gb module¶
All tools related to GenBank file format. Expected to use behind the “gb” module name, like this:
from k1lib.bioinfo.cli import *
cat("abc.gb") | gb.feats()
-
class
k1lib.bioinfo.cli.gb.
feats
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Fetches features, each on a separate stream
-
static
filt
(*terms: str) → k1lib.bioinfo.cli.init.BaseCli[source]¶ Filters for specific terms in all the features texts. If there are multiple terms, then filters for first term, then second, then third, so the term’s order might matter to you
-
static
tag
(tag: str) → k1lib.bioinfo.cli.init.BaseCli[source]¶ Gets a single tag out. Applies this on a single feature only
-
static
-
class
k1lib.bioinfo.cli.gb.
origin
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Return the origin section of the genbank file
grep module¶
-
class
k1lib.bioinfo.cli.grep.
grep
(pattern: str, before: int = 0, after: int = 0)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(pattern: str, before: int = 0, after: int = 0)[source]¶ Find lines that has the specified pattern. Example: .. code-block:
# returns ['c', 'd', '2', 'd'] "abcde12d34" | grep("d", 1) | dereference() # returns ['d', 'e', 'd', '3', '4'] "abcde12d34" | grep("d", 0, 3).till("e") | dereference()
- Parameters
pattern – regex pattern to search for in a line
before – lines before the hit. Outputs independent lines
after – lines after the hit. Outputs independent lines
-
till
(pattern: str)[source]¶ Greps until some other pattern appear. Before lines will be honored, but after lines will be set to inf. Inclusive.
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.grep.
grepToTable
(pattern: str, before: int = 0, after: int = 0)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(pattern: str, before: int = 0, after: int = 0)[source]¶ Searches for a pattern. If found, then put all the before and after lines in different columns. Example:
# returns [['2', 'b'], ['5', 'b']] "1a\n 2b\n 3c\n 4d\n 5b\n 6c\n f" | grepToTable("b", 1) | dereference()
-
__ror__
(it)¶
-
init module¶
-
cli.
bioinfoSettings
= {'defaultDelim': '\t', 'defaultIndent': ' ', 'lookupImgs': True, 'oboFile': None, 'strict': False}¶ Main settings of
k1lib.bioinfo.cli
. When using:from k1lib.bioinfo.cli import *
…you can just set the settings like this:
bioinfoSettings["defaultIndent"] = "\t"
There are a few settings:
defaultDelim: default delimiter used in-between columns when creating tables
defaultIndent: default indent used for displaying nested structures
lookupImgs: whether to automatically look up images when exploring something
oboFile: gene ontology obo file location
strict: whether strict mode is on. Turning it on can help you debug stuff, but could also be a pain to work with
-
class
k1lib.bioinfo.cli.init.
BaseCli
[source]¶ Bases:
object
-
all
() → k1lib.bioinfo.cli.init.BaseCli[source]¶ Applies this BaseCli to all incoming streams
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.init.
serial
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Merges clis into 1, feeding end to end. Used in chaining clis together without a prime iterator. Meaning, without this, stuff like this fails to run:
[1, 2] | a() | b() # runs c = a() | b(); [1, 2] | c # doesn't run if this class doesn't exist
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.init.
oneToMany
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Duplicates 1 stream into multiple streams, each for a cli in the list. Used in the “a & b” joining operator
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.init.
manyToMany
(cli)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.init.
manyToManySpecific
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Applies multiple streams to multiple clis independently. Used in the “a + b” joining operator
-
__ror__
(it)¶
-
inp module¶
-
k1lib.bioinfo.cli.inp.
cat
(fileName: Optional[str] = None)[source]¶ Reads a file line by line.
- Parameters
fileName – if None, then return a
BaseCli
that accepts a file name and outputs Iterator[str]
-
class
k1lib.bioinfo.cli.inp.
cats
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Like
cat()
, but opens multiple files at once, returning streams. Looks something like this:apply(lambda s: cat(s))
-
__ror__
(it)¶
-
-
k1lib.bioinfo.cli.inp.
wget
(url: str, fileName: Optional[str] = None)[source]¶ Downloads a file
- Parameters
url – The url of the file
fileName – if None, then tries to infer it from the url
-
class
k1lib.bioinfo.cli.inp.
cmd
(cmd: str)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
property
err
¶ Error from the last command
-
__ror__
(it)¶
-
property
kcsv module¶
All tools related to csv file format. Expected to use behind the “kcsv” module name, like this:
from k1lib.bioinfo.cli import *
kcsv.cat("file.csv") | display()
kxml module¶
All tools related to xml file format. Expected to use behind the “kxml” module name, like this:
from k1lib.bioinfo.cli import *
cat("abc.xml") | kxml.node() | kxml.display()
-
class
k1lib.bioinfo.cli.kxml.
node
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Turns lines into a single node
-
__ror__
(it: Iterator[str]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
maxDepth
(depth: Optional[int] = None, copy: bool = True)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(depth: Optional[int] = None, copy: bool = True)[source]¶ Filters out too deep nodes
- Parameters
depth – max depth to include in
copy – whether to limit the nodes itself, or limit a copy
-
__ror__
(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
tag
(tag: str)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(tag: str)[source]¶ Finds all tags that have a particular name. If found, then don’t search deeper
-
__ror__
(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
pretty
(indent: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__ror__
(it: Iterator[xml.etree.ElementTree.Element]) → Iterator[str][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
display
(depth: int = 3, lines: int = 20)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(depth: int = 3, lines: int = 20)[source]¶ Convenience method for getting head, make it pretty and print it out
-
__ror__
(it: Iterator[xml.etree.ElementTree.Element], lines=10)[source]¶
-
modifier module¶
This is for quick modifiers, think of them as changing formats
-
class
k1lib.bioinfo.cli.modifier.
apply
(f: Callable[[str], str], column: Optional[int] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(f: Callable[[str], str], column: Optional[int] = None)[source]¶ Applies a function f to every line
- Parameters
column – if not None, then applies the function to that column only
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.modifier.
applySingle
(f: Callable[[Any], Any])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(f: Callable[[Any], Any])[source]¶ Like
apply
, but much simpler, just operating on the entire input object, essentially
-
__ror__
(it)¶
-
-
k1lib.bioinfo.cli.modifier.
applyS
¶
-
k1lib.bioinfo.cli.modifier.
lstrip
(column: Optional[int] = None, char: Optional[str] = None)[source]¶ Strips left of every line
-
k1lib.bioinfo.cli.modifier.
rstrip
(column: Optional[int] = None, char: Optional[str] = None)[source]¶ Strips right of every line
-
k1lib.bioinfo.cli.modifier.
strip
(column: Optional[int] = None, char: Optional[str] = None)[source]¶ Strips both sides of every line
-
k1lib.bioinfo.cli.modifier.
upper
(column: Optional[int] = None)[source]¶ Make all characters uppercase
-
k1lib.bioinfo.cli.modifier.
lower
(column: Optional[int] = None)[source]¶ Make all characters lowercase
-
k1lib.bioinfo.cli.modifier.
replace
(s: str, target: Optional[str] = None, column: Optional[int] = None)[source]¶ Replaces substring s with target for each line.
-
k1lib.bioinfo.cli.modifier.
remove
(s: str, column: Optional[int] = None)[source]¶ Removes a specific substring in each line.
-
k1lib.bioinfo.cli.modifier.
toFloat
(column: Optional[int] = None)[source]¶ Converts every row into a float. Excludes non numbers if not in strict mode.
-
k1lib.bioinfo.cli.modifier.
toInt
(column: Optional[int] = None)[source]¶ Converts every row into an integer. Excludes non numbers if not in strict mode.
-
class
k1lib.bioinfo.cli.modifier.
sort
(column: int = 0, numeric=True, reverse=False)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(column: int = 0, numeric=True, reverse=False)[source]¶ Sorts all lines based on a specific column.
- Parameters
numeric – whether to treat column as float
reverse – False for smaller to bigger, True for bigger to smaller. Use
__invert__()
to quickly reverse the order instead of using this param
-
__ror__
(it)¶
-
output module¶
For operations that feel like the termination
-
class
k1lib.bioinfo.cli.output.
file
(fileName: str)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.output.
pretty
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Pretty prints a table
-
__ror__
(it)¶
-
sam module¶
This is for functions that are .sam or .bam related
-
class
k1lib.bioinfo.cli.sam.
header
(long=True)[source]¶
structural module¶
This is for functions that sort of changes the table structure in a dramatic way. They’re the core transformations
-
class
k1lib.bioinfo.cli.structural.
joinColumns
(fillValue=None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(fillValue=None)[source]¶ Join multiple columns and loop through all rows. Aka transpose.
- Parameters
fillValue – if not None, then will try to zip longest with this fill value
Example:
# returns [[1, 4], [2, 5], [3, 6]] [[1, 2, 3], [4, 5, 6]] | joinColumns() | dereference() # returns [[1, 4], [2, 5], [3, 6], [0, 7]] [[1, 2, 3], [4, 5, 6, 7]] | joinColumns(0) | dereference()
-
__ror__
(it)¶
-
-
k1lib.bioinfo.cli.structural.
transpose
¶
-
k1lib.bioinfo.cli.structural.
splitColumns
¶
-
class
k1lib.bioinfo.cli.structural.
joinList
(element=None, begin=True)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(element=None, begin=True)[source]¶ Join element into list.
- Parameters
element – the element to insert. If None, then takes the input [e, […]], else takes the input […] as usual
Example:
# returns [5, 2, 6, 8] [5, [2, 6, 8]] | joinList() # also returns [5, 2, 6, 8] [2, 6, 8] | joinList(5)
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
splitList
(weights: List[float] = [0.8, 0.2])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(weights: List[float] = [0.8, 0.2])[source]¶ Splits list of elements into multiple lists. Example:
# returns [[0, 1, 2, 3, 4, 5, 6, 7], [8, 9]] range(10) | splitList([0.8, 0.2]) | dereference()
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
joinStreams
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Join multiple streams. Example:
# returns [1, 2, 3, 4, 5] [[1, 2, 3], [4, 5]] | joinStreams() | dereference()
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
joinStreamsRandom
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Join multiple streams randomly. If any streams runs out, then quits. Example:
# could return [0, 1, 10, 2, 11, 12, 13, ...], with max length 20, typical length 18 [range(0, 10), range(10, 20)] | joinStreamsRandom() | dereference()
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
batched
(bs=32)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(bs=32)[source]¶ Batches the input stream. Ignores the last batch. Example:
# returns [[0, 1, 2], [3, 4, 5], [6, 7, 8]] range(11) | batched(3) | dereference()
-
__ror__
(it)¶
-
-
k1lib.bioinfo.cli.structural.
collate
()[source]¶ Puts individual columns into a tensor. Example:
# returns [tensor([ 0, 10, 20]), tensor([ 1, 11, 21]), tensor([ 2, 12, 22])] [range(0, 3), range(10, 13), range(20, 23)] | collate() | toList()
-
k1lib.bioinfo.cli.structural.
insertRow
(*row: List[T])[source]¶ Inserts a row right before every other rows. See also:
joinList()
.
-
k1lib.bioinfo.cli.structural.
insertColumn
(*column, begin=True, fillValue='')[source]¶ Inserts a column at beginning or end. Example:
# returns [['a', 1, 2], ['b', 3, 4]] [[1, 2], [3, 4]] | insertColumn("a", "b") | dereference()
-
k1lib.bioinfo.cli.structural.
insertIdColumn
(table=False, begin=True, fillValue='')[source]¶ Inserts an id column at the beginning (or end). Example:
# returns [[0, 'a', 2], [1, 'b', 4]] [["a", 2], ["b", 4]] | insertIdColumn(True) | dereference() # returns [[0, 'a'], [1, 'b']] "ab" | insertIdColumn()
- Parameters
table – if False, then insert column to an Iterator[str], else treat input as a full fledged table
-
class
k1lib.bioinfo.cli.structural.
toDict
(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶ Transform an incoming stream into a dict using a function for values. Example:
names = ["wanda", "vision", "loki", "mobius"] names | toDict(valueF=lambda s: len(s)) # will return {"wanda": 5, "vision": 6, ...} names | toDict(lambda s: s.title(), lambda s: len(s)) # will return {"Wanda": 5, "Vision": 6, ...}
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
split
(delim: Optional[str] = None, idx: Optional[int] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(delim: Optional[str] = None, idx: Optional[int] = None)[source]¶ Splits each line using a delimiter, and outputs the parts as a separate line.
- Parameters
idx – if available, only outputs the element at that index
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
table
(delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(delim: Optional[str] = None)[source]¶ Splits lines to rows (List[str]) using a delimiter. Example:
# returns [['a', 'bd'], ['1', '2', '3']] ["a|bd", "1|2|3"] | table("|") | dereference()
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
stitch
(delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(delim: Optional[str] = None)[source]¶ Stitches elements in a row together, so they become a simple string. See also:
pretty
. Example:# returns ['1|2', '3|4', '5|6'] [[1, 2], [3, 4], [5, 6]] | stitch("|") | dereference()
-
__ror__
(it)¶
-
-
k1lib.bioinfo.cli.structural.
tableFromList
()¶ Turns Iterator[T] into Table[T]
-
class
k1lib.bioinfo.cli.structural.
count
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Finds unique elements and returns a table with [frequency, value, percent] columns. Example:
# returns [[1, 'a', '33%'], [2, 'b', '67%']] ['a', 'b', 'b'] | count() | dereference()
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
permute
(*permutations: List[int])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*permutations: List[int])[source]¶ Permutes the columns. Acts kinda like
torch.Tensor.permute()
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
accumulate
(columnIdx: int = 0, avg=False)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(columnIdx: int = 0, avg=False)[source]¶ Groups lines that have the same row[columnIdx], and add together all other columns, assuming they’re numbers
- Parameters
columnIdx – common column index to accumulate
avg – calculate average values instead of sum
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
AA_
(*idxs: List[int], wraps=False)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*idxs: List[int], wraps=False)[source]¶ Returns 2 streams, one that has the selected element, and the other the rest. Example:
[1, 5, 6, 3, 7] | AA_(1) # will return [5, [1, 6, 3, 7]]
You can also put multiple indexes through:
[1, 5, 6] | AA_(0, 2) # will return [[1, [5, 6]], [6, [1, 5]]]
If you put None in, then all indexes will be sliced:
[1, 5, 6] | AA_(0, 2) # will return: # [[1, [5, 6]], # [5, [1, 6]], # [6, [1, 5]]]
As for why the strange name, think of this operation as “AĀ”. In statistics, say you have a set “A”, then “not A” is commonly written as A with an overline “Ā”. So “AA_” represents “AĀ”, and that it first returns the selection A.
- Parameters
wraps – if True, then the first example will return [[5, [1, 6, 3, 7]]] instead, so that A has the same signature as Ā
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
peek
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns (firstRow, iterator). This sort of peaks at the first row, to potentially gain some insights about the internal formats. Example:
e, it = iter([[1, 2, 3], [1, 2]]) | peek() print(e) # prints "[1, 2, 3]" s = 0 for e in it: s += len(e) print(s) # prints "5", or length of 2 lists
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.structural.
repeat
(limit: Optional[int] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Yields a specified amount of the passed in object. Example:
# returns [[1, 2, 3], [1, 2, 3], [1, 2, 3]] [1, 2, 3] | repeat(3) | toList()
- Parameters
repeat – if None, then repeats indefinitely
-
__ror__
(it)¶
-
class
k1lib.bioinfo.cli.structural.
infiniteFrom
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Yields from a list. If runs out of elements, then do it again. Example:
# returns [1, 2, 3, 1, 2] [1, 2, 3] | infiniteFrom() | head(5) | dereference()
-
__ror__
(it)¶
-
utils module¶
This is for all short utilities that has the boilerplate feeling
-
class
k1lib.bioinfo.cli.utils.
size
(idx=None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(idx=None)[source]¶ Returns number of rows and columns in the input.
- Parameters
idx – if idx is None return (rows, columns). If 0 or 1, then rows or columns
-
__ror__
(it)¶
-
-
k1lib.bioinfo.cli.utils.
shape
¶ alias of
k1lib.bioinfo.cli.utils.size
-
class
k1lib.bioinfo.cli.utils.
item
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns the first row
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
identity
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Yields whatever the input is. Useful for multiple streams
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
toStr
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts every line (possibly just a number) to a string.
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
to1Str
(delim: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
toNumpy
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to numpy array
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
toTensor
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to
torch.Tensor
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
toList
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to list.
list
would do the same, but this is just to maintain the style-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
wrapList
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Wraps inputs inside a list
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
toSet
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to set.
set
would do the same, but this is just to maintain the style-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
toIter
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts object to iterator. iter() would do the same, but this is just to maintain the style
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
toRange
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns iter(range(len(it))), effectively
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
equals
[source]¶ Bases:
object
Checks if all incoming columns/streams are identical
-
class
k1lib.bioinfo.cli.utils.
reverse
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Prints last line first, first line last
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
ignore
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Just executes everything, ignoring the output
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
toSum
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Calculates the sum of list of numbers
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
toAvg
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Calculates average of list of numbers
-
__ror__
(it)¶
-
-
class
k1lib.bioinfo.cli.utils.
lengths
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns the lengths of each row.
-
__ror__
(it)¶
-
-
k1lib.bioinfo.cli.utils.
headerIdx
()[source]¶ Cuts out first line, put an index column next to it, and prints it out. Useful when you want to know what your column’s index is to cut it out. Example:
# returns [[0, 'a'], [1, 'b'], [2, 'c']] ["abc"] | headerIdx() | dereference()
-
class
k1lib.bioinfo.cli.utils.
dereference
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Recursively converts any iterator into a list. Only
str
,numbers.Number
are not converted. Example:iter(range(5)) # returns something like "<range_iterator at 0x7fa8c52ca870>" iter(range(5)) | deference() # returns [0, 1, 2, 3, 4]
Warning
Can work well with PyTorch Tensors, but not Numpy’s array as they screw things up with the __ror__ operator, so do torch.from_numpy(…) first.
-
__ror__
(it)¶
-