cli package¶
The main idea of this package is to emulate the terminal, but doing all of that inside Python itself. So this bash statement:
cat file.txt | head -5 > headerFile.txt
Turns into this statement:
cat("file.txt") | head(5) > file("headerFile.txt")
Here, “cat”, “head” and “file” are all classes extended
from BaseCli
. All of
them implements the “reverse or” operation, or __ror__.
Essentially, these 2 statements are equivalent:
3 | obj
obj.__ror__(3)
Also, a lot of these tools assume that we are operating on a table. So this table:
col1 |
col2 |
col3 |
---|---|---|
1 |
2 |
3 |
4 |
5 |
6 |
Is equivalent to this list:
[["col1", "col2", "col3"], [1, 2, 3], [4, 5, 6]]
Also, the expected way to use these tools is to import everything directly into the current environment, like this:
from k1lib.bioinfo.cli import *
Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. If this interests you, check over this:
Submodules¶
bio module¶
This is for functions that are actually biology-related
-
class
k1lib.bioinfo.cli.bio.
transcribe
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Transcribes (DNA -> RNA) incoming rows
-
class
k1lib.bioinfo.cli.bio.
translate
(length: int = 0)[source]¶
-
class
k1lib.bioinfo.cli.bio.
medAa
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts short aa sequence to medium one
ctx module¶
-
class
k1lib.bioinfo.cli.ctx.
Promise
(ctx: str)¶ Bases:
object
-
__init__
(ctx: str)[source]¶ A delayed variable that represents a value in the current context. Not intended to be instantiated by the end user. Use
__call__()
to get the actual value (aka “dereferencing”).This delayed variable just loves to be dereferenced. A lot of operations that you do with it will dereferences it right away, like this:
from k1lib.bioinfo.cli import * ctx["a"] = 4 f"value: {ctx['a']}" # returns string "value: 4" ctx['a'] + 5 # returns 9 ctx['a'] / 5 # returns 0.8
If a
Promise
attribute is set inBaseCli
subclass, then it will automagically be dereferenced at__ror__
ofBaseCli
.If you don’t interact with it directly like the above operations, but just pass it around, then it won’t dereference. You can then force it to do so like this:
[ctx['a'], 5] | ctx.dereference() # returns an iterator, with the first variable dereferenced [ctx['a'], 5] | ctx.dereference() | toList() # returns ['a', 5] [ctx['a'], 5] | dereference() # returns ['a', 5]
-
-
class
k1lib.bioinfo.cli.ctx.
consume
(ctx: str, **kwargs)¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(ctx: str, **kwargs)[source]¶ Consumes the input, dereferences it and stores it in context. Example:
# returns [2, 3, 4, 5, 6] range(5) | ctx.consume('a') | apply(lambda x: x+2) | toList() # returns [0, 1, 2, 3, 4] ctx['a']()
- Parameters
kwargs – args to pass to
dereference
.
-
-
k1lib.bioinfo.cli.ctx.
ctx
()¶ Returns the internal context dictionary. Only use this if you want to write your own context-manipulating Callbacks
-
class
k1lib.bioinfo.cli.ctx.
dereference
¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
If encountered a
Promise
, then replaces it with the value. It’s important to note thatk1lib.bioinfo.cli.utils.dereference
already replaced everyPromise
, so you don’t have to pass through this cli beforehand if you intend to dereference. Example:ctx.setC('a', 4) # returns [4] [ctx.Promise('a')] | ctx.dereference() | toList()
-
class
k1lib.bioinfo.cli.ctx.
enum
(ctx: str)¶
-
class
k1lib.bioinfo.cli.ctx.
f
(ctx: str, f: Optional[Callable[[T], T]] = None)¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(ctx: str, f: Optional[Callable[[T], T]] = None)[source]¶ Saves the f-transformed list element to context. Example:
# returns [['abc', 3], ['ab', 2]] ["abc", "ab"] | ctx.f('a', lambda s: len(s)) | apply(lambda r: [r, ctx['a']]) | dereference()
- Parameters
f – if not specified, then just save the object as-if
-
-
k1lib.bioinfo.cli.ctx.
getC
(ctx: str) → k1lib.bioinfo.cli.ctx.Promise¶ Gets the context variable. Shortcut available like this:
ctx["a"] = 4 ctx["a"] # return Promise, that will dereferences to 4
entrez module¶
This module is not really fleshed out, not that useful/elegant, and I just use
cmd
instead
mgi module¶
All tools related to the MGI database. Expected to use behind the “mgi” module name, like this:
from k1lib.bioinfo.cli import *
["SOD1", "AMPK"] | mgi.batch()
-
class
k1lib.bioinfo.cli.mgi.
batch
(headless=True)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Queries MGI database, convert list of genes to MGI ids
filt module¶
This is for functions that cuts out specific parts of the table
-
class
k1lib.bioinfo.cli.filt.
filt
(predicate: Callable[[T], bool], column: Optional[int] = None)[source]¶
-
k1lib.bioinfo.cli.filt.
isValue
(value, column: Optional[int] = None)[source]¶ Filters out lines that is different from the given value
-
k1lib.bioinfo.cli.filt.
inSet
(values: Set[Any], column: Optional[int] = None)[source]¶ Filters out lines that is not in the specified set
-
k1lib.bioinfo.cli.filt.
contains
(s: str, column: Optional[int] = None)[source]¶ Filters out lines that don’t contain the specified substring
-
class
k1lib.bioinfo.cli.filt.
nonEmptyStream
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Filters out streams that have no rows
-
k1lib.bioinfo.cli.filt.
startswith
(s: str, column: Optional[int] = None)[source]¶ Filters out lines that don’t start with s
-
k1lib.bioinfo.cli.filt.
endswith
(s: str, column: Optional[int] = None)[source]¶ Filters out lines that don’t end with s
-
k1lib.bioinfo.cli.filt.
isNumeric
(column: Optional[int] = None)[source]¶ Filters out a line if that column is not a number
-
k1lib.bioinfo.cli.filt.
instanceOf
(cls: Union[type, Tuple[type]], column: Optional[int] = None)[source]¶ Filters out lines that is not an instance of the given type
-
k1lib.bioinfo.cli.filt.
inRange
(min: Optional[float] = None, max: Optional[float] = None, column: Optional[int] = None)[source]¶ Checks whether a column is in range or not
-
class
k1lib.bioinfo.cli.filt.
head
(n: int = 10)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(n: int = 10)[source]¶ Only outputs first
n
lines. You can also negate it (like~head(5)
), which then only outputs after firstn
lines. Examples:"abcde" | head(2) | dereference() # returns ["a", "b"] "abcde" | ~head(2) | dereference() # returns ["c", "d", "e"] "0123456" | head(-3) | dereference() # returns ['0', '1', '2', '3'] "0123456" | ~head(-3) | dereference() # returns ['4', '5', '6']
-
-
class
k1lib.bioinfo.cli.filt.
columns
(*columns: List[int])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*columns: List[int])[source]¶ Cuts out specific columns, sliceable. Examples:
["0123456789"] | cut(5, 8) | dereference() # returns [['5', '8']] ["0123456789"] | cut(2) | dereference() # returns ['2'] ["0123456789"] | cut(5, 8) | dereference() # returns [['5', '8']] ["0123456789"] | ~cut()[:7:2] | dereference() # returns [['1', '3', '5', '7', '8', '9']]
If you’re selecting only 1 column, then Iterator[T] will be returned, not Table[T].
-
-
k1lib.bioinfo.cli.filt.
cut
¶ alias of
k1lib.bioinfo.cli.filt.columns
-
class
k1lib.bioinfo.cli.filt.
rows
(*rows: List[int])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*rows: List[int])[source]¶ Cuts out specific rows. Space complexity O(1) as a list is not constructed (unless you’re using some really weird slices).
- Parameters
rows – ints for the row indices
Example:
"0123456789" | rows(2) | dereference() # returns ["2"] "0123456789" | rows(5, 8) | dereference() # returns ["5", "8"] "0123456789" | rows()[2:5] | dereference() # returns ["2", "3", "4"] "0123456789" | ~rows()[2:5] | dereference() # returns ["0", "1", "5", "6", "7", "8", "9"] "0123456789" | ~rows()[:7:2] | dereference() # returns ['1', '3', '5', '7', '8', '9'] "0123456789" | rows()[:-4] | dereference() # returns ['0', '1', '2', '3', '4', '5'] "0123456789" | ~rows()[:-4] | dereference() # returns ['6', '7', '8', '9']
-
-
class
k1lib.bioinfo.cli.filt.
intersection
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns the intersection of multiple streams. Example:
# returns set([2, 4, 5]) [[1, 2, 3, 4, 5], [7, 2, 4, 6, 5]] | intersection()
-
class
k1lib.bioinfo.cli.filt.
union
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns the union of multiple streams. Example:
# returns {0, 1, 2, 10, 11, 12, 13, 14} [range(3), range(10, 15)] | union()
gb module¶
All tools related to GenBank file format. Expected to use behind the “gb” module name, like this:
from k1lib.bioinfo.cli import *
cat("abc.gb") | gb.feats()
-
class
k1lib.bioinfo.cli.gb.
feats
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Fetches features, each on a separate stream
-
static
filt
(*terms: str) → k1lib.bioinfo.cli.init.BaseCli[source]¶ Filters for specific terms in all the features texts. If there are multiple terms, then filters for first term, then second, then third, so the term’s order might matter to you
-
static
tag
(tag: str) → k1lib.bioinfo.cli.init.BaseCli[source]¶ Gets a single tag out. Applies this on a single feature only
-
static
-
class
k1lib.bioinfo.cli.gb.
origin
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Return the origin section of the genbank file
grep module¶
-
class
k1lib.bioinfo.cli.grep.
grep
(pattern: str, before: int = 0, after: int = 0)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(pattern: str, before: int = 0, after: int = 0)[source]¶ Find lines that has the specified pattern. Example:
# returns ['c', 'd', '2', 'd'] "abcde12d34" | grep("d", 1) | dereference() # returns ['d', 'e', 'd', '3', '4'] "abcde12d34" | grep("d", 0, 3).till("e") | dereference()
- Parameters
pattern – regex pattern to search for in a line
before – lines before the hit. Outputs independent lines
after – lines after the hit. Outputs independent lines
-
-
class
k1lib.bioinfo.cli.grep.
grepToTable
(pattern: str, before: int = 0, after: int = 0)[source]¶
init module¶
-
cli.
bioinfoSettings
= {'defaultDelim': '\t', 'defaultIndent': ' ', 'lookupImgs': True, 'oboFile': None, 'strict': False}¶ Main settings of
k1lib.bioinfo.cli
. When using:from k1lib.bioinfo.cli import *
…you can just set the settings like this:
bioinfoSettings["defaultIndent"] = "\t"
There are a few settings:
defaultDelim: default delimiter used in-between columns when creating tables
defaultIndent: default indent used for displaying nested structures
lookupImgs: whether to automatically look up images when exploring something
oboFile: gene ontology obo file location
strict: whether strict mode is on. Turning it on can help you debug stuff, but could also be a pain to work with
-
class
k1lib.bioinfo.cli.init.
BaseCli
[source]¶ Bases:
object
-
__and__
(cli: k1lib.bioinfo.cli.init.BaseCli) → k1lib.bioinfo.cli.init.oneToMany[source]¶ Duplicates input stream to multiple joined clis.
-
__add__
(cli: k1lib.bioinfo.cli.init.BaseCli) → k1lib.bioinfo.cli.init.manyToManySpecific[source]¶ Parallel pass multiple streams to multiple clis.
-
all
() → k1lib.bioinfo.cli.init.BaseCli[source]¶ Applies this cli to all incoming streams
-
__or__
(it) → k1lib.bioinfo.cli.init.serial[source]¶ Joins clis end-to-end
-
-
class
k1lib.bioinfo.cli.init.
serial
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Merges clis into 1, feeding end to end. Used in chaining clis together without a prime iterator. Meaning, without this, stuff like this fails to run:
[1, 2] | a() | b() # runs c = a() | b(); [1, 2] | c # doesn't run if this class doesn't exist
-
-
class
k1lib.bioinfo.cli.init.
oneToMany
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Duplicates 1 stream into multiple streams, each for a cli in the list. Used in the “a & b” joining operator
-
-
class
k1lib.bioinfo.cli.init.
manyToMany
(cli: k1lib.bioinfo.cli.init.BaseCli)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(cli: k1lib.bioinfo.cli.init.BaseCli)[source]¶ Applies multiple streams to a single cli. Used in the “a.all()” operator. Note that this operation will use a different copy of the cli for each of the streams.
-
-
class
k1lib.bioinfo.cli.init.
manyToManySpecific
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*clis: List[k1lib.bioinfo.cli.init.BaseCli])[source]¶ Applies multiple streams to multiple clis independently. Used in the “a + b” joining operator
-
inp module¶
This module for tools that will likely start the processing stream.
-
k1lib.bioinfo.cli.inp.
cat
(fileName: Optional[str] = None)[source]¶ Reads a file line by line. Example:
# display first 10 lines of file cat("file.txt") | headOut() # piping in also works "file.txt" | cat() | headOut()
- Parameters
fileName – if None, then return a
BaseCli
that accepts a file name and outputs Iterator[str]
-
class
k1lib.bioinfo.cli.inp.
cats
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Like
cat()
, but opens multiple files at once, returning streams. Looks something like this:apply(lambda s: cat(s))
Example:
# prints out first 10 lines of 2 files ["file1.txt", "file2.txt"] | cats() | headOut().all() | ignore()
-
k1lib.bioinfo.cli.inp.
curl
(url: str) → Iterator[str][source]¶ Gets file from url. File can’t be a binary blob. Example:
# prints out first 10 lines of the website curl("https://k1lib.github.io/") | cli.headOut()
-
k1lib.bioinfo.cli.inp.
wget
(url: str, fileName: Optional[str] = None)[source]¶ Downloads a file
- Parameters
url – The url of the file
fileName – if None, then tries to infer it from the url
-
k1lib.bioinfo.cli.inp.
ls
(folder: Optional[str] = None, dirs=True, files=True)[source]¶ List every file and folder inside the specified folder. Example:
# returns List[str] ls("/home") # same as above "/home" | ls() # only outputs files, not folders ls("/home", dirs=False) # same as above "/home" | ls(dirs=False)
-
class
k1lib.bioinfo.cli.inp.
cmd
(cmd: str)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(cmd: str)[source]¶ Runs a command, and returns the output line by line. Example:
# return detailed list of files None | cmd("ls -la") # return list of files that ends with "ipynb" None | cmd("ls -la") | cmd('grep ipynb$')
-
property
err
¶ Error from the last command
-
kcsv module¶
All tools related to csv file format. Expected to use behind the “kcsv” module name, like this:
from k1lib.bioinfo.cli import *
kcsv.cat("file.csv") | display()
kxml module¶
All tools related to xml file format. Expected to use behind the “kxml” module name, like this:
from k1lib.bioinfo.cli import *
cat("abc.xml") | kxml.node() | kxml.display()
-
class
k1lib.bioinfo.cli.kxml.
node
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Turns lines into a single node
-
__ror__
(it: Iterator[str]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
maxDepth
(depth: Optional[int] = None, copy: bool = True)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(depth: Optional[int] = None, copy: bool = True)[source]¶ Filters out too deep nodes
- Parameters
depth – max depth to include in
copy – whether to limit the nodes itself, or limit a copy
-
__ror__
(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
tag
(tag: str)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(tag: str)[source]¶ Finds all tags that have a particular name. If found, then don’t search deeper
-
__ror__
(nodes: Iterator[xml.etree.ElementTree.Element]) → Iterator[xml.etree.ElementTree.Element][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
pretty
(indent: Optional[str] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__ror__
(it: Iterator[xml.etree.ElementTree.Element]) → Iterator[str][source]¶
-
-
class
k1lib.bioinfo.cli.kxml.
display
(depth: int = 3, lines: int = 20)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(depth: int = 3, lines: int = 20)[source]¶ Convenience method for getting head, make it pretty and print it out
-
__ror__
(it: Iterator[xml.etree.ElementTree.Element], lines=10)[source]¶
-
modifier module¶
This is for quick modifiers, think of them as changing formats
-
class
k1lib.bioinfo.cli.modifier.
apply
(f: Callable[[str], str], column: Optional[int] = None)[source]¶
-
class
k1lib.bioinfo.cli.modifier.
applyMp
(f: Callable[[T], T], *args, **kwargs)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(f: Callable[[T], T], *args, **kwargs)[source]¶ Like
apply
, but executef(row)
of each row in multiple processes. Example:# returns [3, 2] ["abc", "de"] | applyMp(lambda s: len(s)) | dereference() # returns [5, 6, 9] range(3) | applyMp(lambda x, bias: x**2+bias, bias=5) | dereference()
Internally, this will continuously spawn new jobs up until 80% of all CPU cores are utilized. As the new processes will not share the same memory space as the main process, you should pass all dependencies in the arguments
- Parameters
args – arguments to be passed to the function.
kwargs
too
-
-
class
k1lib.bioinfo.cli.modifier.
applySingle
(f: Callable[[T], T])[source]¶
-
k1lib.bioinfo.cli.modifier.
applyS
¶
-
k1lib.bioinfo.cli.modifier.
lstrip
(column: Optional[int] = None, char: Optional[str] = None)[source]¶ Strips left of every line
-
k1lib.bioinfo.cli.modifier.
rstrip
(column: Optional[int] = None, char: Optional[str] = None)[source]¶ Strips right of every line
-
k1lib.bioinfo.cli.modifier.
strip
(column: Optional[int] = None, char: Optional[str] = None)[source]¶ Strips both sides of every line
-
k1lib.bioinfo.cli.modifier.
upper
(column: Optional[int] = None)[source]¶ Make all characters uppercase
-
k1lib.bioinfo.cli.modifier.
lower
(column: Optional[int] = None)[source]¶ Make all characters lowercase
-
k1lib.bioinfo.cli.modifier.
replace
(s: str, target: Optional[str] = None, column: Optional[int] = None)[source]¶ Replaces substring s with target for each line.
-
k1lib.bioinfo.cli.modifier.
remove
(s: str, column: Optional[int] = None)[source]¶ Removes a specific substring in each line.
-
k1lib.bioinfo.cli.modifier.
toFloat
(*columns: List[int])[source]¶ Converts every row into a float. Excludes non numbers if not in strict mode. Example:
# returns [1, 3, -2.3] ["1", "3", "-2.3"] | toFloat() | dereference() # returns [[1.0, 'a'], [2.3, 'b'], [8.0, 'c']] [["1", "a"], ["2.3", "b"], [8, "c"]] | toFloat(0) | dereference()
- Parameters
columns – if nothing, then will convert each row. If available, then convert all the specified columns
-
k1lib.bioinfo.cli.modifier.
toInt
(*columns: List[int])[source]¶ Converts every row into an integer. Excludes non numbers if not in strict mode. Example:
# returns [1, 3, -2] ["1", "3", "-2.3"] | toInt() | dereference()
- Parameters
columns – if nothing, then will convert each row. If available, then convert all the specified columns
See also:
toFloat()
-
class
k1lib.bioinfo.cli.modifier.
sort
(column: int = 0, numeric=True, reverse=False)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(column: int = 0, numeric=True, reverse=False)[source]¶ Sorts all lines based on a specific column.
- Parameters
column – if None, sort rows based on themselves and not an element
numeric – whether to treat column as float
reverse – False for smaller to bigger, True for bigger to smaller. Use
__invert__()
to quickly reverse the order instead of using this param
-
-
class
k1lib.bioinfo.cli.modifier.
sortF
(f: Callable[[T], float], reverse=False)[source]¶
-
class
k1lib.bioinfo.cli.modifier.
consume
(f: Union[k1lib.bioinfo.cli.init.BaseCli, Callable[[T], None]])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(f: Union[k1lib.bioinfo.cli.init.BaseCli, Callable[[T], None]])[source]¶ Consumes the iterator in a side stream. Returns the iterator. Kinda like the bash command
tee
. Example:# prints "0\n1\n2" and returns [0, 1, 2] range(3) | consume(headOut()) | toList() # prints "range(0, 3)" and returns [0, 1, 2] range(3) | consume(lambda it: print(it)) | toList()
This is useful whenever you want to mutate something, but don’t want to include the function result into the main stream.
-
-
class
k1lib.bioinfo.cli.modifier.
randomize
(bs=100)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(bs=100)[source]¶ Randomize input stream. In order to be efficient, this does not convert the input iterator to a giant list and yield random values from that. Instead, this fetches
bs
items at a time, randomizes them, returns and fetch anotherbs
items. If you want to do the giant list, then just pass infloat("inf")
. Example:# returns [0, 1, 2, 3, 4], effectively no randomize at all range(5) | randomize(1) | dereference() # returns something like this: [1, 0, 2, 3, 5, 4, 6, 8, 7, 9]. You can clearly see the batches range(10) | randomize(3) | dereference() # returns something like this: [7, 0, 5, 2, 4, 9, 6, 3, 1, 8] range(10) | randomize(float("inf")) | dereference()
-
output module¶
For operations that feel like the termination
-
class
k1lib.bioinfo.cli.output.
stdout
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Prints out all lines. If not iterable, then print out the input raw
-
class
k1lib.bioinfo.cli.output.
pretty
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Pretty prints a table
sam module¶
This is for functions that are .sam or .bam related
-
class
k1lib.bioinfo.cli.sam.
header
(long=True)[source]¶
structural module¶
This is for functions that sort of changes the table structure in a dramatic way. They’re the core transformations
-
class
k1lib.bioinfo.cli.structural.
joinColumns
(fillValue=None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(fillValue=None)[source]¶ Join multiple columns and loop through all rows. Aka transpose.
- Parameters
fillValue – if not None, then will try to zip longest with this fill value
Example:
# returns [[1, 4], [2, 5], [3, 6]] [[1, 2, 3], [4, 5, 6]] | joinColumns() | dereference() # returns [[1, 4], [2, 5], [3, 6], [0, 7]] [[1, 2, 3], [4, 5, 6, 7]] | joinColumns(0) | dereference()
-
-
k1lib.bioinfo.cli.structural.
transpose
¶
-
k1lib.bioinfo.cli.structural.
splitColumns
¶
-
class
k1lib.bioinfo.cli.structural.
joinList
(element=None, begin=True)[source]¶
-
class
k1lib.bioinfo.cli.structural.
splitList
(weights: List[float] = [0.8, 0.2])[source]¶
-
class
k1lib.bioinfo.cli.structural.
joinStreams
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Join multiple streams. Example:
# returns [1, 2, 3, 4, 5] [[1, 2, 3], [4, 5]] | joinStreams() | dereference()
-
class
k1lib.bioinfo.cli.structural.
joinStreamsRandom
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Join multiple streams randomly. If any streams runs out, then quits. Example:
# could return [0, 1, 10, 2, 11, 12, 13, ...], with max length 20, typical length 18 [range(0, 10), range(10, 20)] | joinStreamsRandom() | dereference()
-
class
k1lib.bioinfo.cli.structural.
batched
(bs=32, includeLast=False)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(bs=32, includeLast=False)[source]¶ Batches the input stream. Example:
# returns [[0, 1, 2], [3, 4, 5], [6, 7, 8]] range(11) | batched(3) | dereference() # returns [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]] range(11) | batched(3, True) | dereference() # returns [[0, 1, 2, 3, 4]] range(5) | batched(float("inf"), True) | dereference() # returns [] range(5) | batched(float("inf"), False) | dereference()
-
-
k1lib.bioinfo.cli.structural.
collate
()[source]¶ Puts individual columns into a tensor. Example:
# returns [tensor([ 0, 10, 20]), tensor([ 1, 11, 21]), tensor([ 2, 12, 22])] [range(0, 3), range(10, 13), range(20, 23)] | collate() | toList()
-
k1lib.bioinfo.cli.structural.
insertRow
(*row: List[T])[source]¶ Inserts a row right before every other rows. See also:
joinList()
.
-
k1lib.bioinfo.cli.structural.
insertColumn
(*column, begin=True, fillValue='')[source]¶ Inserts a column at beginning or end. Example:
# returns [['a', 1, 2], ['b', 3, 4]] [[1, 2], [3, 4]] | insertColumn("a", "b") | dereference()
-
k1lib.bioinfo.cli.structural.
insertIdColumn
(table=False, begin=True, fillValue='')[source]¶ Inserts an id column at the beginning (or end). Example:
# returns [[0, 'a', 2], [1, 'b', 4]] [["a", 2], ["b", 4]] | insertIdColumn(True) | dereference() # returns [[0, 'a'], [1, 'b']] "ab" | insertIdColumn()
- Parameters
table – if False, then insert column to an Iterator[str], else treat input as a full fledged table
-
class
k1lib.bioinfo.cli.structural.
toDict
(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(keyF: Optional[Callable[[Any], str]] = None, valueF: Optional[Callable[[Any], Any]] = None)[source]¶ Transform an incoming stream into a dict using a function for values. Example:
names = ["wanda", "vision", "loki", "mobius"] names | toDict(valueF=lambda s: len(s)) # will return {"wanda": 5, "vision": 6, ...} names | toDict(lambda s: s.title(), lambda s: len(s)) # will return {"Wanda": 5, "Vision": 6, ...}
-
-
class
k1lib.bioinfo.cli.structural.
split
(delim: Optional[str] = None, idx: Optional[int] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(delim: Optional[str] = None, idx: Optional[int] = None)[source]¶ Splits each line using a delimiter, and outputs the parts as a separate line. Example:
# returns ["a", "b", "d", "e"] ["a,b", "d,e"] | split(",") | dereference() # returns ['b', 'e'] ["a,b", "d,e"] | split(",", 1) | dereference()
- Parameters
idx – if available, only outputs the element at that index
-
-
class
k1lib.bioinfo.cli.structural.
expandE
(f: Callable[[T], List[T]], column: int)[source]¶
-
class
k1lib.bioinfo.cli.structural.
table
(delim: Optional[str] = None)[source]¶
-
class
k1lib.bioinfo.cli.structural.
stitch
(delim: Optional[str] = None)[source]¶
-
k1lib.bioinfo.cli.structural.
tableFromList
()¶ Turns Iterator[T] into Table[T]
-
class
k1lib.bioinfo.cli.structural.
count
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Finds unique elements and returns a table with [frequency, value, percent] columns. Example:
# returns [[1, 'a', '33%'], [2, 'b', '67%']] ['a', 'b', 'b'] | count() | dereference()
-
class
k1lib.bioinfo.cli.structural.
permute
(*permutations: List[int])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*permutations: List[int])[source]¶ Permutes the columns. Acts kinda like
torch.Tensor.permute()
. Example:# returns [['b', 'a'], ['d', 'c']] ["ab", "cd"] | permute(1, 0) | dereference()
-
-
class
k1lib.bioinfo.cli.structural.
accumulate
(columnIdx: int = 0, avg=False)[source]¶
-
class
k1lib.bioinfo.cli.structural.
AA_
(*idxs: List[int], wraps=False)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(*idxs: List[int], wraps=False)[source]¶ Returns 2 streams, one that has the selected element, and the other the rest. Example:
[1, 5, 6, 3, 7] | AA_(1) # will return [5, [1, 6, 3, 7]]
You can also put multiple indexes through:
[1, 5, 6] | AA_(0, 2) # will return [[1, [5, 6]], [6, [1, 5]]]
If you put None in, then all indexes will be sliced:
[1, 5, 6] | AA_(0, 2) # will return: # [[1, [5, 6]], # [5, [1, 6]], # [6, [1, 5]]]
As for why the strange name, think of this operation as “AĀ”. In statistics, say you have a set “A”, then “not A” is commonly written as A with an overline “Ā”. So “AA_” represents “AĀ”, and that it first returns the selection A.
- Parameters
wraps – if True, then the first example will return [[5, [1, 6, 3, 7]]] instead, so that A has the same signature as Ā
-
-
class
k1lib.bioinfo.cli.structural.
peek
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns (firstRow, iterator). This sort of peaks at the first row, to potentially gain some insights about the internal formats. Example:
e, it = iter([[1, 2, 3], [1, 2]]) | peek() print(e) # prints "[1, 2, 3]" s = 0 for e in it: s += len(e) print(s) # prints "5", or length of 2 lists
-
class
k1lib.bioinfo.cli.structural.
peekF
(f: Union[k1lib.bioinfo.cli.init.BaseCli, Callable[[T], T]])[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(f: Union[k1lib.bioinfo.cli.init.BaseCli, Callable[[T], T]])[source]¶ Similar to
peek
, but will executef(row)
and return the input Iterator. Example:it = lambda: iter([[1, 2, 3], [1, 2]]) # prints "[1, 2, 3]" and returns [[1, 2, 3], [1, 2]] it() | peekF(lambda x: print(x)) | dereference() # prints "1\n2\n3" it() | peekF(headOut()) | dereference()
-
-
class
k1lib.bioinfo.cli.structural.
repeat
(limit: Optional[int] = None)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Yields a specified amount of the passed in object. Example:
# returns [[1, 2, 3], [1, 2, 3], [1, 2, 3]] [1, 2, 3] | repeat(3) | toList()
- Parameters
repeat – if None, then repeats indefinitely
-
class
k1lib.bioinfo.cli.structural.
infiniteFrom
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Yields from a list. If runs out of elements, then do it again. Example:
# returns [1, 2, 3, 1, 2] [1, 2, 3] | infiniteFrom() | head(5) | dereference()
utils module¶
This is for all short utilities that has the boilerplate feeling
-
class
k1lib.bioinfo.cli.utils.
size
(idx=None)[source]¶
-
k1lib.bioinfo.cli.utils.
shape
¶ alias of
k1lib.bioinfo.cli.utils.size
-
class
k1lib.bioinfo.cli.utils.
item
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns the first row
-
class
k1lib.bioinfo.cli.utils.
identity
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Yields whatever the input is. Useful for multiple streams
-
class
k1lib.bioinfo.cli.utils.
toStr
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts every line (possibly just a number) to a string.
-
class
k1lib.bioinfo.cli.utils.
toNumpy
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to numpy array
-
class
k1lib.bioinfo.cli.utils.
toTensor
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to
torch.Tensor
-
class
k1lib.bioinfo.cli.utils.
toList
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to list.
list
would do the same, but this is just to maintain the style
-
class
k1lib.bioinfo.cli.utils.
wrapList
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Wraps inputs inside a list
-
class
k1lib.bioinfo.cli.utils.
toSet
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts generator to set.
set
would do the same, but this is just to maintain the style
-
class
k1lib.bioinfo.cli.utils.
toIter
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Converts object to iterator. iter() would do the same, but this is just to maintain the style
-
class
k1lib.bioinfo.cli.utils.
toRange
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns iter(range(len(it))), effectively
-
class
k1lib.bioinfo.cli.utils.
equals
[source]¶ Bases:
object
Checks if all incoming columns/streams are identical
-
class
k1lib.bioinfo.cli.utils.
reverse
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Prints last line first, first line last
-
class
k1lib.bioinfo.cli.utils.
ignore
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Just executes everything, ignoring the output
-
class
k1lib.bioinfo.cli.utils.
toSum
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Calculates the sum of list of numbers
-
class
k1lib.bioinfo.cli.utils.
toAvg
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Calculates average of list of numbers
-
class
k1lib.bioinfo.cli.utils.
toMax
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Calculates the max of a bunch of numbers
-
class
k1lib.bioinfo.cli.utils.
toMin
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Calculates the min of a bunch of numbers
-
class
k1lib.bioinfo.cli.utils.
lengths
[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
Returns the lengths of each row.
-
k1lib.bioinfo.cli.utils.
headerIdx
()[source]¶ Cuts out first line, put an index column next to it, and prints it out. Useful when you want to know what your column’s index is to cut it out. Also sets the context variable “header”, in case you need it later. Example:
# returns [[0, 'a'], [1, 'b'], [2, 'c']] ["abc"] | headerIdx() | dereference() # returns "abc" ctx["header"]()
-
class
k1lib.bioinfo.cli.utils.
dereference
(ignoreTensors=False, maxDepth=inf)[source]¶ Bases:
k1lib.bioinfo.cli.init.BaseCli
-
__init__
(ignoreTensors=False, maxDepth=inf)[source]¶ Recursively converts any iterator into a list. Only
str
,numbers.Number
are not converted. Example:# returns something like "<range_iterator at 0x7fa8c52ca870>" iter(range(5)) # returns [0, 1, 2, 3, 4] iter(range(5)) | deference()
You can also specify a
maxDepth
:# returns something like "<list_iterator at 0x7f810cf0fdc0>" iter([range(3)]) | dereference(maxDepth=0) # returns [range(3)] iter([range(3)]) | dereference(maxDepth=1) # returns [[0, 1, 2]] iter([range(3)]) | dereference(maxDepth=2)
- Parameters
ignoreTensors – if True, then don’t loop over
torch.Tensor
internals
Warning
Can work well with PyTorch Tensors, but not Numpy’s array as they screw things up with the __ror__ operator, so do torch.from_numpy(…) first.
-
__invert__
() → k1lib.bioinfo.cli.init.BaseCli[source]¶ Returns a
BaseCli
that makes everything an iterator.
-