.. module:: k1lib.cli k1lib.cli module ================ Setup ----- To install the library, run this in a terminal: .. code-block:: console pip install k1lib[all] If you don't want to install extra dependencies (not recommended), you can do this instead: .. code-block:: console pip install k1lib To use it in a python file or a notebook, do this:: from k1lib.imports import * Because there are a lot of functions with common names, you may have custom functions or classes that have the same name, which will override the functions in the library. If you want to use them, you can use ``cli.sort()`` instead of ``sort()`` for example. Intro ----- The main idea of this package is to emulate the terminal (hence "cli", or "command line interface"), but doing all of that inside Python itself. So this bash statement: .. code-block:: console cat file.txt | head -5 > headerFile.txt Turns into this statement:: cat("file.txt") | head(5) > file("headerFile.txt") Let's step back a little bit. In the bash statement, "cat" and "head" are actual programs accessible through the terminal, and "|" will pipe the output of 1 program into another program. ``cat file.txt`` will read a file and returns a list of all rows in it, which will then be piped into ``head -5``, which will only return the first 5 lines. Finally, ``> headerFile.txt`` will redirect the output to the "headerFile.txt" file. See this video for more: https://www.youtube.com/watch?v=bKzonnwoR2I On the Python side, "cat", "head" and "file" are Python classes extended from :class:`~init.BaseCli`. ``cat("file.txt")`` will read the file line by line, and return a list of all of them. ``head(5)`` will take in that list and return a list with only the first 5 lines. Finally, ``> file("headerFile.txt")`` will take that in and writes it to a file. You can even integrate with existing shell commands:: ls("~") | cmd("grep *.so") Here, "ls" will list out files inside the home directory, then pipes it into regular grep on linux, which is then piped back into Python as a list of strings. So it's equivalent to this bash statement: .. code-block:: console ls | grep *.so Let's see a really basic example:: # just a normal function f = lambda x: x**2 # returns 9, no surprises here f(3) # f is now a cli tool f = aS(lambda x: x**2) # returns 9, demonstrating that they act like normal functions f(3) # returns 9, demonstrating that you can also pipe into them 3 | f You can think of the flow of these clis in terms of 2 phases. 1 is configuring what you want the cli to do, and 2 is actually executing it. Let's say you want to take a list of numbers and take the square of them:: # configuration stage. You provide a function to `apply` to tell it what to apply to each element in the list f = apply(lambda x: x**2) # initialize the input x = range(5) # execution stage, normal style, returns [0, 1, 4, 9, 16] list(f(x)) # execution stage, pipe style, returns [0, 1, 4, 9, 16] list(x | f) # typical usage: combining configuration stage and execution stage, returns [0, 1, 4, 9, 16] list(range(5) | apply(lambda x: x**2)) # refactor converting to list so that it uses pipes, returns [0, 1, 4, 9, 16] range(5) | apply(lambda x: x**2) | aS(list) You may wonder why do we have to turn it into a list. That's because all cli tools execute things lazily, so they will return iterators, instead of lists. Here's how iterators work:: def gen(): # this is a generator. It generates elements yield 3 print("after yielding 3") yield 2 yield 5 for e in gen(): print(e) It will print this out: .. code-block:: text 3 after yielding 3 2 5 So, iterators feels like lists. In fact, a list is an iterator, ``range(5)``, numpy arrays and strings are also iterators. Basically anything that you can iterate through is an iterator. The above iterator is a little special, as it's specifically called a "generator". They are actually a really cool aspect of Python, in terms of they execute code lazily, meaning ``gen()`` won't run all the way when you call it. In fact, it doesn't run at all. Only once you request new elements when trying to iterate over it will the function run. All cli tools utilize this fact, in terms of they will not actually execute anything unless you force them to:: # returns ". at 0x7f7ae48e4d60>" range(5) | apply(lambda x: x**2) # you can iterate through it directly: for element in range(5) | apply(lambda x: x**2): print(element) # returns [0, 1, 4, 9, 16], in case you want it in a list list(range(5) | apply(lambda x: x**2)) # returns [0, 1, 4, 9, 16], demonstrating deref range(5) | apply(lambda x: x**2) | deref() In the first line, it returns a generator, instead of a normal list, as nothing has actually been executed. You can still iterate through generators using for loops as usual, or you can convert it into a list. When you get more advanced, and have iterators nested within iterators within iterators, you can use :class:`~utils.deref` to turn all of them into lists. Also, a lot of these tools (like :class:`~modifier.apply` and :class:`~filt.filt`) sometimes assume that we are operating on a table. So this table: +------+------+------+ | col1 | col2 | col3 | +======+======+======+ | 1 | 2 | 3 | +------+------+------+ | 4 | 5 | 6 | +------+------+------+ Is equivalent to this list:: [["col1", "col2", "col3"], [1, 2, 3], [4, 5, 6]] :class:`~structural.transpose` and :class:`~init.mtmS` provides more flexible ways to transform a table structure (but usually involves more code). Besides operating on string iterators alone, this package can also be extra meta, and operate on streams of strings, or streams of streams of anything. I think this is one of the most powerful concept of the cli workflow. Check over it here: .. toctree:: :maxdepth: 1 streams All cli tools should work fine with :class:`torch.Tensor`, :class:`numpy.ndarray` and :class:`pandas.core.series.Series`, but k1lib actually modifies Numpy arrays and Pandas series deep down for it to work. This means that you can still do normal bitwise or with a numpy float value, and they work fine in all regression tests that I have, but you might encounter strange bugs. You can disable it manually by changing :attr:`~k1lib.settings`.startup.or_patch. If you chooses to do this, you have to be careful and use these workarounds:: # returns (2, 3, 5), works fine torch.randn(2, 3, 5) | shape() # will not work, returns weird numpy array of shape (2, 3, 5) np.random.randn(2, 3, 5) | shape() # returns (2, 3, 5), mitigation strategy #1 shape()(np.random.randn(2, 3, 5)) # returns (2, 3, 5), mitigation strategy #2 [np.random.randn(2, 3, 5)] | (item() | shape()) All cli-related settings are at :attr:`~k1lib.settings`.cli. Where to start? ------------------------- Core clis include: - :class:`~modifier.apply`, :class:`~modifier.aS`, :class:`~modifier.op`, :class:`~grep.grep` - :class:`~filt.filt`, :class:`~filt.head`, :class:`~filt.rows`, :class:`~filt.cut` - :class:`~utils.deref`, :class:`~utils.item`, :class:`~utils.shape` - :class:`~structural.transpose`, :class:`~structural.joinStreams`, :class:`~structural.batched`, :class:`~structural.count` - :meth:`~inp.cat`, :meth:`~inp.ls`, :class:`~output.file`, :class:`~output.stdout` These clis are pretty important, and are used all the time, so look over them to see what the library can do. Whenever you find some cli you have not encountered before, you can just search it in the search bar on the top left of the page. Then other important, not necessarily core clis include: - :class:`~modifier.applyMp`, :class:`~modifier.sort`, :class:`~modifier.randomize` - :class:`~utils.wrapList`, :class:`~utils.ignore`, :class:`~inp.cmd` - :class:`~structural.repeat` and friends, :class:`~structural.groupBy` So, start reading over what these do first, as you can pretty much 95% utilize everything the cli workflow has to offer with those alone. Then skim over basic conversions in module :mod:`~k1lib.cli.conv`. While you're doing that, checkout :meth:`~trace.trace`, for a quite powerful debugging tool. There are several `written tutorials <../tutorials.html>`_ about cli here, and I also made some `video tutorials `_ as well, so go check those out. For every example in the tutorials that you found, you might find it useful to follow the following debugging steps, to see how everything works:: # assume there's this piece of code: A | B | C | D # do this instead: A | deref() # once you understand it, do this: A | B | deref() # assume there's this piece of code: A | B.all() | C # do this instead: A | item() | B | deref() # once you understand it, you can move on: A | B.all() | deref() # assume there's this piece of code: A | (B & C) # do this instead: A | B | deref() # assume there's this piece of code: A | (B + C) # do these instead: A | deref() | op()[0] | B | deref() A | deref() | op()[1] | C | dereF() # there are alternatives to that: A | item() | B | deref() A | rows(1) | item() | C | deref() Finally, you can read over the summary below, see what catches your eye and check that cli out. Summary ------------------------- .. include:: ../literals/cli-tables.rst Under the hood ------------------------- How it works underneath is pretty simple. All cli tools implement the "reverse or" operation, or __ror__. So essentially, these 2 statements are equivalent:: 3 | obj obj.__ror__(3) There are several other operations that certain clis can override, like ">" or ">>". Also, if you're an advanced user, there's also an optimizer that looks like LLVM, so you can implement optimization passes to speed up everything by a lot: .. toctree:: :maxdepth: 1 llvm Biology-related clis *********************** I separated these out because they might not be interesting to the majority of users. .. include:: ../literals/cli-bio-tables.rst bio module ------------------------- .. automodule:: k1lib.cli.bio :members: :undoc-members: :show-inheritance: cif module ------------------------- .. automodule:: k1lib.cli.cif :members: :undoc-members: :show-inheritance: conv module ------------------------- .. automodule:: k1lib.cli.conv :members: :undoc-members: :show-inheritance: mgi module ------------------------- .. automodule:: k1lib.cli.mgi :members: :undoc-members: :show-inheritance: filt module ------------------------- .. automodule:: k1lib.cli.filt :members: :undoc-members: :show-inheritance: gb module ------------------------- .. automodule:: k1lib.cli.gb :members: :undoc-members: :show-inheritance: grep module ------------------------- .. automodule:: k1lib.cli.grep :members: :undoc-members: :show-inheritance: init module ------------------------- .. autoclass:: k1lib.cli.init.BaseCli :members: :undoc-members: :special-members: __and__, __add__, __or__, __ror__, __lt__, __call__ :show-inheritance: .. automodule:: k1lib.cli.init :members: serial, oneToMany, mtmS, fastF, patchNumpy, patchDict, patchPandas :undoc-members: :show-inheritance: .. attribute:: yieldT Object often used as a sentinel, or an identifying token in lots of clis, including that can be yielded in a stream to ignore this stream for the moment in :class:`~k1lib.cli.structural.joinStreamsRandom`, :class:`~k1lib.cli.utils.deref`, :class:`~k1lib.cli.typehint.tCheck` and :class:`~k1lib.cli.typehint.tOpt` inp module ------------------------- .. automodule:: k1lib.cli.inp :members: :undoc-members: :show-inheritance: .. automethod:: k1lib.cli.inp.cat.pickle kcsv module ------------------------- .. automodule:: k1lib.cli.kcsv :members: :undoc-members: :show-inheritance: kxml module ------------------------- .. automodule:: k1lib.cli.kxml :members: :undoc-members: :show-inheritance: modifier module ------------------------- .. automodule:: k1lib.cli.modifier :members: :undoc-members: :show-inheritance: nb module ------------------------- .. automodule:: k1lib.cli.nb :members: :undoc-members: :show-inheritance: output module ------------------------- .. automodule:: k1lib.cli.output :members: :undoc-members: :show-inheritance: sam module ------------------------- .. automodule:: k1lib.cli.sam :members: :undoc-members: :show-inheritance: structural module ------------------------- .. currentmodule:: k1lib.cli.structural .. automodule:: k1lib.cli.structural :members: :exclude-members: joinStreamsRandom :undoc-members: :show-inheritance: .. autoclass:: joinStreamsRandom :members: trace module ------------------------- .. automodule:: k1lib.cli.trace :members: :undoc-members: :show-inheritance: utils module ------------------------- .. automodule:: k1lib.cli.utils :members: :undoc-members: :show-inheritance: typehint module ------------------------- .. automodule:: k1lib.cli.typehint :members: :undoc-members: :show-inheritance: optimizations module ------------------------- .. automodule:: k1lib.cli.optimizations :members: :undoc-members: :show-inheritance: Elsewhere in the library ------------------------- There might still be more cli tools scattered around the library. These are pretty rare, quite dynamic and most likely a cool extra feature, not a core functionality, so not worth it/can't mention it here. Anyway, execute this:: cli.scatteredClis() to get a list of them.