Collections – Reference Guide

This guide is for versions 1.0 Release Candidate 0+

Source code

Class CollectionTabular

    A "tabular collection" is a Pandas dataframe
    built up from a sequence of "snapshots" of data that's in  the form of a python dictionary
    (representing a list of values and their corresponding names),
    such as the state of the system or of parts thereof.

    Each data "snapshots" is taken at different times,
    or results from varying some parameter.

    Each snapshot - incl. its parameter values and optional captions -
    will constitute a "row" in a tabular format

    MAIN DATA STRUCTURE for "tabular" collections:
        A Pandas dataframe
    
nameargumentsreturns
__init__parameter_name="SYSTEM TIME"
        :param parameter_name:  A label explaining what the snapshot parameter is.
                                Typically it's "SYSTEM TIME" (default), but could be anything
                                Used as the Pandas column name for the
                                parameter value attached to the various snapshot captures
        
nameargumentsreturns
__len__selfint
        Return the number of snapshots comprising the movie
        :return:    An integer
        
nameargumentsreturns
__str__self

nameargumentsreturns
storepar, data_snapshot: dict, caption=""None
        Save up the given data snapshot, alongside the specified parameter value and optional caption

        EXAMPLE :
                store(par=8., data_snapshot={"A": 1., "B": 2.}, caption="State immediately before injection of 2nd reactant")

        :param par:             Typically, the System Time - but could be any value that parametrizes the snapshots
        :param data_snapshot:   A dict of data to preserve for later use;
                                    it's acceptable to contain new fields not used in previous calls
                                    (in that case, the dataframe will add new columns automatically - and NaN values
                                     will appear in earlier rows)
        :param caption:         [OPTIONAL] String to describe the snapshot.  Use None to avoid including that column
        :return:                None (the object variable "self.movie" will get updated)
        
nameargumentsreturns
get_dataframehead=None, tail=None, val_start=None, val_end=None, search_col=None, search_val=None, return_copy=Truepd.DataFrame
        Return the main data structure (a Pandas dataframe) 
        - or a part thereof (in which case a column named "search_value" is inserted to the left.)

        Optionally, limit the dataframe to a specified numbers of rows at the end,
        or just return row(s) corresponding to a specific search value(s) in the specified column
        - i.e. the row(s) with the CLOSEST value to the requested one(s).

        IMPORTANT:  if multiple options to restrict the dataset are present, only one is carried out;
                    the priority is:  1) head,  2) tail,  3) filtering,  3) search

        :param head:        [OPTIONAL] Integer.  If provided, only show the first several rows;
                                as many as specified by that number.
        :param tail:        [OPTIONAL] Integer.  If provided, only show the last several rows;
                                as many as specified by that number.
                                If the "head" argument is passed, this argument will get ignored

        :param val_start:  [OPTIONAL] Perform a FILTERING using the start value in the the specified column
                                - ASSUMING the dataframe is ordered by that value (e.g. a system time)
        :param val_end:    [OPTIONAL] FILTER by end value.
                                Either one or both of start/end values may be provided

        :param search_col:  [OPTIONAL] String with the name of a column in the dataframe,
                                against which to match the value below
        :param search_val:  [OPTIONAL] Number, or list/tuple of numbers, with value(s)
                                to search in the above column

        :param return_copy: [OPTIONAL] If True (default), the returned dataframe is guaranteed to be a (deep) copy -
                                so that modifying it won't affect the internal dataframe

        :return:            A Pandas dataframe, with all or some of the rows
                                that were stored in the main data structure.
                                If a search was requested, insert a column named "search_value" to the left
        
nameargumentsreturns
clear_dataframeselfNone
        Do a clean start

        :return:    None
        
nameargumentsreturns
set_caption_last_snapshotcaption: strNone
        Set the caption field of the last (most recent) snapshot to the given value.
        Any previous value gets over-written

        :param caption: String containing a caption to write into the last (most recent) snapshot
        :return:        None
        
nameargumentsreturns
set_field_last_snapshotfield_name: str, field_valueNone
        Set the specified field of the last (most recent) snapshot to the given value.
        Any previous value gets over-written.
        If the specified field name is not already one of the columns in the underlying
        data frame, a new column by that name gets added; any previous rows will have the value NaN
        assigned to that column

        :param field_name:  Name of field of interest
        :param field_value: Value to write into the above field for the last (most recent) snapshot
        :return:            None
        
nameargumentsreturns
update_last_snapshotupdate_values: dictNone
        Set some fields of the last (most recent) snapshot to the given values.
        Any previous value gets over-written.
        If any field name is not already among the columns in the underlying
        data frame, a new column by that name gets added; any previous rows will have the value NaN
        assigned to that column

        :param update_values:   Dict whose keys are the names of the columns to update
        :return:                None
        



Class CollectionArray

    Use this structure if your "snapshots" (data to add to the cumulative collection) are Numpy arrays,
    of any dimension - but always retaining that same dimension.

    Usually, the snapshots will be dump of the entire system state, or parts thereof, but could be anything.
    Typically, each snapshot is taken at a different time (for example, to create a history), but could also
    be the result of varying some parameter(s)

    DATA STRUCTURE:
        A Numpy array of 1 dimension larger than that of the snapshots

        EXAMPLE: if the snapshots are the 1-d numpy arrays [1., 2., 3.] and [10., 20., 30.]
                        then the internal structure will be the matrix
                        [[1., 2., 3.],
                         [10., 20., 30.]]
    
nameargumentsreturns
__init__parameter_name="SYSTEM TIME"
        :param parameter_name:  A label explaining what the snapshot parameter is.
                                Typically it's "SYSTEM TIME" (default), but could be anything
        
nameargumentsreturns
__len__selfint
        Return the number of snapshots comprising the movie
        :return:    An integer
        
nameargumentsreturns
__str__self

nameargumentsreturns
storepar, data_snapshot: np.array, caption = ""None
        Save up the given data snapshot, and its associated parameters and optional caption

        EXAMPLES:
                store(par = 8., data_snapshot = np.array([1., 2., 3.]), caption = "State after injection of 2nd reagent")
                store(par = {"a": 4., "b": 12.3}, data_snapshot = np.array([1., 2., 3.]))

        :param par:             Typically, the System Time - but could be anything that parametrizes the snapshots
                                    (e.g., a dictionary, or any desired data structure.)
                                    It doesn't have to remain consistent, but it's probably good practice to keep it so
        :param data_snapshot:   A Numpy array, of any shape - but must keep that same shape across snapshots
        :param caption:         OPTIONAL string to describe the snapshot
        :return:                None
        
nameargumentsreturns
get_arrayselfnp.array
        Return the main data structure - the Numpy Array

        :return:    A Numpy Array with the main data structure
        
nameargumentsreturns
get_parametersselflist
        Return all the parameter values

        :return:    A list with the parameter values
        
nameargumentsreturns
get_captionsself[str]
        Return all the captions

        :return:    A list with the captions
        
nameargumentsreturns
get_shapeselftuple

        :return:    A tuple with the shape of the snapshots
        



Class Collection

    A "Collection" is a list of snapshots that the user wants to preserve,
    such as the state of the entire system, or of parts thereof,
    either taken at different times,
    or resulting from varying some parameter(s)

    This class accept data in arbitrary formats.

    MAIN DATA STRUCTURE:
        A list of triplets.
        Each triplet is of the form (parameter value, caption, snapshot_data)
            1) The "parameter" is typically time, but could be anything.
               (a descriptive meaning of this parameter is stored in the object attribute "parameter_name")
            2) "snapshot_data" can be anything of interest, typically a clone of some data element.
            3) "caption" is just a string with an optional label.

        If the "parameter" is time, it's assumed to be in increasing order

        EXAMPLE:
            [
                (0., DATA_STRUCTURE_1, "Initial state"),
                (8., DATA_STRUCTURE_2, "State immediately after injection of 2nd reagent")
            ]
    
nameargumentsreturns
__init__parameter_name="SYSTEM TIME"
        :param parameter_name:  A label explaining what the snapshot parameter is.
                                Typically it's "SYSTEM TIME" (default), but could be anything
        
nameargumentsreturns
__len__self

nameargumentsreturns
__str__self

nameargumentsreturns
storepar, data_snapshot, caption = ""None
        Save up the given data snapshot

        EXAMPLE:
                store(par = 8.,
                      data_snapshot = {"c1": 10., "c2": 20.},
                      caption = "State immediately before injection of 2nd reagent")

                store(par = {"a": 4., "b": 12.3},
                     data_snapshot = [999., 111.],
                     caption = "Parameter is a dict; data is a list")

        IMPORTANT:  if passing a variable pointing to an existing mutable structure (such as a list, dict, object)
                    make sure to first *clone* it, to preserve it as it!

        :param par:             Typically, the System Time - but could be anything that parametrizes the snapshots
                                    (e.g., a dictionary, or any desired data structure.)
                                    It doesn't have to remain consistent, but it's probably good practice to keep it so
        :param data_snapshot:   Data in any format (such as a Numpy array, or an object)
        :param caption:         OPTIONAL string to describe the snapshot
        :return:                None
        
nameargumentsreturns
get_collectionselflist
        Return the main data structure - the list of snapshots, with their attributes

        :return:
        
nameargumentsreturns
get_dataselflist
        Return a list of all the snapshots

        :return:    A list of all the snapshots
        
nameargumentsreturns
get_parametersselflist
        Return all the parameter values

        :return:    A list with all the parameter values
        
nameargumentsreturns
get_captionsself[str]
        Return all the captions

        :return:    A list with all the captions