This file has been truncated. show original
"# pickle tutorial - security and alternate workflows"
"The purpose of this notebook will be to demonstrate some conventions for saving the \"postprocess_dict\" dictionary, which is the returned dictionary from an automunge(.) call which may be used to productionize a model with preprocessing built through the Automunge library. \n",
"Note that our tutorials so far have relied on the pickle library which is a native python module used to serialize and download python dictionaries, where the serialization is a non-encrypted form of data compression, and then the same pickle module may then be used to upload and initialize that same dictionary in a new notebook or environment. In some cases the serialization of a postprocessdict may be beneficial from a storage memory standpoint when preprocessing includes forms of ML infill which may store in the dictionary a trained model specific to each tabular feature. We understand that there are other options available to download a python dictionary without serialization (e.g. libraries like JSON, YAML, etc), however the pickle module has support for serializing additional datya types populated in a dictionary that may not be supported by other forms of JSON download. The most relevant data types that pickle may help with include those trained machine learning models returned Automunge library components for e.g. ML infill, PCA, feature selection, etc, as well as any custom data transformation functions that may have been iniatialized by a user for purposes of custom data transformations integrated into the family tree primitives API.\n",
"The pickle library, although widely used in data science and other circles, has a known security vulnerability associated with cases with an uploaded serialized dictionary had been altered and not matched to the original intended form of distribution. We note in our readme that python's documentation has suggested a mitigation for this vulnerability that relies on a second python module called [hmac](https://docs.python.org/3/library/hmac.html#module-hmac), which serves the purposes of deriving a form of signature for an original downloaded pickle object which can then be compared to an uploaded object received from a potentially unsecure channel in order to validate that the uploaded form matches the original intended basis. In the demonstrations of this notebook the hmac signature will be derived and then affixed to the pickled object for comparison prior to loading. An alternative and even more secure approach may be to share the signature thorugh a seperate more secure channel as an additional means of redundancy. This notebook is our first demonstration of the incorporation of the hmac tool into a workflow in our documentation, which will be one of the agendas of this notebook.\n",