UNIQORN Logo

UNIQORN – Universal Neural Network Interface for Quantum Observable Readout from N-body wavefunctions

— VERSION 0.7 BETA —

Introduction

The purpose of UNIQORN is to implement the inference of observables from measurements of quantum states as a machine learning task, see this preprint for a description. UNIQORN is a repository of python and bash scripts that implement classification or regression tasks using the TensorFlow library. The code performs the training of artificial neural networks on data for various observables obtained from MCTDH-X simulations.

Currently, only single-shot images as input data are supported, but in the future using correlation functions as input will be implemented.

The quantities that can be analyzed so far are fragmentation, particle number, density distributions, the potential, one- and two-body density matrices, and correlation functions.

Minimal single-shot datasets are included in the repository for testing purposes. For use beyond testing, please download the complete double-well and triple-well dataset.

The folder, where the data is read from is configured – together with other properties of the task to perform – in the Input.py file. The double-well dataset contains 3000 and the triple-well dataset contains 3840 random ground states (randomized parameters: barrier heights and widths, interparticle interactions, particle number).

Prerequisites

As prerequisites to run the python scripts in this repository you will need (at least)

Tensorflow 2 (for building and training ML models)
Tensorflow-Addons (for using interpolation loss)
Matplotlib (for visualizing the results)
Numpy (for numerical manipulations)
Jupyter (optional, for executing the notebook)

Structure of the repository

Please refer to the flowchart "workflow.pdf" for a graphical depiction of the structure of the modularized code. The UNIQORN python modules related to machine learning tasks are stored mostly in the "source" directory and the python modules and files related to the data generation with MCTDH-X are stored in the directory "MCTDHX-data_generation".

Calculations can be done using the Jupyter notebook UNIQORN.ipynb or the python script UNIQORN.py. An evaluation of the error of some observables is possible via a formula. For this purpose, the python script Error_from_formula.py can be executed. To perform the (lengthy!) check for the dependence of the neural-network-based regression of observables from input data with a varying amount of single-shot observations per input dataset, the python script Regression_Loop_NShots.py can be executed.

UNIQORN's directories contain other python modules that have different purposes.

PYTHON MODULES:

Input.py: This python class contains all the parameters of the algorithm, ranging from which observable to reconstruct, to how to batch the input data, to what to visualize or not, etc.
Output.py: This module contains all necesarry functionalities to generate the UNIQORN run summary and support a more cohesive structured storage of results including TensorBoard logs and Vizualization.py plots. This is achieved through run-wide consistent, centrally stored time stamps collected in a directory called "./output" .
DataPreprocessing.py: This module performs runtime checks for input data consistency, and extracts and manipulates the input data via the subroutine DataLoading.py depending on the given task (i.e. supervised or unsupervised regression/classification) and on the given observable to fit (e.g. fragmentation, or density, or...). It also splits the input data into training, validation, and test sets.
DataLoading.py: This module imports the input data (single shots or correlation functions) and chooses which observables to use as labels (classification) or true values (regression).
DataGenerator.py: This module implements a class to dynamically load the data on-demand, i.e., batch-by-batch while training, validation and testing the model.
ModelTrainingAndValidation.py: This module constructs or loads from a library the type of neural network selected in the input file, compiles it, trains it on the training set, validates it with the validation set, and tests it on a holdout data set.
Models.py: This module is a collection of python functions that return various types of neural networks and other Keras model objects to be used in ModelTrainingAndValidation.py. There are default models depending on the task to be performed, an archive of tested models, and customizable models.
ModelEvaluation.py: This module executes the validation of an already trained model with the test set.
Visualization.py: This module generates and saves as .png files all the results of model training and testing (e.g. accuracies as a function of the epochs), as well as visual comparisons between the test set and the results obtained with the neural networks.
functions.py: This module contains various useful functions to perform sub-tasks.
generate_random_DW_states.py: This python script generates random relaxations with MCTDH-X software to produce the single-shot input data.

BASH SCRIPTS (mainly used to generate or import data):

check_convergence.sh: called by generate_random_DW_states.py . Checks if a random relaxation is converged and restarts it, if not.
run_anal.sh: called by generate_random_DW_states.py . If a random relaxation is converged, this script runs the MCTDH-X analysis program on it.
MLG.sh: can be called to parse all python files for a certain string, e.g. ** ./MLG.sh UNIQORN ** .

Currently the code supports only supervised regression tasks. The tasks can be implemented via a multilevel perceptron (MLP) or a convolutional neural network (CNN). Some default and also some customizable models are defined in the file Models.py. Note that certain quantities such as fragmentation can only be inferred from multiple (and not a single) single-shot images. The DataPreprocessing.py function therefore assembles the input data in stacks of multiple single-shot images.

Quickstart tutorial

go to the directory example_run_mini
open Input.py and change the "sourceDir" flag to match the absolute path of your "source" directory and change "DirName" flag to match the absolute path of your data
run python UNIQORN.py
check the results in the subdirectory "output"
run summary is in output/run_< date >_< time >.out
change "Input.py" and rerun "python UNIQORN.py" to see the effect of your change

Running the code for a single set of hyperparameters

A good start is the Jupyter notebook UNIQORN.ipynb, which calls all the different modules that perform various tasks.

You can check it out by typing

jupyter notebook

in your shell. This should open a window in your web browser, from which you can navigate to the file UNIQORN.ipynb and execute it line by line. The notebook goes through the workflow explained above, i.e.

data loading -> data processing -> model choice, training and validation -> visualization.

The notebook will automatically call and run other modules such as DataPreprocessing.py, ModelTrainingAndValidation.py etc. These files should be modified only if you are a developer implementing new machine learning tasks. Moreover, UNIQORN.ipynb is to be seen as a starting point that trains, evaluates and visualizes a model for a single set of model parameters.

To choose which machine learning task to perform (e.g. switching from a regression of the particle number from single shots to a classification of fragmented/non-fragmented states from correlation functions), you need to modify the input file (python class) Input.py. This file contains all the different knobs and variables for the machine learning algorithms, including hyperparameters such as batch size or number of epochs. In here, you can also select which quantity to fit and how, and whether or not you want to load a pre-trained neural network or train it yourself, visualize the results or not etc. The input file and the role of each variable therein should be self-explanatory.

Note that the notebook produces results in line while being executed, but the corresponding figures are also saved in the various folders in the main folder "plots" for later retrieval. The paths to these files and the files themselves are named after the quantity being fitted. For example, a plot of the accuracy of the regression of the particle number from single shots in real space during 20 epochs will be saved in the folder "plots/NPAR/accuracies", with the name "Accuracies-for-REGR-of-NPAR-from-SSS-in-x-space-during-20-epochs.png"

Performing a hyperparameter optimization with HPBandster

It is a tough task to optimally configure all the possible parameters of deep learning models. However, since hyperparameter optimization is an optimization task, it can be automated. One library that provides out-of-the-box hyperparameter optimization is the HpBandSter library. See this link for details about HpBandSter. Obviously, the HpBandSter library is a prerequisite for running the code.

Currently, we only provide an HpBandSter implementation for optimizing convolutional neural networks (Set Model='custom' and ConvNet=True in Input.py). You can run the hyperparameter optimization by executing

python HyperParameterOpt.py

This will then perform an optimization which you can visualize by running

python PlotInteractiveHyperParameterOpt.py

This will open plot the results of the optimization run. By clicking on the point in the plot with the lowest loss, you can find out the optimal set of hyperparameters, i.e., the result of the optimization.