5 minute tuning tutorial

We assume that the input data for tuning has already been generated.

Preparing input data with app-yoda2hf and app-datadirtojson

Loading input data from YODA files is supported but can be time consuming. We suggest converting the hierarchical directory structure to HDF5.

# Convert yoda files found in INPUTDIR to hdf5. Parameter information is
# extracted from files matching params.dat
mpirun -np 4 app-yoda2h5 INPUTDIR --pname params.dat -o inputdata.h5

# Convert all yoda files recursively found in DATADIR containing /REF data to json
app-datadirtojson DATADIR -o data.json

Data loading from YODA files in directories as was done in Professor is supported. This is true for the parameterisation inputs as well as the reference data files.

Training approximations with app-build

Note that in contrast to Professor, approximations of the bin contents and bin errors are trained separately.

mpirun -np 4 python3 app-build inputdata.h5  --order 3,0 -o val_30.json
mpirun -np 4 python3 app-build inputdata.h5  --order 2,0 -o err_20.json --errs

# NERSC, i.e. slurm --- this example computes rational approximation with slsqp
srun -n 1000 python3 app-build inputdata.h5  --order --mode sip 4,1 -o val_41.json 

Envelope plotting

To see how the inputs to the approximation do compare with the experimental data, we provide the script app-yodaenvelopes. It takes the extreme input values per bin and stores that information as two separate YODA files, suitable for plotting with e.g. rivet-mkhtml. This allows to quickly see if for instance the chosen parameter space is suitable for minimisation of a goodness of fit measure given the data.

If the inputs do not envelope the data, neither will the approximation. By default, the tuning stage will discard bins where this is the case. If one does filter these bins, the minimisation is in danger of leading to non trustworthy results due to being in a regime of extrapolation.

app-yodaenvelope val_30.json -o mytune/envelope
Envelope plot example.

Inspecting approximations with app-ls

A summary of the built approximations can be obtained with app-ls. The script also allows to produce a standard weight file later used as input for the optimisation.

app-ls val_30.json
app-ls val_30.json -w
app-ls val_30.json -w > myweights.txt

Optmisation with app-tune2

This script loads approximations, experimentally observed data and a weight file to define an objective. The objective is minimised numerically. All output is written to a folder specified with -o. The outputs vary depending on options and available packages. If YODA is is found on the system, a representation approximations evaluated at the best fit point is stored as YODA file for convenient plotting with e.g. rivet-mkhtml.

app-tune2 myweights.txt data.json val_30.json -o tune_no_errs

# With additional error term
app-tune2 myweights.txt data.json val_30.json -e err_20.json -o tune_w_errs

A typical output looks like this:

# Objective value at best fit point: 2515.45 (2515.45 without weights)
# Degrees of freedom: 297
# phi2/ndf: 8.470
# Minimisation took 14.35996127128601 seconds
# Command line: app-tune2 allweights data.json val_30.json -s 1000 -r 10 -o test
# Best fit point:
#
# PNAME        	 PVALUE              #    COMMENT    [  PLOW          ...   PHIGH      ]
#
maxSD         	50.30800237383366    #               [ 50.00647       ...  59.99427    ]
maxXX         	48.24390530879254    #               [ 10.00196       ...  64.98471    ]
sigma         	 0.23612976466256186 #               [  0.2000041     ...   0.3998176  ]
mMin          	 0.26658171719175056 #               [  0.1006284     ...   0.4999751  ]
mResMax       	 0.4660927817026157  #               [  0.1009738     ...   1.998968   ]
SaSepsilon    	-0.09491377747130421 #               [ -0.1982309     ...   0.1998691  ]
lowMEnhance   	 4.043974446846514   #               [  1.00916       ...   5.988157   ]
pickQuarkNorm 	 0.29360584582092153 #               [  0.0001173527  ...   9.997025   ]
pickQuarkPower	 1.561158636963411   #               [  0.5015452     ...   1.498905   ]
sigmaRefPomP  	11.49378439673652    #               [  5.007572      ...  14.99748    ]
mPowPomP      	 0.05749000056977163 #               [  7.39663e-05   ...   0.09989857 ]
mMinPert      	 7.303711183546774   #               [  5.00034       ...   9.993299   ]

The base file name of all outputs is automatically generated from the minimisation options.

It contains information about the objective function at the found minimum and of course the best fit point. Further, a comment is added if e.g. a parameter was fixed or ended up at the domain boundary.

Selecting a minimisation algorithm

The default minimiser is a Truncated Newton method (scipy.optimize "tnc"). The command line option for choosing an algorithm is -a the following arguments are valid

  • tnc (default)

  • lbfgsb (scipy.optimize "lbfgsb")

  • ncg (scipy.optimize "ncg")

  • trust (scipy.optimize "trust-exact")

ncg and trust do not have a concept of domain limits, both use second order information however

app-tune2 myweights.txt data.json val_30.json -o tune_no_errs -a ncg

Note that currently by default a check is performed to test if the minimisation ended up in a saddle point. If that is detected, the minimisation is restarted up to 10 times. To override this behavior, use the option --no-check.

Multistart options

The strategy to select a good start point for the minimisers is to evaluate the objective for a randomly selected set of points. The size of this survey can be adjusted with the command line option -s.

By default, the minimisation runs once. To increase the number of restarts (that is separate minimisations each starting from a different start point) use the command line option -r.

app-tune2 myweights.txt data.json val_30.json -o tune_no_errs -s 10000 -r 10

For each restart, a new random survey is performed to select a start point. The final result of the optimisation is the best result from all restarts.

Limits and fixed parameters

By default, the domain of the approximation is used to set bounds for the minimisation (except for ncg and trust which do not support this to begin with). Manual limits can be supplied via a simple text file. The command line option is -l.

In the following example, the parameter "PARAM_A" is bound to values between 3 and 5.

# Parameter limit file, comments and empty lines are ignored
PARAM_A  3   5
# PARAM_B 5 8

Specifying limits overrides the default bounds --- but only for the parameters specified in the file. I.e. the domain bound for all other dimensions stay at their default.

app-tune2 myweights.txt data.json val_30.json -o tune_no_errs -l mylimits.txt

To fix individual parameters, the same option -l can be used. Fixing parameters and setting manual limits can be mixed.

# Parameter limit file, comments and empty lines are ignored
# A string followed by two numbers is interpreted as bounds
PARAM_A  3   5
# A string followed by a single numbe is interpreted as fixed value
PARAM_B  3.145

Plot options

By default, the correlation of the parameters is inferred from the inverse of the hessian at the minimum. A colour map plot is stored in the output folder.

Parameter correlations

Example visualisation of correlation matrix.

Histogram plotting

If YODA is available, the predictions at the found minimum are written to yoda file. The latter can be plotted using e.g. rivet-mkhtml (See https://rivet.hepforge.org/trac/wiki/RivetHistogramming)

Profiles of objective

To produce profiles, that is 1D projections of the objective function onto the parameter axes, the command line switch -p can be used.

app-tune2 myweights.txt data.json val_30.json -o tune_no_errs -p
Scan of the objective in one direction ("pickQuarkNorm") all other parameter are fixed at their best fit value.

Optimisation with app-nest

Instead of numerical minimisation, we can use MultiNest (https://github.com/JohannesBuchner/PyMultiNest) to sample the domain and use Bayesian inference to learn about best fit points. MultiNest can be pip installed but requies libmultinest.so to be build. The documentation at their webpage is excellent. All options for multinest are available as options for app-nest. It further supports the setting of limits and fixing parameters the same way as app-tune2.

app-nest allweights data.json val_30.json -e err_30.json -o nestout

# If MPI4py is available and libmultinest is build with MPI support
mpirun -np 4 app-nest allweights data.json val_30.json -e err_30.json -o nestout
Cornerplot made from output of app-nest.

Last updated

Was this helpful?