contur.factories package

Submodules

contur.factories.test_observable module

class contur.factories.test_observable.Observable(ana_obj, xsec, nev, sm=None)[source]

Bases: object

Processes and decorates YODA.AnalysisObject to a testable format

Parameters:
  • ana_obj (YODA.AnalysisObject) – YODA AO to dress, containing signal info.

  • xsec (YODA.Scatter1D) – _XSEC scatter recording generator cross section in YODA file (contained in all Rivet run outputs)

  • nev (YODA.Scatter1D) – _EVTCOUNT scatter recording total generated events in YODA file (contained in all Rivet run outputs)

  • sm (SMPrediction) – Standard Model prediction for this observable

property data_scale

Scale factor applied to the refdata histogram/scatter

type (float)

doPlot()[source]

Public member function to build yoda plot members for interactive runs

These are only for display, they are not used in any of the statistics calculations.

get_sm_pval()[source]

Calculate the pvalue compatibility (using chi2 survival) for the SM prediction and this measurement

property has_theory

Bool representing if a theory prediction was found for the input signal

type (bool)

property likelihood

The instance of Likelihood derived from this histogram

type (Likelihood)

property ref

Reference data, observed numbers input to test, scaled if required

type (YODA.Scatter2D)

property refplot

Reference data for plotting

type (YODA.Scatter2D)

property scaled

Bool representing if there is additional scaling applied on top of luminosity

type (bool)

property signal_scale

Scale factor applied to the signal histogram/scatter, derived generally from input nEv and xs

type (float)

property sigplot

Signal for plotting

type (YODA.Scatter2D)

property stack_databg

Stacked, unscaled Signal+background for plotting (data as background)

type (YODA.Scatter2D)

property stack_smbg

Stacked, unscaled Signal+background for plotting (SM as background)

type (YODA.Scatter2D)

property thy

Reference SM theory data, scaled if required

type (YODA.Scatter2D)

property thyplot

Theory for plotting

type (YODA.Scatter2D)

class contur.factories.test_observable.ObservableValues(bin_widths=None, central_values=None, err_breakdown=None, covariance_matrix=None, diagonal_matrix=None, isref=False)[source]

Bases: object

A book-keeping class to contain all the numerical info (central values, err_breakdown, covariance) for a given binned observable.

contur.factories.likelihood module

This module contains the implementation of the likelihood calculation, and various functions to manipulate test statistics.

Abstracted from the underlying YODA objects, this module defines two ways to construct likelihood functions from numerical types:

  • Likelihood – The base likelihood building blocks, representing the information extracted from an underlying histogram of potentially correlated observables.

  • CombinedLikelihood – A shell to combine Likelihood blocks into a full likelihood, automatically encodes assumption that the included blocks are uncorrelated.

class contur.factories.likelihood.CombinedLikelihood(stat_type='all')[source]

Bases: Likelihood

Shell to combine Likelihood blocks

This class is used to extract the relevant test statistic from each individual Likelihood and combine them This is initialised with no arguments as it is just a shell to combine the individual components, and automatically encodes the fact that each block is uncorrelated with each other

Two use cases: 1. Combining subpool likelihoods, where statistics are combined for all stat types.

2. Building the full likelihood, which is done separately for each stat type. This is because different histograms can provide the best exclusion in a pool depending on the stat type.

Note

Technically this could be constructed by building a Likelihood with a master covariance matrix made forming block diagonals with each individual component. Avoiding this is faster but less rigourous

add_likelihood(likelihood)[source]

Add a Likelihood block to this combination likelihood

Parameters:

likelihood (Likelihood) – Instance of computed Likelihood

calc_cls()[source]

Call the calculation of the CLs confidence interval

Triggers the parent class calculation of the CLs interval based on the sum of test statistics added with the add_likelihood() method

combine_spey_models()[source]

Combines a list of spey models into a single one. Assumes models are statistically uncorrelated

getCLs(stat_type)[source]

CLs hypothesis test value (ratio of p_s_b and p_b)

stat_type (str)

get_mu_hat(stat_type)[source]

Maximum likelihood estimator of the signal strength parameter. type (float)

get_mu_upper_limit(stat_type)[source]

Upper limit on the signal strength parameter mu at 95% CLs

type (float)

get_ts_b(stat_type)[source]

Test statistic of b only hypothesis

stat_type (str)

get_ts_s_b(stat_type)[source]

Test statistic of the s+b hypothesis

stat_type (str)

set_ts_b(stat_type, value)[source]
class contur.factories.likelihood.Likelihood(calculate=False, ratio=False, profile=False, lumi=1.0, lumi_fb=1.0, sxsec=None, bxsec=None, tags='', sm_values=None, measured_values=None, bsm_values=None, expected_values=None)[source]

Bases: object

Fundamental likelihood-block class and confidence-interval calculator

This class defines the structure of a series of observables to be constructed into a hypothesis test

Keyword Arguments:
  • calculate (bool) – Perform the statistics calculation (otherwise the input variables are just set up)

  • ratio (bool) – Flag if data is derived from a ratio measurement (not general! caveat emptor!)

  • profile (bool) – Flag if data is derived from a profile histogram measurement

  • lumi (float) – the integrated luminosity in the units of the measurement. used to calculate expected stat uncertainty on signal

  • lumi_fb (float) – the integrated luminosity for the measurement in fb. used to scale for HL-LHC

  • sxsec (float) – Signal cross-section in picobarns, if non-null

  • tags (string) – names of the histograms this is associated with

  • sm_values (Observable_Values) – All the numbers for the SM prediction

  • measured_values (Observable_Values) – All the numbers for the measurement

  • bsm_values (Observable_Values) – All the numbers for the signal

  • expected_values (Observable_Values) – The SM prediction with data uncertainties.

build_spey_models()[source]

Function to build an Spey statistical models for hypothesis testing. For each stat type, the model constructed is a multivariate Gaussian with one parameter, the signal strength.

calculate(stat_type)[source]

Default mode: Calculates the CLs exclusion for this histogram (and this stat type)

Spey mode: Calculates several statistics using the spey statistical models, namely: - CLs exclusion - 95% CLs upper limit on the signal strength parameter mu - Maximum likelihood estimator of the signal strength parameter, muhat

calculate_max_mu(stat_type)[source]

Calculate the 95% CLs upper limit on the signal strength parameter mu.

calculate_mu_hat(stat_type)[source]

Calculate maximum likelihood estimator of the signal strength parameter, mu_hat.

cleanup_model_list()[source]

Delete the single bin models, this can be done after the bin with the highest exclusion power is found

find_dominant_bin(stat_type)[source]

Function to find the bin that gives the highest CLs for cases with no covariance matrix (either the matrix has no invserse or has not been succesfully built)

getCLs(type)[source]

CLs hypothesis test value (ratio of p_s_b and p_b)

type (float)

get_mu_hat(type)[source]

Maximum likelihood estimator of the signal strength parameter.

type (float)

get_mu_upper_limit(type)[source]

Upper limit on the signal strength parameter mu at 95% CLs

type (float)

get_ndof(type)[source]

estimate numbers of degree of freedom for this plot

get_sm_pval()[source]

Calculate the pvalue compatibility (using chi2 survival) for the SM prediction and this measurement

get_ts_b(type)[source]

Test statistic of b only hypothesis

type (float)

get_ts_s_b(type)[source]

Test statistic of the s+b hypothesis

type (float)

property pools

Pool that the test belongs to

settable parameter

type (string)

set_ts_b(type, value)[source]
set_ts_s_b(type, value)[source]
spey_calculate_CLs(stat_type)[source]

Use the statistical model to calculate the CLs exclusion. This is calculated from the profile likelihood ratio

property subpools

Subpool the test belongs to

settable parameter

type (string)

property tags

Name(s) of source histograms for this block

settable parameter

type (string)

contur.factories.likelihood.build_full_likelihood(sorted_blocks, stat_type)[source]

Function to build the full likelihood representing an entire YODA file

This function takes the sorted_likelihood_blocks and combines them as statistically uncorrelated diagonal contributions to a CombinedLikelihood instance which is stored as an attribute to this class as likelihood

Keyword Arguments:
  • stat_type (string) – Stat type to build full likelihood for

contur.factories.likelihood.combine_subpool_likelihoods(likelihood_blocks)[source]

build combined likelihoods for any active subpools, and add them to the list of likelihood blocks.

contur.factories.likelihood.likelihood_blocks_find_dominant_ts(likelihood_blocks, stat_type)[source]

Function that finds the chi-square test statistic that gives the maximum confidence level for each likelihood block for which we don’t have a valid covariance matrix (either the matrix has no invserse or has not been succesfully built)

contur.factories.likelihood.likelihood_blocks_ts_to_cls(likelihood_blocks, stat_type)[source]

Function that calculates the confidence level for each likelihood block extracted from the YODA file using the signal and background test statistic for the block

contur.factories.likelihood.pval_to_cls(pval_tuple)[source]

Function to calculate a cls when passed background and signal p values.

notes: we are not actually varying a parameter of interest (mu), just checking mu=0 vs mu=1

the tail we integrate to get a p-value depends on whether you’re looking for signal-like or background-like tails. For the signal-like p-value we integrate over all the probability density less signal-like than was observed, i.e. to the right of the observed test stat.

For the background-like p-value we should integrate over the less background-like stuff, i.e. from -infty to t_obs… which is 1 - the t-obs…infty integral.

So CLs is the ratio of the two right-going integrals, which is nice and simple and symmetric, but looks asymmetric when written in terms of the p-values because they contain complementary definitions of the integral limits

The code has implemented them both as right-going integrals, so does look symmetric, hence this comment to hopefully avoid future confusion.

Parameters:

pval_tuple (Tuple of floats) – Tuple, first element p-value of signal hypothesis, second p-value of background

Returns:

CLs float – Confidence Interval in CLs formalism

contur.factories.likelihood.sort_blocks(likelihood_blocks, stat_type, omitted_pools='')[source]

Function that sorts the list of likelihood blocks extracted from the YODA file

This function implements the sorting algorithm to sort the list of all extracted Likelihood blocks in the likelihood_blocks list, storing the reduced list in the sorted_blocks list

Keyword Arguments:
  • stat_type (string) – Which statisic (default, smbg, expected, hlexpected) to sort on.

contur.factories.likelihood.ts_to_cls(ts_tuple_list, tags)[source]

Method to directly cast a list of tuples of test statistics (tuple contains background and signal test stats) into a list of CLs values

notes: we are not actually varying a parameter of interest (mu), just checking mu=0 vs mu=1

the tail we integrate to get a p-value depends on whether you’re looking for signal-like or background-like tails. For the signal-like p-value we integrate over all the probability density less signal-like than was observed, i.e. to the right of the observed test stat.

For the background-like p-value we should integrate over the less background-like stuff, i.e. from -infty to t_obs… which is 1 - the t-obs…infty integral.

So CLs is the ratio of the two right-going integrals, which is nice and simple and symmetric, but looks asymmetric when written in terms of the p-values because they contain complementary definitions of the integral limits

The code has implemented them both as right-going integrals, so does look symmetric, hence this comment to hopefully avoid future confusion.

Parameters:

ts_tuple_list (list) – list of tuples of tests statistics (tuples of the form (test stat background, test stat background))

Returns:

CLs list – List of Confidence Intervals in CLs formalism

contur.factories.likelihood.ts_to_pval(ts)[source]

Method to convert test statistic to log pval

Parameters:

ts (float assuming n DoF=1) – Single or numpy array of test statistics to convert to a p-values with a Gaussian

Returns:

p-value float

contur.factories.likelihood_point module

class contur.factories.likelihood_point.LikelihoodPoint(paramPoint={}, yodaFactory=None)[source]

Bases: object

Save the statistical information about a model parameter point in a run, which can then be manipulated to sort them, calculate a full likelihood result, exclusions result, test b result, test s+b result with related stat_type and a parameter point dictionary

If instantiated with a valid parameter dictionary this will be added as a property If instantiated with a valid YodaFactory, its likelihood blocks will be associated with this likelihood point

If these are not provided, a blank point will be created which can be populated later (e.g. from a results database)

Note that in general those likelihood blocks (ie the lists of likelihood objects) will not be present, since a result database does not store them. The statistics info can be retrieved from the relevant dictionaries, but not recalculated from scratch since this signal/background info won’t be available.

fill_pool_dict(stat_type)[source]
get_dominant_analysis(stat_type, poolid=None, cls_cut=0.0)[source]

return the analysis object which has the biggest exclusion for this point.

get_dominant_pool(stat_type)[source]

return the name of the dominant pool for this point

get_full_likelihood(stat_type=None)[source]

The full likelihood representing the result file in it’s entirety.

If stat_type is specified, return to entry for it. Else return the dict of all of them.

type (CombinedLikelihood)

get_run_point()[source]
get_sorted_likelihood_blocks(stat_type=None)[source]

The list of reduced component likelihood blocks extracted from the result file, sorted according the test statisitic of type stat_type. If stat_type is None, return the whole dictionary.

type ( list [ Likelihood ])

property likelihood_blocks

The list of all component likelihood blocks extracted from the result file

This attribute is the total information in the result` file, but does not account for potential correlation/ overlap between the members of the list

type ( list [ Likelihood ])

recalculate_CLs(stat_type, omitted_pools='')[source]

recalculate the combined exclusion after excluding the omitted pool in the class :param omitted_pools:

string, the name of the pool to ignore

resort_blocks(stat_type, omitted_pools='')[source]

Function to sort the sorted_likelihood_blocks list. Used for resorting after a merging exclusively. :Keyword Arguments: * stat_type (string) – which statisic type (default, SM background, expected or hlexpected) is being sorted by.

set_full_likelihood(stat_type, value)[source]
set_run_point(run_point)[source]
set_sorted_likelihood_blocks(value, stat_type)[source]
store_param_point(paramPoint)[source]
Parameters:

paramPoint (dict) – key string param name : value float

store_point_info(statType, combinedExclusion, poolExclusion, poolHistos, poolTestb, poolTestsb, obs_excl_dict, yoda_files)[source]
Parameters:
  • statType – string, represent the point type

  • combinedExclusion (float) – full likelihood for a parameter point

  • poolExclusion (dict) – key string pool name : value double

  • poolHistos (dict) – key string pool name : value string

  • poolTestb (dict) – key string pool name : value double

  • poolTestsb (dict) – key string pool name : value double

contur.factories.depot module

The Depot module contains the Depot class. This is intended to be the high level analysis control, most user access methods should be implemented at this level

class contur.factories.depot.Depot[source]

Bases: object

Parent analysis class to initialise

This can be initialised as a blank canvas, then the desired workflow is to add parameter space points to the Depot using the add_point() method. This appends each considered point to the objects internal points. To get the point from a database to the Depot use the add_points_from_db() method.

Path for writing out objects is determined by cfg.plot_dir

add_point(yodafile, param_dict)[source]

Add yoda file and the corresponding parameter point into the depot

add_points_from_db(file_path, runpoint=None)[source]

Get the info of model points from the result database into the depot class

@TODO write a “get from DB” method for likelihood_point?

export(path, include_dominant_pools=True, include_per_pool_cls=False)[source]
property frame

A pandas.DataFrame representing the CLs interval for each point in points

type (pandas.DataFrame)

merge(depot)[source]

Function to merge this conturDepot instance with another.

Points with identical parameters will be combined. If point from the input Depot is not present in this Depot, it will be added.

Parameters:

depot (contur.conturDepot) – Additional instance to conturDepot to merge with this one

property points

The master list of LikelihoodPoint instances added to the Depot instance

type ( list [ LikelihoodPoint ])

resort_points()[source]

Function to trigger rerunning of the sorting algorithm on all items in the depot, typically if this list has been affected by a merge by a call to contur.depot.merge()

write(outDir, args, yodafile=None, include_dominant_pools=True, include_per_pool_cls=True)[source]

Function to write depot information to disk

write a results db files to outDir if cfg.csvfile is not None, also write out a csv file containing the data

Parameters:

outDir (string) – String of filesystem location to write out the pickle of this instance to

write_summary_dict(output_opts)[source]

Write a brief text summary of a conturDepot to a (returned) dictionary, intended for use with yoda stream input.

Parameters:

output_opts – list of requested outputs to put in the summary dict

contur.factories.depot.ts_to_cls(ts_tuple_list)[source]

calculate the final cls value

contur.factories.yoda_factories module

The yoda_factories module contains three main components in the middle of the data flow, sitting between the high level steering in contur.factories.Depot class and the lower level statistics in the contur.factories.Likelihood class

class contur.factories.yoda_factories.YodaFactory(yodaFilePath)[source]

Bases: object

Class controlling Conturs YODA file processing ability

This class is initialised from an os path to a YODA file and dresses it by iterating through each ao and wrapping that in an instance of Observable which encapsulates a YODA analysis object and derives the required Likelihood block for it. This class then contains the aggregated information for all of these instances across the entire YODA file.

Parameters:

yodaFilePath (string) – Valid os.path filesystem YODA file location

contur.factories.yoda_factories.load_bg_data(path, sm_id='A', smtest=False)[source]

load the background (THY) and data (REF) for all the observables associated with this rivet analysis if smtest then read all SM predictions, if not then only read the default/selected one.

contur.factories.yoda_factories.load_ref_ao(path, orig_ao, aos)[source]

Load the ao, with the path=path, into memory, as THY or REF object

contur.factories.yoda_factories.load_ref_aos(f, analysis)[source]

Load the relevant analysis objects (REF or THY) from the file f.

contur.factories.yoda_factories.load_sm_ao(path, orig_ao, sm)[source]

Load the ao, with the path=path, into memory, as THY or REF object

contur.factories.yoda_factories.load_sm_aos(sm)[source]

Load the relevant analysis objects (REF or THY) from the file f.

contur.factories.yoda_factories.root_n_errors(ao, is_evcount, nx=0.0, lumi=1.0, replace=False)[source]

Function to include root(number of expected events) errors in the uncertainties of 2D scatter.

The uncertainty based on the expected events for the relevant integrated luminosity. This is not about MC statistics!

The minimum uncertainty is one event… we are not doing proper low-stat treatment in tails, so this is a conservative fudge.

Parameters:
  • ao – The YODA analysis object to be manipulated.

  • nx – factor needed to convert to number of events for none-uniform bin widths (<0, not used, ==0, do nothing).

  • is_evcount – True is the plot is in event numbers. Otherwise assumed to be a differential cross section.

  • lumi – Integrated luminosity used to get event counts from differential cross sections

  • replace – If True replace the uncertainties. If False (default) add them in quadrature.

Type:

YODA.AnalysisObject

Type:

float

Type:

boolean

Type:

float

Type:

bool

Module contents