Method
======
Here the considerations taken in combining the datasets and defining the test are outlined
.. include:: toolchain.rst
Strategy
--------
The current approach considers BSM models in the light of
existing measurements which have already been shown to agree with SM
expectations. Thus this is inherently an exercise in limit-setting
rather than discovery. The assumption is that a generic,
measurement-based approach such as this will not be competitive in terms
of sensitivity, or speed of discovery, with a dedicated search for a
specific BSM final-state signature. However, it will have the advantage
of breadth of coverage, and will make a valuable contribution to physics
at the energy frontier whether or not new signatures are discovered at
the LHC.
In the case of a new discovery, many models will be put forward
to explain the data (as was seen for example :cite:`PhysRevLett.116.150001` after the 750 GeV
diphoton anomaly reported by ATLAS and CMS at the end of 2015 and start
of 2016 :cite:`ATLAS-CONF-2016-018,CMS-PAS-EXO-16-018`).
Checking these models for consistency with existing measurements will be
vital for unravelling whatever the data might be telling us.
Models designed to explain one signature
may have somewhat unexpected consequences in different final states,
some of which have already been precisely measured.
If it should turn out that no BSM signatures are in the end confirmed at the LHC, Contur offers
potentially the broadest and most generic constraint on new physics,
and motivates the precise model-independent measurements
over a wide range of final states, giving the best chance of an indirect
pointer to the eventual scale of new physics.
Dynamical data selection
------------------------
We define a procedure to combine exclusion limits from different
measured distributions. The data used for comparison in come in the form
of histograms (or 2D scatter plots), some of which carry information about the correlations
between systematic uncertainties.
There are also overlaps between event samples used in many different measurements,
which lead to non-trivial correlations in the statistical uncertainties.
To avoid spuriously high exclusion rates due to
multiply-counting what might be the same exclusion against several
datasets, we take the following approach:
#. Divide the measurements into groups that have no overlap in the event
samples used, and hence no statistical correlation between them.
These measurements are grouped by, crudely, different final states,
different experiments, and different beam energies (referred to as *pools*, see :doc:`data listing <../datasets/data-list>`).
#. Scan within each group for the most significant deviation between
BSM+SM and SM. This is done distribution-by-distribution and
bin-by-bin within distributions. Use only the measurement with the most significant
deviation, and disregard the rest. Although the selection of the most
significant deviation sounds intuitively suspect, in this case it is
a conservative approach, since we are setting limits, and discarding
the less-significant deviations simply reduces sensitivity. If correlations are
not used (or unavailable) the single most discrepant bin from the most
discrepant measurement is used, removing the dominant effect of
highly correlated systematic uncertainties within a single
measurement. Where a number of statistically-independent measurements
exists *within* a pool, their likelihoods may be combined to give a
single likelihood ratio from the group.
#. Combine the likelihood ratios of the different groups to give a
single exclusion limit.
Statistical Method
------------------
The question we wish to ask of any given BSM proposal is *‘at what
significance do existing measurements, which agree with the SM, already
exclude this’*. For all the measurements considered, comparisons to SM
calculations have shown consistency between them and the data. Thus as a
starting point, we take the data as our “null signal”, and we superimpose
onto them the contribution from the BSM scenario under consideration.
The uncertainties on the data will define the allowed space for these
extra BSM contributions.
Since unfolded measurements generally have reasonably high statistics, a simple :math:`\chi^2` method is appropriate and is
used for most of these results, for speed and simplicity. However, this has been validated against the
more sopisticate likelihood method described below.
Taking each bin of each distribution considered as a separate statistic
to be tested, a likelihood function for each bin can be constructed as
follows,
.. math::
:label: likely
\begin{aligned}
L(\mu, {b}, {\sigma}_{b}, {s}) = { \frac{(\mu s + b)^{n}}{n!} \exp\big(-(\mu s + b)\big) \times \frac{1}{\sqrt{2 \pi} \sigma_{b}} \exp\left(-\frac{(m - b)^{2}}{2 \sigma_{b}^{2}}\right)} \times \frac{(\tau s)^{k}}{k!}\exp\big(-\tau s\big)\,,\end{aligned}
where the three factors are:
- A Poisson event count, noting that the measurements considered are
differential cross section measurements, hence the counts are
multiplied by a factor of the integrated luminosity taken from the
experimental paper behind each analysis, to convert to an event count
in each bin (and subsequently the additional events that the new
physics would have added to the measurement made). This statistic in
each tested bin then is comprised of:
- :math:`s`, the parameter defining the BSM signal event count.
- :math:`b`, the parameter defining the background event count.
- :math:`n`, the observed event count.
- :math:`\mu`, the signal strength parameter modulating the strength
of the signal hypothesis tested, thus :math:`\mu=0` corresponds to
the background-only hypothesis and :math:`\mu=1` the full signal
strength hypothesis;
- A convolution with a Gaussian defining the distribution of the
background count, where the following additional components are
identified:
- :math:`m`, the background count. The expectation value of this
count, which is used to construct the test, is taken as the
central value of the measured data point.
- :math:`\sigma_{b}`, the uncertainty in the background event count
taken, from the data, as 1 :math:`\sigma` error on a Gaussian
(uncertainties taken as the combination of statistical and
systematics uncertainties in quadrature. Typically the systematic
uncertainty dominates).
- An additional Poisson term describing the Monte Carlo error on the
simulated BSM signal count with :math:`k` being the actual number of
generated BSM events. The expectation value of :math:`k` is related
to :math:`s` by a factor :math:`\tau`, which is the ratio of the
generated MC luminosity to the experimental luminosity.
This likelihood is then used to construct a test statistic based on
the profile likelihood ratio, following the arguments laid out in
Ref. :cite:`Cowan:2010js`. In particular, the
:math:`\tilde{q}_{\mu}` test statistic is constructed. This enables
the setting of a one-sided upper limit on the confidence in the
strength parameter hypothesis, :math:`\mu`, desirable since in the
situation that the observed strength parameter exceeds the tested
hypothesis, agreement with the hypothesis should not diminish. In
addition this construction places a lower limit on the strength
parameter, where any observed fluctuations below the background-only
hypothesis are said to agree with the background-only
hypothesis [3]_. The required information then is the sampling
distribution of this test statistic. This can either be evaluated
either using the so called Asimov data set to build an approximate
distribution of the considered test statistic, or explicitly using
multiple Monte Carlo ‘toy model’ tests [4]_.
The information needed to build the approximate sampling distributions
is contained in the covariance matrix composed of the second
derivatives with respect to the parameters (:math:`\mu, b` and
:math:`s`), of the log of the likelihood given in equation
:eq:`likely`. They are as follows:
.. math::
:label: likely2
\begin{aligned}
\mu \mu :& &\frac{\partial^2{\text{ln}L}}{\partial{\mu^2}} = & \frac{-ns^2}{(\mu s + b)^2} \\
b b :& &\frac{\partial^2{\text{ln}L}}{\partial{b^2}} = & \frac{-n}{(\mu s + b)^2} - \frac{1}{\sigma_b^2} \\
s s :& &\frac{\partial^2{\text{ln}L}}{\partial{s^2}} = & \frac{-n\mu^2}{(\mu s + b)^2} - \frac{k}{s^2} \\
\mu s = s \mu :& &\frac{\partial^2{\text{ln}L}}{\partial{\mu \partial s}} = & \frac{nb}{(\mu s + b)^2} - 1 \\
\mu b = b \mu :& &\frac{\partial^2{\text{ln}L}}{\partial{\mu \partial b}} = &\frac{-ns}{(\mu s + b)^2} \\
b s = sb :& &\frac{\partial^2{\text{ln}L}}{\partial{s \partial b}} =& \frac{-n\mu}{(\mu s + b)^2}.\end{aligned}
Which are arranged in the inverse covariance matrix as follows.
.. math::
:label: covar
\begin{aligned}
V^{-1} = - E
\begin{bmatrix}
\mu\mu & \mu s & \mu b \\
s \mu & s s & s b \\
b \mu & b s & b b
\end{bmatrix}
\end{aligned}
The variance of :math:`\mu` is extracted from the inverse of the matrix
given in :eq:`covar` as;
.. math:: \sigma_\mu^{2} = V_{\mu,\mu}
In order to evaluate this, the counting parameters (:math:`n, m` and
:math:`k`) are evaluated at their Asimov values, following arguments
detailed in Ref. :cite:`Cowan:2010js`. These are taken as
follows,
- :math:`n_{A} = E[n] = \mu' s + b`. The total count under the assumed
signal strength, :math:`\mu'`, which for the purposes of this
argument is equal to 1
- :math:`m_{A}=E[m] = b`. The background count is defined as following
a Gaussian distribution with a mean of :math:`b`.
- :math:`k_{A} = E[k] = \tau s`. The signal count is defined following
a Poisson distribution with a mean of :math:`\tau s`
Using this data set the variance of the strength parameter, :math:`\mu`,
under the assumption of a hypothesised value, :math:`\mu'`, can be
found. This is then taken to define the distribution of the
:math:`\tilde{q}_{\mu}` statistic, and consequently the size of test
corresponding to the observed value of the count. The size of the test
can be quoted as a :math:`p`-value, or equivalently the confidence level
which is the inverse of the size of the test. As is convention in the
particle physics community, the final measure of statistical agreement
is presented in terms of what is known as the CL\ :math:`_{s}`
method :cite:`Junk:1999kv,Read:2002hq`. Then, for a given
distribution, CL\ :math:`_{s}` can be evaluated separately for each bin,
where the bin with the largest CL\ :math:`_{s}` value (and
correspondingly smallest :math:`p_{s+b}` value) is taken to represent
the sensitivity measure used to evaluate each distribution, a process
outlined in section [sec:selec].
Armed then with a list of selected sensitive distributions with minimal
correlations, a total combined CL\ :math:`_{s}` across all considered
channels can then be constructed from the product of the likelihoods.
This leaves the core of the methodology presented here unchanged, the
effect is simply extending the covariances matrix. The overall result
gives a probability, for each tested parameter set, that the observed
counts :math:`n_{i}`, across all the measurement bins considered, are
compatible with the full signal strength hypothesis.
Finally it is noted that this methodology has been designed to simply
profile BSM contributions against data taken. This can be extended to
incorporate a separate background simulation or include correlation
between bins where available.
.. [3]
This is not unexpected, the construction up to this point has been
designed to look at smoothly falling well-measured processes at
energies that the LHC is designed to probe. This is however a result
that should be monitored when considering different models.
.. [4]
For the cases considered here the results were found to be equivalent, implying that the tested parameter space values fall into
the asymptotic, or large sample, limit, and so the Asimov approach is used
Limitations
-----------
Most of the limitations come from the fact that (in its default mode) Contur assumes the data are identically equal to the SM. This is
an assumption that is reasonable for distributions where the uncertainties on the SM prediction are not larger than the
uncertainties on the data. It is also the assumption made in the control regions of many searches,
where the background evaluation is "data driven".
Because of this, Contur as currently implemented is best adapted to identifying kinematic features (mass peaks, kinematic edges) and may be
less reliable for smooth deviations in normalisation. In particular, since we currently take the data to be identically equal to the
SM expectation, we will be insensitive to a signal which might in principle arise as the cumulative effect of a number of
statistically insignificant deviations across a range of experimental measurements.
To do this properly requires an extensive evaluation of the theoretical uncertainties on the SM predictions for each channel. These predictions
and uncertainties are gradually being added to Contur and can be tried out using a command-line option (see :cite:`Brooijmans:2020yij` for a first
demonstration of this).
Additionally, in low statistics regions, outlying events in the tails of the data will not lead to a weakening of the limit,
as would be the case in a search. However, measurements unfolded to the particle-level are typically performed in bins with a
requirement of minimum number of events in any given bin, reducing the impact of this effect (and also weakening the exclusion
limits).
Although some searches are available in Rivet and can be used by Contur (since :cite:`Brooijmans:2020yij`) by selecting the appropriate command-line option,
our limits generally focus on the impact of high precision measurements on the BSM model, in which systematic uncertainties typically
dominate.
For these reasons, the limits derived are best described as expected limits, seen as delineating regions where the
measurements are sensitive and deviations are disfavoured. In regions where the confidence level is high, they do
represent a real exclusion.