Content uploaded by Brendan Marsh

Author content

All content in this area was uploaded by Brendan Marsh on Aug 11, 2016

Content may be subject to copyright.

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

Brendan Marsh University of Missouri August 8, 2016

Ph.D. Student Supervisor: Antonio De Maria

Supervisor: Prof. Dr. Arnulf Quadt

Abstract

A multivariate analysis is presented for the study of the vector boson

fusion (VBF) Higgs boson decaying to a pair of tau leptons. While the VBF

production mechanism of the Higgs is roughly an order of magnitude lower

in cross section than the dominant gluon-gluon fusion mechanism, it is

shown that VBF produces a distinctive signature that is well suited for

detection by multivariate analyses. A number of discriminant variables are

explored in addition to a direct comparison of different machine learning

toolkits. Ultimately, a statistical significance of 7.9 is achieved for detection

of the VBF Higgs boson in this truth level study.

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

. . . . . . . . . . . . . . . . . . . . . . . . . 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

. . . . . . . . . . . . . . . . . . . . . . . . . . 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Contents

1. Motivation and Background

1.1 The Higgs Boson

1.2 Vector Boson Fusion

1.3 Fully Hadronic Decay Mode

1.4 Background Processes

2. Multivariate Analysis

2.1 Monte Carlo Samples

2.2 Preselection Cuts

2.3 Cut Based Analysis

2.4 Decision Trees

2.5 Adaptive Boosting

2.6 Discriminant Variables

2.6.1 Collinear Approximation

2.6.2 Tau Centrality Product

2.6.3 ! Variables

2.6.4 Tau-Jet Angular Correlations

2.6.5 Fox-Wolfram Moments

2.6.6 MVA Variables

2.7 TMVA Multivariate Analysis

2.8 Scikit Learn Multivariate Analysis

3. Conclusions

3.1 Outlook for VBF Higgs Analysis

3.2 Suggestions for Future Studies

3.3 Thanks!

References

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

2

1. Motivation and Background

1.1 The Higgs Boson

Within the context of the Standard Model (SM),

the Higgs mechanism is necessary for the mass

generation of the W and Z gauge bosons. By

invoking a break in electroweak symmetry, the

Higgs mechanism implies the existence of a spin

zero, neutral particle; we know this particle as the

Higgs boson.

For many years, the Higgs remained elusive in

particle detectors. It was not until July 4, 2012 that

CERN announced that both the CMS and ATLAS

experiments at the large hadron collider (LHC) met

the 5" discovery benchmark for a new boson with a

mass of roughly 125 GeV that was consistent with

a Higgs boson. It seems the Higgs has finally been

found!

Many studies of the Higgs boson are ongoing as Run II of the LHC is currently approaching an

online integrated luminosity of 20 inverse femtobarns. As our studies of the Higgs progress, the vector

boson fusion production mechanism becomes increasingly important as a detection pathway, in CP

violation studies [1], and in other areas.

1.2 Vector Boson Fusion

A standard model Higgs boson may be produced via one of four production mechanisms at the

LHC. The vector boson fusion (VBF) mechanism involves the scattering of two quarks via the

exchange of a W or Z (vector) boson. This pair of vector bosons then fuses to produce a low mass

Higgs boson.

Figure 2 Left: Feynman diagrams of the four Higgs production mechanisms at the LHC, with vector boson

fusion highlighted in red. Right: Corresponding cross section for Higgs production mechanisms.

One can see from the cross section that the gluon-gluon mechanism is roughly an order of

magnitude greater than that of the VBF mechanism for a Higgs of mass 125 GeV [2]. However,

the addition of the two quarks into the final state, visible as highly energetic jets, produces a

Figure 1 The elementary particles of the Standard

Model, labelled with their mass, charge, and spin.

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

3

distinctive signature that is lacking in gluon-gluon fusion. In terms of measurable quantities,

VBF events may be recognized by the following characteristics:

• Highly ! separated jets

• Jets in opposite hemispheres

• High invariant mass of jets

• No central jets above a certain #$

1.3 Fully Hadronic Decay Mode

The 125 GeV Higgs boson most often decays into a %% pair, however this decay mode is not easily

recovered in a sea of && background [3]. The Higgs additionally may decay into a '(') pair; this is the

decay mode studied in this analysis. Specifically, I investigate the “fully hadronic” decay mode in which

both tau leptons subsequently decay into a tau neutrino and a number of pions, which accounts for

roughly 41% of the branching ratio[2]. A Feynman diagram of the signal process is given below.

Figure 3 The Feynman diagram of the signal process of this study; a Higgs boson production via vector boson

fusion with a subsequent decay into tau leptons, a tau neutrino, and pion.

1.4 Background Processes

A bit like searching for a needle in a haystack, the VBF Higgs process is a rare event that is drowned

out by background processes with similar event characteristics and much higher cross sections. To

detect a small signal in a sea of background, one’s goal is to remove as much of the background as

possible while retaining as many signal events as possible. Thus, it is equally as important to

understand the background processes competing with your signal process as it is important to

understand your signal process. The main background processes relevant to this study are the Z*''

and && processes.

Z*'' +,-&.

According to the particle data group [2], the Z boson decays into a pair of tau leptons with a

branching ratio of roughly 3.4%. As Z bosons are produced in excess at the LHC, this channel

introduces a large background with the same final state, a pair of tau leptons. Fortunately, there do

exist features of VBF that we expect to differ in the case of Z*''. Foremost, the invariant mass of the

reconstructed taus should reflect the mass of the particle from which it came, although mass

reconstruction can be difficult (section 2.6.1). For VBF taus we expect to see the mass of the Higgs,

roughly 125 GeV, while for the Z*'' channel we expect a peak around 91 GeV. Additionally, the

distinctive jet topology of VBF is not expected in the Z*'' channel.

&&

Top quarks almost always decay into W boson – b quark pairs, with the W boson then emitting a

tau lepton. Thus, given two top quarks it is possible to have two taus in the final state. Therefore &&

background, also produced in excess at the LHC, poses another background process. However, there

exist a number of features of the && background that make it quite easy to eliminate. Very often in the

,

,

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

4

final state of the && background there exist jets originating from b quarks, while this is rare for VBF final

states. Fortunately, there exist “b-tagging” algorithms capable of labelling jets in the detector that most

likely arise from b quarks. Thus, we may cut out events with b jets, leaving Z*'' as irreducible

background. Additionally, we do not expect to find any correlations between the tau decay products

and the missing transverse energy, unlike VBF in which they are heavily correlated.

2. Multivariate Analysis

The basic goal of any multivariate analysis (MVA) is to classify signal events over background

events, with as high of an efficiency as possible, given some input variables for each event. Most

MVAs take a number of input variables and return a single measure of “signal-likeness”, which must

hit a certain threshold to be considered a signal event.

Before diving into the multivariate techniques used for this analysis, the training samples used to

develop and test the analysis will be described, along with the traditional cut based analysis for VBF

and reasons why it can be improved using a multivariate analysis.

2.1 Monte Carlo Samples

Monte Carlo simulations provide a powerful tool for studying stochastic processes. Here, Powheg

and Pythia 8 Monte Carlo generators were used to simulate truth level events for both VBF and the

relevant background processes at a centre of mass energy of . / 012345. Using these simulated

events, one may train a multivariate analysis method to be applied to real data. The Monte Carlo

samples used for this study are given below.

It is important to note that this was truth level study only; no reconstruction or trigger level effects

have been incorporated. These effects are non-negligible and should incorporated in future studies.

2.2 Preselection Cuts

A number of cuts may be applied to the events before any classifier is used. Some of these cuts

correspond to limitations of the ATLAS detector (corresponding to events that would not be well

reconstructed in practice) while others are made specifically to remove background events. The

preselection cuts used for this analysis are given below. If any event does not fulfill the criteria, it is

discarded from the analysis.

The transverse momentum of both tau leptons must be at least 20 GeV

to be detected and reconstructed by tau reconstruction algorithms.

The absolute value of !, the pseudorapidity, of each tau lepton must be

less than 2.5 for good reconstruction in the tracker.

The missing transverse energy should be greater than 20 GeV, as we

expect missing energy from neutrinos in the final state.

'

678

9$> 20 GeV

:!;: < 2.5

MET > 20 GeV

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

5

The transverse momentum of the leading and subleading jet should be

greater than 20 GeV to be detected.

B-tagging algorithms can identify jets originating from b quarks, thus b-

tagged jets can be cut to eliminate && background. In truth level studies,

one uses the PDG (Particle Data Group) ID to identify and cut b-jets.

2.3 Cut Based Analysis

The most basic form of classifier, and the one that is often used due to its simplicity and physical

motivation, is a simple cut based analysis. This entails requiring a candidate event to pass a series of

univariate “cuts” which are motivated by knowledge of the signal process. The traditional cuts used to

identify VBF events over background events are given below [4].

VBF produces highly energetic quark jets into the final state, we expect

to see a leading jet with high transverse momentum.

There are two quark jets into the final state, thus the subleading jet

should also have high transverse momentum.

The jets of VBF have characteristically high separation in

pseudorapidity.

The VBF topology exhibits jets that are back-to-back.

The highly energetic jets show a high invariant mass.

The tau leptons should be detected in the central part of the detector in

comparison to the jets. Explicitly, the pseudorapidity of the taus should

lie between the range spanned by the jets.

The cut based analysis has its advantages; it is very simple to implement, requires no “training” like

the multivariate methods, and the rationale for each of the cuts is grounded in physics. However, while

it excels in its understandability, it often lacks the classification power required to recover rare

processes like the VBF Higgs.

The inferiority of the cut based analysis lies in the assumption that each variable can be cut upon

independently of the others when, in fact, the best cut to make on one variable may depend on another,

or even many others. That is, correlations cannot be accounted for. This issue is addressed by

multivariate classification methods like decision trees.

2.4 Decision Trees

Decision trees, like cut based analyses, split events into groups by setting a threshold on some

variable. However, while the cut based analysis only makes a single round of cuts, decision trees

continue to further subdivide groups, separating signal from background more and more at each step

by making the most efficient cut possible. Additionally, the most efficient cuts are calculated

algorithmically from a set of data used to “train” the decision tree.

9

$

<=>? > 40 GeV

9$

8@A<=>? > 30 GeV

:!<=>? B !8@A<=>? : > 3

!<=>? C !8@A<=>? < 0

DEFGH I (EJKLFGHI

> 300 GeV

Jets-Taus Centrality

!

9

$

<=>? M 9

$

8@A<=>? > 20 GeV

No b-tagged jets

!

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

6

Figure 4 A simple decision tree. Here orange represents VBF events while blue represents background events.

At each stage, groups become more purely signal or background by splitting on some variable.

The metric that is normally minimized for each split is the Gini impurity of the current group of

events. It is defined as the probability of incorrectly labelling a random event in the group based on

the known distribution of signal and background within the group. For a binary classification problem,

the Gini impurity for a group of events is given by the following formula:

NO/ P87Q C 0 B !87Q + PAQ C 0 B PAQ

Unlike a cut based analysis, which can only form rectangular signal regions in the variable phase

space, decision trees can be grown to approximate arbitrarily complex decision functions. However,

decision trees, too, are not without their flaws. The intuition of a cut based analysis is lost since the

splits are generated algorithmically. Additionally, it is very easy to grow a tree that is too deep that

begins to train itself to recognize individual points in the training data, becoming artificially complex.

This phenomenon is well known in the field of machine learning, and is commonly known as

“overtraining”. To address this issue, a technique known as boosting is performed as opposed to older

“pruning” methods which grow full decision trees then backtrack and discard unimportant splits.

2.5 Adaptive Boosting

Adaptive boosting, or AdaBoost, is a general method that can be applied to a number of

classifiers, such as decisions trees, to improve reliability, performance, and resistance to

overtraining. In the context of adaptive boosting of decision trees, the single decision tree is replaced

by a “forest” consisting of hundreds of decision trees which are restricted to only a few levels, such

as the one above. As a whole, this forest of decision trees is called a boosted decision tree (BDT),

and the output of the BDT is a weighted sum of the outputs of each individual tree.

Each individual decision tree is called a “weak learner” in the sense that it is only one of many

classifiers in the forest. Here is where the adaptive boosting comes in; each weak learner is trained

iteratively to improve upon the previous one. The first weak learner is trained as a normal decision

tree from the training data. However, the results of the first weak learner are then used to weight the

importance of the training data for the next weak learner; points that were classified correctly receive

small weights while incorrectly classified points receive large weights. In this way, the next weak

learner is trained focusing on points that have not been classified well by the previous weak learner.

This process continues such that each weak learner focuses on correcting mistakes of the last,

improving at each step. The process is visualised below.

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

7

Figure 6 A view of the transverse plane depicting the collinear

approximation. The tau neutrinos go collinearly with the tau leptons

such that their sum matc hes the missing transverse energy.

Figure 5 Training of an AdaBoost classifier. The first classifier trains on unweighted data, then

reweights the data for the next and so on to produce the final classifier.

2.6 Discriminant Variables

When training a BDT, a balance should be found between the number of variable inputs to the

BDT and the performance of the BDT. Additionally, while BDTs are known to handle correlated

variables quite well, it is superfluous to include two strongly correlated variables, only one of which

adds discriminatory power to the classification.

Much of my work this summer was spent investigating variables, both common and newly

devised, to search for new discriminating variables for use in a multivariate analysis. The most

important in the analysis was the ditau mass, calculated via the collinear approximation.

2.6.1 Collinear Approximation

In the case of VBF, the mass of the ditau should correspond to the mass of the Higgs, for Z*''

the mass of the Z boson, and for && we expect no clear peak. Thus, there are good physical motivations

for the use of the ditau mass in our MVA. However, in order to fully reconstruct the ditau one needs

the missing neutrinos. The collinear approximation accounts for the missing neutrinos by making the

following assumptions.

1. The tau neutrinos are perfectly collinear with their associated tau lepton.

2. The missing transverse energy is entirely due to the tau neutrinos.

Under these approximations, the magnitude

of the neutrino momenta becomes completely

determined by the missing transverse energy.

One is then left with a simple matter of

constructing the neutrinos collinearly with the

taus such that the sum of the neutrinos is

precisely the missing transverse energy.

The collinear approximation is not always

applicable; when the tau leptons are emitted

back to back in the R plane, it is impossible to

reconstruct the missing transverse energy.

This leads to a simple constraint between taus:

STU VR W BXYZZ

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

8

Historically, the collinear approximation has relied upon using the charged decay products of the

tau leptons, be it either 1-prong or 3-prong decays. However, the decay products may also include a

neutral pion. Recently, tau substructure algorithms have become available that allow for reconstruction

of the entire visible (charged + neutral) tau [5]. One of my first studies was on the marked improvement

in the collinear approximation as a result of using the entire visible tau.

Figure 7 The collinear approximation using the charged tau leptons (left) and the full visible tau leptons (right).

The blue histograms represent VBF and red represents combined backgrounds scaled appropriately. All

distributions normalized to unity, and units are in GeV.

As you can see, there is a remarkable improvement using tau substructure techniques to

reconstruct the visible tau. In future studies, I suggest applying smearing of the transverse momentum

or otherwise modelling imprecision in the detector to see if the collinear approximation remains as

robust as it is in this truth study. Needless to say, this variable made it to the final MVA.

2.6.2 Tau Centrality Product

In the context of VBF topology, centrality has been used as a flag indicating whether or not a tau

lepton is centrally located in the detector with respect to the jets. Explicitly, a tau lepton is central if

its pseudorapidity lies in the range spanned by the leading and subleading jet. To generalize this

binary variable to a continuous variable, which is more powerful in multivariate analyses, the

following definition has been suggested [6].

[;\4]9 B!;B !>6Q

V!

^

2222_`4a422222!>6Q \!<=>? + !8@A<=>?

b222M2222 V! \ !<=>? B !8@A<=>?

A perfectly central tau lepton (with exactly the average ! of the jets) will have a centrality of one,

while a tau lepton far from the average ! of the jets will have centrality close to zero. Note that if the

jets are not well separated in !, the centrality also approaches zero.

The authors of this continuous centrality variable used the centrality of the two taus as independent

variables. However, I found the two variables to have an 88% positive correlation for VBF. By taking

the product of the two tau centralities, a single uncorrelated variable is achieved with greater

separation power than either of the individual centralities.

[cde? \ [;fC [;g/4]9 B!;fB !>6Q

V!

^

B!;gB !>6Q

V!

^

Collinear Approximation Ditau Mass (Charged)

0 20 40 60 80 100 120 140 160

Events

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Collinear Approximation Ditau Mass (Visible)

0 20 40 60 80 100 120 140 160

Events

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

9

Figure 8 The centrality of the individual tau leptons (left and centre) vs. the product of tau centrality (right).

Given the redundancy of the correlated variables and increased separation power of the product

variable, it was the centrality product variable that made it to the final multivariate analysis.

2.6.3 h Variables

Variables explicitly related to the pseudorapidity of the leading and subleading jets are common in

analyses of the VBF Higgs, including the cut based analysis already presented. On the surface, these

variables seem well suited to multivariate analysis as well given their separation power. However, I

found that these traditional VBF variables are highly correlated with the invariant mass of the jets.

Figure 9 V! (centre) and !<=>? C !8@A<=>? (right) of the leading and subleading jets, along with their correlations to

the invariant mass of the jets (left).

Given the strong correlations within this group of variables, I was not surprised to find that

eliminating V! and !<=>? C !8@A<=>? from the MVA led to no decrease in performance of the BDT. The

invariant mass of the jets displayed the greatest separation power (see figure 11), thus, despite their

prevalence in traditional VBF studies, I have chosen to exclude V! and !<=>? C !8@A<=>? from the final

analysis.

2.6.4 Tau-Jet Angular Correlations

The Higgs boson is a spin 0 particle; Z bosons are spin 1 particles. My Ph.D. supervisor and I were

interested in whether or not this difference in spin quantum number manifests itself in angular

correlations between the tau leptons themselves or between tau leptons and the leading and

subleading jet. A number of variables were investigated, boosted into different reference frames,

probing any angular correlations.

Tau 0 Centrality

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Events

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Tau 1 Centrality

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Events

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Tau Centrality Product

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Events

0

0.05

0.1

0.15

0.2

0.25

0.3

Jets dEta

0123456789

Events

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Jets Eta Product

15−10−5−0 5 10

Events

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

10

Jets Plane / Taus Plane Angle

0 0.5 1 1.5 2 2.5 3

Events

0

0.005

0.01

0.015

0.02

0.025

0.03

Jets Plane Eta

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

Events

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Selected Angular Variables

Taus Vi The Vi separation of the two tau leptons.

Taus R Centrality The same as the continuous tau centrality

variable, but in R instead of !.

Jets-Taus Plane Total angle between the two planes formed by

Angle the tau leptons and the jets.

Jets Plane ! ! of the normal vector to the plane formed by

the two jets.

The angular relationships amongst the tau leptons and jets, beyond the expected VBF jet topology,

seems to be subtle if existent at all. While the Vi of the taus above shows modest separation, inclusion

in the MVA yielded no improvement, and unfortunately the angle between the tau plane and jet plane

seems indifferentiable between VBF and background. Boosting to various center of mass reference

frames generally had little effect on separation power.

2.6.5 Fox-Wolfram Moments

The Fox-Wolfram moments are a set of event descriptors that are currently under investigation for

use in replacing traditional cuts with these more advanced metrics [7]. The moments arise from

superpositions of spherical harmonics, defined as follows.

j

7ME

kl/

Above, the sum goes over any number of objects in the event (such as the leading and subleading

jet for the VBF topology), m7ME corresponds to the total angle between the i’th and j’th objects, and n<

are the Legendre polynomials. The weight term j

7ME

k may take many forms, as given above.

A preliminary study of the Fox-Wolfram moments in the analysis of VBF has shown that the

moments display considerable separation power, however, when included in the multivariate analysis

have not improved the classification efficiency. Included below are plots of two sets of Fox-Wolfram

moments. On the left, only the leading and subleading jets were considered, and the best weight was

found to be the unit weight. On the right, both tau leptons are also included as objects into the moment

calculations, for which the transverse momentum weighting scheme was found to be best.

Tau 1 Phi Centrality

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Events

0

0.1

0.2

0.3

0.4

0.5

Taus dR

0.5 1 1.5 2 2.5 3 3.5

Events

0

0.01

0.02

0.03

0.04

0.05

0.06

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

11

100−

80−

60−

40−

20−

0

20

40

60

80

100

ditauMass

mjj

sumPT

PTsum

tausCentrality

ditauMass

mjj

sumPT

PTsum

tausCentrality

Correlation Matrix (signal)

100

100 -12 26 42

-12 100 21 -28

26 21 100 -2

42 -28 -2 100

Linear correlation coefficients in %

D

;f(;g

!

!

DEFG HI(EJKL FGHI

[;fC [;g

9$

<=>? + 9$

8@A<=>?

9$

<=>?(8@A<=>?

100−

80−

60−

40−

20−

0

20

40

60

80

100

ditauMass

mjj

sumPT

PTsum

tausCentrality

ditauMass

mjj

sumPT

PTsum

tausCentrality

Correlation Matrix (background)

100 2 2

100 19 38 39

2 19 100 35

2 38 35 100 -2

39 -2 100

Linear correlation coefficients in %

Figure 10 The first four Fox-Wolfram moments considering only jets, with a unit weighting (left). The first four

Fox-Wolfram moments considering jets and tau leptons, with transverse momentum weight (right).

While only the first four moments are displayed here for brevity, the odd and even moments

are highly correlated though distinct. Unfortunately, my time has run short to fully investigate

the Fox-Wolfram moments as potentially useful discriminating variables in the multivariate

analysis. For future studies, I would suggest to explore the “modified” Fox-Wolfram moments

which are invariant to Lorentz boosts, and explore any correlations that may exist between the

moments and the MVA variables already in use.

2.6.6 MVA Variables

The final list of variables for use in the multivariate analysis was pruned down starting with roughly

ten variables that showed the strongest separation power. After identifying correlations and removing

variables that led to no improvement in classification efficiency, the following variables remain in the

final analysis.

The invariant mass of the ditau, reconstructed via the collinear approximation using

the full visible tau leptons.

The invariant mass of the leading and subleading jets.

The product of the centrality of the two tau leptons.

The scalar sum of the transverse momenta of the leading and subleading jets.

The transverse momentum of the vector sum of the leading and subleading jets.

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

12

Figure 11 Discriminatory variables for the multivariate analysis. The blue histograms represent VBF and red

represents combined backgrounds scaled appropriately. All distributions normalized to unity, masses and

momenta are in units of GeV.

2.7 TMVA Multivariate Analysis

This multivariate analysis was performed at a centre-of-mass energy of . / 012345 and at an

integrated luminosity of 20 inverse femtobarns, corresponding roughly to current Run II conditions at

the LHC. The ROOT analysis framework (or my preference, the python adaptation PyROOT) provides

a toolkit for multivariate analysis known as TMVA [8]. This toolkit was utilized to train a boosted

decision tree using the discriminant variables presented in section 2.6.6. I was interested in comparing

the performance of TMVA with the well-known python machine learning library Scikit Learn. To this

end, a boosted decision tree was optimized in TMVA and compared with an identically parameterized

boosted decision tree trained in Scikit Learn.

Optimization of the BDT parameters in TMVA was

performed by performing single scans over parameters

like the number of trees or tree depth. A full multivariate

sweep over parameter settings and variables was simply

too computationally timely and out of the scope of this

project. Should one like to take this analysis to the next

step, I would recommend performing such a multivariate

sweep over BDT parameter settings. The final

configuration of the BDT parameters that were found to be

important are given to the left.

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

13

When training and testing any multivariate method, one must be careful to weigh the training data

correctly; while we have a similar amount training data for both the VBF and background processes,

in reality the number of background events is much larger than the number of signal events. Thus a

weight needs to be applied to events from each process to correct for their relative abundance.

j / opqrY

stu/spqrYu

stu where sv" is provided by the Monte Carlo sample.

Cross sections were determined for each Monte Carlo sample from the TWiki cross section

summaries of the MC15 samples for Run II analyses. Given these cross sections and an integrated

luminosity of 20 inverse femtobarns, the expected number of events may be calculated. Additionally,

the percentage of events that pass the preselection criteria presented earlier may be calculated per

sample, and then applied to determine the expected number of events after preselection.

Process

Cross Section 9w)x

Events at s / bXyw)x

Events (Preselected)

VBF

ZYZZ1Zz0 C0X)^

1,999

398

Z*{{

0YZ|X}1b C0X~

39,012,642

1,148,098

••

zY|0|Z0| C0X^

9,031,830

26,394

As was expected from eliminating b-tagged jets, the && background is more than decimated, leaving

Z*'' as the main background. Roughly speaking, the signal to combined background ratio is a

staggering 01XXX!

The metric for defining the optimal cut value of the classifier is the statistical significance defined

as follows, where “s” is the number of signal events and “b” the number of background events. For a

Poisson random variable, the standard deviation is defined as the square root of the total number of

events, . + %. Then, the following statistical significance measures the ratio of signal events relative

to one standard deviation.

€•‚•ƒU•ƒS‚„2€ƒ…†ƒyƒS‚†S4 l/ .

. + % ‡.

%2yTa2w ˆ U

Thus, this definition of the statistical significance can either be interpreted as the number of signal

events relative to one standard deviation or, if b is much larger than s, as is usual, the number of signal

events over the background fluctuation level.

The TMVA output classifier along with the optimal cut value after training a boosted decision tree

using the parameters given above is shown below.

BDT response

0.15−0.1−0.05−0 0.05 0.1 0.15 0.2

dx

/

(1/N) dN

0

2

4

6

8

10

12

14

16

18 Signal (test sample)

Background (test sample)

Signal (training sample)

Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0.008 (0.016)

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

TMVA overtraining check for classifier: BDT

Cut value applied on BDT output

0.15−0.1−0.05−0 0.05 0.1 0.15 0.2

Efficiency (Purity)

0

0.2

0.4

0.6

0.8

1

Signal efficiency

Background efficiency

Signal purity

Signal efficiency*purity

S+BS/

For 398 signal and 1174098 background

isS+Bevents the maximum S/

7.9024 when cutting at 0.1453

Cut efficiencies and optimal cut value

Significance

0

1

2

3

4

5

6

7

8

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

14

The final statistical significance of the classifier reaches 7.9, albeit the significance curve becomes

noisy most likely due to statistical fluctuations with such heavily weighted background events. By any

interpretation, the statistical significance can be said to be roughly 6 at minimum. The full interpretation

of the outcome will be discussed in the conclusion.

2.8 Scikit Learn Multivariate Analysis

Scikit Learn (SKL) is a free, general machine learning library for python [9]. Given its popularity

and ease of use, I was interested to see how SKL compares to TMVA in terms of final classifier

efficiency, ease of use, and configurability.

SKL supports all of the machine learning methods implemented by TMVA and many more, and in

the case of boosted decision trees supports many of the same configuration options. However,

unlike TMVA, SKL does not directly provide the user with plots (classifier output distributions,

optimum cuts, correlation matrices) via a nice GUI. Code had to be written to randomize training and

test samples, for viewing the output classifier distribution, for calculation of the maximum statistical

significance, and other tasks.

For a direct comparison of TMVA and SKL, a boosted decision tree was trained in SKL with

identical parameters as was done for TMVA. The resulting output classifier is given below.

Max. Statistical Significance: 3.5

SKL performed worse in many regards. As

can be seen by the shape of the output

classifiers, there exists much more overlap

between signal and background even when

trained identically to TMVA, leading to roughly

only half the statistical significance, seen as

the green line, not to scale, that was achieved

by TMVA. Additionally, SKL took almost five

times longer to train the BDT.

3. Conclusions

3.1 Outlook for VBF Higgs Analysis

Overall, the development of a multivariate analysis for the detection of a VBF Higgs boson

decaying to a pair of tau leptons with subsequent hadronic decays was quite successful. A theoretical

basis was developed to understand the signal process and main backgrounds at play. With only a few

basic preselection cuts, the vast majority of && background was eliminated, leaving the Z*'' process

as the main background. From knowledge of the underlying physics, a number of candidate

discriminant variables were explored for use in the multivariate analysis. Deserving of special attention

is the reconstructed ditau mass using the collinear approximation, which has shown very promising

improvements in mass resolution with the introduction of tau substructure reconstruction algorithm.

Some of the variables typically associated with vector boson fusion, such as the distinctively large

separation in pseudorapidity of the leading and subleading jet, were found to be highly correlated and

did not make it into the final analysis. Both TMVA and Scikit Learn were used to train boosted decision

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

15

trees; TMVA provided faster results with better classification power, and a convenient interface for

producing plots. The final statistical significance of the VBF signal reached 7.9.

Many aspects of the study, including the final statistical significance, must be kept in context. First

and foremost, all aspects of this study were calculated on purely the truth level, no trigger level effects

were accounted for, no detector effects beyond simple preselection cuts on pseudorapidity ranges

accounted for, and no reconstruction level effects were considered. These effects may pose important

effects that should be taken into account in further analyses. Additionally, every algorithm, in particular

the b-tagging, tau ID and tau substructure algorithm, has an associated efficiency. On truth level, these

efficiencies are not modelled and will further decrease performance on the reconstruction level.

Nevertheless, I hope that this multivariate analysis serves as a useful proof of concept for a full scale

multivariate analysis in which all of the above issues are addressed. Finally, I hope this study has

provided insight into the nature of the vector boson fusion production pathway of the Higgs and into

associated variables that may be used in the analysis.

3.2 Suggestions for Future Studies

The collinear approximation performed surprisingly, perhaps suspiciously, well once the entire

visible tau was used as opposed to the charged tau products. It is possible that the collinear

approximation is in fact a valid approximation much of the time, however, I have strong suspicions that

it will not work as well on reconstructed data. One way this could be studied still within a truth study is

by “smearing” (adding zero mean Gaussian noise) to the transverse momentum of all objects in the

event to simulate reconstruction inaccuracy and observe how well the collinear approximation holds

up. Additionally, one could test just how collinear the neutrinos are with their respective tau leptons

explicitly by studying the Vi between the neutrino and tau on the truth level.

While there were over 750,000 Z*'' events, and over 6,000,000 && events, in the Monte Carlo

samples, only about 40,000 total background events survived preselection cuts, then only half of those

events were used to train the boosted decision tree while the other half was used for testing. In

comparison, over 300,000 VBF events make it past preselection to the multivariate analysis stage.

Although the initial number of events is very large for the background processes, I could have actually

used far more while training the BDT. For further Monte Carlo studies, I would suggest increasing the

statistics at least for the Z*'' background to at least a couple millions of events to ensure that enough

events make it past preselection to the BDT training.

The Fox-Wolfram moments have shown promising separation power, and may be very powerful

given a correct tuning to the VBF topology. In this study, moments calculated using just the leading

and subleading jet were experimented with in addition to a few studies using both the jets and the two

tau leptons. Further analyses may explore different combinations of objects to use in the moments,

perhaps even a third jet or no jets at all, in addition finding the optimal weighting term to use.

Additionally, there exist modified Fox-Wolfram moments that are invariant to Lorentz boosts which

may provide more clear results. In any case, it will need to be demonstrated the Fox-Wolfram moments

provide new information about the event that is not contained in the five variables presented for the

analysis in this study if they are to be useful in a multivariate analysis.

3.3 Thanks!

I can’t express my gratitude enough for the opportunity to study here in Göttingen for the

summer, it has been an eye opening and truly enjoyable experience to live abroad and get a taste of

particle physics. To everyone within the institute, thank you for your kindness and help over the

summer; you’re all brilliant physicists and even better people. Finally, I have to thank my Ph.D.

student supervisor Antonio De Maria for organizing a great project for me to work on, for his help

whenever it was needed, and his fantastic taste in music.

Multivariate Analysis of the Vector Boson Fusion Higgs Boson

16

References

[1] “Test of CP Invariance in vector-boson fusion production of the Higgs bson using the Optimal

Observable method in the ditau decay channel with the ATLAS detector”.

arXiv:1602.04516v1

[2] K.A. Olive et al. (Particle Data Group), Chin. Phys. C, 38, 090001 (2014).

[3] “Search for the %% decay of the Standard Model Higgs boson in associated (W/Z)H

production with the ATLAS detector”. arXiv:1409.6212v2

[4] “Prospects for the Search for a Standard Model Higgs Boson in ATLAS using Vector Boson

Fusion”. arXiv:hep-ph/0402254v1

[5] “Reconstruction of hadronic decay products of tau leptons with the ATLAS experiment”.

arXiv:1512.05955

[6] “Evidence for the Higgs-boson Yukawa coupling to tau leptons with the ATLAS detector”.

arXiv:1501.04943

[7] “Fox-Wolfram Moments in Higgs Physics”. arXiv:1212.4436

[8] A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E. von Toerne, and H. Voss, TMVA -

Toolkit for Multivariate Data Analysis, PoS ACAT 040 (2007), arXiv:physics/0703039

[9] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.