PresentationPDF Available

Authors:
Presentaon Prepared by Saba Awan, February 14, 2022
Presentation Edited
by
Saba Awan
(FA19-RSE-041)
Department of Computer Science
(Date: 14/2/2022)
Under the Supervisor of
Department of Computer Science
1
Introduction
surfaceto separate measurements of two or more classes of objects or events. It is a more
general version of thelinear classifier.
Statistical classificationconsiders a set ofvectorsof observationsxof an object or event, each
of which has a known typey. This set is referred to as thetraining set. The problem is then to
determine, for a given new observation vector, what the best class should be.
The standard form of a quadratic is
Quadratic discriminant analysis allows for the classifier to assess non -linear relationships.
QDA assumes that each class has its own covariance matrix.
The predictor variables are not assumed to have common variance across each of thek
(class)levels inY (label).
Presentaon Prepared by Saba Awan, February 14, 2022
2
QDA Derivation (1/12)
Presentaon Prepared by Saba Awan, February 14, 2022
3
Step 1:
Given a training dataset ofNinput variablesxwith corresponding target variablest
The QDA model assumes that theclass-conditional densitieshave “Gaussian
distribution”
(1)
whereμis theclass-specific mean vectorandΣ is theclass-specific covariance matrix.
Step 2:
Calculate posterior by using Bayes' theorem,
(2)
classifyxinto class
(3)
QDA Derivation (2/12)
Presentaon Prepared by Saba Awan, February 14, 2022
4
Step 3:
Assuming the data points are drawn independently (4), the likelihood (conditional
probability) function can be defined as,
(4)
(5)
where, tdenote all our target variables, andπthe prior with a subscript denoting the class.
To simplify let θwillbeequaltoallclasspriors,class-specific mean vectors, and
covariance matrices
And Maximizing the likelihood is equivalent to maximizing the log-likelihood
θ=
The Eq (5) will become
(6)

QDA Derivation (3/12)
Presentaon Prepared by Saba Awan, February 14, 2022
5
By simplifying the natural log, rules are as follows.
QDA Derivation (4/12)
Presentaon Prepared by Saba Awan, February 14, 2022
6
Step 4:
By simplifying the log, by using log rules:
Explained on previous slides
(7)
QDA Derivation (5/12)
Presentaon Prepared by Saba Awan, February 14, 2022
7
Step 5:
Expanding Eq (7), based on Gaussian distribution
(8)
Step 6:
As the goal is to find the maximum likelihood solution for the class-specific priors,
means, and covariance matrices.
The derivation of Eq (8) will be solved for class-specific priors, means and covariance
matrices:
Derivative of class-specific prior
QDA Derivation (6/12)
Presentaon Prepared by Saba Awan, February 14, 2022
8
Constraint: The prior probability at max will be 1.
To maximize the likelihood with constraint optimization the Lagrange multiplier will
help.
(9)
Step 7:
Putting the value of ln P ( t| ) from Eq (8) and takin the derivative w.r.t πc class specific
prior

QDA Derivation (7/12)
Presentaon Prepared by Saba Awan, February 14, 2022
9
Step 8:
Constraint: The prior probability at max will be 1.
Substitutingλ=Nback into (10) gives us
(10)
(11)
Eq (11) tells thatthe class prior is simply the proportion of data points that
belong to the class

QDA Derivation (8/12)
Presentaon Prepared by Saba Awan, February 14, 2022
10

Step 8:
Now maximizing the log-likelihood with respect to class specific mean uc.
To solve the derivative the identity property is used.
(12)
QDA Derivation (9/12)
Presentaon Prepared by Saba Awan, February 14, 2022
11

Step 8:
tnc is only equal to 1, for the data points that belong to class cc .
1.The sum of L.H.S of Eq (12) only includes input variables x which belongs to cc.
2.On L.H.S dividing that sum of vectors with the number of data points in the class Nc ,
which is the same as taking the average of the vectors.
3.The class-specific mean vector is just the mean of the vectors of the class.
QDA Derivation (10/12)
Presentaon Prepared by Saba Awan, February 14, 2022
12
Step 9:
To maximize the log-likelihood with respect to the class-specific covariance matrix,
1.The derivative w.r.t Σc
2.By following the derivative identity:
3.Property of a trace product:
4. Matrix calculus identity:
QDA Derivation (11/12)
Presentaon Prepared by Saba Awan, February 14, 2022
13
The class-specific covariance matrix is just the covariance of the vectors of the
class.
(13)
QDA Derivation (12/12)
Presentaon Prepared by Saba Awan, February 14, 2022
14
(14)
Step 10:
Lastly, From Eq (11), (12) and (13):
where argmax is an operation that finds the argument that gives the maximum value
from a target function
Conclusion
15
Presentaon Prepared by Saba Awan, February 14, 2022
QDA assumes thateach class follow a Gaussian distribution.
Theclass-specific prioris simplythe proportion of data points that belong to the class.
Theclass-specific mean vectoris theaverage of the input variables that belong to the
class.
Theclass-specific covariance matrixis just thecovariance of the vectors that belong to
the class.
References
16
Presentaon Prepared by Saba Awan, February 14, 2022