Content uploaded by Adel. Salah Khatab
Author content
All content in this area was uploaded by Adel. Salah Khatab on Dec 07, 2018
Content may be subject to copyright.
Principles of statistical analysis
Prof. Adel Salah Khattab, Ph.D
Prof. of Animal Breeding, Faculty of Agriculture, Tanta university,
Egypt
!
Email : adelkhattab@yahoo.com
!
What is the aim of use personal Computer (PC) in
stascal analysis?
•1- Input data ( results of thesis, Master or Ph.D and results of
scien!c paper) and save these results by using some
programs such as Excel.
•2- Analysis these data by using some stascal programs
such as
•Mintab
•Matlab
•SPSS
•and SAS ( Stascal analysis System).
Introduction
! "
#$
%
&
$
'
$
&
(
!)*(+)#
%
In this lectures, I will give an idea about the basic
principles of statistics as:
,-
,-
,-
,) !#
, )
%
A – Measures of central tendency
.,
The arithmetic mean (ȳ
)
.
/
0
11
2
n
y
n
y
n
ynyyy
y
iii
n
0/.
Important properties of the arithmetic mean are :
The sum of deviations from the arithmetic mean is equal to zero
oyy
i
#!
The sum of squared deviations from the arithmetic mean is smaller than the sum of squared
deviations from only other value.
Min.
/
#! yy
i
-
i
ii
w
w
yw
y
/
.n
#.
/
!
/
nn
The arithmetic mean for grouped is
2-The Mode of a sample of n observations y1, y2, ……..yn is the
value among the observations that has the highest frequency.
3-The Median
Median is the median value in the sample after arrange the
numbers of sample from the lowest to the highest or from the
highest to the lowest.
If the number of sample is single the median center equal
and if the sample is pair the median
center between
4-Geometic mean (G) is the multiple values of sample
Example 1. The following salary of 7 person per day by Egyptian pound (LE).
Calculate, mean, mode and median
45 50 50 50 55 55 60
n
n
n
yyyynyyG
.
/./.
#!
.34/
5
064
5
476744
n
y
y
n
i
i
Mode = 50
Median
Arrange the sample from the lowest to the highest
45-50-50-50-55-55-60
Center of median
/
.n
3
/
.5
The median is 50
•
•
•
•
•
•
•
•
•
•!
•"!
•
•#$$
•%
&'( )*+),,"
&'-./#
$0$+(
/--$$-$-$
"1"!
23Measures of variability
!
-!#
#
&
(
&
1- &
2- & !
/
# !#
.
/
111
:
&"
"8"
&"2
9∑!,
,
#/
.
#!#!#!
//
/
/
.
/
n
yyyyyy
S
n
.
#!
/
n
yy
ii
/
/
#!
n
y
ySSyy
ii
ii
1- & !#""
/
s
= S
49&!:#"
SE = S/√n
2- & !;+<#
$2 CV
.77X
x
S
The coefficient of variability is used when compared among sets of data that have different
units, also used it when evaluate the results of experimental.
The coefficient of variability is used when compared among sets of
data that have different units, also used it when evaluate the results
of experimental.
Example 2: If the salary /hour for four persons (L.E) were
as shown in the following table. Calculate mean, minimum,
maximum, median, range, variance, standard deviation,
standard error and the coefficient of variability.
/
#! yy
i
yy
i
/
i
y
i
y
2.25
6-7.5=-1.5
36
6
A
0.25
8-7.5=0.5
64
8
B
0.25
7-7.5=-0.5
49
7
C
2.25
9-7.5=1.5
81
9
D
5
0
230
30
il
Mean = 30/5 = 6 L.E
A-Range 6-9=3 L.E
3
#07!
/07
#!
/
/
/
n
y
ySS
ii
ii
= 230 – 225 = 5
/
#! yySS
ii
B – Variance S
2
65.
0
4
.3
4
.
#!
/
/
n
yy
S
ii
65.
.3
3
#07!
/07
.
#!
/
/
/
n
n
y
yS
ii
ii
C – Standard deviation (SD)
ELSSD /=.65.
/
D – Standard error (SE)
6347
/
/=.
3
/=.
n
SD
SE
E- Variance of variability
.77
45
/=.
<X
y
SD
CV
</.5
/#4&/
!
1
"
,
#$55
%
6'(
/7+8)7$7$$7$75$9
7)07:$85$+$$075$+;$'$<
The SAS System 11:36 Saturday, March 1, 1997 1
The MEANS Procedure
Analysis Variable : salary
N Mean Min Max Variance Std Stderr coeff. of var.
4 7.500 6.00 9.00 1.67 1.29 0.65 17.21
Correlation
The items
•Simple correlation
• Partial correlation
•Rank correlation
a-Simple correlation coefficient or Pearson correlation ( r)
%
The coefficient of correlation measures the strength of the linear
relationship between two variables. There is no independent and
dependent variable. The correlation is used when there is interest in
determining the degree of association among variables, but when
they cannot be easily defined as dependent or independent. For
example, we may wish to determine the relationship between weight
and highest, but do not consider one to be dependent on the other.
The coefficient of correlation (r) is defined:
yx
r
xy
xy //
Where:
; >?
+>
+?
•Values of the coefficients of correlation range between -1
and 1 . The positive correlation means that as values of
one variable increase, increasing values of the other
variable are observed. A negative correlation means that as
values of one variable increase, decreasing values of the
other variable are observed.
•In Fig 2.1, a positive correlation between x and y is evident
in a), and a negative correlation in b). There is no clear
association between x and y in c) and there is an
association but it is not clear linear in d).
Fig. 2.1 (a) positive correlation, (b ) negative correlation, (c) no clear association
between x and y in c and (d) an association but is not linear.
Estimation of the coefficient of correlation and tests of
Hypotheses
2.1 Sample coefficient of correlation
!The coefficients of correlation is estimated from a random
sample by a sample coefficient of correlation (r)
n
y
y
n
x
x
n
yx
xy
SSSS
SS
r
yyxx
xy
xy /
/
/
/
#!#!
#!
@2
#,#!,!
>?
//
#,!
">
/
/
#,!
"?
*
)
"
!kas and Lamberson, 2009)
%
9A.,/B!,#!Kaps and Lamberson, 2009#
@/"
%
C/9/9/B
& /7.
Test of significant simple correlation
&
r 2
727D
72D
&!D
7
#
&
$D
&"
9
A,/BA.,
/
9
B9B
9
A,/BA.,
/
9
B9B
Example: Is there a linear association between weight and highest in a sample of
five Children, Weight was measured in kg and highest in cm. Calculate the simple
correlation ( r).
!
Child Height, cm Weight, kg
1 130 35
2 140 40
3 125 30
4 120 30
5 135 35
N X , kg Y, cm
1 35 130
2 40 140
3 30 125
4 30 120
5 35 135
Total
∑X= 170 ∑Y=
650
∑X2 = 5850
∑Y2
=8475
0
∑XY= 22225
n 5
1 Calculate the simple correlation between X and Y and
estimate standard error of the simple correlation
!
2- Calculate value of the t of significant
3- Re calculate the simple correlation by using SAS
program.
!
1- Estimate of correlation ( r)
=347=33E7
/=.0/
./4
E...4065E
./4
/4757
./4
E3477E354745E74E47
//.77////4
4
647
E3547
4
.57
4E47
4
647#.57!
////4
#!#!
#!
//
/
/
/
/
X
X
n
y
y
n
x
x
n
yx
xy
SSSS
SS
r
yyxx
xy
xy
2- Estimate of standard error of
( r)
.E=7=33E7:
.E=7704E7
0
.7537
0
E=/67.
/4
7=33E,.
/,
,.
/
/
t test
9A,/BA.,/97=46A0BA.,!7=467#/93==/0
&3==/0/044<
3B43 . < &
.<
3- Calculate ( r) by using SAS Program
•)
'
$FF'
;'
.0704.3737./407./707.0404'
'
+'
C'
SAS output
The SAS System
17:12 Saturday, March 1, 1997 1
The CORR Procedure
2 Variables: height weight
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
height 5 130.00000 7.90569 650.00000 120.00000 140.00000
weight 5 34.00000 4.18330 170.00000 30.00000 40.00000
Pearson Correlation Coefficients, N = 5
Prob > |r| under H0: Rho=0
height weight
height 1.00000 0.94491
0.0154
weight 0.94491 1.00000
0.0154
%
Explanation:
&
!#7=34&P 77.4774&
>?4<
b – Partial correlation
$
G
&2
/
/
..
#H!
(Kaps and Lamberson,2009)
@2I
Test of significant
&
p
/
7
.
0,
72D
72D
Example 2. ;!J#
!>#!#&
&
%
N height, cm Weight, kg Age, mo
..0704.77
/.3737.74
0./407=4
3./707=6
4.0404.7/
%
%
SAS Program
%
'
$D@)'
;'
.0704.77
.3737.74
./407=4
./707=6
.0404.7/
'
'
+D@'
C'
%
D@
%
%
'
+D@'
)'
C'
%
%
)
%
%
7E20/-..==5.
%
&;(CC
%
2 Variables: height weight
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
!
height 5 130.00000 7.90569 650.00000 120.00000 140.00000
weight 5 34.00000 4.18330 170.00000 30.00000 40.00000
!
!
Pearson Correlation Coefficients, N = 5
Prob > |r| under H0: Rho=0
!
height weight
!
height 1.00000 0.94491
0.0154
!
weight 0.94491 1.00000
0.0154
The SAS System 08:32 Saturday, March 1, 1997 2
!
The CORR Procedure
.+2
/+2
+*- --+
4=4377536==335577E477.7/77
4.07775=746=64777./777.3777..74.0500/306
403773.E007.57770777377734==56/.335.
;;*94
KLLD72C97
.7777776E/E4
70.5.
76E/E4.77777
70.5
Explanation: &
!7=33#
P!77.4#!M774#&
&!76E#
& 8 ) P
!70.5.#77.438@
>? J&
P!70.5.#
C-Spearman correlation or Rank correlation ( r
s)
$
& "
Henderson (1967)
.,
6
,.
/
/
@299
:2&
;)
.
/
0
3
4
!>#
G
!/#
:
!.7#
N
!5#
+
N
!E#
!4#
:
!?#
!4#
+
N!E#
N
!6#
:
!=#
N
!5#
)
!:#
N Statistical Arrange
(x)
Economic
(y)
Arrange
(y)
Differenc
e (X-Y) =
d
d2
∑
1215100
2 5 2 7 3 -1 1
3736211
4 8 4 9 5 -1 1
5 10 5 8 4 1 1
Σ x = ∑y = 15 Σ x*y = 53 Σx2= Σy2 =55 d=0∑4
E7/7.
#./7B/3!.
./44
36
.
.,
6
,.
/
/
Another solution
%
9BA-?AO
9944,!.4#/B4944,349.7
940,349E
9EBA.7A.797E7
%
):C;
&)!>#
!?#
%
%
)'
'
$*P&>?FF'
)&)O$*:'
/44556E=.7E
'
9:)C-)*'
+>?'
C'
%
.9(The SPEARMAN opon computes the rank correlaon.
The VAR statement de!nes the variable between which the correlaon
is computed.
The SAS System 08:09 Tuesday, February 25, 1997 1
The CORR Procedure
2 Variables: x y
Simple Statistics
Variable N Mean Std Dev Median Minimum Maximum
x 5 6.40000 3.04959 7.00000 2.00000 10.00000
y 5 7.00000 1.58114 7.00000 5.00000 9.00000
Spearman Correlation Coefficients, N = 5
Prob > |r| under H0: Rho=0
x y
x 1.00000 0.80000
0.1041
y 0.80000 1.00000
0.1041
•Explanation: &
!#
7E7&p 7.73
774 &
Thank You