Content uploaded by Priyabrata Panda
Author content
All content in this area was uploaded by Priyabrata Panda on Sep 28, 2020
Content may be subject to copyright.
Crafting a Comprehensive Research Methodology Chapter of a Thesis
or Dissertation with Secondary Data
Dr Priyabrata Panda1
Dr Kishore Kumar Das2
Dr Malay Kumar Mohanty3
Abstract
Research Methodology chapter is an important component of every research thesis, project or
article. This segment lays down road map for research. In this part, data are analysed,
econometric and statistical tools are used by considering different variables. The process of
research is summarised here. The present study proposes a comprehensive research
methodology process for social science research. A topic is selected and its research
methodology is outlined. This study will help other social science researchers in developing
their methodology.
Keywords: Research Methodology, Variables, Social Science.
1. Assistant Professor, School of Commerce, Gangadhar Meher University, Amruta
Vihar, Sambalpur, Odisha, India.
2. Associate Professor, School of Commerce, Ravenshaw University, Cuttack, Odisha,
India.
3. Ex Registrar, Ravenshaw University, Cuttack, Odisha, India.
INTRODUCTION.
Research Methodology chapter is an important component of every research thesis, project or
article. This segment lays down road map for research. In this part, data are analysed,
econometric and statistical tools are used by considering different variables. The process of
research is summarised here. The researcher followed the planned road map and proceeded. It
includes the method of data collection, sampling plan, selection of tools, assumption checking
etc. The present article proposes a comprehensive research methodology for a social science
research by taking a topic i.e., Impact of Direct Tax Reform, Economic Policy Reform and
Direct Tax Administrative Reform on Government Revenue.
Proposed objectives are as follows
1. To study the economic reform impact on total revenue and direct tax revenue.
2. To assess the impact of direct tax policy reform on total revenue and direct tax revenue.
3. To canvass the direct tax administration reform on total revenue and direct tax revenue.
Different components of research methodology are discussed below.
1. RESEARCH DESIGN.
This research is descriptive type. Descriptive research studies the characteristics of
phenomenon or population. It does not answer the reason of occurrence of happening of the
characteristics of population or phenomenon. In this research, impact of direct tax reform on
total revenue of the government is studied.
2. NATURE OF DATA.
The purpose of research, research problem and research hypothesis determine the nature of
data. The researcher is also influenced by this factor to determine the nature of data. As the
study rivets on Government policy reform and its impact, secondary data is being used for such
purpose after careful recapitulation of several literatures.
3. SOURCE OF DATA.
The data is required after deciding the objectives. Expert opinions are also obtained for the
validity and reliability of secondary data source.
3.1 Publications of Government of India:
Annual financial statement1 of different years, All India Income Tax Statistics, India’s Public
Finance Statistics, Report of Comptroller and Auditor General of India, Finance Acts of
different years, Income Tax Administrative Hand Book of several period of time, Budget
speech of different finance ministers, Reports of Central Board of Direct Taxes are pertained
for obtaining different composition of Tax Revenue in general Direct Tax revenue in particular.
These resources also assisted in canvassing the amendments in direct tax regime which are the
nitty and gritty for the study purpose.
Annual reports of Reserve Bank of India of year’s concern4 are pivotal in this regard.
Economic Survey of India, etc are analysed to find GDP, national income, gross domestic
savings, gross capital formation, tax-gdp ratio etc. which are treated as independent variables.
3.2 Various Committee Reports on Tax Reforms.
Government of India appointed different committees for both direct tax and indirect tax reform.
Reforms in direct tax regime took place only after a break through to these committee reports
by Government of India. So it is highly necessary to diagnose these reports which are as
follows.
The India Taxation Enquiry Committee, 1924-25.
Taxation Enquiry Commission, 1953-54.
Indian Tax Reforms- A Survey, 1956.
Rationalization and Simplification of the Tax Structure, 1968.
Direct Tax Enquiry Committee Report, 1971.
Direct Tax Laws Committee, 1978.
Economic Administrative Reforms Commission, 1983
Tax Reforms Committee, 1991.
Report of the Task Forces on Direct Taxes, 2002,
The Direct Tax Code Bill, 2010.
3.3 Other Publications:
Many books on tax reform, direct tax reform, taxation in ancient India, Philosophy of public
finance etc are cited for better result. Research articles are collected from diverse data base.
Doctoral thesis on similar studies are amassed for this research work. Newspaper articles,
blogs, essays etc are also garnered to arrive at better conclusion of the research.
1 From 1990 to 2017, concern data is collected from www.indiabudget.nic.in and from 1960 to 1990
data is collected from hard copy of Finance Act from central secretariat library, New Delhi.
4. SAMPLE DESIGN.
Sample design is a framework which serves as a basis of selection of sample. It focuses on
determining the sample size. The sampling design should specify the probability of drawing of
every unit. This process confirms answers questions like what the relevant population is, what
the sampling units are, what the sampling frame is, what shall be the standard error and how to
reduce such error.
In this research Income Tax Act was framed for the first time in India in 1860. The time period
from 1860 to 2017 is 157 years. Hence the population is 157.But the population period is
selected from the independence year. Hence the population becomes 70
4.1 Sample Size.
The sample size is selected by using the power method and a sample selection table framed by
Krejcie and Morgan (1970)
The power method suggested a formula for sample size determination which is as follows.
n = N/1+N(e
2
)
n is sample size, N is population, e’ value is 0.05.
The sample size is generated as 59 by using above formula. Krejcie and Morgan table also
suggests the sample size should be 59 when population is 70. But the sample size is taken
as 57. The first Income Tax Act came to force to India in 1961 and the revenue generated
with the effect of such Act from 1961. Thus the sample period starts from 1961. The sample
size becomes 57 which start from 1961 to 2017.
5. TIME PERIOD OF THE STUDY.
The time period of study starts from 1960-61 when a comprehensive Income Tax Act was
passed for the first time in independent India. The end point of the data period is 2016-17. The
data of 2017-18 is not available till date.
6. DATA CLEANING.
Data cleaning is one of the important processes to check the data set. Data set is prepared by
entering different units in the data analysis software. There may be any omission in while
putting input to the software. Data cleaning not only digs out the missing values but also
measures the outliers and unengaged responses. Here such process portrays only missing values
and outliers because secondary data does not require of detecting of unengaged response.
i. Missing Values.
Different data sets are designed in accordance with the purpose of the study. Missing values
are detected with the help of SPSS software.
Table 1: Missing Values in Variables.
Sl No
Variables N Missing
Values
1 Total Revenue 53 0
2 Direct Tax Revenue 53 0
3 Indirect Tax Revenue 53 0
4 Non Tax Revenue 53 0
5 Corporate Tax revenue 53 0
6 Personal Tax Revenue 53 0
7 Wealth Tax Revenue 53 0
8 Other Tax Revenue 53 0
9 Gross Domestic Saving 53 0
10 Gross Capital Formation
53 0
11 Gross National Income 53 0
Source: Compiled with SPSS Software.
The above data set is prepared to analyse the impact of direct tax administration reforms, policy
reform and economic policy reform on total revenue and direct tax reform. The data period is
53 and the output reveals that there are no missing values.
Table 2: Missing Values in Variables.
Sl No
Variables N Missing
Values
1
Total Revenue
Buoyancy
53
0
2
Direct Tax Buoyancy
53
0
3
Indirect Tax Buoyancy
53
0
4
Non Tax Buoyancy
53
0
5
National Income Buoyancy
53
0
6
Domestic Saving Buoyancy
53
0
7
Capital Formation Buoyancy
53
0
8
Agricultural Sector Buoyancy
53
0
9
Manufacture Sector Buoyancy
53
0
10
Trade, Hotels & Transport Buoyancy
53
0
11
Financing, Insurance & Real Estate Buoyancy
53
0
12
Community Social & Personal Services Buoyancy
53
0
13
Export Sector Buoyancy
53
0
14
Import Sector Buoyancy
53
0
Source: Compiled with SPSS Software.
Table 2 shows the missing values in the second data set which is designed to study the
buoyancy impact on total revenue buoyancy and direct tax buoyancy. Fourteen variables are
processed for such purpose. The output shows that there are no missing values.
Table 3: Missing Values in Variables.
Sl No
Variables N Missing
Values
1 Growth of Direct tax Revenue
16
0
2 Growth of TDS
16
0
3 Growth of Advance Tax
16
0
4 Growth of Self Assessment tax
16
0
5 Growth of Regular assessment Tax
16
0
6 Growth of other receipts
16
0
7 Growth of Collection expenditure
16
0
8 Growth of Work Load
16
0
9 Growth of Work disposal
16
0
10 Growth of Number of Effective Assesses
16
0
11 Growth of Number of Warrants
16
0
12 Growth of Value of Asset Seized
16
0
13 Growth of Pre Assessment Tax
16
0
14 Growth of Post Assessment Tax
16
0
Source: Compiled with SPSS Software.
The above table compares between pre assessment and post assessment collection.
Fourteen variables are processed for this purpose. There are also no missing values in the
data set.
Table 4: Missing Values in Variables.
Sl
No
Variables N Missing
Values Sl No Variables
N Missing
Values
1 Economic
Reform
53 0 7 BCTT 53 0
2 Administrative
Reform
53 0 8 Gift Tax 53 0
3 Policy Reform 53 0 9 MAT 53 0
4 Hotel Receipt
Act
53 0 10 C D S
Act 1974
53 0
5 Interest Act 53 0 11 V D
Scheme
53 0
6 FBT/STT 53 0 Total 11 Variables
Source: Compiled with SPSS Software
FBT: Fringe Benefit Tax, STT: Security Transaction Act, Banking Cash Transaction Tax,
Minimum Alternative Tax, Compulsory Deposit Scheme Act 1974, V D: Voluntary Disclosure
Scheme.
Table 4 depicts the dummy variables used in forming different models. These variables are
categorical. Thus, any missing value can change the output and interpretation. The above table
shows that there is no missing value in compiling of eleven dummy variables.
ii. Checking Outliers.
Outliers are the extreme value. Present of these values significantly affects the R square and
adjusted R squared statistics thus affects the whole model as well. Outliers must be identified
and removed before processing the data for a pre-defined model.
Table 5: Checking Outliers with Standardised Value.
Variables Minimum
Standardised value
Maximum
Standardised value
Desired
level
Total Revenue
-1.80341 1.55301
-3 to +3
Direct Tax Revenue
-1.52003 1.65757
Indirect Tax Revenue -1.80727 1.60988
Non Tax Revenue
-2.02396 1.36864
Personal Tax Revenue
-1.29577 -1.61043
Corporate Tax Revenue 1.74485 1.61597
Source: Compiled with SPSS Software
There are six metric variables which are used as dependent and independent variable in
different situations. The minimum standardised value and maximum standardised value should
not be more than +3 and -3. Thus, there are no outliers. The standardised value is otherwise
called Z score which is obtained by using the following formula.
For the average of a sample from a population n in which the mean is μ and the standard
deviation is σ.
The interpretation of Z score is mentioned below.
A z-score less than 0 represent an element less than the mean.
A z-score greater than 0 represents an element greater than the mean.
A z-score equal to 0 represents an element equal to the mean.
A z-score equal to 1 represents an element that is 1 standard deviation greater than the
mean; a z-score equal to 2, 2 standard deviations greater than the mean; etc.
A z-score equal to -1 represents an element that is 1 standard deviation less than the
mean; a z-score equal to -2, 2 standard deviations less than the mean; etc.
The Z score can be summarised in the following table.
Table 6: Z score, T Statistics and Confidence Level.
Z Score T Statistics Confidence
Level
P Value/Significant
Value
-1 to +1 < -+1.96 90% 0.10
-2 to +2 -+1.96 95% 0.05
-3 to +3 -+2.58 99% 0.00
Source: Compiled Data.
The table above shows the comparative relationship among Z score, T statistics, confidence
level and significant level. The null hypothesis is rejected at 10% level of significance, when
the T value is less than +- 1.96. If the T value is more than +- 1.96 but less than +- 2.58, the
null hypothesis can be rejected at 5% level of significance. The null hypothesis can be rejected
at 1% level of significance when the T statistics is +-2.58. The Z score at this level lies within
+- 3.
Exhibit 1: Normal Distribution Graph with T Statistics and Z score.
Source :https://en.wikipedia.org/wiki/Standard_score#/media/
The Exhibit 1 portrays graphically the T value, Z score and confidence level.
7. TESTING THE ASSUMPTIONS.
Assumption checking is very vital step in any research. It measures the normality and validity
of data. The parametric statistical tools cannot be applied without testing assumptions. It also
reduces the standard error. The data needs proper examination and through verification. Correct
assumption checking generates robust result and thus better conclusion can be derived. Hence
assumption checking must be conducted before analysis of data. Normality assumption applies
to all parametric tools. The homogeneity, collinearity, multi-collinearity tests are also
conducted. Test of homogeneity is very vital condition for application of Anova. The
collinearity between dependent variable and independent variables is conducted for regression
analysis. The multi-collinearity test is also conducted among independent variables for
regression analysis. T test is applied after checking homogeneity test also.
7.1: Normality.
Normality test measures whether a random variable is normality distributed or not. Many tests
are highly relied on normality assumption. The normality test can be carried by both graphically
and statistically.
Table 7: Tools for Normality Tests
Graphical Technique Mathematical Technique
Q-Q probability plots Jarque-Bera test
Cumulative frequency (P-P) plots
Shapiro-Wilks test
Histogram Kolmogorov-Smirnov test
D’Agostino test
Cramér–von Mises criterion,
Lilliefors test
Source: Data Complied
E views software applies Jarque-Bera test, SPSS software avails Shapiro-Wilks test and
Kolmogorov-Smirnov test. Gretl software applies Jarque-Bera test, Shapiro-Wilks test,
Kolmogorov-Smirnov test and Lilliefors test.
All statistical software’s also use graphical techniques to show normality of data. P-P Plot and
Histogram are being used to show normality graphically in this research work. Kolmogorov-
Smirnov test is applied for the same purpose as well.
Table 8: Normal and Non Normal Data.
Normally Distributed Data Non Normally distributed Data
Kolmogorov
-
Smirnov
Kolmogorov
-
Smirnov
Statistic
df
Sig.
Statistic
df
Sig.
Variable .261 50 .200 Variable
.096 50 .000
Source: Data Compiled with the help of SPSS.
The table above shows both normal data and non-normal data. New researchers are often
confused about the hypothesis of normality. The hypothesis is as follows.
Ho: The sample data are not significantly different than a normal population.
Ha: The sample data are significantly different than a normal population.
Some researchers also use the following hypothesis for normal assumption
Ho: Data is normal.
Ha: Data is not normal.
Both set of hypotheses have same meaning.
Here above the null hypothesis is to be accepted. Thus, the P value or significant value should
be more than .05. The first set of data in Table 3.8 carries significant value .20 which is more
than .05. Thus, the null hypothesis is accepted. Hence the data is said to be normal or sample
data are not significantly different than a normal population.
The table also comprises of data which P value is .00. It means the null hypothesis is rejected
at 1% level of significance and the alternative hypothesis is accepted. Thus, data is non normal.
Table 9: Normality of Data.
Variables Kolmogorov-Smirnov Result
Statistics N Sig.
Total Revenue .096 53 .200 Normal
Direct Tax Revenue .114 53 .081 Normal
Indirect Tax Revenue .103 53 .200 Normal
Non Tax Revenue .124 53 .061 Normal
Personal Tax Revenue .162 53 .001 Not Normal
Corporate Tax Revenue .096 53 .200 Normal
Source: Data Compiled in SPSS.
The above table shows six metric variables. The variables are normal except Personal tax
revenue whose P value is .00.
Exhibit 2: Normality of Data in Histogram.
1. Normally Distributed Data 2. Non Normally Distributed Data
Source: Data Compiled with the help of SPSS.
The picture above shows the normality of data with histogram. The first picture includes almost
all data inside the normal curve which has not happened in the second picture. The first picture
relates to the residuals of a regression model of this research work.
Exhibit 3: Normality of Data in P-P Plot and Q-Q Plot.
1. Normally Distributed Data 2. Non-Normally Distributed
Data
Source: Data Compiled with the help of SPSS.
If the data points are nearer to the straight line in P-P Plot and Q-Q Plot, the data is said to be
normally distributes. In the first picture, plots are nearer to the line; hence the said data is
normally distributed. The data points are not closely associated to the straight line; hence the
data is not normal. The first picture relates to a regression model of this research work.
7.2 Test of Homogeneity.
Homogeneity of groups is an essential pre condition of test like Anova and independent T test.
The anova family also needs the homogeneity assumption. Homogeneity in groups refers to
uniform composition or structure of the samples. It refers to the presence of equal variance
among the groups. Homogeneity is measured by T statistics and F statistics in T test and Anova
respectively. Different test has been used for such test like Hartley’s Fmax, Cochran’s, Leven’s,
Barlett’s test etc. Leven’s test (1960) is widely used by researchers as other tests are too
sensitive for non-normal data. Leven’s test has been applied for this research also.
The hypothesis for such test is similar to the hypothesis constructed in normality which is as
follows.
Ho: There is no significant difference in variance of samples with population.
Ha: There is significant difference in variance samples with population.
The above hypothesis can also be interpreted in a different way which is as follows.
Ho: Samples are homogeneous.
Ha: Samples are not homogeneous.
The null hypothesis should be accepted. It means the P value or Significant value should be
more than 0.05 in order to accept the null hypothesis. The Levene’s statistics should be more
than 0 .5 as well.
Table 10: Levene’s Test for Homogeneity.
Variables
Levene’
s
Statistic
df1
df2
Sig.
Result
Total Revenue
2.353
2
50
.106
Homogeneous
Direct Tax Revenue
17.605
2
50
.000
Heterogeneous
Indirect Tax Revenue 1.287 2 50 .285 Homogeneous
Non Tax Revenue 2.643 2 50 .081 Homogeneous
Corporate Tax Revenue 1.947 7 44 .085 Homogeneous
Personal Tax Revenue 6.095 7 44 .000 Heterogeneous
Source: Test Statistics through SPSS Software.
The dependent variable must be metric and independent variable must be categorical. In this
research work the dependent variables are Total Revenue, Direct Tax Revenue, Corporate Tax
Revenue and Personal Tax Revenue. The categorical and independent variables are economic
policy reform, direct tax policy reform and direct tax administrative reform. The table10 shows
the result of homogeneous test of different variables of this research work.
The P value or significant value of Total Revenue, Direct Tax Revenue and Corporate Tax
Revenue is more than .05. Thus, the null hypothesis is accepted. Hence the sample variance of
these variables is equal with the population variance. The significant value of Personal Tax
Revenue is 0.00 which rejects the null hypothesis. It means the samples are not homogeneous.
Thus, Anova technique cannot be applied. Non parametric tests like Friedman test and Krusal-
Wallis test are popular in the place of Anova.
7.3 Test of Multi-Collinearity.
Multi-Collinearity is an important prerequisite for multiple regression analysis. Multi-
Collinearity is a situation where numbers of independent variables are highly correlated to each
other. It means one predictor variable can be used to predict the other. In other words, multi-
collinearity test finds the high degree of inter-correlations or inter associations among the
independent variables.
Multi-Collinearity can be assessed by applying correlation technique. If the coefficient
correlation is +1 or -1, there is presence of perfect multi-collinearity.
Multi-collinearity is measured by two values such as tolerance level and variance inflation
factor. Tolerance value ranges between 0 and 1. Any value of Tolerance above 0.7 is
acceptable. Social science researchers accept tolerance value more than 0.6 as well. The VIF
is the reciprocal of Tolerance value. The VIF value has lower level as 1 and no highest value.
The upper bound must be within 3. Social science researchers also accept 10 as the upper limit.
If the upper bound exceeds 10, there will be multi-collinearity problem.
Table 11: Result of Multi-Collinearity (Dependent Variable: Direct Tax Revenue).
Sl No Variables Tolerance VIF
1 Corporate Tax Revenue .975 1.026
2 Personal Tax Revenue .975 1.026
3 Wealth Tax Revenue .865 1.156
4 Other Tax Revenue .947 1.056
5 Gross Domestic Saving .915 1.093
6 Gross Capital Formation .976 1.025
7 Gross National Income .945 1.059
8 Policy Reform .747 1.338
9 Economic Reform .763 1.310
Source: Data Compiled in SPSS.
The above table shows multi-collinearity statistics of a regression model where Direct Tax
Revenue is processed with nine independent variables. The tolerance level of all independent
variables is more than 0.7. The Variance Inflation factor is also within 3. The same process is
followed to check multi-collinearity problem of other regression models taking Direct Tax
Revenue as dependent variable.
Table 12: Result of Multi-Collinearity (Dependent Variable: Total Revenue).
Sl No Variables Tolerance VIF
1 Direct Tax Revenue .825 1.212
2 Indirect Tax Revenue .748 1.337
3 Non Tax Revenue .696 1.437
Source: Data Compiled in SPSS.
The above table exhibits multi-collinearity statistics of a regression model taking Total
Revenue as dependent variable. The tolerance level and VIF are within the limit as well. The
multi-collinearity problem is being checked for other models in the similar way.
Table 13: Result of Multi-Collinearity (Dependent Variable: Direct Tax Buoyancy).
Sl No Variables Tolerance VIF
1 Corporate Tax Buoyancy .931 1.074
2 Personal Tax Buoyancy .982 1.018
3 Wealth Tax buoyancy .953 1.049
4 Other Direct Tax Buoyancy .991 1.009
Source: Data Compiled in SPSS.
The above table portrays multi-collinearity statistics of a regression model by taking Direct
Tax Buoyancy as predictand and four components of direct tax revenue as predictor. The
tolerance level and variance inflation factor are within the limit.
Table 14: Result of Multi-Collinearity (Dependent Variable: Direct Tax Buoyancy).
Sl No Variables Tolerance VIF
1 Gagr/Ggdp .623 1.604
2 Gtra/Ggdp .639 1.566
3 Gfin/Ggdp .921 1.086
4 Gcsp/Ggdp .476 2.103
5 Gimp/Ggdp .644 1.553
6 Gexp/Ggdp .657 1.522
Source: Data Compiled in SPSS.
The above model measures the impact of sectoral buoyancy on Direct Tax Buoyancy. The
tolerance level of Gcsp/Ggdp is 0.476. The said variable may be removed from the model. The
tolerance limit of other variables is more than 0.60 which is also acceptable.
Table 15: Result of Multi-Collinearity (Predictand: Total Revenue Buoyancy).
Sl No Variables Tolerance VIF
1 Direct Tax Buoyancy .872 1.147
2 Indirect Tax Buoyancy .801 1.248
3 Non Tax Buoyancy .803 1.245
Source: Data Compiled in SPSS.
The above table measures the impact of Direct Tax Buoyancy, Indirect Tax Buoyancy and
Non-Tax Buoyancy on Total Revenue Buoyancy. Tolerance level is more than 0.70 and VIF
is also within 3.
Table 16: Result of Multi-Collinearity (Predictand: Total Revenue Buoyancy).
Sl No Variables Tolerance VIF
1 Direct Tax Buoyancy .782 1.279
2 Gagr/Ggdp .381 2.623
3 Gman/Ggdp .468 2.139
4 Gtra/Ggdp .523 1.913
5 Gfin/Ggdp .920 1.087
6 Gcsp/Ggdp .497 2.014
7 Gimp/Ggdp .648 1.542
8 Gexp/Ggdp .551 1.814
Source: Data Compiled in SPSS.
The above table measures the impact of sectoral buoyancy on Total Revenue Buoyancy. The
tolerance level of Gagr/Ggdp and Gcsp/Ggdp is 0.381 and 0.49. These variables may be
removed from the model. The multi-collinearity is also being checked for other variables in the
similar way.
7.4 R Square and Adjusted R Square.
R square and adjusted R square measures the degree of explanation made by the variables about
the model. It explains the closeness of the data to fit regression line. It is also called the
coefficient of determination. It is the square of coefficient of Correlation. It shows the
percentage of variation in Y variable which is explained by all X variables together. However,
researchers opined that model fit cannot be justified by considering only R square value. A low
R square or a high R square value may not necessarily interpret the badness or goodness of the
model. Significant variables can be interpreted even there is a low R square value. This is the
limitation of R square which is explained below.
Limitations of R Square.
“R-squared cannot determine whether the coefficient estimates and predictions are biased,
which is why you must assess the residual plots. R-squared does not indicate whether a
regression model is adequate. You can have a low R-squared value for a good model or a high
R-squared value for a model that does not fit the data”1.
High R squared value may be derived due to Multi-Collinearity problem.
Exhibit 4: Analysis of R Square.
1. Regression Fit Plots with R Square 98.5. 2.Residual Plots
Source: http://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis.
The regression fit line shows that R square value is 98.5 which sound good but the plots are
below and above the line. The residual plots formed a clear pattern which refers that there is
no constant variance among residuals. Thus, the model is not fit even R square is above 90%.
The difference between R square and Adjusted R square is explained below.
“R-squared measures the proportion of the variation in your dependent variable (Y) explained
by your independent variables (X) for a linear regression model. Adjusted R-squared adjusts
the statistic based on the number of independent variables in the model”2.
Thus, adjusted R square is recommended than R square. Both R square and adjusted R square
are calculated for Anova and Regression analysis. Few models with R square and Adjusted R
square are mentioned below.
1.http://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-
squared-and-assess-the-goodness-of-fit.
2.https://www.quora.com/What-is-the-difference-between-R-squared-and-Adjusted-R-squared.
Table 17: R square and Adjusted R square
Model No Technique R square Adjusted R Square
6.1.2 Factorial Anova .975 .971
6.1.3 Factorial Anova .977 .974
6.1.7 Ancova .99 .99
6.1.13 Manova .975 .971
6.2.1 Multiple Regression .684 .677
6.2.2 Multiple Regression .944 .939
6.2.3 Multiple Regression .970 .967
6.2.7 Multiple Regression .667 .616
Source: Data Compiled in SPSS.
The table above shows the R square and adjusted R square value of some Anova and Regression
Models. The said value of other models is also considered in the similar way.
7.5 Homoscedasticity: Constant Variance of Residuals.
Regression analysis carries one of the important assumptions i.e. there must be constant
variance among residuals. If the plots are scattered all over the area, it is said that there is
constant variance of residuals.
Exhibit 5: Consistency of Residuals.
Source: Data Compiled in SPSS.
The above scatter plot is taken from a regression model which is used in this research work.
The plots are spread across the area; thus, it can be interpreted that there is constant variance
among residuals.
Independence of Observations: Durbin -Watson Statistics.
Durbin and Watson (1951) for the first time used such statistics to the residuals in least square
regression analysis. They developed null hypothesis that the errors are serially uncorrelated at
lag 1. Other researchers formed similar type of hypothesis which is as follows.
H0 = There is no first order autocorrelation.
H1 = There is first order correlation.
Both sets of hypotheses interpret the result in a similar way that there is no auto correlation in
residuals. Such test can be applied when the errors are normally distributed which is also a pre-
condition of regression analysis. The following formula is used in SPSS software to calculate
such value.
The value of such result varies from 0 to 4. But a rule of thumb is that test statistic values in
the range of 1.5 to 2.5 are relatively normal3 &4. Any value near to 0 suggests that there is high
degree of negative correlation and near to 4 suggests that there is high degree of positive
correlation. In other words, the said tool measures the independency of observations.
Table 18: Durbin Watson Statistics.
Model No
Number of Predictors Durbin-Watson Statistics Desired Value
6.2.1 7 1.711
1.5 to 2.5
6.2.3 11 1.810
6.2.5 3 1.556
Source: Data Compiled in SPSS.
The value of above table is compiled by using SPSS 23 data analysis software which is largely
used by researchers.
The Table 18 exhibits Durbin-Watson statistics of few regression models which designed for
this research work. Such values are 1.711, 1.810 and 1.556. These values are within the
prescribed limit. The same value of other models is also within the desired limit.
3. http://www.statisticshowto.com/durbin-watson-test-coefficient/.
4. https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php.
7.6 Model Fit with Anova.
The anova table represents the linear relationship between variables which is one of the
important pre-conditions of linear regression analysis. The result is drawn in Anova table with
F statistics. The hypothesis for such test is mentioned below.
H0 = There is no linear relationship between two variables.
H1 = There is linear relationship between two variables.
The null hypothesis should be rejected at lowest possible P value.
Table 19: Anova Statistics.
Model No F Statistics Sig/P Value
6.2.4 269.254 .000
6.2.8 734.174 .000
6.2.9 5.427 .000
Source: Data Compiled in SPSS.
The above table portrays F statistics and P value. The P value is 0.000 which rejects the null
hypothesis at 1% level of significance. Thus, there is linear relationship between two variables.
The similar interpretations are derived for other regression models.
8. RESEARCH TOOLS AND SOFTWARES USED.
Different statistical tools have been used with the help of various data analysis software to
dissect the data. Data set is prepared first and statistical tools are selected on the basis of data
set. Expert opinion was taken in this regard. Different software like Microsoft Excel, IBM
SPSS 23, and STATA 10 has been used for data analysis purpose. The tools and techniques
which are used in this study are as follows.
1 Tabular Analysis.
2 Graphical Analysis.
3 Ratios and Percentages
4 Averages
5 Independent T test.
6 Analysis of Variance. (One Way Anova)
7 Factorial Anova
8 Ancova
9 Mancova
10 Multiple Linear Regression Analysis.
11 Correlation and Canonical Correlation
12 Non-Parametric Test.
13 Multiple Discriminant Analysis.
14 NVIVO Plus Software for Finding Research Gap.
13. Mendley Software for Reference.
1. Tabular Analysis.
Different tables are drawn for better presentation of data. Three-year average of Total
Revenue, Direct Tax Revenue, Corporate Tax Revenue and Personal Tax Revenue has been
presented in different tables for comparison purpose. The buoyancy of different tax is also
presented in the same way. Summary Assessment Collections in the form of pre assessment
and post assessment is also shown in tables. Moreover, number of warrants, work load and
work load disposal is also portrayed in several tables.
2. Graphical Analysis
Tabular data has been shown in different graphs for a better presentation. The impact of
reform periods is also shown for Total Revenue, Direct Tax Revenue etc. The comparison
among micro economic factors with total Revenue is also shown in line charts. The comparison
between growth of Direct Tax Revenue with Number of Effective Assesses are presented in
lines. The graphical analysis is also extended to other similar presentations as well. Microsoft
Excel is used for such purpose.
3. Ratio and Percentages
Ratios and Percentages are often used for organising the data. These tools are simple and
useful. The contribution of Direct Tax Revenue, Indirect Tax Revenue and Non-Tax Revenue
to Total Revenue is calculated with the help of such technique. Similar method is applied to
measure the contribution of Corporate Tax Revenue, Personal Tax Revenue, Wealth Tax
Revenue and Other Direct Tax Revenue to Total Direct Tax Revenue. Growth of different taxes
has been calculated with the help of percentages. Buoyancy of different taxes has been
calculated with the help of ratio. Microsoft Excel is used for such purpose.
4. Averages.
Averages are the simple mean. Mean is directly used and other tools uses mean to arrive at
their result. Three-year average of Total Revenue, Direct Tax Revenue, Corporate Tax
Revenue and Personal Tax Revenue is calculated.
The buoyancy of different taxes is also calculated in the same way. Three-year average
buoyancy of Total Revenue, Direct Tax Revenue, Corporate Tax Revenue and Personal Tax
Revenue is also measured.
5. Independent T Test
Independent T test has been applied to pre and post assessment collection and to study the
impact of compulsory e filling and establishment of tax information network on advance tax,
tds, regular assessment tax etc. The t value and P value is used to derive appropriate conclusion
between the groups. SPSS 23 is used to apply such test
6. One Way Anova
One-way anova is the extension of T test. The comparison is made in three or more groups.
The dependent variable is metric and independent variables are categorical for anova family.
In this research work, the impact of economic reform on Total Revenue, Indirect Tax Revenue
and Non-Tax Revenue is being measures. Test of normality and homogeneity is essential for
application of such test. The normality and homogeneity statistic are mentioned in table 3.8
and 3.9 respectively. The test is conducted with the help of IBM SPSS 23.
7. Factorial Anova
The factorial anova considered one dependent variable and more than on categorical
independent variables. In this research work total revenue is considered as dependent variables
and economic reform, direct tax policy reform and administration reform is considered as
independent variable.IBM SPSS 23 is used for such analysis.Few models ran factorial anova
are as follows.
Table 20: Factorial Anova.
Model No Dependent Variable Independent Variable
6.1.6 Total Revenue Reform Measures
a
.
6.1.7 Direct Tax Revenue Reform measures
a
Source: Compiled Data.
a. Reform measures are Economic Reform, Direct Tax Policy Reform and administrative
reform.
The other models are also put to software in this way.
8. Ancova
Ancova shows the impact of dependent variable on independent by eliminating
covariate. It means the impact of one variable is neutralised. The covariate variable must
be metric variable.
The homogeneity assumption is followed here as well. In this research work various
ancova models are designed. Few models are shown below in table 3.21
Table 21: Ancova Model.
Model No
Dependent Variable
Independent Variable
Covariate
6.1.11 Total Revenue Policy Reform Indirect Tax Revenue.
6.1.15 Total Revenue Administrative Reform
Indirect Tax Revenue.
Source: Compiled Data.
The above table shows two models. In the first model, impact of policy reform on Total
Revenue is studied by eliminating the effect of Indirect Tax Revenue. The second model
measures the impact of administrative reform on Total Revenue by eliminating the impact of
Indirect Tax Revenue. IBM SPSS 23 is used for such purpose.
9. Manova
Manova model uses many dependent variables with many factors. In this research work
following models are designed for application of manova.
Table 22: Manova Model.
Model No Dependent Variable Independent Variable
6.1.17 Total Revenue
Direct Tax Revenue
Economic Reform
Policy Reform
Administrative Reform
6.1.18 Corporate Tax Revenue
Personal Tax Revenue
Economic Reform
Policy Reform
Administrative Reform
Source: Compiled Data.
The above table portrays that two independent variables are processed with three dependent
variables. IBM SPSS software is used for such purpose. Same number of variables are
processed in the second model.
10. Multiple Regression Analysis.
Multiple Regression Analysis is applied to measure the impact of direct tax reform on total
revenue and direct tax revenue. The assumptions of such technique are already discussed in the
earlier in this segment. Few models are shown below.
Model .1
Direct Tax Revenue = 𝛼 + 𝛽1𝑐𝑡𝑟 + 𝛽2𝑝𝑡𝑟 + + 𝛽3𝑤𝑡𝑟 + 𝛽4𝑜𝑑𝑡𝑟 + 𝛽5𝑔𝑛𝑖 + 𝛽6𝑝𝑐𝑖 +
𝛽7𝑔𝑑𝑠 + 𝛽8𝑔𝑐𝑓 + 𝐸
Model 2
Direct Tax Revenue = 𝛼 + 𝛽1𝑐𝑡𝑟 + 𝛽2𝑝𝑡𝑟 + + 𝛽3𝑤𝑡𝑟 + 𝛽4𝑜𝑑𝑡𝑟 + 𝛽5𝑔𝑛𝑖 + 𝛽6𝑝𝑐𝑖 +
𝛽7𝑔𝑑𝑠 + 𝛽8𝑔𝑐𝑓 + 𝛽9𝐴𝑑𝑚𝑖𝑛𝑖𝑠𝑡𝑟𝑎𝑡𝑖𝑣𝑒 𝑟𝑒𝑓𝑜𝑟𝑚 + 𝛽10𝑃𝑜𝑙𝑖𝑐𝑦 𝑅𝑒𝑓𝑜𝑟𝑚 +
𝛽11𝐸𝑐𝑜𝑛𝑜𝑚𝑖𝑐 𝑅𝑒𝑓𝑜𝑟𝑚 + 𝐸
Model 3
Total Revenue =𝛼 + 𝛽1𝑑𝑡𝑟 + 𝛽2𝑖𝑡𝑟 + 𝛽3𝑛𝑡𝑟+ 𝐸
In the similar way other regression models are designed and tested.
11. Correlation and Canonical Correlation.
Pearson’s correlation technique is used to correlate pre and post assessment data. Such
technique relates individual variable with individual variable by forming correlation matrix. It
also shows P value to the level of significance. The variables are usually metric.
On the other hand, canonical correlation technique is applied to find the correlation
between a set of variables. The set of variables may be category or metric. Such variables have
been applied to correlate pre and post assessment collection.
12. Multiple Disciminant Analysis
This technique takes dependent variable as categorical variable and independent variable
as metric variable like logistic regression and unlike anova and linear regression. This
technique also studied the accuracy of grouping through group centroids, eigen value, wilks’
lambda, canonical correlation, structure matrix etc.
Eigen value explains the variance of disciminant function; structure matrix shows the
relationship between independent variable with discriminant function, chi square value
replicates the equality of means of groups and wiki’s lambda shows the accuracy of
segmentation. The extent of correct classification is also being revealed by such tool.
13. Non-Parametric Tests.
Different non parametric tests are available in the place of T test and Anova. Non
parametric test is applied when the normality assumption is not satisfied. The rigid pre
conditions are not required for such tools. However, sample cannot infer about population as
normal distribution condition is not satisfied.
Table 23:Tools for Non Parametric Tests.
Quantitative Data
Parametric tests Non-Parametric tests
1 Sample T-test Sign Test
Paired Sample t-test Wilcoxon Signed Rank test (Independent
Sample)
2 Sample T-test Mann Whitney U test, Wilcoxon Rank Sum
test (Related Sample)
One Way ANOVA Kruskal Wallis test (Independent Sample)
One Way ANOVA Friedman Test (Related Sample)
Qualitative Data
For independent Samples Chi Square / Fisher’s Exact test
For Matched Case-Control
Samples
McNemar Test
Bivariate Correlation (Quantitative data)
Normality assumptions satisfied Pearson’s Correlation
Normality assumptions not
satisfied
Spearman’s Correlation
Agreement Analysis
Quantitative data Bland Altman Plots
Qualitative data Kappa Estimates
Source: SPSS Software.
The above table portrays different non parametric test which can be used in the place of T test
and Anova. In this research work, Kruskal Wallis has been applied to Personal Tax Revenue
which is neither normal nor homogeneous.
14. Application of NVIVO Plus Software for Finding Research Gap.
Word Cloud of Literatures.
NVIVO 14 Plus software is used to find word frequency and word cloud. The word frequency
reveals the frequency of different words which is used in literatures. On the other hand, world
cloud gives a pictorial presentation of frequency of various words.
The word cloud shows the frequency of words used in different literatures. The highest
frequency words are more prominent. In the picture tax and income is used a greater number
of times than other words. The words which are have little use can be used for forming
statements under research gap.
15. Mendley Software.
Mendley software has been used for reference writing. The articles are loaded in such
software and linked to word file for citation and reference purpose. Some sample references in
APA style of referencing are given below.
Ahmed, Q. M., & Mohammed, S. D. (2010). Determinant of Tax Buoyancy : Empirical
Evidence from Developing Countries. European Journal of Social Sciences, 13(3), 408–
414.
Ashraf, M., & Sarwar, D. S. (2016). Institutional Determinants Of Tax Buoyancy In
Developing Nations. Journal of Emerging Economies and Islamic Research, 4(1), 1–12.
Azémar, C., & Delios, A. (2008). Tax competition and FDI: The special case of developing
countries. Journal of the Japanese and International Economies, 22(1), 85–108.
https://doi.org/10.1016/j.jjie.2007.02.001
Bagchi, A. (1973). Priorities for a Tax Programme. Economic & Political Weekly, 8(8),
435,437-439,441-443.
CONCLUSION.
Research Methodology chapter is pivotal so far as research work is concerned. This part shows
the road track to the researcher. The progress of research work depends upon the strength of
this segment. The methodology, tools, procedure etc are analysed in this chapter. The
researcher made pre testing, data cleaning part in this chapter.
REFERENCES.
1. https://journals.sagepub.com/doi/abs/10.1177/001316447003000308#
2. https://en.wikipedia.org/wiki/Standard_score#/media/
3. https://www.itl.nist.gov/div898/handbook/eda/section3/eda35a.htm
4. https://influentialpoints.com/Training/hartleys-and-cochrans_tests.htm
5. http://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-
interpret-r-squared-and-assess-the-goodness-of-fit.
6. https://www.quora.com/What-is-the-difference-between-R-squared-and-Adjusted-R-
squared.
7. http://www.statisticshowto.com/durbin-watson-test-coefficient/.
8. https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php