Helge Toutenburg’s research while affiliated with Ludwig-Maximilians-Universität in Munich and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (163)


Figure 1. The condition number as a function of f .
Figure 2. Conditional numbers vs. different values of p for y = β 0 + β 1 X 1 + β 2 D + ε.  
Table 2 . Regression analysis output for y = 23 + 1.5X 1 + 3X 2 + 0.5D + ε
Figure 3. Boxplot of 100 condition numbers for p = 0.95.  
Figure 4. Results for the different values of p for y = β 0 + β 1 X 1 + β 2 D A + β 3 D B + ε.  

+8

Role of categorical variables in multicollinearity in linear regression model
  • Chapter
  • Full-text available

January 2014

·

1,311 Reads

·

59 Citations

M. Wissmann

·

·

H. Toutenburg

The present article discusses the role of categorical variable in the problem of multicollinearity in linear regression model. It exposes the diagnostic tool condition number to linear regression models with categorical explanatory variables and analyzes analytically as well as numerically how the dummy variables and choice of reference category can affect the degree of multicollinearity.

Download

Models for Categorical Response Variables

January 2010

·

29 Reads

·

2 Citations

Generalized linear models (GLMs) are a generalization of the classical linear models of regression analysis and analysis of variance, which model the relationship between the expectation of a response variable and unknown predictor variables according to E(yi)=xi1β1++xipβp=xiβ.\begin{array}{ll} {\rm E}(y_i) &= x_{i1}\beta_1 + \ldots + x_{ip}\beta_p\\ &= x^{\prime}_i \beta.\\ \end{array} (8.1) The parameters are estimated according to the principle of least squares and are optimal according to the minimum dispersion theory or, in the case of a normal distribution, are optimal according to the ML theory (cf. Chapter 3).


Stein-Rule Estimation under an Extended Balanced Loss Function

October 2009

·

416 Reads

·

16 Citations

Journal of Statistical Computation and Simulation

This paper extends the balanced loss function to a more general set up. The ordinary least squares and Stein-rule estimators are exposed to this general loss function with quadratic loss structure in a linear regression model. Their risks are derived when the disturbances in the linear regression model are not necessarily normally distributed. The dominance of ordinary least squares and Stein-rule estimators over each other and the effect of departure from normality assumption of disturbances on the risk property is studied.


Multifactor Experiments

September 2009

·

11 Reads

·

1 Citation

In practice, for most designed experiments it can be assumed that the response Y is not only dependent on a single variable but on a whole group of prognostic factors. If these variables are continuous, their influence on the response is taken into account by so–called factor levels. These are ranges (e.g., low, medium, high) that classify the continuous variables as ordinal variables. In Sections 1.7 and 1.8, we have already cited examples for designed experiments where the dependence of a response on two factors was to be examined.


The Linear Regression Model

September 2009

·

34 Reads

·

5 Citations

The main focus of this chapter will be the linear regression model and its basic principle of estimation.We introduce the fundamental method of least squares by looking at the least squares geometry and discussing some of its algebraic properties. In empirical work, it is quite often appropriate to specify the relationship between two sets of data by a simple linear function. For example, we model the influence of advertising time on the number of positive reactions from the public. From the scatterplot in Figure 3.1 one could suspect a linear function between advertising time (x{axis) and the number of positive reactions (y{axis). The study was done on 66 people in order to investigate the impact and cognition of advertising on TV.


Comparison of Two Samples

September 2009

·

15 Reads

·

24 Citations

Problems of comparing two samples arise frequently in medicine, sociology, agriculture, engineering, and marketing. The data may have been generated by observation or may be the outcome of a controlled experiment. In the latter case, randomization plays a crucial role in gaining information about possible differences in the samples which may be due to a specific factor. Full nonrestricted randomization means, for example, that in a controlled clinical trial there is a constant chance of every patient getting a specific treatment. The idea of a blind, double blind, or even triple blind set{up of the experiment is that neither patient, nor clinician, nor statistician, know what treatment has been given. This should exclude possible biases in the response variable, which would be induced by such knowledge. It becomes clear that careful planning is indispensible to achieve valid results. Another problem in the framework of a clinical trial may consist of the fact of a systematic effect on a subgroup of patients, e.g., males and females. If such a situation is to be expected, one should stratify the sample into homogeneous subgroups. Such a strategy proves to be useful in planned experiments as well as in observational studies.


Incomplete Block Designs

September 2009

·

36 Reads

·

1 Citation

In many situations the number of treatments to be compared is large. Then we need large number of blocks to accommodate all the treatments and in turn more experimental material. This may increase the cost of experimentation in terms of money, labor, time etc. The completely ran- domized design and randomized block design may not be suitable in such situations because they will require large number of experimental units to accommodate all the treatments. In such cases when sufficient number of homogeneous experimental units are not available to accommodate all the treatments in a block, then incomplete block designs are used in which each block receives only some and not all the treatments to be compared. Sometimes it is possible that the blocks that are available can only handle a limited number of treatments due to several reasons. For example, suppose the effect of twenty medicines for a rare disease from different companies is to be tested over patients. These medicines can be treated as treatments. It may be difficult to get sufficient number of patients having the disease to conduct a complete block experiment. In such a case, a possible solution is to have less than twenty patients in each block. Then not all the twenty medicines can be administered in every block. Instead few medicines are administered to the patients in one block and the remaining medicines to the patients in other blocks. The incomplete block designs can be used in this setup. In another example, the medical companies and biological experimentalists need animals to conduct their experiments to study the development of any new drug. Usually there is an ethics commission which studies the whole project and decides how many animals can be sacrificed in the experiment. Generally the limits prescribed by the ethics commission are not sufficient to conduct a complete block experiment.


Statistical Analysis of Incomplete Data

September 2009

·

33 Reads

·

7 Citations

A basic problem in the statistical analysis of data sets is the loss of single observations, of variables, or of single values. Rubin (1976) can be regarded as the pioneer of the modern theory of Nonresponse in Sample Surveys. Little and Rubin (1987) and Rubin (1987) have discussed fundamental concepts for handling missing data based on decision theory and models for the mechanism of nonresponse.


Single–Factor Experiments with Fixed and Random Effects

September 2009

·

21 Reads

·

1 Citation

The analysis of variance, which was originally developed by R.A. Fisher for field experiments, is one of the most widely used and one of the most general statistical procedures for testing and analyzing data. These procedures require a large amount of computation, especially in the case of complicated classifications. For this reason, these procedures are available as software.


Repeated Measures Model

September 2009

·

15 Reads

In contrast to the previous chapters, we now assume that instead of having only one observation per object/subject (e.g., patient) we now have repeated observations. These repeated measurements are collected at previously exact defined times. The principle idea is that these observations give information about the development of a response Y . This response might, for instance, be the blood pressure (measured every hour) for a fixed therapy (treatment A), the blood sugar level (measured every day of the week), or the monthly training performance of sprinters for training method A, etc., i.e., variables which change with time (or a different scale of measurement). The aim of a design like this is not so much the description of the average behavior of a group (with a fixed treatment), rather the comparison of two or more treatments and their effect across the scale of measurement (e.g., time), i.e., the treatment or therapy comparison. First of all, before we deal with this interesting question, let us introduce the model for one treatment, i.e., for one sample from one population.


Citations (27)


... A subsequent statistical test on the significance of the difference between the two distributions allows us to score the importance of the tested model term by a p-value (Toutenburg and Heumann, 2008). If predictions of the complete model are significantly better than the reduced model – measured , for example, by a paired parametric t-test (Toutenburg and Heumann, 2008), or a non-parametric Cox-Wilcoxon test (Toutenburg and Heumann, 2008) – the tested model term will be retained. If the reduced model outperforms the complete model, or does not differ significantly from the full model, the model term can be dropped. ...

Reference:

Analysing spatio-temporal patterns of the global NO2-distribution [NO subscript 2 -distribution] retrieved from GOME satellite observations using a generalized additive model
Einführung in SPSS
  • Citing Chapter
  • January 2009

... Another guiding principle, adopted from methods of Six Sigma and statistical process control (Toutenburg & Knöfel, 2009), was the pareto-concept, which suggests that a large part of problems may be caused by a small fraction of root-causes, which can be detected experimentally with only a few data points. This assumption of pivotal root-causes (also known as red-X or key-drivers) as important components in the data turned out to be quite useful. ...

Six Sigma: Methoden und Statistik für die Praxis
  • Citing Book
  • January 2009

... The authors also acknowledge that there exist no specific guidelines on identifying necessary domain information of a manufacturing system and translating them to OPC UA models. Therefore, the authors used SIPOC (Supplier, Input, Process, Output, Customer) analysis to identify the data points which are converted to OPC UA Objects (Toutenburg and Knöfel, 2008). Regarding integrating existing standards, the authors mentioned that there are some missing requirements which can't be covered by existing information models which is why company-specific models are developed. ...

Six Sigma: Methoden und Statistik für die Praxis
  • Citing Book
  • January 2008

... If, for example, only level one was significantly associated with SRA, we would conclude that levels two and three did not differ. This approach is supported by Wissmann et al. (2007) and Anderson et al. (2014) as appropriate for categorical values such as these implemented for SLP levels. The following paragraph details the steps taken for each of the 24 models of interest. ...

Role of categorical variables in multicollinearity in linear regression model

... Thus, the W r matrix for r = 7 can be expressed as: PB 6,6,3,3;2,1 is the incidence matrix of a Partially Balanced Incomplete Block Design PBIBD with 6 treatments in 6 blocks and 3 treatments per block with each treatment occurring in 3 blocks with λ 1 = 2 for the 1st association and λ 2 = 1 for the 2nd association, with the 0 entries replaced with − 1's. Details on how to construct the necessary PBIBD matrices can be found in Street and Street (2006); Toutenburg et al. (2009). This design was first constructed by Williamson (1946), and it gives a D-efficiency of 87.82% for choice pairs when only testing the main effects. ...

Incomplete Block Designs
  • Citing Chapter
  • September 2009

... Assuming  to be small is justifiable from the fact that if it is large, the model in (2.1) will not be well explained by the explanatory variables. This technique was first introduced by Kadane[16] and later used by many researchers, for instance, see Shalabh and Toutenburg[17], Dube and Chandra[13], and Qain and Giles[18]. The performance of the estimators is analyzed with respect to the criterion of risk under the LINEX loss function. ...

Estimation of linear regression models with missing data: The role of stochastic linear constraints
  • Citing Article
  • January 2005

Communications in Statistics - Theory and Methods

... Finally, the responses of the companies surveyed are analysed using simple and Multiple Linear Regression analyses (Frost, 2018;Toutenburg et al., 2008) and Pearson correlation calculations (Kuckartz et al., 2013) and compared with the results of the literature analysis and the author's assumptions. ...

Descriptive statistics. An introduction to methods and applications with R and SPSS. With contributions by Michael Schomaker and Malte Wißmann. 6th revised and extended ed
  • Citing Article
  • January 2008

... Generalization of reliability measures for correlated observations was discussed by Wang and Chen ͑1994͒, Schaffrin ͑1997͒, and Ou ͑1999͒. Schaffrin et al. ͑2003͒ investigated the impact of missing values on the traditional reliability measures. The reliability in constrained Gauss-Markov models and its photogrammetric applications were addressed by Cothren ͑2005͒. ...

ON THE IMPACT OF MISSING VALUES ON THE RELIABILITY MEASURES IN A LINEAR MODEL

... Both approaches offer distinct advantages and disadvantages, depending on the specific requirements and nature of the problem under consideration. In some cases, integrating these methodologies can lead to optimal results [13][14][15]. This study employs Artificial Neural Networks (ANNs) and Response Surface Methodology (RSM) to develop models aimed at optimizing laser-cutting parameters to achieve superior fabric-cutting quality and enhanced production efficiency. ...

Statistical Analysis of Designed Experiments, Third Edition
  • Citing Book
  • January 2009