PreprintPDF Available

Data Analysis Using Statistical Methods: Case Study of Categorizing the Species of Penguin

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Statistical analysis is a scientific tool that helps collect and analyze large amounts of data to identify common patterns and trends to convert them into meaningful information. The goal is to use a dataset on which various statistical methods can be implemented so that one can get accurate predictions. The accuracy metric used in the project give us a 98.5% accuracy while categorizing the data.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Nowadays, insurance contract reserves for coupled lives are considered jointly, which has a significant influence on the process of determining actuarial reserves. In this paper, conditional survival distributions of life insurance reserves are computed using copulas. Subsequently, the results are compared with an independence case. These calculations are based on selected Archimedean copulas and apply when the ‘death of one individual’ condition exists. The estimation outcome indicates that the insurer reserves calculated by means of Archimedean copulas are far more effective than those resulting from an independence assumption. The study demonstrates that copula-based dependency modelling improves the calculations of reserves made for actuarial purposes.
Article
Full-text available
Establishment of switchgrass (Panicum virgatum L.) is challenging, and failure in establishment may expose growers to considerable economic risk. The objectives of this research were to (i) evaluate whether management practices are variety-specific for the establishment of switchgrass and (ii) assess the effectiveness of cover crops as preceding crops on ‘Shawnee’ switchgrass establishment. Therefore, two studies were conducted at the University of Massachusetts Agricultural Experiment Station in Deerfield, MA, USA, in the 2011–2012 and 2012–2013 growing seasons. In Experiment 1, cover crop treatments (fallow, oat (Avena sativa L.) and rye (Secale cereale L.)) were the main plots, the two seeding methods (no-till drill and a cultipacker seeder (Brillion)) were the sub-plots, and the two varieties (‘Cave-in-Rock’ (CIR) and Shawnee)) were the sub-sub-plots. The second study was conducted using Shawnee switchgrass and involved the three cover crop treatments used in Experiment 1 using a cultipacker seeder with seed firming prior to planting but not afterwards (consistent in both experiments). The results indicated that a combination of oat and no-till resulted in higher tiller density (493%), lower weed biomass (77%), increased switchgrass biomass (SGB) (283%) and SGB to weed biomass (WB) ratio. Compared with Shawnee, CIR planted into a winter-killed oat residue had higher tiller density (93%), lower weed biomass (18%), higher switchgrass yield (128%) and thus a greater SGB:WB ratio (507%). Trends of switchgrass response to management practices, however, were similar between the two varieties, indicating that seed quality rather than management practices could influence switchgrass’s response to management practices. In Experiment 2, Shawnee tiller density was suppressed by rye as the preceding crop, possibly due to late termination of rye. Shawnee switchgrass yields were below 1000 kg ha−1 under all management practices; thus, harvesting should happen in the year following establishment. Future research should focus on comparing no-till drilling with cultipacker seeder with rolling not only before but after seeding to increase seed–soil contact.
Article
Full-text available
In the work of Valdez and Shi (2011) and Safari-atesari and Fathi-Vajargah (2015), copula model was fitted on empirical evidence and a predictive model was developed. In this article, we anticipate accident probability after viewing the accidents for the year. This type of actuarial application is predictive modeling for considering the effect of a policyholder's choice of coverage on frequency of accidents which can be used by using Bayes' rule. We can compute the probability by the Frank copula expression and based on the marginal distribution of policyholder's choice of coverage. According to the results, the largest conditional accident probability is observed for the "first level" and the lowest is observed for the "third level". Additionally, we derive the conditional expected frequency of claims for each policyholder and to examine the effect of policy selection on frequency of accidents, we carry out a pairwise comparison for the three types of coverage. Also, we investigate the effects of covariates on the accident probability without and with the information on the coverage choice for each single policyholder.
Article
Full-text available
Modelling claims severity for obtaining insurance premium is one of the major concerns of the insurance industry. There is a considerable amount of literature on the actuarial application of the copula model to calculate the pure premium. In this paper, we model claims severity for computing the pure premium in the collision market by means of the count copula model. Moreover, we apply a regression model using a generalized beta distribution of the second kind (GB2) to compute the premium for an average claim and the conditional computation for all coverage levels. Like many other researchers, we assume that the number of accidents is independent from the size of claims. For real data application, we use a portfolio of a major automobile insurer in Iran in 2007-2008, with a subsample of 59,547 policies available in their portfolio. We then proceed to compare the estimated premiums with the real premiums. The results demonstrate that there is strong positive dependency between the real premium and the estimated one.
Article
Full-text available
Existence of adverse selection in insurance markets could have irreversible effects on enterprise decision-making process and obligations of insurance companies. In this article, testing adverse selection is done by jointly modeling the coverage selection and accidents frequency using Frank's copula, where the dependence parameter states the existence of relationship between coverage selection and the frequency of accidents. Our margins are modeled by ordered logistic regression model for the coverage selection and negative binomial regression model for the accidents frequency. The copula model is calibrated using 59,547 one-year cross-sectional cases of collision insurance coverage of Iran Insurance co. The results indicate a significant positive coverage selection-accidents frequency relationship.
Article
Full-text available
Background: Sexual segregation in vertebrate foraging niche is often associated with sexual size dimorphism (SSD), i.e., ecological sexual dimorphism. Although foraging behavior of male and female seabirds can vary markedly, differences in isotopic (carbon, δ13C and nitrogen, δ15N) foraging niche are generally more pronounced within sexually dimorphic species and during phases when competition for food is greater. We examined ecological sexual dimorphism among sympatric nesting Pygoscelis penguins asking whether environmental variability is associated with differences in male and female pre-breeding foraging niche. We predicted that all Pygoscelis species would forage sex-specifically, and that higher quality winter habitat, i.e., higher or lower sea ice coverage for a given species, would be associated with a more similar foraging niche among the sexes. Results: P2/P8 primers reliably amplified DNA of all species. On average, male Pygoscelis penguins are structurally larger than female conspecifics. However, chinstrap penguins were more sexually dimorphic in culmen and flipper features than Adélie and gentoo penguins. Adélies and gentoos were more sexually dimorphic in body mass than chinstraps. Only male and female chinstraps and gentoos occupied separate δ15N foraging niches. Strong year effects in δ15N signatures were documented for all three species, however, only for Adélies, did yearly variation in δ15N signatures tightly correlate with winter sea ice conditions. There was no evidence that variation in sex-specific foraging niche interacted with yearly winter habitat quality. Conclusion: Chinstraps were most sexually size dimorphic followed by gentoos and Adélies. Pre-breeding sex-specific foraging niche was associated with overall SSD indices across species; male chinstrap and gentoo penguins were enriched in δ15N relative to females. Our results highlight previously unknown trophic pathways that link Pygoscelis penguins with variation in Southern Ocean sea ice suggesting that each sex within a species should respond similarly in pre-breeding trophic foraging to changes in future winter habitat.
Article
Introduction of the VgDGAT1A gene in soybean [Glycine Max (L.) Merr] genotypes increased both protein and oil content and resulted in earlier maturation compared to commonly cultivated soybean genotypes (e.i. Jack). However, the effect of VgDGAT1A gene on the length of the vegetative, reproductive and the overall growth period of soybean has not been thoroughly evaluated. A randomized complete block design consisting of two transgenic soybean genotypes with VgD1‐1 and VgD1‐2 highly active acyl‐CoA: diacylglycerol acyltransferase (DGAT) from ironweed [Vernonia galamensis (Cass) Less.] and non‐transgenic control (Jack) was evaluated in field studies in 2015 and 2016 in Lexington, Kentucky. Levels of sucrose accumulation and seed weight were calculated for the three genotypes. Soybean grain yield, seed weight and seed number were similar among all genotypes in both years. Modified genotypes reached R7 (beginning of physiological maturity) stage earlier and the rate of dry weight accumulation in individual seeds ranged between 2.8 to 4.4 mg/seed/day and was not different in comparison to the control (ranged 3.4 to 4.7 mg/seed/day). While days to reach R7 was shorter in VgD, there were no differences among genotypes for pod weight (PW), seed weight (SW), pod (PS) and seed sucrose (SS) concentrations. Linear plateau and cubic models were the best fit for seed weight for both years. These results indicated that despite earlier maturation in VgD genotypes, seed growth and final soybean yield were similar among VgD and Jack. This article is protected by copyright. All rights reserved Determine the influence of enhanced DGAT oil/Protein soybean variety on yield and its component Determine the influence of enhanced DGAT oil/Protein soybean variety on seed and pod filling du Determine the influence of enhanced DGAT oil/Protein soybean variety on pod sucrose accumula
Article
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, deep learning, survival analysis, multiple testing, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra. This Second Edition features new chapters on deep learning, survival analysis, and multiple testing, as well as expanded treatments of naïve Bayes, generalized linear models, Bayesian additive regression trees, and matrix completion. R code has been updated throughout to ensure compatibility.
Article
Copula models have been widely used to model the dependence between continuous random variables, but modelling count data via copulas has recently become popular in the statistics literature. Spearman's rho is an appropriate and effective tool to measure the degree of dependence between two random variables. In this paper, we derive the population version of Spearman's rho via copulas when both random variables are discrete. The closed-form expressions of the Spearman correlation are obtained for some copulas with different marginal distributions. We derive the upper and lower bounds of Spearman's rho for Bernoulli marginals. The proposed Spearman's rho correlations are compared with their corresponding Kendall's tau values and their functional relationships are characterized in some special cases. An extensive simulation study is conducted to demonstrate the validity of our theoretical results. Finally, we propose a bivariate copula regression model to analyse the count data of a cervical cancer dataset.