Figure 3 - uploaded by Suruswadee Nanglae
Content may be subject to copyright.
Source publication
This study aims to find factors that affect the distribution of wild tea in the upper north of Thailand and build mathematical models that show the prediction of wild tea distribution. We obtained wild tea data from Tea Institute at Mae Fah Luang University. We used 3 climatic factors (rainfall, humidity and temperature) and 5 geographic factors (s...
Context in source publication
Context 1
... of wild tea in the upper north of Thailand and build mathematic model, known as “species distribution models (SDMs)”. The SDMs is an important tool for conservation and evolution study. It shows the relationship between species ranges and environmental parameters. The SDMs are categorized in two groups. First group, the methods that require presence- only data are called “profile techniques” , for example, a bioclimatic analysis and prediction system (BIOCLIM), a flexible modeling procedure for mapping potential distributions of plants and animals (DOMAIN) and ecological niche factors analysis (ENFA). Second group, the methods that require both presence and absence data are called “group discrimination techniques” , for example, generalized linear models (GLMs) and generalized additive models (GAMs). Both groups are based on statistical models. Elith et al [2] reported that when they compared the various SDMs, they found that group discrimination techniques tend to perform better than profile techniques. Thus, group discrimination techniques are increasingly used. The main problem when we use group discrimination techniques is absence data are unavailable. So, many studies have used pseudo-absence data in place of real absence data [3]. There are several approaches for generating pseudo-absence data. It can be categorized in two main approaches. The first approach is randomly selecting the pseudo-absence points including background across all of the study area [4-5]. The second approach is selecting pseudo-absence points with two steps. First step is estimating suitability area by profile technique and the second step is selecting pseudo-absence points outside the suitability area [3, 5]. In this study, we chose group discrimination techniques for building SDMs with generalized linear models (GLMs). We assumed that the area in the circle is the suitable area for wild tea. So, the pseudo-absence points were randomly selected outside the circles at each radius (5 km, 10 km,15 km, 20 km, 25 km, 30 km, 35 km, 40 km, 45 km, and 50 km) from presence points and then compared SDMs at each radius. Species and environmental data We used 41 points of tea data with three climatic factors (rainfall, humidity and temperature) and five geographic factors (soil series, slope, digital elevation model (DEM), distance from the main river and aspect) in this study. The tea data were obtained from Tea Institute at Mae Fah Luang University. Rainfall data, humidity data; temperature data and river data were obtained from Remote Sensing and GIS at Asian Institute of Technology; DEM were obtained from Land Development Department and soil series data were obtained from Center for Information Technology Services at Mae Fah Luang University. We calculated slope and aspect from DEM and calculated distance from the main river from river data. The study area was Upper-North of Thailand. These areas covered eight provinces: Chiang Mai, Chiang Rai, Mae Hong Son, Nan, Phayao, Phrae, Lampang and Lamphun. Points of tea data and the study area are shown in Figure 1. Generating pseudo – absence points We generated pseudo – absence points by randomly selected outside the circles at each radius (5 km, 10 km,15 km, 20 km, 25 km, 30 km, 35 km, 40 km, 45 km, and 50 km) from presence points. From Figure 2, we were overlaying two maps with R programming. The first map, we drew circle around at each presence points with one radius. Then, we randomly selected 410 points on the map of study area (red points). The second map, we built map with 5 km × 5 km grid cells across of the study area with center point for each grid cell. When we overlaid two maps, the outside circle points (black points) were selected as pseudo-absence points. Although, the blue points were outside the circle but they were not selected as pseudo-absence points because the grid cells were not contained random points (red points). Modeling experiment Generalized linear models (GLMs) were introduced by Nelder and Wedderburn [10] in 1972. In GLMs, the functions in exponential family that are non-linear form are transformed to linear form. Estimation and inference are based on the theory of maximum likelihood estimation. We used logistic regression models that are one type of GLMs for binary (presence/absence) data to predict species distribution. Let Y 1 or 0 denote the presence/absence of a species respectively and p ( x ) P ( Y 1| X x ) be the probability that the species is present when X x . The resulting presence-absence response curve is p ( x ) e g ( u ) 1 e g ( u ) where g ( u ) is link function. p g ( u ) T x x This study, we randomly selected 20 times of pseudo-absence data set at each radius and used the area under the curve (AUC) to statistically evaluate each model. We ran GLMs and selected variables by stepwise method. In R programming, the criterion for selecting variables by stepwise was Akaike information criterion (AIC). From Figure 3, we compared these models with AUC by box plot. The box plot showed the median value, the distribution and the outliers of AUC at each radius. The lines in the box indicated the median values of AUC. It showed that the median values of AUC at 5 to 10 km were in the period of increasing. The median values of AUC at 15 to 30 km were rather constant and the median values of AUC at 35 to 50 km were changing. The points outside the end of the vertical lines were outliers. The outliers of AUC at each radius were not considered. The mean plot showed the mean value of AUC and 95% confidence interval at each radius. The most of mean values were about 0.85 to 0.90. These models were considered excellent discrimination. After we deleted the outliers at each radius, we fitted the cubic curve for comparing AUC at each radius. We tried to find the suitability radius for generating pseudo- absence data set. From Figure 4, it showed the cubic curve for comparing AUC at each radius. The range between 15 km up to 40 km was quite constant for AUC. Therefore, we selected smallest part of radius sizes 15 km and 20 km that still gave the excellent AUC to pursuit the suitability model. From Table 1, we randomly selected 20 times of pseudo-absence data set at each radius and the outliers of AUC were not considered. The mean values of AUC at each radius were closely. Although, the mean value at 20 km was more than that of at 15 km but the value of coefficient of variance (CV) at 20 km was more than that of at 15 km. It could interpret that the AUC at 20 km was varying more than AUC at 15 km so we couldn’t say that the model at radius 20 km was better than the model at radius 15 km. Then, we selected model that had AUC closed to the mean value to represent at each radius. We compared model and predictive maps. Table 2 showed the factors that were selected to model by stepwise method. At 15 km, there were 4 factors that affected the model (DEM, Rainfall, Humidity and Distance). DEM was the strongest effect on the models. Subordinate factors were rainfall, humidity and distance, respectively. At 20 km, there were 5 factors that affected the model (DEM, Rainfall, Humidity, Distance and Aspect). DEM was the strongest effects like the model at 15 km. Subordinate factors were rainfall, humidity, distance and aspect, respectively. From Figure 5 showed the estimation of presence and absence area, we compared two estimation maps with radius 15 km and 20 km and the red points on the maps were presence points. These figure showed that both maps estimated the presence area somewhat similarly but the map at radius 20 km seems to have more presence area than the map at radius 15 km and also AUC at radius 20 km was more than AUC at radius 15 km. Although, the map with larger presence area could not tell it was better than the other map because the AUC value at radius 20 km was more varying than AUC value at radius 15 km. So, we could not select the best model from this study but we could analyze the factors that affected the wild tea distribution. The next study, we would analyze factors with GLMs by choosing the other method of generating pseudo – absence points and try to improve the model by reducing the variance of AUC at each radius. Many studies demonstrated that the generating pseudo-absence points affected the resulting models [6]. In this study, we generated pseudo-absence points by randomly selecting outside the circle at each radius from presence points. It showed that species distribution models by GLMs depended on the radius size from presence points. If chosen radius size was too small (less than 15 km), the selected pseudo-absence points probably had the same geographical area or same climate whereas if radius size was too large (more than 35 km), surely the selected pseudo-absence points would have very different geographical area. Although our results were fairly effective, the random size would need to be tuned up by using other statistical techniques together with sufficient data for improving the model. We also thank the Tea Institute and Center for Information Technology Services at Mae Fah Luang University, Remote Sensing and GIS at Asian Institute of Technology and Land Development Department for all data in this ...