Songsak Sriboonchitta’s research while affiliated with Chiang Mai University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (122)


Clustering mixed numerical and categorical data with missing values
  • Article

September 2021

·

128 Reads

·

89 Citations

Information Sciences

·

·

Songsak Sriboonchitta

This paper proposes a novel framework for clustering mixed numerical and categorical data with missing values. It integrates the imputation and clustering steps into a single process, which results in an algorithm named Clustering Mixed Numerical and Categorical Data with Missing Values (k-CMM). The algorithm consists of three phases. The initialization phase splits the input dataset into two parts based on missing values in objects and attributes types. The imputation phase uses the decision-tree-based method to find the set of correlated data objects. The clustering phase uses the mean and kernel-based methods to form cluster centers at numerical and categorical attributes, respectively. The algorithm also uses the squared Euclidean and information-theoretic-based dissimilarity measure to compute the distances between objects and cluster centers. An extensive experimental evaluation was conducted on real-life datasets to compare the clustering quality of k-CMM with state-of-the-art clustering algorithms. The execution time, memory usage, and scalability of k-CMM for various numbers of clusters or data sizes were also evaluated. Experimental results show that k-CMM can efficiently cluster missing mixed datasets as well as outperform other algorithms when the number of missing values increases in the datasets.


The role of weather conditions on tourists’ decision-making process: a theoretical framework and an application to China’s inbound visitors

February 2021

·

143 Reads

·

10 Citations

This paper explores the role of weather conditions on tourists’ decision-making. For this purpose, it develops a theoretical framework that integrates how individual decisions are made at the micro level into a dynamic macro-level model with three decision stages: before booking, before departure, and during the trip. The model is empirically tested applying chi-square analysis, logistic regression and multiple correspondence analysis on a sample of inbound tourists visiting China. The results indicate that weather variations affect tourists’ decisions, mainly regarding activities. The specific weather conditions that impact on visitors’ behaviour are different at each decision-making stage. Additionally, the influence of weather becomes smaller as the stages advance. The empirical results also indicate that the effect of weather is affected by tourists’ characteristics. Finally, the results of the paper are used to provide recommendations for including weather consideration into the design and management of tourism products. ____________________ 50 free online copies: https://www.tandfonline.com/eprint/NDXJKURAMNNN7UEQWICU/full?target=10.1080/13683500.2021.1883555


The general procedure for clustering categorical data
A method for k-means-like clustering of categorical data
  • Article
  • Publisher preview available

September 2019

·

633 Reads

·

38 Citations

Journal of Ambient Intelligence and Humanized Computing

Despite recent efforts, the challenge in clustering categorical and mixed data in the context of big data still remains due to the lack of inherently meaningful measure of similarity between categorical objects and the high computational complexity of existing clustering techniques. While k-means method is well known for its efficiency in clustering large data sets, working only on numerical data prohibits it from being applied for clustering categorical data. In this paper, we aim to develop a novel extension of k-means method for clustering categorical data, making use of an information theoretic-based dissimilarity measure and a kernel-based method for representation of cluster means for categorical objects. Such an approach allows us to formulate the problem of clustering categorical data in the fashion similar to k-means clustering, while a kernel-based definition of centers also provides an interpretation of cluster means being consistent with the statistical interpretation of the cluster means for numerical data. In order to demonstrate the performance of the new clustering method, a series of experiments on real datasets from UCI Machine Learning Repository are conducted and the obtained results are compared with several previously developed algorithms for clustering categorical data.

View access options

A Novel Hybrid Autoregressive Integrated Moving Average and Artificial Neural Network Model for Cassava Export Forecasting

September 2019

·

425 Reads

·

10 Citations

International Journal of Computational Intelligence Systems

This paper proposes a novel hybrid forecasting model combining autoregressive integrated moving average (ARIMA) and artificial neural network (ANN) with incorporating moving average and the annual seasonal index for Thailand's cassava export (i.e., native starch, modified starch, and sago). The comprehensive experiments are conducted to investigate the appropriate parameters of the proposed model as well as other forecasting models compared. In particular, the proposed model is experimentally compared to the ARIMA, the ANN and the other hybrid models according to three popular prediction accuracy measures, namely mean square error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE). The empirical results show that the proposed model gives the lowest error in all three measures for the native starch and the modified starch which are major cassava exported products (98% of the total export volume). However, the Khashei and Bijari's model is the best model for the sago (2% of the total export volume). Therefore, the proposed model can be used as an alternative forecasting method for stakeholders making a decision in cassava international trading to obtain better accuracy in predicting future export of native starch and modified starch which are the majority of the total export.


Type 2 constacyclic codes over [Formula presented] of oddly even length

February 2019

·

6,850 Reads

·

5 Citations

Discrete Mathematics

·

·

Fang-Wei Fu

·

[...]

·

Songsak Sriboonchitta

Let [Formula presented] be a finite field of cardinality [Formula presented], [Formula presented] be an odd positive integer, and denote [Formula presented]. Let [Formula presented]. Then [Formula presented]-constacyclic codes over [Formula presented] are called constacyclic codes over [Formula presented] of Type 2. In this paper, an explicit representation and a complete description for all distinct [Formula presented]-constacyclic codes over [Formula presented] of length [Formula presented] and their dual codes are given. Moreover, explicit formulas for the number of codewords in each code and the number of all such codes are provided respectively. In particular, all distinct self-dual [Formula presented]-constacyclic codes over [Formula presented] of length [Formula presented] are presented precisely. In addition, a complement to a result in Cao et al. (2017) is given.


A class of repeated-root constacyclic codes over Fpm[u]/ue\mathbb{F}_{p^m}[u]/\langle u^e\rangle of Type 22

October 2018

·

7,108 Reads

Let F p m be a finite field of cardinality p m where p is an odd prime, n be a positive integer satisfying gcd(n, p) = 1, and denote R = F p m [u]/u e where e ≥ 4 be an even integer. Let δ, α ∈ F × p m. Then the class of (δ + αu 2)-constacyclic codes over R is a significant subclass of constacyclic codes over R of Type 2. For any integer k ≥ 1, an explicit representation and a complete description for all distinct (δ + αu 2)-constacyclic codes over R of length np k and their dual codes are given. Moreover, formulas for the number of codewords in each code and the number of all such codes are provided respectively. In particular, all distinct (δ + αu 2)-contacyclic codes over F p m [u]/u e of length p k and their dual codes are presented precisely.


An Ensemble Model of Arima and Ann With Restricted Boltzmann Machine Based on Decomposition of Discrete Wavelet Transform for Time Series Forecasting

October 2018

·

95 Reads

·

16 Citations

Journal of Systems Science and Systems Engineering

Time series forecasting research area mainly focuses on developing effective forecasting models to improve prediction accuracy. An ensemble model composed of autoregressive integrated moving average (ARIMA), artificial neural network (ANN), restricted Boltzmann machines (RBM), and discrete wavelet transform (DWT) is presented in this paper. In the proposed model, DWT first decomposes time series into approximation and detail. Then Khashei and Bijari's model, which is an ensemble model of ARIMA and ANN, is applied to the approximation and detail to extract their both linear and nonlinear components and fit the relationship between the components as a function instead of additive relationship. Furthermore, RBM is used to perform pre-training for generating initial weights and biases based on inputs feature for ANN. Finally, the forecasted approximation and detail are combined to obtain final forecasting. The forecasting capability of the proposed model is tested with three well-known time series: sunspot, Canadian lynx, exchange rate time series. The prediction performance is compared to the other six forecasting models. The results indicate that the proposed model gives the best performance in all three data sets and all three measures (i.e. MSE, MAE and MAPE).


Training Attractive Attribute Classifiers Based on Opinion Features Extracted from Review Data

October 2018

·

30 Reads

·

16 Citations

Electronic Commerce Research and Applications

Researchers have proposed statistical regression models that analyse on-line review data to identify attractive attributes of a product or service. This research has the same aim, but with an approach based on machine learning models instead of statistical models. The proposed approach first extracts attribute-level sentiments from the review text by natural language processing techniques, then derives features that reflect the non-linear relations between attribute performance and customer satisfaction based on the sentiments. The non-linear features are fed to the Support Vector Machine (SVM) model to train predictive attractive attribute classifiers. The proposed approach is evaluated on a hotel review dataset crawled from TripAdvisor. The experiment results indicate that the classifiers reach a precision of 79.3% and outperform the existing statistical models by a margin of over 10%.


Expectile Kink Regression: An Application to Service Sector Output

July 2018

·

26 Reads

Varith Pipitpojanakarn

·

·

Worapon Yamaka

·

[...]

·

In this study, we propose a non-linear model for explaining the relationship between the dependent and the independent variables beyond the conditional mean. We extend the kink approach to expectile regression thus the model provides a more flexible means to explain the non-linear relationship in the model across different expectile indices. We also introduce the sup-F statistic test for the existence of kink effect in each expectile. The simulation and application studies are also proposed to examine the performance of our model. We apply our methodology to study the input factor affecting service sector growth in Asian economy. The use of this model allows us to identify and explore the non-linear labour effect on the service output. We can find both labour effect and kink effect present over a range of expectiles in the service output in this application.


Negacyclic codes of length 4ps over Fpm+uFpm and their duals

April 2018

·

9,522 Reads

·

21 Citations

Discrete Mathematics

For any odd prime p, negacyclic codes of length 4ps over the finite commutative chain ring Fpm+uFpm are investigated. The algebraic structures of such codes are classified and completely determined. As an application, the number of codewords and the dual of each negacyclic code are obtained. Simpler structure of cyclic codes of length 4ps over Fpm+uFpm is also noted. Among others, some self-dual negacyclic and cyclic codes of length 4ps over Fpm+uFpm are provided.


Citations (62)


... Numerical data are quantitative measurements that can be mathematically manipulated (e.g., age, income). Categorical data represent qualitative characteristics and are typically represented by distinct categories and cannot be subjected to mathematical operations (e.g., gender, blood type) [19]. ...

Reference:

Optimizing data privacy: an RFD-based approach to anonymization strategy selection
Clustering mixed numerical and categorical data with missing values
  • Citing Article
  • September 2021

Information Sciences

... norm, self-concepts, and return) (Moran et al., 2018), as well as cognitive biases due to dispositions and emotions (Wattanacharoensil & La-ornual, 2019). Behavioural literature takes a relatively general perspective, from internal characteristics to external conditions impacting the decision-making process, including individual-specific socio-economic attributes (Li, Lee, & Yang, 2019), supply-side attractiveness (Chen, Li, Wu, Wu, & Xin, 2022), weather conditions (Tang, Wu, Ramos, & Sriboonchitta, 2021), environmental risk and uncertainty (Karl, 2018) and so on. While relating to all aspects may be beyond the scope of a single article, this study integrates the functional factors as comprehensively as possible into the functional subgraph of constructed tourism-KG for disentangling the tourist decision-making process. ...

The role of weather conditions on tourists’ decision-making process: a theoretical framework and an application to China’s inbound visitors
  • Citing Article
  • February 2021

... Penelitian ketujuh dari penelitian ini adalah aplikasi dapat menentukan penggunaan nilai keliling yang optimal [9]. Penelitian kedelapan membahas hasil dari penelitian ini adalah menggabungkan rata-rata bergerak dan indeks musiman tahunan untuk ekspor singkong Thailand yaitu, pati asli, pati termodifikasi, dan sagu [10]. Hasil dari penelitian kesembilan adalah kinerja yang diperoleh dengan metode Moving Average dan Exponential Smoothing lebih baik daripada dua model yang digunakan untuk perbandingan: model Category Profile (atau naif), yang sebagian besar digunakan oleh perusahaan untuk prakiraan tersebut, dan model KmDt yang didasarkan pada teknik clustering dan pohon keputusan. ...

A Novel Hybrid Autoregressive Integrated Moving Average and Artificial Neural Network Model for Cassava Export Forecasting

International Journal of Computational Intelligence Systems

... Clustering is an unsupervised learning where the objects are grouped based on their inherent similarity [42]. Clustering methods can be used for data processing, such as motion segmentation and target recognition [43], which can also provide preprocessing and feature extraction for other machine learning algorithms [44]. ...

A method for k-means-like clustering of categorical data

Journal of Ambient Intelligence and Humanized Computing

... The text mining literature has mainly focused on tourists' opinions on service attributes, such as destination attributes (e.g., Jiang et al., 2021;Marine-Roig & Huertas, 2020), hotel attributes (e.g., Bi et al., 2019;Ou et al., 2018;Wang et al., 2020), and attractions (e.g., Y. Sun et al., 2017), ignoring the richer information than can be obtained from tourists' evaluations of service employees' role performance. In experience-based industries, most tangible product attributes and intangible service encounters are intertwined (Kara et al., 2005), and frontline service employees play a key role in product/service delivery and customerpersonnel interactions. ...

Training Attractive Attribute Classifiers Based on Opinion Features Extracted from Review Data
  • Citing Article
  • October 2018

Electronic Commerce Research and Applications

... Their approach, which aimed at refining accuracy by simplifying complex raw data into manageable components, was validated with large-scale energy datasets and showed substantial improvements in forecasting precision. Pannakkong et al. [159] tackled significant wave height forecasting with a Bayesian decomposition-based ensemble method, integrating decomposition with Bayesian networks. This method proved to reduce prediction errors significantly, particularly beneficial in maritime operations where high-risk scenarios are prevalent. ...

An Ensemble Model of Arima and Ann With Restricted Boltzmann Machine Based on Decomposition of Discrete Wavelet Transform for Time Series Forecasting
  • Citing Article
  • October 2018

Journal of Systems Science and Systems Engineering

... For λ = 2, many authors studied them (see, e.g., [9][10][11][12][13][14][15][16][17][18]). In particular, for a prime power length, their structure and their symbol-pair distance were completely established in [12,19]. ...

Constacyclic codes of length nps over Fpm + uFpm

Advances in Mathematics of Communications

... Oztas et al. [24] constructed a new family of polynomials over a finite field GF (16), which generates reversible codes over this field. More of studies of cyclic DNA codes over different rings can be found in [25][26][27][28][29][30]. ...

DNA Cyclic Codes Over The Ring $\mathbb{F}_2[u,v]/\langle u^2-1, v^3-v, uv - vu \rangle
  • Citing Article
  • March 2018

International Journal of Biomathematics

... This presentation is a part of results in [5] which we provided the duals of each λ-constacyclic code as a particular case of our research. The units of R can be expressed as two types, namely, λ ∈ F p m − {0}, and α + uβ, where α, β are nonzero. ...

On a class of constacyclic codes of length 4 p s over $\mathbb{F}_{p^{m}} + u\mathbb{F}_{p^m}
  • Citing Article
  • February 2018

Journal of Algebra and Its Applications

... In 2012, Dinh et al. [8] gave the algebraic structures of constacyclic codes of length 2p s over p m + u p m and their dual codes. In 2018, Dinh et al. [11] investigated the algebraic structures of negacyclic codes of length 4p s over p m + u p m and their dual codes. In addition, constacyclic codes of length 4p s over p m + u p m are investigated in [12] and [13]. ...

Negacyclic codes of length 4ps over Fpm+uFpm and their duals
  • Citing Article
  • April 2018

Discrete Mathematics