About
19
Publications
12,273
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
161
Citations
Introduction
I am data scientist at FinTech who applied mathematics and machine learning to increase business metrics. I'm interest in credit scoring and fraud area.
Additional affiliations
August 2020 - present
Amar Bank Indonesia
Position
- Analyst
Description
- I work as a Data Scientist at Amar Bank Indonesia. Experienced hands-on applied mathematics, data scientist and machine learning with passion to help an effective and efficient business, whether it is continuously improving north start metrics.
September 2018 - June 2019
Bukalapak
Position
- Analyst
Description
- Applied machine learning at Mitra Bukalapak.
February 2018 - August 2018
Ilmuone Data
Position
- Analyst
Description
- Discovering new opportunities to optimise the business through analytics, mathematics models, and machine learning.
Education
September 2013 - February 2017
Publications
Publications (19)
The development of financial technology (Fintech) in emerging economies such as Indonesia has been rapid in the last few years, opening a great potential for loan businesses, from venture capital to micro and personal loans. To survive in such competitive markets, new companies need a robust credit-scoring model. However, building a reliable model...
Abstrak Inflasi merupakan salah satu indikator untuk mengukur perkembangan suatu bangsa. Apabila inflasi tidak terkontrol akan memberikan banyak dampak negative terhadap masyarakat disuatu negara. Ada banyak cara untuk mengendalikan inflasi, salah satunya dengan peramalan. Peramalan adalah suatu aktivitas untuk mengetahui kejadian di masa mendatang...
p class="Abstrak">Inflasi merupakan salah satu indikator untuk mengukur perkembangan suatu bangsa. Apabila inflasi tidak terkontrol akan memberikan banyak dampak negative terhadap masyarakat disuatu negara. Ada banyak cara untuk mengendalikan inflasi, salah satunya dengan peramalan. Peramalan adalah suatu aktivitas untuk mengetahui kejadian di masa...
Vehicle routing problem with time windows (VRPTW) is one of NP-hard problem. Multi-trip is approach to solve the VRPTW that looking trip scheduling for gets best result. Even though there are various algorithms for the problem, there is opportunity to improve the existing algorithms in order gaining a better result. In this research, genetic algori...
Vehicle routing problem with time windows (VRPTW) is one of NP-hard problem. Multi-trip is approach to solve the VRPTW that looking trip scheduling for gets best result. Even though there are various algorithms for the problem, there is opportunity to improve the existing algorithms in order gaining a better result. In this research, genetic algori...
p> Jatropha is a plant that has many functions, but this plant can be attacked by various diseases. Expert systems can be applied in identifying so that can help both farmers and extension workers to identify the disease. one of method that can be used is Extreme Learning Machine. Extreme Learning Machine is a method of learning in Neural Network w...
Jatropha curcas is a plant that can be used as a substitute for diesel fuel. Lack of knowledge of farmers and the limited number of experts and extension agents to deal with the disease of the plant will result lower quality of Jatropha curcas. Dempster-Shafer method can be a solution for decision making based on previous research. The difference i...
In this paper, we propose a new method for forecasting based on automatic-optimized fuzzy time series to forecast Indonesia Inflation Rate (IIR). First, we propose the forecasting model of two-factor high-order fuzzy-trend logical relationships groups (THFLGs) for predicting the IIR. Second, we propose the interval optimization using automatic clus...
Inflation is a benchmark of a country's economic development. Inflation is very influential on various things, so forecasting inflation to know on upcoming inflation will impact positively. There are various methods used to perform forecasting, one of which is the fuzzy time series forecasting with maximum results. Fuzzy logical relationships (FLR)...
span lang="EN-US">Distribution is an important aspect of industrial activity to serve customers on time with minimal operational cost. Therefore, it is necessary to design a quick and accurate distribution route. One of them can be design travel distribution route using the k-means method and genetic algorithms. This research will combine k-means m...
Tempe was one of the perishable foods with a durability of 2 to 3 days. Tempe home-based industry must take into account its production in order to avoid losses. Suitable planning and forecasting can determine the ways for the production process is implemented. Previously, regression analysis was used as the method to improve the process. This rese...
Jatropha Curcas is a plant that has many functions and uses for everyday purposes such as biodiesel and beauty tools, but this plant can not can also be separated from the disease. Expert systems can be applied in identifying so as to help both farmers and extension workers to identify disease. The method that can be used one of them is the method...
Jatropha curcas is an important commodity for farmers. The farmers must be aware of the disease caused by pest or virus for the existence and benefits of this plant. The main obstacle is the lack of farmers' knowledge about diseases and a system that utilize plant expert knowledge is needed. This paper proposes Fuzzy Neural Network (FNN) method to...
Jatropha Curcas is a plant that has many functions and uses, but not apart from that this plant can be attacked by disease. Expert systems can be applied in identification so as to help both farmers and extension workers to identify the disease. In this paper the method used in the identification of Fuzzy Neural Network (FNN) with an accuracy of 30...
In the field of dentistry, there are several types of dental diseases. A wide variety of dental diseases become a difficulty for among doctors or medical students to identify the disease. The existence of a decision support system for doctors or medical students to define the disease on the determination of disease using fuzzy logic is still not op...
Teeth is one tool in the human digestive system that serves as a food destroyer for easy digestion. Diseases that attack teeth can inhibit this activity and can not be identified rapidly by young dentist. This problem can be solved with identification system. This identification system uses FIS Tsukamoto in analyzing. Subjective function of the mem...
Abstrak
Kredit merupakan suatu pendapatan terbesar bagi bank. Akan tetapi, bank harus selektif dalam menentukan nasabah yang dapat menerima kredit. Permasalahan ini menjadi semakin komplek karena ketika bank salah memberikan kredit kepada nasabah dapat merugikan, selain itu banyaknya parameter penentu dalam penentuan nasabah yang kredit. Clustering...
Dental disease detection is needed because majority of Indonesian have never experienced dental disease. There are three areas affected by dental disease: South Sulawesi, West Sulawesi, and South Kalimantan according to Basic Health Research 2013. Obtaining accurate detection is difficult because it requires expert observations and interviews in or...
Questions
Questions (2)
I have large dataset with more 400 features. I had done pre-processing and feature engineering. Random forest importance feature give best result. Moreover, I compared many models to address it. To measure the result, I use K-Fold validation. I try two model with new data for validation. On training step logistics regression give f1 score 0.76 and random forest 0.97. I check each fold the model give result around the average result.
But I test with 100 new data logistics regression and random forest each gives f1 score 0.715 and 0.657.
Note: I use python with sklearn to do it.
Do you have any suggestion to improve the model to be robust with new data?
I have large data (more 200K with more 30 feature). I try clusteirng model as K-means, Gaussian Mixture, Agglomerative hierarchical clustering. Overall, all give same result. I use under 25K datasets that still provide varied clusters but when more than 30K datasets only 99% of data is in one cluster. The number of cluster is 6, it the best value from Elbow method.