Trends in Data Mining in 2020

A CASE STUDY OF PROCESS ENGINEERING OF OPERATIONS IN WORKING SITES THROUGH DATA MINING AND AUGMENTED REALITY

Alessandro Massaro¹, Angelo Galiano¹, Antonio Mustich¹, Daniele Convertini¹ ,VincenzoMaritati¹, Antonia Colonna¹, Nicola Savino¹, Angela Pace², LeoIaquinta²
¹Dyrecta Lab , Via Vescovo Simplicio, Italy ²SO.CO.IN. SYSTEM srl, Italy

ABSTRACT

In this paper is analyzed the design of a software platform concerning a case study of process engineering involving the simultaneous adoption of data digitation, Data Mining –DM- processing, and Augmented Reality -AR-. Specifically is discussed the platform design able to upgrade the Knowledge Base –KBenabling production process optimizations in working sites. The KB is gained by following ‘Frascati’research guidelines addressing the possible ways to achieve the Knowledge Gain –KG-. The technologies such as AR and data entry mobile app are tailored in order to apply innovative data mining algorithms. In the first part of the paper is commented the preliminary project specifications, besides, in the second part, are shown the use cases, the unified modeling language –UML- models, and the mobile app mockupsenabling KG. The proposed work discusses preliminary results of an industry project

KEYWORDS

Frascati Guideline, Knowledge Base Gain, Data Mining, Augmented Reality

For More Details :
http://aircconline.com/ijdkp/V9N5/9519ijdkp01.pdf

APPLICATION OF DATA MINING TECHNIQUE TO PREDICT LANDSLIDES IN SRI LANKA

Karunanayake K.B.A.A.M and Wijayanayake W.M.J.I , University of Kelaniya, Sri Lanka

ABSTRACT

Landslides are the major natural disaster in hill country of Sri Lanka, which create terrible economical and ecological damages. Therefore, the fast detection is important. Currently in Sri Lanka,predict landslides based on a map reading approach. But a map is limited to specific point in time, and do not take current conditions into account. Therefore, develop a model/tool which has ability to efficiently deal with current situation is important. Hence within this study, prediction models were developed using Decision Tree and Neural Network data mining techniques,based on the data of Badulla and NuwaraEliya districts. Selected Decision Tree model for Badulla district has 96.2963% accuracy level and Nuwara Eliya district has 100% accuracy level. Though Decision tree models were outperformed, Neural Network models also have above 90% accuracy. Therefore, it can be concluded that both data mining techniques are suitableto develop andslide prediction models for Sri Lanka

KEYWORDS

Landslide, Data mining, Predictive analysis, Plan-Do-Check-Act, Decision tree

For More Details :
http://aircconline.com/ijdkp/V9N4/9419ijdkp04.pdf

ACCESS AND CONNECTION VIA TECH DATA AS AN ENABLER AF A THIN OR NONEXISTENT MARKET

Bathabile S C Amirchand Founder Gropeedy App, South Africa

ABSTRACT

This study provides a prototype of Real estate listing mobile application that has the capability to organize, store, maintain and search data from a Mobile Device such as android or iOS. This application helps the household owners to list their properties without any cost. This system comprises of a mobile app, Central Database, Satellite database and offline database system. It consists of software that combines one or more servers to the computer as well as to the Mobile user, making it a Mobile app and A Web Browser. Apple Mac OSX working framework can also be utilized to make this framework. This Suite of apparatuses involves graphical UI (GUI) based applications, command- line instruments, and documentation to help in the product advancement process. This will pave a way to formalize the much-neglected Former Homelands (Village Sector) and facilitate the development of an inclusive real estate Evaluation Data, thus, enabling access to areas where there has been no base price until now.

KEYWORDS

Data, Application, Real-estate, Technology, Connection and System

For More Details :
http://aircconline.com/ijdkp/V9N5/9519ijdkp02.pdf

COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB

Mahdi Naghibi¹, Reza Anvari¹, Ali Forghani¹ and Behrouz ² , ¹Malek-Ashtar University of Technology, Iran , ² Iran University of Science and Technology, Iran

ABSTRACT

The cost of acquiring training data instances for induction of data mining models is one of the main concerns in real-world problems. The web is a comprehensive source for many types of data which can be used for data mining tasks. But the distributed and dynamic nature of web dictates the use of solutions which can handle these characteristics. In this paper, we introduce an automatic method for topical data acquisition from the web. We propose a new type of topical crawlers that use a hybrid link context extraction method for topical crawling to acquire on-topic web pages with minimum bandwidth usage and with the lowest cost. The new link context extraction method which is called Block Text Window (BTW), combines a text window method with a block-based method and overcomes challenges of each of these methods using the advantages of the other one. Experimental results show the predominance of BTW in comparison with state of the art automatic topical web data acquisition methods based on standard metrics

KEYWORDS

Cost-Sensitive Learning, Data acquisition, Topical Crawler, Link Context, Web Data Mining

For More Details :
http://aircconline.com/ijdkp/V9N3/9319ijdkp04.pdf

CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION

Mariam Moustafa Reda, Mohammad Nassef and Akram Salah ,Cairo University, Egypt

ABSTRACT

A lot of classification algorithms are available in the area of data mining for solving the same kind ofproblem with a little guidance for recommending the most appropriate algorithm to use which gives best results for the dataset at hand. As a way of optimizing the chances of recommending the most appropriate classification algorithm for a dataset, this paper focuses on the different factors considered by data miners and researchers in different studies when selecting the classification algorithms that will yield desired knowledge for the dataset at hand. The paper divided the factors affecting classification algorithms recommendation into business and technical factors. The technical factors proposed are measurable and can be exploited by recommendation software tools.

KEYWORDS

Classification, Algorithm selection, Factors, Meta-learning, Landmarking

For More Details :
http://aircconline.com/ijdkp/V9N4/9419ijdkp01.pdf

IMPLEMENTATION OF RISK ANALYZER MODEL FOR UNDERTAKING THE RISK ANALYSIS OF PROPOSED BUILDING PROJECTS FOR A SELECTED CLIENT

Ibrahim Yakubu , Balewa University, Nigeria

ABSTRACT

The model of RISK ANALYZER was implemented as Knowledge-based System for the purpose of undertaking risk analysis for proposed construction projects in a selected domain. The Fuzzy Decision Variables (FDVs) that cause differences between initial and final contract sums of building projects were identified, the likelihood of the occurrence of the risks were determined and a Knowledge-Based System that would rank the risks was constructed using JAVA programming language and Graphic User Interface. The Knowledge-Based System is composed a Knowledge Base for storing data, an Inference Engine for controlling and directing the use of knowledge for problem-solution, and a User Interface that assists the user retrieve, use and alter data in the Knowledge Base. The developed Knowledge-Based System was compiled, implemented and validated with data of previously completed projects. The client could utilize the Knowledge-Based System to undertake proposed building projects.

KEYWORDS

RISK ANALYZER, Risk analysis, Knowledge-Based Systems, JAVA, Graphic User Interface

For More Details :
http://aircconline.com/ijdkp/V9N4/9419ijdkp03.pdf

SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS

Nehal Mohamed Ali, Marwa Mostafa Abd El Hamid and Aliaa Youssif , Arab Academy for Science Technology and Maritime, Egypt

ABSTRACT

Due to the enormous amount of data and opinions being produced, shared and transferred everyday across the internet and other media, Sentiment analysis has become vital for developing opinion mining systems. This paper introduces a developed classification sentiment analysis using deep learning networks and introduces comparative results of different deep learning networks. Multilayer Perceptron (MLP) was developed as a baseline for other networks results. Long short-term memory (LSTM) recurrent neural network, Convolutional Neural Network (CNN) in addition to a hybrid model of LSTM and CNN were developed and applied on IMDB dataset consists of 50K movies reviews files. Dataset was divided to 50% positive reviews and 50% negative reviews. The data was initially pre-processed using Word2Vec and word embedding was applied accordingly. The results have shown that, the hybrid CNN_LSTM model have outperformed the MLP and singular CNN and LSTM networks. CNN_LSTM have reported the accuracy of 89.2% while CNN has given accuracy of 87.7%, while MLP and LSTM have reported accuracy of 86.74% and 86.64 respectively. Moreover, the results have elaborated that the proposed deep learning models have also outperformed SVM, Naïve Bayes and RNTN that were published in other works using English datasets.

KEYWORDS

Deep learning, LSTM, CNN, Sentiment Analysis, Movies Reviews, Binary Classification

For More Details :
http://aircconline.com/ijdkp/V9N3/9319ijdkp02.pdf

DEEP LEARNING BASED MULTIPLE REGRESSION TO PREDICT TOTAL COLUMN WATER VAPOR (TCWV) FROM PHYSICAL PARAMETERS IN WEST AFRICA BY USING KERAS LIBRARY

Daouda DIOUF¹ , Awa Niang1and Sylvie Thiria² , ¹ Université Cheikh Anta Diop, Sénégal , ²Université Pierre et Marie Curie, France

ABSTRACT

Total column water vapor is an important factor for the weather and climate. This study applydeep learning based multiple regression to map the TCWV with elements that can improve spatiotemporal prediction. In this study, we predict the TCWV with the use of ERA5 that is the fifth generation ECMWF atmospheric reanalysis of the global climate. We use an appropriate deep learning based multiple regression algorithm using Keras library to improve nonlinear prediction between Total Column water vapor and predictors as Mean sea level pressure, Surface pressure, Sea surface temperature, 100 metre U wind component, 100 metre V wind component, 10 metre U wind component, 10 metre V wind component, 2 metre dew point temperature, 2 metre temperature.The results obtained permit to build a predictor which modelling TCWV with a mean abs error(MAE) equal to 3.60 kg/m2 and a coefficient of determination R 2 equal to 0.90.

For More Details :
http://aircconline.com/ijdkp/V9N6/9619ijdkp02.pdf

INSOLVENCY PREDICTION ANALYSIS OF ITALIAN SMALL FIRMS BY DEEP LEARNING

Agostino Di Ciaccio¹ and Giovanni Cialone² , ¹ university of Rome, Italy , ²Senior partner of Kairos Advisory srl., Italy

ABSTRACT

To improve credit risk management, there is a lot of interest in bankruptcy predictive models. Academic research has mainly used traditional statistical techniques, but interest in the capability of machine learning methods is growing. This Italian case study pursues the goal of developing a commercial firms in solvency prediction model. In compliance with the Basel II Accords, the major objective of the model is an estimation of the probability of default over a given time horizon, typically one year. The collected dataset consists of absolute values as well as financial ratios collected from the balance sheets of 14.966 Italian micro-small firms, 13,846 ongoing and 1,120 bankrupted, with 82 observed variables. The volume of data processed places the research on a scale like that used by Moody’s in the development of its rating model for public and private companies, RiskcalcTM. The study has been conducted using Gradient Boosting, Random Forests, Logistic Regression and some deep learning techniques: Convolutional Neural Networks and Recurrent Neural Networks. The results were compared with respect to the predictive performance on a test set, considering accuracy, sensitivity and AUC. The results obtained show that the choice of the variables was very effective, since all the models show good performances, better than those obtained in previous works. Gradient Boosting was the preferred model, although an increase in observation times would probably favour Recurrent Neural Networks.

KEYWORDS

Credit risk, Bankruptcy prediction, Deep learning

For More Details :
http://aircconline.com/ijdkp/V9N6/9619ijdkp01.pdf

A BUSINESS INTELLIGENCE PLATFORM IMPLEMENTED IN A BIG DATA SYSTEM EMBEDDING DATA MINING: A CASE OF STUDY

Alessandro Massaro, Valeria Vitti, Palo Lisco, Angelo Galiano and Nicola Savino , Dyrecta Lab, IT Research Laboratory, Italy

ABSTRACT

In this work is discussed a case study of a business intelligence –BI- platform developed within the framework of an industry project by following research and development –R&D- guidelines of ‘Frascati’. The proposed results are a part of the output of different jointed projects enabling the BI of the industry ACI Global working mainly in roadside assistance services. The main project goal is to upgrade the information system, the knowledge base –KB- and industry processes activating data mining algorithms and big data systems able to provide gain of knowledge. The proposed work concerns the development of the highly performing Cassandra big data system collecting data of two industry location. Data are processed by data mining algorithms in order to formulate a decision making system oriented on call center human resources optimization and on customer service improvement. Correlation Matrix, Decision Tree and Random Forest Decision Tree algorithms have been applied for the testing of the prototype system by finding a good accuracy of the output solutions. The Rapid Miner tool has been adopted for the data processing. The work describes all the system architectures adopted for the design and for the testing phases, providing information about Cassandra performance and showing some results of data mining processes matching with industry BI strategies.

KEYWORDS

Big Data Systems, Cassandra Big Data, Data Mining, Correlation Matrix, Decision Tree, Frascati Guideline

For More Details :
http://aircconline.com/ijdkp/V9N1/9119ijdkp01.pdf

CSEIJ

Trends in Data Mining in 2020

A CASE STUDY OF PROCESS ENGINEERING OF OPERATIONS IN WORKING SITES THROUGH DATA MINING AND AUGMENTED REALITY

ABSTRACT

KEYWORDS

APPLICATION OF DATA MINING TECHNIQUE TO PREDICT LANDSLIDES IN SRI LANKA

ABSTRACT

KEYWORDS

ACCESS AND CONNECTION VIA TECH DATA AS AN ENABLER AF A THIN OR NONEXISTENT MARKET

ABSTRACT

KEYWORDS

COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB

ABSTRACT

KEYWORDS

CATEGORIZATION OF FACTORS AFFECTING CLASSIFICATION ALGORITHMS SELECTION

ABSTRACT

KEYWORDS

IMPLEMENTATION OF RISK ANALYZER MODEL FOR UNDERTAKING THE RISK ANALYSIS OF PROPOSED BUILDING PROJECTS FOR A SELECTED CLIENT

ABSTRACT

KEYWORDS

SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS

ABSTRACT

KEYWORDS

DEEP LEARNING BASED MULTIPLE REGRESSION TO PREDICT TOTAL COLUMN WATER VAPOR (TCWV) FROM PHYSICAL PARAMETERS IN WEST AFRICA BY USING KERAS LIBRARY

ABSTRACT

INSOLVENCY PREDICTION ANALYSIS OF ITALIAN SMALL FIRMS BY DEEP LEARNING

ABSTRACT

KEYWORDS

A BUSINESS INTELLIGENCE PLATFORM IMPLEMENTED IN A BIG DATA SYSTEM EMBEDDING DATA MINING: A CASE OF STUDY

ABSTRACT

KEYWORDS

Reach Us