Top Data Mining & Knowledge Management Research articles of 2019

Top Data Mining &Knowledge Management Research articles of 2019

DETECTION OF HATE SPEECH IN SOCIAL NETWORKS: A SURVEY ON MULTILINGUAL CORPUS

Areej Al-Hassan and Hmood Al-Dossari, King Saud University, Riyadh, Saudi Arabia

ABSTRACT

In social media platforms, hate speech can be a reason of “cyber conflict” which can affect social life in both of individual-level and country-level. Hateful and antagonistic content propagated via social networks has the potential to cause harm and suffering on an individual basis and lead to social tension and disorder beyond cyber space. However, social networks cannot control all the content that users post. For this reason, there is a demand for automatic detection of hate speech. This demand particularly raises when the content is written in complex languages (e.g. Arabic). Arabic text is known with its challenges, complexity and scarcity of its resources. This paper will present a background on hate speech and its related detection approaches. In addition, the recent contributions on hate speech and its related anti-social behaviour topics will be reviewed. Finally, challenges and recommendations for the Arabic hate speech detection problem will be presented.

KEYWORDS

Text Mining, Social Networks, Hate Speech, Natural Language Processing, Arabic NLP

For More Details :
https://aircconline.com/csit/csit902.pdf

5^th International Conference on Data Mining and Applications (DMA 2019)

CONSENT BASED ACCESS POLICY FRAMEWORK

Geetha Madadevaiah¹ , RV Prasad¹ , Amogh Hiremath¹ , Michel Dumontier² , Andre Dekker³ ¹Philips Research, Philips Innovation Campus, Bangalore ^2,3Maastricht University, Maastricht, Netherlands

ABSTRACT

In this paper, we use Semantic Web Technologies to store and share the sensitive medical data in a secure manner. The framework builds on the advantages of the Semantic Web technologies and makes it secure and robust for sharing sensitive information in a controlled environment. The framework uses a combination of Role-Based and Rule-Based Access Policies to provide security to a medical data repository. To support the framework, we built a lightweight ontology to collect consent from the users indicating which part of their data they want to share with another user having a particular role. Here, we have considered the scenario of sharing the medical data by the owner of data, say the patient, with relevant people such as physicians, researchers, pharmacist, etc. We developed a prototype,which is validated using Sesame Open RDF Workbench with 202,908 triples and a consent graph stating consents per patient.

KEYWORDS

Access Policies, Semantic Web,RDF/SPARQL, Role Based, Rule Based

For More Details :
https://aircconline.com/csit/csit905.pdf

7^th International Conference on Database and Data Mining (DBDM 2019)

ATTRIBUTE REDUCTION AND DECISION TREE PRUNING TO SIMPLIFY LIVER FIBROSIS PREDICTION ALGORITHMS A COHORT STUDY

Mahasen Mabrouk¹, Abubakr Awad², Hend Shousha¹, Wafaa Alakel^1,3,Ahmed Salama, Tahany Awad¹ , ¹Faculty of Medicine, Cairo University, Egypt ²University of Aberdeen, UK ,³Ministry of Health and Population, Egypt

ABSTRACT

Background: Assessment of liver fibrosis is a vital need for enabling therapeutic decisions and prognostic evaluations of chronic hepatitis. Liver biopsy is considered the definitive investigation for assessing the stage of liver fibrosis but it carries several limitations. FIB-4 and APRI also have a limited accuracy. The National Committee for Control of Viral Hepatitis (NCCVH) in Egypt has supplied a valuable pool of electronic patients’ data that data mining techniques can analyze to disclose hidden patterns, trends leading to the evolution of predictive algorithms.

Aim: to collaborate with physicians to develop a novel reliable, easy to comprehend noninvasive model to predict the stage of liver fibrosis utilizing routine workup, without imposing extra costs for additional examinations especially in areas with limited resources like Egypt.

Methods: This multi-centered retrospective study included baseline demographic, laboratory, and histopathological data of 69106 patients with chronic hepatitis C. We started by data collection preprocessing, cleansing and formatting for knowledge discovery of useful information from Electronic Health Records EHRs. Data mining has been used to build a decision tree (Reduced Error Pruning tree (REP tree)) with 10-fold internal cross-validation. Histopathology results were used to assess accuracy for fibrosis stages. Machine learning feature selection and reduction (CfsSubseteval / best first) reduced the initial number of input features (N=15) to the most relevant ones (N=6) for developing the prediction model.

Results: In this study, 32419 patients had F(0-1), 25073 had F(2) and 11615 had F(3-4). FIB-4 and APRI revalidation in our study showed low accuracy and high discordance with biopsy results, with overall AUC 0.68 and 0.58 respectively. Out of 15 attributes machine learning selected Age, AFP, AST, glucose, albumin, and platelet as the most relevant attributes. Results for REP tree indicated an overall classification accuracy up to 70% and ROC Area 0.74 which was not nearly affected by attribute reduction, and pruning . However attribute reduction, and tree pruning were associated with simpler model easy to understand by physician with less time for execution.

Conclusion: This study we had the chance to study a large cohort of 69106 chronic hepatitis patients with available liver biopsy results to revise and validate the accuracy of FIB-4 and APRI. This study represents the collaboration between computer scientist and hepatologists to provide clinicians with an accurate novel and reliable, noninvasive model to predict the stage of liver fibrosis.

KEYWORDS

Liver Fibrosis, Data Mining, Weka, Decision Tree, Attribute Reduction, Tree Pruning

For More Details :
https://aircconline.com/csit/papers/vol9/csit90927.pdf

7^th International Conference on Data Mining & Knowledge Management Process (DKMP 2019)

NONNEGATIVE MATRIX FACTORIZATION UNDER ADVERSARIAL NOISE

Peter Ballen, University of Pennsylvania, Philadelphia, USA

ABSTRACT

Nonnegative Matrix Factorization (NMF) is a popular tool to estimate the missing entries of a dataset under the assumption that the true data has a low-dimensional factorization. One example of such a matrix is found in movie recommendation settings, where NMF corresponds to predicting how a user would rate a movie. Traditional NMF algorithms assume the input data is generated from the underlying representation plus mean-zero independent Gaussian noise. However, this simplistic assumption does not hold in real-world settings that contain more complex or adversarial noise. We provide a new NMF algorithm that is more robust towards these nonstandard noise patterns. Our algorithm outperforms existing algorithms on movie rating datasets, where adversarial noise corresponds to a group of adversarial users attempting to review-bomb a movie.

KEYWORDS

Nonnegative Matrix Factorization, Matrix Completion, Recommendation, Adversarial Noise, Outlier Detection, Linear Model

For More Details :
https://aircconline.com/csit/papers/vol9/csit91601.pdf

5^th International Conference on Data Mining and Applications (DMAP 2019)

DATA MODEL FOR BIGDEEPEXAMINATOR

Janusz Bobulski and Mariusz Kubanek Czestochowa University of Technology,Poland

ABSTRACT

Big Data is a term used for such data sets, which at the same time are characterized by high volume, diversity, real-time stream inflow, variability, complexity, as well as require the use of innovative technologies, tools and methods in order to extracting new and useful knowledge from them. Big Data is a new challenge and information possibilities. The effective acquisition and processing of data will play a key role in the global and local economy as well as social policy and large corporations. The article is a continuation of research and development works on the design of the data analysis system using artificial intelligence, in which we present a data model for this system.

KEYWORDS

Big data, intelligent systems, data processing, multi-data processing.

For More Details :
https://aircconline.com/csit/papers/vol9/csit91601.pdf

5^th International Conference on Data Mining and Applications (DMAP 2019)

CSEIJ

Top Data Mining &Knowledge Management Research articles of 2019

DETECTION OF HATE SPEECH IN SOCIAL NETWORKS: A SURVEY ON MULTILINGUAL CORPUS

ABSTRACT

KEYWORDS

CONSENT BASED ACCESS POLICY FRAMEWORK

ABSTRACT

KEYWORDS

ATTRIBUTE REDUCTION AND DECISION TREE PRUNING TO SIMPLIFY LIVER FIBROSIS PREDICTION ALGORITHMS A COHORT STUDY

ABSTRACT

KEYWORDS

NONNEGATIVE MATRIX FACTORIZATION UNDER ADVERSARIAL NOISE

ABSTRACT

KEYWORDS

DATA MODEL FOR BIGDEEPEXAMINATOR

ABSTRACT

KEYWORDS

Reach Us