DETECTION OF HATE SPEECH IN SOCIAL NETWORKS: A SURVEY ON MULTILINGUAL CORPUS
Areej Al-Hassan and Hmood Al-Dossari, King Saud University, Riyadh, Saudi Arabia
In social media platforms, hate speech can be a reason of “cyber conflict” which can affect social life in both of individual-level and country-level. Hateful and antagonistic content propagated via social networks has the potential to cause harm and suffering on an individual basis and lead to social tension and disorder beyond cyber space. However, social networks cannot control all the content that users post. For this reason, there is a demand for automatic detection of hate speech. This demand particularly raises when the content is written in complex languages (e.g. Arabic). Arabic text is known with its challenges, complexity and scarcity of its resources. This paper will present a background on hate speech and its related detection approaches. In addition, the recent contributions on hate speech and its related anti-social behaviour topics will be reviewed. Finally, challenges and recommendations for the Arabic hate speech detection problem will be presented.
Text Mining, Social Networks, Hate Speech, Natural Language Processing, Arabic NLP
For More Details :
CONSENT BASED ACCESS POLICY FRAMEWORK
Geetha Madadevaiah1 , RV Prasad1 , Amogh Hiremath1 , Michel Dumontier2 , Andre Dekker3 1Philips Research, Philips Innovation Campus, Bangalore 2,3Maastricht University, Maastricht, Netherlands
In this paper, we use Semantic Web Technologies to store and share the sensitive medical data in a secure manner. The framework builds on the advantages of the Semantic Web technologies and makes it secure and robust for sharing sensitive information in a controlled environment. The framework uses a combination of Role-Based and Rule-Based Access Policies to provide security to a medical data repository. To support the framework, we built a lightweight ontology to collect consent from the users indicating which part of their data they want to share with another user having a particular role. Here, we have considered the scenario of sharing the medical data by the owner of data, say the patient, with relevant people such as physicians, researchers, pharmacist, etc. We developed a prototype,which is validated using Sesame Open RDF Workbench with 202,908 triples and a consent graph stating consents per patient.
Access Policies, Semantic Web,RDF/SPARQL, Role Based, Rule Based
For More Details :
ATTRIBUTE REDUCTION AND DECISION TREE PRUNING TO SIMPLIFY LIVER FIBROSIS PREDICTION ALGORITHMS A COHORT STUDY
Mahasen Mabrouk1, Abubakr Awad2, Hend Shousha1, Wafaa Alakel1,3,Ahmed Salama, Tahany Awad1 , 1Faculty of Medicine, Cairo University, Egypt 2University of Aberdeen, UK ,3Ministry of Health and Population, Egypt
Background: Assessment of liver fibrosis is a vital need for enabling therapeutic decisions
and prognostic evaluations of chronic hepatitis. Liver biopsy is considered the definitive investigation for assessing the stage of liver fibrosis but it carries several limitations. FIB-4 and APRI also have a limited accuracy. The National Committee for Control of Viral Hepatitis (NCCVH) in Egypt has supplied a valuable pool of electronic patients’ data that data mining techniques can analyze to disclose hidden patterns, trends leading to the evolution of predictive algorithms.
Aim: to collaborate with physicians to develop a novel reliable, easy to comprehend noninvasive model to predict the stage of liver fibrosis utilizing routine workup, without imposing extra costs for additional examinations especially in areas with limited resources like Egypt.
Methods: This multi-centered retrospective study included baseline demographic, laboratory, and histopathological data of 69106 patients with chronic hepatitis C. We started by data collection preprocessing, cleansing and formatting for knowledge discovery of useful information from Electronic Health Records EHRs. Data mining has been used to build a decision tree (Reduced Error Pruning tree (REP tree)) with 10-fold internal cross-validation. Histopathology results were used to assess accuracy for fibrosis stages. Machine learning feature selection and reduction (CfsSubseteval / best first) reduced the initial number of input features (N=15) to the most relevant ones (N=6) for developing the prediction model.
Results: In this study, 32419 patients had F(0-1), 25073 had F(2) and 11615 had F(3-4). FIB-4 and APRI revalidation in our study showed low accuracy and high discordance with biopsy results, with overall AUC 0.68 and 0.58 respectively. Out of 15 attributes machine learning selected Age, AFP, AST, glucose, albumin, and platelet as the most relevant attributes. Results for REP tree indicated an overall classification accuracy up to 70% and ROC Area 0.74 which was not nearly affected by attribute reduction, and pruning . However attribute reduction, and tree pruning were associated with simpler model easy to understand by physician with less time for execution.
Conclusion: This study we had the chance to study a large cohort of 69106 chronic hepatitis patients with available liver biopsy results to revise and validate the accuracy of FIB-4 and APRI. This study represents the collaboration between computer scientist and hepatologists to provide clinicians with an accurate novel and reliable, noninvasive model to predict the stage of liver fibrosis.
Liver Fibrosis, Data Mining, Weka, Decision Tree, Attribute Reduction, Tree Pruning
For More Details :
NONNEGATIVE MATRIX FACTORIZATION UNDER ADVERSARIAL NOISE
Peter Ballen, University of Pennsylvania, Philadelphia, USA
Nonnegative Matrix Factorization (NMF) is a popular tool to estimate the missing entries of a dataset under the assumption that the true data has a low-dimensional factorization. One example of such a matrix is found in movie recommendation settings, where NMF corresponds to predicting how a user would rate a movie. Traditional NMF algorithms assume the input data is generated from the underlying representation plus mean-zero independent Gaussian noise. However, this simplistic assumption does not hold in real-world settings that contain more complex or adversarial noise. We provide a new NMF algorithm that is more robust towards these nonstandard noise patterns. Our algorithm outperforms existing algorithms on movie rating datasets, where adversarial noise corresponds to a group of adversarial users attempting to review-bomb a movie.
Nonnegative Matrix Factorization, Matrix Completion, Recommendation, Adversarial Noise, Outlier Detection, Linear Model
For More Details :
DATA MODEL FOR BIGDEEPEXAMINATOR
Janusz Bobulski and Mariusz Kubanek Czestochowa University of Technology,Poland
Big Data is a term used for such data sets, which at the same time are characterized by high volume, diversity, real-time stream inflow, variability, complexity, as well as require the use of innovative technologies, tools and methods in order to extracting new and useful knowledge from them. Big Data is a new challenge and information possibilities. The effective acquisition and processing of data will play a key role in the global and local economy as well as social policy and large corporations. The article is a continuation of research and development works on the design of the data analysis system using artificial intelligence, in which we present a data model for this system.
Big data, intelligent systems, data processing, multi-data processing.
For More Details :