#BREXIT VS. #STOPBREXIT: WHAT IS TRENDIER? AN NLP ANALYSIS
Marco A. Palomino1 and Adithya Murali2 1 School of Computing, Electronics and Mathematics, University of Plymouth, Drake Circus, Plymouth, PL4 8AA, United Kingdom 2 School of Computing Science and Engineering, Vellore Institute of Technology, Vellore - 632 014, Tamil Nadu, India
Online trends have established themselves as a new method of information propagation that is reshaping journalism in the digital age. We argue that sentiment analysis—the classification of human emotion expressed in text—can enhance existing algorithms for trend discovery. By highlighting topics that are polarised, sentiment analysis can offer insight into the influence of users who are involved in a trend, and how other users adopt such a trend. As a case study, we have investigated a highly topical subject: Brexit, the withdrawal of the United Kingdom from the European Union. We retrieved an experimental corpus of publicly available tweets referring to Brexit and used them to test a proposed algorithm to identify trends. We validate the efficiency of the algorithm and gauge the sentiment expressed on the captured trends to confirm that highly polarised data ensures the emergence of trends.
Twitter; sentiment analysis; world clouds; text mining; information retrieval
For More Details :
https://aircconline.com/csit/papers/vol9/csit91203.pdf
INCLUDING NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING INTO INFORMATION RETRIEVAL
Piotr Malak and Artur Ogurek Institute of Information Science and Book Studies, University of Wrocław, Poland
In current paper we discuss the results of preliminary, but promising, research on including some Natural Language Processing (NLP) and Machine Learning (ML) approaches into Information Retrieval. Classical IR uses indexing and term weighting in order to increase pertinence of answers given to users queries. Such approach allows for matching the meaning, i.e. matching all keywords of the same or very similar meaning as expressed in user query. For most cases this approach is sufficient enough to fulfil user information needs. However indexing and retrieving information over professional language texts brings new challenges as well as new possibilities. One of challenges is different grammar, causing the need of adjusting NLP tools for a given professiolect. One of the possibilities is detecting the context of occurrence of indexed term in the text. In our research we made an attempt to answer the question whether Natural Language Processing approach combined with supervised Machine Learning is capable of detecting contextual features of professional language texts.
Enhanced Information Retrieval, Contextual IR, NLP, Machine Learning
For More Details :
https://aircconline.com/csit/papers/vol9/csit91202.pdf
FLEXIBLE LOG FILE PARSING USING HIDDEN MARKOV MODELS
Nadine Kuhnert and Andreas Maier Pattern Recognition, Friedrich-Alexander University, Erlangen-Nuremberg, Germany
We aim to model unknown file processing. As the content of log files often evolves over time, we established a dynamic statistical model which learns and adapts processing and parsing rules. First, we limit the amount of unstructured text by focusing only on those frequent patterns which lead to the desired output table similar to Vaarandi [10]. Second, we transform the found frequent patterns and the output stating the parsed table into a Hidden Markov Model (HMM). We use this HMM as a specific, however, flexible representation of a pattern for log file processing. With changes in the raw log file distorting learned patterns, we aim the model to adapt automatically in order to maintain high quality output. After training our model on one system type, applying the model and the resulting parsing rule to a different system with slightly different log file patterns, we achieve an accuracy over 99%.
Hidden Markov Models, Parameter Extraction, Parsing, Text Mining, Information Retrieval
For More Details :
https://aircconline.com/csit/papers/vol9/csit91201.pdf
CHEMCONNECT: AN ONTOLOGY-BASED REPOSITORY OF EXPERIMENTAL DEVICES AND OBSERVATIONS
Edward S. Blurock Blurock Consulting AB, Lund, Sweden
CHEMCONNECT is an ontology cloud-based repository of experimental, theoretical and computational data for the experimental sciences domain. Currently, the emphasis is on the chemical combustion community, but in future work (in collaboration with domain experts) the domain will be expanded. CHEMCONNECT goes beyond traditional meta-data annotated scientific result repositories in that the data is parsed and analysed with respect to an extensive chemical and combustion knowledge base. The parsed data is then inter-linked allowing for efficient searching and comparison. The goal is to link all data associated with experiments, including the device description, the intermediate data (both computed and measured), the associated interpretations, procedures and methodologies used to produce the data and the final published results and references. Having published data linked to its dependent measurements and constants, devices, subsystems, sensors and even people and laboratories provides an effective accountability and more confidence in the data. Data entry and availability can range from private user, to user defined consortia to general public. These concepts are implemented at http://www.connectedsmartdata.info.
Case Study, Ontology, Repository, Database, Experimental Devices, Experimental Results
For More Details :
https://aircconline.com/csit/papers/vol9/csit90709.pdf
A SURVEY ON THE DIFFERENT IMPLEMENTED CAPTCHAS
Shadi Khawandi, Firas Abdallah and Anis Ismail Faulty of Technology, Lebanese University, Lebanon
CAPTCHA is almost a standard security technology, and has found widespread application in commercial websites. There are two types: labeling and image based CAPTCHAs. To date, almost all CAPTCHA designs are labeling based. Labeling based CAPTCHAs refer to those that make judgment based on whether the question “what is it?” has been correctly answered. Essentially in Artificial Intelligence (AI), this means judgment depends on whether the new label provided by the user side matches the label already known to the server. Labeling based CAPTCHA designs have some common weaknesses that can be taken advantage of attackers. First, the label set, i.e., the number of classes, is small and fixed. Due to deformation and noise in CAPTCHAs, the classes have to be further reduced to avoid confusion. Second, clean segmentation in current design, in particular character labeling based CAPTCHAs, is feasible. The state of the art of CAPTCHA design suggests that the robustness of character labeling schemes should rely on the difficulty of finding where the character is (segmentation), rather than which character it is (recognition). However, the shapes of alphabet letters and numbers have very limited geometry characteristics that can be used by humans to tell them yet are also easy to be indistinct. Image recognition CAPTCHAs faces many potential problems which have not been fully studied. It is difficult for a small site to acquire a large dictionary of images which an attacker does not have access to and without a means of automatically acquiring new labeled images, an image based challenge does not usually meet the definition of a CAPTCHA. They are either unusable or prone to attacks. In this paper, we present the different types of CAPTCHAs trying to defeat advanced computer programs or bots, discussing the limitations and drawbacks of each.
CAPTCHAs, Labeling,Segmentation, Image recognition
For More Details :
https://airccj.org/CSCP/vol9/csit90101.pdf
EVALUATION OF DIFFERENT IMAGE SEGMENTATION METHODS WITH RESPECT TO COMPUTATIONAL SYSTEMS
1Ms. MehakSaini and 2Prof.(Dr.)K. K. Saini 1Electronics & Communication Engineering, Lovely Professional University, Jullunder, India 2Director, IIMT College of Engineering, Greater Noida, UP
Image segmentation is a fundamental step in the modern computational vision systems and its goal is to produce amore simple and meaningful representation of the image making it easier to analyze. Image segmentation is a subcategory of image processing of digital images and, basically, it divides a given image into two parts: the object(s) of interest and the background. Image segmentation is typically used to locate objects and boundaries in images and its applicability extends to other methods such as classification, feature extraction and pattern recognition. Most methods are based on histogram analysis, edge detection and regiongrowing. Currently, other approaches are presented such as segmentation by graph partition, using genetic algorithms and genetic programming. This paper presents a review of this area, starting with taxonomy of the methods followed by a discussion of the most relevant ones.
Image segmentation , histogram analysis & Edge detectors
For More Details :
https://airccj.org/CSCP/vol9/csit90306.pdf
IMAGE SEGMENTATION BASED ON MULTIPLEX NETWORKS AND SUPER PIXELS
Ivo S. M. de Oliveira1,2, Oscar A. C. Linares1 , Ary H. M. de Oliveira3 , Glenda M. Botelho3 and João Batista Neto1 1 Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, Brazil 2 Instituto Federal do Tocantins, Campus de Paraíso do Tocantins, , Paraíso do Tocatins, Brazil. 3Universidade Federal do Tocantins, Palmas, Brazil
Despite the large number of techniques and applications in the field of image segmentation, it is still an open research field. A recent trend in image segmentation is the usage of graph theory. This work proposes an approach which combines community detection in multiplex networks, in which a layer represents a certain image feature, with super pixels. There are approaches for the segmentation of images of good quality that use a single feature or the combination of several features of the image forming a single graph for the detection of communities and the segmentation. However, with the use of multiplex networks it is possible to use more than one image feature without the need for mathematical operations that can lead to the loss of information of the image features during the generation of the graphs. Through the related experiments, presented in this work, it is possible to identify that such method can offer quality and robust segmentations.
community detection; complex networks; image segmentation; multiplex networks; super pixels
For More Details :
https://airccj.org/CSCP/vol9/csit90304.pdf
OCCLUSION HANDLED BLOCK-BASED STEREO MATCHING WITH IMAGE SEGMENTATION
Jisu Kim, Cheolhyeong Park, Ju O Kim and Deokwoo Lee Department of Computer Engineering, Keimyung University, Daegu 42601, Republic of Korea
This paper chiefly deals with techniques of stereo vision, particularly focuses on the procedure of stereo matching. In addition, the proposed approach deals with detection of the regions of occlusion. Prior to carrying out stereo matching, image segmentation is conducted in order to achieve precise matching results. In practice, in stereo vision, matching algorithm sometimes suffers from insufficient accuracy if occlusion is inherent with the scene of interest. Searching the matching regions is conducted based on cross correlation and based on finding a region of the minimum mean square error of the difference between the areas of interest defined in matching window. Middlebury dataset is used for experiments, comparison with the existed results, and the proposed algorithm shows better performance than the existed matching algorithms. To evaluate the proposed algorithm, we compare the result of disparity to the existed ones.
Occlusion, Stereo vision, Segmentation, Matching
For More Details :
https://airccj.org/CSCP/vol9/csit90303.pdf
A SURVEY OF VISIBLE IRIS RECOGNITION
Yali Song1 , Yongzhong He1,2 and Jin Zhang1 1 Beijing Jiaotong University, China, 2Science and Technology on Electronic Information Control Laboratory, Chengdu, China
In recent years, research on iris recognition in near-infrared has made great progress and achievements. However in many devices, such as most of the mobile phones, there is no nearinfrared device embedded. In order to use iris recognition in these devices, iris recognition in visible light is needed, but there are many problems to use visible iris recognition, including low recognition rate, poor robustness and so on. In this paper, we first clarified the challenges in visible iris recognition. We evaluate the effectiveness of three traditional iris recognition on iris collected from smart phones in visible light. The results show that traditional methods achieve accuracy not exceeding 60% at best. Then we summarize the recent advances in visible iris recognition in three aspects: iris image acquisition, iris preprocessing and iris feature extraction methods. In the end, we list future research directions in visible iris recognition.
visible iris recognition, mobile phones, iris image acquisition, feature extraction
For More Details :
https://airccj.org/CSCP/vol9/csit90302.pdf
A COMPARISON OF ACTIVE CONTOUR PRIOR SHAPE SEGMENTATION METHODS: APPLICATION TO DIABETIC PLANTAR FOOT THERMAL IMAGES
Asma Bougrine, Rachid Harba, Raphael Canals, Roger Ledee, Meryem JablounPRISME Laboratory - University of Orleans – France
The segmentation of diabetic plantar foot thermal images that are taken with no constraining setup is a challenging problem. The present paper is dedicated to the comparison of three active contour-based methods with prior shape information that are well suited to the given problem. The first method was recently proposed by the present authors. It is based on the Kass et al. method and on a new extra term that minimizes the difference between the curve curvature of the active contour and the prior shape one. The second method is the Ahmed et al. one, a Fourier-based method with prior shape matching. The third one was suggested by Chen et al. where a geodesic snake is associated with a prior shape energy function. Using a database of 50 plantar foot thermal images, results show that our proposed method outperforms the two others with a root-mean-square error (RMSE) equal to 5.12 pixels and a Dice Similarity Coefficient (DSC) score of 93.9%. In addition, our method is robust to initial contour variations and fast, therefore suitable for smartphone application in the context of diabetic foot problem.
Prior shape-based segmentation, active contours, plantar foot thermal images, diabetic foot.
For More Details :
https://aircconline.com/csit/papers/vol9/csit90404.pdf
SEGMENTATION OF SINGLE AND OVERLAPPING LEAVES BY EXTRACTING APPROPRIATE CONTOURS
Rafflesia Khan and Rameswar Debnath Computer Science and Engineering Discipline, Khulna University, Khulna, Bangladesh
Leaf detection and segmentation is a complex image segmentation problem as leaves are most often found in groups with natural background. Edges of leaves cannot be clearly defined from image because of their color similarities.Also,separating every single as well as overlapping leaf individually is even more challenging as leaves share almost same color, texture and shape. In this paper, we propose a new automatic approach for leaf segmentation from image. Our leaf segmentation process uses efficient techniques for processing an image to obtain contours of every individual objects. Then, it selects the best appropriate connected contours that represent region of every leaves appearing in an image. Our model archives an overall 90.46% segmentation rate where segmentation rates for single and overlapping leaves are 95.34% and 86.73%, respectively.
image processing, leaf object segmentation, overlapping leaves, connected contour, object boundary detection
For More Details :
https://aircconline.com/csit/papers/vol9/csit91323.pdf
AMHARIC-ARABIC NEURAL MACHINE TRANSLATION
Ibrahim Gashaw and H L Shashirekha Mangalore University, Department of Computer Science, Mangalagangotri, Mangalore
Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. Two Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) models are developed using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. In order to perform the experiment, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.
Amharic, Arabic, Neural Machine Translation, OpenNMT
For More Details :
https://aircconline.com/csit/papers/vol9/csit91606.pdf
TOWARD MULTI-LABEL CLASSIFICATION USING AN ONTOLOGY FOR WEB PAGE CLASSIFICATION
Yaya Traoré1 and Sadouanouan Malo2 and Bassolé Didier1 and Séré Abdoulaye2 1University Joseph KI-ZERBO, Ouagadougou, BURKINA FASO 2University Nazi Boni, Bobo-Dioulasso, BURKINA FASO
Automatic categorization of web pages has become more significant to help the search engines to provide users with relevant and quick retrieval results. In this paper, we propose a method based on Multi-label Classification (ML) using an ontology which allows the prediction of the categories of a new web page created and tagged. It uses the ontology in the learning phase as well as in the prediction phase. In the learning phase, the ontology is used to build the training set. In the prediction phase, the ontology is used to place the new pages tagged in the most specific categories. The experiment evaluation demonstrates that our proposal shows the substantial results.
Multi-label classification (ML), ontology, categorization, prediction
For More Details :
https://aircconline.com/csit/papers/vol9/csit91815.pdf