Top NLP Research articles of 2019

#BREXIT VS. #STOPBREXIT: WHAT IS TRENDIER? AN NLP ANALYSIS

Marco A. Palomino¹ and Adithya Murali² ¹ School of Computing, Electronics and Mathematics, University of Plymouth, Drake Circus, Plymouth, PL4 8AA, United Kingdom ² School of Computing Science and Engineering, Vellore Institute of Technology, Vellore - 632 014, Tamil Nadu, India

ABSTRACT

Online trends have established themselves as a new method of information propagation that is reshaping journalism in the digital age. We argue that sentiment analysis—the classification of human emotion expressed in text—can enhance existing algorithms for trend discovery. By highlighting topics that are polarised, sentiment analysis can offer insight into the influence of users who are involved in a trend, and how other users adopt such a trend. As a case study, we have investigated a highly topical subject: Brexit, the withdrawal of the United Kingdom from the European Union. We retrieved an experimental corpus of publicly available tweets referring to Brexit and used them to test a proposed algorithm to identify trends. We validate the efficiency of the algorithm and gauge the sentiment expressed on the captured trends to confirm that highly polarised data ensures the emergence of trends.

KEYWORDS

Twitter; sentiment analysis; world clouds; text mining; information retrieval

For More Details :
https://aircconline.com/csit/papers/vol9/csit91203.pdf

8^th International Conference on Natural Language Processing (NLP 2019)

INCLUDING NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING INTO INFORMATION RETRIEVAL

Piotr Malak and Artur Ogurek Institute of Information Science and Book Studies, University of Wrocław, Poland

ABSTRACT

In current paper we discuss the results of preliminary, but promising, research on including some Natural Language Processing (NLP) and Machine Learning (ML) approaches into Information Retrieval. Classical IR uses indexing and term weighting in order to increase pertinence of answers given to users queries. Such approach allows for matching the meaning, i.e. matching all keywords of the same or very similar meaning as expressed in user query. For most cases this approach is sufficient enough to fulfil user information needs. However indexing and retrieving information over professional language texts brings new challenges as well as new possibilities. One of challenges is different grammar, causing the need of adjusting NLP tools for a given professiolect. One of the possibilities is detecting the context of occurrence of indexed term in the text. In our research we made an attempt to answer the question whether Natural Language Processing approach combined with supervised Machine Learning is capable of detecting contextual features of professional language texts.

KEYWORDS

Enhanced Information Retrieval, Contextual IR, NLP, Machine Learning

For More Details :
https://aircconline.com/csit/papers/vol9/csit91202.pdf

8^th International Conference on Natural Language Processing (NLP 2019)

FLEXIBLE LOG FILE PARSING USING HIDDEN MARKOV MODELS

Nadine Kuhnert and Andreas Maier Pattern Recognition, Friedrich-Alexander University, Erlangen-Nuremberg, Germany

ABSTRACT

We aim to model unknown file processing. As the content of log files often evolves over time, we established a dynamic statistical model which learns and adapts processing and parsing rules. First, we limit the amount of unstructured text by focusing only on those frequent patterns which lead to the desired output table similar to Vaarandi [10]. Second, we transform the found frequent patterns and the output stating the parsed table into a Hidden Markov Model (HMM). We use this HMM as a specific, however, flexible representation of a pattern for log file processing. With changes in the raw log file distorting learned patterns, we aim the model to adapt automatically in order to maintain high quality output. After training our model on one system type, applying the model and the resulting parsing rule to a different system with slightly different log file patterns, we achieve an accuracy over 99%.

KEYWORDS

Hidden Markov Models, Parameter Extraction, Parsing, Text Mining, Information Retrieval

For More Details :
https://aircconline.com/csit/papers/vol9/csit91201.pdf

8^th International Conference on Natural Language Processing (NLP 2019)

CHEMCONNECT: AN ONTOLOGY-BASED REPOSITORY OF EXPERIMENTAL DEVICES AND OBSERVATIONS

Edward S. Blurock Blurock Consulting AB, Lund, Sweden

ABSTRACT

CHEMCONNECT is an ontology cloud-based repository of experimental, theoretical and computational data for the experimental sciences domain. Currently, the emphasis is on the chemical combustion community, but in future work (in collaboration with domain experts) the domain will be expanded. CHEMCONNECT goes beyond traditional meta-data annotated scientific result repositories in that the data is parsed and analysed with respect to an extensive chemical and combustion knowledge base. The parsed data is then inter-linked allowing for efficient searching and comparison. The goal is to link all data associated with experiments, including the device description, the intermediate data (both computed and measured), the associated interpretations, procedures and methodologies used to produce the data and the final published results and references. Having published data linked to its dependent measurements and constants, devices, subsystems, sensors and even people and laboratories provides an effective accountability and more confidence in the data. Data entry and availability can range from private user, to user defined consortia to general public. These concepts are implemented at http://www.connectedsmartdata.info.

KEYWORDS

Case Study, Ontology, Repository, Database, Experimental Devices, Experimental Results

For More Details :
https://aircconline.com/csit/papers/vol9/csit90709.pdf

8^th International Conference on Soft Computing, Artificial Intelligence and Applications (SAI 2019)

A SURVEY ON THE DIFFERENT IMPLEMENTED CAPTCHAS

Shadi Khawandi, Firas Abdallah and Anis Ismail Faulty of Technology, Lebanese University, Lebanon

ABSTRACT

CAPTCHA is almost a standard security technology, and has found widespread application in commercial websites. There are two types: labeling and image based CAPTCHAs. To date, almost all CAPTCHA designs are labeling based. Labeling based CAPTCHAs refer to those that make judgment based on whether the question “what is it?” has been correctly answered. Essentially in Artificial Intelligence (AI), this means judgment depends on whether the new label provided by the user side matches the label already known to the server. Labeling based CAPTCHA designs have some common weaknesses that can be taken advantage of attackers. First, the label set, i.e., the number of classes, is small and fixed. Due to deformation and noise in CAPTCHAs, the classes have to be further reduced to avoid confusion. Second, clean segmentation in current design, in particular character labeling based CAPTCHAs, is feasible. The state of the art of CAPTCHA design suggests that the robustness of character labeling schemes should rely on the difficulty of finding where the character is (segmentation), rather than which character it is (recognition). However, the shapes of alphabet letters and numbers have very limited geometry characteristics that can be used by humans to tell them yet are also easy to be indistinct. Image recognition CAPTCHAs faces many potential problems which have not been fully studied. It is difficult for a small site to acquire a large dictionary of images which an attacker does not have access to and without a means of automatically acquiring new labeled images, an image based challenge does not usually meet the definition of a CAPTCHA. They are either unusable or prone to attacks. In this paper, we present the different types of CAPTCHAs trying to defeat advanced computer programs or bots, discussing the limitations and drawbacks of each.

KEYWORDS

CAPTCHAs, Labeling,Segmentation, Image recognition

For More Details :
https://airccj.org/CSCP/vol9/csit90101.pdf

3^rd International Conference on Computer Science and Information Technology (COMIT 2019

EVALUATION OF DIFFERENT IMAGE SEGMENTATION METHODS WITH RESPECT TO COMPUTATIONAL SYSTEMS

¹Ms. MehakSaini and ²Prof.(Dr.)K. K. Saini ¹Electronics & Communication Engineering, Lovely Professional University, Jullunder, India ²Director, IIMT College of Engineering, Greater Noida, UP

ABSTRACT

Image segmentation is a fundamental step in the modern computational vision systems and its goal is to produce amore simple and meaningful representation of the image making it easier to analyze. Image segmentation is a subcategory of image processing of digital images and, basically, it divides a given image into two parts: the object(s) of interest and the background. Image segmentation is typically used to locate objects and boundaries in images and its applicability extends to other methods such as classification, feature extraction and pattern recognition. Most methods are based on histogram analysis, edge detection and regiongrowing. Currently, other approaches are presented such as segmentation by graph partition, using genetic algorithms and genetic programming. This paper presents a review of this area, starting with taxonomy of the methods followed by a discussion of the most relevant ones.

KEYWORDS

Image segmentation , histogram analysis & Edge detectors

For More Details :
https://airccj.org/CSCP/vol9/csit90306.pdf

7^th International Conference on Signal Image Processing and Multimedia (SIPM 2019)

IMAGE SEGMENTATION BASED ON MULTIPLEX NETWORKS AND SUPER PIXELS

Ivo S. M. de Oliveira^1,2, Oscar A. C. Linares¹ , Ary H. M. de Oliveira³ , Glenda M. Botelho³ and João Batista Neto¹ ¹ Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, Brazil ² Instituto Federal do Tocantins, Campus de Paraíso do Tocantins, , Paraíso do Tocatins, Brazil. ³Universidade Federal do Tocantins, Palmas, Brazil

ABSTRACT

Despite the large number of techniques and applications in the field of image segmentation, it is still an open research field. A recent trend in image segmentation is the usage of graph theory. This work proposes an approach which combines community detection in multiplex networks, in which a layer represents a certain image feature, with super pixels. There are approaches for the segmentation of images of good quality that use a single feature or the combination of several features of the image forming a single graph for the detection of communities and the segmentation. However, with the use of multiplex networks it is possible to use more than one image feature without the need for mathematical operations that can lead to the loss of information of the image features during the generation of the graphs. Through the related experiments, presented in this work, it is possible to identify that such method can offer quality and robust segmentations.

KEYWORDS

community detection; complex networks; image segmentation; multiplex networks; super pixels

For More Details :
https://airccj.org/CSCP/vol9/csit90304.pdf

7^th International Conference on Signal Image Processing and Multimedia (SIPM 2019)

OCCLUSION HANDLED BLOCK-BASED STEREO MATCHING WITH IMAGE SEGMENTATION

Jisu Kim, Cheolhyeong Park, Ju O Kim and Deokwoo Lee Department of Computer Engineering, Keimyung University, Daegu 42601, Republic of Korea

ABSTRACT

This paper chiefly deals with techniques of stereo vision, particularly focuses on the procedure of stereo matching. In addition, the proposed approach deals with detection of the regions of occlusion. Prior to carrying out stereo matching, image segmentation is conducted in order to achieve precise matching results. In practice, in stereo vision, matching algorithm sometimes suffers from insufficient accuracy if occlusion is inherent with the scene of interest. Searching the matching regions is conducted based on cross correlation and based on finding a region of the minimum mean square error of the difference between the areas of interest defined in matching window. Middlebury dataset is used for experiments, comparison with the existed results, and the proposed algorithm shows better performance than the existed matching algorithms. To evaluate the proposed algorithm, we compare the result of disparity to the existed ones.

KEYWORDS

Occlusion, Stereo vision, Segmentation, Matching

For More Details :
https://airccj.org/CSCP/vol9/csit90303.pdf

7^th International Conference on Signal Image Processing and Multimedia (SIPM 2019)

A SURVEY OF VISIBLE IRIS RECOGNITION

Yali Song¹ , Yongzhong He1,² and Jin Zhang¹ ¹ Beijing Jiaotong University, China, ²Science and Technology on Electronic Information Control Laboratory, Chengdu, China

ABSTRACT

In recent years, research on iris recognition in near-infrared has made great progress and achievements. However in many devices, such as most of the mobile phones, there is no nearinfrared device embedded. In order to use iris recognition in these devices, iris recognition in visible light is needed, but there are many problems to use visible iris recognition, including low recognition rate, poor robustness and so on. In this paper, we first clarified the challenges in visible iris recognition. We evaluate the effectiveness of three traditional iris recognition on iris collected from smart phones in visible light. The results show that traditional methods achieve accuracy not exceeding 60% at best. Then we summarize the recent advances in visible iris recognition in three aspects: iris image acquisition, iris preprocessing and iris feature extraction methods. In the end, we list future research directions in visible iris recognition.

KEYWORDS

visible iris recognition, mobile phones, iris image acquisition, feature extraction

For More Details :
https://airccj.org/CSCP/vol9/csit90302.pdf

7^th International Conference on Signal Image Processing and Multimedia (SIPM 2019)

A COMPARISON OF ACTIVE CONTOUR PRIOR SHAPE SEGMENTATION METHODS: APPLICATION TO DIABETIC PLANTAR FOOT THERMAL IMAGES

Asma Bougrine, Rachid Harba, Raphael Canals, Roger Ledee, Meryem JablounPRISME Laboratory - University of Orleans – France

ABSTRACT

The segmentation of diabetic plantar foot thermal images that are taken with no constraining setup is a challenging problem. The present paper is dedicated to the comparison of three active contour-based methods with prior shape information that are well suited to the given problem. The first method was recently proposed by the present authors. It is based on the Kass et al. method and on a new extra term that minimizes the difference between the curve curvature of the active contour and the prior shape one. The second method is the Ahmed et al. one, a Fourier-based method with prior shape matching. The third one was suggested by Chen et al. where a geodesic snake is associated with a prior shape energy function. Using a database of 50 plantar foot thermal images, results show that our proposed method outperforms the two others with a root-mean-square error (RMSE) equal to 5.12 pixels and a Dice Similarity Coefficient (DSC) score of 93.9%. In addition, our method is robust to initial contour variations and fast, therefore suitable for smartphone application in the context of diabetic foot problem.

KEYWORDS

Prior shape-based segmentation, active contours, plantar foot thermal images, diabetic foot.

For More Details :
https://aircconline.com/csit/papers/vol9/csit90404.pdf

8^th International Conference on Advanced Computer Science and Information Technology (ICAIT 2019)

SEGMENTATION OF SINGLE AND OVERLAPPING LEAVES BY EXTRACTING APPROPRIATE CONTOURS

Rafflesia Khan and Rameswar Debnath Computer Science and Engineering Discipline, Khulna University, Khulna, Bangladesh

ABSTRACT

Leaf detection and segmentation is a complex image segmentation problem as leaves are most often found in groups with natural background. Edges of leaves cannot be clearly defined from image because of their color similarities.Also,separating every single as well as overlapping leaf individually is even more challenging as leaves share almost same color, texture and shape. In this paper, we propose a new automatic approach for leaf segmentation from image. Our leaf segmentation process uses efficient techniques for processing an image to obtain contours of every individual objects. Then, it selects the best appropriate connected contours that represent region of every leaves appearing in an image. Our model archives an overall 90.46% segmentation rate where segmentation rates for single and overlapping leaves are 95.34% and 86.73%, respectively.

KEYWORDS

image processing, leaf object segmentation, overlapping leaves, connected contour, object boundary detection

For More Details :
https://aircconline.com/csit/papers/vol9/csit91323.pdf

6^th International Conference on Computer Science, Engineering and Information Technology (CSEIT-2019)

AMHARIC-ARABIC NEURAL MACHINE TRANSLATION

Ibrahim Gashaw and H L Shashirekha Mangalore University, Department of Computer Science, Mangalagangotri, Mangalore

ABSTRACT

Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. Two Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) models are developed using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. In order to perform the experiment, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.

KEYWORDS

Amharic, Arabic, Neural Machine Translation, OpenNMT

For More Details :
https://aircconline.com/csit/papers/vol9/csit91606.pdf

5^th International Conference on Data Mining and Applications (DMAP 2019)

TOWARD MULTI-LABEL CLASSIFICATION USING AN ONTOLOGY FOR WEB PAGE CLASSIFICATION

Yaya Traoré¹ and Sadouanouan Malo² and Bassolé Didier¹ and Séré Abdoulaye² ¹University Joseph KI-ZERBO, Ouagadougou, BURKINA FASO ²University Nazi Boni, Bobo-Dioulasso, BURKINA FASO

ABSTRACT

Automatic categorization of web pages has become more significant to help the search engines to provide users with relevant and quick retrieval results. In this paper, we propose a method based on Multi-label Classification (ML) using an ontology which allows the prediction of the categories of a new web page created and tagged. It uses the ontology in the learning phase as well as in the prediction phase. In the learning phase, the ontology is used to build the training set. In the prediction phase, the ontology is used to place the new pages tagged in the most specific categories. The experiment evaluation demonstrates that our proposal shows the substantial results.

KEYWORDS

Multi-label classification (ML), ontology, categorization, prediction

For More Details :
https://aircconline.com/csit/papers/vol9/csit91815.pdf

9^th International Conference on Computer Science, Engineering and Applications (ICCSEA 2019)

CSEIJ

Top NLP Research articles of 2019

#BREXIT VS. #STOPBREXIT: WHAT IS TRENDIER? AN NLP ANALYSIS

ABSTRACT

KEYWORDS

INCLUDING NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING INTO INFORMATION RETRIEVAL

ABSTRACT

KEYWORDS

FLEXIBLE LOG FILE PARSING USING HIDDEN MARKOV MODELS

ABSTRACT

KEYWORDS

CHEMCONNECT: AN ONTOLOGY-BASED REPOSITORY OF EXPERIMENTAL DEVICES AND OBSERVATIONS

ABSTRACT

KEYWORDS

A SURVEY ON THE DIFFERENT IMPLEMENTED CAPTCHAS

ABSTRACT

KEYWORDS

EVALUATION OF DIFFERENT IMAGE SEGMENTATION METHODS WITH RESPECT TO COMPUTATIONAL SYSTEMS

ABSTRACT

KEYWORDS

IMAGE SEGMENTATION BASED ON MULTIPLEX NETWORKS AND SUPER PIXELS

ABSTRACT

KEYWORDS

OCCLUSION HANDLED BLOCK-BASED STEREO MATCHING WITH IMAGE SEGMENTATION

ABSTRACT

KEYWORDS

A SURVEY OF VISIBLE IRIS RECOGNITION

ABSTRACT

KEYWORDS

A COMPARISON OF ACTIVE CONTOUR PRIOR SHAPE SEGMENTATION METHODS: APPLICATION TO DIABETIC PLANTAR FOOT THERMAL IMAGES

ABSTRACT

KEYWORDS

SEGMENTATION OF SINGLE AND OVERLAPPING LEAVES BY EXTRACTING APPROPRIATE CONTOURS

ABSTRACT

KEYWORDS

AMHARIC-ARABIC NEURAL MACHINE TRANSLATION

ABSTRACT

KEYWORDS

TOWARD MULTI-LABEL CLASSIFICATION USING AN ONTOLOGY FOR WEB PAGE CLASSIFICATION

ABSTRACT

KEYWORDS

Reach Us