keyboard_arrow_up
Top 10 Natural Language Processing Trends in 2020

FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON

    Omar Abdullah Batarfi, Mohamed Y. Dahab1 and Muazzam A. Siddiqui , King Abdulaziz University, Jeddah, KSA

    ABSTRACT

    The availability of lexical resources is huge to accelerate and simplify the sentiment analysis in English. In Arabic, there are few resources and these resources are not comprehensive. Most of the current research efforts for constructing Arabic Sentiment Lexicon (ASL) depend on a large number of lexical entities. However, the coverage of all Arabic sentiment expressions can be applied using refined regular expressions rather than a large number of lexical entities. This paper presents an ASL that more comprehensive than the existing lexicons, for covering many expressions with different dialects including Franco-Arabic, and in the same time more compact. Also, this paper shows how to integrate different lexicons and to refine them. To enrich lexical entries with very robust morphological syntactical information, regular expressions, the weight of sentiment polarity and n-gram terms have been augmented to each.

    KEYWORDS

    Arabic Natural Language Processing, Arabic Sentiment Lexicon, Sentiment Analysis, Text Mining


    For More Details :
    http://aircconline.com/ijnlc/V8N6/8619ijnlc01.pdf



BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)

    Md. Kowsher1 ,Imran Hossen2and SkShohorab Ahmed2
    , 1Noakhali Science and Technology University,Bangladesh , 2University of Rajshai,Bangladesh

    ABSTRACT

    Information Retrieval System is an effective process that helps a user to trace relevant information by Natural Language Processing (NLP). In this research paper, we have presented present an algorithmic Information Retrieval System(BIRS) based on information and the system is significant mathematically and statistically. This paper is demonstrated by two algorithms for finding out the lemmatization of Bengali words such as Trie and Dictionary Based Search by Removing Affix (DBSRA) as well as compared with Edit Distance for the exact lemmatization. We have presented the Bengali Anaphora resolution system using the Hobbs’ algorithm to get the correct expression of information. As the actions of questions answering algorithms, the TF-IDF and Cosine Similarity are developed to find out the accurate answer from the documents. In this study, we have introduced a Bengali Language Toolkit (BLTK) and Bengali Language Expression (BRE) that make the easiest implication of our task. We have also developed Bengali root word’s corpus, synonym word’s corpus, stop word’s corpus and gathered 672 articles from the popular Bengali newspapers ‘The Daily Prothom Alo’ which is our inserted information. For testing this system, we have created 19335 questions from the introduced information and got 97.22% accurate answer.

    KEYWORDS

    Bangla language Processing, Information retrieval, Corpus, Mathematics, and Statistics


    For More Details :
    http://aircconline.com/ijnlc/V8N5/8519ijnlc01.pdf



PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE

    Martin J Wheatman, Yagadi Ltd, United Kingdom

    ABSTRACT

    A value-based approach to Natural Language Understanding, in particular, the disambiguation of pronouns, is illustrated with a solution to a typical example from the Winograd Schema Challenge. The worked example uses a language engine, Enguage, to support the articulation of the advocation and fearing of violence. The example illustrates the indexical nature of pronouns, and how their values, their referent objects, change because they are set by contextual data. It must be noted that Enguage is not a suitable candidate for addressing the Winograd Schema Challenge as it is an interactive tool, whereas the Challenge requires a preconfigured, unattended program.

    KEYWORDS

    Natural Language Understanding, Winograd Schema Challenge, Enguage, Interactive Computation, Peircean Semiotics


    For More Details :
    http://aircconline.com/ijnlc/V8N5/8519ijnlc02.pdf



AUTO CORRECTION OF SETSWANA REAL-WORD ERRORS

    Gabofetswe Malema, Boago Okgetheng, Moffat Motlhanka and Goaletsa Rammidi , University of Botswana, Botswana

    ABSTRACT

    Spell checkers are used to detect and where possible correct spelling errors. Errors are classified as nonword errors and real-word errors. Real-word errors require the consideration of the context of the sentence to detect and correct. Setswana language has several commonly used words which are often misspelled by either separating or merging them. The misspelling results in real-word errors. In this paper we propose contextual rules that look at neighbor words to determine whether the correct word is written as two separate words or merged as one word. For some words the rules require that the parts of speech category of neighbor words be determined whereas some depend on specific neighbor words or position in a sentence. Implemented rules show that the rules are very consistent with a 88% success rate. Our tool only looks at neighbor words and therefore does not look at the context of the whole sentence. Hence, for words that require context of the whole sentence to disambiguate correctly our rules fail. This module can be incorporated into a spell checker to detect and correct real world errors for some words. That is, help users to determine the correct orthography of certain words

    KEYWORDS

    Spell checker, real-word errors, dictionary.


    For More Details :
    http://aircconline.com/ijnlc/V8N5/8519ijnlc05.pdf



HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH

    Namrata G Kharate1 and Dr.Varsha H. Patil2
    , 1 VIIT,Pune, Maharashtra , 2 Nashik, Maharashtra, India.

    ABSTRACT

    Machine translation is being carried out by the researchers from quite a long time. However, it is still a dream to materialize flawless Machine Translator and the small numbers of researchers has focussed at translating Marathi Text to English. Perfect Machine Translation Systems have not yet been fully built owing to the fact that languages differ syntactically as well as morphologically. Majority of the researchers have opted for Statistical Machine translation whereas in this paper we have addressed the challenges of Rule based Machine Translation. The paper describes the major divergences observed in language Marathi and English and many challenges encountered while attempting to build machine translation system form Marathi to English using rule based approach and rules to handle these challenges. As there are exceptions to the rules and limit to the feasibility of maintaining knowledgebase, the practical machine translation from Marathi to English is a complex task.

    KEYWORDS

    NLP; Machine Translation; English; Marathi; grammar


    For More Details :
    http://aircconline.com/ijnlc/V8N4/8419ijnlc04.pdf



SENTIMENT ANALYSIS ON PRODUCT FEATURES BASED ON LEXICON APPROACH USING NATURAL LANGUAGE PROCESSING

    Ameya Yerpude, Akshay Phirke, Ayush Agrawal and Atharva Deshmukh , RCOEM, Nagpur, India

    ABSTRACT

    Sentiment analysis has played an important role in identifying what other people think and what their behavior is. Text can be used to analyze the sentiment and classified as positive, negative or neutral. Applying the sentiment analysis on the product reviews on e-market helps not only the customer but also the industry people for taking decision. The method which provides sentiment analysis about the individual product’s features is discussed here. This paper presents the use of Natural Language Processing and SentiWordNet in this interesting application in Python: 1. Sentiment Analysis on Product review [Domain: Electronic]2. sentiment analysis regarding the product’s feature present in the product review [Sub Domain: Mobile Phones]. It usesa lexicon based approach in which text is tokenized for calculating the sentiment analysis of the product reviews on a e-market. The first part of paper includessentiment analyzer whichclassifiesthe sentiment present in product reviews into positive, negative or neutral depending on the polarity. The second part of the paper is an extension to the first part in which the customer review’s containing product’s features will be segregated and then these separated reviews are classified into positive, negative and neutral using sentiment analysis. Here, mobile phones are used as the product with features as screen, processors, etc. This gives a business solution for users and industries for effective product decisions.

    KEYWORDS

    Sentiment Analysis, Natural Language Processing, SentiWordNet, lexicon based approach


    For More Details :
    http://aircconline.com/ijnlc/V8N3/8319ijnlc01.pdf



ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR TO ENGLISH LANGUAGE PAIR

    Yi Mon Shwe Sin1and Khin Mar Soe 2 , University of Computer Studies, Yangon, Myanmar

    ABSTRACT

    Neural machine translation is a new approach to machine translation that has shown the effective results for high-resource languages. Recently, the attention-based neural machine translation with the large scale parallel corpus plays an important role to achieve high performance for translation results. In this research, a parallel corpus for Myanmar-English language pair is prepared and attention-based neural machine translation models are introduced based on word to word level, character to word level, and syllable to word level. We do the experiments of the proposed model to translate the long sentences and to address morphological problems. To decrease the low resource problem, source side monolingual data are also used. So, this work investigates to improve Myanmar to English neural machine translation system. The experimental results show that syllable to word level neural mahine translation model obtains an improvement over the baseline systems

    KEYWORDS

    Attention-based NMT, Syllable to word level NMT, Low resource language, Myanmar language


    For More Details :
    http://aircconline.com/ijnlc/V8N2/8219ijnlc01.pdf



BOOTSTRAPPING METHOD FOR DEVELOPING PART-OF-SPEECH TAGGED CORPUS IN LOW RESOURCE LANGUAGES TAGSET- A FOCUS ON AN AFRICAN IGBO

    Onyenwe Ikechukwu E1,Onyedinma Ebele G2,Aniegwu Godwin E 2Ezeani Ignatius M3
    , 1Nnamdi Azikiwe University, Nigeria , 2Federal College of Education ,Nigeria , 3University of Sheffield, United Kingdom

    ABSTRACT

    Most languages, especially in Africa, have fewer or no established part-of-speech (POS) tagged corpus. However, POS tagged corpus is essential for natural language processing (NLP) to support advanced researches such as machine translation, speech recognition, etc. Even in cases where there is no POS tagged corpus, there are some languages for which parallel texts are available online. The task of POS tagging a new language corpus with a new tagset usually face a bootstrapping problem at the initial stages of the annotation process. The unavailability of automatic taggers to help the human annotator makes the annotation process to appear infeasible to quickly produce adequate amounts of POS tagged corpus for advanced NLP research and training the taggers. In this paper, we demonstrate the efficacy of a POS annotation method that employed the services of two automatic approaches to assist POS tagged corpus creation for a novel language in NLP. The two approaches are cross-lingual and monolingual POS tags projection. We used cross-lingual to automatically create an initial ‘errorful’ tagged corpus for a target language via word-alignment. The resources for creating this are derived from a source language rich in NLP resources. A monolingual method is applied to clean the induce noise via an alignment process and to transform the source language tags to the target language tags. We used English and Igbo as our case study. This is possible because there are parallel texts that exist between English and Igbo, and the source language English has available NLP resources. The results of the experiment show a steady improvement in accuracy and rate of tags transformation with score ranges of 6.13% to 83.79% and 8.67% to 98.37% respectively. The rate of tags transformation evaluates the rate at which source language tags are translated to target language tags

    KEYWORDS

    Languages, Africa, Part-of-Speech, Corpus, Natural Language Processing, Tagset, Igbo, Bootstrapping


    For More Details :
    http://aircconline.com/ijnlc/V8N1/8119ijnlc02.pdf



ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LANGUAGE TOOLS

    Suriyah M, Aarthy Anandan, Anitha Narasimhan and Madhan Karky , Karky Research Foundation, India

    ABSTRACT

    With the advent of social media, the amount of text available for processing across different natural languages has become enormous. In the past few decades, there has been tremendous increase in the number of language processing applications. The tools for natural language computing of various languages are very different because each language has its own set of grammatical rules. This paper focuses on identifying the basic inflectional principles of Tamil language at word level. Three levels of word inflection concepts are considered – Patterns, Rules and Exceptions. How grammatical principles for word inflections in Tamil can be grouped in these three levels and applied for obtaining different word forms is the focus of this paper. These can be made use of in a wide variety of natural language applications like morphological analysis, morphological generation, word level translation, spelling and grammar check, information extraction etc. The tools using these rules will account for faster operation and better implementation of Tamil grammatical rules referred from [த ொல்த ொப்பியம் | tholgaappiyam] and [ நன்னூல் | nannool] in NLP applications

    KEYWORDS

    Natural language processing, Rule based approach, word level rules, Tamil tool, language tools


    For More Details :
    http://aircconline.com/ijnlc/V8N1/8119ijnlc03.pdf



ANNOTATED GUIDELINES AND BUILDING REFERENCE CORPUS FOR MYANMAR-ENGLISH WORD ALIGNMENT

    Eman Muslah,Said GhoNway Nway Han ,Aye Thidaul AI Research Lab, University of Computer Studies, Mandalay, Myanmar

    ABSTRACT

    Reference corpus for word alignment is an important resource for developing and evaluating word alignment methods. For Myanmar-English language pairs, there is no reference corpus to evaluate the word alignment tasks. Therefore, we created the guidelines for Myanmar-English word alignment annotation between two languages over contrastive learning and built the Myanmar-English reference corpus consisting of verified alignments from Myanmar ALT of the Asian Language Treebank (ALT). This reference corpus contains confident labels sure (S) and possible (P) for word alignments which are used to test for the purpose of evaluation of the word alignments tasks. We discuss the most linking ambiguities to define consistent and systematic instructions to align manual words. We evaluated the results of annotators agreement using our reference corpus in terms of alignment error rate (AER) in word alignment tasks and discuss the words relationships in terms of BLEU scores

    KEYWORDS

    Annotation Guidelines, Alignment, Agreement, Reference Corpus, Treebank


    For More Details :
    http://aircconline.com/ijnlc/V8N4/8419ijnlc03.pdf







menu
Reach Us

emailsecretary@cseij.org


emailcseijsecretary@yahoo.com

close