FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
Omar Abdullah Batarfi, Mohamed Y. Dahab1 and Muazzam A. Siddiqui , King Abdulaziz University, Jeddah, KSA
The availability of lexical resources is huge to accelerate and simplify the sentiment analysis in English. In Arabic, there are few resources and these resources are not comprehensive. Most of the current research efforts for constructing Arabic Sentiment Lexicon (ASL) depend on a large number of lexical entities. However, the coverage of all Arabic sentiment expressions can be applied using refined regular expressions rather than a large number of lexical entities. This paper presents an ASL that more comprehensive than the existing lexicons, for covering many expressions with different dialects including Franco-Arabic, and in the same time more compact. Also, this paper shows how to integrate different lexicons and to refine them. To enrich lexical entries with very robust morphological syntactical information, regular expressions, the weight of sentiment polarity and n-gram terms have been augmented to each.
Arabic Natural Language Processing, Arabic Sentiment Lexicon, Sentiment Analysis, Text Mining
For More Details :
http://aircconline.com/ijnlc/V8N6/8619ijnlc01.pdf
BENGALI INFORMATION RETRIEVAL SYSTEM (BIRS)
Md. Kowsher1 ,Imran Hossen2and SkShohorab Ahmed2
, 1Noakhali Science and Technology University,Bangladesh
, 2University of Rajshai,Bangladesh
Information Retrieval System is an effective process that helps a user to trace relevant information by Natural Language Processing (NLP). In this research paper, we have presented present an algorithmic Information Retrieval System(BIRS) based on information and the system is significant mathematically and statistically. This paper is demonstrated by two algorithms for finding out the lemmatization of Bengali words such as Trie and Dictionary Based Search by Removing Affix (DBSRA) as well as compared with Edit Distance for the exact lemmatization. We have presented the Bengali Anaphora resolution system using the Hobbs’ algorithm to get the correct expression of information. As the actions of questions answering algorithms, the TF-IDF and Cosine Similarity are developed to find out the accurate answer from the documents. In this study, we have introduced a Bengali Language Toolkit (BLTK) and Bengali Language Expression (BRE) that make the easiest implication of our task. We have also developed Bengali root word’s corpus, synonym word’s corpus, stop word’s corpus and gathered 672 articles from the popular Bengali newspapers ‘The Daily Prothom Alo’ which is our inserted information. For testing this system, we have created 19335 questions from the introduced information and got 97.22% accurate answer.
Bangla language Processing, Information retrieval, Corpus, Mathematics, and Statistics
For More Details :
http://aircconline.com/ijnlc/V8N5/8519ijnlc01.pdf
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
Martin J Wheatman, Yagadi Ltd, United Kingdom
A value-based approach to Natural Language Understanding, in particular, the disambiguation of pronouns, is illustrated with a solution to a typical example from the Winograd Schema Challenge. The worked example uses a language engine, Enguage, to support the articulation of the advocation and fearing of violence. The example illustrates the indexical nature of pronouns, and how their values, their referent objects, change because they are set by contextual data. It must be noted that Enguage is not a suitable candidate for addressing the Winograd Schema Challenge as it is an interactive tool, whereas the Challenge requires a preconfigured, unattended program.
Natural Language Understanding, Winograd Schema Challenge, Enguage, Interactive Computation, Peircean Semiotics
For More Details :
http://aircconline.com/ijnlc/V8N5/8519ijnlc02.pdf
AUTO CORRECTION OF SETSWANA REAL-WORD ERRORS
Gabofetswe Malema, Boago Okgetheng, Moffat Motlhanka and Goaletsa Rammidi , University of Botswana, Botswana
Spell checkers are used to detect and where possible correct spelling errors. Errors are classified as nonword errors and real-word errors. Real-word errors require the consideration of the context of the sentence to detect and correct. Setswana language has several commonly used words which are often misspelled by either separating or merging them. The misspelling results in real-word errors. In this paper we propose contextual rules that look at neighbor words to determine whether the correct word is written as two separate words or merged as one word. For some words the rules require that the parts of speech category of neighbor words be determined whereas some depend on specific neighbor words or position in a sentence. Implemented rules show that the rules are very consistent with a 88% success rate. Our tool only looks at neighbor words and therefore does not look at the context of the whole sentence. Hence, for words that require context of the whole sentence to disambiguate correctly our rules fail. This module can be incorporated into a spell checker to detect and correct real world errors for some words. That is, help users to determine the correct orthography of certain words
Spell checker, real-word errors, dictionary.
For More Details :
http://aircconline.com/ijnlc/V8N5/8519ijnlc05.pdf
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
Namrata G Kharate1 and Dr.Varsha H. Patil2
, 1 VIIT,Pune, Maharashtra
, 2 Nashik, Maharashtra, India.
Machine translation is being carried out by the researchers from quite a long time. However, it is still a dream to materialize flawless Machine Translator and the small numbers of researchers has focussed at translating Marathi Text to English. Perfect Machine Translation Systems have not yet been fully built owing to the fact that languages differ syntactically as well as morphologically. Majority of the researchers have opted for Statistical Machine translation whereas in this paper we have addressed the challenges of Rule based Machine Translation. The paper describes the major divergences observed in language Marathi and English and many challenges encountered while attempting to build machine translation system form Marathi to English using rule based approach and rules to handle these challenges. As there are exceptions to the rules and limit to the feasibility of maintaining knowledgebase, the practical machine translation from Marathi to English is a complex task.
NLP; Machine Translation; English; Marathi; grammar
For More Details :
http://aircconline.com/ijnlc/V8N4/8419ijnlc04.pdf
SENTIMENT ANALYSIS ON PRODUCT FEATURES BASED ON LEXICON APPROACH USING NATURAL LANGUAGE PROCESSING
Ameya Yerpude, Akshay Phirke, Ayush Agrawal and Atharva Deshmukh , RCOEM, Nagpur, India
Sentiment analysis has played an important role in identifying what other people think and what their behavior is. Text can be used to analyze the sentiment and classified as positive, negative or neutral. Applying the sentiment analysis on the product reviews on e-market helps not only the customer but also the industry people for taking decision. The method which provides sentiment analysis about the individual product’s features is discussed here. This paper presents the use of Natural Language Processing and SentiWordNet in this interesting application in Python: 1. Sentiment Analysis on Product review [Domain: Electronic]2. sentiment analysis regarding the product’s feature present in the product review [Sub Domain: Mobile Phones]. It usesa lexicon based approach in which text is tokenized for calculating the sentiment analysis of the product reviews on a e-market. The first part of paper includessentiment analyzer whichclassifiesthe sentiment present in product reviews into positive, negative or neutral depending on the polarity. The second part of the paper is an extension to the first part in which the customer review’s containing product’s features will be segregated and then these separated reviews are classified into positive, negative and neutral using sentiment analysis. Here, mobile phones are used as the product with features as screen, processors, etc. This gives a business solution for users and industries for effective product decisions.
Sentiment Analysis, Natural Language Processing, SentiWordNet, lexicon based approach
For More Details :
http://aircconline.com/ijnlc/V8N3/8319ijnlc01.pdf
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR TO ENGLISH LANGUAGE PAIR
Yi Mon Shwe Sin1and Khin Mar Soe 2 , University of Computer Studies, Yangon, Myanmar
Neural machine translation is a new approach to machine translation that has shown the effective results for high-resource languages. Recently, the attention-based neural machine translation with the large scale parallel corpus plays an important role to achieve high performance for translation results. In this research, a parallel corpus for Myanmar-English language pair is prepared and attention-based neural machine translation models are introduced based on word to word level, character to word level, and syllable to word level. We do the experiments of the proposed model to translate the long sentences and to address morphological problems. To decrease the low resource problem, source side monolingual data are also used. So, this work investigates to improve Myanmar to English neural machine translation system. The experimental results show that syllable to word level neural mahine translation model obtains an improvement over the baseline systems
Attention-based NMT, Syllable to word level NMT, Low resource language, Myanmar language
For More Details :
http://aircconline.com/ijnlc/V8N2/8219ijnlc01.pdf
BOOTSTRAPPING METHOD FOR DEVELOPING PART-OF-SPEECH TAGGED CORPUS IN LOW RESOURCE LANGUAGES TAGSET- A FOCUS ON AN AFRICAN IGBO
Onyenwe Ikechukwu E1,Onyedinma Ebele G2,Aniegwu Godwin E 2Ezeani Ignatius M3
, 1Nnamdi Azikiwe University, Nigeria
, 2Federal College of Education ,Nigeria
, 3University of Sheffield, United Kingdom
Most languages, especially in Africa, have fewer or no established part-of-speech (POS) tagged corpus. However, POS tagged corpus is essential for natural language processing (NLP) to support advanced researches such as machine translation, speech recognition, etc. Even in cases where there is no POS tagged corpus, there are some languages for which parallel texts are available online. The task of POS tagging a new language corpus with a new tagset usually face a bootstrapping problem at the initial stages of the annotation process. The unavailability of automatic taggers to help the human annotator makes the annotation process to appear infeasible to quickly produce adequate amounts of POS tagged corpus for advanced NLP research and training the taggers. In this paper, we demonstrate the efficacy of a POS annotation method that employed the services of two automatic approaches to assist POS tagged corpus creation for a novel language in NLP. The two approaches are cross-lingual and monolingual POS tags projection. We used cross-lingual to automatically create an initial ‘errorful’ tagged corpus for a target language via word-alignment. The resources for creating this are derived from a source language rich in NLP resources. A monolingual method is applied to clean the induce noise via an alignment process and to transform the source language tags to the target language tags. We used English and Igbo as our case study. This is possible because there are parallel texts that exist between English and Igbo, and the source language English has available NLP resources. The results of the experiment show a steady improvement in accuracy and rate of tags transformation with score ranges of 6.13% to 83.79% and 8.67% to 98.37% respectively. The rate of tags transformation evaluates the rate at which source language tags are translated to target language tags
Languages, Africa, Part-of-Speech, Corpus, Natural Language Processing, Tagset, Igbo, Bootstrapping
For More Details :
http://aircconline.com/ijnlc/V8N1/8119ijnlc02.pdf
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LANGUAGE TOOLS
Suriyah M, Aarthy Anandan, Anitha Narasimhan and Madhan Karky , Karky Research Foundation, India
With the advent of social media, the amount of text available for processing across different natural languages has become enormous. In the past few decades, there has been tremendous increase in the number of language processing applications. The tools for natural language computing of various languages are very different because each language has its own set of grammatical rules. This paper focuses on identifying the basic inflectional principles of Tamil language at word level. Three levels of word inflection concepts are considered – Patterns, Rules and Exceptions. How grammatical principles for word inflections in Tamil can be grouped in these three levels and applied for obtaining different word forms is the focus of this paper. These can be made use of in a wide variety of natural language applications like morphological analysis, morphological generation, word level translation, spelling and grammar check, information extraction etc. The tools using these rules will account for faster operation and better implementation of Tamil grammatical rules referred from [த ொல்த ொப்பியம் | tholgaappiyam] and [ நன்னூல் | nannool] in NLP applications
Natural language processing, Rule based approach, word level rules, Tamil tool, language tools
For More Details :
http://aircconline.com/ijnlc/V8N1/8119ijnlc03.pdf
ANNOTATED GUIDELINES AND BUILDING REFERENCE CORPUS FOR MYANMAR-ENGLISH WORD ALIGNMENT
Eman Muslah,Said GhoNway Nway Han ,Aye Thidaul AI Research Lab, University of Computer Studies, Mandalay, Myanmar
Reference corpus for word alignment is an important resource for developing and evaluating word alignment methods. For Myanmar-English language pairs, there is no reference corpus to evaluate the word alignment tasks. Therefore, we created the guidelines for Myanmar-English word alignment annotation between two languages over contrastive learning and built the Myanmar-English reference corpus consisting of verified alignments from Myanmar ALT of the Asian Language Treebank (ALT). This reference corpus contains confident labels sure (S) and possible (P) for word alignments which are used to test for the purpose of evaluation of the word alignments tasks. We discuss the most linking ambiguities to define consistent and systematic instructions to align manual words. We evaluated the results of annotators agreement using our reference corpus in terms of alignment error rate (AER) in word alignment tasks and discuss the words relationships in terms of BLEU scores
Annotation Guidelines, Alignment, Agreement, Reference Corpus, Treebank
For More Details :
http://aircconline.com/ijnlc/V8N4/8419ijnlc03.pdf