ASSAMESE-ENGLISH BILINGUAL MACHINE TRANSLATION
Kalyanee Kanchan Baruah1, Pranjal Das2, Abdul Hannan11 and Shikhar Kr Sarma1, 1Department of Information Technology, Gauhati University, Guwahati, Assam
Machine translation is the process of translating text from one language to another. In this paper, Statistical Machine Translation is done on Assamese and English language by taking their respective parallel corpus. A statistical phrase based translation toolkit Moses is used here. To develop the language model and to align the words we used two another tools IRSTLM, GIZA respectively. BLEU score is used to check our translation system performance, how good it is. A difference in BLEU scores is obtained while translating sentences from Assamese to English and vice-versa. Since Indian languages are morphologically very rich hence translation is relatively harder from English to Assamese resulting in a low BLEU score. A statistical transliteration system is also introduced with our translation system to deal basically with proper nouns, OOV (out of vocabulary) words which are not present in our corpus.
Assamese, Machine translation, Moses, Corpus, BLEU
For More Details :
https://airccse.org/journal/ijnlc/papers/3314ijnlc07.pdf
Volume Link :
https://airccse.org/journal/ijnlc/vol3.html
RESUME INFORMATION EXTRACTION WITH A NOVEL TEXT BLOCK SEGMENTATION ALGORITHM
Shicheng Zu and Xiulai Wang, Post-doctoral Scientific Research Station in East War District General Hospital, Nanjing, Jiangsu 210000, China
In recent years, we have witnessed the rapid development of deep neural networks and distributed representations in natural language processing. However, the applications of neural networks in resume parsing lack systematic investigation. In this study, we proposed an end-to-end pipeline for resume parsing based on neural networks-based classifiers and distributed embeddings. This pipeline leverages the position-wise line information and integrated meanings of each text block. The coordinated line classification by both line type classifier and line label classifier effectively segment a resume into predefined text blocks. Our proposed pipeline joints the text block segmentation with the identification of resume facts in which various sequence labelling classifiers perform named entity recognition within labelled text blocks. Comparative evaluation of four sequence labelling classifiers confirmed BLSTMCNNs-CRF’s superiority in named entity recognition task. Further comparison among three publicized resume parsers also determined the effectiveness of our text block classification method.
Resume Parsing, Word Embeddings, Named Entity Recognition, Text Classifier, Neural Networks
For More Details :
https://aircconline.com/ijnlc/V8N5/8519ijnlc03.pdf
Volume Link :
https://airccse.org/journal/ijnlc/vol8.html
GRAMMAR CHECKERS FOR NATURAL LANGUAGES: A REVIEW
Nivedita S. Bhirud1, R.P. Bhavsar2, B.V. Pawar3, 1Department of Computer Engineering, Vishwakarma Institute of Information Technology, Pune, India, 2,3School of Computer Sciences, North Maharashtra University, Jalgaon, India
Natural Language processing is an interdisciplinary branch of linguistic and computer science studied under the Artificial Intelligence (AI) that gave birth to an allied area called ‘Computational Linguistics’ which focuses on processing of natural languages on computational devices. A natural language consists of many sentences which are meaningful linguistic units involving one or more words linked together in accordance with a set of predefined rules called ‘grammar’. Grammar checking is fundamental task in the formal world that validates sentences syntactically as well as semantically. Grammar Checker tool is a prominent tool within language engineering. Our review draws on the till date development of various Natural Language grammar checkers to look at past, present and the future in the present context. Our review covers common grammatical errors , overview of grammar checking process, grammar checkers of various languages with the aim of seeking their approaches, methodologies and performance evaluation, which would be great help for developing new tool and system as a whole. The survey concludes with the discussion of different features included in existing grammar checkers of foreign languages as well as a few Indian Languages.
Natural Language Processing, Computational Linguistics, Writing errors,Grammatical mistakes, Grammar Checker
For More Details :
https://aircconline.com/ijnlc/V6N4/6417ijnlc01.pdf
Volume Link :
https://airccse.org/journal/ijnlc/vol6.html
Named Entity Recognition using Hidden Markov Model (HMM)
Sudha Morwal1, Nusrat Jahan2 and Deepti Chopra3, 1Associate Professor, Banasthali University, Jaipur, Rajasthan-302001, 2M.Tech (CS), Banasthali University, Jaipur, Rajasthan-302001, 3M. Tech (CS), Banasthali University, Jaipur, Rajasthan-302001
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific.
Named Entity Recognition (NER), Natural Language processing (NLP), Hidden Markov Model(HMM).
For More Details :
https://airccse.org/journal/ijnlc/papers/1412ijnlc02.pdf
Volume Link :
https://airccse.org/journal/ijnlc/vol1.html
SURVEY OF MACHINE TRANSLATION SYSTEMS IN INDIA
G V Garje1 and G K Kharate2, 1Department of Computer Engineering and Information Technology PVG’s College of Engineering and Technology, Pune, India, 2Principal, Matoshri College of Engineering and Research Centre, Nashik, India
The work in the area of machine translation has been going on for last few decades but the promising translation work began in the early 1990s due to advanced research in Artificial Intelligence and Computational Linguistics. India is a multilingual and multicultural country with over 1.25 billion population and 22 constitutionally recognized languages which are written in 12 different scripts. This necessitates the automated machine translation system for English to Indian languages and among Indian languages so as to exchange the information amongst people in their local language. Many usable machine translation systems have been developed and are under development in India and around the world. The paper focuses on different approaches used in the development of Machine Translation Systems and also briefly described some of the Machine Translation Systems along with their features, domains and limitations.
Machine Translation, Example-based MT, Transfer-based MT, Interlingua-based MT
For More Details :
https://airccse.org/journal/ijnlc/papers/2513ijnlc04.pdf
Volume Link :
https://airccse.org/journal/ijnlc/vol2.html
MACHINE TRANSLATION DEVELOPMENT FOR INDIAN LANGUAGES AND ITS APPROACHES
Amruta Godase1 and Sharvari Govilkar2, 1Department of Information Technology (AI & Robotics), PIIT, Mumbai University, India, 2Department of Computer Engineering, PIIT, Mumbai University, India
This paper presents a survey of Machine translation system for Indian Regional languages. Machine translation is one of the central areas of Natural language processing (NLP). Machine translation (henceforth referred as MT) is important for breaking the language barrier and facilitating interlingual communication. For a multilingual country like INDIA which is largest democratic country in whole world, there is a big requirement of automatic machine translation system. With the advent of Information Technology many documents and web pages are coming up in a local language so there is a large need of good MT systems to address all these issues in order to establish a proper communication between states and union governments to exchange information amongst the people of different states. This paper focuses on different Machine translation projects done in India along with their features and domain.
Machine translation, computational linguistics, Indian Languages, Rule-based, Statistical, Empirical MT, Principle-based, Knowledge-based, Hybrid
For More Details :
https://airccse.org/journal/ijnlc/papers/4215ijnlc05.pdf
Volume Link :
https://airccse.org/journal/ijnlc/vol4.html
SURVEY ON MACHINE TRANSLITERATION AND MACHINE LEARNING MODELS
M L Dhore1, R M Dhore2, P H Rathod 3, 1,3Vishwakarma Institute of Technology, Savitribai Phule Pune University, India, 2Pune Vidhyarthi Girha’s College of Engineering and Technology, SPPU, India
Globalization and growth of Internet users truly demands for almost all internet based applications to support local languages. Support of local languages can be given in all internet based applications by means of Machine Transliteration and Machine Translation. This paper provides the thorough survey on machine transliteration models and machine learning approaches used for machine transliteration over the period of more than two decades for internationally used languages as well as Indian languages. Survey shows that linguistic approach provides better results for the closely related languages and probability based statistical approaches are good when one of the languages is phonetic and other is nonphonetic.Better accuracy can be achieved only by using Hybrid and Combined models.
CRF, Grapheme, HMM, Machine Transliteration, Machine Learning, NCM, Phoneme, SVM
For More Details :
https://airccse.org/journal/ijnlc/papers/4215ijnlc02.pdf
Volume Link :
https://airccse.org/journal/ijnlc/vol4.html
Algorithm For Text To Graph Conversion And Summarizing Using Nlp: A New Approach For Business Solutions
Prajakta Yerpude and Rashmi Jakhotiya and Manoj Chandak, Department of Computer Science and Engineering, RCOEM, Nagpur
Text can be analysed by splitting the text and extracting the keywords .These may be represented as summaries, tabular representation, graphical forms, and images. In order to provide a solution to large amount of information present in textual format led to a research of extracting the text and transforming the unstructured form to a structured format. The paper presents the importance of Natural Language Processing (NLP) and its two interesting applications in Python Language: 1. Automatic text summarization [Domain: Newspaper Articles] 2. Text to Graph Conversion [Domain: Stock news]. The main challenge in NLP is natural language understanding i.e. deriving meaning from human or natural language input which is done using regular expressions, artificial intelligence and database concepts. Automatic Summarization tool converts the newspaper articles into summary on the basis of frequency of words in the text. Text to Graph Converter takes in the input as stock article, tokenize them on various index (points and percent) and time and then tokens are mapped to graph. This paper proposes a business solution for users for effective time management.
NLP, Automatic Summarizer, Text to Graph Converter, Data Visualization, Regular Expression, Artificial Intelligence
For More Details :
https://airccse.org/journal/ijnlc/papers/4415ijnlc03.pdf
Volume Link :
https://airccse.org/journal/ijnlc/vol4.html
SENTIMENT ANALYSIS ON PRODUCT FEATURES BASED ON LEXICON APPROACH USING NATURAL LANGUAGE PROCESSING
Ameya Yerpude, Akshay Phirke, Ayush Agrawal and Atharva Deshmukh, Department of Computer Science and Engineering, RCOEM, Nagpur, India
Sentiment analysis has played an important role in identifying what other people think and what their behavior is. Text can be used to analyze the sentiment and classified as positive, negative or neutral. Applying the sentiment analysis on the product reviews on e-market helps not only the customer but also the industry people for taking decision. The method which provides sentiment analysis about the individual product’s features is discussed here. This paper presents the use of Natural Language Processing and SentiWordNet in this interesting application in Python: 1. Sentiment Analysis on Product review [Domain: Electronic]2. sentiment analysis regarding the product’s feature present in the product review [Sub Domain: Mobile Phones]. It usesa lexicon based approach in which text is tokenized for calculating the sentiment analysis of the product reviews on a e-market. The first part of paper includessentiment analyzer whichclassifiesthe sentiment present in product reviews into positive, negative or neutral depending on the polarity. The second part of the paper is an extension to the first part in which the customer review’s containing product’s features will be segregated and then these separated reviews are classified into positive, negative and neutral using sentiment analysis. Here, mobile phones are used as the product with features as screen, processors, etc. This gives a business solution for users and industries for effective product decisions.
Sentiment Analysis, Natural Language Processing, SentiWordNet, lexicon based approach
For More Details :
https://aircconline.com/ijnlc/V8N3/8319ijnlc01.pdf
Volume Link :
https://airccse.org/journal/ijnlc/vol8.html