keyboard_arrow_up
Top 10 read research articles in the field of Natural Language Computing @ 2024

AN IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES

    Mohammed Al-Maolegi1 Bassam Arkok2, Computer Science, Jordan University of Science and Technology, Irbid, Jordan

    ABSTRACT

    There are several mining algorithms of association rules. One of the most popular algorithms is Apriori that is used to extract frequent itemsets from large database and getting the association rule for discovering the knowledge. Based on this algorithm, this paper indicates the limitation of the original Apriori algorithm of wasting time for scanning the whole database searching on the frequent itemsets, and presents an improvement on Apriori by reducing that wasted time depending on scanning only some transactions. The paper shows by experimental results with several groups of transactions, and with several values of minimum support that applied on the original Apriori and our implemented improved Apriori that our improved Apriori reduces the time consumed by 67.38% in comparison with the original Apriori, and makes the Apriori algorithm more efficient and less time consuming.

    KEYWORDS

    Apriori, Improved Apriori, Frequent itemset, Support, Candidate itemset, Time consuming.


    For More Details :
    https://airccse.org/journal/ijnlc/papers/3114ijnlc03.pdf


    Volume Link :
    https://airccse.org/journal/ijnlc/vol3.html




NAMED ENTITY RECOGNITION USING HIDDEN MARKOV MODEL (HMM)

    Sudha Morwal1 Nusrat Jahan2 and Deepti Chopra3, 1Associate Professor, Banasthali University, Jaipur, Rajasthan-302001 2M.Tech (CS), Banasthali University, Jaipur, Rajasthan-3020013M.Tech (CS), M. Tech (CS), Banasthali University, Jaipur, Rajasthan-302001

    ABSTRACT

    Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific. .

    KEYWORDS

    Named Entity Recognition (NER), Natural Language processing (NLP), Hidden Markov Model (HMM).


    For More Details :
    https://airccse.org/journal/ijnlc/papers/1412ijnlc02.pdf


    Volume Link :
    https://airccse.org/journal/ijnlc/vol1.html




SENTIMENT ANALYSIS FOR MODERN STANDARD ARABIC AND COLLOQUIAL

    Hossam S. Ibrahim1 Sherif M. Abdou2 and Mervat Gheith1, 1Computer Science Department, Institute of statistical studies and research (ISSR), Cairo University, EGYPT 2 Information Technology Department, Faculty of Computers and information Cairo University, EGYPT

    ABSTRACT

    The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations, therefore many are now looking to the field of sentiment analysis. In this paper, we present a feature-based sentence level approach for Arabic sentiment analysis. Our approach is using Arabic idioms/saying phrases lexicon as a key importance for improving the detection of the sentiment polarity in Arabic sentences as well as a number of novels and rich set of linguistically motivated features (contextual Intensifiers, contextual Shifter and negation handling), syntactic features for conflicting phrases which enhance the sentiment classification accuracy. Furthermore, we introduce an automatic expandable wide coverage polarity lexicon of Arabic sentiment words. The lexicon is built with gold-standard sentiment words as a seed which is manually collected and annotated and it expands and detects the sentiment orientation automatically of new sentiment words using synset aggregation technique and free online Arabic lexicons and thesauruses. Our data focus on modern standard Arabic (MSA) and Egyptian dialectal Arabic tweets and microblogs (hotel reservation, product reviews, etc.). The experimental results using our resources and techniques with SVM classifier indicate high performance levels, with accuracies of over 95%.

    KEYWORDS

    Sentiment Analysis, opinion mining, social network, sentiment lexicon, modern standard Arabic, colloquial, natural language processing.


    For More Details :
    https://airccse.org/journal/ijnlc/papers/4215ijnlc07.pdf


    Volume Link :
    https://airccse.org/journal/ijnlc/vol4.html




SURVEY OF MACHINE TRANSLATION SYSTEMS IN INDIA

    G V Garje1 and G K Kharate2 1Department of Computer Engineering and Information Technology PVG’s College of Engineering and Technology, Pune, India 2 Principal, Matoshri College of Engineering and Research Centre, Nashik, India

    ABSTRACT

    The work in the area of machine translation has been going on for last few decades but the promising translation work began in the early 1990s due to advanced research in Artificial Intelligence and Computational Linguistics. India is a multilingual and multicultural country with over 1.25 billion population and 22 constitutionally recognized languages which are written in 12 different scripts. This necessitates the automated machine translation system for English to Indian languages and among Indian languages so as to exchange the information amongst people in their local language. Many usable machine translation systems have been developed and are under development in India and around the world. The paper focuses on different approaches used in the development of Machine Translation Systems and also briefly described some of the Machine Translation Systems along with their features, domains and limitations.

    KEYWORDS

    Machine Translation, Example-based MT, Transfer-based MT, Interlingua-based MT.


    For More Details :
    https://airccse.org/journal/ijnlc/papers/2513ijnlc04.pdf


    Volume Link :
    https://airccse.org/journal/ijnlc/vol2.html




RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI

    Deepti Bhalla1 Nisheeth Joshi2and Iti Mathur3, 1,2,3Apaji Institute, Banasthali University, Rajasthan, India

    ABSTRACT

    Machine Transliteration has come out to be an emerging and a very important research area in the field of machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper transliteration of name entities plays a very significant role in improving the quality of machine translation. In this paper we are doing machine transliteration for English-Punjabi language pair using rule based approach. We have constructed some rules for syllabification. Syllabification is the process to extract or separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper names and location). For those words which do not come under the category of name entities, separate probabilities are being calculated by using relative frequency through a statistical machine translation toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to Punjabi.

    KEYWORDS

    Machine Translation, Machine Transliteration, Name entity recognition, Syllabification.


    For More Details :
    https://airccse.org/journal/ijnlc/papers/2213ijnlc07.pdf


    Volume Link :
    https://airccse.org/journal/ijnlc/vol2.html




HYBRID APPROACHES FOR AUTOMATIC VOWELIZATION OF ARABIC TEXTS

    Mohamed Bebah1 Chennoufi Amine2 Mazroui Azzeddine3Lakhouaja Abdelhak4, 1Arab Center for Research and Policy Studies, Doha, Qatar2Faculty of Sciences/University Mohamed I, Oujda, Morocco3Faculty of Sciences/University Mohamed I, Oujda, Morocco 4Faculty of Sciences/University Mohamed I, Oujda, Morocco

    ABSTRACT

    Hybrid approaches for automatic vowelization of Arabic texts are presented in this article. The process is made up of two modules. In the first one, a morphological analysis of the text words is performed using the open source morphological Analyzer AlKhalil Morpho Sys. Outputs for each word analyzed out of context, are its different possible vowelizations. The integration of this Analyzer in our vowelization system required the addition of a lexical database containing the most frequent words in Arabic language. Using a statistical approach based on two hidden Markov models (HMM), the second module aims to eliminate the ambiguities. Indeed, for the first HMM, the unvowelized Arabic words are the observed states and the vowelized words are the hidden states. The observed states of the second HMM are identical to those of the first, but the hidden states are the lists of possible diacritics of the word without its Arabic letters. Our system uses Viterbi algorithm to select the optimal path among the solutions proposed by Al Khalil Morpho Sys. Our approach opens an important way to improve the performance of automatic vowelization of Arabic texts for other uses in automatic natural language processing.

    KEYWORDS

    Arabic language, Automatic vowelization, morphological analysis, hidden Markov model,corpus.


    For More Details :
    https://airccse.org/journal/ijnlc/papers/3414ijnlc04.pdf


    Volume Link :
    https://airccse.org/journal/ijnlc/vol3.html




HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM

    P H Rathod1M L Dhore2R M Dhore3, 1,2Department of Computer Engineering, Vishwakarma Institute of Technology, Pune3 Pune Vidhyarthi Griha’s College of Engineering and Technology, Pune

    ABSTRACT

    Language transliteration is one of the important areas in NLP. Transliteration is very useful for converting the named entities (NEs) written in one script to another script in NLP applications like Cross Lingual Information Retrieval (CLIR), Multilingual Voice Chat Applications and Real Time Machine Translation (MT). The most important requirement of Transliteration system is to preserve the phonetic properties of source language after the transliteration in target language. In this paper, we have proposed the named entity transliteration for Hindi to English and Marathi to English language pairs using Support Vector Machine (SVM). In the proposed approach, the source named entity is segmented into transliteration units; hence transliteration problem can be viewed as sequence labeling problem. The classification of phonetic units is done by using the polynomial kernel function of Support Vector Machine (SVM). Proposed approach uses phonetic of the source language and n-gram as two features for transliteration .

    KEYWORDS

    Machine Transliteration, n-gram, Support Vector Machine, Syllabification.


    For More Details :
    https://airccse.org/journal/ijnlc/papers/2413ijnlc04.pdf


    Volume Link :
    https://airccse.org/journal/ijnlc/vol2.html




HYBRID APPROACHES FOR AUTOMATIC VOWELIZATION OF ARABIC TEXTS

    Mohamed Bebah1Chennoufi Amine2Mazroui Azzeddine3Lakhouaja Abdelhak4, 1Arab Center for Research and Policy Studies, Doha, Qatar2Faculty of Sciences/University Mohamed I, Oujda, Morocco3 Faculty of Sciences/University Mohamed I, Oujda, Morocco 4 Faculty of Sciences/University Mohamed I, Oujda, Morocco

    ABSTRACT

    Hybrid approaches for automatic vowelization of Arabic texts are presented in this article. The process is made up of two modules. In the first one, a morphological analysis of the text words is performed using the open source morphological Analyzer AlKhalil Morpho Sys. Outputs for each word analyzed out of context, are its different possible vowelizations. The integration of this Analyzer in our vowelization system required the addition of a lexical database containing the most frequent words in Arabic language. Using a statistical approach based on two hidden Markov models (HMM), the second module aims to eliminate the ambiguities. Indeed, for the first HMM, the unvowelized Arabic words are the observed states and the vowelized words are the hidden states. The observed states of the second HMM are identical to those of the first, but the hidden states are the lists of possible diacritics of the word without its Arabic letters. Our system uses Viterbi algorithm to select the optimal path among the solutions proposed by Al Khalil Morpho Sys. Our approach opens an important way to improve the performance of automatic vowelization of Arabic texts for other uses in automatic natural language processing. .

    KEYWORDS

    Arabic language, Automatic vowelization, morphological analysis, hidden Markov model,corpus.


    For More Details :
    https://airccse.org/journal/ijnlc/papers/3414ijnlc04.pdf


    Volume Link :
    https://airccse.org/journal/ijnlc/vol3.html




AN UNSUPERVISED APPROACH TO DEVELOP STEMMER

    Mohd. Shahid Husain ,Department of Information Technology, Integral University, Lucknow

    ABSTRACT

    This paper presents an unsupervised approach for the development of a stemmer (For the case of Urdu & Marathi language). Especially, during last few years, a wide range of information in Indian regional languages has been made available on web in the form of e-data. But the access to these data repositories is very low because the efficient search engines/retrieval systems supporting these languages are very limited. Hence automatic information processing and retrieval is become an urgent requirement. To train the system training dataset, taken from CRULP [22] and Marathi corpus [23] are used. For generating suffix rules two different approaches, namely, frequency based stripping and length based stripping have been proposed. The evaluation has been made on 1200 words extracted from the Emille corpus. The experiment results shows that in the case of Urdu language the frequency based suffix generation approach gives the maximum accuracy of 85.36% whereas Length based suffix stripping algorithm gives maximum accuracy of 79.76%. In the case of Marathi language the systems gives 63.5% accuracy in the case of frequency based stripping and achieves maximum accuracy of 82.5% in the case of length based suffix stripping algorithm. .

    KEYWORDS

    Stemming, Morphology, Urdu stemmer, Marathi stemmer, Information retrieval.


    For More Details :
    https://airccse.org/journal/ijnlc/papers/1212ijnlc02.pdf


    Volume Link :
    https://airccse.org/journal/ijnlc/vol1.html




WORD SENSE DISAMBIGUATION USING WSD SPECIFIC WORDNET OF POLYSEMY WORDS

    Udaya Raj Dhungana1 Subarna Shakya2 Kabita Baral3and Bharat Sharma4, 1,2,4Department of Electronics and Computer Engineering, Central Campus, IOE, Tribhuvan University, Lalitpur, Nepal3Department of Computer Science, GBS, Lamachaur, Kaski, Nepal

    ABSTRACT

    This paper presents a new model of WordNet that is used to disambiguate the correct sense of polysemy word based on the clue words. The related words for each sense of a polysemy word as well as single sense word are referred to as the clue words. The conventional WordNet organizes nouns, verbs, adjectives and adverbs together into sets of synonyms called synsets each expressing a different concept. In contrast to the structure of WordNet, we developed a new model of WordNet that organizes the different senses of polysemy words as well as the single sense words based on the clue words. These clue words for each sense of a polysemy word as well as for single sense word are used to disambiguate the correct meaning of the polysemy word in the given context using knowledge based Word Sense Disambiguation (WSD) algorithms. The clue word can be a noun, verb, adjective or adverb. .

    KEYWORDS

    Word Sense Disambiguation, WordNet, Polysemy Words, Synset, Hypernymy, Context word,Clue Words.


    For More Details :
    https://airccse.org/journal/ijnlc/papers/3414ijnlc05.pdf


    Volume Link :
    https://airccse.org/journal/ijnlc/vol3.html








menu
Reach Us

emailijnlc@aircconline.com


email ijnlcjournal@yahoo.com

close