HSRA: Hindi stopword removal algorithm

Downloads

Downloads per month over past year

Vandana, Jha. and Manjunath, M. and Deepa Shenoy, P. and Venugopal, K.R. (2016) HSRA: Hindi stopword removal algorithm. In: 2016 International Conference on Microelectronics, Computing and Communications (MicroCom), 23-25 Jan. 2016.

[img]
Preview
Text
jha2016.pdf - Published Version

Download (165kB) | Preview
Official URL: https://doi.org/10.1109/MicroCom.2016.7522593

Abstract

In the last few years, electronic documents have been the main source of data in many research areas like Web Mining, Information Retrieval, Artificial Intelligence, Natural Language Processing etc. Text Processing plays a vital role for processing structured or unstructured data from the web. Preprocessing is the main step in any text processing systems. One significant preprocessing technique is the elimination of functional words, also known as stopwords, which affects the performance of text processing tasks. An efficient stopword removal technique is required in all text processing tasks. In this paper, we are proposing a stopword removal algorithm for Hindi Language which is using the concept of a Deterministic Finite Automata (DFA). A large number of available works on stopword removal techniques are based on dictionary containing stopword lists. Then pattern matching technique is applied and the matched patterns, which is a stopword, is removed from the document. It is a time consuming task as searching process takes a long time. This makes the method inefficient and very expensive. In comparison of that, our algorithm has been tested on 200 documents and achieved 99% accuracy and also time efficient.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Text processing, Dictionaries, Feature extraction, Pattern matching, Entropy, Cleaning
Subjects: Faculty of Engineering > Computer Science & Information Science Engineering
Divisions: University Visvesvarayya College of Engineering > Department of Computer Science and Information Science Engineering
Depositing User: Mr. Prashantkumar Jaloji
Date Deposited: 20 Oct 2016 06:52
Last Modified: 20 Oct 2016 06:52
URI: http://eprints-bangaloreuniversity.in/id/eprint/6637

Actions (login required)

View Item View Item