Oral Presentations
Location
Schewel 232
Access Type
Campus Access Only
Entry Number
12
Start Date
4-6-2022 2:15 PM
End Date
4-6-2022 2:30 PM
Department
Computer Science
Abstract
Spell Checking is a function vital to word based applications and search engines, as it can greatly assist the user by reducing typing work, time spent proofreading, and wasted search engine time. The main goal of the spell checker is to be able to identify misspelled words and provide the user correct corrections for misspelled words. The first step in creating the spell checker was to create a program that could gather large amounts of text from the internet, known as a web crawler. This was done by using a ranked list of N-grams based on text provided by gutenberg.org [4], news websites, and others in order to give the data a variety of different English words. This ranked list was used in a web crawler that checked that the ranked N-grams in a text gathered from web pages were within a specified standard deviation of the pre-gathered text, thus determining if the text was English or not. This data was then used to generate a ranked list of words, based on how often they appeared in the data, so the more often a word appeared in the text overall, the more likely it would be recommended as a potential spelling correction. This ranked list of words was combined with NLTK’s dictionary [7] to identify misspelled words and then provide a list of possible corrections based on their rank and Levenshtein distance from the misspelled word. The results of this spell checker is that it could correctly identify and provide the correct spelling for a list of 400 misspelled words with a 90 percent accuracy rate. N-grams and n-gram combinations were also attempted to be used in the detection of misspelled words, but the results varied widely, often producing inconsistent results. The N-grams could be tricked by words being accidentally combined and non-English words that just happen to have N-grams that appeared often in English. This program identified the issue of identifying and correcting proper nouns that do not appear in the dictionary. Additionally, in order to improve the accuracy of the word recommendations, research was done and tests were run in order to find how words were commonly misspelled. By determining that most misspelled words have the first letter and last letter correct and then implementing this discovery into the word recommendation program, the accuracy increased 5 percent. This program shows the importance of spell checking in word based applications and search engine optimization.
Faculty Mentor(s)
Dr. Zakaria Kurdi
Rights Statement
The right to download or print any portion of this material is granted by the copyright owner only for personal or educational use. The author/creator retains all proprietary rights, including copyright ownership. Any editing, other reproduction or other use of this material by any means requires the express written permission of the copyright owner. Except as provided above, or for any other use that is allowed by fair use (Title 17, §107 U.S.C.), you may not reproduce, republish, post, transmit or distribute any material from this web site in any physical or digital form without the permission of the copyright owner of the material.
Comparing Spell Checker Models
Schewel 232
Spell Checking is a function vital to word based applications and search engines, as it can greatly assist the user by reducing typing work, time spent proofreading, and wasted search engine time. The main goal of the spell checker is to be able to identify misspelled words and provide the user correct corrections for misspelled words. The first step in creating the spell checker was to create a program that could gather large amounts of text from the internet, known as a web crawler. This was done by using a ranked list of N-grams based on text provided by gutenberg.org [4], news websites, and others in order to give the data a variety of different English words. This ranked list was used in a web crawler that checked that the ranked N-grams in a text gathered from web pages were within a specified standard deviation of the pre-gathered text, thus determining if the text was English or not. This data was then used to generate a ranked list of words, based on how often they appeared in the data, so the more often a word appeared in the text overall, the more likely it would be recommended as a potential spelling correction. This ranked list of words was combined with NLTK’s dictionary [7] to identify misspelled words and then provide a list of possible corrections based on their rank and Levenshtein distance from the misspelled word. The results of this spell checker is that it could correctly identify and provide the correct spelling for a list of 400 misspelled words with a 90 percent accuracy rate. N-grams and n-gram combinations were also attempted to be used in the detection of misspelled words, but the results varied widely, often producing inconsistent results. The N-grams could be tricked by words being accidentally combined and non-English words that just happen to have N-grams that appeared often in English. This program identified the issue of identifying and correcting proper nouns that do not appear in the dictionary. Additionally, in order to improve the accuracy of the word recommendations, research was done and tests were run in order to find how words were commonly misspelled. By determining that most misspelled words have the first letter and last letter correct and then implementing this discovery into the word recommendation program, the accuracy increased 5 percent. This program shows the importance of spell checking in word based applications and search engine optimization.