Year of Publication

2017

Season of Publication

Fall

Paper Type

Master's Thesis

College

College of Computing, Engineering & Construction

Degree Name

Master of Science in Computer and Information Sciences (MS)

Department

Computing

NACO controlled Corporate Body

University of North Florida. School of Computing

First Advisor

Dr. Sherif A. Elfayoumy

Second Advisor

Dr. Robert F. Roggio

Third Advisor

Dr. Karthikeyan Umapathy

Department Chair

Dr. Sherif A. Elfayoumy

College Dean

Dr. Mark Tumeo

Abstract

In this thesis, the improvement to relevance in computerized search results is studied. Information search tools return ranked lists of documents ordered by the relevance of the documents to the user supplied search. Using a small number of words and phrases to represent complex ideas and concepts causes user search queries to be information sparse. This sparsity challenges search tools to locate relevant documents for users. A review of the challenges to information searches helps to identify the problems and offer suggestions in improving current information search tools. Using the suggestions put forth by the Strategic Workshop on Information Retrieval in Lorne (SWIRL), a composite scoring approach (Composite Scorer) is developed. The Composite Scorer considers various aspects of information needs to improve the ranked results of search by returning records relevant to the user’s information need.

The Florida Fusion Center (FFC), a local law enforcement agency has a need for a more effective information search tool. Daily, the agency processes large amounts of police reports typically written as text documents. Current information search methods require inordinate amounts of time and skill to identify relevant police reports from their large collection of police reports.

An experiment conducted by FFC investigators contrasted the composite scoring approach against a common search scoring approach (TF/IDF). In the experiment, police investigators used a custom-built software interface to conduct several use case scenarios for searching for related documents to various criminal investigations. Those expert users then evaluated the results of the top ten ranked documents returned from both search scorers to measure the relevance to the user of the returned documents. The evaluations were collected and measurements used to evaluate the performance of the two scorers. A search with many irrelevant documents has a cost to the users in both time and potentially in unsolved crimes. A cost function contrasted the difference in cost between the two scoring methods for the use cases. Mean Average Precision (MAP) is a common method used to evaluate the performance of ranked list search results. MAP was computed for both scoring methods to provide a numeric value representing the accuracy of each scorer at returning relevant documents in the top-ten documents of a ranked list of search results.

The purpose of this study is to determine if a composite scoring approach to ranked lists, that considers multiple aspects of a user’s search, can improve the quality of search, returning greater numbers of relevant documents during an information search. This research contributes to the understanding of composite scoring methods to improve search results. Understanding the value of composite scoring methods allows researchers to evaluate, explore and possibly extend the approach, incorporating other information aspects such as word and document meaning.

Share

COinS