Year

2005

Paper Type

Master's Thesis

College

College of Computing, Engineering & Construction

Degree Name

Master of Science in Computer and Information Sciences (MS)

Department

Computing

Committee Chairperson

Dr. Sherif A. Elfayoumy

Second Advisor

Dr. Judith L. Solano

Rights Statement

http://rightsstatements.org/vocab/InC/1.0/

Third Advisor

Dr. Charles N. Winton

Abstract

One problem that exists in today's document management arena is the issue of retrieving information from electronic documents such as images, Microsoft Office documents, and e-mail. Specific data entities must be extracted from these documents so that the data can be searched and queried. This study presents a unique approach to extracting these entities: using Extensible Stylesheet Language Transformations (XSLT) to match patterns in text. Because XSLT is processed at run time, new XSLT templates can be created and used without having to recompile and redeploy the application. The specific implementation addressed in this project extracts entities from an image file. The data in the image file is converted to Extensible Markup Language (XML) text via optical character recognition (OCR), and then this XML text is transformed into an organized, well-formed XML output file using an XSLT template. We show this approach can accurately retrieve the correct data and this method can be extended to other electronic document sources.

Suggested Citation

McManigal, Chris A., "Towards More Comprehensive Information Retrieval Systems: Entity Extraction Using XSLT" (2005). UNF Graduate Theses and Dissertations. 222.
https://digitalcommons.unf.edu/etd/222

Download

Included in

Computer Sciences Commons

COinS

Accessibility Statement

This item was created or digitized before April 24, 2027, or is a reproduction of legacy material created before that date. It is preserved in its original, unmodified state specifically for research, reference, or historical recordkeeping. In accordance with the ADA Title II Final Rule, the Library provides accessible versions of archival materials by request. If you are experiencing difficulty accessing the information on the site due to a disability, please submit a request through the following form for assistance.

UNF Graduate Theses and Dissertations

Towards More Comprehensive Information Retrieval Systems: Entity Extraction Using XSLT

Year

Paper Type

College

Degree Name

Department

Committee Chairperson

Second Advisor

Rights Statement

Third Advisor

Abstract

Suggested Citation

Included in

Accessibility Statement

Search

Links

Browse

Author Corner

UNF Graduate Theses and Dissertations

Towards More Comprehensive Information Retrieval Systems: Entity Extraction Using XSLT

Author

Year

Paper Type

College

Degree Name

Department

Committee Chairperson

Second Advisor

Rights Statement

Third Advisor

Abstract

Suggested Citation

Included in

Share

Accessibility Statement

Search

Links

Browse

Author Corner