TPACTechnology Policy and Assessment Center
 
 

Factor Analysis Optimization: Applied on Natural Language Knowledge Discovery

Robert J. Watts, Alan L. Porter, Ph.D., Donghua Zhu, Ph.D.

Abstract: The Technology Opportunities Analysis of Scientific Information System (Tech OASIS) automates the identification and visualization of relationships inherent in sets (i.e., hundreds or thousands) of literature abstracts. An automated Tech OASIS algorithm applies principal components analysis (PCA), multi-dimensional scaling (MDS) and a path-erasing algorithm to elicit and display clusters of related concepts. However, cluster groupings and visual representations are not singular for the same set of literature abstracts (i.e., user selection of the items to be clustered and the number of factors to be considered will generate alternative cluster solutions and relationships displays). Our current research, herein documented, seeks to identify and automate selection of a "best" PCA factor analysis solution for a set of literature abstracts. How then can a "best" solution be identified? Research on quality measures of factor/cluster groups indicates that terms/factors selections based on entropy, F-measure and cohesiveness appear promising. Our developed approach applies a composite metric, which strives to minimize the factor grouping entropy and F-measure and maximize each group's cohesiveness, while also considering set coverage. We apply the detailed approach to automatically map conceptual (term) relationships for 1202 abstracts concerning "natural language knowledge discovery."

To view this paper (PDF), click here.