Technology Mapping --- An Application on the Internet Domain
Donghua Zhu, Alan Porter, TPAC, 02/07/99
Figure B Team Players or Individual Efforts? Comparison in Four Domains
We have developed a partly
automated approach to generate a family of "technology maps." This enables
us to extract, then visualize, relationship patterns based on topical
searches conducted in scientific and technological abstract databases.
The "Bigmap" program based
on this combination of new mapping algorithms has the ability to automatically
extract and transform "co-occurrence" information from] large sets of
abstract records. Co-occurrence is usually based on the patterns of terms
occurring in the documents. Terms might be "keywords" (subject index terms)
or noun phrases generated from titles or abstracts using our natural language
processing (NLP) routine. Bigmap provides six different relationship maps:
The method further provides
visualization solutions that make it possible to create maps in Microsoft
Word or PowerPoint files using visual basic macro's.
Here is an example which demonstrates
an application of Bigmap to the topic of the "internet." Determination
and visualization of the latent relationships and relative importance
of different elements can support technology management. These results
are based on a collection of 2871 abstract records generated by searching
for the term "internet" and the year 1998 in the INSPEC database. INSPEC
is a large compilation of abstracts of journal and conference papers in
the general areas of electrical engineering, computing, and the physical
sciences. It is produced by IEE and available various ways (e.g., through
"Dialog" or by subscription).
In these six example maps,
we used 1) the top 217 terms, excluding the search term "internet"; 2)
15 factors; 3} the top 20 keywords in the terms map; 4) the top 15 affiliations;
5) the most prolific 15 authors; 6) the leading 15 countries; and 7) the
top 16 sources. [The internet data set is "inter98" in Dnghua's tpac account.]
The six example technology
maps were made by extracting and representing co-occurrences and correlative
information in the data set. Other technology information resides within
and outside the search data set. We believe that additional insightful
representations can be produced by mining that additional information.
A test program called "IM" (Indicator Mapping) has run successfully on
the GT Unix machine. It can automatically extract information from GTEL
and produce an indicator - "Na.m." Na.m is a domain or term's normalized
association measure with a given data set in a certain period. It measures
the domain/term's relative association with a given data set, see our
report "TOA in Data Mining"). N.a.m is the ratio of a keyword's frequency
of occurrence in the user's data set to the keyword's occurrences in the
overall source database. A higher Na.m implies that the technical area
is relatively particular to the user's data set. A lower Na.m implies
that the term is an "universal" term, may be taken as a noise term. This
is the first such innovation indicator from TOA generated automatically.
Recently we found that the
log(Na.m) can be plotted versus the log(Domain Size) to yield an informative
two-dimensional map. This can help discern noise terms from more valuable
terms in the user's data set. Figure
7 shows an example map for the "internet" data.
In figure
7, terms that lie below the "diagonal" measure and are very large
in size -- such as "information technology" or "software tools" -- can
be taken as noise terms from a semantic view. On the other hand, terms
which are above the "diagonal" measure and exhibit higher Na.m -- such
as "online front-ends," "transport protocols," "client-server systems,"
"security of data," or "object-oriented languages" -- are relatively particular
to the data set. We can take them as particularly important terms in depicting
research activity pertaining to the "internet" data set.
We are developing other innovation
indicators. We think important latent patterns in users' data sets can
be discovered from such indicator maps. In the near future, we expect
that an analyst can create a set of innovation indicator maps, such as
the figures in our "KDD/Data Mining" and "Fuel Cell" reports, by just
striking a few keys and making a few choices.