3. Technology Opportunities and Competitive Analysis
Examination of the set of
36 technical areas prominently associated with KDD/DM articles yields
additional insights. First, there is a general trend for those topics
engaging industry researchers to be fast growing (see Figure 18). Two
groups of technical areas deviate from this general trend. The following
areas show relatively slow growth with considerable industry involvement
(Pc.p.p): data analysis, information retrieval, relational databases,
expert systems, and knowledge based systems (GroupI domains, from Figure
18). This suggests possible maturing of the field or slowdown in terms
of research, but some of them promise commercial potential. The second
deviant group (II, from Figure18) shows relatively high growth rate but
lacks strong industry participation. (Within this group, " Rough
sets" is still small so might well attract industry attention if
activity increases.)
Figure 18 The Correlation of Company Participation and Domain Growth Rate
Figure 19 The Development Patterns of the Domains with High Growth Rate
Focusing on these 12 terms
that show particularly compelling growth, we now divide them according
to the extent of industry involvement in R&D for each. The rationale
is that increasing industry involvement reflects increasing commercialization
opportunity. Figure 19 locates each technical area by its growth rate
(Y axis) and industrial involvement (X axis). We observe:
Figure 20 locates the 36 technical
areas, as just discussed, in terms of relative industry involvement (Y
axis) and relative emphasis in KDD/DM (X axis). The two technical areas
most concentrated in the KDD/DM domain -- "association rules"
and "very large databases" -- both show especially strong industry
participation. IBM is notably active in publishing on "association
rules"; a number of companies are publishing aggressively on "very
large databases" (IBM, AT&T, Microsoft, Thinking Machines, SAS
Institute, Oracle, MCC). It's interesting that the remaining KDD/DM technical
areas are mainly academic, with the exception of "business data processing"
and the striking outlier, "data warehouse." This suggests that
much of the KDD/DM basic approaches/techniques are still predominantly
being addressed in academia. Industry might want to track developments
in these domains with special attention to identify early opportunities
for commercial application.
Figure 20 Relative Industry Involvement and KDD/DM Concentration
Figure 21 and figure 22 portray
the percentage of documents in INSPEC (R&D database) and COMP (trade
database) that mention "product(s) or software" -- a candidate
indicator of commercial readiness -- along the X axis. The number of documents
noting a gi ven topic is shown both by circle size and Y axis position
-- redundant for emphasis. For example, the very large number of articles
in COMP on "relational databases," "distributed databases,"
and "object-oriented databases" is reinforced by the extensive
mention of "product(s) or software" in those articles -- i.e.,
they appear in the upper right quadrant of Figure 22.
Figure 21 Domains' size vs. Commercial Development (from INSPEC)
Visual databases" illustrates
the potential of these depictions. From Figure 21 we note that a lot of
R&D is being directed to this topic. From Figure 22, we note that
while current trade publication attention to "visual databases"
is limited, what there is strongly points to interest in "product(s)
or software."
Figure 22 Domains' size vs. Commercial Development (from COMP)
"Deductive databases"
presents another intriguing case. In Figure 21, the INSPEC R&D data,
it profiles much like "visual databases" -- a lot of activity
with moderate pointers toward "product(s) or software." But
in Figure 22, COMP shows minimal attenti on to the topic, implying a lack
of commercial interest.
3.2 Data Mining tools from the WWW
The number of host computers
on the Internet has leapt from about 200 in 1980 to over 10 million in
1996. The challenge for us is whether we can mine "innovation forecasting"
knowledge from this huge source of information.
Table G in the appendix lists
140 KDD/DM tools found on the WWW. In INPEC and COMP we have fewer than
700 research articles abstracted on KDD/DM, but we have identified 140
KDD/DM tools from WWW! (Most are from "Knowledge Discovery Nuggets"
a famous web page in KDD/DM.) We infer from this that a lot of companies
which did develop data mining tools may not make significant contributions
to the research literature.
Figure 23 partitions tool-oriented
domains into industrially emphasized and academic. The tools relating
to "multiple discover tasks," "classification," and
"visualization" form the mainstream of commercial development.
Tools for "link analysis," "clust ering," and "summarization"
are heavily academic. Figure 23 suggests that the tools in "text
mining" and "summarization" are still in the rudimentary
stage of development.
Figure 23 KDD/DM Tools: Commercial vs. Non-profit
Figure 24 looks into the techniques
on which classification tools are based. We found that the "neural
network" and "decision tree" approaches are the main bases
of commercial tools in classification, and the tools based on "fuzzy
logic" and "rough set" approaches are still primarily academic.
Figure 24 The approaches on which "Classification" tools
are based:
Company-developed vs. Non-profit-developed
Figure 25 surveys the application
fields for which the KDD/DM tools are developed. "Marketing,"
"banking," and "scientific research" are the main
application domains for KDD/TOA tools.
Figure 25 KDD/DM tools for different application domains
3.3 Competition Analysis
The COMP search set identifies
159 companies associated with the "data mining" articles. Figure
26 shows the most active 18 companies. Of these, IBM, Microsoft, and Silicon
Graphics also publish actively on KDD/DM (INSPEC search results -- Table
E in the appendix).
The companies associated with
data mining appear to fit three categories: