An Industry View
For over 30 years, substantial efforts have been made at 3M to transfer statistical skills and thinking from professional statisticians to engineers, scientists, and business people. Our goal is to decentralize statistical knowledge and services, and our perspective on statistical software reflects this goal.
Statistical software plays a key role in the dissemination process. In training, software allows the students to concentrate on concepts rather than calculations. In application, software provides a link to what was learned in class and a fast means to apply it.
The author is a member of a central group, with consulting, training, and software responsibilities, that acts as a catalyst for this process. Knowledge of the (internal) clients' needs comes from consulting and training interactions and from formal surveys. Until recently this central group was also active in internal software development. Such experience imparts an awareness of the formidable challenges that software providers face in serving the statistical needs of industrial users. This presentation focuses on requirements and critical success factors for statistical software intended for the non-statistician, industrial user.
Requirements for “Industrial-Grade” Software
The requirements for “industrial-grade” software are fairly demanding. In addition to such givens as acceptable cost and core statistical capabilities, it is critical that the software be available for a wide variety of hardware environments. Near-identical versions must be available on commonly used microcomputers and mainframes to facilitate transparent data connectivity, communications on technical matters, and training. The alternative is another Tower of Babel. The nature of the users themselves imposes further requirements. A wide range of user skill levels must be accommodated, from novices to near-experts. The basic user can be easily overwhelmed by too many options, while the advanced user can be frustrated by inflexibility.
The non-statistician sees statistical design and analysis as part of a larger task of product development, process control, or process management. Such a user expects the software to perform the statistical tasks properly and not to inhibit performance of the larger task. For this reason, it is important that the software should interface with commonly used database, spreadsheet, and word processing programs. Graphical display of results in presentation-quality output (both on screen and as hard copy) is also important. Beyond these, the user will be delighted by other features, such as the automatic creation of data entry forms or uncoded response surface plots that go the extra mile to aid the solution of the larger task. Strong prejudices regarding statistical software are apparent even among non-statistician users, but this is more than just bullheadedness. In an atmosphere of stiff business competition where time to market is vital to success, there is limited time available to learn new support tools. The industrial user will be pleased by new capabilities but desires upwardly compatible releases.
Other software requirements include availability of support (either locally or by the vendor), complete and readable documentation, and, in an age of increasingly global businesses, suitability for persons with limited English-language skills.
But the most crucial requirement for “industrial-grade” statistical software, beyond core capabilities, is ease of use for infrequent users! This overriding need is suggested by an internal company survey that showed that, with the exception of basic statistical summaries, plots, and charts, most statistical methods were used by an engineer or scientist on a once-per-month or once-per-quarter basis. From a human factors perspective, this suggests that most industrial users cannot be expected to remember a complex set of commands, protocols, or movements in order to use the statistical software. Appropriate software must rely on recognition (e.g., of icons or application-oriented menus) rather than recall (e.g., of commands or algorithm-oriented menus) to satisfy these users.
Of course, ease of use means more than just recognition aids for the infrequent user. Other important facets are:
well-planned display of information;
understandable and consistent terminology;
reasonable response time; and
helpful error handling and correction.
But the author wants to stress sensitivity to the needs of the infrequent user, which are so often overlooked. While a visually attractive package will sell some copies, only those that genuinely meet the requirements of the infrequent user will survive in the long run.
Implications for Richness.
Given the wide range of user skill levels that must be accommodated by statistical software, creative methods must be employed to ensure the appropriate richness for each user. It is unacceptable to provide software that is designed to meet the needs of the advanced user and assume that the basic user can simply ignore any options or outputs that are not needed. Experience indicates that such a strategy will scare off many basic users. On the other hand, providing different software for basic and advanced users induces an artificial barrier to learning and communication. An alternative is to provide software that can “grow” as the user's skills grow. On output, for instance, this means that different layers of output can be selected reflecting increasing degrees of richness.
While configurable output is already available in many statistical packages, another kind of configuration has yet to be widely exploited. This is the concept of configurable options. In a menu-driven program, this means that the user (or local expert) can configure the menus to exclude certain options that are not needed and only add “clutter” to the program. An example of a non-statistical package employing this feature is Microsoft Word for Windows®. With configurable options, the user or local expert could eliminate unused or undesired options of specific output (e.g., Durbin-Watson statistic) or of whole methods (e.g., nonlinear regression).
Another aspect of richness that is important for the advanced user is the ability to add frequently used capabilities via some sort of “macro” command sequence. As mentioned earlier, strong prejudices exist among statistical software users. A new package may well be rejected by a user or group of users because it lacks only one or two capabilities that the user(s) have come to depend on. The capability to build macros helps overcome such barriers. Even better is the capability to incorporate such macros into the normal program flow as menu items or icons--a capability also available on Word for Windows. This is the “flip side” of configurable options.
With configurable options, statistical software can avoid the “one-size-fits-all” fallacy, without inducing artificial barriers to learning or communications.
Implications for Guidance
The novice user, limited by time constraints and capacity to recall, benefits greatly by appropriate guidance. Such a user is not primarily concerned with guidance as to what the software is doing at a computational level; he or she will generally rely on the trainer or local expert to vouch for its validity and appropriateness. The novice user is more concerned about guidance on how to progress through a reasonable design or analysis from start to finish and how to interpret the output.
Clearly it is easier to provide guidance for progressing through a specific analysis (e.g., regression model fitting) than for a general exploratory analysis. The same can be said for software providing design capabilities: guidance on the particulars of a class of designs (e.g., Plackett-Burman designs) will be easier to provide than guidance on the selection of an appropriate class of designs. For guidance through these more specific kinds of design or analysis problems, it is helpful to have the software options organized by user task rather than by statistical algorithm. For example, it is preferable that algorithms for residual analysis be available within both the linear regression and ANOVA options rather than requiring the user to move to a separate option for residual analysis from each starting point.
Another useful tool for guidance on a specific design or analysis is on-line help for interpreting the output. It is interesting that on-line help for command specification or option selection has been available for years, yet on-line help for output is much less widely used. It seems that many statisticians feel uncomfortable about the idea of a canned interpretation of the results of an ANOVA or regression analysis. Yet even a modest degree of memory-jogging would be enormously helpful. Why should a non-statistician be expected to remember that in a regression analysis, a small p-value on the lack-of-fit test is “bad” (i.e., evidence of lack of fit), while a small p-value on the test for the model is “good” (i.e., evidence of a meaningful model)? Will professional sensibilities be offended if an on-line help statement simply reminds the user (upon request) of the definition of a p-value and the implications of a p-value close to 0 for that particular test?
The author, too, is wary of excessive guidance that would encourage a “black box” attitude toward design or analysis, but feels that statisticians and other software developers must overcome a case of scruples in this regard. A middle ground really is possible.
Guidance on the more general kinds of design and analysis issues invites the development of knowledge-based (expert) systems. A major difficulty here is the nature of statistical knowledge--it is a generic methodology applied to diverse subject-matter areas. A knowledge-based system for, say, medical diagnosis could be relatively stand alone, but effective application of statistical methods requires the integration of subject matter considerations. Any stand-alone system for statistical methods risks segregating statistics from subject matter knowledge in a dangerous way.
For example, in a drying process, oven temperature and product mass can be controlled. Without some incorporation of the basic physics regarding the multiplicative effect of these variables on heat transferred, the stand-alone program might erroneously recommend a 2 × 2 design (where the AB and ab conditions are identical for heat transfer) or suggest fitting an additive model.
One option is to create statistical expert systems to which subject-matter knowledge can be appended. Short of that, statistical knowledge-based systems can exhibit an appropriate degree of modesty by proscribing rather than prescribing: pointing out options that are untenable and suggesting plausible options to explore further, rather than trying to identify one or a few solutions that are “the best.” The system can identify appropriate memory-jogging questions from a catalog of questions such as the following:
Has an appropriate degree of replication been included in the design?
Was this data collected in a completely randomized fashion (as the type of analysis might suggest), or was it collected in blocks?
Needless to say, all of these features would be available upon the user's request rather than imposed without recourse by the program.
Far too many software development dollars are being devoted to adding new capabilities and far too few to enhancing ease of use. It might shock some software providers to realize that the biggest competitor for the recommended statistical software at 3M is not another statistics package, but Lotus 1-2-3®!
Menu-driven software and windowed software for non-statistical applications have raised the ease-of-use expectations of statistical software users. There is a substantial fraction of potential users in industry that will not “buy in” to a statistical software solution that does not combine state-of-the-art ease of use with core capabilities, acceptable cost, and multiple hardware availability.
The future of statistical software is not just a technical issue; it is also a business issue. The providers of most statistical software are private, profit-making companies. These firms often rely on new releases with added capabilities to produce current revenue. In the industrial market, far more revenue is available by providing enhanced ease of use than by adding non-core capabilities. It is hoped that future competition among software products will be more on the basis of guidance and configurability than on the basis of additional non-core capabilities.