Exclusive Content & Downloads from ASQ

Pre-Analysis of Superlarge Industrial Data Sets

Summary: Successful analysis of superlarge data sets requires statistical procedures that automatically clean the data and uncover simple structure. The protocol described applies to multivariate industrial data from continuous manufacturing processes with feedback and feedforward control. Our methods from a twelve-step sequence that edits and re-lags the time series, as well as applying diagnostics to look for subtle data flaws. At different stages, the protocol will reject data, impute data, re-lag the time series, flag categories of suspicious data, and divide the data set into more homogeneous subsets. The result is a clean data set ready for analysis using standard statistical packages or software tools. Although there is no guarantee that every corruption has been caught and corrected, the output data set is more thoroughly examined than traditional human-intensive methods can achieve. To assist in this preliminary analysis, four graphical methods are described, which were developed in studies of glass manufacture from PPG Industries' production plants and sheet aluminum production by Alcoa.

Anyone with a subscription, including Site and Enterprise members, can access this article.

Other Ways to Access content:

Join ASQ

Join ASQ as a Full member. Enjoy all the ASQ member benefits including access to many online articles.

Subscribe to Journal of Quality Technology

Access this and ALL OTHER Journal of Quality Technology online articles. You'll also receive the print version by mail.

  • Topics: Data Quality
  • Keywords: Data collection,Multivariate time series,Statistical methods,Graphics
  • Author: Banks, David; Parmigiani, Giovanni
  • Journal: Journal of Quality Technology