UTS site search

Professor Matt Wand

Biography

Professor Matt Wand is a Distinguished Professor of Statistics at University of Technology, Sydney.

His latest research outputs are available on the website: http://matt-wand.utsacademics.info

He has held faculty appointments at Harvard University, Rice University, Texas A&M University, University of New South Wales and University of Wollongong. In 2008 Professor Wand became an elected Fellow of the Australian Academy of Science. He also has been awarded the two Australian Academy of Science honorific awards for statistical research: the Moran Medal in 1997 for outstanding research by scientists under the age of 40 and the Hannan Medal in 2013 for career research in statistical science. In 2013 he was awarded the University of Technology, Sydney, Chancellor's Medal for Exceptional Research. He received the 2013 Pitman Medal from the Statistical Society of Australia in recognition of outstanding achievement in, and contribution to, the discipline of Statistics. Professor Wand is an elected fellow of the American Statistical Association and the Institute of Mathematical Statistics.

Professor Wand has co-authored two books and more than 100 papers in statistics journals. He has six packages in the R language on the Comprehensive R Archive Network.

In 2002 Professor Wand was ranked 23 among highly cited authors in mathematics and statistics for the period 1991–2001. He is also a member of the ‘ISI Highly Cited Researchers’ list. Since 2000 Professor Wand has been principal investigator on seven major grants. A recent one, an Australian Research Council Discovery Project, is titled ‘Semiparametric Regression for Streaming Data’ and will run for the years 2015–2017. Another is the `Centre of Excellence for Mathematical and Statistical Frontiers of Big Data, Big Models, New Insights' and is running during 2014-2020.

For more information visit his personal website matt-wand.utsacademics.info.

Professional

Matt serves as an associate editor for the Statistics journal: Australian and New Zealand Journal of Statistics.

He has previously served as an associate editor for the Journal of the American Statistical Association, Biometrika and Statistica Sinica.

He also participates in committee work within the Australian Academy of Science.

Image of Matt Wand
Distinguished Professor, School of Mathematical Sciences
B Mathematics (Hons Class 1), Ph D
 
Phone
+61 2 9514 2240
Room
CB01.15.26

Research Interests

Professor Wand is chiefly interested in the development of statistical methodology for finding useful structure in large multivariate data sets.

Currently, Matt’s specific interests include: variational approximate methods, statistical methods for streaming data, generalised linear mixed models, semiparametric regression, spatial statistics, multivariate density estimation and feature significance.

He is also very interested in Statistical Computing and contributes to the field's main software repository — the ‘Comprehensive R Archive Network’.

Recent research by Wand and co-authors on real-time semiparametric regression is described on the Real-time Semiparametric Regression website (opens an external site).

Can supervise: Yes
Matt Wand is currently supervising: Marianne Menictas, PhD candidate; Cathy (Yuen Yi) Lee, PhD candidate; Andy (Sang Il) Kim, PhD candidate.

35393 Seminar (Statistics)

Books

Ruppert, D., Wand, M. & Carroll, R.J. 2003, Semiparametric Regression, 1, Cambridge University Press, New York.
Wand, M. & Jones, M.C. 1995, Kernel Smoothing, First, Chapman and Hall, London.

Conferences

Nevillea, S.E. & Wand, M. 2011, 'Generalised Extreme Value geoadditive model analysis via variational Bayes', Procedia Environmental Sciences, Elsevier, The Netherlands, pp. 8-13.
View/Download from: Publisher's site
We devise a variationalBayes algorithm for fast approximate inference in Bayesian GeneralizedExtremeValue additive modelanalysis. Such models are useful for flexibly assessing the impact of continuous predictor variables on sample extremes. The new methodology allows large Bayesian models to be fitted and assessed without the significant computing costs of Monte Carlo methods

Journal articles

Gloag, E.S., Turnbull, L., Huang, A., Vallotton, P., Wang, H., Nolan, L.M., Mililli, L., Hunt, C., Lu, J., Osvath, S.R., Monahan, L.G., Cavaliere, R., Charles, I.G., Wand, M.P., Gee, M.L., Prabhakar, R. & Whitchurch, C.B. 2013, 'Self-organization of bacterial biofilms is facilitated by extracellular DNA', Proceedings of the National Academy of Sciences of the United States of America, vol. 110, no. 28, pp. 11541-11546.
View/Download from: Publisher's site
Twitching motility-mediated biofilm expansion is a complex, multicellular behavior that enables the active colonization of surfaces by many species of bacteria. In this study we have explored the emergence of intricate network patterns of interconnected trails that form in actively expanding biofilms of Pseudomonas aeruginosa. We have used high-resolution, phase-contrast time-lapse microscopy and developed sophisticated computer vision algorithms to track and analyze individual cell movements during expansion of P. aeruginosa biofilms. We have also used atomic force microscopy to examine the topography of the substrate underneath the expanding biofilm. Our analyses reveal that at the leading edge of the biofilm, highly coherent groups of bacteria migrate across the surface of the semisolid media and in doing so create furrows along which following cells preferentially migrate. This leads to the emergence of a network of trails that guide mass transit toward the leading edges of the biofilm. We have also determined that extracellular DNA (eDNA) facilitates efficient traffic flow throughout the furrow network by maintaining coherent cell alignments, thereby avoiding traffic jams and ensuring an efficient supply of cells to the migrating front. Our analyses reveal that eDNA also coordinates the movements of cells in the leading edge vanguard rafts and is required for the assembly of cells into the "bulldozer" aggregates that forge the interconnecting furrows. Our observations have revealed that large-scale self-organization of cells in actively expanding biofilms of P. aeruginosa occurs through construction of an intricate network of furrows that is facilitated by eDNA.
Huang, A. & Wand, M. 2013, 'Simple marginally noninformative prior distributions for covariance matrices', Bayesian Analysis, vol. 8, no. 2, pp. 439-452.
View/Download from: Publisher's site
A family of prior distributions for covariance matrices is studied. Members of the family possess the attractive property of all standard deviation and correlation parameters being marginally noninformative for particular hyper-parameter choices. Moreove
Menictas, M. & Wand, M. 2013, 'Variational inference for marginal longitudinal semiparametric regression', Stat, vol. 2, no. 1, pp. 61-71.
View/Download from: Publisher's site
We derive a variational inference procedure for approximate Bayesian inference in marginal longitudinal semiparametric regression. Fitting and inference is much faster than existing Markov chain Monte Carlo approaches. Numerical studies indicate that the new methodology is very accurate for the class of models under consideration. Copyright 2013 John Wiley & Sons Ltd
Wand, M., Ormerod, J.T. & Pham, T. 2013, 'Mean field variational Bayesian inference for nonparametric regression with measurement error', Computational Statistics and Data Analysis, vol. 68, no. 1, pp. 375-387.
View/Download from: Publisher's site
A fast mean field variational Bayes (MFVB) approach to nonparametric regression when the predictors are subject to classical measurement error is investigated. It is shown that the use of such technology to the measurement error setting achieves reasonable accuracy. In tandem with the methodological development, a customized Markov chain Monte Carlo method is developed to facilitate the evaluation of accuracy of the MFVB method.
Ormerod, J.T. & Wand, M. 2012, 'Gaussian Variational Approximate Inference For Generalized Linear Mixed Models', Journal of Computational and Graphical Statistics, vol. 21, no. 1, pp. 2-17.
View/Download from: Publisher's site
Variational approximation methods have become a mainstay of contemporary machine learning methodology, but currently have little presence in statistics. We devise an effective variational approximation strategy for fitting generalized linear mixed models
Wand, M. & Ormerod, J.T. 2012, 'Continued fraction enhancement of Bayesian computing', Stat, vol. 1, no. 1, pp. 31-41.
The agd number theoretic concept of continued fractions can enhance certain Bayesian computations. The crux of this claim is due to continued fraction representations of numerically challenging special function ratios that arise in Bayesian computing. Continued fraction approximation via Lentz's Algorithm often leads to efficient and stable computation of such quantities.
Hall, P., Ormerod, J.T. & Wand, M. 2011, 'Theory of Gaussian variational approximation for a Poisson mixed model', Statistica Sinica, vol. 21, no. 1, Special Issue, pp. 369-389.
Likelihood-based inference for the parameters of generalized linear mixed models is hindered by the presence of intractable integrals. Gaussian variational approximation provides a fast and effective means of approximate inference. We provide some theory for this type of approximation for a simple Poisson mixed model. In particular, we establish consistency at rate m(-1/2) + n(-1), where in is the number of groups and n is the number of repeated measurements.
Chacon, J.E., Duong, T. & Wand, M. 2011, 'Asymptotics for general multivariate kernel density derivative estimators', Statistica Sinica, vol. 21, pp. 807-840.
We investigate kernel estimators of multivariate density derivative functions using general (or unconstrained) bandwidth matrix selectors. These density derivative estimators have been relatively less well researched than their density estimator analogues. A major obstacle for progress has been the intractability of the matrix analysis when treating higher order multivariate derivatives. With an alternative vectorization of these higher order derivatives, mathematical intractabilities are surmounted in an elegant and unified framework. The finite sample and asymptotic analysis of squared errors for density estimators are generalized to density derivative estimators. Moreover, we are able to exhibit a closed form expression for a normal scale bandwidth matrix for density derivative estimators. These normal scale bandwidths are employed in a numerical study to demonstrate the gain in performance of unconstrained selectors over their constrained counterparts.
Goldsmith, J., Wand, M.P. & Crainiceanu, C. 2011, 'Functional regression via variational Bayes.', Electron J Stat, vol. 5, pp. 572-602.
We introduce variational Bayes methods for fast approximate inference in functional regression analysis. Both the standard cross-sectional and the increasingly common longitudinal settings are treated. The methodology allows Bayesian functional regression analyses to be conducted without the computational overhead of Monte Carlo methods. Confidence intervals of the model parameters are obtained both using the approximate variational approach and nonparametric resampling of clusters. The latter approach is possible because our variational Bayes functional regression approach is computationally efficient. A simulation study indicates that variational Bayes is highly accurate in estimating the parameters of interest and in approximating the Markov chain Monte Carlo-sampled joint posterior distribution of the model parameters. The methods apply generally, but are motivated by a longitudinal neuroimaging study of multiple sclerosis patients. Code used in simulations is made available as a web-supplement.
Faes, C., Ormerod, J.T. & Wand, M. 2011, 'Variational Bayesian inference for parametric and nonparametric regression with missing data', Journal of the American Statistical Association, vol. 105, no. 495, pp. 959-971.
View/Download from: Publisher's site
Bayesian hierarchical models are attractive structures for conducting regression analyses when the data are subject to missingness. However, the requisite probability calculus is challenging and Monte Carlo methods typically are employed. We develop an alternative approach based on deterministic variational Bayes approximations. Both parametric and nonparametric regression are considered. Attention is restricted to the more challenging case of missing predictor data. We demonstrate that variational Bayes can achieve good accuracy, but with considerably less computational overhead. The main ramification is fast approximate Bayesian inference in parametric and nonparametric regression models with missing data.
Wang, S.S. & Wand, M. 2011, 'Using Infer.NET for statistical analyses', The American Statistician, vol. 65, pp. 115-126.
View/Download from: Publisher's site
We demonstrate and critique the new Bayesian inference package Infer.NET in terms of its capacity for statistical analyses. Infer.NET differs from the well-known BUGS Bayesian inference packages in that its main engine is the variational Bayes family of deterministic approximation algorithms rather than Markov chain Monte Carlo. The underlying rationale is that such deterministic algorithms can handle bigger problems due to their increased speed, despite some loss of accuracy. We find that Infer.NET is a well-designed computational framework and offers significant speed advantages over BUGS. Nevertheless, the current release is limited in terms of the breadth of models it can handle, and its inference is sometimes inaccurate. Supplemental materials accompany the online version of this article.
Wand, M., Ormerod, J.T., Padoan, S.A. & Fruhwirth, R. 2011, 'Mean field variational Bayes for elaborate distributions', Bayesian Analysis, vol. 6, no. 4, pp. 847-900.
View/Download from: Publisher's site
We develop strategies for mean eld variational Bayes approximate inference for Bayesian hierarchical models containing elaborate distributions. We loosely dene elaborate distributions to be those having more complicated forms compared with common distributions such as those in the Normal and Gamma families. Examples are Asymmetric Laplace, Skew Normal and Generalized Ex- treme Value distributions. Such models suer from the diculty that the param- eter updates do not admit closed form solutions. We circumvent this problem through a combination of (a) specially tailored auxiliary variables, (b) univariate quadrature schemes and (c) nite mixture approximations of troublesome den- sity functions. An accuracy assessment is conducted and the new methodology is illustrated in an application
Wand, M. & Ormerod, J.T. 2011, 'Penalized wavelets: Embedding wavelets into semiparametric regression', Electronic Journal of Statistics, vol. 5, no. 1, pp. 1654-1717.
View/Download from: Publisher's site
We introduce the concept of penalized wavelets to facilitate seamless embedding of wavelets into semiparametric regression models. In particular, we show that penalized wavelets are analogous to penalized splines; the latter being the established approach to function estimation in semiparametric regression. They differ only in the type of penalization that is appropriate. This fact is not borne out by the existing wavelet literature, where the regression modelling and fitting issues are overshadowed by computational issues such as efficiency gains afforded by the Discrete Wavelet Transform and partially obscured by a tendency to work in the wavelet coefficient space. With penalized wavelet structure in place, we then show that fitting and inference can be achieved via the same general approaches used for penalized splines: penalized least squares, maximum likelihood and best prediction within a frequentist mixed model framework, and Markov chain Monte Carlo and mean field variational Bayes within a Bayesian framework. Penalized wavelets are also shown have a close relationship with wide data (pn) regression and benefit from ongoing research on that topic
Neville, S., Palmer, M. & Wand, M. 2011, 'Generalized Extreme Value Additive Model Analysis Via Mean Field Variational Bayes', Australian & New Zealand Journal of Statistics, vol. 53, no. 3, pp. 305-330.
View/Download from: Publisher's site
We develop Mean Field Variational Bayes methodology for fast approximate inference in Bayesian Generalized Extreme Value additive model analysis. Such models are useful for flexibly assessing the impact of continuous predictor variables on sample extreme
Hall, P., Pham, T., Wand, M. & Wang, S.S. 2011, 'Asymptotic normality and valid inference for Gaussian variational approximation', Annals of Statistics, vol. 39, no. 1, pp. 2502-2532.
View/Download from: Publisher's site
We derive the precise asymptotic distributional behavior of Gaussian variational approximate estimators of the parameters in a single-predictor Poisson mixed model. These results are the deepest yet obtained concerning the statistical properties of a variational approximation method. Moreover, they give rise to asymptotically valid statistical inference. A simulation study demonstrates that Gaussian variational approximate confidence intervals possess good to excellent coverage properties, and have a similar precision to their exact likelihood counterparts.
Samworth, R.J. & Wand, M.P. 2010, 'Asymptotics and optimal bandwidth selection for highest density region estimation', Annals of Statistics, vol. 38, no. 3, pp. 1767-1792.
View/Download from: Publisher's site
We study kernel estimation of highest-density regions (HDR). Our main contributions are two-fold. First, we derive a uniform-in-bandwidth asymptotic approximation to a risk that is appropriate for HDR estimation. This approximation is then used to derive a bandwidth selection rule for HDR estimation possessing attractive asymptotic properties. We also present the results of numerical studies that illustrate the benefits of our theory and methodology.
Ormerod, J.T. & Wand, M. 2010, 'Explaining variational approximations', The American Statistician, vol. 64, no. 2, pp. 140-153.
View/Download from: Publisher's site
Variational approximations facilitate approximate inference for the parameters in complex statistical models and provide fast, deterministic alternatives to Monte Carlo methods. However, much of the contemporary literature on variational approximations is in Computer Science rather than Statistics, and uses terminology, notation, and examples from the former field. In this article we explain variational approximation in statistical terms. In particular, we illustrate the ideas of variational approximation using examples that are familiar to statisticians.
Kadiri, M.A., Carroll, R.J. & Wand, M.P. 2010, 'Marginal longitudinal semiparametric regression via penalized splines.', Stat Probab Lett, vol. 80, no. 15-16, pp. 1242-1252.
View/Download from: Publisher's site
We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.
Marley, J.K. & Wand, M. 2010, 'Non-standard semiparametric regression via BRugs', Journal of Statistical Software, vol. 37, no. 5, pp. 1-30.
We provide several illustrations of Bayesian semiparametric regression analyses in the BRugs package. BRugs facilitates use of the BUGS inference engine from the R computing environment and allows analyses to be managed using scripts. The examples are chosen to represent an array of non-standard situations, for which mixed model software is not viable. The situations include: the response variable being outside of the one-parameter exponential family, data subject to missingness, data subject to measurement error and parameters entering the model via an index.
Kauermann, G., Ormerod, J.T. & Wand, M. 2010, 'Parsimonious classification via generalised linear mixed models', Journal of Classification, vol. 27, no. 1, pp. 89-110.
View/Download from: Publisher's site
We devise a classification algorithm based on generalized linear mixed model (GLMM) technology. The algorithm incorporates spline smoothing, additive model-type structures and model selection. For reasons of speed we employ the Laplace approximation, rather than Monte Carlo methods. Tests on real and simulated data show the algorithm to have good classification performance. Moreover, the resulting classifiers are generally interpretable and parsimonious.
Naumann, U., Luta, G. & Wand, M.P. 2010, 'The curvHDR method for gating flow cytometry samples.', BMC Bioinformatics, vol. 11, p. 44.
View/Download from: Publisher's site
High-throughput flow cytometry experiments produce hundreds of large multivariate samples of cellular characteristics. These samples require specialized processing to obtain clinically meaningful measurements. A major component of this processing is a form of cell subsetting known as gating. Manual gating is time-consuming and subjective. Good automatic and semi-automatic gating algorithms are very beneficial to high-throughput flow cytometry.
Pearce, N.D. & Wand, M. 2009, 'Explicit connections between longitudinal data analysis and kernel machines', Electronic Journal of Statistics, vol. 3, pp. 797-823.
View/Download from: Publisher's site
Two areas of research longitudinal data analysis and kernel machines have large, but mostly distinct, literatures. This article shows explicitly that both fields have much in common with each other. In particular, many popular longitudinal data fitting procedures are special types of kernel machines. These connections have the potential to provide fruitful cross-fertilization between longitudinal data analytic and kernel machine methodology
Naumann, U. & Wand, M.P. 2009, 'Automation in high-content flow cytometry screening.', Cytometry A, vol. 75, no. 9, pp. 789-797.
View/Download from: Publisher's site
High-content flow cytometric screening (FC-HCS) is a 21st Century technology that combines robotic fluid handling, flow cytometric instrumentation, and bioinformatics software, so that relatively large numbers of flow cytometric samples can be processed and analysed in a short period of time. We revisit a recent application of FC-HCS to the problem of cellular signature definition for acute graft-versus-host-disease. Our focus is on automation of the data processing steps using recent advances in statistical methodology. We demonstrate that effective results, on par with those obtained via manual processing, can be achieved using our automatic techniques. Such automation of FC-HCS has the potential to drastically improve diagnosis and biomarker identification.
Duong, T., Koch, I. & Wand, M.P. 2009, 'Highest density difference region estimation with application to flow cytometric data.', Biom J, vol. 51, no. 3, pp. 504-521.
View/Download from: Publisher's site
Motivated by the needs of scientists using flow cytometry, we study the problem of estimating the region where two multivariate samples differ in density. We call this problem highest density difference region estimation and recognise it as a two-sample analogue of highest density region or excess set estimation. Flow cytometry samples are typically in the order of 10,000 and 100,000 and with dimension ranging from about 3 to 20. The industry standard for the problem being studied is called Frequency Difference Gating, due to Roederer and Hardy (2001). After couching the problem in a formal statistical framework we devise an alternative estimator that draws upon recent statistical developments such as patient rule induction methods. Improved performance is illustrated in simulations. While motivated by flow cytometry, the methodology is suitable for general multivariate random samples where density difference regions are of interest.
Staudenmayer, J., Lake, E.E. & Wand, M. 2009, 'Robustness for general design mixed models using the t-distribution', Statistical Modelling, vol. 9, no. 3, pp. 235-255.
View/Download from: Publisher's site
The t-distribution allows the incorporation of outlier robustness into statistical models while retaining the elegance of likelihood-based inference. In this paper, we develop and implement a linear mixed model for the general design of the linear mixed model using the univariate t-distribution. This general design allows a considerably richer class of models to be fit than is possible with existing methods. Included in this class are semi-parametric regression and smoothing and spatial models.
Ruppert, D., Wand, M.P. & Carroll, R.J. 2009, 'Semiparametric regression during 2003-2007.', Electron J Stat, vol. 3, pp. 1193-1256.
View/Download from: Publisher's site
Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology - thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.
Wand, M. 2009, 'Semiparametric regression and graphical models', Australian & New Zealand Journal of Statistics, vol. 51, no. 1, pp. 9-41.
View/Download from: Publisher's site
Semiparametric regression models that use spline basis functions with penalization have graphical model representations. This link is more powerful than previously established mixed model representations of semiparametric regression, as a larger class of models can be accommodated. Complications such as missingness and measurement error are more naturally handled within the graphical model architecture. Directed acyclic graphs, also known as Bayesian networks, play a prominent role. Graphical model-based Bayesian `inference engines, such as bugs and vibes, facilitate fitting and inference. Underlying these are Markov chain Monte Carlo schemes and recent developments in variational approximation theory and methodology
Duong, T., Cowling, A., Koch, I. & Wand, M. 2008, 'Feature significance for multivariate kernel density estimation', Computational Statistics and Data Analysis, vol. 52, no. 9, pp. 4225-4242.
View/Download from: Publisher's site
Multivariate kernel density estimation provides information about structure in data. Feature significance is a technique for deciding whether featuressuch as local extremaare statistically significant. This paper proposes a framework for feature significance in d-dimensional data which combines kernel density derivative estimators and hypothesis tests for modal regions. For the gradient and curvature estimators distributional properties are given, and pointwise test statistics are derived. The hypothesis tests extend the two-dimensional feature significance ideas of Godtliebsen et al. [Godtliebsen, F., Marron, J.S., Chaudhuri, P., 2002. Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics 11, 121]. The theoretical framework is complemented by novel visualization for three-dimensional data. Applications to real data sets show that tests based on the kernel curvature estimators perform well in identifying modal regions. These results can be enhanced by corresponding tests with kernel gradient estimators.
Fan, Y., Leslie, D.S. & Wand, M.P. 2008, 'Generalised linear mixed model analysis via sequential Monte Carlo sampling', Electronic Journal of Statistics, vol. 2, no. 9, pp. 6-938.
View/Download from: Publisher's site
We present a sequential Monte Carlo sampler algorithm for the Bayesian analysis of generalised linear mixed models (GLMMs). These models support a variety of interesting regression-type analyses, but performing inference is often extremely difficult, even when using the Bayesian approach combined with Markov chain Monte Carlo (MCMC). The Sequential Monte Carlo sampler (SMC) is a new and general method for producing samples from posterior distributions. In this article we demonstrate use of the SMC method for performing inference for GLMMs. We demonstrate the effectiveness of the method on both simulated and real data, and find that sequential Monte Carlo is a competitive alternative to the available MCMC techniques.
Padoan, S.A. & Wand, M. 2008, 'Mixed model-based additive models for sample extremes', Statistics & Probability Letters, vol. 78, no. 17, pp. 2850-2858.
View/Download from: Publisher's site
We consider additive models fitting and inference when the response variable is a sample extreme. Non-linear covariate effects are handled using the mixed model representation of penalised splines. A fitting algorithm based on likelihood approximations is derived. The efficacy of the resulting methodology is demonstrated via application to simulated and real data.
Kuo, F.Y., Dunsmuir, W.T., Sloan, I.H., Wand, M. & Womersley, R.S. 2008, 'Quasi-Monte Carlo for highly structured generalised response models', Methodology and Computing in Applied Probability, vol. 10, no. 2, pp. 239-275.
View/Download from: Publisher's site
Highly structured generalised response models, such as generalised linear mixed models and generalised linear models for time series regression, have become an indispensable vehicle for data analysis and inference in many areas of application. However, their use in practice is hindered by high-dimensional intractable integrals. Quasi-Monte Carlo (QMC) is a dynamic research area in the general problem of high-dimensional numerical integration, although its potential for statistical applications is yet to be fully explored. We survey recent research in QMC, particularly lattice rules, and report on its application to highly structured generalised response models. New challenges for QMC are identified and new methodologies are developed. QMC methods are seen to provide significant improvements compared with ordinary Monte Carlo methods.
Smith, A.D. & Wand, M.P. 2008, 'Streamlined variance calculations for semiparametric mixed models.', Stat Med, vol. 27, no. 3, pp. 435-448.
View/Download from: Publisher's site
Semiparametric mixed model analysis benefits from variability estimates such as standard errors of effect estimates and variability bars to accompany curve estimates. We show how the underlying variance calculations can be done extremely efficiently compared with the direct nave approach. These streamlined calculations are linear in the number of subjects, representing a two orders of magnitude improvement.
Ormerod, J.T., Wand, M. & Koch, I. 2008, 'Penalised spline support vector classifiers: computational issues', Computational Statistics, vol. 23, no. 4, pp. 623-641.
View/Download from: Publisher's site
We study computational issues for support vector classification with penalised spline kernels. We show that, compared with traditional kernels, computational times can be drastically reduced in large problems making such problems feasible for sample sizes as large as ~106. The optimisation technology known as interior point methods plays a central role. Penalised spline kernels are also shown to allow simple incorporation of low-dimensional structure such as additivity. This can aid both interpretability and performance.
Ganguli, B. & Wand, M. 2007, 'Feature significance in generalized additive models', Statistics and Computing, vol. 17, no. 2, pp. 179-192.
View/Download from: Publisher's site
This paper develops inference for the significance of features such as peaks and valleys observed in additive modeling through an extension of the SiZer-type methodology of Chaudhuri and Marron (1999) and Godtliebsen et al. (2002, 2004) to the case where the outcome is discrete. We consider the problem of determining the significance of features such as peaks or valleys in observed covariate effects both for the case of additive modeling where the main predictor of interest is univariate as well as the problem of studying the significance of features such as peaks, inclines, ridges and valleys when the main predictor of interest is geographical location. We work with low rank radial spline smoothers to allow to the handling of sparse designs and large sample sizes. Reducing the problem to a Generalised Linear Mixed Model (GLMM) framework enables derivation of simulation-based critical value approximations and guards against the problem of multiple inferences over a range of predictor values. Such a reduction also allows for easy adjustment for confounders including those which have an unknown or complex effect on the outcome. A simulation study indicates that our method has satisfactory power. Finally, we illustrate our methodology on several data sets.
Wand, M. 2007, 'Fisher information for generalised linear mixed models', Journal of Multivariate Analysis, vol. 98, no. 7, pp. 1412-1416.
View/Download from: Publisher's site
The Fisher information for the canonical link exponential family generalised linear mixed model is derived. The contribution from the fixed effects parameters is shown to have a particularly simple form.
Wand, M.P. & Ormerod, J.T. 2007, 'On semiparametric regression with O'Sullivan penalised splines'.
This is an expos\'e on the use of O'Sullivan penalised splines in contemporary semiparametric regression, including mixed model and Bayesian formulations. O'Sullivan penalised splines are similar to P-splines, but have an advantage of being a direct generalisation of smoothing splines. Exact expressions for the O'Sullivan penalty matrix are obtained. Comparisons between the two reveals that O'Sullivan penalised splines more closely mimic the natural boundary behaviour of smoothing splines. Implementation in modern computing environments such as Matlab, R and BUGS is discussed.
Oakes, S.R., Robertson, F.G., Kench, J.G., Gardiner-Garden, M., Wand, M.P., Green, J.E. & Ormandy, C.J. 2007, 'Loss of mammary epithelial prolactin receptor delays tumor formation by reducing cell proliferation in low-grade preinvasive lesions.', Oncogene, vol. 26, no. 4, pp. 543-553.
View/Download from: Publisher's site
Top quartile serum prolactin levels confer a twofold increase in the relative risk of developing breast cancer. Prolactin exerts this effect at an ill defined point in the carcinogenic process, via mechanisms involving direct action via prolactin receptors within mammary epithelium and/or indirect action through regulation of other hormones such as estrogen and progesterone. We have addressed these questions by examining mammary carcinogenesis in transplants of mouse mammary epithelium expressing the SV40T oncogene, with or without the prolactin receptor, using host animals with a normal endocrine system. In prolactin receptor knockout transplants the area of neoplasia was significantly smaller (7 versus 17%; P < 0.001 at 22 weeks and 7 versus 14%; P = 0.009 at 32 weeks). Low-grade neoplastic lesions displayed reduced BrdU incorporation rate (11.3 versus 17% P = 0.003) but no change in apoptosis rate. Tumor latency increased (289 days versus 236 days, P < 0.001). Tumor frequency, growth rate, morphology, cell proliferation and apoptosis were not altered. Thus, prolactin acts directly on the mammary epithelial cells to increase cell proliferation in preinvasive lesions, resulting in more neoplasia and acceleration of the transition to invasive carcinoma. Targeting of mammary prolactin signaling thus provides a strategy to prevent the early progression of neoplasia to invasive carcinoma.
Ganguli, B. & Wand, M.P. 2006, 'Additive models for geo-referenced failure time data.', Stat Med, vol. 25, no. 14, pp. 2469-2482.
View/Download from: Publisher's site
Asthma researchers have found some evidence that geographical variations in susceptibility to asthma could reflect the effect of community level factors such as exposure to violence. Our methodology was motivated by a study of age at onset of asthma among children of inner-city neighbourhoods in East Boston. Cox's proportional hazards model was not appropriate since there was not enough information about the nature of geographical variations so as to impose a parametric relationship. In addition, some of the known risk factors were believed to have non-linear log-hazard ratios. We extend the geoadditive models of Kamman and Wand to the case where the outcome measure is a possibly censored time to event. We reduce the problem to one of fitting a Poisson mixed model by using Poisson approximations in conjunction with a mixed model formulation of generalized additive modelling. Our method allows for low-rank additive modelling, provides likelihood-based estimation of all parameters including the amount of smoothing and can be implemented using standard software. We illustrate our method on the East Boston data.
Zhao, Y., Staudenmayer, J., Coull, B.A. & Wand, M.P. 2006, 'General Design Bayesian Generalized Linear Mixed Models', Statistical Science, vol. 21, no. 1, pp. 35-51.
View/Download from: Publisher's site
Linear mixed models are able to handle an extraordinary range of complications in regression-type analyses. Their most common use is to account for within-subject correlation in longitudinal data analysis. They are also the standard vehicle for smoothing spatial count data. However, when treated in full generality, mixed models can also handle spline-type smoothing and closely approximate kriging. This allows for nonparametric regression models (e.g., additive models and varying coefficient models) to be handled within the mixed model framework. The key is to allow the random effects design matrix to have general structure; hence our label general design. For continuous response data, particularly when Gaussianity of the response is reasonably assumed, computation is now quite mature and supported by the R, SAS and S-PLUS packages. Such is not the case for binary and count responses, where generalized linear mixed models (GLMMs) are required, but are hindered by the presence of intractable multivariate integrals. Software known to us supports special cases of the GLMM (e.g., PROC NLMIXED in SAS or glmmML in R) or relies on the sometimes crude Laplace-type approximation of integrals (e.g., the SAS macro glimmix or glmmPQL in R). This paper describes the fitting of general design generalized linear mixed models. A Bayesian approach is taken and Markov chain Monte Carlo (MCMC) is used for estimation and inference. In this generalized setting, MCMC requires sampling from nonstandard distributions. In this article, we demonstrate that the MCMC package WinBUGS facilitates sound fitting of general design Bayesian generalized linear mixed models in practice.
Werneck, G.L., Costa, C.H., Walker, A.M., David, J.R., Wand, M. & Maguire, J.H. 2006, 'Multilevel modelling of the incidence of visceral leishmaniasis in Teresina, Brazil', Epidemiology and Infection, vol. 135, no. 2, pp. 195-201.
View/Download from: Publisher's site
Epidemics of visceral leishmaniasis (VL) in major Brazilian cities are new phenomena since 1980. As determinants of transmission in urban settings probably operate at different geographic scales, and information is not available for each scale, a multilevel approach was used to examine the effect of canine infection and environmental and socio-economic factors on the spatial variability of incidence rates of VL in the city of Teresina. Details on an outbreak of greater than 1200 cases of VL in Teresina during 19931996 were available at two hierarchical levels: census tracts (socio-economic characteristics, incidence rates of human VL) and districts, which encompass census tracts (prevalence of canine infection). Remotely sensed data obtained by satellite generated environmental information at both levels. Data from census tracts and districts were analysed simultaneously by multilevel modelling. Poor socio-economic conditions and increased vegetation were associated with a high incidence of human VL. Increasing prevalence of canine infection also predicted a high incidence of human VL, as did high prevalence of canine infection before and during the epidemic. Poor socio-economic conditions had an amplifying effect on the association between canine infection and the incidence of human VL. Focusing interventions on areas with characteristics identified by multilevel analysis could be a cost-effective strategy for controlling VL. Because risk factors for infectious diseases operate simultaneously at several levels and ecological data usually are available at different geographical scales, multilevel modelling is a valuable tool for epidemiological investigation of disease transmission
Wand, M. 2006, 'Support vector machine classification', Parabola, vol. 42, no. 2, pp. 21-37.
NA
Pearce, N.D. & Wand, M. 2006, 'Penalized splines and reproducing kernel methods', The American Statistician, vol. 60, no. 3, pp. 233-240.
View/Download from: Publisher's site
Two data analytic research areaspenalized splines and reproducing kernel methodshave become very vibrant since the mid-1990s. This article shows how the former can be embedded in the latter via theory for reproducing kernel Hilbert spaces. This connection facilitates cross-fertilization between the two bodies of research. In particular, connections between support vector machines and penalized splines are established. These allow for significant reductions in computational complexity, and easier incorporation of special structure such as additivity.
Salganik, M.P., Hardie, D.L., Swart, B., Dandie, G.W., Zola, H., Shaw, S., Shapiro, H., Tinckam, K., Milford, E.L. & Wand, M.P. 2005, 'Detecting antibodies with similar reactivity patterns in the HLDA8 blind panel of flow cytometry data.', J Immunol Methods, vol. 305, no. 1, pp. 67-74.
View/Download from: Publisher's site
The blind panel collected for the 8th Human Leucocyte Differentiation Antigens Workshop (HLDA8; ) included 49 antibodies of known CD specificities and 76 antibodies of unknown specificity. We have identified groups of antibodies showing similar patterns of reactivity that need to be investigated by biochemical methods to evaluate whether the antibodies within these groups are reacting with the same molecule. Our approach to data analysis was based on the work of Salganik et al. (in press) [Salganik, M.P., Milford E.L., Hardie D.L., Shaw, S., Wand, M.P., in press. Classifying antibodies using flow cytometry data: class prediction and class discovery. Biometrical Journal].
Salganik, M.P., Milford, E.L., Hardie, D.L., Shaw, S. & Wand, M. 2005, 'Classifying antibodies using flow cytometry data: class prediction and class discovery', Biometrical Journal, vol. 47, no. 5, pp. 740-754.
View/Download from: Publisher's site
Classifying monoclonal antibodies, based on the similarity of their binding to the proteins (antigens) on the surface of blood cells, is essential for progress in immunology, hematology and clinical medicine. The collaborative efforts of researchers from many countries have led to the classification of thousands of antibodies into 247 clusters of differentiation (CD). Classification is based on flow cytometry and biochemical data. In preliminary classifications of antibodies based on flow cytometry data, the object requiring classification (an antibody) is described by a set of random samples from unknown densities of fluorescence intensity. An individual sample is collected in the experiment, where a population of cells of a certain type is stained by the identical fluorescently marked replicates of the antibody of interest. Samples are collected for multiple cell types. The classification problems of interest include identifying new CDs (class discovery or unsupervised learning) and assigning new antibodies to the known CD clusters (class prediction or supervised learning). These problems have attracted limited attention from statisticians. We recommend a novel approach to the classification process in which a computer algorithm suggests to the analyst the subset of the most appropriate classifications of an antibody in class prediction problems or the most similar pairs/groups of antibodies in class discovery problems. The suggested algorithm speeds up the analysis of a flow cytometry data by a factor 1020. This allows the analyst to focus on the interpretation of the automatically suggested preliminary classification solutions and on planning the subsequent biochemical experiments
Crainiceanu, C.M., Ruppert, D., Claeskens, G. & Wand, M. 2005, 'Exact likelihood ratio tests for penalised splines', Biometrika, vol. 92, no. 1, pp. 91-103.
View/Download from: Publisher's site
Penalised-spline-based additive models allow a simple mixed model representation where the variance components control departures from linear models. The smoothing parameter is the ratio of the random-coefficient and error variances and tests for linear regression reduce to tests for zero random-coefficient variances. We propose exactlikelihood and restricted likelihood ratio tests for testing polynomial regression versus a general alternative modelled by penalised splines. Their spectral decompositions are used as the basis of fast simulation algorithms. We derive the asymptotic local power properties of the tests under weak conditions. In particular we characterise the local alternatives that are detected with asymptotic probability one. Confidence intervals for the smoothing parameter are obtained by inverting the tests for a fixed smoothing parameter versus a general alternative. We discuss F and R tests and show that ignoring the variability in the smoothing parameter estimator can have a dramatic effect on their null distributions. The powers of several known tests are investigated and a small set of tests with good power properties is identified. The restricted likelihood ratio test is among the best in terms of power
Swart, B., Salganik, M.P., Wand, M.P., Tinckam, K., Milford, E.L., Drbal, K., Angelisova, P., Horejsi, V., Macardle, P., Bailey, S., Hollemweguer, E., Hodge, G., Nairn, J., Millard, D., Dagdeviren, A., Dandie, G.W., Zola, H. & HLDA8 blind panel 2005, 'The HLDA8 blind panel: findings and conclusions.', J Immunol Methods, vol. 305, no. 1, pp. 75-83.
View/Download from: Publisher's site
There were over 600 antibodies submitted to HLDA8, with many of unknown specificity. Of these, 101 antibodies were selected for a blind panel study that also included 5 negative controls and 27 positive controls of known CD specificity making a total of 133 antibodies in the final panel. Of the 101 unknowns, 31 antibodies were identified during the course of this blind panel study as being specific for known molecules and included some specific for MHC class II antigens, CD45 isoforms and the Dombrock antigen. Several antibody pairs among those in the blind panel were found to have very similar staining patterns and were therefore compared by immunohistochemical and/or Western blot analyses for identity.
Ganguli, B., Staudenmayer, J. & Wand, M. 2005, 'Additive models with predictors subject to measurement error', Australian & New Zealand Journal of Statistics, vol. 47, no. 2, pp. 193-202.
View/Download from: Publisher's site
This paper develops a likelihood-based method for fitting additive models in the presence of measurement error. It formulates the additive model using the linear mixed model representation of penalized splines. In the presence of a structural measurement error model, the resulting likelihood involves intractable integrals, and a Monte Carlo expectation maximization strategy is developed for obtaining estimates. The method's performance is illustrated with a simulation study.
Crainiceanu, C.M., Ruppert, D. & Wand, M. 2005, 'Bayesian analysis for penalized spline regression using WinBUGS', Journal of Statistical Software, vol. 14, no. 14.
Penalized splines can be viewed as BLUPs in a mixed model framework, which allows the use of mixed model software for smoothing. Thus, software originally developed for Bayesian analysis of mixed models can be used for penalized spline regression. Bayesian inference for nonparametric models enjoys the flexibility of nonparametric models and the exact inference provided by the Bayesian inferential machinery. This paper provides a simple, yet comprehensive, set of programs for the implementation of nonparametric Bayesian analysis in WinBUGS. Good mixing properties of the MCMC chains are obtained by using low-rank thin-plate splines, while simulation times per iteration are reduced employing WinBUGS specific computational tricks.
Durbn, M., Harezlak, J., Wand, M.P. & Carroll, R.J. 2005, 'Simple fitting of subject-specific curves for longitudinal data.', Stat Med, vol. 24, no. 8, pp. 1153-1167.
View/Download from: Publisher's site
We present a simple semiparametric model for fitting subject-specific curves for longitudinal data. Individual curves are modelled as penalized splines with random coefficients. This model has a mixed model representation, and it is easily implemented in standard statistical software. We conduct an analysis of the long-term effect of radiation therapy on the height of children suffering from acute lymphoblastic leukaemia using penalized splines in the framework of semiparametric mixed effects models. The analysis revealed significant differences between therapies and showed that the growth rate of girls in the study cannot be fully explained by the group-average curve and that individual curves are necessary to reflect the individual response to treatment. We also show how to implement these models in S-PLUS and R in the appendix.
Wright, R., Finn, P., Contreras, J.P., Cohen, S., Wright, R.O., Staudenmayer, J., Wand, M., Perkins, D., Weiss, S. & Gold, D.R. 2004, 'Chronic caregiver stress and IgE expression, allergen-induced proliferation, and cytokine profiles in a birth cohort predisposed to atopy', Journal Of Allergy And Clinical Immunology, vol. 113, no. 6, pp. 1051-1057.
View/Download from: Publisher's site
Myatt, T.A., Johnston, S.J., Zhengfa, Z., Wand, M., Kebadze, T., Rudnick, S. & Milton, D.K. 2004, 'Detection of airborne rhinovirus and its relation to outdoor air supply in office environments', American Journal of Respiratory and Critical Care Medicine, vol. 169, pp. 1187-1190.
View/Download from: Publisher's site
Ganguli, B. & Wand, M. 2004, 'Feature significance in Geostatistics', Journal of Computational and Graphical Statistics, vol. 13, no. 4, pp. 954-973.
View/Download from: Publisher's site
French, J.L. & Wand, M.P. 2004, 'Generalized additive models for cancer mapping with incomplete covariates.', Biostatistics, vol. 5, no. 2, pp. 177-191.
View/Download from: Publisher's site
Maps depicting cancer incidence rates have become useful tools in public health research, giving valuable information about the spatial variation in rates of disease. Typically, these maps are generated using count data aggregated over areas such as counties or census blocks. However, with the proliferation of geographic information systems and related databases, it is becoming easier to obtain exact spatial locations for the cancer cases and suitable control subjects. The use of such point data allows us to adjust for individual-level covariates, such as age and smoking status, when estimating the spatial variation in disease risk. Unfortunately, such covariate information is often subject to missingness. We propose a method for mapping cancer risk when covariates are not completely observed. We model these data using a logistic generalized additive model. Estimates of the linear and non-linear effects are obtained using a mixed effects model representation. We develop an EM algorithm to account for missing data and the random effects. Since the expectation step involves an intractable integral, we estimate the E-step with a Laplace approximation. This framework provides a general method for handling missing covariate values when fitting generalized additive models. We illustrate our method through an analysis of cancer incidence data from Cape Cod, Massachusetts. These analyses demonstrate that standard complete-case methods can yield biased estimates of the spatial variation of cancer risk.
Ngo, L. & Wand, M. 2004, 'Smoothing with mixed model software', Journal of Statistical Software, vol. 9, no. 1.
Salganik, M.P., Wand, M. & Lange, N. 2004, 'Comparison of feature significance quantile approximations', Australian & New Zealand Journal of Statistics, vol. 46, pp. 569-581.
View/Download from: Publisher's site
Kammann, E.E. & Wand, M. 2003, 'Geoadditive models', Journal of the Royal Statistical Society Series C: Applied Statistics, vol. 52, no. 1, pp. 1-18.
Wand, M. 2003, 'Smoothing and mixed models', Computational Statistics, vol. 18, pp. 223-249.
Kim, J.Y., Hauser, R., Wand, M.P., Herrick, R.F., Houk, R.S., Aeschliman, D.B., Woodin, M.A. & Christiani, D.C. 2003, 'Association of expired nitric oxide with urinary metal concentrations in boilermakers exposed to residual oil fly ash.', Am J Ind Med, vol. 44, no. 5, pp. 458-466.
View/Download from: Publisher's site
Exposure to metal-containing particulate matter has been associated with adverse pulmonary responses. Metals in particulate matter are soluble, hence are readily recovered in urine of exposed individuals. This study investigated the association between urinary metal concentrations and the fractional concentration of expired nitric oxide (F(E)NO) in boilermakers (N = 32) exposed to residual oil fly ash (ROFA).
Kim, J.Y., Hauser, R., Wand, M.P., Herrick, R.F., Amarasiriwardena, C.J. & Christiani, D.C. 2003, 'The association of expired nitric oxide with occupational particulate metal exposure.', Environ Res, vol. 93, no. 2, pp. 158-166.
Toxicologic studies have shown that soluble transition metals in residual oil fly ash (ROFA) can induce pulmonary injury. In this study, we investigated the association between the fractional concentration of expired nitric oxide (FENO) and exposure to metal constituents of particulate matter with an aerodynamic mass median diameter < or =2.5 microm (PM2.5) in boilermakers exposed to ROFA and metal fume. Metals investigated included vanadium, chromium, manganese, nickel, copper, and lead. Subjects were monitored for 5 consecutive days during boiler repair overhauls in 1999 (n=20) and 2000 (n=14). In 1999, we found a significant inverse association between log-transformed FENO and PM2.5 metal concentrations. LogFENO changed by -0.03 (95% CI: -0.04, -0.01), -0.56 (95% CI: -0.88, -0.24), -0.09 (95% CI: -0.16, -0.02), and -0.04 (95% CI: -0.07, -0.02) per microg/m3 of PM2.5 vanadium, chromium, manganese, and nickel, respectively. In 2000, no significant associations were observed, most likely due to exposure misclassification resulting from the use of respirators. The inverse association between PM2.5 metal exposure and FENO in subjects with limited respirator usage suggests that soluble transition metals might be partially responsible for the adverse pulmonary responses seen in workers exposed to ROFA.
Hauser, R., Rice, T.M., Krishha Murthy, G.G., Wand, M., Lewis, D., Bledsoe, T. & Paulauskis, J. 2003, 'The upper airway response to pollen is enhanced by exposure to combustion particulates: A pilot human experimental challenge study', Environmental Health Perspectives, vol. 111, no. 5, pp. 676-680.
View/Download from: Publisher's site
Kim, J.Y., Wand, M., Hauser, R., Mukherjee, S., Herrick, R.F. & Christiani, D.C. 2003, 'Association of expired nitric oxide with occupational particulate exposure', Environmental Health Perspectives, vol. 111, no. 4, pp. 472-477.
View/Download from: Publisher's site
Cai, T., Hyndman, R.J. & Wand, M. 2002, 'Mixed model-based hazard estimation', Journal of Computational and Graphical Statistics, vol. 11, no. 4, pp. 784-798.
View/Download from: Publisher's site
Aerts, M., Claeskens, G. & Wand, M. 2002, 'Some theory for penalized spline generalized additive models', Journal of Statistical Planning and Inference, vol. 103, no. 1-2, pp. 455-470.
View/Download from: Publisher's site
Wand, M. 2002, 'Vector differential calculus in statistics', The American Statistician, vol. 56, no. 1, pp. 55-62.
View/Download from: Publisher's site
Betensky, R.A., Lindsey, J.C., Ryan, L.M. & Wand, M.P. 2002, 'A local likelihood proportional hazards model for interval censored data.', Stat Med, vol. 21, no. 2, pp. 263-275.
We discuss the use of local likelihood methods to fit proportional hazards regression models to right and interval censored data. The assumed model allows for an arbitrary, smoothed baseline hazard on which a vector of covariates operates in a proportional manner, and thus produces an interpretable baseline hazard function along with estimates of global covariate effects. For estimation, we extend the modified EM algorithm suggested by Betensky, Lindsey, Ryan and Wand. We illustrate the method with data on times to deterioration of breast cosmeses and HIV-1 infection rates among haemophiliacs.
Werneck, G.L., Costa, C.H., Walker, A.M., David, J.R., Wand, M. & Maguire, J.H. 2002, 'The urban spread of visceral leishmaniasis: Clues from spatial analysis', Epidemiology, vol. 13, no. 3, pp. 364-367.
View/Download from: Publisher's site
Background. The pattern of spread of visceral leishmaniasis in Brazilian cities is poorly understood. Methods. We used geographic information systems and spatial statistics to evaluate the distribution of 1061 cases of visceral leishmaniasis in Teresina, Brazil, in 1993 through 1996. Results. A locally weighted (LOESS) regression model, which was fit as a smoothed function of spatial coordinates, demonstrated large-scale variation, with high incidence rates in pe- ripheral neighborhoods that bordered forest land and pastures. Moran's I indicated small-scale variation and clustering up to 300 m, roughly the flight range of the sand fly vector. Conclusions. Spatial analytical techniques can identify high- risk areas for targeting control interventions
Mammen, E., Marron, J.S., Turlach, B.A. & Wand, M. 2001, 'A general projection framework for constrained smoothing', Statistical Science, vol. 16, no. 3, pp. 232-248.
Parise, H., Wand, M., Ruppert, D. & Ryan, L.M. 2001, 'Incorporation of historical controls using semiparametric mixed models', Journal of the Royal Statistical Society Series C: Applied Statistics, vol. 50, no. 1, pp. 31-42.
Coull, B.A., Schwartz, J. & Wand, M.P. 2001, 'Respiratory health and air pollution: additive mixed model analyses.', Biostatistics, vol. 2, no. 3, pp. 337-349.
View/Download from: Publisher's site
We conduct a reanalysis of data from the Utah Valley respiratory health/air pollution study of Pope and co-workers (Pope et al., 1991) using additive mixed models. A relatively recent statistical development (e.g. Wang, 1998; Verbyla et al., 1999; Lin and Zhang, 1999), the methods allow for smooth functional relationships, subject-specific effects and time series error structure. All three of these are apparent in the Utah Valley data.
Coull, B.A., Ruppert, D. & Wand, M.P. 2001, 'Simple incorporation of interactions into additive models.', Biometrics, vol. 57, no. 2, pp. 539-545.
Often, the functional form of covariate effects in an additive model varies across groups defined by levels of a categorical variable. This structure represents a factor-by-curve interaction. This article presents penalized spline models that incorporate factor-by-curve interactions into additive models. A mixed model formulation for penalized splines allows for straightforward model fitting and smoothing parameter selection. We illustrate the proposed model by applying it to pollen ragweed data in which seasonal trends vary by year.
Moore, P.E., Laporte, J.D., Abraham, J.H., Schwartzman, I.N., Yandava, C.N., Silverman, E.S., Drazen, J.M., Wand, M.P., Panettieri, R.A. & Shore, S.A. 2000, 'Polymorphism of the beta(2)-adrenergic receptor gene and desensitization in human airway smooth muscle.', Am J Respir Crit Care Med, vol. 162, no. 6, pp. 2117-2124.
View/Download from: Publisher's site
We examined the influence of two common polymorphic forms of the beta(2)-adrenergic receptor (beta(2)AR): the Gly16 and Glu27 alleles, on acute and long-term beta(2)AR desensitization in human airway smooth muscle (HASM) cells. In cells from 15 individuals, considered without respect to genotype, pretreatment with Isoproterenol (ISO) at 10(-7) M for 1 h or 24 h caused approximately 25% and 64% decreases in the ability of subsequent ISO (10(-6) M) stimulation to reduce HASM cell stiffness as measured by magnetic twisting cytometry. Similar results were obtained with ISO-induced cyclic adenosine monophosphate (cAMP) as the outcome indicator. Data were then stratified post hoc by genotype. Cells containing at least one Glu27 allele (equivalent to presence of the Gly16Glu27 haplotype) showed significantly greater acute desensitization than did cells with no Glu27 allele, whether ISO-induced cell stiffness (34% versus 19%, p < 0.03) or cAMP formation (58% versus 11%, p < 0.02) was measured. Likewise, cells with any Glu27 allele showed greater long-term desensitization of cell stiffness and cAMP formation responses than did cells without the Glu27 allele. The distribution of genotypes limited direct conclusions about the influence of the Gly16 allele. However, presence of the Gly16Gln27 haplotype was associated with less acute and long-term desensitization of ISO-induced cAMP formation than was seen in cells without the Gly16Gln27 haplotype (14% versus 47%, p < 0.09 for short-term desensitization; 32% versus 84%, p < 0.01 for long-term desensitization), suggesting that the influence of Glu27 is not through its association with Gly16. The Glu27 allele was in strong linkage disequilibrium with the Arg19 allele, a polymorphic form of the beta(2)AR upstream peptide of the 5'-leader cistron of the beta(2)AR, and this polymorphism in the beta(2)AR 5'-flanking region may explain the effects of the Glu27 allele. Cells with any Arg19 allele showed significantly greater acute and long-term desensitization of ISO-induced cAMP formation than did cells without the Arg19 allele (54% versus 2%, p < 0.01 for short-term desensitization; 73% versus 35%, p < 0.05 for long-term desensitization). Similar results were obtained for ISO-induced changes in cell stiffness. Thus, the presence of the Glu27 allele is associated with increased acute and long-term desensitization in HASM.
Wand, M. 2000, 'A comparison of regression spline smoothing procedures', Computational Statistics, vol. 15, no. 4, pp. 443-462.
View/Download from: Publisher's site
Zanobetti, A., Wand, M.P., Schwartz, J. & Ryan, L.M. 2000, 'Generalized additive distributed lag models: quantifying mortality displacement.', Biostatistics, vol. 1, no. 3, pp. 279-292.
View/Download from: Publisher's site
There are a number of applied settings where a response is measured repeatedly over time, and the impact of a stimulus at one time is distributed over several subsequent response measures. In the motivating application the stimulus is an air pollutant such as airborne particulate matter and the response is mortality. However, several other variables (e.g. daily temperature) impact the response in a possibly non-linear fashion. To quantify the effect of the stimulus in the presence of covariate data we combine two established regression techniques: generalized additive models and distributed lag models. Generalized additive models extend multiple linear regression by allowing for continuous covariates to be modeled as smooth, but otherwise unspecified, functions. Distributed lag models aim to relate the outcome variable to lagged values of a time-dependent predictor in a parsimonious fashion. The resultant, which we call generalized additive distributed lag models, are seen to effectively quantify the so-called 'mortality displacement effect' in environmental epidemiology, as illustrated through air pollution/mortality data from Milan, Italy.
Wechsler, M.E., Grasemann, H., Deykin, A., Silverman, E.K., Yandava, C.N., Isreal, E., Wand, M. & Drazen, J.M. 2000, 'Exhaled nitric oxide in patients with asthma: Association with NOS1 genotype', American Journal of Respiratory and Critical Care Medicine, vol. 162, pp. 2043-2047.
View/Download from: Publisher's site
Thurston, S.W., Wand, M.P. & Wiencke, J.K. 2000, 'Negative binomial additive models.', Biometrics, vol. 56, no. 1, pp. 139-144.
The generalized additive model is extended to handle negative binomial responses. The extension is complicated by the fact that the negative binomial distribution has two parameters and is not in the exponential family. The methodology is applied to data involving DNA adduct counts and smoking variables among ex-smokers with lung cancer. A more detailed investigation is made of the parametric relationship between the number of adducts and years since quitting while retaining a smooth relationship between adducts and the other covariates.
Wand, M. 1999, 'A central limit theorem for local polynomial backfitting estimators', Journal of Multivariate Analysis, vol. 70, pp. 57-65.
View/Download from: Publisher's site
Opsomer, J.D., Ruppert, D., Wand, M.P., Holst, U. & Hssjer, O. 1999, 'Kriging with nonparametric variance function estimation.', Biometrics, vol. 55, no. 3, pp. 704-710.
A method for fitting regression models to data that exhibit spatial correlation and heteroskedasticity is proposed. It is well known that ignoring a nonconstant variance does not bias least-squares estimates of regression parameters; thus, data analysts are easily lead to the false belief that moderate heteroskedasticity can generally be ignored. Unfortunately, ignoring nonconstant variance when fitting variograms can seriously bias estimated correlation functions. By modeling heteroskedasticity and standardizing by estimated standard deviations, our approach eliminates this bias in the correlations. A combination of parametric and nonparametric regression techniques is used to iteratively estimate the various components of the model. The approach is demonstrated on a large data set of predicted nitrogen runoff from agricultural lands in the Midwest and Northern Plains regions of the U.S.A. For this data set, the model comprises three main components: (1) the mean function, which includes farming practice variables, local soil and climate characteristics, and the nitrogen application treatment, is assumed to be linear in the parameters and is fitted by generalized least squares; (2) the variance function, which contains a local and a spatial component whose shapes are left unspecified, is estimated by local linear regression; and (3) the spatial correlation function is estimated by fitting a parametric variogram model to the standardized residuals, with the standardization adjusting the variogram for the presence of heteroskedasticity. The fitting of these three components is iterated until convergence. The model provides an improved fit to the data compared with a previous model that ignored the heteroskedasticity and the spatial correlation.
Wand, M. 1999, 'On the optimal amount of smoothing in penalised spline regression', Biometrika, vol. 86, no. 4, pp. 936-940.
View/Download from: Publisher's site
Gijbels, I., Pope, A. & Wand, M. 1999, 'Understanding exponential smoothing via kernel regression', Journal of The Royal Statistical Society Series B-methodological, vol. 61, no. 1, pp. 39-50.
Betensky, R.A., Lindsey, J.C., Ryan, L.M. & Wand, M.P. 1999, 'Local EM estimation of the hazard function for interval-censored data.', Biometrics, vol. 55, no. 1, pp. 238-245.
We propose a smooth hazard estimator for interval-censored survival data using the method of local likelihood. The model is fit using a local EM algorithm. The estimator is more descriptive than traditional empirical estimates in regions of concentrated information and takes on a parametric flavor in regions of sparse information. We derive two different standard error estimates for the smooth curve, one based on asymptotic theory and the other on the bootstrap. We illustrate the local EM method for times to breast cosmesis deterioration (Finkelstein, 1986, Biometrics 42, 845-854) and for times to HIV-1 infection for individuals with hemophilia (Kroner et al., 1994, Journal of AIDS 7, 279-286). Our hazard estimates for each of these data sets show interesting structures that would not be found using a standard parametric hazard model or empirical survivorship estimates.
Augustyns, I. & Wand, M. 1998, 'Bandwidth selection for local polynomial smoothing of multinomial data', Computational Statistics, vol. 13, no. 4, pp. 447-461.
Wand, M. 1998, 'Finite sample performance of deconvolving density estimators', Statistics & Probability Letters, vol. 37, pp. 131-139.
View/Download from: Publisher's site
Hyndman, R.J. & Wand, M. 1997, 'Nonparametric autocovariance function estimation', Australian journal of statistics, vol. 39, pp. 313-325.
View/Download from: Publisher's site
Wand, M. 1997, 'Data-based choice of histogram bin width', The American Statistician, vol. 51, no. 1, pp. 59-64.
View/Download from: Publisher's site
Wand, M. & Gutierrez, R.G. 1997, 'Exact risk approaches to smoothing parameter selection', Journal of Nonparametric Statistics, vol. 8, no. 4, pp. 337-354.
Carroll, R.J., Fan, J., Gijbels, I. & Wand, M. 1997, 'Generalized partially linear single-index models', Journal of the American Statistical Association, vol. 92, no. 438, pp. 477-489.
A semiparametric version of the generalized linear model for regression response was developed by replacing the linear combination with nonparametric components. The generalized partially linear single-index models were formed by combining simpler, conventional models such as single-index and partially linear models. Furthermore, the asymptotic distributions of the linear combination involving unknown parameters and unknown function was obtained by using local linear methods.
Ruppert, D., Wand, M., Holst, U. & Hossjer, O. 1997, 'Local polynomial variance-function estimation', Technometrics, vol. 39, no. 3, pp. 262-273.
View/Download from: Publisher's site
Turlach, B.A. & Wand, M. 1996, 'Fast computation of auxiliary quantities in local polynomial regression', Journal of Computational and Graphical Statistics, vol. 5, no. 4, pp. 337-350.
We investigate the extension of binning methodology to fast computation of several auxiliary quantities that arise in local polynomial smoothing. Examples include degrees of freedom measures, cross-validation functions, variance estimates, and exact measures of error. It is shown that the computational effort required for such approximations is of the same order of magnitude as that required for a binned local polynomial smooth.
Hall, P. & Wand, M. 1996, 'On the accuracy of binned kernel density estimators', Journal of Multivariate Analysis, vol. 56, pp. 165-184.
View/Download from: Publisher's site
Gonzalez-Manteiga, W., Sanchez-Sellero, C. & Wand, M. 1996, 'Accuracy of binned kernel functional approximations', Computational Statistics and Data Analysis, vol. 22, no. 1, pp. 1-16.
View/Download from: Publisher's site
Virtually all common bandwidth selection algorithms are based on a certain type of kernel functional estimator. Such estimators can be computationally very expensive, so in practice they are often replaced by fast binned approximations. This is especially worthwhile when the bandwidth selection method involves iteration. Results for the accuracy of these approximations are derived and then used to provide an understanding of the number of binning grid points required to achieve a given level of accuracy. Our results apply to both univariate and multivariate settings. Multivariate contexts are of particular interest since the cost due to having a higher number of grid points can be quite significant.
Herrmann, E., Wand, M., Engel, J. & Gasser, T. 1995, 'A bandwidth selector for bivariate kernel regression', Journal of The Royal Statistical Society Series B-methodological, vol. 57, no. 1, pp. 171-180.
Ruppert, D., Sheather, S.J. & Wand, M. 1995, 'An effective bandwidth selector for local least squares regression', Journal of the American Statistical Association, vol. 90, no. 432, pp. 1257-1270.
Local least squares kernel regression provides an appealing solution to the nonparametric regression, or "scatterplot smoothing," problem, as demonstrated by Fan, for example. The practical implementation of any scatterplot smoother is greatly enhanced by the availability of a reliable rule for automatic selection of the smoothing parameter. In this article we apply the ideas of plug-in bandwidth selection to develop strategies for choosing the smoothing parameter of local linear squares kernel estimators. Our results are applicable to odd-degree local polynomial fits and can be extended to other settings, such as derivative estimation and multiple nonparametric regression. An implementation in the important case of local linear fits with univariate predictors is shown to perform well in practice. A by-product of our work is the development of a class of nonparametric variance estimators, based on local least squares ideas, and plug-in rules for their implementation.
Aldershof, B., Marron, J.S., Park, B.U. & Wand, M. 1995, 'Facts about the gaussian probability density function', Applicable Analysis:, vol. 59, no. 1, pp. 289-306.
Fan, J., Heckman, N.E. & Wand, M. 1995, 'Local polynomial kernel regression for generalized linear models and quasi-likelihood functions', Journal of the American Statistical Association, vol. 90, no. 429, pp. 141-150.
View/Download from: Publisher's site
Wand, M. 1994, 'Fast computation of multivariate kernel estimators', Journal of Computational and Graphical Statistics, vol. 3, no. 4, pp. 433-445.
Ruppert, D. & Wand, M. 1994, 'Multivariate locally weighted least squares regression', Annals of Statistics, vol. 22, no. 3, pp. 1346-1370.
View/Download from: Publisher's site
Wand, M. & Jones, M.C. 1994, 'Multivariate plug-in bandwidth selection', Computational Statistics, vol. 9, pp. 97-116.
Wand, M. & Jones, M.C. 1993, 'Comparison of smoothing parameterizations in bivariate kernel density estimation', Journal of the American Statistical Association, vol. 88, no. 422, pp. 520-528.
View/Download from: Publisher's site
Wand, M. & Devroye, L. 1993, 'How easy is a given density to estimate?', Computational Statistics and Data Analysis, vol. 16, pp. 311-323.
View/Download from: Publisher's site
Devroye, L. & Wand, M. 1993, 'On the influence of the density on the kernel estimate', Statistics, vol. 24, pp. 215-233.
Jones, M.C. & Wand, M. 1992, 'Asymptotic effectiveness of some higher order kernels', Journal of Statistical Planning and Inference, vol. 31, pp. 15-21.
View/Download from: Publisher's site
Marron, J.S. & Wand, M. 1992, 'Exact mean integrated squared error', Annals of Statistics, vol. 20, no. 2, pp. 712-736.
View/Download from: Publisher's site
Wand, M. 1992, 'Finite sample performance of density estimators under moving average dependence', Statistics & Probability Letters, vol. 13, pp. 109-115.
View/Download from: Publisher's site
Wand, M. 1992, 'Error analyses for general multivariate kernel estimators', Journal of Nonparametric Statistics, vol. 2, pp. 1-15.
Ruppert, D. & Wand, M. 1992, 'Correcting for kurtosis in density estimation', Australian Journal of Statistics, vol. 34`, pp. 19-29.
Wand, M., Marron, J.S. & Ruppert, D. 1991, 'Transformations in density estimation', Journal of the American Statistical Association, vol. 86, no. 414, pp. 343-353.
View/Download from: Publisher's site
Carroll, R.J. & Wand, M. 1991, 'Semiparametric estimation in logistic measurement error models', Journal of The Royal Statistical Society Series B-methodological, vol. 53, no. 3, pp. 573-585.
Scott, D.W. & Wand, M. 1991, 'Feasibility of multivariate density estimates', Biometrika, vol. 78, no. 1, pp. 197-205.
Wand, M. 1990, 'On exact L1 rates of convergence in non-parametric kernel regression', Scandinavian Journal of Statistics, vol. 17, no. 3, pp. 251-256.
Hardle, W., Marron, J.S. & Wand, M. 1990, 'Bandwidth choice for density derivatives', Journal of The Royal Statistical Society Series B-methodological, vol. 52, no. 1, pp. 223-232.
Wand, M. & Schucany, W.R. 1990, 'Gaussian-based kernels', Canadian Journal of Statistics, vol. 18, no. 3, pp. 197-204.
View/Download from: Publisher's site
Hall, P. & Wand, M. 1988, 'Minimizing L1 distance in nonparametric density estimation', Journal of Multivariate Analysis, vol. 26, pp. 59-88.
View/Download from: Publisher's site
Hall, P. & Wand, M. 1988, 'On nonparametric discrimination using density differences', Biometrika, vol. 75, no. 3, pp. 541-547.
View/Download from: Publisher's site
Hall, P. & Wand, M. 1988, 'On the minimization of absolute distance in kernel density estimation', Statistics & Probability Letters, vol. 6, pp. 311-314.
View/Download from: Publisher's site