\documentclass[12pt]{article} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \usepackage{amssymb} \usepackage{amsmath} \setcounter{MaxMatrixCols}{10} %TCIDATA{OutputFilter=LATEX.DLL} %TCIDATA{Version=5.00.0.2552} %TCIDATA{} %TCIDATA{Created=Thursday, June 30, 2005 22:08:21} %TCIDATA{LastRevised=Friday, July 01, 2005 02:23:29} %TCIDATA{} %TCIDATA{} \input {tcislid2} \input{tcilatex} \begin{document} \textbf{\large 1 The Bootstrap} \vspace{0.7cm}\newline \textbf{\large 1.1 The Bootstrap Concept} \vspace{0.5cm}\newline To start with we shall just consider samples \textbf{$Y$ }$=(Y_{1},$ $% Y_{2},...,$ $Y_{n})$ and statistics $s(\mathbf{Y})$ calculated from them. The $Y_{i}$ may themselves be vectors, and they are not necessarily independent, though often they will be. We shall assume initially however that the $Y_{i}$ are mutually independent so that \textbf{$Y$ }is a random sample. (The case of several samples is considered in Section (\ref% {Comparisons}) and the case when the $Y_{i}$ are dependent will be discussed in Section (\ref{Timeseries}).)\vspace{0.3cm}\newline Bootstrapping is simply a numerical way of answering the question: `What is $% G(.),$ the probability distribution of a statistic $s($\textbf{$Y$}$),$ calculated from a sample \textbf{$Y$}$=(Y_{1},$ $Y_{2},...,$ $Y_{n})$?'% \vspace{0.3cm}\newline If we knew $F(.),$ the distribution of $Y$, the above question, `What is $% G(.)$?'$,$ is easily answered numerically. We simply generate a number, $B,$ of independent samples of \textbf{$Y$}$:$ $\mathbf{y}_{1},$ $\mathbf{y}_{2},$ $...$ $\mathbf{y}_{B},$ and calculate the statistic $s_{j}=s(\mathbf{y}_{j})$ from each sample. Then the \emph{empirical distribution function} (EDF) of the sample $\mathbf{s}=(s_{1},s_{2},...,s_{B})$, \begin{equation*} G_{n}(s\text{ }|\text{ }\mathbf{s})=\frac{\#\text{ of }s_{j}\leq s}{B}, \end{equation*}% will converge pointwise with probability one to $G(s).$ [Here and in what follows we shall use the notation $G_{n}(.)$ to denote the EDF\ of a sample of size $n$ drawn from the distribution $G(.).$ Where the precise sample $% \mathbf{s}$\ used to construct $G_{n}$ needs to be made explicit then we use the notation $G_{n}(.|$ $\mathbf{s})$ or $G_{n}(s|$ $\mathbf{s})$] The basic sampling process is thus: \newpage \textbf{The Basic Sampling Process}% \vspace{0.2cm}\newline For $j=1$ to $B$ \vspace{0cm}\newline \_For $i=1$ to $n$ \vspace{0cm}\newline \_\_Draw $y_{ij}$ from $F(.)$ \vspace{0cm}\newline \_Next $i$ \vspace{0cm}\newline \_Calculate $s_{j}=s(\mathbf{y}_{j})$ \vspace{0cm}\newline Next $j$ \vspace{0cm}\newline Form $G_{n}(.|$ $\mathbf{s})$ \noindent The problem is that we do not know $F(.)$ usually. However we do have the EDF, $F_{n}(.|$ $\mathbf{y}),$ formed from the sample $\mathbf{y}.$ The bootstrap idea is to use $F_{n}(.|$ $\mathbf{y})$ instead of $F(.)$ in the above basic process. To obtain a sample we draw values, not from $F(.),$ but from $F_{n}(.|$ $\mathbf{y}).$ This is equivalent to drawing a sample of the same size $n,$ \emph{with replacement,} from the original set of $y$'s. We call such a sample a \emph{bootstrap sample }to distinguish it from the original sample, and write it as $\mathbf{y}^{\ast }=(y_{1}^{\ast },$ $% y_{2}^{\ast },$ $...,y_{n}^{\ast }).$ As in the basic process, $B$ such bootstrap samples are drawn, and the statistic is calculated from these each of these samples: $s_{j}^{\ast }=s(\mathbf{y}_{j}^{\ast })$. The EDF, $% G_{n}(.$ $|$ $\mathbf{s}^{\ast }),$ of these bootstrap statistics $% s_{j}^{\ast },$ is our estimate of $G(.).$ The bootstrap process is as follows\vspace{0.2cm}\newline \textbf{The Bootstrap Sampling Process}\vspace{0.2cm}\newline Given a random sample $\mathbf{y}=(y_{1},y_{2},...y_{n})$ from $F(.)$\vspace{% 0.2cm}\newline Form the EDF $F_{n}(.|\mathbf{y})$\vspace{0.2cm}\newline For $j=1$ to $B$\vspace{0.2cm}\newline \_For $i=1$ to $n$\vspace{0.2cm}\newline \_\_Draw $y_{ij}^{\ast }$ from $F_{n}(.|\mathbf{y})$\vspace{0.2cm}\newline \_Next $i$\vspace{0.2cm}\newline \_Calculate $s_{j}^{\ast }=s(\mathbf{y}_{j}^{\ast })$\vspace{0.2cm}\newline Next $j$\vspace{0.2cm}\newline Form $G_{n}(.|\mathbf{s}^{\ast })$\vspace{0.2cm}\newline \noindent The underlying hope and expectation is that the bootstrap process will under many conditions reproduce the behaviour of the original process. \pagebreak \textbf{\large 1.2 Basic Method} \vspace{0.5cm}\newline We shall assume that the objective is to use the random sample $\mathbf{y=}$ $\{y_{1},$ $y_{2},$ $...,y_{n}\}$ to estimate a parameter $\eta =\eta (F(.))$ that describes some quantity of interest of the distribution of $Y.$ Typical examples are the mean \begin{equation*} \eta =\int_{-\infty }^{\infty }ydF(y) \end{equation*}% or some other moment of $Y,$ or a quantile $\eta =q(p),$ defined by \begin{equation*} \int_{-\infty }^{q(p)}dF(y)=p. \end{equation*}% The statistic $s($\textbf{$Y$}$)$ can then be regarded as any appropriate quantity that we may care to use for estimating $\eta $. We shall emphasise this view by using the alternative but entirely equivalent notation $\hat{% \eta}($\textbf{$Y$}$)\equiv s($\textbf{$Y$}$),$ with the `hat' indicating an extimated quantity. Thus, when $\eta $ is the mean, an obvious statistic is the sample mean \begin{equation*} s(\mathbf{Y})=\bar{Y}. \end{equation*}% It is important to realise that we do not have to use the sample version as the estimator, assuming we can calculate it at all. In the present example we might use a trimmed sample mean, or even the median rather than the sample mean. What we \emph{shall} do is regard the sample version of the parameter as being the quantity of interest in the bootstrap process, i.e. the parameter of interest in the bootstrap process is \begin{equation*} \eta ^{\ast }=\eta (F_{n}(.)) \end{equation*}% Thus we think of the bootstrap process as one where we are generating bootstrap samples $\mathbf{y}_{j}^{\ast }$ from which bootstrap estimates $% s_{j}^{\ast }=s(\mathbf{y}_{j}^{\ast })$ are obtained that estimate the bootstrap parameter $\eta ^{\ast }.$ With this viewpoint, the bootstrap process is a prime example of the so-called \emph{plug-in approach},\emph{\ }% being a precise analogue of the original process with the only difference being that the known $F_{n}(.)$ is plugged-in for, i.e. replaces, the unknown $F(.)$. The \emph{bootstrap principle }is that the bootstrap process reproduces the properties of the original basic process. Useful quantities that are obtained from the bootstrap process are the sample mean and variance of the bootstrap estimates: \begin{equation*} \bar{s}^{\ast }=\frac{1}{B}\sum_{j=1}^{B}s_{j}^{\ast }, \end{equation*}% \begin{equation*} \sigma ^{\ast 2}=\frac{1}{B-1}\sum_{j=1}^{B}(s_{j}^{\ast }-\bar{s}^{\ast })^{2}. \end{equation*}% We also have an estimate of the bias of the mean of the bootstrap estimates: $b^{\ast }=\bar{s}^{\ast }-\eta ^{\ast }.$ A \emph{bootstrap bias adjusted estimate} for $\eta $ is thus $\breve{\eta}=\hat{\eta}-b^{\ast }.$If the bootstrap expectation satisfies $E^{\ast }(s_{j}^{\ast })=\hat{\eta}$ (here $% E^{\ast }$ denotes the conditional expectation $E[s_{j}^{\ast }|$ $F_{n}(.|% \mathbf{y})],$ where $F_{n}(.)$ is treated as being the parent distribution), then $\bar{s}^{\ast }\rightarrow \hat{\eta}$ in probability as $B\rightarrow \infty ,$ and we have \begin{equation*} \breve{\eta}=\hat{\eta}-b^{\ast }=\hat{\eta}-(\bar{s}^{\ast }-\eta ^{\ast })\rightarrow \hat{\eta}-(\hat{\eta}-\eta ^{\ast })=\eta ^{\ast }. \end{equation*}% Thus, in this case, adjusting the original estimator for bias using the bootstrap bias is equivalent to using the bootstrap parameter $\eta ^{\ast }$ as the initial estimate \textbf{\large 1.3 The Double Bootstrap \& Bias Correction} \vspace{0.5cm}% \newline The bootstrap bias adjustment might not remove all the bias. At the expense of significantly increased computation, a second application of the bootstrap can be made. This is known as \emph{the double bootstrap}, and consists of an outer bootstrap that is exactly the structure of the initial bootstrap, but where each step within this outer bootstrap is itself a (inner) bootstrap. The precise calculation is as follows.\vspace{0.2cm}% \newline \textbf{The Double Bootstrap (for Bias Correction)}\vspace{0.2cm}\newline Given a random sample $\mathbf{y}=(y_{1},y_{2},...y_{n})$ from $F(.)$\vspace{% 0.2cm}\newline \textbf{Outer Bootstrap:}\vspace{0.2cm}\newline For $j=1$ to $B$ \vspace{0.2cm}\newline \_For $i=1$ to $n$ \vspace{0.2cm}\newline \_\_Draw $y_{ij}^{\ast }$ from $F_{n}(.|\mathbf{y})$ \vspace{0.2cm}\newline \_Next $i$ \vspace{0.2cm}\newline \_Let $\eta _{j}^{\ast \ast }=\eta (F_{n}(.|\mathbf{y}_{j}^{\ast }))$\vspace{% 0.2cm}\newline \_\textbf{Inner Bootstrap:}\vspace{0.2cm}\newline \_For $k=1$ to $B$ \vspace{0.2cm}\newline \_\_For $i=1$ to $n$ \vspace{0.2cm}\newline \_\_\_Draw $y_{ijk}^{\ast \ast }$ from $F_{n}(.|\mathbf{y}_{j}^{\ast })$ \vspace{0.2cm}\newline \_\_Next $i$ \vspace{0.2cm}\newline \_\_Calculate $s_{jk}^{\ast \ast }=s(\mathbf{y}_{jk}^{\ast \ast })$ \vspace{% 0.2cm}\newline \_Next $k$ \vspace{0.2cm}\newline \_Let $\bar{s}_{j\cdot }^{\ast \ast }=\frac{1}{B}\sum_{k=1}^{B}s_{jk}^{\ast \ast }$ \vspace{0.2cm}\newline \_Let $b_{j}^{\ast \ast }=\bar{s}_{j\cdot }^{\ast \ast }-\eta _{j}^{\ast \ast }$ \vspace{0.2cm}\newline \_Let $\breve{\eta}_{j}^{\ast }=\hat{\eta}_{j}^{\ast }-b_{j}^{\ast \ast }$ \vspace{0.2cm}\newline \textbf{\_End of Inner Bootstrap} \vspace{0.2cm}\newline Next $j$ \vspace{0.2cm}\newline Let $\bar{\breve{\eta}}_{\cdot }^{\ast }=\frac{1}{B}\sum_{j=1}^{B}\breve{\eta% }_{j}^{\ast }$ \vspace{0.2cm}\newline Let $b^{\ast }=\bar{\breve{\eta}}_{\cdot }^{\ast }-\eta ^{\ast }$ \vspace{% 0.2cm}\newline Let $\breve{\eta}=\hat{\eta}-b^{\ast }$ \vspace{0.2cm}\newline \textbf{End of Outer Bootstrap} \vspace{0.2cm}\newline \noindent Whilst this can lead to a reduction in the bias, the effect on the variance of the estimator is not so clear. The main obvious penalty is the increase in computing cost. If a full numerical method is used to carry out the double bootstrap then the computation increases quadratically with $B.$ The full double bootstrap is not always necessary. It may be possible to avoid the inner bootstrap by using an estimator of $\eta $ that is already bias adjusted. For example if it is thought that the original estimator might seriously biased then one might replace it immediately with $\eta ^{\ast }$ and then use bootstrap bias adjustment on $\eta ^{\ast }$ using the single bootstrap. The single bootstrap yields an estimate of the variance of $\hat{\eta}^{\ast }.$ When the bias correction is small this same variance will of course estimate the variance of $\breve{\eta}$ as well. However when the bias correction is large then the double bootstrap has to be used \textbf{{\large 1.4 \label{Parametric}Parametric Bootstrap}} \vspace{0.5cm}% \newline We now consider parametric bootstrapping. Again we focus on a random sample $% \mathbf{y}=(y_{1},$ $y_{2},$ $...,$ $y_{n})$ drawn from a distribution $F(y,$ $\mathbf{\theta }),$ with the difference that the form of $F$ is known but which depends on a vector $\mathbf{\theta }$ of parameters that is unknown. We denote the unknown true value of $\mathbf{\theta }$ by $\mathbf{\theta }% _{0}.$ It is then natural to carry out bootstrap sampling, not from the EDF $% F_{n}(y$ $|$ $\mathbf{y}),$ but from the distribution $F(y,$ $\mathbf{\hat{% \theta}}),$ where $\mathbf{\hat{\theta}}$ is an estimate of $\mathbf{\theta }% _{0}.$ An obvious choice for $\mathbf{\hat{\theta}}$ is the \emph{maximum likelihood} (ML) estimator because it has an asymptotic distribution that is easy to calculate. We define the log-likelihood by \begin{equation} L(\mathbf{\theta },\text{ }\mathbf{y})=\sum_{i=1}^{n}\log f(y_{i};\mathbf{% \theta }). \label{1} \end{equation}% where $\mathbf{y}=(y_{1},$ $y_{2},$ $...,$ $y_{n}).$ The method of ML is to estimate $\mathbf{\theta }_{0}$ by maximizing $L$, treated as a function of $% \mathbf{\theta ,}$ subject to $\mathbf{\theta }\in \mathbf{\Theta ,}$ where $% \mathbf{\Theta }$ is some allowed region of the parameter space. for the remainder of the subsection\textbf{\ }$\mathbf{\hat{\theta}}$ will represent the ML estimator. Regularity conditions which ensure that standard asymptotic theory applies are given in Wilks (1962) or Cox and Hinkley (1974) for example. When they apply $\mathbf{\hat{\theta}}$ can found by solving $\left. \frac{\partial L(\mathbf{\theta })}{\partial \mathbf{\theta }% }\right\vert _{\mathbf{\hat{\theta}}}=\mathbf{0}$. Moreover \begin{equation} -n^{-1}\left. \frac{\partial ^{2}L(\mathbf{\theta })}{\partial \mathbf{% \theta }^{2}}\right\vert _{\mathbf{\theta }_{0}}\rightarrow I(\mathbf{\theta }_{0}), \label{lik2} \end{equation}% \noindent a positive definite matrix, and the asymptotic distribution of $% \mathbf{\hat{\theta},}$ as $n\rightarrow \infty $, is% \begin{equation} n^{1/2}(\mathbf{\hat{\theta}}-\mathbf{\theta }_{0})\text{ is }N(\mathbf{0},V(% \mathbf{\theta }_{0})) \label{Distnoftheta} \end{equation}% where $V(\mathbf{\theta }_{0})=[I(\mathbf{\theta }_{0})]^{-1}.$ From (\ref% {lik2}) a reasonable approximation for $I(\mathbf{\theta }_{0})$ is \begin{equation} I(\mathbf{\hat{\theta}})\simeq -n^{-1}\left. \frac{\partial ^{2}L(\mathbf{% \theta })}{\partial \mathbf{\theta }^{2}}\right\vert _{\mathbf{\hat{\theta}}% }. \label{Ihat} \end{equation}% More generally suppose that $\eta $ is a smooth function of $\mathbf{\theta } $, ie $\eta =\eta (\mathbf{\theta }).$ The ML estimate of $\eta $ is then simply \begin{equation*} \hat{\eta}=\eta (\mathbf{\hat{\theta}}), \end{equation*}% where $\mathbf{\hat{\theta}}$ is the ML estimate of $\mathbf{\theta }$. The asymptotic normality of $\hat{\eta}$ follows from that of $\mathbf{\hat{% \theta}}$, assuming that $\eta (\mathbf{\theta })$ can be approximated by a linear Taylor series in $\mathbf{\theta }.$ This method of obtaining the asymptotic behaviour of $\hat{\eta}$ is called the \emph{delta method}. We have \begin{equation} n^{1/2}(\hat{\eta}-\eta _{0})\sim N(0,\sigma ^{2}(\mathbf{\theta }_{0})) \label{etahatdist} \end{equation}% with \begin{equation} \sigma ^{2}(\mathbf{\theta }_{0})=\mathbf{g}^{T}(\mathbf{\theta }_{0})V(% \mathbf{\theta }_{0})\mathbf{g}(\mathbf{\theta }_{0}) \label{sigsq} \end{equation}% where \begin{equation} \mathbf{g}(\mathbf{\theta }_{0})=\left. \frac{\partial \eta (\mathbf{\theta }% )}{\partial \mathbf{\theta }}\right\vert _{\mathbf{\theta }_{0}}. \label{gradeta} \end{equation}% An approximation for the distribution of $\hat{\eta}$ is given by (\ref% {etahatdist}) with $\mathbf{\hat{\theta}}$ used in place of the unknown $% \mathbf{\theta }_{0}.$ The above asymptotic theory is firmly established and much used in practice. For this reason parametric bootstrapping is not especially needed when parametric models are fitted to large samples. However, for small samples Monte-Carlo simulation is a well known method for examining the behaviour of estimators and fitted distributions in parametric problems . Such methods are actually parametric bootstrapping under a different name and as such have been known and used for a long time. In this sense parametric bootstrapping significantly predates the direct non-parametric bootstrap. Parametric bootstrapping lacks the computational novelty of direct sampling in that resampling is not involved. Perhaps it is because of this that the significance of parametric bootstrapping was not fully appreciated, and its theoretical importance not recognized until the landmark paper of Efron (1979). If we view bootstrapping as a numerical sampling approach involving models that may include parameters, then its potential is significantly extended. {\large \noindent }\textbf{\large 2 Percentiles and Confidence Intervals} \vspace{0.5cm}\newline \textbf{\large 2.1 Percentiles} \vspace{0.5cm}\newline We shall denote the estimate of a percentile $q_{p}$, with percentile probability $p,$ by $\hat{q}_{p}(\mathbf{y}).$ The obvious choice for estimating a percentile non-parametrically is to use the corresponding percentile of the EDF. There is a certain ambiguity in doing this as the EDF is a step function. Thus all percentiles in the interval $(\frac{k-1}{n},% \frac{k}{n})$ are estimated by the observed order statistic $y_{(k)}.$ Conversely the percentile $\frac{k}{n}$ is estimated by any point $y$ for which $F_{n}(y)=k/n.$ Such ambiguities are removed if the EDF is smoothed. One way is to use a kernel estimator of the density but in most simulations this is perhaps unnecessarily elaborate. The simplest smoothing procedure is to note that if $Y_{(k)}$ is the $k$th order statistic (as opposed to the observed value $y_{(k)}$) then $E[F(Y_{(k)})]=k/(n+1)$ and to use this value to estimate the value of $F$ at the observed order statistic $y_{(k)}.$ We can now simply interpolate between these estimated points $(y_{(k)},$ $% k/(n+1))$ for $k=1,2,...,n.$ The range can be extended to $(0,0)$ (using the line segment joining $(0,0)$ and $(y_{(1)},$ $1/(n+1))$ if $Y$ is known to be a positive random variable. If we denote this smoothed version of $% F_{n}(.)$ by $\tilde{F}_{n}(.)$ then an obvious estimator for $\hat{q}_{p}(% \mathbf{y})$ is \begin{equation*} \hat{q}_{p}(\mathbf{y})=\tilde{F}_{n}^{-1}(p) \end{equation*}% Estimating $F(.)$ in the range $(y_{(n)},\infty ),$ or equivalently $q_{p}$ for $p>n/(n+1)$ is not advisable unless the tail behaviour of $F$ is known. The bootstrap analogue of $\hat{q}_{p}(\mathbf{y})$ is obtained in the usual way. We draw $\mathbf{y}^{\ast }$, a bootstrap sample from $\mathbf{y,}$ and construct $\tilde{F}_{n}(.|$ $\mathbf{y}^{\ast }),$ the smoothed version of the bootstrap EDF. [Where there is no ambiguity we shall write $F_{n}^{\ast }(.)$ for $F_{n}(.|$ $\mathbf{y}^{\ast })$ and $\tilde{F}_{n}^{\ast }(.)$ for $\tilde{F}_{n}(.|$ $\mathbf{y}^{\ast }).$] We can now calculate $\hat{q}% _{p}^{\ast }$ as $\hat{q}_{p}^{\ast }=\tilde{F}_{n}^{\ast -1}(p).$ The bootstrap parameter here is $\eta ^{\ast }=\hat{q}_{p}(\mathbf{y}).$ In the parametric case, things are much easier. Quantiles are simply estimated from the fitted distribution $F(.,\mathbf{\hat{\theta}})$ \textbf{\large 2.2 Confidence Intervals by Direct Bootstrapping} \vspace{% 0.5cm}\newline We now consider the construction of confidence intervals for a parameter of interest, $\eta ,$ and consider to what extent the corresponding bootstrap process. can be used to supply a confidence interval for this original parameter $\eta .$ The key requirement is that the distribution of the bootstrap difference $\hat{\eta}^{\ast }-\eta ^{\ast }$ should be close to that of $\hat{\eta}-\eta .$ Roughly speaking we need \begin{equation} P^{\ast }(\sqrt{n}(\hat{\eta}^{\ast }-\eta ^{\ast })\leq y)-P(\sqrt{n}(\hat{% \eta}-\eta )\leq y)\rightarrow 0 \label{p*-p} \end{equation}% for any $y$ as $n\rightarrow \infty .$ (As before with $E^{\ast }$, $P^{\ast }$ denotes the conditional probability evaluation treating $F_{n}(.|$ $% \mathbf{y})$ as the parent distribution.) More precise statements of this convergence (\ref{p*-p}) and other results are given in Section (\ref{Theory}% ). We can proceed as follows. We generate $B$ bootstrap samples $\mathbf{y}% _{j}^{\ast },$ $j=1,2,...,B$ and corresponding bootstrap estimates $\hat{\eta% }_{j}^{\ast },$ $j=1,2,...,B.$ Then we select an appropriate confidence level $(1-2\alpha )$ (we use $2\alpha $ rather than $\alpha $ so that $% \alpha $ corresponds to each of the two tail probabilities in two-sided intervals) and find values $a^{\ast }$ and $b^{\ast }$ for which \begin{equation*} P^{\ast }(a^{\ast }\leq \hat{\eta}^{\ast }-\eta ^{\ast }\leq b^{\ast })\simeq 1-2\alpha . \end{equation*}% We then appeal to (\ref{p*-p}) to replace $\hat{\eta}^{\ast }-\eta ^{\ast }$ by $\hat{\eta}-\eta $ and obtain \begin{equation*} P(a^{\ast }\leq \hat{\eta}-\eta \leq b^{\ast })\simeq 1-2\alpha \end{equation*}% which on inversion gives the approximate $(1-2\alpha )$ confidence interval \begin{equation} \hat{\eta}-b^{\ast }\leq \eta \leq \hat{\eta}-a^{\ast } \label{basicCI} \end{equation}% for $\eta .$ One way of obtaining $a^{\ast }$ and $b^{\ast }$ is as follows. Let the bootstrap estimates of the $\alpha $ and $(1-\alpha )$ quantiles obtained from the smoothed EDF$\ $of the $\hat{\eta}_{j}^{\ast }$'s be $\hat{% q}_{\alpha }(\mathbf{\hat{\eta}}^{\ast })$ and $\hat{q}_{1-\alpha }(\mathbf{% \hat{\eta}}^{\ast }).$ These are estimates of the quantiles of the distribution of $\hat{\eta}^{\ast };$ so the corresponding quantiles of $% \hat{\eta}^{\ast }-\eta ^{\ast }$ are \begin{equation} a^{\ast }=\hat{q}_{\alpha }(\mathbf{\hat{\eta}}^{\ast })-\eta ^{\ast }\text{ and }b^{\ast }=\hat{q}_{1-\alpha }(\mathbf{\hat{\eta}}^{\ast })-\eta ^{\ast }. \label{a*b*} \end{equation}% Substituting these into (\ref{basicCI}) and noting that $\eta ^{\ast }=\hat{% \eta}$ gives what is normally called \emph{the basic bootstrap confidence interval.} \begin{subequations} \begin{equation} 2\hat{\eta}-\hat{q}_{1-\alpha }(\mathbf{\hat{\eta}}^{\ast })\leq \eta \leq 2% \hat{\eta}-\hat{q}_{\alpha }(\mathbf{\hat{\eta}}^{\ast }). \label{basicBSCI} \end{equation}% This confidence interval is also known as the \emph{hybrid bootstrap confidence interval} as we have essentially replaced percentage points of the unknown original EDF by those of the smoothed bootstrap EDF. There is substantial evidence that this basic bootstrap can be significantly improved (in terms of giving coverage that is closer to the nominal level) if we use a \emph{pivotal} quantity, like a studentized mean, rather than focusing just on the difference $\hat{\eta}-\eta $. The reason is that a pivotal quantity is less dependent (ideally not dependent at all) ) on the form of the original distribution. This would mean that the confidence interval is less influenced by any difference there may be in the distributions of $\hat{% \eta}^{\ast }-\eta ^{\ast }$ and $\hat{\eta}-\eta .$ We consider this next. \textbf{\large 2.3 Studentization} \vspace{0.5cm}\newline Suppose that $\hat{\sigma}^{2}(\mathbf{y}),$ an estimate of the variance of $% \hat{\eta},$ can be calculated from the sample $\mathbf{y}$. This will be the case for $\hat{\eta}=\bar{y}$ say, when we have $\hat{\sigma}^{2}(% \mathbf{y})=s^{2},$ the sample variance. As before suppose we can find $% a^{\ast }$ and $b^{\ast }$ such that \end{subequations} \begin{equation*} P^{\ast }(a^{\ast }\leq (\hat{\eta}^{\ast }-\eta ^{\ast })/\hat{\sigma}% ^{\ast }\leq b^{\ast })\simeq 1-2\alpha . \end{equation*}% If we then appeal to a similar result to (\ref{p*-p}) and replace $(\hat{\eta% }^{\ast }-\eta ^{\ast })/\hat{\sigma}^{\ast }$ by $(\hat{\eta}-\eta )/\hat{% \sigma},$ we obtain \begin{equation*} P(\hat{\sigma}a^{\ast }\leq \hat{\eta}-\eta \leq \hat{\sigma}b^{\ast })\simeq 1-2\alpha \end{equation*}% which on inversion gives the approximate $(1-2\alpha )$ confidence interval \begin{equation*} \hat{\eta}-\hat{\sigma}b^{\ast }\leq \eta \leq \hat{\eta}-\hat{\sigma}% a^{\ast }. \end{equation*}% We can now calculate $a^{\ast }$ and $b^{\ast }$ by drawing $B$ bootstrap versions of $z=$ $(\hat{\eta}-\eta )/\hat{\sigma}:$ $z_{j}^{\ast }=(\hat{\eta% }_{j}^{\ast }-\eta ^{\ast })/\hat{\sigma}_{j}^{\ast },$ $j=1,2,...,B.$ Let $% \hat{q}_{\alpha }(\mathbf{z}^{\ast })$ and $\hat{q}_{1-\alpha }(\mathbf{z}% ^{\ast })$ be the quantiles obtained from the EDF of these $z_{j}^{\ast }.$ The confidence interval for $\eta $ is now \begin{equation} (\hat{\eta}_{\alpha }^{Stud},\text{ }\hat{\eta}_{1-\alpha }^{Stud})=(\hat{% \eta}-\hat{\sigma}\hat{q}_{1-\alpha }(\mathbf{z}^{\ast }),\text{ }\hat{\eta}-% \hat{\sigma}\hat{q}_{\alpha }(\mathbf{z}^{\ast })). \label{studBSCI} \end{equation}% This is usually called the \emph{studentized bootstrap confidence interval}. Studentized bootstrap intervals can be readily used with the parametric model $F(y,\mathbf{\theta })$ of Section (\ref{Parametric}). Suppose that $% \mathbf{y}$ is a random sample of size $n$ drawn from the distribution $F(y,% \mathbf{\theta }_{0}),$ and that $\eta _{0}=\eta (\mathbf{\theta }_{0})$ is the quantity of interest. We can estimate $\eta _{0}$ using $\hat{\eta}=\eta (\mathbf{\hat{\theta}})$ where $\mathbf{\hat{\theta}}$ is the MLE of $% \mathbf{\theta }_{0}.$When $n$ is large we can use the asymptotic approximation \begin{equation*} n^{1/2}(\hat{\eta}-\eta _{0})/\sigma (\mathbf{\hat{\theta}})\sim N(0,1). \end{equation*}% When $n$ is not large it is worth employing bootstrapping. We draw $B$ bootstrap samples $\mathbf{y}_{j}^{\ast }$ $j=1,2,...,B$ from the fitted distribution $F(y,\mathbf{\hat{\theta}}).$ From each sample we obtain the bootstrap ML estimator $\mathbf{\hat{\theta}}_{j}^{\ast }$ and bootstrap studentized quantity $z_{j}^{\ast }=n^{1/2}(\hat{\eta}_{j}^{\ast }-\eta ^{\ast })/\sigma (\mathbf{\hat{\theta}}_{j}^{\ast }).$ The quantiles $\hat{q}% _{\alpha }(\mathbf{z}^{\ast })$ and $\hat{q}_{1-\alpha }(\mathbf{z}^{\ast })$ can be obtained from the EDF of these $z_{j}^{\ast }$ and the studentized interval (\ref{studBSCI}) constructed. In the non-parametric case an estimate of the variance of $\hat{\eta}$ may not be immediately available. One possibility is to use the double bootstrap as given previously (where $s=% \hat{\eta})$ with the inner loop taking the form \vspace{0.2cm}\newline \_\textbf{Inner Bootstrap:} \vspace{0.2cm}\newline \_For $k=1$ to $B$ \vspace{0.2cm}\newline \_\_For $i=1$ to $n$ \vspace{0.2cm}\newline \_\_\_Draw $y_{ijk}^{\ast \ast }$ from $F_{n}(.|\mathbf{y}_{j}^{\ast })$ \vspace{0.2cm}\newline \_\_Next $i$ \vspace{0.2cm}\newline \_\_Let $s_{jk}^{\ast \ast }=s(\mathbf{y}_{jk}^{\ast \ast })$ \vspace{0.2cm}% \newline \_Next $k$ \vspace{0.2cm}\newline \_Let $\hat{\eta}_{j}^{\ast }=s(\mathbf{y}_{j}^{\ast })$ \vspace{0.2cm}% \newline \_Let $\bar{s}_{j\cdot }^{\ast \ast }=\frac{1}{B}\sum_{k=1}^{B}s_{jk}^{\ast \ast }$ \vspace{0.2cm}\newline \_Let $v_{j}^{\ast \ast }=\frac{1}{B-1}\sum_{k=1}^{B}(s_{jk}^{\ast \ast }-% \bar{s}_{j\cdot }^{\ast \ast })^{2}$ \vspace{0.2cm}\newline \_Let $z_{j}^{\ast }=n^{1/2}(\hat{\eta}_{j}^{\ast }-\eta ^{\ast })/(v_{j}^{\ast \ast })^{1/2}$ \vspace{0.2cm}\newline \textbf{\_End of Inner Loop} \vspace{0.2cm}\newline \noindent If the double bootstrap is considered computationally too expensive, then an alternative using \emph{influence functions }can be used provided our statistic is expressible as functional of the EDF, i.e. $s(% \mathbf{y})=\eta \lbrack F_{n}(.|\mathbf{y})].$ Such statistics are termed \emph{statistical functions, }and were introduced by von Mises (1947). We then assume that the relevant functional $\eta =\eta \lbrack F(.)]$ has a linearised form (akin to a first order linear generalisation of a Taylor series): $\eta (G(.))$ $\simeq \eta (F(.))$ $+$ $\int L_{\eta }(y|F(.))dG(y)$% . Here \begin{eqnarray} L_{\eta }(y|F(.)) &=&\frac{\partial \eta \{(1-\varepsilon )F(.)+\varepsilon H_{y}(.)\}}{\partial \varepsilon } \label{influencefn} \\ &=&\lim_{\varepsilon \rightarrow 0}\varepsilon ^{-1}[\eta \{(1-\varepsilon )F(.)+\varepsilon H_{y}(.)\} \notag \\ &&-\eta (F(.))], \notag \end{eqnarray}% the derivative of $\eta $ at $F,$ is called the \emph{influence function} of $\eta ,$ and $H_{y}(x)$ is the Heaviside unit step function with jump from $% 0 $ to $1$ at $x=y.$ The sample approximation \begin{equation} l(y|\mathbf{y})=L_{\eta }(y|F_{n}(.|\mathbf{y})) \label{EmpInfluenceFn} \end{equation}% is called the \emph{empirical influence function}. An analogous argument to that used in the delta method yields the non-parametric analogue of (\ref% {sigsq}) as \begin{equation*} \mathrm{Var}[\eta \{F(.)\}]=n^{-1}\int L_{\eta }^{2}\{y|F(.)\}dF(y), \end{equation*}% with sample version \begin{equation*} \mathrm{Var}[\eta \{F_{n}(.|\mathbf{y})\}]=n^{-2}\sum_{j=1}^{n}l^{2}(y_{j}|% \mathbf{y}), \end{equation*}% where the $l(y_{j}|\mathbf{y})$ are the \emph{empirical function values} evaluated at the observed values $y_{j}.$ In practice these values have to be evaluated numerically using (\ref{influencefn}) with, typically, $% \varepsilon =(100n)^{-1}.$ The ease with which the empirical influence function can be evaluated is rather problem dependent, and the potential complexity of this calculation detracts from this approach. One simple approximation is the jacknife approximation \begin{equation*} l_{jack}(y_{j}|\mathbf{y})=(n-1)(\hat{\eta}(\mathbf{y})-\hat{\eta}(\mathbf{y}% _{\smallsetminus j})) \end{equation*}% where $\hat{\eta}(\mathbf{y}_{\smallsetminus j})$ is the estimate calculated from the sample $\mathbf{y}$ but with the observation $y_{j}$ omitted. (See Davison and Hinkley 1997, problem 2.18.) \textbf{\large 2.4 Percentile Methods} \vspace{0.5cm}\newline Suppose now that the distribution of $\hat{\eta}$ is symmetric, or more generally, there is some transformation $w=h(\hat{\eta})$ for which the distribution is symmetric. Examination of the form of the bootstrap percentiles (\ref{a*b*}), (see for example Hjorth, 1994, Section 6.6.2) shows that they do not depend on the explicit form of $h(.)$ at all. Instead we find that they can, with a change of sign, be swapped, ie $a^{\ast }$ is replaced by $-b^{\ast }$ and $b^{\ast }$ by $-a^{\ast }.$ Then (\ref{basicCI}% ) becomes \begin{equation} (\hat{\eta}_{\alpha }^{Per},\text{ }\hat{\eta}_{1-\alpha }^{Per})=(\hat{q}% _{\alpha }(\mathbf{\hat{\eta}}^{\ast }),\text{ }\hat{q}_{1-\alpha }(\mathbf{% \hat{\eta}}^{\ast })). \label{PercentileCI} \end{equation}% This $(1-2\alpha )$ confidence interval is known as the \emph{bootstrap percentile interval}. This confidence interval is easy to implement but can be improved. We give two variants called the \emph{bias corrected} (BC) and \emph{accelerated bias corrected }(BCa) \emph{percentile methods.} The basic idea in the BCa method is that there is a studentized transformation $g$ for which \begin{equation} \frac{g(\hat{\eta})-g(\eta )}{1+ag(\eta )}+b\thicksim N(0,1). \label{BCaAssumption} \end{equation}% The BC method is simply the special case where $a=0$. If we calculate a one sided confidence interval for $\eta $ under the assumption (\ref% {BCaAssumption}) we find that if we set \begin{equation*} \beta _{1}=\Phi \left( b+\frac{b+z_{\alpha }}{1-a(b+z_{\alpha })}\right) \end{equation*}% where $\Phi (.)$ is the standard normal distribution function, then with confidence $(1-\alpha )$% \begin{equation*} \hat{q}_{\beta _{1}}(\mathbf{\hat{\eta}}^{\ast })<\eta \end{equation*}% where $\hat{q}_{\beta }(\mathbf{\hat{\eta}}^{\ast })$ is the bootstrap percentile of the $\mathbf{\hat{\eta}}^{\ast }$ with probability $\beta .$ In a similar way a two sided $1-2\alpha $ confidence interval is \begin{equation*} \hat{q}_{\beta _{1}}(\mathbf{\hat{\eta}}^{\ast })<\eta <\hat{q}_{\beta _{2}}(% \mathbf{\hat{\eta}}^{\ast }) \end{equation*}% where \begin{equation*} \beta _{2}=\Phi \left( b+\frac{b+z_{1-\alpha }}{1-a(b+z_{1-\alpha })}\right) . \end{equation*}% The bias parameter $b$ is obtained as \begin{equation*} b=\Phi ^{-1}\{\breve{F}_{n}(\hat{\eta}\text{ }|\text{ }\mathbf{\hat{\eta}}% ^{\ast })\} \end{equation*}% where $\tilde{F}_{n}(\hat{\eta}$ $|$ $\mathbf{\hat{\eta}}^{\ast })$ is the smoothed EDF of the bootstrap sample $\mathbf{\hat{\eta}}^{\ast }$ evaluated at the original estimated value $\hat{\eta}.$ If we now set $a=0$ this gives the BC method. We denote the resulting two-sided version of the confidence interval by $(\hat{\eta}_{\alpha }^{BC},$ $\hat{\eta}_{1-\alpha }^{BC}).$ The acceleration parameter $a$ is arguably less easily calculated. Efron (1987) suggests the approximation (which is one-sixth the estimated standard skewness of the linear approximation to $\eta ):$% \begin{equation*} a=\frac{1}{6}\frac{\sum_{i=1}^{n}l^{3}(y_{i}|\mathbf{y})}{% [\sum_{i=1}^{n}l^{2}(y_{i}|\mathbf{y})]^{3/2}} \end{equation*}% where $l(y_{i}|\mathbf{y})$ are the values of the empirical influence function (\ref{EmpInfluenceFn}) evaluated at the observations $y_{i}.$ We denote the two-sided version of the confidence interval given by this BCa method by $(\hat{\eta}_{\alpha }^{BCa},$ $\hat{\eta}_{1-\alpha }^{BCa}).$ \textbf{{\large 3 \label{Theory}Theory}} \vspace{0.5cm}\newline Like many statistical methods, understanding of the practical usefulness of the bootstrap as well as its limitations has been built up gradually with experience through applications. This practical experience has been underpinned by a growing asymptotic theory which provides a basis for understanding when bootstrapping will work and when it will not. A truly general theory rapidly becomes very technical and is still incomplete. A detailed treatment is not attempted here, but some of the more accessible and useful results are summarized. Bootstrapping relies on sampling from the EDF $F_{n}(.|\mathbf{y})$ reproducing the behaviour of sampling from the original distribution $F(.).$ We therefore need convergence of $F_{n}(.|% \mathbf{y})$ to $F(.).$ This is confirmed by the Glivenko-Cantelli theorem which guarantees strong convergence, i.e. \begin{equation*} \sup_{y}\left\vert F_{n}(y|\mathbf{y})-F(y)\right\vert \rightarrow 0\text{ with probability 1} \end{equation*}% as $n\rightarrow \infty .$ Though reassuring this does not throw any direct light on the bootstrap process. To investigate the bootstrap process we look instead at the equally well-known result that the process \begin{equation*} Z_{n}(y)=\sqrt{n}(F_{n}(y)-F(y)) \end{equation*}% converges in probability to the Brownian bridge $B(F(y))$ as $n\rightarrow \infty .$ [A Brownian bridge $B(t),$ $0$ $0,$ then \begin{equation*} \limsup_{n\rightarrow \infty }\frac{n^{1/4}\sup_{y}\left\vert H_{n}^{\ast }(y)-H_{n}(y)\right\vert }{\sqrt{\log \log n}}=c_{p,F}\text{ with prob. 1} \end{equation*}% This result shows that $H_{n}^{\ast }(y)$ converges to $H_{n}(y)$ at rate $% O(n^{-1/4}\log \log n).$ Now for a percentile estimator we have from the Berry-Ess\'{e}en theorem \begin{equation*} \sup_{y}\left\vert H_{n}(y)-\Phi (\frac{y}{\tau })\right\vert =O(n^{-1/2}) \end{equation*}% where $\tau =\sqrt{p(1-p)}/f(\eta ).$ Thus if we were to use a normal approximation for $H_{n}(y)$ we would use $\Phi (y/\hat{\tau})$ where $\hat{% \tau}$ is an estimate of $\tau .$ Whether this is better than $H_{n}^{\ast }(y)$ thus depends on the converence rate of $\hat{\tau},$ and the matter is not a clear one. \textbf{\large 3.2 Asymptotic Accuracy of EDF's}\vspace{0.5cm}\newline Edgeworth expansions are asymptotic series aimed at improving the normal approximation by introducing additional terms that try to correct for effects of skewness, kurtosis and higher moment which slow the rate of convergence to normality. For fixed $n,$ asymptotic series usually diverge as more and more terms are included. However for a fixed number of terms, $k$ say, the series converges as $n\rightarrow \infty .$ The rate of convergence is usually of smaller order of than the last included term. We shall only consider the special case $\hat{\eta}=\bar{Y}_{n}.$ Here, the general Edgeworth expansion is a power series in $n^{-1/2}$ and has the form: \begin{eqnarray} \tilde{H}_{n}(y) &=&\Phi (y)+n^{-1/2}p^{(1)}(y)\phi (y)+.. \label{Edgeworth} \\ &&+n^{-k/2}p^{(k)}(y)\phi (y)+o(n^{-k/2}) \notag \end{eqnarray}% where $\phi (y)=(2\pi )^{-1/2}\exp (-y^{2}/2)$ is the standard normal density, and $p_{j}$ is a polynomial of degree $3j-1.$ We have explicitly \begin{equation*} p^{(1)}(y)=-\frac{1}{6}\kappa _{3}(y^{2}-1) \end{equation*}% and \begin{equation*} p^{(2)}(y)=-\{\frac{1}{24}\kappa _{4}(y^{2}-3)+\frac{1}{72}\kappa _{3}^{2}(y^{4}-10y^{2}+15)\} \end{equation*}% where $\kappa _{3}$ and $\kappa _{4}$ are the skewness and kurtosis of $F(.)$% . The term involving $p^{(1)}$ corrects for the main effect of skewness, the term involving $p^{(2)}$ corrects for the main effect of kurtosis and for the secondary effect of skewness. Often the remainder term $o(n^{-k/2})$ can be replaced by $O(n^{-(k+1)/2})$ when the Edgeworth expansion is said to be (% $k+1)$th order accurate. Usually inclusion of more than one or two correction terms becomes counter-productive as the coefficients associated with the powers of $n^{-1/2}$ rapidly become large with $k$. When $% E(|Y|^{3})<\infty $ and $Y$ is non-lattice then one-term Edgeworth expansions for both $H_{n}(y)$ and $H_{n}^{\ast }(y)$ exist and a comparison (see Shao and Tu, 1995, Section 3.3.3) shows that $H_{n}^{\ast }(y)$ has smaller asymptotic mean square error than $\Phi (y/\hat{\sigma}_{n})$ unless the skewness is zero. In fact comparison of $H_{n}^{\ast }(y)$ and the one-term Edgeworth expansion estimator \begin{equation*} H_{n}^{EDG}(y)=\Phi (z)+n^{-1/2}p_{n}^{(1)}(z\text{ }|\text{ }\mathbf{y}% _{n})\phi (z) \end{equation*}% where $z=\hat{\sigma}_{n}^{-1}y,$ and $p_{n}^{(1)}(z$ $|$ $\mathbf{y}_{n})$ is the polynomial $p^{(1)}(z)$ with the moments of $F(.)$ replaced by the sample moments calculated from $\mathbf{y}_{n},$ shows that both have the same asymptotic mean square error. If we considered studentized versions the bootstrap does even better. Under appropriate moment conditions both $\breve{% H}_{n}(y)$ and $\tilde{H}_{n}^{\ast }(y)$ have two-term Edgeworth expansions and a comparison of these shows that $\tilde{H}_{n}^{\ast }(y)$ has a smaller asymptotic mean square error than the corresponding one-term Edgeworth estimator, though they have the same convergence rate. The normal, bootstrap and one-term Edgeworth expansion estimators of the standardized distribution have been compared by Hall (1988) using the criterion of asymptotic relative error. When $E(|Y|^{3})<\infty $ then the bootstrap does better than the normal approximation, however it may or may not do better than the one term Edgeworth expansion (see Shao and Tu, 1995). When $% E(|Y|^{3})=\infty ,$ the situation is more complicated and depends on the tail behaviour of $F(.).$ When the tail is thin the bootstrap can be worse than the normal approximation. In estimating tail behaviour the bootstrap is comparable to the one term Edgeworth expansion except in the extreme of the tail. \textbf{\large 3.3 Asymptotic Accuracy of Confidence Intervals} \vspace{0.5cm% }\newline The above analysis focuses on distribution functions, and does not give the whole picture. It is helpful to consider also the coverage accuracy of confidence intervals. We shall write the basic confidence limit that we seek as $\hat{\eta}_{\alpha }$ defined by \begin{equation*} \Pr (\eta \leq \hat{\eta}_{\alpha })=\alpha \end{equation*}% and the normal, basic, percentile, studentized bootstrap, BC and BCa approximations as $\hat{\eta}_{\alpha }^{Norm},$ $\hat{\eta}_{\alpha }^{Boot},$ $\hat{\eta}_{\alpha ,}^{Per}$ $\hat{\eta}_{\alpha }^{Stu},$ $\hat{% \eta}_{\alpha }^{BC}$ and $\hat{\eta}_{\alpha }^{BCa}$ respectively. We summarise the results given in Shao and Tu (1995). These apply when the parameter of interest is a smooth function of the mean, $\eta =\eta (\bar{y}% _{n})$. Then an analysis analogous to that used for EDF's can be carried out, but now relying on an expansion of the quantile, that is the inverse of Edgeworth series, called the \emph{Cornish-Fisher expansion}. Let $\Phi (z_{\alpha })=\alpha .$ Then, under appropriate moment conditions $\hat{\eta}% _{\alpha }$ has an expansion of the form \begin{equation*} q(\alpha )=z_{\alpha }+n^{-1/2}q^{(1)}(z_{\alpha })+n^{-1}q^{(2)}(z_{\alpha })+o(n^{-1}) \end{equation*}% where comparison with the Edgeworth exapnasion shows that \begin{equation*} q^{(1)}(y)=-p^{(1)}(y) \end{equation*}% and \begin{equation*} q^{(2)}(y)=p^{(1)}(y)p^{(1)\prime }(y)-\frac{1}{2}p^{(1)}(y)^{2}-p^{(2)}(y). \end{equation*}% Under appropriate conditions we find that analogous expansions exist for the quantile approximations listed above. We find in particular that \begin{equation*} \hat{\eta}_{\alpha }^{Boot}-\hat{\eta}_{\alpha }=O_{p}(n^{-1}) \end{equation*}% and in general \begin{equation*} \Pr (\hat{\eta}_{\alpha }^{Boot}\leq \eta )=1-\alpha +O(n^{-1/2}). \end{equation*}% However the two tailed version is second-order accurate: \begin{equation*} \Pr (\hat{\eta}_{\alpha }^{Boot}\leq \eta \leq \hat{\eta}_{1-\alpha }^{Boot})=1-2\alpha +O(n^{-1}). \end{equation*}% For symmetric distributions, these results apply to the percentile approximation as well, for example: \begin{equation*} \hat{\eta}_{\alpha }^{Per}-\hat{\eta}_{\alpha }=O_{p}(n^{-1}). \end{equation*}% The bootstrap BC method turns out to perform no better than the percentile limit, in terms of convergence rate, but the constant factor is smaller so it is marginally to be preferred. Studentization is definitely better with \begin{equation*} \hat{\eta}_{\alpha }^{Stu}-\hat{\eta}_{\alpha }=O_{p}(n^{-3/2})\text{ and }% \hat{\eta}_{\alpha }^{BCa}-\hat{\eta}_{\alpha }=O_{p}(n^{-3/2}) \end{equation*}% and \begin{eqnarray*} \Pr (\hat{\eta}_{\alpha }^{Stu} &\leq &\eta )=1-\alpha +O(n^{-1})\text{ } \\ &&\text{and} \\ \Pr (\hat{\eta}_{\alpha }^{BCa} &\leq &\eta )=1-\alpha +O(n^{-1}). \end{eqnarray*}% It follows that the two-sided intervals for both limits are also both second-order accurate. Finally we note that \begin{equation*} \hat{\eta}_{\alpha }^{Boot}-\hat{\eta}_{\alpha }^{Norm}=O_{p}(n^{-1}) \end{equation*}% so that the usual normal approximation and the basic bootstrap behave similarly. \pagebreak \textbf{{\large 3.4 \label{Failure}Failure of Bootstrapping}} \vspace{0.5cm}% \newline It should be clear from the previous three subsections that, for bootstrapping to work well, regularity conditions are required on both the distribution $F$ and also on the statistic of interest. More explicitly, bootstrapping is sensitive to the tail behaviour of $F$; convergence of $% H_{n}^{\ast }$ usually requires moment conditions on $F$ that are more stringent than those needed for convergence of $H_{n}.$ Also the statistic $% s(\mathbf{y})$ has to be suitably smooth in an appropriate sense. Finally it is possible for convergence of the bootstrap to be sensitive to the method used in carrying out the bootstrapping. An example of the first situation is inconsistency of $s(\mathbf{y}),$ when it is simply the variance, even when the asymptotic variance is finite (see Ghosh \emph{et al.}, 1984). This can occur if $F(.)$ is fat-tailed and does not have appropriate moments. The problem then arises because the bootstrap $s(\mathbf{y}^{\ast })$ can take exceptionally large values. An example of the second situation is where $% \mathbf{y}$ is a random sample, from $U(0,b)$ say, and we wish to consider $% y_{(n)}$, the largest order statistic. Then a natural statistic to consider is $s(\mathbf{y})=$ $n(b-y_{(n)})/b,$ which has a limiting standard exponential distribution as $n\rightarrow \infty $. The bootstrap version is then $s(\mathbf{y}^{\ast }\ |\ \mathbf{y})=$ $n(y_{(n)}-y_{(n)}^{\ast })/y_{(n)}.$ But% \begin{eqnarray*} P^{\ast }(s(\mathbf{y}^{\ast }\ |\ \mathbf{y}) &=&0)=P^{\ast }(y_{(n)}^{\ast }=y_{(n)}) \\ &=&1-(1-n^{-1})^{n}\rightarrow 1-e^{-1}. \end{eqnarray*}% Thus $H_{n}^{\ast }$ does not tend to $H_{n}$ as $n\rightarrow \infty .$ This result applies to any given order statistic $y_{(n-k)}$ where $k$ is fixed. However the problem does not arise for a given quantile $y_{(pn)}$ where $p$ is fixed with $0\beta _{0i}$ where $\beta _{0i}>0$ are given constants. Let $S=\{\beta _{i}:\left\vert \beta _{i}\right\vert >\beta _{0i}\}$ denote the \emph{important set of coefficients}. The obvious estimate is to select those coefficients $\beta _{i}$ for which $\left\vert \hat{\beta}_{i}\right\vert >\beta _{0i},$i.e. $% \hat{S}=\{\beta _{i}:\left\vert \hat{\beta}_{i}\right\vert >\beta _{0i}\}$. Bootstrapping is a simple way of assessing the stability of this choice. We generate $B$ bootstrap samples $(y_{j}^{(k)\ast },\mathbf{x}_{j}^{(k)\ast })$ $j=1,2,...,n,$ $k=1,2,...,B,$ using either residual sampling or case sampling say and fit the regression model to each bootstrap sample to give bootstrap estimates $\mathbf{\hat{\beta}}^{(k)\ast },\ k=1,2,...,B.$ We then calculate $\hat{S}^{(k)\ast }=\{\beta _{i}:\left\vert \hat{\beta}% _{i}^{(k)\ast }\right\vert >\beta _{0i}\},$ $k=1,2,...,K.$ Now assume that $% B $ is sufficiently large so that each distinct bootstrap important set that has been obtained occurs a reasonable number of times. Then a $(1-\alpha )$ confidence region for the unknown true important set can be constructed by selecting bootstrap important sets in decreasing order of their observed frequency of occurrence until $(1-\alpha )100\%$ of the $\hat{S}^{(k)\ast }$ have been chosen. Related to identifying important factors in metamodels is the rather more difficult problem of metamodel selection. We consider this in the next section. \textbf{{\large 4.6 \label{Metamodelselect}Metamodel Comparison and Selectionodels}} \vspace{0.5cm}\newline Metamodel comparison and selection is a difficult subject. The main trouble is that models that we wish to compare may be of different functional complexity. The statistical properties associated with quantities derived from different models may therefore be hard to put on a common scale on which they can be compared. A reasonably satisfactory approach is based on the use of \emph{cross-validation} technique. We suppose that the initial data has the form \begin{equation*} S=\{(y_{j},\mathbf{x}_{j}),\ \ j=1,2,...,n\} \end{equation*}% with \begin{equation*} y_{j}=\eta (\mathbf{x}_{j},\mathbf{\beta })+\varepsilon _{j},\ \ j=1,2,...,n. \end{equation*}% Suppose that our fitted regression is \begin{equation*} \hat{y}(\mathbf{x})=\eta (\mathbf{x},\mathbf{\hat{\beta}}). \end{equation*}% In its basic form cross-validation is used to assess how effective the fitted regression is for predicting the response at some new design point $% \mathbf{x}_{new}.$ Rather than explicitly choosing such a new point, a simple idea is the \emph{leave-one-out} method where we drop an observed point $(y_{j},\mathbf{x}_{j})$ from the set of all observed points, fit the metamodel to the remaining points $S_{-j}=$ $S\smallsetminus (y_{j},\mathbf{x% }_{j})$ giving the fitted regression as \begin{equation*} \hat{y}_{-j}(\mathbf{x})=\eta (\mathbf{x},\mathbf{\hat{\beta}}_{-j}), \end{equation*}% and then look at the squared difference between the omitted $y_{j}$ and the value of $y$ at $\mathbf{x}_{j}$, as predicted by the fitted model; i.e. \begin{equation*} (y_{j}-\hat{y}_{-j}(\mathbf{x}_{j}))^{2}. \end{equation*}% If we do this even handedly by leaving out each observation in turn we have as an \emph{estimate of cross-validation prediction error}: \begin{equation} \hat{L}_{CV}=n^{-1}\sum_{j=1}^{n}(y_{j}-\hat{y}_{-j}(\mathbf{x}_{j}))^{2}. \label{Lcv} \end{equation}% It turns out that, for the linear regression model, this formula simplifies elegantly to one where we only need comparisons with the one model fitted to all the original observations: \begin{equation*} \hat{L}_{CV}=n^{-1}\sum_{j=1}^{n}\frac{(y_{j}-\hat{y}(\mathbf{x}_{j}))^{2}}{% 1-h_{jj}} \end{equation*}% where \begin{equation*} \hat{y}(\mathbf{x}_{j})=\mathbf{x}_{j}^{T}\mathbf{\hat{\beta}}, \end{equation*}% and $h_{jj}$ is the $j$th main diagonal entry in the hat-matrix (\ref% {Hatmatrix}), as before. The distributional property of $\hat{L}_{CV}$ can be obtained in the usual way by bootstrapping to get $B$ bootstrap samples. If we use case resampling say then \begin{equation*} S^{(i)\ast }=\{(y_{j}^{(i)\ast },\mathbf{x}_{j}^{(i)\ast }),\ \ j=1,2,...,n\} \end{equation*}% where each $(y_{j}^{(i)\ast },\mathbf{x}_{j}^{(i)\ast })$ is an observation drawn at random from the set $S.$ We turn now to model selection. For simplicity we shall only consider the linear case where $y=\mathbf{x}^{T}% \mathbf{\beta }+\varepsilon $, though with an obvious adjustment, the discussion does generalize. Let $\mathbf{\beta }$ be of dimension $p$. If we are uncertain as to which design variables are really needed in the model, then we would have in principle up to $2^{p}$ models to choose from with $% \binom{p}{q}$ (sub)models where there were precisely $q$ covariates selected. We denote a typical submodel by $M$. The estimate of prediction error using (\ref{Lcv}) is \begin{equation*} \hat{L}_{CV}(M)=n^{-1}\sum_{j=1}^{n}\frac{(y_{j}-\hat{y}_{M}(\mathbf{x}% _{j}))^{2}}{1-h_{Mjj}}. \end{equation*}% It turns out that this measure does not work as well as it might even when $% n $ is large. (This is an example previously referred to in subsection (\ref% {Failure}) where inconsistency arises not in the problem itself but in the bootstrap sampling method.) A much better variant is not to leave out just one observation at a time but instead to split the observations into two: a \emph{training} set, and an \emph{assessment} set with respectively $% n_{t}=n-m$ and $n_{a}=m$ observations in each. We shall moreover select not one but $K$ such pairs and denote the sets as $S_{t,k}$ and $S_{a,k}$, $% k=1,2,...,K.$ We shall write $\mathbf{\hat{\beta}}_{Mk}$ for the coefficients of model $M$ fitted to the $k$th training set, and write $\hat{y% }_{Mjk}=\mathbf{x}_{Mj}^{T}\mathbf{\hat{\beta}}_{Mk}$ for the value of $% \mathbf{x}_{j}^{T}\mathbf{\beta }$ predicted by this model $M$ at $\mathbf{x}% _{j}.$ Then $\hat{L}_{CV}(M)$ becomes \begin{equation} \hat{L}_{CV}(M)=K^{-1}\sum_{k=1}^{K}m^{-1}\sum_{j\in S_{a,k}}(y_{j}-\hat{y}% _{Mjk})^{2}. \label{Lcvmhat} \end{equation}% We use the same set of $K$ pairs for all models being compared. \ Provided $% n-m\rightarrow \infty $ and $m/n\rightarrow 1$ as $n\rightarrow \infty $ then selecting $M$ to minimize (\ref{Lcvmhat}) will yield a consistent estimator for the correct model. When $n$ is small it may not be possible to select $m$ large enough in which case Davison and Hinkley (1997) suggest taking $m/n\simeq 2/3.$ The variability of $\hat{L}_{CV}(M)$ can be estimated by bootstrapping in the usual way. \textbf{{\large 5 \label{Comparisons}Bootstrap Comparisons}} \vspace{0.5cm}% \newline In this Section we consider problems where there are a number of different samples to compare. We suppose that we have $m$ data sets% \begin{equation} \mathbf{y}_{i}=(y_{i1},y_{i2},...,y_{i,n_{i}}),\ \ i=1,2,...,m \label{Severalys} \end{equation}% the $i$th being of size $n_{i}$ runs. This situation covers two different scenarios in particular. The first is where we are considering the question: Does the output of the simulation model accurately represent the output of the system being modelled? Here we have just two data sets, so that $m=2,$ and one data set comprises observations of a real system whilst the other comprises the observed output from a simulation model. Thus this is a question of validation. The second is when we have a number of different simulation models to assess. Here validation against real data is not the issue. The different models simply represent different systems, and their validity is not in doubt. Instead we are just interested in assessing how differently the systems behave, as represented by the output of their simulation models. Typically we might be interested in which system produces the greatest average output, or least average delay. Alternatively we might wish to compare output variability. A similar methodology can be applied to either problem. We discuss this in the remainder of this section. \textbf{\large 5.1 Goodness-of-Fit and Validation} \vspace{0.5cm}\newline A discussion of validation involving trace-driven simulation is described in Kleijnen \emph{et al.} ( 2000, 2001). The basic idea used there can be generalized. We treat the problem as essentially one of goodness-of-fit. A simple starting point is where we have $n$ simulation outputs $y_{j},$ $% j=1,2,...,n,$ to compare with $m$ observations $y_{j}^{\text{\emph{Real}}},$ $j=1,2,...,m$ of a real system and we wish to know if the two samples are identically distributed. A good goodness-of-fit statistic is the two sample Cramer - von Mises statistic (see Anderson, 1962): \begin{equation} W^{2}=\int [F_{n}(y)-F_{m}^{\text{\emph{Real}}}(y)]^{2}dH_{n+m}(y) \label{Cvm} \end{equation}% where $F_{n},$ $F_{m}^{\text{\emph{Real}}}$ and $H_{n+m}$ are respectively the EDF's of the simulated, real and combined samples. The asymptotic distribution of $W^{2}$ is known when the samples are drawn from continuous distributions, but it is easy to set up a bootstrap version of the test that can handle discrete distributions just as well. We can use a bootstrapping method as follows. We obtain $B$ pairs of bootstrap samples $(\mathbf{y}% ^{(i)\ast },$ $\mathbf{y}^{\text{\emph{Real}}(i)\ast })$, $i=1,2,...,B$, each sample in the pair being obtained by resampling with replacement from $% \mathbf{y}$ and $\mathbf{y}^{\text{\emph{Real}}}$ combined but with $\mathbf{% y}^{(i)\ast }$ the same sample size as $\mathbf{y}$, and $\mathbf{y}^{\text{% \emph{Real}}(i)\ast }$ the same sample size as $\mathbf{y}^{\text{\emph{Real}% }}$. Denote the EDFs of the samples of each pair by $F_{n}^{(i)\ast }$ and $% F_{n}^{\text{\emph{Real}}(i)\ast },$ $i=1,2,...,B$. We can now calculate $B$ bootstrap two sample Cramer - von Mises statistics from each pair: \begin{equation} W^{(i)\ast 2}=\int [F_{n}^{(i)\ast }(y)-F_{m}^{\text{\emph{Real}}(i)\ast }(y)]^{2}dH_{n+m}^{(i)\ast }(y). \label{Cvmboot} \end{equation}% Under the null assumption that $\mathbf{y}$ and $\mathbf{y}^{\text{\emph{Real% }}}$ are drawn from the same distribution it follows that each $W^{(i)\ast 2} $ is just a bootstrap version of $W^{2}$. We can thus calculate a critical p-value from the EDF of the $W^{(i)\ast 2},$ $i=1,2,...,B$. The null hypothesis that $\mathbf{y}$ and $\mathbf{y}^{\text{\emph{Real}}}$ are drawn from the same distribution can be tested simply by checking if the original $W^{2}$ exceeds this p-value or not. The validation method just described is easily extended to allow the representational accuracy of a number of different competing simulation models to be compared against (the same) real data. We order the Cramer von-Mises goodness of fit statistics (% \ref{Cvm}) obtained from comparing the real data with the simulated output of each of the models. We can assess the reliability of the comparison by bootstrapping as in (\ref{Cvmboot}) to obtain bootstrap distributions of each such Cramer von-Mises statistic and looking at the degree of overlap of amongst these distributions. When the $W^{2}$ statistic shows the data samples $\mathbf{y}_{j}$ to be significantly different, there is an interesting decomposition of $W^{2}$ that allows the nature of the differences to be more clearly identified . Durbin and Knott (1972) show that in the one sample case where $F_{n}(.)$ is being compared with a uniform null distribution, so that $W^{2}=n$ $\int [F_{n}(y)-y]^{2}dy,$ then $W^{2}$ has the orthogonal representation% \begin{equation*} W^{2}=\sum_{j=1}^{\infty }(j\pi )^{-2}z_{nj}^{2} \end{equation*}% where the $z_{nj}$ are scaled versions of the coefficients in the Fourier decomposition of the process$\sqrt{n}$ $(F_{n}(y)-y).$ The $z_{nj}$ are stochastic of course, depending on $F_{n}(.),$ but they are independent and, under the null, have identical distributions. It can be shown that $z_{n1},$ $z_{n2}$ and $z_{n3}$ are dependent essentially exclusively on deviations respectively of the mean, variance and skewness of $F()$ from that of the null. In other words these components $z_{n1},$ $z_{n2}$ and $z_{n3}$ can be used to provide convenient statistical tests for these differences. Cheng and Jones (2000, 2004) describe a generalisation of the results of Durbin and Knott (1972), which they term \emph{EDFIT statistics,} suitable for comparison of data of the form (\ref{Severalys}) arising in the simulation context. Their formulation uses ranks rather than the original. Though this leads to some loss of power the method does then have the advantage of enabling critical values of tests to be easily obtained by bootstrapping, and moreover in an exact way. \textbf{\large 5.2 Comparison of Different Systems} \vspace{0.5cm}\newline We now consider the situation of (\ref{Severalys}) where the data sets are outputs from different simulation models only, and no comparison is made with real data. Here we are interested simply in making comparisons between different models. The discussion of the previous subsection goes through with no essential change. The only difference is that all samples have the same status; there is no sample being singled out as special by being drawn from a real system. The EDFIT approach can therefore be used directly, with bootstrapping providing critical null values. A more direct approach, not using goodness of fit ideas, is possible. We focus on examining differences between the means, $\bar{y}_{i},$ $i=1,2,..,m,$ of the $m$ samples of (\ref% {Severalys}). Comparison of other sample statistics be handled in the same way. If we suppose that largest is best we can rank the means as $\bar{y}% _{(1)}>\bar{y}_{(2)}>...>\bar{y}_{(m)}.$ The question then is how stable is this order? Bootstrapping provides an easy answer. We generate $B$ bootstrap sets of observations% \begin{eqnarray} \mathbf{y}_{i}^{(k)\ast } &=&(y_{i1}^{(k)\ast },y_{i2}^{(k)\ast },...,y_{i,n_{i}}^{(k)\ast }),\ \ \label{Severalyboots} \\ i &=&1,2,...,m,\ \ k=1,2,...,B \notag \end{eqnarray}% where each sample $\mathbf{y}_{i}^{(k)\ast },\ k=1,2,...,B,$ is obtained by sampling with replacement from $\mathbf{y}_{i}.$ We then order the means in each bootstrapped set: $\bar{y}_{(1)}^{(k)\ast }>\bar{y}_{(2)}^{(k)\ast }>...>\bar{y}_{(m)}^{(k)\ast }$, $k=1,2,...,B.$ The frequency count of how many times the mean of a given sample comes out on top is a measure its relative merit. A more comprehensive picture is provided by setting up a two-way table showing the number of times $\bar{y}_{i}^{(k)\ast }>\bar{y}% _{j}^{(k)\ast }$ out of the $B$ bootstrapped sets, for each possible pair $% 1\leq i,$ $j\leq m$. The parametric form of the bootstrapping procedure is equally easy to implement, though it is not clear there is much advantage to be had in doing so. Suppose for example that the $i$th sample $\mathbf{y}% _{i} $ is drawn from the distribution $F^{(i)}(.,\mathbf{\theta }_{i}).$ Usually the $F^{(i)}(.,\mathbf{\theta }_{i})$ will have the same functional form, for example all normal distributions, but the procedure works equally well when the $F^{(i)}$ are functionally different. Let the mean of the $% F^{(i)}$ distribution be $\mu ^{(i)}(\mathbf{\theta }_{i}).$ Then we can estimate, by ML say, the parameters of each distribution from their corresponding sample. Let these estimates be $\mathbf{\hat{\theta}}_{i}.$ We can then carry out parametric bootstrapping by forming (\ref{Severalyboots}) but this time with each sample $\mathbf{y}_{i}^{(k)\ast },\ k=1,2,...,B,$ obtained by sampling from the fitted distribution $F^{(i)}(.,\mathbf{\hat{% \theta}}_{i}).$ The analysis then proceeds as before. \textbf{\large 6 Bayesian Models} \vspace{0.5cm}\newline Much of the current popularity of Bayesian methods comes about because increased computing power makes possible Bayesian analysis by numerical procedures, most notably Markov Chain Monte Carlo (MCMC). See Gilks et al. (1996).This allows numerical sampling to be carried out in much the same way as is done in bootstrap and other resampling procedures. We indicate the commonality by considering just one or two situations where Bayesian methods and resampling overlap. We again focus on a random sample $\mathbf{w}% =(w_{1}, $ $w_{2},$ $...,$ $w_{n})$ drawn from a distribution $F(w,$ $% \mathbf{\theta }).$ As in the situation where we considered ML estimation, the form of $F$ is known but it depends on a vector $\mathbf{\theta }$ of parameters. In the ML estimation situation $\mathbf{\theta }_{0},$the true value of $\mathbf{\theta },$ was assumed unknown. In the Bayesian case a prior distribution is assumed known. We consider just the continuous case and denote the pdf of the prior by $\pi (\mathbf{\theta }).$ The main step in Bayesian analysis is to construct the posterior distribution $\pi (% \mathbf{\theta }$\ $|\ \mathbf{w})$ which shows how the sample $\mathbf{w,}$ which depends $\mathbf{\theta },$ has modified the prior. The Bayesian formula is \begin{subequations} \begin{equation} \pi (\mathbf{\theta }\ |\ \mathbf{w})=\frac{p(\mathbf{w\ }|\ \mathbf{\theta }% )\pi (\mathbf{\theta })}{\int p(\mathbf{w\ }|\ \mathbf{\theta })\pi (\mathbf{% \theta })d\mathbf{\theta }}. \label{Bayestheorem} \end{equation}% The difficult part is the evaluation of the normalizing integral $\int p(% \mathbf{w\ }|\ \mathbf{\theta })\pi (\mathbf{\theta })d\mathbf{\theta }.$ MCMC has proved remarkably successful in providing a powerful numerical means of doing this. In the context of simulation a Bayesian approach is useful in assessing uncertainty concerning input parameter values used in a simulation model. We can adopt the viewpoint given in (\ref{ysim}) and regard the runs as taking the form: \end{subequations} \begin{equation} y_{j}=y(\mathbf{u}_{j},\mathbf{v}_{j},\mathbf{\theta },\mathbf{x}_{j})\text{% \ \ }j=1,2,...,n. \label{ysim2} \end{equation}% only now $\mathbf{\theta }$ is assumed to have a Bayesian prior distribution $\pi (\mathbf{\theta })$. Note that in this situation we do \emph{not} have the equivalent of data $\mathbf{w}$ from which to calculate a posterior distribution for $\mathbf{\theta }$. What we can do is to treat the prior $% \pi (\mathbf{\theta })$ as \emph{inducing a prior distribution} on $y=y(% \mathbf{u},\mathbf{v},\mathbf{\theta },\mathbf{x}).$ In this sense we can think of $y$ itself as having a prior which can then be estimated by sampling $\mathbf{\theta }$ from its prior $\pi (\mathbf{\theta }),$ yielding values $\mathbf{\theta }_{j}$ $j=1,2,...,n,$ and then running the model to produce a set of observations \begin{equation} y_{j}=y(\mathbf{u}_{j},\mathbf{v}_{j},\mathbf{\theta }_{j},\mathbf{x}_{j})% \text{\ \ }j=1,2,...,n. \end{equation}% The EDF of these $y_{j}$ then estimates this prior distribution of $y.$ This process clearly is the analogue to the classical situation where $\mathbf{% \theta }$ is estimated from data $\mathbf{w,}$ as is supposed in (\ref{ysim}% ), and we then allowed for this variability in the $y_{j}$ by replacing $% \mathbf{\hat{\theta}}(\mathbf{w})$ by $\mathbf{\theta }_{j}^{\ast }$ calculated from (\ref{Thetastarnormal}) or (\ref{Thetastarboot}). More interestingly we can take this a step further if there are real observations available of the $y$ process itself. We do not attempt a fully general formulation but give an example to indicate the possibilities. Suppose that \begin{equation*} y_{j}^{\text{\emph{Real}}}=y^{\text{\emph{Real}}}(\mathbf{x}_{j}),\ \ \ \ j=1,2,...,n \end{equation*}% comprise $n$ observations on the real system being modelled by the simulation. The precise relationship between the simulation output $y=y(% \mathbf{u},\mathbf{v},\mathbf{\theta },\mathbf{x})$ and $y^{\text{\emph{Real}% }}(\mathbf{x})$ is not known. However we might assume that \begin{equation*} y(\mathbf{u},\mathbf{v},\mathbf{\theta },\mathbf{x})\sim N(y^{\text{\emph{% Real}}}(\mathbf{x}),\ \sigma ^{2}) \end{equation*}% where $\sigma ^{2}$ is an additional parameter that will also be treated as Bayesian in the sense of having a prior. We can express our great prior uncertainty about $\sigma $ by assuming a \emph{reference prior distribution, }$\rho (\sigma ),$ for it. Then, by (\ref{Bayestheorem}), the posterior distribution of $(\mathbf{\theta },\sigma )$ is proportional to \begin{equation*} \sigma ^{-n}\pi (\mathbf{\theta })\rho (\sigma )\exp \{-\frac{1}{2\sigma ^{2}% }\sum_{j=1}^{n}[y(\mathbf{u},\mathbf{v},\mathbf{\theta },\mathbf{x}_{j})-y^{% \text{\emph{Real}}}(\mathbf{x}_{j})]^{2}\}. \end{equation*}% The posterior distribution can thus be obtained by MCMC methods for example. An interesting application of this problem occurs in epidemiological modelling. Suppose that $y$ is a measure of the progress of an epidemic that is dependent on factors $\mathbf{\theta }$ for which there are previous measurements or for which there is expert information. A Bayesian approach is then a natural way of incorporating this prior information. We need also to have a good epidemic model for producing a simulated $y$. Thus reverse use of simulation as indicated above has allowed this prior information to be updated. \textbf{{\large 7 \label{Timeseries}Time Series Output}} \vspace{0.5cm}% \newline Bootstrapping of time series is a well studied problem. In simulation the most likely use of such procedures is to generate correlated input for a model. As usual the parametric form is relatively easy to explain and implement and we discuss this first. \textbf{\large 7.1 Residual Sampling} \vspace{0.5cm}\newline We consider the case where the time-series is an autoregressive model. Here the residual sampling method used to construct a bootstrap metamodel applies with little change. Suppose we have the autorgressive model \begin{equation*} Y_{t}=\sum_{j=1}^{p}a_{j}Y_{t-j}+\varepsilon _{t} \end{equation*}% where the $\varepsilon _{t}$ are independently \ and identically distributed, commonly called the \emph{innovations. }Suppose that $% y_{1},y_{2},...,y_{n}$ are a series drawn from this model. Then we can estimate the $a_{j}$ by least squares say, yielding the estimates $\hat{a}% _{j},$ $j=1,2,...,p,$ and form the residuals \begin{equation*} r_{t}=y_{t}-\sum_{j=1}^{p}\hat{a}_{j}y_{t-j},\;\;t=p+1,p+2,...,n. \end{equation*}% We can then form the bootstrap time-series as \begin{equation*} y_{t}^{\ast }=\sum_{j=1}^{p}\hat{a}_{j}y_{t-j}^{\ast }+r_{t}^{\ast },\;\;t=1,2,...,n \end{equation*}% by sampling the $r_{t}^{\ast }$ from the EDF of the residuals $\{r_{t}\}.$ We need $y_{1}^{\ast },y_{2}^{\ast },...,y_{p}^{\ast }$ to initiate the process, but if we assume that the observed series is stationary, it is probably easiest to simply initiate the series with some arbitrary starting values, possibly the original $y_{1},y_{2},...,y_{p},$ then run the bootstrapping sufficiently long until initial effect are negligible and collect the actual $y_{t}^{\ast }$ from that point on. Freedman (1984) gives conditions for the asymptotic validity of this procedure, Basawa \emph{et al.% } (1989) extending these results to the case of nonlinear time-series. \textbf{\large 7.2 Block Sampling} \vspace{0.5cm}\newline For time-series the analogue of case sampling is \emph{block sampling.} We cannot sample individual observations $y_{t}$ because this loses the correlation between observations. If the series is long enough then we can take $n=bl$ and think of the series as comprising $b$ blocks each of length $% l.$ We write the $i$th block as $\mathbf{y}% _{i}=(y_{l(i-1)+1},y_{l(i-1)+2},...,y_{(li)})$ $i=1,2,...,b.$ Bootstrapping is done by sampling blocks with replacement from this set of $b$ blocks, retaining the order of the observations in each block when writing down the individual observations of the bootstrapped series. A balance needs to be struck between having block lengths long enough to retain the correlation properties between neighboring observations, and having enough blocks to measure the inherent variation of the series. A typical compromise is to use say $b=l=n^{1/2}$ so that both quantities tend to infinity as $n\rightarrow \infty .$ A major weakness of block sampling is the loss of correlation incurred by the random sampling of blocks. This loss of correlation is called \emph{whitening}. It is especially serious when the statistic of interest involves correlations of high lag. The crude block sampling just described may be quite ineffective if the size of blocks is not large enough, because calculation involves quantities which straddle blocks and which are then not sufficiently correlated because of the whitening. There are many variants of block sampling aimed at reducing the effect of whitening in specific situations. A good example is estimation of the lag $m$ covariance \begin{equation*} c_{m}=\frac{1}{n-m}\sum_{t=1}^{n-m}(y_{t}-\bar{y})(y_{t+m}-\bar{y}). \end{equation*}% Here we can define a two-dimensional process \begin{equation*} \mathbf{z}_{t}=\left( \begin{array}{c} z_{1t} \\ z_{2t}% \end{array}% \right) =\left( \begin{array}{c} y_{t} \\ y_{t+m}% \end{array}% \right) ,\;\;t=1,2,..,n-m \end{equation*}% with $\bar{z}_{i}=(n-m)^{-1}\sum_{t=1}^{n-m}z_{it}$ and think of $c_{1}$ as a statistic of this process \begin{equation*} c_{1}=\frac{1}{n-1}\sum_{t=1}^{n-1}(z_{1t}-\bar{z}_{1})(z_{2t}-\bar{z}_{2}). \end{equation*}% We can then obtain bootstrap $\mathbf{z}_{t}^{\ast }$ by sampling with replacement from the set $\{\mathbf{z}_{t}\}.$The bootstrap lag $m$ covariance then clearly substantially retains the covariance of the original series as we have, in effect, bootstrap sampled the terms $(y_{t}-\bar{y}% )(y_{t+m}-\bar{y})$ appearing in the formula giving $c_{m}.$ Generalizations of this technique are known as \emph{block of blocks sampling}. \textbf{\large 7.3 Spectral resampling} \vspace{0.5cm}\newline Residual and block sampling are time domain techniques. An alternative approach is to sample in the frequency domain. A big advantage is that spectral increments are uncorrelated, and for Gaussian processes this strengthens to independent increments. Suppose we have $n=2m+1$ observations \begin{equation*} y_{t},\;t=-m,-m+1,...,m-1,m \end{equation*}% for which there is a continuous spectral density $S(\omega )$ and define the frequencies $\omega _{k}=2\pi k/n,$ $-m\leq k\leq m.$ Then the observations have the spectral representation \begin{equation*} y_{t}=\sum_{k=-m}^{m}a_{k}e^{i\omega _{k}t} \end{equation*}% where \begin{equation*} a_{k}=n^{-1}\sum_{t=-m}^{m}y_{t}e^{-i\omega _{k}t}. \end{equation*}% In this section $i=\sqrt{-1}.$ For a Gaussian process the real and purely imaginary components of the $a_{k}$ are independent, and the $a_{k}$ are asymptotically so. Norgaard (1992) gives two possible ways of obtaining bootstrap samples of the $a_{k.}$ The simplest version is to draw $% a_{k}^{\ast }$ at random from one of the twelve elements \begin{equation} (\pm a_{k-1},\;\pm a_{k},\;\pm a_{k+1},\;\pm ia_{k-1},\;\pm ia_{k},\;\pm ia_{k+1}) \end{equation}% when $0