\beta _{0i}$
where $\beta _{0i}>0$ are given constants. Let $S=\{\beta _{i}:\left\vert
\beta _{i}\right\vert >\beta _{0i}\}$ denote the \emph{important set of
coefficients}. The obvious estimate is to select those coefficients $\beta
_{i}$ for which $\left\vert \hat{\beta}_{i}\right\vert >\beta _{0i},$i.e. $%
\hat{S}=\{\beta _{i}:\left\vert \hat{\beta}_{i}\right\vert >\beta _{0i}\}$.
Bootstrapping is a simple way of assessing the stability of this choice. We
generate $B$ bootstrap samples $(y_{j}^{(k)\ast },\mathbf{x}_{j}^{(k)\ast })$
$j=1,2,...,n,$ $k=1,2,...,B,$ using either residual sampling or case
sampling say and fit the regression model to each bootstrap sample to give
bootstrap estimates $\mathbf{\hat{\beta}}^{(k)\ast },\ k=1,2,...,B.$ We then
calculate $\hat{S}^{(k)\ast }=\{\beta _{i}:\left\vert \hat{\beta}%
_{i}^{(k)\ast }\right\vert >\beta _{0i}\},$ $k=1,2,...,K.$ Now assume that $%
B $ is sufficiently large so that each distinct bootstrap important set that
has been obtained occurs a reasonable number of times. Then a $(1-\alpha )$
confidence region for the unknown true important set can be constructed by
selecting bootstrap important sets in decreasing order of their observed
frequency of occurrence until $(1-\alpha )100\%$ of the $\hat{S}^{(k)\ast }$
have been chosen. Related to identifying important factors in metamodels is
the rather more difficult problem of metamodel selection. We consider this
in the next section.
\textbf{{\large 4.6 \label{Metamodelselect}Metamodel Comparison and
Selectionodels}} \vspace{0.5cm}\newline
Metamodel comparison and selection is a difficult subject. The main trouble
is that models that we wish to compare may be of different functional
complexity. The statistical properties associated with quantities derived
from different models may therefore be hard to put on a common scale on
which they can be compared. A reasonably satisfactory approach is based on
the use of \emph{cross-validation} technique. We suppose that the initial
data has the form
\begin{equation*}
S=\{(y_{j},\mathbf{x}_{j}),\ \ j=1,2,...,n\}
\end{equation*}%
with
\begin{equation*}
y_{j}=\eta (\mathbf{x}_{j},\mathbf{\beta })+\varepsilon _{j},\ \ j=1,2,...,n.
\end{equation*}%
Suppose that our fitted regression is
\begin{equation*}
\hat{y}(\mathbf{x})=\eta (\mathbf{x},\mathbf{\hat{\beta}}).
\end{equation*}%
In its basic form cross-validation is used to assess how effective the
fitted regression is for predicting the response at some new design point $%
\mathbf{x}_{new}.$ Rather than explicitly choosing such a new point, a
simple idea is the \emph{leave-one-out} method where we drop an observed
point $(y_{j},\mathbf{x}_{j})$ from the set of all observed points, fit the
metamodel to the remaining points $S_{-j}=$ $S\smallsetminus (y_{j},\mathbf{x%
}_{j})$ giving the fitted regression as
\begin{equation*}
\hat{y}_{-j}(\mathbf{x})=\eta (\mathbf{x},\mathbf{\hat{\beta}}_{-j}),
\end{equation*}%
and then look at the squared difference between the omitted $y_{j}$ and the
value of $y$ at $\mathbf{x}_{j}$, as predicted by the fitted model; i.e.
\begin{equation*}
(y_{j}-\hat{y}_{-j}(\mathbf{x}_{j}))^{2}.
\end{equation*}%
If we do this even handedly by leaving out each observation in turn we have
as an \emph{estimate of cross-validation prediction error}:
\begin{equation}
\hat{L}_{CV}=n^{-1}\sum_{j=1}^{n}(y_{j}-\hat{y}_{-j}(\mathbf{x}_{j}))^{2}.
\label{Lcv}
\end{equation}%
It turns out that, for the linear regression model, this formula simplifies
elegantly to one where we only need comparisons with the one model fitted to
all the original observations:
\begin{equation*}
\hat{L}_{CV}=n^{-1}\sum_{j=1}^{n}\frac{(y_{j}-\hat{y}(\mathbf{x}_{j}))^{2}}{%
1-h_{jj}}
\end{equation*}%
where
\begin{equation*}
\hat{y}(\mathbf{x}_{j})=\mathbf{x}_{j}^{T}\mathbf{\hat{\beta}},
\end{equation*}%
and $h_{jj}$ is the $j$th main diagonal entry in the hat-matrix (\ref%
{Hatmatrix}), as before. The distributional property of $\hat{L}_{CV}$ can
be obtained in the usual way by bootstrapping to get $B$ bootstrap samples.
If we use case resampling say then
\begin{equation*}
S^{(i)\ast }=\{(y_{j}^{(i)\ast },\mathbf{x}_{j}^{(i)\ast }),\ \ j=1,2,...,n\}
\end{equation*}%
where each $(y_{j}^{(i)\ast },\mathbf{x}_{j}^{(i)\ast })$ is an observation
drawn at random from the set $S.$ We turn now to model selection. For
simplicity we shall only consider the linear case where $y=\mathbf{x}^{T}%
\mathbf{\beta }+\varepsilon $, though with an obvious adjustment, the
discussion does generalize. Let $\mathbf{\beta }$ be of dimension $p$. If we
are uncertain as to which design variables are really needed in the model,
then we would have in principle up to $2^{p}$ models to choose from with $%
\binom{p}{q}$ (sub)models where there were precisely $q$ covariates
selected. We denote a typical submodel by $M$. The estimate of prediction
error using (\ref{Lcv}) is
\begin{equation*}
\hat{L}_{CV}(M)=n^{-1}\sum_{j=1}^{n}\frac{(y_{j}-\hat{y}_{M}(\mathbf{x}%
_{j}))^{2}}{1-h_{Mjj}}.
\end{equation*}%
It turns out that this measure does not work as well as it might even when $%
n $ is large. (This is an example previously referred to in subsection (\ref%
{Failure}) where inconsistency arises not in the problem itself but in the
bootstrap sampling method.) A much better variant is not to leave out just
one observation at a time but instead to split the observations into two: a
\emph{training} set, and an \emph{assessment} set with respectively $%
n_{t}=n-m$ and $n_{a}=m$ observations in each. We shall moreover select not
one but $K$ such pairs and denote the sets as $S_{t,k}$ and $S_{a,k}$, $%
k=1,2,...,K.$ We shall write $\mathbf{\hat{\beta}}_{Mk}$ for the
coefficients of model $M$ fitted to the $k$th training set, and write $\hat{y%
}_{Mjk}=\mathbf{x}_{Mj}^{T}\mathbf{\hat{\beta}}_{Mk}$ for the value of $%
\mathbf{x}_{j}^{T}\mathbf{\beta }$ predicted by this model $M$ at $\mathbf{x}%
_{j}.$ Then $\hat{L}_{CV}(M)$ becomes
\begin{equation}
\hat{L}_{CV}(M)=K^{-1}\sum_{k=1}^{K}m^{-1}\sum_{j\in S_{a,k}}(y_{j}-\hat{y}%
_{Mjk})^{2}. \label{Lcvmhat}
\end{equation}%
We use the same set of $K$ pairs for all models being compared. \ Provided $%
n-m\rightarrow \infty $ and $m/n\rightarrow 1$ as $n\rightarrow \infty $
then selecting $M$ to minimize (\ref{Lcvmhat}) will yield a consistent
estimator for the correct model. When $n$ is small it may not be possible to
select $m$ large enough in which case Davison and Hinkley (1997) suggest
taking $m/n\simeq 2/3.$ The variability of $\hat{L}_{CV}(M)$ can be
estimated by bootstrapping in the usual way.
\textbf{{\large 5 \label{Comparisons}Bootstrap Comparisons}} \vspace{0.5cm}%
\newline
In this Section we consider problems where there are a number of different
samples to compare. We suppose that we have $m$ data sets%
\begin{equation}
\mathbf{y}_{i}=(y_{i1},y_{i2},...,y_{i,n_{i}}),\ \ i=1,2,...,m
\label{Severalys}
\end{equation}%
the $i$th being of size $n_{i}$ runs. This situation covers two different
scenarios in particular. The first is where we are considering the question:
Does the output of the simulation model accurately represent the output of
the system being modelled? Here we have just two data sets, so that $m=2,$
and one data set comprises observations of a real system whilst the other
comprises the observed output from a simulation model. Thus this is a
question of validation. The second is when we have a number of different
simulation models to assess. Here validation against real data is not the
issue. The different models simply represent different systems, and their
validity is not in doubt. Instead we are just interested in assessing how
differently the systems behave, as represented by the output of their
simulation models. Typically we might be interested in which system produces
the greatest average output, or least average delay. Alternatively we might
wish to compare output variability. A similar methodology can be applied to
either problem. We discuss this in the remainder of this section.
\textbf{\large 5.1 Goodness-of-Fit and Validation} \vspace{0.5cm}\newline
A discussion of validation involving trace-driven simulation is described in
Kleijnen \emph{et al.} ( 2000, 2001). The basic idea used there can be
generalized. We treat the problem as essentially one of goodness-of-fit. A
simple starting point is where we have $n$ simulation outputs $y_{j},$ $%
j=1,2,...,n,$ to compare with $m$ observations $y_{j}^{\text{\emph{Real}}},$
$j=1,2,...,m$ of a real system and we wish to know if the two samples are
identically distributed. A good goodness-of-fit statistic is the two sample
Cramer - von Mises statistic (see Anderson, 1962):
\begin{equation}
W^{2}=\int [F_{n}(y)-F_{m}^{\text{\emph{Real}}}(y)]^{2}dH_{n+m}(y)
\label{Cvm}
\end{equation}%
where $F_{n},$ $F_{m}^{\text{\emph{Real}}}$ and $H_{n+m}$ are respectively
the EDF's of the simulated, real and combined samples. The asymptotic
distribution of $W^{2}$ is known when the samples are drawn from continuous
distributions, but it is easy to set up a bootstrap version of the test that
can handle discrete distributions just as well. We can use a bootstrapping
method as follows. We obtain $B$ pairs of bootstrap samples $(\mathbf{y}%
^{(i)\ast },$ $\mathbf{y}^{\text{\emph{Real}}(i)\ast })$, $i=1,2,...,B$,
each sample in the pair being obtained by resampling with replacement from $%
\mathbf{y}$ and $\mathbf{y}^{\text{\emph{Real}}}$ combined but with $\mathbf{%
y}^{(i)\ast }$ the same sample size as $\mathbf{y}$, and $\mathbf{y}^{\text{%
\emph{Real}}(i)\ast }$ the same sample size as $\mathbf{y}^{\text{\emph{Real}%
}}$. Denote the EDFs of the samples of each pair by $F_{n}^{(i)\ast }$ and $%
F_{n}^{\text{\emph{Real}}(i)\ast },$ $i=1,2,...,B$. We can now calculate $B$
bootstrap two sample Cramer - von Mises statistics from each pair:
\begin{equation}
W^{(i)\ast 2}=\int [F_{n}^{(i)\ast }(y)-F_{m}^{\text{\emph{Real}}(i)\ast
}(y)]^{2}dH_{n+m}^{(i)\ast }(y). \label{Cvmboot}
\end{equation}%
Under the null assumption that $\mathbf{y}$ and $\mathbf{y}^{\text{\emph{Real%
}}}$ are drawn from the same distribution it follows that each $W^{(i)\ast
2} $ is just a bootstrap version of $W^{2}$. We can thus calculate a
critical p-value from the EDF of the $W^{(i)\ast 2},$ $i=1,2,...,B$. The
null hypothesis that $\mathbf{y}$ and $\mathbf{y}^{\text{\emph{Real}}}$ are
drawn from the same distribution can be tested simply by checking if the
original $W^{2}$ exceeds this p-value or not. The validation method just
described is easily extended to allow the representational accuracy of a
number of different competing simulation models to be compared against (the
same) real data. We order the Cramer von-Mises goodness of fit statistics (%
\ref{Cvm}) obtained from comparing the real data with the simulated output
of each of the models. We can assess the reliability of the comparison by
bootstrapping as in (\ref{Cvmboot}) to obtain bootstrap distributions of
each such Cramer von-Mises statistic and looking at the degree of overlap of
amongst these distributions. When the $W^{2}$ statistic shows the data
samples $\mathbf{y}_{j}$ to be significantly different, there is an
interesting decomposition of $W^{2}$ that allows the nature of the
differences to be more clearly identified . Durbin and Knott (1972) show
that in the one sample case where $F_{n}(.)$ is being compared with a
uniform null distribution, so that $W^{2}=n$ $\int [F_{n}(y)-y]^{2}dy,$ then
$W^{2}$ has the orthogonal representation%
\begin{equation*}
W^{2}=\sum_{j=1}^{\infty }(j\pi )^{-2}z_{nj}^{2}
\end{equation*}%
where the $z_{nj}$ are scaled versions of the coefficients in the Fourier
decomposition of the process$\sqrt{n}$ $(F_{n}(y)-y).$ The $z_{nj}$ are
stochastic of course, depending on $F_{n}(.),$ but they are independent and,
under the null, have identical distributions. It can be shown that $z_{n1},$
$z_{n2}$ and $z_{n3}$ are dependent essentially exclusively on deviations
respectively of the mean, variance and skewness of $F()$ from that of the
null. In other words these components $z_{n1},$ $z_{n2}$ and $z_{n3}$ can be
used to provide convenient statistical tests for these differences. Cheng
and Jones (2000, 2004) describe a generalisation of the results of Durbin
and Knott (1972), which they term \emph{EDFIT statistics,} suitable for
comparison of data of the form (\ref{Severalys}) arising in the simulation
context. Their formulation uses ranks rather than the original. Though this
leads to some loss of power the method does then have the advantage of
enabling critical values of tests to be easily obtained by bootstrapping,
and moreover in an exact way.
\textbf{\large 5.2 Comparison of Different Systems} \vspace{0.5cm}\newline
We now consider the situation of (\ref{Severalys}) where the data sets are
outputs from different simulation models only, and no comparison is made
with real data. Here we are interested simply in making comparisons between
different models. The discussion of the previous subsection goes through
with no essential change. The only difference is that all samples have the
same status; there is no sample being singled out as special by being drawn
from a real system. The EDFIT approach can therefore be used directly, with
bootstrapping providing critical null values. A more direct approach, not
using goodness of fit ideas, is possible. We focus on examining differences
between the means, $\bar{y}_{i},$ $i=1,2,..,m,$ of the $m$ samples of (\ref%
{Severalys}). Comparison of other sample statistics be handled in the same
way. If we suppose that largest is best we can rank the means as $\bar{y}%
_{(1)}>\bar{y}_{(2)}>...>\bar{y}_{(m)}.$ The question then is how stable is
this order? Bootstrapping provides an easy answer. We generate $B$ bootstrap
sets of observations%
\begin{eqnarray}
\mathbf{y}_{i}^{(k)\ast } &=&(y_{i1}^{(k)\ast },y_{i2}^{(k)\ast
},...,y_{i,n_{i}}^{(k)\ast }),\ \ \label{Severalyboots} \\
i &=&1,2,...,m,\ \ k=1,2,...,B \notag
\end{eqnarray}%
where each sample $\mathbf{y}_{i}^{(k)\ast },\ k=1,2,...,B,$ is obtained by
sampling with replacement from $\mathbf{y}_{i}.$ We then order the means in
each bootstrapped set: $\bar{y}_{(1)}^{(k)\ast }>\bar{y}_{(2)}^{(k)\ast
}>...>\bar{y}_{(m)}^{(k)\ast }$, $k=1,2,...,B.$ The frequency count of how
many times the mean of a given sample comes out on top is a measure its
relative merit. A more comprehensive picture is provided by setting up a
two-way table showing the number of times $\bar{y}_{i}^{(k)\ast }>\bar{y}%
_{j}^{(k)\ast }$ out of the $B$ bootstrapped sets, for each possible pair $%
1\leq i,$ $j\leq m$. The parametric form of the bootstrapping procedure is
equally easy to implement, though it is not clear there is much advantage to
be had in doing so. Suppose for example that the $i$th sample $\mathbf{y}%
_{i} $ is drawn from the distribution $F^{(i)}(.,\mathbf{\theta }_{i}).$
Usually the $F^{(i)}(.,\mathbf{\theta }_{i})$ will have the same functional
form, for example all normal distributions, but the procedure works equally
well when the $F^{(i)}$ are functionally different. Let the mean of the $%
F^{(i)}$ distribution be $\mu ^{(i)}(\mathbf{\theta }_{i}).$ Then we can
estimate, by ML say, the parameters of each distribution from their
corresponding sample. Let these estimates be $\mathbf{\hat{\theta}}_{i}.$ We
can then carry out parametric bootstrapping by forming (\ref{Severalyboots})
but this time with each sample $\mathbf{y}_{i}^{(k)\ast },\ k=1,2,...,B,$
obtained by sampling from the fitted distribution $F^{(i)}(.,\mathbf{\hat{%
\theta}}_{i}).$ The analysis then proceeds as before.
\textbf{\large 6 Bayesian Models} \vspace{0.5cm}\newline
Much of the current popularity of Bayesian methods comes about because
increased computing power makes possible Bayesian analysis by numerical
procedures, most notably Markov Chain Monte Carlo (MCMC). See Gilks et al.
(1996).This allows numerical sampling to be carried out in much the same way
as is done in bootstrap and other resampling procedures. We indicate the
commonality by considering just one or two situations where Bayesian methods
and resampling overlap. We again focus on a random sample $\mathbf{w}%
=(w_{1}, $ $w_{2},$ $...,$ $w_{n})$ drawn from a distribution $F(w,$ $%
\mathbf{\theta }).$ As in the situation where we considered ML estimation,
the form of $F$ is known but it depends on a vector $\mathbf{\theta }$ of
parameters. In the ML estimation situation $\mathbf{\theta }_{0},$the true
value of $\mathbf{\theta },$ was assumed unknown. In the Bayesian case a
prior distribution is assumed known. We consider just the continuous case
and denote the pdf of the prior by $\pi (\mathbf{\theta }).$ The main step
in Bayesian analysis is to construct the posterior distribution $\pi (%
\mathbf{\theta }$\ $|\ \mathbf{w})$ which shows how the sample $\mathbf{w,}$
which depends $\mathbf{\theta },$ has modified the prior. The Bayesian
formula is
\begin{subequations}
\begin{equation}
\pi (\mathbf{\theta }\ |\ \mathbf{w})=\frac{p(\mathbf{w\ }|\ \mathbf{\theta }%
)\pi (\mathbf{\theta })}{\int p(\mathbf{w\ }|\ \mathbf{\theta })\pi (\mathbf{%
\theta })d\mathbf{\theta }}. \label{Bayestheorem}
\end{equation}%
The difficult part is the evaluation of the normalizing integral $\int p(%
\mathbf{w\ }|\ \mathbf{\theta })\pi (\mathbf{\theta })d\mathbf{\theta }.$
MCMC has proved remarkably successful in providing a powerful numerical
means of doing this. In the context of simulation a Bayesian approach is
useful in assessing uncertainty concerning input parameter values used in a
simulation model. We can adopt the viewpoint given in (\ref{ysim}) and
regard the runs as taking the form:
\end{subequations}
\begin{equation}
y_{j}=y(\mathbf{u}_{j},\mathbf{v}_{j},\mathbf{\theta },\mathbf{x}_{j})\text{%
\ \ }j=1,2,...,n. \label{ysim2}
\end{equation}%
only now $\mathbf{\theta }$ is assumed to have a Bayesian prior distribution
$\pi (\mathbf{\theta })$. Note that in this situation we do \emph{not} have
the equivalent of data $\mathbf{w}$ from which to calculate a posterior
distribution for $\mathbf{\theta }$. What we can do is to treat the prior $%
\pi (\mathbf{\theta })$ as \emph{inducing a prior distribution} on $y=y(%
\mathbf{u},\mathbf{v},\mathbf{\theta },\mathbf{x}).$ In this sense we can
think of $y$ itself as having a prior which can then be estimated by
sampling $\mathbf{\theta }$ from its prior $\pi (\mathbf{\theta }),$
yielding values $\mathbf{\theta }_{j}$ $j=1,2,...,n,$ and then running the
model to produce a set of observations
\begin{equation}
y_{j}=y(\mathbf{u}_{j},\mathbf{v}_{j},\mathbf{\theta }_{j},\mathbf{x}_{j})%
\text{\ \ }j=1,2,...,n.
\end{equation}%
The EDF of these $y_{j}$ then estimates this prior distribution of $y.$ This
process clearly is the analogue to the classical situation where $\mathbf{%
\theta }$ is estimated from data $\mathbf{w,}$ as is supposed in (\ref{ysim}%
), and we then allowed for this variability in the $y_{j}$ by replacing $%
\mathbf{\hat{\theta}}(\mathbf{w})$ by $\mathbf{\theta }_{j}^{\ast }$
calculated from (\ref{Thetastarnormal}) or (\ref{Thetastarboot}). More
interestingly we can take this a step further if there are real observations
available of the $y$ process itself. We do not attempt a fully general
formulation but give an example to indicate the possibilities. Suppose that
\begin{equation*}
y_{j}^{\text{\emph{Real}}}=y^{\text{\emph{Real}}}(\mathbf{x}_{j}),\ \ \ \
j=1,2,...,n
\end{equation*}%
comprise $n$ observations on the real system being modelled by the
simulation. The precise relationship between the simulation output $y=y(%
\mathbf{u},\mathbf{v},\mathbf{\theta },\mathbf{x})$ and $y^{\text{\emph{Real}%
}}(\mathbf{x})$ is not known. However we might assume that
\begin{equation*}
y(\mathbf{u},\mathbf{v},\mathbf{\theta },\mathbf{x})\sim N(y^{\text{\emph{%
Real}}}(\mathbf{x}),\ \sigma ^{2})
\end{equation*}%
where $\sigma ^{2}$ is an additional parameter that will also be treated as
Bayesian in the sense of having a prior. We can express our great prior
uncertainty about $\sigma $ by assuming a \emph{reference prior
distribution, }$\rho (\sigma ),$ for it. Then, by (\ref{Bayestheorem}), the
posterior distribution of $(\mathbf{\theta },\sigma )$ is proportional to
\begin{equation*}
\sigma ^{-n}\pi (\mathbf{\theta })\rho (\sigma )\exp \{-\frac{1}{2\sigma ^{2}%
}\sum_{j=1}^{n}[y(\mathbf{u},\mathbf{v},\mathbf{\theta },\mathbf{x}_{j})-y^{%
\text{\emph{Real}}}(\mathbf{x}_{j})]^{2}\}.
\end{equation*}%
The posterior distribution can thus be obtained by MCMC methods for example.
An interesting application of this problem occurs in epidemiological
modelling. Suppose that $y$ is a measure of the progress of an epidemic that
is dependent on factors $\mathbf{\theta }$ for which there are previous
measurements or for which there is expert information. A Bayesian approach
is then a natural way of incorporating this prior information. We need also
to have a good epidemic model for producing a simulated $y$. Thus reverse
use of simulation as indicated above has allowed this prior information to
be updated.
\textbf{{\large 7 \label{Timeseries}Time Series Output}} \vspace{0.5cm}%
\newline
Bootstrapping of time series is a well studied problem. In simulation the
most likely use of such procedures is to generate correlated input for a
model. As usual the parametric form is relatively easy to explain and
implement and we discuss this first.
\textbf{\large 7.1 Residual Sampling} \vspace{0.5cm}\newline
We consider the case where the time-series is an autoregressive model. Here
the residual sampling method used to construct a bootstrap metamodel applies
with little change. Suppose we have the autorgressive model
\begin{equation*}
Y_{t}=\sum_{j=1}^{p}a_{j}Y_{t-j}+\varepsilon _{t}
\end{equation*}%
where the $\varepsilon _{t}$ are independently \ and identically
distributed, commonly called the \emph{innovations. }Suppose that $%
y_{1},y_{2},...,y_{n}$ are a series drawn from this model. Then we can
estimate the $a_{j}$ by least squares say, yielding the estimates $\hat{a}%
_{j},$ $j=1,2,...,p,$ and form the residuals
\begin{equation*}
r_{t}=y_{t}-\sum_{j=1}^{p}\hat{a}_{j}y_{t-j},\;\;t=p+1,p+2,...,n.
\end{equation*}%
We can then form the bootstrap time-series as
\begin{equation*}
y_{t}^{\ast }=\sum_{j=1}^{p}\hat{a}_{j}y_{t-j}^{\ast }+r_{t}^{\ast
},\;\;t=1,2,...,n
\end{equation*}%
by sampling the $r_{t}^{\ast }$ from the EDF of the residuals $\{r_{t}\}.$
We need $y_{1}^{\ast },y_{2}^{\ast },...,y_{p}^{\ast }$ to initiate the
process, but if we assume that the observed series is stationary, it is
probably easiest to simply initiate the series with some arbitrary starting
values, possibly the original $y_{1},y_{2},...,y_{p},$ then run the
bootstrapping sufficiently long until initial effect are negligible and
collect the actual $y_{t}^{\ast }$ from that point on. Freedman (1984) gives
conditions for the asymptotic validity of this procedure, Basawa \emph{et al.%
} (1989) extending these results to the case of nonlinear time-series.
\textbf{\large 7.2 Block Sampling} \vspace{0.5cm}\newline
For time-series the analogue of case sampling is \emph{block sampling.} We
cannot sample individual observations $y_{t}$ because this loses the
correlation between observations. If the series is long enough then we can
take $n=bl$ and think of the series as comprising $b$ blocks each of length $%
l.$ We write the $i$th block as $\mathbf{y}%
_{i}=(y_{l(i-1)+1},y_{l(i-1)+2},...,y_{(li)})$ $i=1,2,...,b.$ Bootstrapping
is done by sampling blocks with replacement from this set of $b$ blocks,
retaining the order of the observations in each block when writing down the
individual observations of the bootstrapped series. A balance needs to be
struck between having block lengths long enough to retain the correlation
properties between neighboring observations, and having enough blocks to
measure the inherent variation of the series. A typical compromise is to use
say $b=l=n^{1/2}$ so that both quantities tend to infinity as $n\rightarrow
\infty .$ A major weakness of block sampling is the loss of correlation
incurred by the random sampling of blocks. This loss of correlation is
called \emph{whitening}. It is especially serious when the statistic of
interest involves correlations of high lag. The crude block sampling just
described may be quite ineffective if the size of blocks is not large
enough, because calculation involves quantities which straddle blocks and
which are then not sufficiently correlated because of the whitening. There
are many variants of block sampling aimed at reducing the effect of
whitening in specific situations. A good example is estimation of the lag $m$
covariance
\begin{equation*}
c_{m}=\frac{1}{n-m}\sum_{t=1}^{n-m}(y_{t}-\bar{y})(y_{t+m}-\bar{y}).
\end{equation*}%
Here we can define a two-dimensional process
\begin{equation*}
\mathbf{z}_{t}=\left(
\begin{array}{c}
z_{1t} \\
z_{2t}%
\end{array}%
\right) =\left(
\begin{array}{c}
y_{t} \\
y_{t+m}%
\end{array}%
\right) ,\;\;t=1,2,..,n-m
\end{equation*}%
with $\bar{z}_{i}=(n-m)^{-1}\sum_{t=1}^{n-m}z_{it}$ and think of $c_{1}$ as
a statistic of this process
\begin{equation*}
c_{1}=\frac{1}{n-1}\sum_{t=1}^{n-1}(z_{1t}-\bar{z}_{1})(z_{2t}-\bar{z}_{2}).
\end{equation*}%
We can then obtain bootstrap $\mathbf{z}_{t}^{\ast }$ by sampling with
replacement from the set $\{\mathbf{z}_{t}\}.$The bootstrap lag $m$
covariance then clearly substantially retains the covariance of the original
series as we have, in effect, bootstrap sampled the terms $(y_{t}-\bar{y}%
)(y_{t+m}-\bar{y})$ appearing in the formula giving $c_{m}.$ Generalizations
of this technique are known as \emph{block of blocks sampling}.
\textbf{\large 7.3 Spectral resampling} \vspace{0.5cm}\newline
Residual and block sampling are time domain techniques. An alternative
approach is to sample in the frequency domain. A big advantage is that
spectral increments are uncorrelated, and for Gaussian processes this
strengthens to independent increments. Suppose we have $n=2m+1$ observations
\begin{equation*}
y_{t},\;t=-m,-m+1,...,m-1,m
\end{equation*}%
for which there is a continuous spectral density $S(\omega )$ and define the
frequencies $\omega _{k}=2\pi k/n,$ $-m\leq k\leq m.$ Then the observations
have the spectral representation
\begin{equation*}
y_{t}=\sum_{k=-m}^{m}a_{k}e^{i\omega _{k}t}
\end{equation*}%
where
\begin{equation*}
a_{k}=n^{-1}\sum_{t=-m}^{m}y_{t}e^{-i\omega _{k}t}.
\end{equation*}%
In this section $i=\sqrt{-1}.$ For a Gaussian process the real and purely
imaginary components of the $a_{k}$ are independent, and the $a_{k}$ are
asymptotically so. Norgaard (1992) gives two possible ways of obtaining
bootstrap samples of the $a_{k.}$ The simplest version is to draw $%
a_{k}^{\ast }$ at random from one of the twelve elements
\begin{equation}
(\pm a_{k-1},\;\pm a_{k},\;\pm a_{k+1},\;\pm ia_{k-1},\;\pm ia_{k},\;\pm
ia_{k+1})
\end{equation}%
when $0