Munich Personal RePEc Archive
Volatility Proxies for Discrete Time
Models
Robin G. de Vilder and Marcel P. Visser
Korteweg-de Vries Instute for Mathematics, University of
Amsterdam
14. September 2007
Download paper at IDEAS
MPRA Paper No. 4917, posted 14. September 2007 / 11:40

Page 2
Volatility proxies for discrete time models
Robin G. de Vilder
Marcel P. Visser
September 14, 2007
Abstract
Discrete time volatility models typically employ a latent scale factor to represent
volatility. High frequency data may be used to construct proxies for these scale factors.
Examples are the intraday high-low range and the realized volatility. This paper devel-
ops a method for ranking and optimizing volatility proxies. It is possible to outperform
the quadratic variation as a proxy for the discrete time scale factor. For the S&P 500
index data over the years 1988-2006 this is achieved by a proxy which puts, among
other things, more weight on the highs than on the lows over intraday intervals.
JEL classification: C22,C52,C65.
Keywords: volatility proxy, realized volatility, quadratic variation, scale factor,
arch/garch/stochastic volatility, intraday seasonality.
PSE (CNRS-EHESS-ENPC-ENS), Paris, France and Korteweg-de Vries Institute for Mathematics, Uni-
versity of Amsterdam, The Netherlands.
Corresponding author, Korteweg-de Vries Institute for Mathematics, University of Amsterdam. Plantage
Muidergracht 24, 1018 TV Amsterdam, The Netherlands. Tel. +31 20 5255861. Email: m.p.visser@uva.nl.
1

Page 3
1 Introduction
Volatility refers to the degree to which financial prices tend to fluctuate. It changes over
time and financial time series display periods of low and high volatility. A large and widely
used body of discrete time models for dealing with time varying volatility has a product
structure
r
n
= s
n
Z
n
.
(1)
The observed financial return r
n
is modelled as the product of an iid innovation Z
n
and a
positive scale factor s
n
. The scale factor is generally called the volatility. Attention typically
centers on the volatility process (s
n
), specifications for which include Arch/Garch, stochastic
volatility, long memory, and Markov switching.
Since only the returns r
n
are observed, the values of the scale factors s
n
have to be
estimated. High frequency data may be used to construct proxies for the scale factor s
n
,
such as the intraday high-low range or the realized volatility. Proxies serve as measurements
of the scale factor. Good proxies are needed for efficient parameter estimation (see, e.g.,
Alizadeh, Brandt, and Diebold (2002)), they are essential for volatility forecast evaluation
(see, for instance, Andersen and Bollerslev (1998), and Hansen and Lunde (2006a)), and
good proxies will help to improve the economic understanding of the volatility process.
There is no straightforward way to tell, for any two proxies, which one is better, since
the scale factors s
n
are unobservable. Much of the current research focuses on efficiently
estimating the quadratic variation, which is (the limit of) the sums of squared intraday
returns, see, for instance, Barndorff-Nielsen and Shephard (2002), and Andersen, Bollerslev,
Diebold and Labys (2003). However, we shall see that the quadratic variation does not
necessarily lead to the most efficient proxy for the discrete time scale factor s
n
.
This paper proposes a way to rank and optimize proxies for the daily scale factor s
n
,
based on intraday high frequency data. We take three steps: extend the discrete time models
(1) to a generic continuous time model with intraday seasonality; fix a large class of volatility
proxies; and develop tools for ranking and optimizing those proxies.
We integrate high frequency data into the discrete time models (1), by replacing the iid
innovations (Z
n
) by iid stochastic processes (Ψ
n
) on the unit time interval:
R
n
(ϑ) = s
n
Ψ
n
(ϑ)
ϑ ∈ [0, 1].
(2)
The continuous time models describe, for each day, the daily log return process R
n
from the
opening until closing, starting from the overnight return at the opening. The process Ψ
n
2

Page 4
may be any stochastic process, and is scaled by s
n
resulting in a bridge between the discrete
time, close-to-close returns r
n
in (1). These continuous time models are consistent with both
the persistence of volatility captured by discrete time volatility models, and with widely
observed empirical facts such as intraday seasonalities, leverage effects, and jumps. As in
the discrete time model, s
n
is a state of the market representing the level of the day-specific
price sensitivity. The actual fluctuations in the day n return process R
n
are the result of
the local fluctuations in the process Ψ
n
, scaled by the day-specific factor s
n
. The structure
in (2) is not a model of constant intraday volatility: depending on Ψ
n
it is possible to have
a hectic day (for instance, a large quadratic variation) when s
n
is low, and vice versa. For
discrete time models the term volatility tends to have a slightly different meaning than it
has for continuous time models. To avoid confusion we will, from now on, refer to s
n
as the
daily scale factor.
We proceed by formalizing the notion of a proxy in the spirit of Alizadeh, Brandt, and
Diebold (2002). The definition covers popular proxies such as the absolute return, the
intraday high-low range, and the realized volatility. For many applications it is helpful to
have proxies with small measurement variance. We develop some easy-to-implement tools
for ranking and improving proxies. In particular, proxies may be ranked by the variance of
their logarithm and they may be combined into a more efficient proxy by a technique similar
to the Markowitz (1952) optimization procedure. By the end of Section 3, we will have at
our disposal a largely model free methodology for ranking and improving proxies: optimality
of a proxy for s
n
does not depend on the particular discrete time model, as long as (1) holds.
Empirical analysis of intraday S&P 500 index futures market data from January 1988
to mid 2006 shows that the techniques provide a good proxy for the S&P 500 data. In
terms of the variance of the logarithm, the additional use of the high-low range over intra-
day intervals yields proxies that substantially outperform the realized volatility, which uses
only squared intraday returns. Martens and van Dijk (2007) arrive at a similar finding in
the context of estimating the quadratic variation. Moreover, our empirical results suggest
that the optimized proxy is more efficient than the (square root of) the quadratic variation.
Interestingly, the optimized proxy based on the highs, the lows, and the absolute returns
over ten-minute intervals, puts more weight on the highs than on the lows. From this point
of view, the upward price movements are more informative than the downward movements,
when proxying volatility.
The discrete time model class (1) provides a flexible way for dealing with time varying
volatility. These models are important tools for the practice of risk management and as-
set allocation, as well enable academic researchers to investigate fundamental questions in
3

Page 5
finance, such as the tradeoff between risk and return, see, for instance, Engle, Lilien, and
Robins (1987).
Many discrete time volatility models were developed when high frequency data were
not readily available, and were initially applied to daily, or lower frequency returns. The
continuous time models in (2) offer a natural way to introduce high frequency data into the
discrete time models and to construct proxies for the scale factors.
There have been several ways in the literature to deal with high frequency data in discrete
time volatility models. The first one is to apply the model to a five-minute grid, or even
study the continuous time limit and to infer the implications for the daily or weekly sampling
frequency, see, for example, Drost and Nijman (1993), Drost and Werker (1996), and Meddahi
and Renault (2004). In empirical applications this approach leads to serious time aggregation
problems: the parameters estimated using very short intervals, typically turn out to be
internally inconsistent with those obtained from daily or lower frequencies. These problems
have often led to the impression that the discrete time model structure (1) is fundamentally
flawed and cannot be made consistent with the data. Andersen and Bollerslev (1997) argue,
on the other hand, that intraday seasonality causes a daily barrier below which stationary
models should not be applied. Their daily barrier agrees with our choice for the business day
as a unit of time in the models (1) and (2). They propose a model for five-minute returns,
which explicitly takes into account intraday periodicities on top of the daily volatility s
n
.
Alizadeh, Brandt, and Diebold (2002) assume an intraday Brownian motion, and use the
daily high-low range for quasi maximum likelihood estimation of a discrete time stochastic
volatility model. They obtain improved estimators due to the high-low range being superior
to absolute or squared daily returns as a proxy for the volatility s
n
.
The remainder of the paper is organized as follows. Section 2 provides a continuous time
extension of the discrete time models. Section 3 introduces proxies, and develops tools for
ranking and optimizing them. Section 4 constructs a good proxy for the S&P 500 data.
In Section 5 we conclude. Appendices A, B, and C contain a description of the data, the
empirical technique of prescaling and a number of proofs, respectively.
2 The Continuous Time Models
In the classical Black Scholes world the log return process is a continuous time Brownian
motion σB
t
with constant volatility σ > 0. This leads to discrete time returns
r
n
= σZ
n
,
(3)
4

Page 6
where the Z
n
are iid standard Gaussian random variables. The Black Scholes model has
been extended to deal with time varying volatility. In the continuous time extensions the
term volatility refers to either the instantaneous diffusion coefficient, or the (square root of)
the quadratic variation over a given time period. The discrete time generalizations replace
the constant σ in (3) by a sequence (s
n
) of strictly positive random variables:
r
n
= s
n
Z
n
.
(1)
The innovations (Z
n
) are iid, and the innovation Z
n
is independent of s
n
1
. The term volatil-
ity now refers to the scale factor s
n
, which generally differs from the continuous time concept
of daily volatility.
In order to study intraday based proxies for the scale factor s
n
, we integrate high frequency
data into the discrete time models by replacing the iid innovations (Z
n
) by iid stochastic
processes (Ψ
n
) :
R
n
(ϑ) = s
n
Ψ
n
(ϑ),
ϑ ∈ [0, 1].
(2)
The day n return process R
n
is the log return process with respect to the previous day’s
close. The return process R
n
is the product of the daily scale factor s
n
and a stochastic
process Ψ
n
. The process Ψ
n
is a cadlag process
2
representing the intraday price pattern.
The time of day ϑ is in [0, 1]. The processes Ψ
n
are iid. The sequence of scale factors (s
n
)
may be any strictly positive stochastic process, as long as the scale factor s
n
is independent
of all (Ψ
k
) for k = n up to infinity. In particular, the scale factor s
n
and the process Ψ
n
are
independent, for each day n. The continuous time model structure (2) will be referred to as
the scaling hypothesis. A model that satisfies the scaling hypothesis leads to the well-known
product structure (1) for the daily close-to-close returns,
r
n
= R
n
(1) = s
n
Ψ
n
(1).
The continuous time models have the following interpretation. The discrete time scale
factor s
n
describes the state of the market on day n, and represents the level of overall price
sensitivity. The factor s
n
may be determined by the process over the past days, (s
n−k
, Ψ
n−k
),
k ≥ 1, for instance like a Garch model. It may also include an external source of randomness.
The actual fluctuations in the process Ψ
n
determine the pattern of the intraday return
1
For identification, the innovations are typically assumed mean zero, unit variance. Such identification
assumptions will not be necessary in this paper.
2
The sample paths are right continuous and have left limits.
5

Page 7
process, such as up or down days, quiet or hectic days. The process Ψ
n
offers a flexible way
of taking into account intraday seasonality. It may also have, for instance, leverage effects,
jumps, stochastic spot volatility.
As an example, consider the case that Ψ is a diffusion with stochastic instantaneous
volatility v(ϑ),
dΨ(ϑ) = v(ϑ)dB(ϑ),
which implies for the n-th day return process R
n
,
dR
n
(ϑ) = s
n
v
n
(ϑ) dB
n
(ϑ),
ϑ ∈ [0, 1].
(4)
So, under the scaling hypothesis, in the often used diffusive model for the log price process
p(t),
dp(t) = σ(t)dB(t),
the instantaneous volatility σ(t) allows a decomposition into a daily scale factor s
n
and an
independent intraday stochastic volatility component v
n
(ϑ), modelling intraday effects such
as seasonalities. The intraday volatility processes (v
n
) are iid and independent of events up
to the close of the n-1
st
day
3
.
Let us provide the remaining formal details for the continuous time process R
n
. The
trading hours are normalized to cover the interval [0, 1]. Time advances only during trading
hours. Let Ψ be a cadlag process on the closed interval [0, 1], left continuous in 1. The
processes (Ψ
n
) are independent copies of the process Ψ. In order to describe the dependence
structure, it is convenient to introduce the discrete time model filtration (G
n
). The σ-field
G
n
includes the history of (s
n
, Ψ
n
) extended with s
n+1
. So, G
n
= σ{(Ψ
i
)
i≤n
, (s
i
)
i≤n+1
}. The
σ-field G
n
represents the model information at the start of day n+1. The model information
includes the observable day n return process, as R
n
= s
n
Ψ
n
. The process Ψ
n+1
is independent
of G
n
, and the processes (Ψ
n
) are identically distributed.
Note that Ψ(1) is not standardized
4
. More generally, processes that satisfy the scaling
3
Andersen and Bollerslev (1997) propose to model the seasonal intraday volatility in five-minute returns
r
n,ϑ
i
on day n as the product of a daily scale factor s
n
and deterministic intraday components (v
ϑ
i
) :
r
n,ϑ
i
= s
n
· v
ϑ
i
· ε
n,ϑ
i
, ϑ
i
= i/K, i = 1, . . ., K, where ε
n,ϑ
i
is an iid mean zero, unit variance error. Their
model is a special case of the continuous time model (4), discretized to five-minute returns.
4
If Ψ(1) is not standardized, then the discrete time returns satisfy r
n
= aσ
n
+ σ
n
ε
n
, where the ε
n
are
iid(0,1). So the continuous time model (2) also embeds a simple version of the ARCH-M model introduced
by Engle, Lilien, and Robins (1987).
6

Page 8
hypothesis do not have a unique representation. Consider a representation (s
n
, Ψ
n
) for (R
n
)
that satisfies the scaling hypothesis. Set s
n
= s
n
/2 and Ψ
n
= 2Ψ
n
. Then
R
n
≡ R
n
.
(5)
As we will see later, identification of s
n
and Ψ
n
is not necessary for the study of proxies.
3 Proxies
The daily scale factor s
n
in the continuous time model (2) is difficult to estimate, even if we
observe the full sample path of the asset price. This stands in contrast to the situation of
obtaining a perfect estimate for the diffusion coefficient using the full sample path, as Merton
(1980) discusses. We will work with (ex post) statistics of the sample path to estimate s
n
.
Let us introduce the concept of volatility proxy in the spirit of Alizadeh, Brandt, and Diebold
(2002).
Consider volatility proxies that are positive functionals H(R
n
) of the day n return process
R
n
. If the functional H is positively homogeneous,
H(αR
n
) = αH(R
n
),
α ∈ [0, ∞),
then a proxy H(R
n
) is linear in s
n
:
H(R
n
) = s
n
H(Ψ
n
).
Applying logarithms leads to a measurement equation: the log of a proxy is the sum of two
independent terms, the log of the scale factor and a measurement error U
n
= log(H(Ψ
n
)) :
log(H(R
n
)) = log(s
n
) + U
n
.
(6)
The measurement errors U
n
form an iid sequence. Naturally the quality of the proxy is
related to the bias and the variance of the measurement error U
n
. Rewriting equation (6),
log(H(R
n
)) = log(s
n
) + EU
n
+ (U
n
− EU
n
),
makes clear that for a given functional H the measurement error introduces a constant bias,
EU
n
= µ. In applications where the proxy is used as a variable in a regression, the bias will
be corrected by the regression parameters. It is also possible to rescale the proxy, replacing
H by aH, in order to obtain a bias-corrected version. So, the key determinant of the quality
7

Page 9
is the measurement variance λ
2
,
λ
2
= var(U
n
).
(7)
An optimal proxy functional H
will satisfy
var(log(H
(Ψ))) = inf
H
var(log(H(Ψ))).
For a proxy functional H, the measurement error U
n
only depends on the process Ψ
n
. This
means that the optimality of a proxy functional is independent of the particular discrete
time model for the scale factors (s
n
).
3.1 Definition and Basic Properties
Let us now provide the formal details for volatility proxies. Recall from Section 2 that the
process Ψ is cadlag on [0, 1]. Let D[0, 1] denote the Skorohod space of cadlag functions on
[0, 1], which are left continuous in 1. Endow D[0, 1] with the Skorohod topology. The space
D[0, 1] is a separable, complete metric space (see Billingsley (1999)). The space C[0, 1] of
continuous functions on the unit interval is a linear subspace of D[0, 1].
From now on we assume the scaling hypothesis for the day n return process: R
n
(ϑ) =
s
n
Ψ
n
(ϑ). A proxy is the result of applying a certain estimator, the proxy functional H, to
the day n return process R
n
. We restrict attention to positively homogeneous functionals in
order to ensure that the decomposition in (6) holds.
Definition 3.1. Let H be a measurable, positively homogeneous functional D → [0, ∞), on
a linear subspace D of D[0, 1]. Assume Ψ ∈ D a.s., and H(Ψ) > 0 a.s. Then H is a proxy
functional. The random variable H(R
n
) is a proxy.
Remark 1. Proxy functionals are nonlinear: H(Ψ) + H(−Ψ) > 0 a.s., but H(Ψ − Ψ) =
H(0) = 0; proxy functionals need not be symmetric: H(Ψ) = H(−Ψ), in general.
Example 3.1.1. Here are some examples of proxies: the absolute daily return; the absolute
overnight return; the high-low range; realized volatility, defined for any grid as the square
root of the sum of the squared returns over the grid; maximal absolute two-minute return;
the square root of the daily increment in quadratic variation. In these cases, if Ψ(1) has a
density, the almost sure positivity of H(Ψ) is ensured.
Combining different proxies through a positively homogeneous function, generates a new
proxy. Write x >> 0 if min
i
x
i
> 0.
8

Page 10
Proposition 3.2. Let H
(i)
, i = 1, . . ., d, be proxy functionals on D. Let G : [0, ∞)
d
→ [0, ∞)
be a measurable, positively homogeneous function. Moreover, assume G(x) > 0 for x >> 0.
Then the functional H : f ↦→ G(H
(1)
(f), . . ., H
(d)
(f)) is a proxy functional on D.
Proof. Since every H
(i)
(Ψ) > 0 a.s. we have (H
(1)
(Ψ), . . ., H
(d)
(Ψ)) >> 0 a.s., so H(Ψ) > 0
a.s. Homogeneity holds, since
H(αf) = G(H
(1)
(αf), . . ., H
(d)
(αf)) = G(αH
(1)
(f), . . ., αH
(d)
(f)) = αH(f).
As a consequence, given two proxy functionals H
(1)
and H
(2)
, it is possible to obtain
a new proxy functional, for example, by scaling one of them with a positive number, by
adding them, or by taking the maximum or the minimum. One may also take a geometric
combination:
H(R
n
) ≡ (H
(1)
(R
n
))
w
1
(H
(2)
(R
n
))
w
2
,
w
1
, w
2
∈ R, w
1
+ w
2
= 1.
(8)
Scaling a proxy with a positive number a > 0 does not change the measurement variance
λ
2
:
var(log(aH(Ψ))) = var(log(H(Ψ))) = λ
2
.
(9)
Assume 0 < var(log(s
n
)) < ∞. It then follows, using the decomposition (6), that
corr(log(H(R
n
)), log(s
n
)) = (1 +
λ
2
var(log(s
n
))
)
−1/2
.
(10)
This means that proxies with smaller measurement variance λ
2
have larger correlation with
log(s
n
). The ideal situation of zero measurement variance gives perfect correlation. The
proxy is then a perfect proxy:
H(R
n
) = cs
n
,
for a certain constant c > 0. Such perfect proxies do exist, in special cases. For example,
if Ψ is the standard Brownian motion on [0, 1]. The quadratic variation of Ψ then equals
one, and s
n
is the square root of the quadratic variation: √QV
n
= s
n
. If a proxy has zero
measurement variance, then one knows the value of s
n
, without knowing the model for the
sequence (s
n
). The following example shows that the square root of the quadratic variation
is not necessarily the most efficient proxy for s
n
.
9

Page 11
Example 3.1.2. Consider the case that Ψ is a diffusion as in (4):
dΨ(ϑ) = v(ϑ)dB(ϑ),
where B denotes standard Brownian motion. Let the instantaneous volatility process v(ϑ) be
deterministic at the opening and stochastic for the rest of the day. More specifically, suppose
v(ϑ) equals 1 before time of day ϑ
0
= 1/2, and v(ϑ) equals either c
1
or c
2
after ϑ
0
, both with
probability 1/2. The square root of the truncated quadratic variation over [0, 1/2] equals s
n
times a constant, and as such is perfect. The square root of the quadratic variation of R
n
is
the product of s
n
and a random variable with positive variance
Estimating the daily scale factor s
n
is a different exercise from estimating (the square
root of) the quadratic variation. The quadratic variation is a measure for the price fluctu-
ations that actually occurred during the whole trading day. As an estimator of the daily
scale factor s
n
, the square root of the quadratic variation is merely one out of many possible
candidates.
Let us return to the issue of identification. The following proposition states that different
representations (s
n
, Ψ
n
) for the scaling hypothesis result in the same ordering for proxy
functionals. So, for our purposes identification plays no role.
Proposition 3.3. Suppose H
(1)
and H
(2)
are proxy functionals. Moreover, assume (s
n
, Ψ
n
)
and (s
n
, Ψ
n
) both satisfy the scaling hypothesis for R
n
. If H
(1)
is better than H
(2)
for Ψ, then
H
(1)
is also better than H
(2)
for Ψ
.
Proof. See appendix C.
3.2 Existence of Optimal Proxies
Recall that an optimal proxy functional H
satisfies
var(log(H
(Ψ))) = inf
H
var(log(H(Ψ))).
Theorem 3.4. If there exists a proxy functional with finite measurement variance, then
there exists an optimal proxy functional.
Proof. See appendix C.
The next proposition states that optimal proxies are scaled versions of one another, except
possibly on a set of measure zero.
10

Page 12
Proposition 3.5. Suppose H
(1)
and H
(2)
are two optimal proxy functionals. Then there
exists a constant a > 0, such that H
(1)
(Ψ)
a.s.
= aH
(2)
(Ψ).
Proof. See appendix C.
3.3 Empirically Ranking Proxies
This section provides a practical way to compare proxies. By definition, a proxy functional
H
(1)
is more efficient than H
(2)
if it has smaller measurement variance:
(1)
)
2
≤ (λ
(2)
)
2
.
Comparison by measurement variances is infeasible in an empirical situation: the measure-
ment variances cannot be estimated, since the processes (Ψ
n
) are not observed. However,
taking variances on both sides of the decomposition (6) gives
var(log(H
(i)
(R
n
))) = (λ
(i)
)
2
+ var(log(s
n
)).
(11)
There is no covariance term by the independence of s
n
and Ψ
n
. Equation (11) shows that the
variances of the proxies all have the common term var(log(s
n
)). It follows that if the variance
of the log proxy is smaller, then the measurement variance must be smaller. So, proxies may
be ranked by the variances of the log proxies. Let us summarize this in a proposition.
Proposition 3.6. Let H
(1)
and H
(2)
be two proxy functionals. Assume U
(i)
= log(H
(i)
(Ψ)),
i = 1, 2, and log(s
n
) have finite variances. Assume R
n
satisfies the scaling hypothesis. Then
var(log(H
(1)
(R
n
))) − var(log(H
(2)
(R
n
))) = (λ
(1)
)
2
− (λ
(2)
)
2
.
The autocorrelation function is an additional indicator of the quality of a proxy, see Figure
2 below. Let ρ
·
(j) denote j-th order autocorrelation. If the process (s
n
, Ψ
n
) is stationary,
then, using (6),
ρ
log(H(R
0
))
(j) = a
λ
· ρ
log(s
0
)
(j) + e
λ
(j)
(12)
≈ a
λ
· ρ
log(s
0
)
(j).
where a
λ
∈ (0, 1], see Appendix C. The proportion factor a
λ
is close to unity if the measure-
ment variance λ
2
is small. In practice, the correction term e
λ
(j) vanishes for j → ∞, and
the autocorrelations ρ
log(s
0
)
(j) are large and decay slowly. Under suitable assumptions the
11

Page 13
ratio of the autocorrelation functions for different proxies converges,
ρ
log(H
(1)
(R
0
))
(j)
ρ
log(H
(2)
(R
0
))
(j) →
var(log(s
0
)) + (λ
(2)
)
2
var(log(s
0
)) + (λ
(1)
)
2
,
j → ∞.
(13)
This means that better proxies have larger autocorrelations. See Appendix C for details.
3.4 Improving Proxies
This section provides a way to combine given proxy functionals H
(1)
, . . ., H
(d)
into a more
efficient proxy functional. Consider the geometric combination of these functionals,
H
(w)
(R
n
) =
d
i=1
(H
(i)
(R
n
))
w
i
,
w
1
+ . . . + w
d
= 1, w
i
∈ R.
(14)
Here, the column vector w is the d-dimensional coefficient vector. The restriction ∑ w
i
= 1
is needed to obtain a proxy functional, but the coefficients are not restricted to the interval
[0, 1]. Let Λ denote the covariance matrix of the measurement errors U
(i)
= log(H
(i)
(Ψ)):
Λ = cov([U
(1)
, . . ., U
(d)
]
).
(15)
The measurement variance λ
2
w
of the geometric combination U
(w)
in (14) is λ
2
w
= w
Λw and,
as for the minimal variance portfolio in Markowitz portfolio theory, λ
2
w
is minimal for
w
=
Λ
−1
ι
ι
Λ
−1
ι
,
ι = (1, . . ., 1)
,
(16)
with optimal variance λ
2
w
=
1
ι
Λ
−1
ι
.
This solution is empirically infeasible: it is impossible to estimate the variance matrix Λ,
since the measurement errors U
(i)
n
are not observed. Let Λ
p,n
denote the covariance matrix
of the log proxies:
Λ
p,n
= cov([log(H
(1)
(R
n
)) . . .log(H
(d)
(R
n
))]
).
(17)
The covariance matrix Λ
p,n
is the covariance matrix Λ with a common noise term var(log(s
n
))
added to each element:
Λ
p,n
= Λ + var(log(s
n
)) ιι
.
(18)
12

Page 14
Assuming stationarity for (s
n
), the covariance matrix Λ
p
= Λ
p,n
may be estimated from the
data. Formula (19) in Theorem 3.7 below shows how to obtain the optimal coefficients w
in (16) from the covariance matrix Λ
p,n
.
Theorem 3.7. Let R
n
satisfy the scaling hypothesis. Assume var(log(H
(i)
(Ψ))) < ∞ for
i = 1, . . ., d. Let the covariance matrices Λ and Λ
p,n
be defined by (15) and (17), respectively.
The optimal coefficient vector w
in (16) does not depend on the form of the process (s
n
)
and may be expressed as
w
=
Λ
−1
p,n
ι
ι
Λ
−1
p,n
ι
.
(19)
Let λ
2
w
=
1
ι
Λ
−1
ι
. The variance of the logarithm of the optimal geometric proxy is
var(log(H
(w
)
(R
n
))) = λ
2
w
+ var(log(s
n
)).
Proof. The optimal coefficient w
does not depend on (s
n
): it follows from equation (18)
that
arg min
w
w
Λ
p,n
w = arg min
w
(w
Λw + var(log(s
n
))) = arg min
w
w
Λw.
(20)
Define the Lagrangian w
Λ
p,n
w+µ (1−w
ι). Differentiating the Lagrangian with respect
to w yields 2Λ
p,n
w −µι = 0, hence w = 1/2 Λ
−1
p,n
µι. By ι
w = 1, this yields µ = 2/ι
Λ
−1
p,n
ι and
w = Λ
−1
p,n
ι/ι
Λ
−1
p,n
ι. Since w
Λ
p,n
w is convex in w and there is a unique solution to the first
order condition, it is the optimum.
Use (20) to obtain the equalities w
= Λ
−1
p,n
ι/ι
Λ
−1
p,n
ι = Λ
−1
ι/ι
Λ
−1
ι, which imply
var(log(H
(w
)
(R
n
))) = λ
2
w
+ var(log(s
n
)).
Remark 2. In empirical applications one uses estimates of the variances. In order to reduce
estimation error, we shall use the technique of prescaling, see Appendix B.
Assuming stationarity for the process (s
n
, Ψ
n
), the covariance matrix Λ
p
is consistently
estimated by the sample covariance matrix of the log of the proxies, thereby providing
coefficientsw that are consistent for w
. However, this estimator for w
may remain consistent
while allowing, for example, for structural breaks in the scale factors (s
n
). See the consistency
condition (22) in Appendix C for more details.
13

Page 15
4 A good proxy for the S&P 500
This section applies the techniques of Section 3 to the S&P 500 futures tick data. Appendix
A describes the data.
4.1 Microstructure Noise Barrier
On small time scales financial prices are subject to market microstructure effects, such as
the bid-ask bounce, price discreteness, and asynchronous trading, see, for instance, Zhang,
Mykland, and Aıt-Sahalia (2005), Oomen (2006), and Hansen and Lunde (2006b). These
effects may invalidate the model assumptions. Microstructure effects may be avoided by
sampling at sufficiently wide intervals.
In this paper the measure of comparison is the variance of the logarithm. The standard
realized volatility RV and the high-low based realized volatility RV HL (see Table 1) depend
on the sampling interval ∆ϑ. Figure 1 shows the graph of ∆ϑ → ̂
var(log(H
∆ϑ
(R
n
))), for
∆ϑ ranging from zero to sixty minutes. These curves suggest that a qualitative change of
behaviour occurs for ∆ϑ ≈ five minutes for realized volatility, and ∆ϑ ≈ eight minutes for
realized high-low. From now on realized volatility is based on five-minute sampling intervals
or larger. For realized high-low our minimal sampling interval will be ten minutes.
0
10
20
30
40
50
60
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
Interval length (min.)
var (log(RV))
(a) Realized volatility
0
10
20
30
40
50
60
0.14
0.145
0.15
0.155
0.16
0.165
0.17
0.175
0.18
0.185
0.19
Interval length (min.)
var (log(RVHL))
(b) Realized high-low
Figure 1:
Plots of the sample variance of the log of a proxy with ∆ϑ ranging from zero to 60 minutes (zero
is tick per tick). (a) Realized volatility. (b) Realized high-low.
4.2 Ranking Proxies
Table 1 compares twelve simple proxies constructed from the data. We emphasize that these
proxies are a set of twelve out of many possible proxies. They are of no special importance
14

Page 16
themselves.
full
1st
2nd
3rd
4th
name
PV
PV
PV
PV
PV
RV5
0.064
0.070 0.070 0.073 0.042
RV10
0.080
0.085 0.093 0.090 0.052
RV15
0.089
0.096 0.105 0.093 0.061
RV20
0.100
0.110 0.117 0.103 0.071
RV30
0.117
0.133 0.134 0.113 0.087
abs-r
0.611
0.683 0.550 0.635 0.568
hl
0.161
0.179 0.176 0.160 0.130
maxar2
0.118
0.134 0.124 0.118 0.088
RAV5
0.058
0.060 0.065 0.066 0.040
RAV10
0.072
0.072 0.085 0.082 0.049
RVHL10
0.053
0.057 0.061 0.061 0.034
RAVHL10 0.047
0.048 0.055 0.054 0.031
Table 1:
Performance of twelve proxies. The full sample is split into four subsamples. Prescaling by
EWMA(0.7) predictor for RV5. The following proxies are included.
RV5: root of sum of squared 5 min
returns; RV10: root of sum of squared 10 min returns; RV15: root of sum of squared 15 min returns; RV20:
root of sum of squared 20 min returns; RV30: root of sum of squared 30 min returns; abs-r: absolute close-
to-close return; hl: high-low of the intraday return process; maxar2: maximum of the absolute 2-minute
returns; RAV5: sum of absolute 5-minute intraday returns; RAV10: sum of absolute 10-minute intraday
returns; RVHL10: root of sum of 10-minute squared high-lows; RAVHL10: sum of 10-minute high-lows;
full
1st
2nd
3rd
4th
name
PV
PV
PV
PV
PV
RV5-up
0.066
0.070 0.070 0.073 0.051
RV5-down
0.094
0.103 0.101 0.104 0.068
RV10-up
0.091
0.094 0.101 0.095 0.074
RV10-down
0.133
0.138 0.146 0.149 0.100
RAV5-up
0.064
0.066 0.070 0.067 0.051
RAV5-down
0.097
0.099 0.101 0.110 0.077
RAV10-up
0.093
0.093 0.102 0.097 0.081
RAV10-down
0.147
0.145 0.155 0.168 0.120
RAV10HIGH 0.053
0.055 0.061 0.054 0.041
RAV10LOW
0.081
0.082 0.086 0.090 0.064
Table 2:
Performance of upward/downward decomposed proxies. The table contains proxies from Table
1 splitted according to upward and downward price movements. For example, RV5-up is root of sum of
squared 5 min positive returns, RAV10HIGH is the sum of 10-minute highs, and RAV10LOW is the sum of
10-minute absolute lows.
For each proxy a measure of comparison is given for five samples: first the full sample
(days 2 to 4575) and then for four subsamples spanning the full sample (2:1144, 1145:2287,
2288:3431, 3432:4575). The measure of comparison is PV (prescaled variance), which is
the variance of the logarithm of a proxy after prescaling, PV = var(log(H(R
n
)/p
n
)); see
Section 3.3 and Appendix B. The first observation cannot be prescaled and is left out of
the variance computations. Smaller variances correspond to more efficient proxies. For
the prescaling sequence (p
n
) we take an exponentially weighted moving average predictor
15

Page 17
of five-minute realized volatility with smoothing parameter β = 0.7, yielding a prescaling
sequence p
n
= 0.7 p
n−1
+ 0.3 RV 5
n−1
. We have set the smoothing parameter so that the
sample variance of the logarithm of prescaled five-minute realized volatility is minimal. The
prescaled variance ranks proxies, but its value is not a measure for the quality of a proxy.
The first column of Table 1 shows that the quality of the realized volatility RV improves if
one increases the sampling frequency from 30 minutes to 5 minutes. The prescaled variance is
maximal for the absolute close-to-close returns, confirming the fact that absolute or squared
daily returns are poor proxies
5
. Note that the maximal absolute two-minute return is better
than high minus low, which tends to use returns based on much longer time spans. Overall,
we find that sums of absolute values lead to more efficient proxies than sums of squared
values. This observation relates to a finding of Barndorff-Nielsen and Shephard (2003), whose
simulations indicate that absolute power variation, based on the sum of absolute returns,
has better finite sample behaviour than realized quadratic variation. The best performing
proxy in Table 1 is RAV HL10, the sum of the ten-minute high-low ranges. The remaining
columns of Table 1 show that the ranking of the different proxies in the various subsamples
is the same as in the full sample, with one exception in the second subsample for RV 30 and
maxar2.
Table 2 provides the prescaled variances for the upward and downward components of a
number of proxies. For instance, the five-minute realized volatility is decomposed according
to upward and downward price movements:
RV5 =
K
i=1
r
2
n,ϑ
i
=
K
i=1
r
2
n,ϑ
i
I
{r
n,ϑ
i
>0}
+
K
i=1
r
2
n,ϑ
i
I
{r
n,ϑ
i
<0}
= RV5-up + RV5-down.
Note that RV5 (PV=0.064) is better than RV5-up (PV=0.066) and RV5-down (PV=0.094).
The upward proxies are consistently more efficient than their downward counterparts. This
difference suggests that, when proxying the scale factor s
n
, one should put more weight on
the positive returns.
5
We use the absolute returns larger than 0.001, or 10 basis points, in order to avoid taking the log of
zero. This leaves 4079 daily returns.
16

Page 18
4.3 Optimized Proxy
Let us now combine the proxies in Tables 1 and 2 into a more efficient one using the technique
of Section 3.4. The five-minute realized volatility RV5 has PV=0.064. Let us improve
upon this value. First, by using the high-low range over intraday intervals, RVHL10 has
PV = 0.053, see Table 1. It is even better to use absolute values: RAVHL10 has PV = 0.047.
Now use the theory of Section 3.4 to combine the high-low ranges in RAVHL10 with the
absolute returns in RAV10: inserting the covariance matrix
˜
Λ
p,n
of the log of these two
prescaled proxies into formula (19), yields the proxy
H(R
n
) = (RAV HL10
n
)
1.82
(RAV 10
n
)
−0.82
(PV = 0.041).
Decomposing RAVHL10 into its upward and downward components we obtain the proxy
H
(w)
(R
n
) = (RAV 10HIGH
n
)
1.04
(RAV 10LOW
n
)
0.72
(RAV 10
n
)
−0.76
(PV = 0.038).(21)
Of course, one may also apply the optimal coefficient formula (19) to all twenty-one proxies
in Tables 1 and 2 at once
6
. The full combination yields a proxy with PV = 0.037, which
only marginally outperforms the proxy obtained in (21). We prefer to work with the simpler
proxy in (21) and we will refer to it as the optimized proxy H
(w)
(R
n
).
The optimized proxy easily outperforms all proxies in Tables 1 and 2. If one extrapolates
the full sample prescaled variances of the realized volatilities of Table 1 to a time interval
of length zero (corresponding to the limiting case of the quadratic variation), one obtains a
value between 0.050 and 0.060. The value PV = 0.038 for the optimized proxy is well below
these values, suggesting that it is a more efficient proxy for the daily scale factor than the
square root of the quadratic variation.
Observe that the coefficient for RAV10 in the optimized proxy is negative (w
3
= −0.76).
In geometrical terms this negative coefficient may be explained as follows. The log proxies
are vectors in an affine space. The proxies are highly related, since they all approximate
the same daily scale factor s
n
. The optimal proxy is not in the convex hull of the proxies in
Tables 1 and 2. The original proxies do not completely reflect the direction of the optimal
proxy. The coefficients outside [0, 1] correct the direction.
Table 3 investigates the stability of the optimized proxy H
(w)
(R
n
). Similarly to Table 1
it reports performance measures for the full sample and for four subsamples. The first row
reports the performance of H
(w)
(R
n
) in the different subsamples; comparison with Tables
1 and 2 shows that H
(w)
(R
n
) outperforms all those proxies in every subsample. The proxy
H
(w,i)
(R
n
) is constructed using the coefficients that are optimal for the i-th subsample. In
6
We exclude the absolute close-to-close return in performing this calculation.
17

Page 19
full
1st
2nd
3rd
4th
name
PV
PV
PV
PV
PV
H
(w)
0.038
0.039 0.043 0.043 0.028
H
(w,1)
0.038
0.039 0.043 0.043 0.028
H
(w,2)
0.039
0.039 0.043 0.043 0.029
H
(w,3)
0.039
0.040 0.044 0.042 0.029
H
(w,4)
0.039
0.040 0.045 0.044 0.027
Table 3:
Geometric proxies, optimized for different sub-
samples: performance and stability.
0
10
20
30
40
50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Lag (days)
Figure 2:
Autocorrelations of log proxies.
Lags 1 to 50 days. From bottom to top:
RV30, RV15, RV10, RV5, H
(w)
.
the first subsample the performance of the globally optimized H
(w)
(R
n
) (PV=0.039) is not
substantially improved by H
(w,1)
(R
n
) (PV=0.039). A similar statement holds for the other
subsamples. Moreover, proxies based on a particular subsample are close to optimality in
all other subsamples. For instance, the proxy optimized for the first subsample (the years
1988–1992) is nearly optimal for the years 2002–2006. We conclude that the optimality of
H
(w)
(R
n
) is stable.
Figure 2 shows the autocorrelations of log(H
(w)
(R
n
)) and the log of four different realized
volatilities, RV30, RV15, RV10, and RV5. The autocorrelations of the different realized
volatilities increase as the sampling interval decreases. The autocorrelations for the optimized
proxy are substantially larger than those for five-minute realized volatility. Hence, formula
(13) provides an additional indication that H
(w)
is the best.
Table 4 explores the quality of the optimized proxy in a heuristic way. It gives the
coefficient of determination, R
2
, of a linear regression of the logarithm of a proxy on the
logarithm of another proxy lagged one day:
log(H
(j)
(R
n
)) = α + β log(H
(i)
(R
n−1
)) + ε
n
.
Large R
2
’s in a particular column mean that the proxy in that column is largely predictable,
suggesting that it is a good proxy for daily volatility. The R
2
’s attain their maximum at the
optimized proxy, in the most right column.
7
Finally, Figure 3 shows the time series graphs of four different proxies. The proxies were
standardized to have mean one, by dividing them by their mean. From top to bottom the
7
The optimized proxy is constructed as an optimized proxy for s
n
, not as an optimal predictor for s
n+1
.
Even so, the R
2
’s attained in the row H
(w)
(−1) are large.
18

Page 20
RV30 RV20 RV15 RV10 RV5
H
(w)
RV30(-1)
0.35
0.39
0.42
0.46
0.50
0.58
RV20(-1)
0.38
0.42
0.45
0.49
0.54
0.61
RV15(-1)
0.39
0.44
0.47
0.50
0.55
0.63
RV10(-1)
0.41
0.45
0.48
0.52
0.57
0.66
RV5(-1)
0.43
0.48
0.51
0.55
0.60
0.69
H
(w)
(−1)
0.44
0.48
0.51
0.54
0.60
0.71
Table 4:
R
2
of the regression log(H
(j)
(R
n
)) = α + β log(H
(i)
(R
n−1
)) + ε
n
, for i, j = 1, . . ., 6, and n =
2, . . ., 4575.
curves become ’less erratic’, suggesting a decrease in the measurement errors U
n
. Each step
shows a marked improvement.
5 Conclusions
This paper provides a theoretical basis for ranking and optimizing volatility proxies, based
on intraday data. The theory is founded on a natural class of continuous time extensions of
discrete time, daily volatility models, taking into account intraday seasonality. Good proxies
are needed for parameter estimation and forecast evaluation of discrete time volatility models.
In this paper a volatility proxy is the result of applying a positively homogeneous func-
tional to the intraday return process. By definition, a good proxy has small measurement
variance. We show that optimal proxies exist and provide easy-to-implement tools for rank-
ing and improving proxies. The approach is to a large extent model free: an optimal proxy
for the scaling factor s
n
is an optimal proxy under all possible discrete time models of the
form r
n
= s
n
Z
n
, where the Z
n
are iid innovations, and the innovation Z
n
is independent of
the scale factor s
n
.
For the S&P 500 data a combination of the highs, the lows, and the absolute returns over
ten-minute intervals yields a good proxy. One should put more weight on the highs than on
the lows, when proxying volatility. The empirical results indicate that the optimized proxy,
although it uses only a finite sampling grid, is more efficient for the scale factor s
n
than (the
square root of) the quadratic variation, which is based on continuous sampling.
This paper has addressed the problem of ranking and optimizing proxies for today’s scale
factor s
n
. We see opportunities to use proxies for the specification of the daily volatility
process (s
n
), and accordingly use proxies to forecast future volatility. We aim to pursue
these ideas in future research.
19

Page 21
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
0
1
2
3
4
5
6
(a) Absolute close-to-close returns
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
0
1
2
3
4
5
6
(b) Intraday high-low range
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
0
1
2
3
4
5
6
(c) Five-minute realized volatility RV5
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
0
1
2
3
4
5
6
(d) Optimized proxy from formula (21)
Figure 3: Time series of four standardized proxies, H(R
n
)/
¯
H(R
n
).
20

Page 22
A Data
Our data set is the U.S. Standard & Poor’s 500 stock index future, traded on the Chicago
Mercantile Exchange (CME), for the period 1st of January, 1988 until May 31st, 2006. The
data were obtained from Nexa Technologies Inc. (www.tickdata.com). The futures trade
from 8:30 A.M. until 15:15 P.M. Central Standard Time. Each record in the set contains
a timestamp (with one second precision) and a transaction price. The tick size is $0.05 for
the first part of the data and $0.10 from 1997-11-01. The data set consists of 4655 trading
days. We removed sixty four days for which the closing hour was 12:15 P.M. (early closing
hours occur on days before a holiday). Sixteen more days were removed, either because of
too late first ticks, too early last ticks, or a suspiciously long intraday no-tick period. These
removals leave us with a data set of 4575 days with nearly 14 million price ticks, on average
more than 3 thousand price ticks per day, or 7.5 price ticks per minute.
There are four expiration months: March, June, September, and December. We use the
most actively-traded contract: we roll to a next expiration when the tick volume for the next
expiration is larger than for the current expiration.
An advantage of using future data rather than the S&P 500 cash index is the absence
of non-synchronous trading effects which cause positive autocorrelation between successive
observations, see Dacorogna et al. (2001). As in the cash index there are bid-ask effects in
the future prices which induce negative autocorrelation between successive observations. We
deal with these effects by taking large enough time intervals, see Section 4.1. Since we study
a very liquid asset the error term due to microstructures is relatively small.
B Prescaling
The methods of comparing proxies in Proposition 3.6 and of improving proxies in Theorem
3.7 are formulated in terms of population variances and covariances. In practical situations,
one has to work with the sample counterparts of these quantities, which introduces sampling
error. To reduce the sampling error caused by the scale factors (s
n
), we propose the technique
of prescaling. The idea is to stabilize the sequence (s
n
), by scaling it by a predictable sequence
of random variables (p
n
). Let F
n
= σ(R
i
, i ≤ n) denote the observable information up until
day n.
Definition B.1. A prescaling sequence (p
n
) is an (F
n−1
) adapted sequence of strictly positive
random variables.
21

Page 23
The prescaling factors p
n
will be used to define adjusted scale factors
˜s
n
= s
n
/p
n
.
Proposition B.2. Assume the processes (R
n
) satisfy the scaling hypothesis. Prescale the
scale factors (s
n
) to obtain the sequence (˜s
n
) above. The corresponding processes (
˜
R
n
), where
˜
R
n
= ˜s
n
Ψ
n
, satisfy the scaling hypothesis.
Proof. The variables (˜s
n
) are positive. Both p
n+1
and s
n+1
are G
n
-measurable, hence so is
˜s
n+1
. Therefore
˜
R
n
satisfies the scaling hypothesis.
As a result one may define proxies for ˜s
n
. These are prescaled proxies:
H(
˜
R
n
) = H(R
n
)/p
n
.
The proxy H(
˜
R
n
) for ˜s
n
has the same measurement error as the proxy H(R
n
) for s
n
:
˜
U
n
= log(H(
˜
R
n
)) − log(˜s
n
)
= log(H(R
n
)) − log(p
n
) − (log(s
n
) − log(p
n
))
= U
n
.
So, ranking and optimizing proxies before and after prescaling are equivalent in terms of
population statistics; therefore one may replace s
n
by ˜s
n
, and H(R
n
) by H(
˜
R
n
) in Proposition
3.6 and Theorem 3.7. As a consequence, the population value of the noise term var(log(s
n
))
in equations (11) and (18) changes into
var(log(˜s
n
)) = var(log(s
n
/p
n
)).
A good predictor p
n
of s
n
results in a small term var(log(˜s
n
)).
C Proofs
Proof of Proposition 3.3. By assumption s
n
Ψ
n
= s
n
Ψ
n
. Independence of s
n
and Ψ
n
implies
var(log(s
n
)) + var(log(H
(1)
n
))) = var(log(s
n
)) + var(log(H
(1)
n
)))
≤ var(log(s
n
)) + var(log(H
(2)
n
)))
= var(log(s
n
)) + var(log(H
(2)
n
))).
Hence var(log(H
(1)
n
))) ≤ var(log(H
(2)
n
))).
22

Page 24
Proof of Theorem 3.4. We have to show that there exists a measurable, positively homoge-
neous functional H
: D → [0, ∞), with H
(Ψ) > 0 a.s., and var(log(H
(Ψ))) ≤ var(log(H(Ψ)))
for all proxy functionals H.
For a proxy functional H, write U = log(H). Define λ
2
H
= var(log(H(Ψ))). Let U denote
the space of all log proxy functionals with λ
2
H
< ∞. The space U is not empty, by assump-
tion. If EU(Ψ) = a = 0, then H
= e
−a
H is an equally good proxy functional for which
Elog(H
(Ψ)) = 0. Therefore we may restrict attention to the subspace U
0
of U of centered
functionals. The space U
0
is affine: if U
1
, U
2
∈ U
0
, and w ∈ R, then wU
1
+ (1 − w)U
2
∈ U
0
,
since (H
(1)
)
w
(H
(2)
)
(1−w)
is a proxy functional, see equation (8).
Define λ
2
inf
= inf
H:log(H)∈U
0
2
H
}. Consider the space L
2
(D, B), of equivalence classes [U] of
log proxy functionals U, with inner product < [U
(1)
], [U
(2)
] >= E(U
(1)
(Ψ)U
(2)
(Ψ)). Here, B
denotes the Borel sigma-field for D. Notice that U
0
is a subset of L
2
and that λ coincides
with the L
2
-norm ||.|| on U
0
. Let U
1
, U
2
, . . . ∈ U
0
be a sequence for which ||U
i
|| → λ
inf
. Then
[U
1
], [U
2
], . . . is a Cauchy sequence in L
2
: apply the parallelogram law to obtain
0 ≤ ||U
m
− U
n
||
2
≤ −4||
U
m
+ U
n
2
||
2
+ 2||U
m
||
2
+ 2||U
n
||
2
.
Since U
0
is affine, (U
m
+ U
n
)/2 ∈ U
0
, hence ||
U
m
+U
n
2
||
2
≥ λ
2
inf
. Therefore ||U
m
− U
n
||
2
−4λ
2
inf
+ 2λ
2
m
+ 2λ
2
n
→ 0 for m, n → ∞.
By completeness of L
2
the sequence [U
1
], [U
2
], . . . converges to an element [U
0
] in L
2
and by
continuity of the norm λ
2
0
= λ
2
inf
. Pick a functional U
0
∈ U
0
from [U
0
]. Let us use U
0
to
construct a functional H
that satisfies the conditions stated at the start of the proof. For
every L
2
convergent sequence there exists a subsequence that converges almost surely. Let
U
i
k
= log(H
(i
k
)
(f)) → U
0
(f) on a set C almost everywhere in D. Define on the convergence
set C: H
(f) = limH
(i
k
)
(f). For {αf : f ∈ C, αf /∈ C, α ∈ [0, ∞)}, define H
(αf) =
αH
(f). For remaining f ∈ D define H
(f) ≡ 0. The functional H
assigns a single value
to each f ∈ D : consider f
1
, f
2
∈ C, α
1
, α
2
> 0, and f = α
1
f
1
= α
2
f
2
. Then H
1
f
1
) ≡
α
1
H
(f
1
) = α
1
H
2
1
f
2
) By homogeneity of H
on C this equals α
2
H
(f
2
) ≡ H
2
f
2
).
Being the result of a limit, the functional H
is measurable. Positive homogeneity follows
by construction. Moreover, H
(Ψ) > 0 almost surely, since U
0
(Ψ)
a.s.
= log(H
(Ψ)) and
var(U
0
(Ψ)) = λ
2
0
< ∞. Finally, var(log(H
(Ψ))) = λ
2
inf
≤ λ
2
H
for all H.
Lemma C.1. If H
is an optimal proxy functional, and H is a proxy functional, then
cov(log(H
(Ψ)), log(H(Ψ))) = (λ
)
2
.
23

Page 25
Proof of Lemma C.1. Consider the proxy functional H(f) ≡
(H
(f))
w
(H(f))
1−w
, with
measurement variance λ
2
w
= w
2
)
2
+ 2w(1 − w) cov
(log(H
(Ψ)), log(H(Ψ))) + (1 − w)
2
λ
2
.
Since H
is optimal, ∂λ
2
w
/∂w |
w=1
= 0. Hence cov(log(H
(Ψ)), log(H(Ψ))) = (λ
)
2
.
Proof of Proposition 3.5. Both proxy functionals have measurement variance (λ
)
2
. Let H
0
denote the centered proxy: H
0
= exp(−Elog(H(Ψ)) ) H, with Elog(H
0
(Ψ)) = 0. Consider
the covariance of the centered log proxies: cov(log(H
(1)
0
(Ψ)), log(H
(2)
0
(Ψ))). By Lemma C.1
this covariance equals (λ
)
2
. By Cauchy-Schwarz this equality holds if and only if H
(1)
0
(Ψ)
a.s.
=
H
(2)
0
(Ψ). In other words, if and only if H
(1)
(Ψ)
a.s.
= aH
(2)
(Ψ), for certain a > 0.
Autocorrelation formulas (12) and (13). Correlation calculations yield for a
λ
and e
λ
(j) :
a
λ
=
var(log(s
0
))
var(log(s
0
)) + λ
2
, and e
λ
(j) = corr(log(s
0
), U
−j
) ·
√var(log(s
0
)) λ
var(log(s
0
)) + λ
2
.
The proportion factor a
λ
∈ (0, 1] is large when the measurement variance λ
2
is small. The
correction term e
λ
(j) depends on the impact of U
−j
= log(H(Ψ
−j
)) on log(s
0
). The factor
to the right hand side of the multiplication dot is in [0, 1]. The ratio of two autocorrelation
functions equals
ρ
log(H
(1)
(R
0
))
(j)
ρ
log(H
(2)
(R
0
))
(j)
=
a
λ
(1)
· ρ
log(s
0
)
(j) + e
λ
(1)
(j)
a
λ
(2)
· ρ
log(s
0
)
(j) + e
λ
(2)
(j)
.
Assuming that the impact of U
−j
is small compared to the autocorrelation in log(s
0
) :
corr(U
(i)
−j
, log(s
0
))/corr(log(s
−j
), log(s
0
)) → 0
j → ∞,
gives, upon dividing both numerator and denominator by ρ
log(s
0
)
(j) (assuming ρ
log(s
0
)
(j) = 0
for all j > 0),
ρ
log(H
(1)
(R
0
))
(j)
ρ
log(H
(2)
(R
0
))
(j) →
a
λ
(1)
a
λ
(2)
=
var(log(s
0
)) + (λ
(2)
)
2
var(log(s
0
)) + (λ
(1)
)
2
,
j → ∞.
(13)
Consistently estimating the optimal coefficients. First some notation. Let (X
n
)
n∈1...N
be a
series of vectors. Let ̂
var(X
n
) and ̂
cov(X
n
) denote the standard empirical variance and covari-
ance matrices of the series (X
n
), respectively. Let H(R
n
) be shorthand for the d-dimensional
column vector of proxies H
(i)
(R
n
), and let U
n
denote the accompanying measurement errors.
Let log(H(R
n
)) denote the element wise logarithms. So, log(H(R
n
)) = log(s
n
) · ι + U
n
.
24

Page 26
The standard formula for the sample variance of the sum of random vectors gives:
̂
var(log((H(R
n
)) = ̂
var(log(s
n
))ιι
+ ̂
var(log(U
n
)) + 2 · ̂
cov(U
n
, log(s
n
) · ι).
The estimatorw is given byw = arg min
w
w
̂
var(log(H(R
n
)))w. As in the proof of Theorem
3.7, the variance of log(s
n
) drops out:
w = arg min
w
w
(
̂
var(log(U
n
)) + 2 · ̂
cov(U
n
, log(s
n
) · ι)
) w.
Ifw is consistent for w
, then asymptotically it should solve arg min
w
w
Λw. The term
̂
var(log(U
n
)) converges to Λ for increasing sample sizes, since the measurement error vectors
U
n
are iid. So, the consistency ofw comes down to the consistency condition that the sample
covariance converges to zero in probability:
̂
cov(U
n
, log(s
n
) · ι)
P
→ 0,
N → ∞.
(22)
In addition to existence of second moments and the independence of U
n
and log(s
n
), the
stationarity for (s
n
, Ψ
n
) is a sufficient, but not necessary, condition which ensures that the
consistency condition (22) holds.
References
Alizadeh, S., Brandt, M.W. and Diebold, F.X. (2002). Range-based estimation of stochastic
volatility models. Journal of Finance, 57, number 3, 1047–1091.
Andersen, T.G. and Bollerslev, T. (1997). Intraday periodicity and volatility persistence in
financial markets. Journal of Empirical Finance, 4, 115–158.
Andersen, T.G. and Bollerslev, T. (1998). Answering the skeptics: Yes, standard volatility
models do provide accurate forecasts. International Economic Review, 39, number 4,
885–905.
Andersen, T.G., Bollerslev, T., Diebold, F.X. and Labys, P. (2003). Modeling and forecasting
realized volatility. Econometrica, 71, number 2, 579–625.
Barndorff-Nielsen, O.E. and Shephard, N. (2002). Estimating quadratic variation using
realized variance. Journal of Applied Econometrics, 17, number 5, 457–477.
Barndorff-Nielsen, O.E. and Shephard, N. (2003). Realized power variation and stochastic
volatility models. Bernoulli, 9, number 2, 243–65 and 1109–1111.
25

Page 27
Billingsley, P. (1999). Convergence of Probability Measures, 2nd edn. New York: John Wiley
& Sons, Inc.
Dacorogna, M.M., Gencay, R., Muller, U., Olsen, R.B. and Pictet, O.V. (2001). An Intro-
duction to High-Frequency Finance. London: Academic Press.
Drost, F.C. and Nijman, T.E. (1993). Temporal aggregation of Garch processes. Economet-
rica, 61, number 4, 909–927.
Drost, F.C. and Werker, B.J.M. (1996). Closing the GARCH gap: Continuous time GARCH
modeling. Journal of Econometrics, 74, number 1, 31–57.
Engle, R.F., Lilien, D.M. and Robins, R.P. (1987). Estimating time varying risk premia in
the term structure: The ARCH-M model. Econometrica, 55, number 2, 391–407.
Hansen, P.R. and Lunde, A. (2006a). Consistent ranking of volatility models. Journal of
Econometrics, 131, number 1-2, 97–121.
Hansen, P.R. and Lunde, A. (2006b). Realized variance and market microstructure noise.
Journal of Business & Economic Statistics, 24, number 2, 127–161.
Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7, number 1, 77–91.
Martens, M. and van Dijk, D. (2007). Measuring volatility with the realized range. Journal
of Econometrics, 138, number 1, 181–207.
Meddahi, N. and Renault, E. (2004). Temporal aggregation of volatility models. Journal of
Econometrics, 119, number 2, 355–377.
Merton, R.C. (1980). On estimating the expected return on the market: an exploratory
investigation. Journal of Financial Economics, 8, 323–361.
Oomen, R.C.A. (2006). Properties of realized variance under alternative sampling schemes.
Journal of Business & Economic Statistics, 24, number 2, 219–237.
Zhang, L., Mykland, P.A. and Aıt-Sahalia, Y. (2005). A tale of two time scales: Determining
integrated volatility with noisy high-frequency data. Journal of the American Statistical
Association, 100, number 472, 1394–1411.
26