Title: | Spatial Price Level Comparisons |
---|---|
Description: | Price comparisons within or between countries provide an overall measure of the relative difference in prices, often denoted as price levels. This package provides index number methods for such price comparisons (e.g., The World Bank, 2011, <doi:10.1596/978-0-8213-9728-2>). Moreover, it contains functions for sampling and characterizing price data. |
Authors: | Sebastian Weinand [aut, cre] |
Maintainer: | Sebastian Weinand <[email protected]> |
License: | EUPL |
Version: | 1.3.0 |
Built: | 2024-11-22 04:21:32 UTC |
Source: | https://github.com/sweinand/pricelevels |
Calculation of bilateral price indices. Currently, the following ones are implemented (see below in alphabetic order).
banerjee(p, r, n, q, base=NULL, settings=list()) bmw(p, r, n, base=NULL, settings=list()) carli(p, r, n, base=NULL, settings=list()) cswd(p, r, n, base=NULL, settings=list()) davies(p, r, n, q, base=NULL, settings=list()) drobisch(p, r, n, q, w=NULL, base=NULL, settings=list()) dutot(p, r, n, base=NULL, settings=list()) fisher(p, r, n, q, w=NULL, base=NULL, settings=list()) geolaspeyres(p, r, n, q, w=NULL, base=NULL, settings=list()) geopaasche(p, r, n, q, w=NULL, base=NULL, settings=list()) geowalsh(p, r, n, q, w=NULL, base=NULL, settings=list()) harmonic(p, r, n, base=NULL, settings=list()) jevons(p, r, n, base=NULL, settings=list()) laspeyres(p, r, n, q, w=NULL, base=NULL, settings=list()) lehr(p, r, n, q, base=NULL, settings=list()) lowe(p, r, n, q, base=NULL, settings=list()) medgeworth(p, r, n, q, base=NULL, settings=list()) paasche(p, r, n, q, w=NULL, base=NULL, settings=list()) palgrave(p, r, n, q, w=NULL, base=NULL, settings=list()) svartia(p, r, n, q, w=NULL, base=NULL, settings=list()) toernqvist(p, r, n, q, w=NULL, base=NULL, settings=list()) theil(p, r, n, q, w=NULL, base=NULL, settings=list()) uvalue(p, r, n, q, base=NULL, settings=list()) walsh(p, r, n, q, w=NULL, base=NULL, settings=list()) young(p, r, n, q, base=NULL, settings=list())
banerjee(p, r, n, q, base=NULL, settings=list()) bmw(p, r, n, base=NULL, settings=list()) carli(p, r, n, base=NULL, settings=list()) cswd(p, r, n, base=NULL, settings=list()) davies(p, r, n, q, base=NULL, settings=list()) drobisch(p, r, n, q, w=NULL, base=NULL, settings=list()) dutot(p, r, n, base=NULL, settings=list()) fisher(p, r, n, q, w=NULL, base=NULL, settings=list()) geolaspeyres(p, r, n, q, w=NULL, base=NULL, settings=list()) geopaasche(p, r, n, q, w=NULL, base=NULL, settings=list()) geowalsh(p, r, n, q, w=NULL, base=NULL, settings=list()) harmonic(p, r, n, base=NULL, settings=list()) jevons(p, r, n, base=NULL, settings=list()) laspeyres(p, r, n, q, w=NULL, base=NULL, settings=list()) lehr(p, r, n, q, base=NULL, settings=list()) lowe(p, r, n, q, base=NULL, settings=list()) medgeworth(p, r, n, q, base=NULL, settings=list()) paasche(p, r, n, q, w=NULL, base=NULL, settings=list()) palgrave(p, r, n, q, w=NULL, base=NULL, settings=list()) svartia(p, r, n, q, w=NULL, base=NULL, settings=list()) toernqvist(p, r, n, q, w=NULL, base=NULL, settings=list()) theil(p, r, n, q, w=NULL, base=NULL, settings=list()) uvalue(p, r, n, q, base=NULL, settings=list()) walsh(p, r, n, q, w=NULL, base=NULL, settings=list()) young(p, r, n, q, base=NULL, settings=list())
p |
A numeric vector of prices. |
r , n
|
A character vector or factor of regional entities |
q , w
|
A numeric vector of non-negative quantities |
base |
A character specifying the base region to which all price levels are expressed. If |
settings |
A list of control settings to be used. The following settings are supported:
|
Before calculations start, missing values are excluded and duplicated observations for r
and n
are aggregated, that is, duplicated prices p
and weights w
are averaged and duplicated quantities q
added up.
The weights w
must represent expenditure shares defined as . They are internally (re-)normalized such that they add up to 1 for each region
r
.
A named vector of price levels.
Sebastian Weinand
ILO, IMF, OECD, UNECE, Eurostat and World Bank (2020). Consumer Price Index Manual: Concepts and Methods. Washington DC: International Monetary Fund.
# sample complete price data: set.seed(123) dt1 <- rdata(R=3, B=1, N=5) # compute jevons and toernqvist index: dt1[, jevons(p=price, r=region, n=product, base="1")] dt1[, toernqvist(p=price, r=region, n=product, q=quantity, base="1")] # compute lowe index using quantities of region 2: dt1[, lowe(p=price, r=region, n=product, q=quantity, base="1", settings=list(qbase="2"))] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute jevons and toernqvist index: dt[, jevons(p=price, r=region, n=product, base="1")] # change base region: dt[, jevons(p=price, r=region, n=product, base="4")]
# sample complete price data: set.seed(123) dt1 <- rdata(R=3, B=1, N=5) # compute jevons and toernqvist index: dt1[, jevons(p=price, r=region, n=product, base="1")] dt1[, toernqvist(p=price, r=region, n=product, q=quantity, base="1")] # compute lowe index using quantities of region 2: dt1[, lowe(p=price, r=region, n=product, q=quantity, base="1", settings=list(qbase="2"))] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute jevons and toernqvist index: dt[, jevons(p=price, r=region, n=product, base="1")] # change base region: dt[, jevons(p=price, r=region, n=product, base="4")]
Function cpd()
estimates regional price levels by the Country-Product-Dummy (CPD) method, originally developed by Summers (1973). Auer and Weinand (2022) recently proposed a generalization of the CPD method. This nonlinear CPD method (NLCPD method) is implemented in function nlcpd()
.
cpd(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list()) nlcpd(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list(), ...)
cpd(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list()) nlcpd(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list(), ...)
p |
A numeric vector of prices. |
r , n
|
A character vector or factor of regional entities |
q , w
|
A numeric vector of non-negative quantities |
base |
A character specifying the base to which the estimated logarithmic regional price levels are expressed. When |
simplify |
A logical indicating whether the full regression-object should be provided ( |
settings |
A list of control settings to be used. The following settings are supported:
|
... |
Further arguments passed to |
The CPD method is a linear regression model that explains the logarithmic price of product in region
,
, by the general product price,
, and the overall price level,
:
The NLCPD method inflates the CPD model by product-specific elasticities :
Note that both the CPD and the NLCPD method require a normalization of the estimated price levels to avoid multicollinearity. If
base=NULL
, normalization is used in both functions; otherwise, one price level is set to 0. The NLCPD method additionally imposes the restriction
, where the weights
can be defined by
settings$w.delta
. In nlcpd()
, it is always the parameter that is derived residually from this restriction.
Before calculations start, missing values are excluded and duplicated observations for r
and n
are aggregated, that is, duplicated prices p
and weights w
are averaged and duplicated quantities q
added up.
If q
is provided, expenditure shares are derived as and used as weights in the regression. If only
w
is provided, the weights w
are (re-)normalized by default. If the weights w
do not represent expenditure shares, the (re-)normalization can be turned off by settings=list(norm.weights=FALSE)
.
For simplify=TRUE
, a named vector of (unlogged) regional price levels. Otherwise, for cpd()
, a lm
-object containing the full regression output, and for nlcpd()
the full output of nls.lm()
plus element w.delta
.
Sebastian Weinand
Auer, L. v. and Weinand, S. (2022). A Nonlinear Generalization of the Country-Product- Dummy Method. Discussion Paper 2022/45, Deutsche Bundesbank.
Summers, R. (1973). International Price Comparisons based upon Incomplete Data. Review of Income and Wealth, 19 (1), 1-16.
# sample complete price data: set.seed(123) R <- 3 # number of regions B <- 1 # number of product groups N <- 5 # number of products dt1 <- rdata(R=R, B=B, N=N) # compute expenditure share weighted cpd and nlcpd index: dt1[, cpd(p=price, r=region, n=product, q=quantity)] dt1[, nlcpd(p=price, r=region, n=product, q=quantity)] # set individual start values in nlcpd(): par.init <- list("lnP"=setNames(rep(0, R), 1:R), "pi"=setNames(rep(2, N), 1:N), "delta"=setNames(rep(1, N), 1:N)) dt1[, nlcpd(p=price, r=region, n=product, q=quantity, par=par.init)] # use lower and upper bounds on parameters: dt1[, nlcpd(p=price, r=region, n=product, q=quantity, lower=unlist(par.init)-0.1, upper=unlist(par.init)+0.1)] # change internal calculation of start values: dt1[, nlcpd(p=price, r=region, n=product, q=quantity, settings=list(self.start="s2"))] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute expenditure share weighted cpd and nlcpd index: dt[, cpd(p=price, r=region, n=product, q=quantity, base="1")] dt[, nlcpd(p=price, r=region, n=product, q=quantity, base="1")] # compare with toernqvist index: dt[, toernqvist(p=price, r=region, n=product, q=quantity, base="1")] # computational speed in nlcpd() usually increases if use.jac=TRUE: set.seed(123) dt3 <- rdata(R=20, B=1, N=30) system.time(m1 <- dt3[, nlcpd(p=price, r=region, n=product, q=quantity, settings=list(use.jac=FALSE), simplify=FALSE, control=minpack.lm::nls.lm.control("maxiter"=200))]) system.time(m2 <- dt3[, nlcpd(p=price, r=region, n=product, q=quantity, settings=list(use.jac=TRUE), simplify=FALSE, control=minpack.lm::nls.lm.control("maxiter"=200))]) all.equal(m1$par, m2$par, tol=1e-05)
# sample complete price data: set.seed(123) R <- 3 # number of regions B <- 1 # number of product groups N <- 5 # number of products dt1 <- rdata(R=R, B=B, N=N) # compute expenditure share weighted cpd and nlcpd index: dt1[, cpd(p=price, r=region, n=product, q=quantity)] dt1[, nlcpd(p=price, r=region, n=product, q=quantity)] # set individual start values in nlcpd(): par.init <- list("lnP"=setNames(rep(0, R), 1:R), "pi"=setNames(rep(2, N), 1:N), "delta"=setNames(rep(1, N), 1:N)) dt1[, nlcpd(p=price, r=region, n=product, q=quantity, par=par.init)] # use lower and upper bounds on parameters: dt1[, nlcpd(p=price, r=region, n=product, q=quantity, lower=unlist(par.init)-0.1, upper=unlist(par.init)+0.1)] # change internal calculation of start values: dt1[, nlcpd(p=price, r=region, n=product, q=quantity, settings=list(self.start="s2"))] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute expenditure share weighted cpd and nlcpd index: dt[, cpd(p=price, r=region, n=product, q=quantity, base="1")] dt[, nlcpd(p=price, r=region, n=product, q=quantity, base="1")] # compare with toernqvist index: dt[, toernqvist(p=price, r=region, n=product, q=quantity, base="1")] # computational speed in nlcpd() usually increases if use.jac=TRUE: set.seed(123) dt3 <- rdata(R=20, B=1, N=30) system.time(m1 <- dt3[, nlcpd(p=price, r=region, n=product, q=quantity, settings=list(use.jac=FALSE), simplify=FALSE, control=minpack.lm::nls.lm.control("maxiter"=200))]) system.time(m2 <- dt3[, nlcpd(p=price, r=region, n=product, q=quantity, settings=list(use.jac=TRUE), simplify=FALSE, control=minpack.lm::nls.lm.control("maxiter"=200))]) all.equal(m1$par, m2$par, tol=1e-05)
Function index.pairs()
computes bilateral index numbers for all pairs of regions. Based on that, function geks()
derives regional price levels using the GEKS method proposed by Gini (1924, 1931), Elteto and Koves (1964), and Szulc (1964).
index.pairs(p, r, n, q=NULL, w=NULL, settings=list()) geks(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list())
index.pairs(p, r, n, q=NULL, w=NULL, settings=list()) geks(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list())
p |
A numeric vector of prices. |
r , n
|
A character vector or factor of regional entities |
q , w
|
A numeric vector of non-negative quantities |
base |
A character specifying the base region to which all price levels are expressed. When |
simplify |
A logical indicating whether the full regression-object should be provided ( |
settings |
A list of control settings to be used. The following settings are supported:
|
The GEKS index is a two-step approach. First, prices are aggregated into bilateral index numbers using the index given in type
. This is done for all pairs of regions via function index.pairs()
. Second, these bilateral index numbers are transformed into a set of multilateral, transitive index numbers.
Note that the quantities q
or weights w
are used within the aggregation of prices into index numbers (first stage) while the subsequent transformation of these index numbers (second stage) usually does not rely on any weights (but can if specified in settings$wmethod
).
Before calculations start, missing values are excluded and duplicated observations for r
and n
are aggregated, that is, duplicated prices p
and weights w
are averaged and duplicated quantities q
added up.
The weights w
must represent expenditure shares defined as . They are internally (re-)normalized such that they add up to 1 for each region
r
.
For index.pairs()
, a data.table with variables base
(the base region), region
(the comparison region), and eval(settings$type)
(the price level between the two regions).
For geks()
, a named vector or matrix of (unlogged) regional price levels if simplify=TRUE
. Otherwise, for simplify=FALSE
, a lm
-object containing the full regression output.
Sebastian Weinand
Gini, C. (1924). Quelques Considerations au Sujet de la Construction des Nombres Indices des Prix et des Questions Analogues. Mentron, 4 (1), 3-162.
Gini, C. (1931). On the Circular Test of Index Numbers. International Statistical Review, 9 (2), 3-25.
Elteto, O. and Koves, P. (1964). On a Problem of Index Number Computation Relating to International Comparison. Statisztikai Szemle, 42, 507-518.
Szulc, B. J. (1964). Indices for Multiregional Comparisons. Przeglad Statystyczny, 3, 239-254.
# example data: set.seed(123) dt1 <- rdata(R=3, B=1, N=5) ### Index pairs # matrix of bilateral index numbers: Pje <- dt1[, index.pairs(p=price, r=region, n=product, settings=list(type="jevons"))] # if the underlying index satisfies the country-reversal # test (like the Jevons index), the price index numbers of # the upper-right triangle are the same as the inverse of # the price index numbers of the lower-left triangle. all.equal(Pje$jevons[3], 1/Pje$jevons[7]) # true # hence, one could set all.pairs=FALSE without loosing any # information. however, this is no longer true for indices # that do not satisfy this test (like the Carli index): Pca <- dt1[, index.pairs(p=price, r=region, n=product, settings=list(type="carli"))] all.equal(Pca$carli[3], 1/Pca$carli[7]) # false ### GEKS method # for complete price data (no gaps), the jevons index is transitive. # hence, no adjustment is needed by the geks approach, which is # why the index numbers are the same: all.equal( dt1[, geks(p=price, r=region, n=product, base="1", settings=list(type="jevons"))], dt1[, jevons(p=price, r=region, n=product, base="1")] ) # true # this is no longer true when there are gaps in the data: dt1.gaps <- dt1[!rgaps(region, product, amount=0.25), ] all.equal( dt1.gaps[, geks(p=price, r=region, n=product, base="1", settings=list(type="jevons"))], dt1.gaps[, jevons(p=price, r=region, n=product, base="1")] ) # now, differences # weighting at the second step of GEKS can be done with respect # to the intersection of products for each pair of region: dt1.gaps[, geks(p=price, r=region, n=product, base="1", settings=list(type="jevons", wmethod="obs"))] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute all index pairs and geks: require(data.table) as.matrix(dcast( data=dt[, index.pairs(p=price, r=region, n=product)], formula=base~region, value.var="jevons"), rownames="base") dt[, geks(p=price, r=region, n=product, base="1", settings=list(type="jevons"))]
# example data: set.seed(123) dt1 <- rdata(R=3, B=1, N=5) ### Index pairs # matrix of bilateral index numbers: Pje <- dt1[, index.pairs(p=price, r=region, n=product, settings=list(type="jevons"))] # if the underlying index satisfies the country-reversal # test (like the Jevons index), the price index numbers of # the upper-right triangle are the same as the inverse of # the price index numbers of the lower-left triangle. all.equal(Pje$jevons[3], 1/Pje$jevons[7]) # true # hence, one could set all.pairs=FALSE without loosing any # information. however, this is no longer true for indices # that do not satisfy this test (like the Carli index): Pca <- dt1[, index.pairs(p=price, r=region, n=product, settings=list(type="carli"))] all.equal(Pca$carli[3], 1/Pca$carli[7]) # false ### GEKS method # for complete price data (no gaps), the jevons index is transitive. # hence, no adjustment is needed by the geks approach, which is # why the index numbers are the same: all.equal( dt1[, geks(p=price, r=region, n=product, base="1", settings=list(type="jevons"))], dt1[, jevons(p=price, r=region, n=product, base="1")] ) # true # this is no longer true when there are gaps in the data: dt1.gaps <- dt1[!rgaps(region, product, amount=0.25), ] all.equal( dt1.gaps[, geks(p=price, r=region, n=product, base="1", settings=list(type="jevons"))], dt1.gaps[, jevons(p=price, r=region, n=product, base="1")] ) # now, differences # weighting at the second step of GEKS can be done with respect # to the intersection of products for each pair of region: dt1.gaps[, geks(p=price, r=region, n=product, base="1", settings=list(type="jevons", wmethod="obs"))] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute all index pairs and geks: require(data.table) as.matrix(dcast( data=dt[, index.pairs(p=price, r=region, n=product)], formula=base~region, value.var="jevons"), rownames="base") dt[, geks(p=price, r=region, n=product, base="1", settings=list(type="jevons"))]
Calculation of regional price levels using the multilateral Gerardi index (Eurostat, 1978).
gerardi(p, r, n, q, w=NULL, base=NULL, simplify=TRUE, settings=list())
gerardi(p, r, n, q, w=NULL, base=NULL, simplify=TRUE, settings=list())
p |
A numeric vector of prices. |
r , n
|
A character vector or factor of regional entities |
q , w
|
A numeric vector of non-negative quantities |
base |
A character specifying the base region to which all price levels are expressed. When |
simplify |
A logical indicating whether a named vector of estimated regional price levels ( |
settings |
A list of control settings to be used. The following settings are supported:
|
Before calculations start, missing values are excluded and duplicated observations for r
and n
are aggregated, that is, duplicated prices p
and weights w
are averaged and duplicated quantities q
added up.
The weights w
must represent expenditure shares defined as . They are internally (re-)normalized such that they add up to 1 for each region
r
.
For simplify=TRUE
, a named vector of regional price levels. Otherwise, for simplify=FALSE
, a list containing the named vector of international product prices and regional price levels.
Sebastian Weinand
Balk, B. M. (1996). A comparison of ten methods for multilateral international price and volume comparisons. Journal of Official Statistics, 12 (1), 199-222.
Eurostat (1978), Comparison in real values of the aggregates of ESA 1975, Publications Office, Luxembourg.
require(data.table) # example data: set.seed(123) dt1 <- rdata(R=3, B=1, N=5) # Gerardi price index: dt1[, gerardi(p=price, q=quantity, r=region, n=product)] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute expenditure share weights: dt[, "share" := price*quantity/sum(price*quantity), by="region"] # Gerardi index with quantites or expenditure share weights: dt[, gerardi(p=price, q=quantity, r=region, n=product)] dt[, gerardi(p=price, w=share, r=region, n=product)]
require(data.table) # example data: set.seed(123) dt1 <- rdata(R=3, B=1, N=5) # Gerardi price index: dt1[, gerardi(p=price, q=quantity, r=region, n=product)] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute expenditure share weights: dt[, "share" := price*quantity/sum(price*quantity), by="region"] # Gerardi index with quantites or expenditure share weights: dt[, gerardi(p=price, q=quantity, r=region, n=product)] dt[, gerardi(p=price, w=share, r=region, n=product)]
Calculation of regional price levels using the
Geary-Khamis method (Geary, 1958; Khamis, 1972): gkhamis()
Iklé method (Ikle, 1972; Dikhanov, 1997; Balk, 1996): ikle()
Rao system (Rao, 1990): rao()
Rao-Hajargasht method (Rao and Hajargasht, 2016): rhajargasht()
All methods have in common that they set up a system of interrelated equations of international product prices and price levels, which must be solved iteratively. It is only the definition of the international product prices and price levels that differ between the methods (see package vignette).
gkhamis(p, r, n, q=NULL, base=NULL, simplify=TRUE, settings=list()) ikle(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list()) rao(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list()) rhajargasht(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list())
gkhamis(p, r, n, q=NULL, base=NULL, simplify=TRUE, settings=list()) ikle(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list()) rao(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list()) rhajargasht(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list())
p |
A numeric vector of prices. |
r , n
|
A character vector or factor of regional entities |
q , w
|
A numeric vector of non-negative quantities |
base |
A character specifying the base region to which all price levels are expressed. When |
simplify |
A logical indicating whether a named vector of estimated regional price levels ( |
settings |
A list of control settings to be used. The following settings are supported:
|
In their original form, the above index methods use quantities (or weights). However, Rao and Hajargasht (2016, p. 417) have shown that similar solutions exist for the unweighted definitions of international product prices and price levels. This is implemented in the functions where
gkhamis(q=NULL)
corresponds to a multilateral Dutot index;
ikle(q=NULL, w=NULL)
to a multilateral Harmonic mean index;
rao(q=NULL, w=NULL)
to a multilateral Jevons index;
rhajargasht(q=NULL, w=NULL)
to a multilateral Carli index.
Before calculations start, missing values are excluded and duplicated observations for r
and n
are aggregated, that is, duplicated prices p
and weights w
are averaged and duplicated quantities q
added up.
The weights w
must represent expenditure shares defined as . They are internally (re-)normalized such that they add up to 1 for each region
r
.
For simplify=TRUE
, a named vector of regional price levels. Otherwise, for simplify=FALSE
, a list containing the named vector of international product prices and regional price levels, the number of iterations until convergence, and the achieved difference at convergence.
Sebastian Weinand
Balk, B. M. (1996). A comparison of ten methods for multilateral international price and volume comparisons. Journal of Official Statistics, 12 (1), 199-222.
Diewert, W. E. (1999). Axiomatic and Economic Approaches to International Comparisons. In: International and Interarea Comparisons of Income, Output and Prices, edited by A. Heston and R. E Lipsey. Chicago: The University of Chicago Press.
Dikhanov, Y. (1994). Sensitivity of PPP-based income estimates to the choice of aggregation procedures. The World Bank, Washington D.C., June 10, paper presented at 23rd General Conference of the International Association for Research in Income and Wealth, St. Andrews, Canada.
Geary, R. C. (1958). A Note on the Comparison of Exchange Rates and Purchasing Power Between Countries. Journal of the Royal Statistical Society. Series A (General), 121 (1), 97–99.
Ikle, D. M. (1972). A new approach to the index number problem. The Quarterly Journal of Economics, 86 (2), 188-211.
Khamis, S. H. (1972). A New System of Index Numbers for National and International Purposes. Journal of the Royal Statistical Society. Series A (General), 135 (1), 96–121.
Rao, D. S. P. (1990). A system of log-change index numbers for multilateral comparisons. In: Comparisons of prices and real products in Latin America. Contributions to Economic Analysis Series, edited by Salazar-Carrillo and Rao. Amsterdam: North-Holland Publishing Company.
Rao, D. S. P. and G. Hajargasht (2016). Stochastic approach to computation of purchasing power parities in the International Comparison Program. Journal of Econometrics, 191 (2016), 414-425.
require(data.table) # example data: set.seed(123) dt1 <- rdata(R=3, B=1, N=5) # Gery-Khamis price index can be obtained in two ways: dt1[, gkhamis(p=price, q=quantity, r=region, n=product, settings=list(solve="iterative"))] dt1[, gkhamis(p=price, q=quantity, r=region, n=product, settings=list(solve="matrix"))] # gkhamis(), ikle() and gerardi() yield same results if quantites the same: dt1[, "quantity2" := 1000*rleidv(product)] dt1[, gkhamis(p=price, r=region, n=product, q=quantity2)] dt1[, gerardi(p=price, r=region, n=product, q=quantity2)] dt1[, ikle(p=price, r=region, n=product, q=quantity2)] dt1[, "quantity2":=NULL] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute expenditure share weights: dt[, "share" := price*quantity/sum(price*quantity), by="region"] # Ikle index with quantites or expenditure share weights: dt[, ikle(p=price, q=quantity, r=region, n=product)] dt[, ikle(p=price, w=share, r=region, n=product)]
require(data.table) # example data: set.seed(123) dt1 <- rdata(R=3, B=1, N=5) # Gery-Khamis price index can be obtained in two ways: dt1[, gkhamis(p=price, q=quantity, r=region, n=product, settings=list(solve="iterative"))] dt1[, gkhamis(p=price, q=quantity, r=region, n=product, settings=list(solve="matrix"))] # gkhamis(), ikle() and gerardi() yield same results if quantites the same: dt1[, "quantity2" := 1000*rleidv(product)] dt1[, gkhamis(p=price, r=region, n=product, q=quantity2)] dt1[, gerardi(p=price, r=region, n=product, q=quantity2)] dt1[, ikle(p=price, r=region, n=product, q=quantity2)] dt1[, "quantity2":=NULL] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute expenditure share weights: dt[, "share" := price*quantity/sum(price*quantity), by="region"] # Ikle index with quantites or expenditure share weights: dt[, ikle(p=price, q=quantity, r=region, n=product)] dt[, ikle(p=price, w=share, r=region, n=product)]
A price matrix or price tableau typically consists of prices for multiple products and regions.
Function is.connected()
checks if all regions in the price matrix are connected either directly or indirectly by some bridging region. neighbors()
divides the regions into groups of connected regions. connect()
is a simple wrapper of neighbors()
, connecting some price data by relying on the group with the maximum number of observations. comparisons()
derives the amount of bilateral (or pairwise) comparisons that could be computed for each of those groups of regions. sparsity()
indicates the sparsity of the price matrix.
is.connected(r, n) neighbors(r, n, simplify=FALSE) connect(r, n) comparisons(r, n, ngbs=NULL) sparsity(r, n, useable=FALSE)
is.connected(r, n) neighbors(r, n, simplify=FALSE) connect(r, n) comparisons(r, n, ngbs=NULL) sparsity(r, n, useable=FALSE)
r , n
|
A character vector or factor of regional entities |
simplify |
A logical indicating whether the results should be simplified to a factor of group identifiers ( |
ngbs |
Either |
useable |
A logical indicating whether only observations should be taken into account that could be used for interregional comparisons ( |
Following World Bank (2013, p. 98), a "price tableau is said to be connected if the price data are such that it is not possible to place the countries in two groups in which no item priced by any country in one group is priced by any other country in the second group".
Function is.connected()
prints a single logical indicating the connectedness while connect()
returns a logical vector of the same length as the input vectors. neighbors()
gives a list or vector of connected regions. sparsity()
returns a single numeric showing the sparsity of the price matrix. comparisons()
outputs a data.table with the following variables:
group_id |
group identifier | |
group_members |
regions belonging to that group | |
group_size |
number of regions belonging to that group | |
total |
number of (non-redundant) regional pairs that could be computed, following the formula
|
|
direct |
number of regional pairs that traces back to direct connections, e.g. when two regions have priced the same product | |
indirect |
number of regional pairs that traces back to indirect connections, e.g. when two regions are connected via a third bridging region | |
n_obs |
number of observations containing interregional information | |
Sebastian Weinand
World Bank (2013). Measuring the Real Size of the World Economy: The Framework, Methodology, and Results of the International Comparison Program. Washington, D.C.: World Bank.
### connected price data: set.seed(123) dt1 <- rdata(R=4, B=1, N=3) dt1[, sparsity(r = region, n = product)] dt1[, is.connected(r = region, n = product)] # true dt1[, neighbors(r = region, n = product, simplify = TRUE)] dt1[, comparisons(r = region, n = product)] ### non-connected price data: dt2 <- data.table::data.table( "region" = c("a","a","h","b","a","a","c","c","d","e","e","f",NA), "product" = c(1,1,"bla",1,2,3,3,4,4,5,6,6,7), "price" = runif(13,5,6), stringsAsFactors = TRUE) dt2[, is.connected(r = region, n = product)] # false with(dt2, neighbors(r=region, n=product)) dt2[, comparisons(region, product)] # note that the region-product-combination [NA,7] is dropped # from the output, while [a,2] and [e,5] are not included in # the calculation of 'n_obs' as both are not useable in terms # of regional price comparisons. also sparsity() takes this # into account, if wanted: dt2[, sparsity(region, product, useable=TRUE)] dt2[, sparsity(region, product)] # connect the price data: dt2[connect(r=region, n=product),]
### connected price data: set.seed(123) dt1 <- rdata(R=4, B=1, N=3) dt1[, sparsity(r = region, n = product)] dt1[, is.connected(r = region, n = product)] # true dt1[, neighbors(r = region, n = product, simplify = TRUE)] dt1[, comparisons(r = region, n = product)] ### non-connected price data: dt2 <- data.table::data.table( "region" = c("a","a","h","b","a","a","c","c","d","e","e","f",NA), "product" = c(1,1,"bla",1,2,3,3,4,4,5,6,6,7), "price" = runif(13,5,6), stringsAsFactors = TRUE) dt2[, is.connected(r = region, n = product)] # false with(dt2, neighbors(r=region, n=product)) dt2[, comparisons(region, product)] # note that the region-product-combination [NA,7] is dropped # from the output, while [a,2] and [e,5] are not included in # the calculation of 'n_obs' as both are not useable in terms # of regional price comparisons. also sparsity() takes this # into account, if wanted: dt2[, sparsity(region, product, useable=TRUE)] dt2[, sparsity(region, product)] # connect the price data: dt2[connect(r=region, n=product),]
Calculation of multiple spatial price indices at once.
# list all available price indices: list.indices() # compute all price indices: pricelevels(p, r, n, q=NULL, w=NULL, base=NULL, settings=list())
# list all available price indices: list.indices() # compute all price indices: pricelevels(p, r, n, q=NULL, w=NULL, base=NULL, settings=list())
p |
A numeric vector of prices. |
r , n
|
A character vector or factor of regional entities |
q , w
|
A numeric vector of non-negative quantities |
base |
A character specifying the base region to which all price levels are expressed. If |
settings |
A list of control settings to be used. The following settings are supported:
|
Before calculations start, missing values are excluded and duplicated observations for r
and n
are aggregated, that is, duplicated prices p
and weights w
are averaged and duplicated quantities q
added up.
The weights w
must represent expenditure shares defined as . They are internally (re-)normalized such that they add up to 1 for each region
r
.
A matrix of price levels where the rows contain the index methods and the columns the regions.
Sebastian Weinand
# sample complete price data: set.seed(123) dt1 <- rdata(R=3, B=1, N=5) # compute unweighted indices: dt1[, pricelevels(p=price, r=region, n=product, base="1")] # compute all indices relying on quantities: dt1[, pricelevels(p=price, r=region, n=product, q=quantity, base="1")] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute all unweighted indices: dt[, pricelevels(p=price, r=region, n=product, base="1")] # change base region: dt[, pricelevels(p=price, r=region, n=product, base="4")]
# sample complete price data: set.seed(123) dt1 <- rdata(R=3, B=1, N=5) # compute unweighted indices: dt1[, pricelevels(p=price, r=region, n=product, base="1")] # compute all indices relying on quantities: dt1[, pricelevels(p=price, r=region, n=product, q=quantity, base="1")] # add price data: dt2 <- rdata(R=4, B=1, N=4) dt2[, "region":=factor(region, labels=4:7)] dt2[, "product":=factor(product, labels=6:9)] dt <- rbind(dt1, dt2) dt[, is.connected(r=region, n=product)] # non-connected now # compute all unweighted indices: dt[, pricelevels(p=price, r=region, n=product, base="1")] # change base region: dt[, pricelevels(p=price, r=region, n=product, base="4")]
Calculation of regional price ratios per product with flexible setting of base prices.
ratios(p, r, n, base=NULL, static=FALSE, settings=list())
ratios(p, r, n, base=NULL, static=FALSE, settings=list())
p |
A numeric vector of prices. |
r , n
|
A character vector or factor of regional entities |
base |
A character specifying the base region to be used for the calculation of price ratios, i.e., |
static |
A logical indicating whether the |
settings |
A list of control settings to be used. The following settings are supported:
|
If base
is not available for a specific product, , and
static=FALSE
, another base region is used instead. This is particularly important in cases of missing prices. Otherwise, for static=TRUE
, computation is not possible and gives NA
.
If there are duplicated observations, only one of these duplicates is used as the base price. For example, if two prices are available for product in base region
,
ratios()
divides both prices by the first one.
A numeric vector of the same length as p
. If base
has been adjusted for some products, the attribute attr("base")
is added to the output, providing the respective base region.
Sebastian Weinand
### (1) unique price observations; no missings set.seed(123) dt1 <- rdata(R=3, B=1, N=4) levels(dt1$region) <- c("a","b","c") # calculate price ratios by product: dt1[, ratios(p=price, r=region, n=product, base="b")] ### (2) unique price observations; missings # drop two observations: dt2 <- dt1[-c(5,10), ] # now, region 'a' is base for product 2: (pr <- dt2[, ratios(p=price, r=region, n=product, base="b")]) # base region prices are stored in attributes: attr(pr, "base") # with static base, NAs are produced: dt2[, ratios(p=price, r=region, n=product, base="b", static=TRUE)] ### (3) treatment of duplicates and missing prices (not NAs): # insert duplicates and missings: dt3 <- rbind(dt1[2,], dt1[-c(1,10),]) dt3[1, "price" := dt1[2,price]+abs(rnorm(1))] anyDuplicated(dt3, by=c("region","product")) # duplicated prices are divided by the first base price: dt3[, ratios(p=price, r=region, n=product, base="b")]
### (1) unique price observations; no missings set.seed(123) dt1 <- rdata(R=3, B=1, N=4) levels(dt1$region) <- c("a","b","c") # calculate price ratios by product: dt1[, ratios(p=price, r=region, n=product, base="b")] ### (2) unique price observations; missings # drop two observations: dt2 <- dt1[-c(5,10), ] # now, region 'a' is base for product 2: (pr <- dt2[, ratios(p=price, r=region, n=product, base="b")]) # base region prices are stored in attributes: attr(pr, "base") # with static base, NAs are produced: dt2[, ratios(p=price, r=region, n=product, base="b", static=TRUE)] ### (3) treatment of duplicates and missing prices (not NAs): # insert duplicates and missings: dt3 <- rbind(dt1[2,], dt1[-c(1,10),]) dt3[1, "price" := dt1[2,price]+abs(rnorm(1))] anyDuplicated(dt3, by=c("region","product")) # duplicated prices are divided by the first base price: dt3[, ratios(p=price, r=region, n=product, base="b")]
Simulate random price and quantity data for a specified number of regions , product groups
, and individual products
using function
rdata()
.
The sampling of prices relies on the NLCPD model (see nlcpd()
), while expenditure weights for product groups are sampled using function rweights()
. Purchased quantities are assigned to individual products. Moreover, random sales and gaps (using function rgaps()
) can be introduced in the sampled data.
rgaps(r, n, amount=0, prob=NULL, pairs=FALSE, exclude=NULL) rweights(r, b, type=~1) rdata(R, B, N, gaps=0, weights=~b+r, sales=0, settings=list())
rgaps(r, n, amount=0, prob=NULL, pairs=FALSE, exclude=NULL) rweights(r, b, type=~1) rdata(R, B, N, gaps=0, weights=~b+r, sales=0, settings=list())
r , n , b
|
A character vector or factor of regional entities |
R , B , N
|
A single integer specifying the number of regions |
weights , type
|
A formula specifying the sampling of expenditure weights for product groups. If |
gaps , sales , amount
|
Percentage amount of gaps and sales (between 0 and 1), respectively, to be introduced in the data. |
prob |
A vector of probability weights, see also |
pairs |
A logical indicating if gaps should be introduced such that there are always at least two observations per product available ( |
exclude |
Data.frame of two (character) variables |
settings |
A list of control settings to be used. The following settings are supported:
|
Function rgaps()
ensures that gaps do not lead to non-connected price data (see is.connected()
). Therefore, it could happen that the amount of gaps specified in rgaps()
is only approximate, in particular, in cases where certain regions and/or products should additionally be excluded from exhibiting gaps by exclude
.
If rgaps(pairs=FALSE)
, the minimum number of observations for a connected data set is . Otherwise, for
rgaps(pairs=TRUE)
, this number is defined by .
Note that setting sales>0
in function rdata()
distorts the initial price generating process. Consequently, parameter estimates may deviate stronger from their true values. Note also that the sampled expenditure weights weight
represent the relevance of product groups as (often) derived from national accounts and other data sources. Therefore, they cannot be derived from the sampled prices and quantities in the data, which would represent the expenditure shares of available products.
Function rgaps()
returns a logical vector of the same length as r
where TRUE
s indicate gaps and FALSE
s no gaps.
Function rweights()
returns a numeric vector of (non-negative) expenditure share weights of the same length as r
.
Function rdata()
returns a data.table with the following variables:
group |
product group identifier (factor) | |
weight |
expenditure weight of product groups (numeric) | |
region |
region identifier (factor) | |
product |
product identifier (factor) | |
sale |
are prices and quantities affected by sales (logical) | |
price |
sampled price (numeric) | |
quantity |
consumed quantity (numeric) | |
share |
expenditure share weights (numeric) | |
or a list with the sampled data and its underlying parameter values, if settings=list(par.add=TRUE)
.
Sebastian Weinand
# sample price data for ten regions and five product groups # containing three individual products each: set.seed(1) dt <- rdata(R=10, B=5, N=3) boxplot(price~paste(group, product, sep=":"), data=dt) # sample price data for ten regions and five product groups # containing one to five individual products: set.seed(1) dt <- rdata(R=10, B=5, N=c(1,2,3,4,5)) boxplot(price~paste(group, product, sep=":"), data=dt) # sample price data for three product groups (with one product each) in four regions: dt <- rdata(R=4, B=3, N=1) # add expenditure share weights: dt[, "w1" := rweights(r=region, b=group, type=~1)] # constant dt[, "w2" := rweights(r=region, b=group, type=~b)] # product-specific dt[, "w3" := rweights(r=region, b=group, type=~b+r)] # product-region-specific # weights add up to 1: dt[, list("w1"=sum(w1),"w2"=sum(w2),"w3"=sum(w3)), by="region"] # introduce 25% random gaps: dt.gaps <- dt[!rgaps(r=region, n=product, amount=0.25), ] # weights no longer add up to 1 in each region: dt.gaps[, list("w1"=sum(w1),"w2"=sum(w2),"w3"=sum(w3)), by="region"] # approx. 25% random gaps, but keep observation for product "n2" # in region "r1" and all observations in region "r2": no_gaps <- data.frame(r=c("r1","r2"), n=c("n2",NA)) # apply to data: dt[!rgaps(r=region, n=product, amount=0.25, exclude=no_gaps), ] # or, directly, in one step: dt <- rdata(R=4, B=3, N=1, gaps=0.25, settings=list("gaps.exclude"=no_gaps)) # introduce systematic gaps: dt <- rdata(R=15, B=1, N=10) dt[, "prob" := data.table::rleidv(product)] # probability for gaps increases per product dt.gaps <- dt[!rgaps(r=region, n=product, amount=0.25, prob=prob), ] plot(table(dt.gaps$product), type="l")
# sample price data for ten regions and five product groups # containing three individual products each: set.seed(1) dt <- rdata(R=10, B=5, N=3) boxplot(price~paste(group, product, sep=":"), data=dt) # sample price data for ten regions and five product groups # containing one to five individual products: set.seed(1) dt <- rdata(R=10, B=5, N=c(1,2,3,4,5)) boxplot(price~paste(group, product, sep=":"), data=dt) # sample price data for three product groups (with one product each) in four regions: dt <- rdata(R=4, B=3, N=1) # add expenditure share weights: dt[, "w1" := rweights(r=region, b=group, type=~1)] # constant dt[, "w2" := rweights(r=region, b=group, type=~b)] # product-specific dt[, "w3" := rweights(r=region, b=group, type=~b+r)] # product-region-specific # weights add up to 1: dt[, list("w1"=sum(w1),"w2"=sum(w2),"w3"=sum(w3)), by="region"] # introduce 25% random gaps: dt.gaps <- dt[!rgaps(r=region, n=product, amount=0.25), ] # weights no longer add up to 1 in each region: dt.gaps[, list("w1"=sum(w1),"w2"=sum(w2),"w3"=sum(w3)), by="region"] # approx. 25% random gaps, but keep observation for product "n2" # in region "r1" and all observations in region "r2": no_gaps <- data.frame(r=c("r1","r2"), n=c("n2",NA)) # apply to data: dt[!rgaps(r=region, n=product, amount=0.25, exclude=no_gaps), ] # or, directly, in one step: dt <- rdata(R=4, B=3, N=1, gaps=0.25, settings=list("gaps.exclude"=no_gaps)) # introduce systematic gaps: dt <- rdata(R=15, B=1, N=10) dt[, "prob" := data.table::rleidv(product)] # probability for gaps increases per product dt.gaps <- dt[!rgaps(r=region, n=product, amount=0.25, prob=prob), ] plot(table(dt.gaps$product), type="l")