A unified framework for calculating aggregate commodity prices from a census dataset

Dalton, Michael; Fissel, Benjamin

doi:10.3233/JEM-180453

A unified framework for calculating aggregate commodity prices from a census dataset

Article type: Research Article

Authors: Dalton, Michael^* | Fissel, Benjamin

Affiliations: U.S. National Oceanic and Atmospheric Administration, National Marine Fisheries Service, Alaska Fisheries Science Center, WA 98115, USA

Correspondence: [*] Corresponding author: Michael Dalton, NOAA/NMFS/Alaska Fisheries Science Center, 7600 Sand Point Way NE, Seattle, WA 98115, USA. Tel.: +1 206 526 6551; Fax: +1 206 526 6723; E-mail: michael.dalton@noaa.gov.

Keywords: Data collection, commodities, price dispersion, unit value, producer prices

DOI: 10.3233/JEM-180453

Journal: Journal of Economic and Social Measurement, vol. 43, no. 1-2, pp. 85-104, 2018

Published: 19 July 2018

Get PDF

Abstract

Economic data collection from commodities producers in the United States typically consists of revenues and quantities. While the data collected in some sectors such as fisheries are a census of the population, features of the population such as prices, must be calculated. Unit values are widely used as a price measure to impose a single price in place of dispersed ratios of revenue to quantity from individual producers but alternatives exist. In this paper, different linear aggregation procedures are used to calculate price measures, such as ratio-based calculations (e.g., ratio-of-means, mean-of-ratios), or estimation by ordinary least squares. There are non-trivial differences in the prices calculated depending on the procedure. This paper proposes a unified framework, including Bayesian estimation, for considering the tradeoffs inherent in the different methods commonly employed.

1.Introduction

Price dispersion, which represents deviations from the Law of One Price, is common in transaction data (see Kaplan and Menzio [1]). With variability in the ratio of revenues to quantities among producers, or sellers, what is the appropriate ‘price’ of a commodity such as frozen king crab? The answer to this specific question is important to fishermen because fishery managers collect fees in the Bering Sea based on a standard price, which is a unit value. In this case, data on production and revenues are collected from every commercial fisher and processor that operates in Bering Sea fisheries, and standard prices represent a census of commercial operators in these fisheries. This paper develops alternatives to standard prices to examine effects of price dispersion.

Given cross-sectional data with quantity units and dollar values for any commodity, a first objective is usually to determine a price. But how should this price be calculated? This problem is endemic to low frequency data collection over time or space that include many transactions with dispersed prices because collection efforts will typically focus on aggregates which can be summed, of which two common variables are production quantities and revenue.1 1 For example, import and export data from the census bureau report only value and volume, but not prices of individual transactions. In some cases, collecting prices over an interval of time may be burdensome because they occur at a transaction level and would be distributed at irregular times over the interval the data are collected. However, medium and high frequency data face similar aggregation problems (e.g., Triplett [2]), and the problem of price dispersion applies to the basic level of aggregation used to construct price indices. Baye [3] formally incorporated price dispersion into price index theory, and his numerical results show the use of mean prices in an index leads to bias.

In general, index number methods break down at some level of disaggregation. Diewert [4] recommended unit value as a suitable price aggregate if data on quantities are available. The practice of using unit values in place of observed prices is widespread (e.g., Reinsdorf [5]; Nevo [6]), but encounters problems if conditions for homogeneity are not met (e.g., Balk [7]).2 2 Deaton [8] analyzed the bias that arises from using unit values to proxy prices in a demand system. Silver and Webb [9] found evidence of bias in the unit value index when using scanner data to compare effects of different aggregation levels on a consumer price index. According to Bradley [10, p. 41], “Even though it is often used throughout the economics profession and statistical agencies, little research has been done on the use of the unit value as an aggregate price measure or the use of the unit value index as a lower level subgroup price index.” Moreover, Bradley reports on studies of scanner data that use a unit value as a “plugged-in” price measure to impose a single price for an item on an area-month basis, and provides evidence this practice induces specification error and bias.

Mills [11] began his chapter on price dispersion with a description of price relatives varying around a mean value. His description can easily be extended to price levels, with dispersion around a mean price. Hence this paper starts from a related premise where the price level can be viewed as a statistical parameter, or an equivalent price aggregate, such as unit value. Viewed either as a statistical parameter, or astatistical variable, the estimate, or aggregate, gives a scalar that relates quantities to dollar values for the entire population.

This paper follows the example of Balk [7] in questioning when is a set of economic transactions sufficiently homogeneous to warrant the use of unit values. We have production data with revenues and accurate quantities collected annually, i.e., longer than a Hicksian week (Hicks [12, p. 122]), and considerable variability exists in annual revenue to quantity ratios among producers. The methods presented in this paper apply to this type of price dispersion by treating unit values as a type of average price within a family of mean price estimators, with an associated price distribution. We explore this family of estimators, and broaden it to include Bayesian estimation with a prior based on unit value. The application in the paper is to an interesting commodity, frozen Alaskan king crab, but methods in the paper could apply to any commodity for which data on quantity and value are available.

Unit value is the standard method of calculating an aggregate price over an interval of time which in technical terms is a linear aggregation procedure with a quantity weighted-average of per-unit value (i.e., price) for each production unit. This method intentionally integrates out information about the distribution of values among production units (e.g., vessels, processors) to calculate a single number, and the standard method is equivalent to using a ratio-of-means (RoM) (or total value divided by total quantity) to define an aggregate price, which is equal to the unit value. Sometimes, an alternative price is used that is an average per-unit value, which is a mean-of-ratios (MoR).3 3 If quantity and value data are available for all production units in a population, then both standard and alternative methods calculate ratios that can be interpreted as defensible aggregators for the population of prices. However, the RoM and MoR are not equal except if perfect symmetry prevails, and production volume is equal for all units, which is unrealistic. Furthermore, in practice the difference between the two means can be substantial. When the dataset is a census of the population, what does the application of these ratio-based calculations (and others) imply about the relationship between aggregate value and quantity? These and other estimators can be compared within the context of a statistical framework to determine what parameters are being estimated by these different aggregation methods.

This process implies price is an unknown that is calculated by dividing value by quantity, or equivalently, value is the product of an unknown price and quantity.4 4 The form of this relationship in a statistical framework suggests value is the response variable, quantity is the explanatory variable, and price is the unknown parameter. The use of regression analysis to estimate prices is a natural outcome of a statistical framework for price estimation, and in this paper, we investigate regression estimates of price relative to the RoM or the MoR within the family of linear aggregation. Under linear regression, the coefficient estimate in the regression of value on quantity is a price estimate which opens up new possibilities for alternative price estimates that control for exogenous factors, including seasonality. However, even a simple regression poses potential problems. For example, ordinary least squares (OLS) estimates are biased if an explanatory variable is correlated with the error term. In this paper, we assume quantity variables are exogenous explanatory variables. This assumption is potentially justifiable in resource extraction and agricultural settings where supply changes are environmentally driven. In the paper, we present some evidence in support of this assumption for our empirical application to frozen king crab, and outline a companion theory where revenues are the explanatory variable, and causality runs from revenues to quantities.

In particular, the inverse demand approach of Barten and Bettendorf [14] is generally accepted for fish and other perishable commodities (e.g., Park et al. [15], Lee and Thunberg [16]). A clear justification often exists for an inverse demand specification that treats quantity as an independent variable in fish price-quantity relationships because the total catch is regulated. However justification for an inverse demand specification can extend beyond management constraints. For perishable goods, Barten and Bettendorf [14, p. 1510] explain “supply is very inelastic in the short run and the producers are virtually price takers” and “price-taking producers and price-taking consumers are linked by traders who select a price which they expect clears the market” from which the conclusion is “causality goes from quantity to price.” For empirical support, Eales et al. [17] compared models of Japanese demand for fish and found the inverse demand specification “dominated” other demand specifications based on testing and forecasting performance. In addition, Hospital and Pan [18, Appendix A-7] describe conditions where price could be exogenous and quantity endogenous. They applied exogeneity tests to price-quantity relationships in this setting and found strong support for the inverse demand specification.

The statistical framework we propose applies to datasets that represent a cross-sectional census of the population and where the objective is to find an accurate measure relating value and quantity (as opposed to finding an optimal index for examining price changes over time). When the data are a census of the population the data contain all features of the population distribution. Because there is no sampling there is no underlying probabilistic mechanism for which population inference must be used.5 5 If we want average revenues for the population then we can simply calculate the average. We can also calculate the variance of the population. However, average in this context has no variance, it is simply the average. For price, an aggregate is less obvious because price is a rate variable which relates the flow from the quantity produced to revenues of a particular market transaction. The aggregation of the rate can depend on the importance or significance of the transaction (e.g., prices from high volume transactions may be more important in the aggregate price) which in turn relates to the aggregation methods used on quantities and revenue. Conceptually related is the property of factor reversal in economic indices, which is satisfied when the same index method is used to create price and quantity indices that capture changes in the ratio of values (see Coelli et al. [19]).

We can still conceptually impose a statistical model on census data by hypothesizing a superpopulation. Within a statistical framework, a RoM and a MoR can be compared if they are treated as two different estimators of an unknown superpopulation parameter (see Godambe and Thompson [20]). Within the superpopulation model an OLS regression of values on quantities can also be considered as an estimate of the aggregate price. From a statistical standpoint, if the assumptions for OLS are satisfied then the coefficient on quantity is an unbiased estimate of the marginal change in revenues with respect to quantity; that is the price. In addition to OLS, this paper proposes Bayesian linear regression which recognizes that prices have a distribution and allows the researcher to incorporate prior beliefs about the distribution of prices.

There are critical distinctions between the aggregate price formulae discussed here and price indices, which have been the focus of much of the price index literature. The aggregate price formulae presented here (which relate value and quantity) are not price indices (which relate prices over time/space). The construction of an aggregate price can be thought of as analogous to the construction of an index. As with indices there are numerous formulae available to choose from even for a basic object such as quantity. However, an aggregate price and a price index differ in some fundamentally important ways. The most trivial difference is that an aggregate price retains the scale of dollars per-unit quantity while indices are unit-less by design. More importantly, indices are designed to compare the prices in one state relative to another (generally one time period to another), while an aggregate price summarizes prices in a single state (i.e., at a point in, or interval of, time) for the purpose of obtaining an objective measure relating aggregate quantity and value. Because of this, when the objective of analysis is price change (e.g., over time/space) then price indices are the preferred metric. Moreover, price indices may rely on aggregate prices calculated using methods in this paper.

To demonstrate the potential differences associated with alternative aggregation procedures, we present three price estimators in a case study of the first-wholesale price of Alaska red king crab (Paralithodes camtschaticus). This case is interesting because the price of red king crab is historically variable, and the Alaska red king crab fishery is a commercially important industry to both Alaska and Washington state (Garber-Yonts and Lee [13]). The red king crab fishery is one of the largest crab fisheries in Alaska comprising roughly 20% of the total economic value from crab harvest. The Bristol Bay red king crab (BBRKC) stock constitutes roughly 95% of the red king crab harvest and is the focus of the empirical analysis. In particular, the first-wholesale price of red king crab is estimated from Alaska Department of Fish and Game (ADFG) Commercial Operator’s Annual Reports (COAR), an annual census of commercial fish and shellfish processors in Alaska that records annual data on production and first-wholesale revenues.

Although the fishery is open from October to January, the BBRKC fishery primarily harvests in November of each year.6 6 The majority of sales occur in November in anticipation of the holiday season. King crab is more of a seasonal specialty good than a high volume commodity, such as whitefish fish fillets, which can contribute to price dispersion. The main product after processing is packaged sections (e.g., whole crab legs) of cooked and frozen king crab meat in the shell. The flavor and texture of king crab meat cooked and frozen in the shell is reported to deteriorate after about 6 months in cold storage (Dassow [21]).

Cooked and frozen king crab sections may be exported for finishing or finished domestically. Though the share varies annually, exports typically constitute roughly half of the production volume. The largest importer is Japan followed by Korea, Canada, China, and Western Europe (Dalton and Lee [22]). Smaller but steady markets exist in tourist destinations including Mexico, the Caribbean, and Southeast Asia where in Indonesia some additional processing occurs for export back to the U.S. The U.S. imports much more king crab than it consumes from domestic production. In global markets, the BBRKC fishery competes mainly with Russian red king crab fisheries in the Bering Sea, and in the Barents Sea. Historically, Russia has been the largest global supplier of red king crab. Norway is also a global supplier.

The BBRKC fishery was closed in 1994 and 1995, but recovered before rationalization in 2005 (Kruse et al. [23]). In 2011–13, the BBRKC fishery produced an annual average of 5.9 million pounds of finished products, and over this period, was estimated to have generated average real first-wholesale revenues (i.e., revenues received by processors at the first sale of finished products) of about $78.5 million per year (Garber-Yonts and Lee [13, Table 1]). The BBRKC stock was included in the BSAI Crab Rationalization Program.7 7 The fishery for BBRKC is managed cooperatively by the NMFS, the North Pacific Fishery Management Council (NPFMC), and the State of Alaska. Management of BBRKC is based on a stock assessment model that is used to estimate an Overfishing Limit (OFL), the level of catch that corresponds to a proxy for the fishing mortality rate which achieves Maximum Sustainable Yield (MSY).8 8 The Acceptable Biological Catch is an OFL buffer that accounts for uncertainty and ensures the probability of exceeding the OFL is less than 50%, and is used as an upper bound for the total allowable catch (TAC), which in turn determines the market supply of BBRKC.

The outline of this paper is as follows: Section 2 describes a superpopulation model and estimation methods based on linear aggregation that relate production quantities and values. Section 3 presents a Bayesian estimation method and compares its results with those based on linear aggregation. Section 4 concludes with a discussion of results.

2.Linear aggregation and regression

In practice, ex-vessel and first-wholesale prices for fish are calculated with each cross-sectional year treated as a separate (and independent) sample. This paper adopts that practice, and to save notation, the subscript for time t is suppressed in the equations presented in this section. For each t, we assume a finite population i=1,…,N. For each single observation, we take as given that price satisfies vi=pi×qi; hence, individual prices can be calculated as the ratio of value and quantity pi=vi/qi. This ratio gives rise to the ratio-based price aggregates, RoM and the MoR. These are considered first in the case where the analyst is trying to obtain an aggregate measure of price, which is referred to as an astatistical framework. We then consider estimates of price under the assumption of a superpopulation, where the estimators are considered in a statistical framework along with the regression estimators.

2.1Astatistical calculations of an aggregate price

Even in a framework that does not rely on a statistical foundation a calculation must be constructed to meet some objective to be meaningful. By extension from the observation by observation case, a natural objective is to construct an aggregate scalar price p* such that the relationship between value-price-quantity is conserved in aggregate:

(1)

f⁢(v)-p*×f⁢(q)=0,

where f is some aggregator function (e.g., summation) which should be monotonically increasing and v and q are vectors of the N value and quantity observations. Conservation of this relationship ensures that hypothetical changes in aggregate quantity are captured in aggregate value.

As an example, consider price calculated by using summation for the aggregator function f. Price is calculated as

∑i=1Nvi-p*×∑i=1Nqi=0

p*=1/N⁢∑i=1Nvi1/N⁢∑i=1Nqi.

This calculation is ratio of average value and average quantity, the RoM. The RoM can also be written as a quantity-weighted arithmetic average price,

(2)

p*=∑i=1Nviqi⁢qiQ=∑i=1Npi⁢qiQ,

where Q=∑i=1Nqi. This measure can be interpreted as using a market share adjusted average of price, to characterize the prevailing price in the market. Additionally, it ensures that “abnormal” prices that might be associated with the sale of small quantities of a commodity are not over-represented in the average.9 9 Furthermore, the weighted average is hierarchically consistent in the sense that if the observations in the dataset are actually an aggregate of multiple transactions, then the formula used to construct and aggregate price from the observations will be the same as the implicit formula over transactions calculated when creating observation level prices.

2.1.1Generalized astatistical methods

The formulation proposed in Eq. (1) reduces the problem to choosing an aggregator function for prices and quantities. The weighted generalized mean or power mean, (∑i=1Nωi⁢xi∝)1⁣/⁣∝, is a common function which gives a representation which can capture common measures of aggregate price. Reducing the problem to the selection of a parameter and a set of weights provides a unifying framework through which different methods can be compared. The parameter ∝ controls the degree to which a large observation can influence the mean. For larger values of ∝ observations which are larger in magnitude will have greater influence on the mean and will produce larger value and quantity aggregates. Selecting ∝=1 results in the arithmetic mean, ∝= 2 the quadratic mean, and 𝑙𝑖𝑚∝⁣→∞⁢(∑i=1Nxi∝)1⁣/⁣∝=max⁡(xi). The geometric and harmonic means correspond to 𝑙𝑖𝑚∝→ 0 and ∝=⁣-1, respectively. The weights, ωi, represent our preference or weight attached to observations when aggregating value and quantity. The summation is a mean if ∑iωi=1.

Conservation of the value-price-quantity relationship (Eq. (1)) under the weighted generalized mean aggregator results in the implicit formula for price:

(∑i=1Nωi⁢vi∝)1⁣/⁣∝-p*×(∑i=1Nωi⁢qi∝)1⁣/⁣∝=0.

The generalized aggregate price function is the ratio of the weighted generalized means of value and quantity,

(3)

p*(ω,∝)=(∑i=1Nωi⁢vi∝)1⁣/⁣∝(∑i=1Nωi⁢qi∝)1⁣/⁣∝.

Thus, treating all observed values and quantities equally ωi=1/n and aggregating linearly, ∝= 1, yields the RoM calculation of aggregate price (Eq. (2)). Giving preference to observations with smaller quantities by using ωi=1/qi and aggregating linearly gives the common arithmetic average of prices or MoR,10 10 Eq. (3) is similar to a productivity index, the generalized pricing function is a ratio of (quasi-) distance functions which characterizes industries’ ability to transform inputs (production) into outputs (revenues).

(4)

p*⁢(1qi,1)=1N⁢∑i=1Nviqi=1N⁢∑i=1Npi

Alternatively, giving preference to observations with smaller values by using ωi=1/vi and aggregating linearly gives the (unweighted) harmonic average of prices,

p*⁢(1vi,1)=N∑i=1Nqivi=N∑i=1N1pi

Taking the unweighted geometric mean of value and quantity, ωi=1/n and ∝=0, produces a geometric mean of prices,11 11

p*⁢(1n,0)=∏i=1Npi1/N,

which is sometimes used in productivity analysis and instances where the log-linear arithmetic average of prices might be the preferred.

Arranging the generalized aggregate price function to isolate observation level prices shows that it can be generically cast as either a quantity-weighted generalized mean or a value-weighted generalized harmonic mean,

(5)

p*(ω,∝)=(∑i=1Npi∝ωi⁢qi∝∑i=1Nωi⁢qi∝)1⁣/⁣∝=(∑i=1N(pi∝ωi⁢vi∝∑i=1Nωi⁢vi∝)-1)-1⁣/⁣∝.

The quantity-weighted generalized mean representation is a more economically intuitive functional form than the harmonic mean. Furthermore, value is typically viewed as the result of an interaction between price and quantity. Because of this, if weights are constructed as functions, then basing weights on quantity should be preferred to value. In practice, weights can be based on anything that can be reasonably justified to meet the objectives of the analysis for which the price is being constructed. However, for all but the most common calculations of an aggregate mean the choice of weights should be made explicit.

The generalized aggregate price function and consideration of its parameters shows that the most objective calculation of the price is the ratio of means, p*⁢(1N,1)=∑i=1Nvi∑i=1Nqi, where observations are weighted equally ωi=1/n and where the influence of the magnitude of the variables is determined by their natural scale ∝=1.

2.2Statistical estimation of an aggregate price

A statistical framework is used when there is some underlying probability mechanism that is producing the observed data. When calculating population parameters sampling is often the probabilistic mechanism. There are many tools within statistics that allow all researchers to ask interesting hypothetical questions about the population. When the data are a census of the population, statistics can still be employed by assuming that the observed population is a sample of some larger superpopulation. The statistical methods employed can depend upon the assumed source of random variation and the framework under which the problem is approached.

2.2.1Direct estimation

If we take as given that pi=vi/qi is the actual observed price, then conceptually we might view price as random as opposed to constant. This would also seem natural from the perspective of the agents (e.g., processors) that negotiate the price within the market.

Consider the model for price where pi is an independent observation from the random variable P∼F⁢(γ,σ), with a continuous distribution F(⋅) whose support is R+and has constant mean and variance, E⁢(P)=γ and 𝑉𝑎𝑟⁢(P)=σ2. Like any random variable, under most reasonable assumed distributions (e.g., lognormal), if the variance is finite and constant then the optimal unbiased estimator of γ will be the sample mean, p¯=1N⁢∑i=1pi, which is the MoR. This is true for both quadratic loss and maximum-likelihood estimators of the mean. Under absolute loss the median is the optimal estimator.

When the variance is heteroskedastic, a weighted average is the optimal (minimum variance and MLE) unbiased estimator of the mean of the price distribution. The weights are the inverse of the variances,

(6)

p¯w=∑iNωi⁢pi⁢ where ⁢ωi=1σi2/∑j1σj2.

This may be justified in the case where smaller trades may be associated with a higher variance as in a commodities market. One way of capturing this relationship is by hypothesizing that the variance is proportional to inverse powers of quantity, σi2⁢(x)∝qi-x, where the x is chosen by the analyst to control the influence of quantity on the variance. When values of x are large then larger quantities will have a greater influence (i.e., larger weight) than smaller quantities,

ωi=1σi2/∑j1σj2=qix/∑jqjx.

Thus, in the statistical case the choice estimator can be represented as the analyst’s belief about the relationship between the variance and the quantity. When the variance is constant, x=0, then the unweighted sample mean is the best unbiased estimate. If the analyst believes that the variance is related to quantity by the simple inverse, x=1, then the estimator of price is the simple quantity-weighted average of prices, which is the RoM,

(7)

p¯w=∑iNωi⁢pi=∑iNpi⁢qi/∑jqj=∑ivi∑jqj.

Other estimators can be devised as well through different choices of x. Another alternative is that variance could be made to functionally depend on quantity (or value) in a different way to support some of the more esoteric estimators of the mean like the harmonic mean. In general though there is little support for alternative estimators like the harmonic mean.

2.2.2Estimation by regression

When the object of analysis is the prediction of value for hypothetical quantities, regression is a more appropriate statistical tool. The notion that the actual price paid in the market is random with a distribution is more consistent with Bayesian framework (in contrast to classical regression where quantity is related to value through a constant but unknown price),

vi=p⁢qi+εi

P(v|q,p,τ2,σ2)

π(p|τ2,σ2),

where P(⋅) is likelihood function for value, τ2 is the variance of the likelihood distribution, and π(⋅) is the prior distribution for price with variance σ2. For simplicity we assume here that the error εi is independent across production units, i, and that it’s exogenous, ε⊥q. The assumption of ε⊥q is consistent with the microeconomic principle that production units are price-takers and that the quantity of production is fixed prior to determining price in the market. In many plausible scenarios the assumption that quantity is independent of shocks to value could be violated, in which case a suitable instrument for quantity is needed for identification (Kleinbergen and Zivot [24]). The exogeneity assumption, explained in Section 3, is maintained in this paper to keep focus on the price calculation formulas. In the unified framework of this paper a dual theory exists where a regression coefficient on value vi, call it r, is estimated by OLS which extends to dual astatistical versions of the RoM, which is simply one over the original RoM for prices, and the MoR with value-weights, which is the unweighted harmonic average of prices.

When the normal distribution is used for the likelihood and prior distributions have a known variance and a diffuse prior, or the sample size is sufficiently large, then the estimate of the posterior mean of the price is the OLS estimator,12 12

p^𝑂𝐿𝑆=E(p|v,q,τ2,σ2)=∑ivi⁢qi∑jqj2=∑iNpiqi2/∑jqj2.

Interestingly, the OLS estimator itself is also a quantity-weighted average of price with quadratic weight on quantity. Equation (8) shows a connection between the direct statistical estimation of price in Section 2.2.1 and the regression estimator. A general representation of the statistical estimators is the power weighted average:

(8)

p^=∑iNpi⁢qix/∑jqjx.

By analogy to the weighted average, heteroskedasticity (e.g., σi2∝qi-1) can also be accommodated in the regression model. In Bayesian analysis a posterior simulator must be used if heteroskedasticity is introduced into the model. There is no analog in the classical regression context because price is modeled as constant (but unknown) and hence has no variance. However, in classical regression the price can be efficiently estimated in the presence of heteroskedasticity of the regression error, 𝑉𝑎𝑟⁢(εi)=τi2, through weighted or generalized least squares.

3.Empirical comparisons of aggregate price estimates

Each year the National Marine Fisheries Service (NMFS) is responsible for publishing standard prices to determine ex-vessel value for the assessment of cost-recovery fees in various catch share programs.13 13 For example, NMFS standard prices are used to assess fees for observer coverage in North Pacific groundfish fisheries (77 FR 70062), and these standard prices are based on the RoM using 3-year moving averages of volume and value (80 FR 77606). The use of 3-year moving averages is intended to dampen inter-annual variability in standard prices. As stated in the introduction, our empirical application uses data for red king crab from the BBRKC fishery. Although NMFS does not currently publish standard prices for Alaska crab stocks, the question posed at the beginning of the paper about which price estimate to use for the standard price is relevant and informed by comparing price estimates in this paper.

First-wholesale data on aggregate annual production quantity and revenue for the BBRKC fishery were obtained from the Alaska Department of Fish and Game Commercial Operator’s Annual Reports (COAR). The COAR is collected annually from processors in Alaska. These annual data comprise sales from between 15 and 21 processors processing crab between 2010 and 2014. Using these data we compare nine different price estimates in five cross-sections (across processor) of wholesale value and production data within the years 2010–14. As explained in the introduction, we assume quantity variables are exogenous explanatory variables, and follow Hospital and Pan [17] by applying exogeneity tests to price-quantity relationships.14 14

The first three estimates are the ratio estimators and OLS (Table 1). The differences among these price estimators are larger in some years than others. In 2011, the MoR and OLS differed by less than 1%, and slightly more than 10%, respectively, relative to the standard RoM. In addition, OLS differed by more than 6% in 2014. Relative differences are less than 5% for OLS in other years. The largest relative difference for MoR is 7% in 2012, with relative differences of less than 5% in other years.

Table 1

Comparison of price estimates by ratio-of-means (RoM), mean-of-ratios (MoR), and ordinary least squares (OLS), for Alaskan Red king Crab 2010–2014

YEAR	RoM	MoR	OLS
2010	13.58	13.53	13.27
2011	17.62	17.72	15.84
2012	14.88	13.84	15.51
2013	12.59	12.02	13.06
2014	11.59	11.14	12.34

The other six price estimators come from Bayesian estimation, three from the normal regression model, and the other three from the lognormal model. The three are maximum likelihood, mode of the pdfs, and posterior mean. Bayesian price estimates treat the error variance as a nuisance parameter to concentrate on a Bayesian analysis of price estimates. The maximum prior probability density function (pdf) was set to the RoM for all estimates. The first model is the linear model vi=p⁢qi+εi with a prior on prices g⁢(p) which has a normal distributionN⁢(α,β). The parameter α is set equal to the RoM for a dataset that consists of pairs x={qi,vi}i=1n. In practice, we assume a value for β that is concentrated on the RoM. The error process for each data point is independent and identically distributed with εi∼N⁢(0,σ2). Therefore, the likelihood function for the sampling distribution is

(9)

L(p,σ|x)=∏i=1n1σ⁢2⁢πe12⁢σ2⁢(vi-p⁢qi)2.

Inserting the maximum likelihood estimate of the error variance σ𝑀𝐿𝐸 using the formula σ^i(p|x)=1n⁢∑i=1n(vi-p⁢qi)2 produces a concentrated likelihood function with sampling distribution L^(p|x)=f(x|p).

Extending the estimation method to a fuller treatment of uncertainty in the error variance can be simple depending on the nature of the data and assumptions. The posterior distribution is proportional to the concentrated likelihood times the prior pdf, f(x|p)g(p), which may also be seen as a likelihood function, h(p|x)=f(x|p)g(p). The mode of h(p|x) is a Bayesian estimate from the posterior distribution that is directly comparable to the MLE at the mode of L^(p|x). The mean of the posterior distribution E(p|x)=∫ph(p|x)dp is another Bayesian estimate. The second model uses individual ratios pi=vi/qi, and is log-linear pi=eμ+εi with εi∼N⁢(0,σ2). The likelihood function for the lognormal model is

(10)

L(μ,σ|x)=∏i=1n1pi⁢σ⁢2⁢πe12⁢σ2⁢(ln⁡(pi)-μ)2.

The error variance formula in the concentrated likelihood is σ^i(μ|x)=1n⁢∑i=1n(ln⁡(pi)-μ)2. A prior on prices in the second model is distributed N⁢(α,β) with α set equal to the natural log of RoM.15 15

Table 2 presents point estimates from the regressions with normal and lognormal models. Attention is focused on the point estimates in Table 2 for direct comparison with the astatistical interpretation of unit value calculations for a census dataset in Table 1.

Table 2

Comparison of price estimates by normal and lognormal regression models based on the maximum prior (= RoM) (MAXPRI), maximum likelihood (MAXLIKE), maximum posterior (MAXPOST), and the posterior mean (EXPOST), for Alaskan red king crab 2010–2014

YEAR	MAXPRI	MAXLIKE	MAXPOST	EXPOST
		Normal
2010	13.58	14.32	13.74	13.73
2011	17.62	17.53	17.60	17.60
2012	14.88	16.27	15.14	15.14
2013	12.59	13.73	12.89	12.88
2014	11.59	12.82	11.92	11.91
		Lognormal
2010	13.58	13.15	13.23	13.24
2011	17.62	17.40	17.43	17.48
2012	14.88	13.60	13.76	13.78
2013	12.59	11.70	11.84	11.86
2014	11.59	10.78	10.95	10.97

Table 3

Summary statistics of the price estimates, for Alaskan red king crab 2010–2014

Year	Processor	Min	Max	Average	Range	Standard	Coefficient
	count					deviation	of variation
2010	15	13.15	14.32	13.53	1.17	0.370	0.027
2011	21	15.84	17.72	17.36	1.88	0.578	0.033
2012	17	13.6	16.27	14.66	2.67	0.949	0.065
2013	19	11.7	13.73	12.51	2.03	0.693	0.055
2014	17	10.78	12.82	11.60	2.04	0.702	0.060

Figure 1.

Normalized histograms and estimated pdfs for normal and lognormal models of red king crab price estimates 2010–2014 (prior = dashed; likelihood = dotted; posterior = solid).

Figure 2.

Comparison of price estimates by RoM, MoR, and OLS, with normal and lognormal regression model estimators MAXLIKE, MAXPOST, and EXPOST, for Alaskan red king crab 2010–2014.

Figure 1 summarizes results of the Bayesian estimation by showing probability-normalized histograms of the cross-section for each year, and presents uncertainty with plots of pdfs for the likelihoods, priors, and posteriors. Results of the Bayesian estimation, and other numerical results in the paper, such as the OLS regressions and endogeneity tests in Wooldridge [25, Ch.15], were computed in Mathematica 8 for Microsoft Windows (64-bit) running on an Intel core i7 CPU at 2.67 GHz and 6 GB of RAM. Mathematica’s NMinimize and NMaximize functions were used for least-squares and maximum likelihood operations, respectively, and NIntegrate was used to evaluate integrals that form the mean of the Bayesian posterior distributions.

In all cases, differences between the maximum posterior and posterior mean are relatively small and almost negligible for the normal regression model. The OLS and maximum likelihood estimates are not equal because of the concentrated likelihood. The maximum likelihood price estimates for the normal regression model are larger than the other estimates in most years. The biggest discrepancy in price estimates is OLS in 2011 which is substantially less than the other estimates that are relatively close together. The low OLS price estimate in 2011 is due to the bi-modal mass (evident in the histograms presented in Fig. 1) amplified by the quadratic weight on quantity in the OLS formula.

In general, differences between the maximum likelihood and maximum posterior estimates were larger for the normal regression model than the lognormal model. An interesting result is that the maximum posterior for the normal regression model is closest to the RoM, and the maximum posterior for the lognormal model is closest to the MoR. This relationship provides an interpretation for each (i.e., normal and lognormal) Bayesian estimate. For this particular dataset, it is never the case that the choice of method results in a change in sign for the year-over-year changes in price (Fig. 2). In theory, choice of weights associated with different methods could produce this result. Tables 1 and 2 show the range of estimates for price is quite large.

Summary statistics for the price estimates are presented in Table 3. For example in 2010, the difference between the smallest and largest price estimates was $1.17/lb (a 9% change), and in 2012, it was $2.67/lb (a 20% change). Dispersion across the nine estimated prices, measured by the coefficient of variation, was largest in 2012. The maximum likelihood methods tended to produce estimates of the extremes, with the normal tending to produce the largest price estimate and the log-normal the smallest. Excluding these, the range of price estimates in 2010 is $0.31/lb (a 2% change) and in 2012 is $1.67/lb (an 11% change) (Table 1).

4.Conclusion

The method used to calculate an aggregate price from quantity and value data of a cross section is not trivial. Different methods, each defensible based on context, can lead to substantially different estimates of the price. Aggregate price differences such as these can potentially have considerable impact on estimated welfare effects and or price induced incentives, and could affect cost recovery fees collected by NMFS for observer coverage. This paper has considered unifying frameworks for astatistical and statistical calculation of an aggregate price which make explicit the choices made when selecting a method. Select methods for calculating price are compared empirically displaying the marked differences in price the different methods yield which highlights the importance of thinking through the method used for calculating an aggregate price.

Within the astatistical framework (when the data are a census of the population considered) for calculating an aggregate price, a generalized unified pricing function is used to reduce the problem of selecting a method to that of selecting a set of weights on value and quantity and an aggregation parameter. The unified framework shows that the most objective method for calculating a price is RoM (unit value) in the sense that quantity and value are weighted equally and aggregation is linear. The choice of weights and whether to use a ratio of means, mean of ratios, geometric mean, or some other alternative are ultimately a decision for an analyst and can be justified based on context. The unified framework in this paper facilitates comparison among standard and alternative price estimates making these decisions explicit by deriving everything from first principles.

The statistical framework is appropriate when a probability distribution is a real (e.g., data are sampled) or imposed (e.g., hypotheses are being tested) feature of the data. A unified analysis of statistical methods for aggregating prices shows there are potentially important differences among the statistical averages and regression-based price estimates. The quadratic weight on quantity in the OLS price estimates is a potentially unappealing property because of the influence of outliers which can occur for non-informative reasons (e.g., database errors). Furthermore, OLS does not explicitly account for the distribution of individual prices within a dataset. The Bayesian estimation method explicitly models the distribution of prices and can therefore better represent uncertainty. Bayesian estimation delivers pdfs that summarize the individual prices, and these pdfs can be used to derive probabilistic interpretations for the utility or loss associated with different outcomes.

Fundamentally, the purpose of this paper is to recognize that the calculation of a seemingly elementary aggregate price is not a priori elementary, that different aggregation methods can produce substantially different results, and to motivate a moment’s pause before calculating an aggregate price. Furthermore, we seek to inform the questions that served as the impetus for some of the comments by Diewert [4] and Balk [7], which we summarize as, “What aggregation formula should we use to calculate an aggregate price?” While certain estimates of an aggregate price may be justified based on context, defensible methods can differ depending on whether the aggregate price sought is a summary statistic of the data, a parameter of a probability distribution, or the marginal effect on value from quantity. There will arise contexts where it is not clear which aggregation is appropriate, leaving the researcher considerable liberty to consider and defend the use of a given estimator. A purpose of this article has been to provide a framework that lays bare the tradeoffs made by different estimators to help inform their choice.

Notes

1 Low frequency refers to an interval of time or space over which multiple transactions take place at significantly different prices such that the Law of One Price is plausibly violated and as a consequence price dispersion exists.

2 Balk [7] showed that the unit value index fails the identity test unless both base period and reference period quantities are equal, and it fails the dimensionality test and can be sensitive the choice of units. Hence the unit value index is not a true price index.

3 For example, see Garber-Yonts and Lee [13, Table 4.5] where both ratio-of-means (RoM) and mean-of-ratios (MoR) statistics (including the standard deviation for the MoR as a simple measure of distributional variation) are presented.

4 Depending on the data collection value and quantities may not represent a single transaction. It could be multiple transactions within an interval of time with differing prices.

5 This also assumes that the data collection methodology itself does not introduce uncertainty (e.g., measurement error) or that it is negligible. Furthermore, uncertainty arising from the data collection methodology should generally be addressed explicitly and may be dealt with differently than uncertainty from sampling.

6 http://www.adfg.alaska.gov/static/applications/dcfnewsrelease/872196583.pdf.

7 A voluntary cooperative IFQ program was implemented in 2005 to allocate BSAI crab resources among harvesters, processors, and coastal communities (www.npfmc.org/crabrationalization).

8 See Amendment 24 to the Fishery Management Plan for BSAI King and Tanner Crabs. Current guidelines from the National Marine Fisheries Service for National Standard 1 state “The Magnuson-Stevens Act establishes MSY as the basis for fishery management” (50 C.F.R. §600.310(b)(2)(i)). See also definitions of optimum yield, overfished, and overfishing, in Magnuson-Stevens Fishery Conservation and Management Act, 16 U.S.C. §1802.

9 As an extreme example, consider two companies one with 90% of the market share and goods priced at $0.50 and the other with 10% market share and goods priced at $1.00. The weighted average of prices is $0.55, and the simple average is $0.75.

10 Note that because the weights appear in both the numerator and denominator and the generalized mean is homogenous of degree 1, normalization of the weights so that ∑iωi=1 is unnecessary.

11 The geometric mean is obtained from Eq. (3) in the limit as ∝→0.

12 The variance can also be given a distribution and estimated like the mean. Under similar assumptions the same results hold. The current setup maintains the focus on the formula for the mean.

13 For example, see §50 CFR 679.45(c)(2)(i) for halibut and sablefish programs, 70 FR 10174 for the Bering Sea Aleutian Islands Crab Rationalization Program, and other programs (https://alaskafisheries.noaa.gov/fisheries/cost-recovery-fee-programs).

14 The endogeneity test in Wooldridge [25, Ch.15], which is based on the t-statistic for an estimated coefficient on residuals from an auxiliary regression with lagged quantity as the instrument, was applied to data used in this paper. This test was applied to the specification used in this paper, with quantity as the explanatory variable and value as the response, and also applied to the inverse specification that treats value as the explanatory variable and quantity as the response. Test results for 2010–2014 reject (5% significance level) the null of exogeneity for quantity in 2011 and 2012. For data used in this paper, the conclusion from these tests is endogeneity in quantity variables is not a problem in most years. With these mixed empirical results in mind, microeconomic theory favors the inverse demand approach of Barten and Bettendorf [14] where producers and consumers are price takers and traders choose prices to clear the market so that causality runs from quantity to price.

15 A lognormal prior L⁢N⁢(α,β) could be useful in some cases. In this case, α is set equal to the natural log of RoM plus an adjustment for the assumed value of β because the maximum (i.e., mode) of the pdf occurs at eα-β2. However we encountered numerical problems trying to implement that approach.

Acknowledgments

We thank the editor and an anonymous referee for helpful comments. The findings and conclusions in the paper are those of the authors and do not necessarily represent the views of the National Marine Fisheries Service, NOAA.

References

[1]	Kaplan, G. and Menzio, G. The morphology of price dispersion. International Economic Review. (2015) ; 56: : 1-42.
[2]	Triplett, J.E. Using scanner data in consumer price indexes. Some neglected conceptual considerations. In: Feenstra, R.C. and Shapiro, M.D. (eds). Scanner Data and Price Indexes. University of Chicago Press; (2003) .
[3]	Baye, M.R. Price dispersion and functional price indices. Econometrica. (1985) ; 53: : 217-224.
[4]	Diewert, W.E. Axiomatic and economic approaches to elementary price indexes. NBER Working Paper No. 5104; (1995) .
[5]	Reinsdorf, M. Using scanner data to construct CPI basic components indexes. Journal of Business and Economic Statistics. (1999) ; 17: : 152-160.
[6]	Nevo, A. Measuring market power in the ready to eat cereal industry. Econometrica. (2001) ; 69: : 307-342.
[7]	Balk, B.M. On the use of unit value indices as consumer price subindices. In: Lane, W. (ed). Proceedings of the Fourth Meeting of the International Working Group on Price Indices. Washington, D.C.: U.S. Department of Labor; (1999) .
[8]	Deaton, A. Quality, quantity, and spatial variation of price. American Economic Review. (1988) ; 78: : 418-430.
[9]	Silver, M. and Webb, B. The measurement of inflation: aggregation at the basic level. Journal of Economic and Social Measurement. (2002) ; 28: : 21-35.
[10]	Bradley, R. Pitfalls of using unit values as a price measure or price index. Journal of Economic and Social Measurement. (2005) ; 30: : 39-61.
[11]	Mills, F.C. The Behavior of Prices. National Bureau of Economic Research (NBER); (1927) .
[12]	Hicks, J.R. Value and Capital, Second Edition, Oxford: Clarendon Press; (1946) .
[13]	Garber-Yonts, B. and Lee, J. Stock Assessment and Fishery Evaluation Report for King and Tanner Crab Fisheries of the Bering Sea and Aleutian Islands Regions: Economic Status of the BSAI Crab Fisheries 2015. North Pacific Fisheries Management Council; January (2016) .
[14]	Barten, A.P. and Bettendorf, L.J. Price formation of fish: an application of an inverse demand system. European Economic Review. (1989) ; 33: : 1509-1525.
[15]	Park, H., Thurman, W.N. and Easley, J.E. Modeling inverse demands for fish: empirical welfare measurement in Gulf and South Atlantic fisheries. Marine Resource Economics. (2004) ; 19: : 333-351.
[16]	Lee, M. and Thunberg, E. An inverse demand system for New England groundfish: welfare analysis of the transition to catch share management. American Journal of Agricultural Economics. (2013) ; 95: : 1178-1195.
[17]	Eales, J., Durham, C. and Wessells, C.R. Generalized models of Japanese demand for fish. American Journal of Agricultural Economics. (1997) ; 79: : 1153-1163.
[18]	Hospital, J. and Pan. M. Demand for Hawaii bottomfish revisited: incorporating economics into total allowable catch management. U.S. Department of Commerce, NOAA Tech. Memo. NOAA-TM-NMFS-PIFSC-20; (2009) .
[19]	Coelli, T.J., Rao, D.S.P., O’Donnell, C.J. and Battese, G.E. An introduction to efficiency and productivity analysis. Springer Science & Business Media; (2005) .
[20]	Godambe, V.P. and Thompson, M.E. Parameters of superpopulation and survey population: Their relationships and estimation. International Statistical Review. (1986) ; 54: : 127-138.
[21]	Dassow, J.A. Freezing and canning king crab. Fishery Leaflet 374. U.S. Fish and Wildlife Service. (1950) .
[22]	Dalton, M. and Lee, J. Alaska fisheries and global trade: red king crab, Paralithodes camtschaticus; sockeye salmon, Oncorhynchus nerka; and walleye pollock, Gadus chalcogrammus. Marine Fisheries Review. (2015) ; 77: (3): 84-90.
[23]	Kruse, G.H., Zheng, J. and Stram, D.L. Recovery of the Bristol Bay stock of red king crabs under a rebuilding plan. ICES Journal of Marine Science. (2010) ; 67: : 1866-1874.
[24]	Kleinbergen, F. and Zivot, E. Bayesian and classical approaches to instrumental variable regression. Journal of Econometrics. (2003) ; 114: : 29-72.
[25]	Wooldridge, J.M. Introductory econometrics: a modern approach. Mason, OH: Thomson/South-Western; (2006) .