Search the web
Sign In
New User? Sign Up
futarchy_discuss · Discussion about futarchy
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
How BTS might help measure GDP+   Message List  
Reply | Forward Message #43 of 65 |
A month ago we talked about how GDP+ might be measured (in "Measuring
Welfare"). I suggested that part of GDP+ could be the sum of
individual satisfaction reports. To reduce or eliminate insincere
reporting, I suggested to use Drazen Prelec's "Bayesian Truth Serum"
mechanism (Prelec, Science 2004, vol 306, "A Bayesian Truth Serum for
subjective data") Prelec promises that "Truthful answers maximize
expected score even for respondents who believe that their answer
represents a minority view."

Of course it's easy to wave my hands and say "Use Prelec's algorithm"
without giving details. In this message I'm going to try to pin down
the details of applying BTS to this.


An overview of the BTS algorithm:

There is a question that has K possible answers. Each respondent is
asked the question and replies with his answer (x) and a prediction of
how others will answer (y).

The answers are notated x[r][i], which means:

1 if respondent R has chosen answer I
0 otherwise

The predictions are notated y[r][i], which means the proportion of
respondents that respondent R says will choose answer I. To avoid
infinite logarithms, y[r][i] actually ranges between epsilon and 1,
not 0 and 1. It sums to 1 across any given respondent R.

Let <x[i]> be the average x[r][i] over respondents.

Let <y[i]> be the _geometric_ average of y[r][i] for a
prediction I over respondents.

Let u[r][i] := x[r][i] log(<x[i]>/ <y[i]>)
+ a <x[i]> log(y[r][i]/<x[i]>)

where "a" is a positive scalar, a free parameter.

Let u[r] be the sum of u[r][i] over all answer for a respondent R.
u[r] is called the BTS score of a respondent R.

Payoff:

A respondent R earn a reward proportional to u[r]. It can be
negative, and in fact if a = 1 the rewards are zero-sum over all
respondents.


Several questions immediately arise:

BTS wants a discrete set of answers. What is a reasonable such set
for our purpose?

How should we aggregate this measure suitably for use as a term in
GDP+?

If different answers are not equally informative, does it make sense
to weight their contribution?

Since respondents can earn negative "reward", how do we avoid
uncollectable losses, which would make some of the incentives
meaningless?

Can a respondent who's willing to accept a loss break the system? Ie,
cause problems out of proportion to his negative reward?

Taking these questions one by one:

* BTS wants a discrete set of answers. What is a reasonable such set
for our purpose?

I propose that it simply be a 5-point Likert scale (ie, "much better
off", "somewhat better off" etc).

It's tempting to try to preserve more information than that, perhaps
on a fine-grained numerical scale. I believe that would be a mistake,
especially if done naively. The gain in precision would be illusory.
For example, say a respondent said he was "53% better off" or
"slightly more than somewhat better off". He doesn't neccessarily
mean the same thing as other respondents who also say they are "53%
better off". That doesn't help, and there may be a temptation to
treat the illusory extra precision as real.

There's plenty of room for more nuance in the BTS model, but I'm not
sure it makes sense to use it.

FWIW, I suspect BTS is easily extended to continuous scales. ISTM all
that would be required is data-smoothing so that <x[i]> and <y[i]>
don't behave oddly for values of I where the data is sparse. Prelec's
proofs treated the answer set as discrete, but seemed easy to
generalize to a continuous scale. (In fact, he wrote a key proof step
using integration rather than summation, which may have been an
accident but makes no real difference).

* How should we aggregate this measure suitably for use as a term in
GDP+?

Having proposed a Likert scale, that still leaves the question of how
to sum responses. A Likert scale is an ordinal Guttman scale, but we
need an interval Guttman scale.

Fortunately, some work has been done on converting a Likert scale to
an interval scale, using the polytomous Rasch model. I'm not very
familiar with it but I know it exists.

After converting to an interval scale, it's straightforward to compute
the mean and use that as a weighted term in GDP+.

If we use a 5-point scale, there usually won't be outliers. Even if
there are, eg if a few respondents are "very much worse off" while
everyone else is neutral or better, it's not clear that we should
discard them as outliers. So ISTM we wouldn't need or want a robust
estimator.

FWIW, in a follow-up paper, Prelec provides a method for finding a
single preferred answer, but it doesn't help us. (Prelec 2006, "An
algorithm that finds truth even if most people are wrong").
Basically, select the answer I that maximizes average u[r][i] over
respondents. This presupposes that there exists an answer I that is
correct for all respondents (though they may not all know it). For
our purposes, the right answer I varies over respondents, so this
condition isn't met.


* If different answers are not equally informative, does it make sense
to weight their contribution?

By "informative", I mean the log ratio of actual to predicted
frequencies:

log(<x[i]>/ <y[i]>)

I suspect the answer is yes, but I haven't convinced myself of it.

* Since respondents can earn negative "reward", how do we avoid
uncollectable losses, which would make some of the incentives
meaningless?

First, it is possible for a respondent to accrue an negative reward by
predicting an extremely small y[r][i] where the actual frequency
<x[i]> is moderate or larger. This will give him a penalty roughly
proportional to log(y[r][i]). He can do this for K-1 of the K
possible answers, since the predictions must sum to 1.

This might happen with respondents who strongly believe everyone else
shares their view. This is not uncommon for certain topics in my
experience. It may be the case that our simple satisfaction questions
are not tempting for this situation. If so, good, but let's look at
the situation pessimistically.

Happily, the log operation keeps the situation from becoming
inordinately expensive for our respondent. If epsilon were 10^-20, he
gave his prediction its minimum value of epsilon, and he made the
mistake the maximum 4 times, and each of the 4 answers was actually
moderately popular, he would accrue a penalty of about 180 times the
unit payoff. That's the worst case.

I propose a two-part solution:

.* Limit losses. Either make epsilon fairly large, say 10^-4, or
disallow respondents from answering in such a way that they risk a
large penalty in aggregate. The first is simpler, the second lets us
have both more flexibility and more protection.

.* Arrange the payoffs so that money is always flowing towards the
respondent. This might be done along similar lines to tax
withholding.


* Can a respondent who's willing to accept a loss break the system?
Ie, cause problems out of proportion to his negative reward?

I don't want to say "no" for sure, because there's always the
possibility that a clever attacker may see things that I don't. I
will say, not in the obvious ways.

For instance, by reporting an insincerely low y for his preferred
answer, he might hope to boost the reward to others who choose it. If
we weight answers by informativeness, he might hope to boost its
effect by making it appear to be a surprisingly common answer. The
question is not whether he can do this at all (anyone can), but
whether he achieve an effect that's larger than his losses.

I'll analyze this in the simpler loss-limit approach above, which
merely limits all y to values of epsilon or greater.

To make a single low-ball prediction, he pays a penalty of
approximately -log(epsilon). The lowest value of <y[i]>' that he can
achieve by this maneuver alone is:

<y[i]>' = exp((NR log<y[i]> + log(epsilon))/(NR+1))
= <y[i]>^(NR/(NR+1)) * epsilon^-(NR+1)

where NR is the number of other respondents.

The new informativeness of that answer is:

log(<x[i]>/ <y[i]>')
= log(<x[i]>) - log(<y[i]>')
= (NR/(NR+1))log(<x[i]>/<y[i]>) - (1/(NR+1))log(epsilon)
= log(<x[i]>/<y[i]>) - (1/(NR+1))(log(<x[i]>/<y[i]>) + log(epsilon))

That is, he affects its weight in inverse proportion to the population
and in direct proportion to his losses. No gain for him there.

(The second occurence of "log(<x[i]>/<y[i]>)" just expresses the
dilution of the influence of the original situation. This is always
going to occur when another respondent is added)

The new payoff for others is:

u[r][i]'= x[r][i] log(<x[i]>/ <y[i]>') + a <x[i]> log(y[r][i]/<x[i]>)
= u[r][i] - (1/(NR+1)) x[r][i] (log(<x[i]>/<y[i]>) + log(epsilon))

So he affects their payoff just if they chose answer I, in inverse
proportion to the population and in direct proportion to his losses.

However, his action has an effect on each respondent. This acts as a
multiplier. It is conceivable that a group of low-ballers,
advertising their action in advance, could have an effect out of
proportion to their numbers. This is worrisome.

There are two effects that seem to help here:

.* If the responses are kept secret, the conspirators play a N-person
prisoner's dilemma. Presumably most of them betray. Issues like
vote-selling and ballot secrecy apply here.

.* If he lets others know beforehand what he's going to do, they not
only have an incentive to favor answer I, they have incentive to raise
their prediction of y[][i], which counterbalances the original
manipulation. Of course if he doesn't let anyone know he's going to
do it, he doesn't influence their choice, he just gives them an
unexpected payoff and affects nothing.

Comments? Any other questions that need to be asked about using BTS
for this?

Tom Breton (Tehom)





Fri May 23, 2008 2:56 am

tehom2000
Offline Offline
Send Email Send Email

Forward
Message #43 of 65 |
Expand Messages Author Sort by Date

A month ago we talked about how GDP+ might be measured (in "Measuring Welfare"). I suggested that part of GDP+ could be the sum of individual satisfaction...
Tom Breton (Tehom)
tehom2000
Offline Send Email
May 23, 2008
2:56 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help