Search the web
Sign In
New User? Sign Up
vuids · Voice User Interface Designers
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Recognition-Problem   Message List  
Reply | Forward Message #2346 of 2526 |
RE: [vuids] Re: Recognition-Problem

P.S. When you say "but I see it as managing errors rather than reducing them"
 
 - I see it as the opposite.
An SLM's main aim is to reduce the absolute error count as low as possible - i.e. Reduce them.
If the best way to do that is to completely make one word/phrase not work at all, then the SLM may result in that.
There are certainly many observable cases of, say one strange word occurring just 1 or 2 times in 100000 cases - And the SLM may not work AT ALL on that word.
 - Is that really so bad?
 
 
If you want to 'manage' errors - Saying "this error is 3 times as bad as this other error", or whatever - Then that is managing errors, and uses information the SLM can't really know.
 - That's where your design and tweaking comes in...
 
 
But the SLM is all about trying to minimise the absolute error count. Anything more 'wise' than that is up to you.
 


From: vuids@yahoogroups.com [mailto:vuids@yahoogroups.com] On Behalf Of Peter Nann
Sent: Friday, 3 July 2009 9:51 AM
To: vuids@yahoogroups.com
Subject: RE: [vuids] Re: Recognition-Problem

Everything you say is true.
 
Like I said, you have to consider what you are trying to do, and it is critical to consider how important it is to you to be able to get the 'rare' ones right.
 
You might sacrifice 3 "NewYorks" to make 1 "New Yerk" work - That is your call.
You might sacrifice 10 "Sales" calls for 1 "Sal Lees" call.
But you are probably pushing things if you sacrifice 100 "New Yorks" for 1 "New Yerk".
(Weighting 2 similar sounding things like this is basically an exercise in sacrificing one for the other)
 
And disambiguation strategies? - Sure, that affects what your performance targets should be in the steps prior to the disambiguation.
But we weren't talking about that. That could change everything.
 
 
Also like I said - Your examples are why I stated that starting with weights based on population is really only a very rough start.
The risk with doing this blindly is exactly the sort of thing you are worried about - Low runners might be completely clobbered and you might not know about it unless you test it properly.
 
 
But the harsh reality is:
Do you want to put out a system that gets 10% of requests wrong, or 5% of requests wrong?
 
Probably the latter, and that means sacrificing (performance on) the minority for the good of the majority.
 
 
And as usual, there is no single answer. You have to consider exactly what you are trying to do and design accordingly.
 
 

From: vuids@yahoogroups.com [mailto:vuids@yahoogroups.com] On Behalf Of Bruce Papazian
Sent: Friday, 3 July 2009 3:21 AM
To: vuids@yahoogroups.com; vuids@yahoogroups.com
Subject: RE: [vuids] Re: Recognition-Problem

At 07:29 PM 7/1/2009, Peter Nann wrote:


>> Interesting paper.  I've always wondered if such SLMs truely improver accuracy
 
 - Yes they do. 

Hi Peter,

This may be true, I'll have to take your word for it, but I would certainly agree that these techniques can lead to a greater percentage of calls being processed, but I see it as managing errors rather than reducing them (which is improved accuracy), and I think this can be more or less appropriate, depending on the application.  I see the process as trading off the probability of recognizing some items correctly over others to increase automation rates, but the result may not be acceptable from an individual users perspective if he/she can't get the app to work for things that are not common but still valuable.

Let me give offer an example from a call director application perspective to make my point.  Suppose an app is set up to transfer calls to people and departments within a company.  All departments and employee names are in the grammar, and the prompt is something like "say the name of the person or department you want to reach."  I would say it would be reasonable for the caller to expect that he/she should be able to reach all people and departments. 

Now suppose "Sales" is a department and the CEO's name is "Sal Less" and you want to make it work better so you decide to weight the grammar via some real data. In the real data there are many more calls to Sales than Sal Less, so through this process, a new grammar is built where "Sales" is now weighted heavier than "Sal Less" and is deployed.  Now Sales is recognized correctly more often, and the automation rate is up, but less calls get through to the CEO as they used to.  I can see (and have seen) where some constituents would not see this change as an improvement, even though the automation rate could have gone up.

Taken to the extreme, lets repeat the process a few more times.  As things that were infrequently requested to start with get requested less often because people learn that they don't work very well,  they get weighted lower and lower, which, in the limit, is effectively taking them out of the grammar.  You may now have a great automation rate, but you've changed the problem you are solving, and the job you are doing for the users, and the app is no longer meeting expectations.

Sorry to be belaboring this point, but I think it worth noting when these techniques are being considered.  A long time ago I was involved with a project where city names had to recognized.  Performance data showed there were confusable pairs like Austin and Boston we had to deal with.  Rather then tuning via a weighting strategy we chose to disambiguate via a dialog change where we would ask for the state name whenever one of the confusable pairs was recognized, and we put the city with the state in the grammar so that it would recognize properly if people decided to say the city and state after hearing the disambiguation prompt.

The dialog went something like this:
>What city?
Boston
>Was that Boston Massachusetts, or Austin Texas?
Boston Massachusetts

This made things work better without biasing towards the more frequent request, and it worked for confusable pairs that were equivalent in frequency of request. 

Interesting discussion.




>> or just bias the app to the more frequently requested items.
 
- Yes it certainly does bias, that is exactly how the higher accuracy is achieved.
 
>> If a test set is biased the same way as the models, then I can see why system results will look better,
 
- Yep. That's why real data is your friend, fake data can be your enemy. There's not point optimising a system toward fake data, but real data is another story...
 
>> but if the test set represents all possible items equally does it get better?
 
- That is a 'fake' testset. Performance would be gauranteed WORSE with the weighting, on such a fake testset. See the above point. Optimising on fake data is folly...
 
>>  And if you look at individual items, does the performance on the low frequency items go down using such methods?
 
- Yes it would go down. Possibly a lot for 'rare items' that are similar to 'hugely common' items.
That's why weighting based on things like population is a start, but you really want to then know how the whole solution performs _in_the_real_world_
 
Is it acceptable that some minor cities might be really hard to recognise?
 
If it's really important to you that the town of "New Yerk" with population 50 is recognised at all, then you had better carefully consider the weight of it and other similar sounding cities, and test with real data.
 
You might test it like this:
A) Get 100 people saying "New York"
B) Get 100 people saying "New Yerk"
 
 - Adjust the weightings until they were both, say, 95% right.
 If you did this, you  _WOULD_BE_CRAZY_.
 
It is far, far, FAR more important that "New York" performs better.
So you might adjust weights such that "New York" was 99.5% right (if you are lucky), and "New Yerk" might be only 70% right,
 - But the OVERALL performance on REAL DATA, would almost certainly be better like this.
 
 
It's a numbers game. And it's a harsh reality.
But you can't really argue against the numbers...
 
 
 
Now if you were REALLY SMART, the grammars would not just be weighted on 'population', but preferably by some other factors...
For example, biased more toward cities near your current location, because fairly short trips (?) are probably much more likely than long trips... Depending on the app.
 - That's the sort of thing Google love to use to imrove their apps...
 


From: vuids@yahoogroups.com [ mailto:vuids@yahoogroups.com] On Behalf Of Bruce Papazian
Sent: Thursday, 2 July 2009 2:30 AM
To: vuids@yahoogroups.com
Subject: Re: [vuids] Re: Recognition-Problem

Interesting paper.  I've always wondered if such SLMs truely improver accuracy or just bias the app to the more frequently requested items.  If a test set is biased the same way as the models, then I can see why system results will look better, but if the test set represents all possible items equally does it get better?  And if you look at individual items, does the performance on the low frequency items go down using such methods?

At 01:55 PM 6/30/2009, you wrote:


Here is a pointer to a paper that describes how to weight the grammar by population:

http://phil.shinn.googlepages.com/DesigningLanguageModelsforVoicePorta.pdf

--- In vuids@yahoogroups.com, "vuiwoz" <vuiwoz@...> wrote:
>
> Hi,
> I have an issue with recognition of cities (without state):
> There are about 50.000 cities to recognize (including synonyms) - and according to my experiences recognition should be at least about 70-75%, without any fine-tuning (except lexicon) - but by now its far less. The lexical transcriptions we use are hand-crafted, so this shouldnt be the problem.
>
> Does anyone have experiences with that and knows, which parameters can be set to enhance recognition? (Maybe preprocessing - Sample-Frequency, Volume)
>
> Thanks in advance for any hints...
>
BPIdesign
6 Stonecutters Path
Harvard, MA 01451
brucepapazian@...
978-835-3124

BPIdesign
6 Stonecutters Path
Harvard, MA 01451
brucepapazian@...
978-835-3124


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________


Fri Jul 3, 2009 12:25 am

pnann
Offline Offline
Send Email Send Email

Forward
Message #2346 of 2526 |
Expand Messages Author Sort by Date

Hi, I have an issue with recognition of cities (without state): There are about 50.000 cities to recognize (including synonyms) - and according to my...
vuiwoz
Offline Send Email
Jun 22, 2009
4:32 pm

... I think it would help us to answer you if you could tell us what kinds of misrecognitions you are getting in your nBest list. Are they phonetically...
Sarah Wayland
aktbar
Offline Send Email
Jun 22, 2009
9:45 pm

What recognition engine? What platform? Is this telephony? Maybe not since you mention a possible choice of sample frequency... What context is the caller in?...
Peter Nann
pnann
Offline Send Email
Jun 23, 2009
12:20 am

Thanks for your ideas! I'm sorry, I forgot to mention the detailled settings: We use a Vocon-Recognizer, the platform is proprietary. So it's not telephony, as...
vuiwoz
Offline Send Email
Jun 23, 2009
11:33 am

Oh. Well, I have no clue what the Vocon recogniser is like... Non-telephony situations often have difficulties with variations of microphone type, microphone...
Peter Nann
pnann
Offline Send Email
Jun 23, 2009
12:45 pm

Hi, regarding weighting, totally agree with Peter below. And the number of inhabitants is a good enough guess to start with (even if the best source would be...
ariane_nabeth
Offline Send Email
Jun 23, 2009
8:39 pm

You could also weigh by number of addresses in your database if population is unavailable......
oakieoaktree
Offline Send Email
Jun 24, 2009
1:52 pm

One more thing on weights that you need to consider is the difference between real data and fake test data. If your 'test data' is 5 examples of every city (or...
Peter Nann
pnann
Offline Send Email
Jun 24, 2009
11:40 pm

Is it required that you do not collect the state that the city is in? This could cut down on the size of the city grammar and improve recognition accuracy. ...
Bruce Papazian
brucepapazian
Offline Send Email
Jun 25, 2009
2:15 pm

Here is a pointer to a paper that describes how to weight the grammar by population: http://phil.shinn.googlepages.com/DesigningLanguageModelsforVoicePorta.pdf...
philshinn
Offline
Jun 30, 2009
5:56 pm

Interesting paper. I've always wondered if such SLMs truely improver accuracy or just bias the app to the more frequently requested items. If a test set is...
Bruce Papazian
brucepapazian
Offline Send Email
Jul 1, 2009
4:30 pm

There are more factors than frequency and weighting, yes? Accuracy is one aspect up for trade-off among many. Latency is another very important consideration....
Phillip Hunter
phillipwhunter
Offline Send Email
Jul 1, 2009
4:40 pm

... accuracy - Yes they do. ... - Yes it certainly does bias, that is exactly how the higher accuracy is achieved. ... why system results will look better, -...
Peter Nann
pnann
Offline Send Email
Jul 1, 2009
11:30 pm

... Hi Peter, This may be true, I'll have to take your word for it, but I would certainly agree that these techniques can lead to a greater percentage of calls...
Bruce Papazian
brucepapazian
Offline Send Email
Jul 2, 2009
5:21 pm

Everything you say is true. Like I said, you have to consider what you are trying to do, and it is critical to consider how important it is to you to be able...
Peter Nann
pnann
Offline Send Email
Jul 2, 2009
11:59 pm

P.S. When you say "but I see it as managing errors rather than reducing them" - I see it as the opposite. An SLM's main aim is to reduce the absolute error...
Peter Nann
pnann
Offline Send Email
Jul 3, 2009
1:47 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help