Search the web
Sign In
New User? Sign Up
VoiceCoder · Forum for discussing using VR for the pu
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want to share photos of your group with the world? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
getting close to a linux solution   Message List  
Reply | Forward Message #5969 of 5996 |
Re: [VoiceCoder] getting close to a linux solution


> Other than some serious redraw problems With the Windows environment,
> it's not handling "scratch that" properly. If you give the command it
> only deletes one character and then thinks it's done. It's probably
> something due to the NX environment doing its magic but, is there
> anyway to fool naturally speaking into a meeting of right number
> backspace is?

the short answer is no. The long answer follows.
- Mark


Return-Path:
<sentto-311578-5899-1240088612-mark.lillibridge=hp.com@...>
To: <VoiceCoder@yahoogroups.com>
From: Mark Lillibridge <mark.lillibridge@...>
Delivered-To: mailing list VoiceCoder@yahoogroups.com
Date: Sat, 18 Apr 2009 14:03:16 -0700
Subject: [VoiceCoder] Dictating into non-select-and-say windows
Reply-To: <VoiceCoder@yahoogroups.com>


I recently upgraded to DNS 9.5; in the process I discovered that
DNS's handling of non-select-and-say windows got considerably worse,
requiring new and more complicated workarounds. I wrote the following
series of posts to speechcomputing.com about what I learned in the
process that will probably be of interest to this list as well.

- Mark


======================================================================

Dictating into non-select-and-say windows I

The Dragon NaturallySpeaking documentation, if I remember correctly,
doesn't say much about dictating into non-select-and-say windows other
than that if you have trouble, you should use the dictation box. It is
implied that you are basically dictating blind, without any selection or
correction ability.

In fact, Dragon's handling of these windows is considerably more
sophisticated than this. What follows is my mental model of what Dragon
does; like most mental models, it's probably not entirely correct but is
still good enough to make use of.

Dragon has a buffer for each non-select-and-say application
(window?), which I call the utterance buffer; the application's
utterance buffer remembers the dictated text from the last several
utterances directed at that application. (Utterance = speech between
pauses) When you manually type a key or a macro sends a character to
that application, the buffer is emptied. I don't know how big the
buffer is, but it seems to be capable of holding several utterances
and/or a sentence or so of text. Note that a macro that switches to
another application does not affect original application's buffer.

The cool thing is that because Dragon knows what the text of the
last N characters of your application before the cursor is (e.g., the
contents of the utterance buffer), it can provide what amounts to select
and say functionality for that limited text. Available commands include:

* select X
* select X through Y
* correct that
* correct Z
* compound that

and so on. Dragon updates the utterance buffer based on any corrections
made so that you can correct your correction and the like. Of course,
if your application treats text weirdly -- e.g., a credit card field
that eats "-"s -- the utterance buffer will not correspond correctly to
the application's state, and problems will ensue.

The other catch is that Dragon uses standard Windows editing
keyboard shortcuts like shift left, backspace, and the like to edit the
text to make corrections. If your application is not a native Windows
application (e.g., xterm or emacs), Dragon's edits will make a mess of
things unless you teach your application to understand them (tricky).

[to be continued]

======================================================================

How Dragon edits text in non-select-and-say Windows: DNS 8.1

The easiest way to understand this is by an example:

<code>
Jon_walked_north ; "Jon walked north"
```````````````` ; "correct John"
>>>>
<<<< ; "choose <option for John>"
''''
~~~~
John_''''''''''''
</code>

What I spoke is to the right and the keys typed are to the left, where I
have lined up the keys to make it clear where the cursor is.

<code>
Key:
` = left arrow key
' = right arrow key
< = shifted left arrow key
> = shifted right arrow key
~ = backspace key
_ = a space
</code>

As you can see, DNS 8.1 first uses the arrow keys to move to the
start of the selection (here "Jon_"), then uses shift right to highlight
the selection. Once it has a replacement (here "John_"), it first uses
shift left to unhighlight the selection, then uses arrow keys to move
just beyond the selection then uses repeated backspace to remove the
selection; finally, it types the replacement then uses the arrow keys
again to return the cursor to where it started.

Here are some more examples:

<code>
John_walked_north ; "John walked north"
````````````````` ; "correct that"
>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<< ; "Sally ran south"
'''''''''''''''''
~~~~~~~~~~~~~~~~~
Sally_ran_south
</code>


<code>
John_walked_north ; "John walked north"
```````````` ; "select walked"
>>>>>>>
<<<<<<< ; "climbed"
'''''''
~~~~~~~~
_climbed_
</code>


Although these examples don't happen to show it, sometimes DNS 8.1
types a control c after it has finished highlighting a selection (e.g.,
after the last >). I don't understand the logic of when this is done,
but DNS doesn't seem to care about whether or not anything actually gets
pasted to the clipboard.

======================================================================

Making xterm's play nicely with DNS 8.1

Attempting to use a standard xterm with DNS 8.1 will result in
garbage like ";2C;2D;2C;2C;2C;2CJohn"... when you correct because xterm
turns shift left into (escape);2C (exact characters depend on the mode).
Moreover, if Dragon types a control C, you will probably kill the
program you were running in the xterm or at least discard the line of
input you are working on.

However, if you examine the DNS 8.1 examples from the last post
carefully, you will see that it suffices to either ignore shifted arrow
keys or treat them as normal arrow keys. The last example:

<code>
John_walked_north ; "John walked north"
```````````` ; "select walked"
>>>>>>>
<<<<<<< ; "climbed"
'''''''
~~~~~~~~
_climbed_
</code>

then becomes:

<code>
John_walked_north ; "John walked north"
```````````` ; "select walked"
''''''' ; "climbed"
~~~~~~~~
_climbed_
</code>

or

<code>
John_walked_north ; "John walked north"
```````````` ; "select walked"
'''''''
``````` ; "climbed"
'''''''
~~~~~~~~
_climbed_
</code>

Either way, most of the arrow keys cancel out giving:

<code>
John_walked_north
`````
~~~~~~~~
_climbed_
</code>

which produces exactly the desired edit. The other examples behave
similarly.


Adding the following lines to your .Xdefaults file will have the
effect of converting shift arrow keys to arrow keys:

<code>
XTerm.VT100.Translations: #override \n\
Shift<Key>Left: string(0x1b) string("[D") \n\
Shift<Key>Right: string(0x1b) string("[C") \n
</code>

("(escape)[C" is what xterm returns when a right arrow key is typed.)

Of course, this still leaves the control C to deal with. Any number
of methods are possible, ranging from the simple changing of the
interrupt character to a different character and translating control C
to nothing to a more sophisticated use of key maps (see the forthcoming
solution for DNS 9.5) to map control C to nothing only when it follows a
shift arrow key.

Unfortunately, DNS 9.5, as we shall, see uses a much harder to cope
with editing strategy, which cannot be dealt with so easily.

======================================================================

How Dragon edits text in non-select-and-say Windows: DNS 9.5

Sometimes, DNS 9.5 edits text almost the same way as DNS 8.1; for example,

<code>
_John_walked_north ; "John walked north"
````````````````` ; "correct John"
>
<
>>>>> ; "choose <choice for Jon>"
<<<<<
'''''
~~~~~~
_Jon_''''''''''''
</code>

<code>
Key:
` = left arrow key
' = right arrow key
< = shifted left arrow key
> = shifted right arrow key
~ = backspace key
_ = a space
^C = a control C is sent
</code>

Two noteworthy differences here: first, DNS inserts an additional
space at the beginning of my dictation. More on this in a later post as
this technically isn't an edit problem. Second, there is a extra single
shift-right followed by a single shift-left before the selection is
highlighted. This doesn't do anything, so I assume this is the result
of some bug. The strategy of ignoring shifted arrows or treating them
as un-shifted arrows still works for this case.

Unfortunately, DNS 9.5 has another method of editing text, shown by
the following example, which that strategy can't cope with:

<code>
Jon_walked_north ; "Jon walked north"
```````````````` ; "correct that"
>
<
>>>>>>>>>>>>>>>>^C
~ ; "Celli ran south" (a misrecognition)
Celli_ran_south
</code>

Here things start as before other than not adding the extra space (I
believe this is unrelated to the edit method). The optional control C
is normal for DNS 8.1, but a different method of erasing the selection
is being used. Rather than unhighlight the selection then erase it
using multiple backspaces as DNS 8.1 would, DNS 9.5 simply erases the
highlighted selection with a single backspace.

Handling this edit method requires either keeping count of how long
the selection is or remembering where the selection started. No simple
key mapping can handle it. In this sense, DNS 9.5 is a distinct
downgrade from DNS 8.1. (To be fair, the second edit method is probably
faster for many native Windows applications.)

======================================================================

Making xterm's play nicely with DNS 9.5 (and 8.1)

As discussed in the last post, for DNS 9.5 we need to either keep
count of how long the current selection is or remember where the
selection started. The second of these strategies is the easiest to
implement.

Most line editors (e.g., tcsh, readline, irb, bc, etc.) under UNIX
support a fairly extensive set of Emacs key bindings. In particular,
they support setting the mark to the current cursor position via
{ctrl+@} and erasing the text between the cursor and the mark via
{ctrl+w}. These two commands will suffice to handle remembering the
start of the selection and then optionally erasing the entire
selection. (Remember the selection is the text between the current
cursor position and the start of the selection.)

Adding the following lines to your .Xdefaults file will make all
such line editors handle DNS 8.1/9.5 corrections properly:

<code>
!
! Keymap to handle Dragon NaturallySpeaking corrections for editors
! that understand ^@ (set-mark=0x00) and ^w (kill-to-mark=0x17)
!
! Side effect: sets mark when a Dragon selection/correction is done
!
XTerm.VT100.Translations: #override \n\
<Key>BackSpace: string(0x08)\n\
\
Shift<Key>Left: string(0x00) string(0x1b) string("[D") \
keymap(selection) \n\
Shift<Key>Right: string(0x00) string(0x1b) string("[C") \
keymap(selection) \n

XTerm.VT100.selectionKeymap.translations: \
Shift<Key>Left: string(0x1b) string("[D") \n\
Shift<Key>Right: string(0x1b) string("[C") \n\
\
~Shift<Key>Right: string(0x1b) string("[C") keymap(None) \n\
~Shift<Key>Left: string(0x1b) string("[D") keymap(None) \n\
\
Ctrl<Key>c: ignore() \n\
<Key>BackSpace: string(0x17) keymap(None) \n\
\
<Key>Shift_L: ignore() \n\
<Key>Shift_R: ignore() \n\
<Key>Control_L: ignore() \n\
<Key>Control_R: ignore() \n\
<Key>: insert() keymap(None) \n
</code>

The first entry overrides the standard xterm keymap handling of the
shifted arrows. When the first such shifted arrow is seen, the mark is
set ({ctrl+@} corresponds to charactor 0x00), the corresponding
unshifted arrow key is typed, and then the current keymap is changed to
a selection keymap, which is defined by the second entry.

As long as the selection might exist, the current keymap remains the
selection one. Additional shifted arrows continue to move the cursor by
typing the unshifted equivalent, control C is ignored, and a backspace
erases the entire selection via a {ctrl+w}. We return to the original
keymap when anything other than control C or a shifted arrow is typed.
Except for backspace, which we have redefined, such characters are typed
as themselves. Thus, typing while a selection exists does not replace
the selection, merely unselect it then type. It is possible implement
such behavior, but I have not done so as it is not needed to handle DNS
corrections.


The one drawback of this approach is that it does not work with line
editors that do not support the commands mentioned above. I haven't
run into any yet, but they probably exist. (Note that some form of line
editing is required for corrections because the right arrow key needs to
be handled correctly.)

I believe it is possible to implement the other strategy as well by
creating a lot of key maps, one per possible selection size and
direction. This allows the selection to be erased by typing the
correct number of backspaces. That strategy only requires a line editor
able to properly handle the un-shifted arrow keys, but I don't know how
well it would perform in practice.

======================================================================

How Dragon edits text in non-select-and-say Windows: DNS 10

I confess, I have no idea. I don't have DNS 10 yet. Perhaps one of
you with DNS 10 that uses xterm can help in the meantime.

If you add the following lines to your .Xdefaults file:

<code>
!
! Keymap for making what keys Dragon NaturallySpeaking sends visible.
!
! Use by "xterm -class visible"
!
visible.VT100.Translations: #override \n\
<Key>BackSpace: string(~)\n\
<Key>space: string(_)\n\
\
None<Key>Left: string(`) \n\
None<Key>Right: string(') \n\
Shift<Key>Left: string(<) \n\
Shift<Key>Right: string(>) \n\
\
Ctrl<Key>c: string(^C) \n
</code>

then do

<code>
xrdb -load ~/.Xdefaults
xterm -class visible
</code>

a new xterm will pop up, in which the various keys Dragon has used up to
this point to make corrections will show up as printable characters as
in my examples. Try out some corrections and post the results.

I'm hoping that DNS 10 just uses some subset of the strategies used by
DNS 8.1 and DNS 9.5 so that I don't need to develop new correction
handling code.

======================================================================

Dictating into non-select-and-say windows II

I left out one important aspect in my first post about dictating
into non-select-and-say windows: what about formatting properties?

So long as the utterance buffer is nonempty, things seem to work the
same as with select-and-say windows. Thus, if I dictate "he walked
south" after dictating "Fred walked north period", I will get "Fred
walked north. He walked south". Clearly Dragon maintains some state
about what spacing and capitalization to use for the beginning of the
next dictated phrase.

But what about when the utterance buffer is empty? In DNS 8.1, I
believe one always got no extra spacing or capitalization before the
next dictated phrase. This is predictable, and I've gotten quite good
at adding "space-bar" at the beginning of phrases when necessary.

In DNS 9.5, things are less predictable or maybe I just don't
understand the rule yet. A lot of the time, but not always, I get an
extra space. I haven't noticed extra capitalization, though. Maybe I
haven't completely adjusted yet, but I find this rule more annoying
because the more common case is that a phrase doesn't need a leading
space.

Does anyone know what DNS 10 does?




Wed Jul 15, 2009 6:14 pm

markdlillibr...
Offline Offline
Send Email Send Email

Forward
Message #5969 of 5996 |
Expand Messages Author Sort by Date

I've tried of a variety of solutions including X11 displays under windows enviously but all were unsatisfactory for one reason or another. Today, someone ...
Eric S. Johansson
esjatharvee
Offline Send Email
Jul 15, 2009
4:37 am

... the short answer is no. The long answer follows. - Mark Return-Path: <sentto-311578-5899-1240088612-mark.lillibridge=hp.com@...> To:...
Mark Lillibridge
markdlillibr...
Offline Send Email
Jul 15, 2009
6:16 pm

... Oh bother. How hard can it be to get something so basic working right....
Eric S. Johansson
esjatharvee
Offline Send Email
Jul 15, 2009
10:21 pm

First, I think its a great piece of "forensic" programming analysis! Secondly, I would guess the problem derives from trying to maximize compatibility across a...
Bruce Cyr
a_b_cyr
Offline Send Email
Jul 16, 2009
2:19 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help