Software Patent of the Week: Speech Recognition

by on February 6, 2007 · 10 comments

Every week, I look at a software patent that’s been in the news. You can see previous installments in the series here. There haven’t been any major patent controversies this week, so our patent of the week comes from last November. VoiceSignal Technologies has sued Nuance Communications over its voice patent. I’ll discuss the patent below the fold.


Here’s a key description of how the patent works:

The speech segment data processing part 2203 analyzes the speech segment data generated by the speech segment data generating part 2202 and recognizes the meaning. Conventional methods for recognizing the meaning are specifically described in “Digital Speech Process” (by S. Furui, Tokai University Publishing Association)(published in English translation as “Digital Speech Processing Synthesis and Recognition” (Marcel Dekker 1989)). Generally, a speech recognition dictionary storing part 2204 includes a phoneme dictionary and a word dictionary as dictionaries for speech recognition. In the speech recognition process, a phoneme is recognized based on the distance or the similarity between the short time spectrum of an input speech and that of the reference pattern, and the meaning of the speech is identified by a word matching the recognized phoneme sequence in the word dictionary.

However, conventional speech recognizers posed the problem that it is not easy to correct an error in the recognition of speech data.

More specifically, in the actual speech recognition made by humans, the speech data that was not recognized correctly at first can be corrected later in the context of the conversation for understanding, and the action of the people who are talking is also corrected accordingly. However, in conventional speech recognizers, it is not easy to correct the meaning of the speech once it has been recognized wrongly. Therefore, for example, in an apparatus in which a command is input by speech, it is difficult to correct an operation in the case that an erroneous command was input due to an error in the recognition of the speech data. Thus, the range for the application of speech recognizers is limited.

I think it’s worth conceding that this patent seems to have more merit than most of the ones I’ve looked at to date. At a minimum, the detailed description contains a lot of technical details that could add up to an original and innovative voice recognition system, although I don’t know the relevant literature to judge that for sure. But it’s also worth noting how broad the resulting claim is. Because they’ve come up with one mechanism for re-evaluating previously-recognized speech, they believe they’re entitled to a legal monopoly on any software product that includes a re-evaluation mechanism. This strikes me as unreasonably broad; they’re claiming a general software strategy, not a particular invention.

Comments on this entry are closed.

Previous post:

Next post: