Forum Archive

Speech/speech activation

rex_noctis

So I have been having a problem for a while.
I am trying to make a program that whilst running will listen for a key word (for example ‘Apple’) and then will activate a piece of code upon hearing it.
I have tried creating a loop that records 2 seconds of audio then performs speech recognition on it, but much of the time, the word is said during the speech recognition process and therefore is not picked up.
How can I get it so that it will keep listening and the only margin of error is the quality of the speech recognition? I can’t think of anything.

cvp

@rex_noctis Try this. A file is recorded each 10 seconds (you can change it, max 60 seconds). A label is green while you record, is Orange to warn you it will change file in 3 seconds. File is flip/flop 0/1test.m4a. To stop, say "stop" and pray it works 😂
Of course, it is not perfect, if you change file just during a word is pronounced...

import threading
import speech
import sound
import time
import os
import ui

class my_thread(threading.Thread):
    global my_ui_view,servers_thread,servers_speed
    def __init__(self,view,file):
        threading.Thread.__init__(self)
        self.file = file
        self.view = view
    def run(self):
        local = threading.local() 
        local.file = self.file
        local.view = self.view
        def pr(l):
            local.view['tv'].text = local.view['tv'].text + l + '\n'
        #pr('recognize ' +local.file)
        try:
            local.t = speech.recognize(local.file,'fr')
            for local.m in local.t:
                if 'stop' in local.m[0].lower():
                    local.view.stop = True
                    break
                pr(local.m[0])
        except RuntimeError as e:
            pr('Speech recognition failed: '+str(e))
        os.remove(local.file)
        #pr(local.file+' deleted')

w,h = ui.get_screen_size()
v = ui.View()
v.background_color = 'white'
l = ui.Label()
l.frame = (10,10,100,20)
v.add_subview(l)
tv = ui.TextView(name='tv')
tv.frame = (10,50,w-20,h-50-10)
v.add_subview(tv)
v.present('full_screen')
j = 0
v.stop = False
while not v.stop:
    file = str(j)+'test.m4a'        
    l.text = file
    recorder = sound.Recorder(file)
    recorder.record()
    l.background_color = 'green'
    time.sleep(10)          # duration of one file
    l.background_color = 'orange'
    time.sleep(3)
    l.background_color = 'red'
    recorder.stop()
    s = my_thread(v,file)
    s.start()
    j = 1 - j
mikael

@rex_noctis, maybe @JonB and @Mederic could figure a way to use some kind of circular input audio buffer and continuous recording for this. Based on this thread they can go pretty deep on audio stuff, unfortunately in the wrong direction in that thread.

cvp

@mikael I know but the problem, I think, is that speech recognition needs a file. I know that my little script is not the right solution, it is only to show him that an other thread may do the job while you record.

JonB

Fwiw, you can overlap ping-ponging recorders so you never have gaps.

cvp

@JonB I agree but we don't avoid the risk to cut words.

JonB

https://gist.github.com/b732076dc521c3c130a865924b6731d5

This is what I mean. You would add your processing to the callback. but basically there are always two files recording, and one processing, so words will never be cut off.

In other words file 1 might cut off "App", but file 2 started a little later, so would get the whole "Apple".

obviously there would have to be other logic that switches to continuous record once the wake phrase is discovered.