this may be obvious, but be sure to set the frameLength prior to passing it to the recognizer, otherwise it will be getting duplicate data.
what happens, i think, is that the buffer contains all of the samples, including the initial 0.375 or whatever sec. if you change frame length to 1024, you are telling the engine how many samples you consumed -- it wants to keep that buffer the same size, and not ever skip, so it calls you sooner next time, where everything shifted left, and new samples appended at the end. The least latency would be those end samples. This takes the latency down from .375 for me to maybe 20-30 msec.
def handler(_cmd,buffer_ptr, samptime_ptr):
if buffer_ptr:
buffer = ObjCInstance(buffer_ptr)
# a way to get the sample time in sec of start of buffer, comparable to time.perf_counter. you can differnce these to see latency to start of buffer.
hostTimeSec=AVAudioTime.secondsForHostTime_(ObjCInstance(samptime_ptr).hostTime())
#you can also check for skips, by looking at sampleTime(), which should be always incrementing by whatever you set the framelength to... if more than that, then your other processing is taking too long
#this just sets up pointers that numpy can read... no actual read yet
data=buffer.floatChannelData().contents
data_np=np.ctypeslib.as_array(obj=data,shape=(buffer.frameLength(),))
#Take the LAST N samples for use in visualization... i.e the most recent, and least latency
update_path(data_np[-1024:])
#this tells the engine how many samples we consumed ... next time, we will get samples [1024:] along with 1024 new samples
buffer.setFrameLength_(1024)
# be sure to append the buffer AFTER setting the frameLength, otherwise you will keep feeding it repeated portions of the data
requestBuffer.append(buffer)