Forum Archive

Amazon Lex using AudioRecorder

jovica

I am trying to capture audio on my iPad to submit to AWS Lex bot using Audio recorder. This is the code that records the audio:

settings = {ns('AVFormatIDKey'):ns(1633772320), ns('AVSampleRateKey'):ns(16000), ns('AVNumberOfChannelsKey'):ns(2), ns('AVLinearPCMBitDepthKey'):ns(16), ns('AVLinearPCMIsBigEndianKey'):ns(0),ns('AVLinearPCMIsFloatKey'):ns(0)}

output_path = os.path.abspath(FileName)
out_url = NSURL.fileURLWithPath_(ns(output_path))
recorder = AVAudioRecorder.alloc().initWithURL_settings_error_(out_url, settings, None)

However, Lex bot does not "understand" the submitted audio captured using above code.

I understand that Lex needs a Linear PCM but I am unsure what settings to use in the above code to achieve that.

Can somebody point me in the right direction?

JonB

Try just using sound.Recorder('somefile.wav')
This takes care of all the objc for you, and records in linearpcm (a.k.a wave file)

ccc

Undocumented? http://omz-software.com/pythonista/docs/ios/sound.html

jovica

@JonB thank you for your prompt response. What you suggested works: A wav file gets created.

Unfortunately, AWS lex still does not seem to understand it once I submit it.

Is there a way to capture a MPEG audio? That works as I receive MPEG response from Lex bot which I have then submitted right back to it and it worked like charm, bot can "understand it and response shows that in inputTranscript text that gets returned.

cvp

@jovica Try sound.recorder('file.m4a'), it's MPEG4

sammachin

@jovica Linear PCM is basically WAV, technically a WAV file has a short header on it that describes the format then its linear PCM data, you should be fine sending a wav to Lex as it will just ignore the header as a small glitch of data.
The important thing is that Lex needs the audio in 16bit 16Khz format but it looks like you hav that in the above format.
I'm pretty new to pythonista on ios but I've done a fair bit with Lex (and Alexa) in python for some code that submits audio to Lex using python & requests have a look at https://github.com/Nexmo/lex-connector/blob/master/server.py#L102-L109
Thats part of a larger application but should give you some pointers

zrzka

Hi,

you're using wrong AVFormatIDKey value. The correct one for PCM is 1819304813 (your one is MPEG4AAC). Here's the working code for AWS Lex & Pythonista (I did install awscli & boto3 via pip via StaSh).

from objc_util import *
import boto3
import os
import sound
import console
import uuid

def record(file_name):
    AVAudioSession = ObjCClass('AVAudioSession')
    NSURL = ObjCClass('NSURL')
    AVAudioRecorder = ObjCClass('AVAudioRecorder')
    shared_session = AVAudioSession.sharedInstance()
    category_set = shared_session.setCategory_error_(ns('AVAudioSessionCategoryPlayAndRecord'), None)

    settings = {
        ns('AVFormatIDKey'): ns(1819304813),
        ns('AVSampleRateKey'):ns(16000.0),
        ns('AVNumberOfChannelsKey'):ns(1),
        ns('AVLinearPCMBitDepthKey'):ns(16),
        ns('AVLinearPCMIsFloatKey'):ns(False),
        ns('AVLinearPCMIsBigEndianKey'):ns(False)
    }

    output_path = os.path.abspath(file_name)
    out_url = NSURL.fileURLWithPath_(ns(output_path))
    recorder = AVAudioRecorder.alloc().initWithURL_settings_error_(out_url, settings, None)
    if recorder is None:
        console.alert('Failed to initialize recorder')
        return None

    started_recording = recorder.record()
    if started_recording:
        print('Recording started, press the "stop script" button to end recording...')
    try:
        while True:
            pass
    except KeyboardInterrupt:
        print('Stopping...')
        recorder.stop()
        recorder.release()
        print('Stopped recording.')
    return output_path


def main(): 
    console.clear()

    path = record("{}.pcm".format(uuid.uuid4().hex))

    if path is None:
        print('Nothing recorded')
        return

    sound.play_effect(path)

    recording = open(path, 'rb')    
    session = boto3.Session(profile_name='lex')
    client = session.client('lex-runtime')          

    r = client.post_content(botName='BookTrip', botAlias='$LATEST', userId=uuid.uuid4().hex,
        contentType='audio/l16; rate=16000; channels=1',
        accept='text/plain; charset=utf-8',
        inputStream=recording)
    print(r)

    os.remove(path)

if __name__ == '__main__':
    main()

And here's the console output when I said book a car.

Recording started, press the "stop script" button to end recording...
Stopping...
Stopped recording.
{'slots': {'PickUpDate': None, 'DriverAge': None, 'ReturnDate': None, 'PickUpCity': None, 'CarType': None}, 'intentName': 'BookCar', 'slotToElicit': 'PickUpCity', 'dialogState': 'ElicitSlot', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HTTPHeaders': {'x-amzn-requestid': '33a2a3f2-6c5b-11e7-a3f3-d1ffbce58259', 'connection': 'keep-alive', 'x-amz-lex-slots': 'eyJQaWNrVXBEYXRlIjpudWxsLCJSZXR1cm5EYXRlIjpudWxsLCJEcml2ZXJBZ2UiOm51bGwsIkNhclR5cGUiOm51bGwsIlBpY2tVcENpdHkiOm51bGx9', 'date': 'Wed, 19 Jul 2017 08:21:02 GMT', 'x-amz-lex-input-transcript': 'book a car', 'content-length': '0', 'x-amz-lex-message': 'In what city do you need to rent a car?', 'content-type': 'text/plain;charset=utf-8', 'x-amz-lex-intent-name': 'BookCar', 'x-amz-lex-slot-to-elicit': 'PickUpCity', 'x-amz-lex-dialog-state': 'ElicitSlot'}, 'RequestId': '33a2a3f2-6c5b-11e7-a3f3-d1ffbce58259'}, 'contentType': 'text/plain;charset=utf-8', 'message': 'In what city do you need to rent a car?', 'inputTranscript': 'book a car', 'audioStream': <botocore.response.StreamingBody object at 0x108a4d4a8>}

Lex predefined BookTrip bot is used in this case.

HTH,
Zrzka

Max.Shih

I am also having the same problem as @jovica's.
I have tested on .raw, .wav, and .pcm files.
The files are me saying some valid sample utterance of my bot.
While the Lex console recognize what I am saying every time(so I think the issue is that my pronunciation.), the response from boto3 post_content, seems it doesn't know what I was saying.
(The wav file is me saying "go to the kitchen", however, the 'inputTranscript' returned is 'a a allen')
Can someone tell me what I've done wrong? Thanks.
Mine code is the following.

import boto3
client = boto3.client('lex-runtime')

WAVE_OUTPUT_FILENAME = "File.wav"
f = open(WAVE_OUTPUT_FILENAME, 'rb')
lex_response = client.post_content(
botName = 'ProtoBot',
botAlias = 'ProtoBotFeb',
userId = "12345678910",
inputStream = f,
accept='text/plain; charset=utf-8',
contentType="audio/l16; rate=16000; channels=1"
)
print lex_response

Yilia

It seems that you are a recording expert that know much about program. In spare time, I also make recording in the way I like by using one of the reliable screen recording tools for Mac. It is powerful enough to skip needless parts, cut recordings, specify configuration, highlight cursors, etc. However, knowing more about codec appears necessary as well if we want to be a professional one.