Forum Archive

Real time audio buffer synth/Real time image smudge tool

Mederic

Oct 21, 2018 - 01:28

This topic is about investigating ways to achieve real time in Pythonista in situations that are extremely real time dependent.
These situations are generally found in two fields. Audio (synth, filters, sound processing), and Image (smudge tool). What makes these situations difficult is that, as opposed to usual real-time applications, the computer doesn’t just need to have your ui input and a few variables to update the internal state of the program and play/display whatever it should, it instead needs the actual data that is being played/displayed (the screen image or the audio output). It’s a lot more data than a few hidden state variables. So while you‘re hearing the audio/seeing the screen and deciding what to do next, the computer is actually also working directly on the current audio/screen data in order to compute the next thing you will hear/see, the challenge is then to deal with these often large amount of data fast enough to deliver it in a seamless fashion to you.

Here are the advances we made in the audio department (see the end of the post for the image part)
Real time audio processing in Pythonista:
At the lowest level, real time audio processing is achieved with a processing (also called circular) audio buffer (a kind of array). Basically, the code iteratively fills this buffer with the next bit of sound and send it to your ears when you’re finished hearing the current one.
Most audio processing effects (filters, for instance) needs the very data that you’re currently hearing to compute the next bit of sound. So before sending it to your ears, they will copy that data and start working on refilling the buffer with the next part before you’re done with the current one and come back for more sound.
@JonB AudioRenderer wrapper (see below) is a great solution to have access to such an audio buffer functionality in Pythonista.
https://gist.github.com/0db690ed392ce35ec05fdb45bb2b3306

Here are my current modifications of @JonB files to get an antialiased sawtooth instead of a sine wave, a 4 poles filter you can control, unison, vibrato, chords (with several fingers) and a delay, all in one buffer/render method:

https://gist.github.com/medericmotte/d8e81b7e0961006d7026f16cc195682c

It’s set up to play one chord with number of notes = number of fingers on the screen (up to 4). You can control the filter by moving up and down the first finger having touched the screen.

It’s an inefficient implementation because all the work is asked in perfect real time as if the control parameters were changing on a sample by sample basis. It’s at the edge of glitching (to see it, just set the filterNumbers to 16 and notice the glitches when creating filter sweeps). The solution for that is to compute audio elsewhere with bigger anticipation chunks of audio corresponding to, say, a 60Hz rate (because that’s pretty much the rate at which touch_moved are received anyway) and then progressively fill the circular buffer with the computed data.

Real time image smudge tool in Pythonista:
A real time smudge tool (See also my second post) works similarly to a filter or a reverb, only it processes regions of an image along a brush stroke rather than bits of sound along a sound stream. @JonB ‘s IOSurfaceWrapper (see below, and thanks to him) made it easy for me to code a real time smudging tool (lots of comments in there as well):

https://gist.github.com/medericmotte/37e43e477782ce086880e18f5dbefcc8

It can be interesting to take a look at my previous approach, especially to compare their speeds:

https://gist.github.com/medericmotte/a570381ca8adfcec6149da2510e81da2

The difference, on my device at least, seems small at first glance, but when you smudge in a very fast circular fashion (around a small circle) you will notice that with my previous approach, the blue cursor can’t keep up and ends up being on the opposite side of your finger’s circular motion, while with the current approach, the cursor is always perfectly in line with your finger.

JonB

Oct 22, 2018 - 17:56

so, it actually is possible to use the underlying AudioUnit parts of coreaudio, using ctypes.
https://gist.github.com/0db690ed392ce35ec05fdb45bb2b3306

This has been something I have wanted to do for a long time.
It is a very rough first cut, but you can override the render method of AudioRender to do what you want. ios calls this method at a high rate (on my ipad, about 40 fps), you are provided a buffer pointer, and a number of samples (ios decides how long it should be), and you fill in the samples.

in this example, i create tones based on finger location, and despite horribly inefficient code, it manages to keep up in real time and provide gapless audio.

Mederic

Oct 23, 2018 - 13:26

Thanks a lot! I am very impressed and I had no idea it was possible!

Just in case, as I mentioned real time image in my post, I actually coded a smudge tool in Pythonista and it is kind of real time but still laggy.

Basically I use a 2D-numpy array representation of my image and constantly “blend” the portion around the cursor’s prev_location on the portion around the cursor’s location. Then, at a given rate, I update the image of my ImageView by converting the numpy array to an ui.image.

From what I tested the lag seems to mostly come from the conversion.

I tried doing that with PIL Images instead of numpy but it wasn’t really faster. I also tried directly doing it in an ImageContext and it was actually slower.

My question is, is there a way to use Metal (or some other gpu computing API) with objc_utils and ctypes to do that? And will it be faster?

JonB

Oct 23, 2018 - 18:14

what were you using to render the image?

I have never found a great way to do realtime image updates. It may be possible with some low level library, but up until now, but ios doesnt really allow access to screen buffers directly, except maybe in some low level video libraries (where you are passed a buffer, and expected to fill it). That deserves another look.

A few things I have found:
1) Converting to low res jpg is a lot faster than, say, bit accurate png. I use that method in my first attempt at a matplotlib pinch/pan view:
https://github.com/jsbain/objc_hacks/blob/master/MPLView.py (see updateplt). in that method, i have a thread that updates the ImageView using a reused BytesIO (reusing it saves a little time). updateplt is done in a thread, and has some locks to know when the conversion is complete, and basically tosses other calls, so that it updates as as fast a rate without affecting ui responsivity. Also, for the updates while moving, I use a very low resolution jpg for rendering, which is much much faster than doing bit accurate png. That allows for a pretty responsive feel, and maybe 20 fps or something, i forget. then it renders the full dpi after touch ends.

2) While working on porting an appleii simulator, i experimented with several ways of rendering to a ui.Image from a numpy array, very similar to what you want. here's an example speed compare:
https://gist.github.com/jsbain/1df982ee81e78ae8958b073fa7194a9c

At the time, I think I found that matplotlib.imsave was faster than PIL Image.fromarray, which is what is implemented i the screen.py -- though I think in the latest pythonista version, Pillow's fromarray is much faster. In the speedtest, a 300x300 is rendered to ui.Image at about 12 fps on my old Ipad3 with Image.fromarray, versus 6fps using matplotlib.imsave. I used a similar system to basically renders as fast as possible, and calls to update get queued/grouped if there is already an update in progress (you may be able to use screen.py directly, though probably would want to switch over to the faster method). I am actually using a custom view draw rather than an imageview, though I forget if there was a good reason for that.

3) For your smudge application, one thing you might consider is to have an imageview, with a custom view on top (similar to the applepy screen above) that only renders the portion of the view that has been touched. i.e you keep track which pixels are dirty, and only render the bounding box of those dirty pixels. you would keep track of the corner of that bounding box, so that you can then use .draw() with the right pixel offsets. Then in the background, you would be rendering the "big" image, maybe when the finger lifts, and reset the dirty pixels.

A variation on this would be to divide the image up into small chunks, and render ui.Images for a chunk only when that section has been affected, the. your draw() method would always ui.Image.draw all of the chunks at the proper locations. that avoids having to ever render the big image.

JonB

Oct 23, 2018 - 19:24

Hmm, looks like IOSurface backing a CALayer might be an easy way to do what we want here, without going through an intermediate image... that will be tonight's experiments.

Mederic

Oct 23, 2018 - 19:30

I will clean my code a little bit and post a link later.

I had already try that trick with the small imageView around the cursor. The thing is, when doing big strokes with the smudge tool, the small imageView isn’t big enough (and making it bigger with time ends up causing lag like when there is no small imageView), so I had to test when the cursor leaves the small imageView area during the stroke and update the big image when that happens before moving the small imageView back to the cursor. For some reason, it didn’t really improve anything compared to not using a small imageView. Somehow, the big image updates were still expensive, and although I could make them happen less often by making the small imageView bigger, the small imageView updates would then cost more, and in the end, no real improvement.

However, instead of a small imageView, directly using a custom view allowed me to ask the code to “convert the dirty portion of the numpy array and draw it” in one line in the draw def. Somehow it improved things a lot, but still caused time glitches/lag when the big image updates were needed.

Now I am experimenting with several small custom views, basically relaying each other when the cursor leaves their respective areas, so that I only have to update the big image when the last small view has been used. I am using 8 views and It’s almost perfect.

I will try your variation though. It could definitely be perfect as well.

Btw, I use fromarray to render the image ;)

And I don’t know anything about IOSurface and CALayer, I am (kind of) new to this kind of librairies

Mederic

Oct 24, 2018 - 12:13

So I cleaned up my code. I did my best but probably didn’t respect some conventions...
I wrote a lot of comments and explanations though.

For my IPad Pro 12.9, it is close to real time, although not as reactive as the Procreate Smudge tool, which I find fantastic, but, to my opinion (and taste), still better than a lot of smudge tools I tried in different apps, so I am kind of happy with it :)

You can use the Apple Pencil by setting applePencil=True
You can see the debug mode by setting debug=True

https://gist.github.com/medericmotte/a570381ca8adfcec6149da2510e81da2

By the way, I tried the method where you split the canva in several sub views in a grid, it seemed like having too many views at the same time is also causing lag.

enceladus

Oct 24, 2018 - 12:42

May be try to use scene and shader.

Mederic

Oct 24, 2018 - 13:16

It might work but then I’d still have to reload the texture as the numpy array changes. But maybe it woul be faster.

To avoid the constant reloading I would have to compute the smudge effect directly in the OpenGL code, but it has two issues for me:
- The texture would have to be stored with float data because smudging int8 causes some ugly spot around the white areas.
- I don’t know how I would change the texture in real time directly within the OpenGL code. Do you know a way to do that? I thought they were read-only here, but I do remember hearing about OpenGL image buffers, is it possible in Pythonista?

Mederic

Oct 24, 2018 - 13:30

There might be a simple way to do it with the render_to_texture function. I don’t know how fast it would be but I am gonna give it a try today.

enceladus

Oct 24, 2018 - 13:31

Look at Examples/games/BrickBreaker.py (particularly wavy option)

enceladus

Oct 24, 2018 - 13:39

FWIW my GitHub directory contains few basic examples on scene and shader. https://github.com/encela95dus/ios_pythonista_examples

Mederic

Oct 24, 2018 - 13:40

Yeah I’ll try that, but again, it’s the speed of render_to_texture() that will tell if it’s enough for real time.Because a function like wavy needs a texture of the image at frame n to display the image at frame n+1, but then I need to render that image to a texture so that the shader can process it and display the image at frame n+2, etc

Mederic

Oct 24, 2018 - 14:00

Actually, now I think about it, the problem is that scene and shaders compute their display only at 60 fps, and I think it’s not enough because for fast strokes you need to compute more often than that (otherwise you will have holes or irregularities between the smudge spots).

In my code I use a while(true) loop to compute the smudging (outside of the ui class) and its rate is only limited by the (very short) time numpy takes to add arrays.

By the way, somehow I now that it’s not good to use while(True) loops that way, but I don’t know what is the good practice to do the equivalent, at the same speed. Because of that loop, for example, right now when I close the ui window it doesn’t stop the code, and I need to do it manually with the cross in the editor. What should I do about that?

Mederic

Oct 24, 2018 - 20:05

@JonB :

So back to the topic of real time audio, I modified your code to have a sawtooth instead of a sine, and then implemented a simple lowpass filter. There is an unwanted vibrato sound happening in the background for high frequencies, which is probably an aliasing behavior due to the inability of the program to keep a perfect rate? I am not sure. If I set the sampleRate to 44100, the vibrato seems less important (which kind of supports my aliasing assumption? Again, not sure) but still noticeable. Interestingly, I tried sampleRate= 88200 and the unwanted vibrato was gone. The thing is, when one changes the sampleRate, the filter actually behaves differently. Basically, taking a higher sampleRate with the same filter algorithm will tend to make its cutoff higher, so, for the comparison to be “fair”, with a 88200 sampleRate I replaced the 0.9 in the render method below by 0.95, and unfortenately, the unwanted vibrato was back :(

I also thought maybe it was a problem with the data precision and error accumulation so I tried scaling up the data in the render method and renormalizing it in the end for the buffer but that didn’t fix the issue.

To hear the unwanted vibrato with a 11000 sampleRate, all you need to do is add an attribute

self.z=[0,0]

in the AudioRenderer class and then change the render method this way (to have a filtered sawtooth):

def render(self, buffer, numFrames, sampleTime):
        '''override this with a method that fills buffer with numFrames'''
        #print(self.sounds,self.theta,v.touches)
        #The scale factor was to try to win some precision with the data. Scale=1 means it doesn’t scale
        scale=1
        z=self.z
        for frame in range(numFrames):
            b=0
            for t in self.sounds:
                f,a=self.sounds[t]
                theta=self.theta[t]
                #dTheta=2*math.pi*f/self.sampleRate
                dTheta=(f*scale)/self.sampleRate
                #b+=math.sin(theta) * a
                b+=((theta%scale)*2-scale)*a
                theta += dTheta
                #self.theta[t]=theta %(2*math.pi)
                self.theta[t]=theta%scale
            z[0]=0.9*z[0]+0.1*b
            z[1] = 0.9*z[1]+0.1*z[0]
            buffer[frame]=self.z[1]/scale
        self.z=z
        return 0

JonB

Oct 24, 2018 - 20:43

@Mederic Re: rendering numpy arrays, iosurface/calayer is amazingly fast:

Here is an iosurface wrapper that exposes a numpy array (w x h x 4 channels) and a ui.View:
https://gist.github.com/87d9292b238c8f7169f1f2dcffd170c8

See the notes regarding using .Lock context manager, which is required.
Just manipulate the array inside a with s.Lock(), and it works just like you would hope.

On my crappy ipad3, I get > 100 fps when updating a 50x50 region, which is probably plenty fast.

edit: i see you are using float arrays. conversion from float to uint8 is kinda slow, so that is a problem.

JonB

Oct 24, 2018 - 20:59

@Mederic regarding while True:

doing while v.on_screen:
or at least checking on_screen is a good way to kill a loop once the view is closed.

Mederic

Oct 24, 2018 - 21:25

Ok thank you.

I ran your code and it is very fast but I have a question (and as I am still not familiar with the libraries you use, it might take a while to figure out the answer on my own):

The printed fps is around 1000 on my IPad Pro.

Now, I computed the fps of my PythoniSmudge code and I realize it’s important to have two fps data here:

The computation fps of my while(True) loop was around 300
The fps of my Views (computed by incrementing an N every time a draw function is over) was 40

That is important because the first fps makes sure the smudge tool is internally computed continuously enough to avoid having irregularities and holes in the path on the final image (nothing to do with lag), (which is the case with computation fps = 300), and the second fps makes sure that my eye doesn’t see lag on the screen (which is the case as soon as view fps>30)

My question is, what does your fps=1000 compute exactly? It seems to only be the computation fps but maybe I am wrong and it somehow includes the view fps as a part of it, but I would really need to isolate the view fps because that is really what causes the sensation of lag.

If really 1000 IS the view fps, then it’s more than enough.

JonB

Oct 24, 2018 - 22:27

I believe it is the actual view FPS but you might want to increase N to get better timing. The redraw method should effectively block while data is copied over.

What you would do is have a single view, from the iosurface. You could try s.array[:,:,0]=imageArray, but that may be slow since it must copy the entire image.

Better would be to determine the affected box each touch_moved, then only copy those:

with s.Lock():

s.array[rows,cols,0]=imageArray[rows,cols]

(Where rows And cols are indexes to affected pixels)

To keep monochrome, you would want your imageArray to be sized (r,c,1)
to allow broadcasting to work

with s.Lock(): 
s.array[rows,cols,0:3]=imageArray[rows,cols]

This way you only copy over and convert the changed pixels each move.

JonB

Oct 24, 2018 - 22:36

By the way... You might get acceptable performance with your original code if you use pil2ui with a jpeg instead of png format during touch_moved, then switch over to the png during touch_ended.
Also, you might eek out some performance by using a single overlay view, but rendering N ui.images, that are drawn during the view's draw method. That way you don't have the overhead of multiple views moving around. You would keep track of the pixel locations. See ui.Image.draw, which let's you draw into an image content. I think draw itself is fast, if you have the ui.Images already created.

That said, the iosurface approach should beat the pants off these methods.

Mederic

Oct 24, 2018 - 22:43

Ok I am going to give it a try. Regarding the N ui.Images method, I actually did that before and it was lagging. I think that’s because at every frame it dynamically draws the N ui.images as opposed to my current approach where at each frame the set_needs_display() method is used for only one miniview, the other ones are just “inactive” or “frozen”.

Also, I got a big improvement by only sending a set_needs_display() request every 3 touch_moved.

Mederic

Oct 24, 2018 - 23:26

Regarding the use of a float array: it’s kind of necessary for the smudge to be beautiful, otherwise, with int8, you get visible ugly and persistent spots around the white areas. What causes that is that if, for instance, you have a pixel of value 254 next to a pixel of value 255, and smudge on them, then at the first frame the 254 pixel will try to become, say, 254.2, but as it is an integer, it will stay equal to 254, hence the same thing will happen at the second frame, the third frame, etc. It will keep trying to go to 255 but fail and get completely absorbed to 254. In the end, the smudge won’t have affected it, and it gets worst: it will stay equal to 254 whatever number of strokes you make on it. On the other hand, if you use floats, then at the first frame the 254.0 pixel will become 254.2 (and get rounded to 254 for display, but stay 254.2 in the array), and at the second frame it will become, say, 254.4, and maybe then 254.55, which will be displayed as a 255 pixel, so the smudge will really have affected it correctly.

Mederic

Oct 25, 2018 - 00:56

I tried with IOSurface, and it’s really extremely fast!

I didn’t have to change my code too much so I will took a few minutes to clean things up and post a link!

Thank you!!!

Mederic

Oct 25, 2018 - 02:47

Here it is!!!

https://gist.github.com/medericmotte/37e43e477782ce086880e18f5dbefcc8

It made the code so much simpler and faster!

Thank you so much!

PS: Have you seen my post above about the “aliasing” vibrato in the real time audio buffer code? I don’t want to take too much of your time but now that one problem has definitely been fixed, I kind of hope the same for audio :)

JonB

Oct 25, 2018 - 06:17

I have not run the audio issue yet.. but two possibilities:
1) precision issue. The samples are float32, not double. For filtering you probably want to work as doubles before writing.
2) overrun -- if your code falls behind, iOS will skip frames. There are some fields in the timecode structure that help tell you what the time that the buffer will start, etc., But I haven't did into them.
Going to high sample rate means your code has less time to produce the same number of samples, increasing chance of overrun. You could compare the time that render takes to numSamples/sampleRate -- render time should be less than say 80% of the actual audio time. That's why I started with a low sample rate.

I tried speeding things up with numpy, but got bad results..care needs to be taken with how time is treated. Since frequency and amplitude change discretely, there might be a better design that ensures continuity of samples.

3) have you tried writing your samples to a wave file then playing it back? I.e is your filter and logic setup correctly?

Also, for sawtooth, I would think scaling the amplitude correctly is super important, because the signal must stay between -1 and +1, otherwise you saturate and that will produce harmonics. I haven't really looked at your code, but it might be worth mocking up the code to write to wave and see.

Mederic

Oct 25, 2018 - 10:34

1) The point of scaling the data was precisely to gain precision during the computation to take advantage of the exponent part of float32. But I will try with doubles.
2) I thought of overrun as well. Regarding numpy, I actually tried it when I was building a non-real-time synth, and I agree, it shouldn’t be used, even in non-real-time, because the computation is extremely non-parallel (numpy was very good in a parallelized version of my synth though, but in the end equivalent to the standard version with standard arrays)
3) The filter logic is ok (I’ve been using it for a long time in other apps, and also in my non-real-time Pythonista synth)
It’s simply two one pole filters in serie, with the formula y[n]=0.9y[n-1]+0.1x[n]. And as you can see it’s a safe barycentric operation so as long as y[n-1] and x[n] are between -scale and scale the value y[n] will also be, and then it’s divided by scale so between -1 and 1. I tried dividing by 1.2*scale just in case but it didn’t really improve.

The only difference in my non-real-time synth was that after having computed floats between -32768 and 32767 I didn’t have to divide it by 32768 in the end since .wav wants this int16 range. (Then I would use the int() function to convert)

I mentioned an unwanted vibrato but actually it’s a vibrato only if you keep moving the finger on the ui toward the high frequencies, because if it stays still, what you here is an unwanted additional frequency (that isn’t an harmonic, and whose frequency ocscillate when you keep moving the finger). That’s why I thought about aliasing because it’s typically what it does.

Mederic

Oct 25, 2018 - 12:59

@JonB Ok now I realize the issue was actually there even without the filter. I went back to my non-real-time synth, took out all the effects, played a simple high pitch sawtooth and the unwanted vibrato/“ghost frequency” was there. I am pretty sure now it is aliasing. I solved the problem using the PolyBLEP method, see Phelan Kane’s article and the code sample at the end:
http://metafunction.co.uk/all-about-digital-oscillators-part-2-blits-bleps/

Here are my current modifications of @JonB audio processing buffer files, that you can get at:
https://gist.github.com/87d9292b238c8f7169f1f2dcffd170c8

Here are the attributes to add in the AudiorRenderer class:

filtersNumber=16
self.filtersBuffers= [0.0]*filtersNumber
# cutoffParam=1 will mean no filtering
self.cutoffParam=1

Here is the anti-aliased sawtooth and the 16 filters in the render method:

def render(self, buffer, numFrames, sampleTime):
        '''override this with a method that fills buffer with numFrames'''
        #print(self.sounds,self.theta,v.touches)
        fb=self.filtersBuffers
        for frame in range(numFrames):
            b=0.0
            cut=self.cutoffParam
            for touch in self.sounds:
                f,a=self.sounds[touch]
                t=self.theta[touch]
                #replace 110.0 by f if you want to control the frequency and see there is no aliasing.
                dt=110.0/self.sampleRate
                t+= dt
                t=t%1
                saw=2*t-1
                self.theta[touch]=t
                if (t < dt):
                    t /= dt
                    saw-= t + t - t * t - 1.0
                elif (t > 1.0 - dt):
                    t = (t - 1.0) / dt
                    saw-= t * t + t + t + 1.0
                b+=saw*1
                #the a control (from 0 to 1) is used to change the cutoff by setting:
                lerpfact=0.2
                cut=(1-lerpfact)*cut+ lerpfact*a
            #setting the first filter input to b=sawtooth wave
            input= b
            #start the loop around filters:
            for f in range(len(fb)):
                fb[f]=(1-cut)*fb[f]+cut*input
                input = fb[f]
            buffer[frame]=input
        self.filtersBuffers=fb
        self.cutoffParam=cut
        return 0

It’s set up to a have a fat filtered 110 Hz saw bass (showing the filter works in real time, despite a few glitches), but if you replace 110 by f to control the frequency you can check there is no aliasing even at high frequencies.

You can find the modified files here:
https://gist.github.com/medericmotte/b523acbc1c446ca889e7471afa5a9b2f

Mederic

Oct 25, 2018 - 21:30

@JonB How would I go about getting a stereo buffer with your code? Most of my sound is mono but in the very end I like to add two delay effects for each ear with different parameters to give a sensation of space, so I would like to end the render loop with something like
Buffer[frame]=[out1, out2] or something similar.

JonB

Oct 25, 2018 - 23:22

here is a cleaned up idea, whereby there are different generators. Each generator has an amplitude and base frequency, and an internal numpy buffer, which is used as a circular buffer. So we can make use of numpy rather than filling one sample at a time, should be more performant.

Samples get buffered based on a time stamp -- and get generated in touch_moved or update. But phase always increments on a per sample basis -- so if we fill the buffer, we just fill what we can. Then render method of the audio unit is simply popping the number of requested samples out of the circular buffer, and writing into the audio unit buffer, so that part should never overrun.

I still get some dropouts, though you can pre-charge the buffer more in touch_began. The debug text is showing number of buffer underruns, overruns, and current number of buffered samples.

This approach may result in some latency, but it's better at preventing tearing.

I offer a sinewave, sawtooth and triangle wave, then my plan was to implement a filtet decorator/function that let's you define filter coeffs, but that is not done yet.
https://gist.github.com/jsbain/ed6a6956c43f3d8fd40092e93e49a007

The buffer, filter and mixing is all done as doubles, conversion to float is at end. I try to maintain phase continuity (though in retrospect I might be doing it wrong).

JonB

Oct 25, 2018 - 23:39

@Mederic for stereo,
streamFormat.mChannelsPerFrame = 1;

Would be set to 2 instead of 1. Some of the other fields would then get multiplied by 2, see
https://developer.apple.com/documentation/coreaudio/audiostreambasicdescription

In my more recent code, above, buffer in render_callback, would be cast to pointer to c_float2inNumberFrames, in which case, when converting to array, b in render could be accessed as b[idxChannel,:] for one channel.

You could either create the generators in stereo, or have different generators filling reach channel.

Mederic

Oct 26, 2018 - 01:09

@JonB Thanks. I noticed you haven’t antialiased your sawtooth, hence I still hear the unwanted vibrato/ghost frequency in the high frequencies (although maybe less dominant).

Here is my previous code with only the antialiased sawtooth (you will notice that the ghost frequency is gone):

https://gist.github.com/medericmotte/d99357919ce0ed658e5fa6e3b9d82121

And here is my current code with antialiased sawtooth, vibrato, unison, chords, filter, and delay all in one render method:

https://gist.github.com/medericmotte/d8e81b7e0961006d7026f16cc195682c

With this inefficient “all computation in the render method” implementation, on my device, I am able to play one lfo vibrato, 4 notes at a time (4 fingers), each triggering 4 unison voices (detuned), so 16 antialiased sawtooths in total plus one lfo sine, at the same time, plus 4 one pole iir filters in serie, and a 1 second delay with feedback, all in one render method, with sampleRate=44100.

It works glitchless with this setting, but if I set the filtersNumber attribute to, say, an unusually high 16 poles, some glitches can be heard while changing the cutoff.

My initial idea, when creating this topic, was to fill the circular buffer with the numbers computed by my already programmed non-real-time synth, as they come (by 60Hz chunks). As you said, it may or may not introduce a bit of latency depending on the complexity of the synth, but it will at least prevent the glitching as long as I set the latency high enough. That’s what I am going to do next.

Regarding the use of numpy in this context, before a created this thread I actually had implemented two versions of my non real time synth. One standard serialized, and the other using numpy in parallel.

For the parallel one, the saw/chord/unison part was easy to parallelized, but the trick was to get a parallel algorithm for IIR filters and feedback delays. The problem is that their output depend on its own past values so they can’t, as is, be parallelized. I managed to get a parallel algorithm by truncating their transfer function’s infinite development. For instance, by approximating 1/(1-az ) = 1+az +a^2z^2+a^3z^3... +a^nz^n+... by 1+az +a^2z^2+a^3z^3+ ... + a^10 z^10. Then it becomes a (10 order) fir filter and can be implemented in parallel with numpy. A fast implementation of this approximation is the (classic):

For i in range(approximationOrder):
{OutputVector= 1+a*InputVector
InputVector=OutputVector}

Here is a test code for my serialized synth (the saw hasn’t been antialiased yet in this code):

https://gist.github.com/medericmotte/5330028059e3e94198a14b4f87a9189e

Here is the equivalent for my parallelized synth:

https://gist.github.com/medericmotte/85b0d3f9eb7bb30c03f87e5c0eb16322

On my device, to compute a 5 second sound (3*4 sawtooths + 16 one pole filters), the serialized synth take 2.9 seconds and the parallelized synth take 1.2 seconds (with a 10 order FIR approximation of each IIR).

So they are kind of faster than real time with this simple setup, which makes me think that latency may not even be a problem in most cases.

Mederic

Oct 26, 2018 - 15:57

@JonB My last post has changed quite a lot during the day so I hope you read the last version :) Sorry about that.

JonB

Oct 26, 2018 - 17:42

Your previous anti alias sawtooth wouldn't even play on my old device, whereas my numpy version basically plays with very few glitches (I need to push an updated version, I forgot to reset the buffer in touch ended, and the version I posted might have been playing only one buffer instead of adding. ). You could also do numpy within the render method, sounds like maybe that is what you have done.

I have not tried it, but I was thinking it might be acceptable to over sample, then use a half-band type FIR filter (convolve should be the fastest way to implement filters in numpy), then decimate. You'd have to store enough old samples to pre charge the filter.

In my mental model, one would have generators, filters, mixers, that do the work on their own circular buffers, then output samples on demand.

JonB

Oct 26, 2018 - 18:01

By the way, numpy arrays have a roll method that I think so the same as your shift, but should be a bit faster because done in c. Both make a complete copy, which probably might be inefficient.

I think y=numpy.convolve(h,x,'valid') where h has the FIR filter coeffs, should be a lot faster than a python loop. The filter would need to internally keep Len(h) original samples from the end of the previous signal. Also, depending on size, using fft might beat convolve. Though maybe not for these really small chunks.

Mederic

Oct 26, 2018 - 18:05

Ok I downloaded my code and ran it and it still works on my device so it could be a matter of device but I would be very surprised if it were because it’s a very small additional computation. In my code I set sampleRate=44100. Maybe that’s the issue on your device. Just to check, could you take your working audiounittest.py file and, keeping sampleRate=11000, just replace

dTheta=2*math.pi*f/self.sampleRate
b+=math.sin(theta) * a
theta += dTheta
self.theta[t]=theta %(2*math.pi)

dTheta=2*f/self.sampleRate
theta+=dTheta
theta=theta%1
saw=2*theta-1
self.theta[t]=theta
if theta < dTheta:
    theta /= dTheta
    saw-=theta+theta-theta*theta-1.0
elif theta > 1.0-dTheta:
    theta=(theta-1.0)/dTheta
    saw-=theta *theta +theta+theta+1.0
b+= saw *a

because it was the only real difference. With a 11000Hz sample rate the sawtooth will miss high harmonics and sound filtered but there shouldn’t be additional inharmonic frequencies due to aliasing

Also, the aliasing issue isn’t the same thing as the glitch issue. It’s really a second inharmonic frequency playing in the background but only for high frequencies, and it has nothing to do with overhead since I could hear it on my non real time synth (which first records my finger’s motion on the screen in an array, and only then computes the whole sound at once).

Yeah I thought about fft and I agree, especially for very high order filters and large audio chunks. Regarding convolution, I also agree, I don’t know numpy very well and wasn’t aware of this numpy.convolve method.

Mederic

Oct 26, 2018 - 18:37

Your mental model is the modular environment model and also the one used by Apple’s Core Audio high level audio processing system.

One of my goals was to kind of bypass this model to have the feeling of working directly on the audio data. But now I think about it, what really bothered me initially was that in visual modular environments you can’t easily have aliases and you can’t do for loops. However, implementing your model would totally allow to do loops and aliases in the code when creating and patching the modules, so I think I would dig it.
Also, it’s even more interesting since there are no text-based environment like Chuck or Supercollider on the iPad, only visual modular environments.

About the numpy way. I tested my two versions with a smaller duration. Instead of 5 seconds I used 1/60s. Here are the average results

serialized: 0.00995s
numpy parallel, order 10 approximation (without « convolve » for the moment): 0.00814s

So on small chunks, the numpy approach still beats the serialized one even when convolve isn’t used.

JonB

Oct 26, 2018 - 19:27

no, your code works, my point was that on old hardware, it is much much much slower, such that all of the filtering and non-buffered it was constantly overrunning (though it might have also just been my pythonista had gotten all screwy). when you were getting 300-1000 fps in the graphics, i was getting <60.

zin my current approach, I am thinking it might also be a good idea to only compute samples to fill the buffer inside update, so that we can use as long as possibly numpy buffers, minimizing the overhead. though you could also do this in the render of the audiounit -- there are ways, i think, to specify the minimum and maximum number of samples that the audiounit can ask for.

Mederic

Oct 26, 2018 - 19:42

Ok I was more referring to the antialiasing algorithm itself rather than the way I did it in the render method, I mean that if you use this algorithm in your numpy approach it would probably kill the aliasing ;)

Edit: About numpy, with an order 10 approximation (truncating the transfer function at the z^10 term), it still beats the serialized approach by 0.002s on a 1/60s chunk (see above). But with an order 30 it looses (without the convolve or roll methods, which I will try).

Mederic

Oct 26, 2018 - 22:17

I think that for pure oscillator modules it might be simpler to store a wave table in the AudioRenderer and have the buffer loop around it by skipping every self.freq index in the table. The filter (only for this oscillator) could then simply be the program updating the wave table (using numpy for instance).

JonB

Oct 26, 2018 - 22:17

I have not run your latest stuff.. will try this weekend.

A few thoughts. First, you might consider generating sawtooth via weighted sum of sines, up to Nyquist. (Using coefficients from the Fourier expansion). Thus, you can perfectly bandlimit, with a small number of additions and without the headache or distortion of filtering. i.e for 440Hz fundamental, summing through 6th harmonic gets you to 14480, 7th would be 28960 which would fold over if sampling at 44100. So, you get pure tone by effectively adding 6 numbers, probably faster than trying to do a 20 tap filter, though at the expense of 6x the number of sin calls.

I suspect much of the aliasing you are getting is this sort that you can't filter after sampling. When you go to 88kHz, I'm guessing the hardware has limits in the 40kHz range, so your filtering does help aliasing.

A low frequency sawtooth requires more terms obviously, than a high frequency, so this is not a constant time operation, but that's probably ok. You can do this as a list comprehension which accelerates this, or you could generate t as matrix, then do multiply the row vector amplitude weighting times the sin(matrix) which then does the weight and sum in one efficient numpy step.

If you are then trying to modulate (multipling by some low frequency tone), there might be other considerations.

For arbitrary signals that you get out of nonlinear operations, one thought is you might be able to fine tune some filtering by oversampling, say 2 or 4x. Since numpy is efficient, that might still be small compared to filter overhead. And this would eliminate any real aliasing for signals higher than Nyquist of the output audio.

With numpy.convolve you'd end up throwing away samples, so you could do a custom convolve using a downsampled toeplitz matrix. For oversample factor q, and FIR filter coeffs in h

T=np.zeros(N,N*q)
for i in range (N):
    T[i, q*i:(q*i+len (h))]=h 
xf=T*xs

(You may need to ensure T is a numpy matrix, not array, so maybe need an .asmatrix before multiply.. I forget how that works in numpy)

Thus, the filtering op can be expressed as a Nx(qN) times qNx1 matrix multiply which should be efficient and vectorized in numpy. Your modified toeplitz can be precomputed if the number of samples is fixed, which might be possible to force the audiounit to do.

You have manage the filter transients by saving len(h) samples of the end of the previous signal.

I think there is a rule of thumb that Ntaps=A/(22×B) where B=(fstop-fpass)/fs. So in this case, if we wanted 60 dB attenuation at fs/4

Mederic

Oct 26, 2018 - 22:21

@JonB Sorry, not sure if you have seen my latest post since you posted at the same time:

JonB

Oct 26, 2018 - 22:28

I thought about that.. you would be limited to specific frequencies, f=M/N*fs for integers M,N. If M is high enough, then yeah you eliminate all the sin calls.

Mederic

Oct 26, 2018 - 22:34

I like the sum of sines method because this was my first understanding of sound synthesis and for a long time I wanted to implement everything that way and with gpu computing. Then I learned about DSP and the whole poles and zeros thing and kind of fell in love with it. But I might try the additive thing again with numpy.

But as you said, it is a lot of sines for low frequencies, especially when you realize that sawtooth are the most beautiful in this range (for bass sounds etc), and with unison it multiplies the number of sines to add together, and with chords it’s even worst.

JonB

Oct 26, 2018 - 22:39

Ok, I see what you mean.. yes each generator can produce an antialiased signal, either by direct synthesis (sum of sin's) or by precomputing a filtered oversampled version that can work at multiple fixed freqs.

For linear mixing, just adding signals, that would be enough to produce clean sound, though maybe one has to watch out for clipping.

For x(t)*y(t) type mixing, you get frequency multiplication so need to either prefilter the antialiased signals again, or over sample, mix, then filter/decimate. Or some combo... If you antialias first, you may miss real content such as multipling a 44000 sine with a 44444 sine produces a real tone at 444 and 88444, but if you antialias first you would get silence.

Mederic

Oct 26, 2018 - 22:43

Also, I am not totally sure sum of sines is equivalent to DSP filters. I mean, in a stationary state it IS equivalent, but when you modulate the cutoff the output signal is in a non stationary state and I don’t think the additive synthesis is then equivalent to the DSP.

I am sure I wrote that but I must have deleted it:

The polyBLEP method can really be seen as adding to a hard angle sawtooth the right residu to kill the hard angle and put a soft angle in its place. It’s also what filtering does but here it’s not really that. I mean that it’s an analytic method, so it can be formally/analytically computed for any frequency without storing anything. It’s just a 2 order polynomial added to the sawtooth.

Saw being the hard angle sawtooth, theta being its phase, and dTheta the frequency/sampleRate:

if theta < dTheta:
    theta /= dTheta
    saw-=theta+theta-theta*theta-1.0
elif theta > 1.0-dTheta:
    theta=(theta-1.0)/dTheta
    saw-=theta *theta +theta+theta+1.0

And you’re right about the x(t)*y(t) part. But it’s kind of an extreme case you took. Most of the time, this kind of multiplication happens between an osc and an lfo.

Personally, for my usage, I’d like to stay minimalistic so not complicate the code too much to take into account extreme/rare situations. My goal is to tweak the code creatively during the music composing process, so I need a base-code that is simple and tweakable.

I will focus on the numpy pre-buffering idea and the convolve method. Right now I am worried about the fact that I would need to compute the impulse response each time the filter parameters are changed...

If I use my 16 poles filter with a 10 order approx of each IIR one pole filter in it, it means I have to compute a 160 coefs impulse response everytime a parameter is changed. I wonder if it’s really manageable for fast filter sweeps. One solution might be to compute these 160 coefs for each value of the filter’s parameters, so a 160xN matrix if only cutoff, but a 160xNxM matrix if cutoff+resonance...

I’ll probably end up not using convolve because of that. But I am planning to add an additive synthesis functionality so that a special additive generator can emulate a sawtooth+filter. Even if it’s not mathematically equivalent when the cutoff is modulated, it’s probably musically interesting ;)

Mederic

Oct 26, 2018 - 23:33

And for what you said about only having frequencies of the form M/N * fs , a solution is to keep track of the phase theta (as a precise and continuous number, not necessecarily a multiple of 1/fs) and then the render method looks at the wavetable, either at the closest index to theta (so [theta*fs], or better by interpolating between this index and the following value in the table. But anyway keeping track of the exact phase should give you any frequency. The table is just a lookup table.

JonB

Oct 26, 2018 - 23:49

Maybe I misunderstood your filter.. are you high pass filtering or lowpass?

I am not a digital music person, but I see now you are going for "supersaw" type effects, as opposed to "pure" sawtooth.

http://www.ghostfact.com/jp-8000-supersaw/
So high pass filtering to keep the low frequencies clean is what your are after -- meaning you need to change the filter as frequency changes (or have a bank of filters that you select depending on frequency). Blep is more about shaping things in the time domain.?

Still, doing things vectorised, an antialiased supersaw would be 7 tones × mabe 10 octaves at most, might still be okay if you do it as matrix based ( no loops)

Another approach would be to use GPU processing. It wouldn't be too hard to write a shader that computes hundreds of filters in parallel, and you just select the one that you need, though moving data in/out now becomes the limiter.

Mederic

Oct 27, 2018 - 00:21

Ì am lowpass filtering. Not a huge fan of high frequencies in general. Love the warm low ends.

What might be confusing in my code is that I don’t use standard algorithms. I have experimented a lot to find my own preferences in filtering and everything :) So my algo probably doesn’t correspond to standard stuff, but is still based on the zeros/poles theory.

There are several components in my personal synth:
- A pure antialiased sawtooth.
- A unison : basically duplicating 4 or 6 times the mentioned sawtooth and detuning/dephasing it.
- A filter. I actually like the unconventional 16 poles because of the way you can get back an almost perfect sine by setting a low cutoff.
- A slight vibrato applied to every frequencies.
- A stereo delay which, combined with the vibrato, creates a very minimalist but beautiful and pure reverb effect (I don’t like convolution reverbs, too resonant).

I am not necessarily going for a supersaw like the ones you hear in techno. I tend to play some parts with no delay/unison and just a dry low sawtooth with filter cutoff modulation, and some other parts with delay+unison to get the feeling of a big space and ensemble.

Not a fan of complex patches. I like the brut minimalistic sensation of electric tension vibrating in wires. The sawtooth provides that feeling. The filter allows me to stay on the edge between « too soft » and « too aggressive » in a human expressive way to get that « breaking through the air » sound and that tension :)

So basically I don’t want to make the sound more interesting by the way its built but more by the way I play it and modulate it with expressivity. That’s why real time is important to me.

My musical genre is closer to movie music than techno, so not aggressive, lots of low ends and long chords. Kind of an orchestral synth feeling.

Actually, I’ve been doing sound synthesis in modular environments for a long time and I know now what I like. The only part I was really missing was the access to a circular buffer. And now I am just trying to polish the process to have as few glitches as possible and no aliasing, given what I like. I think it will have to do with a bit of precomputing/latency and well placed numpy gpu computing, with the kind of ideas you mentioned.

There is something beautiful in the zeros/poles theory of DSP. I like to think about filters in terms of placing the zeros and poles rather than in terms of actual cutoff frequency and resonance. In that approach, I am not really interested in the actual cutoff frequency, because I tend to modulate it a lot anyway... so not a huge fan of standardized lowpass filter algorithms.

But I do like the idea of defining the shape of the frequency response I want and summing sines according to it, because it’s another kind of freedom and because I also like additive synthesis anyway (organ, pads stuffs).

JonB

Oct 27, 2018 - 02:20

@Mederic I realized in my last post, iwas an idiot an confused harmonic vs octave. So, obviously 10 tones is insufficient for a 20 hz sawtooth...

Haven't read your latest post yet, but I just realized this...

JonB

Oct 27, 2018 - 02:31

Take a look at the generator code, you could still do all of your work there in one place, to take advantage of buffering.

One thing I don't quite get in your approach.. if you're low pass filtering after sampling, that should not affect aliasing -- you are already aliased with a sawtooth, and that will rollover and produce low frequency garbage. If you added noise before filtering, I suppose it would help with quantization artifacts, but for 32 bit those are down reasonably low I think. .The sawtooth smoothing with the polynomial would help with actual sampling aliasing. The filter is shaping your sound certainly, but is not antialiasing... Am I missing something?

Mederic

Oct 27, 2018 - 11:28

@JonB I never said I used the filter for antialiasing. Maybe I was confusing because I started talking about the aliasing problem at the same time as I talked about the filter, but they have nothing to do with each other in my code.

The only thing I use to antialiase is the polyBLEP method (the polynomial residu), and it doesn’t just help with antialiasing, it works really well (although not technically killing frequencies > Nyquist , it weakens them enough to be inaudible).

That being said, theoretically speaking there is actually one (theoretically perfect) filter that WOULD antialiase a sound with base frequency=freq: it’s, by definition, the one with frequency response g(n*freq)=(n*freq<Nyquist). The corresponding signal the filter would need to convolve its input with is then

IR(t)=sin(freq*t)+sin(2*freq*t)+sin(3*freq*t)+...+sin(Nyquist*t)

(Here I assumed that Nyquist is a multiple of freq, and I omitted the 2pi factors) This function can be computed using sin(x)=Im(exp(ix)) and the geometric sum formula. I am gonna omit the 2pi factors here again.

IR(t)=sin((freq+Nyquist)*t/2)*sin(Nyquist*t/2)/sin(freq*t/2)

Anyway, it is a continuous time function. In other words, you would have to convolve the filter’s input with a continuous time function as opposed to the discrete impulse response DSP filters use. That’s why DSP filters can’t do (perfect) antialiasing, and that’s why I am not counting on it, even if oversampling them would make them more time continuous, they would still be time discrete by nature (also, the form of their transfer functions shows that their frequency response loops around the complex circle so any lowpass DSP filter ends up not killing some frequencies if you go high enough).

Mederic

Oct 27, 2018 - 12:51

And regarding the sum of sines, for a 100Hz note you would need to go up to Nyquist=22050 so approximately 220 sines to sum just for one note. If you want 7 detuned saws playing one note in unison, you would need 7*220= 1540 sines. If you want to play a chord of 4 notes with it and in that range, you would need approximately 1540*4= 6160 sines to sum at the same time, each multiplied element-wise by some frequency response function to evaluate in parallel on a 6160 sized vector (the frequencies), all that knowing that one might play notes in a fast way so you can’t compute things too much in advance. Do you think it could work? You know more than me when it comes to computational time (I have a mathematical background and have done a lot of programming but only as a hobby so still a lot to learn).

Mederic

Oct 27, 2018 - 13:36

@JonB I think I need to take a step back and put in practice all this ideas we mentioned. There is a lot of information in there, and I have a lot to learn (c_types, objc_utiles , more about numpy, Apple Core Audio) and a lot of ideas to try (Standard DSP, parallel approximation, convolution, sum of sines, pre-buffering) and I feel like I need to catch up with all that before we keep thinking about new ideas (the more ideas we get, the more work it will take to try them all) ;)

Also, I will try to code it first myself (as a learning exercise) before looking more at your code (as a reference/solution). I learn better that way.

I will get back to you when all of this will be done. I am under the impression that you are also interested in coding a synth with Pythonista (correct me if I am wrong), but it seems that your vision is to have a complete modular environment while I am (right now) really trying to do something minimalistic and customized for my personal (and subjective) preferences, so we will probably end up with two different codes :)

JonB

Oct 27, 2018 - 17:39

I am not coding a synth. Mostly I have been interested in pushing the boundaries of high performance on pythonista, as a fun exercise, and trying to understand these various libraries. Making the iPad bleep is also kinda fun ;)

Mederic

Oct 28, 2018 - 02:23

@JonB Coming back sooner than I thought :)

I made an interesting experiment with your codes and there is some phenomenon I am not sure I understand right.

If you take your audiounittest.py code (the very first one) and play the sine by modulating the frequency very fast on the whole range, you will notice quantized modulation. Now if you take your other code: audiounittest2.py, set it to a sine, and do the same thing. It’s a perfectly smooth frequency modulation. Do you hear the difference? It’s not really audio glitches like caused by overheading. It’s really « quantized » frequencies.

Here is what I tried to fight that. In the render() method of audiounittest.py, I stored the last used frequency in the previous render() call in a self.previous_frequency attribute.

Then, at the beginning of render(), I assign self.previous_frequency to a prev_freq variable and the frequency from self.sound[touch_id] to a current_freq variable. Then, during the « for frame in range(numFrames) » loop, I interpolate between prev_freq and current_freq. It killed the frequency quantization effect. That’s the only way I could get that perfectly smooth modulation I was hearing in audiounittest2.py.

The problem was solved but I still wanted to understand what was the issue.I first thought it had to do with maybe the touch_moved() method getting slowed down in audiounittest.py by the less efficient implementation compared to audiounittest2.py, but I just timed them and both touch_moved are around 110Hz. In other words, the sounds[touch_id] attribute changes as often in audiounittest.py as in audiounittest2.py. So it doesn’t explain the quantization.

Here is my current guess.

In audiounittest2.py, you compute the samples perfectly without missing any frequency values because the touch_moved() method is actually calling the generator and sending to it the frequency as a parameter. So no frequency is missed by the generator.

On the other hand, in audiounittest.py, the render() method generally fills the buffer faster than the duration of the buffer itself. In other words, there is always time when the render() method « waits » before filling the buffer again. The consequence is that when it fills the buffer it actually only accesses the first few frequency values happening during this very short time and use them to fill the whole buffer. Then, during the waiting time, it misses all the other frequency values. Btw, I don’t think it falls under the scope of overheading. To me overheading is the opposite (the render() method being to slow), but here it would be kind of too fast.

What do you think about this? Am I understanding the issue correctly?

JonB

Oct 28, 2018 - 06:26

Check how many samples render is asking for (display numFrames). In my version, it asks for 1024 at a time for 44100 sample rate. So, if you move frequency over 8000 Hz in half a sec, (22000 samples), it will be quantized into 8000/22 Hz chunks.

The buffered approach updates the buffer (up to present time, at least using time.perf_counter) for every touch moved, and within update. So over 0.5 sec you might get several hundred touch events, and updates every few msec.

https://gist.github.com/6ccd9ad8ba95c373ec7d76ceaf9061bc has some minor corrections, adds some diagnostic prinouts, and pulled all the ctypes garbage into a separate file.

So, even if you don't use the modular generator approach, you might consider using a custom generator (i.e subclass ToneGenerator and handle the logic within the buffer filling methods.

It is also theoretically possible to force the audiounit to ask for data more frequently. https://developer.apple.com/documentation/audiotoolbox/1534199-generic_audio_unit_properties/kaudiounitproperty_maximumframesperslice?language=objc

        maxFrames=c_uint32(256)
        err = AudioUnitSetProperty(toneUnit, kAudioUnitProperty_MaximumFramesPerSlice,
        kAudioUnitScope_Global, 0, byref(maxFrames), sizeof(maxFrames));

however this didnt't work when i tried.

Mederic

Oct 28, 2018 - 09:57

1024 as well for me. Please refer to audiounittest.py or audiounittest2.py because they are both your version :)

I agree with you about the 8000/22Hz computation but it’s only true if my assumption (the render() method filling the buffer way faster than the buffer’s duration) is true.

1) if the render() method was taking 0.02s to fill a 0.02s buffer, then, as it accesses the sounds[touch_id] attribute at each iteration of the « for frame in range(numFrames) » loop, it should be accessing the correct frequency values in real time and at the right time and fill a 0.02s buffer with frequency values corresponding to a 0.02s time of modulation, so there shouldn’t occur quantizing, at least not more than in audiounittest2.py, knowing that touch_moved updates occurred at 110Hz even for the fastest moves in both codes, so the quantization should happen at a 110Hz rate (which is not noticeable and appears as smooth) (Even if you remove the computation in the update part of audiounittest2.py you won’t notice quantization)
2) If on the contrary, the render() method takes 0.001s to fill a 0.02s buffer, then, even if it accesses the sounds[touch_id] attribute at each iteration of the « for frame in range(numFrames) » loop, it will be doing so during a 0.001s period, thus accessing frequency values corresponding to such a small period of time and using it to fill the whole 0.02s buffer. So it will be almost as if it only gets the very first frequency of the 0.02s and use it for the whole chunk resulting in a 1/0.02 = 50Hz quantization. Somehow this is noticeable on fast modulation (like 30fps vs 15fps in graphics when there is fast motion, I guess).

Sorry for the redundancy but I wanted to be more precise in what I meant.

Also, I do intend to have a prebuffered approach in the future, I am just studying the differences in order to learn and figure out more exactly how the render() thread works.

Mederic

Oct 28, 2018 - 13:19

@JonB , just to keep you up to date: I finally took my initial non-real-time synth and merged it with audiounittest. It wasn’t long to do because my synth was actually already computing 60Hz chunks of sound data in the update method (of a Scene), appending it to an out array. I had done that in the past back when I was trying to achieve real time by continuously writing in a .wav file at each update (that’s when I realized that it couldn’t really be done right and came here to create this thread about the audio circular buffer functionality).

So it was already all set up for real time and the circular buffer! The only thing I had to do was to not write in a .wav file, and place the out array in the AudioRenderer’s attributes, and have the render() method start reading it after waiting a fixed latency attribute. Weirdly, latency=0 actually works glitchless for me :) I guess that’s because the first render() doesn’t work, in which case the first audio chunk is kept null by default, and then the render() thread naturally takes a delay of one chunk behind the update() method.

Of course, later I will have to use some kind of deque or circular array for the out attribute because right now it’s basically recording the whole performance without deleting anything, but I am already really happy to have been able to just take my non-real time synth code from before and almost just copy/paste it in audiounittest.py! When you think about it, it’s very similar to what I had to do with the smudge tool: I almost didn’t change my code and basically merged it with your IOSurface wraper! It’s interesting how one can just plug a pure Python script to these features (IOSurface wraper and AudioRenderer) and work perfectly :)

And I still intend to take advantage of numpy in multiple places.

JonB

Oct 31, 2018 - 08:34

@Mederic
I remembered reading about the Accelerate framework. Might have some useful bits for you

https://developer.apple.com/documentation/accelerate/vdsp?language=objc

In particular: vector polynomial evaluation (though i image numpy is competitive here), IIR filter in biquad form, and some other goodness that would be in scipy, but not numpy.

Mederic

Nov 02, 2018 - 10:25

@JonB Sorry for the late answer. Thank you for this. I won’t have the time to look at it for a while because I have put other stuff on pause for too long for this project and now I have to catch up.

Right now everything will just feel like a bonus/improvement to my current code because it’s already working very well on my device.

Btw, as an experiment I tried using numpy to compute sawtooths as sums of sines (in real time) and it gets quickly hard to keep up for my device. A few notes at the same time and there was overhead. To be honest I didn’t try every possibilities. I basically summed the sines with according weights by stacking them as rows in a matrix and left-multiplying this matrix by a row vector of the weights.

I think the only reasonable way to use the sum of sines decomposition is to precompute wavetables and then use them.

ClywdSlade

Sep 11, 2019 - 18:32

Hi...I had an issue with one of their first plugins in Tracktion - Micah was very helpful, and we got to the root of the issue. I'm sure if you contact him he'd appreciate the chance to investigate and fix, especially if there's an issue with a DAW as popular as S1.