[oe] [PATCH 04/10] fluidsynth: performance improvements

Fri Dec 1 17:52:33 UTC 2017

On Fri, Dec 1, 2017 at 9:35 AM, Andreas Müller <schnitzeltony at gmail.com> wrote:
> On Fri, Dec 1, 2017 at 3:49 PM, Khem Raj <raj.khem at gmail.com> wrote:
>>
>> On Fri, Dec 1, 2017 at 3:08 AM, Andreas Müller <schnitzeltony at gmail.com>
>> wrote:
>> > * Use floats instead of double for sound calculations. This improves
>> >   performance notable and was default for version 1.1.6 using autotools.
>> > * Fix buffer overrun when using floats
>> > * Make use of ARM NEON for multithreading enabled
>> >
>> > Performance and sound correctnes was tested with qtractor and a reworked
>> > version of fluidsynth-dssi [1-2]. Tests were performed for single- and
>> > multithreading enabled.
>> >
>> > [1]
>> > https://github.com/schnitzeltony/fluidsynth-dssi/commit/bad09c6f5c5508c5f5330aa5188510f975e50c50
>> > [2]
>> > https://github.com/schnitzeltony/meta-qt5-extra/blob/master/recipes-misc/recipes-multimedia/fluidsynth/fluidsynth-dssi_1.0.0.bb
>> >
>> > Signed-off-by: Andreas Müller <schnitzeltony at gmail.com>
>> > ---
>> >  ...uffer-overrun-in-fluid_synth_nwrite_float.patch | 32 +++++++++
>> >  ...N-accelaration-for-float-multithreaded-se.patch | 76
>> > ++++++++++++++++++++++
>> >  .../fluidsynth/fluidsynth_1.1.8.bb                 |  8 ++-
>> >  3 files changed, 114 insertions(+), 2 deletions(-)
>> >  create mode 100644
>> > meta-multimedia/recipes-multimedia/fluidsynth/files/0001-avoid-buffer-overrun-in-fluid_synth_nwrite_float.patch
>> >  create mode 100644
>> > meta-multimedia/recipes-multimedia/fluidsynth/files/0002-Use-ARM-NEON-accelaration-for-float-multithreaded-se.patch
>> >
>> > diff --git
>> > a/meta-multimedia/recipes-multimedia/fluidsynth/files/0001-avoid-buffer-overrun-in-fluid_synth_nwrite_float.patch
>> > b/meta-multimedia/recipes-multimedia/fluidsynth/files/0001-avoid-buffer-overrun-in-fluid_synth_nwrite_float.patch
>> > new file mode 100644
>> > index 0000000..dda76cf
>> > --- /dev/null
>> > +++
>> > b/meta-multimedia/recipes-multimedia/fluidsynth/files/0001-avoid-buffer-overrun-in-fluid_synth_nwrite_float.patch
>> > @@ -0,0 +1,32 @@
>> > +From a13cb63103aa56b5e8bad816c7d13d6e01c0cd9f Mon Sep 17 00:00:00 2001
>> > +From: derselbst <tom.mbrt at googlemail.com>
>> > +Date: Sun, 26 Nov 2017 22:12:12 +0100
>> > +Subject: [PATCH 1/2] avoid buffer overrun in fluid_synth_nwrite_float()
>> > +
>> > +Upstream-Status: Backport [1]
>> > +
>> > +[1]
>> > https://github.com/FluidSynth/fluidsynth/commit/a13cb63103aa56b5e8bad816c7d13d6e01c0cd9f
>> > +---
>> > + src/synth/fluid_synth.c | 4 ++--
>> > + 1 file changed, 2 insertions(+), 2 deletions(-)
>> > +
>> > +diff --git a/src/synth/fluid_synth.c b/src/synth/fluid_synth.c
>> > +index 266d759..14f6b21 100644
>> > +--- a/src/synth/fluid_synth.c
>> > ++++ b/src/synth/fluid_synth.c
>> > +@@ -2752,10 +2752,10 @@ fluid_synth_nwrite_float(fluid_synth_t* synth,
>> > int len,
>> > +     {
>> > + #ifdef WITH_FLOAT
>> > +       if(fx_left != NULL)
>> > +-        FLUID_MEMCPY(fx_left[i + count], fx_left_in[i], bytes);
>> > ++        FLUID_MEMCPY(fx_left[i] + count, fx_left_in[i], bytes);
>> > +
>> > +       if(fx_right != NULL)
>> > +-        FLUID_MEMCPY(fx_right[i + count], fx_right_in[i], bytes);
>> > ++        FLUID_MEMCPY(fx_right[i] + count, fx_right_in[i], bytes);
>> > + #else //WITH_FLOAT
>> > +       int j;
>> > +       if(fx_left != NULL) {
>> > +--
>> > +2.9.5
>> > +
>> > diff --git
>> > a/meta-multimedia/recipes-multimedia/fluidsynth/files/0002-Use-ARM-NEON-accelaration-for-float-multithreaded-se.patch
>> > b/meta-multimedia/recipes-multimedia/fluidsynth/files/0002-Use-ARM-NEON-accelaration-for-float-multithreaded-se.patch
>> > new file mode 100644
>> > index 0000000..0e1846e
>> > --- /dev/null
>> > +++
>> > b/meta-multimedia/recipes-multimedia/fluidsynth/files/0002-Use-ARM-NEON-accelaration-for-float-multithreaded-se.patch
>> > @@ -0,0 +1,76 @@
>> > +From 2de7e128fbdf528716b500cf27ed9a4358c931c9 Mon Sep 17 00:00:00 2001
>> > +From: =?UTF-8?q?Andreas=20M=C3=BCller?= <schnitzeltony at gmail.com>
>> > +Date: Fri, 24 Nov 2017 00:05:35 +0100
>> > +Subject: [PATCH 2/2] Use ARM-NEON accelaration for float-multithreaded
>> > setups
>> > +MIME-Version: 1.0
>> > +Content-Type: text/plain; charset=UTF-8
>> > +Content-Transfer-Encoding: 8bit
>> > +
>> > +Upstream-Status: Pending
>> > +
>> > +Signed-off-by: Andreas Müller <schnitzeltony at gmail.com>
>> > +---
>> > + src/rvoice/fluid_rvoice_mixer.c | 26 ++++++++++++++++++++++++++
>> > + 1 file changed, 26 insertions(+)
>> > +
>> > +diff --git a/src/rvoice/fluid_rvoice_mixer.c
>> > b/src/rvoice/fluid_rvoice_mixer.c
>> > +index 9616518..dbf8057 100644
>> > +--- a/src/rvoice/fluid_rvoice_mixer.c
>> > ++++ b/src/rvoice/fluid_rvoice_mixer.c
>> > +@@ -27,6 +27,10 @@
>> > + #include "fluid_ladspa.h"
>> > + #include "fluid_synth.h"
>> > +
>> > ++#if defined(__ARM_NEON__)
>> > ++#include "arm_neon.h"
>> > ++#endif
>> > ++
>> > +
>> > + #define ENABLE_MIXER_THREADS 1
>> > +
>> > +@@ -794,20 +798,42 @@ fluid_mixer_buffers_mix(fluid_mixer_buffers_t*
>> > dest, fluid_mixer_buffers_t* src)
>> > +   if (minbuf > src->buf_count)
>> > +     minbuf = src->buf_count;
>> > +   for (i=0; i < minbuf; i++) {
>> > ++#if defined(__ARM_NEON__) && defined(WITH_FLOAT)
>> > ++    for (j=0; j < scount; j+=4) {
>> > ++        float32x4_t vleft = vld1q_f32(&dest->left_buf[i][j]);
>> > ++        float32x4_t vright = vld1q_f32(&dest->right_buf[i][j]);
>> > ++        vleft = vaddq_f32(vleft, vld1q_f32(&src->left_buf[i][j]));
>> > ++        vright = vaddq_f32(vright, vld1q_f32(&src->right_buf[i][j]));
>> > ++        vst1q_f32(&dest->left_buf[i][j], vleft);
>> > ++        vst1q_f32(&dest->right_buf[i][j], vright);
>> > ++    }
>>
>> > I wonder if gcc could be initiated to see if it can vectorize the code,
>> > here
>> > have you tried setting cmdline options or may be pragma hints
>>
>
> I've sent patches like these to some projects. If I remember correctly first
> one was jack2 [1]. For that I wrote some performance tests [2] and tested
> vectorizing either but with disappointing results. This is ~ a year ago and
> I think intrinsics are still the option performing better.
>
> [1]
> https://github.com/jackaudio/jack2/commit/77bb8be12e0d856fbc004cf57185ab36a2df04c2
> [2]
> https://github.com/jackaudio/jack2/commit/c32b823860bc8e887774ab10bc3e9bd76e85e3f3

interesting. does it mean that vectorization sucks in gcc form arm ?
and works ok on other architectures