[OE-core] Hash Equiv Server experiment results

Thu Aug 22 17:29:34 UTC 2019

On Thu, Aug 22, 2019, 9:20 AM Richard Purdie <
richard.purdie at linuxfoundation.org> wrote:

> I wanted to summarise what my local tests with the hash server
> concluded.
>
> a) the opendb() changes I made in:
>
> http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=ca04aaf7b51e3ee2bb04da970d5f20f2c9982cb8
>    broke things as the database is opened for each request. I have
>    local patches to fix that but it only helps by about 10%.
>
> b) the overhead from the separate receive and handling threads is not
>    worth the overhead:
>
> http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=d40d7e43856f176c45cf515644b5f211c708e237
>    This probably halves throughput.
>
> c) I moved the database writes to their own thread with a queue but it
>    doesn't seem to help much other than allowing other threads to
>    handle requests in parallel.
>
> d) the ThreadedMixin regresses performance further and is the worst
>    change I've tested.
>
> e) Using ThreadPoolExecutor along the lines of:
>    self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=10)
>    self.executor.submit(self.process_request_thread, request,
> client_address)
>    doesn't help speed. Its faster then ThreadedMixin but slower than no
> threads.
>
> f) The profile data I have suggests we spend a lot of time in TCP/HTTP
>    header overhead and connection setup which confirms Joshua's
>    thoughts.
>

Based on the profiling I did, the overhead of the HTTP requests is also
substantial compared to the amount of actual body data sent.

> g) The most optimal setup is therefore the original server with no
>    threading.
>

Ya threading is *bad*. I don't exactly know why...

> h) The autobuilder would need to cope with 9000*40 requests in under a
>    minute, preferably faster. The current server does not have
>    anywhere near that speed.
>

I have tried a few improvements to the client and server model, and I was
able to get a 5x improvement in performance. On my desktop, this went from
less than 3000 requests/sec to 15000 requests/sec. (both tests were with 40
simultaneous clients). I believe this would allow my underpowered desktop
to process these requests in under 30 seconds, let alone the autobuilder
server.

> My conclusion based on this is that we need to rewrite the way runqueue
> makes the hash computations, perhaps seeding the cache in advance so we
> minimise the number of single calls. We can make one query for all 9000
> entries to the server on a single connection/request, or batch in
> blocks of 1000 or similar.
>

Would that work? Aren't the hash calculations dependent on each other? My
concern here is that we might end up making much more than 9000 requests as
unihashes are changed by the server.

> The other option would be a custom server/protocol but I think what
> we're using is fine, we just need to change how.
>

I've recently been investigating how to optimize the protocol for the heavy
request load expirenced during signature generation. There are a few ways I
have thought of that this could be done.

1) use the HTTP upgrade header to "convert" an http connection to an
endpoint to an optimized byte stream based hash lookup protocol (this is
effectively how websockets work). This protocol would be a simple
persistent connection where the client would send a line to the server and
the server would reply with a single line that was the unihash.

2) The same hash lookup protocol as before, but over an actual websocket.

3) a custom TCP protocol, perhaps some sort of JSON based message protocol.
This protocol would also be able to enter a "stream" mode, just like the
previous two for optimal transfer speed.

The trick here of course is that if you are leaving persistent connections
open, you need a way to service the streams fairly between the clients, and
you can't use threads, since we've seen that they do not scale well. Using
Pythons asynchronous API (async/await) can be used to solve this and run
everything on a single thread.

The three proposed solutions have their tradeoffs. 1 & 2 would require 3rd
party modules, as there is no AIO http server in vanilla python. 3 can be
done with vanilla python (3.5 or later), but doesn't have any of the
advantages that we get from using HTTP, such as proxy transversal, HTTPS
negotiation, and others. I'm not sure how important these things are anyway.

One advantage here is that the runqueue changes are relatively simple, and
isolated to the hash equivalence mix in class.

> I have some profiling code for the hashserver but its doing profiling
> per thread so doesn't integrate well until we decide what form the
> codebase should have. Simpler is looking to perform better.
>
> I haven't had a chance to work on these patches, nor will I over the
> next few days as I'm taking a break but I think I know what we need to
> do now.
>

Thanks a lot for looking into this, Richard. Have a good break. We can pick
it back up next week and try to figure out the best path.

Joshua Watt

> Cheers,
>
> Richard
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openembedded.org/pipermail/openembedded-core/attachments/20190822/20519a69/attachment-0001.html>