Re: performance problem

From: Laurent Deniau <Laurent.Deniau@xxxxxxx>
To: "<luajit@xxxxxxxxxxxxx>" <luajit@xxxxxxxxxxxxx>
Date: Sat, 16 May 2015 09:11:48 +0000

On May 16, 2015, at 10:38 AM, Geoff Leyland <geoff_leyland@xxxxxxxxxxx> wrote:

On 16/05/2015, at 10:37 am, Laurent Deniau <Laurent.Deniau@xxxxxxx> wrote:

On May 15, 2015, at 8:30 PM, Geoff Leyland <geoff_leyland@xxxxxxxxxxx> wrote:

But why does the speed of expression creation matter so much?

Because we have to evaluate "track" routines million times, specially when
the expressions involve a mix of scalars and very small polynomials. I
really though that the JIT would generate specialised code after some
iterations. But yes, if I do not succeed to influence the JIT to optimise
better, I will have to think about some memorisation of the expressions and
placeholders as a backdoor.

Well, I’m not sure I’ll be that much help, but I’m intrigued: do you really
have millions of *different* expressions you need to construct *and* evaluate
millions of times, or just a large number of expressions each of which you
construct once and then evaluate millions of times? I’m having trouble
imagining, well, anything, that would generate millions of different
expressions, so it’d be interesting to hear more.

Second case. We have few thousands different expressions that I need to build
once and evaluate millions times. The expressions leafs have stable types
during the evaluation but the input values change all the time so I would need
to build them with placeholders and memoize them, not a problem in Lua except
that I will have to evaluate a tree even if all the input parameters are
scalars (a common case). But if I have to go this way it means that the JIT
failed to optimise the code, including sinking, type and ops specialisations,
inlining, control flow specialisation, strength reduction, fuse, etc… and
therefore the resulting code will be way to slow (e.g. take days to run).

I had a look at various versions of rima, and no, I didn’t re-use nodes
building expressions, I did do a fair bit of work trying minimise garbage
while evaluating (rima supports partial expression evaluation, so a some over
a two indices might become a list of sums over one or something). I wouldn’t
be sure that I gained a lot from that, but it was a while ago.

Reusing temporaries during evaluation of the expression is easy (with some
pitfalls), but we were talking to reuse nodes during building, which means
detecting subexpressions in a fast way. I don't think that it helps much so I
haven't given a try.

Best,
Laurent.

--
Laurent Deniau http://cern.ch/mad
Accelerators Beam Physics mad@xxxxxxx
CERN, CH-1211 Geneva 23 Tel: +41 (0) 22 767 4647

References:
- performance problem
  - From: Laurent Deniau
- Re: performance problem
  - From: Geoff Leyland
- Re: performance problem
  - From: Laurent Deniau
- Re: performance problem
  - From: Geoff Leyland

Re: performance problem

Other related posts: