[Scip] Timing issues

Sun Jul 14 15:17:23 MEST 2013

Hey Giacomo (and SCIPing people),

you know, I guess that's a reasonable behavior, kind of solver independent
I believe. Some months ago I run some experiments precisely to get a grip
on "how unstable computing times become when forgetting that an X core
machine differs from X independent ones".

My setup was: 27 miplib3 instances (solved in under a minute --I needed
quick results), CPLEX 12.x, a 4 core machine (with both TurboBoost and
HyperThreading disabled). I solved each instance 100 times using either 1,
2, 3, or 4 cores at the same time, then averaged everything. I have some
tabled data --but I guess that might be too much for the mailing list.

The results I had are, overall, that the average, over all instances, of
the maximum solving time using 4 cores can be up to 32% larger than the
average solving time using a single core. My figures are, roughly, +22% for
2 cores, +24% for 3, +32% for 4.

That +40% (200 over 500) seems kind of inline to me, although I have never
run experiments with larger instances with higher solution times --and I'm
interested in the outcome there.

To me, the tagline is: "regardless of how high X is, an X core machine
should run a single process a time, as much as possible)".

-Stefano

PS: did this earn me any beers? ;)

On Sun, Jul 14, 2013 at 2:18 PM, Giacomo Nannicini <giacomo.n at gmail.com>wrote:

> Dear Mr SCIP,
> we are getting some strange readings when running the same process
> multiple times. A (very smart) student who's working with me reports
> that if he runs the same job (same input, parameters etc) multiple
> times, using 4 parallel jobs on a machine that has 4 distinct CPUs and
> no other load, some jobs may take considerably longer. Number of
> iterations, nodes etc. is exactly the same of course.
>
> For example, almost all runs out of 30 runs take ~500 seconds with
> tiny variance as expected, and a couple of runs take an extra 200
> seconds (!!!). The process is not swapping and the only shared
> resources are HD and RAM: each CPU has its own cache. This happens on
> a few problem instances. Note that typically, we do not have this
> issue and all jobs take almost exactly the same time, as expected.
>
> We are measuring user CPU time, clocktype = 1.
>
> I could not replicate this, but the logs and settings look correct and
> I have no idea where the discrepancy comes from. Did anybody
> experience a similar issue? Is it a hardware related thing (TurboBoost
> is disabled, but who knows...)? And finally, is there any way to get a
> more stable reading of the CPU time (I see that SCIP uses times(),
> while I am more familiar with getrusage())?
>
> Any ideas are welcome because I am completely in the dark. I am
> willing to pay a beer for a hint in the right direction.
>
> Giacomo
> _______________________________________________
> Scip mailing list
> Scip at zib.de
> http://listserv.zib.de/mailman/listinfo/scip
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserv.zib.de/mailman/private/scip/attachments/20130714/afcbe667/attachment.html