[SCIP] About variances in solving time

Thu Nov 28 02:18:05 CET 2024

Hi Selin,

Depending on what you are working on, you may be able to use the total
number of LP iterations or B&B nodes as a surrogate measure for time. These
are unaffected by cluster issues.

As a start, you could determine how closely related to time these measures
are on your single-thread experiments, using a simple linear regression for
example.

Depending on how related these measures are, you may be able to keep using
the cluster with 48 concurrent jobs and track performance using these
surrogate measures for some or all of your results (except perhaps the
final ones).

I hope that makes sense.

Kind regards
Pierre

On Thu, 28 Nov 2024 at 11:12, Bayramoglu, Selin <sbayramoglu3 at gatech.edu>
wrote:

> Hi Sascha,
>
>
>
> Many thanks for your detailed response! I will definitely experiment with
> allocating a fraction of the cores for the runs.
>
>
>
> Best wishes,
>
>
>
> *Selin Bayramoglu*
>
> Ph.D. Student
>
> *H. Milton Stewart School of Industrial and Systems Engineering*
> Georgia Institute of Technology
>
> sbayramoglu3 at gatech.edu
>
>
>
> *From: *s schnug <sascha.schnug at gmail.com>
> *Date: *Tuesday, 26 November 2024 at 18:04
> *To: *Bayramoglu, Selin <sbayramoglu3 at gatech.edu>
> *Cc: *scip at zib.de <scip at zib.de>
> *Subject: *Re: [SCIP] About variances in solving time
>
> You don't often get email from sascha.schnug at gmail.com. Learn why this is
> important <https://aka.ms/LearnAboutSenderIdentification>
>
> Hi,
>
> i think this a very complex multi-dimensional question.
>
> Some of those questions and my personal opinions (i'm no SCIP-dev):
>
> 1: What influences solving-time with "deterministic tasks" (here meaning:
> same decisions/path -> but potentially earlier/later finishing)
> - 1a: Thread starvation (e.g. overcommitment of cpu-cores)
> - 1b: memory-limit (leading to memory-trashing)
> - 1c: memory-bandwith limit
> - 1d: cache invalidation
> - 1e: IO-bottleneck
>
> 1a is related to question 3
> 1b should be easy to estimate / check -> care has to be taken as
> memory-trashing is catastrophic in regards to solving-time (probably not
> your issue as it's usually worse than only a 70% slow-down)
> 1c can be a real issue (imho) with many LP/MILP solvers running
> concurrently as some subroutines (e.g. simplex) are quite memory-bandwith
> hungry -> tuning recommended (see "recommendation")
> 1d not much one can do without removing concurrent workers / noisy
> neighbors -> hopefully not that important as mostly a L3-cache issue (which
> is slow anyway) until threads are highly overcommitted (many
> context-switches)
> 1e only relevant when all your 48 tasks read their input the same time ->
> not sure if relevant: but easy to solve if needed => add a randomized sleep
> before doing IO which is not counting towards solving-time
>
> Conclusion: 1a would be a real problem (see 3); 1b should be analyzed and
> 1c might need tuning. 1d we ignore and 1e might be analyzed and approaches
> if nothing else works.
>
> 2: "Determinism" (again: decisions/path)
> - 2a: Slow-down while using a different solver-path (different decisions)
> - 2b: Slow-down while not using a different solver-path (same path, but
> faster or slower)
>
> Non-deterministic throughput (OS-scheduling and all topics above) is bad
> itself but it's probably even worse when the algorithm is working in a
> time-dependent manner. In this case, 2a, any bad effect can become much
> much worse.
>
> I'm somewhat afraid *and this is a question for SCIP-devs* that SCIP is
> not (opposed to some other solvers based on cooperative-scheduling /
> task-interleaving) necessarily following the same path before being cut-off
> by some time-limit (or terminating with a solution). I interpret the
> source-code like "#define DEFAULT_HEURTIMELIMIT    60.0 /**< time limit for
> a single heuristic run */" as indication, that effects/influences from 1
> can make two solver-runs "diverge" in their decisions (run A can work with
> heuristic success solution; run B cannot) => the bad thing here is that
> decisions can have exponentially bad influence on solving-time (compare
> with "performance variability in mixed-integer programming").
>
> Conclusion  : Probably not much one can do except for reducing the effects
> of 1.
>
> 3: "CPU-core overcommitment (self-scheduled + "noisy neighbors" = other
> peoples scheduled tasks):
> - 3a: Scheduling 100 tasks on 48 cores leads to <= 48 tasks running
> concurrently (bounded pool)
> - 3b: Scheduling 100 tasks on 48 cores leads to 49 or more tasks running
> concurrently (unbounded pool)
>
> Related to 1a:
>
> Conclusion: 3b is bad in any case, but even worse when other peoples tasks
> are not controlled by the scheduler. I assume 3a is the case. *If not,
> it's not a good environment/setup for benchmarking.*
>
> 4: Single-core performance variability
> - 4a: Turbo-boosted CPUs
> - 4b: More "stable" CPUs
>
> Especially desktop CPUs are hard to use for benchmarks due to
> turbo-boosting and co. Some libraries like "google benchmark" even tries to
> detect this and outputs strong warnings.
> There are similar (inverse) topics related to noisy-neighbors (AVX512
> instruction-counting based cpu-clock throttling).
>
> Terrible situation for everyone doing benchmarks. BUT hopefully your VM is
> a classic server-like setup where turbo-boosting and co. is not there and
> things are much more symmetric.
>
> *If not: again... bad environment/setup.*
>
> The AVX512 topic can hurt a lot (~30% less clock-speed) but pretty much no
> software is using it. I guess this includes simplex-based MILP solvers
> (especially SCIP). Not so sure about IPMs though.
>
> -----
>
> Long story short:
>
> Assuming that it's a server-like environment ("stable CPU cores" + high
> memory-bandwith) i have only one real recommendation:
>
> - Don't let the scheduler run with a pool-size of N=cpu-cores=48 but M<<N
> - Tune M by doing experiments: M converging to 0 should make
> performance-variability converge to 0 (~ your serial experiments) and you
> should decide for yourself if 1%, 5% or 10% deviation is ok (when 70% is
> not) -> pick M achieving the % desired
> - This should help with all topics in 1, 2, and 3
>
> ***** Hopyfully, you have some kind of control of the scheduler to do
> this! (maybe there is an admin to contact and talk to) *****
> Best-case: M includes other peoples tasks (probably a strong no by your
> admin due to economic factors)
> Maybe okay: M includes only your tasks (for which we now are very
> cpu/memory-bandwith hungry)
>
>
>
> (Without control there might be some merrit in scheduling dummy-tasks; we
> control; limiting negative impact. But in most cases this assumes something
> about the scheduler-decisions and is a rather hacky endavour)
>
> Greetings,
> Sascha
>
>
>
> PS: There is, of course, also statistics. Maybe it makes sense to replace
> serial runs with *doing parallel runs multiple times (each) and smooth
> the results* (including analysis of variance).
>
>
>
> Am Di., 26. Nov. 2024 um 21:48 Uhr schrieb Bayramoglu, Selin <
> sbayramoglu3 at gatech.edu>:
>
> Hello,
>
>
>
> I have a question regarding the timing of algorithms. I am recording the
> solving times of SCIP on 100 problems. My end goal is to compare SCIP under
> different configurations. I am running my experiments on a dedicated
> 48-core virtual machine. I have run these problems in two settings:
>
>
>
> 1.      Submit a single job which solves the problems sequentially on a
> single core. I did not submit any other job, but the machine has regular
> processes going on in the background.
>
> 2.      Submit 100 jobs and use all cores of the machine. The jobs get
> assigned to different cores of the machine and run in parallel.
>
>
>
> I use HTCondor for scheduling jobs. I observed that the solving time can
> go up by up to 70% in case 2 compared to case 1. In addition, running case
> 1 several times gives very consistent runtimes, but the runtimes vary a lot
> more in case 2.
>
>
>
> I was wondering if there are certain measures to take so that timing
> values are stable (one can guarantee to get the a similar solving time for
> every run on the same problem) and are short. Even though the results from
> case 1 are satisfactory, it is highly time consuming to run all jobs on a
> single core and keep the remaining 47 cores idle.
>
>
>
> I would be happy to take any suggestions on this matter.
>
>
>
> Thanks.
>
>
>
> *Selin Bayramoglu*
>
> Ph.D. Student
>
> *H. Milton Stewart School of Industrial and Systems Engineering*
> Georgia Institute of Technology
>
> sbayramoglu3 at gatech.edu
>
> _______________________________________________
> Scip mailing list
> Scip at zib.de
> https://listserv.zib.de/mailman/listinfo/scip
>
> _______________________________________________
> Scip mailing list
> Scip at zib.de
> https://listserv.zib.de/mailman/listinfo/scip
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.zib.de/pipermail/scip/attachments/20241128/02c93e1d/attachment.html>