[SCIP] About variances in solving time

Thu Nov 28 01:09:20 CET 2024

Hi Sascha,

Many thanks for your detailed response! I will definitely experiment with allocating a fraction of the cores for the runs.

Best wishes,

Selin Bayramoglu
Ph.D. Student
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology
sbayramoglu3 at gatech.edu<mailto:sbayramoglu3 at gatech.edu>

From: s schnug <sascha.schnug at gmail.com>
Date: Tuesday, 26 November 2024 at 18:04
To: Bayramoglu, Selin <sbayramoglu3 at gatech.edu>
Cc: scip at zib.de <scip at zib.de>
Subject: Re: [SCIP] About variances in solving time
You don't often get email from sascha.schnug at gmail.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
Hi,

i think this a very complex multi-dimensional question.

Some of those questions and my personal opinions (i'm no SCIP-dev):

1: What influences solving-time with "deterministic tasks" (here meaning: same decisions/path -> but potentially earlier/later finishing)
- 1a: Thread starvation (e.g. overcommitment of cpu-cores)
- 1b: memory-limit (leading to memory-trashing)
- 1c: memory-bandwith limit
- 1d: cache invalidation
- 1e: IO-bottleneck

1a is related to question 3
1b should be easy to estimate / check -> care has to be taken as memory-trashing is catastrophic in regards to solving-time (probably not your issue as it's usually worse than only a 70% slow-down)
1c can be a real issue (imho) with many LP/MILP solvers running concurrently as some subroutines (e.g. simplex) are quite memory-bandwith hungry -> tuning recommended (see "recommendation")
1d not much one can do without removing concurrent workers / noisy neighbors -> hopefully not that important as mostly a L3-cache issue (which is slow anyway) until threads are highly overcommitted (many context-switches)
1e only relevant when all your 48 tasks read their input the same time -> not sure if relevant: but easy to solve if needed => add a randomized sleep before doing IO which is not counting towards solving-time

Conclusion: 1a would be a real problem (see 3); 1b should be analyzed and 1c might need tuning. 1d we ignore and 1e might be analyzed and approaches if nothing else works.

2: "Determinism" (again: decisions/path)
- 2a: Slow-down while using a different solver-path (different decisions)
- 2b: Slow-down while not using a different solver-path (same path, but faster or slower)

Non-deterministic throughput (OS-scheduling and all topics above) is bad itself but it's probably even worse when the algorithm is working in a time-dependent manner. In this case, 2a, any bad effect can become much much worse.

I'm somewhat afraid and this is a question for SCIP-devs that SCIP is not (opposed to some other solvers based on cooperative-scheduling / task-interleaving) necessarily following the same path before being cut-off by some time-limit (or terminating with a solution). I interpret the source-code like "#define DEFAULT_HEURTIMELIMIT    60.0 /**< time limit for a single heuristic run */" as indication, that effects/influences from 1 can make two solver-runs "diverge" in their decisions (run A can work with heuristic success solution; run B cannot) => the bad thing here is that decisions can have exponentially bad influence on solving-time (compare with "performance variability in mixed-integer programming").

Conclusion  : Probably not much one can do except for reducing the effects of 1.

3: "CPU-core overcommitment (self-scheduled + "noisy neighbors" = other peoples scheduled tasks):
- 3a: Scheduling 100 tasks on 48 cores leads to <= 48 tasks running concurrently (bounded pool)
- 3b: Scheduling 100 tasks on 48 cores leads to 49 or more tasks running concurrently (unbounded pool)

Related to 1a:

Conclusion: 3b is bad in any case, but even worse when other peoples tasks are not controlled by the scheduler. I assume 3a is the case. If not, it's not a good environment/setup for benchmarking.

4: Single-core performance variability
- 4a: Turbo-boosted CPUs
- 4b: More "stable" CPUs

Especially desktop CPUs are hard to use for benchmarks due to turbo-boosting and co. Some libraries like "google benchmark" even tries to detect this and outputs strong warnings.
There are similar (inverse) topics related to noisy-neighbors (AVX512 instruction-counting based cpu-clock throttling).

Terrible situation for everyone doing benchmarks. BUT hopefully your VM is a classic server-like setup where turbo-boosting and co. is not there and things are much more symmetric.
If not: again... bad environment/setup.
The AVX512 topic can hurt a lot (~30% less clock-speed) but pretty much no software is using it. I guess this includes simplex-based MILP solvers (especially SCIP). Not so sure about IPMs though.

-----

Long story short:

Assuming that it's a server-like environment ("stable CPU cores" + high memory-bandwith) i have only one real recommendation:

- Don't let the scheduler run with a pool-size of N=cpu-cores=48 but M<<N
- Tune M by doing experiments: M converging to 0 should make performance-variability converge to 0 (~ your serial experiments) and you should decide for yourself if 1%, 5% or 10% deviation is ok (when 70% is not) -> pick M achieving the % desired
- This should help with all topics in 1, 2, and 3

***** Hopyfully, you have some kind of control of the scheduler to do this! (maybe there is an admin to contact and talk to) *****
Best-case: M includes other peoples tasks (probably a strong no by your admin due to economic factors)
Maybe okay: M includes only your tasks (for which we now are very cpu/memory-bandwith hungry)

(Without control there might be some merrit in scheduling dummy-tasks; we control; limiting negative impact. But in most cases this assumes something about the scheduler-decisions and is a rather hacky endavour)

Greetings,
Sascha

PS: There is, of course, also statistics. Maybe it makes sense to replace serial runs with doing parallel runs multiple times (each) and smooth the results (including analysis of variance).

Am Di., 26. Nov. 2024 um 21:48 Uhr schrieb Bayramoglu, Selin <sbayramoglu3 at gatech.edu<mailto:sbayramoglu3 at gatech.edu>>:
Hello,

I have a question regarding the timing of algorithms. I am recording the solving times of SCIP on 100 problems. My end goal is to compare SCIP under different configurations. I am running my experiments on a dedicated 48-core virtual machine. I have run these problems in two settings:

1.      Submit a single job which solves the problems sequentially on a single core. I did not submit any other job, but the machine has regular processes going on in the background.
2.      Submit 100 jobs and use all cores of the machine. The jobs get assigned to different cores of the machine and run in parallel.

I use HTCondor for scheduling jobs. I observed that the solving time can go up by up to 70% in case 2 compared to case 1. In addition, running case 1 several times gives very consistent runtimes, but the runtimes vary a lot more in case 2.

I was wondering if there are certain measures to take so that timing values are stable (one can guarantee to get the a similar solving time for every run on the same problem) and are short. Even though the results from case 1 are satisfactory, it is highly time consuming to run all jobs on a single core and keep the remaining 47 cores idle.

I would be happy to take any suggestions on this matter.

Thanks.

Selin Bayramoglu
Ph.D. Student
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology
sbayramoglu3 at gatech.edu<mailto:sbayramoglu3 at gatech.edu>
_______________________________________________
Scip mailing list
Scip at zib.de<mailto:Scip at zib.de>
https://listserv.zib.de/mailman/listinfo/scip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.zib.de/pipermail/scip/attachments/20241128/1c411e08/attachment.html>