Part of LOM

This blog is part of web activities of the Laboratory of Organic Materials (LOM) of the Institute of Solid State Physics of the University of Latvia.
Showing posts with label Performance. Show all posts
Showing posts with label Performance. Show all posts

Thursday, March 7, 2019

Comparative performance of QC programs, or Why you should [not] buy commercial software

Sometimes (e.g. when designing Your laboratory or Your project budget) You meet with the difficult choice of deciding which program to buy for Your computations. Or maybe just use a freeware or some with academic licence?

This is not so easy question, because that depends on what are You going to do. There are four main pros (or cons) of any computational code:
  • performance
  • availability of different methods
  • computational stability
  • ease of use.
Commercial codes have more tendency to excel in performance and stability, because their main clients are labs/research institutions for which quantum chemical computations are something auxiliary; they may host some computationalists but they are not the priority. All the good journals nowadays require comparison with quantum chemical/mechanical calculations, be it chemistry or molecular / solid-state physics; hence these researchers need to get their results
  • as soon as possible
  • as easily as possible.
What they usually do not care about much is
  • the quality of results (except when they disagree with they experimental ones).
Ease of use is less important if they have aforementioned computationalists.
Hence among the four main traits of QC software two or three are important for them – and these are the ones commercial institutions usually excel in (the performance and computational stability).

I checked performance for three programs (two academy-free and one commercial) for the following case:
Molecule: C34H32N2O4, mostly unsaturated, with no symmetry elements (72 atoms, 282 electrons, 210 degrees of freedom)
Computational model: B3LYP/6-31G(d,p), with B3LYP defined as in Gaussian and no solvent in the model (760 basis functions)
Computational tasks: geometry optimization and thermochemistry of the ground state.

The results are as following:

Geometry optimization time Gibbs free energy time
Commercial 1 1374 3209
Academic 1 3815 143394
Academic 2 36488


Final SCF energy Final Gibbs free energy
Commercial 1 -1725.348479 -1724.833152
Academic 1 -1725.314384 -1724.788763
Academic 2 -1725.314514


Here are the same data on pictures: 


 Looks impressive, doesn't it? It is this moment I realized why those commercial suites cost that much.

 So, should we use only commercial software? Again, it depends.
  • If You are doing many calculations routinely and either do not care about selecting the best method or want to use some of the community standards, then probably You should consider buying some commercial package.
  • If You are interested in using new methods out-of-the-box, consider using academic freeware because usually they contain more new methods.

Why are there usually more new methods in the freeware? The reason is simple: they don't need to spend their time and/or people on customer support and refining the robustness of the code of the ease of use. This is because these programs are written by people who understand what's going on for people who understand what's going on, hence both the authors and users has less urge to communicate.

The trade-off is, again, not that clear, because new methods may be computationally more effective even if the commonly used algorithms are less effective that in some commercial code which does not have this new method.

What about the customer service? Aren't there forums from which You can get the support? Well, there are, but, again. for the academic/freeware the community-based approach is invaluable. Commercial software may have some (2–4) people they are paying to reply to every message they get; for freeware, anyone who knows the answer will answer. When he or she has time. And if they know how to solve this particular problem which probably they haven't run into yet. Therefore, both cases have their advantages and disadvantages.

 These are my thoughts, so feel free to disagree :-)

Thursday, March 31, 2016

Gaussian performance on Windows and on Linux

As our Institute has licenses for Gaussian™ 09, Revision D.01 on both Windows™ and Linux®, it was interesting to compare performance on a single machine where both operating systems are installed. It has Intel® i3 2-core CPU (4 threads), 4 GB of RAM and Samsung ST500 hard drive – standard desktop PC, actually. Usually people test computational software on a single CPU core, but we ran both variants. Various calculation types were tested for some of our molecules.

At first glance, there is a big inconsistency in results: for single-core computations Gaussian 09 on Linux performs significantly better than on Windows™, but for multiple-core test the situation seems to be opposite. But if we divide "Job cpu time" values by the number of CPU treads for Linux values, the situation looks more logical... At least, these numbers are comparable.


Windows 7


Job type Calculation with single CPU core Calculation with 4 CPU cores Calculation with 4 CPU cores, norming to single core


Opt
Freq
Stable=Opt
Polar
Polar + SCRF
Total:

Hours Min. Sec.
2 30 40
6 56 41
1 23 33
1 28 37
1 42 0
14 1 31

Hours Min. Sec.
1 21 55
3 38 8
0 33 0
0 43 47
0 50 3
7 6 53

Hours Min. Sec.
0 20 28.75
0 54 32
0 8 15
0 10 56.75
0 12 30.75
1 46 43.25
Total: 50,491 sTotal: 25,613 sTotal: 6,403.25 s



Debian GNU/Linux 8.1


Job type Calculation with single CPU core Calculation with 4 CPU cores Calculation with 4 CPU cores, norming to single core


Opt
Freq
Stable=Opt
Polar
Polar + SCRF
Total:

Hours Min. Sec.
1 30 37.7
4 18 13.8
0 36 55.1
1 6 27.1
1 13 41
8 45 54,7

Hours Min. Sec.
3 33 27.3
11 29 34
1 18 14.8
2 21 4.9
2 34 58.3
21 17 19.3

Hours Min. Sec.
0 53 21.825
2 52 23.5
0 19 33.7
0 35 16.225
0 38 44.575
5 19 19.825
Total: 31,554.7 sTotal: 76,639.3 sTotal: 19,159.83 s


My personal conclusion is that on Unix™, Gaussian™ returns calculation time as if single CPU core was used on Unix™, but on Windows™, the actual computation time is returned. This was confirmed just by comparing time of creation and the last modify time for each file: on Linux this time span was far shorter than it would be if we sum up all "Job cpu times". Therefore, third section of the first table and second section in the second table are not corresponding to reality.

A practical conclusion is that on UnixGaussian™ runs faster than on Windows™. A technical conclusion is that we see clearly that not so much of computation can be parallelized, actually. Also, parallelization improves results much more on Windows than on Linux (but they are still worse).

We have also tried to use specific proprietary CPU firmware on Linux (Debian package i3fw). However, the results generally became slightly worse (for almost all jobs). We won't judge on it, because this "slightly" is really marginal difference. However, if You are free software freak, I think these findings will warm Your heart :)