Part of LOM

This blog is part of web activities of the Laboratory of Organic Materials (LOM) of the Institute of Solid State Physics of the University of Latvia.

Friday, June 30, 2017

Small bash code to split batch .OUT file in parts

This is trivial, but in any case…

Batch jobs output files can grow big, which causes problems for reading them. This code splits them in smaller chunks by selected number of jobs done to be read automatically later. Chunk filenames are derived from the initial file name by adding "_L1", "_L2", etc. before file extension.

$ ./asplout split-by-how-many filename
112230: Normal termination of Gaussian 09 at Thu Apr 27 00:30:13 2017.
391832: Normal termination of Gaussian 09 at Fri May 5 03:24:39 2017.
454985: Normal termination of Gaussian 09 at Fri May 5 20:41:34 2017.
531521: Normal termination of Gaussian 09 at Sat May 6 17:43:29 2017.
629500: Normal termination of Gaussian 09 at Mon May 8 02:16:22 2017.
682260: Normal termination of Gaussian 09 at Mon May 8 16:48:18 2017.
773911: Normal termination of Gaussian 09 at Tue May 9 19:28:10 2017.


#!/bin/bash
[ "$1" == "-h" ] && echo "First argument: # of jobs in single split. Second argument: .OUT file name." && exit 0
irk=0; pesk=0
cp $2 bakabaka.out
while read menschutk; do ((irk++)); [ $((irk/$1*$1)) -eq $irk ] && { moose=$(echo $menschutk | cut -d : -f 1); sed -n '1,'$((moose-pesk))'p' $2 > ${2%.out}_L$((irk/$1)).out && echo $menschutk; sed -i '1,'$((moose-pesk))'d' $2; pesk=$moose; }; done < <(grep -n 'Normal termination' bakabaka.out)

rm bakabaka.out

Some inabilities of G09 / G16 that You might overlook in the Reference

Lately, I have run into the fact that there are some functionals in Gaussian for which 3rd derivatives are not supported. This means the following jobs are unavailable with them:
  • Polar=DCSHG (hyperpolarizabilities)
  • SCRF=ExternalIteration (state-specific solvation)
  • Freq=Raman (Raman intensities) *
  • Freq=Anharmonic (anharmonic corrections) *
  • maybe there are more, but I have never run into them…

*never experienced myself, but these methods definitely deal with the  3rd derivatives

No blame for Gaussian, Inc. — everything is in the Reference. But it is at the end of description of DFT Methods in the printed Reference, so can be overlooked (I did so); on the web page, it is at the "Availability" tab (that's probably more convenient).

Affected functionals for G09 are:

Exchange (x): Correlation (c): Hybrid:
Gill96, P, BRx, PKZB, TPSS, wPBEh, PBEh PKZB, TPSS HSE1PBE, HSE2PBE

For G16:

Exchange (x): Correlation (c): Hybrid:
G96, P86, PKZB, wPBEh, PBEh PKZB OHSE1PBE, OHSE2PBE

Of course, all higher-rung functionals constructed from these are also affected, e.g., TPSSh in G09. That means TPSS in G16 does have 3rd derivatives — another reason to buy the new version!

When a job dies due to such an unavailability, one of the following messages is printed multiple times (probably it is the number of second derivatives which would be differentiated again):

Invalid value of MaxDer in TPSSx

No func 3rd derivs with HSE (interestingly, I got it while using LC-wPBE).

Mentioning HSE is interesting because it is HSE03 exchange which is used for the short-range in LC-wPBE and not the PBE one, both in original paper by Vydrov and Scuseria of 2006 and as implemented in Gaussian. This is no nearly obvious, but it's how it is. Some renowned authors (for example, Takao Tsuneda in his year 2014 monograph about DFT; see p. 129). still erroneously assume it to be PBE.

Friday, June 23, 2017

Small bash code to check how many calculations in a batch have been done (updated 20.09.17.)

Maybe this will be useful. Output contains also last modification time for the .out file.
The script has three assumptions:
  1. title section is marked by initial ::: — we usually have the title section with some certain word fields for automated processing of files and getting them to the database;
  2. the output file is named same as the input file, only with different extension (.out instead of .gjf);
  3. the working directory should be specified in the third line of the script, or within the file name; or You can left out the lines remembering present directory, changing to the working directory and going back in the end.

Arguments are names of the output files.

#!/bin/bash
iamhere=$(pwd
) # can left this line out
cd /your/working/directory # can left this line out

dos2unix $@
for werqt in $@; do wass=$(grep -c 'Normal termin' $werqt); shouldd=$(grep -c '#' ${werqt%.out}.gjf); echo -e "$werqt: $(tput bold)$wass / $shouldd -> $((shouldd-wass))$(tput sgr0)\t$(ls -l $werqt | xargs | cut -f 6-8 -d ' ')"; tac $werqt | grep -m 2 -B 3 -e '^ #' -e '^ :::' | tac  > sneg.foo; routeklis=$(sed -n '/^ #/ {/\\/!{:bulka N; s/\n //g;/---/!b bulka; s/-//g;p}}' sneg.foo); echo -e "Last route/title is:\n$routeklis\n$(sed -n '/^ :::/ {/\\/!{:bulka N; s/\n //g;/---/!b bulka; s/-//g;p}}' sneg.foo)"; [ $(echo $routeklis | grep -c -e ' Opt' -e ' opt') -gt 0 ] && echo "As for now, step $(tac $werqt | sed -n '1,/Normal termin/p' | grep -c 'Step number') is being done."

echo
done
rm sneg.foo
cd $iamhere
# can left this line out

An example result (the script is named howresults):

$ howresults IN_436_M062X_plus_DL2_2_U1.out
IN_436_M062X_plus_DL2_2_U1.out: 0 / 164 -> 164 Jun 23 19:17
Last route/title is:
  #N MaxDisk=250GB M062X/6311G(d,p) Opt=(CalcAll,NoRaman,MaxStep=1) Volume=Tight Guess=Mix scrf=(cpcm,solvent=TetraHydroFuran)
  ::: IN17_436_M062X_ufg_6311_thf IN17_436_M062X_6311_thf GeometryOptimization GeomOpt GeometryOptimization 2

As for now, step 10 is being done.