RMC++
new, RMC++_multi
|
RMC++ latest versions, RMC++_new and RMC++_multi |
|
|
These versions used the united
atomic-molecular version as a starting point; all the changes are
compared to that.
|
Changes concerning the user |
There are a
standard and a multi-threaded version of the
program, having different code for some of the classes, so they have to be kept
separately in different directories (RMC_new and RMC_multi presently). The
functionality of the two versions is the same. See some remarks about the speed of the versions.
The new versions
use gridding of the simulation box, can handle EXAFS, multiple neighbour type
coordination constraint, atomic swaps and distribution of the bond angle
constraint.
The usage of the programme changed slightly.
1. Starting in interactive mode (recommended
for WINDOWS): start the executable without any command line arguments, will ask
for file name, at the end wait for a character input before collapsing the
window.
2. Starting in the background (recommended for
Linux): start the executable with two command line arguments, first the file
name, second any character and redirecting the screen output to a file. This
way does not try to read from the keyboard, and at the end of the program it
terminates without waiting. (If it is started just with the first argument,
then at the and of the program it goes into a waiting state to read a character
from the keyboard (which does not come), and had to be terminated by the user!)
The results of the simulation and file contents are not influenced by this.
There is no need anymore to make any alteration in the altern.h
or altern.cpp, the program can be run both using Linux and Windows.
There are however some option switches located in the
altern.h file.
Using the old type LINUX Makefile you had to
edit the altern.h: if you wanted to choose an option, the #define Option had to
be left in the code, for switching it OFF it had to be commented it out
(//#define Option).
Using the new Makefile
for Linux there is no need editing the altern.h, command line arguments passed
to make can control the building of the code. Typing ‘make help’ will give
information about the command line options.
Compiling on Windows platform edit the altern.h for switching the
appropriate options!
The available options are:
1.
#define _TEST_MODE: this has to be disabled,
if the program is used in normal running mode
#define LAST_GEN 1000000
This is mainly for the testing of the program, in _TEST_MODE the random number
generator will be initialised with the same value, so the random numbers will
follow the same pattern at each run to make testing easier. The run will stop
at ngenerated = LAST_GEN regardless the value of time limit.
2.
#define _SUM_PPCF: the ppcf-s will be summed
at each saving, and the average is calculated and saved too.
3.
#define _ATLAS//use the ATLAS libraries for
matrix-vector operations
The libraries have to be installed separately (see the ATLAS web site) , only use this option if they are installed! The installed ATLAS libraries are using the BLAS routines, optimised for the given platform, and can increase the speed of the vector-vector, matrix-vector and matrix-matrix operations. In RMC ATLAS is applied for the S(Q) and F(Q) Fourier-transformation, and in case of g(r) constraint for the binàdr conversion (matrix-vector multiplication). Pass the AT=0 command line option when compiling on Linux platform to make to switch on the ATLAS option and include the necessary include files and libraries. There are some further remarks about ATLAS later.
4.
#define _OLD_HEADER: if the old style header
(name.h) are used, THE NEW STYLE headers are used by default. This
option has to be commented out normally.
5.
#define
_CODE_WARRIOR_MAC //DEFAULT is the PC or UNIX version, this option has to be
commented out normally
The recent RMC++ code was never tested on
MAC, there is no guarantee, that it can be compiled without any change, this was
only kept from historical reasons!!!
6.
#define _OLDFORMAT_OUT: additional to the normal output (*.fit), RMCA format *.out file is created containing the PPCF-s, partial S(Q), F(Q) and E(Q) data, then the calculated and (renormalized) experimental total g(r), S(Q), F(Q) and E(Q).
7.
#define _Q_SWITCH_SQ_FQ: if it is on, then for all the S(Q) and F(Q) data sets the normally calculated S(Q) and F(Q) will be multiplied by Q, so you have to fit to Q*S(Q) and/or Q*(F(Q).
8.
#define _NEI : if it is on, saving the neighbour list into the *.nei file.
#define _NEIE : if it is on, saving the neighbour list and also the squared neighbour distances and vector components into the *.nei file.
_NEI,_NEI2 only has effect, if cosine distribution constraint is applied
9.
#define _AV_MOVE: if it is on, then at every run status screen display the average distance the atoms moved so far from their starting positions is displayed.
As the new features described below made it
necessary to change the *.dat file, an example file will be included in
the download package.
The average distance so far the atoms moved from their starting positions is displayed on screen at every run status screen display, if the code was compiled with _AV_MOVE.
Introducing frequency for configuration
collecting.
A new parameter (coll_frequency) can be specified in the *.dat file in
the same line after the number of configuration to collect, a configuration is
collected at every coll_frequency save. If 0 is given for the frequeny or for
the number of collected configuration, than no configuration is collected. The
collecting starts after the timelimit was reached, as before. If the
coll_frequency is not present after the number of configurations, than it is
set to 1, and a configuration is collected at each save, as before, so older
format *.dat files can be used as well.
Introducing the calculation of Rw
for the g(r), S(Q), F(Q) and E(k) data series,
according to the formula

where x=r, b=c=d=0 for g(r) data; x=Q for
X-ray and neutron scattering data; x=k, c=d=0 for EXAFS data, AEi,j
is the experimental, ACi,j is the calculated
data. Rw is
printed on screen together with c2 by RunParams::PrintStatus, and given in the *.hst file for the
end of the run.
Cache padding was introduced for RMC++_multi to achieve good performance on p>2
multi-processor platforms (see here
for details). This normally does not
concern the user, especially, if you do not have more than 2 processors in a
machine. A few new parameters were introduced in the units.h file to regulate
cache padding:
NUMBER_OF_CACHE_LINES_TO_FETCH 1: Number of cache lines to cache in the
same time (prefetch), default is 1.
CACHE_LINE_SIZE 64: Byte. Size of the L1 cache, needed in some cases to
optimise cache usage. False sharing between threads has to be prevented. This
can happen, when although different threads are writing different memory
addresses, but the addresses are so close to each other, that they would be
cached together into the same cache line (are inside the same CACHE_ALIGNMENT
block). Because of this, if one part of a cache line is modified, the whole
cache line is written back to memory, so different threads may want to write
the same part of the memory holding back each other causing if this happens too
often to slow the performance down.
CACHE_PADDING =NUMBER_OF_CACHE_LINES_TO_FETCH*CACHE_LINE_SIZE: the
thread segments of some arrays have to be separated at least with
CACHE_PADDING-size(data_type) amount of bytes have to be kept between the
threads segment data.
If your computer have different cache line size, or number of cache
lines to fetch, modify these values, and recompile the program!
The *.dat file changed
(2007, version 1.1). In case of S(Q) and F(Q) data,
there are two extra lines after the renormalization constant containing
the switches whether linear and quadratic
renormalization should be used. The renormalization itself was changed, in
the original RMC++ the calculated data was renormalized, which was
theoretically incorrect, now it was changed to the renormalization of the
experimental data. For the sake of
clarity the c2 calculation
and the renormalization of the data sets is given by
,
where Npi
is the number of data points for data set i,
is the experimental,
is the
calculated data, for neutron and
X-ray fitting xij=Qi,j is the jth
data point of the ith set, ai , bi ,
ci and di
are the renormalization coefficients,
controlled by the vary amplitude, constant, linear and quadratic switch
in the *.dat file, and displayed during the simulation and written to
the *.fit file. The same formula is used for E(k) fitting, except
that xij=kij and only the ai
and bi coefficients used and featured in the *.dat
file, and for g(r) fitting xij=kij
and only ai exists.
The program determines the coefficients, where the difference between
the calculated and the experimental sets is the minimum.
The *.add file containing the additional parameters is not needed any more, the content of it is appended to the end of the *.dat file. One extra line after the last entry came after the appended lines of the *.add file is the option switch, whether the histogram (and the coordination number constraints, and too close atoms) should be loaded, if they are available. After this there are a lines specifying the desired number of atoms in a grid cell. The next line contains the swap fraction and for each mixed partial 1 (if the given partial can be involved in swap), 0 (not involved in swap), see details later. The last line is the total number of threads, red only by the multi-threaded version. The same *.dat file can be used for both the standard and the multi-threaded version, if number of threads line is present. Boolean type parameters are accepted in both the .true. or .false. or the 1 or 0 format. The download package contains an example *.dat file including all the possible constraints to demonstrate the format.
The precision of the text type configuration file (*.cfg)
was increased. Be aware, that even with the increased precision of the *.cfg
file, after a long run there can be a difference between the reloaded
histogram, and the histogram newly calculated from the reloaded *.cfg
due to the conversion from binary to decimal system during the saving. That is
the reasons of the introduction of the binary type
configuration file, (*.bcf). The program can work with either of
them. If both present, they are compared and if they differ more, than the
TOLERANCE given in the units.h, the *.cfg file will be used, as the
content of this can be checked easily by the user. It is recommended to use the
load histogram option, only if the binary configuration file is available.
Order of
the partials was changed back,
as it was in RMCA. Indexing follows row-continuously the order of the partials
in an upper diagonal matrix: (for example for 3 types: 1-1, 1-2, 1-3, 2-2, 2-3,
3-3). This effects only the output, as in the *.dat file
the order of the partial properties (cutoffs, coefficients…) even in the
original RMC++ was identical to RMCA.
Output was
changed. The old format,
RMCA-type *.out file is only created, if the _OLDFORMAT_OUT
compiler option is chosen. The new type *.fit output file will contain only the calculated and the
experimental totals. The partial g(r)-s at the middle of the histogram
bins (referred to as ppcf-s) will be saved in the *.ppcf file.
The partial calculated data {g(r) at the experimental data points in
case of g(r) fitting; S(Q), F(Q), EXAFS (referred
to as E(Q))} will be saved in the *.pgr, *.psq, *.pfq and *.peq
respectively.
The moveout option was
corrected, as the updating of the list containing the atoms below cutoff
could be incorrect. The basic principle of the original moveout option allowing
only one too close” atom among the moved atoms has been changed too. Now there
can be more than one “too close” atoms among the moved atoms (both for atomic
moves and for the example molecular move for CCl4) with the maximum equalling
the number of moved atoms + 1, if there is swap. This will make the simulation
even faster, as there is no need to make sure, that there is only one “too
close” atoms among the moved atoms, and search for atoms satisfying this
criterion. The indices of the “too close atoms” are saved into the *.tca file.
It can be reloaded together with the histogram, if the load the histogram
option is chosen, and the recalculation of the histogram is not necessary.
The coordination number constraint was generalized, so
not only one, but also a specified number of neighbour types can exist for a
constraint. For example in case of 3 atom types it is possible to define a
constraint that the central atom of type1 should have 3 atoms belonging to
type2 and type3 between rmin[type2] à rmax[type2] and rmin[type3]
à rmax[type3]
respectively, which means, that there are several possibilities satisfying this
constraint (type2-type2-type2, type2-type2-type3, type2-type3-type3 and
type3-type3-type3).
The concept of sub-constraints was also introduced. More than one sub-constraint means, that more than one desired coordination number (each with its own desired fraction and sigma) can be specified for a constraint to make the calculation quicker and use less memory and disk space, if constraints should only differ in their desired coordination number. In the original concept each constraint had only one sub-constraint. The introduction of sub-constraint will not result in additional functionality.
The *.dat file should contain the constraint the following way, for example for 3 constraint, the first having 1 neighbour type and 1 sub-constraint, the second 1 neighbour type and 2 sub-constraints and the third 2 neighbour type and 1 sub-constraint:
3 1 1 2 1 2 1 ! no of coordination constraints, number of neighbour types for each constraint, number of sub-constraint for each constraint
1 2
2.2 3.05 2 1.0 0.00025 ! central type, neighbour type(s), rmin[first_neightype]…
rmin[last_neightype], rmax[first_neightype]…
rmax[last_neightype], desired coordination
number[first_subconst]...desired coordination number[last_subconst],
fraction[first_subconst]...fraction[last_subconst],
sigma[first_subconst]...sigma[last_subconst]
1 3 2.1 3.1 3 4 0.4 0.6
0.00025 0.00015! central type, neighbour type(s), rmin[first_neightype]…
rmin[last_neightype], rmax[first_neightype]…
rmax[last_neightype], desired coordination number[first_subconst]...desired
coordination number[last_subconst],
fraction[first_subconst]...fraction[last_subconst],
sigma[first_subconst]...sigma[last_subconst]
1 2 3 2.2 2.1 3.05 3.1 3 1.0 0.00001 ! central type, neighbour type(s), rmin[first_neightype]…
rmin[last_neightype], rmax[first_neightype]…
rmax[last_neightype], desired coordination
number[first_subconst]...desired coordination number[last_subconst],
fraction[first_subconst]...fraction[last_subconst],
sigma[first_subconst]...sigma[last_subconst]
If only one neighbour type is required for each constraint, it is possible to specify the constraint as before, omitting the number of neighbour types after the number of constraint, and the program will automatically set the number of neighbour types to 1. If only one sub-constraint is used, than it is possible to omit their number after the number of neighbour types, and 1 is assumed. If however more than one sub-constraint is used for a constraint, than it is obligatory to specify the number of neighbour types as well, even if all of them is 1!
If coordination number constraint is applied, the number of types cannot exceed 7 due to the fact, that the internal variable CoordNumbConst::cctype is defined as long int, which usually represented by 4 bytes! If more types should be used, change the type of this variable to 64bit integer, which is unfortunately called by different names in different C++ environment, so it is not used presently.
EXAFS
fitting was included.
A short description about EXAFS, and what is fitted exactly in RMC is given here.
An example of the necessary lines in the *.dat file (for each EXAFS data set):
21 31 21 31 21 27 ! range of histogram bin points to use for EXAFS (r space) in pairs for each partial
3 ! E(k) weighting factor n
asse_15i_as_bft_k0.dat ßNAME OF THE FILE CONTAINING the k, E(k) data
1 444 ! no. of datapoints
60 260 ! range to be used in fitting
1 ! type of absorbing particle
as_edge_bs.dat ßNAME OF THE FILE CONTAINING the (r,k)-dependent coefficients
0.000008 ! standard deviation
1 ! whether to vary amplitudes
.false. ! whether to vary constant
A sample for the EXAFS data file and the coefficient file will be
included in the download package.
New constraint for the cosine distribution of
the bond angles was included. This has to come before the coordination
number constraint. The parameters in the *.dat file are the following:
1 0.1 ! number of cos. distr of bond angles constraints, dcos_theta (spacing in cos(theta) space
1 1 2 2 1.69 1.69 2.19 2.19 109.5 0.17 0.001 ! calc method (0:step, 1:Gauss, 2:no angle) ,central, neigh1, neigh2 type, dmin1, dmin2, dmax1, dmax2, angle, wcontrol, weight
0 for the
number of constraints is NEEDED, even if there is no constraint!
The number
of bins is regulated through the cos(theta) spacing.
A
constraint can be "positive", which means, that bond angles are
needed in the given region, and "negative" meaning that angles are
not wanted in the given region. Positive constraint can be set up by specifying
either 0 for the calculation method indicating a step function, or 1 meaning
Gaussian for the shape of the constraint. Step function means having uniform
distribution with integral 1 between angle - wcontrol ->angle + wcontrol,
(both given in degrees), 0 otherwise. Gaussian indicates a normal distribution
with integral 1 having the maximum at angle (given in degree), which is
converted to radians and the cosine of it is calculated. This cos(q) value will give the peak position, and the
width is controlled by sigma=wcontrol, where wcontrol is used as it is given in
the *.dat file, so it is a dcos(q) value.
In these
cases the theoretical distribution is calculated for all the bins spanning the
range -1=<(cos(q)=<1, so only one positive constraint (desired angle) can be
meaningfully given for a neighbour1-central-neighbour2 triplet, as more than 1
constraint would ruin each others effect.
Negative
constraints, that no angles desired in a given region can be set up using
calculation method=2. In this case the constraint is for the given region and
not for all the bins, as it could interfere otherwise with a
"positive" constraint for the same particle types. A normal
distribution curve similarly to method=1 is calculated from that part of
-CONF_INT*wcontrol ® + CONF_INT*wcontrol, which is in the -1 à +1 range, centred on the not desired angle.
The calculated curve is used during the chi2 calculation to provide a cos(q)-dependent weight by multiplying the
calculated distribution with it, and calculating the product's squared
difference from 0.
Make sure,
that the interval of a negative constraint does not overlap with a positive
constraint for the same particle triplet!
A new class (NeighbourList) for the gridding of the simulation box and for the neighbour list was created. Gridding means that the simulation box is divided into a given number of sub-cells in each direction. If we know about each atom, in which grid cell it is located, and we want to calculate certain properties for only those atoms that are not farther from the chosen atom than a given distance, then it is enough to calculate only for those atoms, which are located in the same, and a given number of neighbouring grid cells. As checking, whether the move can be acceptable based on the satisfaction of the cutoff distances falls in this category, calculation can be quicker, if the grid-based cutoff check is performed before entering the lengthy histogram change calculation. The new class has its own header and source files (NeighbourList.h and .cpp). A new parameter, giving the desired number of particles in a grid cell can be found in the *.dat file after the load_hist parameter. Based on this, the average number density of the sample and an additional safety parameter, SAFE_ADD located in the units.h, the program calculates the number of grid cells in each direction and dimension of the necessary arrays of the NeighbourList object. (The program has to establish the maximum number of particles in a grid cell, and the maximum number of neighbours for an atom represented on the neighbour list.) As due to inhomogeneity in the sample these numbers can vary, the program always checks, whether the maximum is not exceeded, before attempting to write into these arrays. If the maximum was exceeded, the program will stop with an error message before version 1.4. In this case increase the SAFE_ADD value! Beginning with version 1.4 the program will not stop, but after giving a message it will resize the necessary arrays automatically, and continues execution. The same resizing from version 1.4 applies for the arrays connected with the cosine distribution of bond angles constraints. Before version 1.4 the program stopped with error message, if the maximum neighbours of an atom exceeded the max_neigh calculated from the number density using the largest maximum distance involved in this constraint.
Ø
The gridding
is used for a preliminary check, whether the cutoff distances are satisfied
before entering the lengthy calculation of the change in the histogram. For
this only the information provided by the gridding is used, the neighbour list
is not.
Ø
Gridding can
only increase speed, if the number of atoms in a grid cell is chosen
adequately! It is preferable to set the desired maximum number of atoms in a
grid cell in the *.dat file to a relatively low value (5-10) depending
of course on the system size to ensure at least 5 or preferably more grid cell
in each direction. The actual number of grid cells is calculated by the program
and printed on screen during the initialisation period, or can be found in the *.grid
file. It has to be kept in mind,
that even if the largest cutoff distance is smaller than the length of one grid
cell, not just the cell containing the central atom, but all its closest
neighbour cells are checked, as the central atom can be close to the edge. This
way normally at least 27 cells are checked. If the total number of grid cells
is smaller than the number of cells to be checked (the cells to be checked to
the right are overlapping with the cells to be checked to the left due to the
periodic boundary conditions), care is taken that the overlapping cell are
checked only once. Speed increase due to gridding can only be expected, if the
number of grid cells to be checked is smaller, than the total number of grid
cells!
Ø
If there
is/are constraint(s) for the cosine distribution of bond angles, than the
neighbour lists are created for the constrained types only. Not only the
indices of the neighbours, but their type, squared distance from their central
atom and the components of the centralàneighbour vector are also stored to spare
the time of calculating them several time, when they should be used in the
calculation.
Ø
During the
simulation loop the gridding and the neighbour list (if it exists) are not
completely recalculated, but only updated for the atoms effected by the move to
make the simulation faster.
The possibility of the
swaps of different type particles was included into the
makemove function. The swap fraction (fraction of swaps related to the generated
moves) is the parameter controlling the frequency of the swaps (has to be
between 0 and 1) is included into the .dat file. Following this in the
same line for each mixed partial (their number is ntypes*(ntypes-1)/2
the boolean indicator 1 (if the given partial can be involved in swap) or 0
(not involved in swap). The order of the partials is the same, as usual, without the ‘clean’ partials(1-2, 1-3,…2-3,…).
For example in case of a 3-component system, if we want that 10% of the
generated moves should be swaps, and only allow type1-type2 and type2-type3
swaps, than the *.dat file should contain the following line:
0.1 1 0 1 ! fraction of swaps, ntypes*(ntypes-1)/2 0 (not allowed) or 1 (allowed) for the possible mixed partial (in order 1-2, 1-3,... 2-3...)
Swaps can be used only
in case of atomic (even multi-atomic) moves, but not with the FNC option. In
case of FNC the swap fraction is automatically put to zero. Obviously cannot be
used in case of a mono-atomic system.
A multi-threaded version of the code (RMC++_multi) was
written for multi-processor computers using the POSIX thread standard to
increase the speed of the calculation by distributing the work between the
threads. The functionality of this version is the same as of the standard, consecutive
version. The only difference in the usage is that the total number of threads
(recommended to match the number of available processors) has to be specified
in the last line of the *.dat file.
|
Code changes, that has no direct effect on the user |
The main reason of partly changing the structure of the program was to make it more truly object oriented, quicker, and more suitable for parallelisation.
·
Static variables
were introduced in each class storing the values that are the same for each
instance, and applied to read the initial parameters using static class-member
routines for the class without creating any instance of the class. This made it
possible to get rid of the original, non-class-member routines reading the
files containing the parameters and data of the simulation. Other non-class
member routines, mostly dealing with the initial data calculation were
incorporated into the class, which data they calculated, so the routines.cpp
file is no longer needed. The few, general purpose routines (random number
generator, output masks…) that were defined in utilities.h still exist.
·
The
calculations described directly in the main loop were replaced by class-based
routines to make the code parallelisable.
· The coordinates stored in the SimpleCfg object are updated right after the new move was made with the new coordinates of the moved atoms. This was necessary for the updating of the grid and the neighbour list. If the move is rejected, the original coordinates are restored using the stored oldpos values of the Move object.
To use the ATLAS option, the ATLAS libraries have to be
installed on your computer. ATLAS is natively a UNIX – LINUX based package, but
it is possible to use it on WINDOWS after installing the free cygnus tools.
As the best ATLAS code can differ from platform to platform, compile the ATLAS-using code on the computer, where it will run. The present RMC program only calls the dgmev_cblas(…) routine, so there is no difference, whether the threaded or the standard CBLAS library (libptcblas.a or libcblas.a) is used in the linking, as ATLAS does not use threading in matrix-vector multiplication.
ATLAS can speed up the matrix-vector multiplication based binàdr conversion for the g(r) data sets, and the Fourier-transformation for
S(Q) and F(Q).
Computationally heavy parts were parallelised in the multi-threaded
version, using the using the Portable Operating System Interface (POSIX)
applicable on shared memory multi-processor computers. The workload is equally (as possible) divided among the threads.
Parallelisation is used during the histogram and its change calculation, the ppcf,
the initial and modified partial and total g(r), S(Q), F(Q) and E(Q)
calculation and the copying of the changed histogram, ppcf and partial
parts after acceptance/rejection was decided. The calculation of the neighbour
list connected cosine distribution of bond angles is handled exclusively by the
main thread.
For the comparison of the standard and the
multi-threaded performance efficiency is used:
,
where p is the number of processors, and S(1) is the elapsed time
for the single-threaded standard and S(p) for the p-threaded
application.
POSIX is natively UNIX-LINUX based, but there
is a freely available interface, POSIX for Win32 , which
makes it easy to run the program under WINDOWS.
|
Which version to use? |
Even if the newly added
features are not used, it is recommended to use the new version, as even
without the speed increase provided by ATLAS and/or multi-threading RMC++_new proved to be 1.5 times quicker, than
RMC++ on our test system.
The speed increase caused by ATLAS and/or multi-threading depend on the actual parameters of the simulation (number of atoms, number of histogram bins, number and size of the experimental data, number of available processors), so no general answer can be given.
Several test were performed to test the speed increase on various platforms with varying system sizes, ATLAS usage increased the speed 1.9-2.6 for RMC++_new and 1.2-1.9 for 2-threaded RMC++_multi. The speed increase of RMC++_multi over RMC++_new is about E(2)=93-99% without ATLAS and E(2)=51-81% with ATLAS. See details for more information.
Based on the findings of these tests, it is worth using either standard RMC++_new with ATLAS, or multi-threading, and in most of the cases the combined ATLAS-multi-threading version, but it is a good idea to check, especially before a long simulation, which code is giving the best performance for your system on the given platform.
|
Speed tests on processors with more than 2 execution
core |
The previous speed tests were performed on computers having only two processors. As a more complicated architecture became available, (Intel(R) Xeon(R) CPU E5345 2.33GHz computer containing 2 quad-core E5345 processors) new speed tests were performed, where the speed increase for n>2 threads did not reach the expected values. The problem was fixed by introducing cache padding, details about this can be found here.
Last modified 07/09/2009 by Orsolya Gereben
(comments welcome!)