RMC++ new, RMC++_multi

  RMC++ latest versions, RMC++_new and RMC++_multi

 

 

These versions used the united atomic-molecular version as a starting point; all the changes are compared to that.

 

Changes concerning the user

 

There are a standard and a multi-threaded version of the program, having different code for some of the classes, so they have to be kept separately in different directories (RMC_new and RMC_multi presently). The functionality of the two versions is the same. See some remarks about the speed of the versions.

The new versions use gridding of the simulation box, can handle EXAFS, multiple neighbour type coordination constraint, atomic swaps and distribution of the bond angle constraint.

 

The usage of the programme changed slightly.

1.    Starting in interactive mode (recommended for WINDOWS): start the executable without any command line arguments, will ask for file name, at the end wait for a character input before collapsing the window.

2.    Starting in the background (recommended for Linux): start the executable with two command line arguments, first the file name, second any character and redirecting the screen output to a file. This way does not try to read from the keyboard, and at the end of the program it terminates without waiting. (If it is started just with the first argument, then at the and of the program it goes into a waiting state to read a character from the keyboard (which does not come), and had to be terminated by the user!) The results of the simulation and file contents are not influenced by this.

 

There is no need anymore to make any alteration in the altern.h or altern.cpp, the program can be run both using Linux and Windows. There are however some option switches located in the altern.h file.

Using the old type LINUX Makefile you had to edit the altern.h: if you wanted to choose an option, the #define Option had to be left in the code, for switching it OFF it had to be commented it out (//#define Option).

Using the new Makefile for Linux there is no need editing the altern.h, command line arguments passed to make can control the building of the code. Typing ‘make help’ will give information about the command line options.

Compiling on Windows platform edit the altern.h for switching the appropriate options!

 

The available options are:

1.     

#define _TEST_MODE: this has to be disabled, if the program is used in normal running mode

#define LAST_GEN 1000000

This is mainly for the testing of the program, in _TEST_MODE the random number generator will be initialised with the same value, so the random numbers will follow the same pattern at each run to make testing easier. The run will stop at ngenerated = LAST_GEN regardless the value of time limit.

2.     

#define _SUM_PPCF: the ppcf-s will be summed at each saving, and the average is calculated and saved too.

3.     

#define _ATLAS//use the ATLAS libraries for matrix-vector operations

The libraries have to be installed separately (see the ATLAS web site) , only use this option if they are installed! The installed ATLAS libraries are using the BLAS routines, optimised for the given platform, and can increase the speed of the vector-vector, matrix-vector and matrix-matrix operations. In RMC ATLAS is applied for the S(Q) and F(Q) Fourier-transformation, and in case of g(r) constraint for the binàdr conversion (matrix-vector multiplication).  Pass the AT=0 command line option when compiling on Linux platform to make to switch on the ATLAS option and include the necessary include files and libraries. There are some further remarks about ATLAS later.

 

4.     

#define _OLD_HEADER: if the old style header (name.h) are used, THE NEW STYLE headers are used by default. This option has to be commented out normally.

5.     

#define _CODE_WARRIOR_MAC //DEFAULT is the PC or UNIX version, this option has to be commented out normally

The recent RMC++ code was never tested on MAC, there is no guarantee, that it can be compiled without any change, this was only kept from historical reasons!!!

6.     

#define _OLDFORMAT_OUT: additional to the normal output (*.fit), RMCA format *.out file is created containing the PPCF-s, partial S(Q), F(Q) and E(Q) data, then the calculated and (renormalized) experimental total g(r), S(Q), F(Q) and E(Q).

7.     

#define _Q_SWITCH_SQ_FQ: if it is on, then for all the S(Q) and F(Q) data sets the normally calculated S(Q) and F(Q) will be multiplied by Q, so you have to fit to Q*S(Q) and/or Q*(F(Q).

8.     

#define _NEI : if it is on, saving the neighbour list into the *.nei file.

#define _NEIE : if it is on, saving the neighbour list and also the squared neighbour distances and vector components into the *.nei file.

_NEI,_NEI2 only has effect, if cosine distribution constraint is applied

9.

#define _AV_MOVE: if it is on, then at every run status screen display the average distance the atoms moved so far from their starting positions is displayed.

 

As the new features described below made it necessary to change the *.dat file, an example file will be included in the download  package.

 

 

 

 

 

The average distance so far the atoms moved from their starting positions is displayed on screen at every run status screen display, if the code was compiled with _AV_MOVE.

 

Introducing frequency for configuration collecting. A new parameter (coll_frequency) can be specified in the *.dat file in the same line after the number of configuration to collect, a configuration is collected at every coll_frequency save. If 0 is given for the frequeny or for the number of collected configuration, than no configuration is collected. The collecting starts after the timelimit was reached, as before. If the coll_frequency is not present after the number of configurations, than it is set to 1, and a configuration is collected at each save, as before, so older format *.dat files can be used as well.

 

Introducing the calculation of Rw for the g(r), S(Q), F(Q) and E(k) data series, according to the formula

 


where x=r,  b=c=d=0 for g(r) data; x=Q for X-ray and neutron scattering data; x=k, c=d=0 for EXAFS data, AEi,j is the experimental, ACi,j is the calculated data. Rw is printed on screen together with c2 by RunParams::PrintStatus, and given in the *.hst file for the end of the run.

 


Cache padding was introduced for RMC++_multi to achieve good performance on p>2 multi-processor platforms (see here for details). This normally does not concern the user, especially, if you do not have more than 2 processors in a machine. A few new parameters were introduced in the units.h file to regulate cache padding:

NUMBER_OF_CACHE_LINES_TO_FETCH 1: Number of cache lines to cache in the same time (prefetch), default is 1.

CACHE_LINE_SIZE 64: Byte. Size of the L1 cache, needed in some cases to optimise cache usage. False sharing between threads has to be prevented. This can happen, when although different threads are writing different memory addresses, but the addresses are so close to each other, that they would be cached together into the same cache line (are inside the same CACHE_ALIGNMENT block). Because of this, if one part of a cache line is modified, the whole cache line is written back to memory, so different threads may want to write the same part of the memory holding back each other causing if this happens too often to slow the performance down.

CACHE_PADDING =NUMBER_OF_CACHE_LINES_TO_FETCH*CACHE_LINE_SIZE: the thread segments of some arrays have to be separated at least with CACHE_PADDING-size(data_type) amount of bytes have to be kept between the threads segment data.

If your computer have different cache line size, or number of cache lines to fetch, modify these values, and recompile the program!

 

The *.dat file changed (2007, version 1.1). In case of S(Q) and F(Q) data, there are two extra lines after the renormalization constant containing the  switches whether linear and quadratic renormalization should be used. The renormalization itself was changed, in the original RMC++ the calculated data was renormalized, which was theoretically incorrect, now it was changed to the renormalization of the experimental data. For the sake of clarity the c2 calculation and the renormalization of the data sets is given by

,

where Npi is the number of data points for data set i, is the experimental,  is the calculated  data, for neutron and X-ray fitting xij=Qi,j is the jth data point of the ith set, ai , bi , ci  and di are the renormalization coefficients,  controlled by the vary amplitude, constant, linear and quadratic switch in the *.dat file, and displayed during the simulation and written to the *.fit file. The same formula is used for E(k) fitting, except that xij=kij and only the ai and bi coefficients used and featured in the *.dat file, and for g(r) fitting xij=kij and only ai exists.  The program determines the coefficients, where the difference between the calculated and the experimental sets is the minimum.

The *.add file containing the additional parameters is not needed any more, the content of it is appended to the end of the *.dat file. One extra line after the last entry came after the appended lines of the *.add file is the option switch, whether the histogram (and the coordination number constraints, and too close atoms) should be loaded, if they are available. After this there are a lines specifying the desired number of atoms in a grid cell. The next line contains the swap fraction and for each mixed partial 1 (if the given partial can be involved in swap), 0 (not involved in swap), see details later. The last line is the total number of threads, red only by the multi-threaded version. The same *.dat file can be used for both the standard and the multi-threaded version, if number of threads line is present. Boolean type parameters are accepted in both the .true. or .false. or the 1 or 0 format. The download package contains an example *.dat file including all the possible constraints to demonstrate the format.

 

 

 

The precision of the text type configuration file (*.cfg) was increased. Be aware, that even with the increased precision of the *.cfg file, after a long run there can be a difference between the reloaded histogram, and the histogram newly calculated from the reloaded *.cfg due to the conversion from binary to decimal system during the saving. That is the reasons of the introduction of the binary type configuration file, (*.bcf). The program can work with either of them. If both present, they are compared and if they differ more, than the TOLERANCE given in the units.h, the *.cfg file will be used, as the content of this can be checked easily by the user. It is recommended to use the load histogram option, only if the binary configuration file is available.

 

Order of the partials was changed back, as it was in RMCA. Indexing follows row-continuously the order of the partials in an upper diagonal matrix: (for example for 3 types: 1-1, 1-2, 1-3, 2-2, 2-3, 3-3). This effects only the output, as in the *.dat file the order of the partial properties (cutoffs, coefficients…) even in the original RMC++ was identical to RMCA.

 

Output was changed. The old format, RMCA-type *.out file is only created, if the _OLDFORMAT_OUT compiler option is chosen. The new type *.fit output file will contain only the calculated and the experimental totals. The partial g(r)-s at the middle of the histogram bins (referred to as ppcf-s) will be saved in the *.ppcf file. The partial calculated data {g(r) at the experimental data points in case of g(r) fitting; S(Q), F(Q), EXAFS (referred to as E(Q))} will be saved in the *.pgr, *.psq, *.pfq and *.peq respectively.

 

The moveout option was corrected, as the updating of the list containing the atoms below cutoff could be incorrect. The basic principle of the original moveout option allowing only one too close” atom among the moved atoms has been changed too. Now there can be more than one “too close” atoms among the moved atoms (both for atomic moves and for the example molecular move for CCl4) with the maximum equalling the number of moved atoms + 1, if there is swap. This will make the simulation even faster, as there is no need to make sure, that there is only one “too close” atoms among the moved atoms, and search for atoms satisfying this criterion. The indices of the “too close atoms” are saved into the *.tca file. It can be reloaded together with the histogram, if the load the histogram option is chosen, and the recalculation of the histogram is not necessary.

 

The coordination number constraint was generalized, so not only one, but also a specified number of neighbour types can exist for a constraint. For example in case of 3 atom types it is possible to define a constraint that the central atom of type1 should have 3 atoms belonging to type2 and type3 between rmin[type2] à rmax[type2] and rmin[type3] à rmax[type3] respectively, which means, that there are several possibilities satisfying this constraint (type2-type2-type2, type2-type2-type3, type2-type3-type3 and type3-type3-type3).

The concept of sub-constraints was also introduced. More than one sub-constraint means, that more than one desired coordination number (each with its own desired fraction and sigma) can be specified for a constraint to make the calculation quicker and use less memory and disk space, if constraints should only differ in their desired coordination number. In the original concept each constraint had only one sub-constraint. The introduction of sub-constraint will not result in additional functionality.

The *.dat file should contain the constraint the following way, for example for 3 constraint, the first having 1 neighbour type and 1 sub-constraint, the second 1 neighbour type and  2 sub-constraints and the third 2 neighbour type and  1 sub-constraint:

3 1 1 2 1 2 1  ! no of coordination constraints, number of neighbour types for each constraint, number of sub-constraint for each constraint

1 2    2.2  3.05 2  1.0 0.00025 ! central type, neighbour type(s), rmin[first_neightype]… rmin[last_neightype], rmax[first_neightype]… rmax[last_neightype], desired coordination number[first_subconst]...desired coordination number[last_subconst], fraction[first_subconst]...fraction[last_subconst], sigma[first_subconst]...sigma[last_subconst]

1 3  2.1  3.1 3 4  0.4 0.6 0.00025 0.00015! central type, neighbour type(s), rmin[first_neightype]… rmin[last_neightype], rmax[first_neightype]… rmax[last_neightype], desired coordination number[first_subconst]...desired coordination number[last_subconst], fraction[first_subconst]...fraction[last_subconst], sigma[first_subconst]...sigma[last_subconst]

1 2  3 2.2  2.1  3.05  3.1 3  1.0 0.00001 ! central type, neighbour type(s), rmin[first_neightype]… rmin[last_neightype], rmax[first_neightype]… rmax[last_neightype], desired coordination number[first_subconst]...desired coordination number[last_subconst], fraction[first_subconst]...fraction[last_subconst], sigma[first_subconst]...sigma[last_subconst]

 

If only one neighbour type is required for each constraint, it is possible to specify the constraint as before, omitting the number of neighbour types after the number of constraint, and the program will automatically set the number of neighbour types to 1. If only one sub-constraint is used, than it is possible to omit their number after the number of neighbour types, and 1 is assumed. If however more than one sub-constraint is used for a constraint, than it is obligatory to specify the number of neighbour types as well, even if all of them is 1!

If coordination number constraint is applied, the number of types cannot exceed 7 due to the fact, that the internal variable CoordNumbConst::cctype is defined as long int, which usually represented by 4 bytes! If more types should be used, change the type of this variable to 64bit integer, which is unfortunately called by different names in different C++ environment, so it is not used presently.

 

EXAFS fitting was included.

A short description about EXAFS, and what is fitted exactly in RMC is given here.

An example of the necessary lines in the *.dat file (for each EXAFS data set):

21 31 21 31 21 27       ! range of histogram bin points to use for EXAFS (r space) in pairs for each partial

3                       ! E(k) weighting factor n

asse_15i_as_bft_k0.dat  ßNAME OF THE FILE CONTAINING the k, E(k) data

1 444                   ! no. of datapoints

60 260                  ! range to be used in fitting

1                       ! type of absorbing particle

as_edge_bs.dat          ßNAME OF THE FILE CONTAINING the (r,k)-dependent coefficients

0.000008                ! standard deviation

1                       ! whether to vary amplitudes

.false.                 ! whether to vary constant

 

A sample for the EXAFS data file and the coefficient file will be included in the download  package.

 

New constraint for the cosine distribution of the bond angles was included. This has to come before the coordination number constraint. The parameters in the *.dat file are the following:

1 0.1          ! number of cos. distr of bond angles constraints, dcos_theta (spacing in cos(theta) space

1  1  2  2  1.69  1.69  2.19  2.19  109.5  0.17  0.001 ! calc method (0:step, 1:Gauss, 2:no angle) ,central, neigh1, neigh2 type, dmin1, dmin2, dmax1, dmax2, angle, wcontrol, weight

 

0 for the number of constraints is NEEDED, even if there is no constraint!

The number of bins is regulated through the cos(theta) spacing.

A constraint can be "positive", which means, that bond angles are needed in the given region, and "negative" meaning that angles are not wanted in the given region. Positive constraint can be set up by specifying either 0 for the calculation method indicating a step function, or 1 meaning Gaussian for the shape of the constraint. Step function means having uniform distribution with integral 1 between angle - wcontrol ->angle + wcontrol, (both given in degrees), 0 otherwise. Gaussian indicates a normal distribution with integral 1 having the maximum at angle (given in degree), which is converted to radians and the cosine of it is calculated. This cos(q) value will give the peak position, and the width is controlled by sigma=wcontrol, where wcontrol is used as it is given in the *.dat file,  so it is a dcos(q) value.

In these cases the theoretical distribution is calculated for all the bins spanning the range -1=<(cos(q)=<1, so only one positive constraint (desired angle) can be meaningfully given for a neighbour1-central-neighbour2 triplet, as more than 1 constraint would ruin each others effect.

Negative constraints, that no angles desired in a given region can be set up using calculation method=2. In this case the constraint is for the given region and not for all the bins, as it could interfere otherwise with a "positive" constraint for the same particle types. A normal distribution curve similarly to method=1 is calculated from that part of -CONF_INT*wcontrol ® + CONF_INT*wcontrol, which is in the -1 à +1 range, centred on the not desired angle. The calculated curve is used during the chi2 calculation to provide a cos(q)-dependent weight by multiplying the calculated distribution with it, and calculating the product's squared difference from 0.

Make sure, that the interval of a negative constraint does not overlap with a positive constraint for the same particle triplet!

 

A new class (NeighbourList) for the gridding of the simulation box and for the neighbour list was created. Gridding means that the simulation box is divided into a given number of sub-cells in each direction. If we know about each atom, in which grid cell it is located, and we want to calculate certain properties for only those atoms that are not farther from the chosen atom than a given distance, then it is enough to calculate only for those atoms, which are located in the same, and a given number of neighbouring grid cells. As checking, whether the move can be acceptable based on the satisfaction of the cutoff distances falls in this category, calculation can be quicker, if the grid-based cutoff check is performed before entering the lengthy histogram change calculation. The new class has its own header and source files (NeighbourList.h and .cpp). A new parameter, giving the desired number of particles in a grid cell can be found in the *.dat file after the load_hist parameter. Based on this, the average number density of the sample and an additional safety parameter, SAFE_ADD located in the units.h, the program calculates the number of grid cells in each direction and dimension of the necessary arrays of the NeighbourList object. (The program has to establish the maximum number of particles in a grid cell, and the maximum number of neighbours for an atom represented on the neighbour list.) As due to inhomogeneity in the sample these numbers can vary, the program always checks, whether the maximum is not exceeded, before attempting to write into these arrays. If the maximum was exceeded, the program will stop with an error message before version 1.4. In this case increase the SAFE_ADD value!  Beginning with version 1.4 the program will not stop, but after giving a message it will resize the necessary arrays automatically, and continues execution.  The same resizing from version 1.4 applies for the arrays connected with the cosine distribution of bond angles constraints. Before version 1.4 the program stopped with error message, if the maximum neighbours of an atom exceeded the max_neigh calculated from the number density using the largest  maximum distance involved in this constraint.

Ø     The gridding is used for a preliminary check, whether the cutoff distances are satisfied before entering the lengthy calculation of the change in the histogram. For this only the information provided by the gridding is used, the neighbour list is not.

Ø     Gridding can only increase speed, if the number of atoms in a grid cell is chosen adequately! It is preferable to set the desired maximum number of atoms in a grid cell in the *.dat file to a relatively low value (5-10) depending of course on the system size to ensure at least 5 or preferably more grid cell in each direction. The actual number of grid cells is calculated by the program and printed on screen during the initialisation period, or can be found in the *.grid file.  It has to be kept in mind, that even if the largest cutoff distance is smaller than the length of one grid cell, not just the cell containing the central atom, but all its closest neighbour cells are checked, as the central atom can be close to the edge. This way normally at least 27 cells are checked. If the total number of grid cells is smaller than the number of cells to be checked (the cells to be checked to the right are overlapping with the cells to be checked to the left due to the periodic boundary conditions), care is taken that the overlapping cell are checked only once. Speed increase due to gridding can only be expected, if the number of grid cells to be checked is smaller, than the total number of grid cells!

Ø     If there is/are constraint(s) for the cosine distribution of bond angles, than the neighbour lists are created for the constrained types only. Not only the indices of the neighbours, but their type, squared distance from their central atom and the components of the centralàneighbour vector are also stored to spare the time of calculating them several time, when they should be used in the calculation.

Ø     During the simulation loop the gridding and the neighbour list (if it exists) are not completely recalculated, but only updated for the atoms effected by the move to make the simulation faster.

 

The possibility of the swaps of different type particles was included into the makemove function. The swap fraction (fraction of swaps related to the generated moves) is the parameter controlling the frequency of the swaps (has to be between 0 and 1) is included into the .dat file. Following this in the same line for each mixed partial (their number is ntypes*(ntypes-1)/2 the boolean indicator 1 (if the given partial can be involved in swap) or 0 (not involved in swap). The order of the partials is the same, as usual, without the ‘clean’ partials(1-2, 1-3,…2-3,…). For example in case of a 3-component system, if we want that 10% of the generated moves should be swaps, and only allow type1-type2 and type2-type3 swaps, than the *.dat file should contain the following line:

 

0.1  1  0  1            ! fraction of swaps, ntypes*(ntypes-1)/2 0 (not allowed) or 1 (allowed) for the possible mixed partial (in order 1-2, 1-3,... 2-3...) 

 

Swaps can be used only in case of atomic (even multi-atomic) moves, but not with the FNC option. In case of FNC the swap fraction is automatically put to zero. Obviously cannot be used in case of a mono-atomic system.

 

A multi-threaded version of the code (RMC++_multi) was written for multi-processor computers using the POSIX thread standard to increase the speed of the calculation by distributing the work between the threads. The functionality of this version is the same as of the standard, consecutive version. The only difference in the usage is that the total number of threads (recommended to match the number of available processors) has to be specified in the last line of the *.dat file.  

 

Code changes, that has no direct effect on the user

 

The main reason of partly changing the structure of the program was to make it more truly object oriented, quicker, and more suitable for parallelisation.

·        Static variables were introduced in each class storing the values that are the same for each instance, and applied to read the initial parameters using static class-member routines for the class without creating any instance of the class. This made it possible to get rid of the original, non-class-member routines reading the files containing the parameters and data of the simulation. Other non-class member routines, mostly dealing with the initial data calculation were incorporated into the class, which data they calculated, so the routines.cpp file is no longer needed. The few, general purpose routines (random number generator, output masks…) that were defined in utilities.h still exist.

·        The calculations described directly in the main loop were replaced by class-based routines to make the code parallelisable.

·        The coordinates stored in the SimpleCfg object are updated right after the new move was made with the new coordinates of the moved atoms. This was necessary for the updating of the grid and the neighbour list. If the move is rejected, the original coordinates are restored using the stored oldpos values of the Move object.

 

Some remarks about ATLAS

 

To use the ATLAS option, the ATLAS libraries have to be installed on your computer. ATLAS is natively a UNIX – LINUX based package, but it is possible to use it on WINDOWS after installing the free cygnus tools.

As the best ATLAS code can differ from platform to platform, compile the ATLAS-using code on the computer, where it will run. The present RMC program only calls the dgmev_cblas(…) routine, so there is no difference, whether the threaded or the standard CBLAS library (libptcblas.a or libcblas.a) is used in the linking, as ATLAS does not use threading in matrix-vector multiplication.

ATLAS can speed up the matrix-vector multiplication based binàdr conversion for the g(r) data sets, and the Fourier-transformation for S(Q) and F(Q).

 

Some remarks about the multi-threaded version

 

Computationally heavy parts were parallelised in the multi-threaded version, using the using the Portable Operating System Interface (POSIX) applicable on shared memory multi-processor computers. The workload is equally (as possible) divided among the threads. Parallelisation is used during the histogram and its change calculation, the ppcf, the initial and modified partial and total g(r), S(Q), F(Q) and E(Q) calculation and the copying of the changed histogram, ppcf and partial parts after acceptance/rejection was decided. The calculation of the neighbour list connected cosine distribution of bond angles is handled exclusively by the main thread.

For the comparison of the standard and the multi-threaded performance efficiency is used: , where p is the number of processors, and S(1) is the elapsed time for the single-threaded standard and S(p) for the p-threaded application.

POSIX is natively UNIX-LINUX based, but there is a freely available interface, POSIX for Win32 , which makes it easy to run the program under WINDOWS.

 

Which version to use?

 

Even if the newly added features are not used, it is recommended to use the new version, as even without the speed increase provided by ATLAS and/or multi-threading RMC++_new proved to be 1.5 times quicker, than RMC++ on our test system.

The speed increase caused by ATLAS and/or multi-threading depend on the actual parameters of the simulation (number of atoms, number of histogram bins, number and size of the experimental data, number of available processors), so no general answer can be given.

Several test were performed to test the speed increase on various platforms with varying system sizes, ATLAS usage increased the speed 1.9-2.6 for RMC++_new and 1.2-1.9 for 2-threaded RMC++_multi. The speed increase of RMC++_multi over RMC++_new is about E(2)=93-99% without ATLAS and E(2)=51-81% with ATLAS.  See details for more information.

Based on the findings of these tests, it is worth using either standard RMC++_new with ATLAS, or multi-threading, and in most of the cases the combined ATLAS-multi-threading version, but it is a good idea to check, especially before a long simulation, which code is giving the best performance for your system on the given platform.

 

Speed tests on processors with more than 2 execution core

 

The previous speed tests were performed on computers having only two processors. As a more complicated architecture became available, (Intel(R) Xeon(R) CPU E5345 2.33GHz computer containing 2 quad-core E5345 processors) new speed tests were performed, where the speed increase for n>2 threads did not reach the expected values. The problem was fixed by introducing cache padding, details about this can be found here.

 

 Top of page


Last modified 07/09/2009 by Orsolya Gereben
(comments welcome!)