GeMA
The GeMA main application
Parallel calls

Working with multiple threads to parallelize the simulation is mostly acomplished by the processes called from the orchestration script. Nevertheless, if your Lua orchestration code includes some heavy processing, such as loops over mesh nodes, cells or Gauss points to initialize data, or even to do some aditional calculations, it can be interesting to also parallelize those loops.

As seen on the Shared code blocks documentation, to parallelize Lua loops, you should transform them into Lua functions surrounded by pairs of SharedCodeBegin{} and SharedCodeEnd{} statements. This Lua functions can then be executed in parallel through calls to parallelCall(), nodeParallelCall() and cellParallelCall(). The problem domain (mesh nodes, for example) will then be divided according to the rules given in the call and an instance of the user function, defined in the shared code block, will be executed per worker thread, receiving as a parameter the sub-domain of the global problem that should be solved by that particular function call.

Each worker thread is executed in its own Lua environment. Global variables defined inside a shared code block are "local" to each thread (there is a copy of the variable per worker thread), but their values can be exchanged with the main thread through calls to threadGlobal() and setThreadGlobal(). Both parallel call and environment management functions can only be called from the orchestration script, at the main execution thread.

As documented in the simulation options page, there are several configuration parameters that can be used to define the number of threads that will be used in parallel calls, as well as the default behaviour for breaking a problem domain into sub-domains. The execution environment can be queried through some thread management global variables.

The following example shows a function, called from the orchestration script, used to initialize some porosity values at mesh Gauss points, as a function of the point depth. On the first example, the function is executed by the main thread only. On the second one, it is transformed into a parallel version that can be executed through cellParallelCall(). Notice that the changes are minimum: the function was inserted into a shared code block and, instead of traversing the whole mesh cells, it receives as a parameter the set of cells that it should initialize. Also, instead of being called directly by the orchestration script, the initPorosity() function is now being called through cellParallelCall().

Example 1 - Serial code:

--
-- Initializes element porosity on Gauss points using the Athy porosity model:
-- phi(z) = phi0 * exp(-c*z)
--
-- Where:
-- z : the BURIAL depth (without bathymetry), in m
-- phi0 : initial porosity in v/v (0 to 1)
-- c : compaction coefficient in 1/m
--
function initPorosity()
-- Get the needed accessors. Notice that the accessors will retrieve the data in the
-- needed units (which are different from the specified material units for phi0 and phic)
local m = modelData:mesh('mesh')
local coordAc = assert(m:nodeValueAccessor('burialDepth', 'm'))
local phiAc = assert(m:gaussAttributeAccessor('phi', 'V/V'))
local phi0Ac = assert(m:cellPropertyAccessor('phi0', 'V/V'))
local phicAc = assert(m:cellPropertyAccessor('phic', '1/m'))
local ir, shape, etype -- Cache for integration rules and shape functions
for i=1, m:numCells() do
local e = m:cell(i)
-- Get the elements integration rule and shape functions
if e:type() ~= etype then
etype = e:type()
ir = assert(m:elementIntegrationRule(etype, 1))
shape = assert(e:shape())
end
-- Get a matrix with cell node coordinates
local X = e:nodeMatrix(coordAc, true)
-- Get the values of phi0 and phic for this element
local phi0 = phi0Ac:value(e)
local phic = phicAc:value(e)
-- Loop over the element integration points
for j=1, ir:numPoints() do
local ip = ir:integrationPoint(j) -- Get the integration point natural coordinates...
local coord = shape:naturalToCartesian(ip, X) -- .. and converts them to cartesian coordinates
-- Calc phi at the integration point depth using Athy law
local phi = phi0 * math.exp(-phic*coord(2))
-- Save calculated porosity
phiAc:setValue(e, j, phi)
end
end
end
-- The orchestration script
function ProcessScript()
-- ...
initPorosity()
-- ...
end

Example 2 - Parallel code:

SharedCodeBegin{}
--
-- Initializes element porosity on Gauss points using the Athy porosity model:
-- phi(z) = phi0 * exp(-c*z)
--
-- Where:
-- z : the BURIAL depth (without bathymetry), in m
-- phi0 : initial porosity in v/v (0 to 1)
-- c : compaction coefficient in 1/m
--
function initPorosity(taskCells)
-- Get the needed accessors. Notice that the accessors will retrieve the data in the
-- needed units (which are different from the specified material units for phi0 and phic)
local m = modelData:mesh('mesh')
local coordAc = assert(m:nodeValueAccessor('burialDepth', 'm'))
local phiAc = assert(m:gaussAttributeAccessor('phi', 'V/V'))
local phi0Ac = assert(m:cellPropertyAccessor('phi0', 'V/V'))
local phicAc = assert(m:cellPropertyAccessor('phic', '1/m'))
local ir, shape, etype -- Cache for integration rules and shape functions
for e in taskCells() do
-- Get the elements integration rule and shape functions
if e:type() ~= etype then
etype = e:type()
ir = assert(m:elementIntegrationRule(etype, 1))
shape = assert(e:shape())
end
-- Get a matrix with cell node coordinates
local X = e:nodeMatrix(coordAc, true)
-- Get the values of phi0 and phic for this element
local phi0 = phi0Ac:value(e)
local phic = phicAc:value(e)
-- Loop over the element integration points
for j=1, ir:numPoints() do
local ip = ir:integrationPoint(j) -- Get the integration point natural coordinates...
local coord = shape:naturalToCartesian(ip, X) -- .. and converts them to cartesian coordinates
-- Calc phi at the integration point depth using Athy law
local phi = phi0 * math.exp(-phic*coord(2))
-- Save calculated porosity
phiAc:setValue(e, j, phi)
end
end
end
SharedCodeEnd{}
-- The orchestration script
function ProcessScript()
-- ...
cellParallelCall('mesh', true, 'initPorosity')
-- ...
end

Index:

Global variable management functions

threadGlobal(threadId, globalVarName)
threadGlobal(globalVarName)
Description: Queries the value of the named global variable in the given thread environment.
Parameters: threadId The thread id. Must be a value between 0 (referencing the main thread) and maxWorkerThreads. If the thread id parameter is missing, all threads will be queried and the results packed in a table indexed by thread id (including the main one, represented by thread id 0).
globalVarName The name of the per thread global variable to be queried.
Returns: Returns the global variable value if a threadId was given in the call, or a table with per thread values if no threadId was given. In that case, the table can be indexed from 0 to maxWorkerThreads, with the value at index 0 storing the value in the main thread and remaining indices the value in the other worker threads.

Example:

-- Query and print the values stored in each thread in the global variable 'threadResults'
local tresults = threadGlobal('threadResults')
for i = 0, maxWorkerThreads do
print(i, tresults[i])
end
-- Another way to query and print the values stored in each thread in the global variable 'threadResults'
for i = 0, maxWorkerThreads do
print(i, threadGlobal(i, 'threadResults'))
end


setThreadGlobal(threadId, globalVarName, value)
setThreadGlobal(globalVarName, value)
Description: Sets the value of the named global variable in the given thread environment.
Parameters: threadId The thread id. Must be a value between 0 (referencing the main thread) and maxWorkerThreads. If the thread id parameter is missing, the same value will be used to initialize the global variable in every thread (including the main one, represented by thread id 0).
globalVarName The name of the per thread global variable to be changed. If there is no global variable with that name in the given thread, it will be created.
value The value to be set. Must be a value that can be copied to another Lua environment. This includes numbers, strings, booleans, nil, tables and most objects that can be obtained through other API calls (like accessors, meshes, integration rules, etc), but excludes functions and some "handle" objects returned by processes (io handles, fem handles, etc). Tables are deep copied on other threads and must be keyed by strings and numbers, with values following this same rules and without creating cycles.
Returns: Nothing.

Example:

-- Initializes with 0 the value of the global variable 'threadResults' stored in each thread
setThreadGlobal('threadResults', 0)
-- Initializes the per thread global variable 'X' with the thread id
for i = 0, maxWorkerThreads do
setThreadGlobal(i, 'X', i)
end


Parallel execution functions

parallelCall(functionName, numThreads, taskParams, taskAffinity, ...)
Description: Executes the named function in a parallel environment. Must be called from the orchestration script, at the main thread only.
Parameters: functionName The name of the function that will be executed. This parameter is a string, not a function reference, and the function must have been defined inside a Shared code block. The given function will receive as parameters a single entry from the taskParams table, followed by any additional parameters given in the extra parameters to parallelCall(). It should return no values. If it wants to report an error, aborting the parallel call, it can call the standard Lua error() funtion. Pending cancelation requests are handled by API calls, but if those are not being made frequently enough, explicit calls for checkForCancelation() can be made. Process calls can not be made inside a parallel function.
numThreads The number of parallel worker threads that will be launched, limited by the maximum allowed number of threads (maxWorkerThreads). A value of -1 means that the number of threads will be equal to the maximum. A value of 0 efectively disables thread usage, meaning that tasks will be executed by the main thread. A value of 1, on the other hand, means that tasks will be executed by a single thread, other than the main one (can be usefull for testing).
taskParams A table storing per task parameters, defining the work to be done by each parallel function invocation. This table should be a list with a number of entries equal to the number of parallel tasks to be executed. Each entry will be passed to one function invocation. Alternativelly, the table can be replaced by the number of tasks. In that case, each task will receive a task id as its task parameter (a number between 1 and the number of tasks). Each table entry must be a value that can be copied to another environment, as described in the setThreadGlobal() documentation.
taskAffinity An optional table, with the same size as the effective number of tasks, storing the thread id that should be used to execute each task. When missing, each task can be executed by any idle thread. When given, the thread id defines which thread will be responsible for executing each task. A value of zero means that the particular task can be executed by any thread. See also the description for lastParallelCallAffinity(). This parameter must be present (can be nil) for supplying the worker function with extra parameters.
... The additional set of parameters that will be sent to the function referenced by functionName. Each parameter must be a value that can be copied to another environment, as described in the setThreadGlobal() documentation.
Returns: Returns true if all tasks have finished normally or false if any one of them raised an error. On cancelation requests, like any other API call, this function does not returns.

Example:

SharedCodeBegin{}
-- Function receives as parameter its task description consisting of a first
-- index and a last index to be traversed, plus one extra parameter, sent
-- from the parallel call, with the target mesh object
function taskf(taskParam, mesh)
-- Traverses the subset of mesh cells received in the given task parameter
for i = taskParam.first, taskParam.last do
local elem = mesh:cell(i)
-- Do something with the given element
end
end
SharedCodeEnd{}
function ProcessScript()
local m = modelData:mesh('myMeshName')
-- Table defining cell ranges for 9 tasks 'randomly' defined
local tasks = { {first = 1, last = 10},
{first = 15, last = 25},
{first = 100, last = 120},
{first = 150, last = 170},
{first = 200, last = 230},
{first = 310, last = 320},
{first = 415, last = 440},
{first = 500, last = 520},
{first = 580, last = 595},
}
-- Calls taskf() from three different threads. The function will be called nine
-- times, each one receiving as its task parameter one distinct entry from the
-- tasks table. The function also receives as additional parameter a mesh object
parallelCall('taskf', 3, tasks, m)
end


nodeParallelCall(mesh, nodeTypes, functionName, callParams, ...)
Description: Executes a parallel node loop by splitting the mesh domain in several tasks, each one storing part of the mesh nodes. Several instances of the given function are then called in parallel, each receiving as parameter a task with a partition of the mesh nodes. Must be called from the orchestration script, at the main thread only.
Parameters: mesh The mesh whose nodes will be traversed. Accepts either a mesh name or a mesh object.
nodeTypes Defines which type of nodes will be included in the node partitions. Should be equal to "geometry", "ghost" or "both". See info:affectedNodes().
functionName The name of the function that will be called for executing each parallel task. This parameter is a string, not a function reference, and the function must have been defined inside a shared code block. The given function will receive as parameters an iterator object that can be used to traverse the set of nodes belonging to its alloted task, followed by any additional parameters given in the extra parameters to nodeParallelCall(). It should return no values. If it wants to report an error, aborting the parallel call, it can call the standard Lua error() funtion. Pending cancelation requests are handled by API calls and also by calls to the iterator object, but if those are not being made frequently enough, explicit calls for checkForCancelation() can be made.
If the iterator object parameter is named taskNodes, the set of nodes in the task can be traversed by the following loop statement: for node in taskNodes() do. If nodeTypes is equal to "ghost", node indices returned by the iterator will be the ghost nodes "local" (zero based) indices. If equal to "both", node indices will be their "linear" indices. See the section on working with ghost nodes for further explanations.
callParams An optional table storing fields controlling the number of threads that will be used by the parallel call, along with the number of tasks in which the domain will be partitioned and the strategy that will be used for that partition. Available fields:
- strategy: A string defining the startegy that will be used to create node groups. Current accepted values are "default" and "sequential", both meaning that mesh nodes will be partitioned into equal sized sets with sequential mesh node indices. If this field is missing, its default will be equal to the value of the defNodePartitionStrategy field from the simulation options;
- nworkers: The number of parallel worker threads that will be launched, limited by the maximum allowed number of threads (maxWorkerThreads). A value of -1 means that the number of threads will be equal to the maximum. A value of 0 efectively disables thread usage, meaning that tasks will be executed by the main thread. A value of 1, on the other hand, means that tasks will be executed by a single thread, other than the main one (can be usefull for testing). If this field is missing, the default will be equal to -1;
- ntasks: The number of node groups in which the mesh will be partitioned. If less than the number of workers, the used number of workers will be decreased to match. If greater, tasks will be queued and sent to workers as they become available. If 0, the default number of tasks from the defNumTasks field from the simulation options will be used. If > 0, that number will be used. If < 0, the number of tasks will be equal to the effective number of workers multiplied by the absoulte value of the given number of tasks (-1 equals the number of workers, -2 twice the number of workers and so on). If this field is missing, the default will be equal to 0;
- affinity: A table with the same size as the effective number of tasks, storing the thread id that should be used to execute each task. When missing, each task can be executed by any idle thread. When given, the thread id defines which thread will be responsible for executing each task. A value of zero means that the particular task can be executed by any thread. See also the description for lastParallelCallAffinity().
... The additional set of parameters that will be sent to the task function referenced by functionName. Each parameter must be a value that can be copied to another environment, as described in the setThreadGlobal() documentation. For passing this parameters, the callParams table must have been given (but it can be replaced by nil or by an empty table).
Returns: Returns true if all tasks have finished normally or false if any one of them raised an error. On cancelation requests, like any other API call, this function does not returns.

Example:

SharedCodeBegin{}
-- Function receives as parameters the node iterator plus two
-- extra parameters a and b sent from the parallel call
function nodef(taskNodes, a, b)
-- Traverses the subset of mesh nodes that where designated to this function call
for node in taskNodes() do
-- Do something with the mesh node
end
end
SharedCodeEnd{}
function ProcessScript()
-- Calls nodef in parallel for mesh geometry nodes, using the default call parameters
-- and sending two additional values, 10 and 'extra par 2' that are received in nodef
-- by a and b.
nodeParallelCall('mesh_name', 'geometry', 'nodef', {}, 10, 'extra par 2')
end


cellParallelCall(mesh, activeOnly, functionName, callParams, ...)
Description: Executes a parallel cell loop by splitting the mesh domain in several tasks, each one storing part of the mesh cells. Several instances of the given function are then called in parallel, each receiving as parameter a task with a partition of the mesh cells. Must be called from the orchestration script, at the main thread only.
Parameters: mesh The mesh whose cells will be traversed. Accepts either a mesh name or a mesh object.
activeOnly A boolean value stating whether only active cells should be traversed (true) or not (false).
functionName The name of the function that will be called for executing each parallel task. This parameter is a string, not a function reference, and the function must have been defined inside a shared code block. The given function will receive as parameters an iterator object that can be used to traverse the set of cells belonging to its alloted task, followed by any additional parameters given in the extra parameters to cellParallelCall(). It should return no values. If it wants to report an error, aborting the parallel call, it can call the standard Lua error() funtion. Pending cancelation requests are handled by API calls and also by calls to the iterator object, but if those are not being made frequently enough, explicit calls for checkForCancelation() can be made.
If the iterator object parameter is named taskCells, the set of cells in the task can be traversed by the following loop statement: for cell in taskCells() do, where cell is a cell object.
callParams An optional table storing fields controlling the number of threads that will be used by the parallel call, along with the number of tasks in which the domain will be partitioned and the strategy that will be used for that partition. Available fields:
- strategy: A string defining the startegy that will be used to create cell groups. Current accepted values are "default" and "sequential", both meaning that mesh cells will be partitioned into equal sized sets with sequential mesh cell indices. If this field is missing, its default will be equal to the value of the defCellPartitionStrategy field from the simulation options;
- nworkers: The number of parallel worker threads that will be launched, limited by the maximum allowed number of threads (maxWorkerThreads). A value of -1 means that the number of threads will be equal to the maximum. A value of 0 efectively disables thread usage, meaning that tasks will be executed by the main thread. A value of 1, on the other hand, means that tasks will be executed by a single thread, other than the main one (can be usefull for testing). If this field is missing, the default will be equal to -1;
- ntasks: The number of cell groups in which the mesh will be partitioned. If less than the number of workers, the used number of workers will be decreased to match. If greater, tasks will be queued and sent to workers as they become available. If 0, the default number of tasks from the defNumTasks field from the simulation options will be used. If > 0, that number will be used. If < 0, the number of tasks will be equal to the effective number of workers multiplied by the absoulte value of the given number of tasks (-1 equals the number of workers, -2 twice the number of workers and so on). If this field is missing, the default will be equal to 0;
- affinity: A table with the same size as the effective number of tasks, storing the thread id that should be used to execute each task. When missing, each task can be executed by any idle thread. When given, the thread id defines which thread will be responsible for executing each task. A value of zero means that the particular task can be executed by any thread. See also the description for lastParallelCallAffinity().
... The additional set of parameters that will be sent to the task function referenced by functionName. Each parameter must be a value that can be copied to another environment, as described in the setThreadGlobal() documentation. For passing this parameters, the callParams table must have been given (but it can be replaced by nil or by an empty table).
Returns: Returns true if all tasks have finished normally or false if any one of them raised an error. On cancelation requests, like any other API call, this function does not returns.

Example:

SharedCodeBegin{}
-- Function receives as parameters the cell iterator
function cellf(taskCells)
-- Traverses the subset of mesh cells that where designated to this function call
for cell in taskCells() do
-- Do something with the mesh cell
end
end
SharedCodeEnd{}
function ProcessScript()
-- Calls cellf in parallel for every mesh cell (including inactive ones), using the
-- default partition strategy and number of worker threads but stating that the desired
-- number of tasks should be equal to five times the number of worker threads.
cellParallelCall('mesh_name', false, 'cellf', {ntasks = -5})
end


lastParallelCallAffinity()
Description: Returns a Lua table filled with the ids of the threads used to execute each task in the previous call to parallelCall(), nodeParallelCall() or cellParallelCall(). This can be used to force a task in a following parallel call to be executed by a given thread.
Parameters: None.
Returns: Returns a Lua table filled with the ids of the threads used to execute each task in the previous parallel call.

Example:

function ProcessScript()
-- Calls cellf in parallel for every mesh cell (including inactive ones), using the
-- default partition strategy and number of worker threads but stating that the desired
-- number of tasks should be equal to five times the number of worker threads.
cellParallelCall('mesh_name', false, 'cellf', {ntasks = -5})
-- Recovers a vector with information of which thread executead each task
local taskAffinity = lastParallelCallAffinity()
-- Calls cellf2 in parallel for every mesh cell (including inactive ones), using the
-- default partition strategy and number of worker threads but stating that the desired
-- number of tasks should be equal to five times the number of worker threads.
-- Each task is executed by the SAME thread that executed that same ordered task in
-- the previous parallel call, meaning that if cell 25 was processed by thread 3 in the
-- previous call, it will also be processed by thread 3 in the current call.
cellParallelCall('mesh_name', false, 'cellf2', {ntasks = -5, affinity = taskAffinity})
end