Articles in this series
A simple example of parallel execution using OpenCL, which is an API that allows the execution of tasks on the CPU and GPU.
In this example we cover the square function in the kernel while validating the results in the main function as well.
LU Matrix Decomposition
The program decompose the original matrix into LU matrices, using OpenCL and validating the results in the main function.
The kernel dispatches values for matrix decomposition given the indices i, j and the position by...
OpenMP is an API that allows to add concurrency to our programs wether they are written using C or C++, is composed by a set of compiler's directives for memory distributed systems.
This model divides a heavy task in k threads by making sub...
Matrix LU decomposition
In this case lower-up decomposition, the matrix of factors ca be understood as the product of the lower triangular matrix and an upper triangular matrix. The permutation matrix is sometimes included in the product as well.
LU with Intrinsics
In this case we go from the original matrix to its LU form using intrinsics.
In the code you will find commented the equivalent line to the intrinsics one. For example
//l[INDX_POS(i,j)] = 1;
__m128 result = _mm_set1_ps(1.0);
To finish with OpenCL we can proceed and apply full intrinsics to our previous example and in each case is possible to keep track of execution time. Full intrinsics in this case covers arithmetic operations as well, so more vectorized instructions ar...