November 2023
·
57 Reads
·
14 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
November 2023
·
57 Reads
·
14 Citations
... The pseudocode for the optimized algorithm for a lower triangular non-transposed matrix is given in Algorithm 4. It traverses the matrix "bottom-up". For the last columns, calculations are performed using the baseline algorithm (lines [4][5][6][7][8][9][10][11][12][13][14], and for the remaining columns, they are performed by an optimized algorithm that traverses along diagonals (lines 15-31). AXPY(length + 1, alpha * X[i], a + lda * i, Y + i); 7: end for 8: for (i = 0; i < iend; i += BLOCK_SIZE) do 9: y_copy =LOAD(y + i); 10: for (INT j = 0; j < k; j++) do 11: x_copy = LOAD(x + i + 1 + j); 12: diag_a = LOAD_WITH_STRIDE(a + 1 + j, STRIDE); 13: mul = MUL_VV(x_copy, diag_a); 14: y_copy = FMA_VF(y_copy, alpha, mul); 15: end for 16: SAVE(y + i, y_copy); 17: a += BLOCK_SIZE * lda; 18: end for 19: for (; i < n -k; i++) do 20: length = MIN(n -i -1, k); 21: Y[i] += alpha * DOT(length, a + 1, X + i + 1); 22: a += lda; 23: end for 24: for (; i < n; i++) do 25: length = MIN(n -i -1, k); 26: AXPY(length + 1, alpha * X[i], a + k -length, 27: Y + i -length); 28: Y[i] += alpha * DOT(length, a + 1, X + i + 1); 29: a += lda; 30: end for 31: } Depending on stored triangle of the matrix and whether the matrix is transposed or not, this operation is performed using the DOT or AXPY. ...
November 2023