Computing prefix sums

Given is an array of A of n integers, such that n is power of two. The task is to produce a new array, B, that the i-th element in B is the sum of the first i elements in A.

A simple sequential implementation is as follows:

input int A[_];
int n = A.size;
output int B[n];

B[0] = A[0];

for (int i = 1; i < n; i++)
    B[i] = B[i - 1] + A[i];
Both time and work of the previous algorithm are linear. We present two implementation of parallel version with linear work and logarithmic time. The first one is a recursive procedure:
input int A[_];
    output int B[A.size];

    int prefix_sums(int A[_], int B[_]) {
        if (A.size == 1) {
          B[0] = A[0];
          return 1;
        }

        int n = A.size / 2;
        int AA[n], BB[n];

        pardo(i : n) AA[i] = A[2 * i] + A[2 * i + 1];

        prefix_sums(AA, BB);

        pardo(i : n) {
            B[2 * i] = BB[i] - A[2 * i + 1];
            B[2 * i + 1] = BB[i];
        }

        return 1;
    }

    prefix_sums(A, B);
The idea is to first collapse the input array by adding consecutive pairs of elements. Recursively computing the prefix sums BB on the collapsed array AA, we get the prefix sums of all even prefixes of A. Then it is easy to fill the odd prefixes by subtracting the corresponding value from A.

Another version is a bottom-up non-recursive algorithm. For clarity, we use an auxiliary storage B with n log n elements (log n rows and n columns). Note however, that we use only linearly many elements from B, so it is not hard to rewrite the program such that it uses only linear storage. The program works in two passes: in the bottom-up pass it adds the consecutive pairs of elements in each row, and in the top-down pass it fills in the elements on odd positions.

input int A[_];
int n = A.size;

int ln = 0;
for (; 2 ^ ln < n; ln++);

int B[ln + 1, n];
output int S[n];

pardo(i : n) B[0, i] = A[i];

for (int t = 0; 2 ^ t < n; t++)
    pardo(i : n / 2 ^ (t + 1))
        B[t + 1, i] = B[t, 2 * i] + B[t, 2 * i + 1];

for (int t = 0; 2 ^ t < n; t++)
    pardo(i : n / 2 ^ (ln - t)) {
        B[ln - t - 1, 2 * i] = B[ln - t, i] - B[ln - t - 1, 2 * i + 1];
        B[ln - t - 1, 2 * i + 1] = B[ln - t, i];
    }

pardo(i : n) S[i] = B[0, i];