Given is a binary array of

`A`

, the task is to find
the first occurence of `1`

.
First non-zero index

We could use the same technique as for computing the sum to get logarithmic time and linear work.

```
input int A[_];
int n = A.size;
output int x;
int ln=log(n);
int B[2 ^ ln];
pardo(i : 2 ^ ln) B[i] = 2^ln+1;
pardo(i : n) if (A[i] == 1) B[i] = i;
n = 2 ^ ln;
for (int t = 0; 2 ^ t < n; t++)
pardo(i : n / 2 ^ (t + 1))
B[i] = min(B[2 * i], B[2 * i + 1]);
x = B[0];
```

However, we want to show the advantages of the cCRCW mode. Consider the following
program:
```
#mode cCRCW
input int A[_];
int n = A.size;
int B[n];
output int x = -1;
pardo(i : n) {
B[i] = A[i];
if (i > 0) pardo(j : i) if (A[j] == 1) B[i] = 0;
if (B[i] == 1) x = i;
}
```

The outer `pardo`

is used to separately consider each position in the input array. Each
thread than spawns a number of threads to check whether it is actually the first one
(effectively, the inner `pardo`

computed an unbounded-fan-in `AND`

function in constant time).
Note that several threads may try to write to the same variable `B[i]`

,
but all of them
are writing the same value so the cCRCW mode allows it. The subsequent write is
performed by at most one thread. This program runs in constant time and quadratic work.
We can improve on it:
```
#mode cCRCW
input int A[_];
int n = A.size;
output int x = -1;
int sn = sqrt(n);
int Box[sn];
pardo(i : sn) {
Box[i] = 0;
pardo(j : sn) if (A[i * sn + j] == 1) Box[i] = 1;
}
int fb = first_one_simple(Box);
int C[sn];
pardo(i : sn) C[i] = A[fb * sn + i];
x = first_one_simple(C) + fb * sn;
```

We first cut the input into boxes of size `sqrt(n)`

.
For each box, use the `parallel-AND`

technique to find out if it is empty or not. Using the previous algorithm we find the
first non-empty box. Then again using the previous algorithm we find the first
`1`

in
this box. The time is still constant, and the work is linear.