R: How to find the first non-zero element in a datframe by group
I have the following dataframe
I want to subset this dataset such that I will only have rows between the first record and the the first 1 value in the flag column for each group and if there is no 1, that group should not appear at all.
Something like this:
I saw some answers at Dplyr : how to find the first-non missing string by groups?
But it is for non-missing and I have both non-missing and 0 values.
Or in dplyr
(same result)
Data used:
Benchmark Output:
Benchmark code:
Using dplyr::slice
, then the equivalent base R using by
. And finally one just for performance, with benchmark. All robust for the case with no Flag==1
in a group.
dplyr
base
base fast
benchmark
I did the benchmark on @Lebatsnok modified input, that I remodified because the NAs were not properly recognized as such. MKR's and WWW's solutions are not robust for this case, but i left them in the benchmark anyway.
data
benchmark code
A cumsum
based solution using dplyr
can be as:
Data:
A solution using dplyr
and cumsum
.
DATA
With base R, one could, for instance, do this.
First of all, we need a complete test case with a group having no "1" in the "Flag" column:
Now let's define a function that would take in a data frame and return NULL
if there no 1
in $Flag
and the first N
rows otherwise (where N
is the number of the row where 1
first occurs). This could be done using which.max
with a boolean (TRUE
if $Flag
is 1, FALSE
otherwise):
Now we need to split the data frame by ID
, apply the function, and rbind
the parts again:
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.