Process rows of R dataframe without loop in memory efficient way

Multi tool use
The structure of my dataframe data1
, which has over 1.5 million rows, is like this:
I need to insert a column Exit.time
using values in columns WEEK
and END
and a cutoff value, which is 1287. Exit.time
should have 0 or 1 value based on the following logic:
if WEEK
= 1287, then Exit.time
= 0.
if Week
not equal to 1287, but WEEK
= END
then Exit.time
= 1, otherwise Exit.time
= 0.
For this I tried the following for loop and it does what is required in the above dummy data set.
The problem is that when I use the above loop in my real data set, even after an hour I am not getting an output. I guess looping is not efficient given the size of the dataset. Is there an alternative way to do what I want? I prefer to maintain the order of rows in data1
since I need to do some merge operations later on.
Since you need Exit.time
to be 1 when (WEEK == END) & WEEK != 1287
and 0 otherwise, you can use as.numeric
on the results of (WEEK == END) & WEEK != 1287
, which changes TRUE
to 1
and FALSE
to 0
.
There are multiple ways to code this, mostly differing in the semantics, they are fundamentally doing the same thing
Base R:
This involves typing data1
a lot, so there is a short-cut:
Tidyverse:
Tidyverse is a suite of packages which are great at manipulating data. We are using the package dplyr
, which is part of tidyverse
, so you can either load the whole thing, or just dplyr
:
(I convert from TRUE/FALSE to 0/1 by multiplying by 1. It's less to type)
We can use case_when
from dplyr
.
Using data.table
:
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.