jeudi 23 avril 2015

Splitting and replacing a character variable in a dataframe in R

I have a dataframe with multiple character variables of different lengths, and I would like to convert each variable to a list, with each element containing each word, split by spaces.

Say my data looks like this:

char <- c("This is a string of text", "So is this")
char2 <- c("Text is pretty sweet", "Bet you wish you had text like this")

df <- data.frame(char, char2)

# Convert factors to character
df <- lapply(df, as.character)

> df
$char
[1] "This is a string of text" "So is this"              

$char2
[1] "Text is pretty sweet"                "Bet you wish you had text like this"

Now I can use strsplit() to split each column individually by word:

df <- transform(df, "char" = strsplit(df[, "char"], " "))
> df$char
[[1]]
[1] "This"   "is"     "a"      "string" "of"     "text"  

[[2]]
[1] "So"   "is"   "this"

What I would like to do is create a loop or function which would allow me to do this for both columns at once, something like:

for (i in colnames(df) {
    df <- transform(df, i = strsplit(df[, i], " "))
}

This, however, produces the error:

Error in data.frame(list(char = c("This is a string of text", "So is this",  : 
  arguments imply differing number of rows: 6, 8 

I have also tried:

splitter <- function(colname) {
    df <- transform(df, colname = strsplit(df[, colname], " "))
}

splitter(colnames(df))

Which tells me:

Error in strsplit(df[, colname], " ") : non-character argument

I am confused as to why the call to transform works for an individual column but does not when applied within a loop or function. Any help would be much appreciated!

Aucun commentaire:

Enregistrer un commentaire