我有一个看起来像
this的日志文件.
我试图通过以下方式解析Message列中的JSON:
library(readr)
library(jsonlite)
df <- read_csv("log_file_from_above.csv")
fromJSON(as.character(df$Message))
但是,我遇到了以下错误:
Error: parse error: trailing garbage
"isEmailConfirmed": false } { "id": -1,"firstName":
(right here) ------^
我怎样才能摆脱“拖尾垃圾”?
解决方法
fromJSON()不是“应用”字符向量,它试图将它全部转换为数据帧.你可以试试
purrr::map(df$Message,jsonlite::fromJSON)
@Abdou提供的内容或
jsonlite::stream_in(textConnection(gsub("\\n","",df$Message)))
后两者将创建数据帧.第一个将创建一个列表,您可以添加为列.
您可以使用dplyr :: bind_cols的最后一个方法创建包含所有数据的新数据框:
dplyr::bind_cols(df[,1:3],jsonlite::stream_in(textConnection(gsub("\\n",df$Message))))
@Abdou也提出了一个几乎纯粹的基础R解决方案:
cbind(df,do.call(plyr::rbind.fill,lapply(paste0("[",df$Message,"]"),function(x) jsonlite::fromJSON(x))))
完整,有效,工作流程:
library(dplyr)
library(jsonlite)
df <- read.table("http://pastebin.com/raw/MMPMwNZv",quote='"',sep=",",stringsAsFactors=FALSE,header=TRUE)
bind_cols(df[,stream_in(textConnection(gsub("\\n",df$Message)))) %>%
glimpse()
##
Found 3 records...
Imported 3 records. Simplifying into dataframe...
## Observations: 3
## Variables: 19
## $Id <int> 35054,35055,35059
## $Date <chr> "2016-06-17 19:29:43 +0000","2016-06-17 1...
## $Level <chr> "INFO","INFO","INFO"
## $id <int> -2,-1,-3
## $ipAddress <chr> "100.100.100.100",NA,"100.200.300.400"
## $howYouHearaboutUs <chr> NA,"Radio",NA
## $isInterestedInOffer <lgl> TRUE,FALSE,TRUE
## $incomeRange <int> 60000,1,100000
## $isEmailConfirmed <lgl> FALSE,TRUE
## $firstName <chr> NA,"John",NA
## $lastName <chr> NA,"Smith",NA
## $email <chr> NA,"john.smith@gmail.com",NA
## $city <chr> NA,"Smalltown",NA
## $birthDate <chr> NA,"1999-12-10T05:00:00Z",NA
## $password <chr> NA,"*********",NA
## $agreetoTermsOfUse <lgl> NA,TRUE,TRUE
## $visitUrl <chr> NA,"https://www.website.com/?purpose=X"
## $isIdentityConfirmed <lgl> NA,FALSE
## $validationResults <lgl> NA,NA