文章/答案/技术大牛

发布

社区首页 >问答首页 >在正确的列中移动错置值

问在正确的列中移动错置值
EN

Stack Overflow用户

提问于 2022-07-13 08:59:02

回答 1查看 54关注 0票数 0

这里的初学者有一个包含多个列的大型dataframe，其中一些值被错误地放置，但至少在值前面有正确的列名。想象一下像这样的数据文件：

Country <- c("Spain", "Time:16 Mar 2018 - 23 Apr 2018", "USA")
Platform <- c("Twitter", "Country:Germany", "Cap:200")
Start_Time <- c("10 Jun 2018 - 2 Jul 2018", "Platform:Facebook", "Platform:Instagram")
Cap <- c("300", "500", "Time:10 Jun 2018 - 2 Jul 2018")

dat <- data.frame(Country, Platform, Start_Time, Cap)

Output:
Country                          Platform          Start_Time                       Cap

Spain                            Twitter           10 Jun 2018 - 2 Jul 2018         300
Time:16 Mar 2018 - 23 Apr 2018   Country:Germany   Platform:Facebook                500
USA                              Cap:200           Platform:Instagram               Time:10 Jun 2018 - 2 Jul 2018

如您所见，如果值被错误放置，则在值前面设置正确的列名(或至少设置一个与Start_Time和Time:类似的指示)。

如何将这些值转换为它们各自的列？我原来的dataframe有729行和45列，所以手工工作越少越好。正确的输出应该如下所示：

Output:
Country    Platform          Start_Time                       Cap

Spain      Twitter           10 Jun 2018 - 2 Jul 2018         300
Germany    Facebook          16 Mar 2018 - 23 Apr 2018        500
USA        Instagram         10 Jun 2018 - 2 Jul 2018         200

非常感谢。

编辑:这是dput(df_short_head)的输出，它是我原始数据的前6行

structure(list(Presale_Time = c("16 Apr 2018  -  30 Apr 2018                            ", 
"Whitelist/KYC:Whitelist + KYC", "Country:Jersey", "ICO Time:26 Mar 2018  -  23 Apr 2018                            ", 
"", "ICO Time:01 Mar 2018  -  31 Mar 2018                            "
), ICO_Time = c("01 May 2018  -  20 July 2018                            ", 
"Country:Singapore", "", "Country:UK", "", "Country:Malaysia"
), Whitelist_KYC = c("\nWhitelist/KYC:\nWhitelist + KYC\n", "", 
"", "", "", ""), Country = c("Spain", "", "", "", "", ""), Platform = c("Ethereum                                                            ", 
"Ethereum                                                            ", 
"Ethereum                                                            ", 
"Scrypt                                                            ", 
"Total supply:1,087,156,610.00 FXT", "Ethereum                                                            "
), Token_Type = c("ERC20", "ERC20", "ERC20", "Scrypt", "", "ERC20"
), Available_for_sale = c("1,008,000,000 CST", "2,200,000,000 ZPR", 
"200,000,000 GNY", "20,000,000 SHARD", "", "Total supply:15,000,000,000.00 SRCOIN"
), Total_Supply_2 = c("1,124,463,121.00 CST", "1,850,000,000.00 ZPR", 
"400,000,000.00 GNY", "25,391,088.27 SHARD", "", ""), ICO_Price = c(" 0.05 USD", 
" 0.0375 USD", "Accepting:BTC, ETH, LSK, ASCH", " 0.57 USD", 
"Accepting:ETH, BTC", " 0.006 USD"), Accepting = c("ETH, BTC, Fiat", 
"ETH", "Soft cap:1,000,000 USD", "BTC, ETH, LTC, XRP", "Hard cap:40,000 ETH", 
"ETH"), Soft_Cap = c("983,733 EUR", "5 000 ETH", "Hard cap:400,000,000 GNY", 
"1,500 ETH", "", ""), Hard_Cap = c("71,400,000 EUR", "48 000 ETH", 
"Bonuses:20% sale for first 100,000,000 tokens", "12,500 ETH", 
"", "")), row.names = c(NA, 6L), class = "data.frame")

dataframe

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-07-13 10:08:06

好的，我不知道您是否熟悉tidyverse和dplyr函数，所以我在解决方案中使用了经典函数(考虑到最后一次使用完整格式的更新)。需要注意的是，有些值以“红利：”开头，但是您没有定义这个列，所以代码中没有考虑它。

dat <- structure(list(Presale_Time = c("16 Apr 2018  -  30 Apr 2018                            ", 
                                       "Whitelist/KYC:Whitelist + KYC", "Country:Jersey", "ICO Time:26 Mar 2018  -  23 Apr 2018                            ", 
                                       "", "ICO Time:01 Mar 2018  -  31 Mar 2018                            "), 
                      ICO_Time = c("01 May 2018  -  20 July 2018                            ",
                                   "Country:Singapore", "", "Country:UK", "", "Country:Malaysia"), 
                      Whitelist_KYC = c("\nWhitelist/KYC:\nWhitelist + KYC\n", "",
                                        "", "", "", ""), 
                      Country = c("Spain", "", "", "", "", ""), 
                      Platform = c("Ethereum                                                            ",
                                   "Ethereum                                                            ",
                                   "Ethereum                                                            ",
                                   "Scrypt                                                            ",
                                   "Total supply:1,087,156,610.00 FXT", "Ethereum                                                            "), 
                      Token_Type = c("ERC20", "ERC20", "ERC20", "Scrypt", "", "ERC20"), 
                      Available_for_sale = c("1,008,000,000 CST", "2,200,000,000 ZPR",
                                             "200,000,000 GNY", "20,000,000 SHARD", "", 
                                             "Total supply:15,000,000,000.00 SRCOIN"), 
                      Total_Supply = c("1,124,463,121.00 CST", "1,850,000,000.00 ZPR",
                                         "400,000,000.00 GNY", "25,391,088.27 SHARD", "", ""), 
                      ICO_Price = c(" 0.05 USD",
                                    " 0.0375 USD", "Accepting:BTC, ETH, LSK, ASCH", " 0.57 USD",
                                    "Accepting:ETH, BTC", " 0.006 USD"), 
                      Accepting = c("ETH, BTC, Fiat",
                                    "ETH", "Soft cap:1,000,000 USD", "BTC, ETH, LTC, XRP", 
                                    "Hard cap:40,000 ETH",
                                    "ETH"), 
                      Soft_Cap = c("983,733 EUR", "5 000 ETH", "Hard cap:400,000,000 GNY",
                                   "1,500 ETH", "", ""), 
                      Hard_Cap = c("71,400,000 EUR", "48 000 ETH",
                                   "Bonuses:20% sale for first 100,000,000 tokens", "12,500 ETH", 
                                   "", "")), 
                 row.names = c(NA, 6L), class = "data.frame")

# Define a vector with (columns') categories
groupNames <- gsub(x = colnames(dat), pattern = "_", replacement = " ")

# Define a function for correcting missplacing
correcting <- function(x, groupNames){
  
  # Coerce input vector as character
  x <- gsub(x = as.character(x), pattern = "\n", replacement = "")
  x <- gsub(x = x, pattern = "/", replacement = " ")
  
  # Find the missplaced positions
  index <- do.call(c, sapply(groupNames, grep, x = x, ignore.case = TRUE))
  
  # If there is any missplaced value...
  if(length(index) > 0){
    
    oldValues <- x[index]
    
    x[index] <- NA
    
    # Correct misplacing and remove text used as clue
    x[match(names(index), groupNames)] <- gsub(x = oldValues, 
                                               pattern = "^[[:print:]]{1,}:", 
                                               replacement = "")
    
  }
  
  return(x)
}

# Apply function by row, transposing and coerce output as data frame
out <- as.data.frame(t(apply(dat, 1, correcting, groupNames = groupNames)))

# Replace names of columns
colnames(out) <- colnames(dat)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72963549

复制

相似问题

问在正确的列中移动错置值
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在正确的列中移动错置值EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在正确的列中移动错置值
EN