首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在正确的列中移动错置值

在正确的列中移动错置值
EN

Stack Overflow用户
提问于 2022-07-13 08:59:02
回答 1查看 54关注 0票数 0

这里的初学者有一个包含多个列的大型dataframe,其中一些值被错误地放置,但至少在值前面有正确的列名。想象一下像这样的数据文件:

代码语言:javascript
复制
Country <- c("Spain", "Time:16 Mar 2018 - 23 Apr 2018", "USA")
Platform <- c("Twitter", "Country:Germany", "Cap:200")
Start_Time <- c("10 Jun 2018 - 2 Jul 2018", "Platform:Facebook", "Platform:Instagram")
Cap <- c("300", "500", "Time:10 Jun 2018 - 2 Jul 2018")

dat <- data.frame(Country, Platform, Start_Time, Cap) 
代码语言:javascript
复制
Output:
Country                          Platform          Start_Time                       Cap

Spain                            Twitter           10 Jun 2018 - 2 Jul 2018         300
Time:16 Mar 2018 - 23 Apr 2018   Country:Germany   Platform:Facebook                500
USA                              Cap:200           Platform:Instagram               Time:10 Jun 2018 - 2 Jul 2018

如您所见,如果值被错误放置,则在值前面设置正确的列名(或至少设置一个与Start_Time和Time:类似的指示)。

如何将这些值转换为它们各自的列?我原来的dataframe有729行和45列,所以手工工作越少越好。正确的输出应该如下所示:

代码语言:javascript
复制
Output:
Country    Platform          Start_Time                       Cap

Spain      Twitter           10 Jun 2018 - 2 Jul 2018         300
Germany    Facebook          16 Mar 2018 - 23 Apr 2018        500
USA        Instagram         10 Jun 2018 - 2 Jul 2018         200

非常感谢。

编辑:这是dput(df_short_head)的输出,它是我原始数据的前6行

代码语言:javascript
复制
structure(list(Presale_Time = c("16 Apr 2018  -  30 Apr 2018                            ", 
"Whitelist/KYC:Whitelist + KYC", "Country:Jersey", "ICO Time:26 Mar 2018  -  23 Apr 2018                            ", 
"", "ICO Time:01 Mar 2018  -  31 Mar 2018                            "
), ICO_Time = c("01 May 2018  -  20 July 2018                            ", 
"Country:Singapore", "", "Country:UK", "", "Country:Malaysia"
), Whitelist_KYC = c("\nWhitelist/KYC:\nWhitelist + KYC\n", "", 
"", "", "", ""), Country = c("Spain", "", "", "", "", ""), Platform = c("Ethereum                                                            ", 
"Ethereum                                                            ", 
"Ethereum                                                            ", 
"Scrypt                                                            ", 
"Total supply:1,087,156,610.00 FXT", "Ethereum                                                            "
), Token_Type = c("ERC20", "ERC20", "ERC20", "Scrypt", "", "ERC20"
), Available_for_sale = c("1,008,000,000 CST", "2,200,000,000 ZPR", 
"200,000,000 GNY", "20,000,000 SHARD", "", "Total supply:15,000,000,000.00 SRCOIN"
), Total_Supply_2 = c("1,124,463,121.00 CST", "1,850,000,000.00 ZPR", 
"400,000,000.00 GNY", "25,391,088.27 SHARD", "", ""), ICO_Price = c(" 0.05 USD", 
" 0.0375 USD", "Accepting:BTC, ETH, LSK, ASCH", " 0.57 USD", 
"Accepting:ETH, BTC", " 0.006 USD"), Accepting = c("ETH, BTC, Fiat", 
"ETH", "Soft cap:1,000,000 USD", "BTC, ETH, LTC, XRP", "Hard cap:40,000 ETH", 
"ETH"), Soft_Cap = c("983,733 EUR", "5 000 ETH", "Hard cap:400,000,000 GNY", 
"1,500 ETH", "", ""), Hard_Cap = c("71,400,000 EUR", "48 000 ETH", 
"Bonuses:20% sale for first 100,000,000 tokens", "12,500 ETH", 
"", "")), row.names = c(NA, 6L), class = "data.frame")
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-07-13 10:08:06

好的,我不知道您是否熟悉tidyverse和dplyr函数,所以我在解决方案中使用了经典函数(考虑到最后一次使用完整格式的更新)。需要注意的是,有些值以“红利:”开头,但是您没有定义这个列,所以代码中没有考虑它。

代码语言:javascript
复制
dat <- structure(list(Presale_Time = c("16 Apr 2018  -  30 Apr 2018                            ", 
                                       "Whitelist/KYC:Whitelist + KYC", "Country:Jersey", "ICO Time:26 Mar 2018  -  23 Apr 2018                            ", 
                                       "", "ICO Time:01 Mar 2018  -  31 Mar 2018                            "), 
                      ICO_Time = c("01 May 2018  -  20 July 2018                            ",
                                   "Country:Singapore", "", "Country:UK", "", "Country:Malaysia"), 
                      Whitelist_KYC = c("\nWhitelist/KYC:\nWhitelist + KYC\n", "",
                                        "", "", "", ""), 
                      Country = c("Spain", "", "", "", "", ""), 
                      Platform = c("Ethereum                                                            ",
                                   "Ethereum                                                            ",
                                   "Ethereum                                                            ",
                                   "Scrypt                                                            ",
                                   "Total supply:1,087,156,610.00 FXT", "Ethereum                                                            "), 
                      Token_Type = c("ERC20", "ERC20", "ERC20", "Scrypt", "", "ERC20"), 
                      Available_for_sale = c("1,008,000,000 CST", "2,200,000,000 ZPR",
                                             "200,000,000 GNY", "20,000,000 SHARD", "", 
                                             "Total supply:15,000,000,000.00 SRCOIN"), 
                      Total_Supply = c("1,124,463,121.00 CST", "1,850,000,000.00 ZPR",
                                         "400,000,000.00 GNY", "25,391,088.27 SHARD", "", ""), 
                      ICO_Price = c(" 0.05 USD",
                                    " 0.0375 USD", "Accepting:BTC, ETH, LSK, ASCH", " 0.57 USD",
                                    "Accepting:ETH, BTC", " 0.006 USD"), 
                      Accepting = c("ETH, BTC, Fiat",
                                    "ETH", "Soft cap:1,000,000 USD", "BTC, ETH, LTC, XRP", 
                                    "Hard cap:40,000 ETH",
                                    "ETH"), 
                      Soft_Cap = c("983,733 EUR", "5 000 ETH", "Hard cap:400,000,000 GNY",
                                   "1,500 ETH", "", ""), 
                      Hard_Cap = c("71,400,000 EUR", "48 000 ETH",
                                   "Bonuses:20% sale for first 100,000,000 tokens", "12,500 ETH", 
                                   "", "")), 
                 row.names = c(NA, 6L), class = "data.frame")

# Define a vector with (columns') categories
groupNames <- gsub(x = colnames(dat), pattern = "_", replacement = " ")

# Define a function for correcting missplacing
correcting <- function(x, groupNames){
  
  # Coerce input vector as character
  x <- gsub(x = as.character(x), pattern = "\n", replacement = "")
  x <- gsub(x = x, pattern = "/", replacement = " ")
  
  # Find the missplaced positions
  index <- do.call(c, sapply(groupNames, grep, x = x, ignore.case = TRUE))
  
  # If there is any missplaced value...
  if(length(index) > 0){
    
    oldValues <- x[index]
    
    x[index] <- NA
    
    # Correct misplacing and remove text used as clue
    x[match(names(index), groupNames)] <- gsub(x = oldValues, 
                                               pattern = "^[[:print:]]{1,}:", 
                                               replacement = "")
    
  }
  
  return(x)
}

# Apply function by row, transposing and coerce output as data frame
out <- as.data.frame(t(apply(dat, 1, correcting, groupNames = groupNames)))

# Replace names of columns
colnames(out) <- colnames(dat)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72963549

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档