首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何从spark中的字符串列中提取数字部分,并在数学运算后更新相同的列值

如何从spark中的字符串列中提取数字部分,并在数学运算后更新相同的列值
EN

Stack Overflow用户
提问于 2020-05-05 00:24:51
回答 1查看 64关注 0票数 0

我是scala spark的新手,正在尝试执行下面的操作dataframe列我有一个包含字母数值的列,希望基于数学运算更新这些值,

代码语言:javascript
复制
    +--------------------------------------+
    |Error                                 |
    +--------------------------------------+
    |value: 0.25 Does not meet Requirements|
    |value: 0.5  Does not meet Requirements|
    |value: 0.75 Does not meet Requirements|
    |value: 0.66 Does not meet Requirements|
    |value: 0.34 Does not meet Requirements|
    +--------------------------------------+

我想执行数值操作(1- {Numeric from String}),并用新值更新该列。

例如,我希望输出如下所示

代码语言:javascript
复制
    +--------------------------------------+
    |Error                                 |
    +--------------------------------------+
    |value: 0.75 Does not meet Requirements|
    |value: 0.5  Does not meet Requirements|
    |value: 0.25 Does not meet Requirements|
    |value: 0.34 Does not meet Requirements|
    |value: 0.66 Does not meet Requirements|
    +--------------------------------------+

任何帮助将不胜感激,我学习使用正则表达式的列方法,但执行数学运算,我没有得到任何线索。

问候Mahi

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-05-05 01:08:17

假设您有多个列:

代码语言:javascript
复制
+------+--------------------+
|  col1|               Error|
+------+--------------------+
| first|value: 0.25 Does ...|
|second|value: 0.5  Does ...|
| third|value: 0.75 Does ...|
|fourth|value: 0.66 Does ...|
| fifth|value: 0.34 Does ...|
+------+--------------------+

您可以使用splitmkString来更新列Error

代码语言:javascript
复制
val subtractFromOne: Double => String = number =>
  (BigDecimal(1.0) - BigDecimal(number)).toString()

val transform: String => String = s => s.split(' ') match {
  case Array(first, number, rest@_*) =>
    (Seq(first, subtractFromOne(number.toDouble)) ++ rest).mkString(" ")
  case _ => s // in case if the string is invalid we can return it unchanged
}

implicit val enc: Encoder[Row] = RowEncoder(df.schema)

df
  .map(row => Row(row(0), transform(row.getString(1))))
  .show()

将输出:

代码语言:javascript
复制
+------+--------------------------------------+
|  col1|                                 Error|
+------+--------------------------------------+
| first|value: 0.75 Does not meet Requirements|
|second|value: 0.5  Does not meet Requirements|
| third|value: 0.25 Does not meet Requirements|
|fourth|value: 0.34 Does not meet Requirements|
| fifth|value: 0.66 Does not meet Requirements|
+------+--------------------------------------+

使用BigDecimal来保持规模

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61596938

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档