如何将“3.71B”和“4M”等字符串一致地转换为Python中的数字?

时间:2022-05-12 12:37:06

I have some rather mangled code that almost produces the tangible price/book from Yahoo Finance for companies (a nice module called ystockquote gets the intangible price/book value already).

我有一些相当严重的代码,几乎可以为雅虎财经公司制作有形的价格/书(一个名为ystockquote的好模块已经获得了无形的价格/账面价值)。

My problem is this:

我的问题是:

For one of the variables in the calculation, shares outstanding I'm getting strings like 10.89B and 4.9M, where B and M stand respectively for billion and million. I'm having trouble converting them to numbers, here's where I'm at:

对于计算中的一个变量,未发行股票我得到的字符串如10.89B和4.9M,其中B和M分别为数十亿和百万。我无法将它们转换为数字,这就是我所处的位置:

shares=''.join(node.findAll(text=True)).strip().replace('M','000000').replace('B','000000000').replace('.','') for node in soup2.findAll('td')[110:112]

Which is pretty messy, but I think it would work if instead of

这是非常混乱的,但我认为如果不是,它会起作用

.replace('M','000000').replace('B','000000000').replace('.','') 

I was using a regular expression with variables. I guess the question is simply which regular expression and variables. Other suggestions are also good.

我正在使用带变量的正则表达式。我想这个问题只是简单的正则表达式和变量。其他建议也很好。

EDIT:

To be specific I'm hoping to have something that works for numbers with zero, one, or two decimals but these answers all look helpful.

具体来说,我希望有一些适用于零,一或两位小数的数字,但这些答案看起来都很有帮助。

5 个解决方案

#1


14  

>>> from decimal import Decimal
>>> d = {
        'M': 6,
        'B': 9
}
>>> def text_to_num(text):
        if text[-1] in d:
            num, magnitude = text[:-1], text[-1]
            return Decimal(num) * 10 ** d[magnitude]
        else:
            return Decimal(text)

>>> text_to_num('3.17B')
Decimal('3170000000.00')
>>> text_to_num('4M')
Decimal('4000000')
>>> text_to_num('4.1234567891234B')
Decimal('4123456789.1234000000000')

You can int() the result if you want too

如果你也想要你可以int()结果

#2


4  

Parse the numbers as floats, and use a multiplier mapping:

将数字解析为浮点数,并使用乘数映射:

multipliers = dict(M=10**6, B=10**9)
def sharesNumber(nodeText):
    nodeText = nodeText.strip()
    mult = 1
    if nodeText[-1] in multipliers:
        mult = multipliers[nodeText[-1]]
        nodeText = nodeText[:-1]
    return float(nodeText) * mult

#3


2  

num_replace = {
    'B' : 1000000000,
    'M' : 1000000,
}

a = "4.9M" 
b = "10.89B" 

def pure_number(s):
    mult = 1.0
    while s[-1] in num_replace:
        mult *= num_replace[s[-1]]
        s = s[:-1]
    return float(s) * mult 

pure_number(a) # 4900000.0
pure_number(b) # 10890000000.0

This will work with idiocy like:

这将与白痴一起工作:

pure_number("5.2MB") # 5200000000000000.0

and because of the dictionary approach, you can add as many suffixes as you want in an easy to maintain way, and you can make it more lenient by expressing your dict keys in one capitalisation form and then doing a .lower() or .upper() to make it match.

并且由于字典方法,您可以以易于维护的方式添加任意数量的后缀,并且可以通过以一个大写形式表示您的dict键然后执行.lower()或.upper来使其更宽松。 ()使其匹配。

#4


2  

num_replace = {
    'B' : 'e9',
    'M' : 'e6',
}

def str_to_num(s):
    if s[-1] in num_replace:
        s = s[:-1]+num_replace[s[-1]]
    return int(float(s))

>>> str_to_num('3.71B')
3710000000L
>>> str_to_num('4M')
4000000

So '3.71B' -> '3.71e9' -> 3710000000L etc.

所以'3.71B' - >'3.71e9' - > 3710000000L等

#5


1  

This could be an opportunity to safely use eval!! :-)

这可能是一个安全使用eval的机会! :-)

Consider the following fragment:

请考虑以下片段:

>>> d = { "B" :' * 1e9', "M" : '* 1e6'}
>>> s = "1.493B"
>>> ll = [d.get(c, c) for c in s]
>>> eval(''.join(ll), {}, {})
1493000000.0

Now put it all together into a neat one liner:

现在把它们整合成一个整齐的衬里:

d = { "B" :' * 1e9', "M" : '* 1e6'}

def human_to_int(s):
    return eval(''.join([d.get(c, c) for c in s]), {}, {})

print human_to_int('1.439B')
print human_to_int('1.23456789M')

Gives back:

回馈:

1439000000.0
1234567.89

#1


14  

>>> from decimal import Decimal
>>> d = {
        'M': 6,
        'B': 9
}
>>> def text_to_num(text):
        if text[-1] in d:
            num, magnitude = text[:-1], text[-1]
            return Decimal(num) * 10 ** d[magnitude]
        else:
            return Decimal(text)

>>> text_to_num('3.17B')
Decimal('3170000000.00')
>>> text_to_num('4M')
Decimal('4000000')
>>> text_to_num('4.1234567891234B')
Decimal('4123456789.1234000000000')

You can int() the result if you want too

如果你也想要你可以int()结果

#2


4  

Parse the numbers as floats, and use a multiplier mapping:

将数字解析为浮点数,并使用乘数映射:

multipliers = dict(M=10**6, B=10**9)
def sharesNumber(nodeText):
    nodeText = nodeText.strip()
    mult = 1
    if nodeText[-1] in multipliers:
        mult = multipliers[nodeText[-1]]
        nodeText = nodeText[:-1]
    return float(nodeText) * mult

#3


2  

num_replace = {
    'B' : 1000000000,
    'M' : 1000000,
}

a = "4.9M" 
b = "10.89B" 

def pure_number(s):
    mult = 1.0
    while s[-1] in num_replace:
        mult *= num_replace[s[-1]]
        s = s[:-1]
    return float(s) * mult 

pure_number(a) # 4900000.0
pure_number(b) # 10890000000.0

This will work with idiocy like:

这将与白痴一起工作:

pure_number("5.2MB") # 5200000000000000.0

and because of the dictionary approach, you can add as many suffixes as you want in an easy to maintain way, and you can make it more lenient by expressing your dict keys in one capitalisation form and then doing a .lower() or .upper() to make it match.

并且由于字典方法,您可以以易于维护的方式添加任意数量的后缀,并且可以通过以一个大写形式表示您的dict键然后执行.lower()或.upper来使其更宽松。 ()使其匹配。

#4


2  

num_replace = {
    'B' : 'e9',
    'M' : 'e6',
}

def str_to_num(s):
    if s[-1] in num_replace:
        s = s[:-1]+num_replace[s[-1]]
    return int(float(s))

>>> str_to_num('3.71B')
3710000000L
>>> str_to_num('4M')
4000000

So '3.71B' -> '3.71e9' -> 3710000000L etc.

所以'3.71B' - >'3.71e9' - > 3710000000L等

#5


1  

This could be an opportunity to safely use eval!! :-)

这可能是一个安全使用eval的机会! :-)

Consider the following fragment:

请考虑以下片段:

>>> d = { "B" :' * 1e9', "M" : '* 1e6'}
>>> s = "1.493B"
>>> ll = [d.get(c, c) for c in s]
>>> eval(''.join(ll), {}, {})
1493000000.0

Now put it all together into a neat one liner:

现在把它们整合成一个整齐的衬里:

d = { "B" :' * 1e9', "M" : '* 1e6'}

def human_to_int(s):
    return eval(''.join([d.get(c, c) for c in s]), {}, {})

print human_to_int('1.439B')
print human_to_int('1.23456789M')

Gives back:

回馈:

1439000000.0
1234567.89