首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python:最小平均距离

Python:最小平均距离
EN

Stack Overflow用户
提问于 2014-10-19 21:49:00
回答 2查看 1.4K关注 0票数 3

我有一套用户的纬度和经度,以及一套办公位置的经度。

我必须找到与所有用户的平均距离最小的办公地点。

在python中实现这一目的的有效方法是什么?我有3k用户和40000个办公地点..。

例如:

输入:用户1 (x1,y1)

用户2 (x2,y2)

办公室1 (x3,y3)

Office 2 (x4,y4)

然后,我必须找出与所有用户的平均距离最小的办公地点。

Office 1与用户1相距200 m,与用户2相距400 m。

Office 2与用户1的距离为100 m,与用户2的距离为200 m。

办公室2是选择的地点。

EN

回答 2

Stack Overflow用户

发布于 2014-10-20 02:33:46

下面是一个使用django的格罗丹戈部分的示例。您可以使用修长吡普罗季进行同样的操作。(安装起来可能有点麻烦,但是一旦您完成了所有的设置,这种工作就非常简单了。)

代码语言:javascript
复制
from django.contrib.gis.geos import Point, MultiPoint

WGS84_SRID = 4326
office1 = Point(x1, y1, srid=WGS84_SRID )
office2 = Point(x2, y1, srid=WGS84_SRID )

# load user locations
user_locations = []
with open("user_locations.csv", "r") as in_f:
    # assuming wgs84 decimal degrees 
    # one location per line in format, 'lon, lat'
    for line in in_f:
        x, y = [float(i.strip()) for i in line.split(",")]
        user_locations.append(Point(x, y, srid=WGS84_SRID ))

# get points in a meters projection
GOOGLE_MAPS_SRID = 3857
office1_meters = office1.transform(GOOGLE_MAPS_SRID, clone=True)
office2_meters = office2.transform(GOOGLE_MAPS_SRID, clone=True)
user_locations_meters = [user_loc.transform(GOOGLE_MAPS_SRID, clone=True) for user_loc in user_locations]

# centroid method
mp = MultiPoint(user_locations, srid=4326)
centroid_distance_from_office1 = mp.centroid.distance(office1_meters)
centroid_distance_from_office2 = mp.centroid.distance(office1_meters)

print "Centroid Location: {}".format(mp.centroid.ewkt)
print("centroid_distance_from_office1: {}m".format(centroid_distance_from_office1)
print("centroid_distance_from_office2: {}m".format(centroid_distance_from_office2)

# average distance method
total_user_locations = float(len(user_locations))
office1_user_avg_distance = sum( user_loc.distance(office1_meters) for user_loc in user_locations_meters)/total_user_locations 
office2_user_avg_distance = sum( user_loc.distance(office2_meters) for user_loc in user_locations_meters)/total_user_locations 

print "avg user distance OFFICE-1: {}".format(office1_user_avg_distance)
print "avg user distance OFFICE-2: {}".format(office2_user_avg_distance)
票数 1
EN

Stack Overflow用户

发布于 2014-10-22 12:47:34

主要是代码,在median#Computation中实现该算法,并给出了一个基于一组随机点的使用示例。

注:这是平面上的点,因为我不能决定两个球面坐标要怎么求和.因此,你必须事先用一个平面投影来映射球面坐标,但这一点在以前的答案中已经被触及了。

代码语言:javascript
复制
from math import sqrt
from random import seed, uniform
from operator import add
seed(100)

class Point():
    """Very basic point class, supporting "+", scalar "/" and distances."""
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __repr__(self):
        return "("+repr(self.x)+","+repr(self.y)+")"
    def __add__(self, P):
        return Point(self.x+P.x, self.y+P.y)
    def __div__(self, scalar):
        return Point(self.x/float(scalar), self.y/float(scalar))
    def delta(self, P):
        dx = self.x - P.x
        dy = self.y - P.y
        return sqrt(dx*dx+dy*dy)

def iterate(GM,points):
    "Simple implementation of http://en.wikipedia.org/wiki/Geometric_median#Computation"
    # distances from the tentative GM
    distances = [GM.delta(p) for p in points]
    normalized_positions = [p/d for p,d in zip(points,distances)]
    normalization_factor = sum(1.0/d for d in distances)
    new_median = reduce(add, normalized_positions)/normalization_factor
    return new_median

# The "clients"
nclients = 10
points = [Point(uniform(-3,3),uniform(-3,3)) for i in range(nclients)]

# Centroid of clients and total of distances
centroid = reduce(add,points)/nclients
print "Position of centroid:",centroid
print "Sum of distances from centroid:",
print reduce(add,[centroid.delta(p) for p in points])


print
print "Compute the Geometric Median using random starting points:"
nstart = 10
for P0 in [Point(uniform(-5,5),uniform(-5,5)) for i in range(nstart)]:
    p0 = P0
    for i in range(10):
        P0 = iterate(P0, points)
    print p0,"-->",P0

print
print "Sum of distances from last Geometric Median:",
print reduce(add,[P0.delta(p) for p in points])

输出

代码语言:javascript
复制
Position of centroid: (-0.4647467432024398,0.08675910209912471)
Sum of distances from centroid: 22.846445119

Compute the Geometric Median using random starting points:
(1.2632163919279735,4.633157837008632) --> (-0.8739691868669638,-0.019827884361901298)
(-2.8916600791314986,4.561006461166512) --> (-0.8929310891388812,-0.025857080003665663)
(0.5539966580106901,4.011520429873922) --> (-0.8764828849474395,-0.020607834485528134)
(3.1801819335743033,-3.395781900250662) --> (-0.8550062003820846,-0.014134334529992666)
(1.48542908120573,-3.7590671941155627) --> (-0.8687797019011291,-0.018241177226221747)
(-4.943549141082007,-1.044838193982506) --> (-0.9066276248482427,-0.030440865315529194)
(2.73500702168781,0.6615770729288597) --> (-0.8231318436739281,-0.005320464433689587)
(-3.073593440129266,3.411747144619733) --> (-0.8952513352350909,-0.026600471220747438)
(4.137768422492282,-2.6277493707729596) --> (-0.8471586848200597,-0.011875801531868494)
(-0.5180751681772549,1.377998063140823) --> (-0.8849056106235963,-0.02326386487180884)

Sum of distances from last Geometric Median: 22.7019120091

我自己的评论

在这种情况下,位置(质心对GM)是非常不同的,但结果是相似的。当你在一个点(一个城市),或者在一些特征上,比如一条线(一条道路)上有某种聚类时,我期望在位置和平均距离上都会有很大的差异。

最后,使用numpy可以加快速度,由于时间有限,我避免了使用numpy :)

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/26455778

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档