Note3 :《集体智慧编程》用户相似度计算

时间:2023-03-08 22:00:53
Note3 :《集体智慧编程》用户相似度计算

欧几里德距离评价:

aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAASQAAAB4CAIAAADUoBiDAAAan0lEQVR4nO3dZ0BTV8PA8fR5rK19bZ9qa6ut2tY6aodWcYt7AFUpIFpRcQ+qgARULCigCIIMQRAQBNkbmbK3bAJhhb0JCSRk73XP+0GtigJBMVFyfh8RkpuYP9x7zr3nosArtXlt+mixfd2r//FF/GKzH6f8Fd0nzfdCcsTPMpg2RTeOKu/tUFwoeW8ABCkKGBsEyQiMDYJkBMYGQTICY4MgGYGxQZCMwNggSEZgbBAkIzA2CJIRGBsEyQiMDYJkBMYGQTKi6LEJ6fiultpK6F3S2toq78/FW6HosRGTrxhpLf4aepccOHBA3p+Lt0LRY+uNRdtaGZ+47g+9OzIzM+X9uXgrFD221mC0b0hYeJO8twNSAIoeW5kLOiA0LB1e+gq9fTA2GBskIzA2GBskI4ocGwIAv8De0DcQxgbJgiLHJgGgM874tKs7jA2SBUWOTQxARyz6sp9/UilP3tsCKQAYm21IWHq1vDcFUgQKHltLtL5NUDCMDZIFRY5NCMAjD3WLO64wNkgWFDy2LBddJ1+fwnZ5bwqkCBQ+NrRPaFgFHIyEZADGBmODZATGBmODZESBY0MEgBFrbRYYGlvHHObbWE1lDyP8w1MSS+qZDWl3vTzcQvPL2we4MttOaLxQ4NjEPNDhg7aNCksfbnyE1xTieHjT4tXa241cCr1PL/pl9hfLL9jH1ZBktp3QeAFjGyE2TEzIozhrq7Pqi37ZusYiohrne+GwWyiMDRo9GNsIsRGamtoyblj9c1z5qKN/Hk5Q53QAfT8gvQXuRkKjpcixcQHOBe0YF5bbM/w3dsVdvHL1wpHgFjGrnxystdPmgVdKA3VgAJ5QCY2KAscmZIOsi2ifjLAK2rDfh2DvHXRxuOSGAdyBznTDz0wD8tz9Mgtz83pltKHQOAFjGzG22kiDE1ZGzhEdT2Jbr7xM2ywwFkvly2hDoXECxjZibB1lodEp0UV1DCDkUJvjLjtcveARV4yFc3PQKClwbAK2IOXcubupI8UGQWNDcWOT8Bjd97X0XONgbJBsKG5sYh6jw0fdLCA7sVEg722BFIKix2YbVTLsNBsEjRkYW0l6O0AkIhGPzXgzLA6XL5b3q4LeYTC2kvR2wOkoynM7tObN7DOyCayR96uC3mGKG5uIy6i0V3dJLCkgAD6ptjzYYNtM1CcTUCjUnN827jtlIaXLFhYGe1fO+u5T1ILtx2wL5f2qoHeY4sYmZDOyLqr7ZJRU0AAAbEp7RpDO7B8/nTABtVINHfCQxJHuYSQA9ObY7dZc+OliGBs0LBjb49gQCQfPLLumNfuLKagPJ64+fihA+j1CiaA11OHgWlUYGzQsGFvJk2k2MU9MrYsz2bhs9mTU1O9/3nPJLZ8sQaR7LG5NnNUeA10YGzQchY1NLGAR4k8fCMitqH26w4hIxAOP7lzSXvjDF6jJs5dvMomtoyM8qQYYqaXep+3Pw9ig4ShsbFweo9ZH3TiqBDdomq3G/+DR1R9PRE2d9oP2tTJWL1eqP2/N8VcCXIxcy9/KtkLjA4xtcGwc8sOAc6orURMmfDb3231BWZ1UaYZKeDQ8ua+rj/1WthUaH2Bsg2MDgNic6nZ919JvJvwf6tst2m5Zpd0MeWwhNN4obGxsLr3s9iZ0RMHLsQHAwJUFme6ePfkDFAq1/pxDYh1ZyokACBqawsZGZVPiTWca+qe9KjZEwmhIT7qw+L//+QD13w+3mobEwTvcQ29MYWOjsOkJF9fbJTxqe+U6WSIWqasg0FL1P19/ivpy7hG0R3GrrLdQYVBbe9IctDQ1NTQ0NP6+5ZFST5H3Fr0lbxoboTwiPtTTMyojpe79WtyNwmYkXlR3zyjpHOpqNj69uy5kp9p3n/0PtUBJy9qtqIMJgJQTb4OxmstTolMySt+zN2mMSER8fHFgdFJ+SQt58Kpk/D48JjHA8sJ+Pb2Ten9uXKt56nJIwTh9l94sNgQpc9msvmjS/9bo7rtXO0abJBPIAJsSi1Z3Sxs6NgA4AOTe3rzo148/RE1fu+ZsYCVLJB5lbYhEJGIQ8JX+Vw+oHNe/OYrzUsYREZdWdGPlzv3oy365Df10tui5f6NVFsb5/W0Y0y5BBKDY/8gOA10L73H6Lr3pXzYuuTDIaKPe2Ys3C96rld34BCbOQ3uHa9xwsUkAYA/EnN+98jvUhP+bvkJLLwbPFUpG9TxCWk9XoI7aUVvHiOy6XhL1vXqTxgoiEXNILR2YYGcDYyMDzyTic/8m5nIYVHwvXQAAAtqSzK97OYY8Gqfv0psfs1WF6e03P+cY2jIGWyM7XDyj+rb60aAsLGGERbJ6S6NsdDb/MuWjqTN/1jQOqOC+tCc0NH4/qfKB1Ybd+h5pOV2KGdpz2K1VwQ5XT5joWqY3APDy5fFUjFeAZ1hEZmu/HDZOFt44Nm7a7cNoi0uhee/XUS0Xz6h1UzdOKMGRR/xeGsbP6eTKWZNRkz7/Zpt5SVEPR7pshOyeggJP/bU/34hv6n+/3p63pTsr1sZow1p9j04OTfhsj1wi5HI6izPum/kklxaOsGTue+y1YkOe0xV4Be1i511EGNvtQkb0Zo8/mtgA6CsPunDq5w9QqMko1BHHwtYuqZ5joCnD5arqvLUuza3kl36PD3oVY/Kixtjb2EQy7qG95c7fNri1dtOE/35VQOlocV2n45KV2jDcDYXed68RW3PhfRtTFRUVFXUVFbTvHbTKeS/LyLqxW7FUBAAhx/m04V6VIantUtEPSq3vY732k4wuNgGjNzfJWXcT6r8TUN8uMAqKqZVivIxZ9eie+b7FOy8Vsomc54/0+ABURJw7fkLlyHmHBwUDQMihYb3/+nv/qWv+eZXvykpfnTWJ7pYqmrs0zYJqaX180P3Ix/vC3gOnHON6AUs08s8PQUDCRAQYbfnqr7AiAvPxDgIXj62ONNdzeZBc3UeueVheVhDfPFav4t0ymtgQCWDU5UfbGB7Yv2vbQTQajTZCm+/drHzWwz69e+w2SQwAuTLM/tYV9JBMLqKdUks6Ka9/dwtmJ6PQWv1qTkmzlJ/uAQIm5I7KT5Mm/Af1K9omtnZgxJ/AZybYGKxYcNafhFCffTqFAnZPVbSn1xV9XeU1WofN7NPrG1Mc7zqb/GNq55eMbXr93x9jiNWNSUtwtbh0UGvz3N9OhjdmZUQ+9LewtbpsZReaTwbcN1hqhdeamXHjr4nzr8Z0DFABADxCS7r7VY3FczT+MbN2cD6/z9zBw22cns89itgQsYCDvXl2y/KfVYyMohsAYCLIQ6tf5mud8gzAvCu/kKVFaWYkGqq7Y0s6pT4Ri1zXFXLk608mTpqqeymwdMQ5bnF9dLDpie+VbDIQ5LmCuCxSXehR64yW8iiXQwa6e9Qvut4/v+Ja+kDb2LyFIg6lF99aP5zWXiKNO+wfp77S++GxvmFpdQ/u7PhynUnwNSP1Wz7eDxrHYgMJJRUeZz776LgXro8gAYBeVRz0j9bCZzSNnILrxuKJ3j2jiE3MZxfb/r5e5ZCee3YDS/A0NjXzm8l5faMbEJe/14iN1UbIMls/+WOlY/fDMWThSN9OK7l320Rn7p779Qh4bh8bkYgFrH4qR8jvSr5mtvun6Uu2/+Ve3EERjnJOYSjEYh+jM8qzh7PR+EpYxbAjfmIek8VmsKg9mELLpbNnz1qhdzuwsIc94muWBr2uPuryiolbbXN62jgASPg85gCx6xkimcYcpwt5Sh8bR8ApvbP1291/u/kUkrgAABEV6bt/ZsHRG34FNWN5jwkxAKSyACtbk1ND0jt7yiahoG3g9U8PHnVsHHJjcrjRuu9/2nMlqrKBMvLHYaDorovxvnkHgpqRVwxzAwC4ZV52h//4caX+rQYmWzRWYyNsQk1OZtjd4UTkFDT0S/OyxaSG7nu7v/1i42HP7GLyGP0fM3ANsabKH84yDG+rU7AhWqljEw/w+oIMvpt/1i41sxsAAETMHuLDY6przN0Ta1+4eRKlsaKgsrgaT/33K9TmqqKMmJiYmITU+EdtZJ5o2H1+EQDER+5GF49oDUl7n9aF8Mym/te+fExMaiBGHNPxa8TgpfoMCan1pWFWukuWKl+Izu2S6gq3kWLj9mfcvKy569eN1+PZgD+oNUQCyHVFOZWYxj76vw9Yn5GXll6AbSfIaMJOyOstr7TZPuPzvf88wDYO+rNGa68vy46JiYmJS4rJbSGw+CIA2P3NDYWJiQ+Lm2iiIa9wfxLbjLOhbVIc+I4rUsfGI3KrHXfP3HLB61EpBSASEa297MGxD2ad9Awr7RGJREKRGEFEAg6Hm2N9eOtJLXRk2eMfRESCQidDzaWTPpk06ct5/9vsmoVnsOQ9xs3FY2vdVI0TyFINRiJicsl9rxMq38yziGXwpDy2ohR7u5rsm6cT0PTCbuTTRxS0p9vpaM+btXCNvkeLhCMUiUQSsQQBAEEkIgGTyk033bH85BGb5KonP4EU2izZPO9ntbPeicTBD/cWIBIxpRsTckN58rRJS0/5FJYPiCUSsfDx6WqISFjmZXVo9aRPJk2a8u2ElXYx9X10BHRkOtvs+urLGVr2WFbvUL8RGLiGGIs1EzfZZPW0Kti1tqOK7ebumd8dsU3J6AQD9RnRpkq/zvjgIx3PsNTYlLQ0r9gSWu9Dl21b1LW2KM3R+ftxbGIeIMS7mN9ydA+Pz4sKu2uiMX3x5Xi8tEOAb83oYqOWPrQ/emDTH0e92nsFEinHvUW4qMCLx79faZf9wgAJAEAEQG/WTQsfT4er/5w4uWP7ofAOentcRDquooUFQD8BF3ljw3rVPVuXfHPw/L+xAUDv9Lu+4+xl06giWRzS0FrzH4SYo52j7qHXfveXVdiDosI6bG5gRj9AJKAv1cf2tp1TcGxebFyomcbUeZeC6qrJgEctzb13Zd9MwzAiZ2CoQ1BiaaWX/tSPjnjUEnsVbAFpqWMTMQU9mTfUZqhp6h42sbG3u2pv9bfetplTV2nvM7K09UuMLmliUzAJlldsrutpLTuJfhybRAiopQkxhSW1BDbo7yoPMZ+vfDOf0CbvX2mjia2nKsTynLam2jG7tDYgkn4Yozs93lpf6SfDIApCe/qp4vQ3V8fduGnn7GzrEVdQnhV/20l36QLNC1fsnTxT6nAEHgB0SndBtNklK5tTuxaeNH8uNsBIdDJ2d7uTUdddmedrY3fDJrqst+ctTAL3lgRFejvctL19zy+6AN+RbKKkprV/r5G1c2BCfBUdAATQKtOTCvPLu5mAQm6Nv/L9MvukBhwDAB6uItX3iKpnFbOvMsUv0NXGOzK15MUbtArasrPtdD6cYxHVPqBgh2yjGo0UcDDuuw7vUFJSUtI4ZX4np6cr9KjG9rVKmhZ2D6qffWh7wmy0L6L/3Y3898e5+PbiWIeDDlkkOhXIGbsbg3FYh44njRgbixDqoLlJef3pE+H1o3sOekWe1yXtXzUsy3l93Cf7zZTm/ASz1WtWHnbKbG1lA2ZjRpLdHqWV65R07yXV9z0LRyIEnX4XNps8H5sA525+K/xuEqYt38dq1ZQffl1yLriu6i2cR1gbeMJoj/ox88CMfiARcHrjTE5qb1TaiTbxKR50B0gJv59Yk3hzr21aG5EEgLivID/eVcMyj8VtitRT3rn0R1V9u8AXrgYRDWCjQ85vm6IR+KiX8frTpO+nUU1qI0I2hUYhkUgkCo3B5ovFXBplgEyiMFm85wauh4iNWf+wwNcQHcfq40jkPlFAbclPPPeV4QPCiLGVOimrLdy02zg0b/ipqZdJ+utTna5sX7jJo61t4Mmen0Qk4DHIZBKNxReJEICI+DwWlUQik6gcnkjy/MmCL8WG9IQfNXYJvlvQ1JYf7qyubvOop4UmGqMJgxcIOTQ6lUJjcPgSABBEzGPQKGQShc7gCAY9G7v9EebeUf0oWhtNIgGAjg2PddPX8G5k8Vsi9Q7aO/sntLE4L4ysUBpTna5pLF7rWN9BFcj7yF3W3sKV2q+KjVmfkh7ufCMouYwA3oVbvVCayxIMltk+6m8dZi0fPg1U3z+1TVnrtNO9/K7Rj5zxGfUPk83Vv1ruEttCHt1R6suxIUiFo7LmjrWrVI/qG9t7R+fV0cRvcibHm2O35uZHOV71jS3qkTyegsPnut8+Pn/Wau19+42vu4anV3f3DRom6c1PvGmyXfmkYyOTOrjc8U8WsXG6ywseuLr7B/rmd4l7MdhePk3eV5tQmrGJhqruWPKQ02wiJrs9N910q9LGM1fCKl7zLCpaW1ey65/L/jIMyC0iMEYxUTU4NiFA8FHHVu39Y43S0jXb/zhpU8wU0JkcsXhM5plfAxdfVZbk4el71y27R4Ivre1lkzm06gjby9vmKamoKU3/6bhXfmE7U8jnP3uDud2NMS43TqN3G8dVg5dHaMe/sYwNkYiFbAqt1ueyuoGenm8mjc7kiRExryX4tKmNg2Vs40Abpjfi9MWkgRp5X/c+UmyImFbXGGu86qPJW64kZza//gmLAmpn+121lXusnRIwnSwOT4pdUUQiEjCJ1Co3g3VnjC9H5NOZLD4NESVeWaR93S3c0+2i2b6tf97C9GJwHRyufBbZk/A7oi9cs7EyDqsf6K4fiDxqldhZ3INNdbBEbzbwwBR67Zq266JvQHx1bw8RDwBAECGHSm+KdD3998nDTjF4uWy0/I1lbDxqT/HNdbs3/jhzyvSvv/5xhfYJdEwbqz3aRkNp0ezvZi74ffmSRSsXzTkS0vPOx8boKvF31v7ls8Xou4UdBOmWIH8lRMTn99XX5PteO7HHyOxOZMPIP8LE12ZfW6ay+ofpn02fMXP+Wl1ji3gc0u11fKe1b0JlU3lY8IXtU+evU9LxSW0lyyE2RAKISa6HNiybPWvGvN+XL/19+fyvDnpXF9dkR7g4njkdgKX0EgIPbFy7Yu5mtJl/MR4AEZdRfltN97SphVdKeQtBitNvxqexjE3Iobal2nk4Wz/m4OEfUUHikzGpfnduPfmata3dzdhqulQnC71FvP664sD9ukFtFMKr9sO43Y+SnE5sX7T6D8diLIkzBp8NRm1exL3ImOxyKS7749HwjYnWt24+ecec7kXEVREBpTQuprS2ncKltjTlhlrfcLL2K24gs+WxM4YAenVOqPdt62eiyoh4YlttWWlyRhNdxEPa0/29blm7x8QVdzIBkAi5HZku90KSs+uI8p70kSfFXMqOjsfmuKmiE8i0VwxGCijN8Tes9m1crGoV3iWhyeuoCBp3YGwv6Uu5b7Rj5wZts3QeX/y6a9dB0EtgbM9DAMClXTu6Z/seTfMwHHkUpTFwDx/4uZp4FvQB8A5MbUDvIsWMjdSJTb2uapVFZjw//YVIRETMdeP167doWdrkSbfOyFM9qdct9HcvQSeSYGzQEBQzts5mbIyhqheWzH42UiMRiZnEWPSctb9onXLMHN21wmJuZcBZvcPbV8C7IUJDg7E9xeoVZF3e8MvUrQa+IRjm6Gbd+zN9DVSXLIWxQcOBsQEAAOD39WOCnTV+mzb5o+9WqWseMxmd0+qbfpk18/ddf97FyfeFQe8yhYxN1NpQEnJc1avySWxCdldepqP2QhRqIur1fbVS+0TY+F1hFHpjChkbvRqb5LJd1av8SWzk+uSb53//4IM3KA3GBo1IUWPL9FQ5Gd9NfXzevIDV31qfn/SGMgoqqodcDACCFDa2nLuq6AwyTQFPPYfkBsYGQTICY4MgGVHI2IhF2CRHVTssmfnsLGMupas6+Ow/xjeC8po7XuOiBGJFUVSIlX1q/avuPAZBQEFj68zBxlxT9WonP3fDWU5/fYnrqlXTtp33LawcdLEov78HV5odP0haZn5NJ0ciRshtRbk5963O6xxatsI2DQA5Xz8EvatgbP+ii3iJ5gtM/GKxLRKxSCgQCIUiCUAAAH3pIZc0F00cZPbPyud8OgQccfZttRVL5839Zv5ObRgbNDQY21NCkrDT+8ixgNQiLCbJ1U530669hx2LJSQOAAJKX2tNafYghSWY5l4eIkaoPVXlZSlejobGMDZoGIoYG78ppSTY9EA4hcJ9tsCTmNHZn3xW2zotD4cr9vnnotoa7UuO0Q0SupRjKOS8pBtWMDZoGIoYG706JufuGXQGeH4wkkvEVdkpqVzy901Kj3Nz8rR1TyQ8vZ5NxKQS2hurBsE1NOMHBODJao8wNmgkMLYnGF2Yh3oTv//2iy+mrd5jci+hhflsVUZSTvS1Q+tnDLJ4zQ6zoK6nt+GEsUEjgbE9RiXUPrBeuOCImZnO1jX7/756O69fNDDQD4AIAMDFt5ZlPPAbJDw6sbiRCUTwLxsknXEfG4/aWY0ryavuB+KnB2jk8pg01zMWxYDx74yYsKOt0EtnxgG/0poE56MGx/ZrXQyISyxvRhBpDtkoTbkFkZamOtqr5uqYR0Yl1vaT3uAe79B4NZ5jE/GYLEZzjs+5y/u2o5N4DCHy+I9QZ05MzLUzXu3g2WAks7Epx/OPHd7FeDq7ytf2xIbP56xbciK8RYJIM0Nd4aFxYM3nj02dNn3D9byi15kXh8a58Rxb9f3D+nuUfpv7zawvF/y88U4Gk//4NjGviE3M5dKJOByRKRBJ2H09LTXF5TXYFjIPkWrNHxaxvqm6+LGSktKabgaTr3AL2UMjGs+xded5Bdy2sjI9dVx984KvVY0zujqYIvDK2CDo7RvPsT3BbMBGXNL8YdrvZ6MzGqkcMYwNkg8FiA0R91XF++z9+LPPNp6Pqq1jwNgg+VCA2ADg05tqE/V3ffrNCoOgyOq2+sQof2OrWBbgwuWOIRlSiNgAYLL6Mu9umTNn3cHz/nGxvpF3zzhkKOQtwiA5UpDYgIhN6grQUp77+ZoTZkbmQR4wNkjmFCU2RMIWDISYrZz32+9qq1VMLGFskMwpSmwAiADoKLU9tGPV0i9/0thz5l4lAPBuUJAsKU5sAAAAcAEWB1ZM+0F5+Zmo9sfnPUKQrChYbEhXpr2O1moYGyQHChYb4JGzna/qn4SxQbKnaLEBwKdTSP3tRIYQwLuKQjKleLFBkJzA2CBIRmBsECQjMDYIkhEYGwTJCIwNgmQExgZBMvL/gddi8BHCND0AAAAASUVORK5CYII=" alt="" />

以经过人们一致评价的物品为坐标轴,然后将参与评价的人绘制到图上,并考察他们彼此之间的距离远近。计算出每一轴向上的差值,求平方之后再相加,最后对总和取平方根。

# -*- coding: UTF-8 -*-

#一个涉及影评者及其对几部影片评分情况的字典
critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}} from math import sqrt #返回一个有关person1与person2的基于距离的相似度评价
def sim_distance (prefs,person1,person2):
#得到shared_items的列表
si={}
for item in prefs[person1]:
for item in prefs[person2]:
si[item]=1 #如果两者没有共同之处,则返回0
if len(si)==0: return 0 #计算所有差值的平方和
sum_of_squares=sum([ pow(prefs[person1][item]-prefs[person2][item],2)
for item in prefs[person1] if item in prefs[person2]]) return 1/(1+sqrt(sum_of_squares)) print(sim_distance(critics,'Lisa Rose','Gene Seymour'))

皮尔逊相关度评价:

Note3 :《集体智慧编程》用户相似度计算

Mick Lasalle为《Superman》评了3分,而Gene Seyour则评了5分,所以该影片被定位中图中的(3,5)处。在图中还可以看到一条直线。

皮尔逊相关系数是判断两组数据与某一直线拟合程度的一种度量。

通常情况下:

相关系数0.8-1.0为极强相关

0.6-0.8为强相关

0.4-0.6为中等程度相关

0.2-0.4为弱相关

0.0-0.2为极弱相关或无相关

最佳拟合线:尽可能地靠近图上的所有坐标点。

修正“夸大分值”情况。

皮尔逊积差系数:

  数学特征:

  Note3 :《集体智慧编程》用户相似度计算

     其中,E数学期望,cov表示协方差

     因为μX = E(X),σX2 = E(X2) − E2(X),同样地,对于Y,可以写成

  Note3 :《集体智慧编程》用户相似度计算

  当两个变量的标准差都不为零,相关系数才有定义。从柯西—施瓦茨不等式可知,相关系数不超过1. 当两个变量的线性关系增强时,相关系数趋于1或-1。当一个变量增加而另一变量也增加时,相关系数大于0。当一个变量的增加而另一变量减少时,相关系数小 于0。当两个变量独立时,相关系数为0.但反之并不成立。 这是因为相关系数仅仅反映了两个变量之间是否线性相关。比如说,X是区间[-1,1]上的一个均匀分布的随机变量。Y = X2. 那么Y是完全由X确定。因此YX是不独立的。但是相关系数为0。或者说他们是不相关的。当YX服从联合正态分布时,其相互独立和不相关是等价的。

  假设有两个变量X、Y,那么两变量间的皮尔逊相关系数可通过以下公式计算:

  公式一: Note3 :《集体智慧编程》用户相似度计算

  公式二: Note3 :《集体智慧编程》用户相似度计算

  公式三: Note3 :《集体智慧编程》用户相似度计算

  公式四: Note3 :《集体智慧编程》用户相似度计算

  以上列出的四个公式等价,其中E是数学期望,cov表示协方差,N表示变量取值的个数。

利用公式一代码:

# -*- coding: UTF-8 -*-

#一个涉及影评者及其对几部影片评分情况的字典
critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}} from math import sqrt #返回p1和p2的皮尔逊相关系数
def sim_pearson(prefs,p1,p2):
#得到双方都曾评价过的物品列表
si={}
for item in prefs[p1]:
if item in prefs[p2]:
si[item]=1 #得到列表元素的个数
n=len(si) #如果两人没有共同之处,则返回0
if n==0: return 0 #对所有偏好求和
sum1=sum([prefs[p1][it] for it in si])
sum2=sum([prefs[p2][it] for it in si]) #求平方和
sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
sum2Sq=sum([pow(prefs[p2][it],2) for it in si]) #求乘积之和
pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si]) #计算皮尔逊评价值
num=pSum/n-(sum1*sum2)/(n*n)
den=sqrt((sum1Sq/n-pow(sum1,2)/(n*n))*(sum2Sq/n-pow(sum2,2)/(n*n)))
if den==0:
return 0 r=num/den
return r print(sim_pearson(critics,'Lisa Rose','Gene Seymour'))