如何使用memcached改善社交网站的性能?

时间:2022-11-15 04:02:26

I would like to implement memcached on my social network site. Being a social network, most data changes very frequently.

我想在我的社交网站上实现memcached。作为一个社交网络,大多数数据都经常变化。

For example if I were to store a user's 10,000 friends in the cache, any time he adds a friend, the cache would need to be updated. It's easy enough, but it would also need to update any time someone else added them as a friend. That's a lot of updating just on the friend list alone.

例如,如果我要将用户的10,000个朋友存储在缓存中,则无论何时添加朋友,都需要更新缓存。这很容易,但是当其他人将它们添加为朋友时,它也需要更新。仅仅在好友列表上进行了大量更新。

There are also user blogs and bulletins which are posted non-stop with new ones and you can only see the ones that are created by a user in your friend list, so I think this would be very hard to cache.

还有一些用户博客和公告,这些博客和公告是不间断发布的新用户博客和公告,您只能在朋友列表中看到用户创建的博客和公告,所以我认为这很难缓存。

I could see possibly caching some profile info that only changes when a user updates their profile, but this would create a cache record for every user, if there are 100,000+ users that's a lot of caching. Is this a good idea?

我可以看到可能缓存一些仅在用户更新其配置文件时更改的配置文件信息,但如果有超过100,000个用户进行了大量缓存,则会为每个用户创建一个缓存记录。这是一个好主意吗?

1 个解决方案

#1


I would say that it is a good idea to cache where possible.... most of the time you will be able to pull items from memcached (especially if you have complex joins and such) faster than a traditional RDBMS. I currently employ such a strategy with great success, and here is what i have learned from the experience:

我想说,在可能的情况下缓存是一个好主意....大多数时候,你可以从传统的RDBMS中快速地从memcached中提取项目(特别是如果你有复杂的连接等)。我目前采用这样的策略取得了巨大的成功,这是我从经验中学到的:

  1. if possible, cache indefinitely, and write a new value when a change is made. try not to do an explicit delete, as you could cause a race condition with multiple concurrent accesses to the data trying to update the cache. also implement locking if an item does not exist in the cache to prevent the above issue (using memcached "add" + short sleep time in a loop)

    如果可能,无限期缓存,并在进行更改时写入新值。尽量不要进行显式删除,因为您可能会导致竞争条件对尝试更新缓存的数据进行多次并发访问。如果缓存中不存在项目也可以实现锁定以防止上述问题(在循环中使用memcached“add”+短睡眠时间)

  2. refresh cache in the background if possible, using a queue. My implementation currently uses a multi-threaded perl processes running in the background + beanstalkd, thus preventing lag time on the frontend. most of the time changes can incur a short lag.

    如果可能,使用队列在后台刷新缓存。我的实现目前使用在后台+ beanstalkd中运行的多线程perl进程,从而防止了前端的延迟时间。大多数时候,变化都会导致短暂的滞后。

  3. use memcached getmulti if possible, many separate memcached calls really add up.

    如果可能的话,使用memcached getmulti,许多单独的memcached调用确实加起来。

  4. tier your cache, when checking for an item, check a local array first, then memcached, then db. cache result in the local array after first access to prevent hitting memcached multiple times in a script execution for the same item. EDIT: to clarify, if using a scripted language such as PHP, the local array would live only as long as the current script execution :) an example:

    分层缓存,检查项目时,首先检查本地数组,然后检查memcached,然后检查db。首次访问后,缓存结果在本地数组中,以防止在同一项的脚本执行中多次访问memcached。编辑:澄清一下,如果使用像PHP这样的脚本语言,本地数组只会在当前脚本执行时生存:)示例:

    class Itemcache {
        private $cached_items = array();
        private $memcachedobj;
    
        public function getitem($memcache_key){
            if(isset($this->cached_items[$memcache_key])){
                return $this->cached_items[$memcache_key];
            }elseif($result = $this->memcachedobj->get($memcache_key)){
                $this->cached_items[$memcache_key] = $result;
                return $result;
            }else{
                // db query here as $dbresult
                $this->memcachedobj->set($memcache_key,$dbresult,0);
                $this->cached_items[$memcache_key] = $dbresult;
                return $dbresult;
        }
    }
    
  5. write a wrapper function that implements the above caching strategy #4.

    编写一个实现上述缓存策略#4的包装函数。

  6. use a consistent key structure in memcached, eg. 'userinfo_{user.pk}' where user.pk is the primary key of the user in the rdbms.

    在memcached中使用一致的键结构,例如。 'userinfo_ {user.pk}'其中user.pk是rdbms中用户的主键。

  7. if your data requires post processing, do this processing where possible BEFORE placing in the cache, will save a few cycles on every hit of that data.

    如果您的数据需要后期处理,请尽可能进行此处理在放入缓存之前,将在每次点击数据时节省几个周期。

#1


I would say that it is a good idea to cache where possible.... most of the time you will be able to pull items from memcached (especially if you have complex joins and such) faster than a traditional RDBMS. I currently employ such a strategy with great success, and here is what i have learned from the experience:

我想说,在可能的情况下缓存是一个好主意....大多数时候,你可以从传统的RDBMS中快速地从memcached中提取项目(特别是如果你有复杂的连接等)。我目前采用这样的策略取得了巨大的成功,这是我从经验中学到的:

  1. if possible, cache indefinitely, and write a new value when a change is made. try not to do an explicit delete, as you could cause a race condition with multiple concurrent accesses to the data trying to update the cache. also implement locking if an item does not exist in the cache to prevent the above issue (using memcached "add" + short sleep time in a loop)

    如果可能,无限期缓存,并在进行更改时写入新值。尽量不要进行显式删除,因为您可能会导致竞争条件对尝试更新缓存的数据进行多次并发访问。如果缓存中不存在项目也可以实现锁定以防止上述问题(在循环中使用memcached“add”+短睡眠时间)

  2. refresh cache in the background if possible, using a queue. My implementation currently uses a multi-threaded perl processes running in the background + beanstalkd, thus preventing lag time on the frontend. most of the time changes can incur a short lag.

    如果可能,使用队列在后台刷新缓存。我的实现目前使用在后台+ beanstalkd中运行的多线程perl进程,从而防止了前端的延迟时间。大多数时候,变化都会导致短暂的滞后。

  3. use memcached getmulti if possible, many separate memcached calls really add up.

    如果可能的话,使用memcached getmulti,许多单独的memcached调用确实加起来。

  4. tier your cache, when checking for an item, check a local array first, then memcached, then db. cache result in the local array after first access to prevent hitting memcached multiple times in a script execution for the same item. EDIT: to clarify, if using a scripted language such as PHP, the local array would live only as long as the current script execution :) an example:

    分层缓存,检查项目时,首先检查本地数组,然后检查memcached,然后检查db。首次访问后,缓存结果在本地数组中,以防止在同一项的脚本执行中多次访问memcached。编辑:澄清一下,如果使用像PHP这样的脚本语言,本地数组只会在当前脚本执行时生存:)示例:

    class Itemcache {
        private $cached_items = array();
        private $memcachedobj;
    
        public function getitem($memcache_key){
            if(isset($this->cached_items[$memcache_key])){
                return $this->cached_items[$memcache_key];
            }elseif($result = $this->memcachedobj->get($memcache_key)){
                $this->cached_items[$memcache_key] = $result;
                return $result;
            }else{
                // db query here as $dbresult
                $this->memcachedobj->set($memcache_key,$dbresult,0);
                $this->cached_items[$memcache_key] = $dbresult;
                return $dbresult;
        }
    }
    
  5. write a wrapper function that implements the above caching strategy #4.

    编写一个实现上述缓存策略#4的包装函数。

  6. use a consistent key structure in memcached, eg. 'userinfo_{user.pk}' where user.pk is the primary key of the user in the rdbms.

    在memcached中使用一致的键结构,例如。 'userinfo_ {user.pk}'其中user.pk是rdbms中用户的主键。

  7. if your data requires post processing, do this processing where possible BEFORE placing in the cache, will save a few cycles on every hit of that data.

    如果您的数据需要后期处理,请尽可能进行此处理在放入缓存之前,将在每次点击数据时节省几个周期。