• Members 1 post
    Jan. 4, 2022, 5:49 p.m.

    Loading pages is very slow, it's very bad for user behavior.

    Big offsets

    PostgreSQL Documentation

    Django Documentation

    Execution time: 805 ms

    SELECT ••• FROM "misago_threads_post" INNER JOIN "misago_categories_category" ON ("misago_threads_post"."category_id" = "misago_categories_category"."id") LEFT OUTER JOIN "misago_users_user" ON ("misago_threads_post"."poster_id" = "misago_users_user"."id") LEFT OUTER JOIN "misago_users_rank" ON ("misago_users_user"."rank_id" = "misago_users_rank"."id") LEFT OUTER JOIN "misago_users_online" ON ("misago_users_user"."id" = "misago_users_online"."user_id") LEFT OUTER JOIN "misago_users_bancache" ON ("misago_users_user"."id" = "misago_users_bancache"."user_id") WHERE ("misago_threads_post"."thread_id" = 3 AND "misago_threads_post"."is_event" = false) ORDER BY "misago_threads_post"."id" ASC LIMIT 10 OFFSET 52605
    

    /misago/threads/views/thread.py in get_posts(36)
    /misago/threads/viewmodels/posts.py in init(29)

    QUERY PLAN

    Limit  (cost=78664.83..78679.78 rows=10 width=7430) (actual time=804.250..804.895 rows=10 loops=1)
      ->  Nested Loop  (cost=3.62..79117.91 rows=52908 width=7430) (actual time=0.201..801.837 rows=52615 loops=1)
            Join Filter: (misago_threads_post.category_id = misago_categories_category.id)
            Rows Removed by Join Filter: 210460
            ->  Nested Loop Left Join  (cost=3.62..75677.85 rows=52908 width=7011) (actual time=0.180..730.441 rows=52615 loops=1)
                  Join Filter: (misago_threads_post.poster_id = misago_users_user.id)
                  Rows Removed by Join Filter: 163008
                  ->  Index Scan using misago_threads_post_pkey on misago_threads_post  (cost=0.42..74059.72 rows=52908 width=2624) (actual time=0.090..653.235 rows=52615 loops=1)
                        Filter: ((NOT is_event) AND (thread_id = 3))
                        Rows Removed by Filter: 258534
                  ->  Materialize  (cost=3.19..30.89 rows=2 width=4387) (actual time=0.000..0.000 rows=4 loops=52615)
                        ->  Nested Loop Left Join  (cost=3.19..30.88 rows=2 width=4387) (actual time=0.082..0.103 rows=4 loops=1)
                              ->  Nested Loop Left Join  (cost=3.04..14.52 rows=2 width=4273) (actual time=0.066..0.079 rows=4 loops=1)
                                    Join Filter: (misago_users_user.id = misago_users_online.user_id)
                                    Rows Removed by Join Filter: 3
                                    ->  Hash Right Join  (cost=3.04..13.48 rows=2 width=4261) (actual time=0.055..0.064 rows=4 loops=1)
                                          Hash Cond: (misago_users_rank.id = misago_users_user.rank_id)
                                          ->  Seq Scan on misago_users_rank  (cost=0.00..10.30 rows=30 width=2106) (actual time=0.007..0.008 rows=2 loops=1)
                                          ->  Hash  (cost=3.02..3.02 rows=2 width=2155) (actual time=0.028..0.029 rows=4 loops=1)
                                                Buckets: 1024  Batches: 1  Memory Usage: 13kB
                                                ->  Seq Scan on misago_users_user  (cost=0.00..3.02 rows=2 width=2155) (actual time=0.008..0.013 rows=4 loops=1)
                                    ->  Materialize  (cost=0.00..1.01 rows=1 width=12) (actual time=0.002..0.003 rows=1 loops=4)
                                          ->  Seq Scan on misago_users_online  (cost=0.00..1.01 rows=1 width=12) (actual time=0.005..0.006 rows=1 loops=1)
                              ->  Index Scan using misago_users_bancache_pkey on misago_users_bancache  (cost=0.15..8.17 rows=1 width=114) (actual time=0.004..0.004 rows=1 loops=4)
                                    Index Cond: (misago_users_user.id = user_id)
            ->  Materialize  (cost=0.00..1.07 rows=5 width=419) (actual time=0.000..0.000 rows=5 loops=52615)
                  ->  Seq Scan on misago_categories_category  (cost=0.00..1.05 rows=5 width=419) (actual time=0.006..0.009 rows=5 loops=1)
    Planning time: 3.899 ms
    Execution time: 805.258 ms
    

    Slow Counting
    wiki.postgresql.org/wiki/Slow_Counting

    Execution time: 220 ms

    SELECT COUNT(*) AS "__count" FROM "misago_threads_post" WHERE ("misago_threads_post"."thread_id" = 3 AND "misago_threads_post"."is_event" = false)
    

    /misago/threads/paginator.py in page(15)

    QUERY PLAN

    Finalize Aggregate  (cost=56884.02..56884.03 rows=1 width=8) (actual time=217.525..219.502 rows=1 loops=1)
      ->  Gather  (cost=56883.81..56884.02 rows=2 width=8) (actual time=213.542..219.496 rows=3 loops=1)
            Workers Planned: 2
            Workers Launched: 2
            ->  Partial Aggregate  (cost=55883.81..55883.82 rows=1 width=8) (actual time=207.625..207.626 rows=1 loops=3)
                  ->  Parallel Seq Scan on misago_threads_post  (cost=0.00..55828.69 rows=22045 width=0) (actual time=0.033..205.208 rows=17538 loops=3)
                        Filter: ((NOT is_event) AND (thread_id = 3))
                        Rows Removed by Filter: 86181
    Planning time: 1.024 ms
    Execution time: 219.535 ms
    

    for example, see how threads implemented in other Internet forums with relational databases
    github.com/phpbb/phpbb/blob/5f3fd2739e904b409bcfe100ac7a62aecbc5d28d/phpBB/viewtopic.php#L1187
    github.com/phpbb/phpbb/blob/5f3fd2739e904b409bcfe100ac7a62aecbc5d28d/phpBB/viewtopic.php#L289
    github.com/SimpleMachines/SMF2.1/blob/debbc17370d9c9bdeee97f05eb595f48785f2788/Sources/Display.php#L978

    /misago/threads/paginator.py

    from django.core.paginator import Paginator
    from django.utils.functional import cached_property
    
    class PostsPaginator(Paginator):
        """paginator that makes last item on page repeat as first item on next page."""
    
        def __init__(self, object_list, per_page, orphans=0, allow_empty_first_page=True):
            per_page = int(per_page) - 1
            if orphans:
                orphans += 1
            super().__init__(object_list, per_page, orphans, allow_empty_first_page)
    
        def page(self, number):
            """returns a Page object for the given 1-based page number."""
            number = self.validate_number(number)
            bottom = (number - 1) * self.per_page
            top = bottom + self.per_page
            if top + self.orphans >= self.count:
                top = self.count
            if top < self.count:
                top += 1
            if number > 100 and bottom > self.count/2:
                return self._get_page(self.object_list.reverse()[self.count-top:self.count-bottom][::-1], number, self)
            else:
                return self._get_page(self.object_list[bottom:top], number, self)
    
        @cached_property
        def count(self):
            return self.object_list[0].thread.replies+1
    

    Slow Bitmap Index Scan
    Execution time: 463 ms

    SELECT ••• FROM "misago_threads_post" WHERE ("misago_threads_post"."posted_on" > '2022-01-02T23:50:34.714208+00:00'::timestamptz AND "misago_threads_post"."thread_id" IN (3) AND NOT ("misago_threads_post"."id" IN (SELECT ••• FROM "misago_readtracker_postread" U0 WHERE U0."user_id" = 1)) AND "misago_threads_post"."category_id" IN (6))
    

    /misago/readtracker/threadstracker.py in make_read_aware(30)

    QUERY PLAN

    Unique  (cost=3896.15..39897.69 rows=13 width=4) (actual time=36.648..463.569 rows=1 loops=1)
      ->  Bitmap Heap Scan on misago_threads_post  (cost=3896.15..39897.69 rows=8434 width=4) (actual time=36.646..458.577 rows=52557 loops=1)
            Recheck Cond: ((thread_id = 3) AND (category_id = 6))
            Filter: ((posted_on > '2022-01-02 23:50:34.714208+00'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
            Rows Removed by Filter: 59
            Heap Blocks: exact=27814
            ->  BitmapAnd  (cost=3893.68..3893.68 rows=16867 width=0) (actual time=31.655..31.657 rows=0 loops=1)
                  ->  Bitmap Index Scan on misago_threads_post_thread_id_8083899e  (cost=0.00..1357.23 rows=52908 width=0) (actual time=11.836..11.836 rows=53122 loops=1)
                        Index Cond: (thread_id = 3)
                  ->  Bitmap Index Scan on misago_threads_post_category_id_79f7f0ef  (cost=0.00..2531.98 rows=99141 width=0) (actual time=18.336..18.336 rows=100864 loops=1)
                        Index Cond: (category_id = 6)
            SubPlan 1
              ->  Seq Scan on misago_readtracker_postread u0  (cost=0.00..2.24 rows=95 width=4) (actual time=0.021..0.036 rows=116 loops=1)
                    Filter: (user_id = 1)
                    Rows Removed by Filter: 4
    Planning time: 1.367 ms
    Execution time: 463.757 ms
    

    Acceleration 10 times

    create index forgotten_index on misago_threads_post(posted_on, thread_id, category_id, id);
    

    QUERY PLAN

    Unique  (cost=2.90..14155.69 rows=13 width=4) (actual time=0.085..33.851 rows=1 loops=1)
       ->  Index Only Scan using forgotten_index on misago_threads_post  (cost=2.90..14155.69 rows=8438 width=4) (actual time=0.084..31.309 rows=52557 loops=1)
             Index Cond: ((posted_on > '2022-01-02 23:50:34.714208+00'::timestamp with time zone) AND (thread_id = 3) AND (category_id = 6))
             Filter: (NOT (hashed SubPlan 1))
             Rows Removed by Filter: 59
             Heap Fetches: 3023
             SubPlan 1
               ->  Seq Scan on misago_readtracker_postread u0  (cost=0.00..2.24 rows=95 width=4) (actual time=0.014..0.030 rows=116 loops=1)
                     Filter: (user_id = 1)
                     Rows Removed by Filter: 4
     Planning time: 0.589 ms
     Execution time: 33.878 ms
    
  • Jan. 4, 2022, 6:12 p.m.

    Could you share how many users, threads and posts your forum has? Or what server are you using for your installation?

  • Feb. 1, 2022, 9:39 p.m.

    There are few more things I see here.

    First, thread from your query has at least 52605 posts. This number is MASSIVE! I've never seen a forum thread this long. It also has very low thread id = 3. Is this a synthetic test? Even Discourse limits thread length to 10000 posts, which is number I've also used in my benchmarks at which Misago works great.

    Solution used by phpBB3 is very clever, but they only optimize for thread's start and end. What happens at page 200 out of 400? DB will still have to skip 2000 posts here.

    I've also tested pagination performance on some very large forums, and some of them take over a two seconds to first byte for middle or last page on threads list.

    In theory we could drop offset pagination for Misago and use cursor pagination instead, that's very safe bet to scale (and its used for threads lists and profile activity feeds), but this introduces its own interesting problems for threads:

    • We allow links like /?after=2321 or /?before=41421
    • We have duplicate content hits

    There are some good insights on Discourse forums: meta.discourse.org/t/the-megatopic-public-good-or-public-menace/85592

    So I guess I'll buy into "we should have thread length limit" option in v4.

  • edit

    Thread title has been changed from Very slow performance.