Optimising The Front Page

June 26, 2024 • 3 minute read
PythonDjangoPostgresSQLDebugging

Funny how simple certain optimisations can be. When you are writing a piece of code for the first time you can actually leave so much baggage behind. This technical debt sometimes can be so obvious and hidden in plain sight, but you tend to ignore it in favour of faster delivery.

After moving this website from a 2GB VM to a 1GB VM I ended up noticing one of these issues way more. Even though I knew my home page was the slowest of all the views beforehand. I never actually noticed how much more it actually struggled on the higher memory machine, but on less memory it became a bottleneck.

Looking back at the views get_queryset method I noticed big issues here. In most cases a Django query object would be lazily evaluated, but since I was combining it with a chain object they were all immediately evaluated and loaded into memory. The issue here compounded as I was loading ALL rows related to the models into the view hogging up that precious memory my new virtual machine had less off.

def get_queryset(self) -> Generator[Model, None, None]:
    query_data = list(
        chain(
            Project.objects.filter(weight__gt=0),
            Post.objects.filter(weight__gt=0),
            FeedItem.objects.filter(feed__weight__gt=0),
        ),
    )
    choices = []
    if query_data:
        weights = np.array([item.weight for item in query_data])
        weights = weights / np.sum(weights)
        choices = np.random.choice(
            len(weights),
            size=self.paginate_by,
            p=weights,
            replace=False,
        )
    log.debug("Returning %s home page choices", min(self.paginate_by, len(choices)))

    return (query_data[c] for c in choices)

So after realising this part of the code was causing issues. I resorted to using splicing, as query objects would evaluate splicing lazily leading to the SQL equivalent of LIMIT 10 for each of the queries. Also added sorting to only get the latest items for display, so that the query doesn't cut of the latest entries. This allows the home page view to use way less memory and therefore improve response times by an average of ~30%, according to Django Debug Toolbar profiling.

query_data = list(
    chain(
        Project.objects.filter(weight__gt=0).order_by("-last_modified")[:10],
        Post.objects.filter(weight__gt=0).order_by("-last_modified")[:10],
        FeedItem.objects.filter(feed__weight__gt=0).order_by("-published")[:10],
    ),
)

As the quantity and size of the sites content grows I will have to optimise this even further as other views will slowly bottle neck the backend even further. Finding a solution for this view in particular is important as its the landing page and by extension will have the most page requests. Enhancements like these can give good results, but over time I will have to drastically change the way I approach the home page or increase resources available to it.