Reducing Memory Usage in my Render Web Service and Celery Task

catsarebetter · September 4, 2023, 11:01pm

I was able to reduce my web services’ memory consumption by 3-4x last week. See the full post for a step by step deep dive.

The Problem

My tech stack is Render, Django, and React. My code was experiencing a looongg lag time in all of its views (controllers if you’re a ruby on rails expert). My DB at the time only had 70+ jobs postings and it was taking 30 seconds to load. Further, it was a single 500 word article and it was taking 15 seconds to load. I’ve used the same architecture for my blog sites many times and this is a first.

This can have a lot of different implications, but for a job board it deeply affects the UX and the SEO.

The Solution

The solution turned out to be to remove the spaCy library. It’s an NLP library which I used to grep for nouns and verbs in job postings. The corpus that it uses is enormous. For example, the default English language model en_core_web_sm is relatively small, typically around 11-15 MB in size. Larger models like en_core_web_md are around 50-100 MB, and even larger models like en_core_web_lg can be several hundred megabytes in size. I was loading the package in my views file as a global - nlp = spacy.load("en_core_web_md").

My use cases for it is a high-powered regex and parser to generate json data from web pages. I could use the openai api to do this, but spaCy is completely free and deterministic in the way that I need the code to be.

Fixing Architecture

I have a single monolith for my project. It houses my frontend, backend, and crons.

Crons

What I did was make a new branch techstackjobs_celery and proceed to rip out all the frontend architecture, static directories (30-40% memory improvement), node_modules, and loaded spaCy onto the code. Then I added a background worker to Render and used this branch.

Web Service

I kept the frontend, views, and the staticfiles in this codebase, but removed the spaCy library and all the Celery tasks. In the celery tasks, I made sure to load the corpus as a global. The way that deploy engines like Render or Vercel work is that each tasks is allocated a certain amount of memory. The cron library you use doesn’t typically default provide it so they set it themselves to avoid huge crashes in their servers. If I loaded spaCy in a task, this would immediately crash and give me a SIGKILL, breaking all my future tasks until I restart the server.

Result

My Render dashboard

Sincerely,
Hide from Techstackjobs

system · October 4, 2023, 11:01pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Understanding Memory Usage	1	1127	February 7, 2022
My Celery Task Manager is out of memory 512 mb	2	1049	March 28, 2023
Migrating from Heroku - Slower Performance	2	297	October 23, 2023
Memory usage spikes	3	412	June 24, 2023
Optimizing Gunicorn	7	2442	August 25, 2022