Performance of Django app on Render

I looked at the performance of my Django app hosted on Render to see it can be improved. My goal is to get response times for most pages to within 100ms. The app itself is in a good state that there are no unnecessary calculations and I’m happy with performance on localhost. The performance on Render however is subpar. In particular, I want to know if there’s a bottleneck somewhere and whether that would be the database or rather the app VM.

curl was used to issue ten consecutive requests and display different timing metrics. I averaged them in a spreadsheet and these are the results (all values in miliseconds):

dnslookup connect appconnect pretransfer starttransfer total TTFB
Starter (template) 5.46 29.14 68.87 68.97 146.70 149.13 77.73
Starter (ORM simple) 7.12 30.55 70.55 70.67 309.54 345.07 238.87
Starter (ORM complex) 5.08 28.22 68.65 68.77 654.63 681.49 585.86
Standard (ORM simple) 3.74 26.80 63.48 63.59 196.03 235.14 132.44
Pro (template) 5.49 28.15 69.59 69.69 144.82 146.98 75.13
Pro (ORM simple) 4.98 27.54 68.53 68.65 195.37 228.94 126.71
Pro (ORM complex) 2.83 27.98 68.65 68.77 281.73 313.07 212.97
localhost (template) 0.02 0.41 0.00 0.46 24.83 24.89 24.38
localhost (ORM simple) 0.02 0.38 0.00 0.40 66.62 66.71 66.21
localhost (ORM complex) 0.02 0.35 0.00 0.39 189.86 189.94 189.47
  • template refers to a Django view that only renders a basic template without issuing any database queries (i.e. a 404, about or imprint page). This serves as a baseline to see how fast the server responds without taking the Postgres VM into account. As Render doesn’t allow downgrading Postgres machines, I could only test with the current tier, which is the Standard level with 1GB RAM and 1 CPU.
  • ORM simple is a Django view that renders a template but also runs 6 queries with ~13ms in total (according to Django Debug Toolbar).
  • ORM complex is a Django template view with 23 queries that run in 52ms in total.

My laptop has 8 CPU cores and 16 GB RAM. All localhost measurements were all taken using the gunicorn WSGI server with DEBUG=False. The Postgres instance is on Standard level with 1GB RAM and 1 CPU. The app’s VM on Starter level has 512MB RAM and 0.5 CPU while it has 4 GB RAM and 2 CPUs on Pro level. As Render doesn’t allow to downgrade Postgres machines, I couldn’t do the measuring with different database virtual machine tiers.

This is the command I used to do the measurements:

for i in {1..10}; do 
    curl -w "dnslookup: %{time_namelookup} | connect: %{time_connect} | appconnect: %{time_appconnect} | pretransfer: %{time_pretransfer} | starttransfer: %{time_starttransfer} | total: %{time_total} | size: %{size_download}\n" \
        -so /dev/null \
        "http://localhost:8000/404/" ; 
done

The values up until appconnect (DNS lookup and SSL handshake) look nothing out of the ordinary for me. That’s also what other websites have as well. As browsers cache connection data for subsequent requests anyway, I’m mostly interested in the waiting time / TTFB (starttransfer - pretransfer).

What’s astonishing to me is that rendering a simple template takes 3 times as long on Render on both Starter and Pro (~75ms) machines compared to localhost (25ms). If I could shave off those additional 50ms I’d be very happy.

The fact that a regular page has a waiting time of 127ms even on Render Pro is disappointing considering its high cost. Out of curiosity, I ran the test on a Standard tier as well and got 132ms, which seems like a better price-value ratio for me.

Is it unusual to not have the same performance with Standard / Pro tier VMs as on my localhost?

I’d be interested in your setups and response times. How powerful are the machines that you use?

Could be network time? What region are you hosting and where are you querying from? Compute/node size should be mostly irrelevant since python is on a single core unless multi-processing.

Both VM and database service are in the Frankfurt region and I’m querying from within Germany as well.

Isn’t network time already taken out of the equation if the pretransfer measurement is subtracted from Starttransfer? I assume transfer from server to client can also be measured as total - starttransfer, no? That’s a consistent 3-30ms across all VM sizes but depending on page size.

Hi there,

You can’t compare services on your local machine to those on a shared cloud computing environment. Your local setup essentially has zero network latency and access to resources that are many times more powerful than what your Render service has access to.

I also expect the simple example to be closer in performance across our instance types, as this shouldn’t be as resource-intensive as the more complex examples. You may have a point regarding Standard vs Pro, but did you see this result across multiple tests? A single set of 10 requests probably isn’t a large enough dataset to draw meaningful conclusions. A Pro instance type should have overall better performance than a Standard instance, but this may not always be shown in a single set of non-current requests.

Regards,

Keith
Render Support, UTC+10 :australia:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.