Is Render restarting my app b/c of RAM?

I’m new to Render but loving the service so far. My production app is randomly stalling out for a few minutes and when I check the logs the most suspicious error I see is:

[SolidQueue] Restarting fork[67] (status: )
[SolidQueue] Starting Worker(pid=91, hostname=srv-cn7qcf7sc6pc73cbk1ug-64d5657b7b-4kdb6, metadata={:polling_interval=>0.1, :queues=>"*", :thread_pool_size=>5})

I’ve dug into SolidQueue and this “restart” message is incorrect, it’s actually the queue detecting that a fork was terminated and it needs to be replaced. This indicates is that something outside Solid Queue is killing that worker :thinking: The lack of status code on that log line suggests this might be a SIGKILL being sent to the worker.

My best guess was that maybe I’m exceeding app RAM usage? My app is called hostedgpt and I check the metrics and indeed I’m hovering very close to 512 and it does look like it increased a bit right around the time I got one of these errors.

But how can I see an actual record of when Render kills my app because of memory? I’d like to be able to conclude this definitively and monitor this more closely going forward.

Thanks

Services restarted due to memory overuse will be listed as an Event on the Render Dashboard for that service.

Jason, I appreciate the quick response. But unfortunately, there is no Event on the Render Dashboard which corresponds to the moment when I think my app restarted. That means I have no idea then why my app is having trouble. Can you suggest anything any ideas?

For context, about once a day I am using my app, it’s working fine for awhile, and then I notice it hangs. After a few minutes it starts working again. Yesterday this happened and I found the precise moment in the logs.

It was yesterday at 1:17:01 PM (log time). The key log lines are:

[SolidQueue] Claimed 1 jobs
I, [2024-04-29T18:16:29.093278 #67]  INFO -- : [ActiveJob] [GetNextAIMessageJob] [33930e21-87b8-4173-9c6e-edc5e39ecf4b] Performing GetNextAIMessageJob (Job ID: 33930e21-87b8-4173-9c6e-edc5e39ecf4b) from SolidQueue(default) enqueued at 2024-04-29T18:16:29.024459609Z with arguments: 3357, 3
### GetNextAIMessageJob.perform(3357, 3, 1)
### Wait for reply
[SolidQueue] Restarting fork[67] (status: )
[SolidQueue] Starting Worker(pid=91, hostname=srv-cn7qcf7sc6pc73cbk1ug-64d5657b7b-4kdb6, metadata={:polling_interval=>0.1, :queues=>"*", :thread_pool_size=>5})
[1] - Worker 0 (PID: 103) booted in 0.0s, phase: 0
I, [2024-04-29T18:18:21.404711 #103]  INFO -- : [6ee21674-2654-445a-a387-2c64c24f786c] Started GET "/cable" for 23.112.39.144 at 2024-04-29 18:18:21 +0000
I, [2024-04-29T18:18:21.413428 #103]  INFO -- : [6ee21674-2654-445a-a387-2c64c24f786c] Started GET "/cable" [WebSocket] for 23.112.39.144 at 2024-04-29 18:18:21 +0000
I, [2024-04-29T18:18:21.413564 #103]  INFO -- : [6ee21674-2654-445a-a387-2c64c24f786c] Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: websocket)
I, [2024-04-29T18:18:21.514497 #103]  INFO -- : Turbo::StreamsChannel is streaming from Z2lkOi8vaG9zdGVkLWdwdC9Db252ZXJzYXRpb24vMTI1
[SolidQueue] Pruning dead process 360 - {"polling_interval"=>0.1, "queues"=>"*", "thread_pool_size"=>5}
[SolidQueue] Claimed 1 jobs
I, [2024-04-29T18:23:10.309082 #91]  INFO -- : [ActiveJob] [GetNextAIMessageJob] [33930e21-87b8-4173-9c6e-edc5e39ecf4b] Performing GetNextAIMessageJob (Job ID: 33930e21-87b8-4173-9c6e-edc5e39ecf4b) from SolidQueue(default) enqueued at 2024-04-29T18:16:29.024459609Z with arguments: 3357, 3
### GetNextAIMessageJob.perform(3357, 3, 1)
### Wait for reply
### Finished GetNextAIMessageJob.perform(3357, 3)
I, [2024-04-29T18:23:24.835017 #91]  INFO -- : [ActiveJob] [GetNextAIMessageJob] [33930e21-87b8-4173-9c6e-edc5e39ecf4b] Performed GetNextAIMessageJob (Job ID: 33930e21-87b8-4173-9c6e-edc5e39ecf4b) from SolidQueue(default) in 14525.39ms
I, [2024-04-29T18:23:25.344451 #91]  INFO -- : [ActiveJob] Enqueued Turbo::Streams::BroadcastStreamJob (Job ID: b0083a97-908d-4568-85ec-1ebceb374adf) to SolidQueue(default) with arguments: "Z2lkOi8vaG9zdGVkLWdwdC9Db252ZXJzYXRpb24vMTI1", {:content=>"<turbo-stream action=\"refresh\"></turbo-stream>"}
[SolidQueue] Claimed 1 jobs
I, [2024-04-29T18:23:25.365729 #91]  INFO -- : [ActiveJob] [Turbo::Streams::BroadcastStreamJob] [b0083a97-908d-4568-85ec-1ebceb374adf] Performing Turbo::Streams::BroadcastStreamJob (Job ID: b0083a97-908d-4568-85ec-1ebceb374adf) from SolidQueue(default) enqueued at 2024-04-29T18:23:25.335394484Z with arguments: "Z2lkOi8vaG9zdGVkLWdwdC9Db252ZXJzYXRpb24vMTI1", {:content=>"<turbo-stream action=\"refresh\"></turbo-stream>"}
I, [2024-04-29T18:23:25.366418 #91]  INFO -- : [ActiveJob] [Turbo::Streams::BroadcastStreamJob] [b0083a97-908d-4568-85ec-1ebceb374adf] Performed Turbo::Streams::BroadcastStreamJob (Job ID: b0083a97-908d-4568-85ec-1ebceb374adf) from SolidQueue(default) in 0.87ms
I, [2024-04-29T18:23:26.311872 #103]  INFO -- : [5082870c-319c-444c-b71e-c36f29e7804c] Started GET "/conversations/125/messages?version=1" for 108.162.245.166 at 2024-04-29 18:23:26 +0000
I, [2024-04-29T18:23:26.317496 #103]  INFO -- : [5082870c-319c-444c-b71e-c36f29e7804c] Processing by MessagesController#index as HTML
I, [2024-04-29T18:23:26.317594 #103]  INFO -- : [5082870c-319c-444c-b71e-c36f29e7804c]   Parameters: {"version"=>"1", "conversation_id"=>"125"}
[1] - Worker 1 (PID: 124) booted in 0.0s, phase: 0
I, [2024-04-29T18:23:28.218480 #103]  INFO -- : [5082870c-319c-444c-b71e-c36f29e7804c]   Rendered layout layouts/application.html.erb (Duration: 455.2ms | Allocations: 129565)
I, [2024-04-29T18:23:28.219448 #103]  INFO -- : [5082870c-319c-444c-b71e-c36f29e7804c] Completed 200 OK in 1902ms (Views: 448.0ms | ActiveRecord: 43.2ms | Allocations: 179138)

from here: Render · The Easiest Cloud For All Your Apps

Notably the word “restart” is a distraction. What happened is that the job started running but a second afterwards—before it completed—the supervisor process for SolidQueue noticed that a fork had been terminated so it’s replacing it. Notably, there it’s missing a status for the job that’s being replaced so that suggests it was a SIGKILL but I don’t know that for sure. That was the working theory, but since I see no Event on the Render Dashboard that corresponds with this time it now seems like maybe something else is going on? I don’t have any other theories or ideas of what to investigate.