Celery Sporadic KeyError

Hi, I have a celery worker that works sometimes, but mostly fails with a KeyError when executing celery_tasks.crawl_linkmap_task. The first time I run the celery task, it usually runs correctly. But then when I restart my service, or call the task a second time, it always fails with a KeyError on the second shot (see below). I’m not sure if this is a celery or render issue, but either way, I really appreciate any help!

===
Details below:

Every time it fails, I warm shutdown my celery service, wait for my render queue to sync with my local environment and start it back up in the top level of my local virtual environment with:

celery -A celery_tasks worker --loglevel=info.`

In my local machine I see the task is present:

 -------------- celery@Julians-MacBook-Pro.local v5.3.4 (emerald-rush)
--- ***** ----- 
-- ******* ---- macOS-10.16-x86_64-i386-64bit 2023-11-02 15:57:16
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         celery_tasks:0x7fd1956dc8b0
- ** ---------- .> transport:   rediss://red-cl007kas1bgc73fo0pt0:**@ohio-redis.render.com:6379//
- ** ---------- .> results:     mongodb+srv://julianghadially:**@amati0.xwuxtdi.mongodb.net/
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery
                

[tasks]
  . celery_tasks.crawl_linkmap_task

However, I get the following Celery KeyError in my render queue:

Nov 2 04:01:42 PM [2023-11-02 21:01:42,503: ERROR/MainProcess] Received unregistered task of type ‘celery_tasks.crawl_linkmap_task’.
Nov 2 04:01:42 PM The message has been ignored and discarded.
Nov 2 04:01:42 PM
Nov 2 04:01:42 PM Did you remember to import the module containing this task?
Nov 2 04:01:42 PM Or maybe you’re using relative imports?
Nov 2 04:01:42 PM
Nov 2 04:01:42 PM Please see
Nov 2 04:01:42 PM Celery-Q - Programming Blog
Nov 2 04:01:42 PM for more information.
Nov 2 04:01:42 PM
Nov 2 04:01:42 PM The full contents of the message body was:
Nov 2 04:01:42 PM b’[[[“”]], {“callbacks”: null, “errbacks”: null, “chain”: null, “chord”: null}]’ (153b)
Nov 2 04:01:42 PM
Nov 2 04:01:42 PM Thw full contents of the message headers:
Nov 2 04:01:42 PM {‘lang’: ‘py’, ‘task’: ‘celery_tasks.crawl_linkmap_task’, ‘id’: ‘e3ca819f-079c-4dd5-a4a9-7a702eedb15a’, ‘shadow’: None, ‘eta’: None, ‘expires’: None, ‘group’: None, ‘group_index’: None, ‘retries’: 0, ‘timelimit’: [None, None], ‘root_id’: ‘e3ca819f-079c-4dd5-a4a9-7a702eedb15a’, ‘parent_id’: None, ‘argsrepr’: “([‘’],)”}", ‘origin’: ‘gen13472@Julians-MacBook-Pro.local’, ‘ignore_result’: False, ‘stamped_headers’: None, ‘stamps’: {}}
Nov 2 04:01:42 PM
Nov 2 04:01:42 PM The delivery info for this task is:
Nov 2 04:01:42 PM {‘exchange’: ‘’, ‘routing_key’: ‘celery’}
Nov 2 04:01:42 PM Traceback (most recent call last):
Nov 2 04:01:42 PM File “/opt/render/project/src/.venv/lib/python3.7/site-packages/celery/worker/consumer/consumer.py”, line 591, in on_task_received
Nov 2 04:01:42 PM strategy = strategies[type_]
Nov 2 04:01:42 PM KeyError: ‘celery_tasks.crawl_linkmap_task’

If you want to see the code I’m running here it is:

Filename: celery_tasks.py

from celery import Celery
import time
from time import sleep
import os
from celery.utils.log import get_task_logger
import json
import requests
from random import randint
from bs4 import BeautifulSoup
#tools
import tools
from tools import yyyymmdd_date
import datetime
from selenium.common.exceptions import WebDriverException
#import pymongo

mongo_key = os.environ.get('MONGODB_KEY')
redis_key = os.environ.get('REDIS_KEY')
print(redis_key)
bk = os.environ.get('BROWSERLESS_KEY')
print(bk)

#local = 'celery@Julians-MacBook-Pro.local'
redis = 'rediss://red-cl007kas1bgc73fo0pt0:'+redis_key+'@ohio-redis.render.com:6379'
mongo = 'mongodb+srv://julianghadially:'+mongo_key+'@amati0.xwuxtdi.mongodb.net/?retryWrites=true&w=majority'

logger = get_task_logger(__name__)
app = Celery('celery_tasks', broker=redis,backend=mongo)


@app.task()
def crawl_linkmap_task(linkmap,visited_urls=[],cap=None,bk=bk):
    logger.info('Got request - starting work')
    headers = {
        'Cache-Control': 'no-cache',
        'Content-Type': 'application/json',
    }
    
    date = yyyymmdd_date(datetime.date.today())
    
    page_texts_trackr = []
    links_trackr = []
    link_visited_dates_trackr = []


    if cap != None:
        print("Capping urls in linkmap. Capping should cull recurssively based on parsing rules. Update code.")
        cap = min(cap,len(linkmap))
        linkmap = linkmap[0:cap]

    for link in linkmap:
        try:
            if link not in visited_urls:
                t = time.time()
                print(link)
                
                #Request
                data = {"url": link}
                data_json = json.dumps(data)
                response = requests.post(
                    "https://chrome.browserless.io/content?token="+str(bk), headers=headers, data=data_json)
                logger.info(str(bk))
                if response.status_code == 200:
                    soup = BeautifulSoup(response.content, "html.parser")
                    result_textonly = soup.get_text()
                    page_texts_trackr.append(result_textonly)
                    links_trackr.append(link)
                    link_visited_dates_trackr.append(date)
                    logger.info("crawled a page")
                else:
                    logger.info("Response status code: " + str(response.status_code))

                #sleep
                elapsed = time.time() - t
                remaining_wait = max([int(5.0 - elapsed),0.5])
                sleep(remaining_wait+(randint(0,100)/100))
        except WebDriverException:
            print("WebDriverException: Failed to load site " + str(link))
            sleep(2+(randint(0,100)/100))
    
    #driver.quit()
    logger.info('Work finished')
    return {'Text': page_texts_trackr, 'Links': links_trackr, 'Dates': link_visited_dates_trackr}

Hi,

Something I did spot was that your local output shows Celery 5.3.4, which requires Python >=3.8

Whereas your Render logs show:

Nov 2 04:01:42 PM File “/opt/render/project/src/.venv/lib/python3.7/site-packages/celery/worker/consumer/consumer.py”, line 591, in on_task_received

Where the python3.7 implies you’re running the current Render Python Runtime default of 3.7.10. Maybe as a first step, make sure your environments are as close as possible in Python & package versioning.

I’m not a Python expert, but a search for “Keyerror celery task”, seems to bring up several Stack Overflow results, e.g. https://stackoverflow.com/questions/68888941/keyerror-received-unregistered-task-of-type-on-celery-while-task-is-registere

Alan

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.