Playwright Scraper Timeouts in Cron job

Hi everyone,

I’m facing an issue with my Dockerized Playwright scraper that runs fine locally but times out when executed as a cron job on render.com.

What the Scraper Does

My scraper is designed to scrape a Google Maps list, such as this one: Google Maps List. It scrolls through the list to load all items, then iterates over each item, clicking on it to open the info pane, and scrapes the content.

The Issue

The issue arises when attempting to click a button. The specific button that fails isn’t consistent, leading me to suspect a network problem, but I’m not certain.

Instance Details

  • Instance Type: 1 CPU, 2 GB

Code Snippet

Here’s the relevant code where the failure occurs:

  async clickPlaceButtonWithRetries(page: Page, placeButton: Locator, numRetries: number) {
    await page.waitForTimeout(500);
    for (let attempt = 1; attempt <= numRetries; attempt++) {
      try {
        await placeButton.click();
        await this.waitForMultipleMains(page, { timeout: 3000 + attempt * 1000 });
        return; // Exit the function if successful
      } catch (error) {
        if (attempt === numRetries) {
          console.error(`Failed to click place button after ${numRetries} attempts`, error);
          throw error; // Rethrow the error if it's the last attempt
        } else {
          console.error(`Error clicking place button, attempt ${attempt} of ${numRetries}`, error);
        }
      }
    }
  }

Dockerfile

Here’s my Dockerfile for reference:

# syntax=docker/dockerfile:1

ARG PNPM_VERSION=9.1.1

################################################################################
FROM mcr.microsoft.com/playwright:v1.44.0-jammy

# Set working directory for all build stages.
WORKDIR /usr/src/app

# Copy application dependency manifests to the container image.
# A wildcard is used to ensure both package.json AND package-lock.json are copied.
# Copying this separately prevents re-running npm install on every code change.
COPY package*.json pnpm-lock.yaml ./

# Install pnpm and dependencies
RUN --mount=type=cache,target=/root/.npm \
    npm install -g pnpm@${PNPM_VERSION} \
    && pnpm install
    
# Generate the Prisma client
COPY prisma ./prisma
RUN npx prisma generate

# Copy the rest of the source files into the image.
COPY . .

# Expose the port that the application listens on.
EXPOSE 3000

# Run the application.
CMD ["pnpm", "run", "script", "scripts/scrape-google-maps.ts"]