~ 5 min read

Installing Playwright on Heroku for Programmatic Node.js Browser Automation

share this story on
Getting Playwright to work on Heroku wasn't smooth sailing. It looked for browser dependencies that weren't installed by default and not in the location it expected them. Here's how I did it and what I learned along the way.

Installing Playwright on Heroku is a bit more involved than just running npm install playwright and npm install @playwright/test and setting it as a dependency that gets installed when you deploy your app to Heroku. Installing the browser dependencies didn’t work out of the box for me and I had to do some additional steps to get it working.

I added Playwright web automation capabilities to my Heroku hosted Node.js backend but it wasn’t as seamless as I thought it would be. Here’s how I did it and what I learned along the way.

Adding Playwright as a Dependency

First off, when it comes to the Playwright dependencies, you need both playwright and @playwright/test installed. The former is the core Playwright library and the latter is the test runner that you can use to run your Playwright tests.

If your Playwright use-case isn’t end-to-end testing but rather browser automation and scraping (only from approved websites, of course), you probably need just one browser installed. In my case, I only needed Chromium. So, I switched playwright for playwright-chromium.

npm install playwright-chromium @playwright/test

Playwright Browser Automation via Code

Since I want to run Playwright browser automation tool programmatically via code and not via the classic test runner (although, I do want to run Playwright tests as well), I structured the code so that I separate between the web interaction and the tests being run:

  • A PageScraper.js file defines the code used to interact with the browser and scrape the data I need.
  • An e2e/ directory contains the Playwright tests that use the PageScraper.js file to interact with the browser.

This way, I can actually use the PageScraper.js code in my backend API to drive the needed logic, as well as use it as the basis of sanity end-to-end tests.

The PageScraper.js file looks like this:

import { chromium } from "playwright-chromium";

export default class PageScraper {
  constructor({ Logger }) {
    this.Logger = Logger;
  }

  async scrapePage({ url }) {
    const browser = await chromium.launch({ chromiumSandbox: false });
    const page = await browser.newPage();
    await page.goto(url);

    // the title of the blog post in this web page is found via the `txt-headline-large` class on the h1 element:
    const title = await page.$eval(
      ".txt-headline-large",
      (el) => el.textContent
    );

    // the contents of the blog post in this web page is found via the `txt-rich-long` class on the div element:
    const content = await page.$eval(".txt-rich-long", (el) => el.textContent);
    await browser.close();

    return {
      title,
      content,
    };
  }
}

Then the E2E (end-to-end) test file can be as the follows scrapers/page-scraper.spec.js:

import { container, initDI } from "../infra/di";
import { test, expect } from "@playwright/test";

await initDI({ config: {}, database: {} });

test("url provided to scraper gets title and content", async ({ page }) => {
  const scraper = container.resolve("PageScraper");
  const { title, content } = await scraper.scrapePage({
    url: "https://www.example.com",
  });

  // Expect a title "to contain" a substring
  await expect(title).toContain(
    "Title goes here..."
  );

  // Expect the page content to be at least 1000 characters long
  await expect(content.length).toBeGreaterThan(1000);
});

Now we can also further update the package.json section for scripts to allow easily running these tests:

{
  "scripts": {
    "test:scrapers": "npx playwright test scrapers/*.spec.js",
  }
}

Installing Playwright on Heroku

Getting Playwright to work on Heroku wasn’t smooth sailing. It looked for browser dependencies that weren’t installed by default and not in the location it expected them.

So the Heroku Node.js application build failed with errors like this:

remote:        > app-api@1.0.0 postinstall
remote:        > npx playwright install chromium --with-deps
remote:        
remote:        Installing dependencies...
remote:        Switching to root user to install dependencies...
remote:        Password: su: Authentication failure
remote:        Failed to install browsers
remote:        Error: Installation process exited with code: 1
remote:        npm error code 1
remote:        npm error path /tmp/build_e814256b
remote:        npm error command failed
remote:        npm error command sh -c npx playwright install chromium --with-deps
remote:        
remote:        npm error A complete log of this run can be found in: /tmp/npmcache.KPL0X/_logs/2024-05-08T06_12_14_978Z-debug-0.log

That command npx playwright install chromium --with-deps was failing because it was trying to install the browser dependencies as the root user and Heroku doesn’t allow that, and I added this as a postinstall script in my package.json file to make sure that browser dependencies are installed when the app is deployed to Heroku. However, that didn’t work out.

The solution was to install the browser dependencies via buildpacks, which is something like container images that can be layered. That meant that I also needed to use the heroku/nodejs buildpack as well, and couldn’t just leave it to only having a Procfile and default to Heroku knowing how to build the app.

And so, add to your app.json (or create one, I guess), the following buildpacks directives:

{ 
  "buildpacks": [
    {
      "url": "https://github.com/mxschmitt/heroku-playwright-buildpack.git"
    },
    {
      "url": "https://github.com/heroku/heroku-buildpack-nodejs"
    }
  ]
}

And then, you can deploy your app to Heroku and it should work. The Playwright browser dependencies should be installed and you should be able to run your Playwright tests on Heroku.

Follow-up Playwright Heroku Deployment Resources

See the official Heroku documentation for a Playwright community buildpack and an example Express Playwright repository for practical examples.