~ 5 min read
Installing Playwright on Heroku for Programmatic Node.js Browser Automation
Installing Playwright on Heroku is a bit more involved than just running npm install playwright
and npm install @playwright/test
and setting it as a dependency that gets installed when you deploy your app to Heroku. Installing the browser dependencies didn’t work out of the box for me and I had to do some additional steps to get it working.
I added Playwright web automation capabilities to my Heroku hosted Node.js backend but it wasn’t as seamless as I thought it would be. Here’s how I did it and what I learned along the way.
Adding Playwright as a Dependency
First off, when it comes to the Playwright dependencies, you need both playwright
and @playwright/test
installed. The former is the core Playwright library and the latter is the test runner that you can use to run your Playwright tests.
If your Playwright use-case isn’t end-to-end testing but rather browser automation and scraping (only from approved websites, of course), you probably need just one browser installed. In my case, I only needed Chromium. So, I switched playwright
for playwright-chromium
.
npm install playwright-chromium @playwright/test
Playwright Browser Automation via Code
Since I want to run Playwright browser automation tool programmatically via code and not via the classic test runner (although, I do want to run Playwright tests as well), I structured the code so that I separate between the web interaction and the tests being run:
- A
PageScraper.js
file defines the code used to interact with the browser and scrape the data I need. - An
e2e/
directory contains the Playwright tests that use thePageScraper.js
file to interact with the browser.
This way, I can actually use the PageScraper.js
code in my backend API to drive the needed logic, as well as use it as the basis of sanity end-to-end tests.
The PageScraper.js
file looks like this:
import { chromium } from "playwright-chromium";
export default class PageScraper {
constructor({ Logger }) {
this.Logger = Logger;
}
async scrapePage({ url }) {
const browser = await chromium.launch({ chromiumSandbox: false });
const page = await browser.newPage();
await page.goto(url);
// the title of the blog post in this web page is found via the `txt-headline-large` class on the h1 element:
const title = await page.$eval(
".txt-headline-large",
(el) => el.textContent
);
// the contents of the blog post in this web page is found via the `txt-rich-long` class on the div element:
const content = await page.$eval(".txt-rich-long", (el) => el.textContent);
await browser.close();
return {
title,
content,
};
}
}
Then the E2E (end-to-end) test file can be as the follows scrapers/page-scraper.spec.js
:
import { container, initDI } from "../infra/di";
import { test, expect } from "@playwright/test";
await initDI({ config: {}, database: {} });
test("url provided to scraper gets title and content", async ({ page }) => {
const scraper = container.resolve("PageScraper");
const { title, content } = await scraper.scrapePage({
url: "https://www.example.com",
});
// Expect a title "to contain" a substring
await expect(title).toContain(
"Title goes here..."
);
// Expect the page content to be at least 1000 characters long
await expect(content.length).toBeGreaterThan(1000);
});
Now we can also further update the package.json
section for scripts
to allow easily running these tests:
{
"scripts": {
"test:scrapers": "npx playwright test scrapers/*.spec.js",
}
}
Installing Playwright on Heroku
Getting Playwright to work on Heroku wasn’t smooth sailing. It looked for browser dependencies that weren’t installed by default and not in the location it expected them.
So the Heroku Node.js application build failed with errors like this:
remote: > app-api@1.0.0 postinstall
remote: > npx playwright install chromium --with-deps
remote:
remote: Installing dependencies...
remote: Switching to root user to install dependencies...
remote: Password: su: Authentication failure
remote: Failed to install browsers
remote: Error: Installation process exited with code: 1
remote: npm error code 1
remote: npm error path /tmp/build_e814256b
remote: npm error command failed
remote: npm error command sh -c npx playwright install chromium --with-deps
remote:
remote: npm error A complete log of this run can be found in: /tmp/npmcache.KPL0X/_logs/2024-05-08T06_12_14_978Z-debug-0.log
That command npx playwright install chromium --with-deps
was failing because it was trying to install the browser dependencies as the root user and Heroku doesn’t allow that, and I added this as a postinstall
script in my package.json
file to make sure that browser dependencies are installed when the app is deployed to Heroku. However, that didn’t work out.
The solution was to install the browser dependencies via buildpacks, which is something like container images that can be layered. That meant that I also needed to use the heroku/nodejs
buildpack as well, and couldn’t just leave it to only having a Procfile
and default to Heroku knowing how to build the app.
And so, add to your app.json
(or create one, I guess), the following buildpacks directives:
{
"buildpacks": [
{
"url": "https://github.com/mxschmitt/heroku-playwright-buildpack.git"
},
{
"url": "https://github.com/heroku/heroku-buildpack-nodejs"
}
]
}
And then, you can deploy your app to Heroku and it should work. The Playwright browser dependencies should be installed and you should be able to run your Playwright tests on Heroku.
Follow-up Playwright Heroku Deployment Resources
See the official Heroku documentation for a Playwright community buildpack and an example Express Playwright repository for practical examples.