Building a Simple Content Fetching API with Koa, Axios, and Readability

124 min read

In the ever-evolving landscape of web development, creating efficient and scalable APIs is crucial. Whether you're building a content aggregator, a web scraper, or any service that requires fetching and processing web content, having a robust backend is essential. In this blog post, we'll walk through a simple yet powerful Node.js application that leverages Koa, Axios, and Mozilla's Readability to fetch, parse, and serve readable content from any given URL.

Table of Contents

  1. Introduction
  2. Prerequisites
  3. Setting Up the Project
  4. Understanding the Code
  5. Running the Application
  6. Testing the API
  7. Handling Errors
  8. Conclusion
  9. Further Reading

Introduction

Building an API that can fetch and process web content involves several steps: making HTTP requests, parsing HTML, and extracting meaningful information. In this tutorial, we'll create a Koa-based server that exposes an /api endpoint. This endpoint accepts a URL as a query parameter, fetches the HTML content of the provided URL using Axios, parses the HTML with JSDOM, and then extracts the main readable content using Mozilla's Readability library.

Additionally, we'll include a simple /test endpoint to verify that our server is running correctly.

Prerequisites

Before diving into the code, ensure you have the following installed on your machine:

  • Node.js (version 12 or higher)
  • npm or yarn package manager

Familiarity with JavaScript, Node.js, and basic understanding of Koa will be beneficial.

Setting Up the Project

  1. Initialize a New Node.js Project

    mkdir koa-content-fetcher
    cd koa-content-fetcher
    npm init -y
    
  2. Install Required Dependencies

    npm install koa koa-router axios jsdom @mozilla/readability
    
    • koa: A lightweight and expressive middleware framework for Node.js.
    • koa-router: Router middleware for Koa.
    • axios: Promise-based HTTP client for the browser and Node.js.
    • jsdom: A JavaScript implementation of the DOM and HTML standards.
    • @mozilla/readability: A library to extract and parse the main content from web pages.
  3. Create the Server File

    Create a file named server.js and paste the following code:

    const Koa = require('koa');
    const Router = require('koa-router');
    const axios = require('axios');
    const { JSDOM } = require('jsdom');
    const { Readability } = require('@mozilla/readability');
    
    const app = new Koa();
    const router = new Router();
    
    router.get('/api', async ctx => {
        const url = ctx.query.url;
        console.log(url);
        try {
            const response = await axios.get(url);
            const html = response.data;
            const dom = new JSDOM(html, { url });
            const reader = new Readability(dom.window.document);
            ctx.body = reader.parse();
        } catch (error) {
            ctx.status = 500;
            ctx.body = { error: 'Failed to fetch and parse content' };
        }
    });
    
    router.get('/test', async ctx => {
        ctx.body = "pong";
    });
    
    app.use(router.routes()).use(router.allowedMethods());
    
    app.listen(3009, () => {
        console.log('Server running on http://localhost:3009');
    });
    

Understanding the Code

Let's break down the server code to understand how each part contributes to the overall functionality.

Importing Dependencies

const Koa = require('koa');
const Router = require('koa-router');
const axios = require('axios');
const { JSDOM } = require('jsdom');
const { Readability } = require('@mozilla/readability');
  • Koa: The core framework for building the server.
  • koa-router: Facilitates routing in Koa applications.
  • axios: Handles HTTP requests to fetch external web content.
  • JSDOM: Parses the fetched HTML content into a DOM-like structure.
  • Readability: Extracts the main content (like articles) from the DOM.

Initializing Koa and Router

const app = new Koa();
const router = new Router();
  • app: The Koa application instance.
  • router: An instance of Koa Router to define API endpoints.

Defining Routes

API Route (/api)

router.get('/api', async ctx => {
    const url = ctx.query.url;
    console.log(url);
    try {
        const response = await axios.get(url);
        const html = response.data;
        const dom = new JSDOM(html, { url });
        const reader = new Readability(dom.window.document);
        ctx.body = reader.parse();
    } catch (error) {
        ctx.status = 500;
        ctx.body = { error: 'Failed to fetch and parse content' };
    }
});
  • Endpoint: /api
  • Method: GET
  • Functionality:
    1. Extract URL: Retrieves the url query parameter from the request.
    2. Fetch Content: Uses Axios to make a GET request to the provided URL.
    3. Parse HTML: Converts the fetched HTML string into a DOM using JSDOM.
    4. Extract Readable Content: Utilizes Readability to parse the DOM and extract the main content.
    5. Respond: Sends the parsed content back to the client.
  • Error Handling: If any step fails, the server responds with a 500 status code and an error message.

Test Route (/test)

router.get('/test', async ctx => {
    ctx.body = "pong";
});
  • Endpoint: /test
  • Method: GET
  • Functionality: Returns a simple string "pong" to verify that the server is operational.

Starting the Server

app.use(router.routes()).use(router.allowedMethods());

app.listen(3009, () => {
    console.log('Server running on http://localhost:3009');
});
  • Middleware: Registers the defined routes and allowed HTTP methods with the Koa application.
  • Listening Port: The server listens on port 3009.
  • Confirmation: Logs a message to the console indicating that the server is running.

Running the Application

  1. Start the Server

    In your terminal, navigate to the project directory and run:

    node server.js
    

    You should see the following output:

    Server running on http://localhost:3009
    
  2. Verify the Test Endpoint

    Open your browser or use a tool like curl or Postman to access:

    http://localhost:3009/test
    

    You should receive a response:

    pong
    

Testing the API

The core functionality lies in the /api endpoint, which fetches and parses content from a provided URL.

Example Request

To use the API, send a GET request to /api with the url query parameter set to the desired webpage.

Example:

curl "http://localhost:3009/api?url=https://example.com/article"

Expected Response

The API will respond with a JSON object containing the parsed content. Here's a simplified example:

{
  "title": "Example Article",
  "byline": "Author Name",
  "content": "<p>This is the main content of the article...</p>",
  "textContent": "This is the main content of the article...",
  "length": 1234
}
  • title: The title of the article.
  • byline: The author's name.
  • content: The HTML content extracted from the page.
  • textContent: Plain text version of the content.
  • length: The length of the text content.

Handling Errors

Robust error handling is essential for any API. In our implementation, if the server encounters any issues while fetching or parsing the content, it responds with a 500 status code and an error message.

Error Response Example:

{
  "error": "Failed to fetch and parse content"
}

Common scenarios that might trigger an error include:

  • Invalid URL: The provided URL is malformed or does not exist.
  • Network Issues: Problems with network connectivity or the target server being down.
  • Parsing Failures: Issues with parsing the HTML content, possibly due to unexpected structures.

Conclusion

In this tutorial, we built a simple yet effective API using Koa that can fetch and parse web content from any given URL. By combining the power of Axios for HTTP requests, JSDOM for HTML parsing, and Mozilla's Readability for content extraction, we created a tool that can be the backbone of various applications like content aggregators, readability-enhanced browsers, or data extraction services.

Further Reading

Feel free to experiment with the code, extend its functionalities, and integrate it into your projects. Happy coding!