Convert a form image to an HTML form using Amazon Textract and NodeJS

In this article we will learn how to convert an image (containing a simple form) to an HTML form using Amazon Textract and NodeJS. Amazon Textract is a service that automatically extracts text and data from scanned documents. It’s an AI powered optical character recognition (OCR) that makes it very simple to accurately pull out content from an image.

One of Textract’s strengths is the ability to identify a form in an image and extract the data with its associated relationship. Meaning, if we look at a basic form:

Textract provides an API that enable us to output the results as follows:

{
	"Position": "Software Developer",
	"First Name": "Muhi",
	"Last Name": "Masri",
	"Address": "Planet Earth"
}

Pretty cool, right! But getting the desired JSON object is not as simple as calling one function from Textract’s API and in this article we will look at a simplified solution to achieve this.

This article will assume that you have an AWS account including S3 storage, access key and secret key. If not, you can still continue reading but I highly recommend to create an AWS Account and get a bit familiar with how it works.

Create a simple NodeJS app

We are going to use express application generator. It automatically creates a project with html views (using pug) and a routing system. This way, we can easily add an upload function and post the result in a different view

mkdir aws-textract-app
cd aws-textract-app
npx express-generator --view=pug
npm install

We simply created a folder and installed express-generator with all the npm dependencies. Here is how your project structure should look like:

Upload an image to your S3 storage:

For simplicity sake, we will use the image we have at the beginning of the article that contains 4 inputs (Position, First Name, Last Name and Address)

First, let’s add all the required elements to upload a file in the index.pug view

extends layout

block content
  h1= title
  p Welcome to #{title}
  form(action="fileupload", method="post", enctype="multipart/form-data") 
    input(type="file", name="filetoupload")
    input(type="submit", value="Upload File")

Then let’s create a new fileupload.pug view in the view folder to post the results. Also, we will add a simple form to bind the extracted data later on.

extends layout

block content
  h1= title
  div 
    span Position:
    input(type="text", name="position", value=`${formData['Position']}`)
  div 
    span First Name:
    input(type="text", name="firstName", value=`${formData['First Name']}`)
  div 
    span Last Name:
    input(type="text", name="lastName", value=`${formData['Last Name']}`)
  div 
    span Address:
    input(type="text", name="address", value=`${formData['Address']}`)

Now that we have the HTML part done, let’s go ahead and start writing the logic for uploading the file.

Let’s start with installing all the dependencies to intercept a file and upload it to the S3 storage.

npm i formidable
npm i fs
npm i aws-sdk

In the routes folder, you should find an index.js file that already has the following code:

var express = require('express');
var router = express.Router();

/* GET home page. */
router.get('/', function(req, res, next) {
  res.render('index', { title: 'Express' });
});

module.exports = router;

In the same file, we will include all the dependencies we just installed and add a new post router that handles the fileupload post action in index.pug view.

var express = require('express');
var router = express.Router();
const formidable = require('formidable')
const AWS = require('aws-sdk')
const fs = require('fs')

/* GET home page. */
router.get('/', function(req, res, next) {
  res.render('index', { title: 'Express' });
});

router.post('/fileupload', (req, res, next) => {
  // Upload logic
}

module.exports = router;

And then below we will add the implementation for the upload logic:

router.post('/fileupload', (req, res, next) => {
  // Upload logic
  const form = new formidable.IncomingForm()
  form.parse(req, async (err, fields, files) => {
    if (err) {
      console.error(err)
    }
    const fileContent = fs.readFileSync(files.filetoupload.path)
    const s3Params = {
      Bucket: process.env.AWS_BUCKET,
      Key: `${Date.now().toString()}-${files.filetoupload.name}`,
      Body: fileContent,
      ContentType: files.filetoupload.type,
      ACL: 'public-read'
    }
    const s3Content = await s3Upload(s3Params)
    // Textract code will be added here
  })
})

async function s3Upload (params) {
  const s3 = new AWS.S3({
    accessKeyId: process.env.AWS_ACCESS_KEY,
    secretAccessKey: process.env.AWS_SECRET_KEY
  })
  return new Promise(resolve => {
    s3.upload(params, (err, data) => {
      if (err) {
        console.error(err)
        resolve(err)
      } else {
        resolve(data)
      }
    })
  })
}

Quick summary of what we just did:

Parsed the form using formidable.
Read the content of the file and assigned it as a value to the Body property (along with other required properties) in the AWS upload parameters.
Created an async s3Upload function that returns the results in the variable s3Content. The information in this variable will be used for the Textract reader in the next step.

At this point, you should be able to run the Node app and upload an image directly to your S3 bucket. Awesome job!

Analyse a document directly from S3 bucket using Textract API

Let’s create a function called documentExtract that takes the S3 object key as a parameter and then returns all the data extracted from the image.

async function documentExtract (key) {
  return new Promise(resolve => {
    var textract = new AWS.Textract({
      region: process.env.AWS_REGION,
      endpoint: `https://textract.${process.env.AWS_REGION}.amazonaws.com/`,
      accessKeyId: process.env.AWS_ACCESS_KEY,
      secretAccessKey: process.env.AWS_SECRET_KEY
    })
    var params = {
      Document: {
        S3Object: {
          Bucket: process.env.AWS_BUCKET,
          Name: key
        }
      },
      FeatureTypes: ['FORMS']
    }

    textract.analyzeDocument(params, (err, data) => {
      if (err) {
        return resolve(err)
      } else {
        resolve(data)
      }
    })
  })
}

Similar to the S3 upload process we did earlier, the Textract API will require information about your region, access key, bucket name…

You will notice that we have a property called FeatureTypes and a value FORMS, this is very important as it let Textract do its magic and return Key-Value sets to help us associate input fields with the proper labels (i.e. “Software Developer” belongs to “Position”, “Planet Earth” belongs to “Address” and so on)

Now let’s insert this function right after the S3 upload process and log the results. Our index.js code should look like this so far:

const express = require('express')
const router = express.Router()
const formidable = require('formidable')
const AWS = require('aws-sdk')
const fs = require('fs')

/* GET home page. */
router.get('/', (req, res, next) => {
  res.render('index', { title: 'Textract Uploader' })
})

router.post('/fileupload', (req, res, next) => {
  // Upload logic
  const form = new formidable.IncomingForm()
  form.parse(req, async (err, fields, files) => {
    if (err) {
      console.error(err)
    }
    const fileContent = fs.readFileSync(files.filetoupload.path)
    const s3Params = {
      Bucket: process.env.AWS_BUCKET,
      Key: `${Date.now().toString()}-${files.filetoupload.name}`,
      Body: fileContent,
      ContentType: files.filetoupload.type,
      ACL: 'public-read'
    }
    const s3Content = await s3Upload(s3Params)
    const textractData = await documentExtract(s3Content.Key)
    console.log(textractData)
  })
})

async function s3Upload (params) {
   const s3 = new AWS.S3({
    accessKeyId: process.env.AWS_ACCESS_KEY,
    secretAccessKey: process.env.AWS_SECRET_KEY
  })
  return new Promise(resolve => {
    s3.upload(params, (err, data) => {
      if (err) {
        console.error(err)
        resolve(err)
      } else {
        resolve(data)
      }
    })
  })
}

async function documentExtract (key) {
 return new Promise(resolve => {
    var textract = new AWS.Textract({
      region: process.env.AWS_REGION,
      endpoint: `https://textract.${process.env.AWS_REGION}.amazonaws.com/`,
      accessKeyId: process.env.AWS_ACCESS_KEY,
      secretAccessKey: process.env.AWS_SECRET_KEY
    })
    var params = {
      Document: {
        S3Object: {
          Bucket: process.env.AWS_BUCKET,
          Name: key
        }
      },
      FeatureTypes: ['FORMS']
    }

    textract.analyzeDocument(params, (err, data) => {
      if (err) {
        return resolve(err)
      } else {
        resolve(data)
      }
    })
  })
}

module.exports = router

When running the code, the console will print a list of JSON objects were each object represents a block with a unique id, list of relationships and other related properties. Let’s take this as an example:

{
		"BlockType": "KEY_VALUE_SET",
		"Confidence": 80.23428344726562,
		"Geometry": {
			"BoundingBox": {
				"Width": 0.0715109333395958,
				"Height": 0.043582554906606674,
				"Left": 0.018339848145842552,
				"Top": 0.4098675847053528
			},
			"Polygon": [{
				"X": 0.018339848145842552,
				"Y": 0.4098675847053528
			}, {
				"X": 0.0898507833480835,
				"Y": 0.4098675847053528
			}, {
				"X": 0.0898507833480835,
				"Y": 0.45345014333724976
			}, {
				"X": 0.018339848145842552,
				"Y": 0.45345014333724976
			}]
		},
		"Id": "c3d7521b-0371-4ca1-9607-0864f2edcfdd",
		"Relationships": [{
			"Type": "VALUE",
			"Ids": ["e294a18d-8db3-4369-bec4-e15b882e6563"]
		}, {
			"Type": "CHILD",
			"Ids": ["9eaa014f-03ce-4722-9665-3bd94aea60ec", "8471eed1-9caf-45df-bd6a-e639f6caa9d4"]
		}],
		"EntityTypes": ["KEY"]
	}

Whenever the BlockType is KEY_VALUE_SET, it means that it has a relationship property that connects associated objects together. In this specific block, we can figure the relationship between “Position” and “Software Developer” by looking up the “Ids” in the “Relationships” object. To understand how it works in more details, you can checkout Amazon’s Developers Guide.

Use AWS Textract Helper to extract the form data as a JSON object

To simplify the process of finding all relationships and save you from writing several functions, I created an AWS Textract Helper module to do the job in just one hit.

Let’s install the module and include it in our index.js file

 npm i aws-textract-helper

const textractHelper = require('aws-textract-helper')

There are multiple functions available in the API but for now you only need createForm. It takes two parameters, the data that we got back from the documentExtract function and a config (optional) which allows you to trim unwanted characters in the form keys such as a colon or an extra space.

const textractData = await documentExtract(s3Content.Key)
const formData = textractHelper.createForm(textractData, { trimChars: [':', ' '] })

Display the results in an HTML form

Last but not least, let’s render the fileupload view with the data generated from the textract helper function. Our final code for the upload function should look like this:

router.post('/fileupload', (req, res, next) => {
  // Upload logic
  const form = new formidable.IncomingForm()
  form.parse(req, async (err, fields, files) => {
    if (err) {
      console.error(err)
    }
    const fileContent = fs.readFileSync(files.filetoupload.path)
    const s3Params = {
      Bucket: process.env.AWS_BUCKET,
      Key: `${Date.now().toString()}-${files.filetoupload.name}`,
      Body: fileContent,
      ContentType: files.filetoupload.type,
      ACL: 'public-read'
    }
    const s3Content = await s3Upload(s3Params)
    const textractData = await documentExtract(s3Content.Key)

    const formData = textractHelper.createForm(textractData, { trimChars: [':', ' '] })
    res.render('fileupload', { title: 'Upload Results', formData })
  })
})

Now when we run the code and upload the image, we should get the following results. Also, the full example is available publicly in this repository.

Conclusion

Amazon Textract is still relatively a new technology and a lot to discover and learn from but it is definitely worth looking at when you want to instantly and accurately analyse unstructured text from your customer’s captured data such as invoices or receipts.

Bye for now 👋

If you enjoyed this post, I regularly share similar content on Twitter. Follow me @muhimasri to stay up to date, or feel free to message me on Twitter if you have any questions or comments. I'm always open to discussing the topics I write about!