A First Look at Azure AI Document Intelligence
Azure AI Document Intelligence (formerly known as Azure Cognitive Services Form Recognizer) is a cloud-based service that uses machine learning to extract data from documents, such as key-value pairs, text, tables, and other key data. It can be used to collect the data from documents in various formats, or even from images of the documents.
Following along from the Microsoft Azure documentation and sample code project online, is easy way to get a feel for this technology. It is accessible to try by using a free azure subscription, and then the free pricing tier on the Document Intelligence resource, for our research purposes.
The following blog will give a brief first look at this technology.
Cloud-Based Service
Firstly, Azure AI Document Intelligence lives on the cloud as part of the services provided by Microsoft Azure. This makes it convenient to implement and manage as a developer already working with Microsoft .NET and the Azure platform. Accessing the Document Intelligence from code is as simple as setting up a Document intelligence/Forms Recogniser Resource in Microsoft Azure and providing the generated Endpoint and Key in the code. For other languages and platforms calling the Azure Resource, the process is quite similar.
Models for Data Extraction
Azure AI Document Intelligence offers a couple of basic models it comes to extracting data from documents:
Read Model
Extracts Text, both printed and handwritten from documents. It can process PDFs, Images (PNG, JPG, BMP, TIFF, HEIF formats), MS Word, MS Excel, MS PowerPoint and HTML.
Layout Model
Identifies and extracts both geometric (text, tables) and logical roles (titles, headings, labels) to provide better semantic understanding of documents.
Prebuilt and Custom Models
In addition to these fundamental functions, there is specific models for extracting data from standardised or common documents (Invoices, Bank Statements, US Tax and Mortgage documents, and Identity cards for example) – while these may be niche and not suited out of the box for a project, there is also functionality to create Custom Models which can be trained to extract distinct data from specific documents and forms.
Additional Capabilities
In addition to data extraction, Azure AI Document Intelligence also has the following add on capabilities:
- High Resolution Extraction:
Extracts small text from large documents (A1, A2, A3) with mixed fonts, orientations, and graphical elements. Enables accurate content extraction using the ocr.highResolution capability. - Formula Extraction:
Extracts formulas like mathematical equations, providing their LaTeX representation and polygon coordinates. Uses ocr.formula to identify and structure formulas. - Font Property Extraction:
Extracts font properties such as font family, style (italic/normal), weight (bold/normal), text and background color. The ocr.font capability extends the document’s style analysis. - Barcode Property Extraction:
Extracts various barcode types (e.g., QR Code, Code 39, PDF417) and their content. Barcodes are identified and stored in a structured format with polygon coordinates using ocr.barcode. - Language Detection:
Detects the primary language of each text line along with confidence scores, stored in a languages collection under the analysis result. - Searchable PDF:
Converts scanned image PDFs into searchable PDFs by overlaying extracted text. This feature allows deep searchability within PDF content. - Key-Value Pairs:
Extracts pairs of identifiable keys (e.g., labels or fields) and their corresponding values, enabling structured data extraction from forms and unstructured documents. - Query Fields:
Extends schema extraction from prebuilt/custom models by allowing specific field names to be queried. Supports extracting up to 20 fields per request, enhancing targeted data extraction.
Example Code for Key-Value Pair Extraction
Here is a look at some example code for connecting the Endpoint in code and making a request to process a document.
Microsoft’s example code is available here: https://github.com/Azure-Samples/document-intelligence-code-samples/tree/main/.NET(v4.0)
In this example, we extract Key-Value pairs, which is a very useful function when analysing the content of documents, for example when scanning to digitise data.
Excluding some boilerplate code that points at a default document to process, this the code to run this operation. The dependency used is: Azure.AI.DocumentIntelligence via Nuget.
Performing the data extraction is a simple matter of creating a DocumentIntelligenceClient, providing it with the endpoint and a key for the Document intelligence Resource set up on the Azure platform.
Next the content is provided either from reading a file on the system as a binary, or by providing the URL.
We want to use the KeyValuePairs feature, so it is specified.
We then await a call on the client to AnalyzeDocumentAsync, passing it the above content and features.
Once we get the result, any key value pairs are for this example printed out.
The example code (linked project above) analysed the following example document thusly:
Conclusion
Azure AI Document Intelligence is a powerful tool for automating document processing and data extraction, offering flexibility with prebuilt and custom models. Its ability to handle diverse document types, along with features like high-resolution extraction, formula recognition, and barcode detection, makes it highly adaptable for various industries.
This technology presents an excellent opportunity to streamline document-heavy tasks, reduce manual errors, and improve overall efficiency. By leveraging Azure AI Document Intelligence, Dataworks can enhance its solutions and deliver faster, more accurate document processing capabilities to clients. To find out how Dataworks can help your team leverage digital contact us today.
Get In Touch
To stay up to date with Dataworks Limited news and events, connect with us via the links below or call us on 051 878 555.