PDF Automation by Using UiPath Studio

A Portable Document Format(PDF) is a file format for capturing and sending electronic documents in exactly the intended format. Where PDF files play a huge role in everyday tasks and they own significant parts of processes that every industry has.

Creating or reading such files is a crucial part of PDF workflows across different market verticals which can be easily automated by using the UiPath Studio.

There are two types of PDF's are there

  • Generated PDF: The generated PDFs are those you can copy and paste the content into other files directly.
  • Scanned PDF: It is a kind of PDF format mostly in the form of Images, where you cannot copy and paste this type of content into other files.
Example.1 :

Let us see how to extract a PDF content into the message box by automation, step by step as shown below :

First, Launch the UiPath Studio in your system and create a new process called PDF Automation
creating-new-process-pdf-automation-rpa-uipath

Once the Process opened in the UiPath Studio you need to install the PDF packages. Go to Manage packages and select Official under that and select UiPath.PDF.Activities and install it.
inststall-uipath-pdf-activity-rpa-uipath

Once you install the PDF activity click on the Save button.
click-on-save-rpa-uipath

First, download a sample PDF in your system and save it any of the preferred folders. I am saving sample pdf in the Downloads folder under This PC.
download-sample-pdf-rpa-uipath

Go to UiPath Studio and then drag and drop a sequence into the designer pane. Add the Read PDF Text into the sequence as shown below.
add-read-pdf-text-rpa-uipath

Click on the three horizontal dots in the Read PDF Text box and add a downloaded PDF file path in the double-quotes.
adding-pdf-file-to-read-text-activity-rpa-uipat

The Read Text Box is having some properties in the property pane such as
read-pdf-text-propertt-pane-rpa-uipath

  • Filename : Which is a file path of the sample PDF file which has been automatically identified here.
  • Password : Some PDFs are protected with a password if your sample PDF is having any password means you can enter here or else you can leave that space empty.
  • Range : The range reference to the page number, if you are extracting a file from any particular page of the PDF file means you can enter the page number here. If you extracting all the content of the PDF file means you can mention "All" in double-quotes.
  • Output text : Create a variable in the output text by pressing Ctrl+k and name the variable. I have created a variable called pdf2text variable as shown below.

propert-pane-of-read-text-rpa-uipath

Next, add the Message Box activity inside the sequence to display text from the PDF file And add the variable name(pdf2text) inside the message box as shown below.
add-messagebox-and-variablename-rpa-uipath

Now save the sequence and run. Once the sequence start executes, the text from the sample PDF file will be displayed in the message box.
pop-up-msg-conatins-sample-pdf-content-rpa-uipath

You can compare by opening both on time.
comparing-text-from-pdf-and-output-message-box

This is how we are going to extract the file from the PDF to a message box.

Example.2 :

The following example demonstrates how to write text into a file.

Go back to the same sequence and delete the message box, and add the Write Text File activity inside the sequence as shown below.
adding-write-text-file-rpa-uipath

Click on the three horizontal dots(...) and you can select the place where you wanted to save this file. I am going to save where the process got saved by the name called sample.txt as shown below. Add the file path in the double-quotes.
save-sample-txt-file-rpa-uipath

And enter the text as pdf2text a variable name in the text box in the double-quotes.
add-sample-txt-file-location-rpa-uipath

We have created a sequence that contains an activity Read PDF Text which can read the content of the selected PDF file. The text from the PDF will be transferred to a variable called pdf2text. And now we are creating a file called sample.txt with the content of pdf2text.

Save the sequence and run. Once the sequence starts to execute. Go to your sample.txt file location and refresh it you can see that the sample.txt file has been created. Click on the sample.txt file you can see that the file is containing the content of the Sample PDF file.
the-output-of-pdf-sample-txt-rpa-uipath

This is how we are going to extract the pdf to a txt file. In the same way, you can also extract the PDF file into a Document by creating a sample.doc file.
extracting-pdf-to-doc-rpa-uipath

Now, save and run the sequence, Once the sequence starts to execute, go to the sample.doc file location, you can see that the doc file has been created. Click on the sample.doc you can see that the file contains the pdf file content.
the-doc-file-rpa-uipath

Basics of Automation in UiPath Studio

Scanned PDF Text Extraction by using UiPath Studio

In the case of scanned documents, data extraction can also be achieved by using OCR-based activities, Read PDF With OCR and Read XPS With OCR.

To select one of the three OCR engines specific to UiPath, Google OCR, Microsoft OCR, and Abbyy OCR. You can select Microsoft OCR as it is free and given from UiPath Community.

Follow the below steps to extract the scanned PDF to file. Before that first download a scanned PDF in your system.
sacanned-sample-doc-rpa-uipath

Whenever we want to copy the scanned type of PDF, we need to use the OCR (Optical Character Recognization)method. In UiPath Studio we have N number of OCR functionalities we just have to drag and drop in the sequence.

Create a new sequence called scanned PDF example.
scanned-pdf-sequence-rpa-uipath

We have already installed PDF functionality into the UiPath, search for Read PDF with OCR and add this activity inside the sequence.
adding-ocr-aactivity-into-sequence-rpauipath

Click on the three horizontal dots and select the scanned sample file path and then drop the OCR Engine Activity inside the Read PDF with OCR. In UiPath you will find Microsoft OCR engine, add this to your sequence.
adding-ocr-engine-rpa-uipath

Keep the properties as default and then create a variable called scannedpdf2text.
creating-scanned-pdf-variable-rpa-uipath

First, try to extract scanned PDF to a message box, Add Message Box activity into the sequence and then add variable name inside it.
add-message-box-rpa-uipath

Save the sequence and run. Once the sequence starts to execute, you will see a pop-up message which contains the text from scanned PDF.
the-output-of-scanned-pdf-in-a msg-box-rpa-uipath

Now let us extract the scanned PDF into a text file by creating a text file. Delete the Message Box activity and then add Write Text File inside the sequence.
add-write-text-file-to-sequence-rpa-uipath

Click on the three horizontal dots and enter the new file path in the write to Filename box and then add the variable name in the text box.
creating-txt-file-rpa-uipath

The complete sequence looks as shown below
complete-scanned-pdf-text-rpa-uipath

Save and run the sequence. Once the sequence starts to execute the text file will be created.
output-of-scanned-pdf-to-text-rpa-uipath

If you want to extract this content into doc means you can extract by creating a doc file as shown in the previous example.

Introduction to Robotic Process Automation

PDF Activities by using UiPath Studio

The PDF pack contains activities designed to extract data from PDF and XPS files and store them into string variables. The data can be extracted from the entire document or from a range of pages specified under the Range property found in each of the activities.
pdf-activities-rpa-uipath

Most of the activities are self-explanatory like Read PDF with OCR, Read PDF Text and Manage password, etc.. Where the Manage Password is used to change the password of your PDF. Join PDF File is used to join more than two PDFs in a single file.

Extract PDF File Range is used to Extract the required number of pages into another PDF. For example, if a PDF contains ten pages and you wanted to extract only two pages then this extract pdf file range will make it easy.

Example.1 :

The Following example Demonstrates the joining PDF Files

Join PDF Files: The Join PDF File activity is used to join two or more PDFs into a single file. Let us create a new sequence called JoinPDF files.
creating-new-sequence-join-pdf-rpa-uipath

Add the Join PDF File activity inside it. And in the property pane, you can see file list where you can enter all the file list which you are going to extract. And theIn Output filename enter the name of the File where you want to join all the extracted PDF files.
proprty-pane-join-pdf-files-rpa-uipath

Add an Assign activity inside the sequence before the Join PDF File and create a variable, I am creating a variable called file list.
adding-assign-activity-rpa-uipath

And then write a small function as shown below in the enter VB expression box. Click on the property pane and enter the function in the Expression Editor wizard. Press Ctrl+space to see the available function in the UiPath.

The function to get all the PDF files in a directory is as shown below.

Directory.GetFiles("PDF path","*.PDF")
Where the Getfiles collects all the files into a directory.
PDF path refers to the location of PDF files in your system.
*PDF selects only the PDF files in your folder.

I have PDF files in my UiPath folder under Documents in ThisPC and hence I am going to give the complete path of the file location in double-quotes.
epression-editor-wizard-rpa-uipath

The code is as shown below.

Dirctory.GetFiles("C:UsersUserDocumentsUiPath",".*pdf")

writing-function-for-collecting-only-pdf-files-rpa-uipath

Next, set the variable type as Array(strings). Click on the variable pane and the select Array[T] under variable type, select the String under Select Type wizard.
set-varible-type-tp-string-rpa-uipath

Click on the Join PDF File activity and add filename in the property pane under FileName as a filelist.

adding-filename-in-the-proprty-pane-of-join-pdf-file

Now create a combined .pdf file and add the path in the Join PDF File in the double quotes as shown below.
creating-combined-pdf-rpa-uipath

Now save and run the Sequence. Once the sequence starts to execute, the combined file got created which contains the content of two different PDFs.
combined-file-contains-sample-pdf file-content-rpa-uipath
combined-file-contains-sample-pdf-2-text-rpa-uipath