John Dalesandro

Excel VBA: Automate E-mail Content Extraction from Outlook

In this scenario, Excel VBA is used to extract information from structured e-mails in Outlook using both text-based and HTML table-based methods. By automating the extraction process with VBA, data quality and accuracy are improved, and it is much faster than manually copying and pasting from each e-mail.

Text-based extraction uses the Replace and Split functions, while HTML table-based data is extracted using HTML object library functions.

Structured E-mail Template

Automation depends on consistent, structured data. Each e-mail has the subject Template Email Subject and the body follows this format: The account for <Full Name> has been created., where <Full Name> represents the name associated with the created account. The e-mail body also includes a structured table with user details like Last Name, First Name, Company, Department, and Job Title.

The macro extracts the Full Name from the e-mail body and adds it to an Excel worksheet. It then extracts additional user details from the structured table and adds this information to the Excel worksheet.

Structured E-mail Template

Outlook Configuration

In Outlook, create a sub-folder named Automation within the default Inbox to collect relevant e-mails. You can use an Outlook rule to automatically move these e-mails into the sub-folder. Keeping the e-mails in a separate folder helps reduce the VBA search effort. In this example, the sub-folder name Automation and its location in the default inbox are hard-coded. If the folder is created elsewhere, you will need to adjust the code accordingly.

Automation Sub-folder in Outlook

Enable Libraries

To use this function, you must enable the Microsoft Outlook 16.0 Object Library (used to access and manipulate Outlook) and Microsoft HTML Object Library (used to traverse structured HTML). The version number varies based on the installed version of Microsoft Excel.

  1. From the Developer ribbon in Microsoft Excel, click Visual Basic.
  2. Once the Visual Basic for Applications window opens, go to the Tools menu and select References….
  3. From the References dialog box, check/enable Microsoft Outlook 16.0 Object Library.
  4. From the References dialog box, check/enable Microsoft HTML Object Library.
  5. Click the OK button.
Enabling Libraries in Visual Basic for Applications

Add Source Code

The VBA code is added directly to ThisWorkbook within the Visual Basic for Applications window.

  1. From the Developer ribbon in Microsoft Excel, click Visual Basic.
  2. Once the Visual Basic for Applications window opens, select ThisWorkbook from the Project Explorer pane.
  3. Add the following code.
Option Explicit

Public Sub ExtractDetailsFromOutlookFolderEmails()
  On Error GoTo errorHandler
  
  Dim olApp As Outlook.Application
  Dim olNameSpace As Namespace
  Dim olFolder As Folder
  Dim olItem As Object
  Dim olItemBody As String
  Dim olHTML As MSHTML.HTMLDocument
  Dim olElementCollection As MSHTML.IHTMLElementCollection
  Dim htmlElementTable As IHTMLElement
  Dim iTableRow As Integer
  Dim iTableColumn As Integer
  Dim wksList As Worksheet
  Dim iRow As Integer
  Dim iColumn As Integer
  Dim strFullName As String
  
  Set wksList = ThisWorkbook.Sheets("Sheet1")
  
  Set olApp = CreateObject("Outlook.Application")
  Set olNameSpace = olApp.GetNamespace("MAPI")
  Set olFolder = olNameSpace.GetDefaultFolder(olFolderInbox).Folders("Automation")
  
  iRow = 1

  For Each olItem In olFolder.Items
    If InStr(olItem.Subject, "Template Email Subject") > 0 Then
      iColumn = 1
      
      olItemBody = Replace(olItem.body, "The account for", "####################")
      olItemBody = Replace(olItemBody, "has been created.", "####################")
      
      strFullName = Trim(Replace(Replace(Replace(Split(olItemBody, "####################")(1), Chr(10), ""), Chr(12), ""), Chr(13), ""))
  
      wksList.Cells(iRow, iColumn) = strFullName

      Set olHTML = New MSHTML.HTMLDocument

      olHTML.body.innerHTML = olItem.HTMLBody
      
      Set olElementCollection = olHTML.getElementsByTagName("table")
      
      For Each htmlElementTable In olElementCollection
        For iTableRow = 0 To htmlElementTable.Rows.Length - 1
          For iTableColumn = 0 To htmlElementTable.Rows(iTableRow).Cells.Length - 1
            On Error Resume Next
            If iTableColumn Mod 2 = 1 Then
              iColumn = iColumn + 1
              wksList.Cells(iRow, iColumn).Value = htmlElementTable.Rows(iTableRow).Cells(iTableColumn).innerText
            End If
            On Error GoTo errorHandler
          Next iTableColumn
        Next iTableRow
      Next htmlElementTable
      
      iRow = iRow + 1
    End If
  Next olItem
  
errorHandler:
  If Err.Number <> 0 Then
    MsgBox "Error extracting details. [ExtractDetailsFromOutlookFolderEmails]"
  End If
End Sub

Code Description

This code has been tested with Microsoft Office Professional Plus 2019 using the described e-mail template and Outlook configuration. It includes hard-coded values that must be updated for different e-mail templates and configurations, such as the Excel worksheet name (Sheet1), and the Outlook sub-folder name (Automation) and location.

The code connects to Outlook and accesses a sub-folder named Automation in the default inbox, which holds all the structured e-mails.

It then loops through each e-mail in the sub-folder, checking for those with the subject containing Template Email Subject.

For text-based information extraction, the code replaces specific phrases (The account for and has been created.) in the e-mail body with ####################. This common delimiter makes it easier to extract the text between them. For example, the modified body would look like this:

############## <Full Name> ##############

The code uses the Split function to extract the text between these delimiters and removes any extra whitespace, line feeds, form feeds, and carriage returns using the Trim and Replace functions. The cleaned text is then written to Column A of Sheet1.

For HTML table-based extraction, the code references the HTML table tag in the e-mail body using getElementsByTagName. It loops through the table rows and columns to collect data from the second column, ignoring the first column which contains headers.

Execute Procedure

  1. From the Developer ribbon in Microsoft Excel, click Macros.
  2. The Macro dialog box opens. Select ExtractDetailsFromOutlookFolderEmails
  3. Click the Run button to execute the procedure.
Execute Macro Procedure

Result

After running the procedure, the worksheet is filled with information from both the text-based and table-based methods. This example tested with only one e-mail, but if there were multiple e-mails in the Outlook sub-folder, the sheet would have an additional row for each e-mail.

Extracted Data in Excel Worksheet

Summary

This scenario demonstrates how VBA can automate data extraction from structured e-mails in Outlook to an Excel worksheet, enhancing data quality and processing speed compared to manual methods. The process involves two extraction techniques: a text-based method using string manipulation functions and a structured table-based extraction using HTML object library functions.