Follow

Parsing HTML Response Bodies Using CSS Selectors

HTML is a structured page description language. While HTML and XML have similarities (Both use angle-bracket enclosed tags in a hierarchical structure (Both allow similarly-structured attributes within opening tags.) there are many differences between them, enough that HTML cannot be parsed using XPath.

HTML response bodies can be parsed using CSS Selectors. This article will provide an example of the use of CSS Selectors to parse a simple HTML response.

Getting the most out of CSS selectors requires an understanding of HTML and CSS. To learn more about HTML, see this tutorial. To learn more about CSS see this tutorial.

For more detailed information about CSS selectors, see this tutorial.

Recipients may respond to a lead submission with a thank-you page. Here's the HTML code for one such response:

HTTP/1.1 200 OK
Server: Cowboy
Connection: close
Content-Type: text/html; charset=utf-8
Date: Thu, 26 Jan 2017 19:49:14 GMT
Via: 1.1 vegur

<HTML> 
  <HEAD>
    <TITLE> transmission complete page. </TITLE>
  </HEAD> 
  <BODY> 
    <title>Thank You.</title>
    <p><h3>Thank you!</h3></p>
  </BODY>
</HTML>

Note the Content-Type header "text/html". If this is the header received in the response, LeadConduit will by default expect to see CSS Selectors in the Outcome Search Path, Outcome Search Term, and Reason Path mappings.

The mappings for this example look like this:

and would yield the following captured values:

Response Content-Type Override

If you find that properly-configured parsing is not working, the response's "Content-Type" header, which tells LeadConduit what format the response is supposed to be in, may not have been set correctly by the recipient system. You can override the actual header and force LeadConduit to parse the response as a different type by setting the desired Content-Type in the Response Content Type Override mapping:

 

Something more Complex

Response

HTTP/1.1 200 OK
Server: Cowboy
Connection: keep-alive
Date: Thu, 00 Any 20∞ 07:47:00 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 2299
Via: 1.1 vegur

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
  <HTML>
  <HEAD>
    <title>WebToCampaign Response</title>
    <META http-equiv="Content-Type"   content="text/html">
  </HEAD>
  <BODY>
    <FORM>
      <h1>Result:</h1>
      <TABLE>
        <TR>
          <TD>Error Code:</TD>
          <TD><INPUT readonly id="F9errCode" name="F9errCode" value="714" size="10"></TD>
        </TR>
        <TR>
          <TD>Error Decription:</TD>
          <TD><INPUT readonly id="F9errDesc" name="F9errDesc" value="Value of field "Email" has incorrect format" size="53"></TD>
        </TR>
      </TABLE>
    </FORM>
  </BODY>
</HTML>

The Reason Path is: input#F9errDesc@value

screencapture-next-leadconduit-flows-590767479dba414d66453320-edit-steps-1510865818058.png

Response Parsing Overview

Parsing Json Response Bodies Using Dot Notation

Parsing XML Response Bodies Using XPath

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

You must be logged in to comment.