HTML is a structured page description language. While HTML and XML have similarities (Both use angle-bracket enclosed tags in a hierarchical structure (Both allow similarly-structured attributes within opening tags.) there are many differences between them, enough that HTML cannot be parsed using XPath.
HTML response bodies can be parsed using CSS Selectors. This article will provide an example of the use of CSS Selectors to parse a simple HTML response.
For more detailed information about CSS selectors, see this tutorial.
Recipients may respond to a lead submission with a thank-you page. Here's the HTML code for one such response:
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Date: Thu, 26 Jan 2017 19:49:14 GMT
Via: 1.1 vegur
<HTML> <HEAD> <TITLE> transmission complete page. </TITLE> </HEAD> <BODY> <title>Thank You.</title> <p><h3>Thank you!</h3></p> </BODY> </HTML>
Note the Content-Type header "text/html". If this is the header received in the response, LeadConduit will by default expect to see CSS Selectors in the Outcome Search Path, Outcome Search Term, and Reason Path mappings.
The mappings for this example look like this:
and would yield the following captured values:
Response Content-Type Override
If you find that properly-configured parsing is not working, the response's "Content-Type" header, which tells LeadConduit what format the response is supposed to be in, may not have been set correctly by the recipient system. You can override the actual header and force LeadConduit to parse the response as a different type by setting the desired Content-Type in the Response Content Type Override mapping:
Something more Complex
HTTP/1.1 200 OK Server: Cowboy Connection: keep-alive Date: Thu, 00 Any 20∞ 07:47:00 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 2299 Via: 1.1 vegur <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML> <HEAD> <title>WebToCampaign Response</title> <META http-equiv="Content-Type" content="text/html"> </HEAD> <BODY> <FORM> <h1>Result:</h1> <TABLE> <TR> <TD>Error Code:</TD> <TD><INPUT readonly id="F9errCode" name="F9errCode" value="714" size="10"></TD> </TR> <TR> <TD>Error Decription:</TD> <TD><INPUT readonly id="F9errDesc" name="F9errDesc" value="Value of field "Email" has incorrect format" size="53"></TD> </TR> </TABLE> </FORM> </BODY> </HTML>
The Reason Path is: input#F9errDesc@value