If you want to analyze browsed web pages, first you need a web browser component. We can use Script Injection methods to get more details about the HTML or XML documents, for example to get Title of that webpage, or full Body of that page, or value of an element. This is very useful for data mining or for analyzing web pages, this can be applied with time intervals for refreshing web pages, like pages about weathers, news etc.
Document Object Model (DOM)
The Document Object Model (DOM) represents an HTML or XML document in memory and it is the data representation of the objects that comprise the structure and content of a document on the web. So you can use APIs to edit or create web content and applications. DOM is described well on mozilla.org here.
For example for HTML and XML documents;
DOM Object Document properties can be found here https://developer.mozilla.org/en-US/docs/Web/API/Document
DOM Object Body properties can be found here https://developer.mozilla.org/en-US/docs/Web/API/Document/body
How HTML XML Injection is Performed
DOM document has properties and iinnerHTML is the property of DOM document and with innerHTML, we can write dynamic HTML code. It is used mostly for data input fields like comment fields, questionnaire forms, registration forms, etc.
Script Injection Method in Modern C++
In RAD Studio, C++ Builder 10.4 and above there is a EdgeBrowser (TEdgeBrowser) component to navigate on web pages and it wraps the latest Microsoft WebView2 component. The EdgeBrowser component is easy to use to navigate on web pages than the previously WebBrowser which has some incompatibilities on pages. This new web browser supports fully WebView dynamic library of Canary version. If you are new to EdgeBrowser please read this and Learn To Develop Your Web Browser App in C++ Builder on Windows (EdgeBrowser).
With these information above, if you are able to successfully run EdgeBrowser as in examples; now we can easily add Script Injection method to retrieve the full text of document. This method fully gets all document data as in UnicodeString which means it can be used to analyze web pages in any language.
In C++ Builder, EdgeBrowser has Script Injection method. To apply this we should use this line as below;
1 2 3 |
EdgeBrowser1->ExecuteScript(L"this.document"); |
or you can directly navigate from Text of your Edit as here;
1 2 3 4 5 6 |
void __fastcall TForm1::Button2Click(TObject *Sender) { EdgeBrowser1->ExecuteScript(Edit2->Text); } |
For example you can obtain full body of document with this Script execution;
1 2 3 |
EdgeBrowser1->ExecuteScript(L"this.document.body.textContent"); |
Here we are injecting to ask this.document property and DOM object will return this request. To obtain this request, go to properties of EdgeBrowser, double click to ExecuteScript event and add this code to copy content of the results from web page to Memo
1 2 3 4 5 6 7 8 9 |
void __fastcall TForm1::EdgeBrowser1ExecuteScript(TCustomEdgeBrowser *Sender, HRESULT AResult, const UnicodeString AResultObjectAsJson) { Memo1->Lines->Clear(); Memo1->Lines->Add(AResultObjectAsJson.w_str()); } |
Design. Code. Compile. Deploy.
Start Free Trial
Free C++Builder Community Edition