Sunny Books
What we have

DOMDocument to load and parse HTML

Sometimes we need to deal with HTML files or strings, for example, grab a section of html string, find value of a particular tag, or get the arribute of a tag, we may use DOMDocument class (php5). this article writes down briefly what I learned about this class.

Load document

DOMDocument::loadHTML — Load HTML from a string

The function parses the HTML contained in the string source. Unlike loading XML, HTML does not have to be well-formed to load. This function may also be called statically to load and create a DOMDocument object. The static invocation may be used when no DOMDocument properties need to be set prior to loading. (from

However, this function throws warnings all over the place when it does, like: E_WARNING : DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 6. the solutioin is to add a libxml Function libxml_use_internal_errors to Disable libxml errors and allow user to fetch error information as needed. To do so, this function need to be set as libxml_use_internal_errors(true) before calling loadHTML.

Here is the function of load document:

function load_dom($str) {
 	$dom = new domDocument(); 
 	libxml_use_internal_errors(true); //disable the warning 
 	$dom->loadHTML($str); //load the html into the object
 	$dom->preserveWhiteSpace = false; //discard white space 
 	return $dom;

Parse elements


getElementsByTagName searchs all the elements with the given tag name, it returns a DOMNodeList object containing all the matched elements. for retriving the information inside the returned object, for example, the attribute of an a tag, we can use DOMNodeList:item method.

An example:

$a = $dom->getElementsByTagName('a')->item(0); // more than one "a"s, get the first one
$href = $a->getAttribute('href');
$rel = $a->getAttribute('rel');
$label = $a->nodeValue;
$items = $doc->getElementsByTagName('list');
for ($i= 0; $i<$items->length; $i++;) {
	echo $items->item($i)->nodeValue."\n";


The function getElementById searchs the element with the given id,it returns the DOMElement or null if the element is not found.