Last time, we dove into higher-order functions that Underscorejs provides. This time, we'll be utilizing those higher-order functions to process text on the DOM of a page.

The Problem

Given three paragraphs of text in HTML, find and print all big words within the text. Big words shall be defined by anything with more than eleven characters. The initial set of paragraphs will be defined within an <article> block identified by lipsum. The solution should be printed to the <ul> identified by long_words.

Algorithm Outline

We want to outline the algorithm before we begin programming so that we're aware of our situation, our objective and how to achieve that objective using our algorithm.

  1. Extract the text from the paragraphs under the <article>.
  2. Union the paragraphs into a single block of text.
  3. Filter for all words longer than eleven characters.
  4. Print the solution of long words into the DOM under the <ul>

Context of the Problem

I have written the problem into an HTML5 document to demonstrate the structure as per problem definition. Note here that I am using HTML5 instead of HTML4 or XHTML2.0. Additionally, we use lipsum are our boilerplate.

<!DOCTYPE html>
<html>
<head>Prelude into Underscorejs: Text Processing on the DOM</head>
<body>
<article id="lipsum">
  <h2>Three Paragraphs of Lipsum</h2>

  <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
  Pellentesque vitae lectus at augue adipiscing facilisis in et
  dolor. Maecenas semper scelerisque blandit. Sed ut nibh eget
  purus aliquam suscipit. Nulla in facilisis leo. Etiam augue
  ligula, blandit et volutpat eget, ultrices hendrerit libero.
  Curabitur facilisis tincidunt neque, ornare viverra dui imperdiet
  ut. Suspendisse rhoncus, diam ut congue pharetra, metus tellus
  vehicula tellus, id imperdiet leo sem quis urna.<p>

  <p>Proin massa odio, malesuada quis aliquet suscipit, porta in
  arcu. Nunc ac egestas metus. Sed ac ligula vel neque molestie
  consectetur. Sed ac lacus nulla, sollicitudin interdum arcu.
  Quisque neque elit, hendrerit at mollis id, porta nec sem. In
  dapibus convallis ligula sed laoreet. Nulla non leo turpis. Sed
  dictum magna sit amet neque gravida malesuada. Donec pulvinar
  aliquam nisi, at malesuada libero posuere in. Donec ullamcorper
  accumsan eros nec interdum. Nam pharetra purus eget quam auctor
  nec placerat elit varius. Suspendisse imperdiet vehicula elit, at
  consequat dolor feugiat ut. Vestibulum ac suscipit augue.
  Curabitur scelerisque sollicitudin nisl nec sodales. Sed sed
  placerat ligula.<p>

  <p>Phasellus cursus sagittis augue, sit amet rutrum felis
  adipiscing vel. Pellentesque suscipit posuere sollicitudin. Proin
  pretium enim vel diam lobortis vel ullamcorper purus auctor.
  Vestibulum quis orci sem, nec vulputate arcu. Sed pretium
  facilisis ullamcorper. Curabitur placerat libero et quam rhoncus
  varius. Mauris bibendum felis non mi tincidunt id congue urna
  bibendum.</p>
<article>

<article>
  <h2>Long Words</h2>
  <p>In this case, we define <em>long words</em> as words with
  more than eleven characters. After pulling out these long
  words, we may provide definitions for they may be new to a
  reader's vocabulary.</p>
  <ul id="long_words"></ul>
</article>
</body>
</html>

Walking through the Underscorejs Algorithm

According to our algorithm, we must first extract the text from the paragraphs within the <article> defined by lipsum.

We can start by gaining control of the <article> tag through document.getElementById. Afterwards, we need to acquire a list of all paragraph nodes under our article. Underscorejs easily allows us to do this by filtering childNodes of the article against their nodeName; specifically, we filter against the nodeName of the <p> tag. Beware: all HTML elements have capitalized names.

function processLipsum() {
  // Get the paragraph nodes of the article
  var lipsumArticle = document.getElementById('lipsum');
  var lipsumParagraphs = _.filter(lipsumArticle.childNodes,
    function(node) {
      return node.nodeName === 'P';
    });
};

document.addEventListener("DOMContentLoaded", processLipsum, false);

Because our algorithm is dependent upon the DOM contents, we must execute our code once the entire DOM content has loaded; consequently, we bind the beginning of our function to the DOMCOntentLoaded event.

At this point, we have a list of all <p> tags that are children of the containing <article>.

Now we can simply start _.chain the list of paragraphs and process it through a set of higher-order functions sequentially to acquire our new list of big words.

Union the List of Paragraphs

First, we need to coalesce the paragraphs' text into a single list. We can do this by first replacing all non-alphanumeric symbols and extraneous spaces with a single space through replace. Afterwards, we simply split the paragraph according to spaces such that only words are given to us in our new list.

This list processing can be done with _.map by mapping the list of paragraph to a list of words within the specified paragraphs.

Second, once the list of words have been generated, it is a deep list i.e. it contains a list of list of words originally within our paragraphs. This can easily be subverted by _.flatten which turns our list into a shallow list i.e. inner lists coalesce into the containing list.

Now, our paragraphs should have been successfully unioned into a list words as shown below.

function processLipsum() {
  // Get the paragraph nodes of the article
  var lipsumArticle = document.getElementById('lipsum');
  var lipsumParagraphs = _.filter(lipsumArticle.childNodes,
    function(node) {
      return node.nodeName === 'P';
    });

  var lipsumWords = _.chain(lipsumParagraphs)
    // Reduce all extranneous whitespace and symbols to a single space.
    // Then split the words by space.
    .map(function(node) {
      return node.innerHTML.replace(/[\W\s]+/g, ' ').split(' ');
      })
    // Union all subarrays into a large array of words.
    .flatten()
    // Return the flattened list of words
    .value();
};

document.addEventListener("DOMContentLoaded", processLipsum, false);

Filter the List of Words

This section is trivial though necessary to our objective: filter the following list to return all big words which have more than eleven characters. Simply, we will chain the _.filter function further and return the final list of words by ending our _.chain with _.value.

function processLipsum() {
// Get the paragraph nodes of the article
var lipsumArticle = document.getElementById('lipsum');
var lipsumParagraphs = _.filter(lipsumArticle.childNodes,
  function(node) {
    return node.nodeName === 'P';
  });

var lipsumWords = _.chain(lipsumParagraphs)
  // Reduce all extranneous whitespace and symbols to a single space.
  // Then split the words by space.
  .map(function(node) {
    return node.innerHTML.replace(/[\W\s]+/g, ' ').split(' ');
    })
  // Union all subarrays into a large array of words.
  .flatten()
  // Find all words with more than eleven characters.
  .filter(function(word) {
    return word.length > 11;
  })
  // Return the filtered list of words.
  .value();
};

document.addEventListener("DOMContentLoaded", processLipsum, false);

We have successfully acquired the list of big words. Now, the final task is to print the list into the DOM.

Printing to the DOM

Instead of polluting our function space, we will delegate the printing to an auxiliary function. In this function, we must create a document fragment to begin storing new <li> elements.

For each of the words we pass to the function, it will create a new list element, insert a long word into it as a text node and append it the document fragment. _.each does this for us trivially.

Finally, we append the list of list elements to the unordered list identified by long_words.

function printLongWords(words) {
  var fragment = document.createDocumentFragment();
  var listElm = null;
  _.each(words, function(word) {
    listElm = document.createElement('LI');
    listElm.appendChild(document.createTextNode(word));
    fragment.appendChild(listElm);
  });
  document.getElementById('long_words').appendChild(fragment.cloneNode(true));
};

Voila! We have successfully printed to the DOM.

Putting It All Together Now

<!DOCTYPE html>
<html>
<head>Prelude into Underscorejs: Text Processing on the DOM</head>
<script type="text/javascript" src="underscore.js"></script>
<script type="text/javascript">
function processLipsum() {
  // Get the paragraph nodes of the article
  var lipsumArticle = document.getElementById('lipsum');
  var lipsumParagraphs = _.filter(lipsumArticle.childNodes,
    function(node) {
      return node.nodeName === 'P';
    });

  var lipsumWords = _.chain(lipsumParagraphs)
    // Reduce all extranneous whitespace and symbols to a single space.
    // Then split the words by space.
    .map(function(node) {
      return node.innerHTML.replace(/[\W\s]+/g, ' ').split(' ');
      })
    // Union all subarrays into a large array of words.
    .flatten()
    // Find all words with more than eleven characters.
    .filter(function(word) {
      return word.length > 11;
    })
    // Return the list of words.
    .value();
  printLongWords(lipsumWords);
};

function printLongWords(words) {
  var fragment = document.createDocumentFragment();
  var listElm = null;
  _.each(words, function(word) {
    listElm = document.createElement('LI');
    listElm.appendChild(document.createTextNode(word));
    fragment.appendChild(listElm);
  });
  document.getElementById('long_words').appendChild(fragment.cloneNode(true));
};

document.addEventListener("DOMContentLoaded", processLipsum, false);
</script>
<body>
<article id="lipsum">
  <h2>Three Paragraphs of Lipsum</h2>

  <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
  Pellentesque vitae lectus at augue adipiscing facilisis in et
  dolor. Maecenas semper scelerisque blandit. Sed ut nibh eget
  purus aliquam suscipit. Nulla in facilisis leo. Etiam augue
  ligula, blandit et volutpat eget, ultrices hendrerit libero.
  Curabitur facilisis tincidunt neque, ornare viverra dui imperdiet
  ut. Suspendisse rhoncus, diam ut congue pharetra, metus tellus
  vehicula tellus, id imperdiet leo sem quis urna.<p>

  <p>Proin massa odio, malesuada quis aliquet suscipit, porta in
  arcu. Nunc ac egestas metus. Sed ac ligula vel neque molestie
  consectetur. Sed ac lacus nulla, sollicitudin interdum arcu.
  Quisque neque elit, hendrerit at mollis id, porta nec sem. In
  dapibus convallis ligula sed laoreet. Nulla non leo turpis. Sed
  dictum magna sit amet neque gravida malesuada. Donec pulvinar
  aliquam nisi, at malesuada libero posuere in. Donec ullamcorper
  accumsan eros nec interdum. Nam pharetra purus eget quam auctor
  nec placerat elit varius. Suspendisse imperdiet vehicula elit, at
  consequat dolor feugiat ut. Vestibulum ac suscipit augue.
  Curabitur scelerisque sollicitudin nisl nec sodales. Sed sed
  placerat ligula.<p>

  <p>Phasellus cursus sagittis augue, sit amet rutrum felis
  adipiscing vel. Pellentesque suscipit posuere sollicitudin. Proin
  pretium enim vel diam lobortis vel ullamcorper purus auctor.
  Vestibulum quis orci sem, nec vulputate arcu. Sed pretium
  facilisis ullamcorper. Curabitur placerat libero et quam rhoncus
  varius. Mauris bibendum felis non mi tincidunt id congue urna
  bibendum.</p>
<article>

<article>
  <h2>Long Words</h2>
  <p>In this case, we define <em>long words</em> as words with
  more than eleven characters. After pulling out these long
  words, we may provide definitions for they may be new to a
  reader's vocabulary.</p>
  <ul id="long_words"></ul>
</article>
</body>
</html>

Conclusion

Higher-order functions significantly simplify the processing of stream-based or list-based data such as text.

Using JavaScript's interface into the DOM, we were able to show how functional programming (though impure) may have a place on the web to simplify common computational tasks and even DOM tasks.