web-crawler-2

Patterns in the web applications for automated testing

Posted on Posted in Software Testing, Test Automation

One of the key reasons for doing automated testing is to ensure that time is not spent on doing repetitive tasks which can be completed by tools without human intervention. Automation could be one of the most effective tools in your toolbox but it is not a silver bullet that will solve all the problems and improve quality. Automation tools are obedient servants, and as a tester, we need to become their master and use them properly to realize their full potential. It is very important to understand that automation tools are only as good as we use them. Converting test cases from manual to automated is not the best use of automation tools. They can be used in much more effective ways.

It is very important to understand that automation tools are only as good as we use them. Converting test cases from manual to automated is not the best use of automation tools. They can be used in much more effective ways.

Creating robust and useful test automation framework is a very difficult task. In the web world, this task becomes even more difficult because things might change overnight. If we follow so-called best practices of automation taken from stable, desktop applications, they will not be suitable in the web environment and probably will have the negative impact on the project’s quality.

Many problems in the web world are identical to one another. For example irrespective of any web applications we always need to validate things such as the presence of title on all the pages.Depending on your context may be the presence of meta-data on every page, the presence of tracking code, presence of ad code, size and the number of advertising units and so on.

The solution presented in this article can be used to validate all, or any of the rules mentioned above , across all the pages in any domain / website. A bit of context, we were given the mandate to ensure that specific tracking code is present on all the pages of a big website. In a true agile fashion, once this problem was solved it was extended and refactored to incorporate many rules on all the pages.

 This solution was developed using Selenium Remote Control with Python as scripting language. One of the main reason for using tools such as Selenium RC is their ability to allow us to code in any language and this allows us to utilize the full power of standard language. For this solution, a python library called Beautiful Soup was used to parse HTML pages. This solution was ported to another tool called Twill to make it faster. Since the initial code was also developed in Python, converting it to Twill was a piece of cake.

Essentially this solution / script is a small web crawler, which will visit all the pages of any website and validate certain rules. As mentioned earlier, problem statement for this is very simple i.e. “ Validate certain rules on every web page for any given website ”. In order to achieve this, following steps were followed

  1. Get Page.
  2. Get All the links on the page.
  3. Get the first link and if the link is not external and crawler has not visited it, open link.
  4. Get Page Source, get all the links and add them to the list of links to process.
  5. Validate all the rules you want to validate on this page.
  6. Repeat 1 to 5 for all the pages.

It is worth mentioning here that rules that can be validated using this framework are the rules, which can be validated by looking at the source code (static analysis) for the page. Some of the rules that can be validated using this script are –

  1. Make sure that title is present for all the pages and is not generic

  2. Check the presence of meta tags like keywords and description on all the pages.

  3. Ensure that instrumentation code is present on all the pages

  4. Ensure that every image has an alternative text associated with it

  5. Ensure that ad code is coming from the right server and has all the relevant information we need.

  6. Ensure that size of the banners and skyscrapers used for advertisement is proper.

  7. Ensure that every page contains at least two advertisements and no page should have more than four advertisements, except the home page.

  8. Ensure that master CSS is applied on all the pages for a given domain.

  9. Make sure that all the styles are coming from the CSS files and styles are not present for any element on a web page.

Above mentioned list might give you some idea of what can be achieved using this approach. This list can be extended very easily. It is limited only by your  imagination 🙂

In the next article, we will look at the code snippets and explain how easily these rules can be customized and validated across all the pages on any given domain.

Please follow and like us: