Monday 9 December 2013

Analysis of the JP Morgan Data Breach When Applications Fail Data Security

Almost Half A Million Corporate Customers’ Data Breached in Cyberattack again JPMorgan Chase website. The bank typically keeps the personal information of its customers encrypted, or scrambled, as a security precaution. However, during the course of the breach, personal data belonging to those customers had temporarily appeared in plain text in files the computers use to log activity. Cyber criminals covet such data because it can be used to open bank accounts, obtain credit cards and engage in identity theft. You cannot implement data security without application security, as your application handles your most sensitive data on a regular basis. The JPMorgan example is something very common: from a pure policy perspective, all data security practices were followed – security controls verifying that the defined data repositories are encrypted were in place, as was a proper audit trail.

http://bit.ly/18OvH79

Wednesday 5 October 2011

Advisory: Insecure Redirect in Microsoft SharePoint


By Irene Abezgauz September 13th, 2011
This vulnerability was discovered by Seeker®


Overview
An Insecure Redirect vulnerability has been identified in Microsoft SharePoint shared infrastructure. This vulnerability allows an attacker to craft links that contain redirects to malicious sites in the source parameter used throughout SharePoint portal.

The exploitation technique detailed in this document bypasses the cross application redirection restriction which normally limits such redirects restricting access to external sites.

Details
Multiple pages and components in Microsoft Sharepoint use the source parameter to redirect users to a new location after accessing a certain page, such as:

POST 
  /Docs/Lists/Announcements/NewForm.aspx?Source=http%3a%2f%2f127.0.0.1%2fDocs%2fdefault.aspx
In order to avoid cross application redirects (which pose a threat to the system), Microsoft Sharepoint enforces checks on these redirects, and limits them to localhost or 127.0.0.1, or the SharePoint server IP (the IP redirect is only valid if the redirect is to an actual SharePoint page on the server, redirects to localhost or 127.0.0.1 will work regardless of existence of relevant page).

The implementation of this verification, however, is flawed, and can be circumvented by creating hostnames which begin with the string localhost, or 127.0.0.1 even if they are not localhost.

Due to domain naming restrictions the 127.0.0.1 prefix cannot be used in exploitation, as http://127.0.0.1.quotium.com is not a valid domain name – subdomain names cannot be digits only. However, redirects to http://localhost.quotium.com or http://localhostie.quotium.com are valid. The following prefixes can be provided into the Source parameter to exploit this vulnerability:
localhostaaa, localhost.quotium.com, etc.

An attacker can generate an attack by creating a site containing localhost in its name, and crafting a URL which embeds into the source parameter a link that lead to sites outside the current application. Once a victim follows the specially crafted link he indeed arrives at the selected page of the vulnerable SharePoint application. Once the page operation is completed, the user will be redirected to the URL in the source parameter.

Exploit
Sample exploitation of this vulnerability would be crafting the following link:

http://MySharePoint/Docs/Lists/Announcements/NewForm.aspx?Source=http%3a%2f%2flocalhost.quotium.com
It is important to note that in many situations, even if the application does not use the source parameter by default, this parameter can be added manually to the URL, leading to exploitation of this vulnerability.

Affected Systems
Microsoft Sharepoint 2007
Microsoft Sharepoint 2010

Solution
Microsoft has released a fix for this vulnerability, see http://technet.microsoft.com/security/bulletin/MS11-074 for further information.

Credit
The vulnerability was automatically discovered by Seeker® - New generation application security testing solution, utilising ground breaking BRITE™ technology (Behavioral Runtime Intelligent Testing Engine).

Further research and publication was performed by Irene Abezgauz, Product Manager, Seeker.
For more information about Seeker please visit www.quotium.com

Thursday 14 July 2011


Amazon EC2 Remote Access first time with .pem file

I keep forgetting this so I've placed it here my my own use and for the benefit of others

  1. Go to https://console.aws.amazon.com/ec2/home
  2. On the EC2 tab, select Instances in the Navigation section.
  3. Right-click on your newly created instance and choose Get Windows Password.
    1. Note that you typically have to wait about 15 mins before this option is available.
    2. Open up the .pem file that you downloaded as part of the Key Pair routine and copy the entire contents.
    3. In the Retrieve Default Windows Administrator Password dialog box, paste your .pem file into the Private Key box.
    4. Click on Decrypt Password, and make a note of the password that is displayed.
  4. Right-click on your newly created instance and choose Connect.
  5. Follow either Option 1 or Option 2.  Option 1 is probably the easiest option as it gives you a shortcut to the server that you can then give to any administrator who needs it.
  6. Double-click the downloaded shortcut (if you chose Option 1) and enter the Administrator password from the previous step, to login to your new Server 2008 instance.

Wednesday 2 March 2011

Sharepoint Performance / Load Testing using Microsoft Visual Studio vs Quotium QTest

Below is a white paper written in 2009 - QTest is now even better for performance testing Share point - contact me for more information.

SharePoint Performance Testing White Paper: Scripting
Performance Test Scripts for Complex SharePoint Applications by Adam Brown

Introduction

This paper focuses on the scripting phase of performance testing as this can be one of the most time & effort intensive parts of any load testing project, particularly when working with bespoke and complex applications, including SharePoint applications. Also this can be the phase where most technical experience is required.

Abstract

The reason for the paper is following an attempt to use Microsoft Visual Studio Team Test System Web Testing to performance test a complex SharePoint implementation.
It was found that creating scripts (transactions, user journeys whatever you like to call them) for simple SharePoint applications could be straight forward, however for a more complex implementation the dynamic parameterisation that visual studio does simply did not parameterise all necessary variables and a coding approach was required. The same scenario was then evaluated using QTest from Quotium which was able to parameterise all variables quickly and without the need for coding.

Executive Summary

·  In this scenario, Visual Studio Team System for Testers would require at least 540 – 1080 lines of code to be manually edited or written plus other investigative activity to make a suite of scripts.
·  QTest was able to handle the required parameterisation without any code having to be written; QTest generated and integrated all the code required.

The application under test (AUT)

The objective of the application was to make more of the functionality of Microsoft Office Project available via a web interface, this was then deployed via SharePoint so that staff unfamiliar with MS Project would be more comfortable with a web interface and MS Project did not need to be installed on all machines. It enables users to create, edit and view projects. The application was written by Raona of Spain.

The scenario

We decided to record the process of creating a project using the application, that way we could see on the server if the transactions simulated with the tool were being properly executed. The transaction steps were as follows:
1: Navigate to Application HomePage
2: Click New Project
3: Enter Project Name, Template & and Zone  Click Create Project
4: Click Save and Publish
5: Click OK when asked to check in
6: Click Save

The captures

Captures were made using the built in recording mechanisms featured with each tool, this comprises of clicking the record button and interacting with the AUT as a real user would.

Microsoft’s Visual Studio Team System for Testers

First we used the Microsoft tool, capture was straight forward and the tool appeared to detect dynamic parameters while generating its visual script (no code seen at this point). Dynamic parameters are incredibly important when generating scripts as if these are not dealt with correctly then the entire script (and any load tests executed with it) can be rendered useless.
Looking at the script that was generated and the dynamic parameters, at first glance the tool had done a good job.

It was clear that we would have to parameterise the project name (used in step 3 of our transaction steps list above) as duplicate project names are not allowed in the AUT, however before parameterising this value we thought we’d better check that the script would run after simply manually updating the project name parameter in the script by using find and replace (we changed it from “TestProj2” to “TestProj3”). This way we could quickly find out what else, if anything needed to be parameterised.
After attempting to run the script with the new project name parameter it failed receiving a HTTP Code 500 - internal server error from the application under test.
Note the Server 500 Error (highlighted in blue) and the details in the window below it.


After a closer look at the script (at this point we dropped down into the generated Visual Basic code) we could see exactly what had been automatically parameterised and what had not and it quickly became obvious why this script had caused a 500 internal server error and why this script, in it’s current state, could never work and could never be used to generate accurate load.
The reasons for this are explained below as follows:
The dynamic parameters that the Visual Studio Web Testing tool did not deal with are as below:
ProjectUID
CLSID
SessionUID
JobUID
ViewUID
ViewTimeStamp
The problem is that the AUT as with many applications needs to use these types of parameters to maintain state, identify objects and maintain sessions. To use a parameter from a previous transaction recording simply can’t work and if it does not cause visible errors, it will result in inaccurate load being generated were the script to be used as part of a test.
The parameters it did deal with were as follows:               __VIEWSTATE
__EVENTARGUMENT
__EVENTVALIDATION
__REQUESTDIGEST
__LASTFOCUS
Cookies
URL Parameters in redirects

These are standard Microsoft parameters and are dealt with correctly. The problem with the parameters mentioned previously is that they are more than likely created in the development process, so Visual Studio’s web testing tool can’t know anything about them.
An example of a parameter that has not been parameterised is shown below is the XML request made by Visual Studio during a failed replay of the script.

If we look at the script below we can see the parameter is static in the script not dynamic (look in the chunk of unparsed XML):

Compare this with a parameter that has been automatically parameterised and made dynamic (see __VIEWSTATE):
So it seems that the only way to make the script work is to manually insert code to extract the dynamic parameters from the relevant web server responses, store them as variables and insert them in the place of the static parameters that need to be replaced in the XML.
Of course this should be no problem for a seasoned developer or tester with VB / C# coding / scripting experience, however it may be time consuming as there are at least 6 parameters here that need to be replaced, each of which appears any number of times in the script, depending on the size of the script. Add to that the fact that we will need to produce more than one script to create a realistic scenario, furthermore, when the application changes, the script will more than likely need to be re-recorded to ensure that any changes are correctly handled in the script. This makes for a lot of lines of code and hence a lot of time spent during the scripting process. Let’s quantify this:
Once we’ve figured out a way of finding out where the parameter came from and the best way to extract it with Visual studio we have the following:
6 Parameters each appearing on average 6 times in each script.
This means that approximately 36 lines of code need to be altered.
Further more at least 18 lines of code need to be inserted for declaration and extraction (1 for declaration and one for extraction – possibly more).
More lines of code may be required for complex parsing of parameters.
This means that for each script we have there are at least 54 changes required.
Typically in load testing 10-20 scripts are required for an accurate scenario, this means that we have at least 540-1080 lines of code to edit or insert for each load test we prepare.
If the application is changed then all of this work has to be re-visited or repeated.
What’s required is a parsing engine that can be configured to deal with bespoke / non standard parameters.

Quotium QTest



Next we used QTest to capture the same transaction. QTest does not automatically parameterise the script once it is generated; automatic parameterisation is achieved by selecting a model from the drop down box and clicking ‘Apply Model’. QTest is not limited to Microsoft technologies so there are other models on there for J2EE, Siebel, SAP etc.
Note the Model drop down list in the top left of the application

By default the SharePoint model was not there, this was downloaded separately as an XML file. We found however that the .Net model was able to make most of the parameterisation that Visual Studio could.

After the parameterisation process QTest had covered everything that could be seen as a header parameter, it had not covered any parameters that appeared in the body part of a request, such as an XML request, these remained static.
It was however quite straight forward to parameterise these using Find In HTTP Responses from the right click menu (see above). Highlighting the parameter we needed to make dynamic and right clicking presented the menu we needed (below):

QTest then presented us with the locations of every instance of this parameter in all of the HTTP responses (lower part of screen in screenshot above)

After double clicking the response (above: I chose the one from the body of response ID 87 rather than the header of response 87) QTest then highlighted the parameter for extraction in the HTTP window on the right of the tool (see highlighted text in widow below),

where we were then able to select Extract Value following a right click on the selected text (see right). QTest then evaluated the text to the immediate right and to the immediate left of the parameter for extraction and used this to build an extraction rule (see Left Delimiter and Right Delimiter in screen shot above).

Using the Magnifying glass button we were able to verify that the extraction had worked correctly. Finally, the Apply button created the variable in the script, generated the extraction code and inserted it at the relevant place. All that was left was to use the find and replace to replace all instances of the hard coded value in the script with the variable that had just been parameterised.
By highlighting the static value in the script, right clicking and using Find and replace in strings… it was possible to quickly parameterise all instances of this static value as per screen shot to the right.

This process was repeated for all variables that remained hard coded in the script including the ‘Project Name’ that the user would normally enter through the keyboard.
Rather than use a list of values we decided to use a timestamp on the project name as that way we would always have a unique name for the Project. Looking at the help we found the command for timestamps. This did mean that we had a small piece of code to write as follows, this was placed at the top of the script:
UniqeName = “MyProj” + Date(“%f”);
To save time we then modified the SharePoint model to ensure that all references to the hard coded project name were parameterised. To do this we selected ‘SharePoint’ from the model drop down list on the toolbar and clicked the 'Apply Model' button to edit it. We inserted a rule for the project name to ensure that the variable we just created was used instead of the hard coded value. Please see the screen shots below.



The next step was to replay the script to see if it works.
During the replay the tool shows the HTML pages that the server responds with and finally pops up with a window offering to show replay differences. This proved to be especially useful as it compares the traffic that was generated by the browser when the script was recorded with the traffic that QTest generated. Any unexpected behaviour is quickly highlighted with this tool by severity 1,2 & 3.
Looking at the screen capture below we can see that the request that had failed with Visual Studio (see 2nd illustration on 2nd page) has now worked with QTest. This is because all of the parameters in the XML statements have been correctly dealt with.


We can also see in the replay window in QTest that the Project has been successfully created with the unique name ‘MyProj’ and a UTC:


This can also be verified in a browser:


Summary

Visual Studio is a capable tool in the hands of developers with the necessary experience to use it and the time to correctly program it. It can be suitable for use by non programmers with some simple web applications where nothing has been bespoken (in our experience this is a rare case in large organisations).
However if the application is not a standard out of the box vanilla affair, time is limited and programmers scarce then QTest offers a better approach as it’s features make a typically difficult and lengthy task (scripting) a relatively straight forward and short one.
QTest is a very capable and powerful tool in the hands of anyone with an IT background, developer or not. Therefore it’s highly suitable for testers.


Wednesday 2 February 2011

What to monitor on a windows operating system

I posted this answer to a forum following a question - it seemed like a good blog entry post too.
When monitoring an application under load that's hosted on a windows operating system start by monitoring the following metrics on the windows servers.

Physical Disk


  • % Disk TimeThe amount of time the disk was busy read or writing bytes, anything over 90% is bad.
  • Queue LengthNumber of requests outstanding on the disk at the time the performance data is collected. It also includes requests in service at the time of the collection. Multi-spindle disk devices can have multiple requests that are active at one time, but other concurrent requests are awaiting service. This counter might reflect a transitory high or low queue length, but if there is a sustained load on the disk drive, it is likely that this will be consistently high. Requests experience delays proportional to the length of this queue minus the number of spindles on the disks. For high performance, this difference should average less than two.
  • Time per TransferTime in milliseconds of the average disk transfer, anything over 30ms is not good.


Memory

  • % UsedTo calculate find 100 * committed bytes (committed bytes + available bytes). This value should not exceed 95%
  • Page FaultsAverage number of pages faulted per second. It is measured in number of pages faulted per second because only one page is faulted in each fault operation, hence this is also equal to the number of page fault operations. This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers (more than 350/s) of soft faults without significant consequence. However, hard faults, which require disk access, can cause significant delays.
  • PagingRate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of "Pages Input/sec" and "Pages Output/sec". It is counted in numbers of pages, so it can be compared to other counts of pages, such as "Page Faults/sec", without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.



Network Interface:

  • % Bytes/sBytes Total/sec is the rate at which bytes are sent and received over each network adapter, including framing characters. Network Interface\Bytes Total/sec is a sum of Network Interface\Bytes Received/sec and Network Interface\Bytes Sent/sec.
  • Output Queue Length
    Length of the output packet queue (in packets). If this is longer than two, there are delays and the bottleneck should be found and eliminated, if possible.

Processor

  • % Used % Processor Time is the percentage of elapsed time that the processor spends to execute a non-Idle thread. It is calculated by measuring the percentage of time that the processor spends executing the idle thread and then subtracting that value from 100%. (Each processor has an idle thread that consumes cycles when no other threads are ready to run). This counter is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. It should be noted that the accounting calculation of whether the processor is idle is performed at an internal sampling interval of the system clock (10ms). On todays fast processors, % Processor Time can therefore underestimate the processor utilization as the processor may be spending a lot of time servicing threads between the system clock sampling interval. Workload based timer applications are one example  of applications  which are more likely to be measured inaccurately as timers are signaled just after the sample is taken.

System

  • CongestionCalculate this with Processor Queue Length / number of CPU's. The value should be less than 10 for a performant system.

TCP

  • Segments Retransmitted/sSegments Retransmitted/sec is the rate at which segments are retransmitted, that is, segments transmitted containing one or more previously transmitted bytes. There will be a higher rate per second for a fast network connection, however this should not happen and a value of greater than 1/s would indicate a problem with the network.
  • Here's an interesting conversation on retranmits:http://fixunix.com/tcp-ip/66636-segments-retransmitted.html

Thursday 21 October 2010

Catalogue of Disasters proves performance engineering is essential

In my efforts to create impetus for performance engineering I have for the last few years delivered a talk at the UKCMG Forum in London, in my most recent talk I scoured the Internet with a goal to build a catalogue of disasters. So far these were the most notable. Please do add any more that you can think of in comments.

Catalogue of Disasters - By no means complete and definitely a work in progress, the information below is a reference of IT performance related crashes.

2002: Can't cope, won't copeNectar suspends Web registrations


By John Leyden • Posted in e-Business, 18th September 2002 16:20 GMT
Users flocking to sign up to the newly-introduced loyalty card scheme Nectar have flooded the site, forcing its backers to temporarily suspend Web-based signups.
As with the protracted delays in getting the 1901 Census Web site up and running, the backers of Nectar.com (Sainsbury's, Barclaycard, Debenhams and BP) have chronically underestimated demand.
The Nectar Web site has been receiving over 10,000 unique visitors an hour since Monday morning, according to Loyalty Management UK, which is running the programme.
That of course is not the main problem - The Reg get more visitors at peak times, for instance - no, the difficulty is that signing Nectar.com is a transaction heavy process that it maxing out the site's existing servers.
A notice on the site explains: "We are well on the way to becoming the biggest rewards programme in the UK and are currently experiencing very high volumes of traffic on the Nectar website. As a result you may experience difficulty in accessing our site. If you wish to register for the Nectar programme, please complete your registration form and post it to us in the envelope provided."
A spokeswoman for Loyalty Management UK said that Nectar is putting in additional servers to cope with extra demand, and expressed the hope that the site will be available later today.
She didn't known the platform on which Nectar.com runs but Netcraft reports that the site uses a Netscape-Enterprise/4.1 on Solaris 8 front end.
By signing for Nectar on-line, card holders get a bonus 100 points (worth 50p), and avoid the hassle of either ringing an 0870 number to register or using snailmail. This incentive, together with higher than expected early demand for the programme, sent Nectar titsup.com.
Perhaps the lack of Web access is a blessing in disguise. A Reg reader told us he was able to see another user's details when he managed, after a long struggle, to register on the site late last night.
Nectar is aware of the problem but describes it as an isolated incident, adding that it putting measures in place to prevent a repetition of the security breach.
Over six million registration packs are already with consumers and by the end of the week, Loyalty Management UK estimates over 10 million cards will be in circulation - enough to enrol more than 40 per cent of UK households.
Nectar has proved popular because it allows consumers to accumulate points at the four participating outlets and redeem credits against a wide variety of goods not just at those outlets but for other affiliates (such as Virgin, British Midland and Eurostar) as well. ®

2002: Don't meet your ancestors
1901 Census site closed for urgent repairs

By Tim Richardson Posted in Music and Media, 3rd January 2002 15:20 GMT
The UK's 1901 Census Web site that's been jammed with users since its launch yesterday has been taken off-line for some urgent maintenance.
At one point yesterday more than 1.2 million people were trying to access the site simultaneously far exceeding the site's day-to-day capacity.
Although the Public Records Office (PRO) expected the site to be popular it's been overwhelmed by the public's response and is taking measures to try and resolve some of the problems.
The site was pulled down at around 2.00pm GMT this afternoon and work is expected to last for around two hours.
A PRO spokesman told The Register that this should improve the site's performance and enable people to trace their ancestors who were alive at the turn of the twentieth century.
However, it seems the best advice is just to wait until all the fuss has died down and maybe take a look in a week or so.
The 1901 Census for England and Wales was taken on 31 March 1901 and contains the details of more than 32 million people.
Not only does the site allow Net users to records search by name and place, it also allows them to search by vessel. That's because more than 70,000 people were counted on merchant sea-going and coasting boats, inland barges and boats as well as naval vessels when the census was taken.
The site was created by the PRO and QinetiQ (formerly part of the Government's Defence Evaluation and Research Agency [DERA]). ®

2004: Christmas Spectacular

By Tim Richardson • Posted in Financial News, 3rd December 2004 10:11 GMTMarks & Spencer extends its "One Day Christmas Spectacular" offer into a second day after its website went belly up during its pre-Xmas sale.
The struggling high street retailer had tried to drum up trade with a 20 per cent-off bonanza in its stores and on its website yesterday. But so many people responded online yesterday, the M&S site fell over for several hours during the middle of the day.
As a result, the retailer has now extended its offer until this afternoon to appease frustrated shoppers.
A statement on the M&S web site reads: "For all online customers who may have had problems shopping online Thursday 2 December, we've extended our Christmas Spectacular Offers exclusively online until 2pm Friday 3 December." ®

2006: Gift-card shoppers overload iTunes Web site

The Associated Press Published: 12.29.2006
Swarms of online shoppers armed with new iPods and iTunes gift cards apparently overwhelmed Apple's iTunes music store over the holiday, prompting error messages and slowdowns of 20 minutes or more for downloads of a single song.
Frazzled users began posting urgent help messages Monday and Tuesday on Apple's technical forum for iTunes, complaining they were either not allowed into the store or were told the system couldn't process their request to download songs and videos.
It was not immediately clear how many people were affected by the slowdowns, and Apple Computer Inc. would not immediately comment Wednesday on what caused the slowdown and whether it had been fixed.
Analysts said the problems likely were the result of too many people with holiday iPods and iTunes gift cards trying to access the site at once.
Traffic indeed was heavy over the holiday, with more than four times as many people visiting the iTunes Web site on Christmas than at the same time last year, online market researcher Hitwise said Wednesday.
Some financial analysts said the interruption could be viewed as a sign that sales dramatically exceeded the Cupertino-based company's own forecasts.
"It's actually created more positive buzz among analysts. Traffic was so great it blew up the site," said Gene Munster, senior research analyst at Piper Jaffray. "If anything, it could be a positive. Demand was better than they were expecting."
Apple commands about 75 percent of the market for downloaded music, but could lose as much as 5 percent of that market share in 2007 because of increased competition from rival services, according to Piper Jaffray.
Dan Frakes, a senior editor at Macworld magazine and playlistmag.com, a Web site focused on digital music, said he and some colleagues were unable to access the iTunes store or received error messages when they tried to download songs.
However, others breezed through the process hassle-free, and Frakes successfully downloaded songs again on Wednesday. He said the problem likely was not as widespread as the discussion group chatter might indicate.
Analysts said they didn't anticipate a rash of iPod returns because of the delays.
Apple's stock price fell almost 5 percent Wednesday before rebounding to close up a penny at $81.52 on the Nasdaq Stock Market, then fell 65 cents to $80.87 at Thursday's close.

2007: Led Zeppelin reunion opens with Communication Breakdown
Website goes down like a...oh

By Chris Williams • Posted in Servers, 13th September 2007 13:12 GMT
The clamour for tickets for Led Zeppelin's reunion gig at the O2 in London in November has overloaded the registration website, frustrating thousands.
It's been down all morning and at time of writing we can't access the site. Organisers are appealing for patience and say fans have until midday on Monday to be in with a shot of a ticket.
The site registers would-be rockers in a lottery for the right to buy a £125 ticket. Led heads have been warned that eBay touts do not have tickets to sell.
The BBC reports that the organisers reckon 20 million people have tried to buy tickets. It seems more likely that 20 million attempts have been made to access it, but either way, the rush was fairly predictable.
The show at the former Millennium Dome will see John Bonham's son, Jason, sitting in on drums, and also features The Who's Pete Townshend. It has been organised as a tribute to Atlantic Records' co-founder Ahmet Ertegun, who died after a fall at a Rolling Stones concert last year. Profits will go to fund scholarships in the US and UK, and Ertegun's native Turkey. ®

2008: Debenhams

Debenhams web site, already known for availability issues during sales times is again unavailable over the Xmas sale period; other site such as Next used queuing systems to throttle the traffic to their sites and remained open for business.

2009: Obama’s Crash J.Crew web site.
During the inauguration of Barack Obama some viewers paid more attention to Michelle Obama's wife's attire it seems. So taken were they with the Gloves of the new first lady which were made by J Crew that they flocked en-mass to the ladies-wear section of the site.
By Tuesday afternoon the page that featured the gloves in question was unavailable, by Wednesday morning the whole women's section was down with a message saying: "Stay tuned…Sorry, we’re experiencing some technical difficulties right now (even the best sites aren’t perfect). Check back with us in a little while.".

It's an interesting story - there is a whole page about in the new york times here.

2009: 'Best Job In The World' Web Site Overloaded

Morning Edition, January 14, 2009
Morning Edition reported Tuesday that tourism officials in Queensland, Australia, are looking for an island caretaker. They bill it as the "Best Job in the World." The job involves swimming, snorkeling, strolling around the islands of the Great Barrier Reef and blogging about it. The salary for the six-month job is about $100,000. As more news organizations began reporting the story, applicants overloaded the Web site.

2011: Police Crime Site overloaded
AOL News, 2nd February 2011 by Hugh Collins - contributor
A British website showing block-by-block crime statistics crashed within a few hours of going online today after public interest overwhelmed servers.
The site,police.uk, crashed after receiving as many as 300,000 hits a minute, or 18 million an hour, The Press Association reported."Most popular gov website ever?" the British Home Office wrote on Twitter. "Demand for new #crime maps at around 300k a minute, equivalent to 18m hits an hr. Working hard to make sure everyone can access."
The site is intended to allow people to gauge levels of crime and police activity in neighborhoods in England and Wales. It breaks down crime into six categories: burglary, robbery, vehicle crime, violence, other crime and anti-social behavior,BBC News said.
When AOL News tried to access the site this afternoon, it displayed an error message, saying the site may be overloaded or down for maintenance.
There is no equivalent national map in the United States, according to Maggie McCullough, who heads up the Policy Map project at the nonprofit The Reinvestment Fund. The Policy Map project maps information such as crime statistics and mass transit access for different U.S. cities.
Individual U.S. cities including Chicago and Los Angeles currently offer crime statistics on a street-by-street level."Safety is the No. 1 thing," McCullough told AOL News. "People want to know what crime is happening in their neighborhood."
U.K. Policing and Criminal Justice Minister Nick Herbert specifically cited the Los Angeles example as a factor in creating the map of England and Wales. "Police.uk will make England and Wales world leaders in this field, with every citizen able to access details about crimes on their streets," Herbert said.
Not everybody is happy about the site. When it was first published, it showed Surrey Street in Portsmouth, on the south coast of England, as one of the most violent streets in the country, with 136 crimes in December alone.
In reality, the street is less than 350 feet long, and locals are amazed to hear they live in a crime hotspot.
"These maps are an utter joke. This is a quiet road tucked away and anyone can tell it's hardly Beirut," local Scott Mussen said, according to The Daily Mail.
Still, McCullough said that such maps do a lot of good, helping individuals and authorities make better-informed decisions.
"We need information to make decisions," McCullough said. "The public having information shouldn't be a bad thing. "

Thursday 25 February 2010

10060 Socket Error

To load and performance testers this will be a familiar message, however it often causes much confusion.
This message is seen under the following conditions:

  • The server has run out of socket connections
  • The injector has run out of socket connections
  • Socket connections are timing out
  • There is a network policy limiting the number of connections per machine
Useful tools:
The netstat command will list all active connections on the machine.
The regedit command (Registry Editor) and the following parameters (if you're not going to use Quotium Daemon to manage the changes):
  • \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters\MaxUserPort
  • \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters\TcpTimedWaitDelay
Remedies:
Load injector: If the injector machine has not been optimised to be a load injector, use the Quotium Daemon to adjust the registry settings for:
  • MaxUserPort - this is 5000 as standard in Windows 7, you can increase this to 65534. This is true for any windows OS. The parameter controls the maximum port number that is used when an application requests any available port, by increasing this value you increase the number of sockets available to applications on the machine.
  • TcpTimedWaitDelay - this is set to 240 by default in windows machines, this can be set between 30 - 300. Reduce the value to increase the turn around time for available sockets. The parameter controls the length of time a connection will stay in the TIME_WAIT state when being closed. While a connection is in the TIME_WAIT state,t he socket pair cannot be closed.
Other remedies:
It may be that the server is overloaded - in this case you need to speak to the application architects
It may be that network policy is stopping you open a large number of connections from a single workstation machine that you are using as an injector. Try installing QTest on multiple machines and distributing the load across all of these machines.

Thursday 21 January 2010

Cloud Computing: What could this mean for testers?

A problem for test managers, especially those involved in performance testing is that there are not the platforms available on which to run the number of test systems that they would like to, such as a UAT system for each build, or a system for performance testing. Those involved in performance testing will often have to pitch a very good case in order to get a like for like staging environment for performance testing. Many test departments are denied this and performance testing is done out of hours or after UAT is complete in the stages before go live.

Sure, most IT departments now have some kind of virtualisation, however even to get a virtualised server often internal procurement is involved, as there are storage, resource utilisation and other capacity questions to be answered. In the cloud a server can be provisioned within typically less than 5 minutes and cost just cents to run. The cloud seems like a very good option as a test platform.

Companies such as Microsoft and Oracle have given organisations the option to use their premises, kit and infrastructure as testing platforms when very large performance tests have been a necessity, now with the cloud this is no longer essential as there is a more cost effective, straight forward option available.

Of course the cloud is a shared resource and is a different proposition to a purely hosted environment; there are issues such as performance due to the fact that cloud machines are essentially virtual machines running in data centres on shared physical machines (only with a nice web interface or API to facilitate management of these virtual machines). However these issues can be overcome. For example to performance test a cloud based system, it may be necessary to run several tests at different times of the day to ensure that an even set of results is achieved. We saw this when running the cloud based test shown in our video here: www.youtube.com/user/Quotium. The second time we ran the test we found that results were different to the first time we ran the test.

The cloud provides great resilience due to its distributed nature, vastly reduced costs and simplicity. Assuming that issues surrounding the cloud are not a problem for an organisation, testers within that organisation could in fact lead their own organisation into a cloud computing environment – could this coin a new term – Test Driven Infrastructure? If a cloud infrastructure is used as a test environment then it would follow suit that the business would then start to look at the cloud as an alternative to their hosted or in house solutions and the testers would have shown the way.

As a company we have found that the cloud answers many of our needs including the ability to have a scalable load platforms and international presences that before would have been difficult and expensive for us to reach.

If you'd like to try cloud computing for yourself just google 'amazon ec2', get your credit card ready (it'll only cost you a few cents or possibly dollars - just remember to terminate the instance when your finnished otherwise the dollars might add up) and start an EC2 instance. You then simply use a terminal services client to connect to it and use it as you would any other VM.

Monday 26 October 2009

Common HTTP Response Codes seen in Load Testing

A complete list of HTTP response codes can be found on here Wikipedia and at w3.org.
What's below is not a comprehensive list but some commonly seen HTTP response codes seen when load testing and where possible some reasons for seeing them.

400 Bad Request
Officially: The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.

When / Why: The request in the script may be badly formatted. The best thing to do is check if the request has any parameters that you have edited, of so check them by debugging.

HTTP 401 Unauthorised
Officially: The request requires user authentication. The response MUST include a WWW-Authenticate header field (section 14.47) containing a challenge applicable to the requested resource. The client MAY repeat the request with a suitable Authorization header field (section 14.8). If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials. If the 401 response contains the same challenge as the prior response, and the user agent has already attempted authentication at least once, then the user SHOULD be presented the entity that was given in the response, since that entity might include relevant diagnostic information. HTTP access authentication is explained in "HTTP Authentication: Basic and Digest Access Authentication"

When / Why: Typically when load testing many users are generated from the same machine. In a Microsoft example the browser can use the current Windows logon credentials as credentials for the web site using NTLM. Therefore if the login is different as does not have the same authorisation as the original login (from the recording phase) during the execution the server may deny access with 401.
In the script you should use:
SetAuthentication(UserNameVariable,PasswordVariable,DomainVariable);

HTTP 403 Forbidden
Officially: The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.

When / Why: If you do not see the error when manually browsing but do see it when running a script check that the script is recent. If there has been aconfiguration change on the server you may see this message: For example if if at one time a server hosted the site and now no longer does so and can't or won't provide a redirection to the new location it may send a HTTP 403 back rather than a more meaningful message.
Check that authentication is correctly set up in the script - see HTTP 401.
Also check that the browser you are simulating is allowed as a security policy can ban certain types of traffic from a server.

HTTP 404 Not Found
When / Why: You may find that when interacted with manually the web site does not appear to throw any HTTP 404 messages.
When running a load test script you then may see 404's in the response codes. This can be because the object (probably a page component) that the tool requested does not exist on the server - it may be a .gif or a .js java script or similar that is referenced by the page yet does not exist on the server. In a browser this would simply appear as an empty image place holder in the case of an image for example, or in the case of a java script you may even see nothing if the java script is redundant. As the tool specifically requests the object (as the browser does) it will of course log the 404 message if the object cannot be found.

HTTP 407 Proxy Authentication Required
Officially: This code is similar to 401 (Unauthorized), but indicates that the client must first authenticate itself with the proxy. The proxy MUST return a Proxy-Authenticate header field (section 14.33) containing a challenge applicable to the proxy for the requested resource. The client MAY repeat the request with a suitable Proxy-Authorization header field (section 14.34). HTTP access authentication is explained in "HTTP Authentication: Basic and Digest Access Authentication"

When / Why: The script was probably captured with the browser already pointing to a proxy server set (see browser network settings). See the information on HTTP 401 for reasons why this might happen.
Typically it's best to avoid running a load test through a proxy server, especially if production load will not be routed through that proxy server. Ways to avoid the proxy server are to remove the part of the script that states use of the proxy server (often internal applications are available even while bypassing the proxy server), if that doesn't work - move the injection point to a location in the network where the proxy server can be bypassed (perhaps the same VLAN as the web server).

HTTP 500 Internal Server Error
Officially: The server encountered an unexpected condition which prevented it from fulfilling the request.

When / Why: You'll often see this after an HTTP POST statement and it usually means that the post statement has not been formed correctly.
There can be a number of reasons for this including the request being badly formed by the tool - or at least not formed as expected by the server. More typically it's because the POSTed form values are incorrect due to incorrect correlation / parameterisation of form variables.
For example: In a .Net application a very large __VIEWSTATE value is passed between the browser and server with each POST, this is a way to maintain state and puts the onus on state ownership on the browser rather than the server. This can have performance issues which I won't go into here. If this value is not parameterised correctly in the script (there can be more than one __VIEWSTATE) then the server can be confused (sent erroneous requests) and respond with a 500 Internal Server Error.
A 500 error usually originates from the application server part of the infrastructure.
It's not just .Net parameters that can cause this. Items such as badly formed dates, incorrectly formatted fields and badly formatted strings (consider replaced spaces with + characters) and so on can all cause HTTP 500 errors.

HTTP 503 Service Unavailable
Officially: The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.

When / Why: Typically this will be due to the allowed number of concurrent connections on the server and is usually down to a configuration or license setting. For example IIS running on a non server version of Windows is limited to 10 concurrent connections - after this point it will deliver a 503 message. There is a temptation in load testing to overload the application under test, so it's worth revisiting your non functional requirements - will the production server ever see this number of concurrent connections?

References - w3.org

Friday 2 October 2009

How do you determine the number of concurrent users you will need in your test?

It's quite a common question and sometimes a very difficult one to answer -
How do you estimate how many virtual users you should simulate in a performance test?
However if you have the following information you can make an estimate.

You need to know:

a: How many visits your site gets or how many visits you're expecting per day (choose a relevant day - e.g. if you're a business to business web site choose a weekday, if your a retail site perhaps a weekend day would be more appropriate)

b: The average time a user stays on the site

c: The ratio between a busy period and a slack period

d: How many minutes in a day (or at least how many minutes you consider the site to be active in a day)

You can then apply the following rule:
( a x b x c) / d = concurrent users

For example: you have a site which sees 30,000 visits each day, the average time a user spends on the site is 10 minutes and busy periods are 10 times busier than slack periods. This means that the concurrent user rate is (30,000 x 10 x 10) / 1440 = 2083 concurrent users.

I won't claim to have come up with this formula myself - it's from Performance Planning and Deployment with Content Management Server - a paper about MS's CMS system. The formula will apply to any web site, not just those hosted on these servers of course.

Tuesday 29 September 2009

Should you automate your tests?

Most IT Managers behave as external software engineering companies with their own organizations.

How the IT department gets involved within the company’s business issues is fundamental. Indeed they can be involved in many tasks such as:
- Defining the business needs
- Developing the applications
- Choosing the appropriate solutions on the market
- Setting and managing appropriate infrastructures
- Managing application & infrastructure resources everyday

What’s more the SLA’s between the IT department and the business departments strengthen this role of internal provider of services.

Once an application is developed or acquired it is obvious that its functionalities and its performance must be qualified and validated.

This document deals with the relevance to automate – or not – functional tests and performance tests in this context. It also deals with the application & infrastructure monitoring during the production cycle.

Functional Tests

Your goal is to check that the application meets the specified requirements. Then throughout the application lifecycle the regression tests aim at validating that the application still runs despite its evolutions. Beyond the unit tests achieved by the developers or the project managers, an independent department dedicated to validation / qualification / testing should certify that the applications run properly and meet the requirements specified.

So the first question is: Should you automate those functional tests or not? The second question: Will you be more productive so that your initial investment in functional test automation is justified? Everybody knows that automatic tests are 5 to 7 times more expensive than manual tests. They can be even more expensive when the first testing campaigns are realised by teams with poor experience in automation tools whereas a team dedicated to validation & qualification with a good experience in programming could reduce the costs.

Thus this investment will be recouped more or less quickly depending on the frequency of the minor changes included in the maintenance of the applications. The word “minor” is extremely important, indeed any major change to the application architecture or the GUI can impact the test scripts considerably. Sometimes you even have to rewrite them all.

Several contexts fully justify functional test automation. For instance software editors have to maintain several versions of their products at the same time, and this in several languages, for several system environments, for several browsers. Moreover they use several patches everyday. In this case test automation is vital – it is an integral part of the lifecycle of the products.

In the context of a specific application developed internally, it is obvious that you have to compare the initial costs – acquiring solutions, training testers and creating scripts of course – with the benefits of your test automation given the frequency of the tests made on minor changes.

Performance Tests

In this case your goal is to check that the response times experienced by end-users meet the specified requirements. Respecting those requirements is vital when the response times have a strong impact on the business goals or the user’s work. For instance if your application permits to place an order online, the global order management can take more time because of initial bad response times, and the sales figures can be reduced because the data entry component is unavailable.

Whereas it’s easy to make functional tests manually, you hardly can make load tests manually. You would have to ask dozens or hundreds of end-users to stop their work to make the tests, they would have to be synchronised on a common clock, to make the same tasks at the same time, such as to click or to type in data. Each task should be repeated whenever a correction or a change would be done.

This approach asks such a lot of energy and resources that it is incompatible with most companies.

Several solutions are offered on the market, they are absolutely reliable and they let you obtain the same results without requiring real end-users. They permit you to create scripts in half a day if you work in HTTP environments.

The cost of load test automation is 10 to 30 times less important than the cost of functional test automation. It is all the more recouped rapidly that it permits to test applications but also to validate infrastructure changes – server, network, RDBM S, system versions, browsers, etc…

Application monitoring during the production cycle

For business teams the quality of the services delivered by the IT department can be measured through 3 main criteria:
• Availability: is the application available?
• Response times: do the response times fulfil the business requirements? As outlined above response times can have a strong impact on the productivity and even on the sales figures.
• Accuracy: the end-user needs quick responses, and above all accurate responses, not “404 page not found” results for instance.

An application – when it is delivered to the business teams - implies that complex logistics is set up including most of the time: application server, database server, web server, router, firewall, provider, etc…When only one component of this chain is defective, even if the problem is local, it implies that at least one of the criteria seen above is not respected. From the IT Department’s viewpoint everything seems to work, but the end-user does not agree.

As a consequence you must check that the components of your applications are ok from both the infrastructure viewpoint and the end-user viewpoint. What’s more this correlating approach will permit you to react before the business departments suffer problems sometimes. At last a personalized map will show the deterioration processes thus it will allow locating them geographically and alert the concerned teams immediately.

by Daniel Melloul, Quotium