Load Balancing Horizon View – Failure Testing

In the last post Load Balancing Horizon View – Design we looked at the differences between DNS Round Robin, Windows Network Load Balancing and Load Balancers and the design concepts for internal and external use.

In this post we will focus on testing failure scenarios to understand the impact of various components failing within a design.

Lab Setup

The Horizon View environment is configured as follows:

  • 2 x NetScaler VPX-Express in High Availability
  • 2 x Horizon View Security Servers
  • 2 x Horizon View Connection Servers

For the NetScaler configuration I followed the excellent Load Balancing VMware View with NetScaler guide by Dale Scriven who runs the blog vhorizon.co.uk.  The only addition to this was an additional TCP Service group for 8443 (HTML5).

Service Groups

In the interests of sharing the configuration, below are extracts from each area.

Internal Logical Design

VMFocus View Internal Design HA v0.1

External Logical Design

VMFocus View Remote Access Design HA v0.1

vSphere Web Client

vSphere Client View

Horizon View Administrator

View Client View

NetScaler VPX-Express Admin

NetScalerClient View

Internal Connection Server Failure Scenario – Secure Gateway/Connection Unticked

Connection Server Unticked

I will have a two connections to my Desktop Pool, both via View Client.

Table to Show Expected Results – Internal Connection Server Failure – Secure Gateway/Connection Unticked

Criteria Expected Result Recovery Time
Connection Server Power Off Desktop remains connected n/a
Connection Server Shut Down Desktop remains connected n/a
NetScaler VPX-Express Power Off Desktop remains connected n/a
NetScaler VPX-Express Shut Down Desktop remains connected n/a

Table to Show Actual Results – Internal Connection Server Failure – Secure Gateway/Connection Unticked

Criteria Actual Result Recovery Time
Connection Server Power Off Desktop remains connected n/a
Connection Server Shut Down Desktop remains connected n/a
NetScaler VPX-Express Power Off Desktop remains connected n/a
NetScaler VPX-Express Shut Down Desktop remains connected n/a

Not much to say really, everything performed as expected.

Internal Connection Server Failure Scenario – Secure Gateway/Connection Ticked

Connection Server Ticked

Again, I will have a two connections to my Desktop Pool, both via View Client.

Table to Show Expected Results – Internal Connection Server Failure – Secure Gateway/Connection Ticked

Criteria Expected Result Recovery Time
Connection Server Power Off Desktop session disconnect, then manual reconnect 20 seconds
Connection Server Shut Down Desktop session disconnect, then manual reconnect 25 seconds
NetScaler VPX-Express Power Off Desktop session disconnect, then manual reconnect 20 seconds
NetScaler VPX-Express Shut Down Desktop session disconnect, then manual reconnect 25 seconds

Table to Show Actual Results – Internal Connection Server Failure – Secure Gateway/Connection Ticked

Criteria Actual Result Recovery Time
Connection Server Power Off Desktop session disconnected after 2 seconds, manual reconnect 28 seconds to be logged back into desktop
Connection Server Shut Down Desktop session disconnected after 4 seconds, manual reconnect 35 seconds to be logged back into desktop
NetScaler VPX-Express Power Off Desktop session disconnected after 5 seconds, manual reconnect 33 seconds to be logged back into desktop
NetScaler VPX-Express Shut Down Desktop session disconnected after 9 seconds, manual reconnect 41 seconds to be logged back into desktop

The Citrix NetScaler VPX offer high availability for the sharing of configuration and virtual IP address. They do not provide no session loss between appliance failure.

External Failure Scenario Expected Results

I will have a three connections to my Desktop Pool, two via View Client, one via Blast (HTML5) and the last via View Client.  The Horizon View Administrator will be checked before each test to see which Security Server has the heaviest load and this one will form the test.

View Test

After each test Horizon View Administrator will be checked to find which Security Server has the heaviest load to perform the next test.

Criteria Expected Result Recovery Time
Security Server Power Off Desktop session disconnect, then manual reconnect 40 seconds
Security Server Shut Down Desktop session disconnect, then manual reconnect 40 seconds
Connection Server Power Off Desktop session disconnect, then manual reconnect 40 seconds
Connection Server Shut Down Desktop session disconnect, then manual reconnect 40 seconds
NetScaler VPX-Express Power Off Desktop session disconnect, then manual reconnect 60 seconds
NetScaler VPX-Express Shut Down Desktop session disconnect, then manual reconnect 60 seconds

External Failure Scenario Actual Results

Criteria Actual Result Recovery Time
Security Server Power Off Desktop session disconnected after 14 seconds, manual reconnect 52 seconds to be logged back into desktop
Security Server Shut Down Desktop session disconnected after 12 seconds, manual reconnect 55 seconds to be logged back into desktop
Connection Server Power Off Desktop session disconnected after 19 seconds, manual reconnect 109 seconds reconnected, black desktop background.  Timeout message 134 seconds.  Second reconnect, 252 seconds reconnected, black desktop background.  Timeout message 283 seconds. Loop via View Client.  Can connect via Blast (HTML5) to desktop.
Connection Server Shut Down Desktop session disconnected after 24 seconds, manual reconnect 118 seconds reconnected, black desktop background.  Timeout message 141 seconds.  Second manual reconnect, 276 seconds reconnected, black desktop background.  Timeout message 301 seconds. Loop via View Client.  Can connect via Blast (HTML5) to desktop.
NetScaler VPX-Express Power Off Desktop session disconnected after 4 seconds, manual reconnect 39 seconds to be logged back into desktop.
NetScaler VPX-Express Shut Down Desktop session disconnected after 19 seconds, manual reconnect 57 seconds to be logged back into desktop.

When a View Client connects externally, the NetScaler VPX passes traffic to the least loaded Security Server.  Remember a Security Server is bound to a single Connection Server and that ALL traffic is proxied via the Security Server.

When first Security Server fails you are disconnected (as expected). When the View Client is launched again the NetScaler VPX routes traffic via the secondary Security Server and the secondary Connection Server.

  1. Everything OK NetScaler > Security Server 01 > Connection Server 01 > Desktop
  2. Failed Security Server NetScaler > Security Server 01 > No Access To Connection Server 01
  3. Reconnect NetScaler > Security Server 02 > Connection Server 02 > Desktop

What I found most interesting was the Connection Server failures. In this scenario, the Security Servers are up and a Connection Server goes down.

Trying to reconnect to via the View Client, enables you to authenticate successfully, but you receive a ‘black desktop screen’ and then a connection time out.

Looking at the connection status of the NetScaler VPX-Express services, only the HTTPS SSL Bridge to 443 on Security Server 01 is down and the rest of the services are up.

Failure Connection Server Power Off 01

When the NetScaler VPX polls the Security Server on 443 HTTPS, 4172 TCP and 4172 UDP it sees that the PCoIP services on 4172 are up and tries to reconnect back to the original TCP session, due to the fact that our Persistency Group is Source IP and that we are connecting back over the same ports.

Connecting via Blast HTTPS 8443 works, I imagine this is due to a new TCP connection being established to Security Server02, which in turn connects via Connection Server 02 which is up.

Disconnecting from the Blast Desktop, I was able to reconnect to my desktop using View Client.

Final Word

Hopefully this post has gone someway to helping you understand the failure scenarios .  Knowing what to expect is key as it allows you to set expectations to both the business and users.

12 thoughts on “Load Balancing Horizon View – Failure Testing

  1. I am also trying to setup a similar LAB configuration for the purpose of testing.

    I arrived at your blog from http://vhorizon.co.uk article on how to configure Netscaler vpx to load balance View Security/Connection servers.

    Thank you for sharing your experience and highlighting what is possible and what is not (when using Netscaler) in terms of having a reliable and highly available load balancing solution at a relatively low cost for SMB.

    What I was most curious about is how did you design the DMZ port configurations etc. In your lab setup do you have the Netscaler SNIP configured to reside in your Private LAN or it also sits inside the DMZ. In this case I am most interested in finding out how and in which direction did you open the 443/4172 ports back to your View Security and Connection servers located in your Private LAN.

    A diagram would be most helpful.

  2. I forgot to add, have you deployed Netscaler in customer environments for View load balancing? Is it supported by VMware?

    1. VMware does support load balancing between Connection Servers and Security Servers. As far as I’m aware it is only NetScaler and F5 which support load balancing View.

  3. Yes, I was aware of F5 being supported along with VMware’s own vCNS. I thought although Netscaler was capable of providing load balancing for View, I haven’t found any best practices technical white papers VMware nor from Citrix on how to configure Netscaler for fronting View client access.
    Thanks I will keep this in mind when I am advising clients.

  4. Craig, you know I have been wondering since the Security Server basically acts as a “reverse proxy”, is it not possible to eliminate it by letting the Netscaler to act as the logon point for authentication which is what the SS is doing in effect. Have you looked into this?

  5. When you deploy Netscaler to load balance view environments for (external access) where do you place usually place the Security Servers. I take it in the DMZ, or Private along with the Connection servers?

  6. Craig,
    I see at the top a screenshot of the additional Service groups adding blast HTML5 TCP on 8443. At the bottom these blast HTML5 connections are configured as SSLBRIDGE using 8443. Did it change to one or the other and just not get updated on this page? which one should work here?
    Thanks!

    1. Hi Scott, thanks for reading and good spot!

      I have taken down the NetScalers in my lab, however I’m pretty sure it was SSLBRIDGE

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s