Shared posts

14 Dec 05:14

Don’t Get Slimed By Bad Parameter Sniffing

by Tracy McKibben

ghostbusters_slimer

Listen! Do you smell something? Are you suffering from random, unexplained slowdowns? Users complaining about queries that sometimes run fast, sometimes slow, for no obvious reason? Maybe it’s gremlins, or maybe your server is haunted, but more likely it’s just bad parameter sniffing. Before you call in the Ghostbusters, let’s discuss parameter sniffing, both good and bad, and how to combat the bad variety.

What is parameter sniffing?

Parameter sniffing is a fundamental part of the SQL Server query engine. In order to become a troubleshooting and performance tuning expert, you need to have a solid understanding of what it is and how it works. I’ll give you the simple explanation of what parameter sniffing is. There are better, more detailed explanations available out there, but for the purpose of this article, a simple explanation will suffice. Assume that you have a stored procedure:

CREATE PROCEDURE uspSearchForClientAddressByClientID
  @ClientID INT
AS
BEGIN
  SELECT ClientID, ClientAddress
  FROM ClientAddresses
  WHERE ClientID = @ClientID;
END;

Assume also that you have ten (10) clients with ID’s ranging from 1 to 10. Client number 1 has 900 locations, each represented by a row in the ClientAddresses table. The other clients, 2 thru 10, only have one location each.

The ClientAddresses table is a heap (a table with no clustered index), with one non-clustered index on the ClientID column.

Let’s execute the stored procedure:

EXECUTE uspSearchForClientAddressByClientID
  @ClientID = 9;

Gears whir, buzzers buzz, spinners spin, as SQL Server figures out the fastest way to run our stored procedure. Looking at the parameter value that we provided, the indexes on the table, and the statistics describing the data contained in the table and the indexes, SQL compiles a query plan and then executes the procedure according to that plan. It also remembers the plan for later use. This, in a nutshell, is parameter sniffing.

In this case, the statistics tell SQL Server that there is only one row with a client ID of 9, and an index is available to point us directly at that single row. The resulting query plan will do an index seek against the non-clustered index, finding our one row, then do an additional lookup against the heap to get the ClientAddress column.

client9plan

This plan will be cached, and all subsequent executions of the stored procedure will do the same index seek/heap lookup operation, regardless of the value specified for the @ClientID parameter.

What is bad parameter sniffing?

sniffing

Let’s execute the procedure again, this time looking for a client ID of 1, the client with 900 addresses. There’s already a plan in the cache for the stored procedure, so that plan is retrieved and used to run our query. This time however, there are 900 rows to locate. If you recall, the query plan, thinking that there’s only one row to retrieve, does an index seek and a heap lookup to find one row. In this case, it’s going to do 900 index seeks, and 900 heap lookups (notice the “thick lines” in comparison to the first query).

client1plan

See how widely different the estimated number of rows is from the actual number of rows? That’s bad parameter sniffing at work. The plan is not optimized for the parameter value that we provided this time, it’s optimized for another parameter value with a drastically different data profile.

Conversely, if the first procedure execution uses client ID 1 as its parameter, the query optimizer knows (based on statistics) that there are 900 rows to retrieve. A table scan will be more efficient that seeking each row one by one, so the compiled plan does just that. You can see in the plan that the estimated row count was 900, and 900 rows were actually returned.

client1badplan

Unfortunately, this means that every execution of the procedure, even for the single-row client ID values, would also do a table scan. Running the procedure for client ID 9, the estimated row count is still 900, but only one row is actually returned.

client9badplan

Definitely not preferred behavior, and yet another example of bad parameter sniffing.

Stored procedures vs standalone queries

One common technique for reducing the effects of parameter sniffing in a stored procedure is to declare local variables within the procedure, assign your parameter values to those variables, and use the variables in your query instead of the parameters. Something like this:

CREATE PROCEDURE uspSearchForClientAddressByClientID
  @ClientID INT
AS
BEGIN
  DECLARE @PassedClientID INT;

  SET @PassedClientID = @ClientID;

  SELECT ClientID, ClientAddress
  FROM ClientAddresses
  WHERE ClientID = @PassedClientID;
END;

When presented with a procedure constructed this way, SQL Server is unable to “sniff” the incoming parameter value. For lack of any better direction, it will rely solely on the table and index statistics to figure out an “average” value for the incoming parameter, the value that it thinks is most likely to be provided. It will then produce a nice, generic query plan. The result is a plan that works “OK” for most executions, but it’s not optimized to run as fast as possible for any specific parameter value. If I execute this stored procedure for client ID 9, for which there exists only one row, the query plan indicates a table scan, not an index seek.

localvars

Why? Because the statistics tell the query optimizer that of the 909 rows in the table, 90.9% of them are for client ID 1. Odds are that client ID 1 is going to be searched for most frequently, so that’s what the plan is optimized for.

Another option available for combating bad parameter sniffing in a stored procedure is the WITH RECOMPILE option:

CREATE PROCEDURE uspSearchForClientAddressByClientID
  @ClientID INT
  WITH RECOMPILE
AS
BEGIN
  SELECT ClientID, ClientAddress
  FROM ClientAddresses
  WHERE ClientID = @ClientID;
END;

The WITH RECOMPILE option, if you haven’t guessed already, forces the stored procedure to compile a new query plan each time it is executed. This completely eliminates the possibility of having a “bad” plan cached, but it can add significant overhead to the execution of large or complex stored procedures. Use this with caution.

Unfortunately, bad parameter sniffing isn’t limited to stored procedures. Standalone queries can suffer from it as well.

Parameterized queries

Many of today’s applications don’t use stored procedures at all – it’s part of the timeless conflict between DBA’s and developers. The bias today is toward parameterized queries. You might see something like this used to perform the same client address search that appears above:

SqlConnection conn = new SqlConnection(_connectionString);
conn.Open();
string s = "SELECT ClientID, ClientAddress " + 
  "FROM ClientAddresses WHERE ClientID = @ClientID";
SqlCommand cmd = new SqlCommand(s);
cmd.Parameters.Add("@ClientID", ClientID);
SqlDataReader reader = cmd.ExecuteReader();

A query submitted in this fashion is sniffed, compiled, and cached in exactly the same way that a stored procedure is. It is equally vulnerable to bad parameter sniffing, and it’s even harder for a DBA to fix under fire if it’s causing a problem problem. With the stored procedure, if necessary, I can directly modify the procedure to add WITH RECOMPILE or use the local variables trick to make it behave. With this parameterized code, I can’t do a thing. I need to get a developer involved, a code change is needed, and a new app must be deployed. I do have some options available, which we’ll discuss later in this article, but you can see that I lose a lot of maneuverability when I don’t have stored procedures to work with.

Some of you at this point might be thinking parameters are a bad idea – they seem to cause a lot of instability and potential problems with the query optimizer. Why use them at all? Why not just generate dynamic SQL statements with hard-coded lookup values? If I want to find addresses for client ID 9, why not just have my application issue this query (note that this is a VERY simple example and is provided only to help explain parameterization and plan reuse):

SELECT 
  ClientID, 
  ClientAddress
FROM ClientAddresses
WHERE ClientID = 9;

There are no parameters involved, so there should be no chance of bad parameter sniffing, right? I can issue the same query with a different client ID to find addresses for another client, again with no parameters. Problem solved! Ummm, no offense, but in the words of the great Peter Venkman, this reminds me of the time you tried to drill a hole through your head. Remember that?

What we’ve just described is known as an “ad hoc” query. There’s nothing inherently “bad” about them, it’s just that they sort of defeat the purpose of caching query plans for reuse. Each time somebody looks for client ID 9, that query will reuse the plan that was cached the last time somebody looked for client ID 9. Searching for client ID 8? That’s technically a different query (different hard-coded value, different query text), so it’s going to compile and cache its own plan. If you have 10 clients in your table, you’ll potentially have 10 different query plans cached for client lookups. What if you have a million clients? Yep, that’s a million query plans. Time for a story – gather ’round kiddies…

forcedparameterization

Once upon a time there was a company. This company employed a small army of developers, a handful of DBAs, and one Real SQL Guy. The Real SQL Guy had things humming along pretty nicely, in spite of the fact that the army of developers were constantly assaulting SQL Server with an OLTP workload that was “ad hoc heavy”. One day, the Real SQL Guy decided to upgrade the production server to SQL Server 2012, so he walked down to the computer store on the corner and purchased a monstrous machine – 64 cores, 2TB of RAM, a real beast. SQL Server would surely FLY on this hardware. He plugged in the new machine, restored backups of the OLTP databases onto the new SQL instance, pointed the apps at the new server, and watched as the machine immediately hit 100% CPU utilization and died. Frantically, knobs were turned, buttons were pushed, statistics were updated, all to no avail. Fearing that he’d lost his mojo, the Real SQL Guy called in the REAL SQL Guys, direct from Microsoft. After a week on-site, even they were stumped – the consensus was that the problem had to do with cache management, but they weren’t sure why. Near the end of that week, the Real SQL Guy timidly (as is his manner) suggested that we try a rarely used option – forced parameterization. The mojo was back…

Forced parameterization

So what is forced parameterization? Refer back to the example above where we’re issuing ad hoc queries with the client ID values hard-coded into the query text. In this ridiculously simple example, SQL Server is going to recognize that the client ID value is actually a parameter, even though you haven’t declared it as such, and will process the query as if it were properly declared with parameters. With more complex queries however, it may not do that – the optimizer will only spend a finite amount of time analyzing a query before it gives up and just runs it. Forced parameterization changes this behavior – it forces the optimizer to spend more time thinking about the query before running it, with the hope that it will be able to come up with a reusable query plan.

In the previous tale of woe, where I told you about an upgrade gone bad, the problem stemmed from a change in the way the plan cache is managed in SQL Server 2012. Without going into the technical details, let’s just say that SQL Server 2012 is more aggressive about cleaning out unneeded plans from the cache. With our ad hoc heavy workload, SQL Server was getting slammed with literally THOUSANDS of ad hoc queries per second. That’s thousands of requests to compile a plan, store it in the cache (where it will never be used again), while at the same time the cleanup process is trying to remove the plans that it knows are ad hoc and not needed anymore. SQL Server essentially became so busy trying to manage the cache that it couldn’t do any real work.

By enabling forced parameterization, those thousands of ad hoc queries suddenly became queries that SQL was able to create reusable plans for. The tug-of-war between the incoming new plans and the cleanup of the old plans stopped, and the server was able to stay alive and actually do work.

Am I suggesting that YOU should enable forced parameterization on your databases? No, I’m not. If you think you’re suffering because of ad hoc queries and excessive compilations and poor plan reuse, then yes, consider forced parameterization, but understand what you’re doing and why. Forced parameterization isn’t free – it increases your exposure to bad parameter sniffing, because it forces more queries to become parameterized. As the use of application frameworks like LINQ and nHibernate increases, the problem of ad hoc queries is going to affect more and more of our SQL Server systems. Developers will argue that nHibernate is great and would NEVER cause problems like this. If you believe that, you probably also believe in UFOs, astral projections, mental telepathy, ESP, clairvoyance, spirit photography, telekinetic movement, full trance mediums, the Loch Ness monster and the theory of Atlantis.

Query hints

queryhints

Alright – we’ve established that stored procedures are no good because they can suffer from bad parameter sniffing. We’ve established that parameterized queries are no good because they too can suffer from bad parameter sniffing. We’ve established that ad hoc queries are no good because they can cause problems with the plan cache. We’ve also established that forced parameterization can reduce the problem with ad hoc queries – by making them vulnerable to bad parameter sniffing. Seems like there’s no way to win, doesn’t it?

It’s not really that bad. There are several query hints available to us to help control how a query is compiled. I could dedicate an entire blog post to just query hints, so I’m not going to cover them all here. I do want to mention one specifically however – the OPTIMIZE FOR hint.

Remember the local variables trick that I talked about earlier, where you use local variables in a stored procedure to pass the incoming parameter values into your query? That trick causes the query to be compiled using a “generic” set of parameters that are based solely on the statistics on the underlying table(s). You can achieve the same effect by using the OPTIMIZE FOR UNKNOWN query hint.

You can also use OPTIMIZE FOR to force a query to compile a plan that is optimized for a specific value. Think about the ClientAddresses table that we were querying earlier. We know that client 1 has 900 address records and that queries looking for that client are going to prefer a table scan. We also know that clients 2 thru 9 each only have one address record and will prefer an index seek. Knowing our workload, we know that the vast majority of the ClientAddresses lookups are done for clients 2 thru 9 – lookups for client 1 are rare. We can use OPTIMIZE FOR (@ClientID = 2) to force the query to compile a plan that is optimized for one of the single-row clients. You can (and should) read a lot more online about the OPTIMIZE FOR query hint – better writers than me have better explanations to offer.

Query hints are powerful tuning tools that you can use, but they require you to have access to the queries. If the queries are being generated by an application, you may not have that access. You might try working with the developers to include query hints in some of the problem queries, but that could easily lead to human sacrifice, dogs and cats living together… mass hysteria! There’s a safer option that will keep the dogs and cats separate – plan guides.

Plan guides

If you have a query that usually runs well but seemingly at random will suddenly fall on its face, you’re likely dealing with a bad query plan resulting from bad parameter sniffing. Consider this graph from Ignite, showing this very behavior as witnessed on one of my production servers.

badplan

The green bars represent a query that would, two or three times a day, receive a bad plan from the optimizer. This query is run thousands of times an hour and when it would run with a bad plan, it was a significant load on the server. After trying various query hints, updating statistics more frequently on a couple of tables, some Gozer worshiping (surprisingly uneffective), I decided to force a good plan through the use of a plan guide. Can you tell from the graph which day I did this on?

A plan guide can take a couple of different forms. In one form, a plan guide is simply a way for you to tell the optimizer “Hey, every time you see this query text, I want you to treat it as if this specific query hint were specified in the query text”. For example, you can force the use of OPTIMIZE FOR UNKNOWN on a query that you don’t have access to in order to add the query hint directly to the query text.

Another form of plan guide is that where you have a known good query plan in the plan cache, and you want to make sure that plan is always used to execute a particular query. By using the plan handle for that query plan, you can tell the optimizer “Hey, every time you see this query text, don’t think about how to run it, just always use this query plan”. That’s what I did to fix this particular problem. Let’s have a look at how I did it.

First, I had to find the plan handle for my known good plan. During a period of time that I knew the query was running correctly, I ran this query:

-- Use this SELECT statement to determine the plan_handle for the desired query plan
SELECT * 
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
CROSS APPLY sys.dm_exec_text_query_plan(qs.plan_handle, qs.statement_start_offset, qs.statement_end_offset) AS qp
WHERE qt.text LIKE N'%OrderItemEvents . Event_ID = OrderItemEventRequestor . OrderItemEventID%';

That returns various pieces of information (including the plan handle) about the query containing the text snippet that I specified. Using that plan handle, I then ran this:

-- Create a plan guide for the query using the plan_handle for the desired query plan
DECLARE @plan_handle	VARBINARY(64);
DECLARE @offset			INT;

SELECT 
    @plan_handle	= qs.plan_handle, 
    @offset			= qs.statement_start_offset
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS qt
CROSS APPLY sys.dm_exec_text_query_plan(qs.plan_handle, qs.statement_start_offset, qs.statement_end_offset) AS qp
WHERE qs.plan_handle = 0x060006001E371516A00DAFAD2101000001000000000000000000000000000000000000000000000000000000

EXECUTE sp_create_plan_guide_from_handle 
    @name =  N'OrderItemEvents_PlanGuide',
    @plan_handle = @plan_handle,
    @statement_start_offset = @offset;

Done. From now on, until I remove that plan guide, this particular query will ALWAYS use the known good plan that is currently in the plan cache. As you can see in the Ignite graph, it didn’t misbehave again. Back off man, I’m a data scientist.

Parameter sniffing is a necessary part of the query engine and most of the time, it works and does the job it’s supposed to do. When it doesn’t work, it can cause some serious problems. As the SQL Server product has matured and the query engine has improved, it’s become smarter and can do some amazing things with some complicated queries. We have, fortunately, also gained new weapons to use against bad parameter sniffing when it occurs. It pays to keep up with the latest versions of the product.

But what if you’re stuck on an old version?

When all else fails, cross the streams

It’s not pretty, but sometimes you gotta do what you gotta do. You cross the streams, nuke the site from orbit, throw the Hail Mary, you punt…

CrossTheStreams

Because we’re lucky that way, my team is stuck supporting a poorly configured, grossly overloaded SQL Server 2005 STANDARD EDITION instance that we’re not allowed to upgrade. This instance supports a complex, busy application that uses Hibernate-generated queries to communicate with the database. It’s the classic “built and supported by developers with no understanding of SQL Server” system. It’s littered with views that reference other views that reference other views. The queries tend to be the type that fetch every possible column from every table across a ten-table JOIN. I think you get the point.

This system is, as you might expect, incredibly unstable. Multiple times a day, often overnight or over the weekend, one of these big ugly queries will receive a bad plan, and the system slows to a crawl. The developers are reluctant to make code changes, and “it must be a SQL problem” is commonly heard. We’re still expected to keep the database server running, so we opted for the nuclear option. We crossed the streams – a scheduled job issues the DBCC FREEPROCCACHE command every hour. Yep, we dump the entire plan cache, good plans and bad, every hour. This guarantees that all of the bad plans get removed, with the unfortunate side-effect of removing the good plans too. Hey, I said it wasn’t pretty, but we had to show this prehistoric server how we do things downtown.

Who Ya Gonna Call?

Well, actually, they’re gonna call you. When “the database is slow”, when that pesky query is running slow again, or when “something’s wrong”, they’re gonna call you. You’re gonna need to know what to do and maybe, just maybe, something you’ve read here will help you out of a sticky situation. Afterwards, when they ask you if you’re a SQL God, you say YES! I always do…

gb-marshmallow

The post Don’t Get Slimed By Bad Parameter Sniffing appeared first on RealSQLGuy.

14 Dec 05:13

SQL Server 2014 Hybrid Cloud Scenarios: Migrating On-Premises SQL Server to Windows Azure Virtual Machines

by SQL Server Team

Those of you who have tried the new version of SQL Server Management Studio might have noticed already that it has a couple of new wizards added. One of those wizards is sitting next to the already existing that has been allowing you to deploy to a Windows Azure SQL Database service (former SQL Azure) and it is called Deploy Database to a Windows Azure Virtual Machine (VM).

You might be wondering why this wizard is here and what it can do. The blog post below tries to provide the answer to this and some of the related questions.

The Path to Windows Azure

We have seen many time that when people are trying a Virtual Machine in the Windows Azure environment they quickly come up with a number of questions. Once the first set of questions is resolved the next wave of questions come up, then the next wave, and a few next waves.

I have tried to compile the list of things you could expect.

  1. Get access to a Windows Azure.
    Logging into a Windows Azure Portal might not be enough to unlock the full power of the Azure. In majority of cases you would need a Management Certificate. There is nothing special about it, except it should be available on the machine you are using, and it should be known to the Windows Azure. This means you should create a certificate somehow. Alternate option is to download Publishing Profile from the Portal. In this case Portal will generate certificate on your behalf, add it to the subscriptions you have access to, and share it with you in the form of XML file.
  2. Create and configure your VM.
    When you have an access to the Azure environment you can start creating other pieces you are needed. The first thing to create is a VM. Don’t forget that you need a VM that:
    1. Has a SQL Server in it and version of the SQL Server is the same or higher as the version of SQL Server you have currently, otherwise your deployment capabilities will be very limited.
    2. Has an Azure Drive attached, you will need it to store your data.
    3. In case VM was created in a Cloud Service it should have endpoints configured properly, otherwise please make sure you have Azure VNet and corresponding VPN.
    4. You should be able to connect to the SQL Server instance meaning connections should be accepted by the SQL Server and customer should be able to authenticate. In case your machine is not joined to domain this means only SQL authentication is available, so don’t forget to set up a SQL login in advance to the instance configuration.
    5. Windows firewall should allow connections to the SQL server to allow them to go though.
    6. Plus many other steps needs to be taken.
    7. Once your target system is set up you need to focus on copying data from the source system. The best option that gives you the most of the recovery capabilities it to use the Azure Storage as an intermediate location. In this case you either copy your files directly using Backup to URL, or backup them locally and then use Azure SDK to upload them to the storage.
    8. Once data is in the cloud you need to get it to the VM. Depends on the data size you can either pull it to the VM and restore or just use Restore from URL.

Those steps are describing the things need to happen to end up with your workload in Windows Azure VM. There are multiple small details in this process, same as a lot of things to consider. You can consult this article if you need more information on the subject.

The Wizard

You might have question how the process above is related to the wizard described in this post. This is exactly what Deploy Database to a Windows Azure VM wizard does for you, plus a few additional convenience items.

Let’s see how this looks in the wizard.

The first screen tries to provide you an idea of what information might be useful for you during the use of the wizard.

 

The Source Settings screen only want you to provide with two pieces of information:

  • Connection to the source server and database
  • Where to place backup files (file or UNC path). Please note: this path should look the same for the wizard and the Database Engine

 

Once you told the wizard where to get the data from you might want to connect to Windows Azure. There are 3 different options:

  • Manually provide with the management certificate and paste or type a subscription ID
  • Import a publishing profile if you have downloaded it already
  • Use you Microsoft Account (former Windows Live ID) to sign in to the portal and allow the Wizard to retrieve publishing profile behind the scenes for you

 

Once all authentication information is in place we connect to a Windows Azure environment, so you can start configuring it. There are a few things you could do:

  • Type a new Cloud Service name (or select it from the list if you already have one)
  • Type a new VM name (same as before, you can select it from the list if there are some to select from)
  • Select the storage account which is in the same region with the VM (if you don’t have any, don’t worry – wizard will create an account for you)
  • Press the Settings… button (One of the most important steps). This button behave differently depending on whether this is a new VM or an existing VM, but in all cases the button needs to be pressed to enter the information that is important for the process.

 

When you type a new VM name and press the Settings… button a new dialog comes up. If you have seen Windows Azure Portal before you should be familiar with most fields in this new dialog.

 

Please note: we are trying to warn you if we think that the target SQL Server instance might be lower version than the source instance. Sometimes those are hard errors that doesn’t allow you to continue, when we know for sure that the final configuration will not be compatible, while sometimes we can only guess. In either case the suggestion is to do not ignore those warnings:

 

When you select the VM that has existed before this makes the dialog look differently.

Please note two things here:

  • This is one of the first times SQL Server Cloud Adapter comes up on the screen. This is a new component that runs as a service inside a VM (you can run it on the physical machine too) and does all the magic of the VM configuration. We will spend a bit of time on its capabilities and restrictions a little later.
  • Since single Cloud Service can contain multiple VMs sharing the same public IP – this means Cloud Adapter Port might be different for those VMs. Also as the previous phrase suggests this is a public port.

Once you chose settings for a new VM or connected to an existing one the Target Database section became active. Please note that Database name field is editable meaning you can change your mind about database name on the target machine.

 

Now you are just two easy steps away from having your database in the cloud. The first step is to confirm the selection you have made when following the wizard:

 

And a second step is to wait until process completes. Depends on the database size and your connection speed the time might differ, but wizard will make sure it is done.

 Deploy Database to a Windows Azure VM Deployment Progress

And finally you would see the confirmation screen that also contains a link to the detailed log.

 

This log is important for most of the troubleshooting actions, same as for digging for details.

The Result

Deploy Database to a Windows Azure VM wizard has guided you through the deployment process. Now this is the right time to observe what the end result is look like.

  1. You have Virtual Machine running in Windows Azure
  2. This VM has SQL Server instance configured
  3. You database is deployed to the instance

However, a few manual steps might be needed.

  1. You need to add a logins and give those logins access to the database the wizard has deployed.
  2. If your application would be connecting to the VM from outside – make sure to configure the SQL Server to allow the connectivity.

Finally

The Deploy Database to a Windows Azure VM wizard is in place to simplify the steps needed to have a database in a Windows Azure VM environment. The whole wizard is made around the scenarios when you need your database in the cloud as soon as possible and willing to skip some studying or preparation steps as a tradeoff for the speed of the deployment. The other assumption was that you don’t want to do a deployment manually and prefer SQL Server to do some of this work for you.

The wizard was made as an extremely safe to your data. There is no chance it could delete, overwrite, or even modify your data. All operations it does are read only for the data.

We really hope this new feature will help you to save time during your deployments.

12 Dec 08:20

Windows Azure Pack - Reconfigure portal names, ports and use trusted certificates

by Anders Ravnholt [MSFT]

Following up from the Installing and configuring Windows Azure Pack (WAP) series we are now at the point where we want to reconfigure server names and ports as well as assigning trusted certificates to my WAP Portals.

Blog post in the series are:

  • Installing & Configuring Windows Azure Pack (Released)
  • Configuring Ports and Certificates for Windows Azure Pack (This blog post)
  • Configuring Federation with ADFS and WAP (Coming soon)
  • Federating ADFS with tenant domain (Coming soon)

 

In this blog post we will look at how you can change portal names and ports for the Tenant and Admin portals in WAP.

Once that is done we are going to issue certificates from an Enterprise CA to the Admin portal as well as issuing a certificate to the Tenant Portal. As I don't have a Public CA Certificate I'm going to use one from my Enterprise CA, but the concept for a Public CA is exactly the same as if I was using certificates from a trusted CA like VeriSign or similar.

Figure 1: Windows Azure Pack Tenant Portal

 

Architecture:

Windows Azure Pack has different components which serve various functions.

By looking at the roles being installed on a WAP Server for an express install, we can see a long list of Web Services running on the WAP Server.

These different Web Services provide various roles within the WAP Infrastructure

In this blog post scenario, we will be working with the following Web Services:

  • WAP Tenant Portal Service (MgmtSvc-TenantSite): Hosts the WAP Tenant Portal
  • WAP Tenant Authentication Service (MgmtSvc-AuthSite): Hosts the authentication for tenants
  • WAP Admin Portal Service (MgmtSvc-WindowsAdminSite): Hosts the Admin Portal
  • WAP Admin Authentication Service (MgmtSvc-WindowsAuthSite): Hosts the Admin Authentication

 

Figure 1: List of Web Sites (roles) running on a WAP Server (Express install)

Figur 2: WAP Infrastructure example

When a tenant accesses the WAP Tenant portal (exposed to the Internet) they will be redirected to the WAP Tenant Authentication Service to validate if the user is allowed to access the system, once the WAP Tenant Authentication service has validated the user, it will be redirected back to the WAP Tenant portal with access to WAP services. The tenant authentication service uses claim based authentication and can use different authentication methods like ADFS or .Net. In this scenario we are using default authentication (.Net), in the following three blog posts Shri from the WAP Product team will explain how you can change the WAP tenant authentication service to make use of ADFS.

In the PoC setup these services are running on the same server (WAP01.contoso.com) as shown on "figure 1".

A similar scenario happens when a WAP Administrator accesses the WAP Admin portal (only accessible on the internal network), the WAP admin portal will redirect the admin to the WAP Admin Authentication service which by default uses Windows Authentication. Once Windows Authentication service has authenticated the user, the user is redirected back to the WAP Admin portal with access to WAP.

 

Scenario:

After Installing and configuring Windows Azure Pack with the basic settings for the Contoso.com proof of concept (PoC), the next steps are to configure the following:

  • Change WAP portal name.
  • Configure tenant and admin portals to run on port 443 (Https).
  • Replace the self-signed certificates with certificates provided by the enterprise CA (and consequently remove the warnings displayed in Internet Explorer due to the self-signed certificates).
  • Change the WAP Tenant Portal to use an internet facing url.
  • Change the WAP Tenant Authentication site to use the public web address that is also used by the WAP Tenant Portal.

     

The Servers are configured as follows:

Role

Name

Function

Active Directory

DC01.contoso.com

Active Directory, ADFS, Certificate Server

Windows Azure Pack

WAP01.contoso.com

Windows Azure Pack Express Install

Service Provider Foundation

SPF01.contoso.com

Service Provider Foundation

SQL Server

DB02.contoso.com

SQL Instance hosting the WAP databases

Virtual Machine Manager

VMM01.contoso.com

Virtual Machine Manager 2012 R2 managing one Hyper-v host

 

The portals DNS names will be renamed to the following:

  • WAP Admin Portal: wapadmin.contoso.com port 443
  • WAP Tenant Portal Internal: WAPCloud.contoso.com port: 443
  • WAP Tenant Auth: wapcloud.contoso.com port: 444

Disclaimer: This environment is meant for testing only. This should not be considered guidance for production use, as several decisions made in this blog post are not targeting a production environment.

Reconfigure portal names for Windows Azure Pack

As the two WAP Portals by default (in our proof of concept) are installed with https://wap01.contoso.com:30081 for the Tenant Portal and https://WAP01.contoso.com: 30091 for the Admin Portal we want to change these to use more portal friendly names.

To do this we need to do the following:

  • Create a DNS record for the new portals.
  • Install and configure an enterprise CA.
  • Request certificates for WAP Web Services from the CA.
  • Change ports and assign certificates for WAP Services.
  • Update Windows Azure Pack with the new web service modifications.

 

Create a DNS record for the new portals.

To create new DNS records do the following:

  1. Logon to the DNS server.
  2. Start DNS Manager
  3. Expand dc01 > Forward Lookup Zone > (e.g. contoso.com)
  4. Right click on and select New Host (A-Record)
  5. Provide the DNS name and the IP address of the WAP Admin Server (e.g. Name: wapadmin, IP: 192.168.1.40)

    Figure 3: Creating a new A-record in DNS manager

  6. Create the other DNS name for the remaining portal (e.g. wapcloud,) and provide the WAP01 IP address as all roles are installed on the same server in the PoC.
  7. Verify that the DNS records shows in the list.

    Figure 2: List of DNS records in DNS Manager.

  8. Close the DNS Manager.

 

Use trusted certificates for the Windows Azure Pack

In order to use CA signed certificates in our PoC environment we need to do the following:

  • Install a CA Server
  • Configure the CA Server
  • Request Web Server certificates from the CA Server
  • Change Web Sites to use certificate.

 

Install a CA Server

To install a CA Server do the following steps:

  1. Logon to the server that will be running the CA Server (e.g. DC01)
  2. Start Server Manager.
  3. Select Dashboard on the left.
  4. Click Add roles and features.
  5. Click next to: before you begin, Installation type and server selection.
  6. In Server Roles select Active Directory Certificate Services under Roles.
  7. Click next to features.
  8. Under Role Services Select the following: Certification Services, Certificate Enrolment Policy.., Certificate Enrolment Web, Certification Authority..
  9. Accept the add-ons and click next to Web Role Services.
  10. Click Install.
  11. Verify that the install finishes with success.

 

Configure CA Server

Do the following to configure the newly installed CA Server:

  1. On the CA Server start Server Manager as a user that is member of Enterprise Admins.
  2. Select AD CS on the left.
  3. A message will show in the main window:

    Figure 3: Configuring CA Server in Server Manager

  4. Click on More.
  5. In the server task details click on Configure Active Directory Cert..
  6. Select All Roles to configure except for Web Service and click Next.
  7. Select Enterprise CA.
  8. Select Root CA.
  9. Select Create a new private key and click next.
  10. Click next to cryptography.
  11. Click next to CA Name and keep default.

    Figur 4: CN Names for the CA Server

  12. Keep 5 years and click next
  13. Click next to Certificate Database
  14. Select Windows Integrated auth.. and click next
  15. Under Server Certificate Select Choose and assign a certificate for SSL later and click next
  16. Click Configure
  17. Click Close

     

Change WEB Sites to use Certificate

 

Issue Certificate for the WAP Admin Portal

Greg from CAT has created a blog post which describes how the certificate can be automated. The blog post can be found here: Automating Active Directory Certificate Services with Windows PowerShell – Part 1.

The manual steps will be described below:

To issue certificates for the WAP Services the following steps needs to be done:

  1. Logon to the WAP Server as an administrator (e.g. wap01.contoso.com)
  2. Open IIS Manager on the WAP Portal Server
  3. Select the IIS server under connections
  4. In the main window select server certificates under IIS
  5. In the right windows select create a domain certificate
  6. Specify the following:
    1. WAPAdmin FQDN under common name (e.g. wapadmin.contoso.com)
    2. Orginazation: Contoso
    3. Organ unit: NA
    4. City NA
    5. State NA
  7. Click Next
  8. Select a CA and provide the friendly name for the certificate (e.g. wapadmin.contoso.com)

    Figure 5: Certificate request from IIS Manager

  9. Click Finish
  10. Verify that the certificate shows in the list of certificate

    Figure 6: Certificate list in IIS Manager

    We now have a web certificate, which we can use for the WAP Admin Portal.

     

  11. Request two more certificate following the same procedure:
    1. WAP Authentication: wap01.contoso.com
    2. WAP Tenant Portal Internal: WAPCloud.contoso.com
  12. There should now be three certificates in the Web Server Certificate list from Contoso CA.

    Figure 7: WAP Certificates in IIS Manager

     

Change ports and certificates for the WAP Admin Portal

The following steps needs to be done in order to change ports and certificates for the admin portal.

  1. Logon to the WAP server as Administrator (This assumes it's an express install).
  2. Start ISS Manager.
  3. Expand IIS Server > Sites.
  4. Right click on MgmtSvc-AdminSite and select edit bindings.
  5. Select https 30091 and select edit.
  6. Change port to 443.
  7. Set hostname to wapadmin.contoso.com.
  8. Select the certificate from the drop down list which was created earlier from the CA.

    Figure 8: IIS Certificate list for Web Site Bindings

  9. Click Ok.
  10. Restart the Web Site.
  11. Right click on MgmtSvc-WindowsAuthSite and select edit bindings.
  12. Select the certificate from the list wap01.contoso.com.
  13. Click Ok.

 

Change ports and certificates for the WAP Tenant Portals

The following steps needs to be done in order to change ports and certificates for the tenant portal.

  1. Logon to the WAP server as Administrator (This assumes it's an express install).
  2. Start ISS Manager.
  3. Expand IIS Server > Sites.
  4. Right click on MgmtSvc-TenantSite and select edit bindings.
  5. Select https 30081 and select edit.
  6. Change port to 443.
  7. Set hostname to wapcloud.contoso.com.
  8. Select wapcloud.contoso.com in the drop down list for certificates
  9. Click Close
  10. Right click on MgmtSvc-AuthSite and select edit bindings
  11. Select https 30071 and select edit.
  12. Change port to 444.
  13. Select wapcloud.contoso.com in the drop down list for certificates.
  14. Restart the MgmtSvc-TenantSite Web Site from the action menu.
  15. Restart the MgmtSvc-AuthSite Web Site from the action menu.

 

Update Windows Azure Pack with the new settings

Updating the Windows Azure Admin Portal

The TechNet documentation can be found here: Reconfigure FQDNs and Ports in Windows Azure Pack

To update WAP with our modifications the following commands needs to be executed, where we will use the values used in the scenario.

  • Set-MgmtSvcFqdn: This command will update the FQDN names for the modified services in the WAP Database.
  • Set-MgmtSvcRelyingPartySettings: This command will set the relay location for the WAP authentication service (Tenant or Admin)
  • Set-MgmtSvcIdentityProviderSettings: This command will update the authentication service where redirects will be redirected once verified.

We will be using the following arguments while executing the commands:

WAP Database Server:  db02.contoso.com

WAP Database user:    sa

Admin Portal FQDN:    wapadmin.contoso.com

Admin Portal Port:    443

Admin Auth Service:   wap01.contoso.com:30072

 

To update the modification made to WAP Services in the WAP database do the following.

  1. Logon to the WAP Server as a WAP Administrator.
  2. Start a PowerShell window.
  3. Import the WAP PowerShell module:

    Import-Module -Name MgmtSvcConfig

     

  4. Update WAP Admin Portal with the updated FQDN settings by running the following command:

    Set-MgmtSvcFqdn -Namespace "AdminSite" -FullyQualifiedDomainName "wapadmin.contoso.com" -Port 443 -Server "db02"

     

     

  5. To set the WAP authentication service FQDN for the admin portal run the following command.

    Set-MgmtSvcRelyingPartySettings –Target Admin –MetadataEndpoint 'https://wap01.contoso.com:30072/FederationMetadata/2007-06/FederationMetadata.xml' -ConnectionString "Data Source=db02.contoso.com;User ID=sa;Password=*******"

     

  6. To set the authentication service redirection location to the admin portal run the following command:

    Set-MgmtSvcIdentityProviderSettings –Target Windows –MetadataEndpoint 'https://wapadmin.contoso.com/FederationMetadata/2007-06/FederationMetadata.xml' -ConnectionString "Data Source=db02.contoso.com;User ID=sa;Password=********"

     

     

Updating the Windows Azure Tenant Portal

The following attributes are used for configuring the WAP Tenant Portal.

WAP Database Server:  db02.contoso.com

WAP Database user:    sa

Tenant Portal FQDN:   wapcloud.contoso.com

Admin Portal Port:    443

Admin Auth Service:   wapcloud.contoso.com:444

 

To update the tenant portal do the following:

  1. Logon to the WAP Server as an Administrator.
  2. Start PowerShell.
  3. Import the WAP PowerShell module:

    Import-Module -Name MgmtSvcConfig

     

  4. Update WAP Tenant Portal with the updated settings by running the following command:

    Set-MgmtSvcFqdn -Namespace "TenantSite" -FullyQualifiedDomainName "wapcloud.contoso.com" -Port 443 -Server "db02"

     

  5. Update WAP Tenant Auth Site with the updated settings by running the following command:

    Set-MgmtSvcFqdn -Namespace "AuthSite" -FullyQualifiedDomainName "wapcloud.contoso.com" -Port 444 -Server "db02"

     

  6. To set the WAP authentication service FQDN for the tenant portal run the following command.

    Set-MgmtSvcRelyingPartySettings –Target Tenant –MetadataEndpoint 'https://wapcloud.contoso.com:444/FederationMetadata/2007-06/FederationMetadata.xml' -ConnectionString "Data Source=db02.contoso.com;User ID=sa;Password=********"

     

  7. To set the authentication service redirection location to the admin portal run the following command.

    Set-MgmtSvcIdentityProviderSettings –Target Membership –MetadataEndpoint 'https://wapcloud.contoso.com/FederationMetadata/2007-06/FederationMetadata.xml' -ConnectionString "Data Source=db02.contoso.com;User ID=sa;Password=********"

     

Verify the WAP modification works.

To verify that the modification works do the following:

Pre-requisite: As we don't have a public certificate for our PoC setup we are going to install the CA certificate on the computers in the Trusted Certificates store from where we will access the WAP Portals.

  1. Login to a computer as a user that has WAP Admin Portal access.
  2. Start a browser.
  3. Type the URL that the WAP Admin Portal was changed to (E.g. https://wapadmin.contoso.com)

    Verify that the WAP Admin Portal loads using the new URL

    Figure 9: Updated URL in the WAP Admin Portal

     

  4. Verify that the tenant portal works by opening a browser and go to https://wapcloud.contoso.com. 
  5. During the authentication sign-in process note the redirection to the wapcloud.contoso.com:444 authentication site.

     

    Figure 10: Updated URL in the WAP Tenant Portal

  6. Verify that after login the login redirects you back to the WAP Portal.

    Figure 11: Updated URL in the WAP Tenant Portal

     

Summary

The goal with this blog post was to show how it's possible to reconfigure portal names, ports and use trusted certificates after deploying the Windows Azure Pack.

In the blog post we did the following

  • Created new DNS records
  • Installed and configured CA Enterprise server
  • Issued certificates for the WAP Web Services
  • Change host names, ports and certificates for the WAP Web Services
  • Updated WAP Database with the new configurations
  • Verified that the configuration was successful.

In the next three blog posts Shri from the WAP Product team will walk you through how to configure ADFS with Windows Azure Pack.

  • Federated Identities to Windows Azure Pack through AD FS – Part 1 of 3 (Coming soon)
  • Federated Identities to Windows Azure Pack through AD FS – Part 2 of 3 (Coming soon)
  • Federated Identities to Windows Azure Pack through AD FS – Part 3 of 3 (Coming soon)

Happy building your PoC environment for Windows Azure Pack.

Anders Ravnholt

12 Dec 08:20

First Week As The DBA Whitepaper Published

by Andy Warren

For those that attended the webcast I did last week for Idera the companion whitepaper is now available for download. You can view the recording and/or get the whitepaper by just entering basic contact information.

Also, one of the things I mentioned in the webcast but omitted from the slides was Chocolately.org – it’s a great way to blast a bunch of your standard apps onto that new client/employer laptop.

I wrote the whitepaper first, then built the slides from it. Interesting because I did more writing than I normally would for a deck and interesting because it was easy to do the transition – I had 10 main points to cover in the paper and that mapped well to 10 slides.

It was fun to pick a niche topic (most of us don’t have that many first weeks) and try to do something with it. For those that watched the webcast or read the whitepaper I hope you find it useful.

10 Dec 02:05

Waiting, waiting…

by Rob Farley

“It just runs slow these days”

I’m sure you’ve heard this, or even said it, about a computer that’s a few years old. We remember the days when the computer was new, and it seemed to just fly – but that was then, and this is now. Change happens, things erode, and become slower. Cars, people, computers. I can accept that cars get slower. They lose horsepower over time as the precision components wear and become less precise. I also know that my youth is a thing of the past. But electronics? What happens there?

Well, in my experience, computers don’t get slower. They just feel slower. I see two main reasons, and neither of them are because of ageing hardware.

Your computer might be slower than it was yesterday even. In the world of databases we might even be investigating why the computer is slower than it was five minutes ago. Again, it’s probably not because of ageing hardware.

One possible reason is that we’re simply asking systems to do more. If we’re comparing our laptops to when we bought them, we’re probably refreshing webpages more frequently (often in the background) and have installed too many utilities (hopefully not in the background, but you never know), and the system has more to get done in a given minutes compared to when it was new. With a database server, the amount of data has probably grown, there may be more VLFs in the log file to deal with, more users pushing more transactions. These are not things you want to uninstall like that annoying browser search bar on your aunt’s ageing computer, but they can be a very valid reason for things to be slower. Hopefully you are tuning your system to make sure that scalability is possible, and you’re very happy with the amount of extra work that’s being done, even if it does mean that some processes take a little longer than they once did.

This problem can be summarised by the fact that the system is having to wait for resources to be free so that it can get its stuff done.

Another reason for slowness is that the system is having to wait more for other reasons, things that you don’t want it having to wait for. An increase in busyness will cause slowness because of waiting, but you can easily make the argument that this is ‘acceptable’. It’s much more of a problem if the system is being slower without actually achieving any more than it was before.

TSQL2sDay150x150Waits are the topic of this month’s T-SQL Tuesday, hosted by Robert Davis (@sqlsoldier). Go and have a look at his post to see what other people have written about on this topic.

In the SQL Server world, this kind of problem is identified by looking at wait stats. The system records what processes are waiting for, and you can see these by querying sys.dm_os_wait_stats. It’s very useful, but querying it in isolation isn’t as useful as taking snapshots of it. If you want to store copies of it over time, you may prefer to do something along the lines of:

--A schema for monitoring data can be useful
create schema monitoring;

--Create a table that has the structure of sys.dm_os_wait_stats
select top (0) *
into monitoring.waits
from sys.dm_os_wait_stats;

--Add a column to know the the stats are collected
alter table monitoring.waits
add snapshot_time datetime default sysdatetime();

--Run this section regularly
insert monitoring.waits (wait_type, waiting_tasks_count, wait_time_ms, max_wait_time_ms, signal_wait_time_ms)
select * from sys.dm_os_wait_stats;

Regularly collecting snapshots of wait_stats like this can give you a picture of what has occurred over time. You can easily pull this data into a report, or into Excel, or even get a picture of a recent version quite easily, using a query such as:

with numbered as (
select *,
    wait_time_ms - lead(wait_time_ms) over (partition by wait_type order by snapshot_time desc) as diff_wait_time,
    waiting_tasks_count - lead(waiting_tasks_count) over (partition by wait_type order by snapshot_time desc) as diff_wait_count,
    1000 * datediff(second,lead(snapshot_time) over (partition by wait_type order by snapshot_time desc),snapshot_time) as diff_ms,
    row_number() over (partition by wait_type order by snapshot_time desc) as rownum
from monitoring.waits
)
select wait_type, snapshot_time, diff_wait_count, diff_wait_time, diff_ms
from numbered
where rownum = 1
order by diff_wait_time desc, wait_type;

This query compares the amount of wait time for each type (which is frustratingly stored as a string) since the previous one, using the LEAD function that was introduced in SQL Server 2012 (LEAD rather than LAG because we’re looking at snapshot_time desc, not ASC). Using ROW_NUMBER(), we can easily pick out the latest snapshot by filtering to rownum = 1, but if you’re just wanting to chart them, the contents of the CTE will be enough.

Make sure you keep an eye on the amount of data you’re storing, of course, and be careful of the impact of someone inadvertently clearing the stats (though as the query picks up deltas, you should be able to consider a filter that will ignore the deltas that might have spanned a period during which the stats were cleared).

This post is not going to go into all the different wait types to tell you which ones are worth worrying about and which ones are worth ignoring. But what I would suggest to you is that you track what’s going on with your environment and keep an eye out for things that seem unusual. When troubleshooting, you will find any history invaluable.

10 Dec 02:05

Spatial Index is NOT used when SUBQUERY used

by psssql

I have found the following link to be invaluable when working with and tuning SQL Server Spatial indexes:  http://technet.microsoft.com/en-us/library/bb895265.aspx

However, the link is not as clear as it could be about the Spatial index selections made by the SQL Server query processing.  Here are a few additional tidbits that may assist you.  (Note:  Similar tips may apply to non-Spatial queries as well.)

1. The Spatial method must be on the left side of the predicate (where clause)

       col.STIntersects(@val) = 1   --    Can use the index if costing is appropriate
       1 = col.STIntersects(@val)   --    Unlikely to use index, use previous form

2. The value passed to the spatial method must be ‘constant like’

       col.STDistance(@val) = 1 * 10000   --    Can use the index if costing is appropriate
       col.STDistance(@val / 10000) = 1   --    Unlikely to use index, use previous form

3. Extension of #2 for more complex operations

/* The subquery form does not consider the index */
Select * from Spat where col2.STIntersects((select col2 from Spat where Id = 23 and col2 is not null))=1

/* Using index hint - getting an error message for this query form */
-- Msg 8622, Level 16, State 1, Line 1
-- Query processor could not produce a query plan because of the hints defined in this query.

Select * from Spat with (index(SpatIDX)) where col2.STIntersects( (select col2 from Spat where Id =23) ) = 1

/* Variable or Join forms attempt to use the index */
Declare @i geography
Set @i = (select col2 from Spat where Id =23)
Select * from Spat  where col2.STIntersects((@i))=1  order by Id

Select s1.* from Spat as s1
join Spat as s2 ON
      s1.col2.STIntersects(s2.col2) = 1
   and s2.Id = 23
order by s1.Id

As you can see the variable or join syntax is a construct the SQL Server query processing can evaluate for Spatial index usage where as the subquery is generally not considered.

Be sure to check the form of your queries to make sure the indexes are properly considered.

Bob Dorr - Principal SQL Server Escalation Engineer

10 Dec 01:05

Intel SSD Roadmap Points To 2TB Drives Arriving In 2014

by Soulskill
MojoKid writes "A leaked Intel roadmap for solid state storage technology suggests the company is pushing ahead with its plans to introduce new high-end drives based on cutting-edge NAND flash. It's significant for Intel to be adopting 20nm NAND in its highest-end data center products, because of the challenges smaller NAND nodes present in terms of data retention and reliability. Intel introduced 20nm NAND lower in the product stack over a year ago, but apparently has waited till now to bring 20nm to the highest end. Reportedly, next year, Intel will debut three new drive families — the SSD Pro 2500 Series (codenamed Temple Star), the DC P3500 Series (Pleasantdale) and the DC P3700 Series (Fultondale). The Temple Star family uses the M.2 and M.25 form factors, which are meant to replace the older mSATA form factor for ultrabooks and tablets. The M.2 standard allows more space on PCBs for actual NAND storage and can interface with PCIe, SATA, and USB 3.0-attached storage in the same design. The new high-end enterprise drives, meanwhile, will hit 2TB (up from 800GB), ship in 2.5" and add-in card form factors, and offer vastly improved performance. The current DC S3700 series offers 500MBps writes and 460MBps reads. The DC P3700 will increase this to 2800MBps read and 1700MBps writes. The primary difference between the DC P3500 and DC P3700 families appears to be that the P3700 family will use Intel's High Endurance Technology (HET) MLC, while the DC P3500 family sticks with traditional MLC."

Share on Google+

Read more of this story at Slashdot.








10 Dec 01:00

In Three Years, Nearly 45% of All the Servers Will Ship To Cloud Providers

by samzenpus
dcblogs writes "IDC expects that anywhere from 25% to 30% of all the servers shipped next year will be delivered to cloud services providers. In three years, 2017, nearly 45% of all the servers leaving manufacturers will be bought by cloud providers. The shift is slowing the purchase of server sales to enterprise IT. The increased use of SaaS is a major reason for the market shift, but so is virtualization to increase server capacity. Data center consolidations are eliminating servers as well, along with the purchase of denser servers capable of handling larger loads. The increased use of cloud-based providers is roiling the server market, and is expected to help send server revenue down 3.5% this year, according to IDC."

Share on Google+

Read more of this story at Slashdot.








09 Dec 20:16

Quick Post – Trouble Installing vSphere Client on Windows 8.1 Enterprise

by mjb

I’m setting up a new VMware-based lab for work and came across an error worth sharing.

I just finished installed Windows 8.1 Enterprise as a VMware Fusion-based jump box. I then tried running the vSphere installed, VMware-viclient-all-5.1.0-1064113 to be specific, and had no luck getting it to finish. I kept receiving an “installer already running” error despite having no sign of vSphere installed. I killed tasks, reran after, then rebooted Windows a few times. No luck.

After a while, I was hit by this error:

Setup failed to launch Microsoft Visual J# 2.0 Second Edition installer.

vsphere-5.1-fail

The prompt above points me toward a missing executable, vjredist64.exe, that seemed at the root of the problem. I ended up googling my way to a download of that file and installed it without issue. After a restart, I reran the VMware-viclient-all-5.1.0-1064113 installer.

Bam! vSphere installed without a problem.

Now I don’t conclude much from this issue. Maybe a simple updates would have filled in the gap.  What I know for sure: Since a number of us tend to jump on a Windows machine until we get vCenter up and running, I find this fix worth writing down.

 

09 Dec 20:14

Solid Performance, with a Solid Name: Welcome to Solidfire.

by Shawn Cannon
EMC logo

By Roger lund

I had the change to meet with the CEO and Founder of

“Quality of service is not a feature. It is a architecture.” They go in to state it is ” the only way to deliver guaranteed storage performance in public and private cloud infrastructure” source
http://solidfire.com/technology/qos-benchmark-architecture/

The ability to set levels of service is critical in the service provider space. Min, Max and Burst IOPS per volume for example.


QOS Demo Video

As you can see below, the system can really push out the IO the numbers.

As you can see, fairly straight forward on the hardware side, however, I would recommend dedicated switching.

Source http://solidfire.com/technology/solidfire-storage-system/

What else? Add features like a array solutions, VMware , openstack, to name a couple and multitenancy reporting but I won’t touch on this today.

I’ll wrap up with a interview of me personally interviewing Dave, Owner and founder of Solidfire.

My Thoughts?

Need a top tier, with extreme performance? Need QOS? Need multitenancy? There are a Solid reasons to consider soldfire when looking at Storage. Now if I could get a unit in my lab…

Thanks

Roger Lund

ystem Examples*

Cluster Size 5 Nodes 20 Nodes 40 Nodes 100 Nodes
Effective Capacity** 60TB with SF3010
108TB with SF6010
173TB with SF9010
240TB with SF3010
432TB with SF6010
692TB with SF9010
480TB with SF3010
864TB with SF6010
1.4PB with SF9010
1.2PB with SF3010
2.1PB with SF6010
3.4PB with SF9010
4K Random IOPS 250,000 with SF3010/SF6010
375,000 with SF9010
1M with SF3010/SF6010
1.5M with SF9010
2M with SF3010/SF6010
3M with SF9010
5M with SF3010/SF6010
7.5M with SF9010
kW at max IO Load 1.5 kW with SF3010/SF6010
2.2 kW with SF9010
6 kW with SF3010/SF6010
9 kW with SF9010
12kW with SF3010/SF6010
18 kW with SF9010
30kW with SF3010/SF6010
45 kW with SF9010
Rack Units 5RU 20RU (half rack) 40RU (full rack) 100RU

- See more at: http://solidfire.com/technology/solidfire-storage-system/#sthash.ccyH5XwR.dpuf

Fine-grain, per-volume settings

SolidFire’s QoS functionality lets cloud providers set and control the fine-grain performance levels for every volume and guarantee application performance with firm SLAs.

  • Min IOPS – The minimum number of I/O operations per-second that are always available to the volume, ensuring a guaranteed level of performance even in failure conditions.
  • Max IOPS – The maximum number of sustained I/O operations per-second that a volume can process over an extended period of time.
  • Burst IOPS – The maximum number of I/O operations per-second that a volume will be allowed to process during a spike in demand, particularly effective for data migration, large file transfers, database checkpoints, and other uneven latency sensitive worklo

- See more at: http://solidfire.com/technology//solidfire-element-os/guaranteed-qos/#sthash.rIuGPn5K.dpuf

Quality of Service is not a feature.

It is an architecture

- See more at: http://solidfire.com/technology/qos-benchmark-architecture/#sthash.aTLD0AKs.dpuf

Quality of Service is not a feature.

It is an architecture.

- See more at: http://solidfire.com/technology/qos-benchmark-architecture/#sthash.aTLD0AKs.dpuf

Quality of Service is not a feature.

It is an architecture.

- See more at: http://solidfire.com/technology/qos-benchmark-architecture/#sthash.aTLD0AKs.dpuf

Quality of Service is not a feature.

It is an architecture.

- See more at: http://solidfire.com/technology/qos-benchmark-architecture/#sthash.aTLD0AKs.dpuf

Source: vBrainstorm

07 Dec 02:32

The World Needs More A/B Testing

by Andy Warren

Have you ever been told that it’s ok to fail? Did it ever feel like it was really ok to fail? Are you ok with other people trying and failing? Or do we look at them and think, they didn’t try hard enough, they didn’t ask the right questions, I would have done x, or …..?

It’s incredibly hard to get anything right on the first try. It doesn’t mean we shouldn’t try to get it right the first time, but “practice makes perfect” is something we understand. The path to success almost always requires iterations, with the caveat that you have to learn something from each iteration or you’re just doing mindless repetition.

I’m a huge believer in iterative improvements because I’ve seen it work over and over again. I’m a believer in failing fast because I’ve seen the value of that many times too. Yet I’m still not, deep down, ok with failure. How can that be? I’m a smart guy on good days, why can’t I internalize the goodness of failure better than I do?

More and more I’ve grown to believe that trying to teach the ok-ness of failure is not going to get us where we want to go. I think what we should teach is that all ideas can be made better if you watch and think and learn and that the more you do it, the greater the chances it will be successful, or more successful than the time before. It doesn’t mean the ideas need to be small or that the failures need to be small.

That brings us to A/B testing. Imagine if every time we added a feature or tried to solve a problem we forced ourselves to test two variations of it (or least envision two variations). That forces to us to think about multiple paths to success and it builds in the idea of failure without saying failure – one of them will do better than the other and it builds in the idea of iterations. It should double the velocity of learning in the best case, but it won’t (probably) make a bad idea good and it won’t prevent a good idea from being badly implemented, but whether the core idea succeeds or fails, we get there faster. The goal isn’t speed though, it’s framing ideas so that we’re focused on learning from iterations.

06 Dec 18:58

SQL Server Error Log, A Little Deeper

by Chris Shaw

December 2, 2013 I posted a blog titled, SQL Server Error Log. Often the things I post about are the items that I am using that day, or recently. Many of the topics, are simply based on how I may have made using SQL Server a bit easier for me to use one way or another. In the SQL Server Error Log post I had mentioned a few of the basic things I do when configuring a server to make the Error Log a little easier for me to navigate, or get the information I need quicker. It occurred to me after a question from a well-respected community member that an example on how I use xp_readerrorlog, could add to the usefulness of the post. Thanks for the suggestion.

In the post I recommend tabling your filtered error logs. I am fond of basing all my work on the negative (I will explain more of that in a later post). The term working with the negative for me means working on the things that I am not expecting. For example, I expect that every hour my transaction logs are going to be backed up, I don’t need an alert or an email each time that the transaction log completes without any issues. I do however want the alarms blasting every time they don’t execute as planned. I apply the same logic to my error logs, however even if I could remove entries from my error logs I always want them as they are. In order for me to have the best of both worlds, I like to take all the entries that I am not expecting and store them in a table so I can filter them with a query, join them to other tables to get other relevant events that were occurring at that time and perform other tasks that may be a bit difficult with the error logs as they stand.

Where to start?

When I start working with a new server, if I have the opportunity I like to create a database I call dbUtilities. The Tag on this post will show you some other posts where I have used this database. After that database has been created, or you have picked another location to store your table, start with a table that is similar to this one. I create this table so I can store my information long term. You may want to add some additional fields such as a PK.

On a quick side note, you should customize this code to meet your needs; the posted version here is in the simplest form. Make sure you look at the added notes at the bottom of this post before executing this script on your servers.

Create
Table
MyErrorlog
(LogDate
datetime,
ProcessorInfo VARCHAR (100),ErrorMSG
VARCHAR(2000))

Once I have my final destination created, I want to insert all my errors that exist in the error log into a temp table where I can start the filtering process.

DECLARE
@Errorlog
TABLE (LogDate
datetime,
ProcessorInfo VARCHAR (100),ErrorMSG
VARCHAR(2000))

INSERT
INTO
@Errorlog

EXEC
sp_executesql
N’xp_readerrorlog’

From this point I start to remove the entries that I expect to see, for each server this is on, the filtering could be different. I leave each of the delete statements as an individual statement, for ease of understanding and customizing. If performance is of any concern it would be better to combine the statements.

Delete

FROM
@Errorlog

WHERE
ErrorMSG
LIKE
‘%Log was backed up%’;

Delete

FROM
@Errorlog

WHERE
ErrorMSG
LIKE
‘%Setting database option COMPATIBILITY_LEVEL%’;

Delete

FROM
@Errorlog

WHERE
ErrorMSG
LIKE
‘%were backed up%’;


Delete

FROM
@Errorlog

WHERE
ErrorMSG
LIKE
‘%DBCC TRACEON%’;

Delete

FROM
@Errorlog

WHERE
ErrorMSG
LIKE
‘%without errors%’;

Once the filtering is done I insert these rows into a final table. (I create this all as one stored procedure, and execute it once a day. I am really only concerned with the last days’ worth of information.)

INSERT
INTO
MyErrorlog

        SELECT
Logdate,
‘Error Log’,
SUBSTRING(ErrorMSG, 1, 2000)

        FROM
@Errorlog

        WHERE
LogDate
>
DATEADD(dd,
-1,
GETDATE())

Some additional notes:

  • In the post SQL Server Error Log I mentioned cycling your error log to keep it small and manageable. This can be key when executing xp_readerrorlog. Large logs could impact the performance on your system, and the way I use the code I am expecting a smaller log.
  • In my case, I normally complete an additional step where I insert the rows into a table that I truncate and rebuild each day. The purpose of this table is so I can use it as a base for a reporting services report and pull information from that table and then subscribe to the report. This way I can other items such as failed jobs, replication status, and job executions that have failed to name a few. This is why I use the ‘ProcessorInfo’ column in the tables. It gives me the space to identify the source of the data in my final table.
  • Don’t forget to have an archive plan for your old data. I am a pack rat, and hate the idea of getting rid of info, I often get values from trends. So I will copy the data on occasion and store it somewhere offline if I have to.

06 Dec 00:30

How a patch for IE11 filled up my transaction log

by tjaybelt
The story

Our developers and QA group noticed a bug in our systems with regards to IE11. When the website was used with an IE11 browser, things didn't quite work as expected. After some research and testing, they determined that a patch could be applied to our web server that would fix IE11. It introduced some new code and some changes to the .Net framework on our web server.

So far, nothing affecting the database, right? Right! After all, they did some testing and saw that the effects of applying the patch were negligible to the test system. So, onward ho, we go!

The patch was applied one Saturday night a few weeks ago. Around 6:30 pm. I was at a basketball game with my family at the time, and noticed an alert from RedGate's SQLMonitor about a failed job. Odd I thought. (I had no idea that the patch had been applied at this point). Since it was a single job failure, I chose to continue with the family activity. An hour later, another one appeared. Another job failure. Odd. This is a simple job that is doing some simple validation of errors received from our web site services. Its never failed before.

Again, the game is almost over, and since it a once an hour job, I pushed it off a bit. When the game was over, instead of heading home, we headed to the office. My entire family. They love it, for about 5 minutes. But they bear it when it happens. In this case, we only spent an hour or so at the office. As I dig into the failed job, i realize that its a powershell script that is failing. It seems to be failing on the send email portion of the script. Duh. How lame. I could have swore that it had been tested and functioning at one point, but alas, it is failing. With some tweeking (not twerking) I get it to work, and execute it again manually. This time it sends mail. Yeah! Problem solved? No.

The email it sent me went on to describe that the errors received from the web site services had surpassed a threshold of acceptability. Oops. So a few hours ago, let's lie and say 2, I received an alert that should have told me that a threshold had been exceeded. Something bad was going on. But since the monitor couldn't email me, I didn't know the true depth of the issue.

A bit more digging finds me locating some 7k emails indicating an error I've never seen before. A short phone call later to a developer helps me see that this is a new error, and it might be associated with the patch that was applied to the web site (news i uncovered during my research of the situation). We do a quick cycle of the app pool and viola, the error stops happening in the frequency it was just bombarding us. Yeah!. Problem solved? No.

The Next Day

The next day several folks hit this issue and dig and dig and determine that the patch did something odd to the web site services, which in turn caused this new error to arise when our engines communicated from the outside world to our central services, attempting to synchronize their data with us. So, each one of the attempts failed, causing them to be told to go away and try again another day. In the meantime, the error didn't go away. It was just suppressed for a bit. It would raise again, causing the app pool to crash, and when the app pool was restarted, it would sometimes be good for a while, and other times start or continue spewing the error message email. Still, this seemed isolated from the database. Not affecting it, only in that engines were unable to connect. Some engines. Some of the time.

After more research and several days of looking, it was determined that we should roll the patch back from the system. Maybe that would revert us to the state we were in before. Maybe. How to test for this? Not a lot of confidence was given to several ideas, but the one that was picked was to increase the verbose logging on some of the services so that we could track down what was going wrong with said services as they attempted to connect and process. Yeah!. Problem solved? No.

This increased logging was applied to the web server, which is a VM, which has a small footprint. The increased logging caused some issues as it tried to write out the massive amount of logs for all the connections made that night. This caused the response to the engines to be something like "Hey, I know you tried to contact me, your central server, but I am borked right now. So, instead of accepting your data, why don't you reset your data next time we try to talk." Of course this message was editorialized, but that is the gist. Each of the engines that tried to connect and errored was told to 'reset' next time. A reset means, purge all data on central (which can be like a gig or 2 gig. maybe 5) and resend all said data. This reset process occurs all the time, from a lot of engines. It is a process that shouldn't be a bear on the topology. Usually. So as the web services did their job this night, with more verbose logging, we should have seen a fix, no? No!. Problem solved? No.

The next day was when we realized that the verbose logging caused bottlenecks on writing out all the logging, causing the engine's requests to time out, and send back the lovely error requesting a reset. This is the day before Thanksgiving. Which I had taken off as a vacation day. So I find out this information while off of work. I was informed that tons of resets may occur tonight, but everyone thinks it will be OK. No worries. Yeah!. Problem solved? No.

Thanksgiving Eve

That night, around 130am, after we had finished watching several movies as a family, I was headed to bed, and looked at my emails. Tons of emails. Tons of alerts. Tons of something hit the fan. I log into work to find that a database log file had filled up, grown to capacity on the drive, filled the drive and was basically sitting in a bad state. Other processes were still working. For example, replication was still sending data from this db server to the reporting system. Other requests were being serviced. But this db was in a bad state. Poor SQLMonitor was freaking out trying to keep up with the errors in the log about the tlog fulled up. It turned out that some 65 engines were performing resets. Right now. All at the same time. Some had finished, but others were still trying. Data was flying in and out of the system at a breakneck speed. Delete here, insert there. Tons of changes. Since it all needed to be replicated, it stuck around in the tlog longer and larger than it had ever before done. After some research, calming my beating heart, and maybe a little cussing under my breath, I add another tlog to the system on another drive, and see some pressure released, and processes continue to flow. I watch it and monitor it for an hour or so, until after 3am, and finally call it good. Things were working as expected, just a bit backed up. Several emails explaining the situation were sent out to the team, mainly to document what was going on, what I had found, and making sure others knew what was happening in realtime. I then decided to go to bed. Yeah!. Problem solved? No.

Around 5 am I awake and check my mail only to find out that just before that time, the replication database log file had filled up as well. It didn't fill the drive, but was 100% full, and things were halted. Fearing that I would have to rebuild replication in a few days, or worse, over the holiday, I got up, and drug my self downstairs to see what I could do with my magic DBA pixie dust. After reviewing things, looking at the state of several pieces of the puzzle, I created another log file on another drive for the distribution database. I double checked the status of the other log file, to find out that it was doing fine. Replication was backed up, and latent. But it started flowing again. It was like finding those pesky beavers had built yet again another dam across my river of data. Once the river was diverted, flow occurred. Yeah!. Problem solved? At this point, its 6am the morning of a major holiday, and I have no idea. It looked good at the moment. But would take a while to catch up. After watching a bit, I went to bed.

I slept in until after 10am on Thanksgiving morning. When I awoke, I grabbed my phone and looked, and viola, all seemed well. There was even an apology from the developer that had spearheaded the movement to rollback the patch, increase the logging, and see what happens. Well, we now know what happens. We can potentially affect systems and services to the point that they get blocked up way upstream, causing untold blockage down the river of our data, and inadvertently cause database problems.

What did I learn?

This was a perfect storm situation that I had not planned for. What can i take from this experience? What can I learn from it and do better?


  • I had sized the log files and drives appropriately, at least as far as normal processing occurred and the baseline of normality allowed me to measure. But in this situation, more log file size, and a bit of a larger disk, would have been the preferred configuration. Something I need to look into. 
  • I was happy that I had planned on having extra luns attached that were unused for Data and Log, in the case of an emergency. Since these already existed, and had space, I could easily add files to dbs and sit them on these drives, which are raring to go. This saved a lot of heartache and pressure. The solution was easily reached in both cases, and when things settled down to normal, the extra log files were removed, returning the drives to their emergency waiting state again. 
  • I also realize that for some types of alerts, it would be better if a more noisy alert occurred. When the db log file first filled up, it was around 11pm. I noticed it at 130am. I could have fixed it a lot earlier, had i been properly notified. I need to fix this. 
  • We all need to be more careful about touching our production system. Something as simple as an IE11 patch to a webserver was ultimately responsible for the log file filling up. If we could have tested this much better, in conjunction with the rest of the services, and not just assuming that this patch would affect a website, that it could reach into services that could cause errors to spew, and pile up too many errors and logging which could ultimately affect data coming in, etc. etc. If, If, If. We have to be better at seeing and planning for the If situations, and not be overly cautious that it causes us to freeze in our tracks and not let any change occur.
  • As smart as we all are individually and collectively, there is still much to be learned, even about our system that we have created and believe to know intimately. 



06 Dec 00:29

Now Playing: RunAsRadio

by KKline
Kevin Kline and Richard Campbell, the host of RunAsRadio, discuss many of the new features in SQL Server 2014, plus the NoSQL movement, BigData, and how the roles of developers and databases continue to evolve in organizations today....(read more)
05 Dec 22:22

10 Tips for the Minimalist DBA

by Karen Lopez

Title Slide Minimalist DBA

It seems that there’s so much to learn when you are first working as a production DBA.  What do you focus on first?  How should you prioritize your learning? What things should you automate and measure?  What skills are core to your job, no matter how long you’ve been a DBA.  These are the things that we think that all production DBAs need to know and continue to build upon.

In our DBA Fundamentals presentation on The Minimalist Guide to Database Administration, Thomas LaRock ( blog | @sqlrockstar )and I (@datachick)  discussed the core skills one should have when filling a DBA role. That presentation has been recorded (I will update here when it is posted).  I hope you were able to join us or will stop by and watch the recording.

10 Tips for the Minimalist DBA

  1. Protecting your data is your number one job.  I’m betting that no one else in the company has a to-do list to protect the company’s data.  Maybe someone at the strategic level, but not to actually ensure it’s available when it needs to be.  That means your first job is to ensure backups and recovery are working.  Test your backups, test your restores.  The first thing I do on new projects is to ask about the backup and recovery configurations.  I once found that the production system had not been backed up for more than five years, even though everyone else thought it was backed up daily.  Don’t just ask.  Go look.
  2. Don’t waste time alerting yourself of things that don’t require a reaction.  You may be tempted to set up alerts so that you get an email and a text message to notify you every time a backup successfully completes.  Or when your online monitor finds your database upright and smiling.  However, that soon leads to alert blindness.  You will miss the real alerts that you need to do something about.  Hard drive getting full? Response times approaching ice age times?  Those are things you’ll need to be ready to deal with.
  3. The best DBA is a lazy DBA.  Not a sleepy DBA, but a DBA that automates as much as you can. Think of this as a Driven, Lazy DBA (DLDBA). Do you have tasks that take 15 minutes to do, require no human decisions and that you have to do multiple times a week? Those are automation candidates.  Be lazy so that you can spend your already overscheduled time on tasks that need your awesome data professional skills.
  4. You can’t manage what you don’t measure.  If you don’t know it exists, or whether it is still up and running, you will be stuck in a perpetual firefighting mode.  Tom thinks that firefighters would make great DBAs, but that doesn’t mean that great DBAs are in 24/7 firefighting mode.  It’s also good to understand what’s the best way to measure these things. Almost all measurement consumes resources.  Do you understand what make sense for each case?
  5. You need to understand the basics of all kinds of things.  Always be learning and looking forward. Just because we picked a few things to focus on doesn’t mean you can ignore all the rest.  You need to build your basic literacy of things that aren’t your primary responsibility.  Storage basics, database design methods and practices, web services, development tools and methods…yes, there’s a lot.  Start with the things that are causing your databases the most pain and work out from there.  It’s sometimes a bit overwhelming when you attend a conference or pick up a book and realize how much you just don’t know enough about.  In fact, there’s a name for this: the Dunning-Kruger effect.  The more you know, the more realize how much you don’t know.  The only way to deal with this is to always be learning and looking forward.  Sure, there are some people making RAID-loads of money supporting COBOL and IMS systems, but overall staying afraid of new technologies like cloud, NoSQL, BI, and big data is going to keep you blissfully ignorant.
  6. You must practice everything while your database isn’t burning. It’s not enough to watch a one hour presentation on how backups and restores work.  It’s not enough to download a script.  You need to get in-depth, hands-on experience doing these things. Not just a one time in a class thing, but practice with real world situations and data.  You need to schedule that time to do this. And your boss needs to support this.  Then you need to practice with intentional errors.  What happens when the time on the server is messed up?  What happens when you don’t have the right log files?  What do you do if the SAN is down?  Where are your restore procedures and checklists documented?  You don’t want to be “learning on the job” when your PHB boss is standing beside you and there’s smoke coming out of the server.
  7. Writing stuff down is good. It’s Agile even. The Agile software method calls for the right amount of documentation.  Many read this as “no documentation”, but they are wrong. Yes, sometimes to you can just walk over and ask the person who set up the job why they did something, but on my projects that person has moved on to 6 more teams since I last saw him.  I recommend using wikis or SharePoint collaboration areas for these things, so that they are all in the same place and can be accessed with any device.  By the way, do you know if your documentation is backed up? Redundantly available?  Restores aren’t just for databases.
  8. The more you install, the more you have to manage and troubleshoot. Install only what you need.  Of course, that may mean looking a bit forward for planned uses, but there’s no need to install everything “just in case”. You might even want to look at Server Core as an option, since it has a tiny footprint, requires less management and you can still use your remote GUI tools to manage it.
  9. Don’t be the one that panics. Practice and documentation mitigate stress and panic.  This is where all your laziness, planning, testing and learning pay off.  You’ve seen the guy that sits in front of a server, rapidly pulling cables, pushing buttons, running scripts and wizards and has no idea that he’s making things worse.  Don’t be that guy.  The more calm you are, the better job you’ll do.  And the more calm everyone else will be.
  10. Empathy is a highly-valued trait. For users, for other data professionals, for everyone. Empathy isn’t sympathy or feeling bad for others.  It’s about understanding what their pain points are and why they feel the way they do.  If you can reflect that empathy, work will be easier and progress towards a common goal can be made.  If you come at all issues with a zero-sum game approach, you’re going to have issues getting in the way of doing your job.

We also listed these links as great places to find more information about these skills or to practice them:

What advice did you wish you’d had years ago?   What else should a minimalist DBA know about? 

05 Dec 22:21

How Simple Parameterization works

by JackLi

Recently we got a customer who upgraded from SQL Server 2005 to 2008. But their performance degraded greatly. What happened was that they had an update query that was run many times in a batch. The query was submitted as ad hoc by the application with different values.

Upon further investigation, we discovered that in SQL 2005, the query was parameterized but in SQL 2008 the query wasn't. So the cost came from the compiling every time for the update. Eventually, we resolved the issue. But it prompts a post on simple parameterization.

In SQL Server 2000, there isn't the concept of simple parameterization. There is just one option (auto parameterization). Starting SQL Server 2005, we offer two options for parameterization (simple and forced). Simple parameterization is the default and is the same as auto parameterization in SQL 2000.

How do I know if my query is even parameterized?

The easiest way to find out if a query is parameterized is to use graphical XML plan. Just point to the operator and take a look at some of the seek predicate.

Let's use this update example:

update t1 set c1 = 12, c2 = 23 where c1 = 12

The table t1 has c1 as clustered primary key (the reason why I also update c1 is related to customer's problem which I will talk about later). So the plan has a clustered index seek predicate on c1.

If the plan is parameterized, you will see the seek predicate on c1 as "Scalar Operator (CONVERT_IMPLICIT(int,..)" or "Scalar Operator (@2)" as shown in figure 1. But if the plan is not parameterized, the hard coded value will show up in the seek predicate like "Scalar Operator (12)" as shown in figure 2 below.

Figure 1

 

Figure 2

 

When is a query parameterized?

If you set your database's parameterization as forced, SQL Server will try to parameterize every query except the conditions documented http://technet.microsoft.com/en-us/library/ms175037(v=SQL.105).aspx.

But what about when your database's parameterization is set simple (default)? Our books online documentation (http://technet.microsoft.com/en-us/library/ms186219(v=sql.105).aspx) states that only very small set of queries will qualify. There is no easy answer to what queries qualify. But in general, if your query involves multiple tables, chances are it won't be parameterized. A more precise answer is that simple parameterization can only occur if the plan is a trivial plan. In case you are wondering why your query is not parameterized, you need to look no further than the XML plan itself. In xml plan (you will need to open as XML), you will see an attribute called "StatementOptmLevel" as shown below. If the StatementOptmLevel="FULL", then the query will not be parameterized with default simple parameterization option.

 

What happened to this customer mentioned earlier?

For this customer, the application was really doing something not that optimal. I simplified the scenario like below. t1 is referenced by t2

create table t1 (c1 int primary key, c2 int)

go

create table t2 (c1 int references t1(c1))

go

 

In the update statement, they also seek on the primary key column but also update on the primary key column with the exact same value. The statement looks like something below.

update t1 set c1 = 12, c2 = 23 where c1 = 12

In 2005, the update was a simple trivial plan. But in 2008 and 2008 R2, we made an optimizer change to do some Halloween protection due to incorrect results. As a result, this type of query has to go through full optimization. Therefore, in simple parameterization configuration, the query can no longer parameterize.

Fortuhnately, it's easy to fix. The easiest is to set force parameterization. But this customer doesn't want to do that citing it can impact other queries. Fortunately, template plan guide solves the issue.

All you need to do is to create a template plan guide to force parameterization for that particular query (like below)

DECLARE @stmt nvarchar(max);

DECLARE @params nvarchar(max);

EXEC sp_get_query_template

N'update t1 set c1 = 12, c2 = 23 where c1 = 12',

@stmt OUTPUT,

@params OUTPUT;

EXEC sp_create_plan_guide

N'TemplateGuide2',

@stmt,

N'TEMPLATE',

NULL,

@params,

N'OPTION(PARAMETERIZATION FORCED)';

 

Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support

 

05 Dec 22:21

SQL Server – Quick look at what is Halloween effect?

by Kanchan Bhattacharyya

Dear Friends,

 

No, definitely it does not appear like following image in SQL Server but impact on data retrieval can be severe so you can imagine something similar on a lighter note :).

 

Halloween

 

Ok, Halloween effect refers to a situation where data moves to a position within result set and consequently could be changed many times. This is driven by data modification and not by data reads. Before data is updated, it must be read first. This is achieved in SQL Server using two cursors; one for read and the other one for write. If data is updated by write cursor before it was read in and there is a possibility that a record will move its original position because of the update and is a potential candidate to be read second time and updated again. We can say, reading data using an index whose key is to be updated is good example of Halloween effect.

As we understand this is not an acceptable situation and best thing is Storage engine on SQL Server has appropriate mechanisms in place to avoid such disasters. To go back a bit; during update SQL Server uses two cursors, one for reading and other for writing or you can call it updating. SQL Server injects a blocking operator such as spool into the plan to make sure data available to the write has been read fully. Though it is not mandatory to use spool but SQL Server uses this operator in general because it is having the lowest cost. It means, data has to be inserted into tempdb before they are used by write cursor to ensure all data is read before any modification. SQL Server looks out for Halloween effect when creating the plan. It introduces blocking operator only when there is a chance of this effect occurring.

You should always remember performance overhead because of the Halloween effect when you decide indexing strategy and it is equally important to consider impact on tempdb when deciding on indexing or performance tuning.

To learn more on Halloween effect and how  SQL Server protects us from it , you can read some excellent posts at Halloween effect issue, Halloween Protection and Halloween Protection – The Complete Series.

If you liked the post, do like us on Facebook at http://www.FaceBook.com/SQLServerGeeks

Have a SQL Server question? Join the fastest growing SQL Server Facebook group at: http://www.facebook.com/groups/458103987564477/

 

Regards,

Kanchan

05 Dec 21:45

EV Owner Arrested Over 5 Cents Worth of Electricity From School's Outlet

by Soulskill
sl4shd0rk writes "It seems you can be arrested in Georgia for drawing 5 cents of electricity from a school's outdoor receptacle. Kaveh Kamooneh was charged with theft for plugging his Nissan Leaf into a Chamblee Middle School 110V outlet; the same outlet one could use to charge a laptop or cellphone. The Leaf draws 1KW/hour while charging which works out to under $0.10 of electricity per hour. Mr Kamooneh charged his Leaf for less than 30 minutes, which works out to about a nickel. Sgt. Ernesto Ford, the arresting officer, pointed out, 'theft is a theft,' which was his argument for arresting Mr. Kamooneh. Considering the cost of the infraction, it does not seem a reasonable decision when considering how much this will cost the state in legal funds. Does this mean anyone charging a laptop or cell phone will be charged with theft as well?"

Share on Google+

Read more of this story at Slashdot.








05 Dec 21:44

Ask Slashdot: Application Security Non-existent, Boss Doesn't Care. What To Do?

by Soulskill
An anonymous reader writes "I am a senior engineer and software architect at a fortune 500 company and manage a brand (website + mobile apps) that is a household name for anyone with kids. This year we migrated to a new technology platform including server hosting and application framework. I was brought in towards the end of the migration and overall it's been a smooth transition from the users' perspective. However it's a security nightmare for sysadmins (which is all outsourced) and a ripe target for any hacker with minimal skills. We do weekly and oftentimes daily releases that contain and build upon the same security vulnerabilities. Frequently I do not have control over the code that is deployed; it's simply given to my team by the marketing department. I inform my direct manager and colleagues about security issues before they are deployed and the response is always, 'we need to meet deadlines, we can fix security issues at a later point.' I'm at a loss at what I should do. Should I go over my manager's head and inform her boss? Approach legal and tell them about our many violations of COPPA? Should I refuse to deploy code until these issues are fixed? Should I look for a new job? What would you do in my situation?"

Share on Google+

Read more of this story at Slashdot.








05 Dec 21:29

Gone Phishing

by Lorne Kates

Juan's job wouldn't have been so bad if not for the rampart stupidity. Stupidity was responsible for deciding a 25k+ employee corporation only needed a skeleton-crewed IT department. And that same level of stupidity was spreading across the entire C-level of the org chart.

The IT office, such as it was-- a single converted room in the basement-- was its usual sparsely populated self, made up of just Juan, and his few remaining coworkers. Everyone else had either been caught by the last swing of the budget axe, or had seen it coming and had bailed. The team that remained was a tight mix of competent enough to be seen as valuable; hard-working enough to be taken advantage enough; and skilled enough to leave, but too lazy to do so.

Juan's co-workers were friendly, and the users they supported were tolerable. Indeed, his job wouldn't have been so bad if not for the rampart stupidity. It was barely five minutes into the day, and already it was landing in his inbox with a dreaded bing.

"It's Steve," Juan called out, getting back a collective groan from the Barbara, the mail admin, and Ingeborg, the network admin. The email was addressed to Juan directly-- even though Steve's own policy as Chief Security Officer demanded all employees send IT related email to the catch-all IT inbox.

IT:

Helpdesk found this mail in our inbox.

We can't figure out which user has a problem with his home banking. Can you help us???

Kind regards,

Steve, CSO

Attached to the email was a run-of-the-mill banking phishing scam. Your online accress account has is expireding, for service uninterrupted click her to manually updating etc, etc. Although it had ING's logos, all the links pointed to a blatant scam site. Obviously it had slipped through the filters, but-- Juan just couldn't wrap his mind around it. He read and re-read Steve's email. Steve-- the guy who was ostensibly responsible for all security-related things-- the CHIEF guy responsible for security-- literally could not identify a phishing email.

Juan glanced around, but no one would make eye contact with him. Yeah, he knew the drill-- Juan got the email, Steve was his problem.

He took a deep breath, and calmly composed an email, dictating his real thoughts aloud.

"Dear Chief Stupidity Office," he said. Dear Steve…

Barbara and Ingeborg rolled their chairs over to his desk, craning their necks to read over his shoulder.

"How the hell can you, a so-called IT-professional, not recognize an obvious phishing scam?" Thank you for bringing this to our attention. We believe this is just a piece of junk mail that got through the spamfilters.

"What's next? You want us to track down which user ordered Viagra?" In order to help Barbara update the blacklist, please forward us the original email with headers.

"If I had to pinpoint the origin of the stupidity eating away at the company, it would be the day you failed upwards into the position of Chief Security Office. Here's a link. Please attempt to educate yourself on the very basics of your chosen field, including terminology that's been around for decades, idiot." Please refer to Wikipedia:Phishing.

"I sincerely hope that your stupidity is either reversible, or terminal." Sincerely, Juan

Juan tried not to picture Steve's kidney and a rusty shank when he stabbed "SEND". The sound of his coworker's laughter surrounded him, quelling the anger that dealing with Steve always dredged up. As long as he had good coworkers, the job couldn't be that bad-- even with the rampart stupidity. Right?

"Good one," Barbara said, patting his shoulder. "At least that's taken care of."

"No it isn't," Juan said, pessimistic but realistic, "He'll skim the email, see your name, and just make a stupid mail-server request."

"What?" Barbara shook her head, "No, he'll see it was just a mail that got through the spamfilter, and--"

His inbox went bing. A response from Steve, already. Blah, blah,blah-- Thank you for your pedagogical approach. Barbara pointed at the email. "See, he's trainable."

Ingeborg interjected, "Like a monkey, or a goldfish."

"Yeah," Barbara continued, "I think that's the last we'll hear--"

Barbara's inbox went bing, and Juan gave her a look. "Told you."

They all went over to her workstation. Sure enough an email from Steve-- asking her, as 'the mail lady', to proactively prevent a new threat called phishing (pasted link from Juan's email). His great solution-- hackers are trying to hack ING bank all the time, so the mail servers should just block all emails with "ING" in them.

"That's brilliant!" Ingeborg cried out, "I'll never get a stupid email again!"

Barbara began to type her reply, talking aloud. "Dear Chief Stupidity Officer." Dear Steve…

"Since you're either brain dead or functionally illiterate, I'll just let you know straight out that your stupid solution won't work. All it will do is block all emails using pretty much any verb in the English language." This solution would generate too many false positives from common words or names that contain 'ING'.

"Since I assume you're too stupid to adequately care for yourself, and are probably fed a nutrient slurry from a bottle like a hamster, you don't even know about real world things, like that banks actually exist." In addition, the filter would block legitimate communications from ING

"I hope the money we saved from cuts to the IT department allows R&D to develop some sort of chlorine to dump into your gene pool, to save the rest of humanity from your stupidity." Sincerely, Barbara.

She hit send, and the three of them had another chuckle. But underneath Juan's humor ran that old current of discontent. If this had been an isolated incident of stupidity, perhaps he could have written it off as an honest mistake.

"I don't know how much more of this I can take," Juan said, his humor fading. "These sorts of emails seem to be coming every day."

"It's not that bad," Barbara said, still smiling. "Some people are just slow on the uptake. We give Steve a hard time, but isn't that bad. I'm sure that's the last we'll hear of it."

"Or--" Juan started, but Ingeborg interrupted.

"Let me guess," the network admin said with a smile, "Or Steve will just email me asking for all network traffic with the word 'bank' to be dropped."

Ingeborg and Barbara laughed.

Juan didn't.

"Probably," Juan said.

"Not going to happen," Barbara said, her smile wavering a bit.

Ingeborg shook his head. "No, you can't seriously be that pessimistic."

"I wish I wasn't. Look, guys, this job would be fine if not for the rampart stupidity. I've been putting up with it because you're great, and I'd feel bad about leaving the department even more short-staffed than it already is-- but I'm at my tipping point here. Just one more nudge and I swear I'm going to update my resume and--"

Ingeborg's inbox went bing

[Advertisement] Make your team a DevOps team with BuildMaster. Pairing an easy-to-use web UI with a free base platform, BuildMaster gets you started in minutes. See how Allrecipes.com and others use BuildMaster to automate their software delivery.
05 Dec 21:23

Managed Order Management

by Remy Porter

Hiro’s employer, an international consultancy firm, hosted a number of applications for “enterprise” customers. They provided everything from HR solutions to order management tools. Each local office was independent, but a central corporate office would issue decrees to be obeyed as divine law.

One such decree ordered them to upgrade customers to a new, internally developed order management package, ASAP. Corporate had put a lot of time and effort into the tool, and wanted a successful roll-out. Hiro’s office did what they were told, and to prove their loyalty to their corporate overlords, they started by bringing their largest customer, Initech, on line. Hiro’s management proudly reported a successful migration.

And it was, at least for the first few days. Then new orders started timing out. Then old orders timed out on lookup. Then the application crashed constantly. Angry managers from Initech called corporate. Angry C-level types called Hiro’s office. Hiro’s angry managers called Hiro. Finally, an angry Hiro called back to the corporate office, to try and get support.

Jenga

“I’m having an issue with the new order management package.”

“What new package?” Sam said. Sam was a senior developer at the corporate office.

“The one which we got a memo for- deploy ASAP. That one,” Hiro said.

A long pause waited with Hiro, as Sam dug through recent emails, trying to find the memo. “Oh… that. You… you rolled that out already?”

“The memo said, ‘ASAP’,” Hiro repeated.

“Oh, that’s not good.”

The cheap plastic handset groaned in Hiro’s death grip. “What do you mean, ‘not good’?”

“Well, the application doesn’t work. We’re pulling it, and corporate isn’t going to support it at all.”

Hiro reported this to his bosses. The chain of anger got swung around the office a few times, before wrapping tightly around Hiro’s neck. His manager explained, “We can’t admit to our customer that we can’t fix our own software. We’re a consultancy! How would that look? We can’t roll-back, we can’t switch to a third-party tool. We have to fix this, which means you have to fix this. Make this work, Hiro.”

The first obstacle was getting access to the code. Corporate didn’t use source control. Simply copying source files around didn’t work- no one was clear on what dependencies the product had. Eventually, corporate sent a VM image of Sam’s PC, which was the environment Hiro had to develop in.

The code was “well documented”, in the way that all terrible code is well documented:

//gets customer by ID
public Customer getCustomerById(int id) { … }

Comments never approached being useful. The nearest they came was a terse description of the design pattern in use. bridge-flyweight hybrid pattern, abstract builder factory for the decorator were scattered through the code. The original architect- not poor Sam, who only touched the project in its final days- didn’t use design patterns to overcome common problems. Instead, he invented problems to justify trying every permutation of every design pattern- real or imagined.

The code was lambda crazy. Hiro found one peculiar function which returned a lambda that itself returned a lambda which also returned a lambda- down seven layers deep. LINQ statements were nested the same way. The simple act of creating a new order involved 4,000 unique database operations, each using its own connection object. 10,000 records were added or modified in the process. Common cases, like accepting two orders from the same customer, for the same product, in the same quantity, caused a crash.

Other “patterns” lurked in the code. For example, most of the numeric variables were declared long. Most of the methods expected int parameters. To resolve this, they used this pattern again and again:

public void someMethod(int param) { … }
…
long myVar = someNumber;
…
someMethod(Integer.Parse(myVar.ToString()));

“There is no fixing this,” Hiro told his boss. “If we junk this, I can deliver the actual functionality our customer needs in two months.”

“Corporate spent years on this! We can’t throw away all of that work. And we certainly can’t take two months! We need this application fixed now.”

In the end, it took six months to stabilize the existing code base. It might have gone faster, but Hiro’s boss kept throwing more people onto the project. Each time some arbitrary deadline that none of the developers agreed to passed, he dedicated another resource. No one was happy with the result, least of all Sam back at the corporate office- once the application was mostly working, Hiro’s boss won the argument. Corporate would own and support this beast from now on.

Jenga image By Jorge Barrios (Own work) [Public domain], via Wikimedia Commons
[Advertisement] Make your team a DevOps team with BuildMaster. Pairing an easy-to-use web UI with a free base platform, BuildMaster gets you started in minutes. See how Allrecipes.com and others use BuildMaster to automate their software delivery.
05 Dec 19:30

'O'-Convertible

by Bruce Johnson

The consulting company that Alicia worked for, NewSoft Associates, was in the intelligence business. Their work involved digging through data produced using technology that was years and decades old, identifying the nuggets that were valuable, and persisting them into a central repository so that others could perform the needed analysis and take the appropriate action.

Yes, they were a Business Intelligence company. And Alicia was one of their leading consultants.

When it came to extraction, transformation and loading, there were few in the industry who were better. When faced with divergent definitions of 'customer' or 'month', Alicia had the rare ability to get the parties to find their common ground. But she was even better when digging into the technical morass that frequently surrounds a BI project. For those of you old enough to be involved in Y2K projects, a typical BI engagement is similar. The data structures are convoluted enough to drive the designers of the Byzantine labyrinths crazy. The data is stored in antiquated technology created when COM+ was considered cutting edge. And most of the software used to populate the data was written by monkeys who were annoyed at their lack of an opposable thumb (you trying using the space key without one and you'll understand). Flinging faeces was (and still is) considered bad form in the workplace, so the coders flung good sense to the wind instead.

As a result of Alicia's position in the company, she was often used where no one else could cut it. Such was the case with her current gig, helping the Goodall Construction company extract twenty years of data from their Hyperion system.

The project had run into the standard set of BI issues. Collections of old spreadsheets containing both data and macros. Lack of validation of the data that had been entered. Disparate data sources using different working definitions of the same word (Inigo Montoya would have a field day). But there was one set of data that was consistently failing on conversion. The vast majority of cases worked just fine. But a few cases just blew up the program. The particular value that was causing grief was 2196O.

Now the very astute among you might have already spotted the problem. But put yourself in Alicia's shoes. Hidden amongst a sea of numbers, the difference might not be quite as striking.

Finally, Alicia saw the issue. Quickly she took it to Donald, the Goodall developer who was assigned to the project. When Alicia pointed out that the 'O' at the end of the number was alphabetic instead of the expected numeric value, Donald paused for barely a moment.

"Oh that. Don't you know? This is how we record negative numbers. They are so rare that we usually don't worry about them."

Alicia's jaw dropped. Before she could pick it up off the desk, Donald typed a few keystrokes on this terminal and turned it towards her.

"This should straighten you out", he said.

On the screen, this is what Alicia saw.

EVALUATE #SIG
   WHEN = 0
   LET $SGN = '}'
   BREAK
   WHEN = 1
   LET $SGN = 'J'
   BREAK
   WHEN = 2t (sic)
   LET $SGN = 'K'
   BREAK
   WHEN = 3
   LET $SGN = 'L'
   BREAK
   WHEN = 4
   LET $SGN = 'M'
   BREAK
   WHEN = 5
   LET $SGN = 'N'
   BREAK
   WHEN = 6
   LET $SGN = 'O'
   BREAK
   WHEN = 7
   LET $SGN = 'P'
   BREAK
   WHEN = 8
   LET $SGN = 'Q'
   BREAK
   WHEN = 9
   LET $SGN = 'R'
   BREAK
END-EVALUATE

It took a couple of moments for Alicia to grasp the WTF-ness of the snippet. But, gathering her wits, she thanked Donald and asked him to send the snippet to her.

After all,, she thought, What if the last digit in a negative number were a '2t'?

[Advertisement] Make your team a DevOps team with BuildMaster. Pairing an easy-to-use web UI with a free base platform, BuildMaster gets you started in minutes. See how Allrecipes.com and others use BuildMaster to automate their software delivery.
05 Dec 18:06

Surprise – Your Project is a Failure

by Lee Dallas
EMC logo
I have been thinking about project failure for the last several days. Perhaps because of the healthcare.gov problems or maybe because I put up Christmas lights that failed to entice even a single passing aircraft to attempt landing on my lawn. In a word – disappointing. Expectations were not met. It seems every project that fails […]
05 Dec 18:04

No Spindles Bro

by Jon Owings
EMC logo

I was assisting one of my local team members the other day with sizing a VM for Microsoft SQL. I usually always fall back to this guide from VMware. So I started out with the basic seperation of Data and Logs and TempDB.

Make it look like this:

VM Disk Layout

LSI SCSI Adapter
C: – Windows

Paravirtual SCSI Adapter
D: – Logs
E: – Data
F: – TempDB

Which is pretty standard. Then someone said, “Why do we need to do that?” I thought for a second or five. Why DO we need to do that? I knew the answer in the old school. Certain raid types were awesomer at the types of data written by the different parts of the SQL Database. We are in a total post-spindle count world. No Spindles Bro! So what are some reasons to still do it this way for an All Flash Array?

1. Disk Queues
I think of these like torpedo tubes. The more tubes the less people are waiting in line to load torpedoes. You can fire more, so to speak. Just make sure the array on the other end is able to keep up. Having 30 queues all going to one 2 Gbps Fiber Channel port would be no good. See number 3 for paths.

2.  Logical Separation and OCD compliance (if using RDMs)
Don’t argue with the DBA. Just do it. If something horrifically bad happens the logs and data will be in different logical containers. So maybe that bad thing happens to one or the other, not both. I am not a proponent of RDM’s. SO much more to manage. If you can’t win or don’t want to fight that fight at least with RDM’s you will be able to label the LUN on the array “SQLSERVER10 Logs D” so you know the LUN matches to something in Windows. This also makes writing snapshot scripts much easier.

3. Paths
Each Datastore or RDM has its own paths, if you are using Round Robin (recommended for Pure Flash Array) more IO on more paths equals better usage of the iSCSI or FC interconnects. If you put it all on one LUN, you only get those queues (see #1) and those paths. Remember do what you can to limit waiting.
Am I going down the right path? How does this make it easier? Are there other reasons to separate the logs and data for a database other than making sure the Raid 10 flux capacitor is set correctly for 8k sequential writes? I don’t want to worry about that anymore. Pretty sure plenty other VM Admins and DBA’s don’t either.

For me a good exercise in questioning why I did things one way and if I should still do them this way now.

05 Dec 18:00

Feeling Lucky?

by Martin Glassborow
EMC logo

And here we go again, another IT systems failure at RBS; RBS appear to have been having a remarkable run of high-profile core-system failures but I suspect that they have been rather unlucky or at least everyone else has been lucky. Ross McEwan, the new Chief Executive of RBS has admitted that decades of under-investment in IT Systems is to blame.

Decades seems to be an awful long-time but may well be accurate; certainly when I started working in IT twenty-five years ago, the rot had already set in. For example, the retail-bank that I started at had it’s core standing order system written in pounds, shillings and pence with a translation routine sitting on top of it; yet many of these systems were supposed to have been re-written as part of the millenium-bug investigations. Most of this didn’t happen, whole-scale rewrites of systems decades old and with few people who understood how they worked, this was simply not a great investment; just patch it up and move on.

RBS are not going to be the only large company sitting on a huge liability in the form of legacy applications; pretty much all of the banks do and many others. Applications have been moved from one generation of mainframe to the next and they still generally work but the people who know how they really work are long gone.

Yet this is no longer constrained to mainframe operations; many of us can point at applications running on kit which is ten years or more old on long-deprecated operating-systems. Just talk to your friendly DBA about how many applications are still dependent on Oracle 8 and in cases even earlier. Every data-centre has an application sitting in the corner doing something but no-one knows what it is and no-one will turn-off just in case.

Faced with ever declining IT budgets; either a real decline or being expected to do more with the same amount; legacy applications are getting left behind. Yes, we come across attempts to encapsulate the application in a VM and run it on the latest hardware but it still does not fix the legacy issue.

If it ain’t broke, don’t fix it…but the thing is, most software is broken but you’ve just not yet come across the condition that breaks it. Now the condition that breaks it may well be the untrained operator who does not know the cunning work-around to keep an application running; work-arounds simply should not become standard operating procedure.

Question is as we chase the new world of dynamic operations with applications churning every day; who is brave enough to argue for budget to go back and fix those things which aren’t broken. Who is going to be brave enough to argue for budget to properly decommission legacy systems, you know those systems who only have one user who happens to have a C at the beginning of their job title?

Now it seems that Ross McEwan may be one who is actually being forced into taking action; is anyone else going take action without a major failure and serious reputational damage? Or do people just feel lucky?

 

 

 

05 Dec 18:00

What’s Your Cloud Exit Strategy?

by Lee Dallas
EMC logo
I’ve had a draft of this article for over two years but I didn’t want you to think I’ve gone completely mad.   Yes, the cloud offers many advantages. But, what if the cloud you choose ends up not fitting your needs.  What is your exit strategy? Cloud vendors make it easy to hop-on with data migration […]
05 Dec 17:52

vSphere Upgrade Saga: vCenter 5.5

by Edward Haletky

It has been a while since I added to my vSphere Upgrade Saga, but everything has been running smoothly. Now it is time to upgrade to vSphere 5.5, not because I have to, but because something within SSO broke, and the fix is to use SSO for vCenter 5.5. This leads me to upgrade to vCenter 5.5. What broke? I use the self-signed certificates from VMware for my environment. I probably should not, and that will be fixed shortly, but the long and short of it is that the certificates expired, and an upgrade of vCenter caused SSO, VUM, VIN, and other critical systems to stop talking to each other.

In order to upgrade to vCenter 5.5, you either need to have up-to-date certificates or to start over from scratch. Not having a certificate authority on hand, I chose to reinstall from scratch. Below are the steps I took. Unfortunately, reinstallation lead to other issues, such as needing old SSL certificates to access my existing database. The fact that we cannot use the certificates we want is rather limiting, but at the same time, it improves overall security.

The key to the upgrade is that you must have valid, non-expired, non-revoked certificates in place.

If you can start over, and for some shops this is possible, you need first to remove all VMware products from your vCenter Server and to reinstall, including creating a new database. In some cases, this is a good time to clean up things that are pretty messy to start with, so some planning is required, whichever direction you go. I went the route of a reinstall. If, however, you cannot reinstall due to historical and needed data, then first fix your certs! I will post more on this at a later time.

Following are the steps I took:

Step 0: Make a snapshot and backup of your vCenter server.

I did not do this step, which was a major oversight on my part, as it allows you to recover data, such as old SSL certificates.

Step 1: Remove all existing VMware products installed on your vCenter server (including removing all non-deleted product directories after product removal).

You will need your old SSL certificates, so first copy and store them somewhere. This will allow you to access your old database. Without these certificates, you will need to start your database over fresh. I wanted to try a fresh database, as mine was fairly small and I had tweaked it a few times to fix some issues. So starting fresh would allow me to start with a clean database.

Step 2: Create a new database in Simple Recovery mode for vCenter.

This step was necessary, as I had removed my old SSL certificates.

Step 3: Install using Simple Install mode.

For smaller shops or those that want everything on one server, this is by far the easiest installation method for vCenter.

Step 4: Log in to the VMware vSphere Web Client as administrator@vsphere.local (the default SSO user).

Until you tell SSO to use Active Directory or some other directory server, this is the only user that can log in via either the web client or .NET client.

Step 5: Add in your licenses.

During the vCenter install, you will be asked for your vCenter license (if you do not provide it, you will be in evaluation mode). For small organizations, evaluation mode will allow the use of all the features of vSphere Enterprise Plus, such as SVMotion, Host Profiles, etc., which can be extremely useful when you are upgrading an environment to vSphere 5.5.

Step 6: Enable Active Directory authentication.

Create users, either in Active Directory or in SSO, for each aspect of your vCenter management, such as Upgrade Manager. I use Active Directory for these users. So the first step was to add an identity source into my newly installed SSO environment. Simply login to the vSphere Web Client and navigate to Administration->Configuration (Under the Sign-On and Discovery menu item) then press the green plus sign to add an identity source. This option only shows up when you login as administrator@vsphere.local user.

vCenter Add AD

Step 7: Create roles for each vCenter tool (Upgrade Manager, etc).

It is very important to create a user and role for every vCenter tool. In this case, I am creating a user named vUM, which will be used by Upgrade Manager.

Adding vUM User

Then you need to create a role. Here is a role called vDump.

vCenter Create Role

Finally, attach the user to the role for later use in step 8.

vCenter Roles+Users

Step 8: Install Tool (Upgrade Manager), employing the user previously created to talk to vCenter. This is for security purposes as laid out in the hardening guide.

Repeat Steps 6 through 8 for each tool as you install:

  • vSphere ESXi Dump Collector
  • vSphere Syslog Collector
  • vSphere Auto Deploy
  • vSphere Authentication Proxy (which needs to be deployed on a completely different host from vCenter)
  • Virtual Infrastructure Navigator
  • vCenter Operations Manager
  • VMware vShield Manager
  • vCenter Log Insight
  • etc.

The need to install the vSphere Authentication Proxy, for some work I was testing, was the cause of the need to upgrade to vCenter 5.5.

Step 9: Log in to vCenter .NET Client as an administrator.

Make sure that you have added the appropriate Active Directory user for access so that you will be able to log in as an administrator via the vCenter .NET Client. You will need this client to run those elements not currently supported by the web client. Which to be honest is not much, but also, the web client feels faster and is what we are used to. I usually use both web and .NET clients as needed.

Step 10: Fix dvSwitch configurations if any are in use.

May require a login to the host to determine virtual device is connected to which virtual port. Once I fixed vMotion and FT, which use a dVS, all the other vmnics became available for assignment again through the clients. I found it is easier to do this in the .NET client, as I have been using this client for years. You will have to use the migration tool within the .NET client to migrate disconnected VMs to their new distributed virtual switches. If vCenter is living on a dVS, you may wish to set up the dVS in Ephemeral mode. I usually keep vCenter and most management tools on a non-dVS to keep problems with dVSs to a minimum.

Step 11: Install vCenter Update Manager plugin.

I received an error (see following image) that you would think is related to the database user but is actually related to the user that is running the vCenter Update Manager service. VMware KB1015233 provides the solution.

vUMPluginError

Step 12: Import your host-specific VIBs into your Patch Repository.

Since I have HP Hardware, I import four bundles from the HP Agentless Management Service Bundle, HP ESXi 5.0 Management Bundle, HP Utility Bundle, and hpnmi for ESXi 5.0. In addition, I create a baseline for just these VIBs so I can maintain them across all nodes.

HPVIBs

Step 13: Visit your cluster’s Update Manager tab, and attach and scan your baselines.

Since we just reinstalled everything, it is also time to ensure that we have the proper baselines for updates. You may want to patch everything, but now is the time to ensure that the tools work.

Step 14: Import your 5.5 ISO so you can upgrade from within VUM.

We are just importing at this time; there will be another post on upgrading to 5.5 using VUM. This is part of staging.

Step 15: Add back any customizations you may have had.

Using the customization editor, update or add back in any customizations you use for deploying virtual machines. I only had a few, so I recreated them. If you have more than that, export and import. You may need to go back to that backup made in Step 0.

Step 16: Using the Datastore Browser, add back any templates you may have.

When you add nodes into vCenter server, templates do not get added back in. Only VMs get added back in automatically. You need to access the Datastore Browser to discover any templates on the volumes used for VMs. Many times, templates are placed on other storage as well. You still need to re-import them into vCenter.

Step 17: Enable IP pool (network protocol profiles) on proper networks so that you can install vCops.

While not technically required, this is a good thing because it lets you control what IP pools are created instead of depending on vCops to create an unknown (at least to you) pool.

Step 18: Download Client Integration Plug-in.

You do this by visiting any VM and then selecting Launch Console: the Download Client Plug-in button will appear at the upper right. You can also do this from the login page if there are no errors at the top of your browser window (such as an unsupported browser, e.g. IE 11). You must close the browser AND the .NET client if you are using both to perform the install of the Client Plug-in.

Step 19: Redeploy vCops

Remember to redeploy using steps 6 through 8 when it comes time to register vCops to your vCenter as well as to set up a data collection user. Each VMware and third-party management integration to vCenter requires its own user and role for finer-grain logging and control. After you redeploy vCops, you will need to log out and back into the vSphere Web Client (and .NET Client) to see the vCops Health Tabs.

Step 20: Reconfigure or upgrade vShield Manager.

I chose to upgrade vShield Manager, but the major issue you may have is that the time zone on the vShield Manager machine must match the time zone (and time) of the machine hosting the SSO server. Since I did a simple install, this is the time of my vCenter server. If you do not keep the time and zone in sync, then the lookup service will not be connected, which will cause integration issues with the various tools. However, if you reinstall and do not upgrade, your existing deployed edges will not import, so you may wish to consider an upgrade step. The key here is to keep your SSO virtual machine and vShield Manager virtual machine in time sync. If they are not in time sync, then you cannot attach to the SSO Lookup Service, which is a crucial step.

There you have it: 20 steps to upgrade your vCenter server. A few more steps will come in different posts, as upgrading vCenter server also implies that you need to upgrade those things that use it in order to completely reconnect all of your management tools. This is the nature of vCenter. It is the center of your virtualization management, and it impacts everything around it.

28 Nov 21:31

Automating HDInsight cluster creation with PowerShell

by Murshed Zaman_AzureCAT

If I have to do some work more than twice I like automating the process (or try to as much as I can). Recently, I became interested in spinning up my own HDInsight cluster in Windows Azure with some customization. Since it’s work and I had to spin up clusters more than once, I tried automating the process with PowerShell. In the following code examples I will try to explain what I did and why.

Prerequisite:
1. Windows Azure subscription
2. Windows PowerShell V2
3. Windows Azure PowerShell and Microsoft .net SDK for Hadoop
4. Follow the link from #3 to download and import publishsettings

Follow this link to find the actual PowerShell code, my explanation and examples.

28 Nov 21:31

Embiggen your family this week

by BuckWoody

I know a lot of my readers are located in countries other than the U.S., but I hope you'll indulge me for a moment for a non-technical post.

 

When I was 18, I joined the U.S. Air Force, and my first base was in the United Kingdom at R.A.F. Lakenheath. It was my first time away from home as an independent person, my first trip overseas, and my first military assignment. Each year in the U.S. we celebrate "Thanksgiving" - a time traditionally spent with family, a big meal, and all the drama that goes with both. When you're stationed overseas, if your family isn't with you, you eat at the chow hall with the other 18-year olds who are away from home (which can be wonderful as well), but you really miss the family, especially the first time away from home, and in a place that's different than where you grew up. It's a difficult time for some. Me? I introduced some British friends to Cornish game hens and mashed potatoes - the closest I could get to turkey, stuffing and cranberry sauce. But I sure missed the Holiday traditions.

This year I'd like to challenge you to "embiggen" your family a bit. Contact a military base, station or the like near you (We live close to Joint Base Lewis-McChord) and ask for the Base Chapel or Family Services. Ask them if you can offer your home (and your insane family) to a young Airman, Marine, Soldier or Sailor for the day. Pick them up, and make sure to immerse them in the madness. It's something they won't forget soon.

28 Nov 21:30

Geek City: Did You Know … that not everything you know is actually true!

by Kalen Delaney
Software changes, new versions not only add new features, they also change internal behavior of old features, and not all the changes are documented! And how often, after a software upgrade, do you go through and test everything you knew was true, to...(read more)