Shared posts

14 Apr 19:58

Spared No Expense

by Remy Porter

Renee’s company was a Silicon Valley managed services company. They needed the best data-center they could find to house their infrastructure, and cost was not an object. They loaded up a dump truck full of money and went shopping for the most expensive hosting they could find.

For Renee, this meant lots of data-center tours. She played the tame geek for her management, ostensibly there to inspect the IT services. Her bosses were just there to see how much their prospective vendors would butter them up to close the deal.

The front-runner was a company called “Isla Nublar”. Their first experience on the tour was a massive security gate, guarded by two armed men and a dog. They gave the entire car a quick search, including using mirrors to inspect the undercarriage. After being sniffed over by the dog and having their ID checked and re-checked, Renee and her bosses were allowed through.

The building itself came from the “an alien spaceship has landed” school of architecture. It was steel, glass, organic curves and confusing arcs. A security guard and a sales engineer named John greeted them at the door. “We’re so excited to have you here,” John said. “As you’ll see, we’ve spared no expense.”

He showed them into a glass-walled atrium, full of natural light, tropical plants, and a catered spread laid out on a table, complete with coffee, macarons, and a sushi chef making rolls to order. “Can I interest you in any refreshments? The coffee is Kona- we’ve spared no expense.”

John showed them down the hall to the NOC. Behind the glass wall, staff bustled about the room, mostly ignoring the Mission Control-style wall display, and focusing instead on their own screens. Everyone looked busy, but didn’t actually seem to be doing anything. It reminded Renee of animatronic figures in Disneyland’s old “Mission to Mars” ride, and they seemed about as useful.

“As you can see, we’ve spared no expense. Now, let me get you ready to see the data-center floor.”

“Getting ready” meant filling out piles of paperwork and NDAs, collecting everyone’s fingerprints, mugshots, and patting down everyone to ensure they weren’t smuggling a cellphone camera into the data-center.

The entry to the data-center floor was through a man-trap, the security version of an air-lock. One at a time, they passed through the outer door, and an armed security guard waited for it to seal before pushing the button that released the inner door. After everyone cycled through, John reminded them, “We’ve spared no expense.”

Over the noise from the racks, John shouted out the key features. The floor was raised, and each tile had a dedicated sensor to detect if it were moved. There were motion sensors, water sensors, and a few times, John mentioned “laser beams”. Every millimeter of the space was watched by cameras, and the room itself was a Faraday cage. The supports were earthquake proof and the walls were lined with Kevlar. They had triply redundant backup generators, and each rack had dual, independent power legs, in case anything went wrong with one of them. Each rack was secured with a grille that could only be opened with an admin’s ID badge.

“As you can see,” John began.

“You’ve spared no expense,” Renee finished. “I get it.”

Duly impressed, management signed the contracts the next day. That weekend, Renee and her team had to move their hardware in through the “secure loading dock”, which was just a regular loading dock with a sign that said, “Secure Area.” None of the extremely paranoid security they had seen coming in the front way was in sight here. No one complained, because it certainly made moving in easier.

By Saturday night, everything was up and running. The switchover went without a hitch, and the customers hadn’t noticed a thing.

By Sunday morning, disaster happened. Machines were down, network was down, and the customers were enraged. Renee was on the phone with the data-center support in an instant, and before five minutes had passed, John joined the call, trying to smooth things over. “I just want you to know, we’ve got no other outages. We’ve got the best electricians in the business looking at what appears to be a power outage in one of your racks. We’ve spared no expense.”

The first challenge for the electricians was simply getting permission to lift floor tiles without setting off every alarm in the building. They traced the power lines, but found no fault between the mains and the rack, which meant the problem must be in the rack itself.

“They’re trying to trace the fault in the rack, but they can’t access the power cabling,” John explained.

“What?”

“Well, we spared no expense. Our racks are custom built. The power lines run inside the frame of the cabinet. I’m getting Ray, one of the admins to join the call from the floor.”

“This is tricky stuff,” Ray said, “and I don’t know how quickly we can fix this.”

“We need our servers up now,” Renee said.

“Right, so here’s what I can do- I can move your servers to one of our empty cabinets. Sound good?”

“Whatever it takes.”

“We have spare cabinets,” John said, “because we spared-”

“Just get our servers back up.”

Ray stayed on the line while he worked. Renee heard him slide a server out of the cabinet and then say, “Hunh, that’s interesting.”

“That’s not something I want to hear when you’re moving my servers.”

“There’s a strange black switch down here. I’ve not seen a black switch in any of the other cabinets.”

Ray checked one of the other cabinets, and sure enough, it had a bright glowing red switch. “Hold onto your butts…” Ray said. He flicked the black switch in Renee’s failing cabinet. The light turned red, and the servers spun right back up.

The switch was inconveniently placed, near the floor, where someone passing by the cabinet could accidentally jostle it, which is likely what happened. Unfortunately, moving the switch wasn’t possible. With their bizarre internal power cabling, and the way the switch was soldered to the internal power strip, it would require cutting, drilling, and probably a bit of welding to move it.

The only practical fix was to use tape to secure the switch in the “on” position. “We’ll use the best tape money can buy,” John said, “and it won’t come off. We’ll spare no expense!”

[Advertisement] Release! is a light card game about software and the people who make it. Play with 2-5 people, or up to 10 with two copies - only $9.95 shipped!
14 Apr 18:19

Interested in Winning a Trip to the Ignite Conference?

by Greg Low

This is the first year for Microsoft’s new Ignite conference. Many people are keen to go but figure it would break the bank.

image

Here’s another option:

The Microsoft Virtual Academy have their Heroes program running again. You get one point in the draw for just signing up. Even better though, take some courses and both learn something worthwhile and increase your chances of winning the competition:

image

Sign up and enter here now: SIGN UP           (NOTE: Australians Only for this one sadly)

14 Apr 18:18

Data. Insights. Action. Listen. A TechNet Radio Discussion about the Upcoming PASS BA Conference

by SQL Server Team

The PASS Business Analytics Conference 2015 is right around the corner!  Listen to some of our favorite Data and BI experts chat about this year’s event, April 20th – 22nd in Santa Clara, California.

Are you a key part of a team delivering business analytics and intelligence for your organization?   Are you finding BA/BI more a part of your day-today work life?  Find out why anyone trying to stay ahead of the analytics curve and position your data career for success should attend PASS Business Analytics Conference. 

Tune in to TechNet Radio to find out about the 60+ how-to sessions, practical case studies with hands-on workshops and expert panels. This is a can’t miss event!

14 Apr 18:16

SQL Server 2014 Cumulative Update #6 is available!

by AaronBertrand
Microsoft has released Cumulative Update #6 for SQL Server 2014. This update fixes a few important issues exposed in Cumulative Update #5; see this blog post for some more information about those issues. KB Article: KB #3031047 64 fixes total (55 fixes...(read more)
14 Apr 18:16

Open Letter to the Ohio North SQL Server Users Group

by AllenMWhite
Hi, It's been ten days since our SQL Saturday, when I announced that I was stepping down from my position as President of the Ohio North SQL Server Users Group, and feel I should share some of my thoughts with you, the people that've made our group the envy of many in the worldwide SQL Server community. Roughly ten years ago I started attending the user group meetings, and taking an active part. Over those ten years the group evolved into an organization independent of any one company, affiliated...(read more)
14 Apr 18:15

Memory leak occurs when you run DBCC CHECKDB against a database in SQL Server 2014 - fixed in SQL Server 2014 CU6

by Damian

Great news today!

At the end of January it was a Connect item  (https://connect.microsoft.com/SQLServer/Feedback/Details/1090203) that described a possible memory leak on the SQL Server 2014.

The details are as follow: 

sys.dm_os_memory_clerks for type = MEMORYCLERK_SQLQUERYEXEC and sys.dm_os_memory_objects for type = MEMOBJ_COLUMNBATCH show ever increasing values when columnstore tables are integrity checked. For example, if you run DBCC CHECKDB on a database that contains columnstore tables or you run DBCC CHECKTABLE on a columnstore table you will find that the memory shown for the above mentioned clerk and object types is ever increasing.  

The good news is that the problem is fixed and the solution will be available in the SQL Server 2014 CU6. Find more details: http://support.microsoft.com/kb/3034615 

 

Cheers

Damian 

 

14 Apr 18:14

The SQL Server Query Store

by Benjamin Nevarez

Have you ever found a plan regression after a SQL Server upgrade and wanted to know what the previous execution plan was? Have you ever had a query performance problem due to the fact that a query unexpectedly got a new execution plan? At the last PASS Summit, Conor Cunningham uncovered a new SQL Server feature, which can be helpful in solving performance problems related to these and other changes in execution plans.

This feature, called the Query Store, can help you with performance problems related to plan changes and will be available soon on SQL Azure and later on the next version of SQL Server. Although it is expected to be available on the Enterprise Edition of SQL Server, it is not yet known if it will be available on Standard or any other editions. To understand the benefits of the Query Store, let me talk briefly about the query troubleshooting process.

Why is a Query Slow?

Once you have detected that a performance problem is because a query is slow, the next step is to find out why. Obviously not every problem is related to plan changes. There could be multiple reasons why a query that has been performing well is suddenly slow. Sometimes this could be related to blocking or a problem with other system resources. Something else may have changed but the challenge may be to find out what. Many times we don’t have a baseline about system resource usage, query execution statistics or performance history. And usually we have no idea what the old plan was. It may be the case that some change, for example, data, schema or query parameters, made the query processor produce a new plan.

Plan Changes

At the session, Conor used the Picasso Database Query Optimizer Visualizer tool, although didn’t mention it by name, to show why the plans in the same query changed, and explained the fact that different plans could be selected for the same query based on the selectivity of their predicates. He even mentioned that the query optimizer team uses this tool, which was developed by the Indian Institute of Science. An example of the visualization (click to enlarge):

Picasso Database Query Optimizer VisualizerPicasso Database Query Optimizer Visualizer

Each color in the diagram is a different plan, and each plan is selected based on the selectivity of the predicates. An important fact is when a boundary is crossed in the graph and a different plan is selected, most of the times the cost and performance of both plans should be similar, as the selectivity or estimated number of rows only changed slightly. This could happen for example when a new row is added to a table which qualifies for the used predicate. However, in some cases, mostly due to limitations in the query optimizer cost model in which it is not able to model something correctly, the new plan can have a large performance difference compared to the previous one, creating a problem for your application. By the way, the plans shown on the diagram are the final plan selected by the query optimizer, don’t confuse this with the many alternatives the optimizer has to consider to select only one.

An important fact, in my opinion, which Conor didn’t cover directly, was the change of plans due to regressions after changes on cumulative updates (CUs), service packs, or version upgrades. A major concern that comes to mind with changes inside the query optimizer is plan regressions. The fear of plan regressions has been considered the biggest obstacle to query optimizer improvements. Regressions are problems introduced after a fix has been applied to the query optimizer, and sometimes referred as the classic “two or more wrongs make a right.” This can happen when, for example, two bad estimations, one overestimating a value and the second one underestimating it, cancel each other out, luckily giving a good estimate. Correcting only one of these values may now lead to a bad estimation which may negatively impact the choice of plan selection, causing a regression.

What Does the Query Store Do?

Conor mentioned the Query Store performs and can help with the following:

  1. Store the history of query plans in the system;
  2. Capture the performance of each query plan over time;
  3. Identify queries that have “gotten slower recently”;
  4. Allow you to force plans quickly; and,
  5. Make sure this works across server restarts, upgrades, and query recompiles.

So this feature not only stores the plans and related query performance information, but can also help you to easily force an old query plan, which in many cases can solve a performance problem.

How to Use the Query Store

You need to enable the Query Store by using the ALTER DATABASE CURRENT SET QUERY_STORE = ON; statement. I tried it in my current SQL Azure subscription, but the statement returned an error as it seems that the feature is not available yet. I contacted Conor and he told me that the feature will be available soon.

Once the Query Store is enabled, it will start collecting the plans and query performance data and you can analyze that data by looking at the Query Store tables. I can currently see those tables on SQL Azure but, since I was not able to enable the Query Store, the catalogs returned no data.

You can analyze the information collected either proactively to understand the query performance changes in your application, or retroactively in case you have a performance problem. Once you identify the problem you can use traditional query tuning techniques to try to fix the problem, or you can use the sp_query_store_force_plan stored procedure to force a previous plan. The plan has to be captured in the Query Store to be forced, which obviously means it is a valid plan (at least when it was collected; more on that later) and it was generated by the query optimizer before. To force a plan you need the plan_id, available in the sys.query_store_plan catalog. Once you look at the different metrics stored, which are very similar to what is stored for example in sys.dm_exec_query_stats, you can make the decision to optimize for a specific metric, like CPU, I/O, etc. Then you can simply use a statement like this:

EXEC sys.sp_query_store_force_plan @query_id = 1, @plan_id = 1;

This is telling SQL Server to force plan 1 on query 1. Technically you could do the same thing using a plan guide, but it would be more complicated and you would have to manually collect and find the required plan in the first place.

How Does the Query Store Work?

Actually forcing a plan uses plan guides in the background. Conor mentioned that “when you compile a query, we implicitly add a USE PLAN hint with the fragment of the XML plan associated with that statement.” So you no longer need to use a plan guide anymore. Also keep in mind that, same as using a plan guide, it is not guaranteed to have exactly the forced plan but at least something similar to it. For a reminder of how plan guides work take a look at this article. In addition, you should be aware that there are some cases where forcing a plan does not work, a typical example being when the schema has changed, i.e. if a stored plan uses an index but the index no longer exists. In this case SQL Server can not force the plan, will perform a normal optimization and it will record the fact that the forcing the plan operation failed in the sys.query_store_plan catalog.

Architecture

Every time SQL Server compiles or executes a query, a message is sent to the Query Store. This is shown next.

Query Store Workflow OverviewQuery Store Workflow Overview

The compile and execution information is kept in memory first and then saved to disk, depending on the Query Store configuration (the data is aggregated according to the INTERVAL_LENGTH_MINUTES parameter, which defaults to one hour, and flushed to disk according to the DATA_FLUSH_INTERVAL_SECONDS parameter). The data can also be flushed to disk if there is memory pressure on the system. In any case you will be able to access all of the data, both in memory and disk, when you run the sys.query_store_runtime_stats catalog.

Catalogs

The collected data is persisted on disk and stored in the user database where the Query Store is enabled (and settings are stored in sys.database_query_store_options. The Query Store catalogs are:

sys.query_store_query_text Query text information
sys.query_store_query Query text plus the used plan affecting SET options
sys.query_store_plan Execution plans, including history
sys.query_store_runtime_stats Query runtime statistics
sys.query_store_runtime_stats_interval Start and end time for intervals
sys.query_context_settings Query context settings information

Query Store views

Runtime statistics capture a whole slew of metrics, including the average, last, min, max, and standard deviation. Here is the full set of columns for sys.query_store_runtime_stats:

runtime_stats_id plan_id runtime_stats_interval_id
execution_type execution_type_desc first_execution_time last_execution_time count_executions
avg_duration last_duration min_duration max_duration stdev_duration
avg_cpu_time last_cpu_time min_cpu_time max_cpu_time stdev_cpu_time
avg_logical_io_reads last_logical_io_reads min_logical_io_reads max_logical_io_reads stdev_logical_io_reads
avg_logical_io_writes last_logical_io_writes min_logical_io_writes max_logical_io_writes stdev_logical_io_writes
avg_physical_io_reads last_physical_io_reads min_physical_io_reads max_physical_io_reads stdev_physical_io_reads
avg_clr_time last_clr_time min_clr_time max_clr_time stdev_clr_time
avg_dop last_dop min_dop max_dop stdev_dop
avg_query_max_used_memory last_query_max_used_memory min_query_max_used_memory max_query_max_used_memory stdev_query_max_used_memory
avg_rowcount last_rowcount min_rowcount max_rowcount stdev_rowcount

Columns in sys.query_store_runtime_stats

This data is only captured when query execution ends. The Query Store also considers the query's SET options, which can impact the choice of an execution plan, as they affect things like the results of evaluating constant expressions during the optimization process. I cover this topic in a previous post.

Conclusion

This will definitely be a great feature and something I’d like to try as soon as possible (by the way, Conor’s demo shows “SQL Server 15 CTP1” but those bits are not publicly available). The Query Store can be useful for upgrades which could be a CU, service pack, or SQL Server version, as you can analyze the information collected by the Query Store before and after to see if any query has regressed. (And if the feature is available in lower editions, you could even do this in a SKU upgrade scenario.) Knowing this can help you to take some specific action depending on the problem, and one of those solutions could be to force the previous plan as explained before.

The post The SQL Server Query Store appeared first on SQLPerformance.com.

14 Apr 18:13

Why isn’t my filtered index being used?

by Rob Farley

Quite often, people have filtered indexes but find they’re not being used as often as they’d like. I was reminded of this recently when I read a good post by Kendra Little from brentozar.com about filtered indexes. In it, Kendra talks about how the WHERE clause of a filtered index allows an IN clause, but not an OR clause (to quote someone from Microsoft “We currently do not support the "OR" clause due to the optimizer matching difficulties for such clauses”). Going back a few years, Tim Chapman wrote a post about the pains of filtered indexes.

Anyway, both of these posts hint that filtered indexes aren’t always used. Tim addresses it directly, and Kendra mentions needing a hint to make sure the index is used. I thought I’d explore the question a little more. I’m confident that both Tim and Kendra know this information – they are two of the top index experts in the world. This post isn’t for them, but for those people who are trying to find out why their filtered indexes aren’t being used.

To be used, the filtered index must be able to satisfy the query.

This should be fairly obvious, but it goes a little deeper than you might think on first glance. Let’s explore Tim’s examples to show what I mean.

Tim creates a filtered index:

CREATE INDEX FIDX_SalesOrderDetail_ProductID
ON Sales.SalesOrderDetail (ProductID)
WHERE ProductID = 870;

...and then shows that it’s used successfully for the query:

SELECT ProductID
FROM Sales.SalesOrderDetail
WHERE ProductID = 870;
 

(The image here is from Tim’s blog post, and belongs to Microsoft)

No surprise here – if the system knows that ProductID is 870, then it can use an index which only includes rows that satisfy that.

Tim then tries to use a variable instead of 870 – although he still passes in the value of 870.

DECLARE @ProductID INT;
SET @ProductID = 870;

SELECT ProductID
FROM Sales.SalesOrderDetail
WHERE ProductID = @ProductID;

(Image from Tim’s blog again)

No luck – the system doesn’t use the filtered index. He can’t even use a hint to force it to use it – SQL replies saying that it can’t create a plan for it using that index hint.

So what’s going on?

The problem is not with the Query Optimizer seeing that the value is going to be 870 – the problem is with the plan cache. You see, when SQL runs a query, it figures it won’t be in isolation and it puts it into the cache. But the version that goes into the cache is a general one, that doesn’t consider the values that are passed in. Because it needs a plan that will work regardless of what the parameter is set to, using the filtered index here would be inappropriate.

Tim shows one way around this, and ‘hints’ at another in an edit, although sadly you tend to find that in blog post edits, you can miss the key a little.

The way that Tim gets around this is to use Dynamic SQL, but I’m not a fan.

DECLARE @SQL NVARCHAR(MAX), @ProductID INT
SET @ProductID = 870
SET @SQL = N'SELECT ProductID
FROM Sales.SalesOrderDetail 
WHERE ProductID = ' + CAST(@ProductID AS VARCHAR(10))
EXECUTE sp_executesql @SQL

Yeah – I’m not a fan of this. Go read my post on SQL Injection for why.

Tim does mention a better method in the ‘edit’ bit in his post, and I want to look at that. He says: “In many cases, a way to get around the local variable problem is to recompile the statement.” – but I want to make it very clear that the point is not actually to recompile the statement, but to use a statement that isn’t going to get put into the cache (which you do by using the OPTION (RECOMIPLE) hint, which makes it sound like you’re recompiling the statement).

When you use OPTION (RECOMPILE), the main impact is not that it recompiles, but that the query doesn’t get cached. Because it doesn’t get cached, it won’t find the query in the cache beforehand either. This means that it doesn’t need to consider the generalised version – it has the confidence to know that it doesn’t need to cater for future uses, so it can use the filtered index!

image

So the better option than using Dynamic SQL is to use OPTION (RECOMPILE).

So that’s one reason why your filtered index might not be used – but there’s another too:

Using the filtered index might be too much work.

Let’s consider Kendra’s example. She had a query that used IN. Her example was:

CREATE INDEX IX_Votes_filter ON dbo.Votes (PostId) WHERE (VoteTypeId IN (1,2));

But as I don’t have a Votes table, I’m going to use an equivalent with AdventureWorks:

CREATE INDEX FIDX_Product2Color ON Production.Product (ProductID) WHERE (Color IN ('Red', 'Black'));

Kendra uses SELECT COUNT(*) queries to test it. Similarly to Kendra’s example, this works well when IN is used, and when OR is used, but when just one of the options is used, it needs a hint to use the index, and then it needs a lookup to satisfy it.

image

What Kendra doesn’t point out is why the filtered index needed the hint to be used in the single-option examples (which is fair enough – because her post was about OR v IN, not about why the filtered index wasn’t being used).

The reason why is because of the Lookup that’s needed. This is so expensive, it’s cheaper for the Query Optimiser to do a clustered index scan instead. It’s standard ‘tipping point’ stuff, but we don’t normally see this when we have COUNT(*), because COUNT(*) is just counting rows, not returning extra columns.

...except that there is a column that isn’t included in the filtered index, which our query needs – Color.

Subtly, even though our filtered index only contains rows that have the Color either Red or Black (or in Kendra’s example, VoteTypeID either 1 or 2), it doesn’t store which rows are Red and which rows are Black. We know that every row in the index is either Red or Black, so we can use this index as a starting point, but we would need to do a Lookup to get the actual Color value.

To fix this, we should INCLUDE the Color column in the filtered index.

CREATE INDEX FIDX_Product2Color2 ON Production.Product (ProductID) INCLUDE (Color) WHERE (Color IN ('Red', 'Black'));

Now we don’t need to hint at the index to use, and we see a residual predicate being used to make sure that we only pull Red rows out of the index.

image

So we see two reasons here for filtered indexes not being used, and two ways to help encourage them to be used more often. The first is to consider using OPTION (RECOMPILE) to help persuade the Query Optimizer not to consider generalising the query across different parameter values, and the second is to INCLUDE the columns that are used in the filter, in case the database engine needs those values later in the query.

Do try to use filtered indexes, and be understanding about those times when they’re not used. There’s quite probably a simple explanation.

@rob_farley

14 Apr 18:13

Seattle Building Permits visualized in Power BI with Socrata OData and data.seattle.gov

by Lukasz Pawlowski -- MS

 

Living in Seattle, it’s quite fun to keep track of what’s happening through data.  The Open Data initiative from the City of Seattle is a treasure trove of information that you can explore here: http://data.seattle.gov.  This is all power by Socrata, as are many other open data initiatives for Cities and organizations.  I wanted to explore that data using Power BI, specifically Power BI designer.  In a future post, I’ll show how to monitor insights using the new Power BI Dashboards feature, but that’s for later.

I wanted to look at Building Permit data – since there’s a lot of construction happening in and around town.  I was able to find the “Building Permits : Current” data set which lists “Building Permits issued in the past five years or currently in progress”.  You can see it here:

https://data.seattle.gov/Permitting/Building-Permits-Current/mags-97de

I pulled that data into Power BI Designer.  You can download the designer at the link below if you’d like to try it for yourself: 

http://www.powerbi.com/dashboards/downloads/designer/

You can follow the steps below or play with the final file, download it from here:

Seattle Building Permits OData.pbix 

Let’s walk through step by step:

The data.seattle.gov site for building permits that exposed through Socrata.
1. Click the Export option
2. Select OData
3. Copy the OData URL

Seattle Building Permits Get Odata from Socrata

 

Open Power BI Designer.
1. Under "New Source" select "OData Feed".

Seattle Building Permits Use OData Source in Power BI Designer

 

Fill in the URL and press OK.
1. Put the OData Feed URL here.

 

Start using Power BI Designer to explore the data.
1. There are the fields from the OData Feed.  We call this the "Fields List".
2. You can drag fields from the "Fields List" into this area we call the "Canvas".

Seattle Building Permits Learn about Fields

 

If you drag "value" to the canvas, a default visual is created.
1. This is the field I dragged.  You can also check the check box.
2. Use this button to quickly change the visual. 
3. You can also change visuals from here. Just make sure to select the visual you want to change first.

Seattle Building Permits First Visual

 

Here's my first report after playing with the data for just a few minutes.
1. This is the total value of all permits returned by the OData feed.
2. This is the distinct count of permits by permit_type.
3. This tree map shows that Commercial is the largest category by total value.  Interestingly it shows all the applicant names and how those applicants contribute to the total value of each category. 
4. This trend shows the distinct count applications by date.
5. This data point is fascinating!  The data sets has 5 years of building permit data and any "current" permits.  So permits from before this point are really old.  Might be worth investigating more why they're still open.

Seattle Building Permits Insights

 

The report is interactive by default.
1. I clicked on "COMMERCIAL" here and it  cross filtered all the other visuals for me.  So I can see the trend and totals just for what I selected.

Seattle Building Permits Highlighting

 

When I switch to "Query" I can see how the data was loaded.
1. Clicking here switch to show the data that was loaded.   Clicking "Report" takes me back to the report.
2. I can have multiple queries in a report.  They all show up here.  To get more queries, select "New Source" in the ribbon.
3. I can work on the data to transform it, mash it up, clean it up, or what have you.  We call these actions "Steps".   The steps show up here.  You view and fix the steps.

Seattle Building Permits Query Pane intro

 

Odata can contain nested data.  You can work with it without a problem.
1. This column has more data.  It will shows as "Record", "Table", or "List" in the UI.
2. Press this button to expand it.

Seattle Building Permits Expand a column

 

This is my query after I expanded a few columns.  I selected the green column so they're easier to see.
1. Notice how I have multiple steps now.  They're sequential so I can click through them to see what the query is doing.
2. I can always edit a step by pressing here.

Seattle Building Permits Query showing steps

 

I created a page that shows where each application is planning to build.

Seattle Building Permits Where applicants are building things

 

Using filters and different visuals, you can quickly find insights.  Here quickly see which applicants have applied for permits.  I've captured a view that shows that Patterson has applied for many more permits than Belcher, for example.  But for both those applicants, the total of their permit values is larger than half a million dollars.

Seattle Building Permits Filter to find insight

 

There it is.  A simple interactive view of Seattle building permits data.  The OData feed from Socarata works great but unfortunately doesn’t include zip code information.  That would have allowed me to create a density map of neighborhoods showing where investment is going. Maybe someone from the City of Seattle will read this, add the zip code and I’ll do a follow up. 

Next step will be publishing this to Power BI to monitor it.  I’ll do that in a future blog post.

Let me know what you think in the comments.  Would love to answer any questions you might have.

HTH,

Lukasz

14 Apr 18:13

ICYMI: Sequences, CPUs, and Trimming T-Logs – SQL Server Performance Topics

by kevin

Have you read any of the great articles on SQLPerformance.com? The deep-dive technical info about SQL Server performance issues authored by the SQL Server industry’s top experts. Here are a few great articles from the early days of the website, January 2013:

 

Generate a set or sequence without loops

Aaron Bertrand (b|t) provides detailed performance information about a variety of methods used to generate sets and sequences in this first in a three part series.

Selecting a Processor for SQL Server 2012

Glenn Berry (b|t) of SQLskills.com sheds light on the best CPU to select for your new SQL Server 2012 installation, an especially important insight considering Microsoft’s move to core-based licensing.  The wrong choice could cost you a fortune. A lively discussion also made this article even more enjoyable.

Trimming More Transaction Log Fat

In a follow-up to his December 2012 article, Paul Randal (b|t) of SQLskills.com dives into subtle performance problems that can cause bloat in the transaction log and slow its performance.

 

Let me know what you think!

-Kev

-Follow me on Twitter!
-Google Author

The post ICYMI: Sequences, CPUs, and Trimming T-Logs – SQL Server Performance Topics appeared first on Kevin Kline.

14 Apr 18:12

Monkeys and bananas: stifling innovation in the workplace

The story I wrote for our SQLskills Insider newsletter on Monday last week resonated so well with people that I thought I’d publish it here on my blog too (which I only do rarely). I try to mix things up in the newsletter so it’s not all about SQL Server technical matters; people seem to really like career topics too. You can sign up for the bi-weekly newsletter here (and get all the 100 past issues too) – more than 12,000 people get it now!

Here’s the story, from the newsletter section I call Paul’s Ponderings. Enjoy!

Last week Jonathan pointed me at an interesting story about a psychology experiment involving monkeys and bananas and the reinforcement of constrained, negative thinking. The experiment actually never happened, but the story is quite illustrative. You can see a graphic of the story here: http://i.stack.imgur.com/MyQki.jpg (totally safe for work), and I’ll paraphrase quickly below:

  • A group of monkeys are trained to know that if any one of them attempts to climb a stepladder with a banana at the top, cold water is sprayed on all of them (i.e. temptation is prevented by the spraying of cold water, so the group prevents any individual from trying to get the banana).
  • When a new monkey is introduced, and tries to get the banana, all the other monkeys prevent it as they don’t want to be sprayed with water. The new monkey learns that it can’t get the banana.
  • Eventually all the monkeys are replaced, and they all know not to try to get the banana, but none of them know why, as they’d only been told not to get the banana by the others.

It’s a really interesting story of how conditioning can become groupthink (even if the experiment didn’t actually happen).

There are obvious parallels in human society, and specifically in the work environments of many of you reading this newsletter. Let me explain:

  • A new person (A) joins a team. The new person has a great idea for how to do something differently/better, and everyone else on the team prevents them from expressing their opinion because that won’t be allowed, it won’t work, and/or they could get into trouble (e.g. from an intransigent and influential boss/architect/senior developer).
  • Eventually all the original people leave the team and it’s only A and the people who joined the team after A that are left. They all have the ingrained knowledge that they can’t try anything new, or try a specific technology because they won’t be allowed to etc.

In other words, being constrained to incumbent technologies and methodologies becomes “the way it’s done around here, because it’s always been that way.” It doesn’t matter if the wider world knows that’s maybe not the best way to do it, the historical groupthink wins out.

We see this with new clients over and over, and it can be really difficult to educate the client team as to different (usually better) ways of approaching a problem. Sometimes it’s just one person who stymies all thoughts of innovation or change in the team, and sometimes it’s the collective groupthink that does it.

In fact, there was a client (years ago), that Kimberly actually “fired” because of this. One meeting was enough but she told her contact (“the management”) that she’d try again and meet again for a second session. In her post-mortem with management, her main quote was – your problems are more than technical (specifically person “X”) and until that’s resolved, I can’t help you.

Here are some simple examples of ingrained behavior we’ve seen and corrected:

  • Always use NOLOCK to work around blocking and deadlocking problems (and then it’s used everywhere, and no one even realizes the negatives)
  • Only ever rebuild an index to remove fragmentation, without considering FILLFACTORs or trying to reorganize rather than rebuild
  • Always use a heap and never use a clustered index because heaps are faster
  • When creating a primary key, always use a clustered index as that’s the default so Microsoft must know that’s always the right way to do it
  • For each column that’s used in a WHERE clause, create a single-column nonclustered index
  • Always use OPTION (RECOMPILE) to prevent plans being put into the plan cache and taking precious memory away from the buffer pool
  • Always use sp_executesql to put a plan into cache for ad hoc statements so that their plans get reused
  • Always create one file per processor core for tempdb, because that was the Microsoft guidance for SQL Server 2000

As I hope you can see, these are all very blinkered (and in some cases downright wrong) views on a variety of SQL Server topics. There are cases when some of these are the right thing to do, but not ALL the time just because ‘that’s the way we do it here’.

Does this remind you of anywhere? Or of anyone?

It’s a dangerous way for a team to work because it can lead to all kinds of problems (such as poor performance, failed disaster recovery, good people leaving the team, business failure) that can be extremely hard to fix unless someone takes a stand or someone from outside the team helps to break the impasse. And even then, it’s a delicate process to provide education and slowly change the thinking of the team, or of that one person who dominates the team’s thinking.

Call to action: Consider the environment you work in and whether the situation described above is how your team operates. Are you the person to take the stand to try to change the groupthink? Do you do it or do you move on? Or are you the person who’s unreasonably dominating the team’s thinking? Do you allow yourself to be changed or continue in your intransigence? It’s a difficult situation to recognize you’re in, whatever your role is, and a hard choice to decide what to do about it.

17 Mar 18:56

About Gartner Magic Quadrant 2015 for BI

by tlachev

The 2005 Gartner Magic Quadrant is out and according to Gartner, the distance between Tableau and the other leaders is widening. Here is the full report. It's obvious that Gartner focuses only on the self-service aspect of BI and throws away the entire gamut of tools required to deliver successful BI solutions, including RDBMS, ETL, data models, MDM, etc. But even if we focus on self-service BI, I don't quite agree with Gartner's infatuation with Tableau (see my blog "Top 10 Reasons for Choosing Microsoft Self-service BI"). It's a good visualization tool but based on what I hear, people tend to overestimate its capabilities and get in trouble. Nevertheless, for the most part I agree with the Gartner's assessment related to Microsoft BI cautions, except:

1. "Microsoft had the highest percentage of customer references citing absent or weak functionality (for example, no drill-through capabilities in Power View) as a platform problem."

Really? The other vendors have more functionality? Can Tableau import multiple datasets, transform data, have Q&A, scale, share and discover datasets, or have data governance? And, that percentage is probably high, because the percentage of customers using Microsoft BI is high.

2. "However, customers may have difficulty finding external resources with experience in the newer Power BI stack, which requires a different set of skills and expertise than Microsoft's sweet spot of systems-of-record, developer-focused BI deployments."

The Power BI stack shouldn't require that much knowledge.

A successful BI strategy should be much more than just "putting lipstick on a pig". No matter how nice the lipstick is, it's still a pig. However, the Gartner's summary of the Microsoft weaknesses is spot on:

"Microsoft is attempting to address many of these limitations in the forthcoming stand-alone version of Power BI, which does not require Office 2013 or an Office 365 subscription and can access Analysis Services structures and content without physically moving underlying enterprise data to the cloud."

I'm looking forward to the 2016 quadrant. For now, I like better the Forester Research report (http://www.forrester.com/pimages/rws/reprints/document/115485/oid/1-RN6A25).

17 Mar 18:56

Prepared statements: be careful when given “THE” solution for all of your development practices

by Kimberly Tripp

I’m not sure about how to title this OR how to avoid the term “it depends” but I also don’t feel that “it depends” is a bad phrase. Some folks think of it as a cop-out when they don’t have a better answer but the problem is that it’s often the RIGHT answer. There are rarely simple solutions that work in ALL situations. And, as a result – it depends – is really correct. What’s most important is knowing what it depends ON.

So, where am I going with this one?

I recently received a question that started by referencing a post titled: NHibernate, the database query cache and parameter sizes. My issue is that there is truth in that post (and also in the comments) but there are also a few statements that just aren’t quite right. In fact, there’s one that I thought was right but I had to really dig into it to see what was really happening and what’s happening is NOT what I expected. And, this is really the crux of the problem. Sometimes the behavior (or, solution) is completely counter-intuitive. But, above all, there’s one fatal mistake in this post (and many others) – where they state what THE solution is. Some statements are claiming that they are [ALWAYS] better than the other and unfortunately, that’s just not the case… (comments were closed so I couldn’t comment on it directly)

What I really want to dissect a bit is this part of the post:

There are arguments for both sides, but most DBAs would really like to see only a single query plan for all parameter sizes. I didn’t have the answer off the top of my head, but I was pretty sure that NHibernate did the right thing from the point of view of the database.

As it turn out, I was wrong. By default, NHibernate will generate two separate query plans for this query in the database. Sounds bad, right?

Not really, here is the fix, just put the following in your configuration:

<property name='prepare_sql'>true</property>

And, this particular comment:

The side-effect of calling Prepare() is that a so-called “prepared statement” is created in the DB. This is essentially a statement that is precompiled in the DB and gives back a handle so future calls on THE SAME CONNECTION can be made just referencing this handle and not sending the entire sql again. If all calls to the DB were made on the same connection its true that the execution plans would be reused, but as we know – connections are drawn from the connection pool as needed and handed back in a random manner. The prepared statements on the connection are un-prepared everytime a new session is created and the connection is soft-initialized.

First and foremost, the post and the comments aren’t bad. These people are working to make their environments better and to help others to sort out their bizarre behaviors / problems. They are truly trying to help. I don’t suspect even a tiny bit of their intentionally wanting to lead anyone astray. However, some of this is just wrong. So, let me begin.

There ARE arguments for both sides.

Yes, this is TERRIBLY important. The reason there are arguments for both sides is that EACH side is correct SOME of the time. Neither is correct ALL of the time and, of course, something has to be chosen as the default. Let me try to give a basic example of this… I’ll use a traffic analogy. It’s 2am, you’ve been coding for quite a few hours, and you want to get some great diner food – in Seattle. But, you live in Redmond (across the bridge). The shortest distance between you and the restaurant will get you there in 30 minutes but requires you to cross a toll-bridge. So, the trade-offs are money for time. But, you decide to go solely “for performance” and you choose that plan… take the 520 over to Seattle.

Now, let’s say that plan is “in cache” and MUST be re-used the next time you want to go to the diner. However, the next time you want to go – it’s 4:30pm, on a Friday, and there’s a Husky game… the plan to use 520 is a bad idea. Not because it’s not the shortest distance but because of other problems (congestion, “blocking,” etc…). And, don’t get me wrong – SQL Server does NOT change plans for “blocking” or “congestion” (so, the analogy breaks down a bit here). But, you can still see the point.

A chosen plan for one set of parameters may not be best with different parameters.

And, this is why there isn’t a single right answer for EVERY statement. So, if you’re looking for THE right choice (should the statement’s plan be placed in the cache or not) then this is the tough part – what’s really best depends on what I call – the stability of the plan generated against a wide variety of different parameters.

With the right kind of testing, you can execute a statement with a variety of different parameter combinations and very whether or not each of their optimal plans has the same pattern / shape. If they do – then, I’d say that statement’s optimal plan is stable. However, (and this is often MORE likely), if the optimal plan varies, then that statement’s optimal plan is unstable (across different parameter combinations).

The end result is really a set of two rules:

  1. When a statement’s optimal plan is stable and does NOT vary across executions – a plan should be saved.
  2. When a statement’s optimal plan is unstable and varies across executions – a plan should not be saved.

“Most DBAs would really like to see only a single query plan for all parameter sizes.”

Well, only when the plan is stable. If the plan is unstable then another problem occurs – this is often referred to as “parameter sniffing problems” or “parameter sensitivity problems” (a.k.a. PSP). PSP can actually end up being a worse problem. If a bad plan sits in cache and is re-used for all executions (executing an inefficient plan) then you can end up with statements doing a tremendous amount of I/O or spilling to disk with bad memory grants or doing the wrong type of join, etc. when they shouldn’t. I’ve seen PSP essentially take a server down (causing so much I/O that the server is essentially inaccessible and unable to process any additional requests. Some clients even “react” to this by restarting SQL Server… problem solved. Well, cache cleared and the bad plan will be thrown out. Hopefully, the next execution will be what we want and a reasonable plan will get into cache. At least temporarily.

And, this is how it goes. Poor performance occurs and often a variety of things are blamed (cache or I/O or statistics or indexes) and some quick fix (updating statistics is another common one) SEEMS to solve the problem. But, ultimately, it’s probably a side-effect of what you’ve done that’s solved the problem (NOTE: Updating statistics often “invalidates” plans so the side-effect of having updated statistics might be what solved your problem and NOT the updating of statistics itself). I talked about this in a related post: What caused that plan to go horribly wrong – should you update statistics?

So, to bring it back to the main point… HOW you execute and whether or not YOU choose to force the statement to be cached are VERY important decisions and unfortunately depend on a statement by statement basis. I’ll try to bring it all together in just a moment.

“I was pretty sure that NHibernate did the right thing from the point of view of the database. As it turn out, I was wrong. By default, NHibernate will generate two separate query plans for this query in the database. Sounds bad, right?“

First off, I’d have to agree WITH the default. The default is often referred to as “simple” parameterization. This is where SQL Server looks at the statement and decides – no, I shouldn’t cache a single plan because the plan might not be “safe” across all executions. And, as a result, it must be compiled for EVERY execution. I agree with the default because many executing a bad plan can be WAY more expensive than recompiling to get a good plan and executing that.

But, there is a bad side to this too. If the statement is STABLE then recompiling every time wastes time in compiling (CPU) and, it can also lead to plan cache bloat. (I’ve talked more about this here: Plan cache and optimizing for adhoc workloads)

So… I think that NHibernate is doing the right thing by default.

But, having said that, that default is NOT great for everything.

“Just put the following in your configuration: <property name=‘prepare_sql’>true</property>”

MAYBE – but, only when the plan is STABLE.  (yes, I know… I still haven’t fully described how you can tell if a plan is stable)

So, what does that do? With a bit of digging (and, from the email that I received), it appears as though this essentially uses / created a “prepared” statement. And, this also ties into one the comment that I pointed out above. And, while the comment sounds correct (and, I even agreed with it at first); that’s not actually how it works (yes, I was surprised too).

Prepared statements using sp_prepare

It turns out that telling NHibernate to use prepare_sql = true effectively does something like the following:

DECLARE @handle int
EXEC sp_prepare @handle OUTPUT,
     N'@P1 VARCHAR(15)',
     N'SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] LIKE @p1';
EXEC sp_execute @handle, N'Anderson';
EXEC sp_unprepare @handle;
GO

At first glance it seemed like SQL would have to generate a different plan for every execution (except those that used the same handle) but I found that as long as statement was the same – the plan was the same. ACROSS CONNECTIONS. This surprised me. Even with the sp_unprepare. Even with the explicit @handle. YES, they re-used the plan. So, the effect is what was desired by the prepare_sql property being set to true. But, this is NOT always ideal (again, remember, the execution of a bad plan can often be more expensive than the cost of recompilation). But, not, recompiling every time should be avoided where the plans are stable.

So, then I started wondering about what sp_unprepare does and I found another post discussing this a bit more: Watch out those prepared SQL statements which is interesting in and of itself but that just exposes yet-another issue with where / how the handles are managed. So, if you’re calling these yourself then you definitely want to sp_unprepare. But, that still doesn’t get rid of the PSP problem. And, again, these PSP problems can be across connections.

Prepare your statements or not – that is the question?!

It really all boils down to this – effective testing. Most developers are fantastic at code coverage testing. However, most don’t know about what I call “plan stability” testing. The good news is that if you are doing good code-coverage testing then it should make code-stability testing easy!

Code-coverage testing should thoroughly test all of your code branches and all of your possible parameter combinations (or, at least, a healthy set of the more common combinations). In addition to making sure that these code combinations “do the right thing,” what I want you to do is see if their optimal plan varies across these executions. There are a few ways to do this:

For statements: If you can isolate the statement and hard-code different combinations of parameters – this might be the easiest way to do a test. What I would do with the statement above is something like this:

SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Tripp';
SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Anderson';
SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Smith';
SELECT [m].* FROM [dbo].[member] AS [m] WHERE [m].[lastname] = 'Abbas';
GO

Then, execute all of these in SSMS with the option “Include Actual Execution Plan” (from the Query drop-down). Now, before you think you need to understand everything about plans and what they mean… here’s the good news. For “stability-testing” you do NOT need to care what the plan does. What you need to look at is the plan shape (OK, sometimes it’s not quite this easy and I would look a little deeper [like at the types of operators and sometimes deeper still] but I will venture that the large majority of huge performance problems are caused by vastly different plan shapes and so the majority of problems can be found JUST by looking at the shape of the plan and not having to dive in much deeper).

If the plan shape varies then the optimal plan for this statement varies; do not FORCE / PREPARE it into cache. 

If the plan shape does not vary then the optimal plan for this statement is consistent / stable; this statement should be forced / prepared in the cache. (IMPORTANT NOTE: if you don’t do a very good job testing a wide-enough range of combinations you might make a bad decision here. It’s very important to have realistic data, realistic parameter combinations, and a number of them.)

For stored procedures: For stored procedures there’s a similar / easy way to see if the plan varies from execution to execution by using EXEC procedure parameters WITH RECOMPILE. While this does have some limitations, this can also be helpful at knowing if a stored procedure might be prone to similar PSP. I did write a detailed post on that here: Building High Performance Stored Procedures.

The End Result

Blog posts like the one I referenced are based in fact. And, I believe that what they’ve done did help for THEIR problem. However, trying to over simplify (for ALL environments) a single coding practice is often NOT IDEAL. Unfortunately, it depends IS the right general answer here.

If you’re looking for a lot more content around this, check out the links I’ve included in this post – there are numerous free resources including an online video I did as part of PASStv (linked to in the Building High Performance Stored Procedures post). And, if you still want more content, demos, and details, check out my two Pluralsight courses specifically targeting these problems. Start with this one: SQL Server: Optimizing Ad Hoc Statement Performance and then watch this one: SQL Server: Optimizing Stored Procedure Performance.

Have fun!
k

The post Prepared statements: be careful when given “THE” solution for all of your development practices appeared first on Kimberly L. Tripp.

17 Mar 18:56

The SQL Sentry forum for SQL Statement Tuning

by KKline
Check out this community website dedicated to helping SQL programmers tune their most difficult and problematic queries....(read more)
17 Mar 09:55

Balancing Your Available SQL Server Core Licenses Evenly Across NUMA Nodes

One issue I recently ran into with a customer was a case where they were using a new, two-socket server with two Intel Xeon E5-2697 v2 Ivy Bridge-EP processors. This particular 22nm processor has twelve physical cores, plus hyper-threading, so you can have 24 logical cores for one entire physical processor, and 48 logical cores for a two-socket server with both sockets populated.

The reason why we had an issue is because both SQL Server 2012 and 2014 Standard Edition have an artificial license limit of either 16 physical cores or 32 logical cores (along with a limit of four sockets, whichever is lower). This meant that SQL Server 2014 Standard Edition could only use 32 out of the 48 logical cores in this new server. To make matters worse, the available cores were spread unevenly across the two NUMA nodes in the server. There were 24 logical cores on one NUMA node and 8 logical cores on the other NUMA node in the server, which was not an ideal situation.

Fortunately, it is pretty easy to fix this issue by using the ALTER SERVER CONFIGURATION command that was introduced in SQL Server 2008 R2. On the negative side, fixing it was not completely intuitive, because of these dumb Standard Edition license limits.

You can use the queries in the code below to detect and correct this issue.

-- Balancing your available SQL Server core licenses evenly across two NUMA nodes
-- Glenn Berry
-- SQLskills.com


-- Get socket, physical core and logical core count from SQL Server Error Log
EXEC sys.xp_readerrorlog 0, 1, N'detected', N'socket';


-- SQL Server NUMA node information 
SELECT node_id, node_state_desc, memory_node_id, processor_group, online_scheduler_count, 
       active_worker_count, avg_load_balance, resource_monitor_state
FROM sys.dm_os_nodes WITH (NOLOCK) 
WHERE node_state_desc <> N'ONLINE DAC' OPTION (RECOMPILE);


-- SQL Server schedulers by NUMA node
SELECT parent_node_id, 
  SUM(current_tasks_count) AS [current_tasks_count], 
  SUM(runnable_tasks_count) AS [runnable_tasks_count], 
  SUM(active_workers_count) AS [active_workers_count], 
  AVG(load_factor) AS avg_load_factor
FROM sys.dm_os_schedulers WITH (NOLOCK) 
WHERE [status] = N'VISIBLE ONLINE'
GROUP BY parent_node_id;



-- SQL Server NUMA node and cpu_id information
SELECT parent_node_id, scheduler_id, cpu_id
FROM sys.dm_os_schedulers WITH (NOLOCK) 
WHERE [status] = N'VISIBLE ONLINE';


-- Fixing the problem

-- Unfortunately, this does not work, due to the license limits in SQL 2012/2014 Standard Edition
ALTER SERVER CONFIGURATION SET PROCESS AFFINITY NUMANODE = 0,1;

-- Msg 5833, Level 16, State 2, Line 7
-- The affinity mask specified is greater than the number of CPUs supported or licensed on this edition of SQL Server.


-- This command spreads your available 32 logical core licenses across two NUMA nodes 
-- This is valid for an Intel processor, with HT enabled
ALTER SERVER CONFIGURATION
SET PROCESS AFFINITY CPU = 0 TO 15, 25 TO 40;
 

 

A similar issue with “grandfathered” SQL Server 2012 Enterprise Server+CAL customers was discussed by my colleague, Jonathan Kehayias here.

17 Mar 09:55

ICYMI: Ten Query Techniques Every SQL Programmer Should Know (Video)

by kevin

Update 09-Mar-2015: The original PASS links I provided were behind a firewall. Thanks reader MMcDonald, we have a working link on YouTube.

One of the benefits of joining PASS is that you get access to their expansive library of conference videos. (Membership is free. So join now. What’dya want, a hand-written invitation?!?) I want to take a moment to point out a good one that I think anybody who writes SQL code on SQL Server should watch.

If you know anything about the staff here at SQL Sentry, then you’re probably expecting me and my colleague Microsoft MVP Aaron Bertrand (b | t) to be doing double-duty at any conference we attend in both the booth and as a conference presenter. Well, you certainly wouldn’t be wrong. :-)

Kev n Aaron 01

Between the two of us at the PASS Summit 2013, we participated or led in:

  • Three “First Timer” Orientation Sessions, with Microsoft MVP Kendal Van Dyke (b | t)
  • The opening night Reception and Quizbowl
  • The Women in Technology (WIT) Luncheon
  • Two Spotlight sessions with a technology focus. (Click the picture at above to watch the video for the session Ten Query Tuning Techniques Every SQL Programmer Should Know. Aaron has posted extras like PowerPoint and Samples  as well!)
  • Two regular sessions with a professional development focus

Now, it’s time for Aaron and I to figure out what we’re going to submit to the PASS Summit 2015. What kind of suggestions do you have for us?
-Kev

-Follow me on Twitter!
-Google Author

The post ICYMI: Ten Query Techniques Every SQL Programmer Should Know (Video) appeared first on Kevin Kline.

17 Mar 09:55

More online operations available now – or soon

by Aaron Bertrand

I was running some tests in SQL Database and discovered at least one new operation that supports ONLINE = ON. This is on a very recent version, by the way – SELECT @@VERSION; continues to yield an old build number, but the proof is in the build date:

Microsoft SQL Azure (RTM) – 12.0.2000.8
  Feb 12 2015 00:53:13
  Copyright (c) Microsoft Corporation

This version of Azure SQL Database supports the ONLINE = ON option for ALTER TABLE ... ALTER COLUMN.

Let's say you have a table with a nullable column:

CREATE TABLE dbo.a(id INT PRIMARY KEY, x VARCHAR(255));
 
INSERT dbo.a(id, x) SELECT TOP (1) [object_id], name FROM sys.all_objects;

And now you decide to make that column not nullable, you can do this (assuming there are no NULLs):

ALTER TABLE dbo.a
  ALTER COLUMN x VARCHAR(255) NOT NULL
  WITH (ONLINE = ON);

You can also do things like change the collation, the data type or the size of the column:

ALTER TABLE dbo.a 
ALTER COLUMN x NVARCHAR(510)    -- changed data type and length
  COLLATE Albanian_BIN NOT NULL -- changed collation and nullability
  WITH (ONLINE = ON);

In current versions of SQL Server (and previous versions of Azure SQL Database), the ONLINE = ON hint was not supported for ALTER TABLE, and without the option, this was a blocking and size-of-data operation. To be fair, the first time I ran the code, I could only prove that the version with ONLINE = ON ran successfully, not that it worked as advertised.

I ran this code with ONLINE = ON and without:

CREATE TABLE dbo.a(id INT PRIMARY KEY, x VARCHAR(255));
 
INSERT dbo.a(id, x) SELECT TOP (1) [object_id], name FROM sys.all_objects;
 
-- placeholder;
 
ALTER TABLE dbo.a 
  ALTER COLUMN x NVARCHAR(510)
  COLLATE Albanian_BIN NOT NULL 
  -- WITH (ONLINE = ON);
 
-- placeholder;
 
DROP TABLE dbo.a;

In the --placeholder spot, I tried a few things to determine any difference in behavior (this was our production SQL Database, so I didn't want to use enough data or create enough activity that the difference would be obvious). I wanted to check in both scenarios whether the page had changed (indicating a true online operation) or if the values were updated in place on the existing pages (a not-so-online operation). I could have also expanded the test to see how many new pages were created if the pages were full and/or all 255 characters were used, but I thought that just seeing whether the pages changed would be enough.

I tried DBCC IND():

DBCC IND(N'dbname', N'dbo.a', 1, 1);

The results here were not surprising:

Msg 40518, Level 16, State 1
DBCC command 'IND' is not supported in this version of SQL Server.

And sys.dm_db_database_page_allocations (the replacement for DBCC IND):

SELECT allocated_page_page_id
  FROM sys.dm_db_database_page_allocations(DB_ID(),OBJECT_ID(N'dbo.a'),1,1,N'LIMITED')
  WHERE is_iam_page = 0;

This yielded an empty result set – I believe it is by design that this dynamic management function does not expose any physical information in Azure SQL Database.

Next, I tried a trick with fn_PhysLocCracker, which folks like Michelle Ufford (@sqlfool) have blogged about before:

SELECT l.page_id FROM dbo.a
  OUTER APPLY sys.fn_PhysLocCracker(%%PhysLoc%%) AS l;

Success! This returned values for the pages used in the scan against dbo.a, and it is clear that in the ONLINE = ON version, the data is moved to new pages (presumably leaving the old ones available throughout the operation), and without the hint, the data and metadata is updated in place:

Comparing pages under standard ALTER COLUMN behavior (left) with ONLINE = ON (right)Comparing pages under standard ALTER COLUMN behavior (left) with ONLINE = ON (right)

Another thing I wanted to compare was the execution plans. I might not see much in Management Studio, but in SQL Sentry Plan Explorer Pro, I can see the full call stack, including what goes on behind the scenes of some DDL commands. Our tool didn't disappoint – while it didn't present an actual plan for the in-place update variation, it too demonstrates that there is a significant behavior difference when using ONLINE = ON:

Comparing plans under standard ALTER COLUMN behavior (left) with ONLINE = ON (right)Comparing plans under standard ALTER COLUMN behavior (left) with ONLINE = ON (right)

Of course, you will only see this difference if you meet all of the other conditions required for online operations (many are similar to the requirements for online index rebuild) in the recently-updated documentation.

Now, if you're not using SQL Database, how does that help you? After all, this syntax does not parse correctly even in SQL Server 2014 Cumulative Update #6 (12.0.2480). Well, Microsoft has not exactly been guarding the fact that the development pattern has become "cloud first, then box" – as Mark Souza recently implied when he tweeted about the new row-level security feature introduced first in Azure SQL Database:

Row level security. Asked for a lot by #sqlserver community. http://t.co/pp0sNr8Nt5 Cloud first but you know what that means. It's coming

— Mark Souza (@mark_AzureCAT) February 8, 2015

This means that these online operations are likely coming to your local copy of SQL Server at some point soon, too. Like many other online operations, though, keep in mind that these things tend to be reserved for Enterprise Edition.

The post More online operations available now – or soon appeared first on SQLPerformance.com.

17 Mar 09:54

Large SQL Server database backup on an Azure VM and archiving

by Murshed Zaman_AzureCAT

Applicable Versions

SQL Server 2014 and lower versions.

Background

Azure Virtual Machines (VM) allows the capability to install SQL Server and lift/shift an application or data warehouse to the cloud. The largest disk size Azure currently allows is 1tb (or more accurately 1023GB). Larger VMs such as A9 and D14 allows 16 and 32 of these disks to be attached respectively. For a large SQL Server database on an Azure VM, backing up the data can have some challenges. Some challenges for a backup can be scheduled VM reboot, unplanned reboot or VM migration because of faulty OS which are sometimes unavoidable but not very common occurrences and thus out of scope for this blog. We will focus on two particular challenges for backing up large SQL Server database on an Azure VM – size and speed.

Size related challenges

SQL Server allows the option to backup to disk and URL. Currently, URL backup to Azure Storage is limited to 1 file only and also that file cannot exceed more than 1TB. This is a good option for backing up small databases.

Backup to disk would certainly have to be smaller than 1TB per disk unless the disks are configured as a storage pool. In a case where backup will not fit on the disks (16 disks on A9 and 32 disks on D14) we would recommend looking at SQL Shep’s blog. That blog describes how to create DFS file storage in Azure which is not a small undertaking. Thus, planning for backup space has to be one of the factors when moving large SQL Server database in Azure. If the data files fit on 16 disks but the backup will not you may want to think about D14 VMs which allows attaching up to 32x1TB disks.

Another size related challenge comes from the requirement of archiving data to Azure Blob Storage. Currently, each blob cannot exceed 200GB in Azure Storage. Another option of archiving backup is to use Azure File Service. At the time of the writing of this blog, Azure File Service is still under preview. Azure File Service allow you to mount a 5TB drive through SMB protocol as a network drive on an Azure VM. Azure File Service also has the limit where 1 file cannot be greater than 1TB. But multiple Azure Files can be connected to one VM and used to copy/archive backups. These file share(s) can also be mounted on many VMs and can be accessed using a REST APIs.

Speed related challenges

When backing up SQL Server on-premises we usually have large storage pool or SAN connected servers. Backing up using one backup file on-premises is not uncommon and works fine. In Azure VMs we can also use a storage pool to create a backup with one backup file. But since space on VHDs are limited we need to plan for archiving the data onto Azure Blob Storages. Moving one large backup file to Azure Storage can become impossible because of the size restriction. Azure file Service allow storing of larger files but having one large backup file can prove challenging when copying.

Recommended Approach

Our recommendation would be to backup to multiple files for even a storage pool rather than one large file. In this case let’s assume there are ten drives mapped, H through Q. The backup command would look something like the following:

BACKUP DATABASE tpch_10tb
TO DISK = 'H:\backup\tpch_10tb_01.bak',
DISK = 'I:\backup\tpch_10tb_02.bak',
DISK = 'J:\backup\tpch_10tb_03.bak',
DISK = 'K:\backup\tpch_10tb_04.bak',
DISK = 'L:\backup\tpch_10tb_05.bak',
DISK = 'M:\backup\tpch_10tb_06.bak',
DISK = 'N:\backup\tpch_10tb_07.bak',
DISK = 'O:\backup\tpch_10tb_08.bak',
DISK = 'P:\backup\tpch_10tb_09.bak',
DISK = 'Q:\backup\tpch_10tb_10.bak'
WITH COMPRESSION,
STATS = 5;
GO

SQL Server allows compression when creating a backup. In certain customer cases we have seem a 10TB database compressed into 2TB or smaller in backup files. When Transparent Data Encryption (TDE) is used in the database the backup of that database does not compress. As such an important recommendation is to compress the backup to keep the size of the backup smaller unless TDE is used.

The reasons behind having many compressed backup files vs. one large file are the various size limits discussed earlier and the practicality of data movement in the cloud. Let's assume the storage pool or drives are rather full after the backup, we need to move the backup somewhere else to make space for the next backup. In this case Azure Blob Storage is a good candidate to copy the backup and free up the disks. When a storage account is created on azure, the user has option to choose if the disk will be locally redundant (LRS) or geo redundant storage (GRS). We will leave it up to your business decision as to what type of redundancy to choose.

As 200GB is the current limit of a single blob, you have to calculate to keep the files under that limit. If the database has multiple files or filegroups, a quick way to check would be to backup a file or filegroup to see the size and estimate the backup size from there. SQL Server does a good job at keeping the backup files the same size when multiple files are used.

Let’s assume our backup is divided into 10 files that doesn’t exceed 200GB limit for each file. We can now write 10 copy command to parallelize the copy operation to archive the backup to Azure Storage. We will also need to make sure all files are copied correctly or else the backup is useless when it comes time to restore the data for whatever reason.

We have run the following backup scenarios and captured timing:

Database

Size

Database Compression

Timing (hh:mm:ss)

TPCH_10GB

10GB

Uncompressed

00:02:18

TPCH_1TB

1TB

Compressed

02:29:00

TPCH_10TB

10TB

Compressed

21:16:00

In our case, we have observed that when all the tables in a database is compressed it takes a lot less time to backup that database. In all the above tests we used a MAXTRANSFERSIZE of 4MB. Changing BUFFERCOUNT option didn’t make much difference. The backup command looks like the following for 10TB database.

BACKUP DATABASE tpch_10tb
TO DISK = 'H:\backup\tpch_10tb_01.bak',
DISK = 'I:\backup\tpch_10tb_02.bak',
DISK = 'J:\backup\tpch_10tb_03.bak',
DISK = 'K:\backup\tpch_10tb_04.bak',
DISK = 'L:\backup\tpch_10tb_05.bak',
DISK = 'M:\backup\tpch_10tb_06.bak',
DISK = 'N:\backup\tpch_10tb_07.bak',
DISK = 'O:\backup\tpch_10tb_08.bak',
DISK = 'P:\backup\tpch_10tb_09.bak',
DISK = 'Q:\backup\tpch_10tb_10.bak'
WITH COMPRESSION,
MAXTRANSFERSIZE= 4194304,
STATS = 5;
GO

Archiving

Azcopy does a good job at copying data in Azure. The following Azcopy command will copy the backup files from the VM to a storage of your choice.

This first command will copy all the files one after the other:

AzCopy /Source:X:\tpchbackup\ /Dest:https://tpch1tbbackup.blob.core.windows.net/backup/ /Destkey:<destination storage key> /Pattern:*.bak

This command will copy tpch_1tb_01.bak file:

AzCopy /Source:X:\tpchbackup\ /Dest:https://tpch1tbbackup.blob.core.windows.net/backup/ /Destkey:<destination storage key> /Pattern:*_01.bak

In the latter case, we can execute multiple of these commands to finish copying faster. We would recommend adding a date component to the directory structure to make the files unique so the older backup are not overwritten and also many versions of the backup can reside side by side but also can be uniquely identified. In that case the copy command will look as following:

AzCopy /Source:X:\tpchbackup\ /Dest:https://tpch1tbbackup.blob.core.windows.net/backup/20150101 /Destkey:<destination storage key> /Pattern:*.bak

Azure File Service is another option to copy the backups for archival purposes. Use the following commands to mount a network drive to archive the backup:

1. Create a storage account in Windows Azure PowerShell

New-AzureStorageAccount -StorageAccountName "tpch1tbbackup" -Location "West US"

2. Create a storage context

$storageAccountName = "tpch1tbbackup"

$storageKey = Get-AzureStorageKey $storageAccountName | %{$_.Primary}

$context = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageKey

3. Create a share

New-AzureStorageShare -Name backup -Context $context

4. Attach the share to your Azure VM

net use x: \\tpch1tbbackup.file.core.windows.net\backup /u:$storageAccountName $storageKey

5. Xcopy backup to x:\ to offload the backup on the mounted drive

xcopy D:\backup\*.* X:\tpchbackup\*.*

This begs the question why not just use the UNC path of the mounted drive to do the backup. In which the answer is, it's not possible at this time. Notice that the file is mounted using a storage account name and storage key. If you try to backup to the UNC path of the drive directly SQL Server will throw the following error:

Operating system error 1326(The user name or password is incorrect.).

If you are using multiple Azure File shares for archiving backup you should also have some kind of accounting (metadata) that tells you where each backup files of a backup set reside.

Lastly, you should follow the same procedure for backup and restore that you follow for your on-premises database. As an example, if your business procedure is to check the newly created backup using ‘restore verifyonly’, follow that before you move your data for archiving.

17 Mar 09:54

Data Stories….

by Karen Lopez

shareasimage (4)

I prefer to think of it is trying to decipher the stories data is already saying.  So more listening, less torture.

17 Mar 09:54

SAN Configuration Performance Problems

by jchang
The charts provided here illustrates my complaints against SAN vendor doctrine, obstinately adhering to the concept of one large pool of disks from which all volumes are created for any purpose (data, log, and junk non-DB stuff). There is no consideration...(read more)
17 Mar 09:54

ICYMI: the Halloween Problem and DBCC CheckConstraints - SQL Server Performance

by KKline
Cool and informative articles about SQL Server performance relating to IO caused by the DBCC CheckConstraints command and exploring the infamous Halloween Problem....(read more)
17 Mar 09:54

Running SQL Server on Machines with More Than 8 CPUs per NUMA Node May Need Trace Flag 8048

by psssql

Applies To:  SQL 2008, 2008 R2, 2012 and 2014 releases

Note:  The number of CPUs is the logical count, not sockets.   If more than 8 logical CPUs are presented this post may apply.

The SQL Server developer can elect to partition memory allocations at different levels based on the what the memory is used for.   The developer may choose a global, CPU, Node, or even worker partitioning scheme.   Several of the allocation activities within SQL Server use the CMemPartitioned allocator.  This partitions the memory by CPU or NUMA node to increase concurrency and performance.  

You can picture CMemPartitioned like a standard heap (it is not a HeapCreate) but this concept is the same.  When you create a heap you can specify if you want synchronized assess, default size and other attributes.   When the SQL Server developer creates a memory object they indicate that they want things like thread safe access, the partitioning scheme and other options.

The developer creates the object so when a new allocation occurs the behavior is upheld.  On the left is a request from a worker against a NODE based memory object.  This will use a synchronization object (usually CMEMTHREAD or SOS_SUSPEND_QUEUE type) at the NODE level to allocate memory local to the workers assigned NUMA NODE.   On the right is an allocation against a CPU based memory object.  This will use a synchronization object at the CPU level to allocate memory local to the workers CPU.

In most cases the CPU based design reduces synchronization collisions the most because of the way SQL OS handles logical scheduling.  Preemptive and background tasks make collisions possible but CPU level reduces the frequency greatly.  However, going to CPU based partitioning means more overhead to maintain individual CPU access paths and associated memory lists.  

The NODE based scheme reduces the overhead to the # of nodes but can slightly increase the collision possibilities and may impact ultimate, performance results for very specific scenarios.  I want to caution you the scenarios encountered by Microsoft CSS have been limited to very specific scopes and query patterns.

image

 

Newer hardware with multi-core CPUs can present more than 8 CPUs within a single NUMA node.  Microsoft has observed that when you approach and exceed 8 CPUs per node the NODE based partitioning may not scale as well for specific query patterns.   However, using trace flag 8048 (startup parameter only requiring restart of the SQL Server process) all NODE based partitioning is upgraded to CPU based partitioning.   Remember this requires more memory overhead but can provide performance increases on these systems.

HOW DO I KNOW IF I NEED THE TRACE FLAG?

The issue is commonly identified by looking as the DMVs dm_os_wait_stats and dm_os_spinlock_stats for types (CMEMTHREAD and SOS_SUSPEND_QUEUE).   Microsoft CSS usually sees the spins jump into the trillions and the waits become a hot spot.   

Caution: Use trace flag 8048 as a startup parameter.   It is possible to use the trace flag dynamically but limited to only memory objects that are yet to be created when the trace flag is enabled.  Memory objects already built are not impacted by the trace flag.

References

http://blogs.msdn.com/b/psssql/archive/2012/12/20/how-it-works-cmemthread-and-debugging-them.aspx

 

 

Bob Dorr - Principal SQL Server Escalation Engineer

17 Mar 09:53

How It Works: MAX DOP Level and Parallel Index Builds

by psssql

I have been working on an issue where rebuilding an index leads to additional fragmentation.   Using XEvents I debugged the page allocations and writes and was able to narrow in on the behavior.

There are lots of factors to take into account when rebuilding the index.   I was able to break down the behavior to the worst possible case using a single file database, single heap table,  SORT IN TEMPDB and packing of the heap data to the beginning of the database file when create clustered index is issued.

When the index is build a portion of the data (range) is assigned to each of the parallel workers.  The diagram below shows a MAX DOP = 2 scenario.

clip_image002Each parallel worker is assigned its own CBulkAllocator when saving the final index pages.   This means Worker 1 gets an extent and starts to fill pages from TEMPDB for Worker 1’s given key range.   Worker 2 is executing in parallel and has its own CBulkAllocator.  Worker 2 acquires the next extent and starts to spool the assigned key range.

Looking at the database a leap frog behavior of values, across extents occurs as the workers copy the final keys into place.

The diagram below shows the leap frog behavior from a MAX DOP = 4 index creation.   The saw tooth line represents the offsets in the file as read during an index order scan.  The horizontal access is the event sequence and the vertical access is the offset in the database file.  As you can see the leap frog behavior places key values all over the file.

Key 1 is at a low offset but Key 2 is at an offset higher than Key 9 as shown in the example above.  Each of the workers spreads 1/4th of the data across the entire file instead of packing the key values together in a specific segment of the file.

clip_image002

In comparison the a serial index build shows the desired layout across the drive.   Smaller offsets have the 1st set of keys and larger offsets always have higher key values.

image

This mattered to my customer because after a parallel index build an index ordered scan takes longer than a serial index build.  The chart below shows the difference in read size and IOPS requirements.

select count_big(*) from tblTest (NOLOCK)

Serial Built

Parallel Built

Avg Read Size

508K

160K

Duration

00:01:20

00:01:50

# Reads

15,000

52,000

SQL Server reads up to 512K in a chuck for read ahead behavior.   When doing an index order scan we read the necessary extents to cover the key range.  Since the key range is leap frogged, during the parallel build, the fragmentation limits SQL Server’s I/O size to 160K instead of 508K and drives the number of I/O requests much higher.  The same data in a serial built index maximizes the read ahead capabilities of SQL Server.

The testing above was conducted using:  select count_big(*) from tblTest with (NOLOCK)

Hint: You don’t have to rebuild the index in serial to determine how much a performance gain it may provide.   Using WITH(NOLOCK, INDEX=0) forces an allocation order scan, ignoring the key placement and scanning the object from first IAM to last IAM order.  Leveraging the statistics I/O, XEvents and virtual file statistics output you are able to determine the behaviors.

Workarounds
The obvious question is that a serial index rebuild can take a long time so what should I do to leverage parallel index builds and reduce the fragmentation possibilities?

1. Partition the table on separate files matching the DOP you are using to build the index.  This allows better alignment of parallel workers to specific partitions, avoiding the leap frog behavior.

2. For a non-partitioned table aligning the number of files with the DOP may be helpful.   With reasonably even distribution of free space in each file the allocation behavior is such that alike keys will be placed near each other.

3. For single partition rebuild operations consider serial index building behaviors to minimize fragmentation behaviors.

Future
I am working with the development team to evaluate the CBulkAllocator behavior.   Testing is needed but it could be that the CBulkAllocator attempts to acquire 9 (64K) extents to align with the read ahead (512K) chunk size.   Something like this idea could reduce the fragmentation by a factor of 8.

Bob Dorr - Principal SQL Server Escalation Engineer

17 Mar 09:53

Explore In-Memory Technologies at Ignite

by SQL Server Team

You've probably already heard why you should use the in-memory tech built into Microsoft SQL Server: 30 times faster transaction speed, 100 times the query speed, and millions of rows of data analyzed every second. If you want to learn more about the how, check out the Microsoft Ignite session with the no-nonsense title, In-Memory Technologies Overview.

Microsoft SQL Program Manager Kevin Farlee will walk you through all three in-memory technologies contained within SQL Server:

  • In-memory online transaction processing (OLTP)
  • In-memory data warehousing
  • Buffer pool extension (BPE)

He'll cover the thinking behind their inclusion in SQL Server, take a look at the tech behind the scenes, and explore real-world customer benefits. You'll come away with a deeper understanding of these powerful technologies, knowing exactly how to put them to work for your team.

There’s plenty more happening at Ignite. Get email updates by subscribing here. And if you haven’t already, register for Ignite here.

17 Mar 09:53

Tip # 5 – SQL Server Maximum Memory (Tips for SQL Server Performance and Resiliency)

by Chris Shaw

This article is part 5 in a series on the top 10 most common mistakes that I have seen impacting SQL Server Performance and Resiliency. This post is not all inclusive.

Most common mistake #5: Not Setting a Maximum Memory

Default configurations are not always the best configuration for your server. Maximum memory allocated to SQL Server is a great example of this. If this setting is left unchanged, there is a chance your SQL Server could be starving the operating system of memory. When SQL Server starves the operating system, the OS cannot provide a stable platform. A struggling OS will have a difficult time retrieving data from disk, swapping processes, and completing standard OS functions to name a few issues.

There are a number of ways to see if the operating system is running into memory issues. One of the fastest ways is to simply look at the task manager and see how much is available. If you review the available memory when the server is not performing well and the amount of available memory is low, there is a good chance the OS is being starved. With tasks like file copies, backups and virus scans, the operating system will look for memory to help complete the process.   If SQL Server has that memory reserved, the OS will not have the memory it needs to work efficiently.

Proper SQL Server memory configuration can take some fine tuning as a number of variables need to be taken into consideration. For example if there are 2 SQL Server instances installed you will need to leave the memory configured for each SQL Server low enough so the Operating System has enough to work with. And don’t forget processes like copying large files to different media, such as a file copy for a backup. Finding the balance can often leave DBA’s going in circles. When you add virtual servers into the mix the variables increase even more and sometimes the memory configurations go against SQL Server best practices (VMWare recommends min and max memory to be set to be the same). There are many formulas for recommend initial max memory settings.

So where do you start? A general rule of thumb is to leave the operating system 20% of the memory. With that being said, I hesitate to leave the operating system anything less than 2 gigs. After making a memory setting adjustment, close attention should be given to monitoring the memory counters in the performance monitor and the DMV’s in SQL Server to ensure the operating system has enough memory to perform well without fighting with SQL Server.

Top 10 Tips for SQL Server Performance and Resiliency

  1. Improper Backups
  2. Improper Security
  3. Improper Maintenance
  4. Not having a Baseline

17 Mar 09:53

Rebuild Partitioned Clustered Columnstore Indexes

by Greg Low

A short time back, I provided an updated script that can be used to rebuild clustered columnstore indexes when necessary.

One thing that wasn’t covered by that script was the ability to only individual index partitions. Below, I’ve provided an updated version of the script that will rebuild only affected partitions if the clustered columnstore index is partitioned.

----------------------------------------------------------------------

-- Rebuild clustered columnstore indexes when necessary

--

-- Dr Greg Low v2.2

 

----------------------------------------------------------------------

-- rebuild when more than supplied % of rows have been deleted

-- rebuild when any segments contain more than supplied % deleted rows

-- rebuild if more than supplied number of segments are empty

-- rebuild when segments are on average less than the supplied % of

--     maximum capacity

 

DECLARE @DeletedTotalRowPercentage int = 10; 

DECLARE @DeletedSegmentsRowPercentage int = 20;

DECLARE @EmptySegmentsAllowed int = 0;

DECLARE @SegmentPotentialFullnessPercentage int = 90;

----------------------------------------------------------------------

 

DECLARE @IndexesToRebuild TABLE (SchemaName sysname,

                                 TableName sysname,

                                 IndexName sysname,

                                 PartitionNumber int,

                                 IsPartitioned bit);

 

SET NOCOUNT ON;

 

WITH ClusteredColumnstoreIndexes

AS

( SELECT t.object_id AS ObjectID,

         SCHEMA_NAME(t.schema_id) AS SchemaName,

         t.name AS TableName,

         i.name AS IndexName

  FROM sys.indexes AS i

  INNER JOIN sys.tables AS t

  ON i.object_id = t.object_id

  WHERE i.type = 5

),

RowGroups

AS

( SELECT csrg.object_id AS ObjectID,

         csrg.partition_number AS PartitionNumber,

         csrg.total_rows AS TotalRows,

         csrg.deleted_rows AS DeletedRows,

         csrg.deleted_rows * 100.0 / csrg.total_rows AS DeletedPercentage,

         CASE WHEN csrg.total_rows = csrg.deleted_rows

              THEN 1 ELSE 0

         END AS IsEmptySegment

  FROM sys.column_store_row_groups AS csrg

  WHERE csrg.state = 3 -- Compressed (Ignoring: 0 - Hidden, 1 - Open, 2 - Closed, 4 - Tombstone)

),

IndexPartitionStats

AS

( SELECT cci.ObjectID,

         cci.SchemaName,

         cci.TableName,

         cci.IndexName,

         rg.PartitionNumber,

         SUM(CAST(rg.TotalRows AS decimal(18,0))) AS TotalRows,

         SUM(CAST(rg.DeletedRows AS decimal(18,0))) AS DeletedRows,

         SUM(CAST(rg.DeletedRows AS decimal(18,0))) * 100.0

           / SUM(CAST(rg.TotalRows AS decimal(18,0)))

           AS DeletedPercentage,

         SUM(rg.IsEmptySegment) aS EmptySegments,

         COUNT(rg.TotalRows) AS TotalSegments

  FROM ClusteredColumnstoreIndexes AS cci

  INNER JOIN RowGroups AS rg

  ON cci.ObjectID = rg.ObjectID

  GROUP BY cci.ObjectID, cci.SchemaName, cci.TableName, cci.IndexName, rg.PartitionNumber

)

INSERT @IndexesToRebuild (SchemaName, TableName, IndexName, PartitionNumber, IsPartitioned)

SELECT s.SchemaName,

       s.TableName,

       s.IndexName,

       s.PartitionNumber,

       CASE WHEN EXISTS(SELECT 1

                        FROM IndexPartitionStats AS ps

                        WHERE s.SchemaName = ps.SchemaName

                        AND s.TableName = ps.TableName

                        AND s.IndexName = ps.IndexName

                        AND ps.PartitionNumber > 1)

            THEN 1

            ELSE 0

       END

FROM IndexPartitionStats AS s

WHERE s.DeletedPercentage > @DeletedTotalRowPercentage

OR s.EmptySegments > @EmptySegmentsAllowed

OR EXISTS(SELECT 1 FROM RowGroups AS rg

                   WHERE rg.ObjectID = s.ObjectID

                   AND rg.DeletedPercentage > @DeletedSegmentsRowPercentage)

OR (s.TotalRows * 100 / (s.TotalSegments * 1048576.0))

            < @SegmentPotentialFullnessPercentage;

 

DECLARE @SchemaName sysname;

DECLARE @TableName sysname;

DECLARE @IndexName sysname;

DECLARE @PartitionNumber int;

DECLARE @IsPartitioned bit;

DECLARE @SqlCommand nvarchar(max);

 

DECLARE IndexList CURSOR FAST_FORWARD READ_ONLY

FOR

  SELECT SchemaName, TableName, IndexName, PartitionNumber, IsPartitioned

  FROM @IndexesToRebuild

  ORDER BY SchemaName, TableName, IndexName, PartitionNumber;

 

OPEN IndexList;

 

FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName, @PartitionNumber, @IsPartitioned;

 

WHILE @@FETCH_STATUS = 0

BEGIN

  SET @SqlCommand = N'ALTER INDEX ' + QUOTENAME(@IndexName)

                  + N' ON ' + QUOTENAME(@SchemaName)

                  + N'.' + QUOTENAME(@TableName)

                  + N' REBUILD'

                  + CASE WHEN @IsPartitioned <> 0

                         THEN N' PARTITION = ' + CAST(@PartitionNumber AS nvarchar(10))

                         ELSE N''

                    END

                  + ';';

  PRINT @SqlCommand;

  EXEC(@SqlCommand);

  FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName, @PartitionNumber, @IsPartitioned;

END;

 

CLOSE IndexList;

DEALLOCATE IndexList;

 

 

 

 

17 Mar 09:52

Suggest Compression Strategies for Tables and Indexes

by Greg Low

I make no secret of the fact that I think that table compression was the best thing added to SQL Server 2008.

It seems like every week that I’m talking to clients about it. They have Enterprise Edition but they weren’t using it because they were familiar with O/S level disk compression and they think that compression will mean smaller and slower, yet this often couldn’t be further from the truth for customers whose databases are currently I/O bound or who don’t have enough memory in their buffer cache for it to work correctly.

It’s important though, to make sure the correct form of compression is used for each table and index. In fact, the decision should be taken for each partition of each table and index if the table is partitioned. We often recommend different compression strategies for different partitions of the same table.

To make it easier to decide which form of compression might suit, I’ve created another version of the script that I use as a first pass in determining compression strategies. The earlier version suggested ROW and/or PAGE compression. While this version can do the same, it also considers the clustered columnstore indexes that are part of the Enterprise Edition of SQL Server 2014. (I’ve included a switch to turn that function on or off in case you don’t have 2014 yet).

The cutoff points are configurable but my default values are as shown.

  • CCI (Clustered Columnstore Index) will be recommended where the partition is scanned more than 95% of the time, updated less than 10% of the time, seeks and lookups are less than 5% of the time, and where there are at least 800,000 rows. It will also only be recommended if it is supported.
  • PAGE will be recommended where the partition is scanned more than 75% of the time and updated less than 20% of the time.
  • ROW will be recommended in all other cases. We believe that ROW should be the default in SQL Server across the board, instead of NONE.

It is important that this script only be run after the system has been in use for long enough to have experienced typical usage patterns.

I hope this helps you to get into using compression where it makes sense for you.

-----------------------------------------------------------------------------

-- Suggest data compression changes for tables and indexes

--

-- Dr Greg Low

-- March 2015

--

-----------------------------------------------------------------------------

DECLARE @ClusteredColumnstoreScansCutoff int = 95;

DECLARE @ClusteredColumnstoreUpdatesCutoff int = 10;

DECLARE @ClusteredColumnstoreSeeksLookupsCutoff int = 5;

DECLARE @ClusteredColumnstoreTotalRowsCutoff bigint = 800000;

 

DECLARE @PageCompressionScansCutoff int = 75;

DECLARE @PageCompressionUpdatesCutoff int = 20;

 

DECLARE @IsClusteredColumnstoreSupported bit = 1;

-----------------------------------------------------------------------------

 

WITH IndexUsageStats

AS

( SELECT object_id AS ObjectID,

         index_id AS IndexID,

         COALESCE(user_seeks, 0) + COALESCE(system_seeks, 0) AS Seeks,

         COALESCE(user_scans, 0) + COALESCE(system_scans, 0) AS Scans,

         COALESCE(user_lookups, 0) + COALESCE(system_lookups, 0) AS Lookups,

         COALESCE(user_updates, 0) + COALESCE(system_updates, 0) AS Updates,

         COALESCE(user_seeks, 0) + COALESCE(system_seeks, 0)

           + COALESCE(user_scans, 0) + COALESCE(system_scans, 0)

           + COALESCE(user_lookups, 0) + COALESCE(system_lookups, 0)

           + COALESCE(user_updates, 0) + COALESCE(system_updates, 0) AS Operations

  FROM sys.dm_db_index_usage_stats

  WHERE database_id = DB_ID()

),

PartitionUsageDetails

AS

( SELECT SCHEMA_NAME(t.schema_id) AS SchemaName,

         t.name AS TableName,

         i.name AS IndexName,

         i.index_id AS IndexID,

         i.type_desc AS IndexType,

         CASE WHEN COALESCE(Operations, 0) <> 0

              THEN CAST((COALESCE(Seeks, 0) + COALESCE(Lookups, 0)) * 100.0

                        / COALESCE(Operations, 0) AS int)

              ELSE 0

         END AS SeekLookupPercentage,

         CASE WHEN COALESCE(Operations, 0) <> 0

              THEN CAST(COALESCE(Scans, 0) * 100.0 / COALESCE(Operations, 0) AS int)

              ELSE 0

         END AS ScanPercentage,

         CASE WHEN COALeSCE(Operations, 0) <> 0

              THEN CAST(COALESCE(Updates, 0) * 100.0 / COALESCE(Operations, 0) AS int)

              ELSE 0

         END AS UpdatePercentage,

         p.partition_number AS PartitionNumber,

         p.data_compression_desc AS CurrentCompression,

         p.rows AS TotalRows

   FROM sys.tables AS t

   INNER JOIN sys.indexes AS i

   ON t.object_id = i.object_id

   INNER JOIN sys.partitions AS p

   ON i.object_id = p.object_id

   AND i.index_id = p.index_id

   LEFT OUTER JOIN IndexUsageStats AS ius

   ON i.object_id = ius.ObjectID

   AND i.index_id = ius.IndexID

   WHERE i.index_id > 0

   AND t.is_ms_shipped = 0

   AND t.type = N'U'

),

SuggestedPartitionCompressionTypes

AS

( SELECT pud.*,

         CASE WHEN pud.ScanPercentage >= @ClusteredColumnstoreScansCutoff

              AND pud.UpdatePercentage <= @ClusteredColumnstoreUpdatesCutoff

              AND pud.SeekLookupPercentage <= @ClusteredColumnstoreSeeksLookupsCutoff

              AND pud.TotalRows >= @ClusteredColumnstoreTotalRowsCutoff

              AND @IsClusteredColumnstoreSupported <> 0

              THEN N'CCI'

              WHEN pud.ScanPercentage >= @PageCompressionScansCutoff

              AND pud.UpdatePercentage <= @PageCompressionUpdatesCutoff

              THEN N'PAGE'

              ELSE N'ROW'

         END AS SuggestedCompression

  FROM PartitionUsageDetails AS pud

)

SELECT spct.SchemaName,

       spct.TableName,

       spct.IndexName,

       spct.PartitionNumber,

       spct.CurrentCompression,

       spct.SuggestedCompression

FROM SuggestedPartitionCompressionTypes AS spct

WHERE spct.SuggestedCompression <> spct.CurrentCompression

ORDER BY spct.SchemaName,

         spct.TableName,

         CASE WHEN spct.IndexID = 1 THEN 0 ELSE 1 END,

         spct.IndexName;

 

17 Mar 09:52

Monitoring Read/Write Latency

by Tim Radney

Being responsible for the performance of SQL Server can be a daunting task. There are many areas that we to have to monitor and understand. We are also expected to be able to stay on top of all those metrics and know what is happening on our servers at all times. I like to ask DBAs what the first thing that they think about when they hear the phrase “tuning SQL Server;” the overwhelming response I get is “query tuning.” I agree that tuning queries is very important and is a never-ending task that we face because workloads are continuously changing.

However there are many other aspects to consider when thinking about SQL Server performance. There are a lot of instance-, OS- and database-level settings that need to be tweaked from the defaults. Being a consultant allows me to work in many different lines of business and get exposure to all sorts of performance issues. When working with a new client I try to always perform a health audit of the server to know what I am dealing with. While performing these audits, one of the things that I’ve found repeatedly has been excessive read and write latencies on the disks where SQL Server data and log files reside.

Read/Write Latency

To view your disk latencies in SQL Server you can quickly and easily query the DMV sys.dm_io_virtual_file_stats. This DMV accepts two parameters: database_id and file_id. What is awesome is that you can pass NULL as both values and return the latencies for all files for all databases. The output columns include:

  • database_id
  • file_id
  • sample_ms
  • num_of_reads
  • num_of_bytes_read
  • io_stall_read_ms
  • num_of_writes
  • num_of_bytes_written
  • io_stall_write_ms
  • io_stall
  • size_on_disk_bytes
  • file_handle

As you can see from the column list, there is really useful information that this DMV retrieves, however just running SELECT * FROM sys.dm_io_virtual_file_stats(NULL, NULL); doesn’t help much unless you have memorized your database_ids and can do some math in your head.

When I query the file stats, I use a query from Paul Randal’s blog post, “How to examine IO subsystem latencies from within SQL Server.” This script makes the column names easier to read, includes the drive the file is on, the database name, and the path to the file.

By querying this DMV you can easily tell where the I/O hot spots are for your files. You can see where the highest write and read latencies are and which databases are the culprits. Knowing this will allow you to start looking at the tuning opportunities for those specific databases. This could include index tuning, checking to see if the buffer pool is under memory pressure, possibly moving the database to a faster portion of the I/O subsystem, or possibly partitioning the database and spreading the filegroups across other LUNs.

So you run the query and it returns lots of values in ms for latency – which values are okay, and which are bad?

What values are good or bad?

If you ask SQLskills, we will tell you something along the lines of:

  • Excellent:
  • Very good:
  • Good: 5 – 10ms
  • Poor: 10 – 20ms
  • Bad: 20 – 100ms
  • Really bad: 100 – 500ms
  • OMG!: > 500ms

If you do a Bing search, you will find articles from Microsoft making the recommendations similar to:

  • Good:
  • Okay: 10 – 20ms
  • Bad: 20 – 50ms
  • Seriously bad: > 50ms

As you can see, there are some slight variations in the numbers, but the consensus is that anything over 20ms can be considered troublesome. With that being said, your average write latency may be 20ms and that is 100% acceptable for your organization and that is okay. You need to know general I/O latencies for your system so that, when things do get bad, you know what normal is.

My Read/Write Latencies Are Bad, What Do I Do?

If you are finding that read and write latencies are bad on your server, there are several places you can start looking for issues. This is not a comprehensive list but some guidance of where to start.

  • Analyze your workload. Is your indexing strategy correct? Not having the proper indexes will lead to much more data being read from disk. Scans instead of seeks.
  • Are your statistics up to date? Bad statistics can make for poor choices for execution plans.
  • Do you have parameter sniffing issues that are causing poor execution plans?
  • Is the buffer pool under memory pressure, for instance from a bloated plan cache?
  • Any network issues? Is your SAN fabric performing correctly? Have your storage engineer validate pathing and network.
  • Move the hot spots to different storage arrays. In some cases it may be a single database or just a few databases that are causing all the problems. Isolating them to a different set of disk, or faster high end disk such as SSD’s may be the best logical solution.
  • Can you partition the database to move troublesome tables to different disk to spread the load?

Wait Statistics

Just like monitoring your file stats, monitoring your wait stats can tell you a great deal about bottlenecks in your environment. We are lucky to have another awesome DMV (sys.dm_os_wait_stats) that we can query that will pull all available wait information collected since the last restart or since the last time the waits were reset; there are waits related to disk performance too. This DMV will return important information including:

  • wait_type
  • waiting_task_count
  • wait_time_ms
  • max_wait_time_ms
  • signal_wait_time_ms

Querying this DMV on my SQL Server 2014 machine returned 771 wait types. SQL Server is always waiting on something, but there are a lot of waits that we shouldn’t worry ourselves with. For this reason, I utilize another query from Paul Randal; his blog post, “Wait Statistics, or please tell me where it hurts,” has an excellent script that excludes a bunch of the waits we don’t really care about. Paul also lists out many of the common problematic waits as well as offering guidance for the common waits.

Why are wait stats important?

Monitoring for high wait times for certain events will tell you when there are issues going on. You need a baseline to know what is normal and when things exceed a threshold or pain level. If you have really high PAGEIOLATCH_XX then you know SQL Server is having to wait for a data page to be read from disk. This could be disk, memory, workload change or a number of other issues.

A recent client I was working with was seeing some very unusual behavior. When I connected to the database server and was able to observe the server under a work load, I immediately started checking file stats, wait stats, memory utilization, tempdb usage, etc. One thing that immediately stood out was WRITELOG being the most prevalent wait. I know this wait has to do with a log flush to disk and reminded me of Paul’s series on Trimming the Transaction Log Fat. High WRITELOG waits can usually be identified by high-write latencies for the transaction log file. So I then used my file stats script to review the read and write latencies on the disk. I was then able to see high write latency on the data file but not my log file. In looking at the WRITELOG it was a high wait but the time waiting in ms was extremely low. However something in the second post of Paul’s series was still in my head. I should look at the auto growth settings for the database just to rule out “Death by a thousand cuts”. In looking at the database properties of the database I saw that the data file was set to auto grow by 1MB and the transaction log set to auto grow by 10%. Both files had nearly 0 unused space. I shared with the client what I found and how this was killing their performance. We quickly made the appropriate change and testing went forward, much better by the way. Sadly this is not the only time I have encountered this exact issue. Another other time a database was 66GB in size, it got there by 1MB growths.

Capturing your data

Many data professionals have created processes to capture file and wait stats on a regular basis for analysis. Since the wait stats are cumulative, you would want to capture them and compare the deltas between different times of day or before and after certain processes run. This isn’t too complicated and there are numerous blog posts available where people share how they accomplished this. The important part is to be measuring this data so that you can monitor it. How do you know today that things are better or worse on your database server unless you know the data from yesterday?

How can SQL Sentry help?

I'm glad you asked! SQL Sentry Performance Advisor brings latency and waits front and center on the dashboard. Any anomalies are easy to spot; you can switch to historical mode and see the previous trend and compare that to previous periods as well. This can prove to be priceless when analyzing those “what happened?” moments. Everyone has gotten that call, “Yesterday around 3:00PM the system just seemed to freeze, can you tell us what happened?” Um, sure, let me pull up Profiler and go back in time. If you have a monitoring tool like Performance Advisor, you would have that historical information at your fingertips.

In addition to the charts and graphs on the dashboard, you have the ability to use built-in alerts for conditions such as High Disk Waits, High VLF Counts, High CPU, Low Page Life Expectancy, and many more. You also have the ability to create your own custom conditions, and you can learn from the examples on the SQL Sentry site or through the Condition Exchange (Aaron Bertrand has blogged about this). I touched on the alerting side of this in my last article on SQL Server Agent Alerts.

On the Disk Space tab of Performance Advisor, it is very easy to see things like autogrowth settings and high VLF counts. You should know, but in case you don’t, autogrowth by 1MB or 10% is not the best setting. If you see these values (Performance Advisor highlights them for you), you can quickly make note and schedule the time to make the proper adjustments. I love how it displays Total VLFs as well; too many VLFs can be very problematic. You should read Kimberly’s post “Transaction Log VLFs – too many or too few?” if you haven't already.

Partial grid on Performance Advisor's Disk Space tabPartial grid on Performance Advisor's Disk Space tab

Another way that Performance Advisor can help is through its patented Disk Activity module. Here you can see that tempdb on F: is experiencing substantial write latency; you can tell this by the thick red lines underneath the disk graphics. You might also notice that F: is the only drive letter whose disk is represented in red; this is a visual cue that the drive has a misaligned partition, which can contribute to I/O problems.

Performance Advisor Disk Activity modulePerformance Advisor Disk Activity module

And you can correlate this information in the grids below – problems are highlighted in the grids there, too, and take a look at the ms/Write column:

Partial grid of Performance Advisor Disk Activity dataPartial grid of Performance Advisor Disk Activity data

You can also look at this information retroactively; if someone complains about a perceived disk bottleneck yesterday afternoon or last Tuesday, you can simply go back using the date pickers in the toolbar and see the average throughput and latency for any range. For more information on the Disk Activity module, see the User Guide.

Performance Advisor also has a lot of built-in reports under the categories Performance, Blocking, Top SQL, Disk/File Space, and Deadlocks. The image below shows you how to get to the Disk/File Space reports. Having the reports just a few mouse clicks away is very valuable to be able to immediately dig in and view what is (or was) happening on your server.

Performance Advisor reportsPerformance Advisor reports

Summary

The important takeaway from this post is to know your performance metrics. A common statement among data professionals is that disk is our #1 bottleneck. Knowing the file stats of your server will go a long way in helping to understand the pain points on your server. In conjunction with file stats, your wait stats are a great place to look as well. Many people, including myself, start there. Having a tool like SQL Sentry Performance Advisor can drastically help you troubleshoot and find performance issues before they become too problematic; however, if you don’t have such a tool, getting familiar with sys.dm_os_wait_stats and sys.dm_io_virtual_file_stats will serve you well to start tuning your server.

The post Monitoring Read/Write Latency appeared first on SQLPerformance.com.

17 Mar 09:47

MVP Monday - SQL Server High Availability in Windows Azure Iaas

by MVP Award Program

Editor’s note: The following post was written by Cluster MVP David Bermingham

SQL Server High Availability in Windows Azure Iaas

When deploying SQL Server in Windows Azure you must consider how to minimize both planned and unplanned downtime. Because you have given up control of the physical infrastructure, you cannot always determine when maintenance periods will occur. Also, just because you have given control of your infrastructure to Microsoft it does not guarantee that you are not susceptible to some of the same types of outages that you might expect in your own data center. To minimize the impact of both planned and unplanned downtime Microsoft provides what are called Fault Domains and Upgrade Domains. By leveraging Upgrade Domains and Fault Domains and deploying either SQL Server AlwaysOn Availability Groups (AOAG) or AlwaysOn Failover Cluster Instances (AOFCI) you can help minimize both planned and unplanned downtime in your SQL Server Windows Azure deployment. Throughout this document when I refer to a SQL Server Cluster, I am referring to both AOAG and AOFCI. When needed, I will refer to AOAG and AOFCI specifically.

Fault Domains are essentially “a rack of servers”, with no common single point of failure between different Fault Domains, including different power supplies and network switches. An Update Domain ensures that when Microsoft is doing planned maintenance, only one Update Domain is worked on at a given time. This eliminates the possibility that Microsoft would accidentally reboot all of your servers at the same time, assuming that each server is in a different Update Domain.

When you provision your Azure VM instances in the same Availability Set, you are ensuring that each VM instance is in a different Update Domain and Fault Domain…to an extent. You probably want to read Manage The Availability of Virtual Machines to completely understand how VMs get provisioned in different Fault Domains and Update Domains. The important part of the availability equation is ensuring that each VM participating in your SQL Server cluster is isolated from each other, ensuring that the failure of a single Fault Domain or maintenance in an Update Domain does not impact all of your Azure instances at the same time.

So that is all you need to know….right? Well, not exactly. Azure IaaS does not behave exactly like your traditional infrastructure when it comes to clustering. In fact, before July of 2013, you could not even create a workable cluster in Azure IaaS. It wasn’t until then that they released hotfix KB2854082 that made it possible. Even with that hotfix there are still a few considerations and limitations when it comes to highly available SQL Server in Windows Azure.

Before we dive into the considerations and limitations, you need to understand a few basic Azure terms. These are not ALL the possible terms you need to know to be an Azure administrator, but these are the terms we will be discussing that are specific to configuring highly available SQL Server is Azure IaaS.

Virtual Network

Before you begin provisioning any virtual machines, you will need to configure your Virtual Network such that all your SQL Server Cluster VMs can reside in the same Virtual Network and Subnet. There is an easy Virtual Network Wizard that will walk you through the process of creating a Virtual Network. Additional information about Virtual Networking can be found here.
http://azure.microsoft.com/en-us/services/virtual-network/

If you are considering a Hybrid Cloud deployment where you stretch your on premise network to the Azure Cloud for disaster recovery purposes, you may want to review my blog post below. http://clusteringformeremortals.com/2014/01/07/extending-your-datacenter-to-the-azure-cloud-azure/

As you will see below, It is required that each SQL Server cluster must reside in a dedicated Cloud Service (see Cloud Service section below) and clients must connect to from outside of the Cloud Service. When creating subnets, I would create a small subnet for each cluster I plan to create. These subnets will only hold a handful of VMs and will be used exclusively for the Cloud Services that contain your SQL Server clusters.

Availability Set

As previously mentioned, an Availability Set is used to define Fault Domains and Update Domains. When provisioning your SQL Servers and File Share Witness (more on this later) make sure put all of your virtual machines in the same Availability Set. Availability Sets are described as follows…

“An availability set is a group of virtual machines that are deployed across fault domains and update domains. An availability set makes sure that your application is not affected by single points of failure, like the network switch or the power unit of a rack of servers.”

Cloud Service

Before you go Bing “Azure Cloud Service”, you need to understand that there is the overall marketing term “Cloud Service”, which is all fine and good, but not what we are talking about here. A Cloud Service in Azure IaaS is a very specific feature that is described as follows…

“A cloud service is a container for one or more virtual machines you create. You can create a cloud service for a single virtual machine, or you can load balance multiple virtual machines by placing them in the same cloud service.”

The other thing about a Cloud Service is that it is addressable by a single public IP address. All virtual machines residing in a Cloud Service can be reached by the Public IP associated with the Cloud Service and the endpoint ports defined when you create the virtual machine. Later in this article we will also learn that it is this public IP address that will be used instead of the traditional Cluster IP Resource for connectivity to the cluster.

The thing to remember about highly available SQL Servers is that when creating highly available SQL Server instances, you will want to place ALL of your SQL instances and the File Share Witness in the same Cloud Service. It is required that you have a different Cloud Service for each additional SQL Server Clusters that you create. I also recommend that you reserve that Cloud Service for only the SQL Server cluster nodes and the File Share Witness. You will see later in this article that all SQL Server cluster clients will need to reside outside of the cluster’s Cloud Service, which is just one of the reasons to keep only the SQL cluster nodes and File Share Witness in a dedicated Cloud Service.

You can create a Cloud Service, join an existing Cloud Service, create an Availability Set or join an Availability Set at the time you provision your Virtual Machines as shown in Figure 1 below.

 

Figure 1 - Cloud Service and Availability Set are defined when creating your virtual machine

Configuration of SQL Cluster

Now that we have a base understanding of some of the Azure terminology, we are ready to begin the cluster installation. Whether you are configuring an AlwaysOn Availability Group Cluster or an AlwaysOn Failover Cluster Instance, you will need to start with a basic cluster configuration. If you are using Windows Server 2012 R2, you are good to go. If you are using Windows Server 2012 or Windows Server 2008 R2, you will first need to install hotfix KB2854082on each cluster node.

Assuming you have minimally completed the pre-requisites below, you are ready to create your cluster.

Pre-requisites

1. Create your Azure Virtual Network

2. Provisioned three VMS. We’ll call these VMs SQL1, SQL2 and DC1 for simplicity through the rest of this document

3. These VMs should all reside in the same Cloud Service and Availability Set

4. Applied hotfix KB2854082 if necessary (pre-Windows 2012 R2)

5. Created a Windows Domain and joined all servers to the domain

Creating a cluster is pretty straight forward; I won’t go into great detail here as it is the same as creating an onsite cluster. The one major difference is at the end of the process. You will see that the Cluster Name resource will fail to come online. The reason the Cluster Name Resource fails to come online is that Azure VMs get their IP Address information from DHCP, which will issue the same IP to the cluster. When the non-RFC-compliant DHCP service in Azure issues a duplicate IP address the Cluster IP Address resource to fail to come online. In order to fix this, you will need to manually specify another address that is not in use in the subnet. Because we have no control over the DHCP scope, I would choose an IP address that is near the end of the scope. This is another reason why I like to limit the Cloud Service to just the cluster nodes, so I don’t accidentally provision another VM that uses an IP address I have already specified for my cluster.

Because there is no shared storage in Azure, you will notice that the quorum configuration defaulted to Node-Majority. Node-majority for a two node cluster is certainly not optimal. You will need to configure a File Share Witness (FSW). In my example configuration, I configured the FSW on DC1. Wherever you configure the FSW you should ensure that the FSW is in the same Availability Set as the cluster nodes. This ensures that don’t have a failure of a cluster node and the FSW at the same time.

Now that you have configured the basic cluster, you will need to decide whether you want to deploy an AlwaysOn Availability Group (AOAG), or whether you want to deploy an AlwaysOn Failover Cluster Instance (AOFCI). To deploy an AlwaysOn Failover Cluster Instance you will need to use a 3rd party, cluster integrated replicated volume resource, such as SIOS DataKeeper Cluster Edition as there is currently no shared storage option in Azure suitable for clustering.

AOAG or AOFCI?

This post assumes that you are familiar with SQL Server AlwaysOn options, if not you should review High Availability Solutions (SQL Server)

While AOAG can meet the needs of many, there are certainly situations where AOAG does not fulfill the requirements. The chart below highlights some of the limitations of AOAG in comparison to AOFCI with SIOS DataKeeper Cluster Edition.

 

Figure 2 - AOAG vs. AOFCI with DataKeeper

In my experience, the two biggest reasons why people are deploying AOFCI rather than AOAG is the support for SQL Server Standard Edition and because it protects the entire SQL Server Instance rather than just the user defined databases. The later reason becomes even more important after you discover that Windows Azure only supports one client access point, meaning with AOAG all of your database must reside in a single Availability Group. It is also much easier to create one AOFCI and have every database, including System and MSDB, be replicated and protected rather than having to manually manage Agent Jogs, SQL user accounts and each database manually as you do with AOAG.

Configuring AOFCI and AOAG

Once again, the basic configuration of AOFCI or AOAG in the Azure Cloud is pretty much identical to how you would configure these services with on premise servers. (For detailed instructions on deploying a #SANLess cluster with DataKeeper visit my article Creating a SQL Server 2014 AlwaysOn Failover Cluster (FCI) Instance in Windows Azure IaaS). The difference comes when you are configuring the client access point. As we saw with the initial cluster creation process, the Cluster Name resource will fail to come online because the DHCP service will hand out a duplicate IP address. However, instead of simply specifying another address in the same subnet, you will need to set the Client Access Point IP address to be the same as the Cloud Service’s public IP address, with a host specific subnet mask of 255.255.255.255. Clients will then access this SQL Cluster via load-balanced VM endpoints with direct server return. The directions outlined in the Configuring the Client Access Point section below will tell you exactly how to put this all together.

Configuring the Client Access Point

Configuring the client access point and the load balanced endpoint is probably the most confusing or misunderstood part of creating SQL Server clusters in Windows Azure, or at least it was for me. If you are configuring AOAG you are in luck, there is a great article that walks you through this process Step-by-Step.

http://msdn.microsoft.com/en-us/library/dn425027.aspx

However, if you want to configure AOFCI, you have to take some of the information supplied in that article and apply it to AOFCI rather than an AOAG. You can follow Steps 1 through 3 as described in http://msdn.microsoft.com/en-us/library/dn425027.aspx to create the load balanced endpoints. However, when you get to Step 4 you will have to make adjustments since you will already have configured a client access point as part of your SQL Server Cluster Role configuration. On Step 4, “Create the availability group listener”, you can skip 1-6 and continue with 7 through 10 to change the IP address of the SQL Server Cluster resource. Once the IP address has been changed, you can bring the SQL Server Failover cluster instance online.

Accessing the SQL Cluster in Azure

As previously described, the SQL Server cluster must be accessed from outside of the Cloud Service via the load balanced endpoint. Depending upon which server is active, the load balanced endpoint will redirect all client requests to the active server. At the end of the day, your SQL Server cluster should look something like Figure 3 shown below.

 

Figure 3 - Clients accessing the SQL Server Cluster

What about Hybrid Cloud?

While this blog post is focused on High Availability in the Azure Cloud, it is certainly possible to build Disaster Recovery configurations which have some SQL cluster nodes in Azure and some nodes on premise. For more information on Hybrid Cloud configurations, read my article Creating a multi-site cluster in Windows Azure for Disaster Recovery. That article describes Hybrid Cloud solutions such as those pictured in Figure 4 below.

 

Figure 4 - Hybrid Cloud for Disaster Recovery

Summary

Windows Azure IaaS is a powerful platform for deploying business critical applications. All of the tools required to build a highly available infrastructure are in place. Knowing how to leverage those tools, especially in regards to providing High Availability for SQL Server, can take a little research and trial and error. I hope that this article has helped point you in the right direction and has reduced the amount of research and trial and error that you will have to do on your own. As with most Cloud Service, new features become available very rapidly and the guidance in the article may become outdated or even wrong in some cases rather rapidly. For the latest guidance, please refer to my blog Clustering for Mere Mortals where I will attempt to update guidance as things in Azure evolve.

About the author


David Bermingham is recognized within the technology community as a high availability expert and has been honored by his peers by being elected to be a Microsoft MVP in Clustering since 2010. David’s work as director of Technical Evangelist at SIOS has him focused on evangelizing Microsoft high availability and disaster recovery solutions as well as providing hands on support, training and professional services for cluster implementations. David hold numerous technical certifications and draws from over twenty years of experience IT, including work in the finance, healthcare and education fields, to help organizations design solutions to meet their high availability and disaster recovery needs. David has recently begun speaking on deploying highly available SQL Servers in the Azure Cloud and deploying Azure Hybrid Cloud for disaster recovery.

About MVP Mondays

The MVP Monday Series is created by Melissa Travers. In this series we work to provide readers with a guest post from an MVP every Monday. Melissa is a Community Program Manager, formerly known as MVP Lead, for Messaging and Collaboration (Exchange, Lync, Office 365 and SharePoint) and Microsoft Dynamics in the US. She began her career at Microsoft as an Exchange Support Engineer and has been working with the technical community in some capacity for almost a decade. In her spare time she enjoys going to the gym, shopping for handbags, watching period and fantasy dramas, and spending time with her children and miniature Dachshund. Melissa lives in North Carolina and works out of the Microsoft Charlotte office.

 

 

16 Feb 00:39

VMware vSphere 6 : What’s New – Maximums

by Simon Seagrave
Mrdenny

now

VMware vSphere-6 Resource Maximums

 

VMware vSphere 6 has finally been announced!  Yes, that’s right folks, after a 3+ year wait since the last major vSphere release, VMware have announced vSphere 6 which will be publically available sometime in Q1 of this year (2015).

This is an evolutionary update to the widely adopted vSphere product, with over 650 new features and enhancement,  and as with all previous major upgrades to vSphere there has been increases to the resource and configuration levels (maximums) at the host, cluster and VM levels.

In fact, vSphere 6 sees its resource capabilities at least double in almost all of the major areas.  This is really quite something!

Below is a break-down of vSphere 6’s new resource and configuration maximums:

 

vSphere 6 Resource and Configuration Maximums

vSphere 6.0 vSphere 5.x
Hosts per Cluster 64 32
VMs per Cluster 8000 4000
CPUs per Host 480 320
RAM per Host 12TB 4TB
VMs per Host 2048 512
vCPUs per VM 128 64
vRAM per VM 4TB 1TB

 

VMware implemented a rigorous beta testing phase which saw the launch date slip by 6 months, though I personally prefer it when companies take their time to properly test a product before releasing it.  These new maximums certainly open up a whole new world of possibilities, especially with the increasing popularity of hyper-converged infrastructures.

Share The Love:  If you found this post of use, would you please share it with others via the social media buttons to the left hand side of the screen. Thanks :)

The post VMware vSphere 6 : What’s New – Maximums appeared first on TechHead and was written by Simon Seagrave.