Shared posts

15 Nov 04:00

The IDENTITY_CACHE Option in SQL Server

by Artemakis Artemiou

The IDENTITY_CACHE option, is a new database scoped configuration option in SQL Server 2017 and Azure SQL Database. This option is set to ON by default. The concept of identity caching in SQL Server is that the Database Engine stores in cache a series of values for identity columns and…

The post The IDENTITY_CACHE Option in SQL Server appeared first on SQLNetHub.

15 Nov 02:21

Finally, SSMS will talk to Azure SQL DW

by Rob Farley

Don’t get me started on how I keep seeing people jump into Azure SQL DW without thinking about the parallel paradigm. SQL DW is to PDW, the way that Azure SQL DB is to SQL Server. If you were happy using SQL Server for your data warehouse, then SQL DB may be just fine. Certainly … Continue reading Finally, SSMS will talk to Azure SQL DW

The post Finally, SSMS will talk to Azure SQL DW appeared first on LobsterPot Blogs.

15 Nov 02:21

You’ve been doing cloud for years…

by Rob Farley

This month’s T-SQL Tuesday is hosted by Jeffrey Verheul (@devjef) and is on the topic of Cloud. I seem to spend quite a bit of my time these days helping people realise the benefit of the Azure platform, whether it be Machine Learning for doing some predictions around various things (best course of action, or … Continue reading You’ve been doing cloud for years…

The post You’ve been doing cloud for years… appeared first on LobsterPot Blogs.

15 Nov 02:15

The effort of relocating blog sites

by Rob Farley

Hopefully you've realised that I'm not posting at sqlblog.com any more. There's still some excellent content there, but it has come up time and time again that I should be posting at a company blog site – so the move has now been made. I've also brought across the material that I wrote at msmvps.com, … Continue reading The effort of relocating blog sites

The post The effort of relocating blog sites appeared first on LobsterPot Blogs.

15 Nov 02:13

UPDATEs to Statistics

by Erin Stellato

The last several releases of SQL Server have introduced a slew of new features, as well as improvements in existing functionality. One area of the engine which is easy to overlook is statistics. After all, statistics still get created the same way, they still tell you about the distribution of data, they’re still used by the Query Optimizer… what’s different? The basic function of statistics remains the same – but how they’re used by the Query Optimizer does change depending on the Cardinality Estimator you’re using. There are also several noteworthy changes related to updating statistics and new functionality has been added around viewing statistics information. Altogether, these changes across the latest releases can cause a variation in SQL Server behavior that you weren’t expecting.

Note: This post is most applicable to SQL Server 2012 and higher, but some detail for prior releases is included for reference (and fun).

SQL Server 7.0

  • The number of steps in a histogram is limited to 300. In SQL Server 6.5 and earlier a histogram would have the number of steps that could fit on a 2K page, based on the size of the first column in the key.
  • The ‘automatically update statistics’ database option is introduced; previously statistics were only updated manually.

SQL Server 2000

  • The number of steps in the histogram is reduced from 300 to 200 (technically 201, if you include the step for NULL, assuming the first column in the key allows NULLs).

SQL Server 2005

SQL Server 2008

SQL Server 2008R2 SP1

  • Trace Flag 2371 is made available, which can be used to reduce the number of modifications required for automatic updates to statistics to occur. As a reminder, I’m a fan of updating statistics on a regular basis through a scheduled job and leaving the auto update enabled as a safety.

SQL Server 2008R2 SP2

  • The function sys.dm_db_stats_properties is included, which provides the same information found in the header of DBCC SHOW_STATISTICS, as well as a modification column that could be used to track changes and programmatically determine if an update was needed. Remember my preference for using a job to update stats? That job just got a lot smarter with this DMF…now I can look to see how much data has been modified and ONLY update statistics if a certain percentage of data has changed. Slick.

SQL Server 2012

SQL Server 2012 SP1

  • DBCC SHOW_STATISTICS only requires the SELECT permission – previously it required a user to be a member of sysadmin, or a member of the db_owner or db_ddladmin database role. This can be globally disabled with trace flag 9485.
  • Includes sys.dm_db_stats_properties (see 2008R2 SP2 note above)

SQL Server 2012 SP2

  • Cumulative Update 1 introduces a fix related to ascending keys not being properly identified even with trace flags 2389 and 2390 in use. This requires trace flag 4139 in addition to CU1, as noted in KB 2952101.

SQL Server 2014

  • The new Cardinality Estimator is introduced, implemented by setting the database compatibility mode to 120, or by using trace flag 2312. If you haven’t read anything about the new CE I recommend starting with the Cardinality Estimation documentation and then reading Joe Sack’s whitepaper, Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator, for in-depth details.
  • The behavior from Trace Flags 2389 and 2390 for ascending keys is now implemented via the database compatibility mode. If your databases compatibility mode is set to 120 (or higher in later releases), you do not need to use Trace Flags 2389 and 2390 for SQL Server to identify statistics that have ascending keys.
  • Incremental statistics are introduced for partitions, and can be viewed through the new DMF sys.dm_db_incremental_stats_properties. Incremental statistics provide a way to update statistics for a partition without updating them for the entire table. However, the additional statistics information from the incremental statistics is not used by the Query Optimizer, but it is folded into the main histogram for the table.
  • CU2 includes the same fix mentioned above for SQL Server 2012 SP2 that also requires trace flag 4139.

SQL Server 2014 SP1

  • Trace flag 7471 is back-ported to CU6, originally available in SQL Server 2016 as noted below.

SQL Server 2016

SQL Server 2016 SP1

  • The query hint option ENABLE_HIST_AMENDMENT_FOR_ASC_KEYS is introduced, along with the FOR HINT argument, which is the equivalent of trace flag 4139.
  • The DMF sys.dm_db_stats_histogram is exposed in CU2, which is an alternative to the histogram output from DBCC SHOW_STATISTICS. The information in both is the same, use what’s easier for you or better fits the problem you need to solve.
  • The option PERSIST_SAMPLE_PERCENT is introduced in CU4, which can be used to force the same sampling rate to be used every time a statistic is updated going forward, and examples of this behavior can be found in the post, Persisting statistics sampling rate.

SQL Server 2017

  • The attributes StatsInfoType and OptimizerStatsUsageType are added to the Query Plan, which lists statistics used during query optimization. This is pretty cool! I haven’t had a chance to play with this yet, but to get this information previously you had to use undocumented trace flags.

Summary

If you are looking to upgrade to a newer release, or if you’ve recently upgraded, take note as to how these changes impact your solution. We’ve had many clients contact us after upgrading from 2005/2008/2008R2 to 2014 or 2016, complaining of performance issues. In many cases, adequate testing was not completed prior to the upgrade.

This is something we really focus on when we’re helping a client upgrade. Beyond the steps to move a production instance from one version to another with little downtime, we want to make sure that the day after the upgrade is a boring one for DBAs and developers.

We don’t simply test the upgrade process, we test what the system looks like after the upgrade. Are the same trace flags from the old environment needed in the new one? What database settings need to be adjusted? Does query performance change – for better or worse? If you don’t know the answers to those questions before you upgrade production, then you’re setting yourself up for one to many days of fire-fighting, Sev 1 calls, meals at your desk, not enough sleep and who knows what else.

Take the time up front to understand the impact of the new features and changes in functionality listed above, plan the upgrade, and test as much as possible.

The post UPDATEs to Statistics appeared first on SQLPerformance.com.

15 Nov 02:12

New Content for Microsoft Azure’s HDInsight for March 2014

by BuckWoody
15 Nov 02:10

Top Questions from New Users of Azure SQL Database

by Dimitri Furman

Reviewed by: Kun Cheng, John Hoang, Denzil Ribeiro, Rajesh Setlem

Over the years, we have been engaged with many customers who were either already using, or were in the process of adopting Azure SQL Database. Most of those engagements involved deep technical work in the areas of performance, resiliency, connectivity, and general database architecture. During these engagements, and in our interactions with the wider database community, a set of common questions about Azure SQL Database kept coming up.

This blog attempts to answer these questions, while also providing additional background and references for more information. Our goal is to help new and existing users of the service to implement their data tier on Azure SQL Database quickly and efficiently, while avoiding common mistakes.

We should add a caveat that Azure SQL Database is a fast-changing service, with improvements rolled out continuously. This blog is based on the state of the service as of October 2017. It has been updated in July 2018 to reflect recent changes in the service. Some of the information in this blog could become obsolete or superseded in the future. In case of discrepancies between this blog and Azure SQL DB documentation, the latter should take precedence.

Use the links below to quickly get to a particular question.

Q1. How can I check if my SQL Server database is compatible with Azure SQL Database?
Q2. How should I choose the service tier and the performance level for my database(s)?
Q3. How do I monitor resource consumption and query performance?
Q4. Should I use Elastic Pools?
Q5. How should I configure firewall rules?
Q6. Can I create a database in a VNet?
Q7. Can I use Windows Authentication in Azure SQL Database?
Q8. How should I implement retry logic?
Q9. What is the best way to load large amounts of data in Azure SQL Database?
Q10. What if my application is chatty?
Q11. What kind of workload is appropriate for Azure SQL Database?
Q12. How do I scale out my database?

Q1. How can I check if my SQL Server database is compatible with Azure SQL Database?

While built from a common code base, SQL Server and Azure SQL Database have differences in supported features, behaviors, and best practices. The Azure SQL Database features documentation topic is a great starting point for research in this area.

Today, the Data Migration Assistant (DMA) tool is the simplest way to discover a wide variety of compatibility issues in databases that are candidates for migration to Azure SQL Database. The tool can access the source SQL Server database to discover potential migration blockers, unsupported features, and other compatibility issues, providing detailed findings and recommendations. DMA can also be used to migrate your database schema and data in an efficient way. Data is streamed from source to destination and is bulk loaded into the target database.

That said, for many existing applications built on SQL Server over the last years and decades, migrating to Azure SQL Database requires changes, needed to address the differences and gaps in features and behaviors. The new Managed Instance offering of Azure SQL Database, in private preview as of this writing, is intended to simplify this migration effort as much as possible. With Managed Instance, instead of provisioning one or more standalone and isolated databases, you provision a SQL Server instance in the cloud. That instance is very similar in features and behavior to SQL Server running on-premises. Conceptually, this is the same model that has been used to provision SQL Server over the last few decades, but with all the advantages of a PaaS service.

Initial information about Managed Instance is available in this blog, and more information will become available later in 2017 and 2018. For now, the rest of this blog will target the current Azure SQL Database service.

Q2. How should I choose the service tier and the performance level for my database(s)?

In today’s Azure SQL Database, there are four service tiers: Basic, Standard, Premium, and Premium RS. Each service tier has different capabilities and goals, described in What are Azure SQL Database service tiers. [July 2018 update]: The new vCore purchasing model for Azure SQL Database introduced the General Purpose and Business Critical service tiers, which are analogous to Standard and Premium service tiers. The Premium RS service tier has been deprecated.

Within each service tier, several performance levels are available. Each performance level comes with a certain allocation of resources, and can therefore sustain a certain level of performance for a given database workload. The performance level determines the cost of the database. [July 2018 update]: Resources available in each performance level are documented for the DTU purchasing model, and for vCore purchasing model.

Clearly, how to choose the most appropriate service tier and performance level is a top question for most customers, whether using Azure SQL Database for a new application, or migrating an existing application. At the same time, the answer is rarely straightforward. Even when you know the resource demands of the existing database in the on-premises environment exactly, multiple factors can change these demands during cloud migration, making the choice of the performance level an art, rather than science. For a more detailed discussion of this topic, refer to the Service Tier Considerations section of this blog.

Nevertheless, there is a silver lining in every cloud: the switching from one performance level to another in Azure SQL Database is, from a user’s perspective, a very simple process. While the duration of the scaling operation from one performance level to another can be variable, ranging from minutes to hours, the database remains online at the original performance level for the duration of the scaling operation. There is a short (seconds) period of unavailability at the end of this operation, after which the database is at the new performance level.

The strategy of making an educated guess of the performance level, monitoring performance, and adjusting it up or down has worked quite well for many customers, reducing the importance of getting the performance level “just right” from the outset.

Here, we’ll mention a few suggestions that should help in selecting the service tier, keeping in mind that these are not firm rules, and that exceptions do exist:

1. The Basic tier should not be used for production workloads. Available resources are minimal, and even simple queries can run with greatly varying durations. Storage performance is highly variable. At the same time, this is a good choice for databases used for simple development or functional testing activities, or for sufficiently small databases that are not used for long periods of time, and can therefore be scaled down to the Basic tier to save costs.

2. The Standard tier is suitable for a broad variety of common database workloads that do not have high performance demands. While storage performance in the Standard tier is very predictable compared to Basic, storage latency stays in the 2-10 ms range. A recent improvement greatly increased the variety of workloads suitable for the Standard tier. Now, the compute resources available to a Standard database can range to the same upper limit as in the Premium tier. At the same time, additional storage up to 1 TB becomes available in Standard as well. This makes it possible to use the Standard tier for both the “fast and small” and “slow and large” databases, as long as the additional capabilities of the Premium tier are not required.

3. The Premium tier is used for workloads that are the most demanding in terms of performance and availability. Its most distinguishing features are local SSD storage with sub-millisecond latency, and the use of multiple online replicas for fast failover, typically within single seconds. Additionally, certain database engine features such as In-Memory OLTP and Columnstore Indexes are only available in Premium. [July 2018 update]: Columnstore indexes are now also supported in Standard databases (starting with S3), and in all General Purpose databases.

4. The Premium RS tier, currently in preview, provides the same performance and feature capabilities as Premium (within the P1-P6 subset of performance levels), but without the availability SLA that Premium and other tiers have. It is suitable for non-production workloads that still require high performance, e.g. for load testing, or for workloads where RPO measured in minutes (i.e. the possibility of losing the most recent data changes in case of a crash) is acceptable. Premium RS is significantly less expensive than Premium.

[July 2018 update]: The vCore purchasing model simplifies the choice of performance level by allowing the customers to select the actual number of logical CPUs (vCores) for their database, and use specific hardware generations with documented per-core ratios for memory and IOPS. The transaction log and tempdb limits are also documented for each performance level.

Q3. How do I monitor resource consumption and query performance?

Monitoring database performance in Azure SQL Database provides detailed instructions on how to monitor resource consumption of your database. Here we will mention just a few additional considerations.

The standard measure of resource consumption in Azure SQL Database is known as Database Transaction Unit (DTU). Each performance level has a certain maximum DTU value, corresponding to the maximum amount of resources (CPU, memory, data IO, log write) that the workload can consume. When workload demands exceed this maximum, queries continue to run, however they run slower due to limited resources. In extreme cases, queries can time out, causing exceptions in the application.

For monitoring purposes, each measure that contributes to the DTU value (CPU, data IO, log write) is expressed as a percentage of its maximum for the current performance level. DTU itself is defined simply as the highest of these three percentages at a given point in time. For example, in the following result set, obtained from the sys.dm_db_resource_stats DMV, the average DTU for the 15-second period ending on 2017-10-06 15:50:41.090 (top row) is 90.48%, because that is the highest value (avg_cpu_percent) in the preceding three columns:

clip_image002

However, in the following example, DTU is at 100%, because the maximum value is avg_log_write_percent:

clip_image004

Here is a sample monitoring query to obtain these results:

SELECT end_time,
       avg_cpu_percent,
       avg_data_io_percent,
       avg_log_write_percent,
       (
       SELECT MAX(v)
       FROM (
            VALUES (avg_cpu_percent),
                   (avg_data_io_percent),
                   (avg_log_write_percent)
            ) AS dv(v)
       ) AS avg_DTU_percent
FROM sys.dm_db_resource_stats;

If you see that DTU consumption is close to 100%, it is often helpful to look at its three components to better understand the nature of the resource constraint the workload is facing. Here is an example from Azure portal showing that DTU is at 98.84%, because CPU is at 98.84%:

clip_image006

By default, the only metric shown on the database metrics graph in the portal is DTU consumption. In the above example, the CPU percentage, Data IO percentage, and Log IO percentage metrics have been added to the graph, making it clear that the workload is CPU bound, with log write a close secondary bottleneck.

If you find that the performance of your application is negatively affected because of a resource constraint, i.e. if you see that DTU is close to 100%, you can either scale up the database to a higher performance level, and possibly a higher service tier, to increase available resources, or tune the workload to perform efficiently within available resource bounds.

The latter is nothing more than the traditional query and index tuning that most DBAs are very familiar with. While many older tuning techniques are still applicable to Azure SQL Database, the availability of Query Store greatly simplifies the first part of this process, namely finding the top resource consuming queries. Query Store data can be accessed from SSMS (either from the UI, or querying Query Store views directly). Azure portal also provides a UI to examine and act on Query Store data under the Query Performance Insight blade.

Once you know the top resource consumer queries, you can examine their plans and tune them, i.e. via query rewrite or via indexing changes. If multiple plans for the same query exist, Query Store provides the ability to force the most efficient plan. Azure SQL Database can automate the process of forcing optimal plans on an ongoing basis, effectively resulting in a self-tuning database. For more details, see Automatic tuning. This is also available in the portal under the Automatic tuning blade.

clip_image008

For some performance troubleshooting scenarios, it may be necessary to trace individual query executions and other relevant events occurring in the database engine. Similarly to SQL Server, this can be achieved using Extended Events, with some notable differences:

1. Event sessions are always scoped to a database

2. The file target writes data to Azure Blob Storage, rather than a file

3. Not all events available in SQL Server are available in Azure SQL Database

The xel files written to a storage container can be copied locally and opened in the SSMS XEvents UI, or queried from T-SQL.

DBAs experienced in performance troubleshooting in SQL Server often use PerfMon to diagnose performance problems. While PerfMon is not available in Azure SQL Database, the sys.dm_os_performance_counters DMV provides all database engine performance counters (though the Windows counters are not available). This blog includes a script that uses this DMV to collect performance counter data in a table.

Q4. Should I use Elastic Pools?

Elastic Pool is a special type of performance level that exists in all service tiers. In an elastic pool, the resources (CPU, memory, IOPS/throughput, disk space) are shared among all databases in the pool. This can result in significant cost savings from buying less resources than would be needed for the same number of standalone databases.

Elastic pools are particularly attractive when the application uses a very large number of databases, e.g. for multi-tenant applications that use the tenant-per-database model. This is not the only use case for elastic pools though. In general, they should be considered whenever the number of deployed databases is greater than two. The important caveat here is that if spikes in the workloads of multiple databases happen at the same time, they could compete for the same resources, possibly resulting in lower throughput for all databases. In extreme cases, if cumulative workload demands exceed certain pool-level limits (i.e. the worker limit), all databases in a pool could become unavailable for new queries.

For additional details about elastic pools, see Elastic pools help you manage and scale multiple Azure SQL databases.

Q5. How should I configure firewall rules?

When an Azure SQL Database logical server is created initially, a special firewall rule that allows access to the entire Azure IP address space is enabled. In the portal, it looks like this:

clip_image010

This may be required for some applications and scenarios, e.g. if you are planning to use the Import/Export service to import bacpacs, or if the application tier is deployed in Azure and does not have a static IP address.

This firewall rule is very broad. For many applications, leaving it enabled permanently would violate the principle of least privilege. If the logical server will only be accessed from a small set of IP addresses, disable this rule, and instead create rules for each source IP address or range, using descriptive names. A simple example would look like this:

clip_image012

By default, the source IP address of connections originating from Azure VMs is dynamic, which makes it necessary to enable the special “Azure services” rule to allow access from your Azure VMs. However, by assigning a reserved IP address to a virtual machine or a cloud service, it is possible to create a much narrower set of rules for a small set of reserved IP addresses instead.

So far, we have been talking about server-level firewall rules. In addition, or instead of server-level rules, database-level rules can be used as well. For many scenarios, database-level rules have an advantage. Specifically, they make the database more portable. If the database is moved to a different server, or if a replica of the database is created, the firewall rules are preserved, and no additional configuration is required on the new server. On the other hand, if you have many databases on a logical server, then it may be easier to manage firewall rules at the server level, rather than in each database. For more details on firewall rules, see Azure SQL Database server-level and database-level firewall rules. One important point to note is that server-level and database-level rules are not “one on top of the other”. A connection attempt is allowed if either a server-level or a database-level rule allows it.

Q6. Can I create a database in a VNet?

Until recently, the answer to this very frequent question was an unqualified no. Today, there are two relevant developments in this area.

VNet Service Endpoints is a new option for managing network security in Azure SQL Database, currently in preview in a subset of regions. [July 2018 update]: VNet Service Endpoints are now Generally Available. This feature lets a network administrator allow connections to the database server only from a particular subnet in an Azure VNet. This allows a commonly asked for scenario where no IP address-based firewall rules are created at all, and only connections from a private VNet (or several such VNets) are allowed. Importantly, the public IP endpoint for the database still exists; however, access to this endpoint is controlled by specifying VNet/subnet identifiers, rather than by specifying public IP addresses.

Managed Instance, once available, will allow the instance to be created within a VNet, and accessible only via a private IP address, resolving the security and compliance concerns that many customers have today.

Q7. Can I use Windows Authentication in Azure SQL Database?

The short answer is no. Therefore, if you are migrating an application dependent on Windows Authentication from SQL Server to Azure SQL Database, you may have to either switch to SQL Authentication (i.e. use a separate login and password for database access), or use Azure Active Directory Authentication (AAD Authentication).

The latter is conceptually similar to Windows Authentication in the sense that connections from directory principals are authenticated without the need to provide additional secrets, such as a password. Since Azure Active Directory can be federated with the on-premises Active Directory Domain Services, it can effectively authenticate the same Active Directory principals that could access the database prior to migration. However, the authentication flow for AAD Authentication is significantly different, so the analogy with Windows Authentication only goes so far.

Q8. How should I implement retry logic?

It is widely recognized that applications running in public cloud should be resilient to transient faults. This requires implementing retry logic in the application. For applications using Azure SQL Database, there are two kinds of operations that may have to be retried when the database is temporarily unavailable. One is opening a connection to the database, and the other is executing a command against the database (e.g. running a query or making a procedure call).

Retrying connection attempts is relatively straightforward, though there are some things worth mentioning.

One is that some client libraries, specifically the ones wrapping ODBC and SqlClient drivers, may take advantage of the Idle Connection Resiliency feature in SQL Server and Azure SQL Database. For ADO.NET, this is available starting with version 4.5.1. This feature can transparently reestablish some broken database connections without any additional code in the application. Does this mean that retry logic for opening connections is no longer needed? Not really – there are still cases when even with Idle Connection Resiliency, a broken connection cannot be recovered and needs to be opened by the application again, i.e. because connection state cannot be fully reestablished (consider a temp table that no longer exists on the server once the connection is broken).

Another note is related to ADO.NET connection pooling, which implements a behavior called “blocking period”. For details, see SQL Server Connection Pooling (ADO.NET). In summary, if an error occurs when a connection is being opened, connection attempts made in the next five seconds will fail fast without actually attempting to connect to the database. Furthermore, if connection attempts made after five seconds continue to fail, the blocking period will be progressively increased. As a practical outcome, this means that with blocking period enabled, connection failures should be initially retried after at least five seconds have elapsed, to avoid being further delayed by the blocking period.

Starting with ADO.NET 4.6.2, the blocking period behavior is configurable via a connection string attribute. By default in that version, blocking period is disabled for Azure SQL Database, but enabled for other SQL Server connections. If blocking period is disabled, it becomes feasible to retry connections more frequently. For Premium databases, most failovers by far complete within ten seconds. If a connection attempt fails with a timeout during a failover, and the timeout is set to a sufficiently large value (30 seconds is recommended), then by the time the application gets the timeout error, the failover has likely completed, and the connection can be retried right away.

Improvements in the way client libraries handle transient faults in Azure SQL Database have been made in recent releases. We recommend using the latest libraries, i.e. ADO.NET 4.6.2 or later, and JDBC 6.2.2 or later.

Azure SQL Database implements a mechanism that guards against Denial Of Service (DOS) attacks. This mechanism may be triggered if the number of connection attempts per unit time exceeds a certain threshold. This is relevant in the context of implementing retry logic. If retries are too aggressive, i.e. if the application attempts to reestablish all connections at the same time or in a tight loop, then the DOS mechanism may be triggered. The symptom of triggering DOS protection is getting error 18456, “Login failed for user …”, with state code 113. If this happens, consider adding a randomized delay before attempting to reconnect, or implement an exponential back-off policy for retries.

Unlike connections, retrying database commands is much more complicated. Whether you should retry a particular command depends not only on the kind of error you get, but also on the business logic being implemented by the command, the state of the application (i.e. is there an open transaction), and the state of the data in the database (i.e. have other processes modified the data in an unexpected way). Due to these complexities, a general recommendation on command retry other than “it depends” would be fraught with peril. One approach to consider is to retry not individual database commands, but the higher level “application transactions”, such as creating a user account, or checking in a code changeset, or transferring points from one account to another.

Before implementing retry logic in your application, we strongly recommend reviewing Troubleshoot, diagnose, and prevent SQL connection errors and transient errors for SQL Database for detailed info.

Q9. What is the best way to load large amounts of data in Azure SQL Database?

In principle, the answer is the same as it is for the traditional SQL Server. Bulk load still provides the highest throughput. This includes methods such as BCP, BULK INSERT, OLEDB Destination in SSIS, the SqlBulkCopy class in ADO.NET, and SQLServerBulkCopy class in JDBC Driver.

One significant difference between SQL Server and Azure SQL Database is that the latter uses resource governance to limit the amount of resources available in each performance level. Transaction log write throughput is one of governed resource types, and it is often 100% utilized during bulk load, limiting load throughput. If your large data load is an infrequent event, consider temporarily scaling up to a higher performance level that allows higher transaction log write throughput. To monitor resource consumption during data load and see if log write is the bottleneck, use the sys.dm_db_resource_stats DMV, as described in Monitoring database performance in Azure SQL Database.

Another difference is that in traditional SQL Server, an improvement in data load throughput can be achieved if minimal logging is used. However, that requires the database to use either Simple or Bulk-Logged recovery model. All Azure SQL DB databases use Full recovery model to support RPO and RTO goals; therefore, data load is always fully logged, potentially reducing load throughput.

Note that you may be able to reduce log usage by using clustered columnstore indexes and using batch sizes larger than 102400 rows, to load directly into compressed row groups. For details, see Columnstore indexes – data loading guidance.

One new method for loading data into Azure SQL Database is using BULK INSERT to load data from Azure Blob Storage. This supports both native BCP and CSV files as the source. If copying data from an on-premises database, an efficient way would be to export data in native BCP format, upload to a storage account in the same region as the target Azure SQL DB database using AzCopy, and then load into the target database using BULK INSERT.

Q10. What if my application is chatty?

A chatty application makes frequent small requests to the database, resulting in many network round trips. For example, the application could be inserting or updating data one row at a time, or using client-side cursors. In general, such patterns are much less efficient than using set-based statements that affect multiple rows, or sending multiple SQL statements to the server in a single batch, rather than in a separate batch per statement.

For Azure SQL Database, the usual advice for this type of applications has been to refactor code to batch database requests as much as possible, as described in How to use batching to improve SQL Database application performance. That advice is still sound (and not just for Azure SQL Database), and should be followed when developing new applications.

At the same time, making an existing application less chatty may not be a small task. There are improvements in Azure SQL Database that allow chatty applications to perform better today by reducing network latency, compared to the early days of the service.

One improvement has been in place for some time, and is known as client redirection. This removes the regional gateway from the network path for established connections, thus significantly reducing network latency. Client redirection is described in Azure SQL Database Connectivity Architecture, with additional details available in Connect to Azure SQL Database V12 via Redirection. By default, client redirection is enabled only when the client IP address is a part of Azure IP address space. If the client is outside of Azure, the latency of the WAN link is likely large enough to make the improvement from using client redirection insignificant. Nevertheless, if warranted, client redirection can be enabled for all clients, as described in the previous two links.

Azure SQL Database also takes advantage of the Accelerated Networking (AN) improvement in Azure. While this is not yet available for all databases, the observed roundtrip network latency for databases on AN-enabled infrastructure that also use client redirection is below 1 ms. [July 2018 update]: If using the vCore purchasing model, customers can choose Gen5 as the hardware generation. All Gen5 capacity in Azure SQL Database is AN-enabled.

Q11. What kind of workload is appropriate for Azure SQL Database?

Most of the time, Azure SQL Database is positioned as most appropriate for transactional OLTP workloads, i.e. the workloads consisting primarily of many short transactions occurring at a high rate. While such workloads do indeed run well in Azure SQL Database, some types of analytical and data warehousing workloads can also use the service. In the Premium service tier, Azure SQL Database supports columnstore indexes, providing high compression ratios for large data volumes, and memory-optimized non-durable tables, which can be used for staging data without persisting it to disk. [July 2018 update]: Columnstore indexes are now also supported in Standard databases (starting with S3), and in all General Purpose databases. At higher performance levels, the level of query concurrency is similar to what can be achieved on a traditional SQL Server instance in a large VM, albeit without the ability to fine-tune resource consumption using Resource Governor. While SQL Server in an Azure VM always uses remote storage for database files, the Premium/Business Critical service tier in Azure SQL Database uses local SSD storage, which provides much higher IOPS/throughput, and makes Azure SQL Database attractive for small to medium scale analytical workloads.

There are several limiting factors that determine whether an analytical workload can run efficiently, or at all, in Azure SQL Database:

1. Database size limitations. Today, the largest available database size is 4 TB, currently ruling out the service as the platform for larger data warehouses. At the same time, if your data warehouse database exceeds this limit, but currently does not use columnstore indexes, then the compression benefit of columnstore may well put it comfortably within the limit even if the actual uncompressed data volume is much higher than 4 TB.

2. Analytical workloads often use tempdb heavily. However, the size of tempdb is limited in Azure SQL Database.

3. As mentioned in the earlier question about large data loads, the governance of transaction log writes, and mandatory full recovery model can make data loads slower.

If your analytical or data warehousing workload can work within these limits, then Azure SQL Database may be a good choice. Otherwise, you may consider SQL Server running in an Azure VM, or, for larger data volumes measured in multiple terabytes, Azure SQL Data Warehouse with its MPP architecture.

Q12. How do I scale out my database?

Scaling out in this context refers to using multiple similarly structured databases to serve the application workload. The goal is to use much higher (in theory, limitless) amounts of compute and storage resources than what is available to a single database even in the top performance tier, and thus achieve much higher degree of performance and concurrency. There are two Azure SQL Database features to consider here: Geo-replication and Elastic Scale.

Geo-replication allows you to create multiple read-only replicas of the primary read-write database, maintained asynchronously with five second data latency SLA. Even though the name of the feature implies that these replicas are geographically dispersed, they can be created in any region, including the same region where the primary is located. In Azure portal, up to four replicas can be created, however more replicas can be created using PowerShell or ARM API, by specifying one of the four secondary replicas as the primary for a new set of replicas. Note that this configuration is intended to support scale-out for read-only workloads; not all failover scenarios are supported. This configuration has been tested for up to 16 replicas, with each of the four original secondaries being the primary for a set of up to three replicas.

Elastic Scale, or more specifically Elastic Database Client Library, allows you to shard your data over many databases. Thousands of databases are supported. Unlike Geo-replication, these databases are not read-only replicas; each database contains a distinct subset of data, and is both readable and writeable. The client library facilitates querying of these databases by using a shard map, which is stored in a separate database, describing the allocation of data across database shards.

15 Nov 02:10

Cray and Microsoft Bring Supercomputing to Microsoft Azure

by A.R. Guess

by Angela Guess According to a new press release, “Global supercomputer leader Cray Inc. today announced an exclusive strategic alliance with Microsoft Corp. that gives enterprises the tools to enable a new era of discovery and insight, while broadening the availability of supercomputing to new markets and new customers. Under the partnership agreement, Microsoft and […]

The post Cray and Microsoft Bring Supercomputing to Microsoft Azure appeared first on DATAVERSITY.

15 Nov 02:09

Is the Microsoft R Client a…client?

by BuckWoody
Microsoft has recently been on a tear introducing R into, well, everything. And now there are several R offerings – from Microsoft R Server and Microsoft R Open to R Services in SQL Server (2016) and now the Microsoft R Client. But is the Microsoft R Client a client? So it’s a command-line, or a GUI of some sort? No. I...
15 Nov 02:08

The Inherent Insecurity of Data Science

by BuckWoody
Data Science attempts to derive meaning from data. There are a lot of techniques, processes and tools you can use to do that – I cover those in this blog site. But Data Science is insecure – by default. And that’s a real problem. In a solution involving a Relational Database Management (RDBMS) system, you’ll...
15 Nov 02:08

Enabling the Linux Susbsystem in Windows 10

by BuckWoody
If you’ve upgraded to the Windows 10 Anniversary Edition (writing this as of 2016), you have a new way of working with software on your Data Science workstation. NOTE: The steps I’ll show here should be for your development laptop – as always, back up your system before making changes, set a restore point, use...
15 Nov 02:08

Hello Buck – I’d like to work at Microsoft

by BuckWoody
About once a week or so, after teaching a Data Science class or at an event, I’ll get asked “Hey – I’ve got a lot of experience and training in technology X – and I’d like to work at Microsoft. What do I do?” I’m was Germany the week I wrote this - teaching, and...
15 Nov 02:06

Migrating to Azure SQL DB using Transactional Replication

by Manohar Punna

Azure SQL DB is a PaaS offering from Microsoft on Azure for hosting a SQL database. This may not be ideal for all environments. But if you have a database which can work in isolation handling requests from one or more applications, Azure SQL DB may be a right fit for you. One common use-case … Continue reading Migrating to Azure SQL DB using Transactional Replication

The post Migrating to Azure SQL DB using Transactional Replication appeared first on LobsterPot Blogs.

15 Nov 02:05

The IDENTITY_CACHE Option in SQL Server

by Artemakis Artemiou

The IDENTITY_CACHE option, is a new database scoped configuration option in SQL Server 2017 and Azure SQL Database. This option is set to ON by default. The concept of identity caching in SQL Server is that the Database Engine stores in cache a series of values for identity columns and…

The post The IDENTITY_CACHE Option in SQL Server appeared first on SQLNetHub.

15 Nov 02:05

They Did it Again – The Release of SQL Server 2017

by kevin

THEY DID IT AGAIN – THE RELEASE OF SQL SERVER 2017

Not only has the Microsoft SQL Server engineering team added more and stronger leadership, outstanding individual engineers and staff members, but it is also radically moving its entire development process into a “cloud-first” ethos. Now, I’d like to show one of the most salient results of this new development process-accelerated product releases. It was only a few years ago (and is still the case for other data management platforms) that a major new release takes at least 2, but more likely 3 years. With the “production-ready” release of SQL Server 2017 Community Technology Preview 2.0, we are only 1 year from the last production release.

READ MORE…

Originally Posted May 05, 2017

The post They Did it Again – The Release of SQL Server 2017 appeared first on Kevin Kline.

15 Nov 02:03

Why Learn About SQL Server Internals?

by Kalen Delaney
It's not just about performance tuning! https://www.dbbest.com/blog/learn-sql-server-internals/ Please also stop by the DB Best booth if you're at the PASS Summit this week! https://www.eventbrite.com/e/live-qa-sql-pass-2017-with-kalen-delaney-tickets-3930295920...(read more)
15 Nov 02:01

Azure SQL Database Managed Instance

by James Serra

Azure SQL Database Managed Instance is a new flavor of Azure SQL Database that is a game changer.  It offers near-complete SQL Server compatibility and network isolation to easily lift and shift databases to Azure (you can literally backup an on-premise database and restore it into an Azure SQL Database Managed Instance).  Think of it as an enhancement to Azure SQL Database that is built on the same PaaS infrastructure and maintains all it’s features (i.e. active geo-replication, high availability, automatic backups, database advisor, threat detection, intelligent insights, vulnerability assessment, etc) but adds support for databases up to 35TB, VNET, SQL Agent, cross-database querying, replication, etc.  So, you can migrate your databases from on-prem to Azure with very little migration effort which is a big improvement from the current Singleton or Elastic Pool flavors which can require substantial changes.

I have created a presentation about Managed Instance here.  If you are not familiar with Azure SQL Database, first check out my introduction presentation.

Azure SQL Database Managed Instance is in private preview, and will be in public preview this calendar year and it will be generally available next calendar year.

For more details see the presentation at Ignite by Drazen Sumic called “Modernize your on-premises applications with SQL Database Managed Instances (BRK2217)” and this blog post by Lindsey Allen.

There was also a presentation at Ignite called “What’s new with Azure SQL Database: Focus on your business, not on the database (BRK2230)” on the new features in SQL Database (Adaptive Query Processing, SQL Graph, Automatic Tuning, Intelligent Insights, Vulnerability Assessment, Service Endpoint) as well details on Azure Data Sync and an introduction to Managed Instances.

More info:

Native database backup in Azure SQL Managed Instance

Top Questions from New Users of Azure SQL Database

Managed Instances versus Azure SQL Database—What’s the Right Solution for You?

15 Nov 01:58

Adaptive Query Processing in SQL Server 2017

by kevin

ADAPTIVE QUERY PROCESSING IN SQL SERVER 2017

The next release of Microsoft SQL Server, announced as SQL Server 2017, is available for download in CTP2 at http://tinyurl.com/SQLvNextLinux and offers many cool new features. One set of features that I really like to focus on is adaptive query processing. These new features speed processing of workloads, especially those that have historically been the hardest to tune.

READ MORE…

Originally Posted July 05, 2017

The post Adaptive Query Processing in SQL Server 2017 appeared first on Kevin Kline.

15 Nov 01:57

Three years in a row – Microsoft is a leader in the ODBMS Magic Quadrant

by SQL Server Team

We’re happy to report that Gartner has positioned Microsoft in the Leaders Quadrant in the 2017 Magic Quadrant for Operational Database Management Systems again this year. This is the third year that Microsoft has been positioned farthest in completeness of vision and ability to execute in the operational database management systems market.

At Microsoft, we’re dedicated to helping both enterprises and individuals realize their full potential. Our industry position in Operational DBMS is due to the unequaled capabilities of SQL Server.

The release of SQL Server 2017 brings the power of SQL Server to Windows, Linux, and Docker containers for the first time ever. Developers are able to build intelligent applications using preferred languages and environments, while enjoying in-memory performance across workloads, mission-critical high availability, and in-database advanced analytics. You can develop once and deploy anywhere in a consistent experience across your datacenter and public cloud.

SQL Server proves itself, year over year, to be the least vulnerable DBMS in the industry. Built for security from the ground up, SQL Server offers customers a layered protection approach that incorporates encryption, authentication, and monitoring and auditing at the disk, database, and application levels. Innovative security technologies like Always Encrypted, for encryption at rest and in motion, help transform global operations for the better.

Perhaps most noteworthy for organizations living in the real world of cost-benefit analyses, SQL Server 2017 remains one of the most cost-competitive DBMS offerings in the enterprise space. In fact, you can get all the robust business capabilities in SQL Server 2017 built-in to a single product SKU, without expensive add-ons — for one great, low total cost of ownership.

But don’t just take our word for it. We encourage you to take the time to read the full Gartner report.

And then take a moment to see how you can get free licenses when you migrate to SQL Server 2017. We’re confident you’ll find the industry-leading database you know and love — now across operating systems and application platforms, on-premises and in the cloud.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner complete document is available now. 

15 Nov 01:57

Latest news from PASS Summit 2017

by Marco Russo

PASS Summit is the largest conference about the Microsoft Data Platform technology. Started as a Professional Association for SQL Server community, now the PASS brand includes the many technologies included in the Microsoft Data Platform. It is a large and active community, and the PASS Summit is the biggest annual international event, but you will find hundreds of local and virtual events in SQL Saturday, PASS Local Groups and PASS Virtual Groups.

After two days of preconference (I delivered one about Data Modeling with Power BI), Wednesday is the start of the “large” conference. The first keynote was dedicated mainly to SQL Server – even if these days it is hard to understand where is the boundary between on-premises, cloud, relational and non-relational, Windows and Linux.

If you have already seen the news about SQL Server 2017, I suggest you to review the Rohan Kumar keynote. It is mostly a recap of all the new features introduced by this version, but there are some news also described in a fresh new blog post from Rohan itself. From my point of view the two most important news are:

  • In a few weeks we will have Microsoft SQL Operations Studio, a new light weight version of SQL Server Management Studio (I know, Microsoft will never use this comparison to avoid marketing overlapping message). Why yet another management tool? Because it will run on Windows, Linux, and Mac. Should I add anything else?
  • Improvements in SQL Data Warehouse performance introducing the new Compute-Optimized Tier (maybe less popular, but very important to those interested in this platform)

For Power BI and Analysis Services I expect some announcement this week (and – why not – a new version of Power BI Desktop?). However, in the main session delivered by Kamal Hathi and many other PMs from the Power BI team.

A nice demo was using Power BI with IoT data in a sport team. I worked on this many years ago, those days collectiong and transfering data for the analysis was the bottleneck. Now we literally have the opposite problem: selecting which data to analyze. Which is a much more interesting problem to solve.

However, the biggest impact is showing Power BI navigating over one trillion rows from Spark. Christian Wade made a show by slicing and dicing data using one trillion rows from a Spark database tracking location from social media applications, which included geocoding over time. This data is aggregated by the engine (Power BI? Analysis Services? Well, the engine is almost the same…) until a certain level and then, going at the detail level, Analysis Services runs the query to Spark (using DirectQuery? probably…)

I’ve been told that Power BI Premium will have this feature first, but I hope that more detail will be revealed in other sessions. I will keep this post updated. There are biggest changes ahead!

15 Nov 01:55

Azure Virtual Machine Developments for SQL Server Usage

by Glenn Berry

Microsoft has a large number of different Azure virtual machine series and sizes available if you are interested in using their infrastructure as a service (IaaS) option for hosting SQL Server instances. One initial decision you will need to make is what series and size of Azure VM you want to use for your particular SQL Server workload.

For on-premises SQL Server usage, you want to do some careful analysis of your workload and the available server models and processor models in order to get a server with an appropriate processor choice to minimize your SQL Server license costs and get the best performance and capacity possible at a given SQL Server license cost. The cost and performance difference between a good processor choice and a bad processor choice can be quite dramatic.

With an Azure VM, you need to do a similar analysis, but for somewhat different reasons. The licensing cost of SQL Server on an Azure VM is directly related to the number of cores in the VM. You want to choose a VM size that matches your expected needs for CPU performance and capacity, memory capacity, and storage performance.

Unlike an on-premise scenario, you are going to have much less control over the actual processor choice in the underlying host machine. With an Azure VM, you pick a particular machine series (such as a GS series) and size in a particular Azure Data Center, and then you get whatever processor Microsoft is using for that series and size in that Azure Data Center. Microsoft currently has 42 different regions that have Azure Data Centers, so you can use the matrix shown on this page to discover what VM series are offered in each region.

Azure Compute Unit (ACU)

One very important concept to understand when comparing and selecting an Azure VM SKU is the Azure Compute Unit (ACU), which lets you compare the compute performance (per core vCPU performance) across different Azure VM SKUs. This ACU measurement is currently standardized on a Small (Standard_A1) VM with a score of 100 and all other Azure SKUs then have ACU scores that represent approximately how much faster that particular Azure SKU can run a standard CPU benchmark. So, for example, a Standard_A1 has an ACU/vCPU score of 100 while a Standard_GS5 has an ACU/vCPU score of 240.

Looking at the ACU score for an Azure VM SKU gives you a decent idea of the single-threaded CPU performance of the processor used in the underlying host machine. It is also important to understand whether that particular processor supports Intel Turbo Boost and/or Intel Hyper-Threading, since not all processors used in the various Azure VM SKUs support these technologies.

You can do this by identifying the actual host CPU in your VM in the CPU page of Performance tab of Windows Task Manager or by using a utility like CPU-Z. Once you have identified the exact processor model in the host, you can use the online Intel ARK Database to get the specific details about that processor.

New Azure VM Sizes for SQL Server

A common issue with Azure VM sizing for SQL Server has been the fact that you were often forced to select a VM size that had far more virtual CPU cores than you needed or wanted in order to have enough memory and storage performance to support your workload, which increased your monthly licensing cost.

Luckily, Microsoft has recently made the decision process a little easier for SQL Server with a new series of Azure VMs that use some particular VM sizes (DS, ES, GS, and MS), but reduce the vCPU count to one quarter or one half of the original VM size, while maintaining the same memory, storage and I/O bandwidth. These these new VM sizes have a suffix that specifies the number of active vCPUs to make them easier to identify.

For example, a Standard_DS14v2 Azure VM would have 16 vCPUs, 112GB of RAM, and support up to 51,200 IOPS or 768MB/sec of sequential throughput (according to Microsoft). A new Standard_DS14-8v2 Azure VM would only have 8 vCPUs, with the same memory capacity and disk performance as the Standard_DS14v2, which would reduce your SQL Server licensing cost per year by 50%. Both of these Azure VM SKUs would have the same ACU score of 160.

One somewhat confusing issue with these new Azure VM SKUs is that the monthly Compute cost (which includes OS licensing) for both of these SKUs would be the same ($989.52/month for this example). Where you save is the reduced SQL Server monthly licensing costs.

Essentially what Microsoft is doing here is disabling cores in the VM to reduce your SQL Server licensing costs, which is something you are not allowed to do with on-premises SQL Server. With on-premises SQL Server, you are not allowed to disable processor cores in your UEFI/BIOS to reduce your SQL Server license costs. Even if you disable cores in the UEFI/BIOS, you are still required to license those cores for SQL Server.

Benchmarking Your Azure VM with CPU-Z

One very quick CPU benchmark you can run any machine (physical or virtual, Azure VM or not) is the built-in CPU-Z benchmark, which only takes about 20 seconds to complete. I created a Standard DS14-8_v2 in the South Central US Data Center. This VM was using the Intel Xeon E5-2673 v3 Haswell-EP processor, which is a 12-core processor with a base clock speed of 2.4GHz, and a Turbo clock speed of 3.2 GHz that appears to be a special bespoke processor SKU that is not in the Intel Ark Database. This processor family was introduced in Q3 of 2014, so it has been available for over three years. On this particular VM, I did not see Turbo Boost being used during my testing.

Looking at Figures 1 through 3, you can see the CPU-Z benchmark results for this Azure VM, with Figure 4 being the results for my admittedly fast Dell Precision 5520 laptop. My laptop has the same number of virtual cores as the Standard DS14-8_v2 Azure VM, but I am using a much newer Intel Xeon E3-1505M v6 Kaby Lake processor with a higher base clock speed and better single-threaded performance.

Figure 1: Task Manager from Standard DS14-8_v2 in South Central US

Figure 2: CPU-Z CPU Tab from Standard DS14-8_v2 in South Central US

Figure 3: CPU-Z Bench Tab from Standard DS14-8_v2 in South Central US

Figure 4: CPU-Z Bench Tab from Glenn’s Laptop

The post Azure Virtual Machine Developments for SQL Server Usage appeared first on SQLPerformance.com.

15 Nov 01:54

PASS Summit 2017 Followup

by drsql
Ah, back home in my comfy work chair, feet up surrounded by the warm glow of computer monitors with the football game on (and at the right time, not 3 hours too early!) It was quite a week, and I learned more stuff this year than normal, and have a few nuggets I really should have known before, that I know now. I will say that there was an interesting feeling this year, like maybe there were less people than had been in the past. It also may have been that I just went to Ignite, and that place was...(read more)
15 Nov 01:53

How would you do dynamic pivot SQL in Amazon Redshift? (This elegant script might be used in many other databases as well)

by Maria Zakourdaev

 

Several days ago I have spent a few hours of my life figuring out how to do dynamic pivot in Amazon Redshift. To tell you the truth, I have expected much more from this DBMS SQL language.

Redshift is based on Postgre SQL 8.0.2 ( which was released in 2005 !!!! )

Anything you would want for this, not too difficult task,  does not exits.  No stored procedures. No JSON datatype. No variables outside of UDF, no queries inside UDFs. “UDF can be used to calculate values but cannot be used to call SQL functions”. Python UDFs also cannot query the data, only perform calculations.

Finally I have found one useful function LISTAGG that helped me to get distinct values of all pivoted columns.

For each group in a query, the LISTAGG aggregate function orders the rows for that group according to the ORDER BY expression, then concatenates the values into a single string.

Syntax

Copy

LISTAGG( [DISTINCT] aggregate_expression [, 'delimiter' ] ) [ WITHIN GROUP (ORDER BY order_list) ]

As a next step, I have used Python UDF to dynamically build pivoting query ( I love Python more and more, this is such a nice language :-) . In addition, I have used Common Table Expression CTE ( yes, they have it! ) to pass a list of pivoted columns to Python UDF.

 

The last step, execution the SQL query, I am still trying to figure out – there is no dynamic query execution in Redshift SQL dialect. But my customer can execute this query from reporting tool.

 

Here is a setup and all queries.

 

Creating a table: 

create table maria_deleteme ( id int, year int, customer varchar, amount int);

 

Loading some test values:

insert into maria_deleteme values

( 12, 2010, 'customer1', 4),

( 2, 2010,'customer1', 80),

( 3, 2010,'customer2', 41),

( 4, 2011,'customer1', 45),

( 5, 2011,'customer2', 15),

( 6, 2011,'customer3', 18),

( 7, 2012,'customer1', 23),

( 8, 2012,'customer1', 1),

( 9, 2012,'customer1', 8),

( 10, 2012,'customer3', 98),

( 11, 2013,'customer2', 1);


This query gives us a list of pivoted columns:

       select listagg(distinct year','as years

       from maria_deleteme


The result will be :  2011,2013,2012,2010 

 

Using CTE over the above query and passing its result to Python UDF:

with vars

       as (

       select listagg(distinct year, ',') as years

       from maria_deleteme

       )

select maria_pivoting ('maria_deleteme',years,'year','customer','amount')

from vars;

 

The above query will print out this:

select   sum (case when year = 2010  then amount else 0 end) as "2010" ,

          sum (case when year = 2012  then amount else 0 end) as "2012" ,

          sum (case when year = 2011  then amount else 0 end) as "2011" ,

          sum (case when year = 2013  then amount else 0 end) as "2013" ,

          customer

 from maria_deleteme group by customer ;

 

Which results in: 

# 2010 2012 2011 2013 customer

1 0 98 18 0 customer3

2 41 0 15 1 customer2

3 88 32 45 0 customer1 

 

Python UDF: 

DROP FUNCTION maria_pivoting (varchar,varchar,varchar,varchar)

 

CREATE FUNCTION maria_pivoting(tablename varchar, list_vals varchar, pivot_col varchar, groupby_col varchar, counted_col varchar )

RETURNS varchar(4000) IMMUTABLE AS $$

      

       vals = list_vals.split(",")

       pivot_query = " select  "

 

       for value in vals:

           pivot_query = pivot_query + ' sum (case when {} = {}  then {} else 0 end) as "{}" ,'.format(pivot_col,str(value),counted_col,str(value))

      

       pivot_query = pivot_query + ' {} from {} group by {} ;'.format(groupby_col,tablename,groupby_col)

      

       return pivot_query

$$ LANGUAGE plpythonu;


 

I will be glad to hear what do you think about Redshift SQL language ( Of course I know that this is very powerful database for data processing. And VERY expensive.)

Yours, Maria

15 Nov 01:51

Analytics Platform System (APS) AU6 released

by James Serra

Better late than never: The Analytics Platform System (APS), which is a renaming of the Parallel Data Warehouse (PDW), released an appliance update (AU6) about a year ago, and I missed the announcement.  Below is what is new in this release, also called APS 2016.  APS is alive and well and there will be another AU next calendar year:

Microsoft is pleased to announce that the appliance update, Analytics Platform System (APS) 2016, has been released to manufacturing and is now generally available.  APS is Microsoft’s scale-out Massively Parallel Processing fully integrated system for data warehouse specific workloads.

This appliance update builds on the SQL Server 2016 release as a foundation to bring you many value-added features.  APS 2016 offers additional language coverage to support migrations from SQL Server and other platforms.  It also features improved security for hybrid scenarios and the latest security and bug fixes through new firmware and driver updates.

SQL Server 2016

APS 2016 runs on the latest SQL Server 2016 release and now uses the default database compatibility level 130 which can support improved query performance.  SQL Server 2016 allows APS to offer features such as secondary index support for CCI tables and PolyBase Kerberos support.

Transact-SQL

APS 2016 supports a broader set of T-SQL compatibility, including support for wider rows and a large number of rows, VARCHAR(MAX)NVARCHAR(MAX) and VARBINARY(MAX).  For greater analysis flexibility, APS supports full window frame syntax for ROWS or RANGE and additional windowing functions like FIRST_VALUELAST_VALUECUME_DIST and  PERCENT_RANK.  Additional functions like NEWID() and RAND() work with new data type support for UNIQUEIDENTIFIER and NUMERIC.  For the full set of supported T-SQL, please visit the online documentation.

PolyBase/Hadoop enhancements

PolyBase now supports the latest Hortonworks HDP 2.4 and HDP 2.5.  This appliance update provides enhanced security through Kerberos support via database-scoped credentials and credential support with Azure Storage Blobs for added security across big data analysis.

Install and upgrade enhancements

Hardware architecture updates bring the latest generation processor support (Broadwell), DDR4 DIMMs, and improved DIMM throughput – these will ship with hardware purchased from HPE, Dell or Quanta.  This update offers customers an enhanced upgrade and deployment experience on account of pre-packaging of certain Windows Server updates, hotfixes, and an installer that previously required an on-site download.

APS 2016 also supports Fully Qualified Domain Name support, making it possible to setup a domain trust to the appliance.  It also ships with the latest firmware/driver updates containing security updates and fixes.

Flexibility of choice with Microsoft’s data warehouse portfolio

The latest APS update is an addition to already existing data warehouse portfolio from Microsoft, covering a range of technology and deployment options that help customers get to insights faster.  Customers exploring data warehouse products can also consider SQL Server with Fast Track for Data Warehouse or Azure SQL Data Warehouse, a cloud based fully managed service.

Next Steps

For more details about these features, please visit our online documentation or download the client tools.

15 Nov 01:50

Notes on the 2017 PASS Summit

by SQLAndy

Just a few notes:

  • Weather on Wed was great, ok on Thurs, cold on Fri. Have had worse years!
  • Felt like I saw fewer people this year (that I knew), in part due to arriving Wed after lunch and in part to many of them skipping the event due to Halloween
  • Had a lot of good conversations this year, including a few on forming a 501c(3). Think there is a lot of interest, but it will take a year or two (or three) to see how that evolves.
  • Seemed like more BI content this year. Didn’t count to see if true.
  • Heard a couple complaints about sponsor sessions not being clearly flagged. I see the point – how hard to put “(Sponsored)” in the title? Those who want to go won’t care, but it will allow those who DO care to opt out. Benefits everyone to have this clearly marked.
  • I also spoke with someone attending their second time and they mentioned that they didn’t make the “mistake” of letting vendors scan their badge because of so much spam last time. I think this is a huge problem on both sides and no easy answer that works for both parties. Worth figuring out though.
  • For the first time in a long time I had dinner with co-workers instead of friends. Was time well spent, but it was tough to decide on that vs time with people I only see once a year. No easy answer on that either!
  • Lunch all three days was so-so. Dry chicken twice. Dessert was in the sponsor area – was this new, or did I not notice before? Maybe it was effective at driving people there, curious to hear if so. Not my favorite thing they’ve tried, but we’ll see.
  • I went to Game Night on Thursday. Good attendance, 75-80, and kudos to PASS on providing support for this. Great signage to the room. Kevin Hill took on leading the event and all went smooth, very cool!
  • I didn’t get to attend the Monday networking dinner, but I hear it went very well thanks to Lisa Bohm. She’ll be the lead organizer next year, but I’ll plan to be there as an attendee and help out as needed.
  • I attended the Board Q&A on Thursday. Didn’t ask any questions because – unusually – I had not prepped and didn’t want to just throw out questions without being sure I had all available information. Several questions about SQLSaturday funding/support and the answers were along the lines of we only have so much money, but nothing that went deeper into talking about why it’s an issue now or what other options are being considered (if any). Cynically and perhaps unfairly it feels like HQ spend is not on the table for reduction. There was also a question about the speaker bureau, the answer was ‘no resources, but if you build it..’. Hard to do it all, but still, what would a minimal v1 really cost?
  • Same meeting, they mentioned that building the conference with pre-cons really worked, best numbers ever for pre-cons. Now I can tease Brent that it wasn’t his great marketing!
  • Found on Wed that Cathrine was leaving PASS as a employee. That really hurts. She was a great successor to Karla, they worked hard at knowledge transfer, and now we’ll lose some of that, not to mention the pain/risk of the search for a replacement. I don’t have any issue with Cathrine deciding to make the change, its a job where if you’re not 100% happy and engaged the right thing to do is let someone else step in.
  • I heard the SQLSaturday meeting attendance was down. Perhaps due to Halloween? Don’t know. Worrisome to lose engagement there.
  • Overall I thought the event ran fine (as expected). Easy to take that for granted.
  • Summit next year is Nov 7-9. I usually recommend that people purchase at the early bird rate, but I don’t see any options for bundles. PASS should fix that quickly.
  • Those of you on the east coast will get this, preparing to fly out and back I was seriously thinking about what it would take to have an event closer to me just so I didn’t have to travel. I don’t think that’s next year based on other stuff I want to do, but 2019? Maybe.
15 Nov 01:49

The Future of the SQLSaturday Tools

by SQLAndy

Every year when many of the SQLSaturday event leaders meet at the Summit the topic of improvements to the tools comes up. Certainly the site has seen changes  and improvements over the years,  plus at least two new coats of paint (on the public site), but it’s never enough (with software, it never is). Invariably the question that gets asked is ‘why can’t we help?’. It’s a fair question, and perhaps one that deserves a new answer.

Back in 2009 or 2010 we took a first crack at getting volunteers access to the source code and it just failed. Setting up a VPN, credentials, no test suite, no governance, just didn’t work. Keep in mind PASS was not a software shop then and getting good at that stuff takes time and expertise. Maybe it’s been tried since, but my sense is that its been just the HQ Devs making changes.

So if that is all the way on one side of the scale (super closed system), the far other side is  to open source it. Open source is also not simple. If you’re on the PASS Board you have to care about the potential loss of intellectual property. Scoff do you? No, there is no magic in the code, but it’s sweat equity and it’s a substantial part of what drives new members (and new email addresses for existing members) into the mailing list. Do you really want people forking it and spawning variations under different names?

Is there a middle ground? Sure. Let’s put together a straw man of what it might look like:

  • PASS puts the source code into a private Github repo (because all devs love git!) along with a masked data set they can load/restore
  • Write an agreement to get access to the source code and agree to not republish the code, plus convey to PASS the IP rights to new stuff
  • Write the governance process. This is the hardest piece. Who will approve pull requests? Who decides which features get added? Who will do the testing? How often will releases be done (since PASS has to coordinate that)? Code standards. Rules about fonts and logos – all the stuff you deal with any dev shop.
  • Down the road a little build a true dev environment where the latest code can be loaded and tested.

I’m not kidding about governance being hardest. There are changes I want that you don’t care about, fine. But what if I want it to work differently than you do? How do we decide that? How do we decide when to mashup with an external app vs building a feature? I can see having a committee to manage the process, but I’d hope for a lot of opportunity for event leaders to weigh in, perhaps even voting on change before someone invests effort to build/fix something.

We could do all of that and it might or might not work. Could be a huge win, could be we just argue about everything! Even assuming a win, it’s a lot of effort to get it all set up. So…what could we do to test the waters in more agile fashion?

What if we just did the first two steps above, putting the source in Git (might already be there, don’t know) and creating the access agreement doc. That’s what, a couple three days? Then, announce it. You want to contribute, sign up, look at the code, then email someone at HQ about what and how you propose to change something. If they say yes, you build it. If no, you work through the conversation about why. Nothing about the process needs to be secret, so you could just as easily write a blog post that explains the proposed change and get feedback before submitting it for approval.

If it really works, PASS will get a lot of email and a lot of stuff done that will require testing and release, causing perhaps enough pain to merit building out the test environment and a full governance process. If only a few people make changes we’ll have a lightweight process that enables that. If no one does anything, PASS will have spent a few hours that surely won’t end up being a total waste.

Is the risk worth the reward? The Board will have to weigh in on their side. What about us, the ones who want tweaks and twists, can we be work through non-trivial conversations about what to do or not do, or must it be anarchy where everyone has to have it their way and nothing gets done? Patience and transparency will be required. That seems possible, but not necessarily easy. There is rarely one right answer.

I also want to point out that this wouldn’t change the need to have devs on staff at HQ. We’ll need them for bigger features, other products, plus testing and release. They will get to essentially lead an open source project. Hard to imagine that they wouldn’t find that prospect exciting!

Fear kills ideas like this one. What if, what if, what if. We deal with that in two ways. One is to honestly consider concerns raised and find ways to mitigate them. The other is to clearly frame this as a trial, a proof of concept. Make sure we have a continued dialog about is working and what is not, and then use an open process to evolve changes to the process.

So, how do we get this done? Or, should we?

I’m looking for input on the approach. If you have a better one, blog it and send me the link. Have a suggestion for a tweak or see a concern not addressed? Post a comment. I’m putting a reminder on my calendar for December 8. I’ll come back to this post then, look at the comments again and any other ideas out there, then I’ll repackage it into a public letter to the Board to be published here on Dec 15, unless there clearly needs to be more debate. If someone else has a better approach, I’ll sell that one.

Grant Fritchey takes over the big chair Jan 1. If we hit the dates above he can have this in time to discuss at the January 2018 Board meeting and if he thinks its worth doing and can address any concerns raised there, we could see this up and running by the end of Q1. Wouldn’t that be an interesting way to start the new year?

 

15 Nov 01:12

Study Reveals Gap Between CEO Expectations for Digital Transformation and IT Readiness

by A.R. Guess

by Angela Guess According to a recent press release, “At a time when CEOs and executive boards are calling for a rapid transformation to digital business models, a worldwide study released by Commvault (NASDAQ: CVLT), the global leader in enterprise backup, recovery, archive and the cloud, and Quadrant Strategies has uncovered an alarming gap between […]

The post Study Reveals Gap Between CEO Expectations for Digital Transformation and IT Readiness appeared first on DATAVERSITY.

15 Nov 01:11

Querying raw data to Tabular

by Marco Russo

This article describes how to extract raw data stored in the Tabular engine, used by Analysis Service Tabular, Power BI, and Power Pivot.

When you look at the content of a table in Power BI, SQL Server Data Tools (SSDT) for Visual Studio, or Power Pivot, the queries sent to the Tabular engine use a syntax that guarantees minimal materialization and fast performance. For that purpose, Power BI uses a DAX syntax whereas Analysis Services and Power Pivot use a special SQL syntax.

DISCLAIMER: The content of this article includes information derived from the analysis of queries sent by the existing tools. It is not documented by Microsoft. All these techniques are subject to change in future releases, and might not be supported by Microsoft.

Querying raw data in DAX

Power BI relies on a DAX query using SELECTCOLUMNS and including all the columns of a table. The undocumented feature is the use of TOPNSKIP to retrieve a number of rows from a table efficiently, skipping a number of rows. Compared to TOPN, the TOPNSKIP function is less flexible, but much faster. This is the syntax of TOPNSKIP:

TOPNSKIP (
    <rows>, -- Number of rows to return
    <skip>, -- Number of rows to skip (0 to start from first row)
    <table> -- Table expression
    [, <orderbyexpression> [,<order>] ] 
)

For example, this is how Power BI queries the Sales table when you go in Data view. You can try the same query using DAX Studio, and you can monitor what Power BI does by enabling the All Queries trace feature in DAX Studio.

EVALUATE
SELECTCOLUMNS (
    TOPNSKIP ( 1000, 0, 'Sales' ),
    "CustomerCode", [CustomerCode],
    "DiscountAmount", [DiscountAmount],
    "DueDate", [DueDate],
    "OrderDate", [OrderDate],
    "OrderDateKey", [OrderDateKey],
    "OrderQuantity", [OrderQuantity],
    "ProductCode", [ProductCode],
    "ProductKey", [ProductKey],
    "ProductStandardCost", [ProductStandardCost],
    "SalesAmount", [SalesAmount],
    "ShipDate", [ShipDate],
    "TotalProductCost", [TotalProductCost],
    "UnitPrice", [UnitPrice]
)

The above query returns the first 1,000 rows of the table. If you scroll down the table, you will see this other query appearing in the trace just after showing row number 1,001:

EVALUATE
SELECTCOLUMNS (
    TOPNSKIP ( 1000, 1000, 'Sales' ),
    "CustomerCode", [CustomerCode],
    "DiscountAmount", [DiscountAmount],
    "DueDate", [DueDate],
    "OrderDate", [OrderDate],
    "OrderDateKey", [OrderDateKey],
    "OrderQuantity", [OrderQuantity],
    "ProductCode", [ProductCode],
    "ProductKey", [ProductKey],
    "ProductStandardCost", [ProductStandardCost],
    "SalesAmount", [SalesAmount],
    "ShipDate", [ShipDate],
    "TotalProductCost", [TotalProductCost],
    "UnitPrice", [UnitPrice]
)

If you enable the “Sort Descending” setting on the OrderDateKey column of the Sales table, this is the query generated to navigate to the middle of the table:

EVALUATE
SELECTCOLUMNS (
    TOPNSKIP ( 1000, 26000, 'Sales', 'Sales'[OrderDateKey], DESC ),
    "CustomerCode", [CustomerCode],
    "DiscountAmount", [DiscountAmount],
    "DueDate", [DueDate],
    "OrderDate", [OrderDate],
    "OrderDateKey", [OrderDateKey],
    "OrderQuantity", [OrderQuantity],
    "ProductCode", [ProductCode],
    "ProductKey", [ProductKey],
    "ProductStandardCost", [ProductStandardCost],
    "SalesAmount", [SalesAmount],
    "ShipDate", [ShipDate],
    "TotalProductCost", [TotalProductCost],
    "UnitPrice", [UnitPrice]
)

The result of the query includes 1,000 rows after skipping 26,000 rows following the descending sort order defined by the OrderDateKey column. It is not possible to specify more than one column for the sort order. You can apply filters using CALCULATETABLE. This is not supported in Power BI (as of October 2017), but it is available in the syntax. For example, you can get 1,000 rows skipping 2,000 rows from the Sales table ordered by OrderDateKey and filtering only one product using the following query:

EVALUATE
CALCULATETABLE (
    SELECTCOLUMNS (
        TOPNSKIP ( 1000, 2000, 'Sales', 'Sales'[OrderDateKey], ASC ),
        "CustomerCode", [CustomerCode],
        "DiscountAmount", [DiscountAmount],
        "DueDate", [DueDate],
        "OrderDate", [OrderDate],
        "OrderDateKey", [OrderDateKey],
        "OrderQuantity", [OrderQuantity],
        "ProductCode", [ProductCode],
        "ProductKey", [ProductKey],
        "ProductStandardCost", [ProductStandardCost],
        "SalesAmount", [SalesAmount],
        "ShipDate", [ShipDate],
        "TotalProductCost", [TotalProductCost],
        "UnitPrice", [UnitPrice]
    ),
    'Sales'[ProductKey] = 477
)

Querying raw data in SQL

SQL Server Data Tools (SSDT) for Visual Studio and Power Pivot use a different user interface that generates the query to Tabular through a SQL syntax. This syntax is a subset of the SQL syntax that was already allowed in Analysis Services Multidimensional to query the model, provided you set the SqlQueryMode to DataKeys in the connection string.

NOTE: You should include the string SqlQueryMode=DataKeys; in the connection string. You can do this in the Advanced Options of the Connect dialog box in DAX Studio, as you can see in the following screenshot.

For example, the simple syntax to retrieve all the columns of the first 1,000 rows from the Sales table is the following:

SELECT TOP 1000
    [Model].[$Sales].[CustomerCode],
    [Model].[$Sales].[DiscountAmount],
    [Model].[$Sales].[DueDate],
    [Model].[$Sales].[OrderDate],
    [Model].[$Sales].[OrderDateKey],
    [Model].[$Sales].[OrderQuantity],
    [Model].[$Sales].[ProductCode],
    [Model].[$Sales].[ProductKey],
    [Model].[$Sales].[ProductStandardCost],
    [Model].[$Sales].[SalesAmount],
    [Model].[$Sales].[ShipDate],
    [Model].[$Sales].[TotalProductCost],
    [Model].[$Sales].[UnitPrice]
FROM [Model].[$Sales]

The TOP condition specifies how many rows to include in the result. Any reference follows the [Model] identifier. The table name must be prefixed with $ and within square brackets if it includes spaces or special characters. The square brackets are also optional for the column name, and are required only for special names. However, it is a good idea to always use square brackets for table and column names.

The complete syntax is the following – curly braces indicate optional elements, whereas angle brackets are placeholders for arguments/names:

SELECT {SKIP <number_of_rows_to_skip>} {TOP <number_of_rows_to_return>}
    { * | [Model].[$<table_name>].[<column_name>] { , … } }
FROM [Model].[$<table_name>]
ORDER BY [Model].[$<table_name>].[<column_name>] { ASC | DESC }

If you use * instead of a list of columns, you get all the columns of the table including the special RowNumber column. The special RowNumber column identifies the physical position of a row in the table (this is not accessible in DAX).

As you see, RowNumber is a zero-based index. Do not confuse this number with the row number of the result. In fact, if you sort the data you obtain the RowNumber of the physical position in the raw table – which could be different from the sort order of the physical table in the data source because of the arrangements made by the compression algorithm in VertiPaq.

Just to complete the same examples you have already seen in DAX translated to SQL, here is how to skip the first 1,000 rows:

SELECT SKIP 1000 TOP 1000
    [Model].[$Sales].[CustomerCode],
    [Model].[$Sales].[DiscountAmount],
    [Model].[$Sales].[DueDate],
    [Model].[$Sales].[OrderDate],
    [Model].[$Sales].[OrderDateKey],
    [Model].[$Sales].[OrderQuantity],
    [Model].[$Sales].[ProductCode],
    [Model].[$Sales].[ProductKey],
    [Model].[$Sales].[ProductStandardCost],
    [Model].[$Sales].[SalesAmount],
    [Model].[$Sales].[ShipDate],
    [Model].[$Sales].[TotalProductCost],
    [Model].[$Sales].[UnitPrice]
FROM [Model].[$Sales]

The next query skips 26,000 rows and returns 1,000 rows following the descending sort order defined by the OrderDateKey column.  Also in SQL, it is not possible to specify more than one column for the sort order.

SELECT SKIP 26000 TOP 1000
    [Model].[$Sales].[CustomerCode],
    [Model].[$Sales].[DiscountAmount],
    [Model].[$Sales].[DueDate],
    [Model].[$Sales].[OrderDate],
    [Model].[$Sales].[OrderDateKey],
    [Model].[$Sales].[OrderQuantity],
    [Model].[$Sales].[ProductCode],
    [Model].[$Sales].[ProductKey],
    [Model].[$Sales].[ProductStandardCost],
    [Model].[$Sales].[SalesAmount],
    [Model].[$Sales].[ShipDate],
    [Model].[$Sales].[TotalProductCost],
    [Model].[$Sales].[UnitPrice]
FROM [Model].[$Sales]
ORDER BY [Model].[$Sales].[OrderDateKey] DESC

The following query returns 1,000 rows skipping 2,000 rows from the Sales table ordered by OrderDateKey and filtering only one product.

SELECT SKIP 2000 TOP 1000
    [Model].[$Sales].[CustomerCode],
    [Model].[$Sales].[DiscountAmount],
    [Model].[$Sales].[DueDate],
    [Model].[$Sales].[OrderDate],
    [Model].[$Sales].[OrderDateKey],
    [Model].[$Sales].[OrderQuantity],
    [Model].[$Sales].[ProductCode],
    [Model].[$Sales].[ProductKey],
    [Model].[$Sales].[ProductStandardCost],
    [Model].[$Sales].[SalesAmount],
    [Model].[$Sales].[ShipDate],
    [Model].[$Sales].[TotalProductCost],
    [Model].[$Sales].[UnitPrice]
FROM [Model].[$Sales]
WHERE [Model].[$Sales].[ProductKey] = 477
ORDER BY [Model].[$Sales].[OrderDateKey] ASC

Conclusion

Querying raw data stored in a Tabular model can be useful if you create a tool that needs to quickly browse data, similarly to the Power BI and SSDT user interface when you design a data model. Considering that you can apply filter and sort order on a column, this could be an alternative way to execute a drillthrough operation on a large table in a standard report environment where you control the query generation (such as SQL Server Reporting Services).

15 Nov 01:09

SQL Server on Linux – It’s No Joke

by kevin

SQL SERVER ON LINUX – IT’S NO JOKE

Many thought it was an early April Fool’s Day prank, but it was no joke: On March 7, 2016, Microsoft announced the beta release of SQL Server on Linux with the intention of shipping a full release of the product in 2017.

READ MORE…

Originally Posted May 04, 2016

The post SQL Server on Linux – It’s No Joke appeared first on Kevin Kline.

15 Nov 01:09

TSQL Tuesday #96: Folks Who Have Made a Difference

by Paul Randal

It’s been almost three years since I wrote a T-SQL Tuesday post (shame on me!), but this is one I definitely want to contribute to. It’s hosted by Ewald Cress and is about “the opportunity to give a shout-out to people (well-known or otherwise) who have made a meaningful contribution to your life in the world of data.”

There are three people I want to call out, in the order that they came into my life and helped me out.

Firstly, Dave Campbell, who left Microsoft as a Technical Fellow last year after 22 years in Microsoft’s world of data. When I joined the SQL Server team from DEC in 1999, Dave had already been there 5 years and was the Development Lead of the Access Methods team in the Storage Engine. Dave has always been a brilliant engineer, a calm and insightful manager, and a willing mentor. He taught me a lot about engineering, managing crises, and being a manager. I was amazed in late 2003 to find myself becoming the Dev Lead of the Access Methods team and stepping into his shoes.

I’m sad to say that over the years I’ve lost touch with Dave, but I’m forever grateful for the influence he had on my professional career.

Secondly, my great, great friend Bob Ward. I first met Bob a few months into my tenure at Microsoft and continued to meet and swap emails around Product Support matters but I didn’t start working closely with him until a few years later. Bob was the inspiration for me to want to help customers: to help them find why SQL Server was broken for them, to fix bugs, and to make sure that people in Product Support were saying and doing the right thing for customers. He inspired me because that was his passion, and his entire job. We’d spend many hours on the phone each week and through emails discussing things and sorting stuff out. This led me to champion adding an entire pillar to the new engineering process that came 2/3 through SQL Server 2005 development: supportability, making sure all facets of the SQL Server box could be understood and debugged by Product Support. This involved driving and coordinating all development teams to build and deliver training materials on how SQL Server worked, how to debug it, and how Product Support should approach it AND build into each area the tools, messages, and hooks to allow such investigations to be done.

Bob and I (and Bob’s lovely wife Ginger, plus Kimberly) continue to be close friends and we get together whenever we can (which is a lot more frequently now that Bob’s in the product group and up in Redmond regularly). Of all the people I met at Microsoft, Bob made the greatest contribution to who I am today by inspiring me to help people.

Thirdly, my wonderful wife Kimberly, who helped me develop my speaking skills and made me ‘less Paul’, as she puts it (learning humility, presenting with empathy, and removing a lot of the arrogance with which I left Microsoft). I’d just started presenting when I met Kimberly at TechEd 2006 in Boston and I had a *lot* to learn. I quickly adopted her style of presenting, which works for me. This involves going against one of the central things people are taught about presenting – few bullets with few words. We both (and all of SQLskills) have dense slides with lots of bullets. This is so that people can read the deck and tell what we’re talking about, rather than having pictures of kittens, trees, race-cars, whatever, which tell you nothing several months later. Some of you will disagree – each to their own. The central theme though is making sure that people have learned and understand how and why things are, not just what the answer is.

The other thing (among so many in my life since meeting her) that I want to thank Kimberly for here is for SQLskills. Kimberly’s been a successful business owner since the early 1990s and since she started SQLskills.com in 1995. It was incredibly cool that I could leave Microsoft in 2007 and walk straight into a thriving business with a stellar reputation and start teaching and consulting.

You’ll notice that I didn’t say ‘lastly’ above – I said ‘thirdly’. There are two more groups of people I want to give a shout out to.

Firstly, the incredibly-talented group that work with us at SQLskills (Erin, Glenn, Jon, Tim, and previously Joe Sack – another great friend). I continually learn new things from them and I’m sincerely thankful that they chose to work at SQLskills for so long (Jon for 6+ years, Erin and Glenn for 5+ years, and Tim for almost 3 years). They’re all experts in their specialties and immensely capable people, who keep me on my toes and who are all wonderful people and friends.

Lastly, and most importantly, the people who’ve had the most influence in my data world are the SQL Server community; my fellow MVPs, all the MCM community, everyone who’s come to a class, attended a session, read a blog post or article, watched a Pluralsight course, posted a question, or tweeted on #sqlhelp. A huge part of my personality is helping people understand SQL Server. It’s what drives me to blog, to answer random email questions, put together the waits library, teach, and more.

You’ve all helped shape me into the person I am today in the data world, and I thank you sincerely for it.

 

The post TSQL Tuesday #96: Folks Who Have Made a Difference appeared first on Paul S. Randal.