Shared posts

04 Dec 19:02

The Definitive Guide to DAX is now available! #dax #powerpivot #powerbi #ssas #tabular

by Marco Russo (SQLBI)

I am so happy to announce that The Definitive Guide to DAX is finally available!

Cover The Definitive Guide to DAXI and Alberto Ferrari spent one year writing this book, and several years collecting the knowledge necessary to do that. The complete title is The Definitive Guide to DAX: Business intelligence with Microsoft Excel, SQL Server Analysis Services, and Power BI. You can imagine why we like to shorten it! However, the complete title gives you an important hint: this book cover the new DAX syntax of Excel 2016, Power BI Desktop, and Analysis Services 2016. For example, we covered all table functions useful for calculated tables, which is a feature released in Power BI Desktop after we completed the book writing. This has been an additional challenge, but our goal was to publish a book dedicated to the DAX language, independent from the product and completely up-to-date.

But everything has a cost. It took us a huge amount of time to reach the depth and completeness we wanted in this book. And it will took you weeks if not months to read it cover-to-cover. Yes, I know, you no longer read technical books in this way. You open it at the right chapter and you get the content you need, you copy the pattern, you get the good hint. I do that at least once a week. But you will be able to use this book in that way once you have a solid understanding of DAX. At the beginning, my suggestion is to start from chapter 1, even if you are an experienced DAX developer.

What if you are a DAX beginner? This book will be your guide, but you might consider a more introductive book to start (you can find other books from us and from Rob Collie, depending on the product you use and the writing style you prefer). This is particularly important because we don’t spend a line in the book discussing about user interface. We wrote a book about the DAX language, so you have to know in advance the UI of a product that use this language. Today, the list ranges from Excel (2010/2013/2016), Analysis Services (2012/2014/2016), and Power BI Desktop.

Why am I so excited about this book? After all, I wrote many books (this should be the 10th in English, and I wrote other three books in Italian). Well, first of all, after a few months after completing the writing, I and Alberto would not add or modify anything in this book. As you will read in the introduction, we made no compromises. We thought the size would have been 450-500 pages, but the result is 530 pages of content (plus indexes, table of contents, and so on). Is it the perfect book? No, I am pretty sure we will discover some error and something to clarify and to fix. It always happens. But we set the bar very high this time, and we are very satisfied about the final result. Only reviews will tell us if our perception I right, but we know this is the best result possible today. We had technical reviewers that helped us so much in getting the point of view of the reader, and I would like to mention the incredible job made by Gerhard Brueckl. Believe me, if you wrote a technical book, your worst nightmare is the technical reviewer that review too much, so that you spend more time explaining why you were right instead of fixing the content. Well, Gerhard had the skills and the ability to highlight the right thing. Thanks Gerhard, you deserve a public mention!

After this self-celebration, let me spend some paragraph about the content. We use this book as companion content for our courses Mastering DAX *and* Optimizing DAX. During the courses we have hands-on-labs and a lot of interactions, but we constantly refer to the book to get more detailed information about specific functions and behaviors. Thus, if you attend these courses, you will find it easier to read the book. But you will not be able to skip it! Here is the table of contents, with some comments:

  • Foreword: three of the authors of the DAX language and the VertiPaq engine wrote the foreword of our book: Marius Dumitru, Cristian Petculescu, and Jeffrey Wang.
  • Introduction: read the introduction before buying the book. You will understand if it is the book of you or not.
  • Chapter 1: What is DAX?
  • Chapter 2: Introducing DAX
  • Chapter 3: Using basic table functions
  • Chapter 4: Understanding evaluation contexts
  • Chapter 5: Understanding CALCULATE and CALCULATETABLE
  • Chapter 6: DAX examples
  • Chapter 7: Time intelligence calculations
  • Chapter 8: Statistical functions
  • Chapter 9: Advanced table functions
  • Chapter 10: Advanced evaluation context
  • Chapter 11: Handling hierarchies
  • Chapter 12: Advanced relationships
  • Chapter 13: The VertiPaq engine
  • Chapter 14: Optimizing data models
  • Chapter 15: Analyzing DAX query plans
  • Chapter 16: Optimizing DAX

Topics in chapters 1 to 12 are covered in our Mastering DAX workshop. We organized the content so that you can read them one after the other. The content is very dense, at the beginning we use simpler examples, but we never repeat the same concepts, so if you skip one chapter you might miss some knowledge to fully understand the following topics. Even in chapter 6, which tries to consolidate previous content with practical examples, you will find something new in terms of ways you can use DAX.

Topics in chapters 13 to 16 are covered in our Optimizing DAX workshop. Please, don’t jump to this part if you didn’t read the previous chapters before. Also for attendees of the course, we suggest to complete the self-assessment for prerequisites to attend the course, and you can try to do the same for the book. If you are not ready, you will simply see a huge amount of numbers, without understanding how to connect the dots. You need a solid and deep knowledge of how evaluation context works in DAX before doing any optimization.

My personal estimate is that if you dedicate one week to every chapter, you will be able to complete the learning in 4 months. Read the book, absorb the content, make practice. You might be faster at the beginning if you already know DAX. But be careful, you never read anywhere what we describe in chapter 10 (we rewrote that chapter 3 times… but this is another story), and this is of paramount importance to really “understand” DAX. You hardly have seen the complete description of all DAX table functions in chapter 9. You will not find an extensive use of variables, but the VAR / RETURN syntax is described early in the book and you will see this used more and more with the advent of Excel 2016 / Power BI Desktop / SSAS 2016.

Finally, the goal of the book is not to give you patterns and best practices, but to teach you how DAX works, how to write good code, and how to measure the performance, find the bottlenecks and possibly optimize it. As I always say, do not trust any best practice when it comes to DAX optimization. Don’t trust blogs, articles, books. Don’t trust my writings, too. Simply, measure and evaluate case by case. And the reason is the first answer to any question that the consultant receive: it depends!

If you want to order the book on Amazon, here is a quick reference to links in all the available versions of this site:

Have a nice reading!

04 Dec 19:02

Of clustered indexes and ordering

by Gail

There is a particularly irritating and persistent belief that indexes (usually it’s the clustered that gets picked on) are always physically ordered within the data file by the key columns. That is, that the data within the database file is always ordered by the key column.

It doesn’t help that official documentation states this ‘fact’.

I’m going to diverge from my usual methodology of first proving (or disproving) a statement and then explaining it in this case.

Do indexes (clustered or non-clustered) define the physical storage order of the rows?

No, absolutely not.

What indexes do is provide a logical ordering, a collection of pointers, that allow the storage engine to retrieve data from an index ordered by the index key, but that’s logical ordering, it specified nothing regarding the physical ordering.

The index structure is such that the page with key values 4, 5 and 6 will appear earlier in the index’s logical ordering than the page with key values 10,11 and 12. Where these pages are in the file is not defined at all. The page with key values 10,11 and 12 could be page 240 in the database file while the page with key values 4, 5 and 6 could be page 655.

On the data pages themselves there’s no guarantee that the row with the key value 4 will appear earlier on the page than the row with the key value of 6. 6 could be the first row on the page and 4 last and that would be just fine.

Let’s prove this. Time for DBCC page and some undocumented commands.

First up, the order of rows on the page. I’m going to create a table in a nice new database (so that there are no other tables around messing things up) and populate it with some data.

CREATE TABLE OddandEven (
SomeNumber INT,
Filler CHAR(500) DEFAULT ' '
) ;
GO

CREATE UNIQUE CLUSTERED INDEX idx_SomeNumber ON OddandEven (SomeNumber);
GO

INSERT INTO OddandEven (SomeNumber)
SELECT TOP (50) (ROW_NUMBER() OVER (ORDER BY object_id))*2 - 1 FROM sys.objects;

INSERT INTO OddandEven (SomeNumber)
SELECT TOP (50) (ROW_NUMBER() OVER (ORDER BY object_id))*2 FROM sys.objects;

So what I’m doing there is simply inserting 50 odd numbers first and 50 even numbers second

A quick check with DBCC IND shows me that page 89 of this database is a data page for this table. I’m going to use dump style 2 for DBCC Page, because I want a raw binary dump with no interpretation (I’m removing the portions that are just the filler, as that’s just intentionally wasted space)

000000000EB6AC50:   20020000 1000fb01 37332020 20202020 † …..û.73
000000000EB6AE40:   20202020 20202020 20202020 20202002 †               .
000000000EB6AE50:   00001000 fb013735 20202020 20202020 †….û.75
000000000EB6AE60:   20202020 20202020 20202020 20202020 †
000000000EB6B040:   20202020 20202020 20202020 20020000 †             …
000000000EB6B050:   1000fb01 36342020 20202020 20202020 †..û.64 
000000000EB6B060:   20202020 20202020 20202020 20202020 †
000000000EB6B240:   20202020 20202020 20202002 00001000 †           …..
000000000EB6B250:   fb013636 20202020 20202020 20202020 †û.66
000000000EB6B260:   20202020 20202020 20202020 20202020 †

Hmm… 73, 75, 64, 66. That’s not the correct physical ordering… What happened here is that I inserted the odd values first, they were written to the pages then when I wrote the even numbers the pages had to split (firstly) leaving them probably around 50% full, then the even numbers were added in the empty space. SQL doesn’t reorder the rows on the page (that would be expensive).

What keeps track of the logical ordering, what rows should be read first, second, etc. to get the results back in logical ordering, is the slot array at the end of the page

OFFSET TABLE:
Row - Offset
 14 (0xe) - 7236 (0x1c44)
 13 (0xd) - 3666 (0xe52)
 12 (0xc) - 6726 (0x1a46)
 11 (0xb) - 3156 (0xc54)
 10 (0xa) - 6216 (0x1848)
 9 (0x9) - 2646 (0xa56)
 8 (0x8) - 5706 (0x164a)
 7 (0x7) - 2136 (0x858)
 6 (0x6) - 1626 (0x65a)
 5 (0x5) - 5196 (0x144c)
 4 (0x4) - 1116 (0x45c)
 3 (0x3) - 4686 (0x124e)
 2 (0x2) - 606 (0x25e)
 1 (0x1) - 4176 (0x1050)
 0 (0x0) - 96 (0x60)

That tells me that the row with the lowest key value is found at offset 0x60, the next lowest at offset 0x1050, then 0x25e, etc. The rows are not stored on this page in physical order, the slot array defines the logical order so that anything needing the rows in logical order of the index, can read them off the page that way.

That answers the question about rows on a page. Let’s now look at whether pages are always stored in physical order within the data file.

I’m going to drop the OddandEven table and create a new table with the rows sized so that only a few rows fit onto a page.

CREATE TABLE PagePhysicalOrder (
  SomeNumber INT,
  Filler CHAR(800) DEFAULT ' '
);

CREATE UNIQUE CLUSTERED INDEX idx_TestingPhysicalOrder ON PagePhysicalOrder (SomeNumber)

DECLARE @i INT = 9;
WHILE @i >= 0
  BEGIN
    INSERT INTO dbo.PagePhysicalOrder (SomeNumber, Filler)
    SELECT TOP (10)
      ROW_NUMBER() OVER (ORDER BY (SELECT 1)) +@i*10,''
      FROM sys.objects;

    SET @i = @i - 1;
  END

That gets me 100 rows in the table, written in groups of 10, with the higher values for SomeNumber being inserted first. Now, to find where the rows are stored, I’m going to use the sys.fn_PhysLocFormatter function and the %%physloc%% virtual column. See http://www.sqlskills.com/blogs/paul/sql-server-2008-new-undocumented-physical-row-locator-function/ for more details on these.

SELECT SomeNumber,
sys.fn_PhysLocFormatter(%%physloc%%) AS RowLocation
FROM dbo.PagePhysicalOrder

RowPhysicalLocations

The output of the PhysLocFormatter is FileID : Page Number : Slot Index. The output shows the rows with SomeNumber 75, 76, 77 and a few others are on page 197 while rows with a lower SomeNumber (65-70) are on page 248, further into the data file than the page containing the larger values of SomeNumber.

Hence we can say that the clustered index doesn’t enforce the physical order of the pages in the data file either.

The only thing that the clustered index (or nonclustered indexes) enforce is what values belong on a page together. If we have a table with an index on an integer column, we cannot have a situation where rows with a key value of 1, 2, 4, 8, 9 are on one page and rows with a key value of 3, 5, 6, 7 and 10 are on another. If only 5 rows fit onto a page, one page will have 1, 2, 3, 4 and 5 and another page will have 6, 7, 8, 9 and 10.  The physical order of the rows on those pages is irrelevant, as is the physical order of those two pages in the data file.

I suspect this myth came about because, when SQL creates or rebuilds an index, it will try as best as possible to put the pages of the index down in physical order of the index key. Doing so reduces logical fragmentation and allows the read-ahead reads to work as efficiently as possible. This applies only when the index is created, rebuilt or reorganised, not during regular operations.

04 Dec 19:02

Different Types of NoSQL Databases and When to Use Them

by A.R. Guess

by Angela Guess Jim Scott recently wrote for Smart Data Collective, “There are four main types of NoSQL databases, plus one type of “database” that should also be considered in the mix. In this blog post, I’ll provide a brief description of these types of NoSQL databases and when they can be used. (1) Key-value.…

The post Different Types of NoSQL Databases and When to Use Them appeared first on DATAVERSITY.

04 Dec 18:59

The Role of Microsoft’s Data Platform: Recognized as a Leader in Gartner Magic Quadrant

by David Hobbs-Mallyon

Lately, a lot of industry discussion has focused on the growing importance of IT innovation and the role of data as organizations move to position themselves for today’s cloud-first, mobile-first world. For example, in a recent article, Margaret Rouse said, “Innovation has become a major factor in determining modern enterprises' longevity and success.” And for many organizations, data is the driver of innovation, as previously discussed in this series.

Taking this idea further, reporting on an MIT conference about the role of the CIO, Nicole Laskowski emphasized, “Business analysts aren't just hungry for data; they're starved…. And if organizations are to become data driven, business users need access to relevant, quality data.” To satisfy that hunger, according to Nicholas D. Evans, CIOs needs to focus on “core IT initiatives such as establishing a mission-critical infrastructure with increased agility, flexibility, manageability and security.”

Providing just such mission-critical infrastructure that enables innovation has been the longstanding commitment in Microsoft’s comprehensive vision for its data platform: From record-setting performance benchmarks; to flexibility in deployment on premises, in the cloud, or in hybrid configurations; to vast scalability (both on premises and in the cloud); to big data; to data warehousing and business intelligence (BI) and analytics; to unwavering dedication to security, Microsoft ensures that customers have the range of mission-critical capabilities necessary for business innovation. In fact, acknowledgement of Microsoft’s vision and ability to execute recently came in the form of Gartner’s Magic Quadrant for Operational Database Management Systems (ODBMS), which once again named Microsoft a Leader and indeed also placed positioning Microsoft furthest to the right on the axis for completeness of vision, and highest on the axis for ability to execute*.

The question is how to take advantage of this comprehensive data platform to further innovation and grow the business. In his article, "The 5 critical pillars of innovation management capability," Nicholas D. Evans answered, “A key consideration … is to continually innovate around the use of the technologies themselves and incorporate newer elements such as … advanced analytics to further fuel innovation activities and lend additional insights.”

This example illustrates both why innovation is such a hot topic in the industry and how technology can facilitate innovation: As data becomes increasingly vital to the success of business, organizations need to look for technologies that enable them to innovate as well as deliver new ways to use that technology. Gartner has positioned Microsoft as a leader in ODBMS, with acknowledged vision and the ability to execute on that vision. Not only that, but in addition, Gartner has also positioned Microsoft as a leader in the Magic Quadrant for Business Intelligence and Analytics Platforms and as a leader in the Magic Quadrant for Data Warehouse Database Management Systems. All of this recognition underscores Microsoft’s commitment to providing a comprehensive and leading-edge data platform that provides the basis for innovation.

Attend the webinar

Many of the topics covered in this blog are discussed in our upcoming webinar "Tackle the Top Five Data Challenges with SQL Server," which you can attend live on Thursday, October 29, 2015 at 10:00-10:45AM PST (GMT -7). You’ll learn how the Microsoft data platform’s unified approach can give you a comprehensive way to address the top five data management challenges.

Join speakers, Ramnik Gulati , Director of Product Marketing, Data Platform & IoT Marketing and David Hobbs-Mallyon, Senior Product Marketing Manager, Cloud + Enterprise Integrated Marketing, for the first in a series of free 30-minute webinars, Tackle the top five data challenges with Microsoft SQL Server, on Thursday, October 29, 2015 at 10:00-10:45AM PST (GMT -7). Register here today.

* Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

04 Dec 18:59

Managing the errorlog file

by TiborKaraszi
I frequently see recommendations to regularly run sp_cycle_errorlog, so that the errorlog doesn't become huge. My main concern with that is that the errorlog contains valuable information . When I do a health check on a SQL Server machine, I want a few months worth of errorlog information available. I typically use my own scripts for this, available here . Almost every time I go through the errorlog, I find valuable information. Some things you address, like find whatever it is that is attempting...(read more)
04 Dec 18:59

Did You Know: The Times They are A-Changin’

by Kalen Delaney
You know, when Microsoft announces ‘deprecation’ of a feature, there is no guarantee that the feature will be gone any time ‘soon’. All that an official notice of deprecation means is that “This feature will be removed in a future version of Microsoft SQL Server.” But it could be a far future version. SQL Trace, for example, was announced as deprecated in SQL Server 2008, when Extended Events first made their appearance, but SQL Trace, and it’s trusty sidekick SQL Profiler, are still around, even...(read more)
04 Dec 18:58

Miscellaneous Database Design Errors

by Laura Burke

by Michael Blaha This is the third and final blog in a series about database design errors. The two previous blogs addressed primary key and foreign key errors as well as confusion with many-to-many relationships. Now let’s discuss several other design errors. Our coverage is clearly not complete, but these are common mistakes that are…

The post Miscellaneous Database Design Errors appeared first on DATAVERSITY.

04 Dec 18:58

Service Broker Enhancements in SQL Server 2016

by Aaron Bertrand

In advance of this week's PASS Summit, and surely dozens and dozens of announcements around SQL Server 2016, I thought I would share a tidbit of a feature that's been hidden in the CTPs for some time now, but that Microsoft hasn't had a chance to publicize: Additional maintenance operations available for Service Broker queues.

Remus Rusanu (@rusanu) discussed the problems that fragmentation at high volume can cause for queues in this post:

There, he revealed that you could actually use DBCC REINDEX against the internal table, but you had to determine the internal table name, and connect via the DAC. Not exactly convenient.

Now, almost six years later, if you believe you are experiencing fragmentation problems due to high load, you can force index REORGANIZE or REBUILD operations against the queue's internal table by referencing the queue directly:

ALTER QUEUE dbo.myQueue REORGANIZE;
-- or
ALTER QUEUE dbo.myQueue REBUILD;

How do you know how much fragmentation you have in a queue? Well, queues have been added as a permissible object to pass to sys.dm_db_index_physical_stats, too:

SELECT * FROM sys.dm_db_index_physical_stats
  (
    DB_ID(), 
    OBJECT_ID(N'dbo.QueryNotificationErrorsQueue'), 
    -1, 0, 'SAMPLED'
  );

And you get similar output to the same interrogation of a regular table.

Additionally, you can move the queue to another filegroup; this operation will rebuild the internal queue table and all of its indexes on the new filegroup:

ALTER QUEUE dbo.myQueue MOVE TO [MY_FILEGROUP];

These new capabilities should allow for greater scalability of Service Broker solutions.

The post Service Broker Enhancements in SQL Server 2016 appeared first on SQLPerformance.com.

04 Dec 18:58

Azure Key Vault Integration for SQL Server in Azure VMs

by SQL Server Team

Earlier this year we announced a service called Azure Key Vault (AKV), a cloud-hosted Hardware Security Module (HSM) backed service for managing cryptographic keys and other secrets used in your cloud application. This summer, that service became generally available. Around the same time, the SQL Server Connector was also released (available on the Microsoft Download Center). This connector enables SQL Server encryption to use the Azure Key Vault as an Extensible Key Management (EKM) module for more secure key protection. Today we are announcing the Azure Key Vault Integration feature that simplifies the process of setting up your SQL Server VM to take advantage of AKV.

Azure Key Vault provides the convenience of managing your cryptographic keys within one service that is secure and highly available. This provides an alternative to managing the storage, protection, and sharing of those keys yourself. The SQL Server Connector allows your SQL Server machine to connect to Azure Key Vault in order to access the cryptographic keys that are protecting your encrypted databases or backups.

If you are unfamiliar with Azure Key Vault, HSMs, or SQL Server encryption, the links above are a good place to start.

Configuring Your Machine for AKV

Before using Azure Key Vault, there are several steps to go through to configure your SQL Server machine. You need to download and install the SQL Server Connector, configure the EKM provider, and create the credential that allows you to connect to your key vault from your machine. If you need to do this for multiple machines, the time quickly adds up.

To save time, we have created the Azure Key Vault Integration feature that automates this entire setup for your SQL Server Virtual Machines in Azure. This feature, when enabled, automatically configures your SQL VM so that you can access the keys in your key vault and use them to encrypt your databases and backups. This is a one-time setup, providing you the option to access keys in your key vault at any time from your VM. All you need to do is execute a few PowerShell cmdlets.

Note: These steps are provided in this blog post for convenience. However, when there are updates to the feature, the instructions may change. Please refer to the product documentation page for the most current outline of the steps to use this feature if you are reading this sometime after the post date of this blog.

Prerequisites for Using this Feature

Before taking advantage of this feature, you will need the following:

You will then need to create a key vault and register an application with AAD that has access to your key vault. The provided links include detailed steps for how to accomplish these tasks. While completing these steps, there are some pieces of information you will need to gather which will be required parameters for AKV Integration.

When registering an application with AAD, you will need to save the Service Principal name and secret as you go through the steps. The Service Principal name will be the client ID and the Service Principal secret will be the key that is created.

Service Principal name:


Service Principal secret: 

When creating the key vault, you will need to save the key vault URL.

e.g. if you named your key vault ContosoKeyVault, your key vault URL would be https://contosokeyvault.vault.azure.net/

You will then need to authorize the application to access the key in the vault by using the Set-AzureKeyVaultAccessPolicy cmdlet.

Automating AKV Integration with SQL Server Virtual Machines

Once you have completed these steps, you will be able to enable this feature via Microsoft Azure PowerShell. The PowerShell cmdlet to enable this feature requires some parameters. Several of the parameters you collected while completing the steps above.

A list of the parameters is provided below with examples of how to use the parameters in PowerShell. You can change each of these strings to match your resources.

1. The URL of your key vault

$akvURL = “https://contosokeyvault.vault.azure.net/”


2. Service Principal name

$spName = “fde2b411-33d5-4e11-af04eb07b669ccf2”


3. Service Principal secret

$spSecret = “9VTJSQwzlFepD8XODnzy8n2V01Jd8dAjwm/azF1XDKM=”

4. Credential name: This feature will create a credential within SQL Server, allowing the VM to have access to the key vault. You specify what you would like the name of this credential to be.

$credName = “mycred1”

With these parameters defined, you simply execute the cmdlets, as shown in the example below.

$vmName = “yourvmname”

$serviceName = “yourservicename”

$secureakv =  $spSecret | ConvertTo-SecureString -AsPlainText -Force

$akvs = New-AzureVMSqlServerKeyVaultCredentialConfig –Enable –CredentialName $credname –AzureKeyVaultUrl $akvURL –ServicePrincipalName $spName –ServicePrincipalSecret $secureakv

Get-AzureVM –ServiceName $serviceName –Name $vmName | Set-AzureVMSqlServerExtension –KeyVaultCredentialSettings $akvs | Update-AzureVM

The cmdlet New-AzureVMSqlServerKeyVaultCredentialConfig captures the parameters for this feature. Then, Set-AzureVMSqlServerExtension sends the settings to the SQL IaaS Agent extension within your VM in order to enable the feature. The VM is then updated with this change.

After these cmdlets have completed running, the AKV Integration feature will have installed the latest version of the SQL Server Connector, configured the EKM provider to use Azure Key Vault, and created the credential to access your key vault. Now, you have saved yourself an hour! You can then start enabling encryption for your databases or backups as you normally would using the keys in your key vault.

Enabling Encryption after Setup Completes

In order to begin encrypting your databases or backups, you will need to create an asymmetric key in your key vault, and a symmetric key in SQL Server. Then, you will be able to enable transparent data encryption (TDE), encrypted backups, or column level encryption (CLE) for any database in your SQL VM using T-SQL statements, as shown here.

See the documentation page for more detailed steps to enable this feature and enabling encryption for your databases and backups.

If you are interested in this feature and do not have an Azure Subscription, you can try it out today.

04 Dec 18:57

Harnessing the Potential of Big Data in Healthcare

by A.R. Guess

by Angela Guess Rachel Delacour recently wrote in Datanami, “As the world’s population is living longer, treatment delivery models need to quickly evolve, and many of these changes are being driven by data. The focus now is on understanding as much about a patient as possible, hopefully picking up warning signs of serious illness at…

The post Harnessing the Potential of Big Data in Healthcare appeared first on DATAVERSITY.

04 Dec 18:57

Why ‘Relational Databases Don’t Handle Big Data Well’

by A.R. Guess

by Angela Guess Chris Preimesberger recently opined for eWeek, “Enterprises are confronting the reality of big, fast, varied and changing data. It's no longer about managing a small number of systems, but rather hundreds of systems and petabytes of data. Smart companies know that managing large volumes of structured and unstructured data, known as big…

The post Why ‘Relational Databases Don’t Handle Big Data Well’ appeared first on DATAVERSITY.

04 Dec 18:57

Foundation Session: Developing Modern Applications on Azure

by SQL Server Team

Heads-up! On October 28, there will be a session at this year’s PASS Summit presenting the new ways Azure SQL Database can be used alongside DocumentDB, Azure Search, HDInsight, and Machine Learning to create a sophisticated, powerful application.

In collaboration with his team, presenting will be General Manager Shawn Bice. After nearly 10 years at Microsoft, Shawn currently oversees all Database Systems Group, with responsibility for SQL Server RDBMS, Azure SQL Database, Azure SQL Data Warehouse, Azure SQL Virtual Machines, Analytics Platform System (APS), including guiding the vision, strategy, engineering and market delivery.

In his presentation, Shawn will elaborate on Azure’s astute portfolio of data services for modern cloud application development. Azure offers functional capabilities for relational, non-relational, NoSQL, real-time streaming, and search workloads. This series will help you learn how to combine Azure SQL Database, DocumentDB, and Azure Search to ingest store documents, social data, and structured transaction data to build application with different data needs.

Additionally, there will be a fun giveaway to the first 500 who attend. Don’t miss out on this opportunity to learn the beneficial and updated realms of Azure SQL Database!

04 Dec 17:27

PASS Summit 2015 Keynote 1

by Rob Farley

I’m back at the PASS Summit. Another year – my sixth now. And I’m sitting at the bloggers’ table, next to Mark Broadbent (@retracement).

The PASS Summit is by far the best SQL Server event in the world each year – even better than the Adelaide SQL Server User Group sessions, and the first keynote is always full of announcements.

It always starts with a bunch of information about how the reach of the SQL community has grown over the years, and it’s an amazing thing to see the impact that the community has these days. Tom La Rock has given the announcements about this, as the current PASS President.

Joseph Sirosh is up now. He’s the newly appointed Corporate VP for the Data Group now. He’s speaking to the changing face of data. I feel like this is a regular spot at keynotes – data has been changing so fast for years now, and the things that people are using data for becomes more and more impressive. Last year there it was about analysing shopping patterns for people who move through a department store – this year it’s looking at huge quantities of medical information to predict current and future medical conditions. This lets people save lives with data now, because early intervention becomes even more possible.

Eric Flesichman is a Chief Architect and is a VP in Platform Engineering at DocuSign. He’s talking about how SQL Server was the right fit for them, and this fits in with what I hear at customers too. Microsoft is the leader in both Ability to Execute and Completeness of Vision in the latest the Magic Quadrant by Gartner (as has just been pointed out by Joseph), and people are finding that SQL Server continues to become the sensible choice for even the largest of organisations.

Shawn Bice up now – General Manager of the Database Systems Group. He’s showing the new features of SQL Server 2016, and pointing out it’s all built-in. It’s not about Add-Ons – everything is part of the product. He talks about how it’s the leader in Mission Critical, the least vulnerable, the highest performing, cheapest cost for BI, and the Advanced Analytics that has come with R’s integration within the platform. R is the biggest language amongst data scientists, whether they are solving problems in space, or marketing, or wherever. To make SQL Server the most significant data platform for data scientists, this R integration is critical.

HA and DR has been improved with better algorithms for data transfer. I see this as incredibly important. Every improvement in compression and parallelism is an improvement in moving data around, whether between servers within an on-prem system, a hybrid system, or pure Azure.

PolyBase comes into SQL Server 2016 to let people use T-SQL over Hadoop. I’m used to having PolyBase through APS, and the idea of being able to hook into Hadoop data stores from “regular SQL” provides numerous opportunities. This is going to lower the barrier for people who want to leverage Hadoop into their current environments. This is really exciting.

The columnstore improvements in SQL 2014 meant that columnstore data could be updated, but in SQL 2016 we get updateable non-clustered columnstore indexes. For people who redesigned tables to leverage columnstore, or rather, saw changes that they would want to make to leverage columnstore and decided against columnstore – these people can now put an updateable columnstore index on a subset of the columns in a table, and leverage the technology much more easily. Again, lowering the barrier.

Rohan Kumar, a Partner Director in Engineering, is showing the impact of this, with a live dashboard, using a non-clustered columnstore index to explore data. He’s looking at how the R integration and the columnstore improvements provide a platform to discover anomalies in data in much quicker time than ever. I know that fraud analysis happens in close-to-real time within banks, but these changes make this kind of work available to many more organisations.

Sadly, Rohan then opens Profiler to show what’s happening behind the scenes with AlwaysEncrypted. But it’s Profiler, and everyone has been trying to move off Profiler for some years now. AlwaysEncrypted is impressive, but Profiler???

Stretched Databases make up the last main demo, and the keynote wraps up. The mood in the place is that these are exciting times.

@rob_farley

04 Dec 16:27

#sqlcares

by BuckWoody
#sqlcares The SQL Server community is amazing. I’ve been in technology for 30 years, and it’s the most uniquely connected … Continue reading →...(read more)
14 Nov 00:03

10 Problems Better Addressed by NoSQL than SQL

by A.R. Guess

by Angela Guess Tim Stephan, Senior Director of Product Marketing for Couchbase, recently wrote in Network World, “Once only used by the likes of Google, Amazon and Facebook, many industries are now adopting NoSQL database technology for crucial business applications, replacing their relational database deployments to gain flexibility and scalability. Here are 10 enterprise use…

The post 10 Problems Better Addressed by NoSQL than SQL appeared first on DATAVERSITY.

14 Nov 00:03

Are you encouraging data professionals to move from doing repairs to being innovators?

by David Hobbs-Mallyon

If you’re a business or IT decision-maker, when you send IT staff to a technical conference you’re probably thinking about the value of having well-trained technologists to make sure your infrastructure keeps running smoothly. But an interesting point came to light during a recent conversation with some database experts who were preparing to speak at the SQL Server PASS Summit: If you’re not asking your conference attendees to focus on your organization’s strategic objectives—in addition to technical training—maybe you’re missing an opportunity to gain a business advantage.

SQL Server data professionals from all over the world converged on Seattle for the annual PASS Summit from October 27-30, 2015. This community gathering provides peer-to-peer technical training and an opportunity to geek out directly with Microsoft executives and developers to solve problems and learn about the latest data platform developments. As newly elected PASS board member Ryan Adams, a consultant for Linchpin People, said, “In this industry, you have to constantly learn, and the PASS conference is unlike any other. The speakers are there because they do this every day and want to help others. The speakers are implementing this stuff every day, so the type of sessions and information you get are incredible.”

But what if you sent your organization’s data professionals to PASS with the mission of identifying how the data platform could be an engine of innovation for your organization? Stuart Ainsworth a consultant with Linchpin People who spoke about Hadoop at PASS, noted that the CEO and CIO have an opportunity to sit down before the conference and ask the people they’re sending to scope out what’s new and how data platform capabilities could be key to business growth. Ainsworth said, there’s “a huge opportunity to begin to steer the business by taking advantage of emerging—and even existing—technologies” to gain competitive advantage.

Often the business perspective is in its own silo, and IT is geared toward problem-solving for the business. But data professionals are eager to apply their expertise to helping the organization innovate. As Ainsworth put it, “We want to get out of the mentality of making do. We really want to get into the mentality of pushing the company further faster and reducing time to market. It’s exposure to the new technologies at conferences like PASS that can help you gain that mentality. We have to transition from being repairmen to being innovators.” The way to do that is to combine the perspectives of the business with the knowledge of the technologists.

The idea of making IT strategic is far from new, but it means that both management and IT need to consciously consider each other’s expertise. In "6 traits of up-and-coming IT stars," technical recruiter Martha Heller underscores the importance of being able to connect the dots between technology and business strategy:

“We look for people with the ability to contextualize the work they do in the broader context of the business and its goals.This kind of end-to-end process thinking -- the ability to think broader than your particular role and responsibility -- means IT people are ready to make improvements far beyond their specific areas of expertise."

Nicholas R. Colisto (“The CIO Playbook: Strategies and Best Practices for IT Leaders to Deliver Value”) insists that “To capitalize on innovation in a competitive global economy, CIOs must solve today’s problems yesterday by seizing opportunities to leverage emerging technologies.” If your organization’s database professionals attended PASS, are you mining the knowledge they’re bringing back? Think about how they can offer new insights based on the exciting technologies they learn about.

Prepare yourself for the discussion by attending the webinar on the how to "Tackle the top 5 data challenges with SQL Server" or by checking out the mission critical advances in SQL Server 2016.

13 Nov 23:58

A Mess of Nested Transactions

by Gail

Transactions are an area that I often find are used badly, or not used at all. Transactions without any error handling, nested transactions, named transactions named for no good reason or just plain lack of transactions where they should be used.

This is going to be the first of three posts looking at some aspects of transactions and transaction management. And deserving of the first post is that most misunderstood of things, the nested transaction.

Before getting into the details, I need to make one thing clear. Nested transactions are a lie. They do not exist in SQL Server.

A piece of code may appear to have multiple nested transactions in it. It may have multiple BEGIN TRANSACTION statement in it. It only has one transaction though.

An easy way to show this would be to look at the transaction log. Since transactions are there to make multiple statements atomic (committed or rolled back as a unit), the transactions must be reflected in the transaction log to allow SQL to know where to roll things back to or what statements need to be undone/redone as a unit.

CREATE TABLE TestingTransactions (
SomeNumber INT,
SomeString VARCHAR(50)
);
GO

INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
VALUES  (0, 'Zero');

CHECKPOINT; -- mark the log as reusable

BEGIN TRANSACTION

	INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
	VALUES  (1, 'One');

	BEGIN TRANSACTION

		INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
		VALUES  (2, 'Two');

		BEGIN TRANSACTION

			INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
			VALUES  (3, 'Three');

		COMMIT TRANSACTION

	COMMIT TRANSACTION

COMMIT TRANSACTION

SELECT Operation, Context, [Transaction ID], AllocUnitName, [Transaction Name]
FROM fn_dblog(NULL, NULL);

GO

I’m inserting a row before the transaction because allocating the first page to a table generates a lot of log entries. Those log records are not relevant to the subject at hand and will just confuse things.

One table, three transactions, three inserts, three commits. Let’s see what went into the transaction log.

TransactionLogNestedTransactions

What went into the log was a single begin transaction, a single commit and three inserts. The other two begin transaction statements, and the other two commit transaction statements were completely ignored for the purposes of the transaction log.

The other two begin transaction statements did do something. Not very much, and nothing that reflected in the transaction log, but they did do one thing, and if I make the code a little more complicated it’ll become clear what that one thing is.

CHECKPOINT; -- mark the log as reusable

BEGIN TRANSACTION

	INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
	OUTPUT inserted.*
	VALUES  (1, 'One');
	SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran1;

	BEGIN TRANSACTION

		INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
		OUTPUT inserted.*
		VALUES  (2, 'Two');
		SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran2;

		BEGIN TRANSACTION

			INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
			OUTPUT inserted.*
			VALUES  (3, 'Three');
			SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran3;

		COMMIT TRANSACTION
		SELECT @@TRANCOUNT AS TransactionCountAfterCommitTran1;

	COMMIT TRANSACTION
	SELECT @@TRANCOUNT AS TransactionCountAfterCommitTran2;

COMMIT TRANSACTION
SELECT @@TRANCOUNT AS TransactionCountAfterCommitTran3;

TranCount

Each BEGIN TRANSACTION incremented the value returned by @@TranCount, each COMMIT TRANSACTION decremented the value returned by @@TranCount. If I looked at the transaction log after each statement, the BEGIN_XACT log entry corresponds to the BEGIN TRANSACTION which incremented @@Trancount from 0 to 1, and the COMMIT_XACT log entry corresponds to the COMMIT TRANSACTION which decremented @@Trancount from 1 to 0.

Only the operations which changed @@Trancount to or from 0 are reflected in the transaction log, they’re the only ones which have any effect on what becomes part of the database’s durable state.

Does ROLLBACK then follow the same rules? Does it get ignored unless it’s the last statement of multiple COMMIT/ROLLBACK statements?

CHECKPOINT; -- mark the log as reusable

BEGIN TRANSACTION

	INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
	OUTPUT inserted.*
	VALUES  (1, 'One');
	SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran1;

	BEGIN TRANSACTION

		INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
		OUTPUT inserted.*
		VALUES  (2, 'Two')
		SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran2;

		BEGIN TRANSACTION

			INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
			OUTPUT inserted.*
			VALUES  (3, 'Three')
			SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran3;

		ROLLBACK TRANSACTION
		SELECT @@TRANCOUNT AS TransactionCountAfterRollbackTran1;

	COMMIT TRANSACTION
	SELECT @@TRANCOUNT AS TransactionCountAfterCommitTran2;

COMMIT TRANSACTION
SELECT @@TRANCOUNT AS TransactionCountAfterCommitTran3;

SELECT Operation, Context, [Transaction ID], AllocUnitName, [Transaction Name]
FROM fn_dblog(NULL, NULL);

 

RollbackTransaction

TransactionLogRollbackTransaction

No. The ROLLBACK TRANSACTION set the value of @@Trancount directly to 0 and undid all of the inserts back to the outermost BEGIN TRANSACTION. The COMMIT TRANSACTION statements which ran after it all threw errors

Msg 3902, Level 16, State 1, Line 39
The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION.

This is the danger of nesting transactions, that one ROLLBACK, anywhere after the first BEGIN TRANSACTION will undo everything right the way back to that first BEGIN TRANSACTION. These are not autonomous transactions being run one inside another, there’s a single transaction and rollback anywhere will roll back everything.

This can be a debugging nightmare when the application creates a transaction and calls a procedure which begins a transaction which calls other procedures which begin and commit transactions. If something goes wrong in one procedure and that procedure rolls the transaction back, it can end up undoing a lot more work than the developer of that procedure may have expected.

In general I would recommend only starting and committing transactions at a single level, and making transaction handling part of the application’s design, not something that falls out of the implementation by accident. If there’s an intentional design around how transactions will be handled (and choosing not to have any is not a good idea), then it’s a lot less likely that there will be transaction-related bugs that need to be found and fixed.

And what happens when working with an existing app where this wasn’t done? I’ll look at that in the next post on transactions, on conditionally starting transactions and on the use of savepoints.

The full list is:

  1. A Mess of Nested Transactions (This post)
  2. Savepoints and conditional transactions
  3. Why would you want to name a transaction?
  4. When naming transactions causes an error
13 Nov 23:57

My Week at the PASS Summit 2015

by noreply@blogger.com (Jessica M. Moss)
Last week the SQL PASS organization held the annual PASS Summit in Seattle, Washington.  The PASS Summit is a week-long conference that brings thousands of SQL Server, Business Intelligence, and Business Analyst professionals together to learn all about best practices in use today and about new features coming in the next version.  I was able to attend by volunteering to help with PASS and by the good graces of my company and had an amazing week!  Many thanks go to all of the organizers, speakers, volunteers, and sponsors who put on another great event.


The conference has two full-day preconference sessions on Monday and Tuesday which can be purchased in addition to the conference.  These trainings are amazing and definitely worth your while if you want more training.  I ended up flying out on Tuesday to start with the main conference on Wednesday.  On Tuesday, I explored Seattle a bit, and even visited the space needle (my first time in all my years visiting Seattle!).  I checked into the conference, visited with some friends at the Denny Cherry Associates and SIOS #sqlkaraoke event, and went to bed early to prepare for the next day.


Wednesday


Wednesday, the first day of the conference, started with some great sessions.  James Phillips and team presented the Foundation Session on Microsoft Business Intelligence, which talked about the new business intelligence vision (consistency and modernization across all of the reporting tools).  Here are a few of my favorite features:
  1. SSRS reports pinning to Power BI
  2. Auto-insights coming in Power BI
  3. DAX will have intellisense, comments, formatting, and new functions
  4. SSAS tabular will optimize DirectQuery querying
  5. Power BI will provide a new visualization every week
  6. Power BI can use data from on premise multidimensional SSAS databases

I attended a few more sessions and closed out the day by attending an executive meet and greet sponsored by Microsoft and the PragmaticWorks #sqlkaraoke event.


Thursday


Thursday was another great session day.  I attended a Microsoft SSIS Focus session on the new features coming in SSIS 2016.  Jimmy Wong presented many of the new features and took feedback from the group on their thoughts.  Some of the cool new things that will be coming include:
  1. Incremental package deployment (no more "all-or-nothing")
  2. Ability to turn on an "optimize buffer size" option in packages
  3. Execution of R in SQL, which can be called through and SSIS package
  4. Ability to adjust how the ForEach Loop loops through files

There were a few other sessions, and then I closed out the evening at the Community Appreciation Party at the EMP museum (an amazing museum if you haven’t been there).


Friday


On Friday, I hosted a Birds of a Feather table, where people working on similar topics (in my case, it was business intelligence architecture and design) sit together during lunch to discuss the topic.  I had a great table with a split between more advanced professionals and newbies to BI and architecture.  We covered topics from choosing products to modeling to ETL performance tuning and more.  Thanks everyone who attended!


Friday was the last day, so I chatted with some friends, explored Seattle more, and relaxed a bit before week 2 at the MVP Summit.  I can't post anything about it, but be sure I'm sharing your feedback with Microsoft as much as I can :)


More information


For more information on the PASS, the Summit, and announcements, visit the following links:

  1. Information about the Summit: http://www.sqlpass.org/summit/2015/About.aspx
  2. Watch some session on PASSTV: http://www.sqlpass.org/summit/2015/Live.aspx
  3. Reporting Roadmap released during the Summit: http://blogs.technet.com/b/dataplatforminsider/archive/2015/10/29/microsoft-business-intelligence-our-reporting-roadmap.aspx
  4. Latest version of SQL Server 2016 CTP 3: http://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2016




13 Nov 23:57

U-SQL for your Big Data processing

by SQLMaster

As a Microsoft technology user you might have come across about TSQL, iSQL, OSQL and ANSI-SQL. Here is the new kid on the ranch, U-SQL.

A new language with a power of SQL expressive(ness) and produce scalable distributed query capability to analyze data across the relational stores such as Azure SQL database.

Not sure why and who picked up the terminology to call it as U-SQL, the guess is as U comes after T in the alphabets.

Here is the quote from VS Engineering team about why U-SQL?

  • Process any type of data. From analyzing BotNet attack patterns from security logs to extracting features from images and videos for machine learning, the language needs to enable you to work on any data.
  • Use custom code easily to express your complex, often proprietary business algorithms. The example scenarios above may all require custom processing that is often not easily expressed in standard query languages, ranging from user defined functions, to custom input and output formats.
  • Scale efficiently to any size of data without you focusing on scale-out topologies, plumbing code, or limitations of a specific distributed infrastructure.

IN order to get to know more about this new language have a look at: Introducing U-SQL – A Language that makes Big Data Processing Easy

 

 

(4)

13 Nov 23:57

Fast Backup and Restore with File-snapshot Backup

by SQL Server Team

SQL Server Data Files in Azure allows you to create databases in SQL Server running in on-premises or in a virtual machine with a dedicated durable location in Azure blob storage. As part of SQL 2016, we introduced File-Snapshot Backup, the ability to perform very fast backup and restore leveraging the Azure Blob Snapshot Technology.

What is File-Snapshot Backup?

In order to fully understand File-Snapshot backup we first need to get an understanding of Azure blob snapshots. An Azure blob snapshot is a read-only version of a blob taken at a point in time. As data in the blob changes over time, the snapshots continue to point to the data at the time when the snapshots were created.

File-Snapshot backup consists of a set of Azure blob snapshots and a SQL backup file that contains pointers to those snapshots.

There are two types of File-Snapshot backups, full database backups and transaction log backups.

A full database backup creates a snapshot of each data and log file comprising the database and establishes the transaction log backup chain. Unlike tradition backups, a File-Snapshot backup does not copy all of the data in the database, it instead relies on the Azure storage service to snapshot the files. This makes the backup process extremely fast.

A transaction log backup creates a snapshot of each database file, records the location of the snapshots in the backup (.bak) file, and truncates the transaction log file. This is unlike traditional transaction log backups where only the transaction log file is captured in the backup.

How do I use File-Snapshot Backup?

The first thing you will need to do is set the recovery model to full and establish the transaction log backup chain with a full backup using File-Snapshot. Below is an example of setting the recovery model to full and creating the first full backup of the database.

-- To set the recovery model to full.
USE master;
GO
ALTER DATABASE [MY_DATABASE_NAME]
   SET RECOVERY FULL;
GO
-- Create a full backup of the database.
BACKUP DATABASE [MY_DATABASE_NAME]
TO URL = 'https://<mystorageaccountname>.blob.core.windows.net/<mycontainername>/<mydatabasename>.bak'
WITH FILE_SNAPSHOT;
GO

This full backup will create a snapshot of each data and log file, create a backup file with pointers to the snapshots, and establish the transaction log backup chain.

After the initial full backup, you only need to perform transaction log backups because each transaction log File-Snapshot backup set contains blob snapshots of all database files and can be used to perform a database restore or a log restore. Below is an example of creating a transaction log backup and appending the time to the file name.

-- Back up the database log using a time stamp in the backup file name.
DECLARE @Log_Filename AS VARCHAR (300);
SET @Log_Filename = 'https://<mystorageaccountname>.blob.core.windows.net/<mycontainername>/>/<mydatabasename>_Log_'+
REPLACE (REPLACE (REPLACE (CONVERT (VARCHAR (40), GETDATE (), 120), '-','_'),':', '_'),' ', '_') + '.trn';
BACKUP LOG [MY_DATABASE_NAME]
 TO URL = @Log_Filename WITH FILE_SNAPSHOT;
GO

The capability of transaction log File-Snapshot backups to be used for a database restore eliminates the need for additional full or differential backups. This greatly simplifies your backup plan.

How do I restore File-Snapshot Backups?

With File-Snapshot backup you can restore to the end of a backup, or you can restore to a specific point in time. For both of these restore options you can use restore and replace or restore and move to a new database.

For restoring to the end of a backup you only need the File-Snapshot backup you want to restore to the end of. This will work for both full backups and transaction log backups. Below demonstrates restoring to the end of a transaction log File-Snapshot backup to a new database.

RESTORE DATABASE [MY_DATABASE_NAME] FROM URL = 'https://<mystorageaccountname>.blob.core.windows.net/<mycontainername>/<mydatabasename>_2015_05_18_16_00_00.trn'
WITH RECOVERY,
MOVE mydatabasename_Data TO 'https://<mystorageaccountname>.blob.core.windows.net/<mycontainername>/<newdatabasename>_Data.mdf';
MOVE mydatabasename_Log TO 'https://<mystorageaccountname>.blob.core.windows.net/<mycontainername>/<newdatabasename>_Log.ldf';
GO

For restoring to a specific point in time, you will need the File-Snapshot backup before the point in time and the File-Snapshot backup after the point in time. Below demonstrates to restoring to May 18th, 2015 at 5:35 pm using two transaction log File-Snapshot backups overwriting the original database.

RESTORE DATABASE [MY_DATABASE_NAME] FROM URL = 'https://<mystorageaccountname>.blob.core.windows.net/<mycontainername>/<mydatabasename>_2015_05_18_16_00_00.trn'
WITH NORECOVERY,REPLACE;
GO

RESTORE LOG [MY_DATABASE_NAME] FROM URL = 'https://<mystorageaccountname>.blob.core.windows.net/<mycontainername>/<mydatabasename>_2015_05_18_18_00_00.trn'
WITH RECOVERY,STOPAT = 'May 18, 2015 5:35 PM';
GO

Unlike traditional point in time restore, where you need all the transaction log backups from the most recent full or differential backup before the point in time, File-Snapshot backup only requires the backup before the point in time, and the backup after the point in time. The image below illustrates what is required to restore to a place in the last log backup of a backup chain using traditional streaming backup versus using File-Snapshot backup. Note that difference in the number of backups required for File-Snapshot backup.

So really, how fast is it?

In the File-Snapshot Backup Channel 9 Video, the performance of point in time restore with File-Snapshot backup vs traditional streaming backup was compared. The demo environment consisted of the following:

  • SQL Server running on a D13 VM in Azure with 8 CPU cores and 56 GB of memory.
  • Database with 10 data files and one log file, all stored in Azure blob storage.
  • A 1 GB table with three columns; int, datetime2, and binary(4000)
    Next, 3.5 GB of data was generated over a two-hour period with log backups being take every 20 minutes. A point in time in the last (6th) backup file was restored to.

With traditional streaming backup the restore took over 30 minutes, while with File-Snapshot backup the restore took less than one minute. In this example, point in time restore with File-Snapshot was over 30x faster than traditional streaming backup.

How do I get started?

The best way to get started is to follow our tutorial on Using Azure Blob storage service with SQL Server 2016 databases. This will take you through the steps of creating a database with files stored directly on Azure blobs all the way to managing your File-Snapshot backups.

You also will want to check out the following:

13 Nov 23:56

Do You Really Need a Data Scientist?

by A.R. Guess

by Angela Guess Jelani Harper recently opined in Datanami, “Many organizations assume that big data initiatives require mythical data scientists. This notion is partly propagated by the media scrutiny surrounding the data scientist shortage and by the numerous responsibilities of this profession, which include data preparation, cognizance of business problems and analytics, among others. Harvard […]

The post Do You Really Need a Data Scientist? appeared first on DATAVERSITY.

13 Nov 23:56

5 Warning Signs on Data Scientist Resumes

by A.R. Guess

by Angela Guess Kaiser Fung recently opined in the RSquare Edge blog, “As a hiring manager for data analytics positions, I often complain that there are not enough qualified resumes. Most of the resumes that do get passed on to me from recruiters quickly get filed away. Those job candidates belong to one of five […]

The post 5 Warning Signs on Data Scientist Resumes appeared first on DATAVERSITY.

13 Nov 23:28

SSIS 2016 Supports Single-Package Deployment

by andyleonard
One of the most anticipated features of SQL Server 2016 Integration Services (SSIS 2016) is support for deploying a single SSIS package from an SSIS project. “Why Is This A Big Deal, Andy?” I’m glad you asked. Consider the following scenario: You deploy...(read more)
13 Nov 23:28

Redundancy Options in Azure Blob Storage

by James Serra

So you have data in Azure Blob Storage and are concerned about reliability.  Have no fear!  There are four replication options for redundancy:

1. Locally Redundant Storage (LRS): All data in the storage account is made highly durable and available within a facility/datacenter by replicating transactions synchronously to three different storage nodes within a single storage cluster (today those clusters are 20 racks withing the same building).  Each storage node is on a different fault and upgrade domain to ensure recovery from common failures (disk, node, rack) without impacting your storage account’s availability and durability.  All storage writes are performed synchronously across the three replicas in the three separate fault domains before success is returned back to the client. Note there is an option to have premium storage LRS which means the storage is SSD-based.  For internal details of Windows Azure Storage check out a whitepaper here, a video here, and the slides here.

2. Zone-Redundant Storage (ZRS): Zone-redundant storage (ZRS) stores three replicas of your data across two to three facilities, with two of the facilities at least 25 miles apart.  It is designed to keep all three replicas within in a single region (but may span across two regions), providing higher durability than LRS (which replicates data only within the same facility).  Data is synchronously replicated when the multiple facilities are within a region; in the event that multiple regions are needed, three copies are synchronously stored within the region and data is asynchronously replicated to facilities in other regions.  If your storage account has ZRS enabled, then your data is durable even in the case of failure at one of the facilities.  ZRS is currently available only for block blobs.

3. Geo-Redundant Storage (GRS): Like LRS, transactions are replicated synchronously to three different storage nodes within the same primary region chosen for creating the storage account (there are 24 regions throughout the world, which each region made up of multiple datacenters).  However, transactions are also queued for asynchronous replication to another secondary region that is hundreds of miles away (usually 300 miles or more) from the primary (geo-replication).  In this secondary region the data is again made durable by replicating it to three more storage nodes there (i.e. total of 6 copies).  So even in the case of a complete regional outage or a regional disaster in which the primary location is not recoverable, your data is still durable.

4. Read-Access Geo-Redundant Storage (RA-GRS): This is the default option for redundancy when a storage account is created.  For a GRS storage account, there is the ability to turn on read-only access to a storage account’s data in the secondary region.  Since replication to the secondary region is done asynchronously, this provides an eventual consistent version of the data to be read from (if the primary goes offline there can be 5-15 minutes of data last written that are not yet available on the secondary).  When you enable read-only access to your secondary region, you get a secondary endpoint in addition to the primary endpoint for accessing your storage account.  This secondary endpoint is similar to the primary endpoint except for the suffix “-secondary”. For example: if the primary endpoint is myaccount.<service>.core.windows.net, the secondary endpoint is myaccount-secondary.<service>.core.windows.net.

When creating a storage account, you will be presented with the following storage account types:

StorageOptions

These options mean the data in your Microsoft Azure storage account is always replicated to ensure durability and high availability, meeting the Azure Storage SLA even in the face of transient hardware failures.

Untitled picture

For locally redundant storage, Microsoft stores CRCs of the data to ensure correctness and periodically reads and validates the CRCs to detect bit rot (random errors occurring on the disk media over a period of time).  If CRC fails, the data is recovered via an automated process.  And since each VM disk is a blob in Azure storage, if CRC fails on a disk it is automatically commissioned/decommissioned.

Some customers may choose Locally Redundant Storage for storage that does not require the additional durability of Geo Redundant Storage and want to benefit from the discounted price compared to GRS.  This data typically falls into the categories of (a) non-critical or temporary data (such as logs), or (b) data that can be recreated if it is ever lost from sources stored elsewhere.  An example of the latter is encoded media files that could be recreated from the golden bits stored in another Windows Azure Storage account that uses Geo Redundant Storage.  In addition, some companies have geographical restrictions about what countries their data can be stored in, and choosing Locally Redundant Storage ensures that the data is only stored in the location chosen for the storage account.

For remote storage (GRS and RA-GRS), in the event of a major disaster that affects the primary storage location, Microsoft will manually first try to restore the primary location.  Restoring of primary is given precedence since failing over to secondary may result in recent delta changes being lost because of the nature of replication being asynchronous, and not all applications may prefer failing over if the availability to the primary can be restored.  Dependent upon the nature of the disaster and its impacts, in some very rare occasions, Microsoft may not be able to restore the primary location, and they would need to perform a geo-failover.

When this happens, affected customers will be notified via their subscription contact information or via the Azure portal.  As part of the failover, the customer’s “account.service.core.windows.net” DNS entry would be updated to point from the primary location to the secondary location.  Once this DNS change is propagated, the existing Blob URIs will work.  This means that you do not need to change your application’s URIs – all existing URIs will work the same before and after a geo-failover.

After the failover occurs, the location that is accepting traffic is considered the new primary location for the storage account.  This location will remain as the primary location unless another geo-failover was to occur.  Once the new primary is up and accepting traffic, Microsoft will bootstrap to a new secondary to get the data geo redundant again.

More info:

Windows Azure Storage Redundancy Options and Read Access Geo Redundant Storage

Azure Storage replication

Azure Business Continuity Technical Guidance

13 Nov 23:28

NTT Docomo uses SQL Server to handle billing system traffic data and improve customer service

by David Hobbs-Mallyon

To invest in the data platform technology that will serve the future of your business and allow it to grow, it’s crucial to have a deep understanding of the data driving your business strategy, as well as the data that will let you deliver on that strategy. This distinction is subtle, but important. A recent study by Hitachi Data Systems, Information Innovation Index, found that of the 200 UK IT decision makers whom Hitachi surveyed, 93 percent didn’t know where to strategically invest in technology. The reason, according to 75 percent of respondents was that they did not clearly understand their organization’s strategy and data.

In an article discussing this study, Richard Gadd, UK managing director for Hitachi Data Systems is quoted as explaining that the problem these statistics highlight is about “having the ability to garner valuable business insights to make informed technology investments that will drive future growth and enable … organizations to redefine business agility.”

Gadd concludes that “to stay relevant for the long term, organizations need to know their business inside out so they can quickly capitalize on new revenue opportunities before their competition does.”

For organizations that do understand their business strategy and therefore are able to see where they need to invest in technology, the results can be dramatic. Take the example of NTT DoCoMo, Japan’s largest mobile communications carrier.

NTT DoCoMo recognized that growth in the variety of mobile platforms and services required a data analysis and usage system to help the organization understand and deal with quickly changing customer needs. To address this requirement, according to Yuki Shimamura, General Manager, “Establishment of new traffic data warehouse that can understand and analyze subscribers’ trends in real time was an urgent task in order to provide greater value to customers. We decided to adopt SQL Server 2014 because it satisfies such requirements, and also enables a significant reduction in development and maintenance costs, which were an issue with the existing system.”

Clearly, the company’s understanding of its business and customer needs drove the decision of what technology to invest in. Masaki Inoue, Manager Billing System, Information Systems Division, explained that it was important to consider both present and future business needs when deciding which technology to invest in: “If the existing system had simply been updated as it was, it would certainly have been safer and easier. However, the system itself must be flexible enough to be changed in the face of future business opportunities, otherwise new value would never be gained. Those of us who were in charge of the system had to take on the challenge of constructing a new flexible system that could quickly cope with a variety of data analysis demands from top management and the work front in order to achieve the mid-term goals.”

The approach the team used to make the decision was a comparative evaluation of platforms. The deciding factors included the superior cost performance of SQL Server 2014 because the license for the main database engine included everything built-in, such as Integration Services, which is an ETL tool, Analysis Services and Reporting Services. In addition, SQL Server 2014’s columnstore index enables a high performance search through a large amount of data, which was not possible with the former Traffic DWH, so multi-layered processing that used to be needed is no longer required. As a result, quick analysis and usage of billing data was enabled for more than 66 million subscriber lines to further improve customer service.

This example demonstrates the importance of ensuring that an organization’s investments in data platform technology need to be driven by a deep understanding of the business strategy, customer trends, and future needs. In this case, the company found that the mission-critical capabilities included with Microsoft’s data platform met those needs and gave NTT DoCoMo a powerful platform for the present, upon which they can build for the future.

To learn more about SQL Server mission-critical capabilities, click here, or for a more detailed overview read this white paper.

13 Nov 23:27

Internet of Things Will Improve the Lives of City Dwellers First

by A.R. Guess

by Angela Guess Dan Patterson recently wrote in TechRepublic, “The future of cities will be smaller, wired, and very chatty, said Google’s Chief Internet Evangelist Vint Cerf on Wednesday at a panel discussion on the Internet of Things #ReducingCyberRisk at NYU’s Brooklyn-Based PolyTechnic School of Engineering. Other panelists included Deborah Estrin, Professor Computer Science at […]

The post Internet of Things Will Improve the Lives of City Dwellers First appeared first on DATAVERSITY.

13 Nov 23:25

Keep Your Data Lake from Turning Into a Swamp with Semantics

by A.R. Guess

by Angela Guess Jans Aasman recently wrote in Data Informed, “According to Gartner Research Director Nick Heudecker, ‘Data lakes typically begin as ungoverned data stores. Meeting the needs of wider audiences requires curated repositories with governance, semantic consistency, and access controls.’ It’s important not to overly constrain the data, but without sensible governance, users soon […]

The post Keep Your Data Lake from Turning Into a Swamp with Semantics appeared first on DATAVERSITY.

13 Nov 23:25

Machine Learning 101, as Explained by Google

by A.R. Guess

by Angela Guess Danny Sullivan recently wrote for Marketingland, “What exactly is ‘machine learning’ and how do machines teach themselves? Here’s some background drawn from those involved with machine learning at Google itself. Yesterday, Google held a Machine Learning 101 event for a variety of technology journalists. I was one of those in attendance. Despite […]

The post Machine Learning 101, as Explained by Google appeared first on DATAVERSITY.

13 Nov 23:25

What Data Science Means to Data Scientists

by A.R. Guess

by Angela Guess Bob Hayes recently wrote in Customer Think, “The value of data is measured by what you do with it, and organizations are relying on data scientists to extract that value. I recently conducted a survey of data professionals to better understand what it means to be a data scientist. I discovered a […]

The post What Data Science Means to Data Scientists appeared first on DATAVERSITY.

13 Nov 23:25

Post PASS Summit Resolutions

by drsql
Many years I have tried early in the year to list a set of “New Year’s Resolutions” but one thing I have noted every year is it isn’t New Year’s Day that is the defining part of my year, but rather it is the PASS Summit that defines my year. Usually, by the time the end of October rolls around, I am pretty spent. The summer has ended, and all of the events of the year are winding down, heading into the holiday season (I like start the holidays 2 weeks before Halloween and end them on Super Bowl weekend!)...(read more)