Shared posts

26 Jan 07:50

How to deal with Technology Burnout - Maybe it's life's cycles

by Scott Hanselman
Burnout photo by Michael Himbeault used under cc

Sarah Mei had a great series of tweets last week. She's a Founder of RailsBridge, Director of Ruby Central, and the Chief Consultant of DevMynd so she's experienced with work both "on the job" and "on the side." Like me, she organizes OSS projects, conferences, but she also has a life, as do I.

If you're reading this blog, it's likely that you have gone to a User Group or Conference, or in some way did some "on the side" tech activity. It could be that you have a blog, or you tweet, or you do videos, or you volunteer at a school.

With Sarah's permission, I want to take a moment and call out some of these tweets and share my thoughts about them. I think this is an important conversation to have.

This is vital. Life is cyclical. You aren't required or expected to be ON 130 hours a week your entire working life. It's unreasonable to expect that of yourself. Many of you have emailed me about this in the past. "How do you do _____, Scott?" How do you deal with balance, hang with your kids, do your work, do videos, etc.

I don't.

Sometimes I just chill. Sometimes I play video games. Last week I was in bed before 10pm two nights. I totally didn't answer email that night either. Balls were dropped and the world kept spinning.

Sometimes you need to be told it's OK to stop, Dear Reader. Slow down, breathe. Take a knee. Hell, take a day.

Here's where it gets really real. We hear a lot about "burnout." Are you REALLY burnt? Maybe you just need to chill. Maybe going to three User Groups a month (or a week!) is too much? Maybe you're just not that into the tech today/this week/this month. Sometimes I'm so amped on 3D printing and sometimes I'm just...not.

Am I burned out? Nah. Just taking in a break.

Whatever you're working on, likely it will be there later. Will you?

Is your software saving babies? If so, kudos, and please, keep doing whatever you're doing! If not, remember that. Breathe and remember that while the tech is important, so are you and those around you. Take care of yourself and those around you. You all work hard, but are you paying yourself first?

You're no good to us dead.

I realize that not everyone with children in their lives can get/afford a sitter but I do also want to point out that if you can, REST. RESET. My wife and I have Date Night. Not once a month, not occasionally. Every week. As we tell our kids: We were here before you and we'll be here after you leave, so this is our time to talk to each other. See ya!

Thank you, Sarah, for sharing this important reminder with us. Cycles happen.

Related Reading

* Burnout photo by Michael Himbeault used under CC



© 2016 Scott Hanselman. All rights reserved.
     
12 Nov 05:05

Cloud Platform Release Announcements for August 24, 2016

by Cloud Platform Team

This is a blog post of a new ongoing series of consolidated updates from the Cloud Platform team.

In today’s mobile first, cloud first world, Microsoft provides the technologies and tools to enable enterprises to embrace a cloud culture. Our differentiated innovations, comprehensive mobile solutions and developer tools help all of our customers realize the true potential of the cloud first era.

You expect cloud-speed innovation from us, and we’re delivering across the breadth of our Cloud Platform product portfolio. Below is a consolidated list of our latest releases to help you stay current, with links to additional details if you’d like more information. In this update:

  • PowerApps | Public preview – Common data model for PowerApps
  • Operations Management Suite | OMS Security
  • PowerShell (open sourcing) | Public
  • Windows Server 2016 | VMware Migration offer – Available to purchase
  • Azure Premium Storage GA – new geo-availability | US North Central GA
  • MySQL in-app for Azure App Service | Public
  • OMS Log Analytics Linux Management | GA
  • Operations Management Suite | Docker container support
  • Azure SQL Database – P15 | GA
  • Azure SQL Database Security – Azure Active Directory authentication | GA
  • Cognitive Services | PP – Academic Knowledge and Computer Vision APIs
  • Power BI Desktop | GA
  • Power BI service | GA
  • Azure SQL Database – Surface Area Updates | GA – JSON Support
  • SQL Server Migration Assistant for Datazen | GA

PowerApps | Public preview – Common data model for PowerApps

On Monday, August 8, we announced the public preview of the Microsoft Common Data Model (CDM) in PowerApps. With this addition, app creators using PowerApps, will automatically get new capabilities that will enable them to store and manage data unique to the app their building through CDM. This will enable them to create more powerful business apps more quickly, combining data you already have in existing system with data unique to the application.

PowerApps and the Microsoft Common Data model play an important role in our business application platform innovation strategy, enabling power users to creating powerful apps without writing code.

To learn more, read the full blog post and visit the PowerApps page.

Operations Management Suite | OMS Security GA

Today’s IT operations and security teams are tasked with managing increasingly complex environments that are exposed to more sophisticated security threats than ever before. With Operations Management Suite (OMS) Security, you can now quickly and easily assess security posture and detect security threats across your hybrid cloud environments. By leveraging the powerful combination of security log event data, threat intelligence, and Microsoft security expertise, you can detect security threats sooner, and therefore mitigate damage to your business. Also, as OMS Security is born in the cloud, it enables you to cost effectively capture the critical security data at hyper-scale, providing a complete and comprehensive security management solution.

To try OMS Security, please visit the Operations Management Suite webpage for a free account.

PowerShell (open sourcing) | Public

On Thursday, August 18, PowerShell was open sourced and the alpha builds of PowerShell on Linux (Ubuntu, Centos, Red Hat) and MacOS are now available in GitHub.

As detailed in Jeffrey Snover’s Azure blog post announcing PowerShell availability on Linux, PowerShell provides you with a heterogeneous automation and configuration management framework that works well with your existing tools and is optimized for dealing with structured data (e.g. JSON, CSV, XML, etc.), REST APIs, and object models.

You can use Azure management capabilities through Operations Management Suite (OMS) and take advantage of PowerShell and Desired State Configuration (DSC) to manage not only Windows Server but Linux environments as well.

Through OMS Automation, you can graphically author and manage all PowerShell resources including runbooks, DSC configurations and DSC node configurations from one place. Using OMS hybrid runbook workers, you can extend your OMS Automation capability and apply, monitor and update configurations anywhere, including on-premises. You can also manage Linux and Windows systems from any client machine (Linux, MacOS, or Windows).

To learn more, read the full Azure blog post and visit the PowerShell page.

Windows Server 2016 | VMware Migration offer – Available to purchase

On Wednesday, August 24, we announced the availability of the new VMware migration offer. This offer will provide free Windows Server Datacenter licenses for VMware customers committing to move to Hyper-V when customers buy Windows Server Datacenter + Software Assurance.

With the upcoming launch of Windows Server and System Center 2016, Microsoft believes the transition to a proven cloud platform is key to customers in order to address security, performance and cost concerns. Windows Server 2016 and System Center 2016 were built with the learnings from running the hyper-scale Azure datacenters and will bring unique and exciting new features, such as Shielded Virtual Machines that uses BitLocker to encrypt disk and state of virtual machines and also ensures Hyper-V hosts running these Shielded Virtual Machines are “allowed and healthy” hosts. In the software-defined datacenter space, Storage Spaces Direct and new software-defined networking features will deliver the performance and flexibility customers are looking for, at no extra charge.

For more information on the offer, access the VMware Shift website. For more information on Windows Server 2016, access the Windows Server page.

Azure Premium Storage GA – new geo-availability | US North Central GA

Azure Premium Storage is a solid-state drive (SSD)–based storage solution designed to support I/O-intensive workloads. With Premium Storage, you can add up to 64 TB of persistent storage per virtual machine (VM), with more than 80,000 IOPS per VM and extremely low latencies for read operations.

Offering a service-level agreement (SLA) for 99.9 percent availability, Premium Storage is now available in the US North Central region, as well as these previously announced regions. Learn more about Premium Storage.

MySQL in-app for Azure App Service | Public

Today we’re announcing a new feature (in preview) for Web developers that are using Azure App Service to create Web applications that use MySQL. MySQL in-app enables developers to run the MySQL server side-by-side with their Web application within the same environment, which makes it easier to develop and test PHP applications that use MySQL. This feature has no additional costs and shares the resource of the existing plan. This feature is designed to facilitate development and testing and has certain limitations, and we are not recommending it for production.

Check the Azure Blog for more information on this feature and how to get started.

Operations Management Suite Log Analytics Linux Management | GA

Microsoft Operations Management Suite (OMS) now offers expanded Linux monitoring capabilities with the general availability of the OMS Log Analytics agent for Linux. Built on the Open Source project FluentD, the agent provides actionable insights for your applications and workloads across major Linux distributions. You can onboard your Linux systems into OMS and get rich analytics including near real-time performance metrics. You can collect, correlate and search on Linux system logs and process metrics. You can also bring alerts from other monitoring solutions such as Nagios and Zabbix.

Now OMS transforms your cloud experience across Windows Server and Linux environments. Try Operations Management Suite today.

Operations Management Suite | Docker container support

Microsoft Operations Management Suite (OMS) is extending its monitoring capabilities to Docker Containers. By nature, containers are light-weight and easily provisioned and as such they can be difficult to manage. Without a centralized approach to monitoring, customers may find it difficult to respond to critical issues at speed. Using the OMS Docker Container monitoring, you can get visibility into container inventory, performance, and logs from one place. The monitoring solution also provides a simplified view of Docker Containers usage, and lets you diagnose problems across cloud and on-premises environments. As is the case in other OMS solutions, the logs, inventory, and performance data collected are searchable to create custom reports and alerting.

If you have Docker containers, you can start using this solution now in public preview. Try Operations Management Suite today.

Azure SQL Database – P15 | GA

New Azure SQL Database Premium performance level, P15, generally available

Pricing | SQL Database

As our customers build even more powerful applications in Azure using Azure SQL Database, they need more options to scale performance for high-demand workloads. That’s why we’re announcing a new performance level within our Premium service tier: P15. P15 offers 4,000 Database Transaction Units (DTUs)—that’s twice as powerful as our P11 offering—with extremely fast transactional performance, real-time analytics, and up to 1 TB of storage. An advantage to running a SQL database on Azure is being able to scale performance up or down, on the fly, to adapt to changing workload demands. This new performance level lets you scale up database performance for explosive growth of application workloads.

For more information, please check out the SQL Database options and performance: Understand what’s available in each service tier documentation article.

Azure SQL Database Security – Azure Active Directory authentication | GA

Help secure data in Azure SQL Database with Azure Active Directory authentication, now generally available

SQL Database

Azure SQL Database offers a set of simple-to-implement, built-in features to help secure data from malicious and unauthorized users. Azure Active Directory (Azure AD) support in Azure SQL Database is now generally available.

With SQL Database support for Azure AD authentication, you now have a mechanism of connecting to SQL Database by using identities in Azure AD for managed and federated domains. With Azure AD authentication, customers can manage the identities of database users and other Microsoft services in one central location.

To learn more the GA of Azure AD authentication with SQL Database, please review our blog.  Many more security features are available in SQL Database. To start using these features, customers can read the overview on securing their SQL Database across connectivity, authentication, authorization, encryption, auditing, and compliance.

Cognitive Services | PP – Academic Knowledge and Computer Vision APIs

Microsoft Cognitive Services are a collection of APIs which enable developers to tap into high-quality vision, speech, language, knowledge and search technologies, developed with decades of Microsoft research to build intelligent apps.

We’re excited to announce the public preview availability of the following APIs:

  • Computer Vision API, which gives you the tools to understand the contents of any image. Create tags identifying objects, beings, or actions present in the image, and then craft coherent sentences to describe it. With Computer Vision API, you are able to extract rich information from images to categorize and process visual data—and protect users from unwanted content.
  • Academic Knowledge API, which helps to tap into the wealth of academic content by applying the Knowledge Exploration Service to the Microsoft Academic Graph. Users can start from natural language queries, or you can ping the graph directly through structured query expressions.

Computer Vision API and Academic Knowledge API are available as standalone services, and public preview pricing went into effect on August 4, 2016.

For more information, please visit Computer Visions API and Academic Knowledge API pages.

Power BI Desktop | GA

New and most frequently requested Power BI Desktop features are now available to business analysts.

  • Analytics pane will be the new central location for all analytical features.
  • Dynamic reference lines provide the ability to create multiple data bound reference lines on clustered column and bar charts, line charts, and scatter charts—calculated based on the max, min, median, average, or different percentiles of a selected measure. You can add as many reference lines as you need, and for each line you can control name, what measure it is based on, the style, and if it has a label or not.
  • New data connectors include Impala DirectQuery support (preview), Snowflake connector (preview), improved Web connector, and GA of SAP BW connector.

Download the latest Power BI Desktop to experience the new features immediately. For more information on these new features and others, visit the Power BI blog.

Power BI service | GA

More new and most frequently requested Power BI features are now available to end users and business analysts in the month of July.

  • Real-time streaming (preview) provides functionality which makes it even easier to stream real-time data to Power BI, and to see that data light up in your dashboards.
  • Auditing (preview) enables logging and monitoring of user activities.
  • Azure Active Directory (AAD) Conditional Access provides the ability to set user permissions using AAD.

Sign in to powerbi.microsoft.com to experience the new features immediately. For more information on these new features and others, visit the Power BI blog.

Azure SQL Database – Surface Area Updates | GA – JSON Support

We are happy to announce that you can now query and store both relational and textual data formatted in JavaScript Object Notation (JSON) using Azure SQL Database. Azure SQL Database provides simple built-in functions that read data from JSON text, transform JSON text into table and format data from SQL tables as JSON. JSON in Azure SQL Database enables you to build and exchange data with modern web, mobile, and HTM5/JavaScript single-page applications.

Now you can easily integrate your Azure SQL Database with any service that uses JSON. Learn more.

SQL Server Migration Assistant for Datazen | GA

The SQL Server Migration Assistant for Datazen is now generally available. This tool is designed to help organizations migrate their existing Datazen Server content, including dashboards and KPI’s, to a SQL Server 2016 Reporting Services server. The SQL Server Migration Assistant for Datazen is available for download. For more information please read the blog post.

03 Nov 16:26

VMware ESXi 5.0 & 5.1 – End of General Support: August 2016

by Simon Seagrave

VMware ESXi 5.0 & 5.1 – End of General Support ShortlyHow time flies!  It only seems like yesterday that VMware was announcing ESXi 5, though this was back in 2011 – 5 years ago!

There have been lots of great ESXi releases and updates since that time which has seen the entire vSphere portfolio mature and maintain its place as the industries leading virtualization solution.

Anyway…  I just wanted to give those folks out there who may still be running ESXi 5.0 and/or 5.1 hosts that general support ends on the 24th August this year (2016), so definitely a good time to consider upgrading to a later version.  As an aside, ESXi 5.5 doesn’t see it’s general support end until 19th September 2018.

image

For the complete VMware ‘Product Lifecycle Matrix’ check out this pdf here.

The post VMware ESXi 5.0 & 5.1 – End of General Support: August 2016 appeared first on TechHead and was written by Simon Seagrave.

03 Nov 16:25

vCenter appliance database issue

by Gabrie van Zanten

Recently I had an issue in my homelab environment. Because of some power outages, my vCenter Appliance hadn’t been shutdown correctly and now vCenter didn’t start correctly anymore. After some searching I found that the database could not be loaded. In the VMware KBs I couldn’t find anything that fixes the start up of the database it self. Mostly it is about resetting the database, but even though my environment is quite small, I had VSAN running in it and was afraid about what would happen if I connect a clean vCenter to the existing hosts. So I decided to dive in and try and fix it at the database level.

To see what was going on, I first check the vpxd.log ( /var/log/vmware/vpxd/vpxd.log) and found that a login to the database was not possible:

info vpxd[7FF9A8AD97A0] [Originator@6876 sub=vpxdVdb] [VpxdVdb::SetDBType] Logging in to DSN: VMware VirtualCenter with username vc

error vpxd[7FF9A8AD97A0] [Originator@6876 sub=vpxdVdb] [VpxdVdb::SetDBType] Failed to connect to database: ODBC error: (08001) - [unixODBC]Could not connect to the server; --> Connection refused [127.0.0.1:5432].  Retry attempt: 1 ...
Then I wanted to check if the database was running at all. In the database logs (/storage/db/vpostgres/pg_log/postgresql.log) I saw the following lines:
2016-09-10 19:02:12.294 UTC 57d458b4.21d8 0   LOG:  database system was interrupted; last known up at 2016-05-16 22:58:35 UTC

2016-09-10 19:02:14.920 UTC 57d458b4.21d8 0   LOG:  unexpected pageaddr E/C8000000 in log segment 000000010000000E000000CC, offset 0

2016-09-10 19:02:14.920 UTC 57d458b4.21d8 0   LOG:  invalid primary checkpoint record

2016-09-10 19:02:14.920 UTC 57d458b4.21d8 0   LOG:  unexpected pageaddr E/C8000000 in log segment 000000010000000E000000CC, offset 0

2016-09-10 19:02:14.920 UTC 57d458b4.21d8 0   LOG:  invalid secondary checkpoint record

2016-09-10 19:02:14.920 UTC 57d458b4.21d8 0   PANIC:  could not locate a valid checkpoint record

2016-09-10 19:02:14.920 UTC 57d458b1.20bf 0   LOG:  startup process (PID 8664) was terminated by signal 6: Aborted

2016-09-10 19:02:14.920 UTC 57d458b1.20bf 0   LOG:  aborting startup due to startup process failure
Some Google assistance on “PANIC:  could not locate a valid checkpoint record” learn that there probably was a checkpoint not cleared properly because of the unclean shutdown. Suggested solutions talked about using pg_resetxlog which will reset the write-ahead log and other control information of a PostgreSQL database cluster.
** Warning ** Nowhere can I find anything on this command in the VMware KBs, so I want to emphasise that the next steps are unsupported and I expect resetting the write-ahead log will also cause some data loss. You’re at your own from here :-)
The command line for the pg_resetxlog would be:
/opt/vmware/vpostgres/9.3/bin/pg_resetxlog -f  {Location of the database}
First I needed to find out, where the database was located. This can be found in /etc/vmware-vpx/embedded_db.cfg at the following line:
EMB_DB_STORAGE='/storage/db/vpostgres'
Then when running the pg_resetxlog command, I received an error:
/opt/vmware/vpostgres/9.3/bin/pg_resetxlog -f  /storage/db/vpostgres

You must run pg_resetxlog as the PostgreSQL superuser
Hmm, the superuser? When looking at the directory contents of the /storage/db/vpostgres directory, I saw the user vpostgres had rights on this directory. So I tried running the command as the vpostgres user:
su vpostgres -s /bin/sh

/opt/vmware/vpostgres/9.3/bin/pg_resetxlog -f  /storage/db/vpostgres
This returned: Transaction log reset
I then tried to start vpxd again ( service vmware-vpxd start ) but again it took a lot of time. I could then see in the logs that it was waiting for services on port 8089 and since I had stopped and started a number of services during my troubleshooting, I decided to just reboot the appliance. After the reboot, vCenter was up and running again and I could reconnect without any issues.

See full post at: vCenter appliance database issue

06 Oct 20:50

Azure SQL Database new performance level

by James Serra

A new performance level for Azure SQL Database was recently announced, called P15.  This new offering is more than two times more powerful than the next best offering, P11.  P15 offers 4000 database transaction units (DTU) where P11 offered 1750 DTU’s.  Also, the max concurrent workers and concurrent logins increased from 2,400 to 6,400; the max concurrent sessions stayed the same at 32,000; and the max In-memory OLTP storage increased from 14GB to 32GB.

More info:

Azure SQL Database new premium performance level P15 generally available

06 Oct 20:50

SQLSweet16!, Episode 6: DBCC CHECKDB with MAXDOP

by Sanjay Mishra

Reviewed By: Dimitri Furman

DBCC CHECKDB is a common database maintenance task. It can take up significant amount of system resources, and can impact the performance of the production workload. There are some very good articles on the web on optimizing performance of DBCC CHECKDB and minimizing performance impact. SQL Server 2016 (and now backported to SQL Server 2014 SP2) provides another lever to manage resources consumed by DBCC CHECKDB. Now you can apply a MAXDOP option to the DBCC CHECKDB command (and to DBCC CHECKTABLE and DBCC CHECKFILEGROUP commands as well).

When MAXDOP is not specified with DBCC CHECKDB, the command uses the instance level “max degree of parallelism” configuration option. If the instance level configuration is 0 (default), DBCC CHECKDB could employ all the processors on the server and consume lots of resources, leaving very little room for the application workload. When a lower MAXDOP is used, less resources are used, but CHECKDB would take longer to finish.

The syntax of specifying MAXDOP to DBCC CHECKDB is pretty simple:

DBCC CHECKDB WITH MAXDOP = 4

Note that this command respects the MAX_DOP value that may be specified for the Resource Governor workload group used for the session running the command. If the MAXDOP value specified in the DBCC CHECKDB command is greater than the one in the Resource Governor configuration, then the latter will be used.

Figure 1 shows the elapsed time and CPU percentage for a DBCC CHECKDB test with and without MAXDOP.

Figure 1: DBCC CHECKDB with and without MAXDOP

Figure 1: DBCC CHECKDB with and without MAXDOP

In the above test, the server has default MAXDOP setting of 0. The server is 24-cores and the database size is about 190 GB. This shows that as the MAXDOP for the DBCC CHECKDB command is lowered from 0 (meaning all 24 cores) to 4, the time it takes to run increased from about 400 seconds to about 1100 seconds, while average CPU utilization is reduced from about 70% to about 10%, making the impact of DBCC CHECKDB on the application workload nearly negligible. Your mileage will vary, depending upon your hardware configuration.

 

06 Oct 20:50

SQL Saturdays and Why We Have Them

by Tim Radney

There has been some recent controversy over SQL Saturdays after PASS HQ announced some new changes. The changes introduced a new 600 mile radius for SQL Saturdays on the same day, an expansion from the previous 400 mile rule as well as reducing the PASS sponsorship from $500 per event to $250 per event and only for those that are in financial need. Originally the new rules also imposed a 600 mile rule and extended that to the Saturday before and after the event. The community was quick to point out how that would have impacted previous events and PASS HQ has removed the week before and after restriction.

With the popularity of the SQL Saturdays in the US, some event locations are finding it difficult to find sponsors for the event. I can understand this issue. I have helped organize numerous SQL Saturdays ranging from 100 attendees to upwards of 700. In the early days, there were fewer events and it seemed like every sponsor wanted to be at each one. That enabled organizers to be able to offer speakers and organizers event shirts, host a speaker dinner, and provide various other swag for the event. As popularity of the events grew, sponsors realized they couldn’t keep sending people to each one and that their budgets could only stretch so far. Organizers have started feeling the impact and are having to start looking elsewhere for sponsors as well as looking at their budgets.

Something that current and new organizers should consider is that all that extra stuff is just stuff. The main purpose of a SQL Saturday is to provide training to your local area, grow your local user group, and to help grow new speakers. As a speaker at nearly 40 SQL Saturdays, I have always enjoyed the speaker dinner as a way of networking and hanging out with other speakers, I would gladly pay for my own dinner at those events, the event organizer should not feel any pressure to feed the speakers the night before. If they would like to organize a place for us to all meet for dinner, which would be fantastic. Speaker shirts have been a big deal to many speakers, especially for new speakers starting out. If the budget allows for these, then great, if not, then do not feel obligated to provide a shirt. Many organizers feel they should get the speakers a gift, that is not necessary either, a hand written thank you note is more meaningful than a shirt, coffee mug, or Amazon gift card.

Smaller events can be held on a very small budget, especially if you can secure the venue for free.

I organize and run SQL Saturday Columbus GA and have helped organize SQL Saturday Atlanta since 2011. Atlanta is a great market and we have been very fortunate with sponsors year after year, in Columbus GA, things are very much different. Sponsorship dollars are much more difficult in Columbus and as a result, we keep things more “grass roots”. In Columbus GA, our event provides:

  • A venue – Free
  • Lanyards and name badge holders – $100
  • A nice variety of sessions thanks to our amazing speakers – Free
  • Lunch to our volunteers and the attendees opt to pay for lunch – $400
  • Coffee and donuts in the morning – $300
  • Speaker dinner – $500
  • Random snacks and drinks – $300

I hope more organizers will realize that they can put on a great event on a very small budget. SQL Saturday Columbus GA is fortunate to have a free venue and attract around 100 attendee’s year over year and to have the support of the Atlanta MDF. Our event cost just over $1500 and also generates a slight surplus in funds to fund our user group for the year.

In 2017 an approach I plan to do for sponsors is to have a $100 Community sponsor level. This will be for local businesses to help support the IT initiative without having to spend a lot of money. This will be for those to show support, get their name out there, but for those who really don’t need or care for the opt-in list or a table at the event. If I can sell 5 to 10 at that level, it will cover the majority of my event cost.

About me:

  • Attended my first SQL Saturday in 2010
  • Started speaking in 2011
  • Spoken at 38 SQL Saturdays
  • Helped organize 12 SQL Saturdays
  • Chapter Leader – Columbus GA SQL Server Users Group
  • PASS Regional Mentor

 

Share

06 Oct 20:49

Migrating data to Azure SQL Data Warehouse in practice

by Rangarajan Srirangam (AzureCAT)

Authors: Rangarajan Srirangam, Mandar Inamdar

Contributors and Reviewers: John Hoang, Sanjay Mishra, Alexei Khalyako, Sourabh Agarwal, Osamu Hirayama, Shiyang Qiu

Overview: Migrate data to Azure SQL Data Warehouse

Azure SQL Data Warehouse is an enterprise-class, distributed database, capable of processing massive volumes of relational and non-relational data. It can deploy, grow, shrink, and pause in seconds. As an Azure service, Azure SQL Data Warehouse automatically takes care of software patching, maintenance, and backups. Azure SQL Data Warehouse uses the Microsoft massive parallel processing (MPP) architecture. MPP was originally designed to run large on-premises enterprise data warehouses. For more information on Azure SQL Data Warehouse, see What is Azure SQL Data Warehouse?

This article focuses on migrating data to Azure SQL Data Warehouse with tips and techniques to help you achieve an efficient migration. Once you understand the steps involved in migration, you can practice them by following a running example of migrating a sample database to Azure SQL Data Warehouse.

Migrating your data to Azure SQL Data Warehouse involves a series of steps. These steps are executed in three logical stages: Preparation, Metadata migration and Data migration.

three logical stages of data migration - Fig 1

Figure 1: The three logical stages of data migration

In each stage, tasks to be executed involve the on-premises database system, the on-premises local storage, the network connecting the local system to Azure (either internet or a dedicated circuit) and Azure SQL Data Warehouse. This results in a physical data movement from the source database to Azure as shown below. (These steps  are also similar in moving data from any other source system on cloud instead of on-premises to Azure SQL Data Warehouse) 

Physical data movement from the source database to Azure

Figure 2: Physical data movement from the source database to Azure

Steps in the preparation stage start at the source database, where you choose the entities and attributes to migrate. You allocate local storage for further steps to come, establish a network to Azure, create a storage account and create an instance of Azure SQL Data Warehouse on Azure.

Metadata migration involves compatibility assessment and corrections, exporting the metadata, copying the metadata from the source system to Azure, and importing the metadata onto Azure SQL Data Warehouse.

Data Migration involves making the data-level compatibility changes if any, filtering and extracting the data to migrate, performing format conversions on the extracted data as necessary, compressing the data, copying the data to Azure, loading the transferred data, and doing post-load transformations and optimizations.

These steps are illustrated in the diagram below. The steps result in a logical flow from top to bottom and a physical flow from left to right.

(Arrows indicate a dependency:  the latter step depends on the successful completion of former steps)

Data migration process

Figure 3: Data migration process that results in a logical flow from top to bottom and a physical flow from left to right

If the volume of the data to migrate is large, some steps can be time consuming. These steps are rate-determining because they influence the overall migration time. Such steps are shaded in color.

Some migration steps may be optional depending on the size of the data, the nature of the network, and the tools and services used for migration. Optional steps are shown with dotted lines.

Example

To practice and understand the steps, you can follow a running example that migrates a sample database to Azure SQL Data Warehouse. To try out the sample, you’ll need:

  • On Azure, an:
    • Azure subscription
    • Azure storage account
    • Azure SQL Data Warehouse database
  • On a local computer:
    • The latest SQL Server 2016 build installed: Download Link.
      • Note: The sample database accompanying this document is a backup created from SQL Server 2016. You need a version of SQL 2016 to restore it. The steps described in this document can also be applied with a database created in earlier versions of SQL Server 2012 and 2014.
    • SQL Server Management Studio, July 2016 and above version.
    • The AdventureWorksDW Sample database version used in this example restored on to the SQL Server 2016 instance:  Download Link
    • A gzip compression utility. (7-Zip / other tools / use the custom code snippet later in the article)
    • The AzCopy Tool installed: Download Link.
    • The Azure Storage Explorer Tool installed: Download

Choose a migration approach

The data migration steps usually affect the performance, maintainability and reliability of the migration. Approaches for migrating data to Azure SQL Data Warehouse can be classified based on where the data migration is orchestrated from, and based on whether the migration operations are individualized or combined.

  • Source controlled or Azure controlled:
    • Source Controlled: Here the logic for data export, transfer and import steps runs mostly from the source data system. Source-controlled migrations can reuse existing computer and storage resources at the source system for some of the migration steps. Source-controlled migrations don’t require connectivity to Azure for some of the steps. Source-controlled migrations may use custom scripts and programs or ETL tools like SSIS run from the source database server.
    • Azure Controlled: Here the logic for the data export, transfer and import steps runs mostly from Azure. Azure-controlled migrations aim to reduce non-Azure assets for greater maintainability, and to do on-demand migrations by spinning up or scheduling Azure resources as needed. Azure- controlled migrations require connectivity from Azure to the source system to export data. Azure-controlled migrations may run migration logic in virtual machines running on Azure with the virtual machines being allocated and deallocated on demand.
  • Differentiated or integrated:
    • Differentiated approach: Here the data export, transfer and import are distinctly executed with each reading from or writing to intermediate files. File compression is invariably used to reduce the cost and time in transferring files from the source system to Azure. Compressed files are transferred to Azure-based storage before import. When the connectivity from source to Azure is lower in bandwidth or reliability, this approach may turn out to be more feasible. Differentiated approaches are typically realized by custom programs and scripts working independently, using tools such as bcp.exe, AzCopy, and compression libraries.
    • Integrated approach: Here the data export, transfer and import are combined and entities are transferred directly from the source data system to Azure SQL Data Warehouse with no intermediate files created. This approach has fewer moving pieces and tends to be more maintainable. However, it does not compress data in bulk, and can result in slower data transfer to Azure. It needs good connectivity from the source to Azure for repeatable and reliable migrations. Integrated approaches are typically realized by ETL tools like SSIS or with Azure Data Factory with the Data Management Gateway, which is an on-premises installable agent to enable data movement from on premise to Azure. Refer to the documentation on moving data to Azure with Data Management Gateway for more information.

It’s possible to use a hybrid approach, where operations are partly controlled from source and partly from Azure. With Data Factory and the Data Management Gateway, you can also build data pipelines that do one or more operations in the differentiated approach such as for example, moving data from SQL Server to File system/Blob and moving blobs from blob storage to Azure SQL Data Warehouse.

Often the speed of migration is an overriding concern compared to ease of setup and maintainability, particularly when there’s a large amount of data to move. Optimizing purely for speed, a source controlled differentiated approach relying on bcp to export data to files, efficiently moving the files to Azure Blob storage, and using the Polybase engine to import from blob storage works best.

Example:

In our running example, we choose the Source controlled and Differentiated approach, as it favors speed and customizability.

Note: You can also migrate the AdventureWorksDW sample Database to Azure SQL Data Warehouse by the other strategies, using SSIS or Azure Data Factory.

Preparation steps

Source data system: preparation

On the source, establish connectivity to the source data system, and choose which data entities and which attributes to migrate to Azure SQL Data Warehouse. It’s best to leave out entities and objects that aren’t going to be processed on Azure SQL Data Warehouse. Examples of these are log or archival tables and temporarily created tables.

Tip: Don’t migrate more objects than you need. Moving unnecessary data to cloud and having to purge data and objects on Azure SQL Data Warehouse can be wasteful. Depending on the sizes of unused objects, the cost and time of the data export, local transformations, and transfer increase.

Example

In our example, we migrate all the tables in the AdventureWorks DW database, since it’s a relatively small database.

Local storage: preparation

If the exported data will be stored locally prior to transfer (the differentiated approach), on the local storage system, ensure, at a minimum, that there is sufficient space to hold all of the exported data and metadata, the locally transformed data, and the compressed files. For better performance, use a storage system with sufficient independent disk resources allowing read/write options with little contention.

If the data transfer will be directly from the source data system to Azure SQL Data Warehouse (the Integrated approach), skip this step.

Example

In our example, you need about 500 MB of free space on the SQL Server Machine to hold the exported, format converted, and compressed data files for the AdventureWorksDW sample database tables.

Network: preparation

You can establish a connection to Azure via the public internet or using dedicated connectivity. A dedicated connection can provide better bandwidth, reliability, latency, and security compared to the public internet. On Azure, dedicated networking is offered through the ExpressRoute  service. Depending on the migration approach, the connectivity establishedbe used to move data to Azure SQL Data Warehouse directly, or move intermediate files to Azure storage.

Tip: If the size of the data to transfer is large, or you want to reduce the time it takes to transfer data or improve the reliability in data transfer, try ExpressRoute.  

Example

In our example, we transfer the data over the public internet to an Azure Storage location in the same region as the Azure SQL Data Warehouse because the data to transfer is relatively small. This requires no special network establishment step, but make sure that you’re connected to the internet during the following steps:

  • Azure preparation
  • Metadata copy to Azure and metadata import
  • The steps in the Data Migration section involving data transfer and import

Azure preparation

Prepare to receive the data on Azure:

  • Choose an Azure region where the Azure SQL Data Warehouse is available.
  • Create the Azure SQL Data Warehouse database.
  • Create a storage account.
  • Prepare the Azure SQL Data Warehouse for data Import.

Tip: For speedy data movement, choose the Azure region closest to your data source that also has Azure SQL Data Warehouse, and create a storage account in the same region.

Example

  • To find the regions where Azure SQL Data Warehouses are located, refer to Azure Services by Region. Choose the region closest to you.
  • Create an Azure SQL Data Warehouse and database by following the steps in the Create an Azure SQL Data Warehouse Connect to the Azure SQL Data Warehouse by following the steps in Query Azure SQL Data Warehouse (Visual Studio).
  • Create a storage account in the same Azure region where you created the Azure SQL Data Warehouse using the steps described in About Azure storage Accounts. A locally redundant storage (LRS) is sufficient for this example.
  • Create at least one container in the storage account. To continue with this example, you’ll need the following:
    • Name of the container you created above.
    • Name of the storage account you created.
    • Storage access key for the storage account. You can get this by following the steps under “View and copy storage access keys” in About Azure storage Accounts.
    • Server name, user name, and password for the Azure SQL Data Warehouse.
  • Prepare the Azure SQL Data Warehouse for data Import: For fast and parallel data imports we choose Polybase within the Azure SQL Data Warehouse to load data. To prepare the target database for import you need to use this information:

 Sample TSQL commands for the same are as follows:

  1. Create a master key
IF NOT EXISTS (SELECT * FROM sys.symmetric_keys)
CREATE MASTER KEY
  1. Create a database scoped credential
IF NOT EXISTS (SELECT * FROM  sys.database_credentials WHERE name='AzSqlDW_AzureStorageCredentialPolybase' )
CREATE DATABASE SCOPED CREDENTIAL AzSqlDW_AzureStorageCredentialPolybase
WITH IDENTITY = 'AzSqlDW_Identity' , SECRET = '<YourStorageAccountKey>'
  1. Create an external data source
IF NOT EXISTS (SELECT * FROM sys.external_data_sources WHERE name = 'AzSqlDW_AzureBlobStorage')
CREATE EXTERNAL DATA SOURCE AzSqlDW_AzureBlobStorage WITH (TYPE = HADOOP ,
LOCATION=
'wasbs://<YourStorageContainerName>@r<YourStorageAccountName>.blob.core.windows.net',
CREDENTIAL = AzSqlDW_AzureStorageCredentialPolybase);
  1. Create an external file format
IF NOT EXISTS(SELECT * FROM sys.external_file_formats WHERE name = 'AzSqlDW_TextFileGz')
CREATE EXTERNAL FILE FORMAT AzSqlDW_TextFileGz WITH(FORMAT_TYPE = DelimitedText,
FORMAT_OPTIONS (FIELD_TERMINATOR = '|'),
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec' );

In the above TSQL code replace YourStorageAccountName, YourStorageAccountKey and YourStorageContainerName with your corresponding values.

Tip: To prepare for a parallelized data import with Polybase, create one folder in the storage container for each source tablethe folder name could be the same as the table name.  This allows you split the data from large tables into several files and do a parallel data load into the target table from the multiple blobs in the container. You can also create a subfolder hierarchy based on how the source table data is grouped. This allows a control on the granularity of your load. For example, your subfolder hierarchy could be Data/Year/Quarter/Month/Day/Hour. This is also handy for incremental loads. For example, when you want to load a month of new data.

Metadata migration

Compatibility checks and changes

The source objects to migrate need to be compatible with Azure SQL Data Warehouse. Resolve any compatibility issues at the source before starting migration.

Tip: Do compatibility assessment and corrections as the first step in migration.

Tip: Use the Data Warehouse Migration Utility (Preview) to check compatibility issues—even do a quick migration for small amounts of data.

Note: The Data Warehouse Migration Utility can also help automate the migration itself. Note that the tool does not compress files, move data to Azure storage or use Polybase for import. Certain other steps, such as the “Azure Preparation” steps and the UTF 8 conversion are not supported. The tool generates bcp scripts that will move your data first to flat files on your server, and then directly into your Azure SQL Data Warehouse. The tool may be simple to use for small amounts of data.

A list of SQL Server functionality that is not present in Azure SQL Data Warehouse can be found in the migration documentation. In each table, make sure:

  • There are no incompatible column types.
  • There are no user-defined columns.

In addition, when using Polybase for data loading following limitations need to be checked

  • The total size of all columns is <= 32767 bytes
  • There are no varchar(max), nvarchar(max), varbinary(max) columns
  • The maximum length of individual columns is <= 8000 bytes 

Note: Azure SQL Data Warehouse currently supports rows larger than 32K and data types over 8K. Large row support adds support for varchar(max), nvarchar(max) and varbinary(max). In this first iteration of large row support, there are a few limits in place which will be lifted in future updates. In this update, loads for large rows is currently only supported through Azure Data Factory (with BCP), Azure Stream Analytics, SSIS, BCP or the .NET SQLBulkCopy class. PolyBase support for large rows will be added in a future release. This article demonstrates data load using Polybase.

Example

Check the tables in the same database (except for the total column size) for compatibility using the following query:

SELECT t.[name],c.[name],c.[system_type_id],c.[user_type_id],y.[is_user_defined],y.[name]
FROM sys.tables  t
JOIN sys.columns c ON t.[object_id] = c.[object_id]
JOIN sys.types y ON c.[user_type_id] = y.[user_type_id]
WHERE y.[name] IN
('geography','geometry','hierarchyid','image','ntext','numeric','sql_variant'
,'sysname','text','timestamp','uniqueidentifier','xml')
OR (y.[name] IN (  'varchar','varbinary') AND ((c.[max_length] = -1) or (c.max_length > 8000)))
OR (y.[name] IN (  'nvarchar') AND ((c.[max_length] = -1) or (c.max_length > 4000)))
OR y.[is_user_defined] = 1;

When you run this query against the sample database, you’ll find that the DatabaseLog table is incompatible. There are no incompatible column types, but the TSQL column is declared as nvarchar (4000) = 8000 bytes in max length.

To resolve the incompatibility, find the actual sizes of this and other variable columns in the DatabaseLog table and their total length using the following TSQL queries:

SELECT MAX(DATALENGTH([DatabaseUser])),MAX(DATALENGTH([Event])),MAX(DATALENGTH([Schema])),MAX(DATALENGTH([Object])),MAX(DATALENGTH([TSQL]))
FROM DatabaseLog
SELECT MAX(DATALENGTH([DatabaseUser])) + MAX(DATALENGTH([Event]))+ MAX(DATALENGTH([Schema])) + MAX(DATALENGTH([Object])) + MAX(DATALENGTH([TSQL]))
FROM DatabaseLog

You’ll find that the actual maximum data length of the TSQL column is 3034. The total of the maximum data lengths of the columns is 3162. These are within the maximum allowed column lengths and row lengths in Azure SQL Data Warehouse. No data needs to be truncated to meet the compatibility requirement, and we can instead modify the TSQL column as nvarchar(3034) in the exported schema.

Similarly, the sum of declared column lengths in the DimProduct exceeds the maximum allowed column length. This can be resolved in a similar way.

Metadata export

After you’ve made the necessary changes for Azure SQL Data Warehouse compatibility, export your metadata (schema) so that the same schema can be imported onto Azure SQL Data Warehouse. Script or otherwise automate the metadata export so that it can be done repeatedly without errors. A number of ETL tools can export metadata for popular data sources. Note that some further tasks will be needed after the export. First, while creating tables in Azure SQL Data Warehouse you need to mention the distributed table type (ROUND_ROBIN/HASH). Second, if you are using Polybase to import data, you need to create external tables that refer to the locations of the exported files for each table.

Tip: Refer to the SQLCAT guidance for choosing the type of distributed table in Azure SQL Data Warehouse Service.

Note that Azure SQL Data Warehouse does not support a number of common table features, such as primary keys, foreign keys, and unique constraints. For a full list, please refer to Migrate your schema to Azure SQL Data Warehouse.

Example

The table creation statement for the AdventureWorksBuildVersion table compatible with Azure SQL Data Warehouse is as follows:

IF NOT EXISTS (SELECT * FROM sys.tables WHERE schema_name(schema_id) = 'dbo' AND name='AdventureWorksDWBuildVersion')
CREATE TABLE [dbo].[AdventureWorksDWBuildVersion]([DBVersion] nvarchar(100) NOT NULL,[VersionDate] datetime NOT NULL) WITH(CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUND_ROBIN)

A full list of sample table creation commands for the AdventureWorks database can be found here.

The external table creation statement for the AdventureWorksBuildVersion table compatible with Azure  SQL DW is as follows:

IF NOT EXISTS (SELECT * FROM sys.tables WHERE schema_name(schema_id) = 'dbo' AND name='AdventureWorksDWBuildVersion_External') 
CREATE EXTERNAL TABLE [dbo].[AdventureWorksDWBuildVersion_External]([DBVersion] nvarchar(100) NOT NULL,[VersionDate] datetime NOT NULL)
WITH(LOCATION = '/dbo.AdventureWorksDWBuildVersion.UTF8.txt.gz', DATA_SOURCE = AzSqlDW_AzureBlobStorage, FILE_FORMAT = AzSqlDW_TextFileGz);
  •  In the part of the statement starting from the WITH keyword, you need to provide for the values for the parameters – LOCATION, DATA_SOURCE and FILE_FORMAT.
    • The value of the LOCATION parameter should be the path where the data file for the table will reside on Azure blob storage.
    • The value of the DATA_SOURCE parameter should be the name of the data source as created in the “Prepare the Azure SQL Data Warehouse for data Import” section of “Azure preparation”
    • The value of the FILE_FORMAT parameter should be name of the file format created in the “Prepare the Azure SQL Data Warehouse for data Import” section of “Azure preparation”

Note: /dbo.AdventureWorksDWBuildVersion.UTF8.txt.gz refers to a file location relative to the Azure storage container created under “Azure: preparation”. This file itself does not exist yet—it will be created during data export. So you can’t yet execute the External table creation commands just yet.

A full list of sample external table creation commands for the AdventureWorks database can be found here . 

Metadata copy to Azure and metadata import

Since the metadata is usually small in size and the format well known, you don’t need further optimization or format conversions. Use SQL Server Data Tools (SSDT) or SSMS (July 2016 release) to execute the table creation statements against the target Azure SQL Data Warehouse database. To connect to Azure SQL Data Warehouse, specify the server name (of the form YourSQLDWServerName.database.windows.net), user name and database name (not the master database, which is the default) as chosen at the time of creation.

Example

Execute the statements using SSDT or SSMS (July 2016 release) to create the tables on Azure SQL Data Warehouse.

Note: You cannot yet execute the External Table Create statements, as the table data needs to be exported and moved to Azure Blob Storage before you can do this.

Data migration

Data: compatibility changes

In addition to changes to metadata for compatibility, you might need to convert data during extraction for error-free import with Azure SQL Data Warehouse. In importing with Polybase, dates must be in the following formats when the DATE_FORMAT is not specified.

  • DateTime: ‘yyyy-MM-dd HH:mm:ss’
  • SmallDateTime: ‘yyyy-MM-dd HH:mm’
  • Date: ‘yyyy-MM-dd’
  • DateTime2: ‘yyyy-MM-dd HH:mm:ss’
  • DateTimeOffset: ‘yyyy-MM-dd HH:mm:ss’
  • Time: ‘HH:mm:ss’.

Depending on your locale and current date format, you may need to convert date formats during export. Additionally, bcp exports data to field and row delimited files, but bcp by itself does not escape delimiters. You choose a delimiter that does not occur in any of the data in the table. Also, if you have used a data type for a column in an Azure SQL Data Warehouse table that is different from the corresponding column in the source table, ensure that during extraction, the data is converted to a format compatible with the target.

Tip: Invalid export files can result in data being rejected by Azure SQL Data Warehouse during import. Preventing these errors saves you from file correction or re-extraction and retransfer efforts.

The most common mistakes include:

  • Malformed data files.
  • Un-escaped or missing field/row delimiters.
  • Incompatible date formats and other representations in extracted files.
  • The order of extracted columns being different from the order in import.
  • The column name or number of supplied values don’t match the target table definition.

When there are individual rows with errors, you can get error messages like the following which will help determine what went wrong:

“Query aborted– the maximum reject threshold (… rows) was reached while reading from an external source: YYY rows rejected out of total ZZZ rows processed. (…) Column ordinal: .., Expected data type: …Offending value:”

Example

If you use an (un-escaped) comma as a field delimiter, you’ll have import errors with a number of tables in the sample database. A field delimiter not found in the tables is the pipe character. You can extract dates to a target format using the CONVERT function. An example follows for one of the tables in the sample database:

SELECT REPLACE([DBVersion],'|','||'),CONVERT(varchar(32), [VersionDate], 121) 
FROM [AzureSQLDWAdventureWorks].[dbo].[AdventureWorksDWBuildVersion]

For a full list of extraction commands, refer to the “Data: export and format conversion” section.

Data: export and format conversion

When you don’t use an ETL tool like SSIS to integrate the steps of export, transfer, and load, or you’re following the differentiated approach in migration as discussed earlier, choose an extraction tool and optionally specify the extraction query to choose columns and filter rows. Data export can be CPU, memory, and IO intensive. To speed up data export, use bulk /batched extraction, parallelize extraction, and scale compute/memory/IO resources as needed.

You can use the bcp utility, which bulk copies data from an instance of Microsoft SQL Server to a data file in a user-specified format. Note that bcp data files don’t include any schema or format information. An independent schema import is essential before you import data generated by bcp on Azure SQL Data Warehouse. bcp can export data in character format (-c option) or Unicode character format (-w option).

Note: Bcp version 13 (SQL Server 2016) supports code page 65001 (UTF-8 encoding). This article demonstrates UTF-8 conversion as earlier versions of bcp did not have this support.

In importing data into Azure SQL Data Warehouse, with Polybase, non-ASCII characters need to be encoded using UTF-8. Hence if your tables have data with extended ASCII characters you need to convert the exported data to UTF-8 before importing. Also, in creating the bcp commands, note the need to escape delimiters, as mentioned in the earlier section.

Tip: If invalid characters in the exported files don’t conform to the expected encoding, data import into Azure SQL Data Warehouse can fail. For example, if you have extended characters in tables, convert the files generated by bcp to UTF-8 programmatically or by using PowerShell commands.

The System.Encoding class in .NET provides support for programmatic conversion between Unicode and UTF-8.

Tip: The speed at which bcp exports data to files depends on a number of factors including command options such as batch_size, packet_size, rows_per_batch, query hints used such as TABLOCK, the extent of parallelism, the number of processing cores and the performance of the IO subsystem. For more information on bcp options, refer to the documentation on the bcp utility.

You can also experiment with parallelizing the process by running bcp in parallel for separate tables, or separate partitions in a single table.

Tip: Export data from large tables into multiple files so that they can be imported in parallel. Decide on a way to filter records based on attributes to implement multi-file export so that batches of records go into different files.

When Azure has network reliability issues, implementing multi-file export per table for large tables increases the chances of individual file transfers Azure being successful.

Example

A sample bcp command to export one of the tables in the sample database follows:

bcp "select REPLACE([DBVersion],'|','||'),CONVERT(varchar(32), [VersionDate], 121)  from [AzureSQLDWAdventureWorks].[dbo].[AdventureWorksDWBuildVersion]" queryout "<YouLocalPathForBcpFiles>” /dbo.AdventureWorksDWBuildVersion.txt" -q -c -t "|" -r "\n" -S <YourSQLServerInstance> -T

A full list of sample bcp commands for the sample database can be found here

After bcp execution is complete, there should be 34 files created on disk, ending with .txt, corresponding to the 34 tables in the sample database.

The sample database has a number of tables with extended characters. Importing the bcp-generated files directly into Azure SQL Data Warehouse can fail. Sample code in C# to do the conversion is as follows:

public void ConvertTextFileToUTF8(string sourceFilePath, string destnFilePath)
        {
            string strLine;
            using (StreamReader reader = new StreamReader(sourceFilePath, true))
            {
                using (StreamWriter writer = new StreamWriter(destnFilePath))
                    // Encoding is UTF-8 by default
                    while (!reader.EndOfStream)
                    {
                        strLine = reader.ReadLine();
                        writer.WriteLine(strLine);
                    }
            }
        }

You can also chain the Power Shell get-content and set-content cmdlets with the -encoding parameter option to change the encoding, as follows:

Get-Content <input_file_name> -Encoding Unicode | Set-Content <output_file_name> -Encoding utf8

We assume that after implementing one of the above approaches, the files in UTF-8 are named with the convention filename.UTF8.txt.  For example, AdventureWorksDWBuildVersion.UTF8.txt.

At the end of this step, there should be 34 UTF-8 encoded files created on disk ending with .UTF8.txt and corresponding to the 34 tables in the sample database.

Data: compression

In transferring large amounts of data to Azure or while working with networks that are limited in bandwidth or reliability, compression can cut down migration times. Exported files from data sources with text content tend to yield good compression ratios, resulting in significant size reduction and file transfer times. Delimited files compressed with the gzip compression format can be imported using Polybase (DATA COMPRESSION = ‘org.apache.hadoop.io.compress.GzipCodec’) into Azure SQL Data Warehouse. This way, you don’t need to decompress the files on Azure.

Tip: Note that Polybase supports gzip which is different from the popular Zip format. Choosing an unsupported compression format can result in import failures.

Tip: Create one compressed file for each export file. For easy import logic, avoid putting exported files of multiple tables in the same compressed archive file.

Tip: Split large files—larger than 2 GB—before compression. Each compressed file then has to spend a smaller amount of time on the network. It has a greater chance of getting across without interruption.

A popular tool that supports gzip compression is the 7-Zip compression utility. You can also compress files to the gzip format programmatically. In .NET, support for gzip compression is provided through the GZipStream class in the System.IO.Compression namespace.

Example

Sample code in C# that illustrates how to compress all files in a folder to the gzip format follows:

public static void Compress(string sourceFolderPath, string destnFolderPath)
        {
            string compressedFileName = null;
            string compressedFilePath = null;
            DirectoryInfo dirInfo = new DirectoryInfo(sourceFolderPath);
            foreach (FileInfo fileToCompress in dirInfo.GetFiles())
            {
                using (FileStream originalFileStream = fileToCompress.OpenRead())
                {
                    if ((File.GetAttributes(fileToCompress.FullName) &
                     FileAttributes.Hidden) != FileAttributes.Hidden & 
                     fileToCompress.Extension != ".gz")
                    {
                        compressedFileName = Path.GetFileNameWithoutExtension(fileToCompress.FullName) + ".gz";
                        compressedFilePath = Path.Combine(destnFolderPath, compressedFileName);

                        using (FileStream compressedFileStream = File.Create(compressedFilePath))
                        {
                           using (GZipStream compressionStream = new GZipStream(compressedFileStream, CompressionMode.Compress))
                            {
                                originalFileStream.CopyTo(compressionStream);
                            }
                        }
                    }
                }
            }
        }

You can also use 7-Zip or any other compatible compression utility for this purpose.

After completing this step, you can see that the exported files are about 116 MB in size. The compressed files are about 16.5 MB in size—about seven times smaller.

The sample code shown above stores the compressed files with an extension of .gz. For example, dbo.AdventureWorksDWBuildVersion.UTF8.gz.

At the end of this step, there should be 34 compressed files created on disk with the .gz extension, corresponding to the 34 tables in the sample database.

Data: transfer to Azure

Improving data transfer rates is a common problem to solve. Using compression and establishing a dedicated network to Azure using ExpressRoute  have already been mentioned.

Other good approaches are to do data copies concurrently, execute the copy asynchronously, maintain a log of completed options and errors, and build in the ability to resume failed transfer. The AzCopy tool is optimized for large scale copy scenarios. It includes these techniques and many other options. The key features of interest are below:

  • Concurrency: AzCopy will begin concurrent operations at eight times the number of core processors you have. The /NC allows you to change the concurrency level.
  • Resuming: AzCopy uses a journal file to resume the incomplete operation. You can specify a custom journal location with the /Z option.
  • Logging: AzCopy generates a log file by default. You can provide a custom location with the /V option.
  • Restarting from the point of failure: AzCopy builds in a restartable mode that allows restart from the point of interruption.
  • Configurable Source and Destination Type: AzCopy can be used to copy from on-premises to an Azure storage account or from one Azure storage account to another.

Tip: Run one AzCopy instance on one machine. Control the concurrency using the /NC option instead of launching more instances.

Tip: A large number of concurrent operations in a low-bandwidth environment may overwhelm the network connection. Limit concurrent operations based on actual available network bandwidth.

Please read the AzCopy documentation to understand the utility and its parameters.

Tip: For some source locations, the network connectivity may be poor, establishing an Express Route connectivity may not be possible, and the size of the data to transfer may be large. In such cases, if data transfers become infeasible to implement—even with compression and AzCopy—explore the Azure Import Export Service. You can transfer data to Azure blob storage using physical hard drives with this service.

Example

You can execute AzCopy after installation from a command prompt using the following syntax:

"<YourAzCopyPath>/AzCopy.exe" /Source:"<YourLocalPathToCompressedFiles>" /Dest:https://<YourStorageAccount>.blob.core.windows.net/<YourStorageContainer> /DestKey:<YourStorageAccountKey> /pattern:*.gz /NC:<YourConcurrencyLevel>

Note the following with respect to the placeholders in the above command:

  • <YourAzCopyPath>: Provide the AzCopy install path such as C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy/AzCopy.exe or modify your path variable if you want to avoid specifying the full path.
  • <YourLocalPathToCompressedFiles>: Provide the path to the folder containing the gzip files.
  • <YourStorageAccount>, <YourStorageContainer>, <YourStorageAccountKey>: Provide these based on the storage account created in the Azure Preparation step.
  • /NC: <YourConcurrencyLevel>: Set the value to be the number of cores on the source machine on which AzCopy is executed.

If the parameters are supplied correctly, AzCopy will start copying the files and report running progress on the number of files copied and the transfer rate as follows (your transfer rate can be different):

Azcopy finished

After completion, AzCopy will summarize the results as follows (your elapsed time can be different):

azcopy log

Sometimes your copy may get interrupted.

AzCopy maintains a log file and journal file at %LocalAppData%\Microsoft\Azure\AzCopy.

If the journal file does exist, AzCopy will check whether the command line that you input matches the command line in the journal file. If the two command lines match, AzCopy resumes the incomplete operation. If the two command lines don’t match, you’ll be prompted to overwrite the journal file to start a new operation, or to cancel the current operation with a message like the one below:

Incomplete operation with same command line detected at the journal directory "<YourAzCopyLocation>", do you want to resume the operation? Choose Yes to resume, choose No to overwrite the journal to start a new operation. (Yes/No)

At the end of this step, 36 files should have been transferred to your Azure storage account. You can use the Azure Storage Explorer GUI tool to check if the files are available in the storage account.

Scripted/Programmatic transfer:

Data: import

PolyBase is the fastest mechanism to import data into Azure SQL Data Warehouse. PolyBase parallelizes loads from Azure Blob storage, reads all the files inside a folder and treats them as one table, supports the gzip compression format and UTF-8 encoding, and Azure Blob store as the storage mechanism.  Loading with PolyBase data allows data import to scale in speed and proportion to the allocated data warehouse units (DWUs) on Azure SQL Data Warehouse. For a more detailed discussion on data loading strategies and best practices, refer to the Azure CAT Guidance on Azure SQL Data Warehouse loading patterns and strategies.

Tip: Choices in the overall migration process contribute to fast loading with Polybase. These are:

  • Creating folders for each table Creating multiple files for the tables.
  • Creating multiple files for each large table.
  • Converting the exported data to UTF-8.
  • Creating multiple compressed files.
  • Compressing the data to the gzip format.
  • Copying the compressed data to Azure Blob storage.
  • Doing the Polybase preparation steps (including creating external tables.
  • Doing final loading from external tables with Polybase queries.

 If you’ve been following the running example, you’ve practiced most of these techniques already!

Tip: The DWUs allocated for the target Azure SQL Data Warehouse make a difference to the load speed. For more information, refer to the “Data Reader, Writers consideration” section in the Azure CAT guidance on Azure SQL Data Warehouse loading patterns and strategies.

Tip: Depending on your specific scenario, there could be a performance advantage in one of two possible techniques, both using Polybase:

  • Transferring and importing from uncompressed files (slower export and transfer, faster load)
  • Transferring and importing compressed files (faster export and transfer, slower load)

What matters is the overall process performance. Depending on the network speed and the data sizes tests, a few tests with both techniques may help determine which works best in your context.

There are two ways to import data with Polybase from blobs in Blob storage:

  • CREATE TABLE AS: This option creates the target table and load the table. Use this for first time loading.
  • INSERT INTO… SELECT * FROM: This option loads data into an existing target table. Use this with subsequent loads.

Example

Before you can execute the load queries, you need to execute external table creation queries that were created in the “Metadata export” step. Since the files referred to by the External Table creation queries have been transferred to Azure Blob Storage, the external table locations are valid. Those queries can be executed at this time. Ensure that the table creation and external table creation steps are successful before attempting to import data.

In our example, we use the INSERT INTO … SELECT * FROM method to import data into Azure SQL Data Warehouse for easy illustration so you can run it multiple times. This requires you to generate an INSERT INTO … SELECT * FROM query for each table in the sample database.

A sample query is as follows:

INSERT INTO dbo.AdventureWorksDWBuildVersion
SELECT * FROM dbo.AdventureWorksDWBuildVersionExternalGz

A full list of sample INSERT…SELECT queries can be found here . 

During import if you receive errors, correct the root cause using the error messages. In the “Data: Compatibility changes” section, we mentioned the causes of most common errors. Note that formats incompatible with Polybase will be rejected. For example, UTF-16 encoding, Zip Compression, and JSON format.Note that Polybase supports:

  • Encoding: UTF-8
  • Format: delimited text files, Hadoop file formats RC File, ORC, and Parquet
  • Compression: gzip, zlib, and Snappy compressed files

Refer to Azure SQL Data Warehouse loading patterns and strategies  for more information.

Once Import is successful, check the source database tables row counts against the row counts in the corresponding Azure SQL Data Warehouse tables.

This completes our example.

Data transformation and optimization

Once you have successfully migrated your data into Azure SQL Data Warehouse, the next immediate step is to create statistics on your newly loaded data using the CREATE STATISTICS statement on all columns of all tables.

If you plan to query data using external tables, you need to create statistics on external tables also. After this, you may want to do transformations on the data prior to executing query workloads.

Tip: Distinguish between conversions before load for compatibility (such as date format conversion, and UTF-8 encoding) and data transformations after load that can be done on Azure SQL Data Warehouse after loading is complete. These transformations are better done on Azure SQL Data Warehouse instead of on the source, exploiting the full processing power and scale of Azure SQL Data Warehouse. An Extract Load Transform (ELT)pattern rather than an Extract Transform Load (ETL) pattern may work better for you.

 

21 Sep 23:56

SQLSweet16!, Episode 7: Install Option for Instant File Initialization

by Sanjay Mishra

Reviewed by: Dimitri Furman, Kun Cheng, Denzil Ribeiro

Database Instant File Initialization helps improve performance of certain file operations. Prior to SQL Server 2016, enabling instant file operation has been cumbersome (editing the Local Security Policy to add the SQL Server service account to the Perform Volume Maintenance Tasks policy, followed by restarting SQL Server instance), therefore some administrators missed out on this performance improvement technique.

If you want to enable instant file initialization, SQL Server 2016 makes life simpler for DBAs and System Administrators by providing a simple checkbox during the install of SQL Server, as shown in Figure 1.

Figure 1: Option to enable instant file initialization while installing SQL Server 2016

Figure 1: Option to enable instant file initialization while installing SQL Server 2016

The checkbox “Grant Perform Volume Maintenance Task privilege to SQL Server Database Engine Service” is unchecked by default. To enable instant file initialization, all you need to do is check that box. No need to edit the security policies through the Local Security Policy application any more.

Notably, setup grants the privilege to the per-service SID for the SQL Server instance, e.g. to the NT SERVICE\MSSQL$SQL2016 security principal, for an instance named SQL2016. This is preferable to granting the privilege to the SQL Server engine service account, which is still sometimes done by administrators. The service account is subject to change, and if changed, SQL Server could unexpectedly lose the IFI privilege. But the per-service SID remains the same for the lifetime of the instance, which avoids this risk.

To emphasize the impact of instant file initialization, I installed a SQL Server 2014 instance on a server and restored a database (of size 190 GB). By default, this SQL Server instance doesn’t have instant file initialization enabled. And, then installed a SQL Server 2016 instance on the same server (checked the above mentioned checkbox during the install), and restored from the same database backup. The results are in Figure 2.

Figure 2: Improved restore time with instant file initialization in SQL Server 2016.

Figure 2: Improved restore time with instant file initialization in SQL Server 2016.

How to know if instant file initialization was used while restoring your database? Use the simple techniques described here: https://blogs.msdn.microsoft.com/sql_pfe_blog/2009/12/22/how-and-why-to-enable-instant-file-initialization/, or (another improvement in SQL Server 2016) check the server error log. If IFI is enabled, the following message is logged during server startup: “Database Instant File Initialization: enabled. For security and performance considerations see the topic ‘Database Instant File Initialization’ in SQL Server Books Online. This is an informational message only. No user action is required.”

Instant file initialization not only helps improve restore performance, but helps other operations as well, such as creating a database or adding new files to an existing database, extending a file or autogrow operations.

 

21 Sep 23:56

SQL Server on Xeon Phi?

by jchang
Can the Intel Xeon Phi x200, aka Knights Landing, run SQL Server? It does run Windows Server 2016, so is there anything in SQL Server 2016 that would stop it from installing? Xeon Phi is designed for HPC, so it would not have been tested with SQL Server,...(read more)
21 Sep 23:56

VMware Announces Alliances and Strategies for the Internet of Things

by A.R. Guess

by Angela Guess A new press release out of the company reports, “VMware, Inc. today announced key strategic alliances in the Internet of Things (IoT) space to help bridge the gap between IT and operational technology (OT) worlds. VMware has formed alliances with Bayshore Networks, Dell, Intwine Connect, Deloitte Digital, PTC and V5 Systems. According […]

The post VMware Announces Alliances and Strategies for the Internet of Things appeared first on DATAVERSITY.

21 Sep 23:54

Pennsylvania Governor Launches New Open Data Program, OpenDataPA

by A.R. Guess

by Angela Guess According to a recent release out of the governor’s office, “Governor Wolf launched OpenDataPA to enhance access to valuable information by creating a central repository to share the commonwealth’s data with the general public. Citizens, researchers, media, and developers can now browse through the first-released datasets at data.pa.gov, the home of OpenDataPA. […]

The post Pennsylvania Governor Launches New Open Data Program, OpenDataPA appeared first on DATAVERSITY.

21 Sep 23:54

Eight scenarios with Apache Spark on Azure that will transform any business

by SQL Server Team

This post was authored by Rimma Nehme, Technical Assistant, Data Group.

Spark-Azure

Since its birth in 2009, and the time it was open sourced in 2010, Apache Spark has grown to become one of the largest open source communities in big data with over 400 organizations from 100 companies contributing to it. Spark stands out for its ability to process large volumes of data 100x faster, because data is persisted in-memory. Azure cloud makes Apache Spark incredibly easy and cost effective to deploy with no hardware to buy, no software to configure, with a full notebook experience to author compelling narratives, and integration with partner business intelligence tools. In this blog post, I am going to review of some of the truly game-changing usage scenarios with Apache Spark on Azure that companies can employ in their context.

Scenario #1: Streaming data, IoT and real-time analytics

Apache Spark’s key use case is its ability to process streaming data. With so much data being processed on a daily basis, it has become essential for companies to be able to stream and analyze it all in real time. Spark Streaming has the capability to handle this type of workload exceptionally well. As shown in the image below, a user can create an Azure Event Hub (or an Azure IoT Hub) to ingest rapidly arriving data into the cloud; both Event and IoT Hubs can intake millions of events and sensor updates per second that can then be processed in real-time by Spark.

Scenario 1_Spark Streaming

Businesses can use this scenario today for:

  • Streaming ETL: In traditional ETL (extract, transform, load) scenarios, the tools are used for batch processing, and data must be first read in its entirety, converted to a database compatible format, and then written to the target database. With Streaming ETL, data is continually cleaned and aggregated before it is pushed into data stores or for further analysis.
  • Data enrichment: Streaming capability can be used to enrich live data by combining it with static or ‘stationary’ data, thus allowing businesses to conduct more complete real-time data analysis. Online advertisers use data enrichment to combine historical customer data with live customer behavior data and deliver more personalized and targeted ads in real-time and in the context of what customers are doing. Since advertising is so time-sensitive, companies have to move fast if they want to capture mindshare. Spark on Azure is one way to help achieve that.
  • Trigger event detection: Spark Streaming can allow companies to detect and respond quickly to rare or unusual behaviors (“trigger events”) that could indicate a potentially serious problem within the system. For instance, financial institutions can use triggers to detect fraudulent transactions and stop fraud in its tracks. Hospitals can also use triggers to detect potentially dangerous health changes while monitoring patient vital signs and sending automatic alerts to the right caregivers who can then take immediate and appropriate action.
  • Complex session analysis: Using Spark Streaming, businesses can use events relating to live sessions, such as user activity after logging into a website or application, can be grouped together and quickly analyzed. Session information can also be used to continuously update machine learning models. Companies can then use this functionality to gain immediate insights as to how users are engaging on their site and provide more real-time personalized experiences.

Scenario #2: Visual data exploration and interactive analysis

Using Spark SQL running against data stored in Azure, companies can use BI tools such as Power BI, PowerApps, Flow, SAP Lumira, QlikView and Tableau to analyze and visualize their big data. Spark’s interactive analytics capability is fast enough to perform exploratory queries without sampling. By combining Spark with visualization tools, complex data sets can be processed and visualized interactively. These easy-to-use interfaces then allow even non-technical users to visually explore data, create models and share results. Because wider audience can analyze big data without preconceived notions, companies can test new ideas and visualize important findings in their data earlier than ever before. Companies can identify new trends and new relationships that were not apparent before and quickly drill down into them, ask new questions and find ways to innovate in new and smarter ways.

Scenario 2_Spark visual data exploration and interactive analysis

This scenario is even more powerful when interactive data discovery is combined with predictive analytics (more on this later in this blog). Based on relationships and trends identified during discovery, companies can use logistic regression or decision tree techniques to predict the probability of certain events in the future (e.g., customer churn probability). Companies can then take specific, targeted actions to control or avert certain events.

Scenario #3: Spark with NoSQL (HBase and Azure DocumentDB)

This scenario provides scalable and reliable Spark access to NoSQL data stored either in HBase or our blazing fast, planet-scale Azure DocumentDB, through “native” data access APIs. Apache HBase is an open-source NoSQL database that is built on Hadoop and modeled after Google BigTable. DocumentDB is a true schema-free managed NoSQL database service running in Azure designed for modern mobile, web, gaming, and IoT scenarios. DocumentDB ensures 99% of your reads are served under 10 milliseconds and 99% of your writes are served under 15 milliseconds. It also provides schema flexibility, and the ability to easily scale a database up and down on demand.

The Spark with NoSQL scenario enables ad-hoc, interactive queries on big data. NoSQL can be used for capturing data that is collected incrementally from various sources across the globe. This includes social analytics, time series, game or application telemetry, retail catalogs, up-to-date trends and counters, and audit log systems. Spark can then be used for running advanced analytics algorithms at scale on top of the data coming from NoSQL.

Scenario 3_Spark NoSQL

Companies can employ this scenario in online shopping recommendations, spam classifiers for real time communication applications, predictive analytics for personalization, and fraud detection models for mobile applications that need to make instant decisions to accept or reject a payment. I would also include in this category a broad group of applications that are really “next-gen” data warehousing, where large amounts of data needs to be processed inexpensively and then served in an interactive form to many users globally. Finally, internet of things scenarios fit in here as well, with the obvious difference that the data represents the actions of machines instead of people.

Scenario #4: Spark with Data Lake

Spark on Azure can be configured to use Azure Data Lake Store (ADLS) as an additional storage. ADLS is an enterprise-class, hyper-scale repository for big data analytic workloads. Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts in an enterprise environment to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. Because ADLS is a file system compatible with Hadoop Distributed File System (HDFS), it makes it very easy to combine it with Spark for running computations at scale using pre-existing Spark queries.

Scenario 4_Spark with Data Lake

The data lake scenario arose because new types of data needed to be captured and exploited by companies, while still preserving all of the enterprise-level requirements like security, availability, compliance, failover, etc. Spark with data lake scenario enables a truly scalable advanced analytics on healthcare data, financial data, business-sensitive data, geo-location coordinates, clickstream data, server log, social media, machine and sensor data. If companies want an easy way of building data pipelines, have unparalleled performance, insure their data quality, manage access control, perform change data capture (CDC) processing, get enterprise-level security seamlessly and have world-class management and debugging tools, this is the scenario they need to implement.

Scenario #5: Spark with SQL Data Warehouse

While there is still a lot of confusion, Spark and big data analytics is not a replacement for traditional data warehousing. Instead, Spark on Azure can complement and enhance a company’s data warehousing efforts by modernizing the company’s approaches to analytics. A data warehouse can be viewed as an ‘information archive’ that supports business intelligence (BI) users and reporting tools for mission-critical functions of company. My definition of mission-critical is any system that supports revenue generation or cost control. If such a system fails, companies would have to manually perform these tasks to prevent loss of revenue or increased cost. Big data analytics systems like Spark help augment such systems by running more sophisticated computations, smarter analytics and delivering deeper insights using larger and more diverse datasets.

Azure SQL Data Warehouse (SQLDW) is a cloud-based, scale-out database capable of processing massive volumes of data, both relational and non-relational. Built on our massively parallel processing (MPP) architecture, SQLDW combines the power of the SQL Server relational database with Azure cloud scale-out capabilities. You can increase, decrease, pause, or resume a data warehouse in seconds with SQLDW. Furthermore, you save costs by scaling out CPU when you need it and cutting back usage during non-peak times. SQLDW is the manifestation of elastic future of data warehousing in the cloud.

Scenario 5_Spark with SQLDW

Some of the use cases of Spark with SQLDW scenario may include: using data warehouse to get a better understanding of its customers across product groups, then using Spark for predictive analytics on top of that data. Running advanced analytics using Spark on top of the enterprise data warehouse containing sales, marketing, store management, point of sale, customer loyalty, and supply chain data, then run advanced analytics using Spark to drive more informed business decisions at the corporate, regional, and store levels. Using Spark with the data warehousing data, companies can literally do anything from risk modeling, to parallel processing of large graphs, to advanced analytics, text processing – all on top of their elastic data warehouse.

Scenario #6: Machine Learning using R Server, MLlib

Another and probably one of the most prominent Spark use cases in Azure is machine learning. By storing datasets in-memory during a job, Spark has great performance for iterative queries common in machine learning workloads. Common machine learning tasks that can be run with Spark in Azure include (but are not limited to) classification, regression, clustering, topic modeling, singular value decomposition (SVD) and principal component analysis (PCA) and hypothesis testing and calculating sample statistics.

Typically, if you want to train a statistical model on very large amounts of data, you need three things:

  • Storage platform capable of holding all of the training data
  • Computational platform capable of efficiently performing the heavy-duty mathematical computations required
  • Statistical computing language with algorithms that can take advantage of the storage and computation power

Microsoft R Server, running on HDInsight with Apache Spark provides all three things above. Microsoft R Server runs within HDInsight Hadoop nodes running on Microsoft Azure. Better yet, the big-data-capable algorithms of ScaleR takes advantage of the in-memory architecture of Spark, dramatically reducing the time needed to train models on large data. With multi-threaded math libraries and transparent parallelization in R Server, customers can handle up to 1000x more data and up to 50x faster speeds than open source R. And if your data grows or you just need more power, you can dynamically add nodes to the Spark cluster using the Azure portal. Spark in Azure also includes MLlib for a variety of scalable machine learning algorithms, or you can use your own libraries. Some of the common applications of machine learning scenario with Spark on Azure are listed in a table below.

Vertical Sales and Marketing Finance and Risk Customer and Channel Operations and Workforce
Retail Demand forecasting

Loyalty programs

Cross-sell and upsell

Customer acquisition

Fraud detection

Pricing strategy

Personalization

Lifetime customer value

Product segmentation

Store location demographics

Supply chain management

Inventory management

Financial Services Customer churn

Loyalty programs

Cross-sell and upsell

Customer acquisition

Fraud detection

Risk and compliance

Loan defaults

Personalization

Lifetime customer value

Call center optimization

Pay for performance

Healthcare Marketing mix optimization

Patient acquisition

Fraud detection

Bill collection

Population health

Patient demographics

Operational efficiency

Pay for performance

Manufacturing Demand forecasting

Marketing mix optimization

Pricing strategy

Perf risk management

Supply chain optimization

Personalization

Remote monitoring

Predictive maintenance

Asset management

 

Scenario 6_Spark Machine Learning

Examples with just a few lines of code that you can try out right now:

Scenario #7: Putting it all together in a notebook experience

For data scientists, we provide out-of-the-box integration with Jupyter (iPython), the most popular open source notebook in the world. Unlike other managed Spark offerings that might require you to install your own notebooks, we worked with the Jupyter OSS community to enhance the kernel to allow Spark execution through a REST endpoint.

We co-led “Project Livy” with Cloudera and other organizations to create an open source Apache licensed REST web service that makes Spark a more robust back-end for running interactive notebooks.  As a result, Jupyter notebooks are now accessible within HDInsight out-of-the-box. In this scenario, we can use all of the services in Azure mentioned above with Spark with a full notebook experience to author compelling narratives and create data science collaborative spaces. Jupyter is a multi-lingual REPL on steroids. Jupyter notebook provides a collection of tools for scientific computing using powerful interactive shells that combine code execution with the creation of a live computational document. These notebook files can contain arbitrary text, mathematical formulas, input code, results, graphics, videos and any other kind of media that a modern web browser is capable of displaying. So, whether you’re absolutely new to R or Python or SQL or do some serious parallel/technical computing, the Jupyter Notebook in Azure is a great choice.

Scenario 7_Spark with Notebook

You can also use Zeppelin notebooks on Spark clusters in Azure to run Spark jobs. Zeppelin notebook for HDInsight Spark cluster is an offering just to showcase how to use Zeppelin in an Azure HDInsight Spark environment. If you want to use notebooks to work with HDInsight Spark, I recommend that you use Jupyter notebooks. To make development on Spark easier, we support IntelliJ Spark Tooling which introduces native authoring support for Scala and Java, local testing, remote debugging, and the ability to submit Spark applications to the Azure cloud.

Scenario #8: Using Excel with Spark

As a final example, I wanted to describe the ability to connect Excel to Spark cluster running in Azure using the Microsoft Open Database Connectivity (ODBC) Spark Driver. Download it here.

Scenario 8_Spark with Excel

Excel is one of the most popular clients for data analytics on Microsoft platforms. In Excel, our primary BI tools such as PowerPivot, data-modeling tools, Power View, and other data-visualization tools are built right into the software, no additional downloads required. This enables users of all levels to do self-service BI using the familiar interface of Excel. Through a Spark Add-in for Excel users can easily analyze massive amounts of structured or unstructured data with a very familiar tool.

Conclusion

Above, I’ve described some of the amazing, game-changing scenarios for real-time big data processing with Spark on Azure. Any company across the globe, from a huge enterprise to a small startup can take their business to the next level with these scenarios and solutions. The question is, what are you waiting for?

21 Sep 23:53

Expanding the uses of DBCC CLONEDATABASE

by Erin Stellato

Service Pack 2 for SQL Server 2014 was released last month (read the release notes here) and includes a new DBCC statement: DBCC CLONEDATABASE.  I was pretty excited to see this command introduced, as it provides a very easy way to copy a database schema, including statistics, which can be used for testing query performance without requiring all the space needed for the data in the database.  I finally made some time to test out DBCC CLONEDATABASE and understand the limitations, and I have to say it was rather fun.

The Basics

I started out by creating a clone of the AdventureWorks2014 database and running a query against the source database and then the clone database:

DBCC CLONEDATABASE (N'AdventureWorks2014', N'AdventureWorks2014_CLONE');
GO
 
SET STATISTICS IO ON;
GO
SET STATISTICS TIME ON;
GO
SET STATISTICS XML ON;
GO
 
USE [AdventureWorks2014];
GO
 
SELECT *
FROM [Sales].[SalesOrderHeader] [h]
JOIN [Sales].[SalesOrderDetail] [d] ON [h].[SalesOrderID] = [d].[SalesOrderID]
ORDER BY [SalesOrderDetailID];
GO
 
USE [AdventureWorks2014_CLONE];
GO
 
SELECT *
FROM [Sales].[SalesOrderHeader] [h]
JOIN [Sales].[SalesOrderDetail] [d] ON [h].[SalesOrderID] = [d].[SalesOrderID]
ORDER BY [SalesOrderDetailID];
GO
 
SET STATISTICS IO OFF;
GO
SET STATISTICS TIME OFF;
GO
SET STATISTICS XML OFF;
GO

If I look at the I/O and TIME output, I can see that the query against the source database took longer and generated a lot more I/O, both of which are expected as the clone database has no data in it:

/* SOURCE database */

 

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

 

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 4 ms.

 

(121317 row(s) affected)

 

Table 'SalesOrderHeader'. Scan count 0, logical reads 371567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

Table 'SalesOrderDetail'. Scan count 5, logical reads 1361, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

(1 row(s) affected)

 

SQL Server Execution Times:
CPU time = 686 ms,  elapsed time = 2548 ms.

/* CLONE database */

 

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

 

SQL Server parse and compile time:
CPU time = 12 ms, elapsed time = 12 ms.

 

(0 row(s) affected)

 

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

Table 'SalesOrderHeader'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

Table 'SalesOrderDetail'. Scan count 5, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

(1 row(s) affected)

 

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 83 ms.

If I look at the execution plans, they are the same for both databases except for the actual values (the amount of data that actually moved through the plan):

Query Plan for AdventureWorks2014 databaseQuery Plan for AdventureWorks2014 database

Query Plan for AdventureWorks2014_CLONE databaseQuery Plan for AdventureWorks2014_CLONE database

This is where the value of DBCC CLONEDATABASE is apparent – I can get an empty copy of a database to anyone (Microsoft Product Support, my fellow DBA, etc.) and have them recreate and investigate an issue, and they don't need potentially hundreds of GB of disk space to do it. Melissa’s July T-SQL Tuesday post has detailed information about what happens during the clone process, so I recommend reading that for more information.

Is that it?

But… can I do more with DBCC CLONEDATABASE?  I mean, this is great, but I think there are a lot of other things I can do with an empty copy of the database.  If you read the documentation for DBCC CLONEDATABASE, you’ll see this line:

Microsoft Customer Support Services may ask you to generate a clone of a database by using DBCC CLONEDATABASE in order to investigate a performance issue related to the query optimizer.

My first thought was, “query optimizer – hmm… can I use this as an option for testing upgrades?”

Well, the cloned database is read-only, but I thought I’d try to change some options anyway. For example, if I could change the compatibility mode, that would be really cool, as then I could test CE changes in both SQL Server 2014 and SQL Server 2016.

USE [master];
GO
 
ALTER DATABASE [AdventureWorks2014_CLONE] SET COMPATIBILITY_LEVEL = 110;

I get an error:

Msg 3906, Level 16, State 1
Failed to update database "AdventureWorks2014_CLONE" because the database is read-only.
Msg 5069, Level 16, State 1
ALTER DATABASE statement failed.

Hm.  Can I change the recovery model?

ALTER DATABASE [AdventureWorks2014_CLONE] SET RECOVERY SIMPLE WITH NO_WAIT;

I can.  That doesn’t seem fair.  Well, it’s read-only, can I change that?

ALTER DATABASE [AdventureWorks2014_CLONE] SET READ_WRITE WITH NO_WAIT;

YES!  Before you get too excited, let me leave this note from the documentation right here:

Note The newly generated database generated from DBCC CLONEDATABASE isn't supported to be used as a production database and is primarily intended for troubleshooting and diagnostic purposes. We recommend detaching the cloned database after the database is created.

I'm going to repeat this line from the documentation, and bold it and put it red as a friendly but extremely important reminder:

The newly generated database generated from DBCC CLONEDATABASE isn't supported to be used as a production database and is primarily intended for troubleshooting and diagnostic purposes.

Well that’s fine with me, I definitely wasn’t going to use this for production, but now I can use it for testing!  NOW I can change the compatibility mode, and NOW I can back it up and restore it on another instance for testing!

USE [master];
GO
 
BACKUP DATABASE [AdventureWorks2014_CLONE]
  TO  DISK = N'C:\Backups\AdventureWorks2014_CLONE.bak'
  WITH INIT, NOFORMAT, STATS = 10, NAME = N'AW2014_CLONE_full';
GO
 
/* restore on SQL Server 2016 */
 
 
RESTORE DATABASE [AdventureWorks2014_CLONE]
FROM  DISK = N'C:\Backups\AdventureWorks2014_CLONE.bak' WITH
MOVE N'AdventureWorks2014_Data' TO N'C:\Databases\AdventureWorks2014_Data_2684624044.mdf',
MOVE N'AdventureWorks2014_Log' TO N'C:\Databases\AdventureWorks2014_Log_3195542593.ldf',
NOUNLOAD,  REPLACE,  STATS = 5;
GO
 
ALTER DATABASE [AdventureWorks2014_CLONE] SET COMPATIBILITY_LEVEL = 130;
GO

THIS IS BIG.

In my last post I talked about trace flag 2389 and testing with the new Cardinality Estimator because, friends, you need to be testing with the new CE before you upgrade. If you do not test, and if you change the compatibility mode to 120 (SQL Server 2014) or 130 (SQL Server 2016) as part of your upgrade, then you run the risk of working in a fire-fighting mode if you run into regressions with the new CE. Now, you could be just fine, and performance may be even better after you upgrade. But… wouldn’t you like to be certain?

Very often when I mention testing before an upgrade, I’m told that there is no environment in which to do the testing.  I know some of you have a Test environment. Some of you have Test, Dev, QA, UAT and who knows what else. You’re lucky.

For those of you that state you have no test environment at all in which to test, I give you DBCC CLONEDATABASE. With this command, you have no excuse to not run the most frequently-executed queries and the heavy-hitters against a clone of your database. Even if you don’t have a test environment, you have your own machine.  Backup the clone database from production, drop the clone, restore the backup to your local instance, and then test.  The clone database takes up very little space on disk and you won’t incur memory or I/O contention as there’s no data.  You will be able to validate query plans from the clone against those from your production database. Further, if you restore on SQL Server 2016 you can incorporate Query Store into your testing! Enable Query Store, run through your testing in the original compatibility mode, then upgrade the compatibility mode and test again. You can use Query Store to compare queries side by side! (Can you tell I'm dancing in my chair right now?)

Considerations

Again, this shouldn't be anything you would use in production, and I know you wouldn't do that, but it bears repeating because in its current state, DBCC CLONEDATABASE is not fully complete.  This is noted in the KB article under supported objects; objects such as memory optimized tables and file tables are not copied, Full-text is not supported, etc.

Now, the clone database isn’t without drawbacks. If you inadvertently run an index rebuild or an update to statistics in that database, you’ve just wiped out your test data.  You will lose the original statistics which is what probably you really wanted in the first place.  For example, if I check statistics for the clustered index on SalesOrderHeader right now, I get this:

USE [AdventureWorks2014_CLONE];
GO
DBCC SHOW_STATISTICS (N'Sales.SalesOrderHeader',PK_SalesOrderHeader_SalesOrderID);

Original statistics for SalesOrderHeaderOriginal statistics for SalesOrderHeader

Now, if I update statistics against that table, I get this:

UPDATE STATISTICS [Sales].[SalesOrderHeader] WITH FULLSCAN;
GO
 
DBCC SHOW_STATISTICS (N'Sales.SalesOrderHeader',PK_SalesOrderHeader_SalesOrderID);

Updated (empty) statistics for SalesOrderHeaderUpdated (empty) statistics for SalesOrderHeader

As an additional safety, it's probably a good idea to disable auto updates to statistics:

USE [master];
GO
ALTER DATABASE [AdventureWorks2014_CLONE] SET AUTO_UPDATE_STATISTICS OFF WITH NO_WAIT;

If you do happen to update statistics unintentionally, running DBCC CLONEDATABASE and going through the backup and restore process isn’t that hard, and you’ll have it automated in no time.

You can add data to the database. This could be useful if you want to experiment with statistics (e.g. different sample rates, filtered statistics) and you have enough storage to hold a copy of the table’s data.

With no data in the database, you’re obviously not going to get reliably representative duration and I/O data. That’s ok. If you need data about true resource usage, then you need a copy of your database with all the data in it. DBCC CLONEDATABASE is really about testing query performance; that’s it. It’s not a replacement for traditional upgrade testing in any way – but it is a new option for validating how SQL Server optimizes a query with different versions and compatibility modes. Happy testing!

The post Expanding the uses of DBCC CLONEDATABASE appeared first on SQLPerformance.com.

21 Sep 23:53

IsItSql Shows Databases

by Bill Graziano

You can download the latest build of Is It SQL from my consulting site.  You can now view the databases on a server.  And you can click on a column header to sort that column.

database-page

And no, my fantasy baseball team in 2015 didn’t do very well.  Unfortunately this year is much the same.   

Here are a few other improvements:

  • The Active Task page for a server no longer displays tasks waiting on BROKER_RECEIVE_WAITFOR.
  • The menu bar stays on top when scrolling down.  And I’m really surprised how much this little change has made the whole application so much more usable.
  • There’s better error reporting if you launch the application and the port is already in use.  I mainly see this when I launch the application and it’s already running as a service.
  • It prioritizes ODBC 13 over ODBC 11.  And it will gracefully fail back to earlier drivers until it finds one it can use.

If you’re running as a service, just stop the service, copy over IsItSql.EXE, and restart the service.  There’s never any installation needed.  And my email is at the bottom of the README if you have any questions or concerns.

21 Sep 23:53

I’m a Pluralsight author!

by Gail

My first course, Identifying & Fixing Performance Issues Caused by Parameter Sniffing, was published two weeks ago. It won’t be the last.

Recording the course was a voyage of discovery. Until then, I’d only ever done live presentations, blog posts and articles. I initially thought that the recording would be similar to live presentations, but it’s nothing close.

For presentations, I do them off-the-cuff. Oh, I rehearse them beforehand, but I have no script, no speakers notes, no cue cards. My slides are as much for me, to direct what I’m talking about at each point, as they are for the audience. Two presentations using the same slide deck are not going to be the same.

I initially tried that method for recording, and it was a mess. Because the mistakes and half-sentences and corrections that are fine in a live presentation are not fine with a recording. The recording has to be near-perfect, all those little mistakes have to be edited out and re-recorded.

The first recording I did I had to record three times and there were still places wrong.

So I switched to scripting the entire thing, and then just ‘reading’ the script, taking care that it didn’t sound like reading a script. That works much better, but the time to write the script is huge. I speak at roughly 160 words/minute. A full page in Word, with default font and spacing, is around 600 words. 30 minutes of recording means around 8-9 pages of script.

The recording, I’ve found, is the least time-consuming part of the exercise (which is good, because it’s only quiet enough to record after 8PM, I live just off a busy road with a school a block away)

The editing is the most tedious. 20 minutes of finished video requires around 40 minutes or more of recording and probably 1.5-2 hours of editing, more if it was a demo.

The demos are still a problem and one where I need to figure out a good process. What I did for this course was to record the video of the demo, do a quick edit to take out mistakes, then record the voice over the top, then edit them together. While it works, it’s a monumental pain. A 15 minute demo took 3 hours of editing to put together.

For the next one I’ll try recording the demos, video and audio, in smaller chunks. Hopefully will make it easier to piece the audio and video together. The finished clips can easily be edited together at the end. Hopefully that’ll make the demos less of a pain.

21 Sep 23:53

PASS Summit is back! Looks inside the 2016 edition #Summit16

by Sergio Govoni

The PASS Summit is back! This year, the most important event around the world for Microsoft Data Platform will be held again in Seattle (WA) from 25 to 28 October 2016 and it will be preceded from two pre-conference days from 24 to 25 October 2016.

I wrote "the most important event around the world" because numbers of PASS Summit 2015 are self explanatory! Last year 5,000 data-geeks have attended the PASS Summit which has become the landmark event for the entire Microsoft Data Platform. Take a look at numbers of the year 2015, that are shown in the following pictures, they are really impressive, aren't they?

Every year more and more IT Professionals, Technicians, Analysts and Data Scientists consider this event as a unique opportunity to: connect to other people that have the same passion you have, share your experience and the problems you face for your job and learn more and more about the Microsoft Data Platform for choose the best technology to win the challenges of the market.

Don't forget that you also have the opportunity to meet and talk to people like Conor Cunningham (one of the principal architect of the relational Engine of SQL Server and SQL Azure), David DeWitt (one of the biggest expert of Parallel Databases), Bob Ward (Chief Technology Officer Microsoft CSS) and Mark Souza (General Manager in the Microsoft Data Platform Group). Are you thinking that people are too busy for a talk with you? Nobody of these big names walks away if you try to meet and talk them!

If you are undecided about the contents provided at the PASS Summit, you can watch the sessions of the previous year on PASS TV and you will touch with your hands the quality of the sessions provided at PASS Summit!

  • PASS Summit 2015 – Day 1
  • PASS Summit 2015 – Keynote Day2 (David DeWitt and Rimma Nehme)
  • PASS Summit 2015 – Day 2
  • PASS Summit 2015 – Day 3 

Are you a beginner or a "first timers" at PASS Summit? Don't worry there are lots of sessions of 100 and 200 levels and you can count on the "First Timers Guidebook" that it will arrive soon!

What kind of sessions can you expect to see at the PASS Summit 2016? Find it out from September 07 to 08, 2016 at the 24 Hours of PASS: Summit 2016 Preview Edition, register now at this link, thanks to the sponsors, the 24 Hours of PASS is presented at no cost! This edition of 24 Hours of PASS wants to be a sneak taste of what you can expect from PASS Summit 2016.

You can find all information you need on PASS Summit 2016 website and for your convenience, here some useful links:


See you there!

21 Sep 23:53

How to Rename SQL Server servers

by Wayne Sheffield
Rename SQL Server

Rename SQL Server

Rename SQL ServerSometimes you make a mistake, and forget to rename a syspred’d server before installing SQL Server. Or perhaps your corporate naming standard has changed, and you need to rename a server. Maybe you like to waste the time involved in troubleshooting connection issues after a server rename. In any case, you now find yourself where the name of the SQL Server is different than the physical name of the server itself, and you need to rename SQL Server to match the server’s physical name.

You could always rerun the setup program to rename the server. Fortunately, SQL Server provides an easier way to do this. You just need to run two stored procedures: sp_dropserver and sp_addserver. The following script demonstrates this concept. First, it will get the current name of the SQL Server name, the name of the computer, and the name of the SQL Server instance. Next, if the computer name plus the instance name is not the same as the SQL Server name, then it runs the sp_dropserver and sp_addserver stored procedures to rename SQL Server. The “LOCAL” parameter of sp_addserver denotes that this is the name of the local server. Consequently, you will need to restart the instance for the name change to take effect.

Before you just run this script, there are a few things to take into consideration:

  • If this is part of a SQL Server failover cluster, then a different process is needed. See this link to rename the cluster’s virtual SQL Server name. To rename the individual nodes, each node must be evicted from the cluster, the instance renamed (per this script), and then the node added back to the cluster.
  • SQL Server does not support renaming a server involved in replication.
  • Renaming a server that runs Reporting Services (SSRS) may result in SSRS not being available after the rename. If this happens, see this link.
  • When using database mirroring, you need to stop the mirroring before the rename, and reestablish it when finished.
  • If Remote Logins, Linked Servers, or Client Alias Names are used, see the “Other Considerations” section in this link.

/******************************************************************************
Rename a SQL Server instance to match the computer name.
Reference: https://msdn.microsoft.com/en-us/library/ms143799.aspx
*******************************************************************************
                               MODIFICATION LOG
*******************************************************************************
2016-08-27 WGS Initial creation.
******************************************************************************/
DECLARE @OldServerName sysname,
        @NewServerName sysname,
        @InstanceName  sysname;
SELECT  @OldServerName = @@SERVERNAME,
        @NewServerName = CONVERT(sysname, SERVERPROPERTY('ComputerNamePhysicalNetBIOS')),
        @InstanceName  = CONVERT(sysname, SERVERPROPERTY('InstanceName'));

IF @InstanceName IS NOT NULL SET @NewServerName = @NewServerName + N'\' + @InstanceName;

IF @NewServerName <> @OldServerName
BEGIN
    SELECT * FROM sys.servers;
    EXECUTE sp_dropserver @OldServerName;
    EXECUTE sp_addserver @NewServerName, local;
    RAISERROR('Restart SQL Instance service to have the name change take effect', 10, 1) WITH NOWAIT;
END;
GO

SELECT * FROM sys.servers;

 

The post How to Rename SQL Server servers appeared first on Wayne Sheffield.

21 Sep 23:52

No 32-bit for SQL Server 2016 Express

by Haidong Ji

I’ve learned that SQL Server 2016 Standard and Enterprise Editions no longer provide 32-bit. But I do wonder about SQL Server 2016 Express Edition. It’s different in that it’s free, and mostly geared toward lightweight usage, people who are learning, etc. So perhaps it still offers 32-bit?

After some upgrade work to one SQL Server 2008 R2 Express 32-bit, I can tell you with real experience that SQL Server 2016 Express does NOT have 64-bit either.

So the latest Express edition that has 32-bit is SQL Server 2014. Like Allan Hirt, I also say good riddance. It’s time to move on.

21 Sep 23:51

Benchmarking Hardware and Environments

by GrumpyOldDBA
I suppose it would be fair to say I didn’t pick up the Grumpy Old DBA for no reason, as well as the usual paranoia of a Production DBA , distrust of almost anyone who wants to go near a database and possibly over protective nature regarding databases...(read more)
21 Sep 23:46

The Basics of Biml – Populating the Biml Relational Hierarchy

by andyleonard
In this post, I’m going to demonstrate how to build the objects Business Intelligence Markup Language ( Biml ) requires before creating anything – the Biml Relational Hierarchy . The Biml Relational Hierarchy provides the foundation for all relational interaction between packages, cubes, dimensions, facts, and T-SQL. It’s important to note that Biml is useful for generating SSIS and SSAS, but Biml can generate any text – which includes .Net code (I’ve used Biml to generate C#) – that is based on...(read more)
21 Sep 23:46

Log Page Life Expectancy over time

by TiborKaraszi
You often see Page Life Expectancy referred to as an interesting performance monitor counter. And it can be! It indicates for how long a page is expected to stay in cache, from the time it was brought into cache. But just looking at a snapshot value doesn't say that much. It might be high, but that is because you haven't had a high turnover of you pages for the past couple of hours. Or the other way around, you happen to look just after a very rare monster query. Furthermore, having a log can show...(read more)
21 Sep 23:45

What’s a Data Professional Doing at #VMWorld?

by Karen Lopez

Last week I attended VMWorld, the conference for VMWare customers and partners.  I know what you are thinking: “why would a DataChick go to a conference about virtualization technologies?” 

Yes, VMWare is a bit off my normal path of events and writings, but that makes it even more interesting to me. I attended because:

1. Tech Field Day Extra

Tech Field Day invited me to attend Tech Field Day Extra (#TFDx), which is an abbreviated version of their full events (like the Cloud Field Day 1 (#CFD1) I’m attending next week.  Tech Field Days bring in vendor product teams to demo and talk about their products with independent professionals who share their thoughts about what they heard with their audiences and communities. I attended the presentations for:

Docker:  Docker is software based on open standards that helps you package up all the parts of a solution and then deploy that anywhere.  You may have heard people talking about containers and how they help with successful DevOps processes. By using containers, deployments are easier to deploy and scale. More about Docker. 

image
https://www.docker.com/what-docker#/VM

I’ll be writing more about Docker and Datachick data pros in another post.

Primary Data: Primary Data presented about their solution Datasphere, a data virtualization product that uses some nifty market-optimization-like processing to automatically move data to where it needs to be, when it needs to be there.  It’s “storage agnostic”, meaning through rules and group, data professionals can guide the right places for data to reside, and let the system decide (if needed), the fastest place for that data to rest. 

The also had me at the wonderful space graphics on their website.

image
http://primarydata.com

I cover Primary Data in a future post, where I will talk about the use of rules and groups and objectives metadata to manage the data virtualization and data orchestration that are possible.

Sandisk: (owned now by Western Digital)  Sandisk Data Center product teams talked with us about some deep dive internal virtualization features that frankly are well beyond my skills levels in virtual machines.  As an overview, they talked about using Flashsoft for VMWare APIs for managing IO for  storage / caches.

image
https://www.sandisk.com/business/datacenter/resources/data-sheets/flashsoft-4-for-vmware-vsphere-6

I will be hearing from again next week at Cloud Field Day 1, so I will be writing about them in a future post.

2. VMWorld Press

I was invited to VMWorld on a press credential.  That meant I had access to all sessions and exhibits.  I attended various press conference/meetings.  I spent time talking to vendors who were most focused on data, DevOps and cloud technologies: Primary Data, SkyTap, SolarWinds, Datrium, Pure Storage, Dell Software, Turbonomic, X-IO, Github, Puppet, and SIOS.  Most of my coverage of these technologies happened via Twitter @datachick.  I expect from the conversations, though, that I will be covering these solutions and services in the longer term.  Once this series is completed, I’ll wrap it up with some thoughts on VMWorld.

3. Professional Development

Over the last couple of years I’ve been focusing a lot of my professional development on cloud technologies and processes.  This leads to learning more about hybrid technologies (cloud and on-prem, plus private clouds). All of this has shown me that I need to understand virtualization and data centre technologies more than I have had to know in the past.  Working in other communities has helped me make the contacts and friends that I need to be successful. I think every few years IT pros should be an event that is related to but not the focus of their specialization to broaden their understand of the tiny piece of the puzzle they work on.

I also found some time to attend sessions and I hope to get some posts up later about the ones I picked.

4. My Own Data Management Environments

While I was attending these sessions and talking to vendors, I was thinking about the data tools environments I manage: repositories, model marts, data management tools, configuration files, etc.  All of them can benefit from my implementing these technologies.  It’s sort of a “metadata centre” I need to think about, too. I’m hoping to write about those experiences as well.

Finally

The advent of Software-defined {Storage | Data Centre |Networks | Software Smile} means that configurations, metadata, policies, and rules will need to be well-managed.  I see my job as a data professional just as applicable in managing data centre data as line of business data.  If we aren’t apply our rules to our own work, then why would the business trust us when we tell them they should be doing that with “their” data?

Related posts:

  1. Follow Along TechFiedDay10 #TFD10 Austin–Updated with Video Streaming Last year I participated in the first Data Field Day...
  2. I’m Going to be TECHUnplugged in Austin …and you should join me. On 2 February I’ll be...
  3. Join me at DellWorld 2016 in Austin, TX I will be attending DellWorld 2016 as an influencer/media/analyst participant. This...
21 Sep 23:45

Search for SQL Saturday sessions by specified key phrase

by Wayne Sheffield
Sure wish we could search this site
Sure wish we could search this site

Click image to find an event near you!

SQL Saturdays are a neat idea – they’re a day-long event of free training, encompassing several one-hour sessions. These events will normally have 5 or 6 time slots during the entire day. Therefore, if the event has 6 concurrent sessions, that is 36 hours of training material. There is usually more than one event going on most weekends. If we assume 100 events at 36 sessions each, then there are 3600 sessions in a year. Since the speakers normally post their presentation materials and demo scripts on the site, the site itself has become a resource for additional training material. While the program is fantastic, the problem is that there isn’t a way to search the site for sessions.

Search the SQL Saturday sessions

5315.powershell-logo.gif-550x0Therefore, I’ve developed a PowerShell script that will search the SQL Saturday site. It searchs for sessions where the search phase is in either the session title or abstract. Without further ado, I introduce Get-SQLSaturdaySessionTopicSearch (which is available in my Code Library).

Input:

Name Required? What it does
-SearchTopic Required The phrase that you are searching for.
-StartDate Optional The first event date that you want to search. This defaults to 30 days before the current date.
-EndDate Optional The last event date that you want to search. This defaults to 30 days after the current date.
-EventNumber Optional The first event number where you want to start searching. Defaults to 500.
-ExportFile Optional The path and filename of the file to export the results to.
-DebugLevel Optional Controls the display of debugging and progress messages. Defaults to 1.

-DebugLevel values:

1. Displays the SQL Saturday URL as it is being processed.
2. Displays the event name and date if the feed for that SQL Saturday could be opened.
3. Displays session title for matched sessions.
4. Displays all session titles.

Examples:

.\Get-SQLSaturdaySessionTopicSearch.ps1 -SearchTopic ‘Query Store’

.\Get-SQLSaturdaySessionTopicSearch.ps1 -SearchTopic ‘Query Store’ -ExportFile ‘C:\Temp\SQLSatSearchResults.csv’

Output:

The output to the screen is the Event #, Speaker, Session Title and URL for the presentation.

The generated export file will also include the event name and session abstract. Additionally, the URL will be encased with the Excel HYPERLINK() function. When the export file is opened up with Excel, clicking the URL will open your browser to the session information, where the presentation material can be downloaded.

I hope that you get a lot of use out of this script.

The post Search for SQL Saturday sessions by specified key phrase appeared first on Wayne Sheffield.

21 Sep 23:45

SQLSentry does it Again – Plan Explorer is Completely Free

by Andrew Kelly
Many of us have used the free version of Plan Explorer from SQLSentry for a long time to help tune and explore query plans in a way that SSMS can only dream of. Unlike most free tools this one still had plenty of useful features that served the community...(read more)
21 Sep 23:45

Connect to Azure SQL Database V12 via Redirection

by Kun Cheng (SQLCAT)

Reviewed by: Vince Curley, Saurabh Singh, Joe Ponce-Galindo, Murugan Ayyappan, Dimitri Furman, Denzil Ribeiro, Arvind Shyamsundar, Murshed Zaman, Sanjay Mishra, Mike Weiner

Introduction

In the old days of Azure SQL Database (prior to V12), SQL Database used what is called a gateway to proxy all connections and communications between clients and user databases. With V12, the gateway is still there, but it helps to establish the initial connection, and then gets out of the way in some cases. In the cases where direct connection can be established, subsequent communication happens directly between client and user database without going through the gateway anymore. This feature is also known as client “redirection”. The benefit of this “redirection” is faster response time for each database call, and better performance.

So how do you know if your application is taking advantage of the “redirection”?

The first restriction is that “redirection” by default is only supported for connections originating within Azure IP address space, so your application and Azure SQL database must both be deployed in Azure. However, an application outside Azure can also use “redirection” when a server connection policy is properly created (connectionType should be set as “Redirect” to enable “redirection”) against the target Azure SQL Database server. Keep in mind though the latency/perf benefit of redirection is very much diminished in the latter scenario since internet connection latency from outside the Azure data center would be much higher.

Second, your application must be using a SQL Server driver that supports TDS 7.4. Those drivers include (not a comprehensive list):

  • ADO.Net 4.5 or above
  • Microsoft SQL Server JDBC 4.2 or above (JDBC 4.0 actually supports TDS 7.4 but does not implement “redirection”)
  • Microsoft SQL Server ODBC 11 or above

— Note: Tedious for Node.js and JDBC 4.0 don’t implement redirection.

A simple way to find out what version of TDS the application is using is by querying:

SELECT session_id, protocol_type, protocol_version = SUBSTRING(CAST(protocol_version AS BINARY(4)),1,1)

FROM sys.dm_exec_connections

 

Sample output:

session_id           protocol_type   protocol_version

89                           TSQL                      0x74

105                         TSQL                      0x74

 

If protocol_version is equal to or greater than 0x74 then the connection would support “redirection.”

Third, as documented here, even applications using the right SQL Server drivers aren’t guaranteed to make successful connections via “redirection”. You also need to make sure the following ranges of outbound TCP ports (in addition to 1433) are open on the application instance: 11000-11999, 14000-14999. This is the reason why “redirection” is not enabled by default for connections originating outside of Azure – in some on-premises environments, network administrators may be unwilling to open these additional outbound port ranges, causing connection attempts to fail.

 

Use Wireshark to look deeper how redirection works

Now let’s use Wireshark (a network tracing tool) to examine the network traffic of a sample application running on an Azure VM that connects to an Azure SQL database, so we can see how it works. (If your application is deployed in a VM or cloud service, you can RDP into your app instance and install 3rd-party tools like Wireshark. Azure App Service doesn’t allow RDP.)

Sample application connection step through:

  1. Open a new connection to an Azure SQL database
  2. Execute command to run Ad-hoc query 1
  3. Execute command to run Ad-hoc query 2

In step #1, when new connection is being established, we can see in Wireshark the TCP connection handshake pre-login as shown below (starting at time 2.702112). 10.5.0.4 is local VM IP address where the application is running. 191.235.193.75 is the gateway IP address, used for inbound traffic on default port 1433.

wireshark1

To finish establishing the connection, a dynamically identified port, in this case 11142, was sent to the application (time 2.790811). The application used that port and connected to the target user database (time 2.791394), with the IP address 191.235.193.77. The application then executed the first command (time 2.792376+).

wireshark2

Let’s proceed with executing the 2nd Ad-hoc query command. Remember that the connection is still open at this point, so when the application sends the command, it doesn’t need to go through the gateway (191.235.193.75) anymore. Instead it uses the “redirection” to communicate with the user database (191.235.193.77) directly (time 8.891064+).

wireshark3

Recap

To summarize, for an application running in the same data center as SQL database to leverage “redirection” capability, it needs to:

  1. Use SQL Server driver version that supports TDS 7.4 or above (ADO.Net 4.5, JDBC 4.2, ODBC 11, or above).
  2. Make outbound TCP ports open on the application instance: 1433, 11000-11999 and 14000-14999.
21 Sep 23:45

Dell and EMC Merger Complete; Forms World’s Largest Privately-Controlled Tech Company

by A.R. Guess

by Angela Guess A new press release reports, “Dell Technologies today announced completion of the acquisition of EMC Corporation, creating a unique family of businesses that provides the essential infrastructure for organizations to build their digital future, transform IT and protect their most important asset, information. This combination creates a $74 billion market leader with […]

The post Dell and EMC Merger Complete; Forms World’s Largest Privately-Controlled Tech Company appeared first on DATAVERSITY.

21 Sep 23:39

24 Hours of PASS (September 2016): Recordings now available!

by Sergio Govoni
 
The Sessions of the event 24 Hours of PASS named "Summit Preview Edition" (held the last September 2016 on 07 and 08) were recorded and now they are available for online streaming!
 
If you have missed one session in particular or the entire event, you can view or review your preferred sessions.
 
Each video is available on detail page of the related session.
 
Enjoy!
21 Sep 23:36

Real-World Azure SQL DB: Unexpected Database Maximum Size Limit

by Dimitri Furman

Reviewed by: Kun Cheng, Sanjay Mishra, Denzil Ribeiro, Arvind Shyamsundar, Mike Weiner, and Murshed Zaman

The Problem: A Production Outage

A customer using Azure SQL Database recently brought an interesting problem to our attention. Unexpectedly, their production workload started failing with the following error message: “The database ‘ProdDb’ has reached its size quota. Partition or delete data, drop indexes, or consult the documentation for possible resolutions.” The database was in a Premium elastic pool, where the documented maximum size limit for each database is 500 GB. But when they checked the size of the database shown in the Azure Portal, it was only 10 GB, and the portal was showing that all available database space has been used. Naturally, they were wondering why the database was out of space even when they were not near the maximum database size limit for their premium elastic pool.

Explanation

One of the established capacity limits of each Azure SQL DB database is its size. The maximum size limit is determined by the service objective (a.k.a. performance tier, or service tier) of the database, as documented in resource limit documentation. To determine the size limit, or size quota, that is set for a particular database, the following statement can be used, in the context of the target database:

SELECT DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes');

When a new database is created, by default its size quota is set to the maximum allowed for the service objective. However, it is possible to set the limit to a lower value, either when creating the database, or later. For example, the following statement limits the size of an existing database named DB1 to 1 GB:

ALTER DATABASE DB1 MODIFY (MAXSIZE = 1 GB);

Customers can use this ability to allow scaling down to a lower service objective, when otherwise scaling down wouldn’t be possible because the database is too large.

While this capability is useful for some customers, the fact that the actual size quota for the database may be different from the maximum size quota for the selected service objective can be unexpected, particularly for customers who are used to working with the traditional SQL Server, where there is no explicit size quota at the database level. Exceeding the unexpectedly low database size quota will prevent new space allocations within the database, which can be a serious problem for many types of applications.

In this context, there is one particular scenario that we would like to call out. Specifically, when a database with a size quota explicitly lowered from the default is scaled up to a higher service objective, its size quota remains unchanged. For an administrator expecting the maximum size quota for the new service objective to be in effect after the scaling operation completes, this may be an unpleasant surprise.

Let’s walk through an example. First, let’s create an S2 database without specifying an explicit database size quota:

CREATE DATABASE DB1 (SERVICE_OBJECTIVE = 'S2');

Once the database is created, we can query its current size quota, and see that it is set to the expected maximum for S2:

SELECT DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes');
-- Result: 268435456000 == 256,000 MB

(There is a minor inconsistency here that you might have noticed: the default quota is actually 256 thousands of megabytes, not 256 GB.)

Now let’s lower the quota and query it again:

ALTER DATABASE DB1 MODIFY (MAXSIZE = 10 GB);
SELECT DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes');
-- 10737418240 == 10 GB

We see that the quota has been lowered to 10 GB as expected. Now, let’s scale the database up to P1:

ALTER DATABASE DB1 MODIFY (SERVICE_OBJECTIVE = 'P1');

Note that scaling operations are asynchronous, so the ALTER DATABASE command will complete quickly, while the actual change can take much longer. To determine if the scaling operation on the DB1 database has completed, query the sys.dm_operation_status DMV in the context of the master database.

SELECT operation, state_desc, percent_complete, start_time, last_modify_time
FROM sys.dm_operation_status
WHERE resource_type_desc = 'Database'
      AND
      major_resource_id = 'DB1'
ORDER BY start_time;

/*
operation state_desc percent_complete start_time last_modify_time
--------------------------- ---------------- ----------------------- -----------------------
CREATE DATABASE COMPLETED 100 2016-09-02 15:11:28.243 2016-09-02 15:12:09.933
ALTER DATABASE COMPLETED 100 2016-09-02 15:16:49.807 2016-09-02 15:16:50.700
ALTER DATABASE COMPLETED 100 2016-09-02 15:23:26.623 2016-09-02 15:25:24.837
*/

This shows all recent operations for the DB1 database. We see that the last ALTER DATABASE command has completed. Now we can query the size quota again (in the context of the DB1 database):

SELECT DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes');
-- 10737418240 == 10 GB

We see that even though the maximum size limit for a P1 database is 500 GB, the quota is still set to 10 GB.

Conclusion

It is important to know that in Azure SQL DB databases, an explicit database size quota always exists. This quota can be lower than the maximum (and default) quota for a given service objective. While for some customers this may be intentional, most would prefer the maximum quota to be in effect, particularly after scaling the database up.

We recommend that customers:

1. Proactively check the current size quota for your databases, to make sure it is set as expected. To do this, the following statement can be used in the context of the target database:

SELECT DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes');

2. When scaling up to the service objective with a larger maximum size quota, explicitly change the quota to match the maximum by using the ALTER DATABASE … MODIFY (MAXSIZE = …) command as shown above (unless a lower quota is desired to guarantee being able to scale down in the future). The change is executed in an online manner.

This is what the customer we mentioned in the beginning of this article did in order to resolve their application outage, and to proactively prevent a reoccurrence of the same problem.


21 Sep 23:36

You’ve been doing cloud for years...

by Rob Farley

This month’s T-SQL Tuesday is hosted by Jeffrey Verheul (@devjef) and is on the topic of Cloud.

I seem to spend quite a bit of my time these days helping people realise the benefit of the Azure platform, whether it be Machine Learning for doing some predictions around various things (best course of action, or expected value, for example), or keeping a replicated copy of data somewhere outside the organisation’s network, or even a full-blown Internet of Things piece with Stream Analytics pulling messages off an Service Bus Event Hub. But primarily, the thing that I have to combat most of all is this:

Do I really want that stuff to be ‘out there’?

People are used to having their data, their company information, their processing, going on somewhere outside the building where they physically are.TSQL2sDay150x150

Now, there are plenty of times when organisations’ server rooms aren’t actually providing as much benefit as they expect. Conversations with people quickly help point out that their web site isn’t hosted locally (I remember in the late ‘90s a company I was at making the decision to start hosting their web site at an actual hosting provider rather than having every web request come in through the same modem as all their personal web browsing). Email servers are often the next to go. But for anyone working at home, the server room may as well be ‘the cloud’ anyway, because their data is going off to some ‘unknown’ place, with a decent amount of cabling between where they are and where their data is hosted.

Everyone’s photos are stored in ‘cloud’ already, where it be in Instagram’s repository or in something which is more obviously ‘the cloud’. Messages with people no longer just live on people’s phones, but on the servers of Facebook and Twitter. Their worries and concerns are no longer just between them and their psychiatrist, but stored on Google’s search engine web logs.

The ‘cloud’ is part of today’s world. You’re further into it than you may appreciate. So don’t be afraid, but try it out. Play with Azure ML, or with other areas of Cortana Intelligence. Put some things together to help yourself in your day-to-day activity. You could be pleasantly surprised about what you can do.

@rob_farley