Shared posts

15 Nov 02:26

Anticipation, Fear Uncovered in New Study on What Executives Really Think About AI

by A.R. Guess

by Angela Guess A new press release states, “For nearly two years, artificial intelligence has been one of the most hyped fields to hit enterprise software and modern business. New research data from CITE Research, sponsored by SugarCRM, aims to detect how the relentless news cycle is affecting business executives, and whether or not they […]

The post Anticipation, Fear Uncovered in New Study on What Executives Really Think About AI appeared first on DATAVERSITY.

15 Nov 02:26

Using SQL Service Broker for asynchronous external script (R / Python) execution in OLTP systems

by Arvind Shyamsundar

Authored by Arvind Shyamsundar (Microsoft)
Credits: Prasad Wagle, Srinivasa Babu Valluri, Arun Jayapal, Ranga Bondada, Anand Joseph (members of the Sayint by Zen3 team)
Reviewers: Nellie Gustafsson, Umachandar Jayachandran, Dimitri Furman (Microsoft)

This blog post was inspired our recent work with the Sayint dev team, who are a part of Zen3 Infosolutions. SQLCAT has been working with them in their process of adopting SQL Server 2017. Microsoft introduced the ability to invoke external Python scripts in SQL Server 2017, and this capability to effectively move ‘intelligence’ closer to the data, was a big motivation factor for the Sayint team to adopt SQL Server 2017.

Sayint application Overview

The Sayint application has many layers through which data flows:

  • It all starts with the Automatic Speech Recognition (ASR) server. This server takes audio recordings of conversations between a customer and a call center executive, and then converts that to text.
  • This text is then recorded into a database and in parallel sent to a message queue.
  • The Sayint application uses several Python scripts which are operationalized in a separate application server to read the same call transcript from the message queue.
  • Once the message is read by the Python script; it then processes that call transcript in various ways, one of which is to find ‘n-grams’ in the text.
  • These n-grams are then recorded into the same database. The diagram below shows the data flow at a high level:

Sayint Architecture

The multiple paths that the data takes, and the external application server, are moving parts which add to the latency and complexity of the solution. This is one of the places where the in-database execution of Python code in SQL Server 2017 appeared to be a valuable simplification for the Sayint development team.

Motivation

Given the capability for external script execution in SQL Server, one scenario is that some external (Python) scripts which are fairly lightweight, might be invoked very frequently in a OLTP system. In fact there are special optimizations now in SQL Server 2017 to directly invoke specific types of ML models using the new native / real-time scoring functions.

But what if you had a (more expensive / long running) external script which you ideally wanted to run as ‘quickly’ as possible when new data arrives, but did not want that expensive external script to block the OLTP insert operation?

That is the scenario we address in this blog post. Here, we will show you how you can use the asynchronous execution mechanism offered by SQL Server Service Broker to ‘queue’ up data inside SQL Server which can then be asynchronously passed to a Python script, and the results of that Python script then stored back into SQL Server.

This is effectively similar to the external message queue pattern but has some key advantages:

  • The solution is integrated within the data store, leading to fewer moving parts and lower complexity
  • Because the solution is in-database, we don’t need to make copies of the data. We just need to know what data has to be processed (effectively a ‘pointer to the data’ is what we need).

Service Broker also offers options to govern the number of readers of the queue, thereby ensuring predictable throughput without affecting core database operations.

Implementation Details

We assume that you have some background about how SQL Server Service Broker helps decouple execution of code inside of SQL Server. If not, please take some time to review the documentation on the feature. There are also many excellent articles in the public domain on the general application of Service Broker.

The chart below shows the sequence of operations that is involved in our solution:image

With that overview in our mind, let us take a look at sample code which illustrates exactly how this works!

Tables and Indexes

First let’s create a table which contains the raw call data. The ASR application inserts into this table. Note the call_id which will serve as a ‘unique key’ for incoming calls.

CREATE TABLE dbo.BaseTable (
     call_id    INT PRIMARY KEY NOT NULL,
     callrecord NVARCHAR (MAX)
);

Next, let’s create a table which will store the ‘shredded’ n-grams corresponding to each call_id:

CREATE TABLE dbo.NGramTable (
     call_id   INT            NOT NULL,
     ngramid   INT            IDENTITY (1, 1) NOT NULL,
     ngramtext NVARCHAR (MAX),
     CONSTRAINT PK_NGramTable PRIMARY KEY (ngramid),
     INDEX CCI_NGramTable CLUSTERED COLUMNSTORE,
     INDEX NC_NGramTable NONCLUSTERED (call_id)
);

In the above table, note that we have a Clustered Columnstore Index declared on this table. Here’s why:

  • The primary purpose of that is to enable high performance querying on aggregates (such as COUNT.) The second purpose is to save on space.
  • Secondly, given that n-grams will be naturally repetitive in nature, the dictionary compression offered by Columnstore Indexes is really effective in saving storage space.

In the Sayint application, there is a need to perform full-text search queries over the n-grams (“show me all n-grams which contain a specific keyword”). To support those queries, we proceed to create a SQL Full-Text Index on the n-gram text column:

CREATE FULLTEXT CATALOG [mydb_catalog] WITH ACCENT_SENSITIVITY = ON

CREATE FULLTEXT INDEX ON [dbo].NGramTable
     (ngramtext)
     KEY INDEX PK_NGramTable
     ON mydb_catalog;

Service Broker Objects

With the tables and indexes created, we now proceed to create the necessary Service Broker constructs. First, we enable Service Broker for this database:

ALTER DATABASE mydb SET ENABLE_BROKER;

Well, that was simple enough! Next, let’s create message types and associated contracts:

CREATE MESSAGE TYPE [RequestMessage]
     VALIDATION = WELL_FORMED_XML;

CREATE MESSAGE TYPE [ReplyMessage]
     VALIDATION = WELL_FORMED_XML;

CREATE CONTRACT [SampleContract]
     ([RequestMessage] SENT BY INITIATOR, [ReplyMessage] SENT BY TARGET);

Lastly we create the Queues and associated Services. We have to do this for both the target and initiator, though the initiator queue will not really store any messages in our specific implementation.

CREATE QUEUE TargetQueue;

CREATE SERVICE [TargetService]
     ON QUEUE TargetQueue
     ([SampleContract]);

CREATE QUEUE InitiatorQueue;

CREATE SERVICE [InitiatorService]
     ON QUEUE InitiatorQueue;

Stored Procedures

We first create a stored procedure which takes a list of call_ids and compute the n-grams for those calls. This stored procedure is the one that actually invokes the Python code using sp_execute_external_script.

CREATE PROCEDURE GenerateNGrams
(
     -- varchar is okay here because we only have numbers
     @listOfCallIds varchar(max)
)
AS
BEGIN
     EXECUTE sp_execute_external_script @language = N'Python', @script = N'
import nltk
from nltk import corpus
import pandas as pd
import string
from revoscalepy import RxSqlServerData, rx_data_step
mydict = []
translate_table = dict((ord(char), None) for char in string.punctuation)
def generate_ngrams(row):
     inputText = row["InputText"]
     nopunc = inputText.translate(translate_table)
     raw_words = nltk.word_tokenize(nopunc)
     
     words = [word for word in raw_words if word not in corpus.stopwords.words("english")]
     
     my_bigrams = list(nltk.bigrams(words))
     my_trigrams = nltk.trigrams(words)
     
     for bigram in my_bigrams:
         rowdict = {}
         rowdict["call_id"] = row["call_id"]
         rowdict["ngram_text"] = (bigram[0] + " " + bigram[1])
         mydict.append(rowdict)
     return
mydict = []
result = InputDataSet.apply(lambda row: generate_ngrams(row), axis=1)
OutputDataSet = pd.DataFrame(mydict)
'
, @input_data_1 = N' SELECT B.call_id, B.callrecord as InputText FROM BaseTable B
WHERE EXISTS(SELECT * FROM STRING_SPLIT(@listOfCallIds, '','') as t WHERE CAST(t.value as int) = B.call_id)'
, @params = N'@listOfCallIds varchar(max)'
, @listOfCallIds = @listOfCallIds
END

Next, we create an “activation stored procedure” which is invoked by the Service Broker queue whenever there are messages on the queue. This activation procedure reads from the queue, constructs a list of call_id values and then invokes the previous “n-gram creator” stored procedure:

CREATE PROCEDURE TargetActivProc
AS
BEGIN
DECLARE @RecvReqDlgHandle AS UNIQUEIDENTIFIER;
DECLARE @RecvReqMsg AS XML;
DECLARE @RecvReqMsgName AS sysname;
DECLARE @listOfCallIds varchar(max);
WHILE (1 = 1)
     BEGIN
         BEGIN TRANSACTION;
         WAITFOR (RECEIVE TOP (1) @RecvReqDlgHandle = conversation_handle, @RecvReqMsg = message_body, @RecvReqMsgName = message_type_name FROM TargetQueue),  TIMEOUT 5000;
         IF (@@ROWCOUNT = 0)
             BEGIN
                 ROLLBACK;
                 BREAK;
             END
         IF @RecvReqMsgName = N'RequestMessage'
             BEGIN
                 -- Populate a local variable with a comma separated list of {call_id} values which are new
                 -- so that the external script invoked by GenerateNGrams can operate on the associated data
                 -- We avoid making a copy of the InputText itself as it will then occupy space in a temp table
                 -- as well as in a Pandas DF later on
                 SELECT @listOfCallIds = STRING_AGG(T.c.value('./@call_id', 'varchar(100)'), ',')
                 FROM   @RecvReqMsg.nodes('/Inserted') AS T(c);
                -- Call the SPEES wrapper procedure to generate n-grams and then insert those into the n-gram table
                 INSERT NGramTable (call_id, ngramtext)
                 EXECUTE GenerateNGrams @listOfCallIds;
                END CONVERSATION @RecvReqDlgHandle;
             END
         ELSE
             IF @RecvReqMsgName = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'
                 BEGIN
                     END CONVERSATION @RecvReqDlgHandle;
                 END
             ELSE
                 IF @RecvReqMsgName = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error'
                     BEGIN
                         END CONVERSATION @RecvReqDlgHandle;
                     END
         COMMIT TRANSACTION;
     END
END

We then associate the activation procedure with the queue:

ALTER QUEUE TargetQueue WITH ACTIVATION (STATUS = ON, PROCEDURE_NAME = TargetActivProc, MAX_QUEUE_READERS = 5, EXECUTE AS SELF);

Trigger

We finally create a trigger on the BaseTable which will be invoked automatically for each INSERT operation. This trigger takes the list of call_id values which have been inserted and transforms that into a XML document (using FOR XML) which is then inserted into the Service Broker queue:

CREATE OR ALTER TRIGGER trg_Insert_BaseTable
     ON dbo.BaseTable
     FOR INSERT
     AS BEGIN
            DECLARE @InitDlgHandle AS UNIQUEIDENTIFIER;
            DECLARE @RequestMsg AS XML;
            
            SELECT @RequestMsg = (SELECT call_id
                                  FROM   Inserted
                                  FOR    XML AUTO);
            
            BEGIN TRANSACTION;
           BEGIN DIALOG @InitDlgHandle
                FROM SERVICE [InitiatorService]
                TO SERVICE N'TargetService'
                ON CONTRACT [SampleContract]
                WITH ENCRYPTION = OFF;
            
            -- Send a message on the conversation           
            SEND ON CONVERSATION (@InitDlgHandle) MESSAGE TYPE [RequestMessage] (@RequestMsg);
           COMMIT TRANSACTION;
        END

Unit Testing

Let’s now unit test the operation of the trigger, activation procedure and the n-gram computation using Python! Let’s insert 4 rows in one go into the base table, simulating what the ASR application might do:

TRUNCATE TABLE BaseTable;
TRUNCATE TABLE NGramTable;
INSERT  INTO BaseTable
VALUES (1, 'the quick brown fox jumps over the lazy dog'),
(2, 'the lazy dog now jumps over the quick brown fox'),
(3, 'But I must explain to you how all this mistaken idea of denouncing of a pleasure and praising pain was born and I will give you a complete account of the system, and expound the actual teachings of the great explorer of the truth, the master-builder of human happiness. No one rejects, dislikes, or avoids pleasure itself, because it is pleasure, but because those who do not know how to pursue pleasure rationally encounter consequences that are extremely painful. Nor again is there anyone who loves or pursues or desires to obtain pain of itself, because it is pain, but occasionally circumstances occur in which toil and pain can procure him some great pleasure. To take a trivial example, which of us ever undertakes laborious physical exercise, except to obtain some advantage from it? But who has any right to find fault with a man who chooses to enjoy a pleasure that has no annoying consequences, or one who avoids a pain that produces no resultant pleasure?'),
(4, 'On the other hand, we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment, so blinded by desire, that they cannot foresee the pain and trouble that are bound to ensue; and equal blame belongs to those who fail in their duty through weakness of will, which is the same as saying through shrinking from toil and pain. These cases are perfectly simple and easy to distinguish. In a free hour, when our power of choice is untrammeled and when nothing prevents our being able to do what we like best, every pleasure is to be welcomed and every pain avoided. But in certain circumstances and owing to the claims of duty or the obligations of business it will frequently occur that pleasures have to be repudiated and annoyances accepted. The wise man therefore always holds in these matters to this principle of selection: he rejects pleasures to secure other greater pleasures, or else he endures pains to avoid worse pains.');

(In case you are wondering, the last 2 text samples in the unit test are taken from this Wikipedia page!)

Then we can take a look at the n-gram table:

SELECT *
FROM   NGramTable;

As you can see below, n-grams have been computed for multiple call_id values! (The results below have been truncated for readability; there are values for call_id 4 as well in the actual table.)

image

Now, let’s try a representative full-text query:

SELECT T.*
FROM   CONTAINSTABLE(NGramTable, ngramtext, 'fox') FT 
JOIN NGramTable T
ON T.ngramid = FT.[Key];

This query returns the following:

image

Stress Test

Let’s put this to the ultimate test – multiple threads concurrently inserting records. The ideal outcome is to see that the INSERT operations complete quickly, and the Python.exe processes run asynchronously and finish the n-gram generation in due course of time. To facilitate the stress test, we create a SEQUENCE object in SQL and an associated Stored procedure:

CREATE SEQUENCE CallIdsForTest AS INT
    START WITH 1  
    INCREMENT BY 1;
GO

CREATE OR ALTER PROCEDURE InsertCall
AS
BEGIN
	SET NOCOUNT ON

	INSERT  INTO BaseTable
	VALUES (NEXT VALUE FOR CallIdsForTest, 'But I must explain to you how all this mistaken idea of denouncing of a pleasure and praising pain was born and I will give you a complete account of the system, and expound the actual teachings of the great explorer of the truth, the master-builder of human happiness. No one rejects, dislikes, or avoids pleasure itself, because it is pleasure, but because those who do not know how to pursue pleasure rationally encounter consequences that are extremely painful. Nor again is there anyone who loves or pursues or desires to obtain pain of itself, because it is pain, but occasionally circumstances occur in which toil and pain can procure him some great pleasure. To take a trivial example, which of us ever undertakes laborious physical exercise, except to obtain some advantage from it? But who has any right to find fault with a man who chooses to enjoy a pleasure that has no annoying consequences, or one who avoids a pain that produces no resultant pleasure?')
END;

To simulate workload, we use the OStress utility from the RML Utilities package to simulate a number of connections. The specific OStress command line used is shown below:

ostress -S.\SQL2017 -dmydb -Q"exec InsertCall" -n100 -q

Each execution of the stored procedure should eventually result in 84 n-gram values being populated in the NGramTable. So with the above stress test, we should see 8400 records in the table. When we run the OStress command line, the 100 concurrent calls to the InsertCall stored procedure finish very quickly – in 1.403 seconds. This is good, because we do not want the front-end stored procedure to block in any way. The time it takes the NGramTable to be completely populated with the 8400 records is around 88 seconds. Almost all that time is actually spent in Python.exe crunching away at the text (converting to lower case, tokenizing, generating n-grams etc.)

Resource Governance

When running the above stress test (with default settings for SQL Server), you will notice that the Python.exe instances end up using almost 100% of the CPU on the machine. This is expected because text processing is a CPU-bound activity. To balance the resource usage and restrict the external Python processes from using too much CPU, we can use the external resource governor setting for maximum CPU used by external instances of R / Python. It is as easy as running the code below, no restart required!

ALTER EXTERNAL RESOURCE POOL [default] WITH (max_cpu_percent=30, AFFINITY CPU = AUTO)

ALTER RESOURCE GOVERNOR RECONFIGURE;

After setting this external pool CPU limit to 30%, the overall CPU usage in the system is reduced, with the obvious tradeoff in overall latency in populating the n-grams into their table. In this case, the total time to populate the 8400 n-grams is around 234 seconds, which when you think about it is just right (we would expect processing time to increase to around 3x the old duration, given we have just 30% of CPU resources.)

So by adjusting the Resource Governor limit you can achieve the right balance between resource usage on the system and external script processing time. More information about Resource Pools for external scripts is here and here.

Summary

In the actual Sayint application, the ability to query for n-grams containing a specific keyword is crucial. With this workflow within SQL Server 2017, that requirement is elegantly accomplished using a combination of several SQL Server features alongside the in-DB Python code – with the advantages of lower latency, data security and no data duplication.

15 Nov 02:25

Surprising but True: the Basics are Still Difficult

by kevin

SURPRISING BUT TRUE: THE BASICS ARE STILL DIFFICULT

I always look forward to new research from Unisphere Research, a division of Information Today, Inc., publishers of this magazine and other great products for data professionals. The latest report which you should read is “SQL Server Transformation: Toward Agility & Resiliency 2017; PASS Database Management Survey.”

READ MORE…

Originally Posted April 07, 2017

The post Surprising but True: the Basics are Still Difficult appeared first on Kevin Kline.

19 Oct 06:24

Stronger Security Via Always Encrypted in SQL Server 2016

by kevin

STRONGER SECURITY VIA ALWAYS ENCRYPTED IN SQL SERVER 2016

Prior to SQL Server 2016, currently in CTP, your main method for encrypting a SQL Server application was to use a feature called Transparent Data Encryption. TDE provides strong encryption, but with some shortcomings. First, you have to encrypt an entire database. No granularity is offered at a lower level, such as encrypting specific tables or certain data within a table. Second, TDE encrypts only data at rest, in files. Data in memory or in-flight between the application and server are unencrypted. Enter Always Encrypted.

READ MORE…

Originally Posted October 07, 2015

The post Stronger Security Via Always Encrypted in SQL Server 2016 appeared first on Kevin Kline.

19 Oct 06:24

For the Better Developer: When indexes are not enough

by Davide Mauri

Another month, another session for the PASS Application Development VG. This time I will be both the host and the guest :), talking about something is very close to my interests and, I think, also a key point for any developer to have success in future of AI. Here’s the session title and abstract:

For the Better Developer: When indexes are not enough

If you want optimum performance in SQL Server, you have to use indexes. But what if you already implemented indexes and your solution is still slow or it doesn’t scale like you want? Or, if you're on Azure, it is just requiring too much resources, which in turns means just more money to be spent on it? You may have to rethink the way you write your queries. How you write your queries is directly related to how you approach and solve your business problems, and more often than not thinking outside the box is the way to unlock incredible performances. But what this exactly means in a database? And how a developer can do that? In this session, you’ll see how using some lateral thinking and a set-based approach will open up a whole world of possibilities. Thanks to this demo intensive session, you'll never be the same after switching on this new mindset!

registration is free via the usual link:

http://appdev.pass.org/

Live event will take place on October 18th, and, again as usual, session will be recorded and made available on group YouTube channel couple of days after the live event.

19 Oct 06:24

Unattended install and configuration for SQL Server 2017 on Linux

by Denzil Ribeiro

SQL 2017 bits are generally available to customers today. One of the most notable milestones in the 2017 release is SQL Server on Linux. Setup has been relatively simple for SQL Server on Linux, but often there are questions around unattended install. For SQL Server on Linux, there are several capabilities that are useful in unattended install scenarios:

  • You can specify environment variables prior to the install that are picked up by the install process, to enable customization of SQL Server settings such as TCP port, Data/Log directories, etc.
  • You can pass command line options to Setup.
  • You can create a script that installs SQL Server and then customizes parameters post-install with the mssql-conf

In SQL Server documentation, we have a sample of what an unattended install process would look like for specific distributions here.

Given multiple installs SQLCAT had to do as a part of working with early versions of SQL Server on Linux, multiple supported platforms, and some common post-install tasks, we created a sample script that would further ease the unattended install process, and would allow you to:

  • Have one script across multiple distributions
  • Choose the components installed (SQL Server, SQL Agent, SQL FTS, SQL HA)
  • Configure common install parameters via a config file
  • Set up some SQL Server on Linux best practices we have documented such as tempdb configuration and processor affinity, which are not part of the core install
  • Enable you to specify a custom post-install .sql file to run once SQL Server is installed

Note: If you choose to install HA components, for RHEL  have to enable subscription manager and add the right HA repository and for SLES you need to add the HA add-on. The configuration file has links to the documentation in both cases.

Here is how to use the unattended install script:

a. Download the script:

         git clone https://github.com/denzilribeiro/sqlunattended.git
         cd sqlunattended
         

b. To prevent sudo password prompts during unattended install:

sudo chown root:root sqlunattended.sh
sudo chmod 4755 sqlunattended.sh

c. Modify the conf file to specify the configuration options required, including what components to install, data/log directories, etc. Here is a snippet from the sqlunattended.conf:

#Components to install
INSTALL_SQL_AGENT=YES
INSTALL_FULLTEXT=NO
INSTALL_HA=NO

# This will set SQL processor affinity for all CPUs, we have seen perf improvements doing that on Linux
SQL_CPU_AFFINITY=YES

# This creates 8 tempdb files if NumCPUS >=8, or as many tempdb files as there are CPUs if NumCPUS < 8
SQL_CONFIGURE_TEMPDB_FILES=YES
SQL_TEMPDB_DATA_FOLDER=/mnt/data
SQL_TEMPDB_LOG_FOLDER=/mnt/log
SQL_TEMPDB_DATA_FILE_SIZE_MB=500
SQL_TEMPDB_LOG_FILE_SIZE_MB=100

d. Run the unattended install

/bin/bash sqlunattended.sh

We hope that this will help to further simplify the customization of the install process. Your feedback is appreciated!

19 Oct 06:22

What’s New in SQL Server 2017 Database Engine

by Artemakis Artemiou

SQL Server 2017 (also known as SQL Server vNext) and the new Azure SQL Database are finally here and they are generally available! In subsequent articles we will analytically examine every new feature using many examples. In this post, we will talk about SQL Server 2017 Database Engine. We will…

The post What’s New in SQL Server 2017 Database Engine appeared first on SQLNetHub.

19 Oct 06:22

Get a Phone Alert When a Query Finishes

by Bill Graziano

Today, through various snafus and situations, I found myself waiting a LONG time for a lock to clear up. Here's how I got an alert to pop up on my phone (an iPhone) when that stupid SPID finally rolled back.

You'll need to have database mail configured and ready to go.

First, configure your iPhone to display an alert when you receive VIP mail. Open mail and click on the little "i" in the circle by the VIP section in the mailboxes list. Click "VIP Alerts" and configure as desired. I have all of them selected and a sound configured.

Second, send yourself an email from the server using sp_send_dbmail.

Third, on your iPhone, open the email and click on the From field. That will open a contact form for that email address (aka the server). There should be an option to add to VIP on the screen.

Fourth, write a query to email yourself. That should look something like this (typed from memory and subject to typos):

USE msdb;

SELECT TOP 1 * FROM TheDB.dbo.TheLockedTable;

EXEC sp_send_dbmail  
    @Subject = 'It''s Done!',
    @Recipients = 'YourEmail';

Make locking work FOR you for a change. The SELECT will block until the lock clears. It sends you an email and you'll get an alert moments later.

Fifth, remove that server from your VIP list. The last thing I want is alerts on every email from my database servers.

19 Oct 06:22

Taking your data to the cloud is easier than you thought

by SQL Server Team

Data management is shifting away from on-premises towards the many benefits of the cloud. Data is the fuel that energizes businesses in today’s marketplace, and the cloud amplifies this energy. The question then arises — what’s the simplest way to take your data to the cloud?

In this free webinar, our Azure experts share how and why migration to the cloud is now easier, faster, and more cost-effective. Discover how to:

  • Realize cost savings by preserving your investment in already purchased on-premises licenses.
  • Save four times the investment you made on cores, leverage cloud flexibility for ROI, and uncover other cost savings in your environment.
  • Migrate easily with the new Azure SQL Database Managed Instance, shifting to the cloud without having to rearchitect apps.
  • Prepare for your migration with four important steps.

Tune in to our live webinar, Modernize to the cloud in four quick, easy, and cost-effective steps, on October 5, 2017, at 10:00 AM Pacific Time (UTC-7) to learn how modernizing your data with the cloud is within your reach. You’ll also get the chance to have your questions answered in real time by our team. Register today.

19 Oct 06:21

Making Videos for Fun and Profit

by arcanecode

As I author this post, I’m at a Microsoft conference. Today’s focus is on the creation of video content to help promote your brand. As I have some experience with this from my Pluralsight videos, I had some suggestions as to technology.

The first tool I use is from TechSmith, it is Camtasia. It is designed to do screen recordings with narration. You can record your audio during your screen recording or add narration later, or a mix of both (should, for example, you mess up and need to fix an audio flub). It has “call outs”, i.e. ways to draw on the screen, transitions, text/titles, noise reduction, and more. It can also record from your webcam, but it’s more meant for putting your head in a small box in a corner of the screen.

I’ve used the free Audacity tool to do some audio touch ups on occasion.

I recently bought CyberLink’s PowerDirector, although to be honest I haven’t yet had time to work with it but look forward to doing so. I also recently found a free video editing suite called Davinci Resolve that looks very nice, but like PowerDirector I haven’t had time to look at it. But the price is right.

Of course if you want to go all out there’s Adobe Premier. It’s the video editor the pro’s use. I haven’t used it, but my daughter uses it to edit her YouTube videos (http://annakatmeow.com is where you can see examples of it in use, her Let’s Explore Nashville and DisneyLand/Vidcon blogs made good use of its benefits. ). FYI Adobe has a program for students where you can get the entire cloud suite, normally $49 a month for only $9 a month.

I also do some video work on my iOS devices. Mostly I record on my iPhone then transfer and edit video on my 12.9 inch iPad Pro.

For recording, I use FiLMiC Pro . It’s the most complete recorder I’ve found, with tons of options. It also has a remote app, so I can setup my phone on a tripod then use one of my iPads to see what the phone sees, start and stop the recordings, etc.

If you want a mid-ranged video recording option, try MoviePro . It has a lot more features than the built in Apple software, but isn’t quite as complex as FiLMiC Pro. It also has a remote app.

For editing, I use LumaFusion . A very full featured editor. Has titles, transitions, prebuilt lower thirds, audio editing, easy addition of sound tracks, and more. I’ve edited several videos with it, and like it a lot.

On the hardware side, I use a Rode Podcaster USB mic to do my video work. A bit pricy for the hobbyist, but since I do this for a living it was a good investment. I have it 7 years now and it still works great.

With my iOS devices I invested in the Rode smartLav+ . With it I can get some really good sound quality even in less than ideal conditions.

I also own a Nikon D7000 DSLR with Video capabilities, but as good as my iPhone 7 camera is now I don’t use it all that much except for when I want to take some really great photos too. I have a variety of lenses that will out perform the iPhone camera easily. It’s just big and bulky and I need a back pack to carry around all my accessories.

I have a small tripod for my phone, and a bigger one for holding my iPads. I also have a SanDisk USB key that can plug into both iOS devices as well as your computer’s standard USB port. Makes it very easy to transfer video files from one device to another.

The last thing I ordered was off Kickstarter, I saw the guy demonstrating it at VidCon and loved it. It’s a ten foot selfie stick. I know, you’re scratching your head and thinking I’ve gone nuts. Selfie stick? But the more I thought about it the more uses I kept coming up with for it.

This will be great for taking crowd shots at big events, such as Ignite and the PASS Summit. It comes with a tripod base, so you could use it to record a stand up hallway interview with someone, or put it on a table top. And that’s just for us geeks, I can also see it being useful for journalists, or real estate agents, letting them get unique shots.  Really fills a gap for when a drone would be useful but can’t be used due to being indoors or other regulations. It comes with both a phone mount as well as a GoPro mount, so I may wind up having to buy a GoPro.

Often video can be greatly enhanced through the use of soundtracks or special audio effects. A few sources I found for these include:

ZapSplat: Lots of sound effects and some music. https://www.zapsplat.com/

FreeSound: Mostly free, good but not easy to find things. https://freesound.org/

Epidemic Sound: Outstanding music, I mean really, really good. It does require a fee though, either 99 cents per minute if you only want one song, or $15 a month per youtube channel you want to use it in.  http://www.epidemicsound.com/

Finally, there is an excellent YouTube channel by Amy Schmittauer called Savvy Sexy Social. She provides excellent advice on vlogging, including content creation, technical aspects. and more. She’s also authored an outstanding book which I highly reccomend called Vlog Like a Boss: How to Kill It Online with Video Blogging.

19 Oct 06:21

Secure your on-premises network outbound connection to Azure SQL Database by locking down target IP addresses

by Kun Cheng (SQLCAT)

Reviewed by: Dimitri Furman, John Hoang, Mike Weiner, Rajesh Setlem

Most people familiar with Azure SQL DB (aka SQL Database) are aware of the firewall setting requirements of SQL DB, which are very important to lock down connections to SQL DB on Azure. Details are documented here. The firewall setting helps restrict inbound connections to clients with specific IP addresses. Note that this refers to the firewall on the Azure SQL DB side. But when the SQL client is running on-premises behind internet gateway/proxy server, the firewall of the gateway/proxy server needs to be configured as well, to allow outbound connections to Azure. Other than opening TCP port 1433, which is the port SQL DB listens on, customers may also limit the IP addresses of target SQL DB that are allowed for outbound connections. This is important from security perspective, since it blocks unwanted network traffic and limits the extent of damage that a potential hacker/malware could do to copy confidential data out of the on-premises network. So how do you know your intended target Azure SQL DB IP addresses for outbound connection?

I had a customer using a simple approach to get his SQL DB IP address. Let me explain with an example. When you spin up a SQL database on Azure, it’s always associated with a logical SQL Server, which has a DNS name like this (shown on Azure portal SQL DB overview blade):

It’s easy to look up its IP address by ping.

The customer got his SQL DB IP address and configured his on-premises network firewall to allow outbound connection only to the IP address for his SQL client application. And it worked! But only until one day that IP address changed without notice. What happened?

SQL DB actually has a fixed set of public gateway IP addresses. These IP addresses have been published in this article on SQL DB connectivity (It’s also a very good read for you to understand the connectivity architecture of Azure SQL DB). Note that for each region, there could be up to 2 IP addresses, which provide enhanced high availability for the SQL DB gateway. The aforementioned customer had connection problem in the end because when the gateway failover occurred, its IP address switched to the secondary. So it’s very important to configure your on-premises firewall to allow outbound connections to both gateway IP addresses of each region. And here is the important part: these IP addresses won’t change. But equally important note: the regions with only primary IP address today eventually will have secondary IP address as well. For future updates on these secondary IPs please refer to this article.

If your on-premises application uses other Azure services besides SQL DB (e.g., storage, compute), and you desire to control outbound connections to all IP addresses of each Azure region, you may download the current IP addresses here. Note:

  1. It includes SQL DB IP addresses as mentioned above.
  2. It’s updated on a weekly basis so you need to update your on-premises firewall rules accordingly (unlike SQL DB, other Azure service IP addresses might change over time)

Summary: to help secure your on-premises network environment, it’s a best practice to configure your on-premises firewall and allow outbound connections on port 1433 only to your target SQL DB IP addresses listed here. To allow outbound connections to other Azure services besides SQL DB, the IP ranges can be downloaded here, which is updated every week.

19 Oct 06:19

SpeedPASS Download Naming Convention

by SQLAndy

This was the first time in a long time that I managed the check-in process of SQLSaturday Orlando and was surprised/dismayed to see that when I unzipped the download the filenames were all named with a less than helpful format; b9d73479-5a04-42a3-9f11-a56af7b56beb.pdf. The goal was to be able to quickly find the SpeedPASS for anyone that needed it printed and while search should look inside the files to find a name, I thought it would be faster to just have the attendee name IN the file name. I exported the registration list to Excel, added a calculated fullname column, then exported that to CSV.  My quick hack:

$list = Import-CSV “c:\speedpass\names.csv”
$list.Count

foreach($item in $list)
{
$fileToFind = “C:\speedpass\” + $item.InvoiceID +”.pdf”

$file = Get-ChildItem $fileToFind
rename-item -Path $fileToFind -NewName ($item.FullName + “_” + $item.InvoiceID + “.pdf”)
}

I see that Wayne has posted a solution that you might like better at http://blog.waynesheffield.com/wayne/archive/2017/03/working-sqlsaturday-speedpasses/ for printing all of them, if that’s the route you take.

I can’t help but ask, why not include the name from the start? Or, to dream a little more, how about simple Powershell forms app that would let someone type in a few characters of a name and show the resulting file matches, then click to print? During check-in its all about moving people through fast.

19 Oct 06:19

Data Governance is Not About Governing Data

by Frank Cerwin

Click to learn more about author Frank Cerwin. Governance is all around us in our daily lives.  The U.S. Department of Agriculture (USDA) governs meat that we purchase in grocery stores.  The county’s Recorder of Deeds governs land ownership.  Government bodies at the municipal, county, and state levels govern roadways with stop signs, street lights, […]

The post Data Governance is Not About Governing Data appeared first on DATAVERSITY.

19 Oct 06:19

The BigData Legacy

by Rob Farley

Trends come along, and trends pass. Some hang around for quite a while, and then move on, and some seem to disappear quickly. Often we’re glad that they’ve gone, but we still bear scars. We live and work differently because they were there. In the world of IT, I feel like this is all too common.

When ORMs became trendy, people were saying that writing T-SQL would be a thing of the past. LINQ was another ways that people were reassuring the developer community that writing database queries would never again be needed. The trend of avoiding T-SQL through ORMs has hung around a bit, and many developers have recognised that ORMs don’t necessarily create the best database experiences.

And yet when we consider what’s happening with Azure SQL Data Warehouse (SQL DW), you find yourself querying the data through an interface. Sure, that interface looks like another database, but it’s not where the data is (because the data is in the 60 databases that live in the back), and has to it translates our query into a series of other queries that actually run. And we’re fine with this. I don’t hear anyone complaining about the queries that appear in SQ DW’s explain plans.

When CLR came in, people said it was a T-SQL killer. I remember a colleague of mine telling me that he didn’t need to learn T-SQL, because CLR meant that he would be able to do it all in .Net. Over time, we’ve learned that CLR is excellent for all kinds of things, but it’s by no means a T-SQL killer. It’s excellent for a number of reasons – CLR stored procedures or functions have been great for things like string splitting and regular expressions – and we’ve learned its place now.

I don’t hear people talking about NoSQL like they once did, and it’s been folded somehow into BigData, but even that seems to have lost a little of its lustre from a year or two ago when it felt like it was ‘all the rage’. And yet we still have data which is “Big”. I don’t mean large, necessarily, just data that satisfies one of the three Vs – volume, velocity, variety.

Of these Vs, Volume seems to have felt like a misnomer. Everything thinks what they have is big, but if you compared it to others, it probably wouldn’t actually be that big. Generally, if people are thinking “BigData” because they think their data is big, then they just need a reality check, and then deal with it like all your regular data.

Velocity is interesting. If your system can’t respond to things quickly enough, then perhaps pushing your data through something like Stream Analytics could be reasonable, to pick up the alert conditions. But if your data is flowing through to a relational database, then is it really “BigData”?

And then we have Variety. This is about whether your data is structured or not. I’m going to suggest that your data probably is structured – and BigData solutions wouldn’t disagree with this. It’s just that you might not want to define the structure when the data is first arriving. To get data into a structured environment (such as a data table), types need to be tested, the data needs to be converted appropriately, and if you don’t have enough control over the data that’s coming in, the potential for something to break is high. Sorting out that mess when you need to query it back again means that you have a larger window to deal with it.

So this is where I think BigData is leaving its legacy – in the ability to accept data even if it doesn’t exactly fit the structure you have. I know plenty of systems that will break if the data arriving is in the wrong structure, which makes change and adaptability hard to achieve. A BigData solution can help mitigate that risk. Of course, there’s a price to pay, but for those times when the structure tends to change overly regularly, BigData’s ideology can definitely help.

We see this through the adoption of JSON within SQL Server, which is much less structured even than XML. We see PolyBase’s external tables define structure separately to the collection of data. Concepts that were learned in a void of relational data have now become part of our relational databases.

Don’t dismiss fads that come through. Look into them, and try to spot those things which have more longevity. By adopting those principles, you might find yourself coming through as a stronger professional.

@rob_farley

This post was put together for T-SQL Tuesday 95, hosted by Derik Hammer (@sqlhammer). Thanks Derik!

19 Oct 06:19

The BigData Legacy

by Rob Farley

Hi! - Great that you've found this page, but it's no longer here! You can find the content over at: http://blogs.lobsterpot.com.au/2017/10/10/the-bigdata-legacy/

19 Oct 06:19

SSMS 17.3 has Profiler built-in

by TiborKaraszi
We all know that Profiler and its background functionality SQL Trace has been deprecated since 2012. Right? And we have all switched to using Extended Events (XE), a long time ago, right? No, the reality is that many of us still uses the old tracing infrastructure. This can be for various reasons, like: Familiarity Existing templates and stuff Ease of use Lack of functionality in the replacement … I have been gradually switching over to XE over the last few years. And I have been gradually becoming...(read more)
19 Oct 06:18

Webinar: Get enterprise-grade features at open-source prices with SQL Server 2017 on Linux

by SQL Server Team

IT pros now have the flexibility of running workloads across Windows, Linux, or a combination of both. By taking advantage of SQL Server 2017 on Linux, you’ll get enterprise-grade features at open-source pricing — all backed by Microsoft support and security. It’s never been easier or more cost effective to make the move.

SQL Server 2017 combines the best of databases and analytics into one product. New enhancements include real-time intelligence due to faster transactions with In-Memory OLTP, new T-SQL enhancements for in-memory tables and natively compiled stored procedures, ColumnStore performance improvements, and support for LOBS.

Join our speakers Debbi Lyons, Sr. Product Marketing Manager and Travis Wright, Principal Program Manager for a webinar on how SQL Server 2017 on Linux provides intelligence on a trusted, industry-leading platform while enabling innovation for all your data.

In this live webinar and Q&A, you’ll learn:

  • What’s new with SQL Server 2017 on Linux and the benefits of building your next Linux project with SQL Server.
  • How to leverage Microsoft’s offer to get a discounted subscription rate.
  • Quick and easy ways to get started with SQL Server 2017, and our suite of free tools available.

Join us Thursday, October 12, 2017, at 9:00 AM Pacific Time (UTC-7). Register now for this webinar on how your data ecosystem can benefit from SQL Server 2017 on Linux.

19 Oct 06:17

Azure Analysis Services scheduled autoscaling and pausing

by jorg

You can read this blog post at this link: https://jorgklein.com/2017/10/11/azure-analysis-services-scheduled-autoscaling/

This blog has moved to https://jorgklein.com There will be no further posts on sqlblog.com. Please update your feeds accordingly.

You can follow me on twitter: http://twitter.com/jorg__klein

19 Oct 06:17

Use cases of various products for a big data cloud solution

by James Serra

There are a tremendous amount of Microsoft products that are cloud-based for building big data solutions.  It’s great that there are so many products to choose from, but it does lead to confusion on what are the best products to use for particular use cases and how do all the products fit together.  My job as a Microsoft Cloud Solution Architect is to help companies know about all the products and to help them in choosing the best products to use in building their solution.  Based on a recent architect design session with a customer I wanted to list the products and use cases that we discussed for their desire to build a big data solution in the cloud (focusing on compute and data storage products and not ingestion/ETL, real-time streaming, advanced analytics, or reporting; also, only PaaS solutions are included – no IaaS):

  • Azure Data Lake Store (ADLS): Is a high throughput distributed file system built for cloud scale storage.  It is capable of ingesting any data type from videos and images to PDFs and CSVs. This is the “landing zone” for all data.  It is HDFS compliant, meaning all products that work against HDFS will also work against ADLS.  Think of ADLS as the place all other products will use as the source of their data.  All data will be sent here including on-prem data, cloud-based data, and data from IoT devices.  This landing zone is typically called the Data Lake and there are many great reasons for using a Data Lake (see Data lake details and Why use a data lake? and the presentation Big data architectures and the data lake)
  • Azure HDInsight (HDI):  Under the covers, HDInsight is simply Hortonworks HDP 2.6 that contains 22 open source products such as Hadoop (Common, YARN, MapReduce), Spark, HBase, Storm, and Kafka.  You can use any of those or install any other open source products that can all use the data in ADLS (HDInsight just connects to ADLS and uses that as its storage source)
  • Azure Data Lake Analytics (ADLA): This is a distributed analytics service built on Apache YARN that lets you submit a job to the service where the service will automatically run it in parallel in the cloud and scale to process data of any size.  Included with ADLA is U-SQL, which has a scalable distributed query capability enabling you to efficiently analyze data whether it be structured (CSV) or not (images) in the Azure Data Lake Store and across Azure Blob Storage, SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse.  Note that U-SQL supports batch queries and does not support interactive queries, and does not handle persistence or indexing.  ADLA is great for things such as prepping large amounts of data for insertion into a data warehouse or replacing long-running monthly batch processing with shorter running distributed processes
  • Azure Analysis Services (AAS): This is a PaaS for SQL Server Analysis Services (SSAS).  It allows you to create an Azure Analysis Services Tabular Model (i.e. cube) which allows for much faster query and reporting processing compared to going directly against a database or data warehouse.  A key AAS feature is vertical scale-out for high availability and high concurrency.  It also creates a semantic model over the raw data to make it much easier for business users to explore the data.  It pulls data from the ADLS and aggregates it and stores it in AAS.  The additional work required to add a cube to your solution involves the time to process the cube and slower performance for ad-hoc queries (not pre-determined), but there are additional benefits of a cube – see Why use a SSAS cube?
  • Azure SQL Data Warehouse (SQL DW): This is a SQL-based, fully-managed, petabyte-scale cloud data warehouse. It’s highly elastic, and it enables you to set up in minutes and scale capacity in seconds. You can scale compute and storage independently, which allows you to burst compute for complex analytical workloads.  It is an MPP technology that shines when used for ad-hoc queries in relational format.  It requires data to be copied from ADLS into SQL DW but this can be done quickly using PolyBase.  Compute and storage are separated so you can pause SQL DW to save costs (see
    SQL Data Warehouse reference architectures)
  • Azure Cosmos DB: This is a globally distributed, multi-model (key-value, graph, and document) database service.  It fits into the NoSQL camp by having a non-relational model (supporting schema-on-read and JSON documents) and working really well for large-scale OLTP solutions (it also can be used for a data warehouse when used in combination with Apache Spark – a later blog).  See Distributed Writes and the presentation Relational databases vs Non-relational databases.  It requires data to be imported into it from ADLS using Azure Data Factory
  • Azure Search: This is a search-as-a-service cloud solution that gives developers APIs and tools for adding a rich full-text search experience over your data.  You can store indexes in Azure Search with pointers to objects sitting in ADLS.  Azure Search is rarely used in data warehouse solutions but if queries are needed such as getting the number of records that contain “win”, then it may be appropriate.  Azure Search supports a pull model that crawls a supported data source such as Azure Blob Storage or Cosmos DB and automatically uploads the data into your index.  It also supports the push model for other data sources such as ADLS to programmatically send the data to Azure Search to make it available for searching.  Note that Azure Search is built on top of ElasticSearch and uses the Lucene query syntax
  • Azure Data Catalog: This is an enterprise-wide metadata catalog that makes data asset discovery straightforward.  It’s a fully-managed service that lets you register, enrich, discover, understand, and consume data sources such as ADLS.  It is a single, central place for all of an organization’s users to contribute their knowledge and build a community and culture of data.  Without using this product you will be in danger having a lot of data duplication and wasted effort
  • Azure Databricks: This is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project.  It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib).  It has built-in integration with Azure Blog Storage, ADLS, SQL DW, Cosmos DB, Azure Event Hub, and Power BI.  Think of it as an alternative to HDI and ADLA.  It differs from HDI in that HDI is a PaaS-like experience that allows working with more OSS tools at a less expensive cost.  Databricks advantage is it is a Spark-as-a-Service-like experience that is easier to use, has native Azure AD integration (HDI security is via Apache Ranger and is Kerberos based), has auto-scaling and auto-termination, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.  Also note with built-in integration to SQL DW it can write directly to SQL DW, as opposed to HDInsight which cannot and therefore more steps are required: when HDInsight processes data it writes it back to Blob Storage and then requires ADF to move the data from Blob Storage to SQL DW

In addition to ADLS, Azure Blob storage can be used instead of ADLS or in combination with it.  When comparing ADLS with Blob storage, Blob storage has the advantage of lower cost since there are now three Azure Blob storage tiers: Hot, Cool, and Archive, that are all less expensive than ADLS.  The advantage of ADLS is that there are no limits on account size and file size (Blob storage has a 5 PB account limit and a 4.75 TB file limit).  ADLS is also faster as files are auto-sharded/chunked where in Blob storage they remain intact.  ADLS supports Active Directory while Blob storage supports SAS keys.  ADLS also supports WebHDFS while Blob storage does not (it supports WASB which is a thin layer over Blob storage that exposes it as a HDFS file system).  Finally, while Blob storage is in all Azure regions, ADLS is only in two US regions (East, Central) and North Europe (other regions coming soon).  See Comparing Azure Data Lake Store and Azure Blob Storage.

Now that you have a high-level understanding of all the products, the next step is to determine the best combination to use to build a solution.  If you want to use Hadoop and don’t need a relational data warehouse the product choices may look like this:

Most companies will use a combination of HDI and ADLA.  The main advantage with ADLA over HDI is there is nothing you have to manage (i.e. performance tuning), there is only one language to learn (U-SQL) with easier development and debugging tools, you only incur costs when running the jobs where HDI clusters are always running and incurring costs regardless if you are processing data or not, and you can scale individual queries independently of each other instead of having queries fight for resources in the same HDIinsight cluster (so predictable vs unpredictable performance).  In addition, ADLA is always available so there is no startup time to create the cluster like with HDI.  HDI has an advantage in that it has more products available with it (i.e. Kafka) and you can customize it (i.e. install additional software) where in ADLS you cannot.  When submitting a U-SQL job under ADLA you specify the resources to use via a Analytics Unit (AU).  Currently, an AU is the equivalent of 2 CPU cores and 6 GB of RAM and you can go as high as 450 AU’s.  For HDI you can give more resources to your query by increasing the number of worker nodes in a cluster (limited by the region max core count per subscription but you can contact billing support to increase your limit).

Most of the time a relational data warehouse as part of your solution, with the biggest reasons being familiarity with relational databases by the existing staff and the need to present an easier to understand presentation layer to the end-user so they can create their own reports (self-service BI).  A solution that adds a relational database may look like this:

The Data Lake technology can be ADLS or blob storage, or even Cosmos DB.  The main reason against using Cosmos DB as your Data Lake is cost and having to convert all files to JSON.  A good reason for using Cosmos DB as a Data Lake is that it enables you to have a single underlying datastore that serves both operational queries (low latency, high concurrency, low compute queries – direct from Cosmos DB) as well as analytical queries (high latency, low concurrency, high compute queries – via Spark on Cosmos DB).  By consolidating to a single data store you do not need to worry about data consistency issues between maintaining multiple copies across multiple data stores.  Additionally, Cosmos DB has disaster recovery built-in by easily allowing you to replicate data across Azure regions with automatic failover (see How to distribute data globally with Azure Cosmos DB), while ADLS requires replication and failover to be done manually (see Disaster recovery guidance for data in Data Lake Store).  Blob storage has disaster recovery built-in via Geo-redundant storage (GRS) but requires manual failover by Microsoft (see Redundancy Options in Azure Blob Storage).

An option to save costs is to put “hot” data in Cosmos DB, and warm/cold data in ADLS or Blob storage while using the same reporting tool, Power BI, to access the data from either of those sources as well as many others (see Data sources in Power BI Desktop and Power BI and Excel options for Hadoop).

If Cosmos DB is your data lake or used as your data warehouse (instead of SQL DW/DB in the picture above), you can perform ad-hoc queries using familiar SQL-like grammar over JSON documents (including aggregate functions like SUM) without requiring explicit schemas or creation of secondary indexes.  This is done via the REST API, JavaScript, .NET, Node.js, or Python.  Querying can also be done via Apache Spark on Azure HDInsight, which provides additional benefits such as faster performance and SQL statements such as GROUP BY (see Accelerate real-time big-data analytics with the Spark to Azure Cosmos DB connector).  Check out the Query Playground to run sample queries on Cosmos DB using sample data.  Note the query results are in JSON instead of rows and columns.

You will need to determine if your solution will have dashboard and/or ad-hoc queries.  Your choice of products in your solution will depend on the need to support one or both of those queries.  For ad-hoc queries, you have to determine what the acceptable performance is for those queries as that will determine if you need a SMP or MPP solution (see Introducing Azure SQL Data Warehouse).  For dashboard queries (i.e. from PowerBI) it’s usually best to have those queries go against AAS to get top-notch performance and because SQL DW has a 32-concurrent query limit (and one dashboard can have a dozen or so queries).  Another option to get around the 32-concurrent query limit is to copy data from SQL DW to data marts in Azure SQL Database.  Complex queries, sometimes referred to as “last mile” queries, may be too slow for a SMP solution (i.e. SQL Server, Azure SQL Database) and require a MPP solution (i.e. SQL DW).

The diagram above shows SQL DW or Azure SQL Database (SQL DB) as the data warehouse.  To decide which is the best option, see Azure SQL Database vs SQL Data Warehouse.  With a clustered column store index SQL DB competes very well in the big data space, and with the addition of R/Python stored procedures, it becomes one of the fastest performing machine learning solutions available.  But be aware that the max database size for SQL DB is 4 TB, but there will soon be an option called SQL DB Managed Instance that supports a max database size much higher.  See the presentations Should I move my database to the cloud? and Introducing Azure SQL Database.

You will also need to determine if you solution will have batch and/or interactive queries.  All the products support batch queries, but ADLA does not support interactive queries (so you could not use the combination of Power BI and ADLA).  If you want to stay within the Hadoop world you can use the HDInsight cluster types of Spark on HDInsight or HDInsight Interactive Query (Hive LLAP) for interactive queries against ADLS or Blob Storage (see General availability of HDInsight Interactive Query – blazing fast queries on hyper-scale data) and can use AtScale instead of AAS to build cubes/OLAP within Hadoop.  AtScale will work against data in ADLS and Blob Storage via HDInsight.

Whether to have users report off of ADLS or via a relational database and/or a cube is a balance between giving users data quickly and having them do the work to join, clean and master data (getting IT out-of-the-way) versus having IT make multiple copies of the data and cleaning, joining and mastering it to make it easier for users to report off of the data.  The risk in the first case is having users repeating the process to clean/join/master data and cleaning/joining/mastering it wrong and getting different answers to the same question (falling into the old mistake that the data lake does not need data governance and will magically make all the data come out properly – not understanding that HDFS is just a glorified file folder).  Another risk in the first case is performance because the data is not laid out efficiently.  Most solutions incorporate both to allow “power users” to access the data quickly via ADLS while allowing all the other users to access the data in a relational database or cube.

Digging deeper, if you want to run reports straight off of data in ADLS, be aware it is file-based security and so you may want to create a cube for row-level security and also for faster performance since ADLS is a file system and does not have indexes (although you can use a products such Jethro Data to create indexes within ADLS/HDFS).  Also, running reports off of ADLS compared to a database has disadvantages such as limited support of concurrent users; lack of indexing, metadata layer, query optimizer, and memory management; no ACID support or data integrity; and security limitations.

If using Power BI against ADLS, note that Power BI imports the data from ADLS.  It is not a live connection to ADLS so you could easily run into the 1GB limit of a data set in Power BI (although potentially not until after ingesting 10GB of data from ADLS as Power BI can achieve compression of up to 10 times).  And be aware the initial import can be slow and the data would be “stale” until refreshed again.

The decisions on which products to use is a balance between having multiple copies of the data and the additional costs that incurs and the maintaining and learning of multiple products versus less flexibility in reporting of data and slower performance.  Also, while incorporating more products into a solution means it takes longer to build, the huge benefit of that is you “future proof” your solution to be able to handle any data in the future no matter what the size, type, or frequency.

The bottom line is there are so many products with so many combinations of putting them together that a blog like this can only help so much – you may wind up needing a solution architect like me to help you make sense of it all 🙂

More info:

The Microsoft Business Analytics and AI Platform – What to Use When

Distributed Computing Principles and SQL-on-Hadoop Systems

Big data architecture style

IS AZURE SQL DATA WAREHOUSE A GOOD FIT?

AZURE DATA LAKE ANALYTICS HOLDS A UNIQUE SPOT IN THE MODERN DATA ARCHITECTURE

My presentation Choosing technologies for a big data solution in the cloud

Using Azure Analysis Services on Top of Azure Data Lake Store

Understanding WASB and Hadoop Storage in Azure

Presentation Data on Azure: The big picture

Presentation Understanding big data on Azure – structured, unstructured and streaming

Presentation Architect your big data solutions with SQL Data Warehouse and Azure Analysis Services

Presentation Building Petabyte scale Interactive Data warehouse in Azure HDInsight

Presentation Building modern data pipelines with Spark on Azure HDInsight

Presentation Azure Blob Storage: Scalable, efficient storage for PBs of unstructured data

Presentation Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platform, and intelligent

Presentation Azure Cosmos DB: The globally distributed, multi-model database

Startup Mistakes: Choice of Datastore

19 Oct 06:16

New Features in SSMS 17.3

by Wayne Sheffield

SSMS 17.3

SSMS 17.3 was just released (download link), and it has a couple of new features worthy of mention:

  • Import Flat File Wizard
  • XEvent Profiler

Import Flat File Wizard

If you’ve worked with SQL Server for very long, you have most likely needed to import flat files from time to time. SSMS 17.3 utilizes a new import technology (Microsoft calls it the “Microsoft Program Synthesis using Examples” (PROSE) SDK). This new technology can take raw, semi-structured data in .csv or .txt files, and it identifies patterns to do “predictive file splitting” to allow the wizard to learn about the file, including column names, types, delimiters, etc.

This wizard is available from the Tasks context menu of a database node:

In the wizard that comes up, you simply specify the input file, and the destination table. The new table name will default to the name of the flat file. The wizard then steps you through previewing the data and specifying column schema.

Testing the Import Flat File Wizard

One of the biggest issues that I’ve dealt with is dealing with lines that have quoted text. So, let’s try this new wizard out. We’ll use this import file:

Stepping through the wizard, we first specify the file being imported:

The next step of the wizard let’s you preview the data:

This preview shows that most of the lines handled the quoted text (and other columns) correctly. I expected the last line to have ended with a double-quote, and it doesn’t. However, lines with double-quotes embedded in the line did keep those double-quotes.

In the next step, you specify the schema for the new table:

After this step is a summary screen that tells you simply the server / database name, table name, and file name being imported. Clicking Finish here creates the table and imports the file into the table.

For this simple test, it worked pretty well, and it should work well for most of the requirements that you have. Time will tell how reliably this new feature does work.

The Import Flat File is available when connecting to SQL Server version 2005 or higher. I haven’t tried this on a lower version, but I don’t see any reason why it wouldn’t work there either. You can read more about this feature in Microsoft’s documentation.

XEvent Profiler

The Extended Events (XE) that were introduced in SQL Server 2008 are intended to replace usage of the SQL Profiler. They are not as invasive (resource-wise) on your system as SQL Profiler is. There are a lot of people still don’t use them because they feel that it’s faster to get a profiler trace running than an XE session. The new XEvent Profiler overcomes this. According to the Microsoft blog post introducing this feature, they even want to know if there is a gap that requires you to use SQL Profiler instead of this new XEvent Profiler.

The XEvent Profiler gives you the ability to start and watch (by live target view) either of two predefined XEvent sessions. These provide you with a quick view capability into your system. It provides:

  • Only the two sessions.
  • No filtering.
  • Only the predefined events in those sessions, and only the events available in SQL Profiler.

Since the XEvent Profiler replaces SQL Profiler, it only has the events that are available in SQL Profiler. This means that none of the newer features that need XEvents to monitor are available in these sessions. I predict that you’ll either love or hate this feature – and I know people who are on both sides of this feature already!

The XEvent Profiler is available in SSMS just under the SQL Server Agent node:

From here, you can start and stop the session. When you start (launch) the session, it will open up the XEvent Live Data Viewer.

The XEvent Profiler is available when connecting to SQL Server versions 2012 or higher. You can read more about this feature in Microsoft’s documentation.

 

The post New Features in SSMS 17.3 appeared first on Wayne Sheffield.

19 Oct 06:14

SQL Server Named DBMS of the Year

by kevin

SQL SERVER NAMED 2017 DBMS OF THE YEAR

Way back in the summer of 2014, I told you about how I used DB-Engines to determine the popularity of various data management tools and vendor platforms. At that time, I explained how I used DB-Engines.com and Google Trends to see where the overall trends that I’m interested in are headed. Staying up with the trends isn’t just for fashionistas!

READ MORE…

Originally Posted March 02, 2017

The post SQL Server Named DBMS of the Year appeared first on Kevin Kline.

19 Oct 06:14

Non Profit Umbrella Org – Good Idea?

by SQLAndy

Note: Published too soon, so lots of edits from the first version. Sorry for any confusion.

I had a question from my earlier post about SQLOrlando with regards to setting up an umbrella organization, basically asking “instead of each Chapter doing all this work, why not have one org?”. Great idea, at least in theory (and maybe in practice).

Here in Orlando we have two PASS Chapters, neither is incorporated. We put both Chapter leaders on the Board of SQLOrlando. We treat SQLSaturday as a joint fundraiser. Any funds left after he event will be allocated to some combination of joint events and Chapter support – we’ll meet soon to hash out a plan.  After that we’ll pay for expenses directly (order the pizza with the corporate card) or just reimburse as long as we capture receipts.  We might also sponsor the groups (or other groups), though I see that requiring extra care (we want to be able to show (prove) that we’re not taking compensation). Tracking the money assigned to each group seems doable. Not painless maybe, but doable. Fairly easy if we only add funds once a year, maybe a little harder if a chapter gets a sponsor for a meeting and the money goes into the non profit. Still doable I think. Given that both groups are local and that we’ve all known each other for years, I have hopes of it working out well in Orlando.

If you aimed higher, say a state level org (hard to go wider because the rules vary by state, but not impossible), then the first consideration would be how much money. This non-profit thing is significantly easier if you stay below $50,000 in annual revenue. If you go over, then you have to do a real filing and not just the e-postcard. That adds expense (to have someone prepare the return), but that doesn’t mean you can’t do it, it’s just a decision that requires some thought. After that, I think its all about governance. Giving each participating Chapter a seat on the Board would be a good start, along with clear bylaws that covered distribution of funds (ranging from a lost receipt to a Chapter withdrawing). In the end it would either work, or a Chapter would withdraw, or the non-profit would close down. I think it would be simple enough to do, as long as the non-profit was just bring those people together and consolidating the money management. If the non-profit starts to do fund raising on its own, then I think it gets complicated. Can they sell sponsors a state-side option? How would the funds get split across member Chapters? Complicated.

Basic on my limited experience I can say that forming a non-profit is doable, but it’s a lot more work than NOT doing it. Joining a “co-op” type org as described above seems like an easy win for new Chapters, or ones with small amounts of income, or just no interest/energy for taking that big step to being a stand alone. But really, I think it just depends. Some groups will want to do their own thing, others won’t, depending on their size, maturity, goals, cross Chapter relationships, and probably a few more things. I think it probably also matters whether that org already exists and you just decide to join or not, versus the effort to get everyone to agree on the approach and then form a new non-profit. Looking back, had the option to join something like this existed already, I think we would have done so. Why not?

Is it legal? I have no idea. I think if the bylaws are well written and good records kept it seems like it would meet the intent of being a non-profit. I think as long as the Chapters are “unofficial” entities (not incorporated, LLC, etc) it seems ok, though I wonder about the Chapter agreement. Typically a non-profit would only have officers sign agreements and if that happens, it is one Chapter or many? How would PASS handle that, or would they care?

For us, just forming the non-profit was an experiment, we know we’ve got a lot more to learn. We have no language in our by-laws yet that handle any of the complexities I’ve mentioned above. The early steps are promising (it’s sooo nice to have the money NOT in my PayPal account!) and we’ll see how things go for the next year.  It’s far too early to say “go do it this way”.  If you’re thinking to try the umbrella approach just be a little cautious – do the reading, spend a $100-$200 to talk to an accountant or an attorney, then go forth and iterate, and take some time to share what you learn.

 

19 Oct 06:13

Tech Data Introduces 30 Days to Cloud Analytics Program

by A.R. Guess

by Angela Guess A recent press release reports, “Tech Data today introduced its new “30 Days to Cloud Analytics” program, designed to provide a consolidated approach for solution providers to migrate, integrate and manage their customers’ workloads to cloud business intelligence and data warehousing solutions. Using a proven methodology and market-leading enterprise cloud data management […]

The post Tech Data Introduces 30 Days to Cloud Analytics Program appeared first on DATAVERSITY.

19 Oct 06:13

Informatica Partners with Tableau to Unleash the Power of Data

by A.R. Guess

by Angela Guess A recent press release states, “Informatica, the Enterprise Cloud Data Management leader accelerating data-driven digital transformation, today announced the seamless integration of Informatica Enterprise Information Catalog with Tableau, the global leader in visual analytics. This integration enables governed self-service analytics—critical for any organization that provides analytics via agile, self-service tools. Data governance […]

The post Informatica Partners with Tableau to Unleash the Power of Data appeared first on DATAVERSITY.

19 Oct 06:12

My Last Blog: Data Models, SQL, UML, and Database Design

by Michael Blaha

Click to learn more about author Michael Blaha. This will be the last blog I write for DATAVERSITY® as I am dealing with health issues. I have cancer but am fortunate to be receiving excellent medical care. With treatment my prognosis is good, but this affects my free time. Hence this will be my last […]

The post My Last Blog: Data Models, SQL, UML, and Database Design appeared first on DATAVERSITY.

19 Oct 06:12

DirectQuery in Analysis Services: Best practices, performance, use cases

by Marco Russo

DirectQuery is a feature of Analysis Services that transforms a Tabular model in a semantic layer on top of a relational database, transforming any MDX or DAX query in a real-time request to the underlying relational engine using the SQL language. This feature has been improved and optimized in the latest versions, including Azure Analysis Services, extending the support to relational databases other than SQL Server, and dramatically improving its performance. Learn what are the features of DirectQuery, how to implement best practices in order to obtain the best results, and what are typical use cases where DirectQuery should be considered as an alternative to the in-memory engine embedded in Analysis Services.

19 Oct 06:10

Microsoft Ignite Big Data Presentations

by James Serra

There were so many good presentations at Microsoft Ignite, all of which can be viewed on-demand.  I wanted to list the big data related presentations that I found the most useful.  It’s a lot of stuff to watch and with our busy schedules can be quite challenging to view them all.  What I do is set aside 40 minutes every day to watch half a session (they are 75 minutes).  If may take a few weeks, but if you consistently watch you will be rewarded by a much better understanding of all the product options and their uses cases, and my last blog post (Use cases of various products for a big data cloud solution) can be used as a summary of all these options:

Modernize your on-premises applications with SQL Database Managed Instances: More and more customers who are looking to modernize their data centers have the need to lift and shift their fleet of databases to public cloud with the low effort and cost. We’ve developed Azure SQL Database to be the ideal destination, with enterprise security, full application compatibility and unique intelligent PaaS capabilities that reduce overall TCO. In this session, through preview stories and demos learn what SQL Database Managed Instances are, and how you can use them to speed up and simplify your journey to cloud.

Architect your big data solutions with SQL Data Warehouse and Azure Analysis Services: Have you ever wondered what’s the secret sauce that allows a company to use their data effectively? How do they ingest all their data, analyze it, and then make it available to thousands of end users? What happens if you need to scale the solution? Come find out how some of the top companies in the world are building big data solutions with Azure Data Lake, Azure HDInsight, Azure SQL Data Warehouse, and Azure Analysis Services. We cover some of the reference architectures of these companies, best practices, and sample some of the new features that enable insight at the speed of thought.

Database migration roadmap with Microsoft: Today’s organizations must adapt quickly to change, using new technologies to fuel competitive advantage, or risk getting left behind. Organizations understand that data is a key strategic asset which, when combined with the scale and intelligence of cloud, can provide the opportunity to automate, innovate, and increase the speed of business. But every migration journey is unique, so knowing the tricks of the trade will make your journey far easier. In this session, we use real-world case studies to provide details about how to perform large-scale migrations. We also share information about how Microsoft is investing in making this journey simpler with Azure Database Migration Service and related tools.

What’s new with Azure SQL Database: Focus on your business, not on the database: – Azure SQL Database is Microsoft’s fully managed, database-as-a-service offering based on the world’s top relational database management system, SQL Server. In this session, learn about the latest innovations in Azure SQL Database and how customers are using our managed service to modernize their applications. Our most recent version combines advanced intelligence, enterprise-grade performance, high-availability, and industry-leading security in one easy-to-use database. Thanks to innovations such as In-Memory OLTP, Columnstore indexes, and our most recent Adaptive Query Processing feature family, customers can rely on Azure SQL DB for their relational data management needs, from managing just a few megabytes of transactional data.

Deep dive into SQL Server Integration Services (SSIS) 2017 and beyond: See how to use the latest SSIS 2017 to modernize traditional on-premises ETL workflows, transforming them into scalable hybrid ETL/ELT workflows. We showcase the latest additions to SSIS Azure Feature Pack, introducing/improving Azure connectivity components, and take a deep dive into SSIS Scale-Out feature, guiding you end-to-end from cluster installation to parallel execution, to help reduce the overall runtime of your workflows. Finally, we show you how to orchestrate/schedule SSIS executions using Azure Data Factory (ADF) and share our cloud-first product roadmap towards SSIS Platform as a Service (PaaS).

New capabilities for data integration in the cloud: This session focuses on the needs of the data integrator whether that be for data warehousing/BI, advanced analytics or preparation of data for SaaS applications. We walk through, by example, a comprehensive set of new additions to Azure Data Factory to make moving and integrating data across on-premises and cloud simple, scalable and reliable. Topics covered include: how to lift SSIS packages to the cloud via first-class SSIS support in data factory, a new serverless data factory application model and runtime capabilities, parallel data movement to/from the cloud, a new code-free experience for building and monitoring data pipelines and more.

Understanding big data on Azure – structured, unstructured and streaming: Data is the new Electricity, and Big Data technologies are helping organizations leverage this new phenomena to foster their businesses in innovative ways. In this session, we show how you can leverage the big data services such as Data Warehousing, Hadoop, Spark, Machine Learning, and Real Time Analytics on Azure and how you can make the most of these for your business scenarios. This is a foundational session to ground your understanding on the technology, its use cases, patterns, and customer scenarios. You will see a lot of these technologies in action and get a good view of the breadth. Join this session if you want to get a real understanding of Big Data on Azure, and how the services are structured to achieve your desired outcome.

Building Petabyte scale Interactive Data warehouse in Azure HDInsight: Come learn to understand real world challenges associated with building a complex, large-scale data warehouse in the cloud. Learn how technologies such as Low Latency Analytical Processing [LLAP] and Hive 2.x are making it better by dramatically improved performance and simplified architecture that suites the public clouds. In this session, we go deep into LLAP’s performance and architecture benefits and how it compares with Spark and Presto. We also look at how business analysts can use familiar tools such as Microsoft Excel and Power BI, and do interactive query over their data lake without moving data outside the data lake.

Building modern data pipelines with Spark on Azure HDInsight: You are already familiar with the key value propositions of Apache Spark. In this session, we cover new capabilities coming in the latest versions of Spark. More importantly we cover how customers are using Apache Spark for building end-to-end data analytics pipeline. It starts from ingestion, Spark streaming, and then goes into the details on data manipulation and finally getting your data ready for serving to your BI analysts.

Azure Blob Storage: Scalable, efficient storage for PBs of unstructured data: Azure Blob Storage is the exa-scale object storage service for Microsoft Azure. In this session, we cover new services and features including the brand new Archival Storage tier, dramatically larger storage accounts, throughput and latency improvements and more. We give you an overview of the new features, present use cases and customer success stories with Blob Storage, and help you get started with these exciting new improvements.

Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platform, and intelligent: Increasingly, customers looking to modernize their analytics needs are exploring the data lake approach. They are challenged by poorly-integrated technologies, a variety of data formats, and inconvenient data types. We explore a modern ETL pipeline through the lens of Azure Data Lake. This approach allows pipelines to scale to thousands of nodes instantly and lets pipelines integrate code written in .NET, Python, and R. This degree of extensibility allows pipelines to handle formats such as CSV, XML, JSON, Images, etc. Finally, we explore how the next generation of ETL scenarios are enabled by integrating intelligence in the data layer in the form of built-in cognitive capabilities.

Azure Cosmos DB: The globally distributed, multi-model database: Earlier this year, we announced Azure Cosmos DB – the first and only globally distributed, multi-model database system. The service is designed to allow customers to elastically and horizontally scale both throughput and storage across any number of geographical regions, it offers guaranteed <10 ms latencies at the 99th percentile, 99.99% high availability and five well defined consistency models to developers. It’s been powering Microsoft’s internet-scale services for years. In this session, we present an overview of Azure Cosmos DB—from global distribution to scaling out throughput and storage—enabling you to build highly scalable mission critical applications.

First look at What’s New in Azure Machine Learning: Take in the huge set of capabilities announced at Ignite for the next generation of the Azure Machine Learning platform. Build and deploy ML applications in the cloud, on-premises, and at the edge. Get started by wrangling your data into shape easily and efficiently, then take advantage of popular tools like Cognitive Toolkit, Jupyter, and Tensorflow to build advanced ML models and train them locally or at large scale in the cloud. Learn how to deploy models with a powerful, new, Docker-based hosting service complete with the ability to monitor and manage everything in production.

Delivering enterprise BI with Azure Analysis Services: Learn how to deliver analytics at the speed of thought with Azure Analysis Services on top of a petabyte-scale SQL Data Warehouse and Azure HDInsight Spark implementation. This session covers best practices for managing processing and query accelerating at scale, implementing change management for data governance, and designing for performance and security. These advanced techniques are demonstrated through an actual implementation including architecture, code, data flows, and tips and tricks.

Creating enterprise grade BI models with Azure Analysis Services: Microsoft Analysis Services enables you to build comprehensive, enterprise-scale analytic solutions that deliver actionable insights through familiar data visualization tools such as Microsoft Power BI and Microsoft Excel. Analysis Services enables consistent data across reports and users of Power BI. This session covers new features such as improved Power BI Desktop feature integration, Power Query connectivity, and techniques for modeling and data loading which enable the best reporting experiences. Various modeling enhancements are included, such as Detail Rows allowing users to easily see transactional records; and deployment and application-lifecycle management (ALM) features to bridge the gap between self-service and corporate BI.

Streaming big data on azure with HDInsight, Kafka, Storm, and Spark: Implementing big data streaming pipelines for robust, enterprise use cases is hard. Doing so with open source technologies is even harder. To help with this, HDInsight recently added Kafka as a managed service to complete a scalable, big data streaming scenario on Azure. This service processes millions+ of events/sec, pedabytes of data/day to power scenarios like Toyota’s connected car, Office 365’s clickstream analytics, fraud detection for large banks, etc. We will discuss the streaming landscape, challenges in building production ready streaming services, and build an enterprise grade realtime pipeline. We will then discuss the learnings and future investments on managed open source streaming through Azure HDInsight.

19 Oct 06:10

Dynamic Data Masking and Row-Level Security Features on the Way in SQL Server 2016

by kevin

DYNAMIC DATA MASKING AND ROW-LEVEL SECURITY FEATURES ON THE WAY IN SQL SERVER 2016

In addition to StretchDB and AlwaysEncrypted, there are two more exciting features in the works for SQL Server 2016: dynamic data masking and row-level security. In the case of these two features, they’ll be released first to the cloud platform (Azure SQL Database) and, later, to the on-premises version of SQL Server.

READ MORE…

Originally Posted December 02, 2015

The post Dynamic Data Masking and Row-Level Security Features on the Way in SQL Server 2016 appeared first on Kevin Kline.

19 Oct 06:10

Cylance Releases ‘Introduction to Artificial Intelligence for Security Professionals’ Book

by A.R. Guess

by Angela Guess According to a new press release, “Cylance Inc., the company that revolutionized the antivirus industry with AI-powered prevention that blocks everyday malware along with today’s most advanced cyberthreats, has announced the publication of a book entitled, Introduction to Artificial Intelligence for Security Professionals. Written by the Cylance Data Science team, the book […]

The post Cylance Releases ‘Introduction to Artificial Intelligence for Security Professionals’ Book appeared first on DATAVERSITY.

19 Oct 06:09

SQLSaturday Orlando 2017 Notes – Part 1

by SQLAndy

Part 1 will be about things we did (or didn’t) do and some lessons learned. In Part 2 I will share some thoughts on the SQLSaturday tools/web site. Part 3 isn’t really about SQLSaturday, rather our student seminar that we run concurrently. I’ll get to Part 2 and 3 as soon as I can.

We tried to run leaner this year, in a bunch of ways:

  • Few meetings. I think this helped and I think we’ll have even fewer next year, in favor of more joint calls/work sessions for major tasks.
  • No pre-cons. We’ve always done them. We had a few people ask, but no one seemed upset. For us, it was nice(!) to not have the extra work to manage. We do plan to do other events before the next SQLSaturday.
  • No event bag stuffing. We did end up giving out a branded bag at the end of the day, but we didn’t miss at all having to stuff, transport, and otherwise deal with that process. I see that as a permanent change.
  • No hot lunch. This was hard to give up, but we needed to change things up and more than that, we needed the time in the schedule. Boxed lunches make things go a lot faster. I’m torn on this – boxed lunch is better than pizza, but a hot lunch is the best.
  • Started marketing later. We ended up not making our goal for registration. I’ll share some data on that below so you can see how we did.
  • No real effort outside of emailing to the list. We never saw much impact from those efforts in earlier years, but it’s also hard to track. I’d like to see us resume having scheduled tweets next year, but aside from that, I think we focus on the list and improving it.
  • No large printed schedules. We’ve done super poster sized schedules for years. It costs a few bucks, but it’s effort to lay it out, send to the printer, then pick them up and get them taped up. We missed them, especially when we had a problem with Guidebook not picking up room number changes. Don’t know if we revert next year or not yet.
  • We stopped offering email blasts on behalf of our sponsors. (We might do as a special request, but it won’t be on the sponsorship plan)

We moved the speaker dinner to a different location (Buffalo Wild Wings). Parking was better and it’s covered so no worry about rain. The downside was that we didn’t have a buffet line set up for the appetizers, instead we ordered appetizers and wings and had it shared family style. It worked, but not great. The area was perfect for conversation. Cost was ok.

We ordered steel water bottles as our speaker gift, plus some extras for the end of day give away. We used “#SQLSatOrlando” for the imprint and it was ok, but not as big as we would have liked. Speakers seems to like them anyway.  About $8 each.

We forgot to remind speakers that the projectors are VGA only.

Guest wifi is 100% unreliable for presentations. Not a big deal, but we need to make sure speakers know ahead of time.

We did our usual polos for speakers who requested them. Cost was $20 each. Because we were more cost conscious this year they were embroidered with #SQLSatOrlando (and maybe a little bit because I don’t like the new logo). I think Hawaiian shirts next year.

Thinking about those last two items, we didn’t do a great job of managing the process of handing them out. Next year I’m thinking brown paper bags labeled with their name so we figure out who hasn’t picked up their stuff yet. We also didn’t print their SpeedPASS to go with it, though we did print them on request the next year. Less than ideal.

We ordered more donuts this year, 25 dozen, and they were all eaten (20 dozen last year). We ordered 7.5 gallons of coffee (about $110) and supplemented with coffee made on site and still hit a point where people wanted coffee and we were out. We’re thinking to buy another 5 gallon container for next year, the ones we have worked out great. I think I’d rather do away with the overhead of making coffee (coffee maker, water, power, coffee, staffing, clean up, etc). If we have an extra container we start the day well stocked and we can send out out for refill if we need to.

In previous years we had the SpeedPASS printing station separate from check-in, this year we put them side by side. It felt like better flow and less frustration if someone was in the wrong line. We didn’t have the lines labeled. We also had students lining up, even though they were supposed to be at a different building. We use a fast laser printer which worked well, we downloaded all (or almost all) of the PDF’s ahead of time and that worked ok. It’s still a pain. If we could get SpeedPASS on one page I think we’d just print them all Friday night. Out of all the stuff we do, I feel like this part needs the most attention before next year.

Lunch was boxed lunches from Firehouse. Roast beef turned out to be far more popular than expected. Only one complaint about the lunch. I think for next year we will look to see if we can do a little more premium lunch if we do boxes again – maybe a second side, better bread, I don’t know. It was fine, just wasn’t (to me) in the same league as the hot lunch. With boxed lunches we didn’t have to budget/buy paper plates and flatware, or to-go boxes for those heading to lunch sessions.

Speaking of lunch, at one point we had 150 people registered that said they were going to buy lunch but had not paid. We sent a few reminders and that went down some, but never came close to zero. This year we set a Thursday night cutoff for ordering/cancelling, but did that late – next year we need to announce all of that sooner and send more reminders. The worry for us is not having lunch for someone who expected it, vs ordering extra food and then not having someone pay for it. We ended up with about 50 extra meals, but hard to tell if that was no shows or what we ordered extra, probably a combination. We need to work this harder next year.

We did ok on managing our budget and buying enough but not too much in the way of supplies (other than lunch itself). We didn’t hit our target amount for sponsorship but we adjusted our spend accordingly and that went well.

We did ok on sponsorship. GAP sponsors certainly help. We were thrilled to have a new staffing company join us this year, and one from a few years back return. We simplified our sponsorship plan and that seemed to work. We emailed logistics separately, but didn’t include details on the scan/get the report process. Fix that next year.

We did not do well on organizing volunteers outside the core team. I turned on the volunteer section late and we just ran out of time to promote it. The volunteer section seems almost great, needs more attention next year to see if we can make it work for us.

We had our first ever networking event from 3:30 to 4pm. It was an experiment. We had a great sponsor (SNI Technology) provide cupcakes, we put on some music, and we just let everyone relax for 30 minutes outside on the covered walkway. We didn’t push any “meet x people” or anything else, it was just let people relax and see what happened. It seemed to go well, and now we can think about how to do a little more next year. I think as much as the cupcakes were good, ice cream might be better – less messy, easier to eat slowly? Success was seeing a crowd there and not all leaving early! It also served as a useful way to queue the attendees for the end of day session.

From 4 to 5 pm we did our first even all-in-one-place end of day session. We used the auditorium, a very nice space. We found out two days earlier that there would be no cable on the stage to the projector, so we had plan A (USB stick in the machine next to the projector) and Plan B (Miracast via HDMI). That mostly worked. As far as the actual session, we spent at least a couple hours deciding what to do. Should it be a “big name” speaker? Or what topic would be of interest to the widest range of attendees? We thought about a live podcast, or a panel, or both. Finally we decided to do a game show – SQLFamily Feud! We made that a tentative decision until we could confirm the logistics, once Brad & Josh had the software identified we committed to doing it. We had two teams of four speakers, plus two team captains from the audience. I wasn’t sure how it would be received, but as soon as the attendees heard the music from the show they were engaged.  I think we did 8 rounds, it took around 40 minutes. My favorite question was “Name something DBA’s hate” and when someone answered “Developers” the audience just erupted! Out of all the questions, one generated a guess of proctologist. Wrong, but still funny. As soon as the feud ended we moved into the end of day raffle. That went smoothly and we were done at maybe 5:15 instead of 6:00 pm, felt like we were done hours early.

We then had to finish cleanup – we should have done more during or before the networking break. We also should have had someone scanning the attendee check-in tickets, I ended up doing that one night the following week, not the most fun task.

I heard a lot of positive comments this year from both speakers and attendees. I haven’t read the event evals yet, need to get to that soon.

We finished up with an after party at Fourth Street Grill. Good food, good service, but parking was hard to find. Food was good enough I hate to change next year. We’ll see.

We ended up with lower than expected attendance, about 300. Part of that looks like it was due to us slowing down on messaging the week of and the week after the hurricane, but it also looks like maybe we didn’t push as hard in the last 2 weeks as we should have. You can look at the year over year registration by week below and see that total registration was quite a bit lower, leading to lower attendance.  One big data point we don’t have integrated here is when the marketing messages went out, maybe something we can add for next year.

All in all a pretty good event for us. Kudos to Dave Mason for taking over marketing this year. Brad & Josh just nailed the Feud, couldn’t have asked for better implementation or the audience reception. Shawn was the MVP this year, he handled lunch, printing, and a bunch of other stuff so that I could spend time on the student seminar. Kendal moderated the seminar for the students and we had lots of positive feedback on that entire effort. Gareth and Gilda rounded out the team, helping out on game day, and Pam Shaw helped out during the networking event to stage the cupcakes!

One thing I struggled with was how to capture all the things worth sharing or thinking about or fixing. How to do that better? Sometimes it’s hard enough to agree on the work and do the work. Just thinking about what I missed, here’s one – Firehouse can only do a max charge of $1000 per transaction. Shawn had to sign two receipts for $900 each! Not a big deal, just one of the things is nice to know.

As I write this I’m still not done though. Still need to finish two more posts worth of notes, write thank yous to sponsors and speakers, follow up notes to students, catching up on LinkedIn requests, sending out a final budget summary and probably a few more things.

With rough but maybe useful notes complete, I want to finish with a couple thoughts on my philosophy. It’s up to you how you run your event, I think that’s the heart of SQLSaturday. Don’t be afraid to try new things and don’t be afraid to do less. Be thoughtful about choices, but recognize that it all adds up. I remember back in year one we had hot water and tea bags, boxes of bananas, and a lot of other stuff. Good stuff, but it takes effort, and we do fine without it. The key is to set expectations if you’re making a change.