26 Jan 17:54

Best of…: Best of 2016: Overpowered

by Remy Porter

Merry Holidays, and as we coast into the new year, it's time for us to reflect on some of the best/worst moments of the past one. Today, we start things off with a BANG, and a reminder that electricity is a complicated beast. --Remy

Mike had a different perspective on wiring and cable management- because he worked at a factory which made wires and cables. It was the early–90s, and he was in charge of babysitting a couple of VAXes and their massive, 85lb hard drives. It was an easy job: the users knew the system and needed very little support, the VAXes were practically unstoppable, and the backup battery system could keep the entire thing running for over an hour.

The computers supported HR and accounting, which meant as the year ticked towards its end, Mike had to prep the system for its heaviest period of use- the year end closing processes. Through the last weeks of December, his users would be rushing to get their reports done and filed so they could take off early and enjoy the holidays.

Mike had been through this rodeo before, but Reginald, the plant manager, called him up to his office. There was going to be a wrench in the works this year. Mike sat down in Reginald’s cramped and stuffy office next to Barry, the chief electrician for the plant.

“Our factory draws enough power from the main grid that the power company has a substation that’s essentially dedicated to us,” Reginald explained. “But they’ve got a problem with some transformers loaded with PCBs that they want to replace, so they need to shut down that substation for a week while they do the work.”

The factory portion was easy to deal with- mid-December was a period when the assembly line was usually quiet anyway, so the company could just shift production to another facility that had some capacity. But there was no way their front-office could delay their year-end closing processes.

“So, to keep the office running, we’ll be bringing in a generator truck,” Reginald said. “And that means we’re going to need to set up a cut-over from the main grid to the generator.”

From the computer-room side, the process was easy, but that didn’t stop Mike from writing up a checklist, taping it to the wall beside his desk and sharing a copy with Barry. Before the generator truck arrived, he’d already tested the process several times, ensuring that he could go from mains power to battery and back to mains power without any problem.

The generator truck arrived a week before the changeover. The electricians ignored it for a few days. Just as Mike was starting to get worried about deadlines, he looked out a window and saw a trio of electricians, lead by Barry, starting to connect cables to it. Later that day, when Mike left to go to lunch, he walked past the generator truck, and noticed something odd about the cables- they were clearly single phase power cables.

Typical residential power systems are single-phase alternating current- one peak, one trough. This creates “dead” moments in the cycle, where no work is being done. That’s fine for residential use- but industrial systems need three-phase power- three offset AC cycles that, when added together, guarantee current is always flowing.

“Hey,” Mike said to one of the electricians loitering near the truck, “you’re planning to run some three-phase cabling, right?”

“Nope. The factory’s shut down- this thing is just gonna run lights and heating.”

“And computers,” Mike said. “The hard drives need three-phase power.”

“We’ll have to order some parts,” the electrician said.

A few more days ticked by with no three-phase connections, and Mike checked in with the electricians again.

“The parts are coming.”

At this point, Reginald came down from his office to the computer room. “Mike, Barry’s telling me you’re being a disruption.”

“What?”

“Look, there’s a chain of command,” Reginald said. “And you can’t tell the electricians how to do their job.”

“I’m no-”

“From now on, if you have any concerns, bring them to me.”

The day of the cut-over, the three-phase cabling finally arrived. Barry and his electricians quickly strung those cables from the generator. Mike wasn’t exactly watching them like a hawk, but he was worried- there was no way they could test the configuration while they were working so hastily. Unlike single-phase power, three-phase power could be out-of-sync, which would wreak havoc on the hard drives. He thought about bringing this up to the electricians, but recalled Reginald’s comments on the chain of command. He went to Reginald instead.

“Mike,” Reginald said, shaking his head. “I know you computer guys think you know everything, but you’re not an electrician. This is a factory, Mike, and you’ve got to learn a little about what it’s like to work for a living.”

Now, the electricians and Mike needed to coordinate their efforts, but Reginald strictly enforced the idea of the chain of command. One electrician would power on the generator and announce it, “Generator’s running.” Another electrician would relay this to Barry: “Generator’s running.” Barry would relay that to Reginald. “Generator’s on.” Reginald, finally, would tell Mike.

At 1PM, the electric company cut the power to the factory. The lights went out, but the computers kept humming along, running off battery power. A few minutes later, and a few games of telephone between electricians and Reginald, the lights came back on.

Mike stopped holding a breath that he didn’t know he’d been holding. Maybe he’d been too worried. Maybe he was jumping at shadows. Everything was going well so far.

“Tell the computer guy to switch back to mains power,” called one of the electricians.

“Tell the computer guy to switch back to the mains,” Barry repeated.

“Mike, switch back to the mains,” Reginald ordered.

Mike threw the switch.

BOOOOOOOOOOOOOOMMMMMM

The computer room shook so violently that Mike briefly thought the generator had exploded. But it wasn’t the generator- it was the hard drives. They didn’t literally explode, but 85lbs of platter spinning at about 3,000 RPMs had a lot of momentum. The motor that drove the spindle depended on properly sequenced three-phase power. The electricans had wired the power backwards, and when Mike switched to mains power, the electric motor suddenly reversed direction. Angular momentum won. The lucky drives just broke the belt which drove the spindle, but a few of them actually shrugged their platters from the spindle, sending the heavy metal disks slamming into the sides of the enclosure with a lot of energy.

For the first and only time in his career, Mike slammed a fist into the emergency stop button, cutting all power from the computer room.

Year end closing got delayed. It took the better part of a month for Mike to recover from the disaster. While he focused on recovering data, the rest of the organization kept playing their “chain-of-command telephone”.

The electricians couldn’t be at fault, because they took their orders from Barry. Barry couldn’t be at fault, because he took his orders from Reginald. Reginald couldn’t be at fault, because he followed the chain of command. Mike hadn’t always followed the chain of command. Therefore, this must be Mike’s fault.

Once the data recovery was finished, he was fired.

[Advertisement] Release! is a light card game about software and the people who make it. Play with 2-5 people, or up to 10 with two copies - only $9.95 shipped!

Ronald.phillips likes this

26 Jan 08:19

Healthcare Can Make You Sick

by snoofle

Every industry has information that needs to be moved back and forth between disparate systems. If you've lived a wholesome life, those systems are just different applications on the same platform. If you've strayed from the Holy Path, those systems are written using different languages on different platforms running different operating systems on different hardware with different endian-ness. Imagine some Java app on Safari under some version of Mac OS needing to talk to some version of .NET under some version of Windows needing to talk to some EBCIDIC-speaking version of COBOL running on some mainframe.

Long before anyone envisioned the above nightmare, we used to work with SGML, which devolved into XML, which was supposed to be a ~~trivial~~ tolerable way to define the format and fields contained in a document, with parsers on every platform, so that information could be exchanged without either end needing to know anything more than the DTD and/or schema for purposes of validation and parsing.

In a hope~~less~~ful attempt at making this somewhat easier, wrapper libraries were written on top of XML.

Sadly, they failed.

A hand holding a large pile of pills, in front of a background of pills

In the health care industry, some open-source folks created the (H)ealthcare (API), or HAPI project, which is basically an object oriented parser for text-based healthcare industry messages. Unfortunately, it appears they suffered from Don't-Know-When-To-Stop-Syndrome™.

Rather than implementing a generic parser that simply splits a delimited or fixed-format string into a list of text-field-values, the latest version implements 1205 different parsers, each for its own top-level data structure. Most top level structures have dozens of sub-structures. Each parser has one or more accessor methods for each field. Sometimes, a field can be a single instance, or a list of instances, in which case you must programmatically figure out which accessor to use.

That's an API with approximately 15,000 method calls! WTF were these developers thinking?

For example, the class: EHC_E15_PAYMENT_REMITTANCE_DETAIL_INFO can have zero or more product service sections. So right away, I'm thinking some sort of array or list. Thus, instead of something like:

    EHC_E15_PAYMENT_REMITTANCE_DETAIL_INFO info = ...;
    List<EHC_E15_PRODUCT_SERVICE_SECTION> prodServices = info.getProductServices();
    // iterate

... you need to do one of these:

    // Get sub-structure
    EHC_E15_PAYMENT_REMITTANCE_DETAIL_INFO info = ...;
	
    // Get embedded product-services from sub-structure

    // ...if you know for certain that there will be exactly one in the message:
    EHC_E15_PRODUCT_SERVICE_SECTION prodSvc = info.getPRODUCT_SERVICE_SECTION();
	
    // ...if you don't know how many there will be:
    int n = infos.getPRODUCT_SERVICE_SECTIONReps();
    for (int i=0; i<n; i++) {
        EHC_E15_PRODUCT_SERVICE_SECTION prodSvc = info.getPRODUCT_SERVICE_SECTION(i);
        // use it
    }

    // ...or you can just grab them all and iterate
    List<EHC_E15_PRODUCT_SERVICE_SECTION> allSvcs = info.getPRODUCT_SERVICE_SECTIONAll();

...and you need to call the correct one, or risk an exception. But having multiple ways of accomplishing the same thing via the API leads to multiple ways of doing the same thing in the code that is using the API, which invariably leads to problems.

So you might say, OK, that's not SO bad; you just use what you need. Until you realize that some of these data structures are embedded ten+ levels deep, each with dozens of sub-structures and/or fields, each with multiple accessors. With those really long names. Then you realize that the developers of the HAPI got tired of typing and just started using acronyms for everything, with such descriptive data structure names as: LA1, ILT and PCR.

The API does attempt to be helpful in that if it doesn't find what it's expecting in the field you ask it to parse, it throws an exception and it's up to you to figure out what went wrong. Of course, this implies that you already know what is being sent to you in the data stream.

Anonymous worked in the healthcare industry and was charged with maintaining a library that had been wrapped around HAPI. He was routinely assigned tasks (with durations of several weeks) to simply parse one additional field. After spending far too much time choking down the volumes of documentation on the API, he wrote a generic single-class 300 line parser with some split's, substring's, parseDate's and parseInt's to replace the whole thing.

Now adding an additional field takes all of ten minutes.

[Advertisement] Release! is a light card game about software and the people who make it. Play with 2-5 people, or up to 10 with two copies - only $9.95 shipped!

18 Jan 21:36

MVP No More

by admin

Back in September I received a call from my MVP lead that I wasn’t being renewed. Not a big surprise, though still a disappointment. The program has changed a lot since I joined in 2008 and today has much more focus on “data platform” than just SQL Server (where I continue to focus). I’ve also done less in the community in the past year or two than in many years, opting to use the time I had to support local groups and serve on the NomCom this year. The combination of the two meant I fell under whatever magic line was drawn between stay and go.

Since I was first awarded in 2008 I’ve tried to stick to the plan of doing the things I wanted to do and that I could do (time, effort, money) and if they aligned with MVP goals that was good, if not, that was good too. I have no complaints about my time in the program, or on my non-renewal. Our paths have diverged, at least for now, and that means someone with more time and energy than I have gets to ‘take my seat’ as it were.

18 Jan 21:36

Thoughts on the 2016 PASS Summit

by admin

I’ll have several more posts this week on specific activities at the Summit, but wanted to start with how the trip went personally:

I flew out Monday instead of Sunday. Whether it just that or my pacing for the week, I wasn’t wiped out on Friday afternoon
I didn’t take the direct flight to Seattle because it leaves at a kinda-too-early 5 am. I had a 2 hour stop in Houston, had coffee and a bagel and all was good. A side story to that, when I got to Houston there was an earlier flight to Houston but I couldn’t get a seat because it was leaving in less than an hour, not enough time for them to move my checked bag. I took my scheduled flight and waited on my bag, now show – it was put on the earlier flight!
I stayed at Motif year on 5th. Ok to good, annoyed about an auto included $12.99 per day convenience fee for wifi and bottled water.
Monday night networking dinner went fine and was good start to week
I attended the SQLSaturday meeting and the pre-con on Tuesday both went well, had lunch with friends off site, was nice to have that break
Wish I had gone to the Chapter meeting, but wanted to attend a pre-con, first time I made time for that!
I watched parts of both keynotes remote and felt better for it. Better watching when you can have a table, coffee, and scone, and less annoying when they not to my particular interest.
I didnt blog during the week, deliberately, and didn’t pay much attention to Twitter either. Nothing wrong with either and I might change next year, but it was nice to just….be there and do the things I wanted to do
Logistics were fine, lunch was ok to good, maybe not quite as good as some years, a very subjective judgement
I’m not thrilled about the continued expansion of content to the building across the street. I don’t know if that is just to add more tracks or due to capacity issues. It works, but it adds time.
Our Thursday night SQL dinner meetup and game night went well – more on that separately
We designed our Thursday events for introverts and as I watched through the week I feel like there is an enormous opportunity there to serve attendees that is being missed. We have the First-Timers thing, but that doesn’t go far enough or address introverts (first timers or not).
It was a week of interesting conversations. Had someone recommend that I do some learning in an area I hadn’t considered, got to see an old friend who is very sick, do some diplomacy and strategy, and a lot more.
Committed to submitting a presentation as co-presenter with my friend Mitch for next year (and thinking I might do that with one or two others – a different way to participate)
Took the train back to the airport on Friday still wearing my attendee badge and that led to a great conversation that lasted right up to the airport.
I attended with two colleagues this year that were both first timers, plus a former colleague that was also a first timer. I’m interested to follow up with all three to hear what they thought worked and what didn’t.
Took the redeye home, never a fun experience, but I had to be home in time to get some rest before taking my kids to see Lindsey Stirling perform Saturday night!

A good week overall. Learned a lot, technically and otherwise.

18 Jan 21:36

The ROI of SQLSaturday

by admin

Following up on my notes about the 2016 SQLSaturday Roundtable I wanted to write more about the ROI of SQLSaturday. I suppose there are many ways to measure it, but I think it boils down to two views:

How many people did we train/hours we deliver to our members? That’s what we’re in business to do, right?
How much revenue did we drive back to PASS, primarily in the form of Summit registrations?

PASS puts a decent amount of money into SQLSaturday. A full time evangelist plus their travel, hosting the site, making improvements to the website, creating the event logo images, responding to web site problems, organizing the Roundtable and putting some direct money (though less now) into each event in the form of sponsorship. I’m not sure of the exact number, but it’s probably $200k-$400k per year, out of a $5-8mil/year budget. In return PASS gets brand exposure, a continuing list of new attendees and CRM type data on those returning, and a chance to further serve those. They are a super-sponsor, getting the entire list, an advantage no other sponsor gets. Good deal? Maybe.

In a perfect, or even better world, we’ve see a nice upsell path from attending SQLSaturday to attending the Summit (the money maker). In practice that number is very low, at least as far we can tell looking back five years and not having a perfect “member id” for each member (they often use different emails for SQLSat vs paid events). As is currently assessed SQLSaturday is “losing money” – which is acceptable or not depending on the state of the budget, whether you think “losing” money on anything is bad, and how you see the overall role of PASS. That lack of growth has contributed to a view from at least some Board members that SQLSaturday is decreasing Summit attendance/growth (though I hear it grew 10% in 2016), a position so far not supported with data and one that seems non-intuitive to me. Those two things plus some short of budget shortfall due to an overly ambitious revenue projection have led to cuts in how much money goes to events from PASS for sponsorship.

Why don’t our attendees go to the Summit? Easily the number one reason is cost! It’s a very large spend, many are reluctant to even ask, and fewer still get approved. It’s often a multi year effort, or one that is done when negotiating initial salary and benefits. We can’t (probably) make it cheaper long term, but we would surely work on giving those not-attended-yet people more of an incentive (first time cost of $1000 for example). We could do more to reach the decision maker directly. There’s no doubt I take this “failure” to drive Summit sales personally. Those of running these events are pro-PASS, pro-Summit, and we do what we can, but we’re running the event. Our job is to show people the value of learning, get them thinking about learning more, and to get them into the system. The job of PASS and its marketing team is to find ways to reach those contacts and guide/incent them to attend other events, paid or unpaid. I’m glad to do what I can to help, but if this isn’t the target audience to market to, what is? How do we not see this as a marketing failure? Setting up a table and staffing it with a regional mentor is a start, but it’s superficial.

I was stunned to hear a Board member tell me people are opting out of Summit because they can just go to more SQLSaturdays (leading him to think we should do fewer of them). Is this true, where is the data? It defies intuition at best. No one skips the Superbowl to go to 4 local high school games, regardless of the quality of those teams. If it it true, then what? That’s an interesting discussion. They go to multiple events because they enjoy them and can afford them. How many go to multiple events per year? I’d be surprised if that is a large number, not counting speakers.

I heard about “speaker exhaustion”, a problem that I don’t see but might exist, and is easily fixable. I heard about reduced sponsor participation, or at least the inability to scale it up as we grow. Both in the context of reasons to do fewer events and to make them smaller.

I get PASS has to pay the bills. I’m not opposed to being smarter about how we send out money. For example, lets offer more money to first time events, or to events that cannot find a free venue. I’m not opposed to doing less for a year if we had a shortfall. Maybe it’s time to wean from that entirely, but that cost – about $60k – is noise in the PASS budget.

Ranting and frustration aside, it’s the nature of a quasi-not-for-profit to struggle with balancing the need to raise funds vs using those funds to do go good. What I don’t want to see happen is a return to the bad old days where PASS spent basically nothing on anything besides Summit, a game of make money so we can grow staff and make more money, over and over. That’s a tough balance without some guiding principle or goal that I’m not sure we have. I’ve wondered more than once if we didn’t need PASS.COM and PASS.ORG as two different companies, one for profit with a mandate to put x dollars/percent into the other.

Why does this all matter? I feel like the Board is making decisions about the growth, size, etc, of SQLSaturday based on an incomplete view of the world. We can cap spending, cap the number of events, do other defensive things, or we can ask “how do we change the game so we can train more people?”. Steve isn’t wrong to think about 500 events a year. Not events for the sake of events but to reach people that want to learn, grow, join the craft, be part of our community. That’s why we started this, why so many of you lead and speak and volunteer at these events. If we can help with the ROI, that’s good. If we have to get less support, let’s be smart about what support matters. We need a longer and better conversation about the two conflicting but valid views of ROI. I think we can achieve both, but we don’t do it by pulling back.

18 Jan 21:36

The PASS Global Alliance Program (GAP)

by admin

One of the topics I forgot to mention from the Roundtable was a swell of dissatisfaction with the GAP. Not all understood it, and it’s clearly not “global” from the perspective of the non North American events. I don’t know if GAP is a good idea or not, but it started to help sponsors (and PASS) look at marketing overall. Here’s an example. Imagine you’re a vendor in our space. You’re going to get at least 2, and probably more, emails from each event asking you to sponsor. They all have similar but different sponorship plans. With 120+ events a year, how you do you manage that effectively?

One way is to give PASS a big check and have them distribute it to the events you pick. You still have to send SWAG, coordinate a speaker, send tweets and emails, but it’s a one time check to write and probably easier to manage. Not dumb to wish for easier, and kudos to PASS for trying to solve the problem at multiple levels.

I don’t think GAP should go away, but it needs to evolve. I don’t know if we can “make” sponsors allocate some of those funds to non North American events. Certainly we could encourage them – maybe even offer some dollar matching for those cases. Maybe we should just rename it!

18 Jan 21:36

Why we need a PASS Sponsor Portal

by admin

Major sponsors in our space get hundreds of emails from us each year, all asking for money. Those are all interesting and potentially valuable invitations, but the sheer volume can be overwhelming. Beyond that, sponsors struggle to find the ROI, from picking the swag that draws people to the table to identifying venues most likely to generate leads if not immediate sales. They have to schedule a speaker and/or sales rep, ship stuff, buy stuff, it’s not just “show up”.

Imagine a sponsor portal where they can see every event scheduled/reserved, whether they’ve sponsored this year (and in previous years), and easily see the sponsor plan. Imagine they can post a note to events that says “I’m only doing 10 events this year and have already picked them”, or “we only go to events with registration counts over 500 in the previous year” or “we don’t sponsor the same event two years in a row”. They might say “we require a speaking slot”. They could see registration counts for the history of the event, how many leads we sent them. Maybe even show them how many sponsors signed up at each event so they could opt for events where they might get more time/space as a percentage of sponsors. Maybe they could see floor plans and what spaces were left. Make it as easy as booking an airline seat.

Should we do this? I think it’s based on how we see the future. Are we moving away from reliance on national sponsors, by shifting to a paid model (yuck, but maybe), local sponsors, or just spending less? Or will it always be a hybrid and merits an investment in tools combined with revisiting how we serve sponsors and adjusting expectations?

We still have don’t have the speaker bureau, but this is the companion to that. We should treat both groups well. Both are critical to our success.

18 Jan 21:35

Reflecting on Ten Years of SQLSaturday

by admin

It’s hard to believe it’s been more than 10 years since we started the planning for the first event in Orlando. The success of the idea seems obvious now – surely the time was right for something to happen in our space. Back then though it was far from obvious. We didn’t have doubts about the ability to do one event, or even repeat it in Orlando, but how to get it to grow in other cities? The best answer we had was to leverage our network, slowly. Pam Shaw in Tampa, Brian Knight in Jacksonville, Greg Larsen in Tacoma. We were bare bones on everything in those days – the site had no login, no tools, and we just updated the HTML from the first event to support the second one. Thinking about it, I don’t know if I could do it that agile/lean again, but back then we had limited time/resources and we knew that we didn’t know enough to build anything, and building tools is a big leap when you don’t know if the concept will grow or not. Running lean was the right decision, not without pain times. We learned a lot in those early days, the chance to iterate quickly made a big difference.

Even back then there was a lot of worry. We worried about finding free venues (or having someone go the paid space route and lose personal money). We worried about running out of sponsor dollars! We worried about finding speakers. We added the optional lunch fee to mitigate the sponsor money issue, a decision we struggled with then but in hindsight was a good choice. We saw the speakers start to appear – when I saw Tim Mitchell and Geoff Hiten traveling to see what we were doing I realized that the speakers would drive it all – if they came, we’d grow. They did. Not overnight, but that pool grew (and continues to grow). Events grew slow as speakers took the idea back home and encouraged their own groups to try it, and that pattern repeated, you can see from the early days the focus on the south east and how it spread outward. Looking back at those early days with minimal tools and not a ton of experience I’m amazed that the event leaders took up the challenge and while they had suggestions for making it better it was always positive dealing with them.

We went from planting a seed to slow growth to decent success, enough for PASS to see the value and take over the event format in 2010. It was not a popular decision then, many feared that PASS would mismanage it. I can’t say we had no concerns, but we were betting on the people on the Board – good people – who wanted to serve the community. It took a while for PASS to decide and it was Rushabh Mehta who made it happen, a decision I’d call the capstone of his tenure on the Board. The handoff illuminated the weaknesses we still had in tools and processes, a source of pain and concern for HQ but once we worked through it things settled down. I believe it changed PASS. It didn’t fix everything, but it provided a steady stream of new members and it made the org feel vibrant in a way that it didn’t before. I credit the Board too for spending to support it, hiring a full time evangelist, providing direct sponsorship to events, and a lot of IT time/money into upgrades to the tools. The growth caused pain at times, but the right kind of pain. I can’t close out this paragraph without mentioning Karla. If there was every the right person at the right time! The growth since then has been incredible.

An obvious but unpredicted side affect of having more events is that we built more relationships faster. Relationships need repetition to grow. Back then most of us met once a year at the Summit. That coincided with an uptick in social media so it’s hard to credit just one or the other, but together that sense of family started to grow further and faster.

I look back in wonder at the wisdom/luck of defining a format with so few rules. We had a formula, or a recipe if you prefer, but it was loosely defined. I believed then and now that the whole thing depends on the event leader, that one person who says “I will get it done”. The best way to encourage that is to give them a goal, some suggested guidelines, and then get out of the way. It’s their event. If they want to do a BI edition, or customize the logo, or buy Hawaiian shirts or whatever else, let them do it. It’s that sense of ownership that will drive them to do more and do it better. Related to that, we’re also training leaders – they have to find a venue, make deals with sponsors, provide customer service, and not least learn the challenges of using volunteers effectively, all skills that will serve them well in business, life, and when they run for the PASS Board.

The impact is hard to measure. We can measure hours delivered, number of events, number of speakers and sessions, all good stuff. But that misses the intangibles. The careers launched or improved because an attendee heard the right message at the right time. The friendships built, obvious but so easy to overlook. I can’t count the number of friends I’ve made at SQLSaturday, but I remember the first – Jack Corbett, who introduced himself after the first event and soon become part of the SQL fabric. The increase in the number and quality of speakers vying for Summit sessions. Even sponsors that have grown and changed from the chance to be at so many events. And what about our craft, surely it’s better for the exchange of ideas?

My thoughts? Unimaginable success. Satisfaction. I wish I could shake the hand of every event leader and every speaker – we’ve done this together as peers, making our part of the world a better place and perhaps inspiring more to do the same. Thinking back to when dreaming of doing this in multiple cities seemed like an awfully big dream. Thinking that one day we’ll peak and perhaps even decline as needs and interests change, but also thinking that if we dare that we might do more good in more places. A bigger dream? Improbable? Why not?

Finally, the biggest reason to look back is to prepare to look forward. What if anything needs to change to continue to serve, to grow? Let’s look at that openly, calmly. I’ll write more on that soon.

18 Jan 21:34

Should We Allow Paid Sponsor Presentations at PASS Summit?

by SQLAndy

For many years the PASS Summit program committee has enforced a bright line rule – no sponsor content in the educational tracks. I always thought it was a good rule because it was easy to enforce and it made sure that sessions were educational (that is, not a veiled or not-so-veiled sales pitch). If you wanted information about a product or the all out pitch you could make your way to the sponsor expo and talk to a rep or watch a group demo. It wasn’t perfect, but it was usable and easy to follow (if not appreciate).

I think we could continue to apply the rule, though clearly there could and should be some exceptions for speakers who are demoing a free solution (take sp_whoisactive as a great example) to show how it works and how to solve problems with it.

I’ve slowly realized that in this case, as in life, it’s the nuance that matters. There are plenty of presentations that are “just SQL”, but our jobs require more than that. Take the task of putting a database into source control and learning to do continuous (or close!) release. We can talk about that from an academic perspective which has its place, but if I want to take the next step it’s one that requires tools and now we’ve hit the conflict. What happens now?

I’ll borrow my friend Steve Jones as an example. He works for Redgate and they have a stack of tools (there are other vendors that have similar) that support source control and continuous release and he’s working to understand the real world problems that the tools solve and don’t solve. I’m biased of course, but I can see the value of attending a presentation that shows how to put that stack together and do something with it. Under the current (maybe old now) rule it’s not going to make it onto the schedule regardless of the quality of his speaking skills or abstract or domain knowledge. Right? Wrong? Complicated.

Maybe think of sessions as having these possible flavors:

Pure education using SQL Server (maybe with a sub-flavor of including tools that are open source or otherwise free and require no registration to use)
Vendor tools allowed in context of solving a problem (try parsing an abstract to figure that out, or even assessing the final presentation)
Sales pitch allowed and expected

Think about that for a moment. If you kill the rule, what’s the new rule? Maybe you’re ok with sales pitches being on the schedule, but how many others won’t be?

For 2016 PASS did something I don’t think much of – they sold session slots to vendors. As far as I know that’s the first time. We’ve put up with it for keynotes because, well, it’s just always been that way. Personally I think we’d get better keynotes if we paid the speaker rather than the other way around. Rant over. Anyway, I know more than a few attendees were taken aback this year to see content being delivered that didn’t fit their mental model of what was allowed or accepted. PASS didn’t hide it, but it wasn’t really published or explained well, before or during the Summit.

Selling slots to vendors, does it solve the problem, or just make money for PASS? I don’t know. I know that the vendors with the most to spend get the spots and the little guys are left trying to play by the old rules. Free market? Eh.

I spoke to PASS about some of this last July or so. My suggestion was to leverage the current system of abstract evaluation. Imagine if we did this:

Require speakers to disclose any tool being using in the presentation that isn’t available as a free no registration download
Require speakers to disclose their relationship with the tool vendor (employee, contractor, “friend of”, or “none”)
Mandate that pricing/licensing cannot be shown/discussed (come see us in the expo)
Tag sessions and speakers (“vendor” maybe?) so that anyone deciding what to attend can clearly see that it might have a subtle sales pitch and/or product bias. Many would go anyway (or because), others would run the other way. Transparency would support both.
Limit those specially tagged sessions to some percentage of the total, or perhaps allocate one track for them.
Let the program committee evaluate as always using the amended rules – this ensures quality, transparency, and I bet intense competition for those vendor slots.

We could do all of that pretty easily and trial it for a year without much change in process or software. Maybe there’s a better way to do it, and maybe we shouldn’t do it at all, but I’d like to see a public discussion before selling slots is ingrained into the fabric.

It’s easy to complain about the old rule…until you think about the consequences of not having it. Changing the formula is always tricky (New Coke!).

18 Jan 21:23

Announcing Columnstore Indexes and Query Store support in Database Engine Tuning Advisor

by SQL Server Team

The latest version of Microsoft SQL Server Database Engine Tuning Advisor (DTA) supports two new features: (a) Ability to recommend columnstore indexes in addition to rowstore indexes, and (b) Ability to automatically select a workload to tune from the Query Store. These new features are available when tuning for SQL Server 2016 Database Engine (or later) versions.

Recommendations for columnstore indexes

Data warehousing and analytic workloads often need to scan large amounts of data, and can greatly benefit from columnstore indexes. In contrast, rowstore (B+-tree) indexes are most effective for queries that access relatively small amounts of data searching for a particular range of values. Since rowstore indexes can deliver rows in sorted order, they can also reduce the cost of sorting in query execution plans. Therefore, the choice of which rowstore and columnstore indexes to build for your database is dependent on your application’s workload.

The latest version of DTA can analyze the workload and recommend a suitable combination of rowstore and columnstore indexes to build on the tables referenced by the workload. This article highlights the performance improvements achieved on real customer workloads by using DTA to recommend a combination of rowstore and columnstore indexes.

Tune Database using Workload from SQL Server Query Store

The Query Store feature in SQL Server automatically captures a history of queries, plans, and runtime statistics, and persists this information along with a database. It stores query execution statistics summarized over time intervals so you can see database usage patterns and understand when query plan changes happened on the server. DTA now supports a new option to analyze the Query Store to automatically select an appropriate workload for tuning. For many DTA users, this can take away the burden of having to collect a suitable workload file using SQL Server Profiler. This feature is only available if the database has Query Store turned on.

Next Steps

Download the latest version of Database Engine Tuning Advisor
For additional documentation on these features see also:
Columnstore Index Recommendations in Database Engine Tuning Advisor (DTA)
Tuning Database Using Workload from Query Store
Performance Improvements using Database Engine Tuning Advisor (DTA) recommendations

18 Jan 21:23

T-SQL Tuesday #86 My all-time favorite personal SQL Server Enhancement Request: Automatically maintained columns

by drsql

I have heard the name T-SQL Tuesday for (like,) ever; but I have always been so into my own bundle of insanity that never took the time to figure out what it is. This changes today because I read a blog on Adam Machanic's ( @adammachanic ) blog that outlined the rules ( here ) and I have been keen to get back to blogging regularly. When I saw what the topic was this month, hosted by Brent Ozar, I was even more excited. SQL Server Bugs & Enhancement Requests . Over the years, I have posted a lot...(read more)

06 Jan 21:53

Books are DONE…

by drsql

Finally, I can say I am done with a couple of books I had been working on, pretty much simultanously at times the past few months: Exam Ref 70-762 Developing SQL Databases ( https://www.micro softpressstore.com/store/exam-ref-70-762-developing-sql-databases-9781509304943 ) I wrote the first half of the book, with Stacia Varga finishing things out. It was a challenging project for several reasons, but I am very proud of the final product. If you are studying for the exam, you should find what you...(read more)

06 Jan 21:53

Prologika Newsletter Winter 2016

by Prologika - Teo Lachev

Designing an Operational Data Store (ODS)

I hope you’re enjoying the holiday season. I wish you all the best in 2017! The subject of this newsletter came from a Planning and Strategy assessment for a large organization. Before I get to it and speaking of planning, don’t forget to use your Microsoft planning days as they will expire at the end of your fiscal year. This is free money that Microsoft gives you to engage Microsoft Gold partners, such as Prologika, to help you plan your SQL Server and BI initiatives. Learn how the process works here.

Just like a data warehouse, Operational Data Store (ODS) can mean different things for different people. Do you remember the time when ODS and DW were conflicting methodologies and each one claimed to be superior than the other? Since then the scholars buried the hatchet and reached a consensus that you need both. I agree.

To me, ODS is nothing more than a staging database on steroids that sits between the source systems and DW in the BI architectural stack.

What’s Operational Data Store?

According to Wikipedia “an operational data store (or “ODS”) is a database designed to integrate data from multiple sources for additional operations on the data…The general purpose of an ODS is to integrate data from disparate source systems in a single structure, using data integration technologies like data virtualization, data federation, or extract, transform, and load. This will allow operational access to the data for operational reporting, master data or reference data management. An ODS is not a replacement or substitute for a data warehouse but in turn could become a source.”

OK, this is a good starting point. See also the “Operational Data Source (ODS) Defined” blog by James Serra. But how do you design an ODS? In general, I’ve seen two implementation patterns but the design approach you take would really depends on how you plan to use the data in the ODS and what downstream systems would need that data.

One to One Pull

ODS is typically implemented as 1:1 data pull from the source systems, where ETL stages all source tables required for operational reporting and downstream systems, such loading the data warehouse. ETL typically runs daily but it could run more often to meet low-latency reporting needs. The ETL process is typically just Extract and Load (it doesn’t do any transformations), except for keeping a history of changes (more on this in a moment). This results in a highly normalized schema that’s the same is the original source schema. Then when data is loaded in DW, it’s denormalized to conform to the star schema. Let’s summarize the pros and cons of the One:one Data Pull design pattern.

	Pros	Cons
Table schema	Highly normalized and identical to the source system	The number of tables increase
Operational reporting	Users can query the source data as it’s stored in the original source. This offloads reporting from the source systems	No consolidated reporting if multiple source systems process same information, e.g. multiple systems to process claims
Changes to source schema	Source schema is preserved	Additional ETL is required to transform to star schema
ETL	Extraction and load from source systems (no transformations)	As source systems change, ETL needs to change

Common Format

This design is preferred when the same business data is sourced from multiple source systems, such as when the source systems might change or be replaced over time. For example, an insurance company might have several systems to process claims. Instead of ending up with three sets of tables (one for each source system), the ODS schema is standardized and the feeds from the source systems are loaded into a shared table. For example, a common Claim table stores claim “feeds” from the three systems. As long as the source endpoint (table, view, or stored procedure) returns the data according to an agreed “contract” for the feed, ODS is abstracted from source system changes. This design is much less normalized. In fact, for the most part it should mimic the DW schema so that DW tables can piggy back on the ODS tables with no or minimum ETL.

	Pros	Cons
Table schema	Denormalized and suitable for reporting	The original schema is lost
Operational reporting	Relevant information is consolidated and stored in one table	Schema is denormalized and reports might not reflect how the data is stored in the source systems
Schema changes to source systems	As long as the source endpoints adhere to the contract, ODS is abstracted from schema changes	A design contract needs to be prepared and sources systems need to provide the data in the agreed format
ETL	Less, or even no ETL to transform data from ODS to DW	ETL needs to follow the contract specification so upfront design effort is required

Further Recommendations

Despite which design pattern you choose, here are some additional recommendations to take the most of your ODS:

Store data at its most atomic level – No aggregations and summaries. Your DW would need the data at its lowest level anyway.
Keep all the historical data or as long as required by your retention policy – This is great for auditing and allows you to reload the DW from ODS since it’s unlikely that source systems will keep historical data.
Apply minimum to no ETL transformations in ODS – You would want the staged data to keep the same parity with the source data so that you can apply data quality and auditing checks.
Avoid business calculations in ODS – Business calculations, such as YTD, QTD, variances, etc., have no place in ODS. They should be defined in the semantic layer, e.g. Analysis Services model. If you attempt to do so in ODS, it will surely impact performance, forcing to you to pre-aggregate data. The only permissible type of reporting in ODS is operational reporting, such as to produce the same reports as the original systems (without giving users access to the source) or to validate that the DW results match the source systems.
Maintain column changes to most columns – I kept the best for last. Treat most columns as Type 2 so that you now when a given row was changed in the source. This is great for auditing.

Here is a hypothetical Policy table that keeps Type 2 changes. In this example, the policy rate has changed on 5/8/2010. If you follow this design, you don’t have to maintain Type 2 in your DW (if you follow the Common Format pattern) and you don’t have to pick which columns are Type 2 (all of them are). It might be extreme but it’s good for auditing. Tip: use SQL Server 2016 temporal tables to simplify Type 2 date tracking.

RowStartDate	RowEndDate	SourceID	RowIsCurrent	RowIsDeleted	ETLExecutionID	PolicyKey	PolicyID	PremiumRate	…
5/2/2010	5/8/2010	1	0	0	0BB76521-AA63-…	1	496961	0.45
5/9/2010	12/31/9999	1	1	0	CD348258-42ED-..	2	496961	0.50

MS BI Events in Atlanta

Atlanta MS BI Group: “Master Data Management with SQL Server 2016 MDS” presentation by Neal Waterstreet on 1/31
Atlanta MS BI Group: Presentation and sponsorship by SolidQ on 2/27.

As you’d probably agree, the BI landscape is fast-moving and it might be overwhelming. If you need any help with planning and implementing your next-generation BI solution, don’t hesitate to contact me. As a Microsoft Gold Partner and premier BI firm, you can trust us to help you plan and implement your data analytics projects, and rest assured that you’ll get the best service.

Regards,

Teo Lachev
Prologika, LLC | Making Sense of Data
Microsoft Partner | Gold Data Analytics

06 Jan 21:53

Windows Virtual Accounts and SQL Server

by Wayne Sheffield

How SQL Server uses Windows Virtual Accounts and local groups

Managed Service Accounts (MSAs) and Virtual Accounts were introduced in Windows 7 / Windows 2008R2 (link). Installations of SQL Server 2012 and newer will utilize the Virtual Accounts. SQL Server 2008 / 2008R2 installations use a combination of the local groups and Virtual Accounts, and SQL Server 2005 installations use just local groups.

On my Windows 7 laptop, I have SQL Server versions 2005-2014 installed. Looking at Computer Management for the groups, we can see the groups that were created by the various SQL Server installations:

sqlgroups

If we drill in to one of these groups, we can see that the SQL Server service account for the database engine is assigned to this group:

virtual-accounts-sqlgroupdetail

Just to prove that this is the SQL Server service account:

virtual-accounts-sql-service-accounts

SQL Server will use these groups in many places so that permissions are granted to the group, instead of the actual service account. This simplifies things greatly if you change the service account – SQL Server Configuration Manager will just change the member of this group instead of having to hunt down and change everywhere that it knows that permissions are needed for the service account. Using these groups instead of the service account will simplify your life also if you ever change the service account – all those specific permissions that you granted on local resources (paths, registry, etc.) would have to be changed. Using the group, it will still have the same permissions.

When installing on Windows 7 / 2008R2, Virtual Accounts are used instead. The virtual account used is “NT SERVICE\”. For a default SQL Server instance, the name will be “NT SERVICE\MSSQLSERVER”, and for a named SQL Server instance it will be “NT SERVICE\MSSQL$”. Essentially, this Virtual Account takes the place of a group. One of the places where this is used is in Local Security Policies. Here is a screen shot from one of those:

virtual-accounts-usage

Notice that this policy has both Virtual Accounts and local groups assigned to it, depending on the version of SQL.

Does this work?

I have had people insist that this only works when using a local windows account, and that it doesn’t work with a domain account. So, let’s test it out.

Previously, I had created a lab environment with a network for testing out Availability Groups. I installed an instance of SQL Server 2016 using a domain account for the service account:

virtual-accounts-sql-services-domain-accounts-used

The SQL Server 2016 installation specified to set the “Perform Volume Maintenance Tasks” policy. The following screen shot shows that is set to use “NT SERVICE\MSSQLSERVER”:

virtual-accounts-pvmt

When the SQL Server error log is examined, we can see that Instant File Initialization (IFI) is enabled. This can only be enabled if the service is running with the “Perform Volume Maintenance Task” security privilege assigned to the service:

virtual-accounts-ifi-used

Just to prove that IFI is actually working, let’s build a database using the example shown at https://blogs.msdn.microsoft.com/sql_pfe_blog/2009/12/22/how-and-why-to-enable-instant-file-initialization/:

DROP DATABASE IF EXISTS TestFileZero;
GO
DBCC TRACEON(3004,3605,-1);
GO
CREATE DATABASE TestFileZero;
GO
EXEC sp_readerrorlog;
GO
DROP DATABASE TestFileZero;
GO
DBCC TRACEOFF(3004,3605,-1); 
GO

Running these statements produces the following results from the error log:

virtual-accounts-ifi-used-db-create

Here we can see that only the log file was zeroed out – the data file wasn’t. Proving that IFI is actually working, and that the virtual account properly utilizes the “Perform Volume Maintenance Tasks” privilege for the service account for use by SQL Server.

Summary

In summary, the Virtual Account is essentially an automatic group for the service. It can be used on the local machine, even if the service account is a domain account. It can simplify things if you ever need to change the service account. You should use it.

Note that if you need to grant the service account permissions outside of the local server (network file share, another server, etc.) then you will still need to use the domain service account.

The post Windows Virtual Accounts and SQL Server appeared first on Wayne Sheffield.

06 Jan 21:52

SQL Server v.Next : STRING_AGG() performance

by Aaron Bertrand

While SQL Server on Linux has stolen almost all of the headlines about v.Next, there are some other interesting advancements coming in the next version of our favorite database platform. On the T-SQL front, we finally have a built-in way to perform grouped string concatenation: STRING_AGG().

Let's say we have the following simple table structure:

CREATE TABLE dbo.Objects
(
  [object_id]   int, 
  [object_name] nvarchar(261),
  CONSTRAINT PK_Objects PRIMARY KEY([object_id])
);
 
CREATE TABLE dbo.Columns
(
  [object_id] int NOT NULL
    FOREIGN KEY REFERENCES dbo.Objects([object_id]), 
  column_name sysname,
  CONSTRAINT PK_Columns PRIMARY KEY ([object_id],column_name)
);

For performance tests, we're going to populate this using sys.all_objects and sys.all_columns. But for a simple demonstration first, let's add the following rows:

INSERT dbo.Objects([object_id],[object_name])
  VALUES(1,N'Employees'),(2,N'Orders');
 
INSERT dbo.Columns([object_id],column_name)
  VALUES(1,N'EmployeeID'),(1,N'CurrentStatus'),
        (2,N'OrderID'),(2,N'OrderDate'),(2,N'CustomerID');

If the forums are any indication, it is a very common requirement to return a row for each object, along with a comma-separated list of column names. (Extrapolate that to whatever entity types you model this way – product names associated with an order, part names involved in the assembly of a product, subordinates reporting to a manager, etc.) So, for example, with the above data we'd want output like this:

object       columns
---------    ----------------------------
Employees    EmployeeID,CurrentStatus
Orders       OrderID,OrderDate,CustomerID

The way we would accomplish this in current versions of SQL Server is probably to use FOR XML PATH, as I demonstrated to be the most efficient outside of CLR in this earlier post. In this example, it would look like this:

SELECT [object]  = o.[object_name],
       [columns] = STUFF(
                    (SELECT N',' + c.column_name
                       FROM dbo.Columns AS c
                       WHERE c.[object_id] = o.[object_id]
                       FOR XML PATH, TYPE
                    ).value(N'.[1]',N'nvarchar(max)'),1,1,N'')
FROM dbo.Objects AS o;

Predictably, we get the same output demonstrated above. In SQL Server v.Next, we will be able to express this more simply:

SELECT [object]  = o.[object_name],
       [columns] = STRING_AGG(c.column_name, N',')
FROM dbo.Objects AS o
INNER JOIN dbo.Columns AS c
ON o.[object_id] = c.[object_id]
GROUP BY o.[object_name];

Again, this produces the exact same output. And we were able to do this with a native function, avoiding both the expensive FOR XML PATH scaffolding, and the STUFF() function used to remove the first comma (this happens automatically).

What About Order?

One of the problems with many of the kludge solutions to grouped concatenation is that the ordering of the comma-separated list should be considered arbitrary and non-deterministic.

For the XML PATH solution, I demonstrated in another earlier post that adding an ORDER BY is trivial and guaranteed. So in this example, we could order the column list by column name alphabetically instead of leaving it to SQL Server to sort (or not):

SELECT [object]  = [object_name],
       [columns] = STUFF(
                    (SELECT N',' +c.column_name
                       FROM dbo.Columns AS c
                       WHERE c.[object_id] = o.[object_id]
                       ORDER BY c.column_name -- only change
                       FOR XML PATH, TYPE
                    ).value(N'.[1]',N'nvarchar(max)'),1,1,N'')
FROM dbo.Objects AS o;

Output:

object       columns
---------    ----------------------------
Employees    CurrentStatus,EmployeeID
Order        CustomerID,OrderDate,OrderID

CTP 1.1 adds WITHIN GROUP to STRING_AGG(), so using the new approach, we can say:

SELECT [object]  = o.[object_name],
       [columns] = STRING_AGG(c.column_name, N',')
                   WITHIN GROUP (ORDER BY c.column_name) -- only change
FROM dbo.Objects AS o
INNER JOIN dbo.Columns AS c
ON o.[object_id] = c.[object_id]
GROUP BY o.[object_name];

Now we get the same results. Note that, just like a normal ORDER BY clause, you can add multiple ordering columns or expressions inside WITHIN GROUP ().

All Right, Performance Already!

Using quad-core 2.6 GHz processors, 8 GB of memory, and SQL Server CTP1.1 (14.0.100.187), I created a new database, re-created these tables, and added rows from sys.all_objects and sys.all_columns. I made sure to only include objects that had at least one column:

INSERT dbo.Objects([object_id], [object_name]) -- 656 rows
  SELECT [object_id], QUOTENAME(s.name) + N'.' + QUOTENAME(o.name) 
    FROM sys.all_objects AS o
    INNER JOIN sys.schemas AS s 
    ON o.[schema_id] = s.[schema_id]
    WHERE EXISTS
    (
      SELECT 1 FROM sys.all_columns 
      WHERE [object_id] = o.[object_id]
    );
 
INSERT dbo.Columns([object_id], column_name) -- 8,085 rows 
  SELECT [object_id], name 
    FROM sys.all_columns AS c  
    WHERE EXISTS
    (
      SELECT 1 FROM dbo.Objects 
      WHERE [object_id] = c.[object_id]
    );

On my system, this yielded 656 objects and 8,085 columns (your system may yield slightly different numbers).

The Plans

First, let's compare the plans and Table I/O tabs for our two unordered queries, using Plan Explorer. Here are the overall runtime metrics:

Runtime metrics for XML PATH (top) and STRING_AGG() (bottom)

The graphical plan and Table I/O from the FOR XML PATH query:

Plan and Table I/O for XML PATH, no order

And from the STRING_AGG version:

Plan and Table I/O for STRING_AGG, no ordering

For the latter, the clustered index seek seems a little troubling to me. This seemed like a good case for testing out the seldom-used FORCESCAN hint (and no, this would certainly not help out the FOR XML PATH query):

SELECT [object]  = o.[object_name],
       [columns] = STRING_AGG(c.column_name, N',')
FROM dbo.Objects AS o
INNER JOIN dbo.Columns AS c WITH (FORCESCAN) -- added hint
ON o.[object_id] = c.[object_id]
GROUP BY o.[object_name];

Now the plan and Table I/O tab look a lot better, at least on first glance:

Plan and Table I/O for STRING_AGG(), no ordering, with FORCESCAN

The ordered versions of the queries generate roughly the same plans. For the FOR XML PATH version, a sort is added:

Added sort in FOR XML PATH version

For STRING_AGG(), a scan is chosen in this case, even without the FORCESCAN hint, and no additional sort operation is required – so the plan looks identical to the FORCESCAN version.

At Scale

Looking at a plan and one-off runtime metrics might give us some idea about whether STRING_AGG() performs better than the existing FOR XML PATH solution, but a larger test might make more sense. What happens when we perform the grouped concatenation 5,000 times?

SELECT SYSDATETIME();
GO
 
DECLARE @x nvarchar(max);
SELECT @x = STRING_AGG(c.column_name, N',')
  FROM dbo.Objects AS o
  INNER JOIN dbo.Columns AS c
  ON o.[object_id] = c.[object_id]
  GROUP BY o.[object_name];
GO 5000
SELECT [string_agg, unordered] = SYSDATETIME();
GO
 
DECLARE @x nvarchar(max);
SELECT @x = STRING_AGG(c.column_name, N',')
  FROM dbo.Objects AS o
  INNER JOIN dbo.Columns AS c WITH (FORCESCAN)
  ON o.[object_id] = c.[object_id]
  GROUP BY o.[object_name];
GO 5000
SELECT [string_agg, unordered, forcescan] = SYSDATETIME();
 
GO
DECLARE @x nvarchar(max);
SELECT @x = STUFF((SELECT N',' +c.column_name
  FROM dbo.Columns AS c
  WHERE c.[object_id] = o.[object_id]
  FOR XML PATH, TYPE).value(N'.[1]',N'nvarchar(max)'),1,1,N'')
FROM dbo.Objects AS o;
GO 5000
SELECT [for xml path, unordered] = SYSDATETIME();
 
GO
DECLARE @x nvarchar(max);
SELECT @x = STRING_AGG(c.column_name, N',')
  WITHIN GROUP (ORDER BY c.column_name)
  FROM dbo.Objects AS o
  INNER JOIN dbo.Columns AS c
  ON o.[object_id] = c.[object_id]
  GROUP BY o.[object_name];
GO 5000
SELECT [string_agg, ordered] = SYSDATETIME();
 
GO
DECLARE @x nvarchar(max);
SELECT @x = STUFF((SELECT N',' +c.column_name
  FROM dbo.Columns AS c
  WHERE c.[object_id] = o.[object_id]
  ORDER BY c.column_name
  FOR XML PATH, TYPE).value(N'.[1]',N'nvarchar(max)'),1,1,N'')
FROM dbo.Objects AS o
ORDER BY o.[object_name];
GO 5000
SELECT [for xml path, ordered] = SYSDATETIME();

After running this script five times, I averaged the duration numbers and here are the results:

Duration (milliseconds) for various grouped concatenation approaches

We can see that our FORCESCAN hint really did make things worse – while we shifted the cost away from the clustered index seek, the sort was actually much worse, even though the estimated costs deemed them relatively equivalent. More importantly, we can see that STRING_AGG() does offer a performance benefit, whether or not the concatenated strings need to be ordered in a specific way. As with STRING_SPLIT(), which I looked at back in March, I am quite impressed that this function scales well prior to "v1."

I have further tests planned, perhaps for a future post:

When all the data comes from a single table, with and without an index that supports ordering
Similar performance tests on Linux

In the meantime, if you have specific use cases for grouped concatenation, please share them below (or e-mail me at abertrand@sentryone.com). I'm always open to making sure my tests are as real-world as possible.

The post SQL Server v.Next : STRING_AGG() performance appeared first on SQLPerformance.com.

06 Jan 21:52

The Data Dictionary Demystified

by Michelle Knight

Understanding Big Data and Data Governance goes hand in hand with the concept of a Data Dictionary. Data Dictionaries have been integral to business functions. This article will demystify and help to clarify the Data Dictionary model. What is a Data Dictionary? A Data Dictionary provides the ingredients and steps needed to create relevant business […]

The post The Data Dictionary Demystified appeared first on DATAVERSITY.

06 Jan 21:50

My Top Blog Posts for 2016

by andyleonard

(with a nod to @KrispyKreme ) I’d like to wish everyone reading this post a blessed, peace-filled, and happy 2017! I would also like to thank you for reading this post and any other posts you may have read. Some Blog Post Stats I published 87 blog posts this year at SqlBlog.com . I published most in December (12) and least in October (2). These posts attracted more reads than the others (ordered by most reads in descending order): Installing SQL Server 2016 Developer Edition, One Example SQL Server...(read more)

06 Jan 21:48

Configuring MSDTC Across Domains

by Bill Graziano

One of my clients has applications that use MSDTC. We’re spinning up a SQL Server 2016 Availability Group to take advantage of the MSDTC support it introduced. Most of these distributed transactions take place across linked servers. And of course the new servers are in a new domain with a very limited trust with the old domain.

I found lots of articles and checklists for configuring MSDTC but none had everything I needed. This is the checklist I came up with to configure cross-domain MSDTC with a limited trust in place. (I don’t know what kind of trust it is. I just know I have to put accounts into Domain Local groups in order to use them across the trust. And it’s one way.)

Resources

I found LOTS of articles on configuring MSDTC. These are the ones I found to be the most comprehensive.

Troubleshooting MSDTC Communication Checklist – This is one of the better checklists I found. Has a good list of required ports.
Troubleshooting Problems with MSDTC – Another good resource
DTCPing – A VERY useful utility to confirm basic connectivity.
DTCTester – Utility to perform a distributed transaction.

Configuration

There are the steps that future Bill will need to get this configured next time.

Name Resolution

We were crossing a domain boundary and weren’t getting good NetBIOS name resolution. I had to put a host file entry on both servers pointing to each other with just the machine name. I eventually had to put a second name in for the Availability Group Listener. You need to be able to have a ping –a resolve the names on both sides using only the name and not a FQDN. And if you’re editing the hosts file don’t forget to run NOTEPAD as an Administrator.

Fix the Matching CIDs

I’m not sure exactly how this came to happen but both these machines had the same CID values for MSDTC. (Note: I still don’t really know what a CID is but I know they can’t have the same one – see the second document above.) The only way to change the CID value is reinstall MSDTC. If you’re building machines from images you’ll probably have this problem. You reinstall by running:

msdtc –uninstall

msdtc –install

I suggest a reboot after each step. Until I rebooted I didn’t see the service installed.

Open the Ports

I had the following ports open in the firewall:

Port 135 both ways (for RPC)
The dynamic ports 49152-65535

MSDTC starts talking on 135 and then jumps to a dynamic port. I think the firewall people may have done something fancier but that’s what I told them.

I also had to:

Enable the three inbound Windows Firewall rules for Distributed Transaction Coordinator
Enable the one outbound Windows Firewall rule for Distributed Transaction Coordinator

Those rules exist but they were disabled by default in my environment.

Configure MSDTC Permissions

Using DCOMCNFG.EXE I had to enable the following permissions in MSDTC.

Network DTC Access
Allow Inbound
Allow Outbound
No Authentication Required – This one was a little frustrating. I’d prefer to have them authenticate but I haven’t worked on that enough yet.

There are screenshots of this all over the web that you’ve probably already found.

DTCPing

At this point you should be able to get DTCPing to work. You run it on both sides and they talk to each other simulating a transaction. It give good error messages in the application. It also writes a text file with more detail diagnostic logging. It’s a very handy utility. If you get any errors you’ll need to work through those.

DTCTester

This little command-line utility actually runs a distributed transaction. It creates a temporary table and and inserts a row into it. You’ll need to configure a 32-bit ODBC entry. Look for C:\Windows\SysWOW64\odbcad32.exe to make that entry. The 64-bit utility is in C:\Windows\System32 and has the same name. That’s some confusing naming right there. You’ll also need a SQL Server login and password. If you’re going to test against multiple machines your ODBC source should point to a hosts file entry. It makes testing much easier.

06 Jan 21:47

T-SQL Tuesday Rules of Engagement

by Adam Machanic

This blog has moved! You can find this content at the following new location: http://dataeducation.com/t-sql-tuesday-rules-of-engagement/...(read more)

06 Jan 21:47

T-Mobile Delivers Simplicity to the Internet of Things

by A.R. Guess

by Angela Guess According to a recent press release, “Since the launch of Un-carrier™, T-Mobile’s tackled and solved some of the worst pain points for consumers and businesses, and now, the Un-carrier’s bringing the same approach to the “Internet of Things” (IoT). Today, T-Mobile unveiled two new simple, straightforward, low-cost IoT Access packs with wireless […]

The post T-Mobile Delivers Simplicity to the Internet of Things appeared first on DATAVERSITY.

06 Jan 21:47

The Data Warehouse: From the Past to the Present

by Amber Lee Dennis

Bill Inmon, the “Father of Data Warehousing,” defines a Data Warehouse (DW) as, “a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.” In his white paper, Modern Data Architecture, Inmon adds that the Data Warehouse represents “conventional wisdom” and is now a standard part of the corporate infrastructure. […]

The post The Data Warehouse: From the Past to the Present appeared first on DATAVERSITY.

06 Jan 21:46

Welcome 2017! Recap of top SQL Server posts of 2016

by SQL Server Team

We start off the new year with a quick recap of our most popular posts on this blog channel from the year just concluded.

Here are the posts that got the most number of views from our readers during 2016, in increasing order of readership:

10. SQL Server 2016 Express Edition in Windows containers

In October, we announced the public availability of SQL Server 2016 Express Edition in Windows Containers, a capability that comes in especially handy when you need to quickly create SQL Server instances for development or testing; or to maximize density in a test or production environment (especially in microservice architectures); or when you need to isolate and control apps in a multi-tenant infrastructure.

9. SQL Server 2016 Service Pack 1 generally available

A couple of months back, we made several new capabilities available as part of the SP1 release of SQL Server 2016. Developers can now build to a common programming surface across all editions – and do tons more, see the graphic below or read the full article for more.

8. WideWorldImporters: The new SQL Server sample database

We figured a major release of the magnitude of SQL Server 2016 deserved its own new sample database. Besides, it was time for an upgrade of the erstwhile AdventureWorks database, which had a good run for over a decade. The new Wide World Importers sample illustrates how the rich feature set of SQL Server can be used in a real world setting. While it doesn’t attempt to use every single feature (as that would be unrealistic!) it does showcase a variety of key SQL Server 2016 capabilities and performance enhancements.

7. Preview the newest ODBC SQL Server Driver for Windows and Linux

Early last year, we announced our community technology preview of Microsoft ODBC Driver 13 for SQL Server on Windows and Linux, supporting Ubuntu, RedHat and SUSE distributions, providing robust data access to Microsoft SQL Server and Microsoft Azure SQL Database via ODBC on Windows and Linux platforms.

6. Microsoft announces major commitment to Apache Spark

Back in June last year, we announced an extensive commitment for Spark to power Microsoft’s big data and analytics offerings, including Cortana Intelligence, Power BI, and Microsoft R Server.

5. JSON in SQL Server 2016: Part 1 of 4

JSON is one of the most widely used data exchange formats, with most modern services returning information as JSON text, and web browsers using information formatted as JSON. JSON is the common format when semi-structured information needs to be stored, such as in files. Because so much data is formatted as JSON, it’s critical for SQL Server to process JSON text retrieved from other systems, and to format information retrieved from SQL Server tables as JSON text. This was the first post in a 4-part series; use the links that follow to read Part 2, Part 3 and Part 4 of the series.

4. SQL Server on Linux: How? Introduction

Many of you in the community expressed your curiosity about how we made SQL Server run on Linux, so – just before the holidays – we kicked off a blog series last month to share technical details about this very topic, starting with an introduction to the journey of offering SQL Server on Linux.

3. Announcing SQL Server on Linux public preview, first preview of next release of SQL Server

The community shared our excitement when we announced the first public preview of SQL Server on Linux in November, allowing developers to create their SQL Server solutions on Linux, Windows, Docker, or macOS (via Docker), and then deploy to Linux, Windows, or Docker, on-premises or in the cloud. There’s been tremendous interest from large enterprises as well, with a majority of Fortune 500 companies signing up for this preview release.

2. SQL Server 2016 is generally available today

Perhaps our biggest announcement of the year was the general availability of SQL Server 2016, back in June. There was also this post a month prior to the release, recapping key capabilities and benchmarks we achieved. As the world’s fastest, most price-performant database for HTAP (Hybrid Transactional and Analytical Processing) with updateable, in-memory columnstores and advanced analytics via deep integration with R Services, SQL Server offers industry-leading TCO and is a truly watershed product.

1. Microsoft SQL Server Developer Edition is now free

The best things in life are free, as they say – and so it was certainly so for our announcement that SQL Server Developer Edition was being made a free download for all Visual Studio Dev Essentials members. Note that the SQL Server Developer Edition is only for development and testing, and not for production environments or for use with production data. This announcement was an important step in Microsoft making SQL Server ever more broadly accessible.

Stay tuned to this channel for much more exciting news during the year ahead about databases in general, and the SQL Server family in particular.

We wish all of our readers a very happy new year!
SQL Server Blog Team

06 Jan 21:46

Create quick queries quickly with SSAS in Azure

by James Serra

In a perfect world a data warehouse would always return all queries very quickly, no matter what the size or complexity of the data warehouse. But the reality is that sometimes we won’t get the performance we need from the data warehouse, and the solution is to use a SQL Server Analysis Services (SSAS) cube, which also has other benefits (see Why use a SSAS cube?). Thanks to Azure, creating a cube can be done in a matter of hours, giving you quick results for just about any query no matter how big the data warehouse gets.

If you want to create a cube quickly you can do so by creating a VM in Azure and installing SQL Server and SSAS (or choosing a VM from the marketplace with SQL Server and SSAS already installed) and assuming the following restrictions:

No high-availability needed (can be accomplished in the future by creating another VM in an availability set and syncing cubes and using a Azure load balancer)
No disaster recovery needed (can be accomplished in the future by creating another VM in another region and syncing cubes). Solution for now would be to backup the cube and restore if needed, or just rebuild and reprocess the cube
Number of concurrent connections under a hundred. To handle more connections in the future can be accomplished by scaling up the VM and/or creating multiple VMs and syncing cubes and adding a load balancer (keeping in mind VMs can be shut-down off hours to save cost). The number of concurrent connections is highly variable based on the type of querying and caching. See SQL Server Best Practices Article
Using SSAS tabular model, and the size of the cube is less than 180GB. Cube compression is between 10-100x and the 180GB number is with a conservative average of 10x and using 60% of the servers memory. A larger cube can be accomplished via a SSAS multidimensional model as it does not require the cube in memory like Tabular. The multidimensional SSAS cube can use HOLAP storage mode to support more source data, as the cube will not have a copy of the detail records from the source system, greatly reducing cube size and processing time. Drilling down to the detail records can still be made by the cube which will accomplish this by querying the source system
Available maintenance window at night to process the tabular cube (in the future can add a “cube processing VM” to process cube and then sync the cubes to avoid queries hitting a cube when it is being processed and slowing the queries down)
Only need to update the cube once a day (in the future can use VM to process cube and sync as well as use partitioning)
Future expansion can be via Azure Analysis Services (in preview) as well as creating multiple cubes by subject areas and using a true scale-out architecture
It’s a balance between using extra developer hours to save cost and choosing more hardware up-front

To choose the appropriate size VM, see Hardware Sizing a Tabular Solution (SQL Server Analysis Services) and Sizes for virtual machines in Azure and Largest VM in the Cloud and How Much RAM Do I Need For My SSAS Tabular Server?

More info:

SQL Server 2008 R2 Analysis Services Operations Guide

Why a Semantic Layer Like Azure Analysis Services is Relevant (Part 1)

06 Jan 21:46

Frequent Password Expiration: Time to Revise it?

by Artemakis Artemiou [MVP]

Until recently, enabling the password expiration option in SQL Server was included in many security best practices. However, recent studies, revised this recommendation and support that it should not be further included in SQL Server’s security best practices. However, if this is the case, how can this recommendation be replaced with new, modern best practices that would take into consideration

06 Jan 21:46

How six lines of code + SQL Server can bring Deep Learning to ANY App

by Rimma Nehme

The link to the source code is here.

The What Part

Deep Learning is a hot buzzword of today. The recent results and applications are incredibly promising, spanning areas such as speech recognition, language understanding and computer vision. Indeed, Deep Learning is now changing the very customer experience around many of Microsoft’s products, including HoloLens, Skype, Cortana, Office 365, Bing and more. Deep Learning is also a core part of Microsoft’s development platform offerings with an extensive toolset that includes: the Microsoft Cognitive Toolkit, the Cortana Intelligence Suite, Microsoft Cognitive Services APIs, Azure Machine Learning, the Bot Framework, and the Azure Bot Service. Our Deep Learning based language translation in Skype was recently named one of the 7 greatest software innovations of the year by Popular Science, and this technology has now helped machines achieve human-level parity in conversational speech recognition. To learn more about our Deep Learning journey, I encourage you to read a recent blog From “A PC on every desktop” to “Deep Learning in every software”.

The applications of Deep Learning technology are truly so far reaching that the new mantra, of Deep Learning in Every Software, may well become a reality within this decade. The venerable SQL Server DBMS is no exception. Can SQL Server do Deep Learning? The response to this is enthusiastic “yes!” With the public preview of the next release of SQL Server, we’ve added significant improvements into R Services inside SQL Server including a very powerful set of machine learning functions that are used by our own product teams across Microsoft. This brings new machine learning and deep neural network functionality with increased speed, performance and scale to database applications built on SQL Server. We have just recently showcased SQL Server running more than one million R predictions per second, using SQL Server as a Machine Learning Model Management System and I encourage you all to try out R examples and machine learning templates for SQL Server on GitHub.

In this blog, I wanted to address the finer points of the matter – the what, the why and the how part of Deep Learning in SQL Server. With this new clarity, it will be easier to see a picture of the road forward for data-driven machine intelligence using such a powerful data platform like SQL Server.

The Why Part

Today, every company is a data company, and every app is a data app.

When you put intelligence (AI, ML, AA, etc.) close to where the data lives, then every app becomes an intelligent app. SQL Server can help developers and customers everywhere realize the holy grail of deep learning in their applications with just a few lines of code. It enables It enables data developers to deploy mission critical operational systems that embed deep learning models. So here are the 10 whys for deep learning in SQL Server.

The 10 Whys of Deep Learning inside SQL Server

By pushing intelligence close to where your data lives (i.e., SQL Server), you get security, compliance, privacy, encryption, master data services, availability groups, advanced BI, in-memory, virtualization, geo-spatial, temporal, graph capabilities and other world-class features.
You can do both near “real-time intelligence” or “batch intelligence” (similar in spirit to OLTP and OLAP, but applied to Deep Learning and intelligence).
Your apps built on top of SQL Server don’t need to change to take advantage of Deep Learning, and a multitude of apps (web, mobile, IoT) can share the same deep learning models without duplicating code.
You can exploit a number of functionalities that come in machine learning libraries (e.g., MicrosoftML) that will drive the productivity of your data scientists, developers and DBAs and business overall. This might be faster and far more efficient than doing it in the house.
You can develop predictable solutions that can evolve/scale up as you need. With the latest service pack of SQL Server, many features that were only available in the Enterprise Edition are now available in the Standard/Express/Web Edition of SQL Server. That means you can do Deep Learning using a standard SQL Server without high costs.
You can use heterogeneous external data sources (via Polybase) for training and inference of deep models.
You can create versatile data simulations and what-if scenarios inside SQL Server and then train a variety of rich Deep Learning models in those simulated worlds to enable intelligence even with a limited training data.
You can operationalize Deep Learning models in a very easy and fast way using stored procedures and triggers.
You get all the tools, monitoring, debugging and ecosystem around SQL Server applicable to intelligence. SQL Server can literally become your Machine Learning Management System and handle the entire life cycle of DNN models along with data.
You can generate new knowledge and insights on the data you are storing already and anyways without having any impact on your transactional workload (via HTAP pattern).

Let’s be honest, nobody buys a DBMS for the sake of DBMS. People buy it for what it enables you to do. By putting deep learning capabilities inside SQL Server, we can scale artificial intelligence and machine learning both in traditional sense (scale of data, throughput, latency), but we also scale it in terms of productivity (low barrier to adoption and lower learning curve). The value that it brings results in so many shapes and forms – time, better experience, productivity, lower $ cost, higher $ revenue, opportunity, higher business aspirations, thought-leadership in an industry, etc.

Real-life applications of Deep Learning running inside SQL Server span banking, healthcare, finance, manufacturing, retail, e-commerce and IoT. With applications like fraud detection, disease prediction, power consumption prediction, personal analytics, you have the ability to transform existing industries and apps. That also means whatever workloads you are running using SQL Server, be it CRM, ERP, DW, OLTP, BD… you can add Deep Learning to them almost seamlessly. Furthermore, it’s not just about doing deep learning standalone, but it’s rather about combining it with all kinds of data and analytics that SQL Server is so great at (e.g., processing structured data, JSON data, geo-spatial data, graph data, external data, temporal data). All that is really left to be added to this mix is… your creativity.

The How Part

Here is a great scenario to show all of this in reality. I am going to use an example of predicting galaxy classes from image data – using the power of Microsoft R and its new MicrosoftML package for machine learning (which has been built by our Algorithms and Data Science team). And I am going to do all this in SQL Server with R Services on a readily available Azure NC VM. I am going to classify the images of galaxies and other celestial objects into 13 different classes based on the taxonomy created by astronomers – mainly elliptical and spirals and then various sub-categories within them. The shape and other visual features of galaxies change as they evolve. Studying the shapes of galaxies and classifying them appropriately helps scientists learn how the universe is evolving. It is very easy for us humans to look at these images and put them in the right buckets based on the visual features. But in order to scale it to the 2 trillion known galaxies I need help from machine learning and techniques like deep neural networks – so that is exactly what I am going to use. It’s not a big leap to imagine that instead of astronomy data, we have healthcare data or financial data or IoT data and we are trying to make predictions on that data.

An app

Imagine a simple web app that loads images from a folder and then classifies them into different categories – spiral or elliptical and then sub-types with those categories (e.g., is it a regular spiral or does it have a handlebar structure in the center).

The classification can be done incredibly fast on vast amounts of images. Here is an example output:

The first two columns are the elliptical types and the others are of different spiral types.

So how does this simple app do this amazingly complex computation?

The code for such an app actually isn’t doing much – it just writes the paths to the new files to classify into a database table (the rest of the app code is plumbing and page layout, etc).

SqlCommand Cmd = new SqlCommand("INSERT INTO [dbo].[GalaxiesToScore] ([path] ,[PredictedLabel]) "

What is happening in the database?

Prediction and operationalization part:

Let’s look at the table where the app writes the image paths. It contains a column with paths to the galaxy images, and a column to store the predicted classes of galaxies. As soon as a new row of data gets entered into this table, a trigger gets executed.

The trigger in turn invokes a stored procedure – PredictGalaxiesNN as shown below (with R script portion embedded inside the stored proc):

This is where the magic happens – in these few lines of R code. This R script takes two inputs – the new rows of data (that have not been scored yet) and the model that is stored in a table as varbinary(max). I will talk about how the model got there in a minute. Inside the script, the model gets de-serialized and is used by the familiar scoring function (rxPredict) in this line:

scores <- rxPredict(modelObject = model_un, data = InputDataSet,  extraVarsToWrite="path")

to score the new rows and then write the scored output out. This is a new variant of rxPredict which understands the ML algorithms included in the new Microsoft ML package. This line

 [ library("MicrosoftML") ]

loads the new package that contains the new ML algorithms. In addition to DNN (the focus of this blog), there are five other powerful ML algorithms in this package – fast linear learner, fast tree, fast forest, one class SVM for anomaly detection, regularized logistic regression (with L1 and L2) and neural nets. So, with just 6-7 lines of R code, you can enable any app to get the intelligence from a DNN based model. All that the apps needs to do is connect to SQL Server. By the way, you can now very easily generate a stored procedure for R Code using the sqlrutils package.

What about training the model?

Where was the model trained? Well, the model was trained in SQL Server as well. However, it does not have to be trained on SQL Server – it could have been trained on a separate machine with a standalone R Server running on it, on-prem or on the cloud. Today we have these new ML algorithms on Windows version of R Server, and the support for other platforms is coming soon. I just chose to do the training in the SQL server box here, but I could have done it outside as well. Let’s look at the stored proc with the training code.

Training code:

The model training is done in these lines of code.

This new function – rxNeuralNet from the MicrosoftML package for training a DNN. The code looks similar to other R and rx functions – there is a formula, an input dataset, and some other parameters. One of the parameters here is this line “netDefinition = netDefinition”. This is where the neural network is being defined.

Network definition:

Here is the DNN definition in this portion of the code:

Here, a deep neural net is defined using Net# specification language that was created for this purpose. It has 1 input, 1 output and 8 hidden layers. It starts with an input layer of 50×50 pixels and 3 colors (RGB) image data. First hidden layer is a convolution layer where we specify the kernel (small sub-part of the image) size and how many times we want the kernel to map to other kernels (convolute). There are some other layers for more convolutions, and for normalization and pooling that help stabilize the network. And finally, the output layer that maps it to one of the 13 classes. In about 50 lines of Net# specification, I have defined a complex neural network. Net# is documented on MSDN.

Training data size/GPU:

Here is the R code to do the training.

Some other lines to note here are – ‘training_rows = 238000’. This model was trained on 238K images that we got from Sloan Digital Sky Survey dataset. We then created two variants of each image with 45% and 90% rotations. In all there was about 700K images to train on. That’s a lot of image data to train on – so, how long did it take to train it? Well, we were able to train this model in under 4 hours. This is a decent sized machine – 6 cores and 56GB or RAM, but then it also has a powerful Nvidia Tesla K80 GPU. In fact, it is an Azure VM – the new NC series GPU VM, readily available to anyone with an Azure subscription. We were able to leverage the GPU computation by specifying one simple parameter: acceleration = “gpu”. Without GPU, the training takes roughly 10X more time.

The What Are You Waiting For Part

So with just a few lines of R code using algorithms from the MicrosoftML package, I was able to train a DNN on tons of image data and operationalize the trained model in SQL using R services such that any app connected to SQL can get this type of Intelligence easily. That’s the power of Microsoft R and the Microsoft ML package in it combined with SQL Server. This is just the beginning, and we are working on adding more algorithms on our quest to democratize the power of AI and machine learning. You can download the MicrosoftML: Algorithm Cheat Sheet here to help you choose the right machine learning algorithm for a predictive analytics model.

Don’t wait, go ahead and give it a try.

@rimmanehme

06 Jan 21:45

Apache Zeppelin 0.6.2 for SQL Server Docker Image updated

by Davide Mauri

In order to be able to keep created notebooks and interpreters configurations when upgrading the docker image to a newer version, I changed the dockerfile to use docker volumes, as described here (if you’re interested into learning more):

Manage data in containers

to run the container, now, the command is

docker run --name zeppelin -p 8080:8080 -v /zeppelin-sqlserver/notebook -v /zeppelin-sqlserver/conf -d yorek/zeppelin-sqlserver:v0.6.2

The -v does the trick and will be very useful the first time a new image will be released, so that you’ll be able to keep all your notebooks without having to export them before and, in addition, also interpreter configurations will be preserved.

The solution used until now (sharing a volume with the host) works nice, but unfortunately works only for notebooks. If you have a lot of different interpreter configured (like me) re-configuring them every time the image is updated is really time consuming and boring.

To be sure that your container is using volumes, you can check it using the inspect command:

docker inspect zeppelin

The output is a JSON with detailed information on the container. Look for the Mounts node:

If you are using a previous version of the image, my recommendation is to download this updated one so that you’ll be ready for future updates.

If you’re not using Apache Zeppelin yet, you should really start. No matter if you are a Data Scientists or a DBA, Zeppelin is *really* useful to create nice visualization and dashboard just using T-SQL:

06 Jan 21:45

SQL Server v.Next : STRING_AGG Performance, Part 2

by Aaron Bertrand

Last week, I made a couple of quick performance comparisons, pitting the new STRING_AGG() function against the traditional FOR XML PATH approach I've used for ages. I tested both undefined/arbitrary order as well as explicit order, and STRING_AGG() came out on top in both cases:

SQL Server v.Next : STRING_AGG() Performance, Part 1

For those tests, I left out several things (not all intentionally):

Mikael Eriksson and Grzegorz Łyp both pointed out that I was not using the absolute most efficient FOR XML PATH construct (and to be clear, I never have).
I did not perform any tests on Linux; only on Windows. I don't expect those to be vastly different, but since Grzegorz saw very different durations, this is worth further investigation.
I also only tested when output would be a finite, non-LOB string – which I believe is the most common use case (I don't think people will commonly be concatenating every row in a table into a single comma-separated string, but this is why I asked in my previous post for your use case(s)).
For the ordering tests, I did not create an index that might be helpful (or try anything where all the data came from a single table).

In this post, I'm going to deal with a couple of these items, but not all of them.

FOR XML PATH

I had been using the following:

... FOR XML PATH, TYPE).value(N'.[1]', ...

After this comment from Mikael, I have updated my code to use this slightly different construct instead:

... FOR XML PATH(''), TYPE).value(N'text()[1]', ...

Linux vs. Windows

Initially, I had only bothered to run tests on Windows:

Microsoft SQL Server vNext (CTP1.1) - 14.0.100.187 (X64) 
	Dec 10 2016 02:51:11 
	Copyright (C) 2016 Microsoft Corporation. All rights reserved.
	Developer Edition (64-bit) on Windows Server 2016 Datacenter 6.3  (Build 14393: ) (Hypervisor)

But Grzegorz made a fair point that he (and presumably many others) only had access to the Linux flavor of CTP 1.1. So I added Linux to my test matrix:

Microsoft SQL Server vNext (CTP1.1) - 14.0.100.187 (X64) 
	Dec 10 2016 02:51:11 
	Copyright (C) 2016 Microsoft Corporation. All rights reserved.
	on Linux (Ubuntu 16.04.1 LTS)

Some interesting but completely tangential observations:

@@VERSION doesn't show edition in this build, but SERVERPROPERTY('Edition') returns the expected Developer Edition (64-bit).
Based on the build times encoded into the binaries, the Windows and Linux versions seem to now be compiled at the same time and from the same source. Or this was one crazy coincidence.

Unordered tests

I started by testing the arbitrarily ordered output (where there is no explicitly defined ordering for the concatenated values). Following Grzegorz, I used WideWorldImporters (Standard), but performed a join between Sales.Orders and Sales.OrderLines. The fictional requirement here is to output a list of all orders, and along with each order, a comma-separated list of each StockItemID.

Since StockItemID is an integer, we can use a defined varchar, which means the string can be 8000 characters before we have to worry about needing MAX. Since an int can be a max length of 11 (really 10, if unsigned), plus a comma, this means an order would have to support about 8,000/12 (666) stock items in the worst case scenario (e.g. all StockItemID values have 11 digits). In our case, the longest ID is 3 digits, so until data gets added, we would actually need 8,000/4 (2,000) unique stock items in any single order to justify MAX. In our case, there are only 227 stock items in total, so MAX isn't necessary, but you should keep an eye on that. If such a large string is possible in your scenario, you'll need to use varchar(max) instead of the default (STRING_AGG() returns nvarchar(max), but truncates to 8,000 bytes unless the input is a MAX type).

The initial queries (to show sample output, and to observe durations for single executions):

SET STATISTICS TIME ON;
GO
 
SELECT o.OrderID, StockItemIDs = STRING_AGG(ol.StockItemID, ',')
  FROM Sales.Orders AS o
  INNER JOIN Sales.OrderLines AS ol
  ON o.OrderID = ol.OrderID
  GROUP BY o.OrderID;
GO
 
SELECT o.OrderID, 
  StockItemIDs = STUFF((SELECT ',' + CONVERT(varchar(11),ol.StockItemID)
       FROM Sales.OrderLines AS ol
       WHERE ol.OrderID = o.OrderID
       FOR XML PATH(''), TYPE).value(N'text()[1]',N'varchar(8000)'),1,1,'')
  FROM Sales.Orders AS o
  GROUP BY o.OrderID;
GO
 
SET STATISTICS TIME OFF;
 
/*
   Sample output:
 
       OrderID    StockItemIDs
       =======    ============
       1          67
       2          50,10
       3          114
       4          206,130,50
       5          128,121,155
 
   Important SET STATISTICS TIME metrics (SQL Server Execution Times):
 
      Windows:
        STRING_AGG:    CPU time =  217 ms,  elapsed time =  405 ms.
        FOR XML PATH:  CPU time = 1954 ms,  elapsed time = 2097 ms.
 
      Linux:
        STRING_AGG:    CPU time =  627 ms,  elapsed time =  472 ms.
        FOR XML PATH:  CPU time = 2188 ms,  elapsed time = 2223 ms.
*/

I ignored the parse and compile time data completely, as they were always exactly zero or close enough to be irrelevant. There were minor variances in the execution times for each run, but not much – the comments above reflect the typical delta in runtime (STRING_AGG seemed to take a little advantage of parallelism there, but only on Linux, while FOR XML PATH did not on either platform). Both machines had a single socket, quad-core CPU allocated, 8 GB of memory, out-of-the-box configuration, and no other activity.

Then I wanted to test at scale (simply a single session executing the same query 500 times). I didn't want to return all of the output, as in the above query, 500 times, since that would have overwhelmed SSMS – and hopefully doesn't represent real-world query scenarios anyway. So I assigned the output to variables and just measured the overall time for each batch:

SELECT sysdatetime();
GO
 
DECLARE @i int, @x varchar(8000);
SELECT @i = o.OrderID, @x = STRING_AGG(ol.StockItemID, ',')
  FROM Sales.Orders AS o
  INNER JOIN Sales.OrderLines AS ol
  ON o.OrderID = ol.OrderID
  GROUP BY o.OrderID;
GO 500
 
SELECT sysdatetime();
GO
 
DECLARE @i int, @x varchar(8000);
SELECT @i = o.OrderID, 
    @x = STUFF((SELECT ',' + CONVERT(varchar(11),ol.StockItemID)
       FROM Sales.OrderLines AS ol
       WHERE ol.OrderID = o.OrderID
       FOR XML PATH(''), TYPE).value(N'text()[1]',N'varchar(8000)'),1,1,'')
  FROM Sales.Orders AS o
  GROUP BY o.OrderID;
GO 500
 
SELECT sysdatetime();

I ran those tests three times, and the difference was profound – nearly an order of magnitude. Here is the average duration across the three tests:

Average duration, in milliseconds, for 500 executions of variable assignment

I tested a variety of other things this way as well, mostly to make sure I was covering the types of tests Grzegorz was running (without the LOB part).

Selecting just the length of the output
Getting the max length of the output (of an arbitrary row)
Selecting all of the output into a new table

Selecting just the length of the output

This code merely runs through each order, concatenates all of the StockItemID values, and then returns just the length.

SET STATISTICS TIME ON;
GO
 
SELECT LEN(STRING_AGG(ol.StockItemID, ','))
  FROM Sales.Orders AS o
  INNER JOIN Sales.OrderLines AS ol
  ON o.OrderID = ol.OrderID
  GROUP BY o.OrderID;
GO
 
SELECT LEN(STUFF((SELECT ',' + CONVERT(varchar(11),ol.StockItemID)
       FROM Sales.OrderLines AS ol
       WHERE ol.OrderID = o.OrderID
       FOR XML PATH(''), TYPE).value(N'text()[1]',N'varchar(8000)'),1,1,''))
  FROM Sales.Orders AS o
  GROUP BY o.OrderID;
GO
 
SET STATISTICS TIME OFF;
 
/*
  Windows:
    STRING_AGG:   CPU time =  142 ms,  elapsed time =  351 ms.
    FOR XML PATH: CPU time = 1984 ms,  elapsed time = 2120 ms.
 
  Linux:
    STRING_AGG:   CPU time =  310 ms,  elapsed time =  191 ms.
    FOR XML PATH: CPU time = 2149 ms,  elapsed time = 2167 ms.    
*/

For the batched version, again, I used variable assignment, rather than try to return many resultsets to SSMS. The variable assignment would end up on an arbitrary row, but this still requires full scans, because the arbitrary row isn't selected first.

SELECT sysdatetime();
GO
 
DECLARE @i int;
SELECT @i = LEN(STRING_AGG(ol.StockItemID, ','))
  FROM Sales.Orders AS o
  INNER JOIN Sales.OrderLines AS ol
  ON o.OrderID = ol.OrderID
  GROUP BY o.OrderID;
GO 500
 
SELECT sysdatetime();
GO
 
DECLARE @i int;
SELECT @i = LEN(STUFF((SELECT ',' + CONVERT(varchar(11),ol.StockItemID)
       FROM Sales.OrderLines AS ol
       WHERE ol.OrderID = o.OrderID
       FOR XML PATH(''), TYPE).value(N'text()[1]',N'varchar(8000)'),1,1,''))
  FROM Sales.Orders AS o
  GROUP BY o.OrderID;
GO 500
 
SELECT sysdatetime();

Performance metrics of 500 executions:

500 executions of assigning LEN() to a variable

Again, we see FOR XML PATH is far slower, on both Windows and Linux.

Selecting the maximum length of the output

A slight variation on the previous test, this one just retrieves the maximum length of the concatenated output:

SET STATISTICS TIME ON;
GO
 
SELECT MAX(s) FROM (SELECT s = LEN(STRING_AGG(ol.StockItemID, ','))
  FROM Sales.Orders AS o
  INNER JOIN Sales.OrderLines AS ol
  ON o.OrderID = ol.OrderID
  GROUP BY o.OrderID) AS x;
GO
 
SELECT MAX(s) FROM (SELECT s = LEN(STUFF(
    (SELECT ',' + CONVERT(varchar(11),ol.StockItemID)
       FROM Sales.OrderLines AS ol
       WHERE ol.OrderID = o.OrderID
       FOR XML PATH(''), TYPE).value(N'text()[1]',N'varchar(8000)'),
	1,1,''))
  FROM Sales.Orders AS o
  GROUP BY o.OrderID) AS x;
GO
 
SET STATISTICS TIME OFF;
 
/*
  Windows:
    STRING_AGG:   CPU time =  188 ms,  elapsed time =  48 ms.
    FOR XML PATH: CPU time = 1891 ms,  elapsed time = 907 ms.
 
  Linux:
    STRING_AGG:   CPU time =  270 ms,  elapsed time =   83 ms.
    FOR XML PATH: CPU time = 2725 ms,  elapsed time = 1205 ms.
*/

And at scale, we just assign that output to a variable again:

SELECT sysdatetime();
GO
 
DECLARE @i int;
SELECT @i = MAX(s) FROM (SELECT s = LEN(STRING_AGG(ol.StockItemID, ','))
  FROM Sales.Orders AS o
  INNER JOIN Sales.OrderLines AS ol
  ON o.OrderID = ol.OrderID
  GROUP BY o.OrderID) AS x;
GO 500
 
SELECT sysdatetime();
GO
 
DECLARE @i int;
SELECT @i = MAX(s) FROM (SELECT s = LEN(STUFF
  (
    (SELECT ',' + CONVERT(varchar(11),ol.StockItemID)
       FROM Sales.OrderLines AS ol
       WHERE ol.OrderID = o.OrderID
       FOR XML PATH(''), TYPE).value(N'text()[1]',N'varchar(8000)'),
	1,1,''))
  FROM Sales.Orders AS o
  GROUP BY o.OrderID) AS x;
GO 500
 
SELECT sysdatetime();

Performance results, for 500 executions, averaged across three runs:

500 executions of assigning MAX(LEN()) to a variable

You might start to notice a pattern across these tests – FOR XML PATH is always a dog, even with the performance improvements suggested in my previous post.

SELECT INTO

I wanted to see if the method of concatenation had any impact on writing the data back to disk, as is the case in some other scenarios:

SET NOCOUNT ON;
GO
SET STATISTICS TIME ON;
GO
 
DROP TABLE IF EXISTS dbo.HoldingTank_AGG;
 
SELECT o.OrderID, x = STRING_AGG(ol.StockItemID, ',')
  INTO dbo.HoldingTank_AGG
  FROM Sales.Orders AS o
  INNER JOIN Sales.OrderLines AS ol
  ON o.OrderID = ol.OrderID
  GROUP BY o.OrderID;
GO
 
DROP TABLE IF EXISTS dbo.HoldingTank_XML;
 
SELECT o.OrderID, x = STUFF((SELECT ',' + CONVERT(varchar(11),ol.StockItemID)
       FROM Sales.OrderLines AS ol
       WHERE ol.OrderID = o.OrderID
       FOR XML PATH(''), TYPE).value(N'text()[1]',N'varchar(8000)'),1,1,'')
  INTO dbo.HoldingTank_XML
  FROM Sales.Orders AS o
  GROUP BY o.OrderID;
GO
 
SET STATISTICS TIME OFF;
 
/*
  Windows:
    STRING_AGG:   CPU time =  218 ms,  elapsed time =   90 ms.
    FOR XML PATH: CPU time = 4202 ms,  elapsed time = 1520 ms.
 
  Linux:
    STRING_AGG:   CPU time =  277 ms,  elapsed time =  108 ms.
    FOR XML PATH: CPU time = 4308 ms,  elapsed time = 1583 ms.
*/

In this case we see that perhaps SELECT INTO was able to take advantage of a bit of parallelism, but still we see FOR XML PATH struggle, with runtimes an order of magnitude longer than STRING_AGG.

The batched version just swapped out the SET STATISTICS commands for SELECT sysdatetime(); and added the same GO 500 after the two main batches as with the previous tests. Here is how that panned out (again, tell me if you've heard this one before):

500 executions of SELECT INTO

Ordered Tests

I ran the same tests using the ordered syntax, e.g.:

... STRING_AGG(ol.StockItemID, ',') 
    WITHIN GROUP (ORDER BY ol.StockItemID) ...
 
... WHERE ol.OrderID = o.OrderID
    ORDER BY ol.StockItemID
    FOR XML PATH('') ...

This had very little impact on anything – the same set of four test rigs showed nearly identical metrics and patterns across the board.

I will be curious to see if this is different when the concatenated output is in non-LOB or where the concatenation needs to order strings (with or without a supporting index).

Conclusion

For non-LOB strings, it is clear to me that STRING_AGG has a definitive performance advantage over FOR XML PATH, on both Windows and Linux. Note that, to avoid the requirement of varchar(max) or nvarchar(max), I didn't use anything similar to the tests Grzegorz ran, which would have meant simply concatenating all of the values from a column, across an entire table, into a single string. In my next post, I'll take a look at the use case where the output of the concatenated string could feasibly be greater than 8,000 bytes, and so LOB types and conversions would have to be used.

The post SQL Server v.Next : STRING_AGG Performance, Part 2 appeared first on SQLPerformance.com.

06 Jan 21:43

Microsoft Plans Big Reorganization Of Partner and Services Groups Starting Feb 1

by msmash

Microsoft is planning to consolidate many of its partner and services teams in a reorganization, as well as add a new digital team focused on the cloud. From a report: Microsoft confirmed that the shift will take effect Feb. 1, and the goal is to unify teams across its Worldwide Commercial Business group. Microsoft has been working to break down barriers between teams to better serve partners and customers for years, going back to CEO Steve Ballmer's 2013 One Microsoft plan. Microsoft said in a statement that no layoffs will occur as a direct result of the reorganization, which is being announced internally today. Microsoft added that "like all companies, Microsoft reviews its resources and investments on an ongoing basis." As part of the move, Microsoft says it will bring together its enterprise and partner group and public sector, small and mid-market solutions, and partners teams. A new group called Microsoft Digital will push Microsoft's current customers and partners to use the company's cloud programs.

SQL Server 2016 Enterprise Edition Performance Advantages

by Glenn Berry

On November 16, 2016, Microsoft announced some very significant changes for SQL Server 2016 Standard Edition, which were implemented in SQL Server 2016 Service Pack 1 (Build 13.0.4001.0). Many very useful programmability-related features that were previously only available in Enterprise Edition will now be available in Standard Edition (and also in Web Edition and even Express Edition).

Once you have a database application using SQL Server 2016 Standard Edition Service Pack 1 (or even a lower edition), you can just perform an edition upgrade to Enterprise Edition to get even more scalability and performance, taking advantage of the higher license limits for sockets, cores, and memory in Enterprise Edition, as detailed here.

You will also get the many other intrinsic performance benefits that are present in Enterprise Edition, along with multiple manageability improvements that make your life so much easier as a DBA.

Columnstore Indexes

If you are using Columnstore indexes, you get the following performance benefits automatically, when you use Enterprise Edition:

Aggregate Pushdown: This performance feature often gives a 2X-4X query performance gain by pushing qualifying aggregates to the SCAN node, which reduces the number of rows coming out of that iterator.
Index Build/Rebuild: Enterprise Edition can build/rebuild columnstore indexes with multiple processor cores, while Standard Edition only uses one processor core. This has a pretty significant effect on elapsed times for these operations, depending on your hardware.
Local Aggregates: Enterprise Edition can use local aggregations to filter the number of rows passing out of a SCAN node, reducing the amount of work that needs to be done by subsequent query nodes. You can confirm this by looking for the “ActualLocallyAggregatedRows” attribute in the XML of the execution plan for the query.
Single Instruction Multiple Data (SIMD) Optimizations: This feature uses a set of hardware instructions which are capable of processing an array of data in a single instruction, dramatically speeding up aggregate operations. These hardware instructions are present on all modern processors (that have AVX support), but they are only used by Enterprise Edition.
String Predicate Pushdown: This performance feature can improve performance of queries using predicate(s) on string columns by pushing these predicates to the SCAN node. This can greatly reduce the amount of work that needs to be done by subsequent nodes.
Degree of Parallelism: Batch mode queries are limited to MAXDOP = 2 on Standard Edition. Enterprise Edition can use all of the cores that are present for the instance. This can be very significant on larger queries on typical, modern server hardware.
Memory limits: The Columnstore object pool is limited to 32GB per instance on Standard Edition. Enterprise Edition does not have any memory limitation for the Columnstore object pool.

In order to test these performance assertions, I ran some fairly simple tests on the Microsoft ContosoRetailDW database on my Intel Core i7-6700K workstation. I have two named instances of SQL Server 2016 SP1 installed, one using Standard Edition, and the other using Developer Edition (which is equivalent to Enterprise Edition).

All instance-level and database-level configurations and properties are identical between the two instances, and the user and tempdb database file locations are in separate directories on the same, separate flash storage device for each instance. The database compatibility level was changed to 130 in both cases, and the underlying Windows configuration and hardware is the same for both instances. The only difference here is the Edition of each instance.

The first test is a simple query (adapted from Niko Neugebauer) that allows SQL Server 2016 to use aggregate pushdown on the FactOnlineSales table. The results are shown in Table 1.

Edition	Elapsed Time (ms)
Standard Edition	30
Developer Edition	1
Time Difference	29
% Improvement	96.7%

Table 1: Aggregate Pushdown Comparison

The next test is timing how long it takes to build a clustered columnstore index on the 12.6 million row FactOnlineSales table. The results are shown in Table 2.

Edition	Elapsed Time (ms)
Standard Edition	42,197
Developer Edition	14,384
Time Difference	27,813
% Improvement	65.9%

Table 2: Building Clustered Columnstore Index Comparison

The next test is timing how long it takes to rebuild a clustered columnstore index on the same FactOnlineSales table. The results are shown in Table 3.

Edition	Elapsed Time (ms)
Standard Edition	33,105
Developer Edition	11,460
Time Difference	21,645
% Improvement	65.4%

Table 3: Rebuilding Clustered Columnstore Index Comparison

The next test is another simple query that allows SQL Server 2016 to use local aggregation on the FactOnlineSales table. The results are shown in Table 4.

Edition	Elapsed Time (ms)
Standard Edition	122
Developer Edition	83
Time Difference	39
% Improvement	32.0%

Table 4: Local Aggregation Comparison

The next test is another simple query that allows SQL Server 2016 to use string predicate pushdown on the FactOnlineSales and DimPromotion tables. The results are shown in Table 5.

Edition	Elapsed Time (ms)
Standard Edition	2,683
Developer Edition	1,221
Time Difference	1,466
% Improvement	54.6%

Table 5: String Predicate Pushdown Comparison

These are just some simple examples of the built-in performance advantages for Columnstore indexes in SQL Server 2016 Enterprise Edition compared to SQL Server 2016 Standard Edition on the same hardware. If you want to really dive into Columnstore indexes (which can be a very effective feature for some workloads), you should bookmark and read Niko Neugebauer’s long series of posts at columnstore.net.

DBCC CHECKDB Performance

Another manageability performance improvement that is present on SQL Server 2016 Enterprise Edition is DBCC CHECKDB performance. On Standard Edition, DBCC CHECKDB only uses one processor core, while it can use all available cores on Enterprise Edition. This behavior is unchanged from previous versions of SQL Server. SQL Server 2016 does allow you to restrict the number of cores that DBCC CHECKDB can use with a new WITH (MAXDOP = x) option.

Running DBCC CHECKDB with the WITH PHYSICAL_ONLY option on a somewhat larger database (about 38GB) that I have, yielded the results shown in Table 6.

Edition	Elapsed Time (ms)
Standard Edition	58,492
Developer Edition	24,897
Time Difference	33,595
% Improvement	57.4%

Table 6: DBCC CHECKDB WITH PHYSICAL_ONLY Comparison

Running a standard DBCC CHECKDB on the same database yielded the results shown in Table 7.

Edition	Elapsed Time (ms)
Standard Edition	435,039
Developer Edition	119,767
Time Difference	315,272
% Improvement	72.5%

Table 7: DBCC CHECKDB Comparison

A very important factor in DBCC CHECKDB performance is the sequential read performance from all of the LUNs where your database data file(s) are located. You can easily check this by running a BACKUP DATABASE command to a NUL device (making sure to use the COPY_ONLY and NO_COMPRESSION options). This will show you your effective sequential read performance, as shown in this example from my workstation:

BACKUP DATABASE successfully processed 5048514 pages in 16.115 seconds (2447.502 MB/sec).

Keep in mind that all of this testing was done on a single, quad-core desktop processor. A multi-socket server with many more total processor cores will show even more of a performance improvement in many of these tests.

The point of all of this is to show a few tangible examples of the performance improvements you can see, simply by upgrading from SQL Server 2016 Standard Edition SP1 to SQL Server 2016 Enterprise Edition SP1, on the same hardware, making no database or application changes. This list is by no means exhaustive, as there are many other benefits as well.

The post SQL Server 2016 Enterprise Edition Performance Advantages appeared first on SQLPerformance.com.

23 Dec 18:07

2016 DATAVERSITY Top 20

by Shannon Kempe

It’s that time of year again to announce the Top 20 publications by DATAVERSITY in 2016. We are excited to wrap up another great year of education. We published over 100 articles, over 150 blogs, 89 webinars, two online conferences, and five face-to-face conferences. We also launched our new DATAVERSITY Training Center this year where […]

The post 2016 DATAVERSITY Top 20 appeared first on DATAVERSITY.

Mrdenny

Shared posts

Recommendations for columnstore indexes

Tune Database using Workload from SQL Server Query Store

Next Steps

Designing an Operational Data Store (ODS)

What’s Operational Data Store?

One to One Pull

Common Format

Further Recommendations

MS BI Events in Atlanta

How SQL Server uses Windows Virtual Accounts and local groups

Does this work?

Summary

What About Order?

All Right, Performance Already!

The Plans

At Scale

Resources

Configuration

Name Resolution

Fix the Matching CIDs

Open the Ports

Configure MSDTC Permissions

DTCPing

DTCTester

10. SQL Server 2016 Express Edition in Windows containers

9. SQL Server 2016 Service Pack 1 generally available

8. WideWorldImporters: The new SQL Server sample database

7. Preview the newest ODBC SQL Server Driver for Windows and Linux

6. Microsoft announces major commitment to Apache Spark

5. JSON in SQL Server 2016: Part 1 of 4

4. SQL Server on Linux: How? Introduction

3. Announcing SQL Server on Linux public preview, first preview of next release of SQL Server

2. SQL Server 2016 is generally available today

1. Microsoft SQL Server Developer Edition is now free

The What Part

The Why Part

The How Part

An app

So how does this simple app do this amazingly complex computation?

What is happening in the database?

Prediction and operationalization part:

What about training the model?

Training code:

Network definition:

Training data size/GPU:

The What Are You Waiting For Part

FOR XML PATH

Linux vs. Windows

Unordered tests

Selecting just the length of the output

Selecting the maximum length of the output

SELECT INTO

Ordered Tests

Conclusion

Columnstore Indexes

DBCC CHECKDB Performance