Shared posts

10 Feb 22:29

Dacpac braindump - What is a dacpac?

by jamiet

In this week’s earlier blog post First release of my own personal T-SQL code library on Github I talked of how one could use a dacpac to distribute a bunch of code to different servers. Upon reading the blog post Jonathan Allen (of SQL Saturday Exeter fame), with whom I’ve been discussing dacpacs with on-and-off recently, sent me this email:

Hi Jamie,

The DacPac thing I emailed about in December hasnt taken off yet but I have just downloaded your code library to take a look and I like the way the dacpac works. Should I be able to open that in VS or is the dacpac compiled/built in VS? The video you linked to didnt cover dapac at all so I am in the dark on how to create one/them.

If I can build a database and create a dacpac simply then this could be really useful.

Jonathan’s email made me realise that there is perhaps a lot of confusion about what dacpacs are, what they can be used for and how they can be used so I figured a braindump of what I know about them might be useful, that’s what you’re getting in this blog post.

 

What is a dacpac?

A dacpac is a file with a .dacpac extension.

image

In that single file are a collection of definitions of objects that one could find in a SQL Server database such as tables, stored procedures, views plus some instance level objects such as logins too (the complete list of supported objects for SQL Server 2012 can be found at DAC Support For SQL Server Objects and Versions). The fact that a dacpac is a file means you can do anything you could do with any other file, store it, email it, share it on a file server etc… and this means that they are a great way of distributing the definition of many objects (perhaps even an entire database) in a single file. Or, as Microsoft puts it, a self-contained unit of SQL Server database deployment that enables data-tier developers and database administrators to package SQL Server objects into a portable artifact called a DAC package, also known as a DACPAC. That in itself is, I think, very powerful.

Ostensibly a dacpac is a binary file so you can’t just open it up in your favourite text editor and look at the contents of it. However, what many people do not know is that the format of a dacpac is simply the common ZIP compression format and hence we can add .zip to the end of a dacpac filename:

image

and open it up like you would any other .zip file to have a look inside. If you do so you will see this:

image

The contents of that zip file conform to something called the Open Packaging Convention (OPC). OPC is a standard defined by Microsoft for, well, for zipping up files basically. You have likely used files conforming to OPC before without knowing it, docx, .xlsx, .pptx are the most common ones that you might recognise if you use Microsoft Office and there are some more obscure ones such as .ispac (SSIS2012 developers should recognise that). (For a more complete list of OPC-compliant file types see the wikipedia page).

Notice in the screenshot above showing the innards of TSQLCodeLibrary.dacpac the biggest file is model.xml. This is the file that contains the definition of all our SQL Server objects. I won’t screenshot that here but I encourage you to get hold of a .dacpac file (here’s one) and have a poke around to see what’s in that model.xml file.

What are dacpacs for?

Dacpacs are used for deploying SQL Server objects to an instance of SQL Server. That’s it. If your job does not ever involve doing that then you probably don’t need to read any further.

Dacpac pre-requisites

A .docx file (i.e. A Microsoft Word document) isn’t much use to someone if they don’t have the software (i.e. Microsoft Word) to make use of it and so the analogy holds for dacpacs; in order to use them you need to have some software installed and that software is called the Data-tier Application Framework (or DAC Framework for short, or DacFx for even shorter).

Incidentally, you may be wondering what DAC stands for at this point. I think its “Data-Tier Application” in which case you may be thinking that the acronym DAC is a stupid one especially as DAC also stands for something else in SQL Server, I would agree!

DacFx is available to download for free however you’ll probably never need to do that as installation of DacFX occurs whenever you install SQL Server, SQL Server client tools or SQL Server Data Tools (SSDT). If DacFX is installed you should be able to see it in Programs and Features:

image

How does one deploy a dacpac?

In dacpac nomenclature the correct term for deploying a dacpac is publishing however the two generally get used interchangeably. There are two methods of publishing a dacpac which I’ll cover below.

Publish via SSMS

In SSMS’s Object Explorer right-click on the databases node and select “Deploy Data-tier Application…” (told you they used those terms interchangeably):

image

This launches a wizard that prompts you to choose a dacpac, fill in some particulars (e.g. database name) and then deploy it for you by calling out to DacFx. Unfortunately this wizard is not very good because it doesn’t (currently) support all features of dacpacs, namely if your dacpacs contain any sqlcmd variables (I won’t cover those here but they are commonly used within dacpacs) a value needs to be supplied for them; the wizard doesn’t prompt you for a value and hence the deployment fails.

This. Is. Stupid. Microsoft should be suitably lambasted for not providing this basic functionality. Anyway, due to this limitation you’re most likely to be using the other method which is…

Publish via command-line

One component distributed in DacFx is a command-line tool called sqlpackage.exe which will quickly become your best friend if you use dacpacs a lot. sqlpackage.exe can do a lot of things and those “things” are referred to as actions, one of those actions is publishing a dacpac. Here’s the syntax for publishing a dacpac using sqlpackage.exe:

"%ProgramFiles(x86)%\Microsoft SQL Server\110 \DAC\bin\SqlPackage.exe"
      /action:Publish
      /SourceFile:<path to your dacpac>
      /TargetServerName:<SQL instance you are deploying to>
      /TargetDatabaseName:<Name of either (a)the database to create or (b) the existing database to deploy into>

Publishing is idempotent

Notice from my comment above for TargetDatabaseName that you can deploy to an existing database. You might ask why you might want to publish into an existing database, after all, the objects you are publishing might already exist. This segues nicely into what I see as the biggest benefit of dacpacs and DacFx, the software interrogates the target database to determine whether or not the objects already exist or not and if they do not it will create them. If they do already exist it will determine whether the definition has changed or not and if it has, it will make those changes. DacFx will always protect your data so if it determines that an operation would cause data destruction (e.g. removing a column from a table) then it will (optionally) throw an error and fail. You never again need to write an ALTER statement or first check that an object exists in order to change an object definition, DacFx will do it for you. To put it another way, publishing using dacpacs and DacFx is idempotent.

How does one create a dacpac?

Of course in order to publish a dacpac you’re first going to have to create one and one of Jonathan’s questions above pertained to exactly this. There are two distinct ways to do create a dacpac.

Use an SSDT Project

SQL Server Data Tools (SSDT) projects are basically a project type within Visual Studio that provide a way of building DDL for SQL Server databases. I’m not going to cover SSDT projects in any detail here except to say that when such a project is built the output is a dacpac. Note that SSDT can also publish the dacpac for you however I didn’t mention that above as the publish operation is essentially another wrapper around the same DacFx functionality used by sqlpackage.exe.

Create from an existing database

One can right-click on a database in SSMS and click on “Extract Data-tier Application…” to create a dacpac containing the definition of all objects in that database:

image

Wrap-up

Should you be using dacpacs? I can’t answer that question for you but hopefully what I’ve done is given you enough information so that you can answer it for yourself. Some people might like the way dacpacs encapsulate many objects into a single file and their idempotent deployment, others may prefer good old simple, handcrafted T-SQL scripts which don’t have any pre-requisites other than SQL Server itself. The choice is yours.

Further reading

UPDATE

David Atkinson from Redgate has been in touch to tell me about another dacpac feature that I didn’t know about. It is possible to right-click on a dacpac in Windows Explorer and choose to unpack it:

image

That essentially unzips it but what you also get is a file called Model.sql that will create all of the objects in the dacpac:

image

Very useful indeed! David tells me that Redgate use this functionality to enable comparison of a dacpac using their SQL Compare tool as you can read about at Using a DACPAC as a data source.

10 Feb 21:55

Merge Operation in Memory-Optimized Tables

by SQL Server Team

In the Storage Allocation and Management for Memory-Optimized Tables, we had briefly touched upon the Merge operation. This blog describes Merge operation in detail. We also recommend you to refer to implementing durability for memory-optimized tables for a good understanding of data/delta files and CFPs

 The metadata of all Checkpoint File Pairs (i.e. CFP) that exist in storage is stored in an internal array structure referred to as Storage Array. It is a finitely sized (4096 entries in CTP2 and 8192 entries in RTM) array of CFPs to support a cumulative size of 256GB for durable memory-optimized tables in the database. The entries in the storage array ordered by transaction range. The CFPs in the storage array (along with the tail of the log) represent all the on-disk state required to recover a database with memory-optimized tables.

Running an OLTP workload overtime can lead to storage array to fill up leading to many inefficiencies as described here

  1. Deleted rows – The deleted rows are not actually removed from the data file but a reference to the deleted row is added to the corresponding delta file. This leads to wastage of storage which in turn impacts the recovery time negatively.
  2. Storage Array Manipulation Overhead - Internal processes search the storage array for operations such as finding the delta file to delete a row. The cost of these operations increases with the number of entries

To alleviate these inefficiencies, the older closed CFPs are merged based on a merge policy (described below) so the storage array is compacted to represent the same set of data, with reduced number of CFPs and the storage.

Merge Operation

The Merge operation takes one or more closed CFPs, called MERGE SOURCE(s), based on an internally defined merge policy, described later, and produces one resultant CFP, called MERGE TARGET. The entries in each delta file of source CFP(s) are used to filter rows from the corresponding data file to remove the data rows that are not needed anymore. The remaining rows in the source CFPs are consolidated into one target CFP.  After the merge is complete, the resultant CFP (i.e. the merge-target) replaces the source CFPs (i.e. the merge sources). The merge-source CFPs go through a transition phase before they are eventually removed from the storage.

In the example below, the memory-optimized table file group has four data and delta file pairs at timestamp 500 containing data from previous transactions. For example, the rows in the first data file correspond to transactions with timestamp > 100 and <=200 or alternatively represented as (100, 200].  The second and third data files are shown to be less than 50% full after accounting for the rows marked deleted. The merge operation combines these two CFPs and creates a new CFP containing transactions with timestamp > 200 and <=400, which is the combined range of these two CFPs.  You see another CFP with range (500, 600] and non-empty delta file for transaction range (200, 400] shows that Merge operation can be done concurrently with transactional activity including deleting more rows from the source CFPs.

 In-memory OLTP engine, a background thread evaluates all closed CFPs using a merge policy and then initiates one or more merge requests for the qualifying CFPs. These merge requests are processed by the offline checkpoint thread. The evaluation of merge policy is done periodically and also when a checkpoint is closed.

Merge Policy

SQL Server 2014 implements the following merge policy

  • A merge is scheduled if 2 or more consecutive CFPs can be consolidated, after accounting for deleted rows, such that the resultant rows can fit into 1 data of ideal size. The ideal size of data  is determined as follows
    • For machines < 16GB. The data file is 16MB and delta file is 1MB (not supported in CTP2)
    • For machines > 16GB, the data file is 128MB and delta file is 8MB

A maximum of 10 consecutive CFPs can be part of one merge operation.

  • A single CFP can be self-merged if the data file is larger than 256 MB and over half of the rows are marked deleted. A data file can grow larger than 128MB if, for example, a single transaction or multiple concurrent transactions insert/update large amount of data. This will result in the data file to grow beyond its ideal size because a transaction(s) cannot span multiple CFPs.   Note, Self-merge is not supported in CTP2.

Here are some examples that show the CFPs that will be merged under the merge policy.

Adjacent CFPs Source Files (% full)

Merge Selection

CFP0 (30%), CFP1 (50%), CFP2 (50%), CFP3 (90%)

(CFP0, CFP1). CFP2 is not chosen as it will make resultant data file > 100% of the ideal size

CFP0 (30%), CFP1 (20%), CFP2 (50%), CFP3 (10%)

(CFP0, CFP1, CFP2). Files are chosen starting from left. CTP3 is not combined as it will make resultant data file > 100% of the ideal size

CFP0 (80%), CFP1 (30%), CFP2 (10%), CFP3 (40%)

(CFP1, CFP2, CFP3). Files are chosen starting from left. CFP0 is skipped because if combined with CFP1, the resultant data file will be > 100% of the ideal size

 Not all CFPs with available space qualify for merge. For example, if two adjacent CFPs are 60% full, they will not qualify for merge which will result in 40% of wasted storage for these CFPs. In the worst case, all CFPs can be little over 50% full leading to storage utilization of only 50% approximately. It is important to note that deleted row(s) may exist in the storage, but they may have already been removed (i.e. garbage collected) from the memory. The management of storage and the memory is somewhat independent from garbage collection perspective. A thumb rule is that storage taken by ‘Active’ CFPs (note, it does not account for all CFPs) can be up to 2x larger than the size of durable tables in memory.

You can force merge using stored procedure sys.sp_xtp_merge_checkpoint_files (Transact-SQL) to override the merge policy. For details, please refer to http://msdn.microsoft.com/en-us/library/dn198330(v=sql.120).aspx

Life Cycle of a CFP

As mentioned earlier, once the CFPs are merged, they go through a state transition before they can be deallocated. At any given time, the CFPs data/delta file pairs can be in the following states

  • PRECREATED – A small set of CFPs are kept pre-allocated to minimize or eliminate any waits to allocate new files as transactions are being executed. These are full sized with data file size of 128MB and delta file size of 8 MB but contain no data. The number of CFPs is computed as the number of logical processors or schedulers with a minimum  of 8. This is a fixed storage overhead in databases with memory-optimized tables
  • UNDER CONSTRUCTION – Set of CFPs that store newly inserted and possibly deleted data rows since the last checkpoint.
  • ACTIVE - These contain the inserted/deleted rows from previous closed checkpoints. These CFPs contain all required inserted/deleted rows required before applying the active part of the transaction log at the database restart. We expect that size of these CFPs to be approximately 2x of the in-memory size of memory-optimized tables assuming merge operation is keeping up with the transactional workload.
  • MERGE TARGET – CFP stores the consolidated data rows from the CFP(s) that were identified by the merge policy. Once the merge is installed, the MERGE TARGET transitions into ACTIVE state
  • MERGED SOURCE – Once the merge operation is installed, the source CFPs are marked as MERGED SOURCE. Note, the merge policy evaluator may identify multiple merges a CFP can only participate in one merge operation.
  • REQUIRED FOR BACKUP/HA – Once the merge has been installed and the MERGE TARGET CFP is part of durable checkpoint, the merge source CFPs transition into this state. CFPs in this state are needed for operational correctness of the database with memory-optimized table.  For example, to recover from a durable checkpoint to go back in time.  A CFP can be marked for garbage collection once the log truncation point moves beyond its transaction range.
  • IN TRANSITION TO TOMBSTONE – These CFPs are not needed by in-memory OLTP engine can they can be garbage collected. This state indicates that these CFPs are waiting for the background thread to transition them to the next state TOMBSTONE
  • TOMBSTONE – These CFPs are waiting to be garbage collected by the filestream garbage collector.  Please refer to FS Garbage Collection for details

After accounting for the storage taken by CFPs in various states, the overall storage taken by durable memory-optimized tables can be much larger than 2x. The DMV sys.dm_db_xtp_checkpoint_files can be queried to list all the CFPs in memory-optimized filegroup including the phase they are in. Transitioning CFPs from MERGE_SOURCE state to TOMBSTONE and ultimately getting garbage collected can take up to 5 checkpoints and transaction log backup, if database is configured for full or bulk-logged recovery model, steps. You can, of course, manually force the checkpoint followed by log backup to expedite the garbage collection but then this will add 5 empty CFPs (i.e. 5 data/delta file pairs with data file of size 128MB each). In production scenarios, the automatic checkpoints and log backups taken as part of backup strategy will seamlessly transition CFPs through various states without requiring any manual intervention. The impact of the garbage collection process is that in-memory databases may have a disproportionate storage footprint compared to its size in memory. It is not uncommon to have the size storage occupied by memory-optimized tables to be 3x to their size in memory.

In the next blog, we will walk through an example showing transition of CFPs across all phases.

10 Feb 21:55

Converting Points to a Path

by Rob Farley

Suppose your SQL table has a bunch of spatial points (geographies if you like) with an order in which they need to appear (such as time) and you want to convert them into a LineString, or path.

One option is to convert the points into text, and do a bunch of string manipulation. I’m not so keen on that, even though it’s relatively straightforward if you use FOR XML PATH to do the heavy lifting.

The way I’m going to show you today uses three features that were all introduced in SQL Server 2012, to make life quite easy, and I think quite elegant as well.

Let’s start by getting some points. I’ve plotted some points around Adelaide. To help, I’m going to use Report Builder to show you the results of the queries – that way, I can put them on a map and you can get a feel for what’s going on, instead of just seeing a list of co-ordinates.

First let’s populate our data, creating an index that will be helpful later on:

select identity(int,1,1) as id, *
into dbo.JourneyPoints
from
(values
    (geography::Point(-34.924269, 138.599252, 4326), 'Cnr Currie & KW Sts', cast('20140121 9:00' as datetime)),
    (geography::Point(-34.924344, 138.597544, 4326), 'Cnr Currie & Leigh Sts', '20140121 9:30'),
    (geography::Point(-34.923025, 138.597458, 4326), 'Cnr Leigh & Hindley Sts', '20140121 10:00'),
    (geography::Point(-34.923016, 138.597608, 4326), 'Cnr Bank and Hindley Sts', '20140121 10:30'),
    (geography::Point(-34.921775, 138.597533, 4326), 'Cnr Bank St and North Tce', '20140121 11:00'),
    (geography::Point(-34.921520, 138.601814, 4326), 'Cnr North Tce and Gawler Pl', '20140121 11:30'),
    (geography::Point(-34.924071, 138.601975, 4326), 'Cnr Gawler Pl and Grenfell St', '20140121 12:00'),
    (geography::Point(-34.923966, 138.605590, 4326), 'Cnr Grenfell and Pulteney Sts', '20140121 12:30'),
    (geography::Point(-34.921338, 138.605405, 4326), 'Cnr Pulteney St and North Tce', '20140121 13:00')
  ) p (geo, address, timeatlocation);

create index ixTime on dbo.JourneyPoints(timeatlocation) include (geo);

select * from dbo.JourneyPoints;

Great. Starting at the corner of Currie and King William Streets, we wander through the streets, including Leigh St, where the LobsterPot Solutions office is (roughly where the ‘e’ is).

image

I’ve labelled the points with the times, but it’s still not great viewing. Frankly, it’s a bit hard to see what route was taken.

What we really want is to draw lines between each of them. For this, I’m going to find the next point in the set, using LEAD(), and use the spatial function ShortestLineTo to get the path from our current point to the next one.

select geo,
       lead(geo) over (order by timeatlocation) as nextGeo,
       geo.ShortestLineTo(lead(geo) over (order by timeatlocation)) as lineToNext,
       timeatlocation
from dbo.JourneyPoints;

I didn’t need to use pull back the fields geo and nextGeo, but I figure that the lineToNext column might be confusing at first glance, since it uses the subsequent row’s position as an argument in a function on the current row’s position. Anyway, hopefully you get the gist, here’s what it looks like.

image

This is way better – you can see the path that was taken, and can easily tell that the route didn’t just go straight up North Tce, it ducked down Gawler Place instead.

What’s more – with each part of the journey still being a separate row, I can colour each part differently. You know, in case I don’t like the “Tomato” colour in my last example (yes, that colour is called “Tomato”, no matter whether you say it “tomato”, or “tomato”, or even “tomato”).

To colour it differently, I’m going to throw in an extra field, which is just the number of minutes since we started. I’ll use the old fashioned OVER clause for that, to count the number of minutes since the earlier time.

select geo.ShortestLineTo(lead(geo) over (order by timeatlocation)) as lineToNext,
       timeatlocation,
       datediff(minute, min(timeatlocation) over (), timeatlocation) as minutesSinceStart
from dbo.JourneyPoints;

image

Cool – now I can easily tell which end it started at (the more tomatoey end), and where it ended (the paler end). Each segment is the same colour, but that’s okay.

Now, I said I’d use three SQL 2012 features, and so far the only new ones have been LEAD and ShortestLineTo. But remember I still have several rows, and each section of the route is a separate line. Well, to join them up, I’m going to use 2012’s UnionAggregate function. To use this, I need to use a sub-query (I’ll go with a CTE), because I can’t put an OVER clause inside an aggregate function.

with lines as (
select geo.ShortestLineTo(lead(geo) over (order by timeatlocation)) as LineToNext
from dbo.JourneyPoints
)
select geography::UnionAggregate(LineToNext) as WholeRoute
from lines;

Now I have my solution! I’ve converted points into lines, in the right order.

image

You may be wondering how this performs – what kind of execution plan is going to appear.

Well it’s this:

image

image

Look at this – there are Stream Aggregates (which just watch the data as it comes through, popping rows out when needed, but never holding onto anything except the aggregate as it grows), a Spool (which is used to do a bit of the windowing trickery, but also holding onto very little), and the Sequence Project & Segment operators which generate a row_number as a marker for the lead function. You might be interested to know that the right-most Stream Aggregate has the following “Defined Value” property:

[Expr1005] = Scalar Operator(LAST_VALUE([spatial_test].[dbo].[JourneyPoints].[geo])),
[[spatial_test].[dbo].[JourneyPoints].geo] = Scalar Operator(ANY([spatial_test].[dbo].[JourneyPoints].[geo]))

For each group (which is defined as the row), it uses the LAST_VALUE of geo, and ANY of geo. ANY is the current one, and LAST_VALUE is the row after it. It’s the last row, because the Spool gives up two rows for each ‘window’ – the current row and the lead row. In this scenario, with 9 rows of data in the index, the Spool pulls in (from the right) 9 rows, and serves up (to the left) 17. That’s two per original row, except the last which doesn’t have a lead row.

So the overhead on making this work is remarkably small. With an index in the right order, the amount of work to do is not much more than scanning over the ordered data.

Finally, if I had wanted to do this for several routes, I could have put a RouteID field in the table, used PARTITION BY RouteID in each OVER clause, and GROUP BY RouteID in the final query. If you do this, then you should put routeid as the first key column in your index. That way, the execution plan can be almost identical (just with slightly more explicit grouping, but with identical performance characteristics) to before.

with lines as (
select routeid, geo.ShortestLineTo(lead(geo) over (partition by routeid order by timeatlocation)) as LineToNext
from dbo.JourneyPoints
)
select routeid, geography::UnionAggregate(LineToNext) as WholeRoute
from lines
group by routeid
;

But I don’t have a picture of that, because that wasn’t the query I was wanting.

10 Feb 21:39

Assumptions and clusters – the double whammy

by Wayne Sheffield

I was recently investigating an issue that stemmed from a database being in the recovery_pending state. A quick investigation through the SQL logs showed that the server had restarted earlier that morning, and that this instance is running on a cluster. Assuming that this is probably just a timing issue, I decided to try to bring the database online.

Source code    
ALTER DATABASE <database_name> SET ONLINE;

Now, you didn’t really think that this blog post would end so soon, did you? Well, running that little bit of code gave me this wonderful result:

Msg 5591, Level 16, State 5, Line 1
FILESTREAM feature is disabled.
Msg 5105, Level 16, State 14, Line 1
A file activation error occurred. The physical file name 'F:\MSSQL11.MSSQLSERVER\MSSQL\Filestream\<database_name>' may be incorrect. Diagnose and correct additional errors, and retry the operation.
Msg 5181, Level 16, State 5, Line 1
Could not restart database "<database_name>". Reverting to the previous status.
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed.

Sigh. Okay, that didn’t work out too well. Looking over the system a little bit more, I could see that the database had been online prior to the shutdown – backups had just happened 2 hours earlier.

I proceeded to check out all of the normal filestream things – things like making sure that the permissions on the specified folder had been granted to the SQL Server service account. But still, no luck – everything that I try continues to return the same error.

A friend reminded me that filestream needs to be enabled on the service. I’m thinking that surely, this isn’t it… the database had been up. Regardless, I launched SQL Server Configuration Manager, went to the properties of the SQL Server service, and clicked on the FILESTREAM tab. And what did I find? My assumption was wrong, for on this node, the “Enable FILESTREAM for Transact-SQL Access” setting was disabled. I quickly checked the other node (it was enabled there), and assumed that this must be the problem. I enabled this setting on the current node, and ran the alter statement to bring the database online again.

And the same error occurs. (DEEP Sigh!)

A little bit of internet searching brings me to a BOL article on how to Enable and Configure FILESTREAM (why do they always capitalize that anyway? It’s not like it’s an acronym, is it?) In this article, it mentions that we need to set the filestream access level. This is performed with the sp_configure stored procedure, so it means that this setting is stored in the master database. Since it was working on the other node, I assumed that the setting is set properly. But a quick check against sys.configurations shows that it isn’t set, so I run this bit of code:

Source code    
EXECUTE sp_configure filestream_access_level, 2;
RECONFIGURE;

and then I run the ALTER DATABASE statement to put the database online. And… success. (Whew)

So did I really write this blog post just to impress you with my troubleshooting skills? Nope, not at all. The purpose of this post is to remind you not to make assumptions. Don’t assume that just because something was working, that it must be configured properly. And don’t assume that because everything with SQL is working properly on one node of a cluster, that it will work properly on the other node(s). I have seen too many times where nodes are not configured identically (in one case, this extended to the drive mappings, so when the cluster did fail over, it started causing all kinds of errors).

So, don’t assume. Step back, and investigate to ensure that the prerequisites for utilizing a feature are met, and that the settings are properly configured. Just look at all of the faulty assumptions that I made trying to get this situation resolved.

And when you’re working on a cluster, ensure that the nodes are configured identically.

10 Feb 21:38

Compiled T-SQL? Don’t throw away all your SQLCLR code just yet

by Bob Beauchemin

The speed increase from memory-optimized tables in SQL Server 2014 is nice, but I thought, to get the most bang for your buck for really need to use compiled stored procedures in addition. After looking at some of the C code these produce, it almost looked like I was doing direct vector branching into the table’s C-based specific access routines. But, in the meantime, I’d worked with a friend of mine who was trying to convert some custom SQLCLR code to compiled T-SQL. He was using table variables in place of CLR objects and, although the code run way faster than the non-compiled T-SQL code, it was still 90% slower than his SQLCLR equivalent.

In general, you’ll get the biggest speed increase from a compiled sproc if you use do lots of logic in code, or use nested loops in plans that return lots of rows. I thought I had just the use case.

Once upon a time, someone had given me code that did a lot of calculations in SQLCLR and in T-SQL. The SQLCLR code uses 2-and-3-dimensional arrays as well as some data access. The T-SQL code was an unoptimized, line-by-line port of the SQLCLR code used multiple temporary tables and a cursor over outer iterator. You could remove the outer cursor and still SQLCLR performed 100 iterations (100 products in this case) 7-8 times faster than T-SQL performed 1 iteration (1 product). In other words, SQLCLR was about 700-800X faster than T-SQL. Folks used to tell me this was “cheating” because of the large number of SQL queries the T-SQL code did (100s of queries, many using INSERT INTO a temp table). Nevertheless, it was worth a try.

Converting the code was pretty straightforward.
Replace temp tables with strongly-typed in-memory table variables
Make sure each table variable has a key and index (they’re required in in-memory table variables)
Change a subquery into another select into table variable + main query (compiled sprocs don’t support subqueries yet)
Make sure everything follows the rules for schemabinding
Didn’t use the outer cursor, so the compiled T-SQL code either performed a single iteration or called the compiled proc from within non-compiled T-SQL cursor code

I also had to wait until CTP2, because the procedure used DATEADD. That’s wasn’t supported until CTP2 (good choice of what to support sooner, SQL team, thanks for adding date functions).

The results for switching to in-memory tables were encouraging:
SQLCLR – standard tables – 5 sec
TSQL – standard tables – 37 sec (x100 = 370 sec) – 740X slower
TSQL – in-memory tables – 10 sec (x100 = 100 sec) – 200x slower
SQLCLR doesn’t support in-memory tables through context connection. I didn’t try it with an external connection back into the same instance.

Switching to compiled sproc (and pre-compiling the sprocs to factor out the first-time compile overhead for each)
TSQL – in-memory tables – 10 sec (x100 = 100 sec)
Compiled TSQL (which only supports in-memory tables) – 10 sec (x100 – 100 sec)

So for this particular use case (just a straight port, didn’t try to optimize the code), the difference was minimal. BTW, I’m not pretending this is a benchmark or anywhere near, just a quick test and some round numbers for comparison.

Then I took a harder look at my code. The SQL code was doing a large *number* of SQL statements, each of which used and returned a small number of rows. The computations were done, not in program logic, but with iterations over SQL statements (not a good way to do custom computations in ANY case). Horrible use case, I guess. Lessons learned:
-In-memory tables are not multidimensional arrays or custom “objects” as far as speed is concerned.
-The SQL statement still has overhead over in-memory operations even with simple SQL in a compiled sproc
-Iterating with SQL statements doesn’t substitute for aggregations and computations done in a single SQL statement (I knew that one already).
-Remember where the sweet spot is for compiled stored procedures.

So folks, don’t throw away your custom SQLCLR sprocs that do custom computations just yet. But do give it a test. And push as much of the computation as possible into the SQL statement and optimize those SQL statements.

Cheers, Bob
@bobbeauch

The post Compiled T-SQL? Don’t throw away all your SQLCLR code just yet appeared first on Bob Beauchemin.

10 Feb 21:38

State Transition of Checkpoint Files in Databases with Memory-Optimized Tables

by SQL Server Team

In the previous blog Merge Operations, we described that how a CFP transitions across various stages in its life cycle. In this blog, we will walk through an actual example to show the state transition. Note, some of the states shown for CFPs are only visible in RTM bits (not in CTP2) but internally the state transitions is the same. Basing this example using RTM bits adds lot more clarity in how storage is managed. Another note is that RTM bits allocate CFPs (16MB of data file and 1 MB of delta file) on machines with physical memory <= 16GB. This example was run on a machine with 8GB of RAM using pre-RTM bits.

CREATE DATABASE hkdb1 ON 

 PRIMARY (NAME = [hkdb1_hk_fs_data], FILENAME = 'C:\hekaton_test\data\hkdb1_data.mdf'),

 FILEGROUP [hkdb1_hk_fs_fg] CONTAINS MEMORY_OPTIMIZED_DATA

 (NAME = [hkdb1_hk_fs_dir],  FILENAME = 'C:\hekaton_test\data\hkdb1_hk_fs_dir')

 LOG ON (name = [hktest_log], Filename='C:\hekaton_test\data\hkdb1_log.ldf', size=100MB)

go

-- disable auto-merge so that we can show the merge

-- in a predictable way

dbcc traceon (9851, -1)

-- set the database to full recovery. This is a common practice for production scenario

alter database hkdb1 set recovery full

go

use hkdb1

go

-- create a memory-optimized table with each row of size > 8KB

CREATE TABLE dbo.t_memopt (

       c1 int NOT NULL,

       c2 char(40) NOT NULL,

       c3 char(8000) NOT NULL,

 

       CONSTRAINT [pk_t_memopt_c1] PRIMARY KEY NONCLUSTERED HASH (c1) WITH (BUCKET_COUNT = 100000)

) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)

go

-- do the database backup. We do this so that we can do transaction log backups

BACKUP DATABASE [hkdb1] TO  DISK = N'C:\hekaton_test\data\hk-empty-data.bak'

WITH NOFORMAT, INIT,  NAME = N'hkdb1-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10

GO

-- load 8000 rows. This should load 5 16MB data files on a machine with <= 16GB of storage

declare @i int = 0

while (@i < 8000)

begin

        insert t_memopt values (@i, 'a', replicate ('b', 8000))

        set @i += 1;

end

go

In this example, we will show the output of DMV sys.dm_db_xtp_checkpoint_files using the following query.

select file_type_desc, state_desc, internal_storage_slot,file_size_in_bytes, file_size_used_in_bytes,

inserted_row_count, deleted_row_count, lower_bound_tsn, upper_bound_tsn, last_backup_page_count, drop_table_deleted_row_count 

from sys.dm_db_xtp_checkpoint_files

order by container_id, file_type_desc, upper_bound_tsn

Here is the output after removing some columns and rows (just showing rows with DATA file).

  • There are 8 data files under ‘PRECREATED’ state. This demo was run on machine with 4 logical processors with 8GB memory. The size of the files is 16MB (supported in SQL Server 2014 RTM on a machine with <= 16GB memory)
  • There are 5 data files marked ‘UNDER CONSTRUCTION’ as checkpoint has not been taken but these files together store 8000 rows that we had inserted. If database is restarted, the data rows will loaded using transaction log and NOT from these files as they are not part of a durable checkpoint

Now, execute a manual checkpoint by executing the following command and see the state transition in the CFPs. You will note that UNDER CONSTRUCTION CFPs are now marked ACTIVE as they are part of a durable checkpoint that we have just completed. The manual checkpoint closed the data file with internal_storage_slot though it was only 25% full. This data file can potentially be merged in future as it qualifies under the merge policy.

checkpoint

go

Now, we will delete 50% rows

-- now delete 50% rows

declare @i int = 0

while (@i <= 8000)

begin

       delete t_memopt where c1 = @i

       set @i += 2;

end

go

checkpoint

go

Here is the output of the DMV for CFPs in ACTIVE state using the following query

select file_type_desc, state_desc, internal_storage_slot,file_size_in_bytes, file_size_used_in_bytes,

inserted_row_count, deleted_row_count, lower_bound_tsn, upper_bound_tsn, last_backup_page_count, drop_table_deleted_row_count 

from sys.dm_db_xtp_checkpoint_files

where state = 2

order by container_id, file_type_desc, upper_bound_tsn

Note that the delta files have reference to the deleted rows. There is a CFP with 0 data rows due to a manual checkpoint.

 For this example, we had disabled auto-merge so that we can see the state transitions with no surprises.

We will now force a manual merge (merging all the ACTIVE CFPs) by executing the following command

-- now do the manual merge

-- it generates merge-target and other files stay as regular files

exec sys.sp_xtp_merge_checkpoint_files 'hkdb1',  1877, 12004

go

The DMV shows that the merge was executed and there is new CFP marked as MERGE TARGET containing 4000 rows. Also, note the data file is around 33MB, much larger than 16MB. This happened because we forced the merge. The auto-merge would not have merged all ACTIVE CFPs because it won’t meet the merge policy.

At this time, the CFPs that are sources of MERGE are still marked as ACTIVE. This is because the merge has not been installed. You can query the state of merge request as follows

select request_state_desc, lower_bound_tsn, upper_bound_tsn

from sys.dm_db_xtp_merge_requests

go

request_state_desc   lower_bound_tsn      upper_bound_tsn

-------------------- -------------------- --------------------

PENDING              0                    12007

 

Now, we will force a checkpoint to install the merge.

checkpoint

go

Here is the output of the DMVs. It shows that the MERGE TARGET is now changed to ACTIVE state and all the CFPs that were used as source of the MERGE are now marked as MERGED SOURCE. Referring to the blog <reference>, these CFPs are now in category-3. 

The source CFPs will transition into category-4 automatically as automatic checkpoints and log backups are taken. For the example, here we will force the checkpoint and log backups as follows

checkpoint

go

-- do the log backup.

BACKUP LOG [hkdb1] TO  DISK = N'C:\hekaton_test\data\hk-log-1.bak'

WITH NOFORMAT, INIT,  NAME = N'hk-log Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10

GO

Here is the output of the DMV showing that earlier CFPs have now transitioned to state REQUIRED FOR BACKUP/HA meaning that these CFPs can’t be garbage collected as they are needed for operational correctness of the database with memory-optimized table

 

After couple of manual checkpoint followed by log backup, the DMV output shows that CFPs are finally being handed off to the garbage collector for File Stream. At this stage, these CFPs are not needed for operational correctness for our database. To state differently, the full database backup of the database will NOT include these CFPs but they continue take storage space in the memory-optimized filegroup until they are garbage collected.

One more cycle of manual checkpoint followed by log backup, the CFPs are actually transitioned to TOMBSTONE state which indicates that these will get removed as part of File Stream Garbage Collection. Please refer to FS Garbage Collection for details

 

This concludes the state transition of CFPs. The main point to remember is that it takes few checkpoint/log-backups after a set of CFPs have been merged before the storage can be garbage collected. We expect these state transitions to occur automatically in production environment as checkpoints will occur automatically for each additional 512MB transaction log generation and the regular log backups that are taken as part of backup strategy. However, you will need to account for this extra storage as CFPs go through transition from MERGE SOURCE to finally getting garbage collected.

10 Feb 21:37

Parameterization and filtered indexes (part 2)

by Hugo Kornelis
In my previous post , I demonstrated how the presence of a filtered index can get in the way of successful parameterization, possibly resulting in too much query compilations and procedure cache bloat. I suggest reading that first, because I will go straight where I left off. Use the Force, Luke If you read the previous post very carefully, you will have noticed that I wrote that if a plan is unstable, it is “unsafe for simple parameterization” – and that suggests that SQL Server will not have such...(read more)
10 Feb 21:37

SQL Server 2012 Windowing Functions: LAG

by John Paul Cook
SQL Server 2012 introduces new windowing functions beyond the basic ranking functions of RANK, DENSE_RANK, NTILE, and ROW_NUMBER. The LAG function is particularly useful when comparing data in the current time period to the previous period. Period to period changes in sales, nosocomial infection rates, and manufacturing defects are just some examples of where a LAG function can simplify your code. The purpose of this post is to provide a basic introduction. Subsequent posts will discuss more advanced...(read more)
10 Feb 21:36

Combining multiple CONTAINS predicates in SQL Server 2012

The SQL Server 2005  “Performance Tuning and Optimization (Full-Text Search)” books online topic has the following recommendation:

Combine multiple CONTAINS predicates into one CONTAINS predicate. In SQL Server you can specify a list of columns in the CONTAINS query.”

While this shows up for the SQL Server 2005 topic, there is no such statement made for the 2008 and 2012 versions of the the books online topic.

I was curious if this 2005-era recommendation still applied for more recent versions.  To test this scenario I made the following changes to the AdventureWorks2012 sample database on version Microsoft SQL Server 2012 (SP1), 11.0.3000.0 Developer Edition:

USE [AdventureWorks2012];
GO
ALTER TABLE [Production].[ProductDescription]
ADD [Description2] NVARCHAR(400) NOT NULL DEFAULT '';
GO

-- Populating the new description column
UPDATE [Production].[ProductDescription]
SET [Description2] = [Description];
GO

CREATE FULLTEXT INDEX ON [Production].[ProductDescription] KEY INDEX [PK_ProductDescription_ProductDescriptionID] ON ([AW2008FullTextCatalog])
WITH (CHANGE_TRACKING AUTO);
GO

ALTER FULLTEXT INDEX ON [Production].[ProductDescription] ADD ([Description]);
ALTER FULLTEXT INDEX ON [Production].[ProductDescription] ADD ([Description2]);
GO

Now for an initial test, I executed the following single CONTAINS predicate query:

SET STATISTICS IO ON;

SELECT  [ProductDescriptionID] ,
[Description] ,
[rowguid] ,
[ModifiedDate]
FROM [Production].[ProductDescription]
WHERE CONTAINS([Description], 'bike');

SET STATISTICS IO OFF;

The query execution plan tree (from SQL Sentry Plan Explorer) was as follows:

image thumb Combining multiple CONTAINS predicates in SQL Server 2012

And the STATISTICS IO output was as follows:

Table ‘ProductDescription’. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now for a multi-CONTAINS predicate, I executed the following example:

SET STATISTICS IO ON;

SELECT  [ProductDescriptionID] ,
[Description] ,
[rowguid] ,
[ModifiedDate]
FROM [Production].[ProductDescription]
WHERE CONTAINS([Description], 'bike') OR
CONTAINS([Description2], 'bike');

SET STATISTICS IO OFF;

The query execution plan tree (from SQL Sentry Plan Explorer) was as follows:

image thumb1 Combining multiple CONTAINS predicates in SQL Server 2012

The operators remained the same, but we have a change in estimated subtree cost and also a doubling up of cardinality estimates.

As for the STATISTICS IO output, it was as follows:

Table ‘ProductDescription’. Scan count 0, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Logical reads were 34 versus the original 28.

For the last example, I combined the two CONTAINS predicates as follows:

SET STATISTICS IO ON;

SELECT  [ProductDescriptionID] ,
[Description] ,
[rowguid] ,
[ModifiedDate]
FROM [Production].[ProductDescription]
WHERE CONTAINS(([Description], [Description2]), 'bike');

SET STATISTICS IO OFF;

I found that with the combined CONTAINS, the plan shape for this example remained the same as the multiple-CONTAINS version. Operator costs and estimates also remained the same.

For SQL Server 2005, if you had multiple CONTAINS predicates referencing columns from the same table, your execution plan would show separate remote scan operators for each reference instead of them being combined natively into a single operator.  This behavior seems to have been addressed in recent versions, but if you have seen a variation of this issue given other conditions please share in the comments.  Thanks!

The post Combining multiple CONTAINS predicates in SQL Server 2012 appeared first on Joe Sack.

10 Feb 21:36

what does a DBA do?

by tjaybelt


What I feel like I do      






  





What I really do






Just keeping the lights on

10 Feb 21:36

Basic RegEx in SSMS

Have you ever had that moment where you have gone "Geez.. I wish I'd learnt this stuff ages ago" after realising just how much more productive you’d have been since!! That happened to me recently after taking literally a few minutes to learn more about the find/replace regular expression option in SSMS.. It has long been on my list of things to make more use off (alongside powershell!) and although I have used regular expressions on the odd occasion in .NET over the past couple of years, I never really picked up on them in SSMS. I spotted Jen McCown’s recent blog post on the same subject and I thought, rights that it, I'm going to spend the time to learn this feature instead of string hacking the hard way in SSMS or Excel.... and it wasn’t long before I had an opportunity to do so the very same day..

We all must have at least once been given a list of maybe 10, 50, 100 or whatever number of key values via email, text file etc and needed to get a list of records from a table based on those key values. Because it is a one off job, how many times have we just pasted that list into SSMS and proceeded to quickly knock up construct a SELECT statement with an IN clause of the key values and then went through and put a comma at the end of each value in the list to make it well formed tsql… very monotonous and tedious for the minute or two that it takes… or worse still, have to make them string values by putting apostrophes at either end of each value AND comma separate them ready for the IN clause..

This was the very scenario I had [again] but this time I forced myself to use RegEx to comma separate the values and soon found out that this was ridiculously easy to do in SSMS… All I did was paste the list of values into an SSMS window, highlight them all, press Ctrl-H to bring up find/replace box. Then in the Find what box I typed “$”, which is the expression to find the end of a line and in the Replace with box, I typed “,” then changed it to use “Regular Expressions” and voila! a comma at the end of each value on each line. A 10 second process if that.. I trimmed off the last comma, wrapped them all with brackets and there was the IN part of my SELECT query. I’ve already lost count of how many times I’ve used this method since learning this..

image

Similar to appending data at the end of a line of text, if I wanted to prepend data at the start of the line then that is just as easy. To find the start of the line via RegEx, you can use “^” and you can instead prepend data to the start of each line.

Taking the previous example a step further, if I wanted to convert the list of key values to string values ready for me to use in a SELECT query, that is also pretty easy to do with the find expression of: 
^{.*}$
and the replace expression of:
'\1',

What the find expression is doing is finding the start of the line (^), then any number of characters (.*) that follow and then the end of the line ($). The curly braces is tagging the any number of characters it founds between the start and the end of the line so that we can reference it in the replace expression. The replace expression references the tagged data by the (\1) expression.

image

If you have a list of 100’s of values then this allows you get them in a way ready for your query in a matter of seconds.

Phil Factor has a blog post with a whole of list of scenarios and expressions to remodel the scripts/statements that you have and well worth flicking through just to get a feel for the variety of use cases. It also goes into great detail the RegEx expressions which I’m not going to regurgitate here.

Once you get the hang of RegEx, you can get really creative with the expressions and find/replace in ways you’d never have thought of before. For example, imagine you have a bunch of 100’s or 1000’s of INSERT statements e.g:

INSERT INTO dbo.Customers (CustomerId, CustomerName, CustomerAddress, CustomerComments)
VALUES (1, 'SomeCustomer1', 'SomeAddress1', 'Comments')
INSERT INTO dbo.Customers (CustomerId, CustomerName, CustomerAddress, CustomerComments)
VALUES (2, 'SomeCustomer2', 'SomeAddress22', 'Comments')
INSERT INTO dbo.Customers (CustomerId, CustomerName, CustomerAddress, CustomerComments)
VALUES (3, 'SomeCustomer3', 'SomeAddress333', 'Comments')

And for whatever reason, you want to remove column 3 (CustomerAddress) completely from all of the INSERT statements.. To achieve this by hand, you’d need to go through and first remove the column reference (easy enough using find/replace), but you also need to remove the value as well.. Not so easy using traditional replace. But it is with a regular expression using a find expression similar to:
{\(([:a\']+\, )^2}[:a\']*\,
And the replace expression of:
\1
Try it on the example and see.. (see Phil Factor’s post or the Technet article for a more detailed explanation of each element of the expression) but basically it is looking for the first three columns based on the comma after the opening bracket but keeping the first two columns tagged (the curly braces). It then replaces what it finds with the tagged data.

What about taking the example above and converting the INSERT statements to an UPDATE statement updating the CustomerName based on the CustomerId in one swoop. You can by building up and using a find expression along the lines of:
INSERT INTO .*$\n.*\({[:a]+}, {[:a\']+}.*\)
which finds each INSERT statement tagging the relevant parts that we want to use during the replace operation and then replace that data with the expression:
UPDATE dbo.Customers SET CustomerName = \2 WHERE CustomerId = \1

image

Pretty simple stuff, but awesome nonetheless and if you haven’t learnt regular expressions yet (and you frequently do what I used to do in SSMS!), then seriously, push yourself to try to use them the next time you get an opportunity to do so.. Only shame is that Microsoft use their own version of RegEx which isn’t 100% in line with that found in .NET and other languages.

Enjoy!

10 Feb 20:31

SQL Monitoring

by tjaybelt
I use a variety of methods for monitoring my database systems. Some are home grown. Others are 3rd party tools. Some look at more than SQL Server, reaching out to services and network and beyond. Some are very specific; monitoring a collection of file to ensure that imports are occurring on a regular basis.

One of the tools that I love is RedGate's SQL Monitor. I consider it my junior DBA. It is always on and always watching my systems. It lets me know when things are going awry.

With the custom metrics, I have been able to create and monitor some things that are near and dear to my heart, but are not generic enough that an alert already exists. For example, I have one custom metric that collects data about replication. Its not perfect, but the goal is to let me know when we are experiencing a particularly heavy replication period, as i may need to stay on my feet and be vigilant. Most times these periods simply pass without incident, but on occasion, in retrospect, something has gone terribly awry and one of the early indicators was heavier than normal replication business. So, thus the custom alert.

All this is to share with you some valuable information before i share the funny that I found. I was tweeking one of the alerts, and went back in time a day to see the data it had collected in the Analysis portion of the SQL Monitor tool. And the graphic representation of the data seemed to be mimicking the icon of the application itself.

 


This made me giggle and i had to share it with you.
10 Feb 20:30

Worst Bar Chart* of 2014–We May Already Have a Winner

by Karen Lopez

 

Update: It appears that this chart and other data visualizations have been removed from the website and report.  I’m hoping that means that the authors will be refactoring them with improved graphics.  Meanwhile, I’m going to leave my post below as is.  There are good lessons and tips to be shared. 

I know.  I hear you. It’s still January and we might just have a winner, one that will be impossible to beat during the next 12 months. Incredible. As you may recall, in late 2011 I awarded Stupidest Bar Chart to a doozy from Klout.  That bar chart was confusing, but not in the way this one is.  First, put down your beverage of choice.  Then take a look at this:

 

image

 

Yeah.  That…chart.  It’s kind of like a horizontal stacked bar chart.  I don’t understand anything about it, though.   This chart comes from an infographic at Deloitte.com  on Analysis Trends for 2014.

Maybe zooming in might help?

 

image

 

Nope, doesn’t make it any clearer. In fact, it’s just as crazy, but bigger.  Call it Big Crazy DataTM.

Here are the issues and questions I have about it:

  1. What do the colours mean? If this were a stacked bar chart, the grey and blue areas would indicate different data.  It appears that only some sections have data. But I’m not sure.
  2. What is the scale?  Normally a bar chart would have an axis that indicates some measure and all the bars would be graphed against that axis.  This has no axis.
  3. Why do some bars have signed numbers and one have a range?  Why are some numbers unsigned? Even some delta numbers are unsigned.
  4. What do the relative sizes of the sections mean?  In one bar we see a blue section labeled 285, but it’s larger than a section labeled 425-475
  5. Where numbers appear, do they describe the section they are on or the section next to the number? I’m not sure
  6. What does the relative position of the blue section mean? I’m not sure.
  7. Why are some of the labels in light grey and some in dark grey? I’m not sure
  8. What are the units of measurement for these numbers? Are some percentages? Units of 1000s? 100,000s? Are they of people? Positions? Something else? I’m not sure.
  9. Do the endnotes there explain the numbers? No, they are just citations for reference materials used to create the report.

Maybe the chart has an explanation inside the full document, Analytics Trends 2014: (And why some may not materialize)… No, same chart, no text that directly explains any of the numbers. To add some irony to this, the report itself is about Analytics and even covers trends in visualizations.

A Picture is Worth A Thousand Words, Unfortunately.

image

The report has something to say about data visualizations used in data analytics:

There’s no question that visualization has become a critical capability for organizations of virtually every shape and size. Easy-to-use software makes complex data accessible and understandable for almost any business user. From discovery and visual exploration to pattern and relationship identification, today’s visualization tools easily affirm the adage that a picture is worth a thousand words. Or, in this case, numbers.

This is especially true with big data, where visualization may even be a necessary capability for driving insights. That’s why visually oriented tools are rising in prominence for many big data applications. Users get to understand, explore, share, and apply data efficiently and collaboratively—often without the need for analytics professionals. And that’s where the risk comes in. In their eagerness to dive into data, users may choose polished graphics over thorough data preparation and normalization and rigorous analysis—glossing over important insights and analysis opportunities and potentially producing erroneous results. [emphasis mine]

Keep reading the report from that section.  The irony burns.

What’s Going on with this Bar Chart?

I’d bet that the Analytics professionals at Deloitte know much better than this.  The webpage and report for Analytics trends is beautiful to look at.  I’m guessing that a graphics designer has taken these numbers and created a beautiful, yet meaningless graphic with numbers. And just as the report predicts, people who don’t understand how to best use visualizations can gloss over important insights and analysis opportunities and potentially produce erroneous results.  This report has some great points. And it’s pretty.  Very, very pretty.  But the distraction of bad visualizations makes difficult for me to actually see the points the authors are trying to make.

My guess is also that this data, as a set, had no business being put together in one chart.  I’m not sure, but they don’t seem to have the same measures or even be the same type of data.  So putting them in one chart won’t help.  This was a page in a report needing a graphic, so someone made one.

*Updates:

Jamie Calder ( @jamiecalder) helped me “see” the story this chart is trying to tell: think of it as a math equation.  That might get you there.  But it’s still not an appropriate use of a bar chart.  And Josh Fennessy ( @joshuafennessy)  has pointed out that this isn’t supposed to be a bar chart at all. It’s supposed to be a waterfall chart.  But it’s dressed up as a bar chart, so I’m going to still leave as a contender for Worst Bar Chart of 2014.  Let’s just call it a self-nominated chart.  Martin Ribunal has found what is most likely the original chart from which this chart was most likely copied inspired by and has listed that in comments below.

What Have We Learned About Data Visualizations?

  1. The best data analysis can be invalidated with bad data visualizations.
  2. If you develop content, insist that you say in the final published work.  I know this is difficult in large corporate entities, but it’s important to ensuring that your goals are met.
  3. The more accessible we make self-serve BI and data visualization tools available, the more responsibility we have to educate, train, and mentor those using these tools.
  4. Show your visualizations to other people.  Ask them what they see. Ask them if they are confused, what conclusions they might have and what questions they still have.
  5. Choose the right chart type to fit your data.  Then use that chart correctly.
  6. If you needs a graphic image, don’t mimic a recognized chart type. 
  7. If you add a chart to a document, you should actual reference it in the text in the way that helps the reader understand it.
  8. If your chart has numbers, you have to say what those are number of, including some sort of unit of measure.  And your graphics should correctly portray their relative size.
  9. If a chart leaves viewers saying “I’m not sure” more than once, it’s not working.
  10. Loving your data means loving how it is presented, too.

What Would You Ask?

What other questions do you have about this…graphic.? How would you improve it?

I can’t bring myself to call it a bar chart any more.  But it’s still dressed as a bar chart, so it fits the nomination category.  If you find a bar chart or any other data visualization to nominate, let me know.  I wouldn’t want something worse than this one to go unrecognized.

27 Jan 02:50

Court Victory Gives Blogger Same Speech Protections As Traditional Press

by Soulskill
cold fjord writes "Reuters reports, 'A blogger is entitled to the same free speech protections as a traditional journalist and cannot be liable for defamation unless she acted negligently, a federal appeals court ruled on Friday. Crystal Cox lost a defamation trial in 2011 over a blog post she wrote accusing a bankruptcy trustee and Obsidian Finance Group of tax fraud. A lower court judge had found that Obsidian did not have to prove that Cox acted negligently because Cox failed to submit evidence of her status as a journalist. But in the ruling, the 9th U.S. Circuit Court of Appeals in San Francisco said Cox deserved a new trial, regardless of the fact that she is not a traditional reporter. "As the Supreme Court has accurately warned, a First Amendment distinction between the institutional press and other speakers is unworkable."... Eugene Volokh, [a] Law professor who represented Cox, said Obsidian would now have to show that Cox had actual knowledge that her post was false when she published it. ... "In this day and age, with so much important stuff produced by people who are not professionals, it's harder than ever to decide who is a member of the institutional press."' Further details are available at Courthouse News Service."

Share on Google+

Read more of this story at Slashdot.








27 Jan 01:32

Nobel Prize Winning Economist: Legalize Sale of Human Organs

by timothy
retroworks writes "Dr. Gary Becker (University of Chicago) and Julio Elias (Universidad CEMA, Argentina) wrote a thought-provoking editorial in last week's WSJ, arguing that the prohibition on voluntary sale and trade of human organs is probably killing people. In 2012, 95,000 American men, women and children were on the waiting list for new kidneys. Yet only about 16,500 kidney transplant operations were performed that year. 'The altruistic giving of organs might decline with an open market, since the incentive to give organs to a relative, friend or anyone else would be weaker when organs are readily available to buy. On the other hand, the altruistic giving of money to those in need of organs could increase to help them pay for the cost of organ transplants.' Paying for organs would lead to more transplants, the article maintains. 'Initially, a market in the purchase and sale of organs would seem strange, and many might continue to consider that market "repugnant." Over time, however, the sale of organs would grow to be accepted, just as the voluntary military now has widespread support.'"

Share on Google+

Read more of this story at Slashdot.








27 Jan 01:18

Amazon: We Can Ship Items Before Customers Order

by timothy
An anonymous reader writes "The WSJ is reporting that Amazon has obtained a patent for 'anticipatory shipping,'' and claims it knows its customers so well it can start shipping even before orders are placed. The technique could cut delivery time and discourage consumers from visiting physical stores. In the patent document, Amazon says delays between ordering and receiving purchases 'may dissuade customers from buying items from online merchants.' Of course, Amazon's algorithms might sometimes err, prompting costly returns. To minimize those costs, Amazon said it might consider giving customers discounts, or convert the unwanted delivery into a gift. 'Delivering the package to the given customer as a promotional gift may be used to build goodwill,' the patent said. Considering the problems that can arise when shipping something a customer did not order anticipatory shipping has the potential to backfire faster than an Amazon drone can deliver."

Share on Google+

Read more of this story at Slashdot.








27 Jan 01:13

The Whole Story Behind Low AP CS Exam Stats

by samzenpus
theodp writes "At first glance, the headline in The Salt Lake Tribune — Very Few Utah Girls, Minorities Take Computer Science AP Tests — appears to be pretty alarming. As does the headline No Girls, Blacks, or Hispanics Take AP Computer Science Exam in Some States over at Education Week. Not One Girl Took The AP Computer Science Test In Some States warns a Business Insider headline. And so on and so on and so on. So how could one quibble with tech-giant backed Code.org's decision to pay teachers a $250 "Female Student Bonus", or Google's declaration that 'the ultimate goal of CS First is to provide proven teaching materials, screencasts, and curricula for after-school programs that will ignite the interest and confidence of underrepresented minorities and girls in CS,' right? But the thing is, CollegeBoard AP CS exam records indicate that no Wyoming students at all took an AP CS exam (xls) in 2013, and only a total of 103 Utah students (xls) had reported scores. Let's not forget about the girls and underrepresented minorities, but since AP CS Exam Stats are being spun as a measure of CS education participation (pdf) and equity, let's not forget that pretty much everyone has been underrepresented if we look at the big AP CS picture. If only 29,555 AP CS scores were reported (xls) in 2013 for a HS population of about 16 million students, shouldn't the goal at this stage of the game really be CS education for all?"

Share on Google+

Read more of this story at Slashdot.








27 Jan 00:52

Light Years 101

27 Jan 00:48

Hacker Says He Could Access 70,000 Healthcare.Gov Records In 4 Minutes

by timothy
cold fjord writes with this excerpt from Computerworld: "[W]hite hat hacker David Kennedy, CEO of TrustedSec, may feel like he's beating his head against a stone wall. Kennedy said, 'I don't understand how we're still discussing whether the website is insecure or not. ... It is insecure — 100 percent.' Kennedy has continually warned that healthcare.gov is insecure. In November, after the website was allegedly 'fixed,' he told Congress it was even more vulnerable to hacking and privacy breaches. ... 'Out of the issues identified last go around, there has been a half of a vulnerability closed out of the 17 previously disclosed ... other security researchers have also identified an additional 20+ exposures on the site.' ... Kennedy said he was able to access 70,000 records within four minutes ... At the House Science and Technology Committee hearing held last week ... elite white hat hackers — Kevin Mitnick, Ed Skoudis, Chris Nickerson, Eric Smith, Chris Gates, John Strand, Kevin Johnson, and Scott White – blasted the website's insecurity. ... Mitnick, the 'world's most famous hacker' testified: '... It would be a hacker's wet dream to break into Healthcare.gov ... A breach may result in massive identity theft never seen before — these databases house information on every U.S. citizen! It's shameful the team that built the Healthcare.gov site implemented minimal, if any, security best practices.'"

Share on Google+

Read more of this story at Slashdot.








27 Jan 00:44

EU Commissioner Renews Call for Serious Fines in Data Privacy Laws

by Unknown Lamer
DW100 writes "Despite Google being fined €900,000 by Spanish authorities and €150,000 in France for its controversial privacy policies in recent months, an EU commissioner has admitted this is mere 'pocket money' to the company. Instead, a new legal regime that would have seen Google fined $1bn for breaching data protection laws is needed to make U.S. companies fear and respect the law in Europe. 'Is it surprising to anyone,' asked Commissioner Viviane Reding, 'that two whole years after the case emerged, it is still unclear whether Google will amend its privacy policy or not? Europeans need to get serious. And that is why our reform introduces stiff sanctions that can reach as much as 2% of the global annual turnover of a company. In the Google case, that would have meant a fine of EUR 731 million (USD 1 billion). A sum much harder to brush off.'"

Share on Google+

Read more of this story at Slashdot.


27 Jan 00:41

Not That Simple

Not That Simple

Submitted by: Unknown

27 Jan 00:40

You Might Rent Features & Options On Cars In the Future

by Soulskill
cartechboy writes "These days, you go to a car dealership and you buy a car. If you want seat heaters, you might need to option for the cold weather package from the factory. Want the high-end stereo? You'll be likely be opting for some technology package which bundles in navigation. While some options are a la carte, most are bundled, and even when they are a la carte, they aren't cheap. What if in the future you could buy a car and unlock options later? Say the car came from the factory with heated seats, but you didn't pay for them. But later on, say in the middle of the freezing winter, you suddenly want them. What if you could simply pay a monthly fee during the winter months to have those heated seats work? Whether this model would benefit the consumer, the automakers, or both is yet to be seen. But automakers such as MINI are already talking about this type of a future. Is this the right road to be headed down, or are consumers going to just get screwed in the long run?"

Share on Google+

Read more of this story at Slashdot.








27 Jan 00:31

Atherton is the Home of the World's Dumbest First World Problems, if You Can Believe These Ridiculous Newspaper Clips

Atherton is the Home of the World's Dumbest First World Problems, if You Can Believe These Ridiculous Newspaper Clips

Redditor HesterLee gathered these clips from the police blotter of Atherton, California, where "the average house costs $4 million." Look below, and see what constitutes "suspicious" activity and grave danger according to some of the most entitled people in the United States

Submitted by: ichc.chrispeeler

27 Jan 00:12

Yep, People Are Still Using '123456' and 'Password' As Passwords In 2014

by Soulskill
Nerval's Lobster writes "Earlier this week, SplashData released its annual list of the 25 most common passwords used on the Internet — and no surprise, most are so blindingly obvious it's a shock that people still rely on them to protect their data: '12345,' 'password,' 'qwerty' '11111,' and worse. There were some interesting quirks in the dataset, however. Following a massive security breach in late 2013, a large amount of Adobe users' passwords leaked onto the broader Web; many of those users based their password on either 'Adobe' or 'Photoshop,' which are terms (along with the ever-popular 'password') easily discoverable using today's hacker tools. 'Seeing passwords like "adobe123" and "photoshop" on this list offers a good reminder not to base your password on the name of the website or application you are accessing,' Morgan Slain, CEO of SplashData, wrote in a statement. Slashdotters have known for years that while it's always tempting to create a password that's easy to remember — especially if you maintain profiles on multiple online services — the consequences of an attacker breaking into your accounts are potentially devastating."

Share on Google+

Read more of this story at Slashdot.








27 Jan 00:03

Open Data Tells NYC Residents Where the Rats Are

by samzenpus
itwbennett writes "The New York City Health Department's Rat Information Portal provides raw data on where the rats are, based on inspections done by the health department, as well as by their rat indexing initiative. The portal isn't a new open data initiative, but if you're a NYC resident and not a big fan of rodents, the site is worth a look. 'The most interesting part of the portal is the interactive heat map of rat inspection data,' says ITworld's Phil Johnson. 'Using this interactive map, you can look up the inspection history, going back to 2009, for any address in the five boroughs. It will tell you the dates and results of any inspections, as well of any follow up compliance checks. As for raw data, the site provides city-wide rat reports, aggregated to the zip code level, going back to 2006.'"

Share on Google+

Read more of this story at Slashdot.








27 Jan 00:02

Security Vendors Self-Censor Target Breach Details

by samzenpus
angry tapir writes "At least three security companies have scrubbed information related to Target from the Web, highlighting the ongoing sensitivity around one of the largest-ever data breaches. How hackers broke into Target and installed malware on point-of-sale terminals that harvested up to 40 million payment card details is extremely sensitive. Now, details that give insight into the attack are being hastily removed or redacted by security companies."

Share on Google+

Read more of this story at Slashdot.








24 Jan 21:01

VMware Hardened Virtual Appliance Operations Guide

by Erin Banks
EMC logo

VMware recently released the Hardened Virtual Appliance Operations Guide. This whitepaper was written to assist with the additional components that an administrator may choose to implement on the hardened virtual appliances. You see VMware created a hardened virtual appliance “by embedding the technical requirements of the STIG in the design”. What the heck am I talking about you might say? Well, the Defense Information Systems Agency (DISA) is an entity for the Department of Defense (DoD) and the security technical implementation guide (STIGs) are standards for DoD IA and IA enabled systems. In my experience, DoD is the only one that uses the STIGs but I am sure that there are other companies or civilian agencies out there using the document as guidance. Either way, it is a way to make the system stronger or like we say, harden it. There are many ways of making the system stronger, for instance the Hardening Guidelines that VMware releases as well. img_hardening

Now that you understand that all of these items are in place, you may ask yourself why you would even bother. Well, in some parts, you won’t have to because as the whitepaper mentions, there are currently (as of Jan 2014) 9 products that utilize the hardened virtual appliance. So in this case, you don’t have to worry so much about building out your own, VMware did it for it. But lets go back to the original question… why do you care? Because you want to minimize any avenue that an attacker might have into your system. You want to reduce the attack surface. In terms that my mom can understand… you want to check all the windows… check all the doors… make sure they are locked, and turn on the house alarm before you go to bed. You want to reduce any avenue that someone may have of getting in to your house. Possibly you don’t have an alarm system so your house can’t be as hardened but there additional steps that you can take to get closer to 100%. This whitepaper helps with those additional steps which include:

  • root password
  • password expiry
  • dodscript.sh script
  • secure shell, administrative accounts, and console access
  • time sourcing and synchronization
  • log forwarding – syslong.ng and auditd
  • boot loader (grub) password
  • NFS and NIS

security-hardening-services-250x250Your next step, go back to the vendors that release virtual appliances and ask them if they have hardened them. If they have no idea what you are talking about, mention that VMware has done it, they should too. The fact is all of the experience shows that people don’t manage security for themselves well, so if it can be BUILT IN like we always say… why not!!


22 Jan 02:52

FCIP - Connectivity over Distance

by Allen Ward
EMC logo
As always it has been a busy few months (and then some) and I always find it challenging to get the time to sit down and write. I've had a larger post bouncing around in my head for a while and haven't quite managed to get it out yet so I thought I might
21 Jan 18:26

It Doesn’t Matter What You Think: Setting up the Linksys EA6900

by Jeramiah Dooley
EMC logo

Trust me, by the time we get done you’ll understand the title…

Based on the work that Chris Wahl did documenting his upgrade to 5GHz in his home lab, I ran right out and purchased a spiffy new Linksys AC1900 Dual Band SMART Wi-Fi Router (EA6900) off of Amazon.  Like most of you, I’m no stranger to wireless access points and routers, and I have a closet full of old ones that I’ve upgraded over the years.  I’ve not been a huge DD-WRT user, but I definitely have a couple devices I’ve bricked along the way.  All of this to say, I wasn’t sweating the installation of this device.  At all.

Yeah.  About that.

I’m hard-pressed to think of any piece of electronics that has frustrated me more during the setup process than this one.  It all seems to start with a firmware update done to Linksys devices back when they were owned by Cisco.  Remember the uproar in mid-2012 when, unannounced to users, their devices were “upgraded” to default to a cloud-based management system?  Cisco quickly backed down, or at least gave customers a way to downgrade the firmware, so problem solved, right?

Well, Belkin bought Linksys in March of 2013, and it seems like they decided that the cloud management thing was pretty cool.  The EA6900 is the first of the Belkin/Linksys devices put on the market, and that internet-based management system is front and center.  It’s not terrible, or criminal, or even that hard to deal with, but it’s a significant departure from the standard setup process we’ve all come to know and love with these devices.  Since I spent a couple hours figuring out every possible way an installation could fail, I figured I write up the easiest possible way for it to succeed.  Here we go.

After you’ve unboxed it, and gotten it plugged in, connect a LAN cable to Port #1 on the back of the device.  They are labeled in blue.  Next, you would think you’d have to connect to the default IP address in a web browser.

IT DOESN’T MATTER WHAT YOU THINK!

Linksys still uses http://192.168.1.1 as the default, but now, when you browse to it, it immediately redirects you to http://linksyssmartwifi.com.  Which is, of course, the cloud management portal we are going to completely avoid.

So, first things first, you need to edit your local hosts file to map linksyssmartwifi.com to 192.168.1.1.  That way, when you browse to the IP address, you are actually connecting directly to the box.  linksyssetup10After you do that and re-connect to the web browser you should see this screen:

linksyssetup13

At this point, I want you to resist, with every fiber of your being, the desire to check the box and hit “Next”.  Just don’t do it, or I promise you’ll be resetting the device to factory defaults and starting over.  Trust me.  Check the box and select “I want to skip setup and configure my router manually”. It’ll churn for a bit and you’ll see this screen:

linksyssetup21

Click “Login” and you’ll see this:

linksyssetup22

The default password for this device is “admin”, so log in and you’ll see the main dashboard, looking something like this:

linksyssetup18

Again, none of these features are bad, but just not anything we will normally need in a lab setting.  Or, ever, really.  So you’d think that the next step would be to setup bridge mode, right?

IT DOESN’T MATTER WHAT YOU THINK!

Funny enough, if you set up bridge mode now, you lose not only the silly features you didn’t want to begin with, but basic functionality, like being able to change the hostname of the device.  So, before you change the mode, click on the dropdown at the top right, edit the device name and save it.  I know, who cares about hostnames, right?  Well, they all go in my spreadsheet, so I care.

Then go into “Connectivity”, choose the “Internet Settings” tab and change the drop-down to “Bridge Mode”.  After you do, and you manually set the IP address of the device and hit “Apply” you’ll get the following message:

linksyssetup08

Ahh, finally.  Just a normal old wireless access point.  Click “Yes” and you get another big red exclamation point:

linksyssetup09

Browse to the IP address you set, and you should be able to login to the device locally using the admin password.  Now you can do the setup you need to, including the SSID naming and WPA passwords which Chris detailed in his post.

But, for the sake of argument, let’s say you want to go through the setup wizard.  This makes you a strange, crazy person, but okay, here you go…

linksyssetup13

Remember this screen?  The one where I implored you not to click next?  Well, go ahead crazy person.  Click next.  After a couple seconds you’ll get the following error message, which makes sense since you don’t have anything plugged into the Internet port.

linksyssetup14

If you are thinking “Well, I’ll just hit the back button and cancel out of the wizard and do it like Jeramiah first explained…

IT DOESN’T MATTER WHAT YOU THINK!

If you do that, you’ll get to this screen:

linksyssetup17

And, boys and girls, you’ll never get away from it.  Even if you use a custom URL to get in and manage the device, and get it set up the way you want, you’ll get this same splash screen every time you log in, until you reset the router and start over.

Instead, take a second LAN cable and plug it into the “Internet” port on the router.  Give it a couple seconds to connect up, then click “Next”.  It’ll churn for a while, then you’ll get this message:

linksyssetup15

You might think “I bet if I uncheck the box there, it’ll prevent the device from automatically updating firmware…”

IT DOESN’T MATTER WHAT YOU THINK!

No matter what, the wizard is going to look for, and load, any new firmware that is available.  Which is another good reason not to use the wizard, in my opinion.

Finally, you get to enter your wireless information…

linksyssetup01

…and then your device password.

linksyssetup02

After confirming those choices, you’ll get a screen that asks if you want to create a Linksys Smart Wi-Fi Account.  Since you were the one who insisted on using the damn wizard anyway, I’m sure you’d like one, but know that as soon as we put the device into bridge mode it won’t be accessible from the internet anyway.  Click “No thanks” and move on.

linksyssetup04

Now we are back at the main admin page, but this time you can see that we are connected to the internet because of the second cable we added.

linksyssetup06

Change your host name, and then change your internet settings to bridge mode.  Be aware: once you do you’ll have to immediate unplug the LAN cable from the “Internet” port in order to keep the router from becoming an closed loop on the network.  You’d think that moving to bridge mode would either disable the Internet port, or make it “Port 5” on the embedded switch, but…

IT DOESN’T MATTER WHAT YOU THINK!

So there you go.  It’s set up and working.  In my testing, the 5GHz works as advertised, although the range on it isn’t as good as I’d hoped.  I’m going to do some testing with the antennas and see if I can’t get it a little better.

Thanks to Chris for the original post, I hope this helps some of you out there trying to get this thing set up and working with a minimum of lost hair.

21 Jan 18:22

The future of the enterprise engineer will be more than technical

by Keith Townsend
EMC logo

helpwanted

Popular thought is eventually every product gets commoditized. Some industries may take longer than others, but competition normally catches the market leader and costs a driven down. Or more efficient ways of providing a product or service is developed. Vendors then have to find some way to add value around the commoditized product. I’m wondering if enterprise infrastructure engineer is quickly becoming a commodity and what value-add needs to occur to stay happily employed.

We have heard it preached that computer science is one of the sure career bets. You constantly see a need for computer science majors in the workforce. However, IT engineers are under pressure from several fronts. There are Cloud services such as AWS, VMware Cloud Hybrid Service and the host of OpenStack providers competing for your compute responsibilities. Then there are the traditional outsourcing companies such as Accenture, IBM and HP Enterprise Services competing for general IT services. Both of these trends have been going on for a few years now. Adding to these pressures are the hyper-converged infrastructure and Software Defined Networking (SDN).

Our industry is a double-edged sword. One of the most compelling parts of the industry is how quickly it moves, and the need to constantly re-invent self. But it’s this same innovation that can bite the career. We find ourselves in situations where we become as smart as possible to make our own jobs obsolete. Every year it’s a grind to do more with less. And every year the bar is set even higher.

“Hyper-converged Infrastructures” is an interesting development during the recent past. Enterprise technology is unique in that as it matures, the physical size of the equipment shrinks while the performance and capability increases. This has enabled a new class of datacenter devices that combine shared storage, compute and the connection between the nodes (networking). Examples of companies putting out products in this category include Nutanix and SimpliVity. The advantages from a technology perspective are pretty obvious. You eliminate the cost and complexity while achieving great or good enough performance.

Hyper-converged Infrastructures allow administrators to do more with less. Not just less physical resources but also with less engineering staff to support the environment. For the right installation, converged infrastructures can reduce, eliminate or consolidate entire support pillars. Converged Infrastructures are aimed at reducing complexity and costs. SDN also adds a similar type of pressure.

The network will get virtualized. There maybe a question to what extent and which technology. The enterprise will eventually embrace the network overlay. There will always be a need for engineers to manage the physical layer of the network. Just like when carriers implemented overlays such as MPLS, there was a need to continue maintaining the physical underlay. However, it did enable the carriers to reduce the costs of providing services to more customers using less infrastructure and engineers.

SDN will have the same impact on enterprise. The promise of the technology will allow enterprises to offer better service levels with less infrastructure and fewer engineers. This isn’t a new trend in technology. IT engineers ways to do more with less every day. You should be able to look over your resume and see how you’ve enabled an enterprise to leverage technology to do great things while either reducing cost or at least slowing down the growth of IT spending. In other words, the measure of your success is how well you are able to do more with less each day.

This all means the nature of the jobs change with more frequency than most of us would like at times. If you are an engineer in a pillared organization, you should look beyond your pillar organization to a broader skillset. The individual pillars are being broken down providing an efficient infrastructure. The need to have a deeper understanding across IT infrastructure is critical. But that’s the simple part. As engineers, we understand upgrading our technical skill to meet the market. Just upgrading your technical skill over the long period will help you stay employed in technology. With the direction of the market, the job may be with a cloud provider or service company.

What do you do if you enjoy working in end user organizations? To do so, enterprise engineers need to become more familiar with the business of business. They need to add value around the commodity of enterprise service. Enterprise engineers will be required to take business requirements and translate them into technical requirements. Think of the role as a management consultant with a focus on enterprise IT. Today it’s a unicorn of sorts. The team over on Geek Whispers talked about the eluded to this mystical beast. A seasoned IT worker with deep business skill.

I may be biased. My day job is an IT Management Consultant. I get to speak geek and business every day and see the future of the enterprise IT worker consolidating. What are your thoughts? Can you stay a pure geek and thrive in corporate IT long term?