Shared posts

21 Sep 10:45

Felipe wants to read 'My Father Before Me: How Fathers and Sons Influence Each Other Throughout Their Lives'

20 Sep 13:42

David Rader: Improve PostgreSQL on Windows performance by 100%

L

Crap OS!

It sounds like click-bait, or one of those late night TV promotions – “Improve your database performance by 100% – by changing just this one setting!” But in this case, it’s true – you can drastically improve PostgreSQL on Windows performance by changing one configuration setting – and we made this the default in our Postgres by BigSQL distribution for 9.2 thru 9.6.

tl;dr – if you have high query load, change “update_process_title” to ‘off’ on Windows, and get 100% more throughput.

Improve postgresql performance by turning off update_process_title

Performance Improvement by turning off update_process_title

Most Postgres DBA’s already know that they need to tune settings for shared buffers, WAL segments, checkpoints, etc, to get the best performance from their database. If you are running PostgreSQL on Windows, there’s another setting that you need to look at, specifically “update_process_title”. Changing this setting from “on” to “off” can improve throughput on a heavy query load by close to 100%

We ran a series of benchmark tests in our performance lab and you can see the dramatic improvement in the graphs displayed. We tested PostgreSQL 9.5 on a 16-core Windows server with fast SSD drives using a standard pgbench run in both read-only and read-write modes. Scaling from 4 to 40 clients shows a plateau in throughput (measured by TPS) after 8 clients when the setting is set to “on”. Changing the update_process_title setting to “off” allows PostgreSQL to continue to scale throughput, showing increasing TPS up to 40 clients. The throughput at 32 read-only clients increases from 20K TPS to 58K TPS (180% higher) and at 40 clients continues to climb to 76K TPS (270% higher).

Improvement in read-write transactions turning off update_process_title

Improvement in read-write transactions turning off update_process_title

This performance gain is seen for both read-only and read-write workloads. With 32 clients, the write throughput increases from 2,700 TPS to 7,700 TPS (180% higher) and at 40 clients continues to climb to 8,200 (200% higher).

The update_process_title setting controls whether or not Postgres will update the process description that you see when querying the system list of running commands based on the current SQL statement being processed. On Linux this is done using ps, on Windows it requires the ProcessExplorer tool. Updating the process description becomes a bottleneck on Windows, and limits the throughput even on a high-end server. Not many Windows admins actually use this information on a regular basis, so unless you are actively debugging a slow or long running process using this process information, you should leave this turned off.

Takayuki Tsunakawa originally tracked down this bottleneck and created a patch for PostgreSQL 9.6 that has changed the default to be ‘off’ on Windows. We have made the same setting change in BigSQL distributions of version 9.2 thru 9.5 as well as 9.6. So even if you’re not ready to move to the new 9.6 version, when you install Postgres by BigSQL on Windows you are getting the best performance out-of-the-box.

 

20 Sep 13:40

Natalie added 'What Some Of You Were'

What Some Of You Were by Christopher Keane Natalie gave 4 stars to What Some Of You Were (Paperback) by Christopher Keane
bookshelves: good-theology
A compilation of stories that have originated with Liberty Christian Ministries. They specialise in dealing with Christians (and others) struggling with same-sex attraction.

The homosexuality issue seems to have become seriously controversial in Christian circles and many avoid discussing it altogether. The church of England has sadly compromised its biblical stance as have many other denominations. The result is that some Christians are confused about where they should stand and others think that if they pretend the issue doesn't exist it will go away. There is also a climate of fear about expressing biblical views on the issue due to the general direction of the law-makers. But this sin with all its associated damage will not just suddenly vanish anymore than any other sin. We must not give in to groups like Stonewall who want to abolish Christian views on the subject altogether.

There is a difference between saying homosexuality is unacceptable to God and being homophobic. When one says the Bible stands against the performance of homosexual acts, that is not homophobic. It is not vilification to say that God rejects some human sexual behaviour. And it is no more homophobic than rejecting Christ and his messengers is Christophobic. Paul condemns a range of human behaviours (1 Corinthians 6 vs 9-11.) Homosexuality is on the list, alongside drunkenness and greed. Paul does not suggest one is better or worse than the other-these kinds of actions are incompatible with being the people of God.

This book will help Christians to better understand same-sex attraction and how to interact with those who are struggling and desire to change. There is a big difference between someone who says that they are gay, Christian and proud and someone who acknowledges their difficulty and desires to deal with it biblically. The latter are the people that really need our support and help not our condemnation and judgement. All of us are prone to weakness and temptation, why is homosexuality suddenly the big taboo?

Let's make it clear. Heterosexual adultery is no more and no less of a sin than homosexuality. Christians do not argue for heterosexuality but for obedience to God. That has implications for all of us, no matter what our gender orientation.

The book has testimonies from those who are attempting to live biblically despite their same-sex attraction either through praying and hoping for change in their orientation or through life-long celibacy. There are success stories for both approaches. It also documents a parental perspective--when a child comes out as gay. Also when a husband or wife tells their spouse they are struggling.

I particularly liked the appendix chapters which deal with

1. How we went gay (as a nation)
2. Is homosexuality biologically determined?
3. On homosexuality and change
4. Homosexuality in the New Testament

This is a good book that will not only encourage Christians to give proper support to those struggling but will clarify where Christians should stand on the issue of homosexuality.
19 Sep 16:11

Fashion Police and Grammar Police

* Mad about jorts
19 Sep 14:09

Felipe Sabino liked a review

The Fundamentals by R.A. Torrey
"In the early part of the twentieth century a controversy in the church erupted between the modernists and the fundamentalists. The fundamentalists got their name from the fact that they published an enormous number of booklets defending "the fundamentals," by which they mean the infallibility of the Bible, the deity of Christ, etc. Those booklets were published in four volumes, and I have to say that I am quite proud of my fundamentalist fathers. They acquitted themselves well, and with great learning. They were (to my view) a tad too accommodating with some things (e.g. age of the earth), but for the most part they held the line wonderfully.

The last essay in Volume 1 has this marvelous line, addressing the modernists, who haven't changed a bit in the last century.

"A striking characteristic of these people is a persistent ignoring of what is written on the other side."

The date marks my completion of the first volume. I will note when I finish the other three volumes in the body of the text here. Finished the second volume in September 2016."
16 Sep 13:18

Eduardo Maçan shared Raio Privatizador's photo.

by Eduardo Maçan

Eduardo Maçan shared Raio Privatizador's photo.

E sumam com essa porcaria de computador, internet e smartfones também!

¯(ツ)

EDIT: E os teares! Destruam os teares!
EDIT2: Pela volta dos ascensoristas!
EDIT3: E dos digitadores também!
EDIT4: Não... dos digitadores não... eles trarão computadores com eles.

15 Sep 18:35

De volta ao papel e às canetas-tinteiro

by Rodrigo Gurgel

Tenho a mania terrível de ir na contramão do meu tempo. Na verdade, não é terrível, mas salutar. Há formas de pensar, valores, livros, comportamentos, hábitos que, hoje, começam a ser esquecidos, mas não perderam sua importância. Podem inclusive estar completamente esquecidos por alguns, mas continuam emitindo sinais inquestionáveis de que, se recuperados, têm o poder de melhorar nossa vida.

Sou um aficionado da tecnologia. Um novo software — que esteja ligado, de alguma forma, à escrita ou à leitura — sempre me atrai. E, graças à tecnologia, meu trabalho se tornou, nos últimos anos, menos cansativo, pois pude suprimir etapas que me faziam perder tempo — como copiar, para um arquivo Word, as anotações com que sempre preencho os livros que leio. Leitores de e-books tornaram-se não só úteis, mas indispensáveis.

Entretanto, a onipresença do teclado me incomodava. O distanciamento de uma forma de escrita que me obrigasse a desenhar as palavras pareceu-me, a partir de certo momento, uma perda estética — ainda mais para mim, que sempre apreciei as canetas-tinteiro, a textura dos diferentes papéis, o odor e as cores das tintas. Havia uma perda sensorial que me perturbava.

O que era uma impressão vaga, desconforto impreciso, ganhou corpo quando li o estudo de Pam A. Mueller e Daniel M. Oppenheimer a respeito de como tomar notas em laptops resulta num processamento mais superficial das ideias. No primeiro momento, desconfiei do estudo — não seria mais uma conclusão apocalíptica? Depois, refletindo, comparando as conclusões dos pesquisadores ao que tantos escritores afirmam, comecei a questionar meu julgamento: não, concluí, voltar a escrever com canetas-tinteiro não se tratava apenas de nostalgia, ainda que esse sentimento estivesse presente.

papel e caneta-tinteiro

Estou convencido de que as canetas-tinteiro e o papel trouxeram-me uma conexão parcialmente perdida entre meus pensamentos e a linguagem, a elaboração do texto.

A única forma de descobrir os efeitos da escrita à mão seria voltar aos velhos instrumentos — e quando decidi fazê-lo, percebi que, sim, eu desenvolvia as ideias com mais facilidade, com maior rapidez. O texto brotava com uma celeridade que eu havia esquecido.

Pode parecer pedante, mas reutilizar a caneta-tinteiro, ver o desenho das letras no papel, alimenta uma espécie de prazer. Tudo me parece mais real, mais vivo. Estou conectado ao meu próprio eu de uma forma mais clara, mais intensa. A própria cadência da mão sobre o papel, desenhando os sinais que me acompanham desde a infância, quando minha mãe ensinou-me a escrever em pedaços de papel polvilhados de farinha, tudo me torna mais produtivo, mais próximo da minha índole. A escrita deixou ser um ofício, ofício de que me orgulho, para ser também uma forma de aconchego.

Estou convencido de que as canetas e o papel trouxeram-me uma conexão parcialmente perdida entre meus pensamentos e a linguagem, a elaboração do texto. Escrever à mão talvez produza outros tipos de sinapses. Ou talvez eu esteja apenas sonhando. Mas meus escritos, com certeza, agora refletem melhor minha personalidade.

The post De volta ao papel e às canetas-tinteiro appeared first on Rodrigo Gurgel.

09 Sep 15:11

Wrong

Hang on, I just remembered another thing I'm right about. See...
08 Sep 13:21

[$] An asynchronous Internet in GNOME

by n8willis

At GUADEC 2016 in Karlsruhe, Germany, Jonathan Blandford challenged the GNOME project to rethink how its desktop software uses network access. The GNOME desktop assumes Internet connectivity is always available, which has the side effect of making the software stack considerably less useful and, indeed, usable to people who live in those places regarded as the developing world.

08 Sep 13:15

Julia Reda, MEP: "Proprietary Software threatens Democracy"

Julia Reda, MEP: "Proprietary Software threatens Democracy"

Julia Reda ended the QtCon, a conference for the Free Software community, with a closing keynote on, among other things, Free Software in the European Public Sector.

Ms Reda, a member of the EU Parliament for the Pirate Party, explained how proprietary software, software that forbids users from studying and modifying it, has often left regulators in the dark, becoming a liability for and often a threat to the well-being and health of citizens.

An example of this, she said, is the recent Dieselgate scandal, in which auto-mobile manufacturers installed software that cheated instruments that measured fumes in test environments, only to spew illegal amounts of toxic exhaust into the atmosphere the moment they went on the road.

Ms Reda also explained how medical devices running proprietary software posed a health hazard for patients. She gave the example of a woman with a pacemaker who collapsed while climbing some stairs due to a bug in her device. Doctors and technicians had no way of diagnosing and correcting the problem as they did not have access to the code.

Also worrying is the threat software with restrictive licenses pose to democracy itself. The trend of substituting traditional voting ballots with voting machines is especially worrying, because, as these machines are not considered a threat to national security, their software also goes unaudited and is, in fact, unauditable in most cases.

And, although voting machines are built and programmed by private companies, they are commissioned by public entities and paid for with public money, money taken from citizens' taxes. However, there are no universal EU regulations that force companies, or, indeed, public organisations, to make the source code available to the citizens that have paid for it, said Ms Reda.

Furthermore, she noted that, despite the fact Free Software technologies (web servers, CMSs, email servers, and so on) are used extensively throughout the public administration, the public sector assumes very little responsibility in the way of giving back to the community via patches or even bug reports.

Ms Reda said that the solution to this very dismal state of affairs is a multi-pronged one. She commended the Free Software Foundation Europe for its work in advocating for all software commissioned by public entities and paid with public money, be made available under free/libre licenses for everyone. She also noted that to get governments on the side of Free Software it is essential to make them see its merits.

Only like this, she said, would it be possible to make legislators regulate coherently in favour of free/libre technologies.

Support FSFE, join the Fellowship
Make a one time donation

02 Sep 17:13

Douglas added '"But God...": The Two Words at the Heart of the Gospel'

"But God..." by Casey Lute Douglas gave 4 stars to "But God...": The Two Words at the Heart of the Gospel (Paperback) by Casey Lute
This is a very encouraging little book, pointing to two words that function as the hinge of all gospel living. There is a problem, a dilemma, a crisis, a trouble, and God’s people are up against it. This happens time and again in the Scriptures. And the next two words are but God . . .

Casey Lute walks through the Scriptures, pointing to nine key instances of this. He starts with Noah. “But God remembered Noah and all the beasts and all the livestock that were with him in the ark. And God made a wind blow over the earth, and the waters subsided” (Gen. 8:1, ESV).

The God of the Bible is the God who saves, and He saves us in the middle of the story. The problem is sketched out and made clear, and then, when all hope is lost, we hear the words but God. God loves cliffhangers.

But God remembered Noah. But God delivered Israel at the Red Sea. But God raised Jesus from the dead. This book is a quick read, but quite meaty for all that.
02 Sep 17:13

Frederico wants to read 'A Fênix Islamista: O Estado Islâmico e a Reconfiguração do Oriente Médio'

02 Sep 17:13

Frederico wants to read 'A Batalha do Avaí - A beleza da barbárie: a Guerra do Paraguai pintada por Pedro Américo'

02 Sep 17:13

Felipe wants to read 'On the Government of God'

On the Government of God by Salvian Felipe wants to read On the Government of God by Salvian
02 Sep 15:55

Gripen Image Of The Month

by Saab AB
.:

czech gripen 6.jpg
Gripen's high operational availability, rapid turnaround and minimal support requirements delivers more time in the air.

 Photo: Jörgen Nilsson (jn_photo.se)​

Published: 9/1/2016 6:23 AM
02 Sep 15:55

Felipe Sabino liked a review

02 Sep 15:55

Felipe Sabino liked a review

01 Sep 18:02

Dan Robinson: When To Avoid JSONB In A PostgreSQL Schema

PostgreSQL introduced the JSONB type in 9.4 with considerable celebration. (Well, about as much as you can expect for a new data type in an RDBMS.) It’s a wonderful feature: a format that lets you store blobs in the lingua franca of modern web services, without requiring re-parsing whenever you want to access a field, and in a way that enables indexing for complicated predicates like containment of other JSON blobs. It meaningfully extends PostgreSQL and makes it a viable choice for a lot of document store workflows. And it fits nicely in a startup engineering context: just add a properties column to the end of your table for all the other attributes you might want to store down the road, and your schema is now officially Future Proof TM.

We lean on JSONB heavily at Heap, and it’s a natural fit, as we have APIs that allow customers to attach arbitrary properties to events we collect. Recently, I’ve gotten a few questions about the benefits and drawbacks of using JSONB to store the entirety of a table – why have anything but an id and a data blob?

The idea of not having to explicitly manage a schema appeals to a lot of people, so it shouldn’t be surprising to see JSONB used this way. But there are considerable performance costs to doing so, some of which aren’t immediately obvious. There is great material for deciding which of JSON, JSONB, or hstore is right for your project, but the correct choice is often “none of the above.” [1] Here are a few reasons why.

Hidden Cost #1: Slow Queries Due To Lack Of Statistics

For traditional data types, PostgreSQL stores statistics about the distribution of values in each column of each table, such as:

  • the number of distinct values seen
  • the most common values
  • the fraction of entries that are NULL
  • for ordered types, a histogram sketch of the distribution of values in the column

For a given query, the query planner uses these statistics to estimate which execution plan will be the fastest. For example, let’s make a table with 1 million “measurements” of three values, each chosen at uniform random from {0, 1}. Each measurement was taken by one of 10,000 scientists, and each scientist comes from one of three labs:

Let’s say we want to get the tick marks in which all three values were 0 — which should be about 1/8th of them — and see how many times each lab was represented amongst the corresponding scientists. Our query will look something like this:

And our query plan will look something like this: https://explain.depesz.com/s/H4oY

This is what we’d hope to see: the planner knows from our table statistics that about 1/8th of the rows in measurements will have value_1, value_2, and value_3 equal to 0, so about 125,000 of them will need to be joined with a scientist’s lab, and the database does so via a hash join. That is, load the contents of scientist_labs into a hash table keyed on scientist_id, scan through the matching rows from measurements, and look each one up in the hash table by its scientist_id value. The execution is fast — about 300 ms on my machine.

Let’s say we instead store our measurements as JSONB blobs, like this:

The analogous read query would look like this:

The performance is dramatically worse — a whopping 584 seconds on my laptop, about 2000x slower: https://explain.depesz.com/s/zJiT

The underlying reason is that PostgreSQL doesn’t know how to keep statistics on the values of fields within JSONB columns. It has no way of knowing, for example, that record ->> 'value_2' = 0 will be true about 50% of the time, so it relies on a hardcoded estimate of 0.1%. So, it estimates that 0.1% of 0.1% of 0.1% of the measurements table will be relevant (which it rounds up to ~1 row). As a result, it chooses a nested loop join: for each row in measurements that passes our filter, look up the corresponding lab_name in scientist_labs via the primary key of the latter table. But since there are ~125,000 such measurements, instead of ~1, this turns out to take an eternity. [2]

As always, accurate statistics are a critical ingredient to good database performance. In their absence, the planner can’t determine which join algorithms, join orders, or scan types will make your query fast. The result is that innocent queries will blow up on you. This is one of the hidden costs of JSONB: your data doesn’t have statistics, so the query planner is flying blind.

This is not an academic consideration. This caused production issues for us, and the only way to get around them was to disable nested loops entirely as a join option, with a global setting of enable_nestloop = off. Ordinarily, you should never do something like that.

This probably won’t bite you in a key-value / document-store workload, but it’s easy to run into this if you’re using JSONB along with analytical queries.

Hidden Cost #2: Larger Table Footprint

Under the hood, PostgreSQL’s JSON datatype stores your blobs as strings that it happens to know are valid JSON. The JSONB encoding has a bit more overhead, with the upside that you don’t need to parse the JSON to retrieve a particular field. In both cases, at the very least, the database stores each key and value in every row. PostgreSQL doesn’t do anything clever to deduplicate commonly occurring keys.

Using the above measurements table again, the initial non-JSONB version of our table takes up 79 mb of disk space, whereas the JSONB variant takes 164 mb — more than twice as much. That is, the majority of our table contents are the the strings value_1, value_2, value_3, and scientist_id, repeated over and over again. So, in this case, you would need to pay for twice as much disk, not to mention follow-on effects that make all sorts of operations slower or more expensive. The original schema will cache much better, or might fit entirely in memory. The smaller size means it will also require half as much i/o for large reads or maintenance operations.

For a less contrived anecdote, we found a disk space savings of about 30% by pulling 45 commonly used fields out of JSONB and into first-class columns. On a petabyte-scale dataset, that turns out to be a pretty big win.

As a rule of thumb, each column costs about 1 bit of overhead for each row in your table, regardless of whether the column’s value is null.[3] So, for example, if an optional field is going to have a ten-character key in your JSONB blobs, and thus cost at least 80 bits to store the key in each row in which it’s present, it will save space to give it a first-class column if it’s present in at least 1/80th of your rows.

For datasets with many optional values, it is often impractical or impossible to include each one as a table column. In cases like these, JSONB can be a great fit, both for simplicity and performance. But, for values that occur in most of your rows, it’s still a good idea to keep them separate.

In practice, there is often additional context to inform how you organize your data, such as the engineering effort required to manage explicit schemas or the type safety and SQL readability benefits from doing so. But there is often an important performance penalty as well for unnecessarily JSONB-ing your data.

Know another innocuous change with big performance implications? Ping me @danlovesproofs.

We’re constantly evaluating alternative schemas and indexing strategies for serving ad hoc queries across hundreds of billions of events. Interested in working with us? Shoot us a note at jobs@heapanalytics.com.

[1] I recommend this post, for starters: https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/
[2] As an aside, explain.depesz is a wonderful tool for finding problems like these in your queries. You can see in this example that the planner underestimated how many rows would be returned by this subquery by a factor of 124,616.
[3] This isn’t quite correct. PostgreSQL allocates one byte per row for the first 8 columns, and then 8 bytes / 64 rows at a time after that. So, for example, your first 8 rows are free, and the 9th costs 8 bytes per row in your table, and then the 10th through 72nd are free, and so forth. (H/t Michael Malis for the investigation into this.)
30 Aug 16:59

Felipe commented on Leandro's review of Panaceia

New comment on Leandro's review of Panaceia
by Rodrigo Constantino

Por favor, leia os romances do Mainardi , em especial:

A queda
Malthus
Arquipélago
Contra o Brasil
Polígono das Secas
29 Aug 21:18

Gene Wilder's Best Joke

29 Aug 15:22

Unicode

I'm excited about the proposal to add a "brontosaurus" emoji codepoint because it has the potential to bring together a half-dozen different groups of pedantic people into a single glorious internet argument.
26 Aug 18:53

Linear Regression

The 95% confidence interval suggests Rexthor's dog could also be a cat, or possibly a teapot.
25 Aug 10:19

Douglas added 'Till We Have Faces'

Till We Have Faces by C.S. Lewis Douglas gave 5 stars to Till We Have Faces (Paperback) by C.S. Lewis
bookshelves: fiction
Stupendous. World class. Top drawer.

Finished an audio version of it in August of 2016. I have read this a total of three times. Once when I was young, and I didn't like it. The second time was in 2003, and I thought it was great. This time, and greater still.
24 Aug 18:59

Leonardo commented on Leonardo's review of Cornelius Van Til: o homem e o mito

New comment on Leonardo's review of Cornelius Van Til: o homem e o mito
by John W. Robbins

Existem alguns autores que ao fazer uma análise crítica de outro autor que discordem, geram respeito e reflexão nos leitores; este não é o caso de Robbins! Alguns fatores me fizeram dar nota 2 (e não nota 1): 1) A capa é bonita. 2) Eu concordo com a crítica dele ao estilo e a forma de comunicação de Van Til. 3) Eu gosto da editora.
Para mim, esses são os únicos pontos positivos da obra.
Minhas críticas são: 1) Seu estilo irônico, zombador e desrespeitoso em alguns momentos. 2) A citação de textos de Van Til soltos e fora de contexto para criticar uma doutrina. 3) A má compreensão de Van Til em alguns pontos, principalmente quanto sua visão do conhecimento de Deus. 4) Ao criticar alguns de serem "fans" do "mito" Van Til, ele demonstra ser a mesmo coisa com relação a Clark. 5) Sua análise e critica superficial de Van Til. Creio que o próprio Clark desaprovaria essa obra.

A tonalidade da escrita se assemelha a uma criança birrenta defendendo seu super herói favorito.
20 Aug 23:22

Podemos Perder a Salvação em Cristo?

by Solano Portela
Hebreus 6.4-8

Uma pessoa me perguntou se Hebreus 6.4-6 anularia a possibilidade de algum ex-cristão arrependido voltar a trilhar os caminhos do Senhor? A dúvida era: “A quem este trecho se refere?”. Uma outra questão, que surge com este texto, é a possibilidade de perda da salvação. Certamente já ouvimos muitas pessoas afirmarem que podemos perde-la. Na realidade, alguns chamam a certeza da salvação, “o orgulho dos Crentes”.

Essa noção está presente em algumas denominações de orientação teológica arminiana. Vários valorizam a manutenção de uma insegurança por “necessidade de preservar a vida cristã”.  Em meu entendimento, e assim ensina a teologia da Reforma, isso representa uma falta de compreensão do que a Bíblia ensina sobre a doutrina da salvação. No entanto, alguns reformados se surpreendem quando ouvem que a Confissão de Fé de Westminster apresenta a “certeza da salvação” como sendo algo adicional e não essencial à essência da fé real (CFW, Cap. 18, 3 e Catecismo Maior, Perguntas 81 e 172 do Catecismo Maior da CFW). A questão, então, não é se existe ou não “a certeza subjetiva”, ou pessoal, mas se o verdadeiramente salvo pode perder essa salvação. Afinal, um dos pontos principais da teologia bíblica, e da Reforma, é a Perseverança dos Santos.

Textos como Hb 6.4-6 podem nos deixar um pouco confusos, se forem estudados superficialmente, fora do contexto geral da Palavra de Deus. Precisamos, portanto, procurar entender algumas passagens bíblicas que parecem ir em sentido contrário à doutrina da Perseverança dos Santos.


1.     Duas passagens difíceis: Hebreus 6.4-6 e 2 Pedro 2.20-22 são dois trechos de difícil compreensão, mas analisemos cada um deles.


Alguns “provam o dom celestial” e caem, nos diz Hb 6.4-6: (4) É impossível, pois, que aqueles que uma vez foram iluminados, e provaram o dom celestial, e se tornaram participantes do Espírito Santo, (5) e provaram a boa palavra de Deus e os poderes do mundo vindouro,(6) e caíram, sim, é impossível outra vez renová-los para arrependimento, visto que, de novo, estão crucificando para si mesmos o Filho de Deus e expondo-o à ignomínia.


Alguns “escapam” do mundo, “conhecem”, mas voltam ao erro, nos diz 2 Pe 2.20-22: (20)  Portanto, se, depois de terem escapado das contaminações do mundo mediante o conhecimento do Senhor e Salvador Jesus Cristo, se deixam enredar de novo e são vencidos, tornou-se o seu último estado pior que o primeiro. (21) Pois melhor lhes fora nunca tivessem conhecido o caminho da justiça do que, após conhecê-lo, volverem para trás, apartando-se do santo mandamento que lhes fora dado. (22) Com eles aconteceu o que diz certo adágio verdadeiro: O cão voltou ao seu próprio vômito; e: A porca lavada voltou a revolver-se no lamaçal.


2.     As Exposição de João sobre o assunto – Precisamos estar cientes da exposição sobre a salvação que o apóstolo João faz no capítulo 10.26-28, do seu Evangelho e em suas três cartas universais, antes de abordar estes versos difíceis.


O Espírito Santo moveu João a registrar nesse trecho (João 10.26-28) o fundamento da doutrina da Perseverança dos Santos - “Ninguém as arrebatará”: (26) Mas vós não credes, porque não sois das minhas ovelhas. (27) As minhas ovelhas ouvem a minha voz; eu as conheço, e elas me seguem. (28) Eu lhes dou a vida eterna; jamais perecerão, e ninguém as arrebatará da minha mão. Somos seguros nas mãos de Cristo não por nosso próprio poder, mas pelo poder de Deus. Salvação não é apenas um ato de vontade nossa, mas a vontade é um reflexo e resposta à obra de Cristo na Cruz; de sua vitória sobre a morte, na ressurreição; e ao chamado eficaz do Espírito Santo em nossos corações.


Na sua primeira carta, João deixa claro que existem pessoas agregadas ao povo de Deus, mas que nunca fizeram parte dos verdadeiramente salvos pelo poder de Cristo. “Saíram de nós, mas não eram dos nossos”, diz ele. Em 1 João 2.18-20, temos: (18) Filhinhos, já é a última hora; e, como ouvistes que vem o anticristo, também, agora, muitos anticristos têm surgido; pelo que conhecemos que é a última hora. (19) Eles saíram de nosso meio; entretanto, não eram dos nossos; porque, se tivessem sido dos nossos, teriam permanecido conosco; todavia, eles se foram para que ficasse manifesto que nenhum deles é dos nossos. (20) E vós possuís unção que vem do Santo e todos tendes conhecimento.


Em sua segunda carta João fala daqueles que não se prendem à doutrina de Cristo. Ele se referia à pessoa que “ultrapassa a doutrina” e diz que esse não tem Deus. Em 2 Jo 9, ele escreve: Todo aquele que ultrapassa a doutrina de Cristo e nela não permanece não tem Deus; o que permanece na doutrina, esse tem tanto o Pai como o Filho. Entendemos que esse “ultrapassar” é o mesmo curso abordado por Paulo em Gálatas 1.8. Significa “ir além”, pregar um “outro evangelho”. A esses Paulo reserva palavras duras. Mesmo estando no meio do Povo de Deus, Paulo alerta seus leitores contra eles, e conclui que qualquer um que “...vos pregue evangelho que vá além do que vos temos pregado, seja anátema”. Ou seja, seja amaldiçoado aquele que ultrapassa a doutrina.


Na terceira carta, João chama atenção para o fato de que uma vida realmente transformada, realmente salva, não permanece no pecado. Aqueles que dizem ser salvos, mas não demonstram transformação de vida (e, infelizmente, sempre existem esses no meio do Povo de Deus), nunca viram a Deus. Em 3 João 11, ele fala dessa permanência na prática do mal:  Amado, não imites o que é mau, senão o que é bom. Aquele que pratica o bem procede de Deus; aquele que pratica o mal jamais viu a Deus.


3.     O contexto das passagens difíceis. Mas voltemos às nossas passagens difíceis, agora examinando o contexto imediato no qual elas são encontradas.


Hebreus 6.4-6, não pode ser lido isolado dos versos 7 e 8. Estes versos dizem: (7) Porque a terra que absorve a chuva que freqüentemente cai sobre ela e produz erva útil para aqueles por quem é também cultivada recebe bênção da parte de Deus;(8) mas, se produz espinhos e abrolhos, é rejeitada e perto está da maldição; e o seu fim é ser queimada.


Vemos nesses dois versos a harmonia do texto em Hebreus, com a parábola do semeador, encontrada em Mateus 13.18-23. O v. 22 é de especial importância. O trecho, em Hebreus, não está falando dos verdadeiramente salvos, mas dos semeados em espinhos. Eles recebem a chuva, igual à que cai na terra fértil, ou seja, “provam o dom”, no sentido de que são participantes das bênçãos, mas são sufocados pelos cuidados do mundo. Muitos aparentam seguir o evangelho; fazem parte dos ajuntamentos e atividades das igrejas; mas o coração não está regenerado. O texto em Hebreus não fala em “perda da salvação”, nem em impossibilidade de um crente cair em pecado e não ter condição de se arrepender, mas daqueles que demonstram, ao longo do tempo, que nunca foram convertidos.


Semelhantemente, 2 Pe 2.20-22, deve ser lido a partir do verso 9. Os que “escapam” do mundo conhecem, mas voltam ao erro, são contrastados com os “piedosos”. O texto de Pedro diz: (9)Porque o Senhor sabe livrar da provação os piedosos e reservar sob castigo os injustos, para o dia do juízo; (10) especialmente aqueles que, seguindo a carne, andam em imundas paixões e menosprezam qualquer governo. Atrevidos, arrogantes, não temem difamar autoridades superiores, (11) ao passo que anjos, embora maiores em força e poder, não proferem contra elas juízo infamante na presença do Senhor. (12) Esses, todavia, como brutos irracionais, naturalmente feitos para presa e destruição, falando mal daquilo em que são ignorantes, na sua destruição também hão de ser destruídos, (13) recebendo injustiça por salário da injustiça que praticam. Considerando como prazer a sua luxúria carnal em pleno dia, quais nódoas e deformidades, eles se regalam nas suas próprias mistificações, enquanto banqueteiam junto convosco; (14) tendo os olhos cheios de adultério e insaciáveis no pecado, engodando almas inconstantes, tendo coração exercitado na avareza, filhos malditos; (15) abandonando o reto caminho, se extraviaram, seguindo pelo caminho de Balaão, filho de Beor, que amou o prêmio da injustiça (16) (recebeu, porém, castigo da sua transgressão, a saber, um mudo animal de carga, falando com voz humana, refreou a insensatez do profeta). (17) Esses tais são como fonte sem água, como névoas impelidas por temporal. Para eles está reservada a negridão das trevas;(18) porquanto, proferindo palavras jactanciosas de vaidade, engodam com paixões carnais, por suas libertinagens, aqueles que estavam prestes a fugir dos que andam no erro, (19) prometendo-lhes liberdade, quando eles mesmos são escravos da corrupção, pois aquele que é vencido fica escravo do vencedor.


O trecho claramente fala de ímpios. Eles conheceram o caminho da justiça, porque aderiram fisicamente à igreja, relacionaram-se com o Povo de Deus, ouviram incontáveis exposições da Palavra. No entanto, em seu espírito e em suas obras, nunca experimentaram conversão. Por isso, nos versos 20 a 22, eles são comparados a porcos que voltam ao vômito. Viviam em um clima limpo, eivado de ensinamentos proveitosos à vida. Levam sobre si a condenação de rejeitarem a tudo isso e ao Senhor da Glória.


Conclusão:

A Palavra de Deus deixa claro, em muitas passagens, que somos salvos para sempre. Por isso Paulo ensina em Romanos que Aquele que iniciou a obra em nós é poderoso para completá-la (Filipenses 1.6). Obviamente, isso não é encorajamento para o pecado, mas motivo de ações de graças – somos salvos pelo poder de Deus e preservados por este  mesmo poder, não por nossas frágeis forças.


Os dois trechos que examinamos ainda que difíceis, são entendidos pelas explicações de João e pelo contexto imediato das passagens. Devemos confiar no preservador da nossa salvação e nunca devemos nos abalar ou permitir que dúvidas sejam colocadas em nossa cabeça.



Solano Portela

11 Aug 01:21

FSF Blogs: Support the Libre Tea Computer Card, a candidate for Respects Your Freedom certification

They write:

"Now imagine if you owned a computing device that you could easily fix yourself and inexpensively upgrade as needed. So, instead of having to shell out for a completely new computer, you could simply spend around US$50 to upgrade — which, by the way, you could easily do in SECONDS, by pushing a button on the side of your device and just popping in a new computer card. Doesn’t that sound like the way it should be?"

This project certainly sounds appealing, but only if the computer hardware is designed and configured to run software that does as much as possible to respect your freedom and ensure your control over your device. Fortunately, one option you have when backing this project is to purchase a Libre Tea Computer Card. After working closely with the developers and reviewing a sample test board, we are confident that their plans are to create a device that can achieve our Respects Your Freedom (RYF) certification. And because the project is running their crowdfunding on Crowd Supply, users can financially support them anonymously and without the use of proprietary Javascript.

The project is being developed by Luke Kenneth Casson Leighton of Rhombus-Tech and is sponsored by Christopher Waid of ThinkPenguin, a company that sells multiple RYF-certified hardware products. It is exciting to see passionate free software advocates in our community working with OEMs to produce a computer hardware product capable of achieving RYF certification. We hope that this is the first of many computing systems they are able to design and build that respect your freedom.

The Libre Tea Computer Card is built with an Allwinner A20 dual core processor configured to use the main CPU for graphics; it has 2 GB of RAM and 8 GB of NAND Flash; and it will come pre-installed with Parabola GNU/Linux-libre, an FSF-endorsed fully-free operating system.

We encourage you to back the Libre Tea Computer Card. We'll have to do another evaluation once it is actually produced to be sure it meets our certification standards, but we have high hopes. Their funding deadline is August 26th, so don't delay!

09 Aug 19:16

The People’s Code (White House blog)

by ris
US Chief Information Officer Tony Scott introduces the Federal Source Code Policy, on the White House blog. "By making source code available for sharing and re-use across Federal agencies, we can avoid duplicative custom software purchases and promote innovation and collaboration across Federal agencies. By opening more of our code to the brightest minds inside and outside of government, we can enable them to work together to ensure that the code is reliable and effective in furthering our national objectives. And we can do all of this while remaining consistent with the Federal Government’s long-standing policy of technology neutrality, through which we seek to ensure that Federal investments in IT are merit-based, improve the performance of our government, and create value for the American people." (Thanks to David A. Wheeler)
02 Aug 17:36

Simon Riggs: Thoughts on Uber’s List of Postgres Limitations

An Uber technical blog of July 2016 described the perception of “many Postgres limitations”. Regrettably, a number of important technical points are either not correct or not wholly correct because they overlook many optimizations in PostgreSQL that were added specifically to address the cases discussed. In most cases, those limitations were actually true in the distant past of 5-10 years ago, so that leaves us with the impression of comparing MySQL as it is now with PostgreSQL as it was a decade ago. This is no doubt because the post was actually written some time/years? ago and only recently published.

This document looks in detail at those points to ensure we have detailed information available for a wider audience, so nobody is confused by PostgreSQL’s capabilities.

Having said that, I very much welcome the raising of those points and also wish to show that the PostgreSQL project and 2ndQuadrant are responsive to feedback. To do this, detailed follow-ups are noted for immediate action.

These points were noted in the blog
* Poor replica MVCC support
* Inefficient architecture for writes
* Inefficient data replication
* Difficulty upgrading to newer releases

Poor replica MVCC support

“If a streaming replica has an open transaction, updates to the database are blocked if they affect rows held open by the transaction. In this situation, Postgres pauses the WAL application thread until the transaction has ended.”

This is true, though misses the point that a parameter exists to control that behaviour, so that when
hot_standby_feedback = on
the described behaviour does not occur in normal circumstances. This is supported from PostgreSQL 9.1 (2011) and above. If you’re not using it, please consider doing so.

Later, this comment leads to the conclusion “Postgres replicas … can’t implement MVCC” which is wholly incorrect and a major misunderstanding. PostgreSQL replicas certainly allow access to data with full MVCC semantics.

Inefficient architecture for writes

“If old transactions need to reference a row for the purposes of MVCC MySQL copies the old row into a special area called the rollback segment.”

“This design also makes vacuuming and compaction more efficient. All of the rows that are eligible to be vacuumed are available directly in the rollback segment. By comparison, the Postgres autovacuum process has to do full table scans to identify deleted rows.”

Moving old rows to a rollback segment adds time to the write path for UPDATEs, but that point isn’t mentioned. PostgreSQL is more efficient architecture for writes in relation to MVCC because it doesn’t need to do as many push-ups.

Later, if the workload requires that we access old rows from the rollback segment that is also more expensive. That is not always needed, yet it is very common for longer running queries to need to access older data. However, if all transactions are roughly the same short duration access to the rollback segment is seldom needed, which just happens to make benchmark results appear good while real-world applications suffer.

By contrast, PostgreSQL has multiple optimizations that improve vacuuming and compaction. First, an optimization called HOT improves vacuuming in heavily updated parts of a table (since 2007), while the visibility map ensures that VACUUM can avoid full table scans (since 2008).

Whether rollback segments help or hinder an application depend on the specific use case and it’s much more complex than this first appears.

Next, we discuss indexes…

“With Postgres, the primary index and secondary indexes all point directly to the on-disk tuple offsets.”

This point is correct; PostgreSQL indexes currently use a direct pointer between the index entry and the heap tuple version. InnoDB secondary indexes are “indirect indexes” in that they do not refer to the heap tuple version directly, they contain the value of the Primary Key (PK) of the tuple.

Comparing direct and indirect indexes we see
* direct indexes have links that go index → heap
* indirect indexes have links that go index → PK index → heap

Indirect indexes store the PK values of the rows they index, so if the PK columns are wide or contain multiple columns the index will use significantly more disk space than a direct index, making them even less efficient for both read and write (as stated in MySQL docs). Also indirect indexes have index search time >=2 times worse than direct indexes, which slows down both reads (SELECTs) and searched writes (UPDATEs and DELETEs).
Performance that is >=100% slower is understated as just a “slight disadvantage” [of MySQL].

“When a tuple location changes, all indexes must be updated.”

This is misleading, since it ignores the important Heap Only Tuple (HOT) optimization that was introduced in PostgreSQL 8.3 in 2007. The HOT optimization means that in the common case, a new row version does not require any new index entries, a point which effectively nullifies the various conclusions that are drawn from it regarding both inefficiency of writes and inefficiency of the replication protocol.

“However, these indexes still must be updated with the creation of a new row tuple in the database for the row record. For tables with a large number of secondary indexes, these superfluous steps can cause enormous inefficiencies.”

As a result of ignoring the HOT optimization this description appears to discuss the common case, rather than the uncommon case. It is currently true that for direct indexes if any one of the indexed columns change then new index pointers are required for all indexes. It seems possible for PostgreSQL to optimize this further and I’ve come up with various designs and will be looking to implement this best fairly soon.

Although they have a higher read overhead, indirect indexes have the useful property that if a table has multiple secondary indexes then an update of one secondary index does not affect the other secondary indexes if their column values remain unchanged. This makes indirect indexes useful only for the case where an application needs indexes that would be infrequently used for read, yet with a high update rate that does not touch those columns.

Thus, it is possible to construct cases in which PostgreSQL consistently beats InnoDB, or vice versa. In the “common case” PostgreSQL beats InnoDB on reads and is roughly equal on writes for btree access. What we should note is that PostgreSQL has the widest selection of index types of any database system and this is an area of strength, not weakness.

The current architecture of PostgreSQL is that all index types are “direct”, whereas in InnoDB primary indexes are “direct” and secondary indexes “indirect”. There is no inherent architectural limitation that prevents PostgreSQL from also using indirect indexes, though it is true that has not been added yet.

We’ve done a short feasibility study and it appears straightforward to implement indirect indexes for PostgreSQL, as an option at create index time. We will pursue this if the HOT optimizations discussed above aren’t as useful or possible, giving us a second approach for further optimization. Additional index optimizations have also been suggested.

Inefficient data replication

“However, the verbosity of the Postgres replication protocol can still cause an overwhelming amount of data for a database that uses a lot of indexes.”

Again, these comments discuss MySQL replication which can be characterized as Logical Replication. PostgreSQL provides both physical and logical replication. All of the benefits discussed for MySQL replication are shared by PostgreSQL’s logical replication. There are also benefits for physical replication in many cases, which is why PostgreSQL provides both logical and physical replication as options.

PostgreSQL physical replication protocol itself is not verbose – this comment is roughly the same as the “inefficient writes” discussion: if PostgreSQL optimizes away index updates then they do not generate any entries in the transaction log (WAL), so there is no inefficiency. Also, the comment doesn’t actually say what we mean by “overwhelming”. What this discussion doesn’t consider is the performance of replication apply. Physical replication is faster than logical replication because including the index pointers in the replication stream allows us to insert them directly into the index, rather than needing to search the index for the right point for insertion. Including the index pointers actually increases not decreases performance, even though the replication bandwidth requirement is higher.

PostgreSQL Logical Replication is available via 2ndQuadrant’s pglogical and will be available in PostgreSQL 10.0 in core.

MySQL “Statement-based replication is usually the most compact but can require replicas to apply expensive statements to update small amounts of data. On the other hand, row-based replication, akin to the Postgres WAL replication, is more verbose but results in more predictable and efficient updates on the replicas.”

Yes, statement-based replication is more efficient in terms of bandwidth, but even less efficient in terms of the performance of applying changes to receiving servers. Most importantly, it leads to various problems and in various cases replication may not work as expected, involving developers in diagnosing operational problems. PostgreSQL probably won’t adopt statement-based replication.

Difficulty upgrading to newer releases

“the basic design of the on-disk representation in 9.2 hasn’t changed significantly since at least the Postgres 8.3 release (now nearly 10 years old).”

This is described as if it were a bad thing, but actually it’s a good thing and is what allows major version upgrades to occur quickly without unloading and reloading data.

“We started out with Postgres 9.1 and successfully completed the upgrade process to move to Postgres 9.2. However, the process took so many hours that we couldn’t afford to do the process again. By the time Postgres 9.3 came out, Uber’s growth increased our dataset substantially, so the upgrade would have been even lengthier.”

The pg_upgrade -k option provides an easy and effective upgrade mechanism. Pg_upgrade does require some downtime, which is why 2ndQuadrant has been actively writing logical replication for some years, focusing on zero-downtime upgrade.

Although the logical replication upgrade is only currently available from 9.4 to 9.5, 9.4 to 9.6 and 9.5 to 9.6, there is more good news coming. 2ndQuadrant is working on highly efficient upgrades from earlier major releases, starting with 9.1 → 9.5/9.6. When PostgreSQL 9.1 is desupported later in 2016 this will allow people using 9.1 to upgrade to the latest versions. This is available as a private service, so if you need zero-downtime upgrade from 9.1 upwards please get in touch.

In 2017, upgrades from 9.2 and 9.3 will also be supported, allowing everybody to upgrade efficiently with zero-downtime prior to the de-supporting of those versions.

29 Jul 21:06

Markus Winand: On Uber’s Choice of Databases

A few days ago Uber published the article “Why Uber Engineering Switched from Postgres to MySQL”. I didn’t read the article right away because my inner nerd told me to do some home improvements instead. While doing so my mailbox was filling up with questions like “Is PostgreSQL really that lousy?”. Knowing that PostgreSQL is not generally lousy, these messages made me wonder what the heck is written in this article. This post is an attempt to make sense out of Uber’s article.

In my opinion Uber’s article basically says that they found MySQL to be a better fit for their environment as PostgreSQL. However, the article does a lousy job to transport this message. Instead of writing “PostgreSQL has some limitations for update-heavy use-cases” the article just says “Inefficient architecture for writes,” for example. In case you don’t have an update-heavy use-case, don’t worry about the problems described in Uber’s article.

In this post I’ll explain why I think Uber’s article must not be taken as general advice about the choice of databases, why MySQL might still be a good fit for Uber, and why success might cause more problems than just scaling the data store.

On UPDATE

The first problem Uber’s article describes in great, yet incomplete detail is that PostgreSQL always needs to update all indexes on a table when updating rows in the table. MySQL with InnoDB, on the other hand, needs to update only those indexes that contain updated columns. The PostgreSQL approach causes more disk IOs for updates that change non-indexed columns (“Write Amplification” in the article). If this is such a big problem to Uber, these updates might be a big part of their overall workload.

However, there is a little bit more speculation possible based upon something that is not written in Uber’s article: The article doesn’t mention PostgreSQL Heap-Only-Tuples (HOT). From the PostgreSQL source, HOT is useful for the special case “where a tuple is repeatedly updated in ways that do not change its indexed columns.” In that case, PostgreSQL is able to do the update without touching any index if the new row-version can be stored in the same page as the previous version. The latter condition can be tuned using the fillfactor setting. Assuming Uber’s Engineering is aware of this means that HOT is no solution to their problem because the updates they run at high frequency affect at least one indexed column.

This assumption is also backed by the following sentence in the article: “if we have a table with a dozen indexes defined on it, an update to a field that is only covered by a single index must be propagated into all 12 indexes to reflect the ctid for the new row”. It explicitly says “only covered by a single index” which is the edge case—just one index—otherwise PostgreSQL’s HOT would solve the problem.

[Side note: I’m genuinely curious whether the number of indexes they have could be reduced—index redesign in my challenge. However, it is perfectly possible that those indexes are used sparingly, yet important when they are used.]

It seems that they are running many updates that change at least one indexed column, but still relatively few indexed columns compared to the “dozen” indexes the table has. If this is a predominate use-case, the article’s argument to use MySQL over PostgreSQL makes sense.

On SELECT

There is one more statement about their use-case that caught my attention: the article explains that MySQL/InnoDB uses clustered indexes and also admits that “This design means that InnoDB is at a slight disadvantage to Postgres when doing a secondary key lookup, since two indexes must be searched with InnoDB compared to just one for Postgres.” I’ve previously written about this problem (“the clustered index penalty”) in context of SQL Server.

What caught my attention is that they describe the clustered index penalty as a “slight disadvantage”. In my opinion, it is a pretty big disadvantage if you run many queries that use secondary indexes. If it is only a slight disadvantage to them, it might suggest that those indexes are used rather seldom. That would mean, they are mostly searching by primary key (then there is no clustered index penalty to pay). Note that I wrote “searching” rather than “selecting”. The reason is that the clustered index penalty affects any statement that has a where clause—not just select. That also implies that the high frequency updates are mostly based on the primary key.

Finally there is another omission that tells me something about their queries: they don’t mention PostgreSQL’s limited ability to do index-only scans. Especially in an update-heavy database, the PostgreSQL implementation of index-only scans is pretty much useless. I’d even say this is the single issue that affects most of my clients. I’ve already blogged about this in 2011. In 2012, PostgreSQL 9.2 got limited support of index-only scans (works only for mostly static data). In 2014 I even raised one aspect of my concern at PgCon. However, Uber doesn’t complain about that. Select speed is not their problem. I guess query speed is generally solved by running the selects on the replicas (see below) and possibly limited by mostly doing primary key side.

By now, their use-case seems to be a better fit for a key/value store. And guess what: InnoDB is a pretty solid and popular key/value store. There are even packages that bundle InnoDB with some (very limited) SQL front-ends: MySQL and MariaDB are the most popular ones, I think. Excuse the sarcasm. But seriously: if you basically need a key/value store and occasionally want to run a simple SQL query, MySQL (or MariaDB) is a reasonable choice. I guess it is at least a better choice than any random NoSQL key/value store that just started offering an even more limited SQL-ish query language. Uber, on the other hand just builds their own thing (“Schemaless”) on top of InnoDB and MySQL.

On Index Rebalancing

One last note about how the article describes indexing: it uses the word “rebalancing” in context of B-tree indexes. It even links to a Wikipedia article on “Rebalancing after deletion.” Unfortunately, the Wikipedia article doesn’t generally apply to database indexes because the algorithm described on Wikipedia maintains the requirement that each node has to be at least half-full. To improve concurrency, PostgreSQL uses the Lehman, Yao variation of B-trees, which lifts this requirement and thus allows sparse indexes. As a side note, PostgreSQL still removes empty pages from the index (see slide 15 of “Indexing Internals”). However, this is really just a side issue.

What really worries me is this sentence: “An essential aspect of B-trees are that they must be periodically rebalanced, …” Here I’d like to clarify that this is not a periodic process one that runs every day. The index balance is maintained with every single index change (even worse, hmm?). But the article continues “…and these rebalancing operations can completely change the structure of the tree as sub-trees are moved to new on-disk locations.” If you now think that the “rebalancing” involves a lot of data moving, you misunderstood it.

The important operation in a B-tree is the node split. As you might guess, a node split takes place when a node cannot host a new entry that belongs into this node. To give you a ballpark figure, this might happen once for about 100 inserts. The node split allocates a new node, moves half of the entries to the new node and connects the new node to the previous, next and parent nodes. This is where Lehman, Yao save a lot of locking. In some cases, the new node cannot be added to the parent node straight away because the parent node doesn’t have enough space for the new child entry. In this case, the parent node is split and everything repeats.

In the worst case, the splitting bubbles up to the root node, which will then be split as well and a new root node will be put above it. Only in this case, a B-tree ever becomes deeper. Note that a root node split effectively shifts the whole tree down and therefore keeps the balance. However, this doesn’t involve a lot of data moving. In the worst case, it might touch three nodes on each level and the new root node. To be explicit: most real world indexes have no more than 5 levels. To be even more explicit: the worst case—root node split—might happen about five times for a billion inserts. On the other cases it will not need to go the whole tree up. After all, index maintenance is not “periodic”, not even very frequent, and is never completely changing the structure of the tree. At least not physically on disk.

On Physical Replication

That brings me to the next major concern the article raises about PostgreSQL: physical replication. The reason the article even touches the index “rebalancing” topic is that Uber once hit a PostgreSQL replication bug that caused data corruption on the downstream servers (the bug “only affected certain releases of Postgres 9.2 and has been fixed for a long time now”).

Because PostgreSQL 9.2 only offers physical replication in core, a replication bug “can cause large parts of the tree to become completely invalid.” To elaborate: if a node split is replicated incorrectly so that it doesn’t point to the right child nodes anymore, this sub-tree is invalid. This is absolutely true—like any other “if there is a bug, bad things happen” statement. You don’t need to change a lot of data to break a tree structure: a single bad pointer is enough.

The Uber article mentions other issues with physical replication: huge replication traffic—partly due to the write amplification caused by updates—and the downtime required to update to new PostgreSQL versions. While the first one makes sense to me, I really cannot comment on the second one (but there were some statements on the PostgreSQL-hackers mailing list).

Finally, the article also claims that “Postgres does not have true replica MVCC support.” Luckily the article links to the PostgreSQL documentation where this problem (and remediations) are explained. The problem is basically that the master doesn’t know what the replicas are doing and might thus delete data that is still required on a replica to complete a query.

According to the PostgreSQL documentation, there are two ways to cope with this issue: (1) delaying the application of the replication stream for a configurable timeout so the read transaction gets a chance to complete. If a query doesn’t finish in time, kill the query and continue applying the replication stream. (2) configure the replicas to send feedback to the master about the queries they are running so that the master does not vacuum row versions still needed by any slave. Uber’s article rules the first option out and doesn’t mention the second one at all. Instead the article blames the Uber developers.

On Developers

To quote it in all its glory: “For instance, say a developer has some code that has to email a receipt to a user. Depending on how it’s written, the code may implicitly have a database transaction that’s held open until after the email finishes sending. While it’s always bad form to let your code hold open database transactions while performing unrelated blocking I/O, the reality is that most engineers are not database experts and may not always understand this problem, especially when using an ORM that obscures low-level details like open transactions.”

Unfortunately, I understand and even agree with this argument. Instead of “most engineers are not database experts” I’d even say that most developers have very little understanding of databases because every developer that touches SQL needs know about transactions—not just database experts.

Giving SQL training to developers is my main business. I do it at companies of all sizes. If there is one thing I can say for sure is that the knowledge about SQL is ridiculously low. In context of the “open transaction” problem just mentioned I can conform that hardly any developer even knows that read only transactions are a real thing. Most developers just know that transactions can be used to back out writes. I’ve encountered this misunderstanding often enough that I’ve prepared slides to explain it and I just uploaded these slides for the curious reader.

On Success

This leads me to the last problem I’d like to write about: the more people a company hires, the closer their qualification will be to the average. To exaggerate, if you hire the whole planet, you’ll have the exact average. Hiring more people really just increases the sample size.

The two ways to beat the odds are: (1) Only hire the best. The difficult part with this approach is to wait if no above-average candidates are available; (2) Hire the average and train them on the job. This needs a pretty long warm-up period for the new staff and might also bind existing staff for the training. The problem with both approaches is that they take time. If you don’t have time—because your business is rapidly growing—you have to take the average, which doesn’t know a lot about databases (empirical data from 2014). In other words: for a rapidly growing company, technology is easier to change than people.

The success factor also affects the technology stack as requirements change over time. At an early stage, start-ups need out-of-the-box technology that is immediately available and flexible enough to be used for their business. SQL is a good choice here because it is actually flexible (you can query your data in any way) and it is easy to find people knowing SQL at least a little bit. Great, let’s get started! And for many—probably most—companies, the story ends here. Even if they become moderately successful and their business grows, they might still stay well within the limits of SQL databases forever. Not so for Uber.

A few lucky start-ups eventually outgrow SQL. By the time that happens, they have access to way more (virtually unlimited?) resources and then…something wonderful happens: They realize that they can solve many problems if they replace their general purpose database by a system they develop just for their very own use-case. This is the moment a new NoSQL database is born. At Uber, they call it Schemaless.

On Uber’s Choice of Databases

By now, I believe Uber did not replace PostgreSQL by MySQL as their article suggests. It seems that they actually replaced PostgreSQL by their tailor-made solution, which happens to be backed by MySQL/InnoDB (at the moment).

It seems that the article just explains why MySQL/InnoDB is a better backend for Schemaless than PostgreSQL. For those of you using Schemaless, take their advice! Unfortunately, the article doesn’t make this very clear because it doesn’t mention how their requirements changed with the introduction of Schemaless compared to 2013, when they migrated from MySQL to PostgreSQL.

Sadly, the only thing that sticks in the reader’s mind is that PostgreSQL is lousy.

If you like my way of explaining things, you’ll love my book.

Original title and author: “On Uber’s Choice of Databases” by Markus Winand.

01 Jul 16:13

Speed and Danger

NASCAR removed the passenger seats because drivers hated how astronauts kept riding along with them and loudly announcing "Ahh, what a nice and relaxing drive."