Rafael Bernard Rodrigues Araújo
Shared posts
Speeding up Application Development with Bootstrap
Regulamentação do designer
Programador: Parabéns, Divasca, pela regulamentação da profissão de designer…
Divasca: Valeu…
Programador: Vocês conseguiram algo que nós da T.I. nunca vamos conseguir, simplesmente porque vale mais o “cada um por si”. O cara ganha uns 3 mil e já se acha um Bill Gates… Daí nunca vai ter uma descrição de cargos decente e as funções nunca vão ser bem definidas. Daí vamos continuar fazendo todas as funções numa pessoa só, do backend ao design…
Divasca: Droga, vocês são péssimos nisso!
—
Camiseta: Release the Kraken
The post Regulamentação do designer appeared first on Vida de Programador.
Fotógrafo amador capta a beleza da Ilha Sul da Nova Zelândia

Dicas: Segurança no Apache instalado no Ubuntu (Parte I)
PG Phriday: The Bones of High Availability
Well, the bell has tolled, the day is over, and at the end of it all, Postgres Open has ended its fifth year in service of the community. I will say it was certainly an honor to speak again this year, though now that it’s not conveniently in Chicago, I’ll have to work harder to justify hauling myself across the country next year. Of course at this point, I’d feel guilty if I didn’t at least try, assuming any of my submissions are accepted. ![]()
Given that the conference has ended, I would be remiss if I didn’t post my slides. The official location on the PostgreSQL Wiki is a start, but I have a website, so I might as well use it. So if you want to view my presentation directly, there are two ways:
What was the presentation about? If you believe Gabby’s tweet, it was about puns. That’s not too far from the truth, but the despite my propensity for wordplay, the actual topic focused on—what else—Postgres high availability. By starting the journey with a single server, I discuss how each additional server on the stack can reinforce the (spooky) skeleton of a successful database architecture. In the process I also share a couple of the sobering disasters that probably shaved a few years off of my life due to some level of insufficient paranoia.
High availability really is the result of a cost to benefit analysis between how much downtime costs the company, and the expense of hardware and space in a data center. With Postgres, there are a lot of ways to leverage built-in features and take advantage of its underlying capabilities in avoiding such scenarios. This isn’t like the year I built a DRBD + Pacemaker + Postgres stack live on stage, so don’t be afraid to read through it. If nothing else, the slides are good for a few laughs.
Hope to see you at Postgres Open 2016!
Canonical Design Team: Prepare for when Ubuntu freezes
I routinely have at least 20 tabs open in Chrome, 10 files open in Atom (my editor of choice) and I’m often running virtual machines as well. This means my poor little X1 Carbon often runs out of memory, at which point Ubuntu completely freezes up, preventing me from doing anything at all.
Just a few days ago I had written a long post which I lost completely when my system froze, because Atom doesn’t yet recover documents after crashes.
If this sounds at all familiar to you, I now have a solution! (Although it didn’t save me in this case because it needs to be enabled first – see below.)
oom_kill
The magic SysRq key can run a bunch of kernel-level commands. One of these commands is called oom_kill. OOM stands for “Out of memory”, so oom_kill will kill the process taking up the most memory, to free some up. In most cases this should unfreeze Ubuntu.
You can run oom_kill from the keyboard with the following shortcut:
Except that this is disabled by default on Ubuntu:
Enabling SysRq functions
For security reasons, SysRq keyboard functions are disabled by default. To enable them, change the value in the file /etc/sysctl.d/10-magic-sysrq.conf to 1:
And to enable the new config run:
SysRq shortcut for the Thinkpad X1
Most laptops don’t have a physical SysRq key. Instead they offer a keyboard combination to emulate the key. On my Thinkpad, this is fn + s. However, there’s a quirk that the SysRq key is only “pressed” when you release.
So to run oom_kill on a Thinkpad, after enabling it, do the following:
- Press and hold
alt - To emulate
SysRq, pressfnandskeys together, then release them (keep holdingalt) - Press
f
This will kill the most expensive process (usually the browser tab running inbox.google.com in my case), and freeup some memory.
Now, if your computer ever freezes up, you can just do this, and hopefully fix it.
(Also posted on robinwinslow.uk)
Projeto instala mesas de ping pong por SP pra estimular uso dos espaços urbanos

Preenchendo a lacuna entre administradores de bancos de dados e o resto da TI
Os profissionais de bancos de dados desempenham um papel único na TI, mas raramente recebem o reconhecimento que merecem. Talvez seja por causa de sua natureza analítica. Talvez por se enquadrarem entre desenvolvedores e operações. Ou, para dizer a verdade, talvez seja porque a maioria das outras pessoas não entende de fato o que eles fazem.
Seja qual for o motivo, com frequência os administradores de bancos de dados (DBAs) se sentam sozinhos em um canto, isolados do resto da TI, com seu potencial de impacto ignorado. As organizações onde isso acontece (ou seja, a maioria das empresas) geralmente partem do pressuposto que a única função do DBA é manter as coisas funcionando. Quase sempre, a ordem que vem de cima é: “apenas não deixe que o banco de dados quebre nada, ok?”. Isso é um desperdício de oportunidade e de um recurso muito valioso, o que cria obstáculos para organizações de TI em toda a parte.
Como?
Considere o fato de que o banco de dados é o cerne de todos os aplicativos. Afinal de contas, a principal função da maioria dos aplicativos empresariais é armazenar e recuperar dados de um banco de dados. Pensando bem, desde CRM até e-mail e aplicativos de RH, todos são apenas front-ends de bancos de dados.
Agora, considere que o banco de dados costuma ser a parte mais complexa de qualquer aplicativo. Na verdade, para a maioria dos profissionais de TI que não trabalha com bancos de dados, estes são como caixas pretas. Poucos entendem os tempos de espera dos bancos de dados, seus bloqueios/desbloqueios, planos de execução e tudo que acontece dentro de um sistema de gerenciamento de banco de dados.
Embora o fato de que os bancos de dados sejam difíceis de expandir e manter seja bastante reconhecido, existe pouco entendimento das complexidades decorrentes de cargas de trabalho dinâmicas, crescimento, segurança dos dados, resiliência, consistência e desempenho, e de que essas funções são tão importantes quanto backups e atualizações. Por isso, não é de se surpreender que, de acordo com uma pesquisa da Gleanster, até 80% dos problemas de desempenho de aplicativos estejam relacionados ao banco de dados.
Com tudo isso em mente, a grande necessidade de romper a barreira entre os DBAs e o resto da TI – especialmente equipes de sistemas – torna-se bem aparente.
Como isso pode ser feito?
A primeira etapa é garantir que toda a TI, mas especialmente as equipes de bancos de dados e sistemas, comecem a operar em conjunto de acordo com um modelo centrado em aplicativos em que o desempenho do app seja a principal prioridade de todos. Afinal de contas, o desempenho do aplicativo é um dos fatores mais importantes para o sucesso geral de qualquer empresa. Na verdade, uma recente pesquisa da SolarWinds descobriu que 93% dos usuários finais da empresa consideram que o desempenho e a disponibilidade dos aplicativos afetam sua capacidade de realizar seu trabalho, sendo que 62% informam que são absolutamente críticos.
Tendências como nuvem e DevOps já estão começando a forçar todo o setor a prestar mais atenção ao desempenho do aplicativo voltado para o usuário final e ajudando a eliminar os silos entre DBAs, desenvolvedores e equipes de sistemas. O segredo do sucesso aqui é ter uma linguagem ou visão em comum com relação ao desempenho dos aplicativos. Isso dará a todos uma compreensão compartilhada não apenas do desempenho geral, mas do papel que cada elemento da pilha do aplicativo desempenha, especialmente o banco de dados.
Isso não deve surpreender ninguém que esteja familiarizado com a evolução da computação. Nas últimas décadas, várias tecnologias, desde o desenvolvimento orientado por objetos até serviços na Web e computação em nuvem, vêm representando uma marcha constante para sistemas distribuídos em que a interação entre os sistemas se tornou mais importante do que o desempenho de cada um dos componentes.
É verdade que, em sistemas, a corda sempre arrebenta do lado mais fraco, mas o desempenho do sistema em geral é afetado cada vez mais pela interação entre os elementos da pilha, muitos dos quais estão ficando cada vez mais temporários, sendo facilmente substituídos, expandidos e reiniciados em um mundo virtual. Com exceção dos bancos de dados, que, como observamos, continuam sendo o cerne dos aplicativos.
Em vez de gastar tempo localizando problemas em isolamento, o gerenciamento do desempenho de aplicativos de hoje exige essa visão completa da pilha. Isso reduz a atribuição de culpas e permite que diferentes equipes trabalhem em conjunto rumo à solução de problemas, em vez de apenas isentar sua área de competência da responsabilização.
Quando uma organização de TI chega a uma colaboração eficaz, cada membro passa a reconhecer a importância do banco de dados para o desempenho do sistema em geral e do DBA como uma das responsabilidades mais complexas na equipe.
É chegado o momento de nos unirmos e trabalharmos para construir aplicativos melhores e melhores organizações de TI. É hora de eliminar as barreiras e permitir que todos contribuam para o desempenho dos aplicativos.
poesia
Todas as coisas têm o seu mistério, e a poesia é o mistério de todas as coisas.
(Federico Lorca)
Send to KindleSalvando diff em HTML
Comece instalando as ferramentas:
sudo apt-get install colordiff kbtin
Agora você pode:
diff arquivo1.txt arquivo2.txt | colordiff | ansi2html > diff.html
Ou, com git:
git diff | colordiff | ansi2html > gitdiff.html
Você também pode salvar a saída de qualquer comando que retorne ANSI colorido:
ls -lha --color | ansi2html > ls.html
O post Salvando diff em HTML apareceu primeiro em Elcio Ferreira - fechaTag.
Indicações de sexta (12)

Toda sexta-feira, uma pequena lista de artigos cuja leitura recomendamos. Além disso, indicaremos também uma mensagem e um hino para serem ouvidos. Nosso desejo é que lhe sejam úteis para aprofundar seu conhecimento do Senhor, para capacitar você a servi-Lo melhor e para despertar em você mais amor por Ele.
É sempre importante relembrar o que dizemos em Sobre este lugar: as indicações a um autor ou a alguma fonte não implica aprovação total ou incondicional de tudo o que é ali ensinado nem indicado em outros links ou em vídeos relacionados, etc; indica, outrossim, que naquele artigo específico há conteúdo bíblico a ser apreciado.
Artigos que merecem ser lidos
- Você já teve dificuldades de explicar o relacionamento de Davi e Jônatas quando ele é usado como “apoio bíblico” para a prática do homossexualismo? Entao, leia este artigo de Wilson Porte e habilite-se a defender a fé escriturística e a condenação sobre o pecado.
- Ainda sobre o assunto homossexualismo, este outro artigo de Wilson Porte desmonta a falácia de alguns que dizem que Romanos 1 não condena essa prática. Os homossexuais não são menos pecadores que o restante da humanidade, mas seu pecado é especialmente condenado nas Escrituras.
- Você já deve ter-se deparado com mensagens falando das assim chamadas luas de sangue, as quais, segundo seus “seguidores”, sempre estão ligadas a fatos extraordinários na história de Israel. Por essa razão, a tétrade de luas de sangue desse ano seria um sinal do fim do mundo ou da volta do Senhor Jesus. Leia dois artigos de Hugh Ross, astrofísico cristão, que demonstra a bobagem dessa interpretação, tanto do ponto de vista científico quanto do bíblico. Os artigos estão em inglês.
- A beleza da santidade, por A. W. Pink. Por que vale a pena ser santo? Por que a santidade é bela?
Mensagem que merece ser ouvida
O chamado para o arrependimento e a fé, por Paul Washer.
Hino que merece ser ouvido
O Love that wilt not let me go
Letra de George Matheson (1882) e música de Albert Lister Peace (1884).
Letra original
O Love that Will Not Let Me Go
O Love that wilt not let me go,
I rest my weary soul in thee;
I give thee back the life I owe,
That in thine ocean depths its flow
May richer, fuller be.
O light that followest all my way,
I yield my flickering torch to thee;
My heart restores its borrowed ray,
That in thy sunshine’s blaze its day
May brighter, fairer be.
O Joy that seekest me through pain,
I cannot close my heart to thee;
I trace the rainbow through the rain,
And feel the promise is not vain,
That morn shall tearless be.
O Cross that liftest up my head,
I dare not ask to fly from thee;
I lay in dust life’s glory dead,
And from the ground there blossoms red
Life that shall endless be.
Tradução para o português
Ó amor que não me deixas ir
Ó amor que não me deixas ir,
descanso minha cansada alma em ti;
eu te dou de volta a vida que devo a ti,
para que, nas profundidades de teu oceano, seu fluir
a faça mais rica, mais plena.
Ó luz que me seguiste por todo meu caminho,
eu rendo minha tocha bruxuleante a ti;
meu coração restaura o raio que de ti tomou emprestado
para que na chama de tua luz do sol seu dia
seja mais brilhante, mais justa.
Ó alegria que me procuras em meio à dor:
eu não posso fechar meu coração para ti.
Eu sigo o arco-íris no meio da chuva
E sinto que a promessa não é vã,
que a manhã será sem lágrimas.
Ó cruz que ergueste minha cabeça,
não me atrevo a pedir para voar para longe de ti.
Eu joguei ao pó a glória morta da vida
e, desse chão, irá florescer
a vida que será sem fim.
Versão em português
Amor que não me largas nunca
Amor! que não me largas nunca!
Minh’alma achou descanso em Ti;
Desejo dar-Te minha vida,
A Ti, de quem a recebi,
E só por Ti viver.
Ó Luz! que sempre me iluminas!
Por Ti, Senhor, eu posso ver;
E já que a luz celeste brilha,
Nenhum farol preciso ter,
Mas, sim, a luz do céu.
Ó Gozo! que minh’alma inundas!
Que penas Teu poder desfaz!
Na chuva ao ver um arco-íris,
Sei que a promessa cumprirás,
Que o pranto cessará.
Ó Cruz! Levantas minha fronte;
Alentas tu meu coração;
O sangue por Jesus vertido
Garante minha salvação
E dá-me paz com Deus.
História
George Matheson (27.3.1842–28.8.1906) nasceu com deficiência visual e, aos 15 anos, soube que estava ficando cego. Em lugar de ficar desencorajado com isso, matriculou-se na Universidade de Glasgow e graduou-se aos 19 anos. Aos 20, ficou completamente cego e resolveu entrar para o ministério, dedicando-se aos estudos teológicos.
Suas três irmãs o ajudaram nos estudos, de tal modo que aprenderam hebraico, grego e latim para poderem auxiliá-lo. Após formar-se, pastoreou igrejas na Escócia. Foi um dos mais destacados ministros de seus dias, servindo até 1899, quando teve de se aposentar por conta da precária saúde.
No dia em que uma de suas irmãs estava se casando, Matheson escreveu esse hino. Ele registrou a experiência em seu diário:
Meu hino foi composto na casa pastoral de Inellan na noite de 6 de junho de 1882. Eu estava sozinho naquele momento. Era o dia do casamento da minha irmã, e o resto de minha família foi passar a noite em Glasgow. Alguma coisa tinha acontecido para mim, que era conhecida apenas por mim, e que me causou o mais severo sofrimento mental. O hino foi fruto daquele sofrimento. Ele foi o mais rápido trabalho que já fiz na minha vida. Eu tive a impressão de ter sido ditado a mim por alguma voz interior mais do que de ter sido obra minha. Tenho certeza de que todo o trabalho foi concluído em cinco minutos, e estou igualmente certo de que nunca recebeu de minhas mãos qualquer retoque ou correção. Eu não tenho nenhum dom natural para ritmo. Todos os outros versos que escrevi são artigos manufaturados; esse veio como o amanhecer. Eu nunca fui capaz de expressar outra vez o mesmo fervor em versos.
Mesmo não tendo citado qual seria “o mais severo sofrimento mental” que sofrera, a história de Matheson permite deduzi-lo. Anos antes, ele estava noivo, quando soube que ficaria completamente cego. Então, perguntou à noiva se ela tinha disposição de estar ao lado de um homem que, além de cego, por quem os médicos nada podiam fazer, escolhera ser ministro do evangelho. Ela respondeu que lhe seria demais, que não tinha condições de viver daquele modo, e terminou o noivado com ele. Isso o entristeceu profundamente. Nos anos seguintes, aquela irmã que se casava tinha sido parte fundamental de seu ministério. Além de cuidar dele, ajudava-o a redigir seus sermões e a memorizar as Escrituras. Segundo pessoas que assistiam aos cultos, ninguém saberia dizer que Matheson era cego dada a facilidade com que recitava porções da Bíblia e pregava.
Agora, sua irmã não poderia estar mais com ele. Isso, provavelmente, tenha lhe trazido nova percepção de sua limitação, de sua dor, de sua insuficiência, nova lembrança dolorosa de que ele mesmo não tivera o prazer de casar-se. Então, Deus o lembrou que nunca o havia deixado, que uma luz mais real que aquela do sol o havia guiado todos os dias. E isso foi registrado nesse maravilhoso hino.
O post Indicações de sexta (12) apareceu primeiro em Campos de Boaz.
Gotas de orvalho (16)

Qualquer coisa que me afaste de minha Bíblia é meu inimigo, por mais inofensivo que possa parecer.
(A. W. Tozer)
Sobre a cruz, Jesus não apenas tornou a salvação possível, mas, na verdade, Ele redimiu aqueles previamente dados a Ele pelo Pai.
(Steve Lawson)
O evangelho não é salvação para todos, mas salvação para os que crêem. Para os demais, é uma sentença de morte.
(Paul Washer)
Você nunca terá um teste para sua fé que não se configure uma bênção para você, se você for obediente ao Senhor. Eu nunca passei por uma provação em que, ao cruzar o profundo rio, eu não tenha encontrado algum pobre peregrino que eu não tenha podido ajudar por meio daquela experiência.
(A. B. Simpson)
Devemos nos desculpar por admoestar as pessoas contra os falsos evangelhos, os falsos batismos, os falsos espíritos, os falsos cristos, os falsos sacramentos, os falsos mediadores, as falsas visões da igreja e as falsas visões sobre a Escritura? Devemos nos desculpar por admoestar sobre o pecado, o mundanismo e o compromisso? Tenho falado contra muitos grupos e denominações cristãos, porque Deus me ordena a pregar a verdade e a denunciar o erro (2Tm 4.1-6). Recuso-me a desculpar-me por obedecer a Deus. Pela graça de Deus, vou continuar expondo o erro, até que o Senhor me leve para a glória. E, pela graça de Deus, vou continuar dando nomes aos homens e a ser específico contra o erro e o pecado. Ó Deus, ajuda-nos a ter coragem nesses tempos trabalhosos, para que Te honremos e Te obedeçamos, e não para obedecermos ao homem.
(David Cloud)
A rendição à supremacia, à glória, à vontade, ao prazer de Cristo deveria ser o primeiro e o mais elevado pensamento da vida.
(Andrew Murray)
O post Gotas de orvalho (16) apareceu primeiro em Campos de Boaz.
Log Analysis with OpenDNS
Logs…They try to tell you what’s going on in a system, but it takes a special kind of patience to read through hundreds of thousands of lines of machine generated text full of arcane errors and differing timestamps.
As a security analyst, part of my job involves looking at DNS logs for potential customers and showing what they might have on their network as well as what OpenDNS would have blocked. In these reviews, we don’t have access to the systems or logs from other events to provide extra context. We typically only have information from BIND, Windows Active Directory, InfoBlox or another vendor or service. We basically perform DNS incident response using several techniques to speed up the process and help make sense of the information.
Eyeballing log files line by line isn’t going to get us anything more than a headache. It’s much smarter to tackle the problem programmatically.
When working with DNS logs, we tend to follow these steps.
- Sanitize the data
- Sort and unique the data
- Analyze the data
- Report
When we first acquire a log file, it has its own special format. We have to convert the data to something we can work with. If we’re just trying to find the bad stuff calling out to the bad places, we don’t need much more than domain names, so we will isolate that part of the file.
The most useful logs we get from a usability perspective is one in CSV format. If you’re running some version of MS Office, Excel will try to open this kind of file. However, it’s not recommended as Excel can only handle so much data before eating all the memory in a system and seizing up. We almost exclusively work in the command prompt and with Python scripts.
CSV is actually text, but the values are comma separated, which means maybe just a couple fewer lines of typing. If we have a log file in CSV format, it might look like this when viewed through the terminal:
To get just the domains, we can run the ‘awk’ command to print a specific column. For these lines, the domain contact is in column seven, with each column separated by commas. The command to type in this case is:
cat example.txt | awk -F, '{print $7}'
This can be sent to a file for use during the analysis portion, like so:
cat example.txt | awk -F, '{print $7}' > justthedomains.txt
That was an easy example. Often, the logs are much more complex. As an example, here are some of the top lines from a two gigabyte file:
We need to get the domains out of this file too, but first we have to remove these unnecessary fields from the top using Vim:
#Software: SGOS 6.5.5.1
#Version: 1.0
#Start-Date: 2015-09-07 09:00:00
#Date: 2015-09-03 16:22:16
#Fields: date time time-taken i <snip>
We are now left with a large file of lines that look like this:
2015-09-07 09:00:02 169 192.168.1.223 - - - OBSERVED "Technology/Internet" - 200 TCP_NC_MISS POST image/jpeg http search.namequery.com 80 / - - "Mozilla/5.0 (compatible; MSIE 8.0;)" 10.251.106.45 239 233 - "none" "none"
Continuing forward to get just those domains, columns of information for each line are separated by spaces, so we can grab the domains using ‘awk’ again. The ‘awk’ command automatically uses the space as a file delimiter, so we don’t have to specify a different delimiter like in the CSV example (we used the ‘-F,’ switch to use a comma as the delimiter in that example).

That didn’t go so well. The lines are not in perfect columns. It looks like we will have to find a different way to grab those domains.
The following python script achieves what we want:
import re
from urlparse import urlparse
with open('DNS_logs.log') as f:
for eachline in f:
urlsearch = re.findall(r'(https?://\S+)', eachline)
url = str(urlsearch).replace('[\'','').replace('\']','').replace('[]','')
url_components = urlparse(url)
if "http" in url_components:
justthedomain = url_components.netloc
print justthedomain
Here are the results (of just a small part) after running the script:

We’re left with a list of domains, some of which are duplicates (because they’re contacted multiple times by the client machines). To make processing faster during the analysis stage, we want to remove duplicates. Assuming we ran the python script and sent its output to a text file called domains.txt, we can then get just the unique domains:
sort -u domains.txt

If that list is sent to a new file called unique_domains.txt, we can then run the domains through OpenDNS Investigate using its API to get all kinds of information, including domain score (which determines if a domain is considered malicious or benign), Whois details, ASN information, related domains, and more.
Using the Investigate API is straightforward and well documented. Investigate makes it possible to send a list of domains (we have sent millions at a time) using urllib2, receive a JSON document and parse through it, writing the results to a file for quick analysis.
Going over everything that’s possible with the Investigate API is beyond this post, but the following example demonstrates how to gather Whois information for a domain. We’ll be using a domain from the CSV file we first looked at: monarchestatemanagement[.]com
from urllib2 import Request,
urlopen import json api_key = 'Your Investigate API Key'
headers = {'Authorization': 'Bearer ' + api_key}
request = Request('https://investigate.api.opendns.com/whois/monarchestatemanagement.com.json', headers=headers)
response_body = urlopen(request).read()
values = json.loads(response_body)
print values['registrarName'] + ',' + values['expires']
Running this prints the fields we requested, the registrar and the expiration date:

We are able to acquire more information than just Whois on our list of domains with the Investigate API. Using the same domain as in the previous example, the original logs show allowed communication to monarchestatemanagement[.]com. However, looking at it with Investigate API, we learn that this domain would have been blocked if they were using OpenDNS (the screenshot is from the web interface for Investigate):

Log analysis doesn’t have to be boring. This is really just the tip of the iceberg.
We are always exploring new ideas in this area. One of the more interesting ways we look at logs is by sending them with Logstash to an ElasticSearch cluster for visual analysis with Kibana.
The technologies are out there to enable you to get out of your text editor and into a better place.
The post Log Analysis with OpenDNS appeared first on OpenDNS Security Labs.
Ministério do Planejamento abandona o Expresso do Serpro e adota e-mail da Microsoft
Via convergenciadigital.uol.com.br:
A política de uso da plataforma aberta de software no governo federal sofreu mais um revés e, em paralelo, a recém-criada política que trata da Segurança da Informação. O Ministério do Planejamento, Orçamento e Gestão, que possui no seu organograma a Secretaria de Logística e Tecnologia da Informação (SLTI), responsável pela padronização de bens e serviços de TI e de implementação de uma política de segurança na administração federal, acaba de ceder aos interesses comerciais do mercado de Informática, apesar de decretos e portarias ditarem o contrário.
Sem sequer esperar pela realização e conclusão de um pregão a ser feito pelo Departamento Nacional de Infraestrutura de Transportes (DNIT), órgão vinculado ao Ministério dos Tranportes, a Diretoria de Tecnologia da Informação do ministério do Planejamento adotou para os 5.078 servidores da pasta o serviço de e-mail Outlook, da Microsoft, na modalidade de "contrato select".
É a primeira vez que o portal Convergência Digital assiste a uma contratação de serviços de software no governo sem nenhum amparo legal, à luz do que determina a Lei de Licitações. Não há sequer informação de um contrato emergencial que tenha sido assinado pelo Ministério do Planejamento. Esse acordo comercial prévio, se baseia apenas na possibilidade de sucesso de um suposto pregão de nº 401/2015, do Denit, que somente será realizado no próximo dia 30 de setembro. Dele será extraído uma Ata de Registro de Preços, da qual o Ministério do Planejamento espera aderir como "carona", para referendar o suposto acordo de gaveta com a Microsoft. O ministério alega que entrará como partícipe neste edital e que haverá uma "errata" a ser publicada antes da licitação.
O artigo "Ministério do Planejamento abandona o Expresso do Serpro e adota e-mail da Microsoft" foi originalmente publicado no site BR-Linux.org, de Augusto Campos.
Ronnie Tucker: Whoa. Microsoft Is Using Linux to Run Its Cloud
Microsoft has admitted to something that used to be unthinkable: using Linux to run some of its own operations.
In a blog post on Thursday, Microsoft Azure networking principal architect Kamala Subramaniam explained how the company developed a new software system, dubbed Azure Cloud Switch, for running the networking gear that Microsoft’s cloud service depends on.
Network switches typically come with their own software baked right into the product. The problem Microsoft faced, according to Subramaniam, was integrating the software that ships with those switches with the wide variety of software it uses to run its Azure cloud service. So Microsoft had to build its own switch software—and it turned to Linux to do just that.
Source: http://www.wired.com/2015/09/microsoft-using-linux-run-cloud/
Submitted by: Arnfried Walbrecht
benefício
Um benefício recebido é a mais sagrada das dívidas.
(Henri Lacordaire)
Send to KindleMeu projeto no concurso Casa Conectada 2015
Recentemente participei do concurso Casa Conectada 2015 com um projeto de IoT (Internet of Things ou Internet das coisas). Neste post vou apresentar um pouco da minha experiência durante o concurso e descrever melhor o meu projeto que acabou indo para a final.
O concurso em questão foi um parceria entre a empresa Freescale e do site Embarcados e logo que fiquei sabendo dele me interessei por alguns motivos. Em primeiro lugar eu tinha pouca experiência com sistemas embarcados e vi esta oportunidade como uma boa hora de aprimorar as minhas habilidades nesta área. Além disso, também queria conhecer novas pessoas, entender um pouco melhor as dificuldades destes tipos de projetos e trabalhar com o hardware oferecido pelo patrocinador do evento (a placa FRDM-K64F e o módulo Bluetooth).
De acordo com a organização, 85 projetos foram enviados e destes apenas 25 receberam o hardware e passaram para a fase seguinte. Destes 25, dez projetos foram selecionados e na apresentação final apenas 8 participantes mostraram seus projetos. Na classificação final eu fiquei em sexto lugar, algo que me deixou muito contente pelo fato desta não ser a minha área.
Foi uma experiência no mínimo diferente e completamente fora da minha zona de conforto, pois no evento final haviam muitos engenheiros, professores e outros profissionais da área de eletrônica. Como sou na computação fiquei com aquele sentimento de “peixe fora d’agua”, mas isso não me prejudicou de forma alguma. Nestas horas é melhor respirar fundo, se concentrar e seguir adiante fazendo o seu melhor. Pelo menos eu penso assim.
O meu projeto se chama AMedCA (Anel Medidor de Consumo de Água) e logo abaixo há um vídeo de curtos 2 minutos onde descrevo o cenário, problema, abordagens existentes e a minha proposta junto com uma pequena demonstração. Quer quiser dar uma olhada no código fonte pode ir até o repositório que coloquei no meu GitHub.
Durante o desenvolvimento deste projeto eu utilizei a plataforma MBED para desenvolvimento do código que foi colocado na placa. Inicialmente o backend foi desenvolvido como uma aplicação Windows Desktop no Visual Studio 2015 Comunity edition, porém eu a converti para uma aplicação ASP.net que hospedei na plataforma de cloud Windows Azure.
Como já tinha dito, eu não dominava muito todas as tecnologias que utilizei e foi uma batalha aprender algumas coisas e resolver problemas. Mas, no geral, foi uma ótima experiência. Na foto abaixo dá para ver o meu projeto improvisado com canos, placas de isopor, o sensor, a placa e um item muito importante em projetos de engenharia: a fita tipo silver tape!
Conversei bastante com os oito participantes que foram apresentar o projeto e fiz questão de dar os parabéns a todos os vencedores, pois eles apresentaram ideias bacanas, que se não chegarem a se tornar produtos ao menos deixam clara a criatividade, imaginação, esforço e dedicação dos envolvidos nos projetos.
amabilidade
Amabilidade em nós é o mel que alivia a ferroada da indelicadeza nos outros.
(Waler Savage Landor)
Send to KindleThe sad state of web app deployment
I spent a good chunk of the last four days installing an Internet web forum, which claims it can be up and running in 30 minutes.
I like to think I’m pretty alright at computers. So what went wrong here? Well let me tell you.
My arduous tale
I don’t want to name and shame here, because this is not my first such experience and the problem is larger than one individual product. (Let’s just say it rhymes with “piss horse”.)
The 30-minute claim came because the software only officially supports being installed via Docker, the shiny new container gizmo that everyone loves because it’s shiny and new, which already set off some red flags. I managed to install an entire interoperating desktop environment without needing Docker for any of it, but a web forum is so complex that it needs its own quasivirtualized OS? Hmm.
I tried installing the vendor Docker (I’m using Ubuntu 14.04, the current LTS release), but that’s 1.0, and Docker has gotten up to 1.7 in the intervening year and a half, and this software needs at least Docker 1.2. I stress that this web forum is so cutting-edge that it refuses to install without technology that did not exist two years ago.
So I tried installing current Docker via the officially condoned mechanism, which of course involves piping curl into your shell. That’s a fucking appalling idea, but security is kind of a joke with Docker anyway. It also didn’t work, giving me the rather useless E: Unable to locate package docker-engine instead. I’m sure glad Docker exists, to save me from all those package management nightmares!
Some digging revealed that Docker just doesn’t exist for 32-bit, even though they say it should work (as evidenced by the existence of a canonical 32-bit Ubuntu package), and they just don’t bother mentioning this in their README or installation docs or shell script that runs as root.
At this point I was pretty sick of Docker, so I decided to try installing the damn thing manually. It was just a Rails app, after all, and I’ve managed to install those before. How hard could it possibly be?
Ha, ha! After a git clone (because the app isn’t in rubygems??), I then spent maybe six hours fighting with RVM. (I’m sure you have a suggestion for a different Ruby environment thing I should be using instead, and I don’t care, shut up, I already had RVM installed and running something else.)
The problem was some extremely obtuse errors when running bundle install, which is supposed to install all of the app’s dependencies. Some library was complaining that a .a file in its own build directory didn’t exist, which didn’t make a lot of sense. Also, I spotted x86_64-linux in the path, which made even less sense.
See, I actually have a 64-bit kernel, but a 32-bit userspace. (There’s a perfectly good reason for this.) And the Ruby binary that RVM built was, of course, 32-bit — it wouldn’t have worked otherwise, since libc and everything else are all 32-bit. But those binaries thought they were on a 64-bit system (which they were), and rubygems uses the system architecture for building native extensions for some stupid fucking reason, so everything was built as 64-bit. In a way I’m lucky that this one particular package happened to fail, because all the others built just fine, and I only would’ve found the problem later when I actually tried to run the damn thing.
I tried all kinds of environment variables and hand-editing of files and whatnot to convince Ruby that it was actually 32-bit, to no avail. Eventually I resorted to reading a bunch of RVM’s source code, and then I discovered a --32 flag that magically fixes everything. It’s not documented, but don’t worry! I found a GitHub issue comment from three and a half years ago, saying the docs will be fixed with RVM 2.0.
So now I had a working Ruby, and after some tedious rebuilding, I had a set of gems as well. Super.
Now I just had to figure out how to configure the damn app, which is tricky when the README just says “use Docker”. It had a config/app.conf.sample file, but this turned out to be sample configuration for Upstart, the Ubuntu service manager. I ended up discovering that there are still docs for installing on Ubuntu, just not linked from anywhere.
The next step was to migrate the database from “doesn’t exist” to “exists”, which is usually a breeze in Rails, by which I mean I have never once had it actually work without descending into a hellish nightmare and this time was no exception. The documentation claims the app needs to be superuser. Let’s see what PostgreSQL says about superusers.
Superuser status is dangerous and should be used only when really needed.
Yes, this definitely seems like something a web forum needs. I opted not to give it root on my entire database, which of course broke the migrations because they use CREATE EXTENSION to load binary extensions into my server, a perfectly reasonable thing for database migrations to be doing. I didn’t even have the required extension installed, and of course the documentation never once mentions needing it, so off I went to install it.
I installed postgresql-contrib, and then some funny things happened. Long story short, I was running Postgres 9.1, and the current Ubuntu version is 9.3. I’d originally installed the postgresql package, and using Arch Linux on my desktop has spoiled me into thinking that that will keep me on the latest version, but Ubuntu cares about trivialities like “not breaking your entire server” and had just kept me on 9.1 the whole time. But postgresql-contrib, unqualified, meant the current version now which was 9.3, and had also installed the full server. Whoops! So I just took a quick detour to upgrade to 9.3, which I’ve done before and which is relatively painless.
Okay! Now I have a database.
At this point the docs take a wild detour into installing some Ruby process management library called Bluepill and copying some massive pile of “configuration” (actually just Ruby code, of course) and using that to run the app and also adding Bluepill to the user’s crontab as a @reboot and what the ever-loving fuck.
(I assumed this was some oblique Matrix reference, but someone later pointed out to me that it’s called bluepill and it keeps things up. Charming, but par for the course for Ruby.)
Anyway, I opted to not do all that, and just ran the thing directly with rails server.
Almost done. Now I just need to proxy nginx to it. The app helpfully provides some configuration for me, which is two hundred lines long and consists mostly of convoluted rules for which URLs are static assets and which should be proxied. I decided to hell with it and just proxied the whole thing and I’ll fix it later if I feel like it.
Now we’re up and running! Except I never get any signup email, and it turns out this is because I also have to run “sidekiq”, a job processor. And with that, now we’re done.
What horror have we created
I tell you this story to make the point that this is all completely fucking ridiculous.
Set aside the oddball tool breakage and consider that if you follow the instructions to the letter, this web forum requires:
- Cloning (not installing!) the software’s source code and modifying it in-place.
- Copy-pasting hundreds of lines of configuration into nginx, as root, and hoping it doesn’t change when you upgrade.
- Copy-pasting hundreds of lines of Ruby for the sake of bluepill, and hoping it doesn’t change when you upgrade.
- Installing non-default Postgres extensions, as root.
- Running someone else’s arbitrary database commands as a superuser.
- Installing logrotate configuration, as root.
There’s nothing revolutionary here. It’s an app that wants to accept HTTP connections, use a database, and send email. Why is this so fucking complicated?
I’ll tell you why—
Rails sucks
My experience is admittedly limited here, but as far as I can tell, installing a Rails app is impossible. It reads configuration from the source directory. It logs to the source directory. You have to manually precompile all the assets, which are of course also written to the source directory.
Rails is one of the most popular web frameworks in the world, championed by developers everywhere. And you can’t actually install anything written with it. This is a joke, right?
Unix lied to you
Back in the day, when Windows effectively didn’t have users and everyone just ran everything as an administrator, Unix nerds (myself included) would crow about how great Unix was for making heavy use of separate users for everything.
Boy, do I have egg on my face. Let’s recap here:
- If you’re missing a library or program, and that library or program happens to be written in C, you either need root to install it from your package manager, or you will descend into a lovecraftian nightmare of attempted local builds from which there is no escape. You say you need
lxmlon shared hosting and they don’t havelibxml2installed? Well, fuck you. - Only one thing can bind to port 80 and it has to run as root, so your options are to use nginx and need root to add a new app, or use Apache and do
.htaccessor something equally atrocious. - You want your app to start automatically, of course. You can add it to your crontab with
@reboot, which is kind of a hack and also won’t restart it if it dies. So you can also install your own local process manager, like this app did. Or you can do what most people do and add it to the system’s daemon manager, as root. Allegedly many modern daemon manager things allow non-root users to set their own things up, but I’ve never seen this actually done or even explained very clearly. - If you want to rotate your logs, well, that needs root.
- You think Docker solves any of this? Let me know how piping
curlto a shell script that usessudoworks out for you. Oh, and if you’re in the docker group, you are root.
Modern Linux desktops are pretty alright at the multi-user case, which basically no one uses. On the server side, well, if you have a server everyone just assumes you have root anyway, so everything is a giant mess. Even RVM, which is designed for having multiple per-user Ruby installations, prompted me for my password so it could sudo apt-get install something.
It worked on my machine
We are really, really bad at enumerating and handling dependencies.
I mean, we can’t even express them in our own software. System package managers deal with it, and that’s great — but I’m a developer, not a packager. If I write a Python library that wraps a C library, there is no way to express that dependency. How would I? There’s no canonical repository of C/C++ packages, anywhere. Even if I could, what good would it do? Installing a shared C library locally is a gigantic pain in the ass, involving LD_LIBRARY_PATH, or maybe it was LDFLAGS=-rpath? See, I don’t even know. Virtually no one does it, because it’s a huge pain, because virtually no one does it.
So it should come as no surprise that there is no way whatsoever to list dependencies on services. You’d think that a web app could just have some metadata saying “I need Postgres and, optionally, Redis”, but this doesn’t exist. And the other side, where the system can enumerate the services it has available for a user, similarly doesn’t exist. So there’s no chance of discovery. If you’re missing a service the app needs but failed to document, or you set it up wrong, you’ll just find out on the first request.
Speaking of:
Web apps suck at reporting problems
For all the moving parts and all the things that can go wrong, there sure is a huge lack of reporting when it breaks. I basically rely on people tweeting at me or asking on IRC if something is broken. This particular app relied critically on a job queue, but didn’t notice it wasn’t running.
There are a few widgets that will email all crash logs to you, but what idiot came up with that? That’s completely fucking useless. I have over two thousand unread crash emails for my perfectly functional modest-traffic website. Almost all of them are some misconfigured crawler blowing up on bogus URLs in a way I don’t strongly care about fixing.
But if the app goes down and completely fails to start, I get zero email. If the app runs but every request takes 20 seconds, I get zero email. If every page 404s, I get zero email. And if real actual pages start to break, I get a flood of email that I’ll never notice because I don’t even look in that folder any more.
These are not unique problems. Yet the only solutions I’ve seen take the form of dozens of graphs you’re expected to keep an eye on manually.
What we should have by now
We should have apps that install with one (1) command, take five minutes to configure, and scale up to multiple servers and down to shared hosting. If I cannot install your web forum on Dreamhost, you have failed spectacularly.
But we haven’t even tried to solve this, and all the people who are most capable of solving it are too busy scaling Twitter or Amazon up to ten million servers or whatever. Installing basic web software gets harder all the time, and shared hosting becomes less useful all the time, and web developers flock to garbage like Docker that basically runs a VM because we can’t figure out how to make two apps use the same damned database.
The thing I want, but never figured out how to build, is an intermediate web app for the express purpose of installing and managing web apps. Yes, sure, like cPanel or whatever, but not with ad-hoc support for some smattering of popular apps; I also want a protocol for apps to explain their own minimal requirements.
I want to be able to say “install the Ruby app ‘pisshorse’”. And it goes and finds that gem. And it sees what Ruby version it claims to work on, and installs an RVM environment with that version. And it makes a new gemset and installs the gem. And it looks at a metadata file in a Well-Known Place, and it sees that the file demands a Postgres database and a Redis instance. And it inspects the common ways you might expect to be able to connect to Postgres or Redis. And then it asks me where Postgres and Redis are, and it offers whatever it found as defaults, and it accepts something concise like postgresql:///pisshorse rather than ten separate fields that make no sense if you’re not connecting over TCP. And it double-checks that those are okay, and it writes them to a very small configuration file in ~/.config/webapps/pisshorse or wherever. At no point am I asked to configure some ridiculous value like the TTL of database connections, which no one cares about and which the computer should be smart enough to gauge on its own.
If this is a shared host and you only have one Postgres database, that’s totally fine, because this is a magical world where people actually know about and use Postgres schemata, and apps actually support them.
The metadata file also lists any system-level libraries or binaries that are required (or desired), and if any of them aren’t installed, you’ll be asked to install them, with a single apt-get or yum or packer command you can inspect and then run yourself. Again, if this is a shared host and you can’t install software yourself, then the installer can either attempt to do it locally or just give up, and everything’s fine because it turns out web forums don’t actually need optipng and can just carry on without it.
Then it adds the app to your user-scoped daemon manager, and if you don’t have one then it quietly pretends to be one, using the @reboot hack. And it sees that the app also needs a job queue running, so it adds that too. It uses gunicorn or unicorn or uwsgi or whatever, but you don’t actually care which, and if you do then you can ask for a different one. It defaults to only two workers, but it also keeps an eye on the load and spawns a few more if necessary, learning how much traffic is normal as it goes. If it thinks it’s eating too much of the machine, it sends you an email or pings you on IRC or whatever.
The app is bound to ~/.config/webapps/pisshorse/pisshorse.sock, which isn’t too useful to you. And this is the hard part that I haven’t figured out yet, because there’s not really a good way to determine what your HTTP vhost setup looks like, and if you’re using nginx then you still need root. But I have ideas for a couple (convoluted) workarounds, so let’s pretend that the world is a nice place and it can set up the reverse proxying for you, without needing root. It even adds rules for caching the static assets (also defined in the metadata file), and perhaps can ask you for a CDN if you have one.
Now the app runs, but it has no users, and you can’t log in because you don’t have a confirmed email yet. But that’s okay, because the metadata file also specifies a few administrative commands you can run from the command line, and of course the magical web GUI can also do this for you.
From here you can basically forget about the management GUI. But it quietly collects logs and stats, and there are graphs to look at if you please. If at any point the app fails to start, or there’s a sharp uptick in failures on pages that used to work, or it can’t keep up with requests, or the job queue is broken, you get a ping.
Eventually you’ll need to upgrade, and that’s also fine, because it’s just a single button click. Your current instance goes into read-only mode, which is a thing that all apps support, because it would be embarrassing if they didn’t. The job queue is shut down, the database is copied and upgraded, and a separate new instance of the app is launched. New requests are directed to the new code, the old instance is shut down, and the old database is archived. Or, if the new instance immediately starts to spew errors, the old code is kept up and an irate email is automatically sent to the app’s maintainer. Either way, the disruption is minimal.
And the app benefits as well, because it uses a small library that knows whether it’s running under gunicorn or uwsgi or something else, and can perform some simple tasks like inspect its own load or restart itself or run some simple code outside a request.
I can dream.
We’ve been doing this for 20 years. We should have this by now. It should work, it should be pluggable and agnostic, and it should do everything right — so if you threw away the web gui, it would look like something a very tidy sysadmin set up by hand, not autogenerated sludge.
Instead, we stack layer after layer of additional convoluted crap on top of what we’ve already got because we don’t know how to fix it. Instead, we flit constantly from Thin to Mongrel to Passenger to Heroku to Bitnami to Docker to whatever new way to deploy trivial apps came out yesterday. Instead, we obsess over adding better Sass integration to our frameworks.
And I’m really not picking on Ruby, or Rails, or this particular app. I hate deploying my own web software, because there are so many parts all over the system that only barely know about each other, but if any of them fail then the whole shebang stops working. I have at least five things just running inside tmux right now, because at least I can read the logs and restart them easily.
This is terrible and we should all be ashamed. No wonder PHP is so popular. How am I supposed to tell a new web developer that this is what they have to look forward to?
Marco Slot: Using Docker to run a pg_shard cluster
Docker is quickly becoming one of the most popular ways of deploying distributed applications. By bundling all dependencies of an application in an easily shippable container, software deployment becomes a process that can be performed quickly and often. The open source sharding extension for PostgreSQL, pg_shard, and scalable real-time analytics solution for PostgreSQL, CitusDB, are both meant to run on a cluster of PostgreSQL servers. Deploying such a cluster can become a lot easier with Docker.
While there is no officially supported Docker container for Citus Data extensions at the moment, we were very excited to learn that Heap has published citus-docker. Heap uses CitusDB to analyze their click stream data in real-time with some advanced funnel queries. Their Docker image comes with both pg_shard and CitusDB pre-installed. If you haven't used docker before, you can follow the installation guide to set it up.
One of the benefits of Docker is that it lets you set up a whole cluster on your machine for testing very easily. pg_shard users often set up a cluster on their desktop, running multiple postgres servers on different ports. This approach is not very practical since it requires you to go through all the configuration steps multiple times. With docker-compose, setting up a local cluster becomes a breeze.
If you haven't done so already, you can install docker-compose with the following command:
sudo su curl -L https://github.com/docker/compose/releases/download/1.3.3/docker-compose-`uname -s`-`uname -m` > /usr/bin/docker-compose chmod +x /usr/bin/docker-compose
Now run the following:
git clone https://github.com/heap/citus-docker cd citus-docker docker-compose up -d
That's it! No additional configuration required. You now have a local pg_shard cluster with 2 worker nodes and a master node to which you can connect to using: psql -h localhost -U postgres. When you are done, you can remove it using:
docker-compose kill docker-compose rm
You could also start individual nodes by running the docker command on every node in the cluster:
docker run -d -p 5432:5432 --name citusdocker heap/citus-docker
On the master node, you will want to configure pg_worker_list.conf and add pg_shard to shared_preload_libraries per the instructions on the pg_shard github page:
docker exec -it citusdocker bash cat > /data/pg_worker_list.conf <<WORKERS # Enter IP addresses and ports of worker nodes 10.61.164.42 5432 ... WORKERS psql -c "ALTER SYSTEM SET shared_preload_libraries TO 'pg_shard'" exit
After changing this configuration, you should restart your container:
docker restart citusdocker
You can now connect to the docker container running on the master node using psql -h localhost -U postgres and follow the table sharding examples. If you plan to use docker to operate a cluster, we recommend looking into persistent storage options for docker and creating a new image.
When you would like to get rid of your docker container, run the following commands:
docker kill citusdocker # Stop the container docker rm citusdocker # Delete the container
We hope this gives a good starting point for using pg_shard and CitusDB with docker. Special thanks to Heap for providing the image! We'll also look to provide an official docker image for CitusDB in the upcoming months.
Deploying a cluster of PostgreSQL servers can become a lot easier with Docker
A Edição 18 da LibreOffice Magazine está online
John Hyde, o homem que orava

“A oração feita por um justo pode muito em seus efeitos” (Tg 5.16)
Wilbur Chapman escreveu a um amigo: “Eu aprendi algumas grandes lições com respeito à oração. Em uma de nossas missões na Inglaterra, as audiências eram extremamente pequenas. Mas eu recebi um bilhete dizendo que um missionário americano […] iria orar para que Deus abençoasse nosso trabalho. Ele era conhecido como Hyde, o homem que orava.
“Quase instantaneamente a maré virou. O salão ficou lotado e, em meu primeiro apelo, cinqüenta homens se entregaram a Cristo como Salvador. Quando estávamos indo embora, eu disse: ‘Sr. Hyde, eu quero que você ore por mim’. Ele veio a meu quarto, virou a chave na porta, caiu de joelhos e esperou cinco minutos sem uma única sílaba nos lábios. Eu podia ouvir meu próprio coração batendo e as batidas do dele. Senti as lágrimas quentes correndo por meu rosto. Eu sabia que estava com Deus. Então, ele ergueu o rosto, também molhado de lágrimas, e disse: ‘Ó Deus!’
“Em seguida, por cinco minutos pelo menos, ele ainda esteve em silêncio; depois, quando ele sabia que estava falando com Deus […] vieram das profundezas de seu coração petições pelos homens como eu nunca tinha ouvido antes. Quando me levantei, tinha aprendido o que era orar de fato. Cremos que a oração é poder, e cremos como nunca antes.”
O Espírito que habitava em John Hyde, o homem que orava, é o mesmo Espírito de intercessão que em nós permanece.
“Dá-me almas, ó Deus, ou eu morro!”
(John Hyde)
(Traduzido por M. Luca de Wilbur Chapman On “Praying” John Hyde. Revisado por Francisco Nunes. Este artigo pode ser distribuído e usado livremente, desde que não haja alteração no texto, sejam mantidas as informações de autoria, tradução, revisão e fonte e seja exclusivamente para uso gratuito.)
O post John Hyde, o homem que orava apareceu primeiro em Campos de Boaz.
3 cPanel Plugins for Added Server Security
256
E em comemoração ao Dia do Programador a loja do Vida de Suporte está com promoção:
Utilize o cupom gerson e ganhe 10% de desconto e um adesivo de brinde.
Mas não demora que os cupons são limitados e válidos somente até o dia 15/09!
256 é um post do blog Vida de Suporte.
Tomas Vondra: Common issues with planner statistics
Some time ago I explained that there really are two kinds of statistics in PostgreSQL, and I explained what are the common issues with statistics tracking database activity, and how (not to) fix them.
That however does not mean there are no issues with data distribution (planner) statistics, and as that's one of my areas of interest, in this post I'll discuss the usual issues with data distribution statistics, usually observed by the user as slow (or even failing) queries. And I'll also mention what are the possible remedies available (if any).
As mentioned in the initial post, data distribution stats describe data stored in the database, so that the planner can use this to choose the best plan. For each column there's a bunch of info about the data distribution:
- (estimated) number of distinct values
- most common values
- histogram of the data
- correlation (with respect to position in the table)
and various additional info. All this is subsequently used when planning and optimizing the queries - estimating selectivity of WHERE conditions, cardinality of joins, amount of memory needed to store some auxiliary data structures (e.g. hash tables for a hash join and hash aggregate), etc.
The official documentation explains the basics quite well - overview of statistics used by the planner and range of row estimation examples if definitely worth reading. And of course, understanding EXPLAIN is an absolute necessity for those who investigate slow queries.
But let's do at least a very (very very) quick crash course here, because we can't really talk about the failures otherwise.
The basics of cost-based optimization
So what happens when the database is planning a query? How does it decide whether to use an index or not, whether to aggregate using sorted or hashed approach, etc.? In the past the databases were often rule based, i.e. those planning decision were in a sense hard-coded, either into the database itself, or into the query. But nowadays most databases use cost-based optimization, i.e. they try to associate each plan with a "cost" of the execution (factoring in expected I/O, CPU, memory requirements and such), and then choose the cheapest one because it's also assumed to be the fastest one.
To do this, the database has to do reasonably accurate estimates of verious things - size of source relation and joins, selectivity of the conditions, size of aggregated relations (GROUP BY results) etc. This is where the data distribution / planner stats are absolutely vital.
For example when you run a query like this:
SELECT * FROM events WHERE event_date BETWEEN '2014-01-10' AND '2014-01-11';
the optimizer peeks into pg_class catalog (a special kind of table) and reads reltuples and relpages for the events table, so that it knows how many rows are there (aka cardinality) and how large the table is on disk (how many 8kB pages it has).
Then it peeks into pg_statistic catalog, reads the histogram and list of most common values for the event_date column, and estimates what portion of rows matches the WHERE condition. This is known as a selectivity of the condition, and by multiplying it with reltuples you get the expected size of the query result.
And then it may use this information to decide whether to use index to lookup these rows, for example.
pg_statistics and pg_stats
So, what statistics are available, actually? For each column, there may be these statistics:
- null_frac - fraction of values that are NULL
- avg_width - average width of the values
- n_distinct - number of distinct values in the column
- most_common_vals - list of most common values (MCV)
- most_common_freqs - frequencies of most common values
- histogram_bounds - equi-width histogram (excluding the values listed in most common values)
- correlation - correlation between values and physical row order
Not all statistics may be available - for mostly uniform distributions there may be no most common values, for columns with just a few values there may not be histograms, etc. I've also omitted a few columns related to array element statistics, introduced a few versions back.
BTW if you want to look at these stats, don't use pg_statistic directly, because the data there are stored in a format suitable for the planner (not for humans). Use pg_stats instead, which is a view on top of pg_statistic making the stats comprehensible.
To compute the selectivity of a condition, the planner fetches this info for the event_date column referenced in the query, looks at the MCV list and histogram, and uses it to compute the selectivity of the condition - let's say it's ~5% of the rows. It then takes reltuples (an estimate of the current number rows in the table) from pg_class, and multiplies it with the selectivity to get the expected number of rows returned by the query. If the table has 1.000.000 rows, the query will probably return about 50.000 rows.
It then does similar evaluation for other parts of the query (other conditions, joins, aggregations, ...) and uses this info to estimate the "execution cost" of the entire plan. Of course, doing this for a single plan would be pointless - the point is that the planner generates multiple possible plans (often very many), computes the expected cost for each of them and then chooses the "cheapest one" because it's expected to be the fastest.
For example there might be an index on the event_date column, so the planner needs to decide whether to perform a simple Sequential Scan, Index Scan or a Bitmap Index Scan. For small fractions of the table an Index Scan is the best option, for large portions it's the Sequential Scan, and the Bitmap Scan is somewhere between. So the planner generates three possible plans, assigns them a cost (based on how many I/O and CPU resources they'll need), and then chooses the cheapest one.
This process is usually called cost-based optimization and it's one of the crucial pieces that turns modern database systems into declarative programming environments. The database receives a declarative specification of the result (aka SQL query) and chooses the execution path it believes to be the most efficient one. What could go possibly wrong with that, right?
Well, sometimes things go wrong. The main issue is that data distribution statistics are just simplified summaries of data, and thus inaccurate (not describing all the tiny details) by nature. You can see this as a lossy compression, where by removing some details you get a more compact representation of the data, but this inaccuracy may propagate to the estimates and then into the estimated cost of a plan. And if the estimates for multiple plans get sufficiently wrong, an inefficient one may be chosen in the end, resulting in much longer query execution.
But how could that happen? I'll try to explain that in a minute, let's look at the cost first.
Cost stability principles
At this point, you're probably screaming "But the costs are just estimates, so they're going to be wrong all the time! How the hell can this even work?" And to some degree you're right - the costs are inaccurate all the time, but it works fine most of the time thanks to a set of "cost stability principles" that makes this much more reliable that you might expect (although sometimes I'm pretty sure it works mostly thanks to gnomes and pixies trapped in the CPU).
The cost estimates are generally expected to have these properties (I'm not aware of established terms for those properties. If there are, please let me know in the discussion or by e-mail.):
- correlation to query duration: The estimated cost is correlated with duration of the query, i.e. higher cost means longer execution.
- estimation stability: A small difference in estimation causes only small difference in costs, i.e. small error in estimation causes only small cost differences.
- cost stability: Small cost difference means small difference in duration.
- cost comparability: For a given query, two plans with (almost) the same costs should result in (almost) the same duration.
This does not say the relation between the cost and actual runtime is linear, or that you can reliably estimate the duration from the cost. Cost 1 usually means a few miliseconds, and cost 1000000000 may easily result in queries running for hours or days, but the relation is very complex and non-linear.
It also does not say it makes sense to compare cost between different queries, it only says you can compare costs for plans for a given query. Sure, to some extent it is possible, but there's not much use for that anyway (because the costs are a tool to choose plans for a particular query, not plans across queries).
It however does say that you don't need to worry about accuracy of cost estimates too much. The goal is to choose an efficient execution plan - the most efficient one in the ideal case. But if there are two plans with similar costs, a small estimation error may result in choosing the second plan. But as the cost difference is also small (estimation stability), the difference in actual duration of the two plans should be small too (cost stability).
In practice, the most efficient plan is usually way cheaper than the other plans, so this works fine unless there are significant estimation errors. And "significant" is usually interpreted as "at least an order of magnitude wrong" - e.g. estimating that a condition matches 1000 rows while in reality it matches 100.000 (i.e. 100x more). A difference this large may easily result in a poor plan choice further down the road - choosing a sequential scan when an index scan would perform much better, etc.
Causes of misestimations
But how could an estimate get this wrong? A number of reasons, actually ...
Inaccurate statistics
Sometimes, the distribution is so complex the default level of details (number of elements of the MVC, number of intervals in the histogram) is not enough, which results in inaccurate selectivity estimates. How detailed the statistics are is determined by default_statistics_target which specifies how many items may be tracked in a MCV list or how many buckets may be in a histogram. Since 8.4 the default value is 100, so MCV lists may have up to 100 items and histograms may have 100 buckets.
So, let's see an example where the MCV list size is insufficient. First, let's construct a table with 1000 frequent values, and many (999000) values that are unique.
CREATE TABLE t (v INT); INSERT INTO t SELECT mod(i,1000)+1 FROM generate_series(1,1000000) s(i); INSERT INTO t SELECT i FROM generate_series(1001, 1000000) s(i); ANALYZE t;
To accurately represent this in a MCV list, we'd need up to 1000 entries, but we only have 100. So let's see some queries - first for the unique values.
EXPLAIN ANALYZE SELECT * FROM t WHERE v = 10000;
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on t (... rows=58 ...) (actual ... rows=1 ...)
Filter: (v = 10000)
Rows Removed by Filter: 1998999
Well, it's not perfectly accurate, but not bad - the unique values can't get to the MCV list, and have to be estimated using histogram. So some fuzziness is expected. Now, let's see one of the common values:
EXPLAIN ANALYZE SELECT * FROM t WHERE v = 100;
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on t (... rows=58 ...) (actual ... rows=1000 ...)
Filter: (v = 100)
Rows Removed by Filter: 1998000
Not that great. Apparently this value did not make it into the MCV list, and falls back to the histogram just like the unique values. (The values that make it to the MCV list may differ for each random sample of rows, so you may have to try a few values to get a misestimate.)
Luckily, this is quite easy to fix because that's exactly what default_statistics_target is for - just crank it up to a bit (either globally or for a single column), run ANALYZE and you're done.
SET default_statistics_target = 1000;
ANALYZE t;
EXPLAIN ANALYZE SELECT * FROM t WHERE v = 100;
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on t (... rows=1079 ...) (actual ... rows=1000 ...)
Filter: (v = 100)
Rows Removed by Filter: 1998000
It's good to do this only for columns that actually need this (those with strange distributions and often used in queries), because it means higher overhead both for ANALYZE and planning. For example
ALTER TABLE events ALTER COLUMN event_date SET STATISTICS = 1000;
increases the statistics target on event_date column from 100 to 1000, making the MCV lists and histograms 10x more detailed.
Inaccurate statistics / ndistinct
A somewhat special case of the inaccurate statistics is are ndistinct estimates (number of distinct values in the column). It sounds quite simple but is actually surprisingly difficult to estimate reliably. For some data distributions (correlated columns) it's made worse by our current row sampling implementation, producing imperfect row samples (not quite random), which causes serious issues in the ndistinct estimator (which of course assumes random row samples).
For example let's create table with 100.000.000 rows, containing 10.000.000 distinct values in the first column (the padding column is there to make the table larger, which influences the sampling):
CREATE TABLE t AS SELECT i/10 AS a, md5(i::text) AS padd
FROM generate_series(1,100000000) s(i);
Now, let's analyze the table and see the ndistinct estimate for the first column
ANALYZE t;
SELECT n_distinct FROM pg_stats WHERE tablename = 't' AND attname = 'a';
n_distinct
------------
421450
Well, we know there are 10M distinct values, but the estimate is just 421.450, so 23x under-estimated. Let's see what would happen if the table was even larger by lowering statistics target (so making the sample smaller with respect to the table).
SET default_statistics_target = 10;
ANALYZE t;
SELECT n_distinct FROM pg_stats WHERE tablename = 't' AND attname = 'a';
n_distinct
------------
49005
So this time the estimate is about 200x under-estimated, and it's not difficult to come up with even worse examples.
Cases like this may easily cause OOM errors in HashAggregate if you need to do GROUP BY on the under-estimated column (sadly, hash aggregate is about the only node that still does not respect work_mem limit).
Increasing statistics_target often improves the ndistinct estimates (we've seen that lowering makes the estimate worse), but the maximum value is 10000 which may not be sufficient for very large tables. And moreover there's a better solution - overriding the estimate with a fixed value (which is not quite possible with MCV lists or histograms, because those are complex statistics).
For example
ALTER TABLE events ALTER COLUMN event_date SET (n_distinct = 12345);
sets the number of distinct values in the event_date column to 12345 (you may remove the override by setting it to 0).
Complex conditions
Even if you have accurate statistics for all the columns, it's quite simple to make them useless by using conditions that are somehow incompatible with the statistics. The statistics are applicable only to simple column conditions - once you use the column in an expression (for example column LIKE '%aaa%') or when you apply a function (like UPPER(column) = 'ABC' or date_part('year', column) = 2014) it's pretty much game over.
Such complex conditions make it mostly impossible to use the statistics at all, because the planner does not know how to apply the statistics on the expression (which may change ordering, for example, so the histograms make no sense), or how to "undo" the expressions, which is actually quite tricky thing (likely impossible in general, especially when it's a function call, so entirely opaque to planner). In those cases the planner just uses some reasonable default selectivities, which may work most of the time, but obviously not always.
Sometimes it's possible to fix manually by inverting the conditions - sometimes it's as trivial are rewriting column + 1 > 100 to column > 99. Sometimes it's necessary to apply some additional knowledge of what the function does. e.g. date_part('year', column) = 2014 may be rewritten as column >= '2014-01-01' AND column < '2015-01-01'.
But sometimes it's not really possible - some conditions simply are complex by nature and can't be rewritten to make them compatible with statistics :-(
Of course, there are various other ways to increase complexity of queries - joins are a primer example, because not only join conditions compare multiple columns, but those columns are in different tables.
Dependent columns (aka "cross-correlation")
So far we've been talking about estimating a single condition, but what about multiple conditions? Let's say we have two conditions (column_a = 1) AND (column_b = 2), and we need to estimate them.
By default, most databases assume that all the conditions are independent, which means that you can simply multiply the selectivities of individual conditions, to get the selectivity of the whole WHERE clause (and thus cardinality of the result). This is based on the observation that selectivities are actually probabilities of events "row matches condition" and that probability of independent events is equal to product of probabilities of each event.
So when you have WHERE condition_a AND condition_b, and you know that each condition matches 10% of the rows, you can do (0.1 * 0.1) which is 0.01 and you know that the whole WHERE clause matches ~0.1% of the whole table. But this was based on the assumption of independence, so what if the columns are correlated in some way?
For example let's assume that the columns are exactly the same values
CREATE TABLE t AS SELECT i AS a, i AS b, i AS c
FROM generate_series(1,1000000) s(i);
ANALYZE t;
and use two conditions that each matches 10% of the table
EXPLAIN ANALYZE SELECT * FROM t WHERE (a < 100000) AND (b < 100000);
The optimizer expects this to match 1% of the table, but in reality this matches 10% because the conditions are perfectly redundant:
QUERY PLAN --------------------------------------------------------------------- Seq Scan on t (cost=... rows=9973 ...) (actual ... rows=99999 ...) Filter: ((a < 100000) AND (b < 100000)) Rows Removed by Filter: 900001
so the estimate is 10x lower than it should be. It's not very difficult to make the difference much larger. For example you may add another condition (with the same selectivity), lowering the estimate by a factor of 10:
EXPLAIN ANALYZE SELECT * FROM t WHERE (a < 100000) AND (b < 100000) AND (c < 100000);
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on t (... rows=1038 ...) (actual ... rows=99999 ...)
Filter: ((a < 100000) AND (b < 100000) AND (c < 100000))
Rows Removed by Filter: 900001
Alternatively it's possible to use more selective conditions, which also increases the difference (thanks to the multiplication)
EXPLAIN ANALYZE SELECT * FROM t WHERE (a < 10000) AND (b < 10000);
QUERY PLAN
------------------------------------------------------
Seq Scan on t (... rows=96 ...) (... rows=9999 ...)
Filter: ((a < 10000) AND (b < 10000))
Rows Removed by Filter: 990001
So far we've only seen under-estimates, i.e. the estimated number of rows was much lower than the actual value. It's quite simple to construct examples of the opposite:
EXPLAIN ANALYZE SELECT * FROM t WHERE (a < 500000) AND (b > 500000);
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on t (... rows=249993 ...) (actual ... rows=0 ...)
Filter: ((a < 500000) AND (b > 500000))
Rows Removed by Filter: 1000000
Which exploits the fact that both columns really contain the same values, so this particular combination of conditions is "incompatible."
But those are simple artificial examples, constructed as an illustration - what about actual data?
In real-world data sets, the independence assumption is only rarely met perfectly. Sometimes columns may be truly independent, often the correlation is very weak (so that it does not really impact the estimates significantly), but sometimes it's very strong and makes the estimates significantly off - just like in the previous examples.
But even if the columns are strongly correlated, it may not be an issue - it really depends on what types of queries you're executing, i.e. what kind of workload you're dealing with. Se let's talk about OLTP and OLAP workloads for a while.
OLTP and OLAP
OLTP is the kind of workload that naturally works with small subsets of the data - accesses individual records using a PK, a few dozens of rows using an index, and so on. So this kind of workload already works with rather low estimates, and the under-estimates won't change the plan significantly. If the query was using Nested Loop before, it's still going to choose Nested Loop.
OLAP workload however work with much larger sets of rows, as it performs analytical queries - large aggregations, selection of large subsets of the data, etc. In this case, the under-estimate may easily change the plan - a scan my switch from Bitmap Index Scan to a plain Index Scan, a join may switch from Hash Join to Nested Loop, and so on. If you've ever dealt with such issues, you know how serious issue this is.
Of course, application are often a mix of OLTP and OLAP queries. An OLTP application may use a few analytical queries for reporting purposes, or perform batch updates - both of which are rather OLAP-style queries. Similarly, OLAP application may allow ad-hoc updates of individual records and other OLTP-style queries.
Under and over-estimates
When I was explaining the impact on OLTP and OLAP workloads in the previous section, I was only talking about under-estimates. There's a good reason for that - the consequences of under-estimates are usually much more severe, in my experience.
Of course, if an over-estimate changes the plan (e.g. by choosing Bitmap Index Scan instead of Index Scan, or Hash Join instead of Nested Loop), it's likely to make the query slower. But the actual cost should actually be lower than the cost of the cheapest plan, because it's not processing as many queries as expected - a Hash Join needs to build the Hash table anyway, but then it will perform maybe 100 lookups instead of the estimated 100.000. A Bitmap Index Scan will have to build the bitmaps even though there's just 100 matching rows. And so on. In a sense, the cost of the plan is an upper boundary of the actual cost.
With the under-estimates, it does not really work this way - there's no such upper cost limit, and the actual cost may grow arbitrarily large.
Anyway, estimates on correlated columns are not a new problem (see for example this paper from VLDB 1997), but it's surprisingly difficult to solve well (without costs making it inefficient to use) and only very few databases implement multi-column statistics (or something like that). We currently don't have anything to address this in PostgreSQL, although I'm working on a patch that should make this possible.
Obtendo nota máxima no teste da Qualys SSL Labs
Enviado por Tiago "Myhro" Ilieve (contatoΘmyhro·info):
Segue um guia sobre como obter classificação "A+" no teste a partir do nginx.” [referência: ]
O artigo "Obtendo nota máxima no teste da Qualys SSL Labs" foi originalmente publicado no site BR-Linux.org, de Augusto Campos.
Diffy, serviço usado pelo Twitter para “testar” código, agora é Open Source
Twitter liberou Diffy, uma ferramenta utilizada pela rede social para encontrar bugs dentro do código de suas criações toda vez que são atualizadas.

A ideia é que, já que cada atualização de um serviço próprio visa expandir o atual código, o mais prudente, embora complicado, é testar seus diferentes componentes. Pois bem, não é tão simples desenhar testes eficientes e muito menos que todos os componentes sejam considerados em conjunto, logo, Diffy oferece eficientes resultados com pouco esforço.
E Diffy funciona correndo o código antigo e o novo simultaneamente, para analisar e gerar boletins com os detalhes encontrados na comparação.
O esquema do trabalho de Diffy, comparando diferentes instâncias com o novo código e o antigo, além de um exemplo e instruções ao usar pela primeira vez, são assinalados tanto no blog do Twitter como na página de Diffy em GitHub onde já está disponível para sua adaptação e/ou melhoria.
Link: Diffy em GitHub | Mais informações: Blog oficial do Twitter
Artigo escrito no br.wwwhatsnew.com
Acompanhe também as notícias pelo twitter: twitter.com/pooldigital ou pelo RSS
Veja também:










