Planeta PythonBrasil

7 subjects (and GitHub repositories) to become a better Go Developer

2021-07-11T00:00:00+00:00

With the high adoption of the Go language by developers and large companies, this has led companies to search for engineers with experience in Go. This can create a lot of pressure of what to study to become a better engineer, this is very personal, it requires planning of what and when to study other subjects (even outside the engineering area). In this blogpost some topics (with repositories and links) that I think are important to know in order to become an engineer person with even better Go knowledge, follow good practices for writing code, concepts of code structure (usually using design pattern), scalable code and clean code.

Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional

2021-06-21T21:51:57+00:00

A Faculdade de Computação e o Programa de Pós-Graduação em Ciência da Computação da UFPA estão desenvolvendo um projeto que pretende atingir dois objetivos: o primeiro, fazer uma melhor divulgação para o público externo à universidade do que produzimos em nossas pesquisas; o segundo, uma melhor divulgação INTERNA da mesma coisa – o que desenvolvemos em nossas pesquisas.

Sim, INTERNA – não bastasse de fato a comunicação deficitária e pulverizada do que fazemos para os nossos próprios alunos, a pandemia só veio a piorar esse quadro. Após mais de um ano sem contato próximo com as turmas, com aulas completamente à distância e sem maiores interações extra-classe, os alunos em geral estão perdidos sobre as possibilidades de temas para TCCs e pesquisas que eles podem realizar conosco.

Dessa forma as duas subunidades estão entrevistando todos os professores para que falem um pouco sobre os temas que trabalham, o histórico de pesquisa, o que vem sendo feito e, mais interessante, o que pode vir a ser feito. As entrevistas ocorrem no canal do YouTube Computação UFPA e depois são retrabalhadas para aparecerem no FacompCast.

Feitas as devidas introduções, nesta terça dia 22/06 às 11h eu e o amigo prof. Jefferson Morais iremos falar um pouco sobre as pesquisas em Inteligência Computacional (ou seria Artificial?) desenvolvidas por nós. Será um bom apanhado sobre os trabalhos em 4 áreas que atuamos – aprendizado de máquina, metaheurísticas, sistemas fuzzy e sistemas multiagentes -, expondo projetos atuais e novos para os interessados.

Nos vemos portanto logo mais na sala da entrevista.

UPDATE:

A gravação já está disponível, segue abaixo:

Decisões na carreira de engenharia de software (desenvolvimento)

2021-06-18T00:00:00+00:00

Ao decidir sobre qual passo seguirá em sua trajetória profissional, acredito que vale pena considerar os seguintes fatores (1). Dependendo do seu momento de pessoal e/ou profissional, diferentes fatores será mais (ou menos) importantes (2). Importante: Assumi-se que você achou melhor se juntar, ao invés de criar (começar uma empresa). Não estou falando de fundar ou não uma empresa, e sim desmitificar auxiliar que queira entrar em uma empresa existente e é parcial com pessoas no início da carreira.

Accessing Google Firestore on Vercel

2021-06-14T00:00:00+00:00

Or on any other cloud service, or language.

TL;DR: Use GOOGLE_APPLICATION_CREDENTIALS with a valid JSON credential to use any Google APIs anywhere.

Firebase Hosting is great, but the new Vercel is awesome for NextJS apps. On Vercel your code runs on Lambda@Edge and it is cached on CloudFront; in the same way, Firebase uses Fastly, another great CDN.

You can not take full advantage of running a NextJS app on Firebase Hosting, only on Vercel, or by deploying manually.

I like to use Firestore on some projects, and unfortunately it is “restricted” to the internal network of Google Cloud, although there is a trick; you can download the service account and export an environment variable named GOOGLE_APPLICATION_CREDENTIALS with the path of the downloaded credential.

First, download the JSON file following this steps.

Then, convert the credentials JSON file to base64:

cat ~/Downloads/project-name-adminsdk-owd8n-43fca28a2a.json | base64

Now copy the result and create an environment variable on Vercel named GOOGLE_CREDENTIALS and paste the contents.

On your NextJS project, create a pages/api/function.js and add the following code:

import os from "os"
import { promises as fsp } from "fs"
import path from "path"

import { Firestore, FieldValue } from "@google-cloud/firestore"

let _firestore = null

const lazyFirestore = () => {
  if (!_firestore) {
    const baseDir = await fsp.mkdtemp((await fsp.realpath(os.tmpdir())) + path.sep)
    const fileName = path.join(baseDir, "credentials.json")
    const buffer = Buffer.from(process.env.GOOGLE_CREDENTIALS, "base64")
    await fsp.writeFile(fileName, buffer)

    process.env["GOOGLE_APPLICATION_CREDENTIALS"] = fileName

    _firestore = new Firestore()
  }

  return _firestore
}

export default async (req, res) => {
  const firestore = await lazyFirestore()

  const increment = FieldValue.increment(1)
  const documentRef = firestore.collection("v1").doc("default")

  await documentRef.update({ counter: increment })

  res.status(200).json({})
}

Done! Now it is possible to use Firestore on Vercel or anywhere.

Project of example.

Or on any other cloud service, or language.

Reconhecimento pelo trabalho com Open Source - GitHub Star

2021-06-10T00:00:00+00:00

Tenho a felicidade de compartilhar que fui premiado com o status de GitHub Stars por GitHub e nomeação por diversas pessoas - obrigado a todos, que tornaram isso possível. Estou contribuindo e criando software Open Source desde 2008. Entrei no GitHub 31 outubro de 2008 e vê um universo de oportunidade para aprender olhando código de softwares que usava no meu dia a dia. No começo tive dificuldade de receber não em minhas contribuições (issues e pull requests), mas depois que entendi a “dinâmica” de um projeto open source comecei encarar o não como oportunidade de aprender.

Minha meta como gestor é: ser mandado embora no final do dia

2021-05-19T00:00:00+00:00

Compartilho com meu time que minha meta é ser “mandado em bora” no final do dia, o que quero dizer com isso? Meu trabalho como gestor é fazer meu time trabalhar sem minha dependência, se eles estão conseguindo andar (entregar o combinado, dar suporte a profissionais menos experientes, se comunicar com pessoas não técnicas e etc) sem minha dependência quer dizer que fiz um ótimo trabalho dando a autonomia necessária para todos tomar decisão sem pedir permissão.

Orientação a objetos de outra forma: Property

2021-05-17T21:00:00+00:00

Seguindo com a série, chegou a hora de discutir sobre encapsulamento, ou seja, ocultar detalhes de implementação de uma classe do resto do código. Em algumas linguagens de programação isso é feito utilizando protected ou private, e às vezes o acesso aos atributos é feito através de funções getters e setters. Nesse texto vamos ver como o Python lida com essas questões.

Métodos protegidos e privados

Diferente de linguagens como Java e PHP que possuem palavras-chave como protected e private para impedir que outras classes acessem determinados métodos ou atributos, Python deixa tudo como público. Porém isso não significa que todas as funções de uma classe podem ser chamadas por outras, ou todos os atributos podem ser lidos e alterados sem cuidados.

Para que quem estiver escrevendo um código saiba quais as funções ou atributos que não deveriam ser acessados diretamente, segue-se o padrão de começá-los com _, de forma similar aos arquivos ocultos em sistemas UNIX, que começam com .. Esse padrão já foi seguido na classe AutenticavelComRegistro da postagem sobre mixins, onde a função que pega a data do sistema foi nomeada _get_data. Entretanto isso é apenas uma sugestão, nada impede dela ser chamada, como no exemplo a baixo:

from datetime import datetime


class Exemplo:
    def _get_data(self):
        return datetime.now().strftime('%d/%m/%Y %T')


obj = Exemplo()
print(obj._get_data())

Porém algumas bibliotecas também utilizam o _ para indicar outras informações como metadados do objeto, e que podem ser acessados sem muitos problemas. Assim é possível utilizar esse símbolo duas vezes (__) para indicar que realmente essa variável ou função não deveria ser acessada de fora da classe, apresentando erro de que o atributo não foi encontrado ao tentar executar a função, porém ela ainda pode ser acessada:

from datetime import datetime


class Exemplo:
    def __get_data(self):
        return datetime.now().strftime('%d/%m/%Y %T')


obj = Exemplo()
print(obj.__get_data())  # AttributeError
print(obj._Exemplo__get_data())  # Executa a função

Property

Os getters e setters muitas vezes são usados para impedir que determinadas variáveis sejam alteradas, ou validar o valor antes de atribuir a variável, ou ainda processar um valor a partir de outras variáveis. Porém como o Python incentiva o acesso direto as variáveis, existe a property, que ao tentar acessar uma variável ou alterar um valor, uma função é chamada. Exemplo:

class Pessoa:
    def __init__(self, nome, sobrenome, idade):
        self._nome = nome
        self.sobrenome = sobrenome
        self._idade = idade

    @property
    def nome(self):
        return self._nome

    @property
    def nome_completo(self):
        return f'{self.nome} {self.sobrenome}'

    @nome_completo.setter
    def nome_completo(self, valor):
        valor = valor.split(' ', 1)
        self._nome = valor[0]
        self.sobrenome = valor[1]

    @property
    def idade(self):
        return self._idade

    @idade.setter
    def idade(self, valor):
        if valor < 0:
            raise ValueError
        self._idade = valor

    def fazer_aniversario(self):
        self.idade += 1

Nesse código algumas variáveis são acessíveis através de properties, de forma geral, as variáveis foram definidas começando com _ e com uma property de mesmo nome (sem o _). O primeiro caso é o nome, que possui apenas o getter, sendo possível o seu acesso como obj.nome, porém ao tentar atribuir um valor, será lançado um erro (AttributeError: can't set attribute). Em relação ao sobrenome, como não é necessário nenhum tratamento especial, não foi utilizado um property, porém futuramente pode ser facilmente substituído por um sem precisar alterar os demais códigos. Porém a função nome_completo foi substituída por um property, permitindo tanto o acesso ao nome completo da pessoa, como se fosse uma variável, quanto trocar nome e sobrenome ao atribuir um novo valor para essa property. Quanto a idade utiliza o setter do property para validar o valor recebido, retornando erro para idades inválidas (negativas).

Vale observar também que todas as funções de getter não recebem nenhum argumento (além do self), enquanto as funções de setter recebem o valor atribuído à variável.

Utilizando a ABC, ainda é possível informar que alguma classe filha deverá implementar alguma property. Exemplo:

from abc import ABC


class Pessoa(ABC):
    def __init__(self, nome, sobrenome, idade):
        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

    @property
    @abstractmethod
    def nome_completo(self):
        ...


class Brasileiro(Pessoa):
    @property
    def nome_completo(self):
        return f'{self.nome} {self.sobrenome}'


class Japones(Pessoa):
    @property
    def nome_completo(self):
        return f'{self.sobrenome} {self.nome}'

Considerações

Diferente de algumas linguagens que ocultam as variáveis dos objetos, permitindo o seu acesso apenas através de funções, Python seguem no sentido contrário, acessando as funções de getter e setter como se fossem variáveis, isso permite começar com uma classe simples e ir adicionando funcionalidades conforme elas forem necessárias, sem precisar mudar o código das demais partes da aplicação, além de deixar transparente para quem desenvolve, não sendo necessário lembrar se precisa usar getteres e setteres ou não.

De forma geral, programação orientada a objetos consiste em seguir determinados padrões de código, e as linguagens que implementam esse paradigma oferecem facilidades para escrever código seguindo esses padrões, e às vezes até ocultando detalhes complexos de suas implementações. Nesse contexto, eu recomendo a palestra do autor do htop feita no FISL 16, onde ele comenta como usou orientação a objetos em C. E para quem ainda quiser se aprofundar no assunto de orientação a objetos no Python, recomendo os vídeos do Eduardo Mendes (também conhecido como dunossauro).

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

Iniciando com o ORM Pony no Python III - Erros e Exceções

2021-05-16T00:00:00+00:00

Seguindo o trilha com o ORM Pony, neste terceiro texto, agora começaremos a gerar situações de erro e ver o que os objetos retornam ou que exceções são geradas.

Decifrando o Zen do Python

2021-05-12T00:00:00+00:00

Eis que buscando algum conjunto de dados dentro do Python, deparei-me com a verdade sobre o Zen do Python

Iniciando com o ORM Pony no Python II - Banco de Dados com Docker

2021-05-11T00:00:00+00:00

Continuando a jornada com o ORM Pony, agora com outros Banco de Dados, em Docker

Orientação a objetos de outra forma: ABC

2021-05-10T15:00:00+00:00

Na discussão sobre herança e mixins foram criadas várias classes, como Autenticavel e AutenticavelComRegistro que adicionam funcionalidades a outras classes e implementavam tudo o que precisavam para seu funcionamento. Entretanto podem existir casos em que não seja possível implementar todas as funções na própria classe, deixando com que as classes que a estende implemente essas funções. Uma forma de fazer isso é través das ABC (abstract base classes, ou classes base abstratas).

Sem uso de classes base abstratas

Um exemplo de classe que não é possível implementar todas as funcionalidades foi dada no texto Encapsulamento da lógica do algoritmo, que discutia a leitura de valores do teclado até que um valor válido fosse lido (ou que repete a leitura caso um valor inválido tivesse sido informado). Nesse caso a classe ValidaInput implementava a lógica base de funcionamento, porém eram suas classes filhas (ValidaNomeInput e ValidaNotaInput) que implementavam as funções para tratar o que foi lido do teclado e verificar se é um valor válido ou não.

class ValidaInput:
    mensagem_valor_invalido = 'Valor inválido!'

    def ler_entrada(self, prompt):
        return input(prompt)

    def transformar_entrada(self, entrada):
        raise NotImplementedError

    def validar_valor(self, valor):
        raise NotImplementedError

    def __call__(self, prompt):
        while True:
            try:
                valor = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_valor(valor):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return valor


class ValidaNomeInput(ValidaInput):
    mensagem_valor_invalido = 'Nome inválido!'

    def transformar_entrada(self, entrada):
        return entrada.strip().title()

    def validar_valor(self, valor):
        return valor != ''


class ValidaNotaInput(ValidaInput):
    mensagem_valor_invalido = 'Nota inválida!'

    def transformar_entrada(self, entrada):
        return float(entrada)

    def validar_valor(self, valor):
        return 0 <= valor <= 10

Entretanto, esse código permite a criação de objetos da classe ValidaInput mesmo sem ter uma implementação das funções transformar_entrada e validar_valor. E a única mensagem de erro ocorreria ao tentar executar essas funções, o que poderia estar longe do problema real, que é a criação de um objeto a partir de uma classe que não prove todas as implementações das suas funções, o que seria semelhante a uma classe abstrata em outras linguagens.

obj = ValidaInput()

# Diversas linhas de código

obj('Entrada: ')  # Exceção NotImplementedError lançada

Com uso de classes base abstratas

Seguindo a documentação da ABC, para utilizá-las é necessário informar a metaclasse ABCMeta na criação da classe, ou simplesmente estender a classe ABC, e decorar com abstractmethod as funções que as classes que a estenderem deverão implementar. Exemplo:

from abc import ABC, abstractmethod


class ValidaInput(ABC):
    mensagem_valor_invalido = 'Valor inválido!'

    def ler_entrada(self, prompt):
        return input(prompt)

    @abstractmethod
    def transformar_entrada(self, entrada):
        ...

    @abstractmethod
    def validar_valor(self, valor):
        ...

    def __call__(self, prompt):
        while True:
            try:
                valor = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_valor(valor):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return valor

Desta forma, ocorrerá um erro já ao tentar criar um objeto do tipo ValidaInput, dizendo quais são as funções que precisam ser implementadas. Porém funcionará normalmente ao criar objetos a partir das classes ValidaNomeInput e ValidaNotaInput visto que elas implementam essas funções.

obj = ValidaInput()  # Exceção TypeError lançada

nome_input = ValidaNomeInput()  # Objeto criado
nota_input = ValidaNotaInput()  # Objeto criado

Como essas funções não utilizam a referência ao objeto (self), ainda é possível decorar as funções com staticmethod, como:

from abc import ABC, abstractmethod


class ValidaInput(ABC):
    mensagem_valor_invalido = 'Valor inválido!'

    @staticmethod
    def ler_entrada(prompt):
        return input(prompt)

    @staticmethod
    @abstractmethod
    def transformar_entrada(entrada):
        ...

    @staticmethod
    @abstractmethod
    def validar_valor(valor):
        ...

    def __call__(self, prompt):
        while True:
            try:
                valor = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_valor(valor):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return valor


class ValidaNomeInput(ValidaInput):
    mensagem_valor_invalido = 'Nome inválido!'

    @staticmethod
    def transformar_entrada(entrada):
        return entrada.strip().title()

    @staticmethod
    def validar_valor(valor):
        return valor != ''


class ValidaNotaInput(ValidaInput):
    mensagem_valor_invalido = 'Nota inválida!'

    @staticmethod
    def transformar_entrada(entrada):
        return float(entrada)

    @staticmethod
    def validar_valor(valor):
        return 0 <= valor <= 10

Isso também seria válido para funções decoradas com classmethod, que receberiam a referência a classe (cls).

Considerações

Não é necessário utilizar ABC para fazer o exemplo discutido, porém ao utilizar essa biblioteca ficou mais explícito quais as funções que precisavam ser implementados nas classes filhas, ainda mais que sem utilizar ABC a classe base poderia nem ter as funções, com:

class ValidaInput:
    mensagem_valor_invalido = 'Valor inválido!'

    def ler_entrada(self, prompt):
        return input(prompt)

    def __call__(self, prompt):
        while True:
            try:
                valor = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_valor(valor):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return valor

Como Python possui duck-typing, não é necessário uma grande preocupação com os tipos, como definir e utilizar interfaces presentes em outras implementações de orientação a objetos, porém devido à herança múltipla, ABC pode ser utilizada como interface que não existe em Python, fazendo com que as classes implementem determinadas funções. Para mais a respeito desse assunto, recomendo as duas lives do dunossauro sobre ABC (1 e 2), e a apresentação do Luciano Ramalho sobre type hints.

Uma classe filha também não é obrigada a implementar todas as funções decoradas com abstractmethod, mas assim como a classe pai, não será possível criar objetos a partir dessa classe, apenas de uma classe filha dela que implemente as demais funções. Como se ao aplicar um abstractmethod tornasse a classe abstrata, e qualquer classe filha só deixasse de ser abstrata quando a última função decorada com abstractmethod for sobrescrita. Exemplo:

from abc import ABC, abstractmethod


class A(ABC):
    @abstractmethod
    def func1(self):
        ...

    @abstractmethod
    def func2(self):
        ...


class B(A):
    def func1(self):
        print('1')


class C(B):
    def func2(self):
        print('2')


a = A()  # Erro por não implementar func1 e func2
b = B()  # Erro por não implementar func2
c = C()  # Objeto criado

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

Juros e lógica

2021-05-06T18:18:00+00:00

Um dos problemas de lógica de programação que mais quebram a cabeça de quem está começando são os problemas com cálculo de porcentagens ou juros.

Começa assim: Calcule 10% de aumento de um salário de R$2.500,00

Dependendo da base matemática do aluno, porcentagem se aprende na quarta/quinta série… alguns conceitos tem que ser relembrados. Como diz o nome, a 10% significa que a cada 100 do valor, você deve retirar 10. O cálculo é bem simples, pode ser realizado com uma multiplicação. Por exemplo: 10/100 * 2500, que resulta em 250. Até aí tudo bem, mas alguns alunos perguntam por que as vezes fazemos 0.1 * 2500. Bem, é apenas a forma de representar que muda, pois 10/100 é equivalente a 0.1, alias 100/1000 também e por aí vai. Mas está é a parte fácil e depois de acertar a notação, as coisas voltam a andar novamente.

Quando chegamos em repetições, problemas como cálculo da poupança começam a aparecer. Eles tem o seguinte formato: imagine que alguém deposita R$1.000,00 para poupança e que esta rende 1% ao mês (bons tempos). Depois de n meses, quanto a pessoa terá como saldo? O problema combina porcentagem com repetição. Vejamos como fica para 6 meeses:

saldo = 1000  # Valor inicial
n = 6         # Número de meses
juros = 0.01  # Juros mensal 1% = 1/100 = 0.01
for mês in range(1, n + 1):
   saldo *= 1 + juros
   print(f"Mês ({mês}): {saldo:7.2f}")

que resulta em:

Mês (1): 1010.00
Mês (2): 1020.10
Mês (3): 1030.30
Mês (4): 1040.60
Mês (5): 1051.01
Mês (6): 1061.52

Várias perguntas surgem.

Por que o `*=`?

É que o juros é composto, ou seja, aplicado ao saldo precedente. Não se calcula 1% do saldo inicial e se multiplica pelo número de meses. Vai ficar mais claro depois com a fórmula.

Por que se soma 1 ao juros?

Como estamos multiplicando o saldo, precisamos ajustar o novo saldo para que seja igual ao anterior mais o juros do mês. Em formato mais longo seria saldo = saldo + juros * saldo. Este cálculo pode ser simplificado, agrupando a variável saldo de forma que saldo = saldo * (1 + juros). Como em Python podemos escrever saldo = saldo * como saldo *=, a expressão fica resumida a saldo *= (1 + juros), veja que retirei os parênteses pois não são mais necessários.

Precisa de repetição para realizar este cálculo?

Não, você pode ter uma solução analítica, aplicando a fórmula: `saldo = saldo * (1 + juros) ** mês). Exemplo:

>>> print(f"{1000 * (1 + 0.01) ** 6:7.2f}")
1061.52

E de onde vem essa exponenciação?

Se voltarmos ao exemplo com repetição, veremos que:

No mês 1, o saldo é saldo * 1.01.

O saldo no mês 2 é saldo * 1.01 * 1.01

No mês 3, saldo * 1.01 * 1.01 * 1.01

E assim por diante, começamos a ver um padrão onde o 1.01 é multiplicado por ele mesmo o número de mês que estamos calculando. Podemos então escrever de forma genérica que saldo *= 1.01 ** mês, que é justamente a definição da exponenciação: a ** n = a * a * a ... (n vezes)

Por que usamos a repetição?

Porque o curso é de lógica de programação e o professor quer que você tenha uma motivo para usar o for ou while.

Uma questão que apareceu hoje no StackOverflow em Português

A questão foi votada para baixo e logo depois fechada :-(

Duas fabricantes de calçado disputam o mercado no Brasil. A empresa A tem produção de 10.000 pares/mês e um crescimento mensal de 15%. A empresa B, de 8.000 pares/mês e tem um crescimento mensal de 20%. Determinar o número de meses necessários para que a empresa B supere o número de pares produzidos pela empresa A.

Vamos ver como ficaria isso na matemática:

Produção da empresa A: 10000 x 1.15^m

Produção da empresa B: 8000 x 1.20^m

Onde m é número de meses.

O que você procura é o valor de m quando:

8000 x 1.20^m > 10000 x 1.15^m

Você pode resolver isso usando logaritmos, mas num curso de lógica de programação, o professor provavelmente espera que você varie o valor de m de 1 em 1.

Então, tente calcular a produção quando m = 1. Compare a produção das duas fábricas (use as fórmulas acima). Se o valor de B não ultrapassar o de A, continue incrementando m de 1. Pare quando a produção de B for maior que A (a resposta é o valor de m quando isso acontecer).

m = 0
while True:
   prodA = 10000 * 1.15 ** m
   prodB = 8000 * 1.2 ** m
   print(f"Prod A: {prodA:8.2f} Prod B: {prodB:8.2f} - mês: {m}")
   if prodB > prodA:
      break
   m += 1
print(f"Meses para que a produção de B ultrapasse a produção de A: {m}")

que resulta em:

Prod A: 10000.00 Prod B:  8000.00 - mês: 0
Prod A: 11500.00 Prod B:  9600.00 - mês: 1
Prod A: 13225.00 Prod B: 11520.00 - mês: 2
Prod A: 15208.75 Prod B: 13824.00 - mês: 3
Prod A: 17490.06 Prod B: 16588.80 - mês: 4
Prod A: 20113.57 Prod B: 19906.56 - mês: 5
Prod A: 23130.61 Prod B: 23887.87 - mês: 6
Meses para que a produção de B ultrapasse a produção de A: 6

E a solução analítica? Como falei o curso é de lógica de programação :-D

Mas você pode resolver a desigualdade usando logaritmos:

8000 x 1.20^m > 10000 x 1.15^m

ln(8000) + m * ln(1.20) > ln(10000) + m * ln(1.15)

m * ln(1.20) - m * ln(1.15) > ln(10000) - ln(8000)

m * (ln(1.20) - ln(1.15)) > ln(10000) - ln(8000)

m = (ln(10000) - ln(8000))/(ln(1.20) - ln(1.15))

que em Python fica:

from math import log
m = (log(10000) - log(8000))/(log(1.20) - log(1.15))
print(m)

que resulta em:

5.243082071149164

Que podemos arredondar para 6, caso não consideremos frações de um mês. Agora conhecemos o método analítico e o iterativo. Fica fácil de entender porque o segundo é mais usado em cursos de lógica de programação.

Orientação a objetos de outra forma: Herança múltiplas e mixins

2021-05-03T18:00:00+00:00

No texto anterior foi apresentando o conceito de herança, que herda toda a estrutura e comportamento de uma classe, podendo estendê-la com outros atributos e comportamentos. Esse texto apresentará a ideia de herança múltipla, e uma forma para se aproveitar esse recurso, através de mixins.

Herança múltiplas

Voltando ao sistema para lidar com dados das pessoas, onde algumas dessas pessoas possuem a possibilidade de acessar o sistema através de usuário e senha, também deseja-se permitir que outros sistemas autentiquem e tenham acesso os dados através de uma API. Isso pode ser feito criando uma classe para representar os sistemas que terão permissão para acessar os dados. Exemplo:

class Sistema:
    def __init__(self, usuario, senha):
        self.usuario = usuario
        self.senha = senha

    def autenticar(self, usuario, senha):
        return self.usuario == usuario and self.senha == senha

Porém, esse código repete a implementação feita para PessoaAutenticavel:

class PessoaAutenticavel(Pessoa):
    def __init__(self, nome, sobrenome, idade, usuario, senha):
        super().__init__(nome, sobrenome, idade)
        self.usuario = usuario
        self.senha = senha

    def autenticar(self, usuario, senha):
        return self.usuario == usuario and self.senha == senha

Aproveitando que Python, diferente de outras linguagens, possui herança múltipla, é possível extrair essa lógica das classes, centralizando a implementação em uma outra classe e simplesmente herdá-la. Exemplo:

class Autenticavel:
    def __init__(self, *args, usuario, senha, **kwargs):
        super().__init__(*args, **kwargs)
        self.usuario = usuario
        self.senha = senha

    def autenticar(self, usuario, senha):
        return self.usuario == usuario and self.senha == senha


class PessoaAutenticavel(Autenticavel, Pessoa):
    ...


class Sistema(Autenticavel):
    ...


p = PessoaAutenticavel(nome='João', sobrenome='da Silva', idade=20,
                       usuario='joao', senha='secreta')

A primeira coisa a ser observada são os argumentos *args e **kwargs no __init__ da classe Autenticavel, eles são usados uma vez que não se sabe todos os argumentos que o __init__ da classe que estenderá o Autenticavel espera receber, funcionando de forma dinâmica (mais sobre esse recurso pode ser visto na documentação do Python).

A segunda coisa a ser verificada é que para a classe PessoaAutenticavel, agora cria em seus objetos, a estrutura tanto da classe Pessoa, quanto Autenticavel. Algo similar a versão sem orientação a objetos a baixo:

# Arquivo: pessoa_autenticavel.py

import autenticavel
import pessoa


def init(p, nome, sobrenome, idade, usuario, senha):
    pessoa.init(p, nome, sobrenome, idade)
    autenticavel.init(p, usuario, senha)

Também vale observar que as classes PessoaAutenticavel e Sistema não precisam definir nenhuma função, uma vez que elas cumprem seus papéis apenas herdando outras classes, porém seria possível implementar funções específicas dessas classes, assim como sobrescrever as funções definidas por outras classes.

Ordem de resolução de métodos

Embora herança múltiplas sejam interessantes, existe um problema, se ambas as classes pai possuírem uma função com um mesmo nome, a classe filha deveria chamar qual das funções? A do primeiro pai? A do último? Para lidar com esse problema o Python usa o MRO (method resolution order, ordem de resolução do método), que consiste em uma tupla com a ordem de qual classe o Python usará para encontrar o método a ser chamado. Exemplo:

print(PessoaAutenticavel.__mro__)
# (<class '__main__.PessoaAutenticavel'>, <class '__main__.Autenticavel'>, <class '__main__.Pessoa'>, <class 'object'>)

Por esse motivo que também foi possível chamar o super().__init__ dentro de Autenticavel, que devido ao MRO, o Python chama o __init__ da outra classe pai da classe que estendeu Autenticavel, em vez de precisar fazer um método __init__ em PessoaAutenticavel chamando o __init__ de todas as suas classes pais, como foi feito na versão sem orientação a objetos. E por isso a ordem Autenticavel e Pessoa na herança de PessoaAutenticavel, para fazer o MRO procurar os métodos primeiro em Autenticavel e depois em Pessoa.

Para tentar fugir da complexidade que pode ser herança múltipla, é possível escrever classes que tem por objetivo unicamente incluir alguma funcionalidade em outra, como o caso da classe Autenticavel, que pode ser herdada por qualquer outra classe do sistema para permitir o acesso ao sistema. Essas classes recebem o nome de mixins, e adiciona uma funcionalidade bem definida.

Estendendo mixins

Imagine se além de permitir o acesso ao sistema, também gostaríamos de registrar algumas tentativas de acesso, informando quando houve a tentativa e se o acesso foi concedido ou não. Como Autenticavel é uma classe, é possível extendê-la para implementar essa funcionalidade na função autenticar. Exemplo:

from datetime import datetime


class AutenticavelComRegistro(Autenticavel):
    @staticmethod
    def _get_data():
        return datetime.now().strftime('%d/%m/%Y %T')

    def autenticar(self, usuario, senha):
        print(f'{self._get_data()} Tentativa de acesso de {usuario}')
        acesso = super().autenticar(usuario, senha)
        if acesso:
            acesso_str = 'permitido'
        else:
            acesso_str = 'negado'
        print(f'{self._get_data()} Acesso de {usuario} {acesso_str}')
        return acesso


class PessoaAutenticavelComRegistro(AutenticavelComRegistro, Pessoa):
    ...


class SistemaAutenticavelComRegistro(AutenticavelComRegistro, Sistema):
    ...


p = PessoaAutenticavelComRegistro(
    nome='João', sobrenome='da Silva', idade=20,
    usuario='joao', senha='secreta',
)
p.autenticar('joao', 'secreta')
# Saída na tela:
# 23/04/2021 16:56:58 Tentativa de acesso de joao
# 23/04/2021 16:56:58 Acesso de joao permitido

Essa implementação utiliza-se do super() para acessar a função autenticar da classe Autenticavel para não precisar reimplementar a autenticação. Porém, antes de chamá-la, manipula seus argumentos para registrar quem tentou acessar o sistema, assim como também manipula o seu retorno para registrar se o acesso foi permitido ou não.

Essa classe também permite analisar melhor a ordem em que as classes são consultadas quando uma função é chamada:

print(PessoaAutenticavelComRegistro.__mro__)
# (<class '__main__.PessoaAutenticavelComRegistro'>, <class '__main__.AutenticavelComRegistro'>, <class '__main__.Autenticavel'>, <class '__main__.Pessoa'>, <class 'object'>)

Que também pode ser visto na forma de um digrama de classes:

Onde é feito uma busca em profundidade, como se a função fosse chamada no primeiro pai, e só se ela não for encontrada, busca-se no segundo pai e assim por diante. Também é possível observar a classe object, que sempre será a última classe, e é a classe pai de todas as outras classes do Python quando elas não possuirem um pai declarado explicitamente.

Considerações

Herança múltipla pode dificultar bastante o entendimento do código, principalmente para encontrar onde determinada função está definida, porém pode facilitar bastante o código. Um exemplo que usa bastante herança e mixins são as views baseadas em classe do django (class-based views), porém para facilitar a visualização existe o site Classy Class-Based Views que lista todas as classes, e os mixins utilizados em cada uma, como pode ser visto em "Ancestors" como na UpdateView, que é usado para criar uma página com formulário para editar um registro já existente no banco, assim ela usa mixins para pegar um objeto do banco (SingleObjectMixin), processar formulário baseado em uma tabela do banco (ModelFormMixin) e algumas outras funcionalidades necessárias para implementar essa página.

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

Configurando limite de recursos em aplicações Java (JVM) no Kubernetes

2021-05-01T00:00:00+00:00

Fazer deploy de software desenvolvido usando tecnologias que foram criadas para ter escalabilidade vertical para escalar horizontalmente (micro serviço, nano serviço e etc) em produção pode gerar alguns desafios que não estamos preparados. Principalmente quando o software esta rodando em JVM e não foi declarado limites de recursos. -Xms, -Xmx e seus problemas Ao estudar sobre a JVM você provavelmente passara pelos parâmetros de alocação inicial (Xms) e alocação máxima (Xmx) de memória, os parâmetros funcionam rigorosamente bem.

Falar sobre 'Assuntos Difíceis'

2021-04-29T00:00:00+00:00

Em nossa vida é necessário encarar de frente assuntos considerados difíceis ou tabu com qualquer pessoa, para isso precisamos ter coragem e maturidade para lidar com naturalidade com qualquer tema - mesmo se ele nos tire da zona de conforto. Para falar sobre esse assunto, vamos começar pelo porquê. Por que é comum procrastinar uma conversa quando envolve assunto difícil? Quando não falamos constantemente sobre um tipo de assunto ele se torna “difícil” por falta de familiaridade e sensação de desconforto.

Iniciando com o ORM Pony no Python

2021-04-28T00:00:00+00:00

Depois de anos só no Django, estou eu sendo iniciado na simplicidade e elegância do Pony ORM

Orientação a objetos de outra forma: Herança

2021-04-26T20:00:00+00:00

Algo que ajuda no desenvolvimento é a reutilização de código. Em orientação a objetos, essa reutilização pode ocorrer através de herança, onde um objeto pode se comportar como um objeto da sua própria classe, como também da classe que herdou.

Adicionando funcionalidades

Uma das utilidades da herança é estender uma classe para adicionar funcionalidades. Pensando no contexto das postagens anteriores, poderíamos querer criar um usuário e senha para algumas pessoas poderem acessar o sistema. Isso poderia ser feito adicionando atributos usuário e senha para as pessoas, além de uma função para validar se os dados estão corretos, e assim permitir o acesso ao sistema. Porém isso não pode ser feito para todas as pessoas, e sim apenas para aqueles que possuem permissão de acesso.

Sem orientação a objetos

Voltando a solução com dicionários (sem utilizar orientação a objetos), isso consistiria em criar um dicionário com a estrutura de uma pessoa, e em seguida estender essa estrutura com os novos campos de usuário e senha nesse mesmo dicionário, algo como:

# Arquivo: pessoa.py

def init(pessoa, nome, sobrenome, idade):
    pessoa['nome'] = nome
    pessoa['sobrenome'] = sobrenome
    pessoa['idade'] = idade


def nome_completo(pessoa):
    return f"{pessoa['nome']} {pessoa['sobrenome']}"

# Arquivo: pessoa_autenticavel.py

def init(pessoa, usuario, senha):
    pessoa['usuario'] = usuario
    pessoa['senha'] = senha


def autenticar(pessoa, usuario, senha):
    return pessoa['usuario'] == usuario and pessoa['senha'] == senha

import pessoa
import pessoa_autenticavel

p = {}
pessoa.init(p, 'João', 'da Silva', 20)
pessoa_autenticavel.init(p, 'joao', 'secreta')

print(pessoa.nome_completo(p))
print(pessoa_autenticavel.autenticar(p, 'joao', 'secreta'))

Porém nessa solução é possível que o programador esqueça de chamar as duas funções init diferentes, e como queremos que todo dicionário com a estrutura de pessoa_autenticavel contenha também a estrutura de pessoa, podemos chamar o init de pessoa dentro do init de pessoa_autenticavel:

# Arquivo: pessoa_autenticavel.py

import pessoa


def init(p, nome, sobrenome, idade, usuario, senha):
    pessoa.init(p, nome, sobrenome, idade)
    p['usuario'] = usuario
    p['senha'] = senha


...  # Demais funções

import pessoa
import pessoa_autenticavel

p = {}
pessoa_autenticavel.init(p, 'João', 'da Silva', 20, 'joao', 'secreta')

print(pessoa.nome_completo(p))
print(pessoa_autenticavel.autenticar(p, 'joao', 'secreta'))

Nesse caso foi necessário alterar o nome do argumento pessoa da função pessoa_autenticavel.init para não conflitar com o outro módulo importado com esse mesmo nome. Porém ao chamar um init dentro de outro, temos a garantia de que o dicionário será compatível tanto com a estrutura pedida para ser criada pelo programador, quanto pelas estruturas pais dela.

Com orientação a objetos

class Pessoa:
    def __init__(self, nome, sobrenome, idade):
        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

    def nome_completo(self):
        return f'{self.nome} {self.sobrenome}'


class PessoaAutenticavel(Pessoa):
    def __init__(self, nome, sobrenome, idade, usuario, senha):
        Pessoa.__init__(self, nome, sobrenome, idade)
        self.usuario = usuario
        self.senha = senha

    def autenticar(self, usuario, senha):
        return self.usuario == usuario and self.senha == senha


p = PessoaAutenticavel('João', 'da Silva', 20, 'joao', 'secreta')

print(Pessoa.nome_completo(p))
print(PessoaAutenticavel.autenticar(p, 'joao', 'secreta'))

A principal novidade desse exemplo é que ao declarar a classe PessoaAutenticavel (filha), foi declarado a classe Pessoa (pai) entre parênteses, isso faz o interpretador Python criar uma cópia dessa classe estendendo-a com as novas funções que estamos criando. Porém pode ser um pouco redundante chamar Pessoa.__init__ dentro da função __init__ sendo que já foi declarado que ela estende Pessoa, podendo ser trocado por super(), que aponta para a classe que foi estendida. Exemplo:

class PessoaAutenticavel(Pessoa):
    def __init__(self, nome, sobrenome, idade, usuario, senha):
        super().__init__(nome, sobrenome, idade)
        self.usuario = usuario
        self.senha = senha

    ...  # Demais funções

Assim se evita repetir o nome da classe, e já passa automaticamente a referência para self, assim como quando usamos o açúcar sintático apresentado na primeira postagem dessa série. E esse açúcar sintática também pode ser usado para chamar tanto as funções declaradas em Pessoa quanto em PessoaAutenticavel. Exemplo:

p = PessoaAutenticavel('João', 'da Silva', 20, 'joao', 'secreta')

print(p.nome_completo())
print(p.autenticar('joao', 'secreta'))

Esse método também facilita a utilização das funções, uma vez que não é necessário lembrar em qual classe que cada função foi declarada. Na verdade, como PessoaAutenticavel estende Pessoa, seria possível executar também PessoaAutenticavel.nome_completo, porém eles apontam para a mesma função.

Sobrescrevendo uma função

A classe Pessoa possui a função nome_completo que retorna uma str contento nome e sobrenome. Porém no Japão, assim como em outros países asiáticos, o sobrenome vem primeiro, e até estão pedindo para seguir a tradição deles ao falarem os nomes de japoneses, como o caso do primeiro-ministro, mudando de Shinzo Abe para Abe Shinzo.

Com orientação a objetos

Isso também pode ser feito no sistema usando herança, porém em vez de criar uma nova função com outro nome, é possível criar uma função com o mesmo nome, sobrescrevendo a anterior, porém apenas para os objetos da classe filha. Algo semelhante ao que já foi feito com a função __init__. Exemplo:

class Japones(Pessoa):
    def nome_completo(self):
        return f'{self.sobrenome} {self.nome}'


p1 = Pessoa('João', 'da Silva', 20)
p2 = Japones('Shinzo', 'Abe', 66)

print(p1.nome_completo())  # João da Silva
print(p2.nome_completo())  # Abe Shinzo

Essa relação de herança traz algo interessante, todo objeto da classe Japones se comporta como um objeto da classe Pessoa, porém a relação inversa não é verdade. Assim como podemos dizer que todo japonês é uma pessoa, mas nem todas as pessoas são japonesas. Ser japonês é um caso mais específico de pessoa, assim como as demais nacionalidades.

Sem orientação a objetos

Esse comportamento de sobrescrever a função nome_completo não é tão simples de replicar em uma estrutura de dicionário, porém é possível fazer. Porém como uma pessoa pode ser tanto japonês quanto não ser, não é possível saber de antemão para escrever no código pessoa.nome_completo ou japones.nome_completo, que diferente do exemplo da autenticação, agora são duas funções diferentes, isso precisa ser descoberto dinamicamente quando se precisar chamar a função.

Uma forma de fazer isso é guardar uma referência para a função que deve ser chamada dentro da própria estrutura. Exemplo:

# Arquivo: pessoa.py

def init(pessoa, nome, sobrenome, idade):
    pessoa['nome'] = nome
    pessoa['sobrenome'] = sobrenome
    pessoa['idade'] = idade
    pessoa['nome_completo'] = nome_completo


def nome_completo(pessoa):
    return f"{pessoa['nome']} {pessoa['sobrenome']}"

# Arquivo: japones.py

import pessoa


def init(japones, nome, sobrenome, idade):
    pessoa(japones, nome, sobrenome, idade)
    japones['nome_completo'] = nome_completo


def nome_completo(japones):
    return f"{pessoa['sobrenome']} {pessoa['nome']}"

import pessoa
import japones

p1 = {}
pessoa.init(p1, 'João', 'da Silva', 20)
p2 = {}
japones.init(p2, 'Shinzo', 'Abe', 66)

print(p1['nome_completo'](p1))  # João da Silva
print(p2['nome_completo'](p2))  # Abe Shinzo

Perceba que a forma de chamar a função foi alterada. O que acontece na prática é que toda função que pode ser sobrescrita não é chamada diretamente, e sim a partir de uma referência, e isso gera um custo computacional adicional. Como esse custo não é tão alto (muitas vezes sendo quase irrelevante), esse é o comportamento adotado em várias linguagens, porém em C++, por exemplo, existe a palavra-chave virtual para descrever quando uma função pode ser sobrescrita ou não.

Considerações

Herança é um mecanismo interessante para ser explorado com o objetivo de reaproveitar código e evitar repeti-lo. Porém isso pode vir com alguns custos, seja computacional durante sua execução, seja durante a leitura do código, sendo necessário verificar diversas classes para saber o que de fato está sendo executado, porém isso também pode ser usado para ocultar e abstrair lógicas mais complicadas, como eu já comentei em outra postagem.

Herança também permite trabalhar com generalização e especialização, podendo descrever o comportamento mais geral, ou mais específico. Ou simplesmente só adicionar mais funcionalidades a uma classe já existente.

Assim como foi utilizado o super() para chamar a função __init__ da classe pai, é possível utilizá-lo para chamar qualquer outra função. Isso permite, por exemplo, tratar os argumentos da função, aplicando modificações antes de chamar a função original, ou seu retorno, executando algum processamento em cima do retorno dela, não precisando rescrever toda a função.

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

Orientação a objetos de outra forma: Métodos estáticos e de classes

2021-04-19T20:00:00+00:00

Na postagem anterior foi apresentado o self, nessa postagem será discutido mais a respeito desse argumento, considerando opções para ele e suas aplicações.

Métodos estáticos

Nem todas as funções de uma classe precisam receber uma referência de um objeto para lê-lo ou alterá-lo, muitas vezes uma função pode fazer o seu papel apenas com os dados passados como argumento, por exemplo, receber um nome e validar se ele possui pelo menos três caracteres sem espaço. Dessa forma, essa função poderia ser colocada fora do escopo da classe, porém para facilitar sua chamada, e possíveis alterações (que será discutido em outra postagem), é possível colocar essa função dentro da classe e informar que ela não receberá o argumento self com o decorador @staticmethod:

class Pessoa:
    ...  # Demais funções

    @staticmethod
    def valida_nome(nome):
        return len(nome) >= 3 and ' ' not in nome

Dessa forma, essa função pode ser chamada diretamente de um objeto pessoa, ou até mesmo diretamente da classe, sem precisar criar um objeto primeiro:

# Chamando diretamente da classe
print(Pessoa.valida_nome('João'))

# Chamando através de um objeto do tipo Pessoa
p1 = Pessoa('João', 'da Silva', 20)
print(p1.valida_nome(p1.nome))

E essa função também pode ser utilizada dendro de outras funções, como validar o nome na criação de uma pessoa, de forma que caso o nome informado seja válido, será criado um objeto do tipo Pessoa, e caso o nome seja inválido, será lançado uma exceção:

class Pessoa:
    def __init__(self, nome, sobrenome, idade):
        if not self.valida_nome(nome):
            raise ValueError('Nome inválido')

        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

    ...  # Demais funções

    @staticmethod
    def valida_nome(nome):
        return len(nome) >= 3 and ' ' not in nome


p1 = Pessoa('João', 'da Silva', 20)  # Cria objeto
p2 = Pessoa('a', 'da Silva', 20)  # Lança ValueError: Nome inválido

Métodos da classe

Entretanto algumas funções podem precisar de um meio termo, necessitar acessar o contexto da classe, porém sem necessitar de um objeto. Isso é feito através do decorador @classmethod, onde a função decorada com ele, em vez de receber um objeto como primeiro argumento, recebe a própria classe.

Para demonstrar essa funcionalidade será implementado um id auto incremental para os objetos da classe Pessoa:

class Pessoa:
    total_de_pessoas = 0

    @classmethod
    def novo_id(cls):
        cls.total_de_pessoas += 1
        return cls.total_de_pessoas

    def __init__(self, nome, sobrenome, idade):
        self.id = self.novo_id()
        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

p1 = Pessoa('João', 'da Silva', 20)
print(p1.id)  # Imprime 1
p2 = Pessoa('Maria', 'dos Santos', 18)
print(p2.id)  # Imprime 2
print(Pessoa.total_de_pessoas)  # Imprime 2
print(p1.total_de_pessoas)  # Imprime 2
print(p2.total_de_pessoas)  # Imprime 2

Nesse código é criado uma variável total_de_pessoas dentro do escopo da classe Pessoas, e que é compartilhado tanto pela classe, como pelos objetos dessa classe, diferente de declará-la com self. dentro do __init__, onde esse valor pertenceria apenas ao objeto, e não é compartilhado com os demais objetos. Declarar variáveis dentro do contexto da classe é similar ao se declarar variáveis com static em outras linguagens, assim como o @classmethod é semelhante a declaração de funções com static.

As funções declaradas com @classmethod também podem ser chamadas sem a necessidade de se criar um objeto, como Pessoa.novo_id(), embora que para essa função específica isso não faça muito sentido, ou receber outros argumentos, tudo depende do que essa função fará.

Considerações

Embora possa parecer confuso identificar a diferença de uma função de um objeto (função sem decorador), função de uma classe (com decorador @classmethod) e função sem acesso a nenhum outro contexto (com decorador @staticmethod), essa diferença fica mais clara ao se analisar o primeiro argumento recebido por cada tipo de função. Podendo ser a referência a um objeto (self) e assim necessitando que um objeto seja criado anteriormente, ser uma classe (cls) e não necessitando receber um objeto, ou simplesmente não recebendo nenhum argumento especial, apenas os demais argumentos necessários para a função. Sendo diferenciados pelo uso dos decoradores.

Na orientação a objetos implementada pelo Python, algumas coisas podem ficar confusas quando se mistura com nomenclaturas de outras linguagens que possuem implementações diferentes. A linguagem Java, por exemplo, utiliza a palavra-chave static para definir os atributos e métodos de classe, enquanto no Python um método estático é aquele que não acessa nem um objeto, nem uma classe, devendo ser utilizado o escopo da classe e o decorador @classmethod para se criar atributos e métodos da classe.

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

Orientação a objetos de outra forma: Classes e objetos

2021-04-12T18:00:00+00:00

Nas poucas e raríssimas lives que eu fiz na Twitch, surgiu a ideia de escrever sobre programação orientada a objetos em Python, principalmente por algumas diferenças de como ela foi implementada nessa linguagem. Aproveitando o tema, vou fazer uma série de postagens dando uma visão diferente sobre orientação a objetos. E nessa primeira postagem falarei sobre classes e objetos.

Usando um dicionário

Entretanto, antes de começar com orientação a objetos, gostaria de apresentar e discutir alguns exemplos sem utilizar esse paradigma de programação.

Pensando em um sistema que precise manipular dados de pessoas, é possível utilizar os dicionários do Python para agrupar os dados de uma pessoa em uma única variável, como no exemplo a baixo:

pessoa = {
    'nome': 'João',
    'sobrenome': 'da Silva',
    'idade': 20,
}

Onde os dados poderiam ser acessados através da variável e do nome do dado desejado, como:

print(pessoa['nome'])  # Imprimindo João

Assim, todos os dados de uma pessoa ficam agrupados em uma variável, o que facilita bastante a programação, visto que não é necessário criar uma variável para cada dado, e quando se manipula os dados de diferentes pessoas fica muito mais fácil identificar de qual pessoa aquele dado se refere, bastando utilizar variáveis diferentes.

Função para criar o dicionário

Apesar de prático, é necessário replicar essa estrutura de dicionário toda vez que se desejar utilizar os dados de uma nova pessoa. Para evitar a repetição de código, a criação desse dicionário pode ser feita dentro de uma função que pode ser colocada em um módulo pessoa (arquivo, nesse caso com o nome de pessoa.py):

# Arquivo: pessoa.py

def nova(nome, sobrenome, idade):
    return {
        'nome': nome,
        'sobrenome': sobrenome,
        'idade': idade,
    }

E para criar o dicionário que representa uma pessoa, basta importar esse módulo (arquivo) e chamar a função nova:

import pessoa

p1 = pessoa.nova('João', 'da Silva', 20)
p2 = pessoa.nova('Maria', 'dos Santos', 18)

Desta forma, garante-se que todos os dicionários representando pessoas terão os campos desejados e devidamente preenchidos.

Função com o dicionário

Também é possível criar algumas funções para executar operações com os dados desses dicionários, como pegar o nome completo da pessoa, trocar o seu sobrenome, ou fazer aniversário (o que aumentaria a idade da pessoa em um ano):

# Arquivo: pessoa.py

def nova(nome, sobrenome, idade):
    ...  # Código abreviado


def nome_completo(pessoa):
    return f"{pessoa['nome']} {pessoa['sobrenome']}"


def trocar_sobrenome(pessoa, sobrenome):
    pessoa['sobrenome'] = sobrenome


def fazer_aniversario(pessoa):
    pessoa['idade'] += 1

E sendo usado como:

import pessoa

p1 = pessoa.nova('João', 'da Silva', 20)
pessoa.trocar_sobrenome(p1, 'dos Santos')
print(pessoa.nome_completo(p1))
pessoa.fazer_aniversario(p1)
print(p1['idade'])

Nesse caso, pode-se observar que todas as funções aqui implementadas seguem o padrão de receber o dicionário que representa a pessoa como primeiro argumento, podendo ter outros argumentos ou não conforme a necessidade, acessando e alterando os valores desse dicionário.

Versão com orientação a objetos

Antes de entrar na versão orientada a objetos propriamente dita dos exemplos anteriores, vou fazer uma pequena alteração para facilitar o entendimento posterior. A função nova será separada em duas partes, a primeira que criará um dicionário, e chamará uma segunda função (init), que receberá esse dicionário como primeiro argumento (seguindo o padrão das demais funções) e criará sua estrutura com os devidos valores.

# Arquivo: pessoa.py

def init(pessoa, nome, sobrenome, idade):
    pessoa['nome'] = nome
    pessoa['sobrenome'] = sobrenome
    pessoa['idade'] = idade


def nova(nome, sobrenome, idade):
    pessoa = {}
    init(pessoa, nome, sobrenome, idade)
    return pessoa


...  # Demais funções do arquivo

Porém isso não muda a forma de uso:

import pessoa

p1 = pessoa.nova('João', 'da Silva', 20)

Função para criar uma pessoa

A maioria das linguagens de programação que possuem o paradigma de programação orientado a objetos faz o uso de classes para definir a estrutura dos objetos. O Python também utiliza classes, que podem ser definidas com a palavra-chave class seguidas de um nome para ela. E dentro dessa estrutura, podem ser definidas funções para manipular os objetos daquela classe, que em algumas linguagens também são chamadas de métodos (funções declaradas dentro do escopo uma classe).

Para converter o dicionário para uma classe, o primeiro passo é implementar uma função para criar a estrutura desejada. Essa função deve possui o nome __init__, e é bastante similar a função init do código anterior:

class Pessoa:
    def __init__(self, nome, sobrenome, idade):
        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

As diferenças são que agora o primeiro parâmetro se chama self, que é um padrão utilizado no Python, e em vez de usar colchetes e aspas para acessar os dados, aqui basta utilizar o ponto e o nome do dado desejado (que aqui também pode ser chamado de atributo, visto que é uma variável do objeto). A função nova implementada anteriormente não é necessária, a própria linguagem cria um objeto e passa ele como primeiro argumento para o __init__. E assim para se criar um objeto da classe Pessoa basta chamar a classe como se fosse uma função, ignorando o argumento self e informando os demais, como se estivesse chamando a função __init__ diretamente:

p1 = Pessoa('João', 'da Silva', 20)

Nesse caso, como a própria classe cria um contexto diferente para as funções (escopo ou namespace), não está mais sendo utilizado arquivos diferentes, porém ainda é possível fazê-lo, sendo necessário apenas fazer o import adequado. Mas para simplificação, tanto a declaração da classe, como a criação do objeto da classe Pessoa podem ser feitas no mesmo arquivo, assim como os demais exemplos dessa postagem.

Outras funções

As demais funções feitas anteriormente para o dicionário também podem ser feitas na classe Pessoa, seguindo as mesmas diferenças já apontadas anteriormente:

class Pessoa:
    def __init__(self, nome, sobrenome, idade):
        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

    def nome_completo(self):
        return f'{self.nome} {self.sobrenome}'

    def trocar_sobrenome(self, sobrenome):
        self.sobrenome = sobrenome

    def fazer_aniversario(self):
        self.idade += 1

Para se chamar essas funções, basta acessá-las através do contexto da classe, passando o objeto criado anteriormente como primeiro argumento:

p1 = Pessoa('João', 'dos Santos', 20)
Pessoa.trocar_sobrenome(p1, 'dos Santos')
print(Pessoa.nome_completo(p1))
Pessoa.fazer_aniversario(p1)
print(p1.idade)

Essa sintaxe é bastante semelhante a versão sem orientação a objetos implementada anteriormente. Porém quando se está utilizando objetos, é possível chamar essas funções com uma outra sintaxe, informando primeiro o objeto, seguido de ponto e o nome da função desejada, com a diferença de que não é mais necessário informar o objeto como primeiro argumento. Como a função foi chamada através de um objeto, o próprio Python se encarrega de passá-lo para o argumento self, sendo necessário informar apenas os demais argumentos:

p1.trocar_sobrenome('dos Santos')
print(p1.nome_completo())
p1.fazer_aniversario()
print(p1.idade)

Existem algumas diferenças entre as duas sintaxes, porém isso será tratado posteriormente. Por enquanto a segunda sintaxe pode ser vista como um açúcar sintático da primeira, ou seja, uma forma mais rápida e fácil de fazer a mesma coisa que a primeira, e por isso sendo a recomendada.

Considerações

Como visto nos exemplos, programação orientada a objetos é uma técnica para juntar variáveis em uma mesma estrutura e facilitar a escrita de funções que seguem um determinado padrão, recebendo a estrutura como argumento, porém a sintaxe mais utilizada no Python para chamar as funções de um objeto (métodos) posiciona a variável que guarda a estrutura antes do nome da função, em vez do primeiro argumento.

No Python, o argumento da estrutura ou objeto (self) aparece explicitamente como primeiro argumento da função, enquanto em outras linguagens essa variável pode receber outro nome (como this) e não aparece explicitamente nos argumentos da função, embora essa variável tenha que ser criada dentro do contexto da função para permitir manipular o objeto.

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

Funções in place ou cópia de valor

2021-03-29T15:00:00+00:00

Eventualmente observo dificuldades de algumas pessoas em usar corretamente alguma função, seja porque a função deveria ser executada isoladamente, e utilizado a própria variável que foi passada como argumento posteriormente, seja porque deveria se atribuir o retorno da função a alguma variável, e utilizar essa nova variável. No Python, essa diferença pode ser observada nos métodos das listas sort e reverse para as funções sorted e reversed, que são implementadas com padrões diferentes, in place e cópia de valor respectivamente. Assim pretendo discutir esses dois padrões de funções, comentando qual a diferença e o melhor caso de aplicação de cada padrão.

Função de exemplo

Para demonstrar como esses padrões funcionam, será implementado uma função que recebe uma lista e calcula o dobro dos valores dessa lista. Exemplo:

entrada = [5, 2, 8, 6, 4]

# Execução da função

resultado = [10, 4, 16, 12, 8]

Função com in place

A ideia do padrão in place é alterar a própria variável recebida pela função (ou o próprio objeto, caso esteja lidando com orientação a objetos). Neste caso, bastaria calcular o dobro do valor de cada posição da lista, e sobrescrever a posição com seu resultado. Exemplo:

from typing import List


def dobro_inplace(lista: List[int]) -> None:
    for i in range(len(lista)):
        lista[i] = 2 * lista[i]


valores = [5, 2, 8, 6, 4]
retorno = dobro_inplace(valores)

print(f'Variável: valores | Tipo: {type(valores)} | Valor: {valores}')
print(f'Variável: retorno | Tipo: {type(retorno)} | Valor: {retorno}')

Resultado da execução:

Variável: valores | Tipo: <class 'list'> | Valor: [10, 4, 16, 12, 8]
Variável: retorno | Tipo: <class 'NoneType'> | Valor: None

Com essa execução é possível observar que os valores da lista foram alterados, e que o retorno da função é nulo (None), ou seja, a função alterou a própria lista passada como argumento. Outro ponto importante a ser observado é a assinatura da função (tipo dos argumentos e do retorno da função), que recebe uma lista de inteiros e não tem retorno ou é nulo (None). Dessa forma embora seja possível chamar essa função diretamente quando está se informando os argumentos de outra função, como print(dobro_inplace(valores)), a função print receberia None e não a lista como argumento.

Função com cópia de valor

A ideia do padrão cópia de valor é criar uma cópia do valor passado como argumento e retornar essa cópia, sem alterar a variável recebida (ou criando um novo objeto, no caso de orientação a objetos). Neste caso, é necessário criar uma nova lista e adicionar nela os valores calculados. Exemplo:

from typing import List


def dobro_copia(lista: List[int]) -> List[int]:
    nova_lista = []

    for i in range(len(lista)):
        nova_lista.append(2 * lista[i])

    return nova_lista


valores = [5, 2, 8, 6, 4]
retorno = dobro_copia(valores)

print(f'Variável: valores | Tipo: {type(valores)} | Valor: {valores}')
print(f'Variável: retorno | Tipo: {type(retorno)} | Valor: {retorno}')

Resultado da execução:

Variável: valores | Tipo: <class 'list'> | Valor: [5, 2, 8, 6, 4]
Variável: retorno | Tipo: <class 'list'> | Valor: [10, 4, 16, 12, 8]

Com essa execução é possível observar que a variável valores continua com os valores que tinha antes da execução da função, e a variável retorno apresenta uma lista com os dobros, ou seja, a função não altera a lista passada como argumento e retorna uma nova lista com os valores calculados. Observado a assinatura da função, ela recebe uma lista de inteiros e retorna uma lista de inteiros. Isso permite chamar essa função diretamente nos argumentos para outra função, como print(dobro_copia(valores)), nesse caso a função print receberia a lista de dobros como argumento. Porém caso o retorno da função não seja armazenado, parecerá que a função não fez nada, ou não funcionou. Então em alguns casos, quando o valor anterior não é mais necessário, pode-se reatribuir o retorno da função a própria variável passada como argumento:

valores = dobro_copia(valores)

Função híbrida

Ainda é possível mesclar os dois padrões de função, alterando o valor passado e retornando-o. Exemplo:

from typing import List


def dobro_hibrido(lista: List[int]) -> List[int]:
    for i in range(len(lista)):
        lista[i] = 2 * lista[i]

    return lista


valores = [5, 2, 8, 6, 4]
retorno = dobro_hibrido(valores)

print(f'Variável: valores | Tipo: {type(valores)} | Valor: {valores}')
print(f'Variável: retorno | Tipo: {type(retorno)} | Valor: {retorno}')

Resultado da execução:

Variável: valores | Tipo: <class 'list'> | Valor: [10, 4, 16, 12, 8]
Variável: retorno | Tipo: <class 'list'> | Valor: [10, 4, 16, 12, 8]

Nesse caso, pode-se apenas chamar a função, como também utilizá-la nos argumentos de outras funções. Porém para se ter os valores originais, deve-se fazer uma cópia manualmente antes de executar a função.

Exemplo na biblioteca padrão

Na biblioteca padrão do Python, existem os métodos sort e reverse que seguem o padrão in place, e as funções sorted e reversed que seguem o padrão cópia de valor, podendo ser utilizados para ordenar e inverter os valores de uma lista, por exemplo. Quando não é mais necessário uma cópia da lista com a ordem original, é preferível utilizar funções in place, que alteram a própria lista, e como não criam uma cópia da lista, utilizam menos memória. Exemplo:

valores = [5, 2, 8, 6, 4]
valores.sort()
valores.reverse()
print(valores)

Se for necessário manter uma cópia da lista inalterada, deve-se optar pelas funções de cópia de valor. Exemplo:

valores = [5, 2, 8, 6, 4]
novos_valores = reversed(sorted(valores))
print(novos_valores)

Porém esse exemplo cria duas cópias da lista, uma em cada função. Para criar apenas uma cópia, pode-se misturar funções in place com cópia de valor. Exemplo:

valores = [5, 2, 8, 6, 4]
novos_valores = sorted(valores)
novos_valores.reverse()
print(novos_valores)

Também vale observar que algumas utilizações dessas funções podem dar a impressão de que elas não funcionaram, como:

valores = [5, 2, 8, 6, 4]

sorted(valores)
print(valores)  # Imprime a lista original, e não a ordenada

print(valores.sort())  # Imprime None e não a lista

Considerações

Nem sempre é possível utilizar o padrão desejado, strings no Python (str) são imutáveis, logo todas as funções que manipulam elas seguiram o padrão cópia de valor, e para outros tipos, pode ocorrer de só existir funções in place, sendo necessário fazer uma cópia manualmente antes de chamar a função, caso necessário. Para saber qual padrão a função implementa, é necessário consultar sua documentação, ou verificando sua assinatura, embora ainda possa existir uma dúvida entre cópia de valor e híbrida, visto que a assinatura dos dois padrões são iguais.

Os exemplos aqui dados são didáticos. Caso deseja-se ordenar de forma reversa, tanto o método sort, quanto a função sorted podem receber como argumento reverse=True, e assim já fazer a ordenação reversa. Assim como é possível criar uma nova lista já com os valores, sem precisar adicionar manualmente item por item, como os exemplos:

valores = [5, 2, 8, 6, 4]
partes_dos_valores = valores[2:]
novos_valores = [2 * valor for valor in valores]

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

Encapsulamento da lógica do algoritmo

2021-03-02T18:00:00+00:00

Muitas listas de exercícios de lógica de programação pedem em algum momento que um valor seja lido do teclado, e caso esse valor seja inválido, deve-se avisar, e repetir a leitura até que um valor válido seja informado. Utilizando a ideia de otimização do algoritmo passo a passo, começando com uma solução simples, pretendo estudar como reduzir a duplicação de código alterando o algoritmo, encapsulando a lógica em funções, e encapsulando em classes.

Exercício

Um exemplo de exercício que pede esse tipo de validação é a leitura de notas, que devem estar entre 0 e 10. A solução mais simples, consiste em ler um valor, e enquanto esse valor for inválido, dar o aviso e ler outro valor. Exemplo:

nota = float(input('Digite a nota: '))
while nota < 0 or nota > 10:
    print('Nota inválida')
    nota = float(input('Digite a nota: '))

Esse algoritmo funciona, porém existe uma duplicação no código que faz a leitura da nota (uma antes do loop e outra dentro). Caso seja necessário uma alteração, como a mudança da nota para um valor inteiro entre 0 e 100, deve-se alterar os dois lugares, e se feito em apenas um lugar, o algoritmo poderia processar valores inválidos.

Alterando o algoritmo

Visando remover a repetição de código, é possível unificar a leitura do valor dentro do loop, uma vez que é necessário repetir essa instrução até que o valor válido seja obtido. Exemplo:

while True:
    nota = float(input('Digite a nota: '))
    if 0 <= nota <= 10:
        break
    print('Nota inválida!')

Dessa forma, não existe mais a repetição de código. A condição de parada, que antes verificava se o valor era inválido (o que pode ter uma leitura não tão intuitiva), agora verifica se é um valor válido (que é geralmente é mais fácil de ler e escrever a condição). E a ordem dos comandos dentro do loop, que agora estão em uma ordem que facilita a leitura, visto que no algoritmo anterior era necessário tem em mente o que era executado antes do loop.

Porém esses algoritmos validam apenas o valor lido, apresentando erro caso seja informado um valor com formato inválido, como letras em vez de números. Isso pode ser resolvido tratando as exceções lançadas. Exemplo:

while True:
    try:
        nota = float(input('Digite a nota: '))
        if 0 <= nota <= 10:
            break
    except ValueError:
        ...
    print('Nota inválida!')

Encapsulamento da lógica em função

Caso fosse necessário ler várias notas, com os algoritmos apresentados até então, seria necessário repetir todo esse trecho de código, ou utilizá-lo dentro de uma estrutura de repetição. Para facilitar sua reutilização, evitando a duplicação de código, é possível encapsular esse algoritmo dentro de uma função. Exemplo:

def nota_input(prompt):
    while True:
        try:
            nota = float(input(prompt))
            if 0 <= nota <= 10:
                break
        except ValueError:
            ...
        print('Nota inválida!')
    return nota


nota1 = nota_input('Digite a primeira nota: ')
nota2 = nota_input('Digite a segunda nota: ')

Encapsulamento da lógica em classes

Em vez de encapsular essa lógica em uma função, é possível encapsulá-la em uma classe, o que permitiria separar cada etapa do algoritmo em métodos, assim como ter um método responsável por controlar qual etapa deveria ser chamada em qual momento. Exemplo:

class ValidaNotaInput:
    mensagem_valor_invalido = 'Nota inválida!'

    def ler_entrada(self, prompt):
        return input(prompt)

    def transformar_entrada(self, entrada):
        return float(entrada)

    def validar_nota(self, nota):
        return 0 <= nota <= 10

    def __call__(self, prompt):
        while True:
            try:
                nota = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_nota(nota):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return nota


nota_input = ValidaNotaInput()


nota = nota_input('Digite a nota: ')

Vale observar que o método __call__ permite que o objeto criado a partir dessa classe seja chamado como se fosse uma função. Nesse caso ele é o responsável por chamar cada etapa do algoritmo, como: ler_entrada que é responsável por ler o que foi digitado no teclado, transformar_entrada que é responsável por converter o texto lido para o tipo desejado (converter de str para float), e validar_nota que é responsável por dizer se o valor é válido ou não. Vale observar que ao dividir o algoritmo em métodos diferentes, seu código principal virou uma espécie de código comentado, descrevendo o que está sendo feito e onde está sendo feito.

Outra vantagem de encapsular a lógica em classe, em vez de uma função, é a possibilidade de generalizá-la. Se fosse necessário validar outro tipo de entrada, encapsulando em uma função, seria necessário criar outra função repetindo todo o algoritmo, alterando apenas a parte referente a transformação do valor lido, e validação, o que gera uma espécie de repetição de código. Ao encapsular em classes, é possível se aproveitar dos mecanismos de herança para evitar essa repetição. Exemplo:

class ValidaInput:
    mensagem_valor_invalido = 'Valor inválido!'

    def ler_entrada(self, prompt):
        return input(prompt)

    def transformar_entrada(self, entrada):
        raise NotImplementedError

    def validar_valor(self, valor):
        raise NotImplementedError

    def __call__(self, prompt):
        while True:
            try:
                valor = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_valor(valor):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return valor


class ValidaNomeInput(ValidaInput):
    mensagem_valor_invalido = 'Nome inválido!'

    def transformar_entrada(self, entrada):
        return entrada.strip().title()

    def validar_valor(self, valor):
        return valor != ''


class ValidaNotaInput(ValidaInput):
    mensagem_valor_invalido = 'Nota inválida!'

    def transformar_entrada(self, entrada):
        return float(entrada)

    def validar_valor(self, valor):
        return 0 <= valor <= 10


nome_input = ValidaNomeInput()
nota_input = ValidaNotaInput()


nome = nome_input('Digite o nome: ')
nota = nota_input('Digite a nota: ')

Dessa forma, é possível reutilizar o código já existente para criar outras validações, sendo necessário implementar apenas como converter a str lida do teclado para o tipo desejado, e como esse valor deve ser validado. Não é necessário entender e repetir a lógica de ler o valor, validá-lo, imprimir a mensagem de erro, e repetir até que seja informado um valor válido.

Considerações

É possível encapsular a lógica de um algoritmo em funções ou em classes. Embora para fazê-lo em uma classe exija conhecimentos de programação orientada a objetos, o seu reaproveitamento é facilitado, abstraindo toda a complexidade do algoritmo, que pode ser disponibilizado através de uma biblioteca, exigindo apenas a implementações de métodos simples por quem for a utilizar.

Ainda poderia ser discutido outras formas de fazer essa implementação, como passar funções como parâmetro e a utilização de corrotinas no encapsulamento do algoritmo em função, assim como a utilização de classmethod, staticmethod e ABC no encapsulamento do algoritmo em classes.

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

Blog has moved!

2021-02-07T23:41:22+00:00

It's 2021 and I decided to start blogging again. I am not going to migrate the posts to the new blog nor disable this blog for now, but if you want to checkout my latest content, please go to https://blog.fsouza.dev! :)

Mocando um serviço com Bottle

2021-01-11T00:00:00+00:00

Como minha vida ficou mais fácil com o Bottle ou, uma linda história de Natal

Hosting Telegram bots on Cloud Run for free

2021-01-08T00:00:00+00:00

I write a lot of Telegram bots using the library python-telegram-bot. Writing Telegram bots is fun, but you will also need someplace to host them.

I personally like the new Google Cloud Run; or run, for short, is perfect because it has a “gorgeous” free quota that should be mostly sufficient to host your bots, also, and is it super simple to deploy and get running.

To create Telegram bots, first, you need to talk to BotFather and get a TOKEN.

Secondly, you need some coding. As I mentioned before, you can use python-telegram-bot to do your bots. Here is the documentation.

Code

Here is the base code that you will need to run on Cloud Run.

main.py

import os
import http

from flask import Flask, request
from werkzeug.wrappers import Response

from telegram import Bot, Update
from telegram.ext import Dispatcher, Filters, MessageHandler, CallbackContext

app = Flask(__name__)


def echo(update: Update, context: CallbackContext) -> None:
    update.message.reply_text(update.message.text)

bot = Bot(token=os.environ["TOKEN"])

dispatcher = Dispatcher(bot=bot, update_queue=None, workers=0)
dispatcher.add_handler(MessageHandler(Filters.text & ~Filters.command, echo))

@app.route("/", methods=["POST"])
def index() -> Response:
    dispatcher.process_update(
        Update.de_json(request.get_json(force=True), bot))

    return "", http.HTTPStatus.NO_CONTENT

requirements.txt

flask==1.1.2
gunicorn==20.0.4
python-telegram-bot==13.1

Dockerfile

FROM python:3.8-slim
ENV PYTHONUNBUFFERED True
WORKDIR /app
COPY *.txt .
RUN pip install --no-cache-dir --upgrade pip -r requirements.txt
COPY . ./

CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app

Deployment

Finally, you need to deploy. You can do it in a single step, but first, let’s run the command below to set the default region (optionally).

gcloud config set run/region us-central1

Then deploy to Cloud Run:

gcloud beta run deploy your-bot-name \
    --source . \
    --set-env-vars TOKEN=your-telegram-bot-token \
    --platform managed \
    --allow-unauthenticated \
    --project your-project-name

After this, you will receive a public URL of your run, and you will need to set the Telegram bot webHook using cURL

curl "https://api.telegram.org/botYOUR-BOT:TOKEN/setWebhook?url=https://your-bot-name-uuid-uc.a.run.app"

You should replace the YOUR-BOT:TOKEN by the bot’s token and the public URL of your Cloud Run.

This should be enough.

I write a lot of Telegram bots using the library python-telegram-bot. Writing Telegram bots is fun, but you will also need someplace to host them.

Periodically backup your Google Photos to Google Cloud Storage

2020-12-31T00:00:00+00:00

Why?

Google Cloud Storage is cheaper, and you pay only for what you use than Google One. Also, you can erase any photo, and you still have a copy of that.

Installation

Create a Compute Engine (a VM).

If you choose Ubuntu, first of all, remove snap

sudo apt autoremove --purge snapd
sudo rm -rf /var/cache/snapd/
rm -rf ~/snap

Install gcsfuse or follow the official instructions.

export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

sudo apt-get update
sudo apt-get install gcsfuse

On Google Cloud console create a bucket of the type Nearline, in my case the name of the bucket is tank1, then back to your VM and create a dir with the same name of the bucket.

mkdir name-of-your-bucket

Now install gphotos-sync.

sudo apt install -y python3-pip
pip3 install gphotos-sync

I created a small Python script to deal with multiple Google accounts. I’ll explain later how it works.

cat <<EOF > /home/ubuntu/synchronize.py
#!/usr/bin/env python3

import os
import sys
import subprocess
from pathlib import Path

import requests


home = Path(os.path.expanduser("~")) / "tank1/photos"

args = [
  "--ntfs",
  "--retry-download",
  "--skip-albums",
  "--photos-path", ".",
  "--log-level", "DEBUG",
]

env = os.environ.copy()
env["LC_ALL"] = "en_US.UTF-8"

for p in home.glob("*/*"):
  subprocess.run(["/home/ubuntu/.local/bin/gphotos-sync", *args, str(p.relative_to(home))], check=True, cwd=home, env=env, stdout=sys.stdout, stderr=subprocess.STDOUT)

# I use healthchecks.io to alert me if the script has stopped work
url = "https://hc-ping.com/uuid4"
response = requests.get(url, timeout=60)
response.raise_for_status()
EOF

Give execute permission.

chmod u+x synchronize.py

Now let’s create some systemd scripts.

sudo su

Let’s create a service to gcsfuse, responsible to mount the bucket locally using the FUSE.

cat <<EOF >/etc/systemd/system/gcsfuse.service 
# Script stolen from https://gist.github.com/craigafinch/292f98618f8eadc33e9633e6e3b54c05
[Unit]
Description=Google Cloud Storage FUSE mounter
After=local-fs.target network-online.target google.service sys-fs-fuse-connections.mount
Before=shutdown.target

[Service]
Type=forking
User=ubuntu
ExecStart=/bin/gcsfuse tank1 /home/ubuntu/tank1
ExecStop=/bin/fusermount -u /home/ubuntu/tank1
Restart=always

[Install]
WantedBy=multi-user.target
EOF

Enable and start the service:

systemctl enable gcsfuse.service
systemctl start gcsfuse.service

cat <<EOF >/etc/systemd/system/gphotos-sync.service 
[Unit]
Description=Run gphotos-sync for each account

[Service]
User=ubuntu
ExecStart=/home/ubuntu/synchronize.py
EOF

And enable the service.

systemctl enable gphotos-sync.service

Now let’s create a timer to run 1 minute after the boot the gphotos-sync.service with gcsfuse.service as dependency.

cat <<EOF >/etc/systemd/system/gphotos-sync.timer 
[Unit]
Description=Run gphotos sync service weekly
Requires=gcsfuse.service

[Timer]
OnBootSec=1min
Unit=gphotos-sync.service

[Install]
WantedBy=timers.target
EOF

systemctl enable gphotos-sync.timer
systemctl start gphotos-sync.timer

exit (back to ubuntu user)

Now follow https://docs.google.com/document/d/1ck1679H8ifmZ_4eVbDeD_-jezIcZ-j6MlaNaeQiz7y0/edit to get a client_secret.json to use with gphotos-sync.

mkdir -p /home/ubuntu/.config/gphotos-sync/
# Copy the contents of the json to the file bellow
vim /home/ubuntu/.config/gphotos-sync/client_secret.json 

Testing

Due to an issue with gcsfuse, I was unable to create the backup dir directly on the bucket. The workaround is to create a temp directory and start the gphotos-sync manually first.

mkdir -p ~/temp/username/0
cd ~/temp
gphotos-sync --ntfs --skip-albums --photos-path . username/0
# gphotos-sync will ask for a token, paste it and CTRL-C to stop the download of photos.
cp ~/temp/username/ ~/tank1/photos/username

Verify if it is working.

./synchronize.py

After executing the command above, the script should start the backup. You can wait until it finishes or continue to the steps below.

Schedule startup and shutdown of the VM

The content below is based on and simplified version of Scheduling compute instances with Cloud Scheduler by Google

Back to your VM and add the label runtime with the value weekly, this is needed by the function below to know which instances should be started or shutdown.

Create a new directory, in my case, I will call functions and add two files:

index.js

const Compute = require('@google-cloud/compute');
const compute = new Compute();

exports.startInstancePubSub = async (event, context, callback) => {
  try {
    const payload = JSON.parse(Buffer.from(event.data, 'base64').toString());
    const options = {filter: `labels.${payload.label}`};
    const [vms] = await compute.getVMs(options);
    await Promise.all(
      vms.map(async instance => {
        if (payload.zone === instance.zone.id) {
          const [operation] = await compute
            .zone(payload.zone)
            .vm(instance.name)
            .start();

          return operation.promise();
        }
      })
    );

    const message = 'Successfully started instance(s)';
    console.log(message);
    callback(null, message);
  } catch (err) {
    console.log(err);
    callback(err);
  }
};

exports.stopInstancePubSub = async (event, context, callback) => {
  try {
    const payload = JSON.parse(Buffer.from(event.data, 'base64').toString());
    const options = {filter: `labels.${payload.label}`};
    const [vms] = await compute.getVMs(options);
    await Promise.all(
      vms.map(async instance => {
        if (payload.zone === instance.zone.id) {
          const [operation] = await compute
            .zone(payload.zone)
            .vm(instance.name)
            .stop();

          return operation.promise();
        } else {
          return Promise.resolve();
        }
      })
    );

    const message = 'Successfully stopped instance(s)';
    console.log(message);
    callback(null, message);
  } catch (err) {
    console.log(err);
    callback(err);
  }
};

And

package.json

{
  "main": "index.js",
  "private": true,
  "dependencies": {
    "@google-cloud/compute": "^2.4.1"
  }
}

Create a PubSub topic to start the instance.

gcloud pubsub topics create start-instance-event

Now deploy the startInstancePubSub function

gcloud functions deploy startInstancePubSub \
    --trigger-topic start-instance-event \
    --runtime nodejs12 \
    --allow-unauthenticated

And another PubSub topic to stop the instance.

gcloud pubsub topics create stop-instance-event

And the stopInstancePubSub function

gcloud functions deploy stopInstancePubSub \
    --trigger-topic stop-instance-event \
    --runtime nodejs12 \
    --allow-unauthenticated

And finally, let’s create two Cloud Scheduler to publish on the topics on Sunday and Monday at midnight.

gcloud beta scheduler jobs create pubsub startup-weekly-instances \
    --schedule '0 0 * * SUN' \
    --topic start-instance-event \
    --message-body '{"zone":"us-central1-a", "label":"runtime=weekly"}' \
    --time-zone 'America/Sao_Paulo'

gcloud beta scheduler jobs create pubsub shutdown-weekly-instances \
    --schedule '0 0 * * MON' \
    --topic stop-instance-event \
    --message-body '{"zone":"us-central1-a", "label":"runtime=weekly"}' \
    --time-zone 'America/Sao_Paulo'

After this setup, your VM will start every Sunday, backup all your photos of all accounts and shutdown on Monday.

Why? Google Cloud Storage is cheaper, and you pay only for what you use than Google One. Also, you can erase any photo, and you still have a copy of that.

Auto generating SEO-friendly URLs with Scrapy pipelines

2020-12-14T00:00:00+00:00

I was using Scrapy to crawl some websites and mirror their content into a new one and at the same time, generate beautiful and unique URLs based on the title, but the title can appear repeated! So I added part of the original URL in base36 as uniqueness guarantees.

In the URL I wanted the title without special symbols, only ASCII and at the end a unique and short inditifier, and part of the result of the SHA-256 of the URL in base36.

class PreparePipeline():
  def process_item(self, item, spider):
    title = item.get("title")
    if title is None:
      raise DropItem(f"No title were found on item: {item}.")

    url = item["url"]

    N = 4
    sha256 = hashlib.sha256(url.encode()).digest()
    sliced = int.from_bytes(
        memoryview(sha256)[:N].tobytes(), byteorder=sys.byteorder)
    uid = base36.dumps(sliced)

    strip = str.strip
    lower = str.lower
    split = str.split
    deunicode = lambda n: normalize("NFD", n).encode("ascii", "ignore").decode("utf-8")
    trashout = lambda n: re.sub(r"[.,-@/\\|*]", " ", n)
    functions = [strip, deunicode, trashout, lower, split]
    fragments = [
        *functools.reduce(
        lambda x, f: f(x), functions, title),
        uid,
    ]

    item["uid"] = "-".join(fragments)

    return item

For example, with the URL https://en.wikipedia.org/wiki/Déjà_vu and title Déjà vu - Wikipedia will result in: deja-vu-wikipedia-1q9i86k. Which is perfect for my use case.

Taking advantage of Python’s concurrent futures to full saturate your bandwidth

2020-12-13T00:00:00+00:00

I am starting a new series of small snippets of code which I think that maybe useful or inspiring for others.

Let’s suppose you have a pandas’ dataframe with a column named URL which one do you want to download.

The code below takes the advantage of the multi-core processing using the ThreadPoolExecutor with requests.

import multiprocessing
import concurrent.futures

from requests import Session
from requests.adapters import HTTPAdapter
from urllib3.util import Retry

session = Session()

retry = Retry(connect=8, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount("http://", adapter)
session.mount("https://", adapter)


def download(url):
    filename = "/".join(["subdir", url.split("/")[-1]])

    with session.get(url, stream=True) as r:
        if not r.ok:
            return

        with open(filename, "wb") as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)


def run(df, processes=multiprocessing.cpu_count() * 2):
    with concurrent.futures.ThreadPoolExecutor(processes) as pool:
        list(pool.map(download, df["url"]))


if __name__ == '__main__':
    df = pd.read_csv("download.csv")
    run(df)

I am starting a new series of small snippets of code which I think that maybe useful or inspiring for others.

Fazendo backup do banco de dados no Django

2020-10-30T13:40:00+00:00

Apresentação

Em algum momento, durante o seu processo de desenvolvimento com Django, pode ser que surja a necessidade de criar e restaurar o banco de dados da aplicação. Pensando nisso, resolvi fazer um pequeno tutorial, básico, de como realizar essa operação.

Nesse tutorial, usaremos o django-dbbackup, um pacote desenvolvido especificamente para isso.

Configurando nosso ambiente

Primeiro, partindo do início, vamos criar uma pasta para o nosso projeto e, nela, isolar o nosso ambiente de desenvolvimento usando uma virtualenv:

mkdir projeto_db && cd projeto_db #criando a pasta do nosso projeto

virtualenv -p python3.8 env && source env/bin/activate #criando e ativando a nossa virtualenv

Depois disso e com o nosso ambiente já ativo, vamos realizar os seguintes procedimentos:

pip install -U pip #com isso, atualizamos a verão do pip instalado

Instalando as dependências

Agora, vamos instalar o Django e o pacote que usaremos para fazer nossos backups.

pip install Django==3.1.2 #instalando o Django

pip install django-dbbackup #instalando o django-dbbackup

Criando e configurando projeto

Depois de instaladas nossas dependências, vamos criar o nosso projeto e configurar o nosso pacote nas configurações do Django.

django-admin startproject django_db . #dentro da nossa pasta projeto_db, criamos um projeto Django com o nome de django_db.

Depois de criado nosso projeto, vamos criar e popular o nosso banco de dados.

python manage.py migrate #com isso, sincronizamos o estado do banco de dados com o conjunto atual de modelos e migrações.

Criado nosso banco de dados, vamos criar um superusuário para podemos o painel admin do nosso projeto.

python manage.py createsuperuser

Perfeito. Já temos tudo que precisamos para executar nosso projeto. Para execução dele, é só fazermos:

python manage.py runserver

Você terá uma imagem assim do seu projeto:

Configurando o django-dbbackup

Dentro do seu projeto, vamos acessar o arquivo settings.py, como expresso abaixo:

django_db/
├── settings.py

Dentro desse arquivos iremos, primeiro, adiconar o django-dbbackup às apps do projeto:

INSTALLED_APPS = (
    ...
    'dbbackup',  # adicionando django-dbbackup
)

Depois de adicionado às apps, vamos dizer para o Django o que vamos salvar no backup e, depois, indicar a pasta para onde será encaminhado esse arquivo. Essa inserção deve ou pode ser feita no final do arquivo settings.py:

DBBACKUP_STORAGE = 'django.core.files.storage.FileSystemStorage' #o que salvar
DBBACKUP_STORAGE_OPTIONS = {'location': 'backups/'} # onde salvar

Percebam que dissemos para o Django salvar o backup na pasta backups, mas essa pasta ainda não existe no nosso projeto. Por isso, precisamos criá-la [fora da pasta do projeto]:

mkdir backups

Criando e restaurando nosso backup

Já temos tudo pronto. Agora, vamos criar o nosso primeiro backup:

python manage.py dbbackup

Depois de exetudado, será criado um arquivo -- no nosso exemplo, esse arquivo terá uma extensão .dump --, salvo na pasta backups. Esse arquivo contem todo backup do nosso banco de dados.

Para recuperarmos nosso banco, vamos supor que migramos nosso sistema de um servidor antigo para um novo e, por algum motivo, nossa base de dados foi corrompida, inviabilizando seu uso. Ou seja, estamos com o sistema/projeto sem banco de dados -- ou seja, exlua ou mova a a sua base dados .sqlite3 para que esse exemplo seja útil --, mas temos os backups. Com isso, vamos restaurar o banco:

python manage.py dbrestore

Prontinho, restauramos nosso banco de dados. O interessante do django-dbbackup, dentre outras coisas, é que ele gera os backups com datas e horários específicos, facilitando o processo de recuperação das informações mais recentes.

Por hoje é isso, pessoal. Até a próxima. ;)

Seqtembro de eventos virtuais e gratuitos sobre Qt e KDE

2020-08-29T18:48:00+00:00

(Ok a piada com seqtembro funciona melhor na versão em inglês, seqtember, mas simbora)

Por uma grande coincidência, obra do destino, ou nada disso, teremos um Setembro de 2020 repleto de eventos virtuais e gratuitos de alta qualidade sobre Qt e KDE.

Começando de 4 à 11 do referido mês teremos o Akademy 2020, o grande encontro mundial da comunidade KDE que esse ano, por motivos que todos sabemos, acontecerá de forma virtual. A programação do Akademy traz palestras, treinamentos, hacking sessions, discussões com foco em aplicações KDE específicas, e mais, reunindo hackers, designers, gerentes de projetos, tradutores, e colabores dos mais diversos segmentos para discutir e planejar o KDE e seus futuros passos.

E como falamos em KDE, por extensão, também falamos em Qt – afinal, grande parte das aplicações é escrita nesse framework. Portanto, mesmo que você trabalhe com Qt mas não use nada do KDE, vale a pena participar do evento – e também se perguntar “porque diabos não estou usando e desenvolvendo aplicações do KDE?”.

Um incentivo extra é que durante o Akademy, entre 7 e 11, acontecerá o Qt Desktop Days, evento da KDAB voltado para Qt no desktop (surpresa?). A programação preliminar já está disponível e será muito interessante ver os avanços da tecnologia em um campo que pode parecer menos sexy hoje em dia, por conta da muita atenção dada a projetos mobile ou embarcados, mas que pelo contrário, continua vibrante e recebendo muito investimento.

Após uma rápida pausa para um respiro, temos a Maratona Qt. Nosso amigo Sandro Andrade, professor do IFBA e colaborador de longa data do KDE, resolveu dedicar uma semana inteira, de 14 à 18 de setembro, para apresentar 5 tópicos sobre o Qt tratando de seus fundamentos e passos iniciais de cada um. O programa cobre QML, C++ e Qt, Qt no Android, no iOS, na web (sim!), computação gráfica e mesmo jogos! Extremamente recomendada pra todo mundo que conhece ou quer conhecer o framework.

A Maratona Qt vai servir como um esquenta para a QtCon Brasil 2020, esse ano também virtual. Em 26 e 27 de setembro o pessoal da qmob.solutions reunirá desenvolvedores Qt de vários países para apresentarem, entre outras coisas, trabalhos com Wayland, visão computacional e IA, análise de dados, Python, containers, prototipagem, embarcados, e outros, tudo envolvendo Qt! E também haverá uma apresentação sobre a próxima versão major da ferramenta, Qt 6.

Portanto pessoal, reservem este mês para uma grande imersão nos vários aspectos e possibilidades disponibilizadas pelo Qt.

Kubicast - Episódio 45: Arquitetura de Software, existe algo além dos microsserviços?

2020-08-27T00:00:00+00:00

Escute minha participação no Kubicast junto com Felipe Oliveira falando sobre Arquitetura de Micros Serviços em comparação com Monolito:

MinHashing all the things: a quick analysis of MAG search results

2020-07-24T15:00:00+00:00

Last time I described a way to search MAGs in metagenomes, and teased about interesting results. Let's dig in some of them!

I prepared a repo with the data and a notebook with the analysis I did in this post. You can also follow along in Binder, as well as do your own analysis!

Preparing some metadata

The supplemental materials for Tully et al include more details about each MAG, so let's download them. I prepared a small snakemake workflow to do that, as well as downloading information about the SRA datasets from Tara Oceans (the dataset used to generate the MAGs), as well as from Parks et al, which also generated MAGs from Tara Oceans. Feel free to include them in your analysis, but I was curious to find matches in other metagenomes.

Loading the data

The results from the MAG search are in a CSV file, with a column for the MAG name, another for the SRA dataset ID for the metagenome and a third column for the containment of the MAG in the metagenome. I also fixed the names to make it easier to query, and finally removed the Tara and Parks metagenomes (because we already knew they contained these MAGs).

This left us with 23,644 SRA metagenomes with matches, covering 2,291 of the 2,631 MAGs. These are results for a fairly low containment (10%), so if we limit to MAGs with more than 50% containment we still have 1,407 MAGs and 2,938 metagenomes left.

TOBG_NP-110, I choose you!

That's still a lot, so I decided to pick a candidate to check before doing any large scale analysis. I chose TOBG_NP-110 because there were many matches above 50% containment, and even some at 99%. Turns out it is also an Archaeal MAG that failed to be classified further than Phylum level (Euryarchaeota), with a 70.3% complete score in the original analysis. Oh, let me dissect the name a bit: TOBG is "Tara Ocean Binned Genome" and "NP" is North Pacific.

And so I went checking where the other metagenome matches came from. 5 of the 12 matches above 50% containment come from one study, SRP044185, with samples collected from a column of water in a station in Manzanillo, Mexico. Other 3 matches come from SRP003331, in the South Pacific ocean (in northern Chile). Another match, ERR3256923, also comes from the South Pacific.

What else can I do?

I'm curious to follow the refining MAGs tutorial from the Meren Lab and see where this goes, and especially in using spacegraphcats to extract neighborhoods from the MAG and better evaluate what is missing or if there are other interesting bits that the MAG generation methods ended up discarding.

So, for now that's it. But more important, I didn't want to sit on these results until there is a publication in press, especially when there are people that can do so much more with these, so I decided to make it all public. It is way more exciting to see this being used to know more about these organisms than me being the only one with access to this info.

And yesterday I saw this tweet by @DrJonathanRosa, saying:

I don’t know who told students that the goal of research is to find some previously undiscovered research topic, claim individual ownership over it, & fiercely protect it from theft, but that almost sounds like, well, colonialism, capitalism, & policing

Amen.

I want to run this with my data!

Next time. But we will have a discussion about scientific infrastructure and sustainability first =]

Comments?

Thread on Twitter

Comunicação é a base dos projetos Open Source

2020-07-23T00:00:00+00:00

Sou criador e mantenedor (junto com uma incrivel comunidade, composta por pessoas ao redor do mundo) de um projeto chamado awesome-go Lista curada pela comunidade de frameworks, bibliotecas e software escritos em Go. Quando comecei contribuir e criar projeto Open Source achava que o foco principal era código, com o passar dos anos comecei perceber que o projeto era um meio para chegar em algum lugar, ou seja, código tem sua importância, mas não basta tem um projeto com código impecável vendo que “ninguém” ou poucas pessoas conseguem usar.

MinHashing all the things: searching for MAGs in the SRA

2020-07-22T15:00:00+00:00

(or: Top-down and bottom-up approaches for working around sourmash limitations)

In the last month I updated wort, the system I developed for computing sourmash signature for public genomic databases, and started calculating signatures for the metagenomes in the Sequence Read Archive. This is a more challenging subset than the microbial datasets I was doing previously, since there are around 534k datasets from metagenomic sources in the SRA, totalling 447 TB of data. Another problem is the size of the datasets, ranging from a couple of MB to 170 GB. Turns out that the workers I have in wort are very good for small-ish datasets, but I still need to figure out how to pull large datasets faster from the SRA, because the large ones take forever to process...

The good news is that I managed to calculate signatures for almost 402k of them ¹, which already let us work on some pretty exciting problems =]

Looking for MAGs in the SRA

Metagenome-assembled genomes are essential for studying organisms that are hard to isolate and culture in lab, especially for environmental metagenomes. Tully et al published 2,631 draft MAGs from 234 samples collected during the Tara Oceans expedition, and I wanted to check if they can also be found in other metagenomes besides the Tara Oceans ones. The idea is to extract the reads from these other matches and evaluate how the MAG can be improved, or at least evaluate what is missing in them. I choose to use environmental samples under the assumption they are easier to deposit on the SRA and have public access, but there are many human gut microbiomes in the SRA and this MAG search would work just fine with those too.

Moreover, I want to search for containment, and not similarity. The distinction is subtle, but similarity takes into account both datasets sizes (well, the size of the union of all elements in both datasets), while containment only considers the size of the query. This is relevant because the similarity of a MAG and a metagenome is going to be very small (and is symmetrical), but the containment of the MAG in the metagenome might be large (and is asymmetrical, since the containment of the metagenome in the MAG is likely very small because the metagenome is so much larger than the MAG).

The computational challenge: indexing and searching

sourmash signatures are a small fraction of the original size of the datasets, but when you have hundreds of thousands of them the collection ends up being pretty large too. More precisely, 825 GB large. That is way bigger than any index I ever built for sourmash, and it would also have pretty distinct characteristics than what we usually do: we tend to index genomes and run search (to find similar genomes) or gather (to decompose metagenomes into their constituent genomes), but for this MAG search I want to find which metagenomes have my MAG query above a certain containment threshold. Sort of a sourmash search --containment, but over thousands of metagenome signatures. The main benefit of an SBT index in this context is to avoid checking all signatures because we can prune the search early, but currently SBT indices need to be totally loaded in memory during sourmash index. I will have to do this in the medium term, but I want a solution NOW! =]

sourmash 3.4.0 introduced --from-file in many commands, and since I can't build an index I decided to use it to load signatures for the metagenomes. But... sourmash search tries to load all signatures in memory, and while I might be able to find a cluster machine with hundreds of GBs of RAM available, that's not very practical.

So, what to do?

The top-down solution: a snakemake workflow

I don't want to modify sourmash now, so why not make a workflow and use snakemake to run one sourmash search --containment for each metagenome? That means 402k tasks, but at least I can use batches and SLURM job arrays to submit reasonably-sized jobs to our HPC queue. After running all batches I summarized results for each task, and it worked well for a proof of concept.

But... it was still pretty resource intensive: each task was running one query MAG against one metagenome, and so each task needed to do all the overhead of starting the Python interpreter and parsing the query signature, which is exactly the same for all tasks. Extending it to support multiple queries to the same metagenome would involve duplicating tasks, and 402k metagenomes times 2,631 MAGs is... a very large number of jobs.

I also wanted to avoid clogging the job queues, which is not very nice to the other researchers using the cluster. This limited how many batches I could run in parallel...

The bottom-up solution: Rust to the rescue!

Thinking a bit more about the problem, here is another solution: what if we load all the MAGs in memory (as they will be queried frequently and are not that large), and then for each metagenome signature load it, perform all MAG queries, and then unload the metagenome signature from memory? This way we can control memory consumption (it's going to be proportional to all the MAG sizes plus the size of the largest metagenome) and can also efficiently parallelize the code because each task/metagenome is independent and the MAG signatures can be shared freely (since they are read-only).

This could be done with the sourmash Python API plus multiprocessing or some other parallelization approach (maybe dask?), but turns out that everything we need comes from the Rust API. Why not enjoy a bit of the fearless concurrency that is one of the major Rust goals?

The whole code ended up being 176 lines long, including command line parsing using strucopt and parallelizing the search using rayon and a multiple-producer, single-consumer channel to write results to an output (either the terminal or a file). This version took 11 hours to run, using less than 5GB of RAM and 32 processors, to search 2k MAGs against 402k metagenomes. And, bonus! It can also be parallelized again if you have multiple machines, so it potentially takes a bit more than an hour to run if you can allocate 10 batch jobs, with each batch 1/10 of the metagenome signatures.

So, is bottom-up always the better choice?

I would like to answer "Yes!", but bioinformatics software tends to be organized as command line interfaces, not as libraries. Libraries also tend to have even less documentation than CLIs, and this particular case is not a fair comparison because... Well, I wrote most of the library, and the Rust API is not that well documented for general use.

But I'm pretty happy with how the sourmash CLI is viable both for the top-down approach (and whatever workflow software you want to use) as well as how the Rust core worked for the bottom-up approach. I think the most important is having the option to choose which way to go, especially because now I can use the bottom-up approach to make the sourmash CLI and Python API better. The top-down approach is also way more accessible in general, because you can pick your favorite workflow software and use all the tricks you're comfortable with.

But, what about the results?!?!?!

Next time. But I did find MAGs with over 90% containment in very different locations, which is pretty exciting!

I also need to find a better way of distributing all these signature, because storing 4 TB of data in S3 is somewhat cheap, but transferring data is very expensive. All signatures are also available on IPFS, but I need more people to host them and share. Get in contact if you're interested in helping =]

And while I'm asking for help, any tips on pulling data faster from the SRA are greatly appreciated!

Comments?

Thread on Twitter

Footnotes

pulling about a 100 TB in 3 days, which was pretty fun to see because I ended up DDoS myself because I couldn't download the generated sigs fast enough from the S3 bucket where they are temporarily stored =P ↩

Por trás de projetos Open Source existe pessoas

2020-07-08T00:00:00+00:00

Tecnologia, sejá humano ao receber contribuição" Muitos engenheiro(a)s esquecem ao contribuir com projetos Open Source que por trás de todos projetos temos pessoas. Não conhecemos as pessoas que estão do outro lado (mantenedores do projeto) e como eles receberá nossa contribuição, isso nos gera a necessidade da comunicação ser extremamente clara e não assumirmos que os mantenedores (contribuidores) tenha o mesmo conhecimento que nós (não temos como saber o que as outras pessoas tem de conhecimento), mesmo conceitos que achamos óbvios é importante deixar claro na comunicação (issue, pull request e etc).

Efeitos gráficos com Python, Tkinter, Cython e Numba

2020-05-24T07:25:00+00:00

Ontem, sábado, fiquei com vontade de criar um efeito de flamas (fogo) em Python. Este efeito era bem popular no início dos anos 90. Eu lembrava que o algoritmo era bem simples, mas tinha uns truques a fazer com a paleta de cores.

Achei este artigo com a implementação em C: https://lodev.org/cgtutor/fire.html

Do mesmo artigo, podemos ter uma ideia de como fica o efeito:

Depois de ler o artigo e ver uns vídeos no Youtube, vi-me com dois problemas:

Precisava de uma aplicação gráfica capaz de mostrar imagens, como uma animação, uma imagem após a outra, o mais rápido possível (no mínimo uns 15 frames por segundo, idealmente acima de 30).
Suspeitei que teria problemas de velocidade para gerar as imagens, uma vez que uma mera imagem de 1024 x 1024 tem muitos pontos e usa uns 3 bytes por ponto. Imaginando uma matriz deste tamanho para trabalhar em Python, vi que não seria tão fácil escrever esta parte apenas em Python. Instalei o numpy para garantir.

Eu esperava que o problema um seria relativamente simples, mas já explico o que complicou um pouco. Como eu quero apenas mostrar uma imagem, o tkinter do Python já seria suficiente. Comecei por criar uma aplicação simples, mostrando um Canvas e adicionando uma imagem. Porém, devido ao problema dois, durante o tempo para gerar a imagem, a tela fica completamente bloqueada, você não consegue mover ou fechar a janela.

O código está em português e em inglês, mas basicamente é uma aplicação tkinter, onde a janela principal tem um Label para apresentar uma mensagem, no caso, o número do frame corrente; e uma imagem.

class App(tk.Tk):
    def __init__(self, desenhador, func, preFunc, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.setup_windows()
        self.queue = Queue()
        self.queueStop = Queue()
        self.setup_thread(desenhador, func, preFunc)
        self.buffer = None
        self.running = True
        self.dead = False
        self.after(1, self.check_queue)

    def setup_windows(self):
        self.title('Gerador de Imagens')
        self.status = tk.StringVar(self, value='Aguardando')
        tk.Label(self, textvariable=self.status).pack()
        self.canvas = tk.Canvas(self, width=LARGURA, height=ALTURA)
        self.image = self.canvas.create_image(0, 0, anchor=tk.NW)
        self.canvas.pack()
        self.protocol("WM_DELETE_WINDOW", self.terminate)

O método setup_windows configura a janela, adicionando o Label, criando o Canvas e a imagem. Como vamos trocar a imagem frequentemente, ela também guarda uma referência a imagem no canvas em self.image. Este método também configura a janela para chamar self.terminate caso o usuário a feche.

Já setup_thread faz a inicialização do thread, a classe que gerencia o thread é passada como desenhador ao __init__. Para facilitar a comunicação com o thread, duas filas foram criadas, uma para receber as mensagens vindas do thread (self.queue) e outra para esperar a finalização do thread self.queueStop (mais detalhes depois). func e preFunc são duas funções usadas para facilitar os testes, onde as funções que realizam o desenho da imagem podem ser passadas como parâmetro. preFunc gera a primeira imagem e func é chamada dentro de um loop para gerar as imagens (frames seguintes). O thread desenhador é iniciado imediatamente após sua criação.

    def setup_thread(self, desenhador, func, preFunc):
        self.desenhador = desenhador(self.queue, self.queueStop, func, preFunc)
        self.desenhador.start()

Uma vez que a janela e o thread que atualiza as imagens já foram criados, precisamos de um método que fique periodicamente verificando se há novas imagens na fila. Este método é o check_queue, chamado no __init__ com self.after(1, self.check_queue). O uso de self.after é capital, pois começa a executar check_queue fora do __init__, depois da criação da janela e do loop de eventos.

check_queue verifica se a fila com as imagens geradas pelo thread está vazia. Caso esteja, não faz nada, mas caso contrário, pega a nova imagem e troca a imagem do Canvas. No final, se agenda para rodar de novo 10 ms depois e repete este processo para trocar as imagens o quanto antes.

    def check_queue(self):
        if not self.queue.empty():
            contador, self.buffer = self.queue.get()
            self.status.set(f"Frame: {contador}")
            self.canvas.itemconfig(self.image, image=self.buffer)
            self.queue.task_done()
        if self.running:
            self.after(10, self.check_queue)

Quando trabalhamos com múltiplos threads no tkinter e na maioria dos frameworks GUI, normalmente, só podemos alterar os objetos geridos pelo framework no mesmo thread que roda o mainloop. É por isso que a imagem é trocada em check_queue. Isto também leva a outros problemas a gerir entre os threads e a GUI. Por exemplo, a conversão de uma imagem, realizada no thread que desenha (detalhes depois), precisa que o tkinter esteja rodando e processando eventos, mesmo sendo um objeto fora da tela e não associado a nenhum controle. Esta é uma característica do tkinter. E por isto, terminate chama check_thread_dead para matar o thread, mas esperando como o loop principal da tkinter rodando. Veja que o desenhador é parado com self.desenhador.stop(). Depois, check_thread_dead é chamada para verificar se o desenhador realmente parou, é neste momento que usamos a outra fila, queueStop. Esta fila fica vazia durante a execução do programa e só recebe algo quando o loop do desenhador termina seu trabalho. Só então que o loop da tkinter é destruído com a chamada a self.destroy().

    def check_thread_dead(self):
        if self.queueStop.empty() and not self.dead:
            self.after(1, self.check_thread_dead)
            return
        self.queueStop.get()
        self.dead = True
        self.desenhador.join()
        self.destroy()

    def terminate(self, e=None):
        self.running = False
        if not self.dead:
            self.desenhador.stop()
            self.check_thread_dead()

Isso tudo apenas para ter a janela sendo atualizada por outro thread. Ainda não desenhamos nada. Vejamos uma implementação de desenhador:

class Desenha(Thread):
    def __init__(self, queue, queueStop, func, preFunc, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.queue = queue
        self.queueStop = queueStop
        self.running = True
        self.func = func
        self.preFunc = preFunc or func

    def run(self):
        try:
            data = numpy.zeros((ALTURA, LARGURA, 3), dtype=numpy.uint8)
            c = 0
            self.preFunc(data, c, LARGURA, ALTURA)
            while self.running:
                with TimeIt("Loop") as t:
                    # with TimeIt("ForLoop") as t:
                    self.func(data, c, LARGURA, ALTURA)
                    # with TimeIt("FROM ARRAY") as t1:
                    image = Image.fromarray(data)
                    # with TimeIt("Convert") as t2:
                    converted_image = ImageTk.PhotoImage(image)
                    # with TimeIt("Queue") as t3:
                    self.queue.put((c, converted_image))
                    c += 1
        finally:
            self.running = False
            self.queueStop.put((0, 'FEITO'))

    def stop(self):
        self.running = False

A classe Desenha recebe as filas para onde vai enviar as imagens que serão criadas dentro do run e também a mensagem que indica ter terminado. O trabalho em si é realizado dentro de run que é executado quando o thread é iniciado.

Como as matrizes são grandes, com facilmente mais de 1 milhão de elementos para imagens 1024 x 1024 pontos, Desenha utiliza arrays otimizados da numpy. Seria simplesmente muito mais lento trabalhar com listas do python para realizar estas operações, pois temos que preencher todos os pontos a cada imagem.

Se você não conhece a NumPy, ela é uma biblioteca muito utilizada em ciência de dados e várias outras áreas que precisam realizar operações com matrizes e efetuar cálculos matemáticos em geral em Python. Voce pode ler a documentação aqui NumPy. A grande vantagem da NumPy é ser otimizada em C, além de fazer parte do Scipy.org.

Voltando ao run, ele basicamente cria uma matriz com tamanho suficiente para representar os pontos da nova imagem que vamos criar. Estas dimensões são ALTURA e LARGURA em 3 dimensões, uma para cada componente de cor RGB (Vermelho, Verde e Azul; um byte para cada). Assim, com uma imagem de 1024 x 1024 pontos, temos 1024 x 1024 x 3 = 3.145.728 bytes só para armazenar a matriz de pontos.

Uma vez que a matriz é criada, run chama a função de desenho self.preFunc, que realiza o desenho da primeira imagem, passando a matriz, um contador de frames, assim como as dimensões da imagem. Esta assinatura foi se desenvolvendo conforme eu precisei fazer testes. Depois, chama dentro do loop principal self.func, com os mesmos parâmetros, mas para criar as imagens seguintes. Esta organização com preFunc e func foi necessária para melhor visualizar os dados, uma vez que o algoritmo de desenho que comecei a usar para testes não dava uma resposta visual rápida. Logo, usei preFunc para desenhar uma imagem e func para modificá-la, como por exemplo, movendo suas linhas para cima.

Como precisamos trocar as imagens o mais rápido possível (30 frames por segundo = ~33 ms entre cada imagem), run deve executar seu loop o mais rápido possível.

O passo seguinte, que independe de como a imagem foi criada, é transformar a matriz de pontos numa imagem. Esta transformação é realizada por image = Image.fromarray(data). A partir deste ponto, temos uma imagem, porém esta está no formato da PILLOW (PIL), biblioteca de imagens que usamos para fazer esta gestão. Para converter nossa imagem para o tkinter, também usando a PILLOW, chamamos: converted_image = ImageTk.PhotoImage(image). converted_image está pronta para ir para a fila e ser desenhada na tela. Já podemos passar para o desenho da imagem seguinte.

Uma curiosidade é que foi justamente a ImageTk.PhotoImage que complicou o processo de finalização dos threads, é esta classe que precisa do loop de eventos do tkinter rodando para funcionar e fez com que uma coordenação de finalização fosse elaborada com o queueStop.

O loop de run fica rodando até self.running ser False. E é exatamente isto que o método stop faz.

Como o thread de desenho é independente do thread principal do programa, onde roda a tkinter, o stop pode ocorrer em momentos diferentes do loop. É por isso que não podemos desativar a tkinter até que o loop seja finalizado e chegue novamente no while que verifica self.running.

Ao sair do loop, uma mensagem a queueStop é postada. Esta mensagem serve de sinal para que o loop principal continue sua finalização e posteriormente feche a janela.

Você deve ter reparado várias chamadas comentadas a TimeIt. Esta classe foi criada apenas para medir o tempo de execução de algumas funções, pois percebi que estava muito lento.

class TimeIt:
    """Classe para medir o tempo de execução de alguns blocos.
       Deve ser usada como gerenciados de contexto, com blocks with"""
    def __init__(self, name, silent=False):
        self.name = name
        self.start = 0
        self.end = 0
        self.silent = silent

    def __enter__(self):
        self.start = datetime.now()

    def __exit__(self, *args, **kwargs):
        self.end = datetime.now()
        if not self.silent:
            segundos = self.elapsed().total_seconds()
            if segundos == 0:
                return
            fps = 1.0 / segundos
            print(f"Elapsed {self.name}: {self.elapsed()} Frames: {fps}")

    def elapsed(self):
        return self.end - self.start

Antes de tudo otimizar era preciso descobrir a origem da lentidão. No caso do loop, era sempre a chamada a self.func que dominava o tempo de execução. Você pode remover os comentários e identar a linha seguinte para ter os resultados na tela. A operação de Image.fromarray e ImageTk.PhotoImage executam muito rápido, na casa de 1 ms em meu computador. Já a função de desenho estava demorando até 3s ou 3000 ms no início. Lembrando que precisamos desenhar em no máximo 33 ms para termos 30 frames por segundo.

Vejamos uma função simples de desenho:

def draw(data, c, largura, altura):
    for y in numpy.arange(0, altura):
        for x in numpy.arange(0, largura):
            data[y, x] = [0, 0, y // (c + 1)]

Esta função simplesmente desenha uma série de listras na tela, mudando a componente azul de cada ponto com a divisão da linha corrente pelo contador de frames (c). A ideia era apenas de percorrer os pontos da imagem e poder visualizar na tela o mais rápido possível.

Esta função tem uma performance horrível:

Elapsed Loop: 0:00:01.427400 Frames: 0.7005744710662744
Elapsed Loop: 0:00:01.316119 Frames: 0.7598097132554122
Elapsed Loop: 0:00:01.308270 Frames: 0.764368211454822
Elapsed Loop: 0:00:01.341486 Frames: 0.7454419949220491
Elapsed Loop: 0:00:01.359058 Frames: 0.7358037699641957

Menos de 1 frame por segundo, já que estamos passando mais de 1s para gerar uma única imagem.

Como tempo, a imagem fica cada vez mais escura, em função dos valores de c que aumentam a cada frame. Mas com esta velocidade, ficou muito lento e você mal percebe que há alguma mudança na tela em si.

Mesmo usando NumPy, o tempo de execução do loop de desenho era muito alto. Decidi então usar outra biblioteca, chamada Numba. Numba é um JIT (Just in Time compiler) para Python. Com ela, você pode anotar suas funções e elas são compiladas logo na primeira chamada. Ao chamar outra vez, a função original é substituída pela compilada, otimizada e rodando com performances de linguagens nativas (desde que a interação com o interpretador seja limitada). Vejamos o que precisamos mudar para usar Numba:

@jit(nopython=True, parallel=True, fastmath=True, nogil=True)
def drawNumba(data, c, largura, altura):
    for y in numpy.arange(0, altura):
        for x in numpy.arange(0, largura):
            data[y, x] = [0, 0, y // (c + 1)]

O código é o mesmo, simplesmente adicionamos o decorador @jit da Numba para marcar que queremos que esta função seja otimizada. Nada mais no código foi mudado, salvo o import da Numba em si. Para o resto do programa, a função se comporta da mesma forma que antes.

Vejamos o resultado com Numba:

Elapsed PreLoop: 0:00:01.276058 Frames: 0.7836634384957424
Elapsed Loop: 0:00:00.210905 Frames: 4.74147127853773
Elapsed Loop: 0:00:00.205027 Frames: 4.877406390377853
Elapsed Loop: 0:00:00.219663 Frames: 4.552428037493797
Elapsed Loop: 0:00:00.220664 Frames: 4.5317768190552155
Elapsed Loop: 0:00:00.209917 Frames: 4.763787592238837
Elapsed Loop: 0:00:00.222617 Frames: 4.492019926600395

Eu adicionei um contexto TimeIt para medir o tempo de execução da primeira chamada a função, no caso, preFunc. Veja que a função executou praticamente com a mesma lentidão da versão sem aceleração. Porém, observe que a partir da segunda chamada, o tempo de execução foi reduzido de 1.27s para 0.21s, elevando nossos frames por segundo a 4.7 (o número de frames na tela pode ser menor ou um pouco diferente devido a comunicação com a tkinter). A versão acelerada com Numba, roda em apenas 16% do tempo, ou seja, é quase 8 vezes mais rápida. Tudo isso com a instalação via pip e duas linhas no código. Mas 4 frames por segundo ainda é muito lento e longe dos 30 desejados. Lembrando que até agora nem comecei a fazer o efeito de chamas.

Outra alternativa é usar um módulo compilado, criado com Cython. Cython (diferente de CPython), é um compilador que traduz um programa parecido com Python em um módulo C que o Python pode chamar.

Para usar Cython, precisamos fazer algumas mudanças mais importantes. Primeiro instalar o Cython e ter certeza que um compilador C/C++ está instalado na máquina. No Windows com Python 3.8, usei o Visual Studio 2019 sem problemas.

Um programa em Cython é escrito num arquivo com a extensão .pyx. Convertendo a função de desenho para Cython, temos:

import numpy as np
cimport numpy as np
cimport cython
from libc.math cimport abs
from libc.stdlib cimport rand


ctypedef np.uint8_t DTYPE_t
ctypedef np.uint32_t DTYPE32_t


@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.cdivision(True)
def draw2(np.ndarray[DTYPE_t, ndim=3] data, int c, int max_x, int max_y):
    cdef int x, y
    cdef int ic = c
    cdef np.ndarray[DTYPE_t, ndim=3] h = data
    cdef int cmax_y = max_y, cmax_x = max_x
    for y in range(cmax_y):
        for x in range(cmax_x):
            h[y, x, 0] = 0
            h[y, x, 1] = 0
            h[y, x, 2] = y / (ic + 1)

Muito parecido com Python e com C.

O Cython pede também a configuração de um setup.py para compilar o módulo.

from setuptools import setup
from Cython.Build import cythonize
import numpy

setup(
    name='Gerador de Telas',
    ext_modules=cythonize("compute.pyx", annotate=True, language_level=3),
    include_dirs=[numpy.get_include()],
    zip_safe=False,
)

E precisa ser compilado com:

python setup.py build_ext --inplace

Mas os resultados são muito bons:

Elapsed Loop: 0:00:00.023445 Frames: 42.65301770100235
Elapsed Loop: 0:00:00.022442 Frames: 44.55930843953302
Elapsed Loop: 0:00:00.023414 Frames: 42.70949004868882
Elapsed Loop: 0:00:00.024410 Frames: 40.96681687832855
Elapsed Loop: 0:00:00.023431 Frames: 42.67850283812044
Elapsed Loop: 0:00:00.022455 Frames: 44.533511467379206
Elapsed Loop: 0:00:00.023436 Frames: 42.66939750810719

Agora passamos de 4 para 40 frames por segundo e geramos uma nova imagem em apenas 23 ms!

Na realidade, ficou tão rápido que a imagem fica preta muito rápido. Para facilitar a visualização, uma outra função, chamada drawUp for criada. Para não ficar escurecendo a imagem, resolvi copiar as linhas de forma a rolar as linhas na tela, desta forma, o programa pode rodar por mais tempo, sem que a tela fique negra.

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.cdivision(True)
def drawUp(np.ndarray[DTYPE_t, ndim=3] data, int c, int max_x, int max_y):
    cdef int x, y
    cdef int ic = c
    cdef np.ndarray[DTYPE_t, ndim=3] h = data
    cdef int cmax_y = max_y, cmax_x = max_x
    # Copy top to bottom
    for x in range(0, cmax_x):
        h[cmax_y - 2, x, 0] = h[0, x, 0]
        h[cmax_y - 2, x, 1] = h[0, x, 1]
        h[cmax_y - 2, x, 2] = h[0, x, 2]
    for y in range(1, cmax_y - 1):
        for x in range(0, cmax_x):
            h[y - 1, x, 0] = h[y, x, 0]
            h[y - 1, x, 1] = h[y, x, 1]            
            h[y - 1, x, 2] = h[y, x, 2]

Foi está mudança que levou a separação entre preFunc e func. No preFunc, executada por draw2, uma imagem como a gerada em Python puro é criada. Na função drawUp, ela simplesmente rola as linhas da imagem, copiando a linha do topo para baixo e movendo as outras linhas para cima.

Neste ponto, tanto o problema de performance quanto de finalização da janela foram resolvidos. Falta apenas converter o algoritmo para gerar as flamas.

O primeiro passo é gerar uma paleta de cores compatíveis, uma vez que o algoritmo usa 256 cores para indicar a intensidade do fogo.

Convertendo para Python, temos algo como:

def build_fire_palette():
    palette = numpy.zeros((256, 3), dtype=numpy.uint8)
    for x in range(256):
        h = x // 3
        saturation = 100
        b = min(256, x * 2) / 256.0 * 100.0
        css = f"hsl({h},{saturation}%,{b}%)"
        palette[x] = ImageColor.getrgb(css)
    return palette

A paleta é simplesmente uma tabela de cores que vamos usar para transformar um valor entre 0 e 255 (byte) em uma cor RGB (vermelho, verde e azul com 3 bytes).

Um problema aperece com o desenhador, pois a classe Desenha não suporta paletas de cores. Vamos precisar de outro desenhador:

class DesenhaComPalette(Desenha):
    def run(self):
        try:
            palette = build_fire_palette()
            data = numpy.zeros((ALTURA, LARGURA), dtype=numpy.uint8)
            fogo = numpy.zeros((ALTURA, LARGURA), dtype=numpy.uint32)
            c = 0
            while self.running:
                with TimeIt("Loop") as t:
                    # with TimeIt("ForLoop") as t:
                    self.func(data, c, LARGURA, ALTURA, fogo)
                    # with TimeIt("FROM ARRAY") as t1:
                    image = Image.fromarray(data, mode="P")
                    image.putpalette(palette)
                    # with TimeIt("Convert") as t2:
                    converted_image = ImageTk.PhotoImage(image)
                    # with TimeIt("Queue") as t3:
                    self.queue.put((c, converted_image))
                    c += 1
        finally:
            self.running = False
            self.queueStop.put((0, 'FEITO'))

A diferença é que criamos a imagem de forma diferente, pois temos que passar os pontos (com cores 0 a 255) e a paleta (com a tradução de cada cor). Criamos também o fogo, mas como uma matriz de inteiros e não como uma matriz de bytes. Isto muda o tamanho da matriz em memória, mas é necessária pro algoritmo das flamas que guarda a informação do fogo entre uma tela e outra. Em data, vamos guardar os pontos em 256 cores.

O algoritmo convertido em Python fica assim:

def desenhaPythonFlamas(data, c, largura, altura, fogo):
    for x in range(LARGURA):
        fogo[ALTURA - 1, x] = int(min(random.random() * 2048, 2048))

    for y in range(1, ALTURA - 2):
        for x in range(0, LARGURA):
            v = int((fogo[(y + 1) % ALTURA, x] +
                     fogo[(y + 1) % ALTURA, (x - 1) % LARGURA] +
                     fogo[(y + 1) % ALTURA, (x + 1) % LARGURA] +
                     fogo[(y + 2) % ALTURA, x]) * 32) / 129
            fogo[y, x] = v
    for y in range(altura):
        for x in range(largura):
            data[y, x] = fogo[y, x] % 256

Que fica ultra lento, como esperado:

Elapsed Loop: 0:00:06.345203 Frames: 0.15759937073723254
Elapsed Loop: 0:00:06.327644 Frames: 0.15803670370836284
Elapsed Loop: 0:00:06.362772 Frames: 0.15716420453223848
Elapsed Loop: 0:00:06.387171 Frames: 0.15656383710409505
Elapsed Loop: 0:00:06.590262 Frames: 0.15173903556489862

São enormes 6s para gerar uma só tela com as flamas! Passemos a versão otimizada com Numba, simplesmente adicionado o decorador, como fizemos anteriormente.

Elapsed Loop: 0:00:00.022445 Frames: 44.55335263978615
Elapsed Loop: 0:00:00.024425 Frames: 40.941658137154555
Elapsed Loop: 0:00:00.024421 Frames: 40.94836411285369
Elapsed Loop: 0:00:00.024396 Frames: 40.99032628299721

Muito melhor! Atingimos mais de 30 frames como esperado, voltando a casa dos 23 ms para gerar um frame. Lembrando que todas estas performances são para imagens de 1024 x 1024 pontos. Se você tem um computador mais lento, pode diminuir o tamanho da tela.

E como ficaria em Cython? Pagando para ver:

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.cdivision(True)
def desenhaflamas(np.ndarray[DTYPE_t, ndim=2] data, 
                  int c, int max_x, int max_y, 
                  np.ndarray[DTYPE32_t, ndim=2] fogo):
    cdef int x, y
    cdef int ic = c
    cdef np.ndarray[DTYPE_t, ndim=2] d = data
    cdef np.ndarray[DTYPE32_t, ndim=2] f = fogo
    cdef int cmax_y = max_y, cmax_x = max_x

    for x in range(cmax_x):
        f[cmax_y - 1, x] = abs(32768 + rand()) % 2048

    for y in range(1, cmax_y - 2):
        for x in range(0, cmax_x):
            f[y, x] = ((f[(y + 1) % cmax_y, x] +
                        f[(y + 1) % cmax_y, (x - 1) % cmax_x] +
                        f[(y + 1)% cmax_y, (x + 1) % cmax_x] +
                        f[(y + 2)% cmax_y, x]) * 32) / 129
    for y in range(max_y):
        for x in range(max_x):
            d[y, x] = f[y, x] % 256

Que tem como performance:

Elapsed Loop: 0:00:00.026373 Frames: 37.917567208887874
Elapsed Loop: 0:00:00.026379 Frames: 37.90894271958755
Elapsed Loop: 0:00:00.028309 Frames: 35.324455120279765
Elapsed Loop: 0:00:00.030253 Frames: 33.05457310018841
Elapsed Loop: 0:00:00.028315 Frames: 35.316969803990816
Elapsed Loop: 0:00:00.026374 Frames: 37.91612952149844
Elapsed Loop: 0:00:00.026347 Frames: 37.95498538733063
Elapsed Loop: 0:00:00.026374 Frames: 37.91612952149844

Ficou um pouco pior que com Numba. Eu acredito que seja algum detalhe no código Cython. Mas já deu para ver que o efeito está rodando:

Como rodar tudo isso em um só programa? Precisamos de uma seção de configuração:

if len(sys.argv) < 5:
    print("Uso: python desenha.py <algoritmo> <acelerador> <largura> <altura>")
    print("Algoritmo: desenho, flamas")
    print("Acelerador: cython, python, numba")

ALGORITMO = sys.argv[1].lower()
ACELERADOR = sys.argv[2].lower()
LARGURA = int(sys.argv[3])
ALTURA = int(sys.argv[4])

print(f"ALGORITMO: {ALGORITMO}")
print(f"ACELERADOR: {ACELERADOR}")
print(f"LARGURA: {LARGURA} ALTURA: {ALTURA}")

CONFIGURACAO = {
    "flamas": {"desenhador": DesenhaComPalette,
               "otimizacao": {"python": (desenhaPythonFlamas.py_func, None),
                              "cython": (desenhaflamas, None),
                              "numba": (desenhaPythonFlamas, None)
                              }},
    "desenho": {"desenhador": Desenha,
                "otimizacao": {"python": (drawNumba.py_func, None),
                               "cython": (drawUp, draw2),
                               "numba": (drawNumba, None)
                               }}
}

if ALGORITMO not in CONFIGURACAO:
    print(f"Algoritmo {ALGORITMO} inválido", file=sys.stderr)
    sys.exit(1)

if ACELERADOR not in CONFIGURACAO[ALGORITMO]["otimizacao"]:
    print(f"Acelerador {ACELERADOR} inválido", file=sys.stderr)
    sys.exit(2)

if ALTURA < MIN_V or LARGURA < MIN_V or ALTURA > MAX_V or LARGURA > MAX_V:
    print(f"Altura e largura devem ser valores entre {MIN_V} e {MAX_V}.")
    sys.exit(3)

desenhador = CONFIGURACAO[ALGORITMO]["desenhador"]
func = CONFIGURACAO[ALGORITMO]["otimizacao"][ACELERADOR][0]
prefunc = CONFIGURACAO[ALGORITMO]["otimizacao"][ACELERADOR][1]

app = App(desenhador=desenhador, func=func, preFunc=prefunc)
app.mainloop()

Ufa, finalmente rodando! Espero que tenha gostado do artigo e que tenha ficado curioso sobre performance em Python, Cython e Numba. O código completo esta publicado no GitHub: https://github.com/lskbr/flamas_em_python

E se usarmos este código para simular o Jogo da Vida? Fica para outro artigo.

Pegando dados macroeconômicos com Quandl em Python

2020-05-20T00:00:00+00:00

Como obter dados históricos sobre indicadores de macroeconomia com a biblioteca Quandl usando Python

Putting it all together

2020-05-11T15:00:00+00:00

sourmash 3.3 was released last week, and it is the first version supporting zipped databases. Here is my personal account of how that came to be =]

What is a sourmash database?

A sourmash database contains signatures (typically Scaled MinHash sketches built from genomic datasets) and an index for allowing efficient similarity and containment queries over these signatures. The two types of index are SBT, a hierarchical index that uses less memory by keeping data on disk, and LCA, an inverted index that uses more memory but is potentially faster. Indices are described as JSON files, with LCA storing all the data in one JSON file and SBT opting for saving a description of the index structure in JSON, and all the data into a hidden directory with many files.

We distribute some prepared databases (with SBT indices) for Genbank and RefSeq as compressed TAR files. The compressed file is ~8GB, but after decompressing it turns into almost 200k files in a hidden directory, using about 40 GB of disk space.

Can we avoid generating so many hidden files?

The initial issue in this saga is dib-lab/sourmash#490, and the idea was to take the existing support for multiple data storages (hidden dir, TAR files, IPFS and Redis) and save the index description in the storage, allowing loading everything from the storage. Since we already had the databases as TAR files, the first test tried to use them but it didn't take long to see it was a doomed approach: TAR files are terrible from random access (or at least the tarfile module in Python is).

Zip files showed up as a better alternative, and it helps that Python has the zipfile module already available in the standard library. Initial tests were promising, and led to dib-lab/sourmash#648. The main issue was performance: compressing and decompressing was slow, but there was also another limitation...

Loading Nodegraphs from a memory buffer

Another challenge was efficiently loading the data from a storage. The two core methods in a storage are save(location, content), where content is a bytes buffer, and load(location), which returns a bytes buffer that was previously saved. This didn't interact well with the khmer Nodegraphs (the Bloom Filter we use for SBTs), since khmer only loads data from files, not from memory buffers. We ended up doing a temporary file dance, which made things slower for the default storage (hidden dir), where it could have been optimized to work directly with files, and involved interacting with the filesystem for the other storages (IPFS and Redis could be pulling data directly from the network, for example).

This one could be fixed in khmer by exposing C++ stream methods, and I did a small PoC to test the idea. While doable, this is something that was happening while the sourmash conversion to Rust was underway, and depending on khmer was a problem for my Webassembly aspirations... so, having the Nodegraph implemented in Rust seemed like a better direction, That has actually been quietly living in the sourmash codebase for quite some time, but it was never exposed to the Python (and it was also lacking more extensive tests).

After the release of sourmash 3 and the replacement of the C++ for the Rust implementation, all the pieces for exposing the Nodegraph where in place, so dib-lab/sourmash#799 was the next step. It wasn't a priority at first because other optimizations (that were released in 3.1 and 3.2) were more important, but then it was time to check how this would perform. And...

Your Rust code is not so fast, huh?

Turns out that my Nodegraph loading code was way slower than khmer. The Nodegraph binary format is well documented, and doing an initial implementation wasn't so hard by using the byteorder crate to read binary data with the right endianess, and then setting the appropriate bits in the internal fixedbitset in memory. But the khmer code doesn't parse bit by bit: it reads a long char buffer directly, and that is many orders of magnitude faster than setting bit by bit.

And there was no way to replicate this behavior directly with fixedbitset. At this point I could either bit-indexing into a large buffer and lose all the useful methods that fixedbitset provides, or try to find a way to support loading the data directly into fixedbitset and open a PR.

I chose the PR (and even got #42! =]).

It was more straightforward than I expected, but it did expose the internal representation of fixedbitset, so I was a bit nervous it wasn't going to be merged. But bluss was super nice, and his suggestions made the PR way better! This simplified the final Nodegraph code, and actually was more correct (because I was messing a few corner cases when doing the bit-by-bit parsing before). Win-win!

Nodegraphs are kind of large, can we compress them?

Being able to save and load Nodegraphs in Rust allowed using memory buffers, but also opened the way to support other operations not supported in khmer Nodegraphs. One example is loading/saving compressed files, which is supported for Countgraph (another khmer data structure, based on Count-Min Sketch) but not in Nodegraph.

If only there was an easy way to support working with compressed files...

Oh wait, there is! niffler is a crate that I made with Pierre Marijon based on some functionality I saw in one of his projects, and we iterated a bit on the API and documented everything to make it more useful for a larger audience. niffler tries to be as transparent as possible, with very little boilerplate when using it but with useful features nonetheless (like auto detection of the compression format). If you want more about the motivation and how it happened, check this Twitter thread.

The cool thing is that adding compressed files support in sourmash was mostly one-line changes for loading (and a bit more for saving, but mostly because converting compression levels could use some refactoring).

Putting it all together: zipped SBT indices

With all these other pieces in places, it's time to go back to dib-lab/sourmash#648. Compressing and decompressing with the Python zipfile module is slow, but Zip files can also be used just for storage, handing back the data without extracting it. And since we have compression/decompression implemented in Rust with niffler, that's what the zipped sourmash databases are: data is loaded and saved into the Zip file without using the Python module compression/decompression, and all the work is done before (or after) in the Rust side.

This allows keeping the Zip file with similar sizes to the original TAR files we started with, but with very low overhead for decompression. For compression we opted for using Gzip level 1, which doesn't compress perfectly but also doesn't take much longer to run:

Level	Size	Time
0	407 MB	16s
1	252 MB	21s
5	250 MB	39s
9	246 MB	1m48s

In this table, 0 is without compression, while 9 is the best compression. The size difference from 1 to 9 is only 6 MB (~2% difference) but runs 5x faster, and it's only 30% slower than saving the uncompressed data.

The last challenge was updating an existing Zip file. It's easy to support appending new data, but if any of the already existing data in the file changes (which happens when internal nodes change in the SBT, after a new dataset is inserted) then there is no easy way to replace the data in the Zip file. Worse, the Python zipfile will add the new data while keeping the old one around, leading to ginormous files over time¹ So, what to do?

I ended up opting for dealing with the complexity and complicating the ZipStorage implementation a bit, by keeping a buffer for new data. If it's a new file or it already exists but there are no insertions the buffer is ignored and all works as before.

If the file exists and new data is inserted, then it is first stored in the buffer (where it might also replace a previous entry with the same name). In this case we also need to check the buffer when trying to load some data (because it might exist only in the buffer, and not in the original file).

Finally, when the ZipStorage is closed it needs to verify if there are new items in the buffer. If not, it is safe just to close the original file. If there are new items but they were not present in the original file, then we can append the new data to the original file. The final case is if there are new items that were also in the original file, and in this case a new Zip file is created and all the content from buffer and original file are copied to it, prioritizing items from the buffer. The original file is replaced by the new Zip file.

Turns out this worked quite well! And so the PR was merged =]

The future

Zipped databases open the possibility of distributing extra data that might be useful for some kinds of analysis. One thing we are already considering is adding taxonomy information, let's see what else shows up.

Having Nodegraph in Rust is also pretty exciting, because now we can change the internal representation for something that uses less memory (maybe using RRR encoding?), but more importantly: now they can also be used with Webassembly, which opens many possibilities for running not only signature computation but also search and gather in the browser, since now we have all the pieces to build it.

Comments?

Thread on Twitter

Footnotes

The zipfile module does throw a UserWarning pointing that duplicated files were inserted, which is useful during development but generally doesn't show during regular usage... ↩

LaKademy 2019

2020-05-05T18:29:16+00:00

Em novembro passado, colaboradores latinoamericanos do KDE desembarcaram em Salvador/Brasil para participarem de mais uma edição do LaKademy – o Latin American Akademy. Aquela foi a sétima edição do evento (ou oitava, se você contar o Akademy-BR como o primeiro LaKademy) e a segunda com Salvador como a cidade que hospedou o evento. Sem problemas para mim: na verdade, adoraria me mudar e viver ao menos alguns anos em Salvador, cidade que gosto baste.

Foto em grupo do LaKademy 2019

Minhas principais tarefas durante o evento foram em 2 projetos: Cantor e Sprat, um “editor de rascunhos de artigos acadêmicos”. Além deles, ajudei também com tarefas de promoção como o site do LaKademy.

Nos trabalhos sobre o Cantor me foquei naqueles relacionados com organização. Por exemplo, pedi aos sysadmins que migrassem o repositório para o Gitlab do KDE e crei um site específico para o Cantor em cantor.kde.org usando o novo template em Jekyll para projetos do KDE.

O novo site é uma boa adição ao Cantor porque nós queremos comunicar melhor e mais diretamente com nossa comunidade de usuários. O site tem um blog próprio e uma seção de changelog para tornar mais fácil à comunidde seguir as notícias e principais mudanças no software.

A migração para o Gitlab nos permite utilizar o Gitlab CI como uma alternativa para integração contínua no Cantor. Eu orientei o trabalho do Rafael Gomes (que ainda não teve merge) para termos isso disponível pro projeto.

Além dos trabalhos no Cantor, desenvolvi algumas atividades relacionadas ao Sprat, um editor de rascunhos de artigos científicos em inglês. Este softwar usa katepart para implementar e metodologia de escrita de artigos científicos em inglês conhecida como PROMETHEUS, conforme descrita neste livro, como uma tentativa de auxiliar estudantes e pesquisadores em geral na tarefa de escrever artigos científicos. Durante o LaKademy finalizei o port para Qt5 e, tomara, espero lançar o projeto este ano.

Nas atividades mais sociais, participei da famosíssima reunião de promo, que discute as ações futuras do KDE para a América Latina. Nossa principal decisão foi organizar e participar mais de eventos pequenos e distribuídos em várias cidades, marcando a presença do KDE em eventos consolidados como o FLISoL e o Software Freedom Day, e mais – mas agora, em tempos de COVID-19, isso não é mais viável. Outra decisão foi mover a organização do KDE Brasil do Phabricator para o Gitlab.

Contribuidores do KDE trabalhando pesado

Para além da parte técnica, este LaKademy foi uma oportunidade para encontrar velhos e novos amigos, beber algumas cervejas, saborear a maravilhosa cozinha bahiana, e se divertir entre um commit e outro.

Gostaria de agradecer ao KDE e.V. por apoiar o LaKademy, e Caio e Icaro por terem organizado essa edição do evento. Não vejo a hora de participar do próximo LaKademy e que isso seja o mais rápido possível!

Akademy 2019

2020-05-04T21:20:54+00:00

Foto em grupo do Akademy 2019

Em setembro de 2019 a cidade italiana de Milão sediou o principal encontro mundial dos colaboradores do KDE – o Akademy, onde membros de diferentes áreas como tradutores, desenvolvedores, artistas, pessoal de promo e mais se reúnem por alguns dias para pensar e construir o futuro dos projetos e comunidade(s) do KDE

Antes de chegar ao Akademy tomei um voo do Brasil para Portugal a fim de participar do EPIA, uma conferência sobre inteligência artificial que aconteceu na pequena cidade de Vila Real, região do Porto. Após essa atividade acadêmica, voei do Porto à Milão e iniciei minha participação no Akademy 2019.

Infelizmente pousei no final da primeira manhã do evento, o que me fez perder apresentações interessantes sobre Qt 6 e os novos KDE’ Goals. Pela tarde pude algumas palestras sobre temas que também me chamam atenção, como Plasma para dispositivos móveis, MyCroft para a indústria automotiva, relatório do KDE e.V. e um showcase de estudantes do Google Summer of Code e do Season of KDE – muito legal ver projetos incríveis desenvolvidos pelos novatos.

No segundo dia me chamou atenção as palestras sobre o KPublicTransportation, LibreOffice para o Plasma, Get Hot New Stuffs – que imagino utilizarei em um futuro projeto – e a apresentação do Caio sobre o kpmcore.

Após a festa do evento (comentários sobre ela apenas pessoalmente), os dias seguintes foram preenchidos pelos BoFs, para mim a parte mais interessante do Akademy.

O workshop do Gitlab foi interessante porque pudemos discutir temas específicos sobre a migração do KDE para esta ferramenta. Estou adorando esse movimento e espero que todos os projetos do KDE façam essa migração o quanto antes. Cantor já está por lá há algum tempo.

No BoF do KDE websites, pude entender um pouco melhor sobre o novo tema do Jekyll utilizado pelos nossos sites. Em adição, aguardo que logo mais possamos aplicar internacionalização nessas páginas, tornando-as traduzíveis para quaisquer idiomas. Após participar e tomar informações nesse evento, criei um novo website pro Cantor durante o LaKademy 2019.

O BoF do KDE Craft foi interessante para ver como compilar e distribuir nosso software na loja do Windows (pois é, vejam só o que estou escrevendo…). Espero trabalhar com esse tema durante o ano de forma a disponibilizar um pacote do Cantor naquela loja ((pois é)²).

Também participei do workshop sobre QML e Kirigami realizado pelo pessoal do projeto Maui. Kirigami é algo que tenho mantido o olho para futuros projetos.

Finalmente, participei do BoF “All About the Apps Kick Off”. Pessoalmente, penso que esse é o futuro do KDE: uma comunidade internacional que produz software livre de alta qualidade e seguro para diferentes plataformas, do desktop ao mobile. De fato, isto é como o KDE está atualmente organizado e funcionando, mas nós não conseguimos comunicar isso muito bem para o público. Talvez, com as mudanças em nossa forma de lançamentos, somados aos websites para projetos específicos e distribuição em diferentes lojas de aplicativos, possamos mudar a maneira como o público vê a nossa comunidade.

O day trip do Akademy 2019 foi no lago Como, na cidade de Varenna. Uma viagem linda, passei o tempo todo imaginando que lá poderia ser um bom lugar para eu passar uma lua de mel :D. Espero voltar lá no futuro próximo, e passar alguns dias viajando entre cidades como ela.

Eu em Varenna

Gostaria de agradecer a todo o time local, Riccardo e seus amigos, por organizarem essa edição incrível do Akademy. Milão é uma cidade muito bonita, com comida deliciosa (carbonara!), lugares históricos para visitar e descobrir mais sobre os italianos e a sofisticada capital da alta moda.

Finalmente, meus agradecimentos ao KDE e.V. por patrocinar minha participação no Akademy.

Neste link estão disponíveis vídeos das apresentações e BoFs do Akademy 2019.

Consultas do Telegram - Questões interessantes

2020-05-03T13:45:00+00:00

Semana passada tive a oportunidade de ver 3 questões interessantes a discutir nos grupos do Telegram, mas que a explicação seria grande demais para apresentar em um chat.

Por que `True, True, True == (True, True, True)` retorna `True, True, False`?

Esta questão foi apresentada como sendo uma sintaxe bizarra do Python, mas na realidade é uma pegadinha visual. Repare no operador == (igual igual).

Python 3.8.2 (default, Apr 27 2020, 15:53:34) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> True, True, True == (True, True, True)
(True, True, False)

Algumas pessoas ficaram em dúvida e perguntaram por que (True, True, False) em vez de (True, True, True). Primeiramente, devemos lembrar que Python permite criarmos tuplas sem utilizar parênteses, apenas com vírgulas. E é exatamente esta sintaxe que causou certa confusão, pois olhando rápido você pode pensar que estamos comparando duas tuplas, o que não é o caso. Por exemplo:

>>> (True, True, True) == (True, True, True)
True

A diferença são os parênteses. Sem os parênteses, estamos comparando apenas True == (True, True, True) e não a primeira tupla com a segunda. É uma questão de prioridade de operadores. Desta forma, para criar a tupla, o interpretador avalia primeiramente a comparação de True com (True, True, True). Como o primeiro é do tipo bool e o segundo uma tupla, o resultado é False. Assim, a tupla gerada tem os primeiros True, True, seguidos do False que é o resultado da comparação. Quando escrevemos entre parênteses, estamos comparando duas tuplas e o resultado é True.

Por que `-2 * 5 // 3 + 1` retorna `-3`?

Esta aqui é mais complicada. Vários resultados foram apresentados e a prioridade do operador // foi questionada. Vejamos o que diz o Python:

>>> -2 * 5 // 3 + 1
-3

Qual resultado você esperava? Algumas pessoas esperavam -4, outras -2. Como resulta em -3?

Primeiro, devemos rever a prioridade do operador // que é exatamente a mesma da divisão. No caso, a divisão e a multiplicação tem a mesma prioridade e devem ser avalidas da esquerda para a direita. Isto é especialmente importante, pois o // faz um arredondamento.

Agora vejamos a definição do // na documentação do Python:

Division of integers yields a float, while floor division of integers results in an integer; the result is that of mathematical division with the ‘floor’ function applied to the result…

Que podemos traduzir como: A divisão de inteiros resulta em um número de ponto flutuante (float), enquanto a divisão piso(floor) de inteiros resulta em um número inteiro (int); o resultado é o da divisão matemática com a aplicação da função piso(floor) aplicada ao resultado.

O ponto que não ficou claro é o comportamento de floor com números negativos.

>>> 10 // 3
3
>>> -10 // 3
-4
>>> -10 / 3
-3.3333333333333335

Naturalmente, se espera que o resultado de -10 // 3 fosse igual ao de 10 // 3, porém com sinal diferente. Você pode consultar a definição destas duas funções na Wikipedia e na documentação do Python:

math.floor(x) Return the floor of x, the largest integer less than or equal to x. If x is not a float, delegates to x.__floor__(), which should return an Integral value

Que pode ser traduzido como: retorna o valor do piso de x, o maior número inteiro menor or igual a x. Se x não é um número de ponto flutuante, delega a x.__floor()__ que deve retornar um valor inteiro.

Para números negativos, a parte de retornar o menor inteiro engana facilmente. Pois o menor inteiro relativo a -3.33 é -4, lembrando que -3 > -4!

Assim a expressão é corretamente avaliada pelo interpretador:

-2 * 5
-10
-10 // 3
-4
-4 + 1
-3

Olhando os fontes do Python (3.8.2) no github, fica fácil de ver o ajuste (linha 3768):

3751/* Fast floor division for single-digit longs. */
3752static PyObject *
3753fast_floor_div(PyLongObject *a, PyLongObject *b)
3754{
3755    sdigit left = a->ob_digit[0];
3756    sdigit right = b->ob_digit[0];
3757    sdigit div;
3758
3759    assert(Py_ABS(Py_SIZE(a)) == 1);
3760    assert(Py_ABS(Py_SIZE(b)) == 1);
3761
3762    if (Py_SIZE(a) == Py_SIZE(b)) {
3763        /* 'a' and 'b' have the same sign. */
3764        div = left / right;
3765    }
3766    else {
3767        /* Either 'a' or 'b' is negative. */      
3768        div = -1 - (left - 1) / right;              
3769    }
3770
3771    return PyLong_FromLong(div);
3772}

Nota: recordo de chamar a função piso de solo ou de mínima.

Como adicionar tempos em Python?

Um colega do grupo perguntou como adicionar durações de tempo em Python. Apresentando o seguinte código:

from datetime import time
t0 = time.fromisoformat('06:52:00')
t1 = time.fromisoformat('00:08:15') 
t2 = time.fromisoformat('00:07:12') 
t3 = t0 + t1 + t2

que resulta no erro seguinte:

Traceback (most recent call last):
  File "tdeltao.py", line 5, in <module>
    t3 = t0 + t1 + t2
TypeError: unsupported operand type(s) for +: 'datetime.time' and 'datetime.time'

Isto acontece porque a classe time não define a operação de soma. A classe correta para este tipo de cálculo é timedelta. Para calcular corretamente esta soma, precisamos primeiro converter a string em um objeto timedelta. Isto pode ser feito com uma função simples:

from datetime import time, timedelta, datetime

def string_para_timedelta(str_time: str) -> timedelta:
    valor = time.fromisoformat(str_time)
    return timedelta(hours=valor.hour, 
                     minutes=valor.minute, 
                     seconds=valor.second)

t0 = string_para_timedelta('06:52:00')
t1 = string_para_timedelta('00:08:15') 
t2 = string_para_timedelta('00:07:12') 
t3 = t0 + t1 + t2
print(t3)
print(datetime.now() + t3)

que resulta em:

7:07:27
2020-05-03 22:48:55.473647

Uma dica para lembrar da diferença entre time e timedelta é que time não pode representar mais de 24h e timedelta pode representar mais de um século (270 anos). Outra vantagem de timedelta é poder ser utilizada em operações com date e datetime.

Foco no ambiente, acelerando o aprendizado!

2020-02-28T16:49:00+00:00

Objetivo desse blogpost é compartilhar como geralmente faço para acelerar meu aprendizado em uma área que não tenho tanta experiencia e quero (e/ou preciso) ganhar mais experiência. Quando entrei na área de tecnologia (desenvolvimento de software) não sabia praticamente nada e comecei estudar como poderia acelerar meu aprendizado, até que me deparei em um texto no reddit que falava sobre foco no ambiente, foi o extremamente difícil eu conseguir entender o que estava querendo dizer aquele texto, mas depois de dias lendo e relendo consegui absorver que deveria frequentar lugares onde tinha pessoas fazendo o que buscava aprender, assim aceleraria meu aprendizado.

Chegando no limite da tecnologia, e agora para aonde vou?

2020-02-19T14:00:00+00:00

Nós de tecnologia em geral, somos early adopter (gostamos de abraçar novas tecnologias, mesmo sem saber ao certo porque ela existe), quando falamos em desenvolvimento não é muito diferente. Por que não usamos o banco de dados X? Podemos usar a linguagem de programação Y! O serviço Z resolve 100% dos nossos problemas! Vamos assumir que as afirmações acima estejam 100% corretas (lançamos o primeiro erro), a solução irá servir para “vida toda” ou daqui a alguns meses tenham que olhar para ela, porque batemos em algum limite da implementação, arquitetura ou da própria tecnologia?

Criando um CI de uma aplicação Django usando Github Actions

2020-01-24T15:10:00+00:00

Fala pessoal, tudo bom?

Nos vídeo abaixo vou mostrar como podemos configurar um CI de uma aplicação Django usando Github Actions.

https://www.youtube.com/watch?v=KpSlY8leYFY.

Oxidizing sourmash: PR walkthrough

2020-01-10T15:00:00+00:00

sourmash 3 was released last week, finally landing the Rust backend. But, what changes when developing new features in sourmash? I was thinking about how to best document this process, and since PR #826 is a short example touching all the layers I decided to do a small walkthrough.

Shall we?

The problem

The first step is describing the problem, and trying to convince reviewers (and yourself) that the changes bring enough benefits to justify a merge. This is the description I put in the PR:

Calling .add_hash() on a MinHash sketch is fine, but if you're calling it all the time it's better to pass a list of hashes and call .add_many() instead. Before this PR add_many just called add_hash for each hash it was passed, but now it will pass the full list to Rust (and that's way faster).

No changes for public APIs, and I changed the _signatures method in LCA to accumulate hashes for each sig first, and then set them all at once. This is way faster, but might use more intermediate memory (I'll evaluate this now).

There are many details that sound like jargon for someone not familiar with the codebase, but if I write something too long I'll probably be wasting the reviewers time too. The benefit of a very detailed description is extending the knowledge for other people (not necessarily the maintainers), but that also takes effort that might be better allocated to solve other problems. Or, more realistically, putting out other fires =P

Nonetheless, some points I like to add in PR descriptions: - why is there a problem with the current approach? - is this the minimal viable change, or is it trying to change too many things at once? The former is way better, in general. - what are the trade-offs? This PR is using more memory to lower the runtime, but I hadn't measure it yet when I opened it. - Not changing public APIs is always good to convince reviewers. If the project follows a semantic versioning scheme, changes to the public APIs are major version bumps, and that can brings other consequences for users.

Setting up for changing code

If this was a bug fix PR, the first thing I would do is write a new test triggering the bug, and then proceed to fix it in the code (Hmm, maybe that would be another good walkthrough?). But this PR is making performance claims ("it's going to be faster"), and that's a bit hard to codify in tests. ¹ Since it's also proposing to change a method (_signatures in LCA indices) that is better to benchmark with a real index (and not a toy example), I used the same data and command I run in sourmash_resources to check how memory consumption and runtime changed. For reference, this is the command:

sourmash search -o out.csv --scaled 2000 -k 51 HSMA33OT.fastq.gz.sig genbank-k51.lca.json.gz

I'm using the benchmark feature from snakemake in sourmash_resources to track how much memory, runtime and I/O is used for each command (and version) of sourmash, and generate the plots in the README in that repo. That is fine for a high-level view ("what's the maximum memory used?"), but not so useful for digging into details ("what method is consuming most memory?").

Another additional problem is the dual² language nature of sourmash, where we have Python calling into Rust code (via CFFI). There are great tools for measuring and profiling Python code, but they tend to not work with extension code...

So, let's bring two of my favorite tools to help!

Memory profiling: heaptrack

heaptrack is a heap profiler, and I first heard about it from Vincent Prouillet. Its main advantage over other solutions (like valgrind's massif) is the low overhead and... how easy it is to use: just stick heaptrack in front of your command, and you're good to go!

Example output:

$ heaptrack sourmash search -o out.csv --scaled 2000 -k 51 HSMA33OT.fastq.gz.sig genbank-k51.lca.json.gz

heaptrack stats:
        allocations:            1379353
        leaked allocations:     1660
        temporary allocations:  168984
Heaptrack finished! Now run the following to investigate the data:

  heaptrack --analyze heaptrack.sourmash.66565.gz

heaptrack --analyze is a very nice graphical interface for analyzing the results, but for this PR I'm mostly focusing on the Summary page (and overall memory consumption). Tracking allocations in Python doesn't give many details, because it shows the CPython functions being called, but the ability to track into the extension code (Rust) allocations is amazing for finding bottlenecks (and memory leaks =P). ³

CPU profiling: py-spy

Just as other solutions exist for profiling memory, there are many for profiling CPU usage in Python, including profile and cProfile in the standard library. Again, the issue is being able to analyze extension code, and bringing the cannon (the perf command in Linux, for example) looses the benefit of tracking Python code properly (because we get back the CPython functions, not what you defined in your Python code).

Enters py-spy by Ben Frederickson, based on the rbspy project by Julia Evans. Both use a great idea: read the process maps for the interpreters and resolve the full stack trace information, with low overhead (because it uses sampling). py-spy also goes further and resolves native Python extensions stack traces, meaning we can get the complete picture all the way from the Python CLI to the Rust core library!⁴

py-spy is also easy to use: stick py-spy record --output search.svg -n -- in front of the command, and it will generate a flamegraph in search.svg. The full command for this PR is

py-spy record --output search.svg -n -- sourmash search -o out.csv --scaled 2000 -k 51 HSMA.fastq.sig genbank-k51.lca.json.gz

Show me the code!

OK, OK, sheesh. But it's worth repeating: the code is important, but there are many other aspects that are just as important =]

Replacing `add_hash` calls with one `add_many`

Let's start at the _signatures() method on LCA indices. This is the original method:

@cached_property
def _signatures(self):
    "Create a _signatures member dictionary that contains {idx: minhash}."
    from .. import MinHash

    minhash = MinHash(n=0, ksize=self.ksize, scaled=self.scaled)

    debug('creating signatures for LCA DB...')
    sigd = defaultdict(minhash.copy_and_clear)

    for (k, v) in self.hashval_to_idx.items():
        for vv in v:
            sigd[vv].add_hash(k)

    debug('=> {} signatures!', len(sigd))
    return sigd

sigd[vv].add_hash(k) is the culprit. Each call to .add_hash has to go thru CFFI to reach the extension code, and the overhead is significant. It is a similar situation to accessing array elements in NumPy: it works, but it is way slower than using operations that avoid crossing from Python to the extension code. What we want to do instead is call .add_many(hashes), which takes a list of hashes and process it entirely in Rust (ideally. We will get there).

But, to have a list of hashes, there is another issue with this code.

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        sigd[vv].add_hash(k)

There are two nested for loops, and add_hash is being called with values from the inner loop. So... we don't have the list of hashes beforehand.

But we can change the code a bit to save the hashes for each signature in a temporary list, and then call add_many on the temporary list. Like this:

temp_vals = defaultdict(list)

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].append(k)

for sig, vals in temp_vals.items():
    sigd[sig].add_many(vals)

There is a trade-off here: if we save the hashes in temporary lists, will the memory consumption be so high that the runtime gains of calling add_many in these temporary lists be cancelled?

Time to measure it =]

version	mem	time
original	1.5 GB	160s
`list`	1.7GB	173s

Wait, it got worse?!?! Building temporary lists only takes time and memory, and bring no benefits!

This mystery goes away when you look at the add_many method:

def add_many(self, hashes):
    "Add many hashes in at once."
    if isinstance(hashes, MinHash):
        self._methodcall(lib.kmerminhash_add_from, hashes._objptr)
    else:
        for hash in hashes:
            self._methodcall(lib.kmerminhash_add_hash, hash)

The first check in the if statement is a shortcut for adding hashes from another MinHash, so let's focus on else part... And turns out that add_many is lying! It doesn't process the hashes in the Rust extension, but just loops and call add_hash for each hash in the list. That's not going to be any faster than what we were doing in _signatures.

Time to fix add_many!

Oxidizing `add_many`

The idea is to change this loop in add_many:

for hash in hashes:
    self._methodcall(lib.kmerminhash_add_hash, hash)

with a call to a Rust extension function:

self._methodcall(lib.kmerminhash_add_many, list(hashes), len(hashes))

self._methodcall is a convenience method defined in RustObject which translates a method-like call into a function call, since our C layer only has functions. This is the C prototype for this function:

void kmerminhash_add_many(
    KmerMinHash *ptr,
    const uint64_t *hashes_ptr,
    uintptr_t insize
  );

You can almost read it as a Python method declaration, where KmerMinHash *ptr means the same as the self in Python methods. The other two arguments are a common idiom when passing pointers to data in C, with insize being how many elements we have in the list. ⁵. CFFI is very good at converting Python lists into pointers of a specific type, as long as the type is of a primitive type (uint64_t in our case, since each hash is a 64-bit unsigned integer number).

And the Rust code with the implementation of the function:

ffi_fn! {
unsafe fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize,
  ) -> Result<()> {
    let mh = {
        assert!(!ptr.is_null());
        &mut *ptr
    };

    let hashes = {
        assert!(!hashes_ptr.is_null());
        slice::from_raw_parts(hashes_ptr as *mut u64, insize)
    };

    for hash in hashes {
      mh.add_hash(*hash);
    }

    Ok(())
}
}

Let's break what's happening here into smaller pieces. Starting with the function signature:

ffi_fn! {
unsafe fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize,
  ) -> Result<()>

The weird ffi_fn! {} syntax around the function is a macro in Rust: it changes the final generated code to convert the return value (Result<()>) into something that is valid C code (in this case, void). What happens if there is an error, then? The Rust extension has code for passing back an error code and message to Python, as well as capturing panics (when things go horrible bad and the program can't recover) in a way that Python can then deal with (raising exceptions and cleaning up). It also sets the #[no_mangle] attribute in the function, meaning that the final name of the function will follow C semantics (instead of Rust semantics), and can be called more easily from C and other languages. This ffi_fn! macro comes from symbolic, a big influence on the design of the Python/Rust bridge in sourmash.

unsafe is the keyword in Rust to disable some checks in the code to allow potentially dangerous things (like dereferencing a pointer), and it is required to interact with C code. unsafe doesn't mean that the code is always unsafe to use: it's up to whoever is calling this to verify that valid data is being passed and invariants are being preserved.

If we remove the ffi_fn! macro and the unsafe keyword, we have

fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize
  );

At this point we can pretty much map between Rust and the C function prototype:

void kmerminhash_add_many(
    KmerMinHash *ptr,
    const uint64_t *hashes_ptr,
    uintptr_t insize
  );

Some interesting points:

We use fn to declare a function in Rust.
The type of an argument comes after the name of the argument in Rust, while it's the other way around in C. Same for the return type (it is omitted in the Rust function, which means it is -> (), equivalent to a void return type in C).
In Rust everything is immutable by default, so we need to say that we want a mutable pointer to a KmerMinHash item: *mut KmerMinHash). In C everything is mutable by default.
u64 in Rust -> uint64_t in C
usize in Rust -> uintptr_t in C

Let's check the implementation of the function now. We start by converting the ptr argument (a raw pointer to a KmerMinHash struct) into a regular Rust struct:

let mh = {
    assert!(!ptr.is_null());
    &mut *ptr
};

This block is asserting that ptr is not a null pointer, and if so it dereferences it and store in a mutable reference. If it was a null pointer the assert! would panic (which might sound extreme, but is way better than continue running because dereferencing a null pointer is BAD). Note that functions always need all the types in arguments and return values, but for variables in the body of the function Rust can figure out types most of the time, so no need to specify them.

The next block prepares our list of hashes for use:

let hashes = {
    assert!(!hashes_ptr.is_null());
    slice::from_raw_parts(hashes_ptr as *mut u64, insize)
};

We are again asserting that the hashes_ptr is not a null pointer, but instead of dereferencing the pointer like before we use it to create a slice, a dynamically-sized view into a contiguous sequence. The list we got from Python is a contiguous sequence of size insize, and the slice::from_raw_parts function creates a slice from a pointer to data and a size.

Oh, and can you spot the bug? I created the slice using *mut u64, but the data is declared as *const u64. Because we are in an unsafe block Rust let me change the mutability, but I shouldn't be doing that, since we don't need to mutate the slice. Oops.

Finally, let's add hashes to our MinHash! We need a for loop, and call add_hash for each hash:

for hash in hashes {
  mh.add_hash(*hash);
}

Ok(())

We finish the function with Ok(()) to indicate no errors occurred.

Why is calling add_hash here faster than what we were doing before in Python? Rust can optimize these calls and generate very efficient native code, while Python is an interpreted language and most of the time don't have the same guarantees that Rust can leverage to generate the code. And, again, calling add_hash here doesn't need to cross FFI boundaries or, in fact, do any dynamic evaluation during runtime, because it is all statically analyzed during compilation.

Putting it all together

And... that's the PR code. There are some other unrelated changes that should have been in new PRs, but since they were so small it would be more work than necessary. OK, that's a lame excuse: it's confusing for reviewers to see these changes here, so avoid doing that if possible!

But, did it work?

version	mem	time
original	1.5 GB	160s
`list`	1.7GB	73s

We are using 200 MB of extra memory, but taking less than half the time it was taking before. I think this is a good trade-off, and so did the reviewer and the PR was approved.

Hopefully this was useful, 'til next time!

Comments?

Bonus: `list` or `set`?

The first version of the PR used a set instead of a list to accumulate hashes. Since a set doesn't have repeated elements, this could potentially use less memory. The code:

temp_vals = defaultdict(set)

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].add(k)

for sig, vals in temp_vals.items():
    sigd[sig].add_many(vals)

The runtime was again half of the original, but...

version	mem	time
original	1.5 GB	160s
`set`	3.8GB	80s
`list`	1.7GB	73s

... memory consumption was almost 2.5 times the original! WAT

The culprit this time? The new kmerminhash_add_many call in the add_many method. This one:

self._methodcall(lib.kmerminhash_add_many, list(hashes), len(hashes))

CFFI doesn't know how to convert a set into something that C understands, so we need to call list(hashes) to convert it into a list. Since Python (and CFFI) can't know if the data is going to be used later ⁶ it needs to keep it around (and be eventually deallocated by the garbage collector). And that's how we get at least double the memory being allocated...

There is another lesson here. If we look at the for loop again:

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].add(k)

each k is already unique because they are keys in the hashval_to_idx dictionary, so the initial assumption (that a set might save memory because it doesn't have repeated elements) is... irrelevant for the problem =]

Footnotes

We do have https://asv.readthedocs.io/ set up for micro-benchmarks, and now that I think about it... I could have started by writing a benchmark for add_many, and then showing that it is faster. I will add this approach to the sourmash PR checklist =] ↩
or triple, if you count C ↩
It would be super cool to have the unwinding code from py-spy in heaptrack, and be able to see exactly what Python methods/lines of code were calling the Rust parts... ↩
Even if py-spy doesn't talk explicitly about Rust, it works very very well, woohoo! ↩
Let's not talk about lack of array bounds checks in C... ↩
something that the memory ownership model in Rust does, BTW ↩

Try out Tsuru: announcing limited preview

2019-12-10T03:42:44+00:00

A few days ago, Tsuru got some attention in the news. After reading about Tsuru, and seeing some of its capabilities, people started asking for a way to try Tsuru. Well, your claims were attended! We're preparing a public cloud that will be freely available for beta testers.

TL;DR: go to tsuru.io/try, signup for beta testing and get ready to start deploying Python, Ruby, Go and Java applications in the cloud.

What is Tsuru?

Tsuru is an open source platform as a service that allows developers to automatically deploy and manage web applications written in many different platforms (like Python, Ruby and Go). It aims to provide a solution for cloud computing platforms that is extensible, flexible and component based.

You can run your own public or private cloud using Tsuru. Or you can try it in the public cloud that Globo.com is building.

What is Tsuru public cloud? What does "beta availability" means?

Tsuru public cloud will be a public, freely available, installation of Tsuru, provided by Globo.com. "Beta availability" means that it will not be available for the general Internet public.

People will need to subscribe for the beta testing and wait for the confirmation, so they can start deploying web applications on Tsuru public cloud.

Which development platforms are going to be available?

Tsuru already supports Ruby, Python, Java and Go, so it is very likely that these platforms will be available for all beta users.

It's important to notice that adding new platforms to Tsuru is a straightforward task: each development platform is based on Juju Charms, so one can adapt charms available at Charm Store and send a patch.

How limited is it going to be?

We don't know what's the proper answer for this question yet, but don't worry about numbers now. There will be some kind of per-user quota, but it has not been defined yet.

People interested in running applications in the Tsuru public cloud that get to use the beta version will have access a functional environment where they will be able to deploy at least one web application.

When will it be available?

We're working hard to make it available as soon as possible, and you can help us get it done! If you want to contribute, please take a look at Tsuru repository, chose an issue, discuss your solution and send your patches. We are going to be very happy helping you out.

What if I don't want to wait?

If you want an unlimited, fully manageable and customized installation of Tsuru, you can have it today. Check out Tsuru's documentation and, in case of doubts, don't hesitate in contacting the newborn Tsuru community.

Setting up a Django production environment: compiling and configuring nginx

2019-12-10T03:42:44+00:00

Here is another series of posts: now I’m going to write about setting up a Django production environment using nginx and Green Unicorn in a virtual environment. The subject in this first post is nginx, which is my favorite web server.

This post explains how to install nginx from sources, compiling it (on Linux). You might want to use apt, zif, yum or ports, but I prefer building from sources. So, to build from sources, make sure you have all development dependencies (C headers, including the PCRE library headers, nginx rewrite module uses it). If you want to build nginx with SSL support, keep in mind that you will need the libssl headers too.

Build nginx from source is a straightforward process: all you need to do is download it from the official site and build with some simple options. In our setup, we’re going to install nginx under /opt/nginx, and use it with the nginx system user. So, let’s download and extract the latest stable version (1.0.9) from nginx website:


% curl -O http://nginx.org/download/nginx-1.0.9.tar.gz
% tar -xzf nginx-1.0.9.tar.gz

Once you have extracted it, just configure, compile and install:


% ./configure --prefix=/opt/nginx --user=nginx --group=nginx
% make
% [sudo] make install

As you can see, we provided the /opt/nginx to configure, make sure the /opt directory exists. Also, make sure that there is a user and a group called nginx, if they don’t exist, add them:

% [sudo] adduser --system --no-create-home --disabled-login --disabled-password --group nginx

After that, you can start nginx using the command line below:

% [sudo] /opt/nginx/sbin/nginx

Linode provides an init script that uses start-stop-daemon, you might want to use it.

nginx configuration

nginx comes with a default nginx.conf file, let’s change it to reflect the following configuration requirements:

nginx should start workers with the nginx user
nginx should have two worker processes
the PID should be stored in the /opt/nginx/log/nginx.pid file
nginx must have an access log in /opt/nginx/logs/access.log
the configuration for the Django project we’re going to develop should be versioned with the entire code, so it must be included in the nginx.conf file (assume that the library project is in the directory /opt/projects).

So here is the nginx.conf for the requirements above:


user  nginx;
worker_processes  2;

pid logs/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                     '$status $body_bytes_sent "$http_referer" '
                     '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  logs/access.log  main;

    sendfile           on;
    keepalive_timeout  65;

    include /opt/projects/showcase/nginx.conf;
}

Now we just need to write the configuration for our Django project. I’m using an old sample project written while I was working at Giran: the name is lojas giranianas, a nonsense portuguese joke with a famous brazilian store. It’s an unfinished showcase of products, it’s like an e-commerce project, but it can’t sell, so it’s just a product catalog. The code is available at Github. The nginx.conf file for the repository is here:


server {
    listen 80;
    server_name localhost;

    charset utf-8;

    location / {
        proxy_set_header    X-Real-IP   $remote_addr;
        proxy_set_header    Host        $http_host;
        proxy_set_header    X-Forwarded-For $proxy_add_x_forwarded_for;

        proxy_pass http://localhost:8000;
    }

    location /static {
        root /opt/projects/showcase/;
        expires 1d;
    }
}

The server listens on port 80, responds for the localhost hostname (read more about the Host header). The location /static directive says that nginx will serve the static files of the project. It also includes an expires directive for caching control. The location / directive makes a proxy_pass, forwarding all requisitions to an upstream server listening on port 8000, this server is the subject of the next post of the series: the Green Unicorn (gunicorn) server.

Not only the HTTP request itself is forwarded to the gunicorn server, but also some headers, that helps to properly deal with the request:

X-Real-IP: forwards the remote address to the upstream server, so it can know the real IP of the user. When nginx forwards the request to gunicorn, without this header, all gunicorn will know is that there is a request coming from localhost (or wherever the nginx server is), the remote address is always the IP address of the machine where nginx is running (who actually make the request to gunicorn)
Host: the Host header is forwarded so gunicorn can treat different requests for different hosts. Without this header, it will be impossible to Gunicorn to have these constraints
X-Forwarded-For: also known as XFF, this header provide more precise information about the real IP who makes the request. Imagine there are 10 proxies between the user machine and your webserver, the XFF header will all these proxies comma separated. In order to not turn a proxy into an anonymizer, it’s a good practice to always forward this header.

So that is it, in the next post we are going to install and run gunicorn. In other posts, we’ll see how to make automated deploys using Fabric, and some tricks on caching (using the proxy_cache directive and integrating Django, nginx and memcached).

See you in next posts.

Speaking at OSCON 2014

2019-12-10T03:42:28+00:00

Wow, one year without any posts! But I'm trying to get back...

This is a very short post, just to tell everybody that this year, I will have the opportunity to speak at OSCON 2014. I'm speaking about tsuru, and check more details of the talk in the tsuru blog.

Creating HTML 5 slide presentations using landslide

2019-12-10T03:42:20+00:00

Recently I found landslide, which is a Python tool for creating HTML 5 slide presentations.

It’s based in a famous slide presentation. It’s a simple script that generates HTML from a source file, which can be formatted using reStructuredText, Textile or Markdown.

Let’s make a very simple presentation as a proof of concept: we’re going to create a “Python flow control” presentation, showing some basic structures of the language: if, for and while. We need a cover, a slide for each structure (with some topics and code examples) and the last slide for questions and answers. Here is the RST code for it:


Python
======

--------------

If
==

* Please don't use ()
* Never forget the ``:`` at the end of the line

Check this code:

.. sourcecode:: python

    x, y = 1, 2
    if x > y:
        print 'x is greater'

--------------

For
===

* ``for`` iterates over a sequence
* Never forget the ``:`` at the end of the line

Check this code:

.. sourcecode:: python

    numbers = [1, 2, 3, 4, 5,]
    for number in numbers:
        print number

--------------

While
=====

* ``while`` is like ``if``, but executes while the codition is ``True``
* please don't use ()
* never forget the ``:`` at the end of the line

Check this code:

.. sourcecode:: python

    from random import randint

    args = (1, 10,)
    x = randint(*args)
    while x != 6:
        x = randint(*args)

--------------

Thank you!
==========

As you can see it’s very simple. If you’re familiar with RST syntax, you can guess what landslide does: it converts the entire content to HTML and then split it by <hr /> tag. Each slide will contain two sections: a header and a body. The header contains only an <h1></h1> element and the body contains everything.

We can generate the HTML output by calling the landslide command in the terminal:

% landslide python.rst

To use landslide command, you need to install it. I suggest you do this via pip:

% [sudo] pip install landslide

landslide supports theming, so you can customize it by creating your own theme. Your theme should contain two CSS files: screen.css (for the HTML version of slides) and print.css (for the PDF version of the slides). You might also customize the HTML (base.html) and JS files (slides.js), but you have to customize the CSS files in your theme. You specify the theme using the --theme directive. You might want to check all options available in the command line utility using --help:

% landslide --help

It’s quite easy to extend landslide changing its theme or adding new macros. Check the official repository at Github. This example, and a markdown version for the same example are available in a repository in my github profile.

You can also see the slides live!

Splinter sprint on FISL

2019-12-10T03:42:20+00:00

We are going to start tomorrow, on FISL, another splinter sprint. “From June 29 through July 2, 2011, fisl12 will be hosted at the PUC Events Center, in Porto Alegre, Rio Grande do Sul, Brazil” (copied from FISL website). But don’t worry about the location: anyone in anywhere can join us in this sprint. There is an entry in splinter wiki about this sprint, and I’m just replicating the information here...

What is a splinter sprint?

Basically, a splinter sprint is an excuse for people to focus their undivided attention, for a set time frame, on improving splinter. It’s a focused, scheduled effort to fix bugs, add new features and improve documentation.

Anybody, anywhere around the world, can participate and contribute. If you’ve never contributed to splinter before, this is the perfect chance for you to chip in.

How to contribute

Choose an issue
Create a fork
Send a pull request

Remember: all new features should be well tested and documented. An issue can’t be closed if there isn’t docs for the solution code.

Preparing for the sprint

Get an IRC client, so that you can join us in the channel #cobrateam on Freenode.

See all you there!

Testing jQuery plugins with Jasmine

2019-12-10T03:42:20+00:00

Since I started working at Globo.com, I developed some jQuery plugins (for internal use) with my team, and we are starting to test these plugins using Jasmine, “a behavior-driven development framework for testing your JavaScript code”. In this post, I will show how to develop a very simple jQuery plugin (based on an example that I learned with Ricard D. Worth): zebrafy. This plugin “zebrafies” a table, applying different classes to odd and even lines. Let’s start setting up a Jasmine environment...

First step is download the standalone version of Jasmine, then extract it and edit the runner. The runner is a simple HTML file, that loads Jasmine and all JavaScript files you want to test. But, wait... why not test using node.js or something like this? Do I really need the browser on this test? You don’t need, but I think it is important to test a plugin that works with the DOM using a real browser. Let’s delete some files and lines from SpecRunner.html file, so we adapt it for our plugin. This is how the structure is going to look like:


.
├── SpecRunner.html
├── lib
│   ├── jasmine-1.0.2
│   │   ├── MIT.LICENSE
│   │   ├── jasmine-html.js
│   │   ├── jasmine.css
│   │   └── jasmine.js
│   └── jquery-1.6.1.min.js
├── spec
│   └── ZebrafySpec.js
└── src
    └── jquery.zebrafy.js

You can create the files jquery.zebrafy.js and ZebrafySpec.js, but remember: it is BDD, we need to describe the behavior first, then write the code. So let’s start writing the specs in ZebrafySpec.js file using Jasmine. If you are familiar with RSpec syntax, it’s easy to understand how to write spec withs Jasmine, if you aren’t, here is the clue: Jasmine is a lib with some functions used for writing tests in an easier way. I’m going to explain each function “on demmand”, when we need something, we learn how to use it! ;)

First of all, we need to start a new test suite. Jasmine provides the describe function for that, this function receives a string and another function (a callback). The string describes the test suite and the function is a callback that delimites the scope of the test suite. Here is the Zebrafy suite:


describe('Zebrafy', function () {

});

Let’s start describing the behavior we want to get from the plugin. The most basic is: we want different CSS classes for odd an even lines in a table. Jasmine provides the it function for writing the tests. It also receives a string and a callback: the string is a description for the test and the callback is the function executed as test. Here is the very first test:


it('should apply classes zebrafy-odd and zebrafy-even to each other table lines', function () {
    var table = $("#zebra-table");
    table.zebrafy();
    expect(table).toBeZebrafyied();
});

Okay, here we go: in the first line of the callback, we are using jQuery to select a table using the #zebra-table selector, which will look up for a table with the ID attribute equals to “zebra-table”, but we don’t have this table in the DOM. What about add a new table to the DOM in a hook executed before the test run and remove the table in another hook that runs after the test? Jasmine provide two functions: beforeEach and afterEach. Both functions receive a callback function to be executed and, as the names suggest, the beforeEach callback is called before each test run, and the afterEach callback is called after the test run. Here are the hooks:


beforeEach(function () {
    $('<table id="zebra-table"></table>').appendTo('body');
    for (var i=0; i < 10; i++) {
        $('<tr></tr>').append('<td></td>').append('<td></td>').append('<td></td>').appendTo('#zebra-table');
    };
});

afterEach(function () {
    $("#zebra-table").remove();
});

The beforeEach callback uses jQuery to create a table with 10 rows and 3 columns and add it to the DOM. In afterEach callback, we just remove that table using jQuery again. Okay, now the table exists, let’s go back to the test:


it('should apply classes zebrafy-odd and zebrafy-even to each other table lines', function () {
    var table = $("#zebra-table");
    table.zebrafy();
    expect(table).toBeZebrafyied();
});

In the second line, we call our plugin, that is not ready yet, so let’s forward to the next line, where we used the expect function. Jasmine provides this function, that receives an object and executes a matcher against it, there is a lot of built-in matchers on Jasmine, but toBeZebrafyied is not a built-in matcher. Here is where we know another Jasmine feature: the capability to write custom matchers, but how to do this? You can call the beforeEach again, and use the addMatcher method of Jasmine object:


beforeEach(function () {
    this.addMatchers({
        toBeZebrafyied: function() {
            var isZebrafyied = true;

            this.actual.find("tr:even").each(function (index, tr) {
                isZebrafyied = $(tr).hasClass('zebrafy-odd') === false && $(tr).hasClass('zebrafy-even');
                if (!isZebrafyied) {
                    return;
                };
            });

            this.actual.find("tr:odd").each(function (index, tr) {
                isZebrafyied = $(tr).hasClass('zebrafy-odd') && $(tr).hasClass('zebrafy-even') === false;
                if (!isZebrafyied) {
                    return;
                };
            });

            return isZebrafyied;
        }
    });
});

The method addMatchers receives an object where each property is a matcher. Your matcher can receive arguments if you want. The object being matched can be accessed using this.actual, so here is what the method above does: it takes all odd <tr> elements of the table (this.actual) and check if them have the CSS class zebrafy-odd and don’t have the CSS class zebrafy-even, then do the same checking with even <tr> lines.

Now that we have wrote the test, it’s time to write the plugin. Here some jQuery code:


(function ($) {
    $.fn.zebrafy = function () {
        this.find("tr:even").addClass("zebrafy-even");
        this.find("tr:odd").addClass("zebrafy-odd");
    };
})(jQuery);

I’m not going to explain how to implement a jQuery plugin neither what are those brackets on function, this post aims to show how to use Jasmine to test jQuery plugins.

By convention, jQuery plugins are “chainable”, so let’s make sure the zebrafy plugin is chainable using a spec:


it('zebrafy should be chainable', function() {
    var table = $("#zebra-table");
    table.zebrafy().addClass('black-bg');
    expect(table.hasClass('black-bg')).toBeTruthy();
});

As you can see, we used the built-in matcher toBeTruthy, which asserts that an object or expression is true. All we need to do is return the jQuery object in the plugin and the test will pass:


(function ($) {
    $.fn.zebrafy = function () {
        return this.each(function (index, table) {
            $(table).find("tr:even").addClass("zebrafy-even");
            $(table).find("tr:odd").addClass("zebrafy-odd");
        });
    };
})(jQuery);

So, the plugin is tested and ready to release! :) You can check the entire code and test with more spec in a Github repository.

Splinter: Python tool for acceptance tests on web applications

2019-12-10T03:42:20+00:00

Capybara and Webrat are great Ruby tools for acceptance tests. A few months ago, we started a great tool for acceptance tests in Python web applications, called Splinter. There are many acceptance test tools on Python world: Selenium, Alfajor, Windmill, Mechanize, zope.testbrowser, etc. Splinter was not created to be another acceptance tool, but an abstract layer over other tools, its goal is provide a unique API that make acceptance testing easier and funnier.

In this post, I will show some basic usage of Splinter for simple web application tests. Splinter is a tool useful on tests of any web application. You can even test a Java web application using Splinter. This post example is a "test" of a Facebook feature, just because I want to focus on how to use Splinter, not on how to write a web application. The feature to be tested is the creation of an event (the Splinter sprint), following all the flow: first the user will login on Facebook, then click on "Events" menu item, then click on "Create an Event" button, enter all event informations and click on "Create event" button. So, let’s do it…

First step is create a Browser instance, which will provide method for interactions with browser (where the browser is: Firefox, Chrome, etc.). The code we need for it is very simple:

browser = Browser("firefox")

Browser is a class and its constructor receives the driver to be used with that instance. Nowadays, there are three drivers for Splinter: firefox, chrome and zope.testbrowser. We are using Firefox, and you can easily use Chrome by simply changing the driver from firefox to chrome. It’s also very simple to add another driver to Splinter, and I plan to cover how to do that in another blog post here.

A new browser session is started when we got the browser object, and this is the object used for Firefox interactions. Let's start a new event on Facebook, the Splinter Sprint. First of all, we need to visit the Facebook homepage. There is a visit method on Browser class, so we can use it:

browser.visit("https://www.facebook.com")

visit is a blocking operation: it waits for page to load, then we can navigate, click on links, fill forms, etc. Now we have Facebook homepage opened on browser, and you probably know that we need to login on Facebook page, but what if we are already logged in? So, let's create a method that login on Facebook with provided authentication data only the user is not logged in (imagine we are on a TestCase class):

def do_login_if_need(self, username, password):
    if self.browser.is_element_present_by_css('div.menu_login_container'):
        self.browser.fill('email', username)
        self.browser.fill('pass', password)
        self.browser.find_by_css('div.menu_login_container input[type="submit"]').first.click()
        assert self.browser.is_element_present_by_css('li#navAccount')

What was made here? First of all, the method checks if there is an element present on the page, using a CSS selector. It checks for a div that contains the username and password fields. If that div is present, we tell the browser object to fill those fields, then find the submit button and click on it. The last line is an assert to guarantee that the login was successful and the current page is the Facebook homepage (by checking the presence of “Account” li).

We could also find elements by its texts, labels or whatever appears on screen, but remember: Facebook is an internationalized web application, and we can’t test it using only a specific language.

Okay, now we know how to visit a webpage, check if an element is present, fill a form and click on a button. We're also logged in on Facebook and can finally go ahead create the Splinter sprint event. So, here is the event creation flow, for a user:

On Facebook homepage, click on “Events” link, of left menu
The “Events” page will load, so click on “Create an Event” button
The user see a page with a form to create an event
Fill the date and chose the time
Define what is the name of the event, where it will happen and write a short description for it
Invite some guests
Upload a picture for the event
Click on “Create Event” button

We are going to do all these steps, except the 6th, because the Splinter Sprint will just be a public event and we don’t need to invite anybody. There are some boring AJAX requests on Facebook that we need to deal, so there is not only Splinter code for those steps above. First step is click on “Events” link. All we need to do is find the link and click on it:

browser.find_by_css('li#navItem_events a').first.click()

The find_by_css method takes a CSS selector and returns an ElementList. So, we get the first element of the list (even when the selector returns only an element, the return type is still a list) and click on it. Like visit method, click is a blocking operation: the driver will only listen for new actions when the request is finished (the page is loaded).

We’re finally on "new event" page, and there is a form on screen waiting for data of the Splinter Sprint. Let’s fill the form. Here is the code for it:

browser.fill('event_startIntlDisplay', '5/21/2011')
browser.select('start_time_min', '480')
browser.fill('name', 'Splinter sprint')
browser.fill('location', 'Rio de Janeiro, Brazil')
browser.fill('desc', 'For more info, check out the #cobratem channel on freenode!')

That is it: the event is going to happen on May 21th 2011, at 8:00 in the morning (480 minutes). As we know, the event name is Splinter sprint, and we are going to join some guys down here in Brazil. We filled out the form using fill and select methods.

The fill method is used to fill a "fillable" field (a textarea, an input, etc.). It receives two strings: the first is the name of the field to fill and the second is the value that will fill the field. select is used to select an option in a select element (a “combo box”). It also receives two string parameters: the first is the name of the select element, and the second is the value of the option being selected.

Imagine you have the following select element:

<select name="gender">
    <option value="m">Male</option>
    <option value="f">Female</option>
</select>

To select “Male”, you would call the select method this way:

browser.select("gender", "m")

The last action before click on “Create Event” button is upload a picture for the event. On new event page, Facebook loads the file field for picture uploading inside an iframe, so we need to switch to this frame and interact with the form present inside the frame. To show the frame, we need to click on “Add Event Photo” button and then switch to it, we already know how click on a link:

browser.find_by_css('div.eventEditUpload a.uiButton').first.click()

When we click this link, Facebook makes an asynchronous request, which means the driver does not stay blocked waiting the end of the request, so if we try to interact with the frame BEFORE it appears, we will get an ElementDoesNotExist exception. Splinter provides the is_element_present method that receives an argument called wait_time, which is the time Splinter will wait for the element to appear on the screen. If the element does not appear on screen, we can’t go on, so we can assume the test failed (remember we are testing a Facebook feature):

if not browser.is_element_present_by_css('iframe#upload_pic_frame', wait_time=10):
fail("The upload pic iframe did'n't appear :(")

The is_element_present_by_css method takes a CSS selector and tries to find an element using it. It also receives a wait_time parameter that indicates a time out for the search of the element. So, if the iframe element with ID=”upload_pic_frame” is not present or doesn’t appear in the screen after 10 seconds, the method returns False, otherwise it returns True.

Important: fail is a pseudocode sample and doesn’t exist (if you’re using unittest library, you can invoke self.fail in a TestCase, exactly what I did in complete snippet for this example, available at Github).

Now we see the iframe element on screen and we can finally upload the picture. Imagine we have a variable that contains the path of the picture (and not a file object, StringIO, or something like this), and this variable name is picture_path, this is the code we need:

with browser.get_iframe('upload_pic_frame') as frame:
    frame.attach_file('pic', picture_path)
    time.sleep(10)

Splinter provides the get_iframe method that changes the context and returns another objet to interact with the content of the frame. So we call the attach_file method, who also receives two strings: the first is the name of the input element and the second is the absolute path to the file being sent. Facebook also uploads the picture asynchronously, but there’s no way to wait some element to appear on screen, so I just put Python to sleep 10 seconds on last line.

After finish all these steps, we can finally click on “Create Event” button and asserts that Facebook created it:

browser.find_by_css('label.uiButton input[type="submit"]').first.click()
title = browser.find_by_css('h1 span').first.text
assert title == 'Splinter sprint'

After create an event, Facebook redirects the browser to the event page, so we can check if it really happened by asserting the header of the page. That’s what the code above does: in the new event page, it click on submit button, and after the redirect, get the text of a span element and asserts that this text equals to “Splinter sprint”.

That is it! This post was an overview on Splinter API. Check out the complete snippet, written as a test case and also check out Splinter repository at Github.

Killer Java applications server with nginx and memcached

2019-12-10T03:42:20+00:00

Last days I worked setting up a new web serving structure for Wine, the largest wine’s e-commerce in Latin America. After testing, studying and learning a lot, we built a nice solution based on nginx and memcached. I will use a picture to describe the architecture:

As you can see, when a client do a request to the nginx server, it first checks on memcached if the response is already cached. If the response was not found on cache server, then nginx forward the request to Tomcat, which process the request, cache the response on memcached and returns it to nginx. Tomcat works only for the first client, and all other clients requesting the same resource will get the cached response on RAM. My objective with this post is to show how we built this architecture.

nginx

nginx was compiled following Linode instructions for nginx installation from source. The only difference is that we added the nginx memcached module. So, first I downloaded the memc_module source from Github and then built nginx with it. Here is the commands for compiling nginx with memcached module:

% ./configure --prefix=/opt/nginx --user=nginx --group=nginx --with-http_ssl_module --add-module={your memc_module source path}
% make
% sudo make install

After install nginx and create an init script for it, we can work on its settings for integration with Tomcat. Just for working with separate settings, we changed the nginx.conf file (located in /opt/nginx/conf directory), and it now looks like this:

user  nginx;
worker_processes  1;

error_log  logs/error.log;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                                '$status $body_bytes_sent "$http_referer" '
                                '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  logs/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;

    #gzip  on;

    include /opt/nginx/sites-enabled/*;
}

See the last line inside http section: this line tells nginx to include all settings present in the /opt/nginx/sites-enabled directory. So, now, let’s create a default file in this directory, with this content:

server {
    listen       80;
    server_name  localhost;

    default_type  text/html;

    location / {
        proxy_set_header    X-Real-IP   $remote_addr;
        proxy_set_header    Host        $http_host;
        proxy_set_header    X-Forwarded-For $proxy_add_x_forwarded_for;

        if ($request_method = POST) {
            proxy_pass      http://localhost:8080;
            break;
        }

        set $memcached_key   "$uri";
        memcached_pass      127.0.0.1:11211;

        error_page  501 404 502 = /fallback$uri;
    }

    location /fallback/ {
        internal;    

        proxy_set_header    X-Real-IP   $remote_addr;
        proxy_set_header    Host        $http_host;
        proxy_set_header    X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_redirect      off;

        proxy_pass          http://localhost:8080;
    }

}

Some stuffs must be explained here: the default_type directive is necessary for proper serving of cached responses (if you are cache other content types like application/json or application/xml, you should take a look at nginx documentation and deal conditionally with content types). The location / scope defines some settings for proxy, like IP and host. We just did it because we need to pass the right information to our backend (Tomcat or memcached). See more about proxy_set_header at nginx documentation. After that, there is a simple verification oF the request method. We don’t want to cache POST requests.

Now we get the magic: first we set the $memcached_key and then we use the memcached_pass directive, the $memcached_key is the URI. memcached_pass is very similar to proxy_pass, nginx “proxies” the request to memcached, so we can get some HTTP status code, like 200, 404 or 502. We define error handlers for two status codes:

404: memcached module returns a 404 error when the key is not on memcached server;
502: memcached module returns a 502 error when it can’t found memcached server.

So, when nginx gets any of those errors, it should forward the request to Tomcat, creating another proxy. We configured it out on fallback, an internal location that builds a proxy between nginx and Tomcat (listening on port 8080). Everything is set up with nginx. As you can see in the picture or in the nginx configuration file, nginx doesn’t write anything to memcached, it only reads from memcached. The application should write to memcached. Let’s do it.

Java application

Now is the time to write some code. I chose an application written by a friend of mine. It’s a very simple CRUD of users, built by Washington Botelho with the goal of introducing VRaptor, a powerful and fast development focused web framework. Washington also wrote a blog post explaining the application, if you don’t know VRaptor or want to know how the application was built, check the blog post "Getting started with VRaptor 3". I forked the application, made some minor changes and added a magic filter for caching. All Java code that I want to show here is the filter code:


package com.franciscosouza.memcached.filter;

import java.io.IOException;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.net.InetSocketAddress;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletOutputStream;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;

import net.spy.memcached.MemcachedClient;

/**
 * Servlet Filter implementation class MemcachedFilter
 */
public class MemcachedFilter implements Filter {

    private MemcachedClient mmc;

    static class MemcachedHttpServletResponseWrapper extends HttpServletResponseWrapper {

        private StringWriter sw = new StringWriter();

        public MemcachedHttpServletResponseWrapper(HttpServletResponse response) {
            super(response);
        }

        public PrintWriter getWriter() throws IOException {
            return new PrintWriter(sw);
        }

        public ServletOutputStream getOutputStream() throws IOException {
            throw new UnsupportedOperationException();
        }

        public String toString() {
            return sw.toString();
        }
    }

    /**
     * Default constructor.
     */
    public MemcachedFilter() {
    }

    /**
     * @see Filter#destroy()
     */
    public void destroy() {
    }

    /**
     * @see Filter#doFilter(ServletRequest, ServletResponse, FilterChain)
     */
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        MemcachedHttpServletResponseWrapper wrapper = new MemcachedHttpServletResponseWrapper((HttpServletResponse) response);
        chain.doFilter(request, wrapper);

        HttpServletRequest inRequest = (HttpServletRequest) request;
        HttpServletResponse inResponse = (HttpServletResponse) response;

        String content = wrapper.toString();

        PrintWriter out = inResponse.getWriter();
        out.print(content);

        if (!inRequest.getMethod().equals("POST")) {
            String key = inRequest.getRequestURI();
            mmc.set(key, 5, content);
        }
    }

    /**
     * @see Filter#init(FilterConfig)
     */
    public void init(FilterConfig fConfig) throws ServletException {
        try {
            mmc = new MemcachedClient(new InetSocketAddress("localhost", 11211));
        } catch (IOException e) {
            e.printStackTrace();
            throw new ServletException(e);
        }
    }
}

First, the dependency: for memcached communication, we used spymemcached client. It is a simple and easy to use memcached library. I won’t explain all the code, line by line, but I can tell the idea behind the code: first, call doFilter method on FilterChain, because we want to get the response and work with that. Take a look at the MemcachedHttpServletResponseWrapper instance, it encapsulates the response and makes easier to play with response content.

We get the content, write it on response writer and put it in cache using the MemcachedClient provided by spymemcached. The request URI is the key and timeout is 5 seconds.

web.xml

Last step is to add the filter on web.xml file of the project, map it before the VRaptor filter is very important for proper working:


<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://java.sun.com/xml/ns/javaee" xmlns:web="http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" id="WebApp_ID" version="2.5">
    <display-name>memcached sample</display-name>

    <filter>
        <filter-name>vraptor</filter-name>
        <filter-class>br.com.caelum.vraptor.VRaptor</filter-class>
    </filter>
    
    <filter>
        <filter-name>memcached</filter-name>
        <filter-class>com.franciscosouza.memcached.filter.MemcachedFilter</filter-class>
    </filter>
    
    <filter-mapping>
        <filter-name>memcached</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

    <filter-mapping>
        <filter-name>vraptor</filter-name>
        <url-pattern>/*</url-pattern>
        <dispatcher>FORWARD</dispatcher>
        <dispatcher>REQUEST</dispatcher>
    </filter-mapping>

</web-app>

That is it! Now you can just run Tomcat on port 8080 and nginx on port 80, and access http://localhost on your browser. Try some it: raise up the cache timeout, navigate on application and turn off Tomcat. You will still be able to navigate on some pages that use GET request method (users list, home and users form).

Check the entire code out on Github: https://github.com/fsouza/starting-with-vraptor-3. If you have any questions, troubles or comments, please let me know! ;)

Flying with tipfy on Google App Engine

2019-12-10T03:42:20+00:00

Hooray, there is a bonus part in the series (after a looooooooooooong wait)! In the first blog post, about Django, I received a comment about the use of tipfy, a small Python web framework made specifically for Google App Engine. Like Flask, tipfy is not a full stack framework and we will not use a database abstraction layer, we will use just the Google App Engine Datastore API, but tipfy was designed for Google App Engine, so it is less laborious to work with tipfy on App Engine.

First, we have to download tipfy. There are two options on official tipfy page: an all-in-one package and a do-it-yourself packaged. I am lazy, so I downloaded and used the all-in-one package. That is so easy:

% wget http://www.tipfy.org/tipfy.build.tar.gz
% tar -xvzf tipfy.0.6.2.build.tar.gz
% mv project gaeseries

After it, we go to the project folder and see the project structure provided by tipfy. There is a directory called "app", where the App Engine app is located. The app.yaml file is in the app directory, so we open that file and change the application id and the application version. Here is the app.yaml file:

application: gaeseries
version: 4
runtime: python
api_version: 1

derived_file_type:
- python_precompiled

handlers:
- url: /(robots\.txt|favicon\.ico)
  static_files: static/\1
  upload: static/(.*)

- url: /remote_api
  script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
  login: admin

- url: /_ah/queue/deferred
  script: main.py
  login: admin

- url: /.* 
 script: main.py

After this, we can start to code our application. tipfy deals with requests using handlers. A handler is a class that has methods to deal with different kinds of requests. That remember me a little the Strut Actions (blergh), but tipfy is a Python framework, what means that it is easier to build web application using it!

Understanding tipfy: a URL is mapped to a handler that do something with the request and returns a response. So, we have to create two handlers: one to the list of posts and other to create a post, but let’s create first an application called blog, and a model called Post. Like Django, Flask and web2py, tipfy also works with applications inside a project.

To create an application, we just need to create a new Python package with the application name:

% mkdir blog
% touch blog/__init__.py

After create the application structure, we install it by putting the application inside the "apps_installed" list on config.py file:

# -*- coding: utf-8 -*-
"""
    config
    ~~~~~~

    Configuration settings.

    :copyright: 2009 by tipfy.org.
    :license: BSD, see LICENSE for more details.
"""
config = {}

# Configurations for the 'tipfy' module.
config['tipfy'] = {
    # Enable debugger. It will be loaded only in development.
    'middleware': [
        'tipfy.ext.debugger.DebuggerMiddleware',
    ],
    # Enable the Hello, World! app example.
    'apps_installed': [
        'apps.hello_world',
        'apps.blog',
    ],
}

See the line 22. Inside the application folder, let’s create a Python module called models.py. This module is exactly the same of Flask post:

from google.appengine.ext import db

class Post(db.Model):
    title = db.StringProperty(required = True)
    content = db.TextProperty(required = True)
    when = db.DateTimeProperty(auto_now_add = True)
    author = db.UserProperty(required = True)

After create the model, let’s start building the project by creating the post listing handler. The handlers will be in a module called handlers.py, inside the application folder. Here is the handlers.py code:

# -*- coding: utf-8 -*-
from tipfy import RequestHandler
from tipfy.ext.jinja2 import render_response
from models import Post

class PostListingHandler(RequestHandler):
    def get(self):
        posts = Post.all()
        return render_response('list_posts.html', posts=posts)

See that we get a list containing all posts from the database and send it to the list_posts.html template. Like Flask, tipfy uses Jinja2 as template engine by default. Following the same way, let’s create a base.html file who represents the layout of the project. This file should be inside the templates folder and contains the following code:

<html>
    <head>
      <meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
      <title>{% block title %}{% endblock %}</title>
    </head>
    <body id="">
        {% block content %}{% endblock %}
    </body>
</html>

And now we can create the list_posts.html template extending the base.html template:

{% extends "base.html" %}

{% block title %}
    Posts list
{% endblock %}

{% block content %}
    Listing all posts:

    <ul>
        {% for post in posts %}
            <li>
                {{ post.title }} (written by {{ post.author.nickname() }})
                {{ post.content }}
            </li>
        {% endfor %}
    </ul>
{% endblock %}

Can we access the list of posts now by the URL? No, we can’t yet. Now we have to map the handler to a URL, and we will be able to access the list of posts through the browser. On tipfy, all URL mappings of an application are located in a Python module called urls.py. Create it with the following code:

from tipfy import Rule

def get_rules(app):
    rules = [
        Rule('/posts', endpoint='post-listing', handler='apps.blog.handlers.PostListingHandler'),
    ]

    return rules

It is very simple: a Python module containing a function called get_rules, that receives the app object as parameter and return a list containing the rules of the application (each rule is an instance of tipfy.Rule class). Now we can finally see the empty post list on the browser, by running the App Engine development server and touching the http://localhost:8080/posts URL on the browser. Run the following command on the project root:

% /usr/local/google_appengine/dev_appserver.py app

And check the browser at http://localhost:8080/posts. And we see the empty list. Now, let’s create the protected handler which will create a new post. tipfy has an auth extension, who makes very easy to deal with authentication using the native Google App Engine users API. To use that, we need to configure the session extension, changing the conf.py module, by adding the following code lines:

config['tipfy.ext.session'] = {
    'secret_key' : 'just_dev_testH978DAGV9B9sha_W92S',
}

Now we are ready to create the NewPostHandler. We will need to deal with forms, and tipfy has an extension for integration with WTForms, so we have to download and install WTForms and that extension in the project:

% wget http://bitbucket.org/simplecodes/wtforms/get/tip.tar.bz2
% tar -xvf tip.tar.bz2
% cp -r wtforms/wtforms/ ~/Projetos/gaeseries/app/lib/
% wget http://pypi.python.org/packages/source/t/tipfy.ext.wtforms/tipfy.ext.wtforms-0.6.tar.gz
% tar -xvzf tipfy.ext.wtforms-0.6.tar.gz
% cp -r tipfy.ext.wtforms-0.6/tipfy ~/Projetos/gaeseries/app/distlib

Now we have WTForms extension installed and ready to be used. Let’s create the PostForm class, and then create the handler. I put both classes in the handlers.py file (yeah, including the form). Here is the PostForm class code:

class PostForm(Form):
    csrf_protection = True
    title = fields.TextField('Title', validators=[validators.Required()])
    content = fields.TextAreaField('Content', validators=[validators.Required()])

Add this class to the handlers.py module:

class NewPostHandler(RequestHandler, AppEngineAuthMixin, AllSessionMixins):
    middleware = [SessionMiddleware]

    @login_required
    def get(self, **kwargs):
        return render_response('new_post.html', form=self.form)

    @login_required
    def post(self, **kwargs):
        if self.form.validate():
            post = Post(
                title = self.form.title.data,
                content = self.form.content.data,
                author = self.auth_session
            )
            post.put()
            return redirect('/posts')
        return self.get(**kwargs)

    @cached_property
    def form(self):
        return PostForm(self.request)

A lot of news here: first, tipfy explores the multi-inheritance Python feature and if you will use the auth extension by the native App Engine users API, you have to create you handler class extending AppEngineAuthMixin and AllSessionMixins classes, and add to the middleware list the SessionMiddleware class. See more at the tipfy docs.

The last step is create the new_post.html template and deploy the application. Here is the new_post.html template code:

{% extends "base.html" %}

{% block title %}
    New post
{% endblock %}

{% block content %}
    <form action="" method="post" accept-charset="utf-8">
        <p>
            <label for="title">{{ form.title.label }}</label>

            {{ form.title|safe }}

            {% if form.title.errors %}
            <ul class="errors">
                {% for error in form.title.errors %}
                <li>{{ error }}</li>
                {% endfor %}
            </ul>
            {% endif %}
        </p>
        <p>
            <label for="content">{{ form.content.label }}</label>

            {{ form.content|safe }}

            {% if form.content.errors %}
            <ul class="errors">
                {% for error in form.content.errors %}
                <li>{{ error }}</li>
                {% endfor %}
            </ul>
            {% endif %}
        </p>
        <p><input type="submit" value="Save post"/></p>
    </form>
{% endblock %}

Now, we can deploy the application on Google App Engine by simply running this command:

% /usr/local/google_appengine/appcfg.py update app

And you can check the deployed application live here: http://4.latest.gaeseries.appspot.com.

The code is available at Github: https://github.com/fsouza/gaeseries/tree/tipfy.

Flying with Flask on Google App Engine

2019-12-10T03:42:20+00:00

A little late, finally I introduce the third part of using Python frameworks in Google App Engine. I wrote before about web2py and Django, and now is the time of Flask, a Python microframework based on Werkzeug, Jinja2 and good intentions. Unlike Django and web2py, Flask is not a full stack framework, it has not a database abstraction layer or an object relational mapper, Flask is totally decoupled from model layer. It is really good, because we can use the power of SQLAlchemy when we are working with relational databases, and when work with non-relational databases, we can use the native API.

Flask is a microframework, what means that we have more power on customizing the applications, but it is also a little more painful to build an application, because the framework is not a father that does about 10 billion of things for us: it is simple, but still fun! As Flask has no data abstraction layer, we will use the BigTable API directly.

So, as we have done in other parts of the series, the sample application will be a very simple blog, with a public view listing all posts and other login protected view used for writing posts. The first step is to setup the environment. It is very simple, but I little laborious: first we create an empty directory and put the app.yaml file inside it (yes, we will build everything from scratch). Here is the app.yaml code:

application: gaeseries
version: 3
runtime: python
api_version: 1

handlers:
- url: .*
  script: main.py

We just set the application ID, the version and the URL handlers. We will handle all request in main.py file. Late on this post, I will show the main.py module, the script that handles Flask with Google App Engine. Now, let’s create the Flask application, and deal with App Engine later :)

Now we need to install Flask inside the application, so we get Flask from Github (I used 0.6 version), extract it and inside the flask directory get the flask subdirectory. Because Flask depends on Werkzeug and Jinja2, and Jinja2 depends on simplejson, you need to get these libraries and install in your application too. Here is how you can get everything:

% wget http://github.com/mitsuhiko/flask/zipball/0.6
% unzip mitsuhiko-flask-0.6-0-g5cadd9d.zip
% cp -r mitsuhiko-flask-5cadd9d/flask ~/Projetos/blog/gaeseries
% wget http://pypi.python.org/packages/source/W/Werkzeug/Werkzeug-0.6.2.tar.gz
% tar -xvzf Werkzeug-0.6.2.tar.gz
% cp -r Werkzeug-0.6.2/werkzeug ~/Projetos/blog/gaeseries/
% wget http://pypi.python.org/packages/source/J/Jinja2/Jinja2-2.5.tar.gz
% tar -xvzf Jinja2-2.5.tar.gz
% cp -r Jinja2-2.5/jinja2 ~/Projetos/blog/gaeseries/
% wget http://pypi.python.org/packages/source/s/simplejson/simplejson-2.1.1.tar.gz
% tar -xvzf simplejson-2.1.1.tar.gz
% cp -r simplejson-2.1.1/simplejson ~/Projetos/blog/gaeseries/

On my computer, the project is under ~/Projetos/blog/gaeseries, put all downloaded tools on the root of your application. Now we have everything that we need to start to create our Flask application, so let’s create a Python package called blog, it will be the application directory:

% mkdir blog
% touch blog/__init__.py

Inside the __init__.py module, we will create our Flask application and start to code. Here is the __init__.py code:

from flask import Flask
import settings

app = Flask('blog')
app.config.from_object('blog.settings')

import views

We imported two modules: settings and views. So we should create the two modules, where we will put the application settings and the views of applications (look that Flask deals in the same way that Django, calling “views” functions that receives a request and returns a response, instead of call it “actions” (like web2py). Just create the files:

% touch blog/views.py
% touch blog/settings.py

Here is the settings.py sample code:

DEBUG=True
SECRET_KEY='dev_key_h8hfne89vm'
CSRF_ENABLED=True
CSRF_SESSION_LKEY='dev_key_h8asSNJ9s9=+'

Now is the time to define the model Post. We will define our models inside the application directory, in a module called models.py:

from google.appengine.ext import db

class Post(db.Model):
    title = db.StringProperty(required = True)
    content = db.TextProperty(required = True)
    when = db.DateTimeProperty(auto_now_add = True)
    author = db.UserProperty(required = True)

The last property is a UserProperty, a “foreign key” to a user. We will use the Google App Engine users API, so the datastore API provides this property to establish a relationship between custom models and the Google account model.

We have defined the model, and we can finally start to create the application’s views. Inside the views module, let’s create the public view with all posts, that will be accessed by the URL /posts:

from blog import app
from models import Post
from flask import render_template

@app.route('/posts')
def list_posts():
    posts = Post.all()
    return render_template('list_posts.html', posts=posts)

On the last line of the view, we called the function render_template, which renders a template. The first parameter of this function is the template to be rendered, we passed the list_posts.html, so let’s create it using the Jinja2 syntax, inspired by Django templates. Inside the application directory, create a subdirectory called templates and put inside it a HTML file called base.html. That file will be the application layout and here is its code:

<html>
    <head>
        <meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
        <title>{% block title %}Blog{% endblock %}</title>
    </head>
    <body>
        {% block content %}{% endblock %}
    </body>
</html>

And now create the list_posts.html template, with the following code:

{% extends "base.html" %}

{% block content %}
<ul>
    {% for post in posts %}
    <li>
        {{ post.title }} (written by {{ post.author.nickname() }})

        {{ post.content }}
    </li>
    {% endfor %}
</ul>
{% endblock %}

Now, to test it, we need to run Google App Engine development server on localhost. The app.yaml file defined a main.py script as handler for all requests, so to use Google App Engine local development server, we need to create the main.py file that run our application. Every Flask application is a WSGI application, so we can use an App Engine tool for running WSGI application. In that way, the main.py script is really simple:

from google.appengine.ext.webapp.util import run_wsgi_app
from blog import app

run_wsgi_app(app)

The script uses the run_wsgi_app function provided by webapp, the built-in Google Python web framework for App Engine. Now, we can run the application in the same way that we ran in the web2py post:

% /usr/local/google_appengine/dev_appserver.py .

And if you access the URL http://localhost:8080/posts in your browser, you will see a blank page, just because there is no posts on the database. Now we will create a login protected view to write and save a post on the database. Google App Engine does not provide a decorator for validate when a user is logged, and Flask doesn’t provide it too. So, let’s create a function decorator called login_required and decorate the view new_post with that decorator. I created the decorator inside a decorators.py module and import it inside the views.py module. Here is the decorators.py code:

from functools import wraps
from google.appengine.api import users
from flask import redirect, request

def login_required(func):
    @wraps(func)
    def decorated_view(*args, **kwargs):
        if not users.get_current_user():
            return redirect(users.create_login_url(request.url))
        return func(*args, **kwargs)
    return decorated_view

In the new_post view we will deal with forms. IMO, WTForms is the best way to deal with forms in Flask. There is a Flask extension called Flask-WTF, and we can install it in our application for easy dealing with forms. Here is how can we install WTForms and Flask-WTF:

% wget http://pypi.python.org/packages/source/W/WTForms/WTForms-0.6.zip
% unzip WTForms-0.6.zip
% cp -r WTForms-0.6/wtforms ~/Projetos/blog/gaeseries/
% wget http://pypi.python.org/packages/source/F/Flask-WTF/Flask-WTF-0.2.3.tar.gz
% tar -xvzf Flask-WTF-0.2.3.tar.gz
% cp -r Flask-WTF-0.2.3/flaskext ~/Projetos/blog/gaeseries/

Now we have installed WTForms and Flask-WTF, and we can create a new WTForm with two fields: title and content. Remember that the date and author will be filled automatically with the current datetime and current user. Here is the PostForm code (I put it inside the views.py file, but it is possible to put it in a separated forms.py file):

from flaskext import wtf
from flaskext.wtf import validators

class PostForm(wtf.Form):
    title = wtf.TextField('Title', validators=[validators.Required()])
    content = wtf.TextAreaField('Content', validators=[validators.Required()])

Now we can create the new_post view:

@app.route('/posts/new', methods = ['GET', 'POST'])
@login_required
def new_post():
    form = PostForm()
    if form.validate_on_submit():
        post = Post(title = form.title.data,
                    content = form.content.data,
                    author = users.get_current_user())
        post.put()
        flash('Post saved on database.')
        return redirect(url_for('list_posts'))
    return render_template('new_post.html', form=form)

Now, everything we need is to build the new_post.html template, here is the code for this template:

{% extends "base.html" %}

{% block content %}
    <h1 id="">Write a post</h1>
    <form action="{{ url_for('new_post') }}" method="post" accept-charset="utf-8">
        {{ form.csrf_token }}
        <p>
            <label for="title">{{ form.title.label }}</label>

            {{ form.title|safe }}

            {% if form.title.errors %}
            <ul class="errors">
                {% for error in form.title.errors %}
                <li>{{ error }}</li>
                {% endfor %}
            </ul>
            {% endif %}
        </p>
        <p>
            <label for="content">{{ form.content.label }}</label>

            {{ form.content|safe }}

            {% if form.content.errors %}
            <ul class="errors">
                {% for error in form.content.errors %}
                <li>{{ error }}</li>
                {% endfor %}
            </ul>
            {% endif %}
        </p>
        <p><input type="submit" value="Save post"/></p>
    </form>
{% endblock %}

Now everything is working. We can run Google App Engine local development server and access the URL http://localhost:8080/posts/new on the browser, then write a post and save it! Everything is ready to deploy, and the deploy process is the same of web2py, just run on terminal:

% /usr/local/google_appengine/appcfg.py update .

And now the application is online :) Check this out: http://3.latest.gaeseries.appspot.com (use your Google Account to write posts).

You can also check the code out in Github: https://github.com/fsouza/gaeseries/tree/flask.

Flying with web2py on Google App Engine

2019-12-10T03:42:20+00:00

Here is the second part of the series about Python frameworks under Google App Engine. Now we will talk about web2py, a simple and fast Python web framework. Like Django, web2py has a great data abstraction layer. Unlike Django, the web2py data abstraction layer (DAL) was designed to manage non-relational databases, including BigTable.

The first step is setup the environment, which is something really easy ;) First, access the web2py official website and in download section, get the source code in a zip file called web2py_src.zip. After download this file, extract it. A directory called web2py will be created, I renamed it to web2py_blog, but it is not relevant. web2py extracted directory is ready to Google App Engine, it contains an app.yaml file with settings of the application, for the application developed here, the following file was used:

application: gaeseries
version: 2
api_version: 1
runtime: python

handlers:

- url: /(?P<a>.+?)/static/(?P<b>.+)
  static_files: applications/\1/static/\2
  upload: applications/(.+?)/static/(.+)
  secure: optional
  expiration: "90d"

- url: /admin-gae/.*
  script: $PYTHON_LIB/google/appengine/ext/admin
  login: admin
 
- url: /_ah/queue/default
  script: gaehandler.py
  login: admin

- url: .*
  script: gaehandler.py  
  secure: optional

skip_files: |
^(.*/)?(
 (app\.yaml)|
 (app\.yml)|
 (index\.yaml)|
 (index\.yml)|
 (#.*#)|
 (.*~)|
 (.*\.py[co])|
 (.*/RCS/.*)|
 (\..*)|
 ((admin|examples|welcome)\.tar)|
 (applications/(admin|examples)/.*)|
 (applications/.*?/databases/.*) |
 (applications/.*?/errors/.*)|
 (applications/.*?/cache/.*)|
 (applications/.*?/sessions/.*)|
 )$

I changed only the two first lines, everything else was provided by web2py. The web2py project contains a subdirectory called applications where the web2py applications are located. There is an application called welcome used as scaffold to build new applications. So, let’s copy this directory and rename it to blog. Now we can walk in the same way that we walked in the django post: we will use two actions on a controller: one protected by login, where we will save posts, and other public action, where we will list all posts.

We need to define our table model using the web2py database abstraction layer. There is a directory called models with a file called db.py inside the application directory (blog). There are a lot of code in this file, and it is already configured to use Google App Engine (web2py is amazing here) and the web2py built-in authentication tool. We will just add our Post model at the end of the file. Here is the code that defines the model:

current_user_id = (auth.user and auth.user.id) or 0

db.define_table('posts', db.Field('title'),
                    db.Field('content', 'text'),
                    db.Field('author', db.auth_user, default=current_user_id, writable=False),
                    db.Field('date', 'datetime', default=request.now, writable=False)
                )

db.posts.title.requires = IS_NOT_EMPTY()
db.posts.content.requires = IS_NOT_EMPTY()

This code looks a little strange, but it is very simple: we define a database table called posts with four fields: title (a varchar – default type), content (a text), author (a ~~foreign key~~ – forget this in BigTable – to the auth_user table) and date (an automatically filled datetime field). On the last two lines, we define two validations to this model: title and content should not be empty.

Now is the time to define a controller with an action to list all posts registered in the database. Another subdirectory of the blog application is the controllers directory, where we put the controllers. web2py controllers are a Python module, and each function of this module is an action, which responds to HTTP requests. web2py has an automatic URL convention for the action: /<application>/<controller>/<action>. In our example, we will have a controller called posts, so it will be a file called posts.py inside the controllers directory.

In the controller posts.py, we will have the action index, in that way, when we access the URL /blog/posts, we will see the list of the posts. Here is the code of the index action:

def index():
    posts = db().select(db.posts.ALL)
    return response.render('posts/index.html', locals())

As you can see, is just a few of code :) Now we need to make the posts/index.html view. The web2py views system allow the developer to use native Python code on templates, what means that the developer/designer has more power and possibilities. Here is the code of the view posts/index.html (it should be inside the views directory):

{{extend 'layout.html'}}
<h1 id="">Listing all posts</h1>
<dl>
    {{for post in posts:}}
    <dt>{{=post.title}} (written by {{=post.author.first_name}})</dt>
    <dd>{{=post.content}}</dd>
    {{pass}}
</dl>

And now we can run the Google App Engine server locally by typing the following command inside the project root (I have the Google App Engine SDK extracted on my /usr/local/google_appengine):

% /usr/local/google_appengine/dev_appserver.py .

If you check the URL http://localhost:8080/blog/posts, then you will see that we have no posts in the database yet, so let’s create the login protected action that saves a post on the database. Here is the action code:

@auth.requires_login()
def new():
    form = SQLFORM(db.posts, fields=['title','content'])
    if form.accepts(request.vars, session):
        response.flash = 'Post saved.'
        redirect(URL('blog', 'posts', 'index'))
    return response.render('posts/new.html', dict(form=form))

Note that there is a decorator. web2py includes a complete authentication and authorization system, which includes an option for new users registries. So you can access the URL /blog/default/user/register and register yourself to write posts :) Here is the posts/new.html view code, that displays the form:

{{extend 'layout.html'}}

<h1 id="">
Save a new post</h1>
{{=form}}

After it the application is ready to the deploy. The way to do it is running the following command on the project root:

% /usr/local/google_appengine/appcfg.py update .

And see the magic! :) You can check this application live here: http://2.latest.gaeseries.appspot.com/ (you can login with the e-mail demo@demo.com and the password demo, you can also register yourself).

And the code here: https://github.com/fsouza/gaeseries/tree/web2py.

Using Juju to orchestrate CentOS-based cloud services

2019-12-10T03:42:20+00:00

Earlier this week I had the opportunity to meet Kyle MacDonald, head of Ubuntu Cloud, during FISL, and he was surprised when we told him we are using Juju with CentOS at Globo.com. Then I decided to write this post explaining how we came up with a patched version of Juju that allows us to have CentOS clouds managed by Juju.

For those who doesn't know Juju, it's a service orchestration tool, focused on devops "development method". It allows you to deploy services on clouds, local machine and even bare metal machines (using Canonical's MAAS).

It's based on charms and very straightforward to use. Here is a very basic set of commands with which you can deploy a Wordpress related to a MySQL service:


% juju bootstrap
% juju deploy mysql
% juju deploy wordpress
% juju add-relation wordpress mysql
% juju expose wordpress

These commands will boostrap the environment, setting up a bootstrap machine which will manage your services; deploy mysql and wordpress instances; add a relation between them; and expose the wordpress port. The voilà, we have a wordpress deployed, and ready to serve our posts. Amazing, huh?

But there is an issue: although you can install the juju command line tool in almost any OS (including Mac OS), right now you are able do deploy only Ubuntu-based services (you must use an Ubuntu instance or container).

To change this behavior, and enable Juju to spawn CentOS instances (and containers, if you have a CentOS lxc template), we need to develop and apply some changes to Juju and cloud-init. Juju uses cloud-init to spawn machines with proper dependencies set up, and it's based on modules. All we need to do, is add a module able to install rpm packages using yum.

cloud-init modules are Python modules that starts with cc_ and implement a `handle` function (for example, a module called "yum_packages" would be written to a file called cc_yum_packages.py). So, here is the code for the module yum_packages:


import subprocess
import traceback

from cloudinit import CloudConfig, util

frequency = CloudConfig.per_instance


def yum_install(packages):
    cmd = ["yum", "--quiet", "--assumeyes", "install"]
    cmd.extend(packages)
    subprocess.check_call(cmd)


def handle(_name, cfg, _cloud, log, args):
    pkglist = util.get_cfg_option_list_or_str(cfg, "packages", [])

    if pkglist:
        try:
            yum_install(pkglist)
        except subprocess.CalledProcessError:
            log.warn("Failed to install yum packages: %s" % pkglist)
            log.debug(traceback.format_exc())
            raise

    return True

The module installs all packages listed in cloud-init yaml file. If we want to install `emacs-nox` package, we would write this yaml file and use it as user data in the instance:


#cloud-config
modules:
 - yum_packages
packages: [emacs-nox]

cloud-init already works on Fedora, with Python 2.7, but to work on CentOS 6, with Python 2.6, it needs a patch:


--- cloudinit/util.py 2012-05-22 12:18:21.000000000 -0300
+++ cloudinit/util.py 2012-05-31 12:44:24.000000000 -0300
@@ -227,7 +227,7 @@
         stderr=subprocess.PIPE, stdin=subprocess.PIPE)
     out, err = sp.communicate(input_)
     if sp.returncode is not 0:
-        raise subprocess.CalledProcessError(sp.returncode, args, (out, err))
+        raise subprocess.CalledProcessError(sp.returncode, args)
     return(out, err)

I've packet up this module and this patch in a RPM package that must be pre-installed in the lxc template and AMI images. Now, we need to change Juju in order to make it use the yum_packages module, and include all RPM packages that we need to install when the machine borns.

Is Juju, there is a class that is responsible for building and rendering the YAML file used by cloud-init. We can extend it and change only two methods: _collect_packages, that returns the list of packages that will be installed in the machine after it is spawned; and render that returns the file itself. Here is our CentOSCloudInit class (within the patch):


diff -u juju-0.5-bzr531.orig/juju/providers/common/cloudinit.py juju-0.5-bzr531/juju/providers/common/cloudinit.py
--- juju-0.5-bzr531.orig/juju/providers/common/cloudinit.py 2012-05-31 15:42:17.480769486 -0300
+++ juju-0.5-bzr531/juju/providers/common/cloudinit.py 2012-05-31 15:55:13.342884919 -0300
@@ -324,3 +324,32 @@
             "machine-id": self._machine_id,
             "juju-provider-type": self._provider_type,
             "juju-zookeeper-hosts": self._join_zookeeper_hosts()}
+
+
+class CentOSCloudInit(CloudInit):
+
+    def _collect_packages(self):
+        packages = [
+            "bzr", "byobu", "tmux", "python-setuptools", "python-twisted",
+            "python-txaws", "python-zookeeper", "python-devel", "juju"]
+        if self._zookeeper:
+            packages.extend([
+                "zookeeper", "libzookeeper", "libzookeeper-devel"])
+        return packages
+
+    def render(self):
+        """Get content for a cloud-init file with appropriate specifications.
+
+        :rtype: str
+
+        :raises: :exc:`juju.errors.CloudInitError` if there isn't enough
+            information to create a useful cloud-init.
+        """
+        self._validate()
+        return format_cloud_init(
+            self._ssh_keys,
+            packages=self._collect_packages(),
+            repositories=self._collect_repositories(),
+            scripts=self._collect_scripts(),
+            data=self._collect_machine_data(),
+            modules=["ssh", "yum_packages", "runcmd"])

The other change we need is in the format_cloud_init function, in order to make it recognize the modules parameter that we used above, and tell cloud-init to not run apt-get (update nor upgrade). Here is the patch:


diff -ur juju-0.5-bzr531.orig/juju/providers/common/utils.py juju-0.5-bzr531/juju/providers/common/utils.py
--- juju-0.5-bzr531.orig/juju/providers/common/utils.py 2012-05-31 15:42:17.480769486 -0300
+++ juju-0.5-bzr531/juju/providers/common/utils.py 2012-05-31 15:44:06.605014021 -0300
@@ -85,7 +85,7 @@
 
 
 def format_cloud_init(
-    authorized_keys, packages=(), repositories=None, scripts=None, data=None):
+    authorized_keys, packages=(), repositories=None, scripts=None, data=None, modules=None):
     """Format a user-data cloud-init file.
 
     This will enable package installation, and ssh access, and script
@@ -117,8 +117,8 @@
         structure.
     """
     cloud_config = {
-        "apt-update": True,
-        "apt-upgrade": True,
+        "apt-update": False,
+        "apt-upgrade": False,
         "ssh_authorized_keys": authorized_keys,
         "packages": [],
         "output": {"all": "| tee -a /var/log/cloud-init-output.log"}}
@@ -136,6 +136,11 @@
     if scripts:
         cloud_config["runcmd"] = scripts
 
+    if modules:
+        cloud_config["modules"] = modules
+
     output = safe_dump(cloud_config)
     output = "#cloud-config\n%s" % (output)
     return output

This patch is also packed up within juju-centos-6 repository, which provides sources for building RPM packages for juju, and also some pre-built RPM packages.

Now just build an AMI image with cloudinit pre-installed, configure your juju environments.yaml file to use this image in the environment and you are ready to deploy cloud services on CentOS machines using Juju!

Some caveats:

Juju needs a user called ubuntu to interact with its machines, so you will need to create this user in your CentOS AMI/template.
You need to host all RPM packages for juju, cloud-init and following dependencies in some yum repository (I haven't submitted them to any public repository):
With this patched Juju, you will have a pure-centos cloud. It does not enable you to have multiple OSes in the same environment.

It's important to notice that we are going to put some effort to make the Go version of juju born supporting multiple OSes, ideally through an interface that makes it extensible to any other OS, not Ubuntu and CentOS only.

Go solution for the Dining philosophers problem

2019-12-10T03:42:20+00:00

I spent part of the sunday solving the Dining Philosophers using Go. The given solution is based in the description for the problem present in The Little Book of Semaphores:

The Dining Philosophers Problem was proposed by Dijkstra in 1965, when dinosaurs ruled the earth. It appears in a number of variations, but the standard features are a table with ﬁve plates, ﬁve forks (or chopsticks) and a big bowl of spaghetti.

There are some constraints:

Only one philosopher can hold a fork at a time
It must be impossible for a deadlock to occur
It must be impossible for a philosopher to starve waiting for a fork
It must be possible for more than one philosopher to eat at the same time

No more talk, here is my solution for the problem:

package main

import (
    "fmt"
    "sync"
    "time"
)

type Fork struct {
    sync.Mutex
}

type Table struct {
    philosophers chan Philosopher
    forks []*Fork
}

func NewTable(forks int) *Table {
    t := new(Table)
    t.philosophers = make(chan Philosopher, forks - 1)
    t.forks = make([]*Fork, forks)
    for i := 0; i < forks; i++ {
        t.forks[i] = new(Fork)
    }
    return t
}

func (t *Table) PushPhilosopher(p Philosopher) {
    p.table = t
    t.philosophers <- data-blogger-escaped-0="" data-blogger-escaped-1="" data-blogger-escaped-2="" data-blogger-escaped-3="" data-blogger-escaped-4="" data-blogger-escaped-:="range" data-blogger-escaped-_="" data-blogger-escaped-able="" data-blogger-escaped-anscombe="" data-blogger-escaped-artin="" data-blogger-escaped-chan="" data-blogger-escaped-e9="" data-blogger-escaped-eat="" data-blogger-escaped-eating...="" data-blogger-escaped-eter="" data-blogger-escaped-f="" data-blogger-escaped-fed.="" data-blogger-escaped-fed="" data-blogger-escaped-fmt.printf="" data-blogger-escaped-for="" data-blogger-escaped-func="" data-blogger-escaped-getforks="" data-blogger-escaped-go="" data-blogger-escaped-heidegger="" data-blogger-escaped-homas="" data-blogger-escaped-index="" data-blogger-escaped-int="" data-blogger-escaped-is="" data-blogger-escaped-leftfork.lock="" data-blogger-escaped-leftfork.unlock="" data-blogger-escaped-leftfork="" data-blogger-escaped-leibniz="" data-blogger-escaped-len="" data-blogger-escaped-lizabeth="" data-blogger-escaped-lombard="" data-blogger-escaped-main="" data-blogger-escaped-make="" data-blogger-escaped-n="" data-blogger-escaped-nagel="" data-blogger-escaped-name="" data-blogger-escaped-ork="" data-blogger-escaped-ottfried="" data-blogger-escaped-p.eat="" data-blogger-escaped-p.fed="" data-blogger-escaped-p.getforks="" data-blogger-escaped-p.name="" data-blogger-escaped-p.putforks="" data-blogger-escaped-p.table.popphilosopher="" data-blogger-escaped-p.table.pushphilosopher="" data-blogger-escaped-p.table="nil" data-blogger-escaped-p.think="" data-blogger-escaped-p="" data-blogger-escaped-philosopher="" data-blogger-escaped-philosopherindex="" data-blogger-escaped-philosophers="" data-blogger-escaped-popphilosopher="" data-blogger-escaped-pre="" data-blogger-escaped-putforks="" data-blogger-escaped-return="" data-blogger-escaped-rightfork.lock="" data-blogger-escaped-rightfork.unlock="" data-blogger-escaped-rightfork="" data-blogger-escaped-s="" data-blogger-escaped-string="" data-blogger-escaped-struct="" data-blogger-escaped-t.forks="" data-blogger-escaped-t="" data-blogger-escaped-table="" data-blogger-escaped-think="" data-blogger-escaped-thinking...="" data-blogger-escaped-time.sleep="" data-blogger-escaped-type="" data-blogger-escaped-was="">
Any feedback is very welcome.

Speaking at PythonBrasil[7]

2019-12-10T03:42:20+00:00

Next weekend I’ll be talking about scaling Django applications at Python Brasil, the brazilian Python conference. It will be my first time at the conference, which is one of the greatest Python conferences in Latin America.

Some international dudes are also attending to the conference: Wesley Chun is going to talk about Python 3 and Google App Engine; Alan Runyan will talk about free and open source software, and Steve Holden will be talking about the issues involved in trying to build a global Python user group.

There is also Maciej Fijalkowski, PyPy core developer, talking about little things PyPy makes possible.

As I pointed before, I’m going to talk about scalability, based in some experiences aquired scaling Django applications at Globo.com, like G1, the greatest news portal in the Latin America.

Flying with Django on Google App Engine

2019-12-10T03:42:19+00:00

Google App Engine is a powerful tool for web developers. I am sure that it is useful and every developer should taste it =) Python was the first programming language supported by App Engine, and is a programming language with a lot of web frameworks. So, you can use some of these frameworks on Google App Engine. In a series of three blog posts, I will show how to use three Python web frameworks on App Engine: Django, Flask and web2py (not necessarily in this order).

The first framework is Django, the most famous of all Python frameworks and maybe is used the most.

Django models is the strongest Django feature. It is a high level database abstraction layer with a powerful object-relational mapper, it supports a lot of relational database management systems, but App Engine doesn’t use a relational database. The database behind App Engine is called BigTable, which is a distributed storage system for managing structured data, designed to scale to a very large size (Reference: Bigtable: A Distributed Storage System for Structured Data). It is not based on schemas, tables, keys or columns, it is like a big map indexed by a row key, column key and a timestamp. We can not use native version of Django models with Bigtable, because the Django models framework was not designed for non relational databases.

So, what can we do? There is a Django fork, the django-nonrel project, which aims to bring the power of the Django model layer to non-relational databases. I will use the djangoappengine sub-project to build the sample application of this post, that will be deployed on Google App Engine :)

The sample application is the default: a blog. A very simple blog, with only a form protected by login (using Django built-in authentication system instead of Google Accounts API) and a public page listing all blog posts. It is very easy and simple to do, so let’s do it.

First, we have to setup our environment. According the djangoappengine project documentation, we need to download 4 zip files and put it together. First, I downloaded the django-testapp file, extract its contents and renamed the project directory from django-testapp to blog_gae. After this step, I downloaded the other files and put it inside the blog_gae directory. Here is the final project structure:

“django” directory is from the django-nonrel zip file, “djangoappengine” directory is from djangoappengine zip file and “djangotoolbox” directory is from djangotoolbox zip file. Look that is provided an app.yaml file, ready to be customized. I just changed the application id inside this file. The final code of the file is the following:

application: gaeseries
version: 1
runtime: python
api_version: 1

default_expiration: '365d'

handlers:
- url: /remote_api
  script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
  login: admin

- url: /_ah/queue/deferred
  script: djangoappengine/deferred/handler.py
  login: admin

- url: /media/admin
  static_dir: django/contrib/admin/media/

- url: /.*
  script: djangoappengine/main/main.py

I will use one version for each part of the series, so it is the first version because it is the first part =D In settings.py, we just uncomment the app django.contrib.auth line inside the INSTALLED_APPS tuple, because we want to use the built-in auth application instead of the Google Accounts API provided by App Engine.

All settings are ok now, it is time to create the core application. In the Django project, we will use the core application to manage models and serve some views. We just start it using the following command:

% python manage.py startapp core

It is a famous Django command, that creates the application structure, which is a Python package containing 3 Python modules: models, tests and views. Now we have to create the Post model. Here is the code of models.py file:

from django.db import models
from django.contrib.auth.models import User

class Post(models.Model):
    title = models.CharField(max_length = 200)
    content = models.TextField()
    date = models.DateTimeField(auto_now_add = True)
    user = models.ForeignKey(User)

Now we just need to “install” the core application putting it on INSTALLED_APPS tuple in settings.py file and Django will be ready to play with BigTable. :) We will use the django.contrib.auth app, so let’s run a manage command to create a superuser:

% python manage.py createsuperuser

After create the superuser, we need to setup login and logout URLs, and make two templates. So, in urls.py file, put two mappings to login and logout views. The file will look like this:

from django.conf.urls.defaults import *

urlpatterns = patterns('',
    ('^$', 'django.views.generic.simple.direct_to_template',
     {'template': 'home.html'}),

    ('^login/$', 'django.contrib.auth.views.login'),
    ('^logout/$', 'django.contrib.auth.views.logout'),
)

Here is the registration/login.html template:

{% extends "base.html" %}

{% block content %}

<p>Fill the form below to login in the system ;)</p>

{% if form.errors %}
<p>Your username and password didn't match. Please try again.</p>
{% endif %}

<form method="post" action="{% url django.contrib.auth.views.login %}">{% csrf_token %}
<table>
<tr>
    <td>{{ form.username.label_tag }}</td>
    <td>{{ form.username }}</td>
</tr>
<tr>
    <td>{{ form.password.label_tag }}</td>
    <td>{{ form.password }}</td>
</tr>
</table>

<input type="submit" value="login" />
<input type="hidden" name="next" value="{{ next }}" />
</form>

{% endblock %}

And registration/logged_out.html template:

{% extends "base.html" %}

{% block content %}
    Bye :)
{% endblock %}

See the two added lines in highlight. In settings.py file, add three lines:

LOGIN_URL = '/login/'
LOGOUT_URL = '/logout/'
LOGIN_REDIRECT_URL = '/'

And we are ready to code =) Let’s create the login protected view, where we will write and save a new post. To do that, first we need to create a Django Form, to deal with the data. There are two fields in this form: title and content, when the form is submitted, the user property is filled with the current logged user and the date property is filled with the current time. So, here is the code of the ModelForm:

class PostForm(forms.ModelForm):
    class Meta:
        model = Post
        exclude = ('user',)

    def save(self, user, commit = True):
        post = super(PostForm, self).save(commit = False)
        post.user = user

        if commit:
            post.save()

        return post

Here is the views.py file, with the two views (one “mocked up”, with a simple redirect):

from django.contrib.auth.decorators import login_required
from django.shortcuts import render_to_response
from django.template import RequestContext
from django.http import HttpResponseRedirect
from django.core.urlresolvers import reverse
from forms import PostForm

@login_required
def new_post(request):
    form = PostForm()
    if request.method == 'POST':
        form = PostForm(request.POST)
        if form.is_valid():
            form.save(request.user)
            return HttpResponseRedirect(reverse('core.views.list_posts'))
    return render_to_response('new_post.html',
            locals(), context_instance=RequestContext(request)
    )

def list_posts(request):
    return HttpResponseRedirect('/')

There is only two steps to do to finally save posts on BigTable: map a URL for the views above and create the new_post.html template. Here is the mapping code:

('^posts/new/$', 'core.views.new_post'),
('^posts/$', 'core.views.list_posts'),

And here is the template code:

{% extends "base.html" %}

{% block content %}
    <form action="{% url core.views.new_post %}" method="post" accept-charset="utf-8">
        {% csrf_token %}
        {{ form.as_p }}
        <p><input type="submit" value="Post!"/></p>
    </form>
{% endblock %}

Now, we can run on terminal ./manage.py runserver and access the URL http://localhost:8000/posts/new on the browser, see the form, fill it and save the post :D The last one step is list all posts in http://localhost:8000/posts/. The list_posts view is already mapped to the URL /posts/, so we just need to create the code of the view and a template to show the list of posts. Here is the view code:

def list_posts(request):
    posts = Post.objects.all()
    return render_to_response('list_posts.html',
            locals(), context_instance=RequestContext(request)
    )

And the list_posts.html template code:

{% extends "base.html" %}

{% block content %}
<dl>
    {% for post in posts %}
    <dt>{{ post.title }} (written by {{ post.user.username }})</dt>
    <dd>{{ post.content }}</dd>
    {% endfor %}
</dl>
{% endblock %}

Finished? Not yet :) The application now is ready to deploy. How do we deploy it? Just one command:

% python manage.py deploy

Done! Now, to use everything that we have just created on App Engine remote server, just create a super user in that server and enjoy:

% python manage.py remote createsuperuser

You can check this application flying on Google App Engine: http://1.latest.gaeseries.appspot.com (use demo for username and password in login page).

You can check this application code out in Github: http://github.com/fsouza/gaeseries/tree/django.

Interoperability #rust2020

2019-12-01T15:00:00+00:00

In January I wrote a post for the Rust 2019 call for blogs. The 2020 call is aiming for an RFC and roadmap earlier this time, so here is my 2020 post =]

Last call review: what happened?

An attribute proc-macro like `#[wasm_bindgen]` but for FFI

This sort of happened... because WebAssembly is growing =]

I was very excited when Interface Types showed up in August, and while it is still very experimental it is moving fast and bringing saner paths for interoperability than raw C FFIs. David Beazley even point this at the end of his PyCon India keynote, talking about how easy is to get information out of a WebAssembly module compared to what had to be done for SWIG.

This doesn't solve the problem where strict C compatibility is required, or for platforms where a WebAssembly runtime is not available, but I think it is a great solution for scientific software (or, at least, for my use cases =]).

"More -sys and Rust-like crates for interoperability with the larger ecosystems" and "More (bioinformatics) tools using Rust!"

I did some of those this year (bbhash-sys and mqf), and also found some great crates to use in my projects. Rust is picking up steam in bioinformatics, being used as the primary choice for high quality software (like varlociraptor, or the many coming from 10X Genomics) but it is still somewhat hard to find more details (I mostly find it on Twitter, and sometime Google Scholar alerts). It would be great to start bringing this info together, which leads to...

"A place to find other scientists?"

Hey, this one happened! Luca Palmieri started a conversation on reddit and the #science-and-ai Discord channel on the Rust community server was born! I think it works pretty well, and Luca also has being doing a great job running workshops and guiding the conversation around rust-ml.

Rust 2021: Interoperability

Rust is amazing because it is very good at bringing many concepts and ideas that seem contradictory at first, but can really shine when synthesized. But can we share this combined wisdom and also improve the situation in other places too? Despite the "Rewrite it in Rust" meme, increased interoperability is something that is already driving a lot of the best aspects of Rust:

Interoperability with other languages: as I said before, with WebAssembly (and Rust being having the best toolchain for it) there is a clear route to achieve this, but it will not replace all the software that already exist and can benefit from FFI and C compatibility. Bringing together developers from the many language specific binding generators (helix, neon, rustler, PyO3...) and figuring out what's missing from them (or what is the common parts that can be shared) also seems productive.
Interoperability with new and unexplored domains. I think Rust benefits enormously from not focusing only in one domain, and choosing to prioritize CLI, WebAssembly, Networking and Embedded is a good subset to start tackling problems, but how to guide other domains to also use Rust and come up with new contributors and expose missing pieces of the larger picture?

Another point extremely close to interoperability is training. A great way to interoperate with other languages and domains is having good documentation and material from transitioning into Rust without having to figure everything at once. Rust documentation is already amazing, especially considering the many books published by each working group. But... there is a gap on the transitions, both from understanding the basics of the language and using it, to the progression from beginner to intermediate and expert.

I see good resources for JavaScript and Python developers, but we are still covering a pretty small niche: programmers curious enough to go learn another language, or looking for solutions for problems in their current language.

Can we bring more people into Rust? RustBridge is obviously the reference here, but there is space for much, much more. Using Rust in The Carpentries lessons? Creating RustOpenSci, mirroring the communities of practice of rOpenSci and pyOpenSci?

Planeta PythonBrasil

7 subjects (and GitHub repositories) to become a better Go Developer

Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional

Decisões na carreira de engenharia de software (desenvolvimento)

Accessing Google Firestore on Vercel

Reconhecimento pelo trabalho com Open Source - GitHub Star

Minha meta como gestor é: ser mandado embora no final do dia

Orientação a objetos de outra forma: Property

Métodos protegidos e privados

Property

Considerações

Iniciando com o ORM Pony no Python III - Erros e Exceções

Decifrando o Zen do Python

Iniciando com o ORM Pony no Python II - Banco de Dados com Docker

Orientação a objetos de outra forma: ABC

Sem uso de classes base abstratas

Com uso de classes base abstratas

Considerações

Juros e lógica

Por que o *=?

Por que se soma 1 ao juros?

Precisa de repetição para realizar este cálculo?

E de onde vem essa exponenciação?

Por que usamos a repetição?

Uma questão que apareceu hoje no StackOverflow em Português

Orientação a objetos de outra forma: Herança múltiplas e mixins

Herança múltiplas

Ordem de resolução de métodos

Estendendo mixins

Considerações

Configurando limite de recursos em aplicações Java (JVM) no Kubernetes

Falar sobre 'Assuntos Difíceis'

Iniciando com o ORM Pony no Python

Orientação a objetos de outra forma: Herança

Adicionando funcionalidades

Sem orientação a objetos

Com orientação a objetos

Sobrescrevendo uma função

Com orientação a objetos

Sem orientação a objetos

Considerações

Orientação a objetos de outra forma: Métodos estáticos e de classes

Métodos estáticos

Métodos da classe

Considerações

Orientação a objetos de outra forma: Classes e objetos

Usando um dicionário

Função para criar o dicionário

Função com o dicionário

Versão com orientação a objetos

Função para criar uma pessoa

Outras funções

Considerações

Funções in place ou cópia de valor

Função de exemplo

Função com in place

Função com cópia de valor

Função híbrida

Exemplo na biblioteca padrão

Considerações

Encapsulamento da lógica do algoritmo

Exercício

Alterando o algoritmo

Encapsulamento da lógica em função

Encapsulamento da lógica em classes

Considerações

Blog has moved!

Mocando um serviço com Bottle

Hosting Telegram bots on Cloud Run for free

Code

Deployment

Periodically backup your Google Photos to Google Cloud Storage

Why?

Installation

Testing

Schedule startup and shutdown of the VM

Auto generating SEO-friendly URLs with Scrapy pipelines

Taking advantage of Python’s concurrent futures to full saturate your bandwidth

Fazendo backup do banco de dados no Django

Apresentação

Por que o `*=`?

Por que `True, True, True == (True, True, True)` retorna `True, True, False`?

Por que `-2 * 5 // 3 + 1` retorna `-3`?

Replacing `add_hash` calls with one `add_many`

Oxidizing `add_many`

Bonus: `list` or `set`?