Planeta PythonBrasil

May 20, 2020

Blog do PauloHRPinheiro

Pegando dados macroeconômicos com Quandl em Python

Como obter dados históricos sobre indicadores de macroeconomia com a biblioteca Quandl usando Python

20 de May de 2020 às 00:00

May 11, 2020

Gabbleblotchits

Putting it all together

sourmash 3.3 was released last week, and it is the first version supporting zipped databases. Here is my personal account of how that came to be =]

What is a sourmash database?

A sourmash database contains signatures (typically Scaled MinHash sketches built from genomic datasets) and an index for allowing efficient similarity and containment queries over these signatures. The two types of index are SBT, a hierarchical index that uses less memory by keeping data on disk, and LCA, an inverted index that uses more memory but is potentially faster. Indices are described as JSON files, with LCA storing all the data in one JSON file and SBT opting for saving a description of the index structure in JSON, and all the data into a hidden directory with many files.

We distribute some prepared databases (with SBT indices) for Genbank and RefSeq as compressed TAR files. The compressed file is ~8GB, but after decompressing it turns into almost 200k files in a hidden directory, using about 40 GB of disk space.

Can we avoid generating so many hidden files?

The initial issue in this saga is dib-lab/sourmash#490, and the idea was to take the existing support for multiple data storages (hidden dir, TAR files, IPFS and Redis) and save the index description in the storage, allowing loading everything from the storage. Since we already had the databases as TAR files, the first test tried to use them but it didn't take long to see it was a doomed approach: TAR files are terrible from random access (or at least the tarfile module in Python is).

Zip files showed up as a better alternative, and it helps that Python has the zipfile module already available in the standard library. Initial tests were promising, and led to dib-lab/sourmash#648. The main issue was performance: compressing and decompressing was slow, but there was also another limitation...

Loading Nodegraphs from a memory buffer

Another challenge was efficiently loading the data from a storage. The two core methods in a storage are save(location, content), where content is a bytes buffer, and load(location), which returns a bytes buffer that was previously saved. This didn't interact well with the khmer Nodegraphs (the Bloom Filter we use for SBTs), since khmer only loads data from files, not from memory buffers. We ended up doing a temporary file dance, which made things slower for the default storage (hidden dir), where it could have been optimized to work directly with files, and involved interacting with the filesystem for the other storages (IPFS and Redis could be pulling data directly from the network, for example).

This one could be fixed in khmer by exposing C++ stream methods, and I did a small PoC to test the idea. While doable, this is something that was happening while the sourmash conversion to Rust was underway, and depending on khmer was a problem for my Webassembly aspirations... so, having the Nodegraph implemented in Rust seemed like a better direction, That has actually been quietly living in the sourmash codebase for quite some time, but it was never exposed to the Python (and it was also lacking more extensive tests).

After the release of sourmash 3 and the replacement of the C++ for the Rust implementation, all the pieces for exposing the Nodegraph where in place, so dib-lab/sourmash#799 was the next step. It wasn't a priority at first because other optimizations (that were released in 3.1 and 3.2) were more important, but then it was time to check how this would perform. And...

Your Rust code is not so fast, huh?

Turns out that my Nodegraph loading code was way slower than khmer. The Nodegraph binary format is well documented, and doing an initial implementation wasn't so hard by using the byteorder crate to read binary data with the right endianess, and then setting the appropriate bits in the internal fixedbitset in memory. But the khmer code doesn't parse bit by bit: it reads a long char buffer directly, and that is many orders of magnitude faster than setting bit by bit.

And there was no way to replicate this behavior directly with fixedbitset. At this point I could either bit-indexing into a large buffer and lose all the useful methods that fixedbitset provides, or try to find a way to support loading the data directly into fixedbitset and open a PR.

I chose the PR (and even got #42! =]).

It was more straightforward than I expected, but it did expose the internal representation of fixedbitset, so I was a bit nervous it wasn't going to be merged. But bluss was super nice, and his suggestions made the PR way better! This simplified the final Nodegraph code, and actually was more correct (because I was messing a few corner cases when doing the bit-by-bit parsing before). Win-win!

Nodegraphs are kind of large, can we compress them?

Being able to save and load Nodegraphs in Rust allowed using memory buffers, but also opened the way to support other operations not supported in khmer Nodegraphs. One example is loading/saving compressed files, which is supported for Countgraph (another khmer data structure, based on Count-Min Sketch) but not in Nodegraph.

If only there was an easy way to support working with compressed files...

Oh wait, there is! niffler is a crate that I made with Pierre Marijon based on some functionality I saw in one of his projects, and we iterated a bit on the API and documented everything to make it more useful for a larger audience. niffler tries to be as transparent as possible, with very little boilerplate when using it but with useful features nonetheless (like auto detection of the compression format). If you want more about the motivation and how it happened, check this Twitter thread.

The cool thing is that adding compressed files support in sourmash was mostly one-line changes for loading (and a bit more for saving, but mostly because converting compression levels could use some refactoring).

Putting it all together: zipped SBT indices

With all these other pieces in places, it's time to go back to dib-lab/sourmash#648. Compressing and decompressing with the Python zipfile module is slow, but Zip files can also be used just for storage, handing back the data without extracting it. And since we have compression/decompression implemented in Rust with niffler, that's what the zipped sourmash databases are: data is loaded and saved into the Zip file without using the Python module compression/decompression, and all the work is done before (or after) in the Rust side.

This allows keeping the Zip file with similar sizes to the original TAR files we started with, but with very low overhead for decompression. For compression we opted for using Gzip level 1, which doesn't compress perfectly but also doesn't take much longer to run:

Level Size Time
0 407 MB 16s
1 252 MB 21s
5 250 MB 39s
9 246 MB 1m48s

In this table, 0 is without compression, while 9 is the best compression. The size difference from 1 to 9 is only 6 MB (~2% difference) but runs 5x faster, and it's only 30% slower than saving the uncompressed data.

The last challenge was updating an existing Zip file. It's easy to support appending new data, but if any of the already existing data in the file changes (which happens when internal nodes change in the SBT, after a new dataset is inserted) then there is no easy way to replace the data in the Zip file. Worse, the Python zipfile will add the new data while keeping the old one around, leading to ginormous files over time1 So, what to do?

I ended up opting for dealing with the complexity and complicating the ZipStorage implementation a bit, by keeping a buffer for new data. If it's a new file or it already exists but there are no insertions the buffer is ignored and all works as before.

If the file exists and new data is inserted, then it is first stored in the buffer (where it might also replace a previous entry with the same name). In this case we also need to check the buffer when trying to load some data (because it might exist only in the buffer, and not in the original file).

Finally, when the ZipStorage is closed it needs to verify if there are new items in the buffer. If not, it is safe just to close the original file. If there are new items but they were not present in the original file, then we can append the new data to the original file. The final case is if there are new items that were also in the original file, and in this case a new Zip file is created and all the content from buffer and original file are copied to it, prioritizing items from the buffer. The original file is replaced by the new Zip file.

Turns out this worked quite well! And so the PR was merged =]

The future

Zipped databases open the possibility of distributing extra data that might be useful for some kinds of analysis. One thing we are already considering is adding taxonomy information, let's see what else shows up.

Having Nodegraph in Rust is also pretty exciting, because now we can change the internal representation for something that uses less memory (maybe using RRR encoding?), but more importantly: now they can also be used with Webassembly, which opens many possibilities for running not only signature computation but also search and gather in the browser, since now we have all the pieces to build it.

Comments?


Footnotes

  1. The zipfile module does throw a UserWarning pointing that duplicated files were inserted, which is useful during development but generally doesn't show during regular usage...

por luizirber em 11 de May de 2020 às 15:00

May 05, 2020

Filipe Saraiva

LaKademy 2019

Em novembro passado, colaboradores latinoamericanos do KDE desembarcaram em Salvador/Brasil para participarem de mais uma edição do LaKademy – o Latin American Akademy. Aquela foi a sétima edição do evento (ou oitava, se você contar o Akademy-BR como o primeiro LaKademy) e a segunda com Salvador como a cidade que hospedou o evento. Sem problemas para mim: na verdade, adoraria me mudar e viver ao menos alguns anos em Salvador, cidade que gosto baste. 🙂

Foto em grupo do LaKademy 2019

Minhas principais tarefas durante o evento foram em 2 projetos: Cantor e Sprat, um “editor de rascunhos de artigos acadêmicos”. Além deles, ajudei também com tarefas de promoção como o site do LaKademy.

Nos trabalhos sobre o Cantor me foquei naqueles relacionados com organização. Por exemplo, pedi aos sysadmins que migrassem o repositório para o Gitlab do KDE e crei um site específico para o Cantor em cantor.kde.org usando o novo template em Jekyll para projetos do KDE.

O novo site é uma boa adição ao Cantor porque nós queremos comunicar melhor e mais diretamente com nossa comunidade de usuários. O site tem um blog próprio e uma seção de changelog para tornar mais fácil à comunidde seguir as notícias e principais mudanças no software.

A migração para o Gitlab nos permite utilizar o Gitlab CI como uma alternativa para integração contínua no Cantor. Eu orientei o trabalho do Rafael Gomes (que ainda não teve merge) para termos isso disponível pro projeto.

Além dos trabalhos no Cantor, desenvolvi algumas atividades relacionadas ao Sprat, um editor de rascunhos de artigos científicos em inglês. Este softwar usa katepart para implementar e metodologia de escrita de artigos científicos em inglês conhecida como PROMETHEUS, conforme descrita neste livro, como uma tentativa de auxiliar estudantes e pesquisadores em geral na tarefa de escrever artigos científicos. Durante o LaKademy finalizei o port para Qt5 e, tomara, espero lançar o projeto este ano.

Nas atividades mais sociais, participei da famosíssima reunião de promo, que discute as ações futuras do KDE para a América Latina. Nossa principal decisão foi organizar e participar mais de  eventos pequenos e distribuídos em várias cidades, marcando a presença do KDE em eventos consolidados como o FLISoL e o Software Freedom Day, e mais – mas agora, em tempos de COVID-19, isso não é mais viável. Outra decisão foi mover a organização do KDE Brasil do Phabricator para o Gitlab.

Contribuidores do KDE trabalhando pesado

Para além da parte técnica, este LaKademy foi uma oportunidade para encontrar velhos e novos amigos, beber algumas cervejas, saborear a maravilhosa cozinha bahiana, e se divertir entre um commit e outro. 🙂

Gostaria de agradecer ao KDE e.V. por apoiar o LaKademy, e Caio e Icaro por terem organizado essa edição do evento. Não vejo a hora de participar do próximo LaKademy e que isso seja o mais rápido possível! 😉

por Filipe Saraiva em 05 de May de 2020 às 18:29

May 04, 2020

Filipe Saraiva

Akademy 2019

Foto em grupo do Akademy 2019

Em setembro de 2019 a cidade italiana de Milão sediou o principal encontro mundial dos colaboradores do KDE – o Akademy, onde membros de diferentes áreas como tradutores, desenvolvedores, artistas, pessoal de promo e mais se reúnem por alguns dias para pensar e construir o futuro dos projetos e comunidade(s) do KDE

Antes de chegar ao Akademy tomei um voo do Brasil para Portugal a fim de participar do EPIA, uma conferência sobre inteligência artificial que aconteceu na pequena cidade de Vila Real, região do Porto. Após essa atividade acadêmica, voei do Porto à Milão e iniciei minha participação no Akademy 2019.

Infelizmente pousei no final da primeira manhã do evento, o que me fez perder apresentações interessantes sobre Qt 6 e os novos KDE’ Goals. Pela tarde pude algumas palestras sobre temas que também me chamam atenção, como Plasma para dispositivos móveis, MyCroft para a indústria automotiva, relatório do KDE e.V. e um showcase de estudantes do Google Summer of Code e do Season of KDE – muito legal ver projetos incríveis desenvolvidos pelos novatos.

No segundo dia me chamou atenção as palestras sobre o KPublicTransportation, LibreOffice para o Plasma, Get Hot New Stuffs – que imagino utilizarei em um futuro projeto – e a apresentação do Caio sobre o kpmcore.

Após a festa do evento (comentários sobre ela apenas pessoalmente), os dias seguintes foram preenchidos pelos BoFs, para mim a parte mais interessante do Akademy.

O workshop do Gitlab foi interessante porque pudemos discutir temas específicos sobre a migração do KDE para esta ferramenta. Estou adorando esse movimento e espero que todos os projetos do KDE façam essa migração o quanto antes. Cantor já está por lá há algum tempo.

No BoF do KDE websites, pude entender um pouco melhor sobre o novo tema do Jekyll utilizado pelos nossos sites. Em adição, aguardo que logo mais possamos aplicar internacionalização nessas páginas, tornando-as traduzíveis para quaisquer idiomas. Após participar e tomar informações nesse evento, criei um novo website pro Cantor durante o LaKademy 2019.

O BoF do KDE Craft foi interessante para ver como compilar e distribuir nosso software na loja do Windows (pois é, vejam só o que estou escrevendo…). Espero trabalhar com esse tema durante o ano de forma a disponibilizar um pacote do Cantor naquela loja ((pois é)²).

Também participei do workshop sobre QML e Kirigami realizado pelo pessoal do projeto Maui. Kirigami é algo que tenho mantido o olho para futuros projetos.

Finalmente, participei do BoF “All About the Apps Kick Off”. Pessoalmente, penso que esse é o futuro do KDE: uma comunidade internacional que produz software livre de alta qualidade e seguro para diferentes plataformas, do desktop ao mobile. De fato, isto é como o KDE está atualmente organizado e funcionando, mas nós não conseguimos comunicar isso muito bem para o público. Talvez, com as mudanças em nossa forma de lançamentos, somados aos websites para projetos específicos e distribuição em diferentes lojas de aplicativos, possamos mudar a maneira como o público vê a nossa comunidade.

O day trip do Akademy 2019 foi no lago Como, na cidade de Varenna. Uma viagem linda, passei o tempo todo imaginando que lá poderia ser um bom lugar para eu passar uma lua de mel :D. Espero voltar lá no futuro próximo, e passar alguns dias viajando entre cidades como ela.

Eu em Varenna

Gostaria de agradecer a todo o time local, Riccardo e seus amigos, por organizarem essa edição incrível do Akademy. Milão é uma cidade muito bonita, com comida deliciosa (carbonara!), lugares históricos para visitar e descobrir mais sobre os italianos e a sofisticada capital da alta moda.

Finalmente, meus agradecimentos ao KDE e.V. por patrocinar minha participação no Akademy.

Neste link estão disponíveis vídeos das apresentações e BoFs do Akademy 2019.

por Filipe Saraiva em 04 de May de 2020 às 21:20

February 28, 2020

Thiago Avelino

Foco no ambiente, acelerando o aprendizado!

Objetivo desse blogpost é compartilhar como geralmente faço para acelerar meu aprendizado em uma área que não tenho tanta experiencia e quero (e/ou preciso) ganhar mais experiência. Quando entrei na área de tecnologia (desenvolvimento de software) não sabia praticamente nada e comecei estudar como poderia acelerar meu aprendizado, até que me deparei em um texto no reddit que falava sobre foco no ambiente, foi o extremamente difícil eu conseguir entender o que estava querendo dizer aquele texto, mas depois de dias lendo e relendo consegui absorver que deveria frequentar lugares onde tinha pessoas fazendo o que buscava aprender, assim aceleraria meu aprendizado.

28 de February de 2020 às 16:49

February 19, 2020

Thiago Avelino

Chegando no limite da tecnologia, e agora para aonde vou?

Nós de tecnologia em geral, somos early adopter (gostamos de abraçar novas tecnologias, mesmo sem saber ao certo porque ela existe), quando falamos em desenvolvimento não é muito diferente. Por que não usamos o banco de dados X? Podemos usar a linguagem de programação Y! O serviço Z resolve 100% dos nossos problemas! Vamos assumir que as afirmações acima estejam 100% corretas (lançamos o primeiro erro), a solução irá servir para “vida toda” ou daqui a alguns meses tenham que olhar para ela, porque batemos em algum limite da implementação, arquitetura ou da própria tecnologia?

19 de February de 2020 às 14:00

January 24, 2020

PythonClub

Criando um CI de uma aplicação Django usando Github Actions

Fala pessoal, tudo bom?

Nos vídeo abaixo vou mostrar como podemos configurar um CI de uma aplicação Django usando Github Actions.

https://www.youtube.com/watch?v=KpSlY8leYFY.

por Lucas Magnum em 24 de January de 2020 às 15:10

January 10, 2020

Gabbleblotchits

Oxidizing sourmash: PR walkthrough

sourmash 3 was released last week, finally landing the Rust backend. But, what changes when developing new features in sourmash? I was thinking about how to best document this process, and since PR #826 is a short example touching all the layers I decided to do a small walkthrough.

Shall we?

The problem

The first step is describing the problem, and trying to convince reviewers (and yourself) that the changes bring enough benefits to justify a merge. This is the description I put in the PR:

Calling .add_hash() on a MinHash sketch is fine, but if you're calling it all the time it's better to pass a list of hashes and call .add_many() instead. Before this PR add_many just called add_hash for each hash it was passed, but now it will pass the full list to Rust (and that's way faster).

No changes for public APIs, and I changed the _signatures method in LCA to accumulate hashes for each sig first, and then set them all at once. This is way faster, but might use more intermediate memory (I'll evaluate this now).

There are many details that sound like jargon for someone not familiar with the codebase, but if I write something too long I'll probably be wasting the reviewers time too. The benefit of a very detailed description is extending the knowledge for other people (not necessarily the maintainers), but that also takes effort that might be better allocated to solve other problems. Or, more realistically, putting out other fires =P

Nonetheless, some points I like to add in PR descriptions: - why is there a problem with the current approach? - is this the minimal viable change, or is it trying to change too many things at once? The former is way better, in general. - what are the trade-offs? This PR is using more memory to lower the runtime, but I hadn't measure it yet when I opened it. - Not changing public APIs is always good to convince reviewers. If the project follows a semantic versioning scheme, changes to the public APIs are major version bumps, and that can brings other consequences for users.

Setting up for changing code

If this was a bug fix PR, the first thing I would do is write a new test triggering the bug, and then proceed to fix it in the code (Hmm, maybe that would be another good walkthrough?). But this PR is making performance claims ("it's going to be faster"), and that's a bit hard to codify in tests. 1 Since it's also proposing to change a method (_signatures in LCA indices) that is better to benchmark with a real index (and not a toy example), I used the same data and command I run in sourmash_resources to check how memory consumption and runtime changed. For reference, this is the command:

sourmash search -o out.csv --scaled 2000 -k 51 HSMA33OT.fastq.gz.sig genbank-k51.lca.json.gz

I'm using the benchmark feature from snakemake in sourmash_resources to track how much memory, runtime and I/O is used for each command (and version) of sourmash, and generate the plots in the README in that repo. That is fine for a high-level view ("what's the maximum memory used?"), but not so useful for digging into details ("what method is consuming most memory?").

Another additional problem is the dual2 language nature of sourmash, where we have Python calling into Rust code (via CFFI). There are great tools for measuring and profiling Python code, but they tend to not work with extension code...

So, let's bring two of my favorite tools to help!

Memory profiling: heaptrack

heaptrack is a heap profiler, and I first heard about it from Vincent Prouillet. It's main advantage over other solutions (like valgrind's massif) is the low overhead and... how easy it is to use: just stick heaptrack in front of your command, and you're good to go!

Example output:

$ heaptrack sourmash search -o out.csv --scaled 2000 -k 51 HSMA33OT.fastq.gz.sig genbank-k51.lca.json.gz

heaptrack stats:
        allocations:            1379353
        leaked allocations:     1660
        temporary allocations:  168984
Heaptrack finished! Now run the following to investigate the data:

  heaptrack --analyze heaptrack.sourmash.66565.gz

heaptrack --analyze is a very nice graphical interface for analyzing the results, but for this PR I'm mostly focusing on the Summary page (and overall memory consumption). Tracking allocations in Python doesn't give many details, because it shows the CPython functions being called, but the ability to track into the extension code (Rust) allocations is amazing for finding bottlenecks (and memory leaks =P). 3

CPU profiling: py-spy

Just as other solutions exist for profiling memory, there are many for profiling CPU usage in Python, including profile and cProfile in the standard library. Again, the issue is being able to analyze extension code, and bringing the cannon (the perf command in Linux, for example) looses the benefit of tracking Python code properly (because we get back the CPython functions, not what you defined in your Python code).

Enters py-spy by Ben Frederickson, based on the rbspy project by Julia Evans. Both use a great idea: read the process maps for the interpreters and resolve the full stack trace information, with low overhead (because it uses sampling). py-spy also goes further and resolves [native Python extensions] stack traces, meaning we can get the complete picture all the way from the Python CLI to the Rust core library!4

py-spy is also easy to use: stick py-spy record --output search.svg -n -- in front of the command, and it will generate a flamegraph in search.svg. The full command for this PR is

py-spy record --output search.svg -n -- sourmash search -o out.csv --scaled 2000 -k 51 HSMA.fastq.sig genbank-k51.lca.json.gz

Show me the code!

OK, OK, sheesh. But it's worth repeating: the code is important, but there are many other aspects that are just as important =]

Replacing add_hash calls with one add_many

Let's start at the _signatures() method on LCA indices. This is the original method:

@cached_property
def _signatures(self):
    "Create a _signatures member dictionary that contains {idx: minhash}."
    from .. import MinHash

    minhash = MinHash(n=0, ksize=self.ksize, scaled=self.scaled)

    debug('creating signatures for LCA DB...')
    sigd = defaultdict(minhash.copy_and_clear)

    for (k, v) in self.hashval_to_idx.items():
        for vv in v:
            sigd[vv].add_hash(k)

    debug('=> {} signatures!', len(sigd))
    return sigd

sigd[vv].add_hash(k) is the culprit. Each call to .add_hash has to go thru CFFI to reach the extension code, and the overhead is significant. It's similar situation to accessing array elements in NumPy: it works, but it is way slower than using operations that avoid crossing from Python to the extension code. What we want to do instead is call .add_many(hashes), which takes a list of hashes and process it entirely in Rust (ideally. We will get there).

But, to have a list of hashes, there is another issue with this code.

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        sigd[vv].add_hash(k)

There are two nested for loops, and add_hash is being called with values from the inner loop. So... we don't have the list of hashes beforehand.

But we can change the code a bit to save the hashes for each signature in a temporary list, and then call add_many on the temporary list. Like this:

temp_vals = defaultdict(list)

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].append(k)

for sig, vals in temp_vals.items():
    sigd[sig].add_many(vals)

There is a trade-off here: if we save the hashes in temporary lists, will the memory consumption be so high that the runtime gains of calling add_many in these temporary lists be cancelled?

Time to measure it =]

version mem time
original 1.5 GB 160s
list 1.7GB 173s

Wait, it got worse?!?! Building temporary lists only takes time and memory, and bring no benefits!

This mystery goes away when you look at the add_many method:

def add_many(self, hashes):
    "Add many hashes in at once."
    if isinstance(hashes, MinHash):
        self._methodcall(lib.kmerminhash_add_from, hashes._objptr)
    else:
        for hash in hashes:
            self._methodcall(lib.kmerminhash_add_hash, hash)

The first check in the if statement is a shortcut for adding hashes from another MinHash, so let's focus on else part... And turns out that add_many is lying! It doesn't process the hashes in the Rust extension, but just loops and call add_hash for each hash in the list. That's not going to be any faster than what we were doing in _signatures.

Time to fix add_many!

Oxidizing add_many

The idea is to change this loop in add_many:

for hash in hashes:
    self._methodcall(lib.kmerminhash_add_hash, hash)

with a call to a Rust extension function:

self._methodcall(lib.kmerminhash_add_many, list(hashes), len(hashes))

self._methodcall is a convenience method defined in [RustObject] which translates a method-like call into a function call, since our C layer only has functions. This is the C prototype for this function:

void kmerminhash_add_many(
    KmerMinHash *ptr,
    const uint64_t *hashes_ptr,
    uintptr_t insize
  );

You can almost read it as a Python method declaration, where KmerMinHash *ptr means the same as the self in Python methods. The other two arguments are a common idiom when passing pointers to data in C, with insize being how many elements we have in the list. 5. CFFI is very good at converting Python lists into pointers of a specific type, as long as the type is of a primitive type (uint64_t in our case, since each hash is a 64-bit unsigned integer number).

And the Rust code with the implementation of the function:

ffi_fn! {
unsafe fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize,
  ) -> Result<()> {
    let mh = {
        assert!(!ptr.is_null());
        &mut *ptr
    };

    let hashes = {
        assert!(!hashes_ptr.is_null());
        slice::from_raw_parts(hashes_ptr as *mut u64, insize)
    };

    for hash in hashes {
      mh.add_hash(*hash);
    }

    Ok(())
}
}

Let's break what's happening here into smaller pieces. Starting with the function signature:

ffi_fn! {
unsafe fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize,
  ) -> Result<()>

The weird ffi_fn! {} syntax around the function is a macro in Rust: it changes the final generated code to convert the return value (Result<()>) into something that is valid C code (in this case, void). What happens if there is an error, then? The Rust extension has code for passing back an error code and message to Python, as well as capturing panics (when things go horrible bad and the program can't recover) in a way that Python can then deal with (raising exceptions and cleaning up). It also sets the #[no_mangle] attribute in the function, meaning that the final name of the function will follow C semantics (instead of Rust semantics), and can be called more easily from C and other languages. This ffi_fn! macro comes from symbolic, a big influence on the design of the Python/Rust bridge in sourmash.

unsafe is the keyword in Rust to disable some checks in the code to allow potentially dangerous things (like dereferencing a pointer), and it is required to interact with C code. unsafe doesn't mean that the code is always unsafe to use: it's up to whoever is calling this to verify that valid data is being passed and invariants are being preserved.

If we remove the ffi_fn! macro and the unsafe keyword, we have

fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize
  );

At this point we can pretty much map between Rust and the C function prototype:

void kmerminhash_add_many(
    KmerMinHash *ptr,
    const uint64_t *hashes_ptr,
    uintptr_t insize
  );

Some interesting points:

  • We use fn to declare a function in Rust.
  • The type of an argument comes after the name of the argument in Rust, while it's the other way around in C. Same for the return type (it is omitted in the Rust function, which means it is -> (), equivalent to a void return type in C).
  • In Rust everything is immutable by default, so we need to say that we want a mutable pointer to a KmerMinHash item: *mut KmerMinHash). In C everything is mutable by default.
  • u64 in Rust -> uint64_t in C
  • usize in Rust -> uintptr_t in C

Let's check the implementation of the function now. We start by converting the ptr argument (a raw pointer to a KmerMinHash struct) into a regular Rust struct:

let mh = {
    assert!(!ptr.is_null());
    &mut *ptr
};

This block is asserting that ptr is not a null pointer, and if so it dereferences it and store in a mutable reference. If it was a null pointer the assert! would panic (which might sound extreme, but is way better than continue running because dereferencing a null pointer is BAD). Note that functions always need all the types in arguments and return values, but for variables in the body of the function Rust can figure out types most of the time, so no need to specify them.

The next block prepares our list of hashes for use:

let hashes = {
    assert!(!hashes_ptr.is_null());
    slice::from_raw_parts(hashes_ptr as *mut u64, insize)
};

We are again asserting that the hashes_ptr is not a null pointer, but instead of dereferencing the pointer like before we use it to create a slice, a dynamically-sized view into a contiguous sequence. The list we got from Python is a contiguous sequence of size insize, and the slice::from_raw_parts function creates a slice from a pointer to data and a size.

Oh, and can you spot the bug? I created the slice using *mut u64, but the data is declared as *const u64. Because we are in an unsafe block Rust let me change the mutability, but I shouldn't be doing that, since we don't need to mutate the slice. Oops.

Finally, let's add hashes to our MinHash! We need a for loop, and call add_hash for each hash:

for hash in hashes {
  mh.add_hash(*hash);
}

Ok(())

We finish the function with Ok(()) to indicate no errors occurred.

Why is calling add_hash here faster than what we were doing before in Python? Rust can optimize these calls and generate very efficient native code, while Python is an interpreted language and most of the time don't have the same guarantees that Rust can leverage to generate the code. And, again, calling add_hash here doesn't need to cross FFI boundaries or, in fact, do any dynamic evaluation during runtime, because it is all statically analyzed during compilation.

Putting it all together

And... that's the PR code. There are some other unrelated changes that should have been in new PRs, but since they were so small it would be more work than necessary. OK, that's a lame excuse: it's confusing for reviewers to see these changes here, so avoid doing that if possible!

But, did it work?

version mem time
original 1.5 GB 160s
list 1.7GB 73s

We are using 200 MB of extra memory, but taking less than half the time it was taking before. I think this is a good trade-off, and so did the reviewer and the PR was approved.

Hopefully this was useful, 'til next time!

Comments?

Bonus: list or set?

The first version of the PR used a set instead of a list to accumulate hashes. Since a set doesn't have repeated elements, this could potentially use less memory. The code:

temp_vals = defaultdict(set)

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].add(k)

for sig, vals in temp_vals.items():
    sigd[sig].add_many(vals)

The runtime was again half of the original, but... | version | mem | time | | :-- | :-- | :-- | |original|1.5 GB|160s| |set|3.8GB|80s| |list|1.7GB|73s| ... memory consumption was almost 2.5 times the original! WAT

The culprit this time? The new kmerminhash_add_many call in the add_many method. This one:

self._methodcall(lib.kmerminhash_add_many, list(hashes), len(hashes))

CFFI doesn't know how to convert a set into something that C understands, so we need to call list(hashes) to convert it into a list. Since Python (and CFFI) can't know if the data is going to be used later 6 it needs to keep it around (and be eventually deallocated by the garbage collector). And that's how we get at least double the memory being allocated...

There is another lesson here. If we look at the for loop again:

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].add(k)

each k is already unique because they are keys in the hashval_to_idx dictionary, so the initial assumption (that a set might save memory because it doesn't have repeated elements) is... irrelevant for the problem =]


Footnotes

  1. We do have https://asv.readthedocs.io/ set up for micro-benchmarks, and now that I think about it... I could have started by writing a benchmark for add_many, and then showing that it is faster. I will add this approach to the sourmash PR checklist =]

  2. or triple, if you count C

  3. It would be super cool to have the unwinding code from py-spy in heaptrack, and be able to see exactly what Python methods/lines of code were calling the Rust parts...

  4. Even if py-spy doesn't talk explicitly about Rust, it works very very well, woohoo!

  5. Let's not talk about lack of array bounds checks in C...

  6. something that the memory ownership model in Rust does, BTW

por luizirber em 10 de January de 2020 às 15:00

December 10, 2019

Francisco Souza

Try out Tsuru: announcing limited preview

A few days ago, Tsuru got some attention in the news. After reading about Tsuru, and seeing some of its capabilities, people started asking for a way to try Tsuru. Well, your claims were attended! We're preparing a public cloud that will be freely available for beta testers.

TL;DR: go to tsuru.io/try, signup for beta testing and get ready to start deploying Python, Ruby, Go and Java applications in the cloud.

What is Tsuru?

Tsuru is an open source platform as a service that allows developers to automatically deploy and manage web applications written in many different platforms (like Python, Ruby and Go). It aims to provide a solution for cloud computing platforms that is extensible, flexible and component based.

You can run your own public or private cloud using Tsuru. Or you can try it in the public cloud that Globo.com is building.

What is Tsuru public cloud? What does "beta availability" means?

Tsuru public cloud will be a public, freely available, installation of Tsuru, provided by Globo.com. "Beta availability" means that it will not be available for the general Internet public.

People will need to subscribe for the beta testing and wait for the confirmation, so they can start deploying web applications on Tsuru public cloud.

Which development platforms are going to be available?

Tsuru already supports Ruby, Python, Java and Go, so it is very likely that these platforms will be available for all beta users.

It's important to notice that adding new platforms to Tsuru is a straightforward task: each development platform is based on Juju Charms, so one can adapt charms available at Charm Store and send a patch.

How limited is it going to be?

We don't know what's the proper answer for this question yet, but don't worry about numbers now. There will be some kind of per-user quota, but it has not been defined yet.

People interested in running applications in the Tsuru public cloud that get to use the beta version will have access a functional environment where they will be able to deploy at least one web application.

When will it be available?

We're working hard to make it available as soon as possible, and you can help us get it done! If you want to contribute, please take a look at Tsuru repository, chose an issue, discuss your solution and send your patches. We are going to be very happy helping you out.

What if I don't want to wait?

If you want an unlimited, fully manageable and customized installation of Tsuru, you can have it today. Check out Tsuru's documentation and, in case of doubts, don't hesitate in contacting the newborn Tsuru community.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Setting up a Django production environment: compiling and configuring nginx

Here is another series of posts: now I’m going to write about setting up a Django production environment using nginx and Green Unicorn in a virtual environment. The subject in this first post is nginx, which is my favorite web server.

This post explains how to install nginx from sources, compiling it (on Linux). You might want to use apt, zif, yum or ports, but I prefer building from sources. So, to build from sources, make sure you have all development dependencies (C headers, including the PCRE library headers, nginx rewrite module uses it). If you want to build nginx with SSL support, keep in mind that you will need the libssl headers too.

Build nginx from source is a straightforward process: all you need to do is download it from the official site and build with some simple options. In our setup, we’re going to install nginx under /opt/nginx, and use it with the nginx system user. So, let’s download and extract the latest stable version (1.0.9) from nginx website:

% curl -O http://nginx.org/download/nginx-1.0.9.tar.gz
% tar -xzf nginx-1.0.9.tar.gz
Once you have extracted it, just configure, compile and install:

% ./configure --prefix=/opt/nginx --user=nginx --group=nginx
% make
% [sudo] make install
As you can see, we provided the /opt/nginx to configure, make sure the /opt directory exists. Also, make sure that there is a user and a group called nginx, if they don’t exist, add them:
% [sudo] adduser --system --no-create-home --disabled-login --disabled-password --group nginx
After that, you can start nginx using the command line below:
% [sudo] /opt/nginx/sbin/nginx

Linode provides an init script that uses start-stop-daemon, you might want to use it.

nginx configuration

nginx comes with a default nginx.conf file, let’s change it to reflect the following configuration requirements:
  • nginx should start workers with the nginx user
  • nginx should have two worker processes
  • the PID should be stored in the /opt/nginx/log/nginx.pid file
  • nginx must have an access log in /opt/nginx/logs/access.log
  • the configuration for the Django project we’re going to develop should be versioned with the entire code, so it must be included in the nginx.conf file (assume that the library project is in the directory /opt/projects).
So here is the nginx.conf for the requirements above:

user nginx;
worker_processes 2;

pid logs/nginx.pid;

events {
worker_connections 1024;
}

http {
include mime.types;
default_type application/octet-stream;

log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';

access_log logs/access.log main;

sendfile on;
keepalive_timeout 65;

include /opt/projects/showcase/nginx.conf;
}
Now we just need to write the configuration for our Django project. I’m using an old sample project written while I was working at Giran: the name is lojas giranianas, a nonsense portuguese joke with a famous brazilian store. It’s an unfinished showcase of products, it’s like an e-commerce project, but it can’t sell, so it’s just a product catalog. The code is available at Github. The nginx.conf file for the repository is here:

server {
listen 80;
server_name localhost;

charset utf-8;

location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

proxy_pass http://localhost:8000;
}

location /static {
root /opt/projects/showcase/;
expires 1d;
}
}
The server listens on port 80, responds for the localhost hostname (read more about the Host header). The location /static directive says that nginx will serve the static files of the project. It also includes an expires directive for caching control. The location / directive makes a proxy_pass, forwarding all requisitions to an upstream server listening on port 8000, this server is the subject of the next post of the series: the Green Unicorn (gunicorn) server.

Not only the HTTP request itself is forwarded to the gunicorn server, but also some headers, that helps to properly deal with the request:
  • X-Real-IP: forwards the remote address to the upstream server, so it can know the real IP of the user. When nginx forwards the request to gunicorn, without this header, all gunicorn will know is that there is a request coming from localhost (or wherever the nginx server is), the remote address is always the IP address of the machine where nginx is running (who actually make the request to gunicorn)
  • Host: the Host header is forwarded so gunicorn can treat different requests for different hosts. Without this header, it will be impossible to Gunicorn to have these constraints
  • X-Forwarded-For: also known as XFF, this header provide more precise information about the real IP who makes the request. Imagine there are 10 proxies between the user machine and your webserver, the XFF header will all these proxies comma separated. In order to not turn a proxy into an anonymizer, it’s a good practice to always forward this header.
So that is it, in the next post we are going to install and run gunicorn. In other posts, we’ll see how to make automated deploys using Fabric, and some tricks on caching (using the proxy_cache directive and integrating Django, nginx and memcached).

See you in next posts.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Speaking at OSCON 2014

Wow, one year without any posts! But I'm trying to get back...

This is a very short post, just to tell everybody that this year, I will have the opportunity to speak at OSCON 2014. I'm speaking about tsuru, and check more details of the talk in the tsuru blog.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Creating HTML 5 slide presentations using landslide

Recently I found landslide, which is a Python tool for creating HTML 5 slide presentations.

It’s based in a famous slide presentation. It’s a simple script that generates HTML from a source file, which can be formatted using reStructuredText, Textile or Markdown.

Let’s make a very simple presentation as a proof of concept: we’re going to create a “Python flow control” presentation, showing some basic structures of the language: if, for and while. We need a cover, a slide for each structure (with some topics and code examples) and the last slide for questions and answers. Here is the RST code for it:

Python
======

--------------

If
==

* Please don't use ()
* Never forget the ``:`` at the end of the line

Check this code:

.. sourcecode:: python

x, y = 1, 2
if x > y:
print 'x is greater'

--------------

For
===

* ``for`` iterates over a sequence
* Never forget the ``:`` at the end of the line

Check this code:

.. sourcecode:: python

numbers = [1, 2, 3, 4, 5,]
for number in numbers:
print number

--------------

While
=====

* ``while`` is like ``if``, but executes while the codition is ``True``
* please don't use ()
* never forget the ``:`` at the end of the line

Check this code:

.. sourcecode:: python

from random import randint

args = (1, 10,)
x = randint(*args)
while x != 6:
x = randint(*args)

--------------

Thank you!
==========
As you can see it’s very simple. If you’re familiar with RST syntax, you can guess what landslide does: it converts the entire content to HTML and then split it by <hr /> tag. Each slide will contain two sections: a header and a body. The header contains only an <h1></h1> element and the body contains everything.

We can generate the HTML output by calling the landslide command in the terminal:
% landslide python.rst
To use landslide command, you need to install it. I suggest you do this via pip:
% [sudo] pip install landslide
landslide supports theming, so you can customize it by creating your own theme. Your theme should contain two CSS files: screen.css (for the HTML version of slides) and print.css (for the PDF version of the slides). You might also customize the HTML (base.html) and JS files (slides.js), but you have to customize the CSS files in your theme. You specify the theme using the --theme directive. You might want to check all options available in the command line utility using --help:
% landslide --help
It’s quite easy to extend landslide changing its theme or adding new macros. Check the official repository at Github. This example, and a markdown version for the same example are available in a repository in my github profile.

You can also see the slides live!

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Splinter sprint on FISL

We are going to start tomorrow, on FISL, another splinter sprint. “From June 29 through July 2, 2011, fisl12 will be hosted at the PUC Events Center, in Porto Alegre, Rio Grande do Sul, Brazil” (copied from FISL website). But don’t worry about the location: anyone in anywhere can join us in this sprint. There is an entry in splinter wiki about this sprint, and I’m just replicating the information here...


What is a splinter sprint?

Basically, a splinter sprint is an excuse for people to focus their undivided attention, for a set time frame, on improving splinter. It’s a focused, scheduled effort to fix bugs, add new features and improve documentation.

Anybody, anywhere around the world, can participate and contribute. If you’ve never contributed to splinter before, this is the perfect chance for you to chip in.

How to contribute

  1. Choose an issue
  2. Create a fork
  3. Send a pull request
Remember: all new features should be well tested and documented. An issue can’t be closed if there isn’t docs for the solution code.

Preparing for the sprint

Get an IRC client, so that you can join us in the channel #cobrateam on Freenode.

See all you there!

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Testing jQuery plugins with Jasmine

Since I started working at Globo.com, I developed some jQuery plugins (for internal use) with my team, and we are starting to test these plugins using Jasmine, “a behavior-driven development framework for testing your JavaScript code”. In this post, I will show how to develop a very simple jQuery plugin (based on an example that I learned with Ricard D. Worth): zebrafy. This plugin “zebrafies” a table, applying different classes to odd and even lines. Let’s start setting up a Jasmine environment...

First step is download the standalone version of Jasmine, then extract it and edit the runner. The runner is a simple HTML file, that loads Jasmine and all JavaScript files you want to test. But, wait... why not test using node.js or something like this? Do I really need the browser on this test? You don’t need, but I think it is important to test a plugin that works with the DOM using a real browser. Let’s delete some files and lines from SpecRunner.html file, so we adapt it for our plugin. This is how the structure is going to look like:

.
├── SpecRunner.html
├── lib
│ ├── jasmine-1.0.2
│ │ ├── MIT.LICENSE
│ │ ├── jasmine-html.js
│ │ ├── jasmine.css
│ │ └── jasmine.js
│ └── jquery-1.6.1.min.js
├── spec
│ └── ZebrafySpec.js
└── src
└── jquery.zebrafy.js
You can create the files jquery.zebrafy.js and ZebrafySpec.js, but remember: it is BDD, we need to describe the behavior first, then write the code. So let’s start writing the specs in ZebrafySpec.js file using Jasmine. If you are familiar with RSpec syntax, it’s easy to understand how to write spec withs Jasmine, if you aren’t, here is the clue: Jasmine is a lib with some functions used for writing tests in an easier way. I’m going to explain each function “on demmand”, when we need something, we learn how to use it! ;)

First of all, we need to start a new test suite. Jasmine provides the describe function for that, this function receives a string and another function (a callback). The string describes the test suite and the function is a callback that delimites the scope of the test suite. Here is the Zebrafy suite:

describe('Zebrafy', function () {

});
Let’s start describing the behavior we want to get from the plugin. The most basic is: we want different CSS classes for odd an even lines in a table. Jasmine provides the it function for writing the tests. It also receives a string and a callback: the string is a description for the test and the callback is the function executed as test. Here is the very first test:

it('should apply classes zebrafy-odd and zebrafy-even to each other table lines', function () {
var table = $("#zebra-table");
table.zebrafy();
expect(table).toBeZebrafyied();
});
Okay, here we go: in the first line of the callback, we are using jQuery to select a table using the #zebra-table selector, which will look up for a table with the ID attribute equals to “zebra-table”, but we don’t have this table in the DOM. What about add a new table to the DOM in a hook executed before the test run and remove the table in another hook that runs after the test? Jasmine provide two functions: beforeEach and afterEach. Both functions receive a callback function to be executed and, as the names suggest, the beforeEach callback is called before each test run, and the afterEach callback is called after the test run. Here are the hooks:

beforeEach(function () {
$('<table id="zebra-table"></table>').appendTo('body');
for (var i=0; i < 10; i++) {
$('<tr></tr>').append('<td></td>').append('<td></td>').append('<td></td>').appendTo('#zebra-table');
};
});

afterEach(function () {
$("#zebra-table").remove();
});
The beforeEach callback uses jQuery to create a table with 10 rows and 3 columns and add it to the DOM. In afterEach callback, we just remove that table using jQuery again. Okay, now the table exists, let’s go back to the test:

it('should apply classes zebrafy-odd and zebrafy-even to each other table lines', function () {
var table = $("#zebra-table");
table.zebrafy();
expect(table).toBeZebrafyied();
});
In the second line, we call our plugin, that is not ready yet, so let’s forward to the next line, where we used the expect function. Jasmine provides this function, that receives an object and executes a matcher against it, there is a lot of built-in matchers on Jasmine, but toBeZebrafyied is not a built-in matcher. Here is where we know another Jasmine feature: the capability to write custom matchers, but how to do this? You can call the beforeEach again, and use the addMatcher method of Jasmine object:

beforeEach(function () {
this.addMatchers({
toBeZebrafyied: function() {
var isZebrafyied = true;

this.actual.find("tr:even").each(function (index, tr) {
isZebrafyied = $(tr).hasClass('zebrafy-odd') === false && $(tr).hasClass('zebrafy-even');
if (!isZebrafyied) {
return;
};
});

this.actual.find("tr:odd").each(function (index, tr) {
isZebrafyied = $(tr).hasClass('zebrafy-odd') && $(tr).hasClass('zebrafy-even') === false;
if (!isZebrafyied) {
return;
};
});

return isZebrafyied;
}
});
});
The method addMatchers receives an object where each property is a matcher. Your matcher can receive arguments if you want. The object being matched can be accessed using this.actual, so here is what the method above does: it takes all odd <tr> elements of the table (this.actual) and check if them have the CSS class zebrafy-odd and don’t have the CSS class zebrafy-even, then do the same checking with even <tr> lines.

Now that we have wrote the test, it’s time to write the plugin. Here some jQuery code:

(function ($) {
$.fn.zebrafy = function () {
this.find("tr:even").addClass("zebrafy-even");
this.find("tr:odd").addClass("zebrafy-odd");
};
})(jQuery);
I’m not going to explain how to implement a jQuery plugin neither what are those brackets on function, this post aims to show how to use Jasmine to test jQuery plugins.

By convention, jQuery plugins are “chainable”, so let’s make sure the zebrafy plugin is chainable using a spec:

it('zebrafy should be chainable', function() {
var table = $("#zebra-table");
table.zebrafy().addClass('black-bg');
expect(table.hasClass('black-bg')).toBeTruthy();
});
As you can see, we used the built-in matcher toBeTruthy, which asserts that an object or expression is true. All we need to do is return the jQuery object in the plugin and the test will pass:

(function ($) {
$.fn.zebrafy = function () {
return this.each(function (index, table) {
$(table).find("tr:even").addClass("zebrafy-even");
$(table).find("tr:odd").addClass("zebrafy-odd");
});
};
})(jQuery);
So, the plugin is tested and ready to release! :) You can check the entire code and test with more spec in a Github repository.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Splinter: Python tool for acceptance tests on web applications

Capybara and Webrat are great Ruby tools for acceptance tests. A few months ago, we started a great tool for acceptance tests in Python web applications, called Splinter. There are many acceptance test tools on Python world: Selenium, Alfajor, Windmill, Mechanize, zope.testbrowser, etc. Splinter was not created to be another acceptance tool, but an abstract layer over other tools, its goal is provide a unique API that make acceptance testing easier and funnier.

In this post, I will show some basic usage of Splinter for simple web application tests. Splinter is a tool useful on tests of any web application. You can even test a Java web application using Splinter. This post example is a "test" of a Facebook feature, just because I want to focus on how to use Splinter, not on how to write a web application. The feature to be tested is the creation of an event (the Splinter sprint), following all the flow: first the user will login on Facebook, then click on "Events" menu item, then click on "Create an Event" button, enter all event informations and click on "Create event" button. So, let’s do it…

First step is create a Browser instance, which will provide method for interactions with browser (where the browser is: Firefox, Chrome, etc.). The code we need for it is very simple:
browser = Browser("firefox")
Browser is a class and its constructor receives the driver to be used with that instance. Nowadays, there are three drivers for Splinter: firefox, chrome and zope.testbrowser. We are using Firefox, and you can easily use Chrome by simply changing the driver from firefox to chrome. It’s also very simple to add another driver to Splinter, and I plan to cover how to do that in another blog post here.

A new browser session is started when we got the browser object, and this is the object used for Firefox interactions. Let's start a new event on Facebook, the Splinter Sprint. First of all, we need to visit the Facebook homepage. There is a visit method on Browser class, so we can use it:
browser.visit("https://www.facebook.com")
visit is a blocking operation: it waits for page to load, then we can navigate, click on links, fill forms, etc. Now we have Facebook homepage opened on browser, and you probably know that we need to login on Facebook page, but what if we are already logged in? So, let's create a method that login on Facebook with provided authentication data only the user is not logged in (imagine we are on a TestCase class):
def do_login_if_need(self, username, password):
if self.browser.is_element_present_by_css('div.menu_login_container'):
self.browser.fill('email', username)
self.browser.fill('pass', password)
self.browser.find_by_css('div.menu_login_container input[type="submit"]').first.click()
assert self.browser.is_element_present_by_css('li#navAccount')
What was made here? First of all, the method checks if there is an element present on the page, using a CSS selector. It checks for a div that contains the username and password fields. If that div is present, we tell the browser object to fill those fields, then find the submit button and click on it. The last line is an assert to guarantee that the login was successful and the current page is the Facebook homepage (by checking the presence of “Account” li).

We could also find elements by its texts, labels or whatever appears on screen, but remember: Facebook is an internationalized web application, and we can’t test it using only a specific language.

Okay, now we know how to visit a webpage, check if an element is present, fill a form and click on a button. We're also logged in on Facebook and can finally go ahead create the Splinter sprint event. So, here is the event creation flow, for a user:
  1. On Facebook homepage, click on “Events” link, of left menu
  2. The “Events” page will load, so click on “Create an Event” button
  3. The user see a page with a form to create an event
  4. Fill the date and chose the time
  5. Define what is the name of the event, where it will happen and write a short description for it
  6. Invite some guests
  7. Upload a picture for the event
  8. Click on “Create Event” button
We are going to do all these steps, except the 6th, because the Splinter Sprint will just be a public event and we don’t need to invite anybody. There are some boring AJAX requests on Facebook that we need to deal, so there is not only Splinter code for those steps above. First step is click on “Events” link. All we need to do is find the link and click on it:
browser.find_by_css('li#navItem_events a').first.click()
The find_by_css method takes a CSS selector and returns an ElementList. So, we get the first element of the list (even when the selector returns only an element, the return type is still a list) and click on it. Like visit method, click is a blocking operation: the driver will only listen for new actions when the request is finished (the page is loaded).

We’re finally on "new event" page, and there is a form on screen waiting for data of the Splinter Sprint. Let’s fill the form. Here is the code for it:
browser.fill('event_startIntlDisplay', '5/21/2011')
browser.select('start_time_min', '480')
browser.fill('name', 'Splinter sprint')
browser.fill('location', 'Rio de Janeiro, Brazil')
browser.fill('desc', 'For more info, check out the #cobratem channel on freenode!')
That is it: the event is going to happen on May 21th 2011, at 8:00 in the morning (480 minutes). As we know, the event name is Splinter sprint, and we are going to join some guys down here in Brazil. We filled out the form using fill and select methods.

The fill method is used to fill a "fillable" field (a textarea, an input, etc.). It receives two strings: the first is the name of the field to fill and the second is the value that will fill the field. select is used to select an option in a select element (a “combo box”). It also receives two string parameters: the first is the name of the select element, and the second is the value of the option being selected.

Imagine you have the following select element:
<select name="gender">
<option value="m">Male</option>
<option value="f">Female</option>
</select>
To select “Male”, you would call the select method this way:
browser.select("gender", "m")
The last action before click on “Create Event” button is upload a picture for the event. On new event page, Facebook loads the file field for picture uploading inside an iframe, so we need to switch to this frame and interact with the form present inside the frame. To show the frame, we need to click on “Add Event Photo” button and then switch to it, we already know how click on a link:
browser.find_by_css('div.eventEditUpload a.uiButton').first.click()
When we click this link, Facebook makes an asynchronous request, which means the driver does not stay blocked waiting the end of the request, so if we try to interact with the frame BEFORE it appears, we will get an ElementDoesNotExist exception. Splinter provides the is_element_present method that receives an argument called wait_time, which is the time Splinter will wait for the element to appear on the screen. If the element does not appear on screen, we can’t go on, so we can assume the test failed (remember we are testing a Facebook feature):
if not browser.is_element_present_by_css('iframe#upload_pic_frame', wait_time=10):
fail("The upload pic iframe did'n't appear :(")
The is_element_present_by_css method takes a CSS selector and tries to find an element using it. It also receives a wait_time parameter that indicates a time out for the search of the element. So, if the iframe element with ID=”upload_pic_frame” is not present or doesn’t appear in the screen after 10 seconds, the method returns False, otherwise it returns True.
Important: fail is a pseudocode sample and doesn’t exist (if you’re using unittest library, you can invoke self.fail in a TestCase, exactly what I did in complete snippet for this example, available at Github).
Now we see the iframe element on screen and we can finally upload the picture. Imagine we have a variable that contains the path of the picture (and not a file object, StringIO, or something like this), and this variable name is picture_path, this is the code we need:
with browser.get_iframe('upload_pic_frame') as frame:
frame.attach_file('pic', picture_path)
time.sleep(10)
Splinter provides the get_iframe method that changes the context and returns another objet to interact with the content of the frame. So we call the attach_file method, who also receives two strings: the first is the name of the input element and the second is the absolute path to the file being sent. Facebook also uploads the picture asynchronously, but there’s no way to wait some element to appear on screen, so I just put Python to sleep 10 seconds on last line.

After finish all these steps, we can finally click on “Create Event” button and asserts that Facebook created it:
browser.find_by_css('label.uiButton input[type="submit"]').first.click()
title = browser.find_by_css('h1 span').first.text
assert title == 'Splinter sprint'
After create an event, Facebook redirects the browser to the event page, so we can check if it really happened by asserting the header of the page. That’s what the code above does: in the new event page, it click on submit button, and after the redirect, get the text of a span element and asserts that this text equals to “Splinter sprint”.

That is it! This post was an overview on Splinter API. Check out the complete snippet, written as a test case and also check out Splinter repository at Github.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Killer Java applications server with nginx and memcached

Last days I worked setting up a new web serving structure for Wine, the largest wine’s e-commerce in Latin America. After testing, studying and learning a lot, we built a nice solution based on nginx and memcached. I will use a picture to describe the architecture:

As you can see, when a client do a request to the nginx server, it first checks on memcached if the response is already cached. If the response was not found on cache server, then nginx forward the request to Tomcat, which process the request, cache the response on memcached and returns it to nginx. Tomcat works only for the first client, and all other clients requesting the same resource will get the cached response on RAM. My objective with this post is to show how we built this architecture.

nginx

nginx was compiled following Linode instructions for nginx installation from source. The only difference is that we added the nginx memcached module. So, first I downloaded the memc_module source from Github and then built nginx with it. Here is the commands for compiling nginx with memcached module:
% ./configure --prefix=/opt/nginx --user=nginx --group=nginx --with-http_ssl_module --add-module={your memc_module source path}
% make
% sudo make install
After install nginx and create an init script for it, we can work on its settings for integration with Tomcat. Just for working with separate settings, we changed the nginx.conf file (located in /opt/nginx/conf directory), and it now looks like this:
user  nginx;
worker_processes 1;

error_log logs/error.log;

events {
worker_connections 1024;
}

http {
include mime.types;
default_type application/octet-stream;

log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';

access_log logs/access.log main;

sendfile on;
#tcp_nopush on;

#keepalive_timeout 0;
keepalive_timeout 65;

#gzip on;

include /opt/nginx/sites-enabled/*;
}
See the last line inside http section: this line tells nginx to include all settings present in the /opt/nginx/sites-enabled directory. So, now, let’s create a default file in this directory, with this content:
server {
listen 80;
server_name localhost;

default_type text/html;

location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

if ($request_method = POST) {
proxy_pass http://localhost:8080;
break;
}

set $memcached_key "$uri";
memcached_pass 127.0.0.1:11211;

error_page 501 404 502 = /fallback$uri;
}

location /fallback/ {
internal;

proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_redirect off;

proxy_pass http://localhost:8080;
}

}
Some stuffs must be explained here: the default_type directive is necessary for proper serving of cached responses (if you are cache other content types like application/json or application/xml, you should take a look at nginx documentation and deal conditionally with content types). The location / scope defines some settings for proxy, like IP and host. We just did it because we need to pass the right information to our backend (Tomcat or memcached). See more about proxy_set_header at nginx documentation. After that, there is a simple verification oF the request method. We don’t want to cache POST requests.

Now we get the magic: first we set the $memcached_key and then we use the memcached_pass directive, the $memcached_key is the URI. memcached_pass is very similar to proxy_pass, nginx “proxies” the request to memcached, so we can get some HTTP status code, like 200, 404 or 502. We define error handlers for two status codes:
  • 404: memcached module returns a 404 error when the key is not on memcached server;
  • 502: memcached module returns a 502 error when it can’t found memcached server.
So, when nginx gets any of those errors, it should forward the request to Tomcat, creating another proxy. We configured it out on fallback, an internal location that builds a proxy between nginx and Tomcat (listening on port 8080). Everything is set up with nginx. As you can see in the picture or in the nginx configuration file, nginx doesn’t write anything to memcached, it only reads from memcached. The application should write to memcached. Let’s do it.

Java application

Now is the time to write some code. I chose an application written by a friend of mine. It’s a very simple CRUD of users, built by Washington Botelho with the goal of introducing VRaptor, a powerful and fast development focused web framework. Washington also wrote a blog post explaining the application, if you don’t know VRaptor or want to know how the application was built, check the blog post "Getting started with VRaptor 3". I forked the application, made some minor changes and added a magic filter for caching. All Java code that I want to show here is the filter code:

package com.franciscosouza.memcached.filter;

import java.io.IOException;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.net.InetSocketAddress;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletOutputStream;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;

import net.spy.memcached.MemcachedClient;

/**
* Servlet Filter implementation class MemcachedFilter
*/
public class MemcachedFilter implements Filter {

private MemcachedClient mmc;

static class MemcachedHttpServletResponseWrapper extends HttpServletResponseWrapper {

private StringWriter sw = new StringWriter();

public MemcachedHttpServletResponseWrapper(HttpServletResponse response) {
super(response);
}

public PrintWriter getWriter() throws IOException {
return new PrintWriter(sw);
}

public ServletOutputStream getOutputStream() throws IOException {
throw new UnsupportedOperationException();
}

public String toString() {
return sw.toString();
}
}

/**
* Default constructor.
*/
public MemcachedFilter() {
}

/**
* @see Filter#destroy()
*/
public void destroy() {
}

/**
* @see Filter#doFilter(ServletRequest, ServletResponse, FilterChain)
*/
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
MemcachedHttpServletResponseWrapper wrapper = new MemcachedHttpServletResponseWrapper((HttpServletResponse) response);
chain.doFilter(request, wrapper);

HttpServletRequest inRequest = (HttpServletRequest) request;
HttpServletResponse inResponse = (HttpServletResponse) response;

String content = wrapper.toString();

PrintWriter out = inResponse.getWriter();
out.print(content);

if (!inRequest.getMethod().equals("POST")) {
String key = inRequest.getRequestURI();
mmc.set(key, 5, content);
}
}

/**
* @see Filter#init(FilterConfig)
*/
public void init(FilterConfig fConfig) throws ServletException {
try {
mmc = new MemcachedClient(new InetSocketAddress("localhost", 11211));
} catch (IOException e) {
e.printStackTrace();
throw new ServletException(e);
}
}
}
First, the dependency: for memcached communication, we used spymemcached client. It is a simple and easy to use memcached library. I won’t explain all the code, line by line, but I can tell the idea behind the code: first, call doFilter method on FilterChain, because we want to get the response and work with that. Take a look at the MemcachedHttpServletResponseWrapper instance, it encapsulates the response and makes easier to play with response content.

We get the content, write it on response writer and put it in cache using the MemcachedClient provided by spymemcached. The request URI is the key and timeout is 5 seconds.

web.xml

Last step is to add the filter on web.xml file of the project, map it before the VRaptor filter is very important for proper working:

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://java.sun.com/xml/ns/javaee" xmlns:web="http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" id="WebApp_ID" version="2.5">
<display-name>memcached sample</display-name>

<filter>
<filter-name>vraptor</filter-name>
<filter-class>br.com.caelum.vraptor.VRaptor</filter-class>
</filter>

<filter>
<filter-name>memcached</filter-name>
<filter-class>com.franciscosouza.memcached.filter.MemcachedFilter</filter-class>
</filter>

<filter-mapping>
<filter-name>memcached</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>


<filter-mapping>
<filter-name>vraptor</filter-name>
<url-pattern>/*</url-pattern>
<dispatcher>FORWARD</dispatcher>
<dispatcher>REQUEST</dispatcher>
</filter-mapping>

</web-app>
That is it! Now you can just run Tomcat on port 8080 and nginx on port 80, and access http://localhost on your browser. Try some it: raise up the cache timeout, navigate on application and turn off Tomcat. You will still be able to navigate on some pages that use GET request method (users list, home and users form).

Check the entire code out on Github: https://github.com/fsouza/starting-with-vraptor-3. If you have any questions, troubles or comments, please let me know! ;)

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Flying with tipfy on Google App Engine

Hooray, there is a bonus part in the series (after a looooooooooooong wait)! In the first blog post, about Django, I received a comment about the use of tipfy, a small Python web framework made specifically for Google App Engine. Like Flask, tipfy is not a full stack framework and we will not use a database abstraction layer, we will use just the Google App Engine Datastore API, but tipfy was designed for Google App Engine, so it is less laborious to work with tipfy on App Engine.

First, we have to download tipfy. There are two options on official tipfy page: an all-in-one package and a do-it-yourself packaged. I am lazy, so I downloaded and used the all-in-one package. That is so easy:
% wget http://www.tipfy.org/tipfy.build.tar.gz
% tar -xvzf tipfy.0.6.2.build.tar.gz
% mv project gaeseries
After it, we go to the project folder and see the project structure provided by tipfy. There is a directory called "app", where the App Engine app is located. The app.yaml file is in the app directory, so we open that file and change the application id and the application version. Here is the app.yaml file:
application: gaeseries
version: 4
runtime: python
api_version: 1

derived_file_type:
- python_precompiled

handlers:
- url: /(robots\.txt|favicon\.ico)
static_files: static/\1
upload: static/(.*)

- url: /remote_api
script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
login: admin

- url: /_ah/queue/deferred
script: main.py
login: admin

- url: /.*
script: main.py
After this, we can start to code our application. tipfy deals with requests using handlers. A handler is a class that has methods to deal with different kinds of requests. That remember me a little the Strut Actions (blergh), but tipfy is a Python framework, what means that it is easier to build web application using it!

Understanding tipfy: a URL is mapped to a handler that do something with the request and returns a response. So, we have to create two handlers: one to the list of posts and other to create a post, but let’s create first an application called blog, and a model called Post. Like Django, Flask and web2py, tipfy also works with applications inside a project.

To create an application, we just need to create a new Python package with the application name:
% mkdir blog
% touch blog/__init__.py
After create the application structure, we install it by putting the application inside the "apps_installed" list on config.py file:
# -*- coding: utf-8 -*-
"""
config
~~~~~~

Configuration settings.

:copyright: 2009 by tipfy.org.
:license: BSD, see LICENSE for more details.
"""
config = {}

# Configurations for the 'tipfy' module.
config['tipfy'] = {
# Enable debugger. It will be loaded only in development.
'middleware': [
'tipfy.ext.debugger.DebuggerMiddleware',
],
# Enable the Hello, World! app example.
'apps_installed': [
'apps.hello_world',
'apps.blog',
],
}
See the line 22. Inside the application folder, let’s create a Python module called models.py. This module is exactly the same of Flask post:
from google.appengine.ext import db

class Post(db.Model):
title = db.StringProperty(required = True)
content = db.TextProperty(required = True)
when = db.DateTimeProperty(auto_now_add = True)
author = db.UserProperty(required = True)
After create the model, let’s start building the project by creating the post listing handler. The handlers will be in a module called handlers.py, inside the application folder. Here is the handlers.py code:
# -*- coding: utf-8 -*-
from tipfy import RequestHandler
from tipfy.ext.jinja2 import render_response
from models import Post

class PostListingHandler(RequestHandler):
def get(self):
posts = Post.all()
return render_response('list_posts.html', posts=posts)
See that we get a list containing all posts from the database and send it to the list_posts.html template. Like Flask, tipfy uses Jinja2 as template engine by default. Following the same way, let’s create a base.html file who represents the layout of the project. This file should be inside the templates folder and contains the following code:
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
<title>{% block title %}{% endblock %}</title>
</head>
<body id="">
{% block content %}{% endblock %}
</body>
</html>
And now we can create the list_posts.html template extending the base.html template:
{% extends "base.html" %}

{% block title %}
Posts list
{% endblock %}

{% block content %}
Listing all posts:

<ul>
{% for post in posts %}
<li>
{{ post.title }} (written by {{ post.author.nickname() }})
{{ post.content }}
</li>
{% endfor %}
</ul>
{% endblock %}
Can we access the list of posts now by the URL? No, we can’t yet. Now we have to map the handler to a URL, and we will be able to access the list of posts through the browser. On tipfy, all URL mappings of an application are located in a Python module called urls.py. Create it with the following code:
from tipfy import Rule

def get_rules(app):
rules = [
Rule('/posts', endpoint='post-listing', handler='apps.blog.handlers.PostListingHandler'),
]

return rules
It is very simple: a Python module containing a function called get_rules, that receives the app object as parameter and return a list containing the rules of the application (each rule is an instance of tipfy.Rule class). Now we can finally see the empty post list on the browser, by running the App Engine development server and touching the http://localhost:8080/posts URL on the browser. Run the following command on the project root:
% /usr/local/google_appengine/dev_appserver.py app
And check the browser at http://localhost:8080/posts. And we see the empty list. Now, let’s create the protected handler which will create a new post. tipfy has an auth extension, who makes very easy to deal with authentication using the native Google App Engine users API. To use that, we need to configure the session extension, changing the conf.py module, by adding the following code lines:
config['tipfy.ext.session'] = {
'secret_key' : 'just_dev_testH978DAGV9B9sha_W92S',
}
Now we are ready to create the NewPostHandler. We will need to deal with forms, and tipfy has an extension for integration with WTForms, so we have to download and install WTForms and that extension in the project:
% wget http://bitbucket.org/simplecodes/wtforms/get/tip.tar.bz2
% tar -xvf tip.tar.bz2
% cp -r wtforms/wtforms/ ~/Projetos/gaeseries/app/lib/
% wget http://pypi.python.org/packages/source/t/tipfy.ext.wtforms/tipfy.ext.wtforms-0.6.tar.gz
% tar -xvzf tipfy.ext.wtforms-0.6.tar.gz
% cp -r tipfy.ext.wtforms-0.6/tipfy ~/Projetos/gaeseries/app/distlib
Now we have WTForms extension installed and ready to be used. Let’s create the PostForm class, and then create the handler. I put both classes in the handlers.py file (yeah, including the form). Here is the PostForm class code:
class PostForm(Form):
csrf_protection = True
title = fields.TextField('Title', validators=[validators.Required()])
content = fields.TextAreaField('Content', validators=[validators.Required()])
Add this class to the handlers.py module:
class NewPostHandler(RequestHandler, AppEngineAuthMixin, AllSessionMixins):
middleware = [SessionMiddleware]

@login_required
def get(self, **kwargs):
return render_response('new_post.html', form=self.form)

@login_required
def post(self, **kwargs):
if self.form.validate():
post = Post(
title = self.form.title.data,
content = self.form.content.data,
author = self.auth_session
)
post.put()
return redirect('/posts')
return self.get(**kwargs)

@cached_property
def form(self):
return PostForm(self.request)
A lot of news here: first, tipfy explores the multi-inheritance Python feature and if you will use the auth extension by the native App Engine users API, you have to create you handler class extending AppEngineAuthMixin and AllSessionMixins classes, and add to the middleware list the SessionMiddleware class. See more at the tipfy docs.

The last step is create the new_post.html template and deploy the application. Here is the new_post.html template code:
{% extends "base.html" %}

{% block title %}
New post
{% endblock %}

{% block content %}
<form action="" method="post" accept-charset="utf-8">
<p>
<label for="title">{{ form.title.label }}</label>

{{ form.title|safe }}

{% if form.title.errors %}
<ul class="errors">
{% for error in form.title.errors %}
<li>{{ error }}</li>
{% endfor %}
</ul>
{% endif %}
</p>
<p>
<label for="content">{{ form.content.label }}</label>

{{ form.content|safe }}

{% if form.content.errors %}
<ul class="errors">
{% for error in form.content.errors %}
<li>{{ error }}</li>
{% endfor %}
</ul>
{% endif %}
</p>
<p><input type="submit" value="Save post"/></p>
</form>
{% endblock %}
Now, we can deploy the application on Google App Engine by simply running this command:
% /usr/local/google_appengine/appcfg.py update app
And you can check the deployed application live here: http://4.latest.gaeseries.appspot.com.

The code is available at Github: https://github.com/fsouza/gaeseries/tree/tipfy.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Flying with Flask on Google App Engine

A little late, finally I introduce the third part of using Python frameworks in Google App Engine. I wrote before about web2py and Django, and now is the time of Flask, a Python microframework based on Werkzeug, Jinja2 and good intentions. Unlike Django and web2py, Flask is not a full stack framework, it has not a database abstraction layer or an object relational mapper, Flask is totally decoupled from model layer. It is really good, because we can use the power of SQLAlchemy when we are working with relational databases, and when work with non-relational databases, we can use the native API.

Flask is a microframework, what means that we have more power on customizing the applications, but it is also a little more painful to build an application, because the framework is not a father that does about 10 billion of things for us: it is simple, but still fun! As Flask has no data abstraction layer, we will use the BigTable API directly.

So, as we have done in other parts of the series, the sample application will be a very simple blog, with a public view listing all posts and other login protected view used for writing posts. The first step is to setup the environment. It is very simple, but I little laborious: first we create an empty directory and put the app.yaml file inside it (yes, we will build everything from scratch). Here is the app.yaml code:
application: gaeseries
version: 3
runtime: python
api_version: 1

handlers:
- url: .*
script: main.py
We just set the application ID, the version and the URL handlers. We will handle all request in main.py file. Late on this post, I will show the main.py module, the script that handles Flask with Google App Engine. Now, let’s create the Flask application, and deal with App Engine later :)

Now we need to install Flask inside the application, so we get Flask from Github (I used 0.6 version), extract it and inside the flask directory get the flask subdirectory. Because Flask depends on Werkzeug and Jinja2, and Jinja2 depends on simplejson, you need to get these libraries and install in your application too. Here is how you can get everything:
% wget http://github.com/mitsuhiko/flask/zipball/0.6
% unzip mitsuhiko-flask-0.6-0-g5cadd9d.zip
% cp -r mitsuhiko-flask-5cadd9d/flask ~/Projetos/blog/gaeseries
% wget http://pypi.python.org/packages/source/W/Werkzeug/Werkzeug-0.6.2.tar.gz
% tar -xvzf Werkzeug-0.6.2.tar.gz
% cp -r Werkzeug-0.6.2/werkzeug ~/Projetos/blog/gaeseries/
% wget http://pypi.python.org/packages/source/J/Jinja2/Jinja2-2.5.tar.gz
% tar -xvzf Jinja2-2.5.tar.gz
% cp -r Jinja2-2.5/jinja2 ~/Projetos/blog/gaeseries/
% wget http://pypi.python.org/packages/source/s/simplejson/simplejson-2.1.1.tar.gz
% tar -xvzf simplejson-2.1.1.tar.gz
% cp -r simplejson-2.1.1/simplejson ~/Projetos/blog/gaeseries/
On my computer, the project is under ~/Projetos/blog/gaeseries, put all downloaded tools on the root of your application. Now we have everything that we need to start to create our Flask application, so let’s create a Python package called blog, it will be the application directory:
% mkdir blog
% touch blog/__init__.py
Inside the __init__.py module, we will create our Flask application and start to code. Here is the __init__.py code:
from flask import Flask
import settings

app = Flask('blog')
app.config.from_object('blog.settings')

import views
We imported two modules: settings and views. So we should create the two modules, where we will put the application settings and the views of applications (look that Flask deals in the same way that Django, calling “views” functions that receives a request and returns a response, instead of call it “actions” (like web2py). Just create the files:
% touch blog/views.py
% touch blog/settings.py
Here is the settings.py sample code:
DEBUG=True
SECRET_KEY='dev_key_h8hfne89vm'
CSRF_ENABLED=True
CSRF_SESSION_LKEY='dev_key_h8asSNJ9s9=+'
Now is the time to define the model Post. We will define our models inside the application directory, in a module called models.py:
from google.appengine.ext import db

class Post(db.Model):
title = db.StringProperty(required = True)
content = db.TextProperty(required = True)
when = db.DateTimeProperty(auto_now_add = True)
author = db.UserProperty(required = True)
The last property is a UserProperty, a “foreign key” to a user. We will use the Google App Engine users API, so the datastore API provides this property to establish a relationship between custom models and the Google account model.

We have defined the model, and we can finally start to create the application’s views. Inside the views module, let’s create the public view with all posts, that will be accessed by the URL /posts:
from blog import app
from models import Post
from flask import render_template

@app.route('/posts')
def list_posts():
posts = Post.all()
return render_template('list_posts.html', posts=posts)
On the last line of the view, we called the function render_template, which renders a template. The first parameter of this function is the template to be rendered, we passed the list_posts.html, so let’s create it using the Jinja2 syntax, inspired by Django templates. Inside the application directory, create a subdirectory called templates and put inside it a HTML file called base.html. That file will be the application layout and here is its code:
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
<title>{% block title %}Blog{% endblock %}</title>
</head>
<body>
{% block content %}{% endblock %}
</body>
</html>
And now create the list_posts.html template, with the following code:
{% extends "base.html" %}

{% block content %}
<ul>
{% for post in posts %}
<li>
{{ post.title }} (written by {{ post.author.nickname() }})

{{ post.content }}
</li>
{% endfor %}
</ul>
{% endblock %}
Now, to test it, we need to run Google App Engine development server on localhost. The app.yaml file defined a main.py script as handler for all requests, so to use Google App Engine local development server, we need to create the main.py file that run our application. Every Flask application is a WSGI application, so we can use an App Engine tool for running WSGI application. In that way, the main.py script is really simple:
from google.appengine.ext.webapp.util import run_wsgi_app
from blog import app

run_wsgi_app(app)
The script uses the run_wsgi_app function provided by webapp, the built-in Google Python web framework for App Engine. Now, we can run the application in the same way that we ran in the web2py post:
% /usr/local/google_appengine/dev_appserver.py .
And if you access the URL http://localhost:8080/posts in your browser, you will see a blank page, just because there is no posts on the database. Now we will create a login protected view to write and save a post on the database. Google App Engine does not provide a decorator for validate when a user is logged, and Flask doesn’t provide it too. So, let’s create a function decorator called login_required and decorate the view new_post with that decorator. I created the decorator inside a decorators.py module and import it inside the views.py module. Here is the decorators.py code:
from functools import wraps
from google.appengine.api import users
from flask import redirect, request

def login_required(func):
@wraps(func)
def decorated_view(*args, **kwargs):
if not users.get_current_user():
return redirect(users.create_login_url(request.url))
return func(*args, **kwargs)
return decorated_view
In the new_post view we will deal with forms. IMO, WTForms is the best way to deal with forms in Flask. There is a Flask extension called Flask-WTF, and we can install it in our application for easy dealing with forms. Here is how can we install WTForms and Flask-WTF:
% wget http://pypi.python.org/packages/source/W/WTForms/WTForms-0.6.zip
% unzip WTForms-0.6.zip
% cp -r WTForms-0.6/wtforms ~/Projetos/blog/gaeseries/
% wget http://pypi.python.org/packages/source/F/Flask-WTF/Flask-WTF-0.2.3.tar.gz
% tar -xvzf Flask-WTF-0.2.3.tar.gz
% cp -r Flask-WTF-0.2.3/flaskext ~/Projetos/blog/gaeseries/
Now we have installed WTForms and Flask-WTF, and we can create a new WTForm with two fields: title and content. Remember that the date and author will be filled automatically with the current datetime and current user. Here is the PostForm code (I put it inside the views.py file, but it is possible to put it in a separated forms.py file):
from flaskext import wtf
from flaskext.wtf import validators

class PostForm(wtf.Form):
title = wtf.TextField('Title', validators=[validators.Required()])
content = wtf.TextAreaField('Content', validators=[validators.Required()])
Now we can create the new_post view:
@app.route('/posts/new', methods = ['GET', 'POST'])
@login_required
def new_post():
form = PostForm()
if form.validate_on_submit():
post = Post(title = form.title.data,
content = form.content.data,
author = users.get_current_user())
post.put()
flash('Post saved on database.')
return redirect(url_for('list_posts'))
return render_template('new_post.html', form=form)
Now, everything we need is to build the new_post.html template, here is the code for this template:
{% extends "base.html" %}

{% block content %}
<h1 id="">Write a post</h1>
<form action="{{ url_for('new_post') }}" method="post" accept-charset="utf-8">
{{ form.csrf_token }}
<p>
<label for="title">{{ form.title.label }}</label>

{{ form.title|safe }}

{% if form.title.errors %}
<ul class="errors">
{% for error in form.title.errors %}
<li>{{ error }}</li>
{% endfor %}
</ul>
{% endif %}
</p>
<p>
<label for="content">{{ form.content.label }}</label>

{{ form.content|safe }}

{% if form.content.errors %}
<ul class="errors">
{% for error in form.content.errors %}
<li>{{ error }}</li>
{% endfor %}
</ul>
{% endif %}
</p>
<p><input type="submit" value="Save post"/></p>
</form>
{% endblock %}
Now everything is working. We can run Google App Engine local development server and access the URL http://localhost:8080/posts/new on the browser, then write a post and save it! Everything is ready to deploy, and the deploy process is the same of web2py, just run on terminal:
% /usr/local/google_appengine/appcfg.py update .
And now the application is online :) Check this out: http://3.latest.gaeseries.appspot.com (use your Google Account to write posts).

You can also check the code out in Github: https://github.com/fsouza/gaeseries/tree/flask.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Flying with web2py on Google App Engine

Here is the second part of the series about Python frameworks under Google App Engine. Now we will talk about web2py, a simple and fast Python web framework. Like Django, web2py has a great data abstraction layer. Unlike Django, the web2py data abstraction layer (DAL) was designed to manage non-relational databases, including BigTable.

The first step is setup the environment, which is something really easy ;) First, access the web2py official website and in download section, get the source code in a zip file called web2py_src.zip. After download this file, extract it. A directory called web2py will be created, I renamed it to web2py_blog, but it is not relevant. web2py extracted directory is ready to Google App Engine, it contains an app.yaml file with settings of the application, for the application developed here, the following file was used:
application: gaeseries
version: 2
api_version: 1
runtime: python

handlers:

- url: /(?P<a>.+?)/static/(?P<b>.+)
static_files: applications/\1/static/\2
upload: applications/(.+?)/static/(.+)
secure: optional
expiration: "90d"

- url: /admin-gae/.*
script: $PYTHON_LIB/google/appengine/ext/admin
login: admin

- url: /_ah/queue/default
script: gaehandler.py
login: admin

- url: .*
script: gaehandler.py
secure: optional

skip_files: |
^(.*/)?(
(app\.yaml)|
(app\.yml)|
(index\.yaml)|
(index\.yml)|
(#.*#)|
(.*~)|
(.*\.py[co])|
(.*/RCS/.*)|
(\..*)|
((admin|examples|welcome)\.tar)|
(applications/(admin|examples)/.*)|
(applications/.*?/databases/.*) |
(applications/.*?/errors/.*)|
(applications/.*?/cache/.*)|
(applications/.*?/sessions/.*)|
)$
I changed only the two first lines, everything else was provided by web2py. The web2py project contains a subdirectory called applications where the web2py applications are located. There is an application called welcome used as scaffold to build new applications. So, let’s copy this directory and rename it to blog. Now we can walk in the same way that we walked in the django post: we will use two actions on a controller: one protected by login, where we will save posts, and other public action, where we will list all posts.

We need to define our table model using the web2py database abstraction layer. There is a directory called models with a file called db.py inside the application directory (blog). There are a lot of code in this file, and it is already configured to use Google App Engine (web2py is amazing here) and the web2py built-in authentication tool. We will just add our Post model at the end of the file. Here is the code that defines the model:
current_user_id = (auth.user and auth.user.id) or 0

db.define_table('posts', db.Field('title'),
db.Field('content', 'text'),
db.Field('author', db.auth_user, default=current_user_id, writable=False),
db.Field('date', 'datetime', default=request.now, writable=False)
)

db.posts.title.requires = IS_NOT_EMPTY()
db.posts.content.requires = IS_NOT_EMPTY()
This code looks a little strange, but it is very simple: we define a database table called posts with four fields: title (a varchar – default type), content (a text), author (a foreign key – forget this in BigTable – to the auth_user table) and date (an automatically filled datetime field). On the last two lines, we define two validations to this model: title and content should not be empty.

Now is the time to define a controller with an action to list all posts registered in the database. Another subdirectory of the blog application is the controllers directory, where we put the controllers. web2py controllers are a Python module, and each function of this module is an action, which responds to HTTP requests. web2py has an automatic URL convention for the action: /<application>/<controller>/<action>. In our example, we will have a controller called posts, so it will be a file called posts.py inside the controllers directory.

In the controller posts.py, we will have the action index, in that way, when we access the URL /blog/posts, we will see  the list of the posts. Here is the code of the index action:
def index():
posts = db().select(db.posts.ALL)
return response.render('posts/index.html', locals())
As you can see, is just a few of code :) Now we need to make the posts/index.html view. The web2py views system allow the developer to use native Python code on templates, what means that the developer/designer has more power and possibilities. Here is the code of the view posts/index.html (it should be inside the views directory):
{{extend 'layout.html'}}
<h1 id="">Listing all posts</h1>
<dl>
{{for post in posts:}}
<dt>{{=post.title}} (written by {{=post.author.first_name}})</dt>
<dd>{{=post.content}}</dd>
{{pass}}
</dl>
And now we can run the Google App Engine server locally by typing the following command inside the project root (I have the Google App Engine SDK extracted on my /usr/local/google_appengine):
% /usr/local/google_appengine/dev_appserver.py .
If you check the URL http://localhost:8080/blog/posts, then you will see that we have no posts in the database yet, so let’s create the login protected action that saves a post on the database. Here is the action code:
@auth.requires_login()
def new():
form = SQLFORM(db.posts, fields=['title','content'])
if form.accepts(request.vars, session):
response.flash = 'Post saved.'
redirect(URL('blog', 'posts', 'index'))
return response.render('posts/new.html', dict(form=form))
Note that there is a decorator. web2py includes a complete authentication and authorization system, which includes an option for new users registries. So you can access the URL /blog/default/user/register and register yourself to write posts :) Here is the posts/new.html view code, that displays the form:
{{extend 'layout.html'}}

<h1 id="">
Save a new post</h1>
{{=form}}
After it the application is ready to the deploy. The way to do it is running the following command on the project root:
% /usr/local/google_appengine/appcfg.py update .
And see the magic! :) You can check this application live here: http://2.latest.gaeseries.appspot.com/ (you can login with the e-mail demo@demo.com and the password demo, you can also register yourself).

And the code here: https://github.com/fsouza/gaeseries/tree/web2py.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Using Juju to orchestrate CentOS-based cloud services

Earlier this week I had the opportunity to meet Kyle MacDonald, head of Ubuntu Cloud, during FISL, and he was surprised when we told him we are using Juju with CentOS at Globo.com. Then I decided to write this post explaining how we came up with a patched version of Juju that allows us to have CentOS clouds managed by Juju.

For those who doesn't know Juju, it's a service orchestration tool, focused on devops "development method". It allows you to deploy services on clouds, local machine and even bare metal machines (using Canonical's MAAS).

It's based on charms and very straightforward to use. Here is a very basic set of commands with which you can deploy a Wordpress related to a MySQL service:

% juju bootstrap
% juju deploy mysql
% juju deploy wordpress
% juju add-relation wordpress mysql
% juju expose wordpress
These commands will boostrap the environment, setting up a bootstrap machine which will manage your services; deploy mysql and wordpress instances; add a relation between them; and expose the wordpress port. The voilà, we have a wordpress deployed, and ready to serve our posts. Amazing, huh?

But there is an issue: although you can install the juju command line tool in almost any OS (including Mac OS), right now you are able do deploy only Ubuntu-based services (you must use an Ubuntu instance or container).

To change this behavior, and enable Juju to spawn CentOS instances (and containers, if you have a CentOS lxc template), we need to develop and apply some changes to Juju and cloud-init. Juju uses cloud-init to spawn machines with proper dependencies set up, and it's based on modules. All we need to do, is add a module able to install rpm packages using yum.

cloud-init modules are Python modules that starts with cc_ and implement a `handle` function (for example, a module called "yum_packages" would be written to a file called cc_yum_packages.py). So, here is the code for the module yum_packages:

import subprocess
import traceback

from cloudinit import CloudConfig, util

frequency = CloudConfig.per_instance


def yum_install(packages):
cmd = ["yum", "--quiet", "--assumeyes", "install"]
cmd.extend(packages)
subprocess.check_call(cmd)


def handle(_name, cfg, _cloud, log, args):
pkglist = util.get_cfg_option_list_or_str(cfg, "packages", [])

if pkglist:
try:
yum_install(pkglist)
except subprocess.CalledProcessError:
log.warn("Failed to install yum packages: %s" % pkglist)
log.debug(traceback.format_exc())
raise

return True
The module installs all packages listed in cloud-init yaml file. If we want to install `emacs-nox` package, we would write this yaml file and use it as user data in the instance:

#cloud-config
modules:
- yum_packages
packages: [emacs-nox]
cloud-init already works on Fedora, with Python 2.7, but to work on CentOS 6, with Python 2.6, it needs a patch:

--- cloudinit/util.py 2012-05-22 12:18:21.000000000 -0300
+++ cloudinit/util.py 2012-05-31 12:44:24.000000000 -0300
@@ -227,7 +227,7 @@
stderr=subprocess.PIPE, stdin=subprocess.PIPE)
out, err = sp.communicate(input_)
if sp.returncode is not 0:
- raise subprocess.CalledProcessError(sp.returncode, args, (out, err))
+ raise subprocess.CalledProcessError(sp.returncode, args)
return(out, err)
I've packet up this module and this patch in a RPM package that must be pre-installed in the lxc template and AMI images. Now, we need to change Juju in order to make it use the yum_packages module, and include all RPM packages that we need to install when the machine borns.

Is Juju, there is a class that is responsible for building and rendering the YAML file used by cloud-init. We can extend it and change only two methods: _collect_packages, that returns the list of packages that will be installed in the machine after it is spawned; and render that returns the file itself. Here is our CentOSCloudInit class (within the patch):

diff -u juju-0.5-bzr531.orig/juju/providers/common/cloudinit.py juju-0.5-bzr531/juju/providers/common/cloudinit.py
--- juju-0.5-bzr531.orig/juju/providers/common/cloudinit.py 2012-05-31 15:42:17.480769486 -0300
+++ juju-0.5-bzr531/juju/providers/common/cloudinit.py 2012-05-31 15:55:13.342884919 -0300
@@ -324,3 +324,32 @@
"machine-id": self._machine_id,
"juju-provider-type": self._provider_type,
"juju-zookeeper-hosts": self._join_zookeeper_hosts()}
+
+
+class CentOSCloudInit(CloudInit):
+
+ def _collect_packages(self):
+ packages = [
+ "bzr", "byobu", "tmux", "python-setuptools", "python-twisted",
+ "python-txaws", "python-zookeeper", "python-devel", "juju"]
+ if self._zookeeper:
+ packages.extend([
+ "zookeeper", "libzookeeper", "libzookeeper-devel"])
+ return packages
+
+ def render(self):
+ """Get content for a cloud-init file with appropriate specifications.
+
+ :rtype: str
+
+ :raises: :exc:`juju.errors.CloudInitError` if there isn't enough
+ information to create a useful cloud-init.
+ """
+ self._validate()
+ return format_cloud_init(
+ self._ssh_keys,
+ packages=self._collect_packages(),
+ repositories=self._collect_repositories(),
+ scripts=self._collect_scripts(),
+ data=self._collect_machine_data(),
+ modules=["ssh", "yum_packages", "runcmd"])
The other change we need is in the format_cloud_init function, in order to make it recognize the modules parameter that we used above, and tell cloud-init to not run apt-get (update nor upgrade). Here is the patch:

diff -ur juju-0.5-bzr531.orig/juju/providers/common/utils.py juju-0.5-bzr531/juju/providers/common/utils.py
--- juju-0.5-bzr531.orig/juju/providers/common/utils.py 2012-05-31 15:42:17.480769486 -0300
+++ juju-0.5-bzr531/juju/providers/common/utils.py 2012-05-31 15:44:06.605014021 -0300
@@ -85,7 +85,7 @@


def format_cloud_init(
- authorized_keys, packages=(), repositories=None, scripts=None, data=None):
+ authorized_keys, packages=(), repositories=None, scripts=None, data=None, modules=None):
"""Format a user-data cloud-init file.

This will enable package installation, and ssh access, and script
@@ -117,8 +117,8 @@
structure.
"""
cloud_config = {
- "apt-update": True,
- "apt-upgrade": True,
+ "apt-update": False,
+ "apt-upgrade": False,
"ssh_authorized_keys": authorized_keys,
"packages": [],
"output": {"all": "| tee -a /var/log/cloud-init-output.log"}}
@@ -136,6 +136,11 @@
if scripts:
cloud_config["runcmd"] = scripts

+ if modules:
+ cloud_config["modules"] = modules
+
output = safe_dump(cloud_config)
output = "#cloud-config\n%s" % (output)
return output
This patch is also packed up within juju-centos-6 repository, which provides sources for building RPM packages for juju, and also some pre-built RPM packages.

Now just build an AMI image with cloudinit pre-installed, configure your juju environments.yaml file to use this image in the environment and you are ready to deploy cloud services on CentOS machines using Juju!

Some caveats:
  • Juju needs a user called ubuntu to interact with its machines, so you will need to create this user in your CentOS AMI/template.
  • You need to host all RPM packages for juju, cloud-init and following dependencies in some yum repository (I haven't submitted them to any public repository):
  • With this patched Juju, you will have a pure-centos cloud. It does not enable you to have multiple OSes in the same environment.
It's important to notice that we are going to put some effort to make the Go version of juju born supporting multiple OSes, ideally through an interface that makes it extensible to any other OS, not Ubuntu and CentOS only.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Go solution for the Dining philosophers problem

I spent part of the sunday solving the Dining Philosophers using Go. The given solution is based in the description for the problem present in The Little Book of Semaphores:

The Dining Philosophers Problem was proposed by Dijkstra in 1965, when dinosaurs ruled the earth. It appears in a number of variations, but the standard features are a table with five plates, five forks (or chopsticks) and a big bowl of spaghetti.

There are some constraints:
  • Only one philosopher can hold a fork at a time
  • It must be impossible for a deadlock to occur
  • It must be impossible for a philosopher to starve waiting for a fork
  • It must be possible for more than one philosopher to eat at the same time
No more talk, here is my solution for the problem:
package main

import (
"fmt"
"sync"
"time"
)

type Fork struct {
sync.Mutex
}

type Table struct {
philosophers chan Philosopher
forks []*Fork
}

func NewTable(forks int) *Table {
t := new(Table)
t.philosophers = make(chan Philosopher, forks - 1)
t.forks = make([]*Fork, forks)
for i := 0; i < forks; i++ {
t.forks[i] = new(Fork)
}
return t
}

func (t *Table) PushPhilosopher(p Philosopher) {
p.table = t
t.philosophers <- data-blogger-escaped-0="" data-blogger-escaped-1="" data-blogger-escaped-2="" data-blogger-escaped-3="" data-blogger-escaped-4="" data-blogger-escaped-:="range" data-blogger-escaped-_="" data-blogger-escaped-able="" data-blogger-escaped-anscombe="" data-blogger-escaped-artin="" data-blogger-escaped-chan="" data-blogger-escaped-e9="" data-blogger-escaped-eat="" data-blogger-escaped-eating...="" data-blogger-escaped-eter="" data-blogger-escaped-f="" data-blogger-escaped-fed.="" data-blogger-escaped-fed="" data-blogger-escaped-fmt.printf="" data-blogger-escaped-for="" data-blogger-escaped-func="" data-blogger-escaped-getforks="" data-blogger-escaped-go="" data-blogger-escaped-heidegger="" data-blogger-escaped-homas="" data-blogger-escaped-index="" data-blogger-escaped-int="" data-blogger-escaped-is="" data-blogger-escaped-leftfork.lock="" data-blogger-escaped-leftfork.unlock="" data-blogger-escaped-leftfork="" data-blogger-escaped-leibniz="" data-blogger-escaped-len="" data-blogger-escaped-lizabeth="" data-blogger-escaped-lombard="" data-blogger-escaped-main="" data-blogger-escaped-make="" data-blogger-escaped-n="" data-blogger-escaped-nagel="" data-blogger-escaped-name="" data-blogger-escaped-ork="" data-blogger-escaped-ottfried="" data-blogger-escaped-p.eat="" data-blogger-escaped-p.fed="" data-blogger-escaped-p.getforks="" data-blogger-escaped-p.name="" data-blogger-escaped-p.putforks="" data-blogger-escaped-p.table.popphilosopher="" data-blogger-escaped-p.table.pushphilosopher="" data-blogger-escaped-p.table="nil" data-blogger-escaped-p.think="" data-blogger-escaped-p="" data-blogger-escaped-philosopher="" data-blogger-escaped-philosopherindex="" data-blogger-escaped-philosophers="" data-blogger-escaped-popphilosopher="" data-blogger-escaped-pre="" data-blogger-escaped-putforks="" data-blogger-escaped-return="" data-blogger-escaped-rightfork.lock="" data-blogger-escaped-rightfork.unlock="" data-blogger-escaped-rightfork="" data-blogger-escaped-s="" data-blogger-escaped-string="" data-blogger-escaped-struct="" data-blogger-escaped-t.forks="" data-blogger-escaped-t="" data-blogger-escaped-table="" data-blogger-escaped-think="" data-blogger-escaped-thinking...="" data-blogger-escaped-time.sleep="" data-blogger-escaped-type="" data-blogger-escaped-was="">
Any feedback is very welcome.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Speaking at PythonBrasil[7]

Next weekend I’ll be talking about scaling Django applications at Python Brasil, the brazilian Python conference. It will be my first time at the conference, which is one of the greatest Python conferences in Latin America.

Some international dudes are also attending to the conference: Wesley Chun is going to talk about Python 3 and Google App Engine; Alan Runyan will talk about free and open source software, and Steve Holden will be talking about the issues involved in trying to build a global Python user group.

There is also Maciej Fijalkowski, PyPy core developer, talking about little things PyPy makes possible.

As I pointed before, I’m going to talk about scalability, based in some experiences aquired scaling Django applications at Globo.com, like G1, the greatest news portal in the Latin America.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

Flying with Django on Google App Engine

Google App Engine is a powerful tool for web developers. I am sure that it is useful and every developer should taste it =) Python was the first programming language supported by App Engine, and is a programming language with a lot of web frameworks. So, you can use some of these frameworks on Google App Engine. In a series of three blog posts, I will show how to use three Python web frameworks on App Engine: Django, Flask and web2py (not necessarily in this order).

The first framework is Django, the most famous of all Python frameworks and maybe is used the most.

Django models is the strongest Django feature. It is a high level database abstraction layer with a powerful object-relational mapper, it supports a lot of relational database management systems, but App Engine doesn’t use a relational database. The database behind App Engine is called BigTable, which is a distributed storage system for managing structured data, designed to scale to a very large size (Reference: Bigtable: A Distributed Storage System for Structured Data). It is not based on schemas, tables, keys or columns, it is like a big map indexed by a row key, column key and a timestamp. We can not use native version of Django models with Bigtable, because the Django models framework was not designed for non relational databases.

So, what can we do? There is a Django fork, the django-nonrel project, which aims to bring the power of the Django model layer to non-relational databases. I will use the djangoappengine sub-project to build the sample application of this post, that will be deployed on Google App Engine :)

The sample application is the default: a blog. A very simple blog, with only a form protected by login (using Django built-in authentication system instead of Google Accounts API) and a public page listing all blog posts. It is very easy and simple to do, so let’s do it.

First, we have to setup our environment. According the djangoappengine project documentation, we need to download 4 zip files and put it together. First, I downloaded the django-testapp file, extract its contents and renamed the project directory from django-testapp to blog_gae. After this step, I downloaded the other files and put it inside the blog_gae directory. Here is the final project structure:


“django” directory is from the django-nonrel zip file, “djangoappengine” directory is from djangoappengine zip file and “djangotoolbox” directory is from djangotoolbox zip file. Look that is provided an app.yaml file, ready to be customized. I just changed the application id inside this file. The final code of the file is the following:
application: gaeseries
version: 1
runtime: python
api_version: 1

default_expiration: '365d'

handlers:
- url: /remote_api
script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
login: admin

- url: /_ah/queue/deferred
script: djangoappengine/deferred/handler.py
login: admin

- url: /media/admin
static_dir: django/contrib/admin/media/

- url: /.*
script: djangoappengine/main/main.py
I will use one version for each part of the series, so it is the first version because it is the first part =D In settings.py, we just uncomment the app django.contrib.auth line inside the INSTALLED_APPS tuple, because we want to use the built-in auth application instead of the Google Accounts API provided by App Engine.

All settings are ok now, it is time to create the core application. In the Django project, we will use the core application to manage models and serve some views. We just start it using the following command:
% python manage.py startapp core
It is a famous Django command, that creates the application structure, which is a Python package containing 3 Python modules: models, tests and views. Now we have to create the Post model. Here is the code of models.py file:
from django.db import models
from django.contrib.auth.models import User

class Post(models.Model):
title = models.CharField(max_length = 200)
content = models.TextField()
date = models.DateTimeField(auto_now_add = True)
user = models.ForeignKey(User)
Now we just need to “install” the core application putting it on INSTALLED_APPS tuple in settings.py file and Django will be ready to play with BigTable. :) We will use the django.contrib.auth app, so let’s run a manage command to create a superuser:
% python manage.py createsuperuser
After create the superuser, we need to setup login and logout URLs, and make two templates. So, in urls.py file, put two mappings to login and logout views. The file will look like this:
from django.conf.urls.defaults import *

urlpatterns = patterns('',
('^$', 'django.views.generic.simple.direct_to_template',
{'template': 'home.html'}),

('^login/$', 'django.contrib.auth.views.login'),
('^logout/$', 'django.contrib.auth.views.logout'),
)
Here is the registration/login.html template:
{% extends "base.html" %}

{% block content %}

<p>Fill the form below to login in the system ;)</p>

{% if form.errors %}
<p>Your username and password didn't match. Please try again.</p>
{% endif %}

<form method="post" action="{% url django.contrib.auth.views.login %}">{% csrf_token %}
<table>
<tr>
<td>{{ form.username.label_tag }}</td>
<td>{{ form.username }}</td>
</tr>
<tr>
<td>{{ form.password.label_tag }}</td>
<td>{{ form.password }}</td>
</tr>
</table>

<input type="submit" value="login" />
<input type="hidden" name="next" value="{{ next }}" />
</form>

{% endblock %}
And registration/logged_out.html template:
{% extends "base.html" %}

{% block content %}
Bye :)
{% endblock %}
See the two added lines in highlight. In settings.py file, add three lines:
LOGIN_URL = '/login/'
LOGOUT_URL = '/logout/'
LOGIN_REDIRECT_URL = '/'
And we are ready to code =) Let’s create the login protected view, where we will write and save a new post. To do that, first we need to create a Django Form, to deal with the data. There are two fields in this form: title and content, when the form is submitted, the user property is filled with the current logged user and the date property is filled with the current time. So, here is the code of the ModelForm:
class PostForm(forms.ModelForm):
class Meta:
model = Post
exclude = ('user',)

def save(self, user, commit = True):
post = super(PostForm, self).save(commit = False)
post.user = user

if commit:
post.save()

return post
Here is the views.py file, with the two views (one “mocked up”, with a simple redirect):
from django.contrib.auth.decorators import login_required
from django.shortcuts import render_to_response
from django.template import RequestContext
from django.http import HttpResponseRedirect
from django.core.urlresolvers import reverse
from forms import PostForm

@login_required
def new_post(request):
form = PostForm()
if request.method == 'POST':
form = PostForm(request.POST)
if form.is_valid():
form.save(request.user)
return HttpResponseRedirect(reverse('core.views.list_posts'))
return render_to_response('new_post.html',
locals(), context_instance=RequestContext(request)
)

def list_posts(request):
return HttpResponseRedirect('/')
There is only two steps to do to finally save posts on BigTable: map a URL for the views above and create the new_post.html template. Here is the mapping code:
('^posts/new/$', 'core.views.new_post'),
('^posts/$', 'core.views.list_posts'),
And here is the template code:
{% extends "base.html" %}

{% block content %}
<form action="{% url core.views.new_post %}" method="post" accept-charset="utf-8">
{% csrf_token %}
{{ form.as_p }}
<p><input type="submit" value="Post!"/></p>
</form>
{% endblock %}
Now, we can run on terminal ./manage.py runserver and access the URL http://localhost:8000/posts/new on the browser, see the form, fill it and save the post :D The last one step is list all posts in http://localhost:8000/posts/. The list_posts view is already mapped to the URL /posts/, so we just need to create the code of the view and a template to show the list of posts. Here is the view code:
def list_posts(request):
posts = Post.objects.all()
return render_to_response('list_posts.html',
locals(), context_instance=RequestContext(request)
)
And the list_posts.html template code:
{% extends "base.html" %}

{% block content %}
<dl>
{% for post in posts %}
<dt>{{ post.title }} (written by {{ post.user.username }})</dt>
<dd>{{ post.content }}</dd>
{% endfor %}
</dl>
{% endblock %}
Finished? Not yet :) The application now is ready to deploy. How do we deploy it? Just one command:
% python manage.py deploy
Done! Now, to use everything that we have just created on App Engine remote server, just create a super user in that server and enjoy:
% python manage.py remote createsuperuser
You can check this application flying on Google App Engine: http://1.latest.gaeseries.appspot.com (use demo for username and password in login page).

You can check this application code out in Github: http://github.com/fsouza/gaeseries/tree/django.

por fsouza (noreply@blogger.com) em 10 de December de 2019 às 03:42

December 01, 2019

Gabbleblotchits

Interoperability #rust2020

In January I wrote a post for the Rust 2019 call for blogs. The 2020 call is aiming for an RFC and roadmap earlier this time, so here is my 2020 post =]

Last call review: what happened?

An attribute proc-macro like #[wasm_bindgen] but for FFI

This sort of happened... because WebAssembly is growing =]

I was very excited when Interface Types showed up in August, and while it is still very experimental it is moving fast and bringing saner paths for interoperability than raw C FFIs. David Beazley even point this at the end of his PyCon India keynote, talking about how easy is to get information out of a WebAssembly module compared to what had to be done for SWIG.

This doesn't solve the problem where strict C compatibility is required, or for platforms where a WebAssembly runtime is not available, but I think it is a great solution for scientific software (or, at least, for my use cases =]).

"More -sys and Rust-like crates for interoperability with the larger ecosystems" and "More (bioinformatics) tools using Rust!"

I did some of those this year (bbhash-sys and mqf), and also found some great crates to use in my projects. Rust is picking up steam in bioinformatics, being used as the primary choice for high quality software (like varlociraptor, or the many coming from 10X Genomics) but it is still somewhat hard to find more details (I mostly find it on Twitter, and sometime Google Scholar alerts). It would be great to start bringing this info together, which leads to...

"A place to find other scientists?"

Hey, this one happened! Luca Palmieri started a conversation on reddit and the #science-and-ai Discord channel on the Rust community server was born! I think it works pretty well, and Luca also has being doing a great job running workshops and guiding the conversation around rust-ml.

Rust 2021: Interoperability

Rust is amazing because it is very good at bringing many concepts and ideas that seem contradictory at first, but can really shine when synthesized. But can we share this combined wisdom and also improve the situation in other places too? Despite the "Rewrite it in Rust" meme, increased interoperability is something that is already driving a lot of the best aspects of Rust:

  • Interoperability with other languages: as I said before, with WebAssembly (and Rust being having the best toolchain for it) there is a clear route to achieve this, but it will not replace all the software that already exist and can benefit from FFI and C compatibility. Bringing together developers from the many language specific binding generators (helix, neon, rustler, PyO3...) and figuring out what's missing from them (or what is the common parts that can be shared) also seems productive.

  • Interoperability with new and unexplored domains. I think Rust benefits enormously from not focusing only in one domain, and choosing to prioritize CLI, WebAssembly, Networking and Embedded is a good subset to start tackling problems, but how to guide other domains to also use Rust and come up with new contributors and expose missing pieces of the larger picture?

Another point extremely close to interoperability is training. A great way to interoperate with other languages and domains is having good documentation and material from transitioning into Rust without having to figure everything at once. Rust documentation is already amazing, especially considering the many books published by each working group. But... there is a gap on the transitions, both from understanding the basics of the language and using it, to the progression from beginner to intermediate and expert.

I see good resources for JavaScript and Python developers, but we are still covering a pretty small niche: programmers curious enough to go learn another language, or looking for solutions for problems in their current language.

Can we bring more people into Rust? RustBridge is obviously the reference here, but there is space for much, much more. Using Rust in The Carpentries lessons? Creating RustOpenSci, mirroring the communities of practice of rOpenSci and pyOpenSci?

Comments?

por luizirber em 01 de December de 2019 às 15:00

November 11, 2019

Thiago Avelino

Diferença entre amadores e profissionais

Porque algumas pessoas parecem ser extremamente bem sucedidas e fazer muito, enquanto a grande maioria de nós luta para pisar na água? A resposta é complicada e provavelmente composta por diversas respostas. O aspecto principal é a forma de pensar e planejamento. Mas qual é a diferença? Na verdade, há diversas diferenças: Amadores param quando chega a seu objetivo, profissionais entendem que a realização inicial é apenas o começo; Amadores têm um objetivo, profissionais têm um processo; Amadores pensam que são bons em tudo, profissionais entendem seus círculos de competência; Amadores ver feedback e concelho como criticas, profissionais sabem que têm pontos fracos e procuram críticas construtivas; Amadores valorizam o desempenho isolado, pense sobre o receptor que pega a bola uma vez em um lance difícil.

11 de November de 2019 às 16:00

November 08, 2019

Thiago Avelino

Chegar ao estado de Flow, para alcançar metas

Compreendendo a Psicologia do Flow Você já se sentiu completamente imerso em uma atividade? Se sim, você pode ter experimentado um estado mental que os psicólogos chama de flow, mas o que é isso? Vamos fazer uma analogia para tentar explicar: imagine que você está fazendo uma corrida Sua atenção está focada nos movimentos do seu corpo, na força dos seus músculos, sua respiração e sensação da rua aos seus pés.

08 de November de 2019 às 21:00

October 01, 2019

PythonClub

Criando dicts a partir de outros dicts

Neste tutorial, será abordado o processo de criação de um dict ou dicionário, a partir de um ou mais dicts em Python.

Como já é de costume da linguagem, isso pode ser feito de várias maneiras diferentes.

Abordagem inicial

Pra começar, vamos supor que temos os seguintes dicionários:

dict_1 = {
    'a': 1,
    'b': 2,
}

dict_2 = {
    'b': 3,
    'c': 4,
}

Como exemplo, vamos criar um novo dicionário chamado new_dict com os valores de dict_1 e dict_2 logo acima. Uma abordagem bem conhecida é utilizar o método update.

new_dict = {}

new_dcit.update(dict_1)
new_dcit.update(dict_2)

Assim, temos que new_dict será:

>> print(new_dict)
{
    'a': 1,
    'b': 3,
    'c': 4,
}

Este método funciona bem, porém temos de chamar o método update para cada dict que desejamos mesclar em new_dict. Não seria interessante se fosse possível passar todos os dicts necessários já na inicialização de new_dict?

Novidades do Python 3

O Python 3 introduziu uma maneira bem interessante de se fazer isso, utilizando os operadores **.

new_dict = {
    **dict_1,
    **dict_2,
}

Assim, de maneira semelhante ao exemplo anterior, temos que new_dict será :

>> print(new_dict['a'])
1
>> print(new_dict['b'])
3
>> print(new_dict['c'])
4

Cópia real de dicts

Ao utilizamos o procedimento de inicialização acima, devemos tomar conseiderar alguns fatores. Apenas os valores do primeiro nível serão realmente duplicados no novo dicionário. Como exemplo, vamos alterar uma chave presente em ambos os dicts e verificar se as mesmas possuem o mesmo valor:

>> dict_1['a'] = 10
>> new_dict['a'] = 11
>> print(dict_1['a'])
10
>> print(new_dict['a'])
11

Porém isso muda quando um dos valores de dict_1 for uma list, outro dict ou algum objeto complexo. Por exemplo:

dict_3 = {
    'a': 1,
    'b': 2,
    'c': {
        'd': 5,
    }
}

e agora, vamos criar um novo dict a partir desse:

new_dict = {
    **dict_3,
}

Como no exemplo anterior, podemos imaginar que foi realizado uma cópia de todos os elementos de dict_3, porém isso não é totalmente verdade. O que realmente aconteceu é que foi feita uma cópia superficial dos valores de dict_3, ou seja, apenas os valores de primeiro nível foram duplicados. Observe o que acontece quando alteramos o valor do dict presente na chave c.

>> new_dict['c']['d'] = 11
>> print(new_dict['c']['d'])
11
>> print(dict_3['c']['d'])
11 
# valor anterior era 5

No caso da chave c, ela contem uma referência para outra estrutura de dados (um dict, no caso). Quando alteramos algum valor de dict_3['c'], isso reflete em todos os dict que foram inicializados com dict_3. Em outras palavras, deve-se ter cuidado ao inicializar um dict a partir de outros dicts quando os mesmos possuírem valores complexos, como list, dict ou outros objetos (os atributos deste objeto não serão duplicados).

De modo a contornar este inconveniente, podemos utilizar o método deepcopy da lib nativa copy. Agora, ao inicializarmos new_dict:

import copy

dict_3 = {
    'a': 1,
    'b': 2,
    'c': {
        'd': 5,
    }
}

new_dict = copy.deepcopy(dict_3)

O método deepcopy realiza uma cópia recursiva de cada elemento de dict_3, resolvendo nosso problema. Veja mais um exemplo:

>> new_dict['c']['d'] = 11
>> print(new_dict['c']['d'])
11
>> print(dict_3['c']['d'])
5 
# valor não foi alterado

Conclusão

Este artigo tenta demonstrar de maneira simples a criação de dicts, utilizando os diversos recursos que a linguagem oferece bem como os prós e contras de cada abordagem.

Referências

Para mais detalhes e outros exemplos, deem uma olhada neste post do forum da Python Brasil aqui.

É isso pessoal. Obrigado por ler!

por Michell Stuttgart em 01 de October de 2019 às 23:20

September 23, 2019

Vinta Software

DjangoCon US 2019: Python & Django in San Diego!

We are back to San Diego!! Our team will be joining DjangoCon US's conference, one of the biggest Django events in the world. For this year, we'll be giving two talks: Pull Requests: Merging good practices into your project and Building effective Django queries with expressions. Here are the slides from the talks we gave during the conference: Pu

23 de September de 2019 às 20:00

September 16, 2019

Filipe Saraiva

Grupo de Estudos do Laboratório Amazônico de Estudos Sociotécnicos – UFPA

Eu e o prof. Leonardo Cruz da Faculdade de Ciências Sociais estamos juntos trabalhando no desenvolvimento do Laboratório Amazônico de Estudos Sociotécnicos da UFPA.

Nossa proposta é realizar leituras e debates críticos sobre o tema da sociologia da tecnologia, produzir pesquisas teóricas e empíricas na região amazônica sobre as relações entre tecnologia e sociedade, e trabalhar com tecnologias livres em comunidades próximas a Belém.

No momento estamos com um grupo de estudos montado com cronograma de textos e filmes para trabalharmos e debatermos criticamente. Esse grupo será o embrião para a orientação de alunos de graduação e pós em temas como impacto da inteligência artificial, computação e guerra, cibernética, vigilantismo, capitalismo de plataforma, fake news, pirataria, software livre, e outros.

Aos interessados, nosso cronograma de estudos está disponível nesse link.

E para quem usa Telegram, pode acessar o grupo de discussão aqui.

Quaisquer dúvidas, só entrar em contato!

por Filipe Saraiva em 16 de September de 2019 às 13:24

September 10, 2019

Humberto Rocha

Desbravando o pygame 5 - Movimento e Colisão

O movimento é uma característica que está presente na maioria dos jogos. Ao saltar entre plataformas, atirar contra a horda de inimigos, pilotar uma nave espacial e correr pelas estradas estamos exercendo movimento, interagindo com o ambiente do jogo, aplicando ações e causando reações. Neste capítulo iremos conhecer os conceitos básicos de movimentação de objetos na tela e sua interação com outros elementos através da detecção de colisão. Movimento Se você vem acompanhando esta série de postagens, teve um breve exemplo de movimentação na postagem sobre game loop, onde uma bola que se movimentava quicando pela tela foi implementada.

10 de September de 2019 às 03:00

August 28, 2019

Humberto Rocha

Publicando meu primeiro Jogo

Jogos sempre me conectam com tecnologia desde o início. Eu e meu pai montamos nosso primeiro computador (um Pentium 286) e a primeira coisa que eu me lembro de fazer, foi jogar os jogos de DOS como Prince of Persia e Lunar Lander. Eu aprendi vários comandos de CLI só para poder jogar os meus jogos favoritos. A paixão por jogar e fazer jogos sempre me acompanhou como um hobby. I tenho uma série de posts sobre pygame neste blog onde eu passo pelos conceitos básicos de desenvolvimento de jogos tentando explicar para pessoas que estejam iniciando seu aprendizado na área.

28 de August de 2019 às 03:00

August 19, 2019

Vinta Software

[pt-BR] PythonBrasil[14] talks

Slides from talks given during the PythonBrasil[14] event will be posted here. This post and the slides are written in Brazilian Portuguese. Como Programar seu Processo de Software Palestrante: Robertson Novelino Link dos Slides: Como Programar seu Processo de Software Todos usamos um método para programar, uma forma que nós gostamos de fazer as

19 de August de 2019 às 21:59

Understanding Time Series Forecasting with Python

Vinta is a software studio whose focus is to produce high quality software and give clients great consulting advices to make their businesses grow. However, even though our main focus is web development, we also do our share of machine learning over here. This article is the first of a few designed to show everything (or almost everything) you need

19 de August de 2019 às 21:59

Dealing with resource-consuming tasks on Celery

In this post, we will talk about how you can optimize your Celery tasks and avoid certain kind of problems related to resource-consuming tasks. If you are new to Celery or want to know more about it before reading this, you might wanna check first this post with an overview about Celery and this one with some more advanced tips and tricks. When w

19 de August de 2019 às 21:59

Vinta's Talks Around the Globe: DjangoConUS, PyBay2017 and DjangoConAU

Slides from talks given during the DjangoConUS, PyBay2017 and DjangoConAU events will be posted here

19 de August de 2019 às 21:59

Advanced Django querying: sorting events by date

Imagine the situation where our application has events (scheduled tasks, appointments, python conferences across the world) happening in different moments of time. Almost anything with a date attached to it. We want to display them in a simple list to the user. Given we are in February 2017 (the date this post was written), what would be the best w

19 de August de 2019 às 21:59

Don't forget the stamps: testing email content in Django

When developing a web app how often do you check the emails you send are all working properly? Not as often as your web pages, right? That's ok, don't feel guilty, emails are hard to test and they are often someone's else responsibility to write and take care. This doesn't mean we should give up on them. There are some things we can do to prevent e

19 de August de 2019 às 21:59

PyBay 2019: Talking about Python in SF

We are back to San Francisco! Our team will be joining PyBay's conference, one of the biggest Python events in the Bay Area. For this year, we'll be giving the talk: Building effective Django queries with expressions. PyBay has been a fantastic place to meet new people, connect with new ideas, and integrate this thriving community. Here is the sl

19 de August de 2019 às 21:59

Celery in the wild: tips and tricks to run async tasks in the real world

This post is aimed at people with some experience writing async taks, if you are starting on Celery you might want to read this other post I wrote before starting on this one. The thing about async tasks is that the hard part is not how to run them [although it can be fairly complicated to understand the architecture and set up things when you

19 de August de 2019 às 21:59

How I test my DRF serializers

In this blog post, I will show the whats and whys on testing Django REST Framework serializers. First, some context. Here is the model setup we are going to use for this example: from django.db import models class Bike(models.Model): COLOR_OPTIONS = ((&#39;yellow&#39;, &#39;Yellow&#39;), (&#39;red&#39;, &#39;Red&#39;), (&#39;black&#39;, &#39;B

19 de August de 2019 às 21:59

Python API clients with Tapioca

In this post I'll present to you Tapioca, a Python library to create powerful API clients with very few lines of code. If you don't want to read through the reasons why I've built it, you may just jump straight to the Tapioca Wrapper section. Why do we need a better way to build API clients Integrating with external services is painful. Here at Vin

19 de August de 2019 às 21:59

[pt-BR] PythonBrasil[13] Talks

Vinhemos para a Python Brasil novamente. Dessa vez aprovamos 6 palestras. Indo desde Python para uso científico até técnicas para salvar grandes projetos.

19 de August de 2019 às 21:59

Contributing to Django Framework is easier than you think

For those who are starting to code and wish to make open source, sometimes it is hard to start. The idea of contributing with that fancy and wonderful lib that you love can sound a little bit scary. Lucky for us many of those libs have room for whoever is willing to start. They also give us the support that we need. Pretty sweet, right? Do you know

19 de August de 2019 às 21:59

Happython 2019!

Happy PyHolidays and Happy New Year! We are approaching the end of the year 2018! And as it turns out this was an incredible year for Python! Even though Guido stepped out of his role as BDFL (this alone is worth a couple of blogposts so this one will not extend the discussion), there are a lot of happy recollections from this year. In 2017 StackO

19 de August de 2019 às 21:59

PyGotham 2018 Talks

Critical Incidents: a guide for developers Presenter: Lais Varejão Slides: http://bit.ly/critical-incidents-guide Pluggable Libs Through Design Patterns Presenter: Filipe Ximenes Video: https://www.youtube.com/watch?v=PfgEU3W0kyU Slides: http://bit.ly/pluggable-libs Examples: https://github.com/filipeximenes/talk-design-patterns 1 + 1 = 1 or Re

19 de August de 2019 às 21:59

Taming Irreversibility with Feature Flags (in Python)

Feature Flags are a very simple technique to make features of your application quickly toggleable. The way it works is, everytime we change some behavior in our software, a logical branch is created and this new behavior is only accessible if some specific configuration variable is set or, in certain cases, if the application context respects some

19 de August de 2019 às 21:59

Django REST Framework Read & Write Serializers

Django REST Framework (DRF) is a terrific tool for creating very flexible REST APIs. It has a lot of built-in features like pagination, search, filters, throttling, and many other things developers usually don't like to worry about. And it also lets you easily customize everything so you can make your API work the way you want. There are many gen

19 de August de 2019 às 21:59

Celery: an overview of the architecture and how it works

Asynchronous task queues are tools to allow pieces of a software program to run in a separate machine/process. It is often used in web architectures as way to delegate long lasting tasks while quickly answering requests. The delegated task can trigger an action such as sending an email to the user or simply update data internally in the system whe

19 de August de 2019 às 21:59

Multitenancy: juggling customer data in Django

Suppose you want to build a new SaaS (Software as a Service) application. Suppose your application will store sensitive data from your customers. What is the best way to guarantee the isolation of the data and make sure information from one client does not leak to the other? The answer to that is: it depends. It depends on the number of customers y

19 de August de 2019 às 21:59

PyCon US 2017: the biggest Python Event in the World

Pycon 2017 happened in Oregon, Portland! If you wanted to discuss anything about Python, that was the place to be. It was the biggest Python event of the world, it lasted from May 17th to May 25th. I got to see talks from some important names on it, like Lisa Guo and Katy Huff, both of them are using Python to make great things! Lisa is using on In

19 de August de 2019 às 21:59

[pt-BR] Python Nordeste 2017 Talks

Slides from talks given during the Python Nordeste 2017 event will be posted here. This post and the slides are written in Brazilian Portuguese. 5 meses de Python: o que aprendi Palestrante: @rsarai Link dos slides: 5 meses de Python: o que aprendi Trabalhar como desenvolvedor de software pode ser um pouco frustrante, as vezes por estar preso a

19 de August de 2019 às 21:59

[Talk] All Things Python meetup in Sunnyvale

I'll be talking at the All Things Python meetup! It will happen on June 6 in Sunnyvale, California. I'll be talking about good practices designing async tasks and some advanced Celery features.This will be a first version of the talk I'm preparing for DjangoCon US in August. For signing up or more information, this is the link to the event. Looki

19 de August de 2019 às 21:59

Metaprogramming and Django - Using Decorators

While programming is about, in some way, doing code to transform data, metaprogramming can be seen as the task of doing code to change code. This category is often used to help programmers to enhance the readability and maintainability of the code, help with separation of concerns and respect one of the most important principles of software develop

19 de August de 2019 às 21:59

Vinta's Review of PythonBrasil[12]

PythonBrasil[12] happened in Florianópolis - SC and lasted for 6 days. We saw some amazing Keynotes from some awesome speakers, such as @SagnewShreds, @hannelita, @NaomiCeder, @freakboy3742, @seocam we got to have a lot of community time getting to know new people from all around Brazil and still got to present 4 talks(Hooray!!). On the following

19 de August de 2019 às 21:59

[pt-BR] PythonBrasil[12] Talks

Slides from talks given during the PythonBrasil[12] event will be posted here. This post and the slides are in written in Brazilian Portuguese. O que é esse tal de REST? Palestrante: @xima Link dos slides: O que é esse tal de REST? REST é a bola vez quando falamos sobre API. As maioria dos serviços que encontramos na web fornece interfaces dest

19 de August de 2019 às 21:59

Database concurrency in Django the right way

When developing applications which have real-time requirements or other specific needs for running asynchronous tasks outside the web application, it is common to adopt a task queue such as Celery. This allows, for example, for the server to handle a request, start an asynchronous task responsible of doing some heavyweight processing, and return an answer while the task is still running. Here, we are considering a similar scenario: a request is made, and the server has to do some processing on the request. Ideally, we want to separate the high time-demanding parts from the view processing flow, so we run those parts in a separate task. Now, let's suppose we have to do some database operations both in the view and the task when the request happens. If not done carefully, those operations can be a source for issues that can be hard to track.

19 de August de 2019 às 21:59

August 08, 2019

Humberto Rocha

Libs Fantásticas: pipx

Estou começando esta série para dar dicas sobre bibliotecas que podem ser muito úteis no seu dia a dia, e também para apresentar bibliotecas interessantes nas quais que você deveria ficar de olho. Uma das habilidades de um bom Programador é ter a ferramenta certa para realizar seu trabalho, e nada mais apropriado que começar esta série com uma ferramenta que instala outras ferramentas! Quantas vezes você já teve que instalar algum programa Python em uma virtualenv que você acabou de criar?

08 de August de 2019 às 03:00

July 12, 2019

Humberto Rocha

TLDR: Gerando Secret Key para o Django

Levante a mão quem nunca versionou a SECRET_KEY do Django no início de um projeto e precisou gerar uma nova na hora de subir pra produção. Este TLDR é um lembrete rápido de como você pode regerar uma secret key localmente, sem recorrer a sites na internet para gera-la para você. Como o Django gera a secret key no início de um projeto, já existe esta função implementada em seu código e você pode acessá-la desta forma:

12 de July de 2019 às 00:00