Planeta PythonBrasil

March 23, 2025

Rodrigo Delduca

Building and Publishing Games to Steam Directly from GitHub Actions

I have been using GitHub Actions extensively both at work and in personal projects, as shown in this post What I’ve been automating with GitHub Actions, an automated life.

In my free time, I’m working on a 2D hide-and-seek game, and as you might expect, I’ve automated the entire release pipeline for publishing on Steam. After a few attempts, when it finally worked, it felt like magic: all I had to do was create a new tag, and within minutes, the Steam client was downloading the update.

As I mentioned earlier, I have a 2D engine that, while simple, is quite comprehensive. With each new tag, I compile it in parallel for Windows, macOS, Linux, and WebAssembly. Once compilation is complete, I create a release and publish it on GitHub. Releases · willtobyte/carimbo

This way

name: Release

on:
  push:
    tags:
      - "v*.*.*"

jobs:
  release:
    runs-on: ${{ matrix.config.os }}

    permissions:
      contents: write

    strategy:
      fail-fast: true

      matrix:
        config:
          - name: macOS
            os: macos-latest
          - name: Ubuntu
            os: ubuntu-latest
          - name: WebAssembly
            os: ubuntu-latest
          - name: Windows
            os: windows-latest

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Cache Dependencies
        uses: actions/cache@v4
        with:
          path: |
            ~/.conan2/p
            C:/Users/runneradmin/.conan2/p
          key: ${{ matrix.config.name }}-${{ hashFiles('**/conanfile.py') }}
          restore-keys: |
            ${{ matrix.config.name }}-

      - name: Prepare Build Directory
        run: mkdir build

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install Conan
        run: pip install conan

      - name: Detect Conan Profile
        run: conan profile detect --force

      - name: Set Conan Center
        run: conan remote update conancenter --url https://center2.conan.io

      - name: Detect WebAssembly Conan Profile
        if: matrix.config.name == 'WebAssembly'
        run: |
          cat > ~/.conan2/profiles/webassembly <<EOF
          include(default)

          [settings]
          arch=wasm
          os=Emscripten

          [tool_requires]
          *: emsdk/3.1.73
          EOF

      - name: Install Windows Or macOS Dependencies
        if: matrix.config.name == 'Windows' || matrix.config.name == 'macOS'
        run: conan install . --output-folder=build --build=missing --settings compiler.cppstd=20 --settings build_type=Release

      - name: Install Ubuntu Dependencies
        if: matrix.config.name == 'Ubuntu'
        run: conan install . --output-folder=build --build=missing --settings compiler.cppstd=20 --settings build_type=Release --conf "tools.system.package_manager:mode=install" --conf "tools.system.package_manager:sudo=True"

      - name: Install WebAssembly Dependencies
        if: matrix.config.name == 'WebAssembly'
        run: conan install . --output-folder=build --build=missing --profile=webassembly --settings compiler.cppstd=20 --settings build_type=Release

      - name: Configure
        run: cmake .. -DCMAKE_TOOLCHAIN_FILE="conan_toolchain.cmake" -DCMAKE_BUILD_TYPE=Release
        working-directory: build

      - name: Build
        run: cmake --build . --parallel 8 --config Release --verbose
        working-directory: build

      - name: Create Artifacts Directory
        run: mkdir artifacts

      - name: Compress Artifacts
        if: matrix.config.name == 'macOS'
        working-directory: build
        run: |
          chmod -R a+rwx carimbo
          tar -cpzvf macOS.tar.gz carimbo
          mv macOS.tar.gz ../artifacts

      - name: Compress Artifacts
        if: matrix.config.name == 'Ubuntu'
        working-directory: build
        run: |
          chmod +x carimbo
          tar -czvf Ubuntu.tar.gz --mode='a+rwx' carimbo
          mv Ubuntu.tar.gz ../artifacts

      - name: Compress Artifacts
        if: matrix.config.name == 'WebAssembly'
        working-directory: build
        run: |
          zip -jr WebAssembly.zip carimbo.wasm carimbo.js
          mv WebAssembly.zip ../artifacts

      - name: Compress Artifacts
        if: matrix.config.name == 'Windows'
        working-directory: build
        shell: powershell
        run: |
          Compress-Archive -LiteralPath 'Release/carimbo.exe' -DestinationPath "../artifacts/Windows.zip"

      - name: Release
        uses: softprops/action-gh-release@v1
        with:
          tag_name: ${{ github.event.inputs.tagName }}
          prerelease: ${{ github.events.inputs.prerelease }}
          files: artifacts/*

Publishing on Steam is quite simple. First, you need a developer account with the correct documentation and fees paid.

After that, you’ll need to generate some secret keys as follows:

steamcmd +login <username> <password> +quit

If you don’t have the steamcmd application installed, you’ll need to install it using:

cast install --cask steamcmd

Copy the contents of the authentication file:

cat ~/Library/Application\ Support/Steam/config/config.vdf | base64 | pbcopy

Note: You must have MFA enabled. After logging in, run the command below and copy the output into a GitHub Action variable named STEAM_CONFIG_VDF.

Also, create the variables STEAM_USERNAME with your username and STEAM_APP_ID with your game’s ID.

Additionally, the Action downloads the latest Carimbo release for Windows only (sorry Linux and macOS users, my time is limited). Ideally, I should pin the runtime version (the Carimbo version) using something like a runtime.txt file. Maybe I’ll implement this in the future, but for now, everything runs on the bleeding edge. :-)

on:
  push:
    tags:
      - "v*.*.*"

jobs:
  publish:
    runs-on: ubuntu-latest
    env:
      CARIMBO_TAG: "v1.0.65"

    permissions:
      contents: write

    steps:
      - name: Clone the repository
        uses: actions/checkout@v4

      - name: Install 7zip
        run: sudo apt install p7zip-full

      - name: Create bundle
        run: 7z a -xr'!.git/*' -xr'!.git' -xr'!.*' -t7z -m0=lzma -mx=6 -mfb=64 -md=32m -ms=on bundle.7z .

      - name: Download Carimbo runtime
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: gh release download ${{ env.CARIMBO_TAG }} --repo willtobyte/carimbo --pattern "Windows.zip"

      - name: Extract Carimbo runtime
        run: 7z x Windows.zip -o.

      - name: Copy files
        run: |
          mkdir -p output
          mv bundle.7z output/
          mv carimbo.exe output/

      - name: Upload build to Steam
        uses: game-ci/steam-deploy@v3
        with:
          username: ${{ secrets.STEAM_USERNAME }}
          configVdf: ${{ secrets.STEAM_CONFIG_VDF }}
          appId: ${{ secrets.STEAM_APP_ID }}
          rootPath: output
          depot1Path: "."
          releaseBranch: prerelease

Boom! If everything is correct, your game should appear in your Steam client under the list of owned games.

por Rodrigo Delduca (rodrigo@delduca.org) em 23 de March de 2025 às 00:00

December 16, 2024

Nilo Ney Coutinho Menezes

Advent of code 2024 - Day 15 of 25

The fifteenth day of Advent of Code was yesterday 15/12/2024! If you’re here for the first time today, don’t forget to read the posts about <a href="https://blog.nilo.pro.br/en/posts/2024-12-06-adventofcode2024-day01/">Day 1</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-06-adventofcode2024-day02/">Day 2</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-06-adventofcode2024-day03/">Day 3</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-06-adventofcode2024-day04/">Day 4</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-06-adventofcode2024-day05/">Day 5</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-07-adventofcode2024-day06/">Day 6</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-08-adventofcode2024-day07/">Day 7</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-09-adventofcode2024-day08/">Day 8</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-10-adventofcode2024-day09/">Day 9</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-11-adventofcode2024-day10/">Day 10</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-12-adventofcode2024-day11/">Day 11</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-13-adventofcode2024-day12/">Day 12</a>, <a href="https://blog.nilo.pro.br/en/posts/2024-12-14-adventofcode2024-day13/">Day 13</a> and <a href="https://blog.nilo.pro.br/en/posts/2024-12-15-adventofcode2024-day14/">Day 14</a>. I classified the <a href="https://adventofcode.com/2024/day/15">fifteenth problem</a> as medium (part 1) and medium (part 2). <h1 id="tips">Tips</h1> <ul> <li>The first part of the problem is to read the maze (warehouse) and the robot commands.</li> <li>Use the example inputs to test the solution.</li> <li>You need to detect in the maze input where the walls, robot, and boxes are.</li> <li>The commands are on multiple lines, it’s easier if you concatenate them into a single string.</li> <li>You can implement robot directions (commands) using vectors.</li> <li>Vectors can also be used to mark box and wall positions.</li> <li>The walls surround the entire warehouse, you don’t need to worry about the robot leaving the warehouse.</li> <li>To check if the robot can move, verify if the new position is free. In this case, move the robot by updating its position.</li> <li>If the next position is occupied by a wall, the robot cannot move and stays in the same place.</li> <li>To know if boxes can be moved, you can recursively try all boxes from the movement direction. For each box, check if it can move (blank space in the movement direction). If one of the boxes cannot be moved, the movement is canceled.</li> <li>Remember that the robot can push multiple boxes at once. A recursive function can help solve this problem.</li> <li>In part 1, boxes are the same size as walls, meaning each box occupies one position. In part 2, each box occupies two positions.</li> <li>Part 2 is a bit more complex because you can move more than columns and rows of boxes, but entire parts of the warehouse.</li> <li>Horizontal movement is similar to part 1, as each box has a height of only one character, like in part 1.</li> <li>Each movement in part 2 can move up to 2 boxes at once. But this movement is done in cascade, so these two boxes can move other 2, 3, or more boxes. Each box movement can move zero, one, or two boxes at each step, but can cause the movement of many others.</li> <li>In part 2, you need to verify if the entire movement is valid. For example, when the boxes being pushed form a V or get stuck on a wall, they all have to stop moving. If you implement the movement function as in part 1, you’ll see that the boxes will move in several pieces, which leads to the wrong answer.</li> </ul> <h1 id="how-much-python-do-you-need-to-know">How much Python do you need to know?</h1> The chapters refer to my book <a href="https://python.nilo.pro.br/">Introduction to Programming with Python</a>. <ul> <li>Lists - Chapter 6</li> <li>Strings - Chapter 7</li> <li>Recursive Functions - Chapter 8</li> <li>Classes, inheritance - Chapter 10</li> </ul> <h1 id="answer">Answer</h1> Here’s the code that solves the problem. You’ll see the scars in the code from switching between part 1 and part 2, but the code works relatively well. You can use the Warehouse, Boxes, Wall, Robot, Problem, and Problem2 classes to debug your solution. The <code>draw</code> and <code>prompt</code> methods can help interpret and draw the commands step by step. Although they’re not called in the answer, they’re in the code if you need them. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python">import sys from helpers import read_file_without_blank_lines, Vector2d, Maze TEST_INPUT = """ ######## #..O.O.# ##@.O..# #...O..# #.#.O..# #...O..# #......# ######## <^^>>>vv<v>>v<< """ INPUT = """ ########## #..O..O.O# #......O.# #.OO..O.O# #..O@..O.# #O#..O...# #O..O..O.# #.OO.O.OO# #....O...# ########## <vv>^<v^>v>^vv^v>v<>v^v<v<^vv<<<^><<><>>v<vvv<>^v^>^<<<><<v<<<v^vv^v>^ vvv<<^>^v^^><<>>><>^<<><^vv^^<>vvv<>><^^v>^>vv<>v<<<<v<^v>^<^^>>>^<v<v ><>vv>v^v^<>><>>>><^^>vv>v<^^^>>v^v^<^^>v^^>v^<^v>v<>>v^v^<v>v^^<^^vv< <<v<^>>^^^^>>>v^<>vvv^><v<<<>^^^vv^<vvv>^>v<^^^^v<>^>vvvv><>>v^<<^^^^^ ^><^><>>><>^^<<^^v>>><^<v>^<vv>>v>>>^v><>^v><<<<v>>v<v<v>vvv>^<><<>^>< ^>><>^v<><^vvv<^^<><v<<<<<><^v<<<><<<^^<v<^^^><^>>^<v^><<<^>>^v<v^v<v^ >^>>^v>vv>^<<^v<>><<><<v<<v><>v<^vv<<<>^^v^>^^>>><<^v>>v^v><^^>>^<>vv^ <><^^>^^^<><vvvvv^v<v<<>^v<v>v<<^><<><<><<<^^<<<^<<>><<><^^^>^^<>^>v<> ^^>vv<^v^v<vv>^<><v<^v>^^^>>>^^vvv^>vvv<>>>^<^>>>>>^<<^v>^vvv<>^<><<v> v^^>>><<^^<>>^v^<v^vv<>v^<<>^<^v^v><^<<<><<^<v><v<>vv>>v><v^<vv<>v^<<^ """ def read_input(input: str) -> tuple[str, str]: maze = [] commands = [] for line in input.splitlines(): if not line.strip(): continue if line.startswith("#"): maze.append(line) else: commands.append(line) return maze, commands class Vector2dWithSymbol(Vector2d): def __init__(self, y, x, symbol): super().__init__(y, x) self.symbol = symbol def __add__(self, other): return Vector2dWithSymbol(self.y + other.y, self.x + other.x, self.symbol) def __sub__(self, other): return Vector2dWithSymbol(self.y - other.y, self.x - other.x, self.symbol) class Box(Vector2dWithSymbol): def __init__(self, y, x, symbol="O"): super().__init__(y, x, symbol) class Wall(Vector2dWithSymbol): def __init__(self, y, x): super().__init__(y, x, "#") class Robot(Vector2dWithSymbol): def __init__(self, y, x): super().__init__(y, x, "@") class Warehouse(Maze): def init_drawing(self, background="."): self.background = background self.drawing = [] for _ in range(self.max_y): self.drawing.append([background] * self.max_x) def draw(self, objects: list[Vector2dWithSymbol]): for obj in objects: self.drawing[obj.y][obj.x] = obj.symbol def dump_drawing(self, output=sys.stdout): for line in self.drawing: print("".join(line), file=output) def __iter__(self): return iter(self.maze) class Problem: DIRECTIONS = { "<": Vector2d(0, -1), "^": Vector2d(-1, 0), ">": Vector2d(0, 1), "v": Vector2d(1, 0), } def __init__(self, maze: list[str], commands: list[str]): self.commands = "".join(commands) self.warehouse = Warehouse(maze) self.setup() def setup(self): self.walls = set() self.boxes = dict() self.robot = None for y, line in enumerate(self.warehouse): for x, char in enumerate(line): if char == "#": self.walls.add(Wall(y, x)) if char == "O": box = Box(y, x) self.boxes[box] = box if char == "@": self.robot = Robot(y, x) def draw(self): self.warehouse.init_drawing() self.warehouse.draw(self.walls) self.warehouse.draw(self.boxes) self.warehouse.draw([self.robot]) self.warehouse.dump_drawing() def execute_commands(self): for command in self.commands: direction = self.DIRECTIONS[command] new_position = self.robot + direction if new_position in self.boxes: if not self.push(new_position, direction): continue elif new_position in self.walls: continue self.robot += direction def push(self, position: Vector2d, direction: Vector2d, execute: bool = True): new_position = position + direction if new_position in self.walls: return False if new_position in self.boxes: can_push = self.push(new_position, direction, execute) if not can_push: return False if execute: box = self.boxes.pop(position) box.y = new_position.y box.x = new_position.x self.boxes[box] = box return True def gps_boxes(self): total = 0 for box in self.boxes: total += box.y * 100 + box.x return total def prompt(self): while True: self.draw() commands = input("Commands: ").lower().strip() if not commands: break self.commands = commands self.execute_commands() class Problem2(Problem): def __init__(self, maze: list[str], commands: list[str]): super().__init__(maze, commands) self.warehouse.max_x *= 2 def setup(self): self.walls = set() self.boxes = dict() self.robot = None for y, line in enumerate(self.warehouse): for x, char in enumerate(line): if char == "#": self.walls.add(Wall(y, x * 2)) self.walls.add(Wall(y, x * 2 + 1)) if char == "O": box1 = Box(y, x * 2, "[") box2 = Box(y, x * 2 + 1, "]") self.boxes[box1] = box1 self.boxes[box2] = box2 if char == "@": self.robot = Robot(y, x * 2) def push(self, position: Vector2d, direction: Vector2d, execute: bool = True): new_position = position + direction if direction in [Vector2d(0, -1), Vector2d(0, 1)] or new_position in self.walls: return super().push(position, direction, execute) if position not in self.boxes: return True box1, box2 = self.find_box(position) can_push1 = super().push(box1, direction, False) can_push2 = super().push(box2, direction, False) if can_push1 and can_push2: if execute: self.push_v(box1, box2, direction) return True return False def find_box(self, position): box1 = self.boxes[position] if box1.symbol == "[": box2 = self.boxes[position + Vector2d(0, 1)] else: box2 = self.boxes[position + Vector2d(0, -1)] return box1, box2 def push_v(self, position1, position2, direction): np1 = position1 + direction np2 = position2 + direction if np1 in self.boxes: self.push_v(*self.find_box(np1), direction) if np2 in self.boxes: self.push_v(*self.find_box(np2), direction) for position, new_position in zip([position1, position2], [np1, np2]): box = self.boxes.pop(position) box.y = new_position.y box.x = new_position.x self.boxes[box] = box def gps_boxes(self): total = 0 for box in self.boxes: if box.symbol == "[": total += box.y * 100 + box.x return total problem = Problem(*read_input(TEST_INPUT)) problem.execute_commands() gps = problem.gps_boxes() print("Part 1 - small input test:", gps) assert gps == 2028 problem = Problem(*read_input(INPUT)) problem.execute_commands() gps = problem.gps_boxes() print("Part 1 - large input test:", gps) assert gps == 10092 problem = Problem(*read_input(read_file_without_blank_lines("15/input.txt"))) problem.execute_commands() gps = problem.gps_boxes() print("Part 1 - large input:", gps) assert gps == 1552463 # Part 2 problem = Problem2(*read_input(INPUT)) problem.execute_commands() gps = problem.gps_boxes() print("Part 2 - large input test:", gps) assert gps == 9021 problem = Problem2(*read_input(read_file_without_blank_lines("15/input.txt"))) problem.execute_commands() gps = problem.gps_boxes() print("Part 2 - large input:", gps) assert gps == 1554058 </code></pre></div>

16 de December de 2024 às 00:49

Advent of code 2024 - Dia 15 de 25

O décimo quinto dia do Advent of Code foi ontem 15/12/2024! Se você caiu aqui hoje pela primeira vez, não deixe de ler os posts sobre o <a href="https://blog.nilo.pro.br/posts/2024-12-02-adventofcode2024-dia01/">Dia 1</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-03-adventofcode2024-dia02/">Dia 2</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-04-adventofcode2024-dia03/">Dia 3</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-05-adventofcode2024-dia04/">Dia 4</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-06-adventofcode2024-dia05/">Dia 5</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-07-adventofcode2024-dia06/">Dia 6</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-08-adventofcode2024-dia07/">Dia 7</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-09-adventofcode2024-dia08/">Dia 8</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-10-adventofcode2024-dia09/">Dia 9</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-11-adventofcode2024-dia10/">Dia 10</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-12-adventofcode2024-dia11/">Dia 11</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-13-adventofcode2024-dia12/">Dia 12</a>, <a href="https://blog.nilo.pro.br/posts/2024-12-14-adventofcode2024-dia13/">Dia 13</a> e <a href="https://blog.nilo.pro.br/posts/2024-12-15-adventofcode2024-dia14/">Dia 14</a>. Classifiquei o <a href="https://adventofcode.com/2024/day/15">décimo quinto problema</a> como médio (parte 1) e médio (parte 2). <h1 id="dicas">Dicas</h1> <ul> <li>A primeira parte do problema é ler o labirinto (armazém) e os comandos do robô.</li> <li>Utilize as entradas de exemplo para testar a solução.</li> <li>Você precisa detectar na entrada do labirinto, onde estão as paredes, o robô e as caixas.</li> <li>Os comandos estão em várias linhas, fica mais fácil se você concatená-las em uma só string.</li> <li>Você pode implementar as direções do robô (comandos) usando vetores.</li> <li>Vetores também podem ser utilizados para marcar as posições das caixas e paredes.</li> <li>As paredes circundam todo o armazém, você não precisa de preocupar com o robô saindo do armazém.</li> <li>Para verificar se o robô pode se mexer, verifique se a nova posição está livre. Neste caso, mova o robô, atualizando a posição.</li> <li>Se a posição seguinte for ocupada por uma parede, o robô não pode se mexer e fica no mesmo lugar.</li> <li>Para saber se pode mover as caixas, você pode tentar recursivamente todas as caixas a partir da direção do movimento. A cada caixa, verifique se ela pode se mover (espaço em branco na direção do movimento). Caso uma das caixas não possa ser movida, o movimento é cancelado.</li> <li>Lembre-se que o robô pode empurrar várias caixas ao mesmo tempo. Uma função recursiva pode ajudar a resolver este problema.</li> <li>Na parte 1, as caixas são do mesmo tamanho das paredes, ou seja, cada caixa ocupa uma posição. Na parte 2, cada caixa ocupa duas posições.</li> <li>A parte 2 é um pouco mais complexa, pois você pode mover mais que colunas e linhas de caixas, mas partes inteiras do armazém.</li> <li>O movimento horizontal é semelhante ao da parte 1, pois cada caixa tem altura de apenas um caractere, como na parte 1.</li> <li>Cada movimento na parte 2 pode mover até 2 caixas ao mesmo tempo. Mas esse movimento é feito em cascata, logo estas duas caixas podem mover outras 2, 3, ou mais caixas. Cada movimento de caixa pode movimentar a cada passo, zero, uma ou duas caixas, mas pode ocasionar o movimento de muitas outras.</li> <li>Na parte 2, você precisa verificar se o movimento inteiro é válido. Por exemplo, quando as caixas sendo empurradas formam um V ou ficam presas em uma parede, todas tem que parar de se mover. Se você implementar a função de movimento como na parte 1, verá que as caixas vão se mexer em vários pedaços, o que leva a resposta errada.</li> </ul> <h1 id="quanto-de-python-você-precisa-saber">Quanto de Python você precisa saber?</h1> Os capítulos se referem ao meu livro <a href="https://python.nilo.pro.br/">Introdução à Programação com Python</a>. <ul> <li>Listas - Capítulo 6</li> <li>Strings - Capítulo 7</li> <li>Funções recursivas - Capítulo 8</li> <li>Classes, herança - Capítulo 10</li> </ul> <h1 id="resposta">Resposta</h1> Aqui o código que resolve o problema. Você verá as cicatrizes no código da troca entre a parte 1 e a parte 2, mas o código funciona relativamente bem. Você pode de usar as classes Armazém, Caixas, Parede, Robô, Problema e Problema2 para depurar sua solução. Os métodos <code>desenha</code>e <code>prompt</code> podem ajudar a interpretar e desenhar os comandos passo a passo. Embora não sejam chamados na resposta, eles estão no código caso você precise. <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python">import sys from auxiliares import le_arquivo_sem_linhas_em_branco, Vector2d, Labirinto ENTRADA_TESTE = """ ######## #..O.O.# ##@.O..# #...O..# #.#.O..# #...O..# #......# ######## <^^>>>vv<v>>v<< """ ENTRADA = """ ########## #..O..O.O# #......O.# #.OO..O.O# #..O@..O.# #O#..O...# #O..O..O.# #.OO.O.OO# #....O...# ########## <vv>^<v^>v>^vv^v>v<>v^v<v<^vv<<<^><<><>>v<vvv<>^v^>^<<<><<v<<<v^vv^v>^ vvv<<^>^v^^><<>>><>^<<><^vv^^<>vvv<>><^^v>^>vv<>v<<<<v<^v>^<^^>>>^<v<v ><>vv>v^v^<>><>>>><^^>vv>v<^^^>>v^v^<^^>v^^>v^<^v>v<>>v^v^<v>v^^<^^vv< <<v<^>>^^^^>>>v^<>vvv^><v<<<>^^^vv^<vvv>^>v<^^^^v<>^>vvvv><>>v^<<^^^^^ ^><^><>>><>^^<<^^v>>><^<v>^<vv>>v>>>^v><>^v><<<<v>>v<v<v>vvv>^<><<>^>< ^>><>^v<><^vvv<^^<><v<<<<<><^v<<<><<<^^<v<^^^><^>>^<v^><<<^>>^v<v^v<v^ >^>>^v>vv>^<<^v<>><<><<v<<v><>v<^vv<<<>^^v^>^^>>><<^v>>v^v><^^>>^<>vv^ <><^^>^^^<><vvvvv^v<v<<>^v<v>v<<^><<><<><<<^^<<<^<<>><<><^^^>^^<>^>v<> ^^>vv<^v^v<vv>^<><v<^v>^^^>>>^^vvv^>vvv<>>>^<^>>>>>^<<^v>^vvv<>^<><<v> v^^>>><<^^<>>^v^<v^vv<>v^<<>^<^v^v><^<<<><<^<v><v<>vv>>v><v^<vv<>v^<<^ """ def le_entrada(entrada: str) -> tuple[str, str]: labirinto = [] comandos = [] for linha in entrada.splitlines(): if not linha.strip(): continue if linha.startswith("#"): labirinto.append(linha) else: comandos.append(linha) return labirinto, comandos class Vector2dComSimbolo(Vector2d): def __init__(self, y, x, símbolo): super().__init__(y, x) self.símbolo = símbolo def __add__(self, other): return Vector2dComSimbolo(self.y + other.y, self.x + other.x, self.símbolo) def __sub__(self, other): return Vector2dComSimbolo(self.y - other.y, self.x - other.x, self.símbolo) class Caixas(Vector2dComSimbolo): def __init__(self, y, x, símbolo="O"): super().__init__(y, x, símbolo) class Parede(Vector2dComSimbolo): def __init__(self, y, x): super().__init__(y, x, "#") class Robô(Vector2dComSimbolo): def __init__(self, y, x): super().__init__(y, x, "@") class Armazém(Labirinto): def inicie_desenho(self, fundo="."): self.fundo = fundo self.desenho = [] for _ in range(self.max_y): self.desenho.append([fundo] * self.max_x) def desenhe(self, objetos: list[Vector2dComSimbolo]): for objeto in objetos: self.desenho[objeto.y][objeto.x] = objeto.símbolo def dump_desenho(self, output=sys.stdout): for linha in self.desenho: print("".join(linha), file=output) def __iter__(self): return iter(self.labirinto) class Problema: DIREÇÕES = { "<": Vector2d(0, -1), "^": Vector2d(-1, 0), ">": Vector2d(0, 1), "v": Vector2d(1, 0), } def __init__(self, labirinto: list[str], comandos: list[str]): self.comandos = "".join(comandos) self.armazém = Armazém(labirinto) self.monta() def monta(self): self.paredes = set() self.caixas = dict() self.robô = None for y, linha in enumerate(self.armazém): for x, char in enumerate(linha): if char == "#": self.paredes.add(Parede(y, x)) if char == "O": caixa = Caixas(y, x) self.caixas[caixa] = caixa if char == "@": self.robô = Robô(y, x) def desenha(self): self.armazém.inicie_desenho() self.armazém.desenhe(self.paredes) self.armazém.desenhe(self.caixas) self.armazém.desenhe([self.robô]) self.armazém.dump_desenho() def executa_comandos(self): for comando in self.comandos: direção = self.DIREÇÕES[comando] nova_posição = self.robô + direção if nova_posição in self.caixas: if not self.empurra(nova_posição, direção): continue elif nova_posição in self.paredes: continue self.robô += direção def empurra(self, posição: Vector2d, direção: Vector2d, executa: bool = True): nova_posição = posição + direção if nova_posição in self.paredes: return False if nova_posição in self.caixas: pode_empurrar = self.empurra(nova_posição, direção, executa) if not pode_empurrar: return False if executa: caixa = self.caixas.pop(posição) caixa.y = nova_posição.y caixa.x = nova_posição.x self.caixas[caixa] = caixa return True def gps_caixas(self): soma = 0 for caixas in self.caixas: soma += caixas.y * 100 + caixas.x return soma def prompt(self): while True: self.desenha() comandos = input("Comandos: ").lower().strip() if not comandos: break self.comandos = comandos self.executa_comandos() class Problema2(Problema): def __init__(self, labirinto: list[str], comandos: list[str]): super().__init__(labirinto, comandos) self.armazém.max_x *= 2 def monta(self): self.paredes = set() self.caixas = dict() self.robô = None for y, linha in enumerate(self.armazém): for x, char in enumerate(linha): if char == "#": self.paredes.add(Parede(y, x * 2)) self.paredes.add(Parede(y, x * 2 + 1)) if char == "O": caixa1 = Caixas(y, x * 2, "[") caixa2 = Caixas(y, x * 2 + 1, "]") self.caixas[caixa1] = caixa1 self.caixas[caixa2] = caixa2 if char == "@": self.robô = Robô(y, x * 2) def empurra(self, posição: Vector2d, direção: Vector2d, executa: bool = True): nova_posição = posição + direção if direção in [Vector2d(0, -1), Vector2d(0, 1)] or nova_posição in self.paredes: return super().empurra(posição, direção, executa) if posição not in self.caixas: return True caixa1, caixa2 = self.ache_caixa(posição) pode_empurrar1 = super().empurra(caixa1, direção, False) pode_empurrar2 = super().empurra(caixa2, direção, False) if pode_empurrar1 and pode_empurrar2: if executa: self.empurra_v(caixa1, caixa2, direção) return True return False def ache_caixa(self, posição): caixa1 = self.caixas[posição] if caixa1.símbolo == "[": caixa2 = self.caixas[posição + Vector2d(0, 1)] else: caixa2 = self.caixas[posição + Vector2d(0, -1)] return caixa1, caixa2 def empurra_v(self, posição1, posição2, direção): np1 = posição1 + direção np2 = posição2 + direção if np1 in self.caixas: self.empurra_v(*self.ache_caixa(np1), direção) if np2 in self.caixas: self.empurra_v(*self.ache_caixa(np2), direção) for posição, nova_posição in zip([posição1, posição2], [np1, np2]): caixa = self.caixas.pop(posição) caixa.y = nova_posição.y caixa.x = nova_posição.x self.caixas[caixa] = caixa def gps_caixas(self): soma = 0 for caixa in self.caixas: if caixa.símbolo == "[": soma += caixa.y * 100 + caixa.x return soma problema = Problema(*le_entrada(ENTRADA_TESTE)) problema.executa_comandos() gps = problema.gps_caixas() print("Parte 1 - teste pequena entrada:", gps) assert gps == 2028 problema = Problema(*le_entrada(ENTRADA)) problema.executa_comandos() gps = problema.gps_caixas() print("Parte 1 - teste entrada grande:", gps) assert gps == 10092 problema = Problema(*le_entrada(le_arquivo_sem_linhas_em_branco("15/input.txt"))) problema.executa_comandos() gps = problema.gps_caixas() print("Parte 1 - entrada grande:", gps) assert gps == 1552463 # Parte 2 problema = Problema2(*le_entrada(ENTRADA)) problema.executa_comandos() gps = problema.gps_caixas() print("Parte 2 - teste entrada grande:", gps) assert gps == 9021 problema = Problema2(*le_entrada(le_arquivo_sem_linhas_em_branco("15/input.txt"))) problema.executa_comandos() gps = problema.gps_caixas() print("Parte 2 - entrada grande:", gps) assert gps == 1554058 </code></pre></div>

16 de December de 2024 às 00:49

October 08, 2024

Rodrigo Delduca

My First Game with Carimbo, My Homemade Engine, For my Son

https://youtu.be/nVRzCstyspQ

TL;DR

After a while, I decided to resume work on my game engine.

I made a game for my son. I could have used an existing engine, but I chose to write everything from scratch because code is like pasta—it’s much better and more enjoyable when homemade.

This actually reminds me of when my father used to build my toys, from kites to wooden slides. Good memories. I have decided to do the same using what I know: programming.

You can watch it here or play it here, (runs in the browser thanks to WebAssembly), use A and D for moving around and space to shoot.

The engine was written in C++17, and the games are in Lua. The engine exposes some primitives to the Lua VM, which in turn coordinates the entire game.

Game source code and engine source code.

Artwork by Aline Cardoso @yuugenpixie.

Result

Henrique Playing

Going Deep

The entire game is scripted in Lua.

First, we create the engine itself.

local engine = EngineFactory.new()
    :set_title("Mega Rick")
    :set_width(1920)
    :set_height(1080)
    :set_fullscreen(false)
    :create()

Then we point to some resources for asset prefetching; they are loaded lazily to avoid impacting the game loop.

engine:prefetch({
  "blobs/bomb1.ogg",
  "blobs/bomb2.ogg",
  "blobs/bullet.png",
  "blobs/candle.png",
  "blobs/explosion.png",
  "blobs/octopus.png",
  "blobs/player.png",
  "blobs/princess.png",
  "blobs/ship.png"
})

We created the postal service, which is something I borrowed from languages like Erlang, where an entity can send messages to any other. As we’ll see below, the bullet sends a hit message when it collides with the octopus, and the octopus, upon receiving the hit, decrements its own health.

We also obtain the SoundManager to play sounds and spawn the main entities.

local postal = PostalService.new()
local soundmanager = engine:soundmanager()
local octopus = engine:spawn("octopus")
local player = engine:spawn("player")
local princess = engine:spawn("princess")
local candle1 = engine:spawn("candle")
local candle2 = engine:spawn("candle")

It’s not a triple-A title, but I’m very careful with resource management. A widely known technique is to create an object pool that you can reuse. In the code below, we limit it to a maximum of 3 bullets present on the screen.

The on_update method is a callback called each loop in the engine. In the case of the bullet, I check if it has approached the green octopus. If so, I send a message to it indicating that it received a hit.

for _ = 1, 3 do
    local bullet = entitymanager:spawn("bullet")
    bullet.placement:set(-128, -128)
    bullet:on_collision("octopus", function(self)
      self.action:unset()
      self.placement:set(-128, -128)
      postalservice:post(Mail.new(octopus, "bullet", "hit"))
      table.insert(bullet_pool, self)
    end)
    table.insert(bullet_pool, bullet)
  end

On the octopus’s side, when it receives a “hit” message, the entity triggers an explosion animation, switches to attack mode, and decreases its health by 1. If it reaches 0, it changes the animation to “dead”.

octopus = entitymanager:spawn("octopus")
octopus.kv:set("life", 16)
octopus.placement:set(1200, 622)
octopus.action:set("idle")
octopus:on_mail(function(self, message)
  local behavior = behaviors[message]
  if behavior then
    behavior(self)
  end
end)
octopus.kv:subscribe("life", function(value)
  vitality:set(string.format("%02d-", math.max(value, 0)))

  if value <= 0 then
    octopus.action:set("dead")
    if not timer then
      timemanager:singleshot(3000, function()
        local function destroy(pool)
          for i = #pool, 1, -1 do
            entitymanager:destroy(pool[i])
            table.remove(pool, i)
            pool[i] = nil
          end
        end

        destroy(bullet_pool)
        destroy(explosion_pool)
        destroy(jet_pool)

        entitymanager:destroy(octopus)
        octopus = nil

        entitymanager:destroy(player)
        player = nil

        entitymanager:destroy(princess)
        princess = nil

        entitymanager:destroy(candle1)
        candle1 = nil

        entitymanager:destroy(candle2)
        candle2 = nil

        entitymanager:destroy(floor)
        floor = nil

        overlay:destroy(vitality)
        vitality = nil

        overlay:destroy(online)
        online = nil

        scenemanager:set("gameover")

        collectgarbage("collect")

        resourcemanager:flush()
      end)
      timer = true
    end
  end
end)
octopus:on_animationfinished(function(self)
  self.action:set("idle")
end)

And finally, the player’s on_update, where I apply the velocity if any movement keys are pressed, or set the animation to idle if none are pressed.

We also have the projectile firing, where the fire function is called, taking an instance of a bullet from the pool, placing it in a semi-random position, and applying velocity.

function loop()
  if not player then
    return
  end

  player.velocity.x = 0

  if statemanager:is_keydown(KeyEvent.left) then
    player.reflection:set(Reflection.horizontal)
    player.velocity.x = -360
  elseif statemanager:is_keydown(KeyEvent.right) then
    player.reflection:unset()
    player.velocity.x = 360
  end

  player.action:set(player.velocity.x ~= 0 and "run" or "idle")

  if statemanager:is_keydown(KeyEvent.space) then
    if not key_states[KeyEvent.space] then
      key_states[KeyEvent.space] = true

      -- player.velocity.y = -360

      if octopus.kv:get("life") <= 0 then
        return
      end

      if #bullet_pool > 0 then
        local bullet = table.remove(bullet_pool)
        local x = (player.x + player.size.width) + 100
        local y = player.y + 10
        local offset_y = (math.random(-2, 2)) * 30

        bullet.placement:set(x, y + offset_y)
        bullet.action:set("default")
        bullet.velocity.x = 800

        local sound = "bomb" .. math.random(1, 2)
        soundmanager:play(sound)
      end
    end
  else
    key_states[KeyEvent.space] = false
  end
end

In The End

In the end, he requested some minor tweaks, such as the projectile being the princess sprite, the target being the player, and the entity that shoots being the green octopus. 🤷‍♂️

Tweaks

por Rodrigo Delduca (rodrigo@delduca.org) em 08 de October de 2024 às 00:00

June 05, 2024

Rodrigo Delduca

How I saved a few dozen dollars by migrating my personal projects from the cloud to a Raspberry Pi

Intro

Since the first URL shorteners emerged, that’s always drawn me in for some reason, perhaps because I figured out on my own that they encoded the primary key in base36 (maybe).

So around ~2010, I decided to make my own. It was called “encurta.ae” (make it short there in Portuguese), and it was based on AppEngine. Even back then, I was quite fond of serverless and such.

It worked great locally, but when I deployed it, the datastore IDs were too large, causing the URLs to be too long.

Fast-forward to nowadays, I’ve decided to bring this project to life again. This time with a twist: when shortening the URL, the shortener would take a screenshot of the website and embed Open Graph tags in the redirect URL. This way, when shared on a social network, the link would display a title, description, and thumbnail that accurately represent what the user is about to open.

Stack

Typically, I use serverless for my personal projects because they scale to zero, thus eliminating costs when not in use. However, after working with serverless for many years, I’ve been wanting to experiment with a more grounded and down-to-earth approach to development and deployment.

I opted to use Go, with several goroutines performing tasks in parallel, and a purely written work queue mechanism.
Playwright & Chromium for automation and screenshot.
SQLite for the database (it’s simple)
Backblaze for storage
BunnyCDN for cotent delivery
Papertrail for logging

Deployment is done using Docker Compose, via SSH using the DOCKER_HOST environment variable pointing directly to a Raspberry Pi that I had bought and never used before. Now it saves me $5 per month, and I can keep a limited number of projects running on it.

And then you might ask: How do you expose it to the internet? I use Cloudflare Tunnel; the setup is simple and creates a direct connection between the Raspberry Pi and the nearest Point of Presence.

This type of hosting is extremely advantageous because the server’s IP and/or ports are never revealed; it stays behind my firewall. Everything goes through Cloudflare.

I have more than one Docker Compose file, and that’s what’s coolest. Locally, I run one, for deployment I instruct the Compose to read two others for logging and tunneling.

docker-compose.yaml

services:
  app:
    build: .
    env_file:
      - .env
    ports:
      - "8000:8000"
    restart: unless-stopped
    volumes:
      - data:/data
    tmpfs:
      - /tmp
volumes:
  data:

docker-compose.logging.yaml

services:
  app:
    logging:
      driver: syslog
      options:
        syslog-address: "udp://logs2.papertrailapp.com:XXX"

docker-compose.cloudflare.yaml

services:
  tunnel:
    image: cloudflare/cloudflared
    restart: unless-stopped
    command: tunnel run
    environment:
      - TUNNEL_TOKEN=yourtoken

Then for deploying

DOCKER_HOST=ssh://pi@192.168.0.10 docker compose --file docker-compose.yaml --file docker-compose.logging.yaml --file docker-compose.cloudflare.yaml up --build --detach

Example of the “worker queue”

package functions

import (
	"context"
	"database/sql"
	"fmt"
	"os"
	"os/exec"
	"path/filepath"
	"sync"
	"time"

	google "cloud.google.com/go/vision/apiv1"
	visionpb "cloud.google.com/go/vision/v2/apiv1/visionpb"
	"github.com/martinlindhe/base36"
	"github.com/minio/minio-go/v7"
	"github.com/minio/minio-go/v7/pkg/credentials"
	"github.com/playwright-community/playwright-go"
	"go.uber.org/zap"
	"google.golang.org/api/option"
	log "skhaz.dev/urlshortnen/logging"
)

var (
	accessKeyID     = os.Getenv("BACKBLAZE_ACCESS_ID")
	secretAccessKey = os.Getenv("BACKBLAZE_APPLICATION_KEY")
	bucket          = os.Getenv("BACKBLAZE_BUCKET")
	endpoint        = os.Getenv("BACKBLAZE_ENDPOINT")
	useSSL          = true
	extension       = "webp"
	mimetype        = "image/webp"
	quality         = "50"
)

type WorkerFunctions struct {
	db      *sql.DB
	vision  *google.ImageAnnotatorClient
	mc      *minio.Client
	browser playwright.BrowserContext
}

func Worker(db *sql.DB) {
	defer func() {
		if r := recover(); r != nil {
			log.Error("worker panic", zap.Any("error", r))
			time.Sleep(time.Second * 10)
			go Worker(db)
		}
	}()

	ctx := context.Background()

	vision, err := google.NewImageAnnotatorClient(ctx, option.WithCredentialsJSON([]byte(os.Getenv("GOOGLE_CREDENTIALS"))))
	if err != nil {
		log.Error("failed to create vision client", zap.Error(err))
		return
	}
	defer vision.Close()

	mc, err := minio.New(endpoint, &minio.Options{
		Creds:  credentials.NewStaticV4(accessKeyID, secretAccessKey, ""),
		Secure: useSSL,
	})
	if err != nil {
		log.Error("failed to create minio client", zap.Error(err))
		return
	}

	pw, err := playwright.Run()
	if err != nil {
		log.Error("failed to launch playwright", zap.Error(err))
		return
	}
	//nolint:golint,errcheck
	defer pw.Stop()

	userDataDir, err := os.MkdirTemp("", "chromium")
	if err != nil {
		log.Error("failed to create temporary directory", zap.Error(err))
		return
	}
	defer os.RemoveAll(userDataDir)

	browser, err := pw.Chromium.LaunchPersistentContext(userDataDir, playwright.BrowserTypeLaunchPersistentContextOptions{
		Args: []string{
			"--headless=new",
			"--no-zygote",
			"--no-sandbox",
			"--disable-gpu",
			"--hide-scrollbars",
			"--disable-setuid-sandbox",
			"--disable-dev-shm-usage",
			"--disable-extensions-except=/opt/extensions/ublock,/opt/extensions/isdncac",
			"--load-extension=/opt/extensions/ublock,/opt/extensions/isdncac",
		},
		DeviceScaleFactor: playwright.Float(4.0),
		Headless:          playwright.Bool(false),
		Viewport: &playwright.Size{
			Width:  1200,
			Height: 630,
		},
		UserAgent: playwright.String("Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/125.0.0.0 Safari/537.36"),
	})
	if err != nil {
		log.Error("failed to launch chromium browser", zap.Error(err))
		return
	}
	defer browser.Close()

	wf := WorkerFunctions{
		db:      db,
		vision:  vision,
		mc:      mc,
		browser: browser,
	}

	for {
		start := time.Now()

		func() {
			var (
				wg   sync.WaitGroup
				rows *sql.Rows
				err  error
			)

			rows, err = db.Query("SELECT id, url FROM data WHERE ready = 0 ORDER BY created_at LIMIT 6")
			if err != nil {
				log.Error("error executing query", zap.Error(err))
				return
			}
			defer rows.Close()

			for rows.Next() {
				var id int64
				var url string
				if err = rows.Scan(&id, &url); err != nil {
					log.Error("error scanning row", zap.Error(err))
					return
				}

				wg.Add(1)
				go wf.run(&wg, url, id)
			}

			if err := rows.Err(); err != nil {
				log.Error("error during rows iteration", zap.Error(err))
				return
			}

			wg.Wait()
		}()

		elapsed := time.Since(start)
		if remaining := 5*time.Second - elapsed; remaining > 0 {
			time.Sleep(remaining)
		}
	}
}

func (wf *WorkerFunctions) run(wg *sync.WaitGroup, url string, id int64) {
	defer wg.Done()

	var message string

	if id < 0 {
		message = "invalid id: id must be non-negative"
		log.Error(message)
		setError(wf.db, url, message)
		return
	}

	var (
		ctx   = context.Background()
		short = base36.Encode(uint64(id))
	)

	dir, err := os.MkdirTemp("", "screenshot")
	if err != nil {
		message = fmt.Sprintf("failed to create temporary directory: %v", err)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}
	defer os.RemoveAll(dir)

	var (
		fileName = fmt.Sprintf("%s.%s", short, extension)
		filePath = filepath.Join(dir, fileName)
	)

	page, err := wf.browser.NewPage()
	if err != nil {
		message = fmt.Sprintf("failed to create new page: %v", err)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}
	defer page.Close()

	if _, err = page.Goto(url, playwright.PageGotoOptions{
		WaitUntil: playwright.WaitUntilStateDomcontentloaded,
	}); err != nil {
		message = fmt.Sprintf("failed to navigate to url: %v", err)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}

	time.Sleep(time.Second * 5)

	title, err := page.Title()
	if err != nil {
		message = fmt.Sprintf("could not get title: %v", err)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}

	description, err := page.Locator(`meta[name="description"]`).GetAttribute("content")
	if err != nil {
		log.Info("could not get meta description", zap.Error(err))
		description = ""
	}

	if _, err = page.Screenshot(playwright.PageScreenshotOptions{
		Path: playwright.String(filePath),
	}); err != nil {
		message = fmt.Sprintf("failed to create screenshot: %v", err)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}

	fp, err := os.Open(filePath)
	if err != nil {
		message = fmt.Sprintf("failed to open screenshot: %v", err)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}
	defer fp.Close()

	image, err := google.NewImageFromReader(fp)
	if err != nil {
		message = fmt.Sprintf("failed to load screenshot: %v", err)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}

	annotations, err := wf.vision.DetectSafeSearch(ctx, image, nil)
	if err != nil {
		message = fmt.Sprintf("failed to detect labels: %v", err)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}

	if annotations.Adult >= visionpb.Likelihood_POSSIBLE || annotations.Violence >= visionpb.Likelihood_POSSIBLE || annotations.Racy >= visionpb.Likelihood_POSSIBLE {
		message = fmt.Sprintf("site is not safe %v", annotations)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}

	cmd := exec.Command("convert", filePath, "-resize", "50%", "-filter", "Lanczos", "-quality", quality, filePath)
	if err := cmd.Run(); err != nil {
		message = fmt.Sprintf("error during image conversion: %v", err)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}

	_, err = wf.mc.FPutObject(ctx, bucket, fileName, filePath, minio.PutObjectOptions{ContentType: mimetype})
	if err != nil {
		message = fmt.Sprintf("failed to upload file to minio: %v", err)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}

	if _, err = wf.db.Exec("UPDATE data SET ready = 1, title = ?, description = ? WHERE url = ?", title, description, url); err != nil {
		message = fmt.Sprintf("failed to update database record: %v", err)
		log.Error(message)
		setError(wf.db, url, message)
		return
	}

	go warmup(fmt.Sprintf("%s/%s", os.Getenv("DOMAIN"), short))
}

func setError(db *sql.DB, url, message string) {
	if _, err := db.Exec("UPDATE data SET ready = 1, error = ? WHERE url = ?", message, url); err != nil {
		log.Error("failed to update database error record", zap.Error(err))
	}
}

It’s noticeable that it took a bit more effort than if I had used a task queue. However, it turned out to be quite robust and easy to debug, and for that reason alone, it was worth it.

Conclusion

In sharing the shortened link (https://takealook.pro/4MP) for my company Ultratech Software on social networks, you can have a really nice preview, as seen below.

takealook.pro

Take A Look

por Rodrigo Delduca (rodrigo@delduca.org) em 05 de June de 2024 às 00:00

March 20, 2024

Rodrigo Delduca

My 2024 Setup

Software & Productivity

brew as package manager
asdf as language & tools management
Zed as code editor
Bear as notes keeping
Firefox as browser
Kagi as search engine
ChatGPT as AI assistant
Bruno as API client
OrbStack as Docker desktop replacement, plus virtual machines
Apple ecosystem for mail, calendar, reminders, music, messaging, and more.

Degoogling

As can be seen, I am using Kagi, which delivers excellent results without any garbage or ads, instead of Google. I also use email with my own domain through Apple’s iCloud email system, which allows for custom domains, as well as calendar and other features, all outside Google’s domain. Unfortunately, I have to maintain my Google account to use YouTube and occasionally for sites that do not allow creating accounts with a username and password, but only through Single Sign-On (SSO).

por Rodrigo Delduca (rodrigo@delduca.org) em 20 de March de 2024 às 00:00

March 06, 2024

Rodrigo Delduca

Mastering Atomic Complex Operations in Redis: Unlocking High-Performance Data Handling With Lua

I have been using Redis for its most basic operations such as caching with expiration and counters, as well as slightly more complex operations like zsets, zrangebyscore, and scan, or as a queue for probably more than 6 years on a daily basis.

I won’t discuss Postgres today, as we can achieve similar results. Today, I will introduce something different: the possibility of scripting Redis with Lua.

Months ago, I found myself facing the following problem: I needed to retrieve a JSON string from Redis using a specific key, merge it with the local JSON, and then submit it back to Redis as a string.

This would have been a trivial task if not for the following scenario: On the “other” side, there was a process that consumed the same JSON and then, in an atomic pipeline operation, deleted it.

Given this context, I could not perform the JSON merge on the client side, as it would not be atomic. This is where my native language, Lua (incidentally, I have a dog with the same name, and another dog named Python, yes, I’m a nerd, how did you guess?), comes into play.

With Lua, operations are atomic and accelerated with JIT, making it possible to do a myriad of things, one very classic example being a rate limiter by IP address.

Let’s see how my solution turned out:

const script = `
    local key = KEYS[1]
    local input = ARGV[1]
    local existing = redis.call('GET', key)

    if existing then
      local existingJson = cjson.decode(existing)
      local inputJson = cjson.decode(input)
      for _, v in ipairs(inputJson) do
        table.insert(existingJson, v)
      end
      input = cjson.encode(existingJson)
    end

    redis.call('SET', key, input)
    return input
  `;

const key = "...";

await redis.eval(script, {
  keys: [key],
  arguments: [JSON.stringify(schema.parse(data))],
});

And on the other side, the consumer can retrieve the JSON and delete it, atomically within a pipeline:

const pipeline = redis.multi();
pipeline.get("...");
pipeline.del("...");
const [jsonStr] = pipeline.exec();

In this way, all operations are atomic.

Elmo reacting to the atomicity of combining Lua with Redis

por Rodrigo Delduca (rodrigo@delduca.org) em 06 de March de 2024 às 00:00

December 18, 2023

Rodrigo Delduca

2023 was a productive year

The year started and I joined Fueled, an agency specializing in mobile apps and web systems, as a lead engineer.

At Fueled, so far I’ve had the opportunity to work with three of their biggest clients to date. One of them is the world’s most famous lingerie brand, for which I developed a ‘social network’ that currently has about 5 million users. Then, I worked on a credit platform for elders in the United States that garnered 500,000 users at launch. Currently, I am working on a personal time management project for one of the top coaches in the field.

In parallel, I wrote five articles for Fueled’s technical blog. They are:

I also wrote another five for my personal blog (this one). They are:

As if that wasn’t enough, I also developed and maintained personal projects, one of which, Carimbo, was the one I devoted most of my time to. They are:

Carimbo - A 2D engine written in C++ using SDL and Lua as a scripting language.
Carimbo Play - A web server to serve a combination of Carimbo releases and games.
Grammateus - An assistant that summarizes Twitch streams using OpenAI, written in Go.
Neo Compiler Bot - A Telegram bot capable of compiling and executing unsafe code in the cloud.
Freudian Slip Bot - A Telegram bot designed to detect slip-ups between the lines.

And many other small projects.

As a result.

GitHub Stats

Let’s see if 2024 will surpass it. Happy new year everyone!

por Rodrigo Delduca (rodrigo@delduca.org) em 18 de December de 2023 às 00:00

December 01, 2023

Rodrigo Delduca

Executing Untrusted Code in Serverless Environments: A Telegram Bot for Running C and C++ Code on Cloud Run

Intro

I enjoy experimenting and writing Telegram bots for programming groups I participate in. In two groups, people frequently ask about C or C++ code, seeking help, examples, and more. Instead of using online tools like Godbolt (Compiler Explorer), they prefer sending their code directly in messages.

I had previously created such a bot using a Flask webserver, which communicated with another container through JSON-RPC. It worked well but occasionally had issues.

With the rise of LLM, I switched to using OpenAI, but many users complained about the unconventional results, which was amusing.

Recently, while working on a project named Carimbo, I started exploring WebAssembly. I realized it could be ideal for running untrusted code. Initially, I considered using isolated-vm with WebAssembly, but I was quite satisfied with Wasmtime. It offered options to limit CPU time and RAM usage, among other features.

Cgroups

Any experienced developer would likely suggest using cgroups and namespaces, which are indeed superior options. However, I prefer not to incur the costs of VMs or keep a machine running 24/7 at my home. This is primarily because Cloud Run, based on Docker, already utilizes cgroups, and to my knowledge, nested cgroups aren’t possible.

Cloud Run offers me several advantages. Without delving into too much detail, it’s a serverless platform built on top of Kubernetes, employing gVisor for an added security layer. You don’t need to handle Kubernetes directly, but the option for fine-tuning is available, which I will discuss in this article.

The Bot

Unlike in my previous work Hosting Telegram bots on Cloud Run for free, this time I will not use Flask, but instead, I will directly employ Starlette. Starlette is an asynchronous framework for Python. One of the main reasons for this migration is to utilize asyncio, which will enable handling more requests. Additionally, the python-telegram-robot library has shifted to this asynchronous model, aligning with this change.

Let’s start with the Dockerfile.

FROM python:3.12-slim-bookworm AS base

ENV PIP_DISABLE_PIP_VERSION_CHECK 1
ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1
ENV EMSDK=/emsdk
ENV PATH=/emsdk:/emsdk/upstream/emscripten:/opt/venv/bin:$PATH

FROM base AS builder
RUN python -m venv /opt/venv
COPY requirements.txt .
RUN pip install --no-cache-dir --requirement requirements.txt

FROM base
WORKDIR /opt/app

# Let's steal this entire directory from the official Emscripten image.
COPY --from=emscripten/emsdk:3.1.49 /emsdk /emsdk
COPY --from=builder /opt/venv /opt/venv
COPY . .

RUN useradd -r user
USER user

# Instead of Gunicorn, we will use Uvicorn, which is an ASGI web server implementation for Python.
CMD exec uvicorn main:app --host 0.0.0.0 --port $PORT --workers 8 --timeout-keep-alive 600 --timeout-graceful-shutdown 600

The main differences are that we steal an entire directory from the Emscripten Docker image, which saves us from having to build in the image, which is excellent. We also use Uvicorn, an ASGI web server that allows direct use of asyncio.

Now let’s see how it goes with handling the incoming requests.

def equals(left: str | None, right: str | None) -> bool:
  """
  Compare two strings using a consistent amount of time to avoid timing attacks.
  """
  if not left or not right:
    return False

  if len(left) != len(right):
    return False

  for c1, c2 in zip(left, right):
    if c1 != c2:
      return False

  return True


async def webhook(request: Request):
  """
  Entry point for requests coming from Telegram.
  """
  if not equals(
    request.headers.get("X-Telegram-Bot-Api-Secret-Token"),
    os.environ["SECRET"],
  ):
    # This section prevents false calls, only this application and Telegram know the secret.
    return Response(status_code=401)

  payload = await request.json()

  # Where the bot becomes operational, the JSON is passed to the application, which in turn processes the request.
  async with application:
    await application.process_update(Update.de_json(payload, application.bot))

  return Response(status_code=200)


app = Starlette(
  routes=[
    Route("/", webhook, methods=["POST"]),
  ],
)

Finally, we have the handler for messages that start with /run.

async def on_run(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
  message = update.message.reply_to_message or update.message
  if not message:
    return

  text = message.text
  if not text:
    return

  text = text.lstrip("/run")

  if not text:
    await message.reply_text("Luke, I need the code for the Death Star's system.")
    return

  try:
    # All the code is asynchronous, while the 'run' function is not. Therefore, we execute it in a thread.
    coro = asyncio.to_thread(run, text)

    # We execute the thread as a coroutine and limit its execution to 30 seconds.
    result = await asyncio.wait_for(coro, timeout=30)

    # Below, we prevent flooding in groups by placing very long messages into a bucket and returning the public URL.
    if len(result) > 64:
      blob = bucket.blob(hashlib.sha256(str(text).encode()).hexdigest())
      blob.upload_from_string(result)
      blob.make_public()

      result = blob.public_url

    # Respond to the message with the result, which can be either an error or a success.
    await message.reply_text(result)
  except asyncio.TimeoutError:
    # If the code exceeds the time limit or takes too long to compile, we return some emojis.
    await message.reply_text("⏰😮‍💨")

Running Untrusted Code

Each request to execute code is compiled using em++, an ‘alias’ for clang++, targeting WebAssembly, and then executed with the WASI runtime. Each execution runs separately and in a thread-safe manner in its own directory. While I could limit CPU usage (fuel) and memory usage, as indicated by the commented lines, in my case I opted for a container with 4GB of RAM and 4 vCPUs, which is more than sufficient given that I configured Run to accept only 8 connections per instance.

def run(source: str) -> str:
  with TemporaryDirectory() as path:
    os.chdir(path)

    with open("main.cpp", "w+t") as main:
      main.write(source)
      main.flush()

      try:
        # Compile it.
        result = subprocess.run(
          [
            "em++",
            "-s",
            "ENVIRONMENT=node",
            "-s",
            "WASM=1",
            "-s",
            "PURE_WASI=1",
            "main.cpp",
          ],
          capture_output=True,
          text=True,
          check=True,
        )

        if result.returncode != 0:
          return result.stderr

        # Run it.
        with open("a.out.wasm", "rb") as binary:
          wasi = WasiConfig()
          # Store the output in a file.
          wasi.stdout_file = "a.out.stdout"
          # Store the errors in a file.
          wasi.stderr_file = "a.out.stderr"

          config = Config()
          # config.consume_fuel = True
          engine = Engine(config)
          store = Store(engine)
          store.set_wasi(wasi)
          # Limits the RAM.
          # store.set_limits(16 * 1024 * 1024)
          # Limits the CPU.
          # store.set_fuel(10_000_000_000)

          linker = Linker(engine)
          linker.define_wasi()
          module = Module(store.engine, binary.read())
          instance = linker.instantiate(store, module)

          # `_start` is the binary entrypoint, also known as main.
          start = instance.exports(store)["_start"]
          assert isinstance(start, Func)

          try:
            start(store)
          except ExitTrap as e:
            # If exit code is not 0, we return the errors.
            if e.code != 0:
              with open("a.out.stderr", "rt") as stderr:
                return stderr.read()

          # If no errors, we return the output.
          with open("a.out.stdout", "rt") as stdout:
            return stdout.read()
      except subprocess.CalledProcessError as e:
        return e.stderr
      except Exception as e:  # noqa
        return str(e)

Deploy

In the past, I always used Google’s tools for deployment, but this time I tried building the Docker image in GitHub Action, which gave me two huge advantages.

Cache: I don’t know why, but I never got the cache to work in Cloud Build. With GitHub, it’s just a matter of using a flag.
Modern Docker syntax usage: In Cloud Build, it’s not possible to use heredoc, for example.
Speed: I know it’s possible to upgrade the Cloud Build machine, but that costs money, and on GitHub, I have a quite generous free quota.

name: Deploy on Google Cloud Platform

on:
  push:
    branches:
      - main

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Authenticate to Google Cloud
        uses: google-github-actions/auth@v1
        with:
          credentials_json: $

      - name: Set up Google Cloud SDK
        uses: google-github-actions/setup-gcloud@v1
        with:
          project_id: $

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Authenticate Docker
        run: gcloud auth configure-docker --quiet $-docker.pkg.dev

      - name: Build And Push Telegram Service
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: $/$:$
          cache-from: type=gha
          cache-to: type=gha,mode=max

      - name: Deploy Telegram Service to Cloud Run
        env:
          TELEGRAM_SERVICE_NAME: $
          REGION: $
          REGISTRY: $
          GITHUB_SHA: $
          TELEGRAM_TOKEN: $
          SECRET: $
          BUCKET: $
        run: |
          cat <<EOF | envsubst > service.yaml
          apiVersion: serving.knative.dev/v1
          kind: Service
          metadata:
            name: "$TELEGRAM_SERVICE_NAME"
            labels:
              cloud.googleapis.com/location: "$REGION"
          spec:
            template:
              metadata:
                annotations:
                  run.googleapis.com/execution-environment: "gen2"
                  run.googleapis.com/startup-cpu-boost: "true"
                  run.googleapis.com/cpu-throttling: "true"
                  autoscaling.knative.dev/maxScale: "16"
              spec:
                containerConcurrency: "1"
                timeoutSeconds: "60"
                containers:
                  - image: "$REGISTRY/$TELEGRAM_SERVICE_NAME:$GITHUB_SHA"
                    name: "$TELEGRAM_SERVICE_NAME"
                    resources:
                      limits:
                        cpu: "4000m"
                        memory: "4Gi"
                    env:
                      - name: TELEGRAM_TOKEN
                        value: "$TELEGRAM_TOKEN"
                      - name: SECRET
                        value: "$SECRET"
                      - name: BUCKET
                        value: "$BUCKET"
          EOF

          gcloud run services replace service.yaml
          rm -f service.yaml

Conclusion

Try here: https://t.me/neo_compiler_bot or @neo_compiler_bot on Telegram.

Source code: https://github.com/skhaz/neo-compiler-and-runner.

por Rodrigo Delduca (rodrigo@delduca.org) em 01 de December de 2023 às 00:00

November 12, 2023

Rodrigo Delduca

Flipping Bits Whilst Updating Pixels

I have less than 16 milliseconds to do more than thousands of operations.

Roughly 15 years ago, during an extended summer holiday, in a shared room of a student residence, I found myself endeavoring to port my 2D game engine built on top of SDL to the Google Native Client (NaCl). NaCl served as a sandboxing mechanism for Chrome, enabling the execution of native code within the browser, specifically within the Chrome browser. It’s safe to assert that NaCl can be considered the progenitor of WebAssembly.

A considerable amount of time has elapsed, and many changes have transpired. I transitioned from game development to web development, yet low-level programming has always coursed through my veins. Consequently, I resolved to revive the dream of crafting my own game engine running on the web. Today, with the advent of WebAssembly, achieving this goal is significantly more feasible and portable.

Therefore, I created Carimbo 🇧🇷, meaning “stamp” in English. The name encapsulates the notion that 2D engines are continually stamping sprites onto the screen.

Artwork by @yuugenpixie

This engine shares the foundational principles of Wintermoon; it is built upon the SDL library, employs Lua for scripting, and consolidates all assets within a compressed LZMA file, which is mounted when the engine initializes.

In essence, it operates as follows: video, audio, joystick, network, and file system components are initialized. Then, the bundle.zip file is mounted. When I say ‘mounted,’ I employ a library that makes read and write operations within the compressed file entirely transparent, eliminating the need for decompression, which is excellent. Subsequently, the ‘main.lua’ file is executed. This file should utilize the factory to construct the engine, which is the cornerstone. Following this, the script must spawn entities and other objects to be used within the game. Finally, with the game defined, the script should invoke the ‘run’ method of the engine, which will block and initiate the event loop.

However, this time around, the tooling for C++ has significantly improved compared to that era. Numerous package managers now exist, and compilers, along with the standard library, have matured considerably. Notably, there’s no longer a necessity to employ Boost for advanced features; many functionalities, including smart pointers and others formerly associated with Boost, are now integral parts of the standard library.

Speaking of package managers, in Carimbo, I opted for Conan, which, in my opinion, is an excellent package manager.

It was during this exploration that I discovered Conan’s support for various toolings, including emsdk—the Software Development Kit (SDK) and compiler for the Emscripten project. Emscripten is an LLVM/Clang-based compiler designed to translate C and C++ source code into WebAssembly.

With the emsdk, I could finally fulfill my long-held aspiration.

Yes, a black screen and an overwhelming sense of victory. I had successfully ported my code to run in the browser. Now, all that remained was to figure out how to load the assets (at that time). However, the event loop was running just a tad below 60 frames per second due to a bug in the counter.

And you might be wondering, ‘How do you debug this?’ Firstly, for various reasons that can burden the final binary, a lot is stripped away. Therefore, we need to recompile everything and all dependencies with sanitizers and debugging symbols. Secondly, WebAssembly should be linked with -sASSERTIONS=2, -sRUNTIME_DEBUG, and –profiling. This way, it’s possible to see the stack trace in the browser console as if by magic. Additionally, Chrome has a debugger that allows you to insert breakpoints within your source code and inspect step by step.

By the way, a binary and all its dependencies compiled with all sanitizers and debugging symbols can easily surpass 300 megabytes! So, I recommend compiling with -Os or -O1.

The source code for the Carimbo

Check out the engine running below; this is just a concept without audio and joystick support.

por Rodrigo Delduca (rodrigo@delduca.org) em 12 de November de 2023 às 00:00

November 01, 2023

Rodrigo Delduca

What I’ve been automating with GitHub Actions, an automated life

Automate all the things!

Programmers, like no other beings on the planet, are completely obsessed with automating things, from the simplest to the most complex, and I’m no different.

I have automated several things using GitHub Actions, and today I will show some of the things I’ve done.

README.md

In my GitHub README, I periodically fetch the RSS feed from my blog (the one you are currently reading) and populate it with the latest articles, like this:

Ah, you can use any source of RSS, like your YouTube channel!

name: Latest blog post workflow
on:
  schedule:
    - cron: "0 */6 * * *"
  workflow_dispatch: # Run workflow manually

jobs:
  update-readme-with-blog:
    name: Update this repo's README with latest blog posts
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v2
      - name: Pull NULL on Error posts
        uses: gautamkrishnar/blog-post-workflow@v1
        with:
          comment_tag_name: BLOG
          commit_message: Update with the latest blog posts
          committer_username: Rodrigo Delduca
          committer_email: 46259+skhaz@users.noreply.github.com
          max_post_count: 6
          feed_list: "https://nullonerror.org/feed"

Resume

My resume is public. I have an action in the repository that compiles and uploads it to a Google Cloud bucket and sets the object as public, like this:

on: push
jobs:
  build:
    runs-on: ubuntu-latest
    container: texlive/texlive
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Build
        run: |
          xelatex resume.tex
      - name: Authenticate to Google Cloud
        uses: google-github-actions/auth@v1
        with:
          credentials_json: $
      - name: Upload to Google Cloud Storage
        uses: google-github-actions/upload-cloud-storage@v1
        with:
          path: resume.pdf
          destination: gcs.skhaz.dev
          predefinedAcl: publicRead

You can check it out at https://gcs.skhaz.dev/resume.pdf.

GitHub Stars

I believe everyone enjoys starring repositories, but GitHub’s interface doesn’t help much when it comes to finding or organizing them.

To do this, I use starred, which generates a README file categorizing by language. You can check it out at the following address: https://github.com/skhaz/stars

name: Update Stars
on:
  workflow_dispatch:
  schedule:
    - cron: 0 0 * * *

jobs:
  stars:
    name: Update stars
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v3
        with:
          python-version: "3.10"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install starred
      - name: Get repository name
        run: echo "REPOSITORY_NAME=${GITHUB_REPOSITORY#*/}" >> $GITHUB_ENV
      - name: Update repository category by language
        env:
          GITHUB_TOKEN: $
          REPOSITORY: $
          USERNAME: $
        run: starred --username ${USERNAME} --repository ${REPOSITORY} --sort --token ${GITHUB_TOKEN} --message 'awesome-stars category by language update by github actions cron, created by starred'

Healthchecks

I imagine that, just like me, you also have several websites. In my case, all of them are simple and do not generate any income. However, I want to make sure that everything is in order, so I use https://healthchecks.io in conjunction with GitHub Actions. Healthchecks.io operates passively; it does not make requests to your site. On the contrary, you must make a request to it, and then it marks it as healthy. If there are no pings for a certain amount of time (configurable), it will notify you through various means that the site or application is not functioning as it should. Think of it as a Kubernetes probe.

name: Health Check

on:
  workflow_dispatch:
  schedule:
    - cron: "0 * * * *"

jobs:
  health:
    runs-on: ubuntu-latest
    steps:
      - name: Check health
        run: |
          STATUSCODE=$(curl -s -o /dev/null --write-out "%{http_code}" "${SITE_URL}")

          if test $STATUSCODE -ne 200; then
            exit 1
          fi

          curl -fsS -m 10 --retry 5 -o /dev/null "https://hc-ping.com/${HEALTH_UUID}"
        env:
          HEALTH_UUID: $
          SITE_URL: $

Salary

This one is a bit more complex, as it goes beyond just an action. I have a repository called ‘salary’, where there’s an action that runs every hour. What this action does is essentially run a Go code, and the result updates the README file. This way, I can simply access the URL and get an estimate of how much I’ll receive.

In the program, I have two goroutines running in parallel. In one of them, I query the number of hours on Toggl, multiply it by my rate, and return it through a channel. The other does the same, but it uses an API to convert dollars to Brazilian reais, and in the end, the values are summed up.

main.go

package main

import (
  "bytes"
  "encoding/json"
  "fmt"
  "io"
  "log"
  "math"
  "net/http"
  "os"
  "strconv"
  "sync"
  "time"
)

type DateRange struct {
  StartDate string `json:"start_date"`
  EndDate   string `json:"end_date"`
}

type Summary struct {
  TrackedSecond int `json:"tracked_seconds"`
}

func toggl(result chan<- int, wg *sync.WaitGroup) {
  defer wg.Done()

  var (
    now      = time.Now()
    firstDay = time.Date(now.Year(), now.Month(), 1, 0, 0, 0, 0, time.UTC)
    lastDay  = firstDay.AddDate(0, 1, -1)

    url       = fmt.Sprintf("https://api.track.toggl.com/reports/api/v3/workspace/%s/projects/summary", os.Getenv("TOGGL_WORKSPACE_ID"))
    dataRange = DateRange{
      StartDate: firstDay.Format("2006-01-02"),
      EndDate:   lastDay.Format("2006-01-02"),
    }
  )

  payload, err := json.Marshal(dataRange)
  if err != nil {
    log.Fatalln(err)
  }

  req, err := http.NewRequest(http.MethodPost, url, bytes.NewBuffer(payload))
  if err != nil {
    log.Fatalln(err)
  }

  req.Header.Set("Content-Type", "application/json")
  req.SetBasicAuth(os.Getenv("TOGGL_EMAIL"), os.Getenv("TOGGL_PASSWORD"))

  client := &http.Client{}
  resp, err := client.Do(req)
  if err != nil {
    log.Fatalln(err)
  }
  defer resp.Body.Close()

  body, err := io.ReadAll(resp.Body)
  if err != nil {
    log.Fatalln(err)
  }

  var summaries []Summary
  if err = json.Unmarshal(body, &summaries); err != nil {
    log.Fatalln(err)
  }

  total := 0
  for _, summary := range summaries {
    total += summary.TrackedSecond
  }

  hourlyRate, err := strconv.Atoi(os.Getenv("TOGGL_HOURLY_RATE"))
  if err != nil {
    log.Fatalln(err)
  }

  result <- (total / 3600) * hourlyRate
}

type CurrencyData struct {
  Quotes struct {
    USDBRL float64 `json:"USDBRL"`
  } `json:"quotes"`
}

func husky(result chan<- int, wg *sync.WaitGroup) {
  defer wg.Done()

  var (
    currency = os.Getenv("HUSKY_CURRENCY")
    url      = fmt.Sprintf("https://api.apilayer.com/currency_data/live?base=USD&symbols=%s&currencies=%s", currency, currency)
  )

  req, err := http.NewRequest(http.MethodGet, url, nil)
  if err != nil {
    log.Fatalln(err)
  }

  req.Header.Set("apikey", os.Getenv("APILAYER_APIKEY"))

  client := &http.Client{}
  resp, err := client.Do(req)
  if err != nil {
    log.Fatalln(err)
  }
  defer resp.Body.Close()

  body, err := io.ReadAll(resp.Body)
  if err != nil {
    log.Fatalln(err)
  }

  var data CurrencyData
  if err = json.Unmarshal(body, &data); err != nil {
    log.Fatalln(err)
  }

  monthlySalary, err := strconv.ParseFloat(os.Getenv("HUSKY_MONTHLY_SALARY"), 64)
  if err != nil {
    log.Fatalln(err)
  }

  gross := int(math.Floor(monthlySalary * data.Quotes.USDBRL))
  deduction := gross * 1 / 100
  result <- gross - deduction
}

func main() {
  var (
    ch    = make(chan int)
    wg    sync.WaitGroup
    funcs = []func(chan<- int, *sync.WaitGroup){toggl, husky}
  )

  wg.Add(len(funcs))

  for _, fun := range funcs {
    go fun(ch, &wg)
  }

  go func() {
    wg.Wait()
    close(ch)
  }()

  var sum int
  for result := range ch {
    sum += result
  }

  fmt.Print(sum)
}

So in the action, all you have to do is run and update the README periodically.

name: Run

on:
  workflow_dispatch:
  schedule:
    - cron: "0 * * * *"

permissions:
  contents: write

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Setup Go
        uses: actions/setup-go@v4
      - name: Update Markdown
        run: |
          salary=$(make run)

          cat > README.md <<EOF
          ### Salary

          EOF

          printf "%s\n" "$salary" >> README.md
        env:
          APILAYER_APIKEY: $
          HUSKY_MONTHLY_SALARY: $
          HUSKY_CURRENCY: $
          TOGGL_WORKSPACE_ID: $
          TOGGL_EMAIL: $
          TOGGL_PASSWORD: $
          TOGGL_HOURLY_RATE: $
      - name: Commit report
        run: |
          git config --global user.name 'github-actions[bot]'
          git config --global user.email '41898282+github-actions[bot]@users.noreply.github.com'
          git add README.md
          git commit -am "Automated Salary Update" || true
          git push

Blog

The next step is to automate the publishing of new blog posts using ChatGPT ;-).

por Rodrigo Delduca (rodrigo@delduca.org) em 01 de November de 2023 às 00:00

October 17, 2023

Rodrigo Delduca

Working with compressed binary data on AWS IoT Core

Objective

Today we will see how to send compressed CBOR with ZSTD from an ESP32 microcontroller through an MQTT topic, passing through AWS IoT Core, and finally being decoded in a TypeScript-written Lambda function.

Intro

Firstly, what is a microcontroller (MCU)? In this article, we will be using the ESP32 microcontroller from Espressif. It is a highly affordable and inexpensive microcontroller that comes with built-in WiFi and Bluetooth, making it ideal for IoT projects as we will see today. It also boasts a generous amount of flash memory and RAM, as well as a powerful dual-core 32-bit CPU.

Another tool we will be using is PlatformIO, a development framework that functions as a series of plugins and command-line tools for VSCode. With PlatformIO, we have everything we need, from project setup, board selection, serial port speed, compilation flags (yes, we’ll be compiling code in C and C++), and more. Installing it is quite simple; just visit https://platformio.org/ and follow the instructions.

Lastly, to streamline our project and infrastructure, we will be using the Serverless Framework, a framework designed for developing serverless solutions. In our case, we will be using a Lambda function to receive messages sent to the topic. The Serverless Framework is a perfect fit for this scenario and works seamlessly with AWS Amazon.

Compressed Binary Data

We could certainly use JSON. In fact, with JSON, it is possible to perform queries using the WHERE clause (Yes, IoT Core supports SQL, and we will delve into that later). However, our objective here is to save as many bytes as possible. Imagine that our application will send and receive data via a costly and unreliable telephone connection. Therefore, we need to compress the data as much as possible.

Firstly, let’s construct the raw payload in CBOR. CBOR is an interchangeable binary format specified in an RFC and supported by various programming languages (it is derived from MsgPack, in case you’ve heard of it before).

Since this part will be done on the microcontroller side, we will be using the C++ language with the ssilverman/libCBOR library. To do this, we open the platformio.ini file and add the dependency under lib_deps.

#include <CBOR.h>
#include <CBOR_parsing.h>
#include <CBOR_streams.h>

namespace cbor = ::qindesign::cbor;

Since we are developing for a microcontroller, memory allocation is a critical concern as it can lead to fragmentation and other issues. Therefore, we will define a buffer of 256 bytes for reading and writing CBOR messages, which is more than sufficient for our current application.

constexpr size_t kBytesSize = 256;
uint8_t bytes[kBytesSize]{0};
cbor::BytesStream bs{bytes, sizeof(bytes)};
cbor::BytesPrint bp{bytes, sizeof(bytes)};

Next, let’s prepare to use ZSTD. The steps are the same as with CBOR.

#include <zstd.h>

constexpr size_t kZBytesSize = 256;
uint8_t zbytes[kZBytesSize]{0};

Finally, and not least importantly, let’s prepare to send messages to a topic on IoT Core. The setup is a bit complex and requires attention. Firstly, we need to add the knolleary/PubSubClient library.

We will need three things from AWS IoT Core: the Certificate Authority (CA) certificate, the Thing certificate, and the private key certificate of the Thing. We will see how to create rules in IoT Core to allow things to subscribe and publish only to specific topics for security reasons. One more thing, in our project, we will hardcode these values in the flash memory of the ESP32 using the PROGMEM directive.

#include <WiFiClientSecure.h>
#include <PubSubClient.h>

static const char AWS_IOT_ENDPOINT[] = "....amazonaws.com";
static const uint16_t AWS_IOT_PORT = 8883;
static const char THINGNAME[] = "My ESP32";

static const char AWS_CERT_CA[] PROGMEM = R"EOF(
-----BEGIN CERTIFICATE-----
... AWS Amazon Certificate Authority (CA)
-----END CERTIFICATE-----
)EOF";

static const char AWS_CERT_CRT[] PROGMEM = R"KEY(
-----BEGIN CERTIFICATE-----
... Thing's certificate
-----END CERTIFICATE-----
)KEY";

static const char AWS_CERT_PRIVATE[] PROGMEM = R"KEY(
-----BEGIN RSA PRIVATE KEY-----
... Thing's private certificate
-----END RSA PRIVATE KEY-----
)KEY";

We need an instance of PubSub to publish and subscribe to topics, as well as a WiFi client to configure the aforementioned keys.

WiFiClientSecure net;
PubSubClient pubsub(AWS_IOT_ENDPOINT, AWS_IOT_PORT, net);

Now let’s proceed with a typical Arduino program structure, with the setup and loop functions. In the setup function, we will connect to the WiFi network, and then we will configure the keys in the WiFiClientSecure object so that the PubSubClient can use them to connect to IoT Core. Without these keys, the connection will be rejected by the AWS Amazon servers.

void setup()
{
  // For debugging.
  Serial.begin(115200);

  // Connect to WiFi.
  WiFi.mode(WIFI_STA);
  WiFi.begin(WIFI_SSID, WIFI_PASSWORD);

  // Wait for the connection.
  while (WiFi.status() != WL_CONNECTED)
  {
    delay(500);
    Serial.print(".");
  }

  // Show WiFi information.
  Serial.println();
  Serial.print("Connected to ");
  Serial.println(WIFI_SSID);
  Serial.print("IP address: ");
  Serial.println(WiFi.localIP());

  // Setup the certificates.
  net.setCACert(AWS_CERT_CA);
  net.setCertificate(AWS_CERT_CRT);
  net.setPrivateKey(AWS_CERT_PRIVATE);

  // Connect to IoT Core.
  pubsub.connect(THINGNAME);
}

If everything goes well, the connect method of PubSubClient will return true. We won’t perform that check; instead, we’ll attempt to publish directly and monitor the results on the IoT Core dashboard.

Now comes the interesting part. We will assemble our payload in CBOR format, compress it using ZSTD, and publish it to a topic.

void on_sensor(const uint64_t *sensor_data, size_t sensor_data_size) {
  // Create a new instance of the CBOR writer.
  cbor::Writer cbor{bp};

  // Reset BytesPrint instance.
  bp.reset();

  // Indicates that what follows is an array of a certain size.
  cbor.beginArray(sensor_data_size);
  for (size_t i = 0; i < sensor_data_size; i++)
  {
    // For each item in the array, write it to CBOR.
    cbor.writeUnsignedInt(sensor_data[i]);
  }

  // Get the final CBOR size.
  const size_t lenght = cbor.getWriteSize();

  // Compress the CBOR buffer using ZSTD.
  size_t compressedSize = ZSTD_compress(zbytes, kZBytesSize, bytes, lenght, ZSTD_CLEVEL_DEFAULT);

  // Publish the binary compressed data onto the topic.
  char topic[128];
  sprintf(topic, "sensors/%s/v1", THINGNAME);
  pubsub.publish(topic, zbytes, compressedSize, false);
}

On the cloud side

As I mentioned before, we will be using the Serverless Framework, which follows the Infrastructure as Code (IaC) approach. So what we’ll do is create the rules for the Things, create a lambda function, and define the trigger for that lambda as IoT Core. Since the data is in binary format, we will encode it in Base64 in the IoT Core SQL.

service: myiot

configValidationMode: error

frameworkVersion: "3"

provider:
  name: aws
  runtime: nodejs18.x
  architecture: arm64
  stage: development

resources:
  Resources:
    IoTPolicy:
      Type: AWS::IoT::Policy
      Properties:
        PolicyName: IoTPolicy
        PolicyDocument:
          Version: "2012-10-17"
          Statement:
            - Effect: Allow
              Action:
                - iot:Connect
              Resource: arn:aws:iot:*:*:client/\${iot:Connection.Thing.ThingName}
            - Effect: Allow
              Action:
                - iot:Publish
                - iot:Receive
              Resource: arn:aws:iot:*:*:topic/*/\${iot:Connection.Thing.ThingName}/*
            - Effect: Allow
              Action:
                - iot:Subscribe
              Resource: arn:aws:iot:*:*:topicfilter/*/\${iot:Connection.Thing.ThingName}/*

functions:
  tracker:
    handler: app/mylambda.handler
    events:
      - iot:
          sql: "SELECT timestamp() AS timestamp, topic() AS topic, encode(*, 'base64') AS data FROM 'sensors/+/v1'"
          sqlVersion: "2016-03-23"

plugins:
  - serverless-plugin-typescript

The YAML file for serverless may seem a bit intimidating at first, but it’s simple. It essentially does two things. Firstly, it creates an IoTPolicy for the things. A Thing can only publish or subscribe to its own topic with IoTPolicies, allowing for a granular level of security.

Also, it’s worth noting that we are using ARM64 architecture. Lambdas running on the ARM architecture are not only more cost-effective but also more efficient compared to x86_64.

The second part is the definition of the lambda function and its trigger. It uses a query in IoT Core, which is the key to working with binary data in IoT Core. You need to encode the data in base64 before sending it to the lambda function; otherwise, it won’t work. This lambda function is “listening” to the sensors topic from any Thing, hence the plus symbol in the topic. By default, I prefer to version APIs, and a topic shouldn’t be an exception. Therefore, this is version 1 of my project.

Now let’s take a look at the lambda function itself. For this project, I chose to use TypeScript, but AWS Lambda and the Serverless Framework support various programming languages.

The code is quite straightforward. It receives the binary payload in the data parameter (as defined in the SQL statement above). First, it decodes the payload from base64 to binary. Then, it decompresses it using ZSTD and, finally, utilizes the CBOR library to parse it into a JavaScript object, ready to be used.

import { decompressSync } from "@skhaz/zstd";
import { decodeFirstSync } from "cbor";

export async function handler(event: { timestamp: number; topic: string; data: string }) {
  // Extract from event some variables.
  const { timestamp, topic, data } = event;

  // Decode from base64 to binary.
  const buffer = Buffer.from(data, "base64");
  // Decompress using ZSTD algorithm.
  const cbor = decompressSync(buffer);
  // Parse the binary CBOR to JavaScript object.
  const payload = decodeFirstSync(cbor);

  // Print the sensor array data.
  console.log(payload);
}

Conclusion

In conclusion, it is possible to save a few bytes or even more with these techniques while utilizing cloud solutions like IoT Core in your projects. It allows for efficient data handling and enables the integration of cloud services into your applications.

The compression ratio can vary dramatically, depending solely on the input. In my experiments with numerical data, it ranged from around 30% to 40%.

por Rodrigo Delduca (rodrigo@delduca.org) em 17 de October de 2023 às 00:00

August 15, 2023

Thiago Avelino

Gentlemen’s Club vs Strip Club: Is There Really a Difference?

Confused about the distinctions between Strip Clubs and Gentlemen's Clubs in Las Vegas? In this episode, we're going to dissect what sets these venues apart and what you can anticipate from the very best of both. Be sure to listen in as we demystify the Las Vegas club scene. <a href="https://www.stripclubconcierge.com/" target="_blank" rel="noopener noreferer">https://www.stripclubconcierge.com/</a> <a href="https://www.stripclubconcierge.com/gentlemens-club-vs-strip-club/" target="_blank" rel="noopener noreferer">https://www.stripclubconcierge.com/gentlemens-club-vs-strip-club/</a> Tags: Gentlemens Clubs Las Vegas, Best Gentlemens Clubs Las Vegas, Treasures Gentlemen’s Club, Crazy Horse III Gentlemen’s Club

por Strip Club Concierge em 15 de August de 2023 às 13:21

August 12, 2023

Thiago Avelino

Las Vegas Strippers 101: Your 2023 Guide to "Strippers to the Room" (Private Bachelor Party Bookings)

Are you mulling over hiring Las Vegas strippers for a bachelor party? In this episode, we help you weigh up the pros and cons between opting for private in-room entertainers and taking the gang to a strip club. Make sure you tune in to get all the information you need for planning the ultimate bachelor party. <a href="https://www.stripclubconcierge.com/" target="_blank" rel="noopener noreferer">https://www.stripclubconcierge.com/</a> <a href="https://www.stripclubconcierge.com/las-vegas-strippers/" target="_blank" rel="noopener noreferer">https://www.stripclubconcierge.com/las-vegas-strippers/</a> Tags: Las Vegas Bachelor Party, Las Vegas Guides, Las Vegas Strippers, Outcall Strippers, Private Strippers

por Strip Club Concierge em 12 de August de 2023 às 13:18

July 26, 2023

Ellison Leão

Gerando documentação vimdoc de plugins Neovim com Github Actions

Dificuldade: intermediário

Escrever documentação nem sempre é uma das tarefas mais prazerosas, mas ter uma boa documentação nos nossos projetos é o que realmente faz diferença. Ainda mais quando usamos ferramentas que nos auxiliam nessa escrita.

Nesse post vamos ensinar como automatizar a geração da documentação específica para plugins Neovim utilizando Github Actions

Entendendo o fluxo

Assumindo que entendemos o fluxo básico das Github Actions, temos o seguinte fluxo para a geração da doc:

Fazemos um push na branch principal (geralmente chamada de main)
Um job chamado docs é iniciado com os seguintes passos:
- Conversão do README.md do seu projeto em um arquivo .txt no formato vimdoc
- Criamos um commit e fazemos o push na mesma branch com o resultado da conversão

Convertendo isso para o Github Actions, temos:

Crie um arquivo .github/workflows/docs.yml com o seguinte conteúdo:

on:
  push:
    branches:
      - main # aqui nossa branch principal se chama `main`

jobs:
  docs:
    runs-on: ubuntu-latest
    needs: test
    steps:
      - uses: actions/checkout@v3
      - name: panvimdoc
        uses: kdheepak/panvimdoc@main
        with:
          vimdoc: nome-do-seu-projeto # sem extensão .txt
          version: "Neovim >= 0.8.0"
          demojify: true # coloque false caso queira manter os emojis
          treesitter: true
      - name: Push changes
        uses: stefanzweifel/git-auto-commit-action@v4
        with:
          commit_message: "auto-generate vimdoc"
          commit_user_name: "github-actions[bot]"
          commit_user_email: "github-actions[bot]@users.noreply.github.com"
          commit_author: "github-actions[bot] <github-actions[bot]@users.noreply.github.com>"

Preparando seu README

Para um bom resultado de uma documentação vimdoc, recomendamos que seu README tenha pelo menos:

Uma seção Sobre
Uma seção Instalação
Uma seção Como usar

O uso de tabelas, bullet points também é suportado e recomendado. Quanto mais detalhes conseguir colocar na documentação, melhor ficará o resultado (até emojis são suportados!)

Colocando tudo para funcionar

Antes de commitar e rodar a action pela primeira vez, crie o arquivo vazio da doc:

$ touch doc/nome-do-seu-projeto.txt
$ git add doc/nome-do-seu-projeto.txt
$ git commit -m 'adicionando arquivo vimdoc'
$ git push

Agora envie o novo worflow:

$ git add .github/workflows/docs.yml
$ git commit -m 'adicionando docs workflow'
$ git push

Se tudo deu certo, você já poderá ver o resultado do workflow com seu vimdoc gerado na branch principal. Exemplo do resultado na imagem abaixo

Exemplo pode ser visto aqui

Links úteis

Documentação da github action
Exemplo de um bom README vs o resultado em vimdoc

Se gostou do artigo, não esqueça de compartilhar em outras redes e aproveita e me segue também! https://linktr.ee/ellisonleao

por Ellison Leão em 26 de July de 2023 às 20:40

December 17, 2022

Blog do PauloHRPinheiro

Debugando em Python

Iniciando a usar o módulo pdb do Python para debugar com maestria, na linha de comando

17 de December de 2022 às 00:00

October 17, 2022

PythonClub

Questões para estudo de algoritmos

Recentemente li o texto do Maycon Alves, "3 algoritmos para você sua lógica", onde são apresentados 3 problemas para treinar a lógica e escrita de algoritmos: cálculo de fatorial, identificar se um número é primo, e calcular os valores da sequência de Fibonacci. São problemas interessantes, e após resolvê-los, pode-se fazer outras perguntas que levam a um entendimento mais profundo desses algoritmos. Eu recomendo que leiam o texto do Maycon primeiro e tentem implementar uma solução para esses problemas propostos, e com isso feito, vamos discutir um pouco sobre eles.

Analisando as soluções

No texto do Maycon, tem uma dica sobre o problema da sequência de Fibonacci, onde é dito que ele pode ser resolvido usando recursividade ou loops. Vamos analisar essas opções.

Uma solução recursiva pode ser implementada como a baixo. O código dessa solução é simples e se aproxima bastante da descrição matemática do problema.

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 2) + fibonacci(n - 1)

Enquanto uma solução iterativa (com loop) pode ser um pouco mais complicada de se ler:

def fibonacci(n):
    a = 0
    b = 1
    for i in range(n - 1):
        a, b = b, a + b
    return b

Essas mesmas técnicas podem ser utilizadas para resolver o cálculo do fatorial. Onde uma implementação recursiva e outra iterativa podem ser vistas a baixo:

def fatorial(n):
    if n <= 1:
        return 1
    return n * fatorial(n - 1)

def fatorial(n):
    valor = 0
    for i in range(1, n + 1):
        valor *= i
    return valor

Com essas soluções implementadas, vem uma pergunta: Existe alguma diferença entre elas, além da diferença de como isso está expresso no código? Um primeiro teste que pode ser feito é de desempenho visando observar quanto tempo cada implementação leva para calcular a resposta. Os testes foram executados em um notebook com processador Intel Core i7-6500U CPU @ 2.50GHz, memória RAM DDR4 2133MHz, no Python 3.9.2 do Debian 11, desativamente o garbage collector do Python durante a execução das funções para ter um resultado com menos variação, apresentados como uma média de 10 execuções (código utilizado).

O gráfico a baixo mostra o tempo de execução das implementações que calculam os valores da sequência de Fibonacci, onde é possível observar que a implementação iterativa mantém uma linearidade do tempo conforme vai se pedindo números maiores da sequência, diferente da implementação recursiva, que a cada valor quase dobra o tempo de execução.

Gráfico do tempo de execução Fibonacci

E a baixo pode-se observar o gráfico para as implementações que calculam o fatorial, que devido aos tempos serem mais baixos, possui uma variação um pouco maior, e é possível observar uma tendência de reta para as duas implementações, com a implementação recursiva tendo um ângulo um pouco mais íngreme, implicando em um algoritmo mais lento.

Gráfico do tempo de execução fatorial

A partir desses dois gráficos algumas perguntas podem ser feitas: Por que a implementação recursiva do Fibonacci apresentou uma curva que se aproxima de uma exponencial e não de uma reta como as demais? Qual a diferença para a implementação recursiva do fatorial que explicar isso? Implementações recursivas são sempre piores que as implementações iterativas, ou existem casos em elas superem ou se equivalem as iterativas?

Saindo um pouco desses gráficos, outras questões podem ser levantadas, como: Existe mais algum aspecto ou característica além do tempo de execução (e facilidade de leitura do código) para a escolha entre uma implementação ou outra? Considerando que essas funções vão rodar em um servidor, existe alguma coisa que possa ser feita para melhorar a performance dessas funções, como reaproveitar resultados já calculados para se calcular novos resultados? Como isso poderia ser feito? Quais são as desvantagens dessa abordagem?

Olhando para o problema de verificar se um número é primo, existe o crivo de Eratóstenes, ele é uma implementação eficiente? Existem casos em que ele pode ser uma boa solução ou não? O exemplo a baixo (retirado da Wikipédia) mostra o processo para encontrar todos os números primos até 120, existe alguma forma de adaptá-lo para executar diversas vezes reaproveitando o que já foi calculado, como sempre retornar o próximo número primo?

Exemplo do crivo de Eratóstenes

Considerações

Se você nunca se deparou com perguntas desse tipo, seja bem-vindo a área de análise de algoritmos, onde após se ter uma solução, busca-se analisar e descrever o comportamento do algoritmo, e até a busca de algoritmos mais eficientes. E trazendo para o dia a dia de um desenvolvedor, essas questões podem ser a resposta do motivo do código funcionar muito bem no computador de quem desenvolveu, mas demorar muito ou apresentar problemas para rodar no servidor de produção, ou com o tempo (e crescimento do volume de dados) começar a dar problemas.

Nesse artigo eu apenas levantei as perguntas, deixo que cada um busque as respostas, que existem. Sintam-se livres para me procurar para discutir alguma questão ou orientações para encontrar as respostas, seja nos comentários do texto no dev.to ou no Twitter.

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

por Eduardo Klosowski em 17 de October de 2022 às 18:00

May 13, 2022

Ellison Leão

Meus 10 plugins favoritos para o Neovim em 2022

Uma breve lista dos melhores plugins pra usar agora e aumentar sua produtividade

Quando comecei a usar o Neovim, lá por 2017, depois de passar 6 anos usando Vim, não tinha sentido uma diferença tão grande porque primeiro a API lua era praticamente inexistente e minha configuração ainda era puramente vimscript e meus plugins também.

Mas tudo mudou quando fiz o upgrade pras versões 0.4.x e gradualmente comecei a me aventurar no mundo do Lua, no final de 2019. De lá pra ca já consegui migrar minhas configurações 100% para Lua e 99% dos plugins que uso hoje também são feitos na linguagem. Aqui vai minha lista dos 10 melhores plugins que acho indispensáveis hoje:

#10

wbthomason / packer.nvim

A use-package inspired plugin manager for Neovim. Uses native packages, supports Luarocks dependencies, written in Lua, allows for expressive config

NOTICE:

This repository is currently unmaintained. For the time being (as of August, 2023), it is recommended to use one of the following plugin managers instead:

lazy.nvim: Most stable and maintained plugin manager for Nvim.
pckr.nvim: Spiritual successor of packer.nvim. Functional but not as stable as lazy.nvim.

packer.nvim

use-package inspired plugin/package management for Neovim.

Have questions? Start a discussion.

Have a problem or idea? Make an issue or a PR.

Packer is built on native packages. You may wish to read :h packages before continuing

Features

Declarative plugin specification
Support for dependencies
Support for Luarocks dependencies
Expressive configuration and lazy-loading options
Automatically compiles efficient lazy-loading code to improve startup time
…

View on GitHub

Começando por esse incrível gerenciador de plugins, que vai deixar sua vida bem mais fácil na hora de instalar, atualizar e configurar seus plugins, principalmente se você utiliza lua nas suas configurações. Tem suporte a instalação de pacotes do luarocks, lazy loading e mais.

#9

numToStr / Comment.nvim

🧠 💪 // Smart and powerful comment plugin for neovim. Supports treesitter, dot repeat, left-right/up-down motions, hooks, and more

// Comment.nvim

^{⚡ Smart and Powerful commenting plugin for neovim ⚡}

✨ Features

Supports treesitter. Read more
Supports commentstring. Read :h comment.commentstring
Supports line (//) and block (/* */) comments
Dot (.) repeat support for gcc, gbc and friends
Count support for [count]gcc and [count]gbc
Left-right (gcw gc$) and Up-Down (gc2j gc4k) motions
Use with text-objects (gci{ gbat)
Supports pre and post hooks
Ignore certain lines, powered by Lua regex

🚀 Installation

With lazy.nvim

-- add this to your lua/plugins.lua, lua/plugins/init.lua,  or the file you keep your other plugins:
{
    'numToStr/Comment.nvim',
    opts = {
        -- add any options here
    }
}

With packer.nvim

use {
    'numToStr/Comment.nvim',
    config = function()
        require('Comment').setup()
    end
}

With vim-plug

Plug 'numToStr/Comment.nvim'
" Somewhere after plug#end()

…

View on GitHub

Um poderoso plugin para inserir e remover comentários em códigos. Tem suporte ao treesitter, que vamos falar mais pra frente.

#8

neovim / nvim-lspconfig

Quickstart configs for Nvim LSP

nvim-lspconfig

nvim-lspconfig is a "data only" repo, providing basic, default Nvim LSP client configurations for various LSP servers.

View all configs or :help lspconfig-all from Nvim.

Important ⚠️

These configs are best-effort and supported by the community (you). See contributions.
If you found a bug in Nvim LSP (:help lsp), report it to Neovim core
- Do not report it here. Only configuration data lives here.
This repo only provides configurations. Its programmatic API is deprecated and must not be used externally.
- The "framework" parts (not the configs) of nvim-lspconfig will be upstreamed to Nvim core.

Install

Requires Nvim 0.10 above. Update Nvim and nvim-lspconfig before reporting an issue.

Install nvim-lspconfig using Vim's "packages" feature:

git clone https://github.com/neovim/nvim-lspconfig ~/.config/nvim/pack/nvim/start/nvim-lspconfig

Or use a 3rd-party plugin manager (consult the documentation for your plugin manager).

Quickstart

Install a language server, e.g. pyright
```
npm i -g pyright
```
Add…

View on GitHub

Plugin obrigatório para quem quer utilizar o maravilhoso cliente nativo LSP. Com esse plugin a configuração dos LSP servers das linguagens fica extremamente fácil.

#7

williamboman / nvim-lsp-installer

Further development has moved to https://github.com/williamboman/mason.nvim!

ℹ️ Project status

This is an excerpt from the announcement here.

nvim-lsp-installer is no longer maintained.

mason.nvim is the next generation version of nvim-lsp-installer. It builds on top of the very same foundation as nvim-lsp-installer (which means it's easy to migrate), but with a majority of internals refactored to improve extensibility and testability.

More importantly, the scope of mason.nvim has also been widened to target more than just LSP servers. mason.nvim supports DAP servers, linters, formatters, and more. As of writing, mason.nvim provides 150+ packages for 100+ languages. It can be thought of as a general-purpose package manager, native to Neovim, that runs everywhere Neovim runs (Windows, macOS, Linux, etc.).

Another big change with mason.nvim is that executables are now linked to a single, shared, location, allowing seamless access from Neovim builtins (shell, terminal, etc.) as well as other 3rd party plugins.

There have also been other improvements…

View on GitHub

Plugin que fornece todo suporte para uma rápida instalação dos LSP servers com uma excelente UX. É um plugin que funciona em conjunto com o nvim-lspconfig

#6

jose-elias-alvarez / null-ls.nvim

Use Neovim as a language server to inject LSP diagnostics, code actions, and more via Lua.

ARCHIVED

null-ls is now archived and will no longer receive updates. Please see this issue for details.

null-ls.nvim

Use Neovim as a language server to inject LSP diagnostics, code actions, and more via Lua.

Motivation

Neovim's LSP ecosystem is growing, and plugins like telescope.nvim and trouble.nvim make it a joy to work with LSP features like code actions and diagnostics.

Unlike the VS Code and coc.nvim ecosystems, Neovim doesn't provide a way for non-LSP sources to hook into its LSP client. null-ls is an attempt to bridge that gap and simplify the process of creating, sharing, and setting up LSP sources using pure Lua.

null-ls is also an attempt to reduce the boilerplate required to set up general-purpose language servers and improve performance by removing the need for external processes.

Status

null-ls is in beta status. Please see below for steps to follow if something doesn't work the way…

View on GitHub

Gere diagnósticos, formate, realize ações de código e mais, utilizando esse plugin que utiliza o neovim como um servidor LSP, e faz a ponte entre ferramentas que não fazem uma conexão direta com o cliente nativo LSP. Entre os exemplos podemos citar:

Formatar código ao salvar o buffer com o prettier
Gerar relatório de erros e warnings com golangci-lint para Go
Formatar código Lua com stylua

e mais!

#5

nvim-treesitter / nvim-treesitter

Nvim Treesitter configurations and abstraction layer

nvim-treesitter

Treesitter configurations and abstraction layer for Neovim

Logo by @steelsojka

The goal of nvim-treesitter is both to provide a simple and easy way to use the interface for tree-sitter in Neovim and to provide some basic functionality such as highlighting based on it:

Traditional highlighting (left) vs Treesitter-based highlighting (right) More examples can be found in our gallery.

Warning: Treesitter and nvim-treesitter highlighting are an experimental feature of Neovim. Please consider the experience with this plug-in as experimental until Tree-Sitter support in Neovim is stable! We recommend using the nightly builds of Neovim if possible. You can find the current roadmap here. The roadmap and all features of this plugin are open to change, and any suggestion will be highly appreciated!

Nvim-treesitter is based on three interlocking features: language parsers, queries, and modules, where modules provide features – e.g., highlighting – based on…

View on GitHub

Um plugin que faz interface com o excelente projeto tree-sitter e traz funcionalides como _syntax highlighting _, indentação, seleção incremental, movimentação, folding, entre outros.

#4

L3MON4D3 / LuaSnip

Snippet Engine for Neovim written in Lua.

LuaSnip

javadoc.mp4

Features

Tabstops
Text-Transformations using Lua functions
Conditional Expansion
Defining nested Snippets
Filetype-specific Snippets
Choices
Dynamic Snippet creation
Regex-Trigger
Autotriggered Snippets
Easy Postfix Snippets
Fast
Parse LSP-Style Snippets either directly in Lua, as a VSCode package or a SnipMate snippet collection.
Expand LSP-Snippets with nvim-compe (or its' successor, nvim-cmp (requires cmp_luasnip))
Snippet history (jump back into older snippets)
Resolve filetype at the cursor using Treesitter

Drawbacks

Snippets that make use of the entire functionality of this plugin have to be defined in Lua (but 95% of snippets can be written in LSP-syntax).

Requirements

Neovim >= 0.7 (extmarks) jsregexp for lsp-snippet-transformations (see here for some tips on installing it).

Setup

Install

With your preferred plugin manager i.e. vim-plug, Packer or lazy
Packer:

use({
    "L3MON4D3/LuaSnip"
    -- follow latest release.
    tag = "v2.*", -- Replace <CurrentMajor> by the latest released major (first

…

View on GitHub

O melhor script para criação de snippets do Neovim. Simples assim. Tem uma curva de aprendizado um pouco alta, mas quando se pega o jeito, você não vai querer largar nunca. Aqui um vídeo explicando mais sobre as funcionalidades dele (Em inglês)

#3

hrsh7th / nvim-cmp

A completion plugin for neovim coded in Lua.

nvim-cmp

A completion engine plugin for neovim written in Lua Completion sources are installed from external repositories and "sourced".

demo.mp4

Readme!

There is a GitHub issue that documents breaking changes for nvim-cmp. Subscribe to the issue to be notified of upcoming breaking changes.
This is my hobby project. You can support me via GitHub sponsors.
Bug reports are welcome, but don't expect a fix unless you provide minimal configuration and steps to reproduce your issue.
The cmp.mapping.preset.* is pre-defined configuration that aims to mimic neovim's native like behavior. It can be changed without announcement. Please manage key-mapping by yourself.

Concept

Full support for LSP completion related capabilities
Powerful customizability via Lua functions
Smart handling of key mappings
No flicker

Setup

Recommended Configuration

This example configuration uses vim-plug as the plugin manager and vim-vsnip as a snippet plugin.

call plug#begin(s:plug_dir)
Plug 'neovim/nvim-lspconfig'
Plug 'hrsh7th/cmp-nvim-lsp

…

View on GitHub

Seguindo o melhor plugin para snippets, esse com certeza vai ser o melhor plugin para autocomplete que você irá ter no mercado até o momento. Seu design flexível permite incluir vários tipos diferentes de sources, desde sugestões de LSP, snippets, etc. A lista é gigantesca e você também conta com uma excelente documentação caso queira criar os seus próprios sources também.

#2

nvim-lualine / lualine.nvim

A blazing fast and easy to configure neovim statusline plugin written in pure lua.

lualine.nvim

A blazing fast and easy to configure Neovim statusline written in Lua.

lualine.nvim requires Neovim >= 0.7.

For previous versions of neovim please use compatability tags for example compat-nvim-0.5

Contributing

Feel free to create an issue/PR if you want to see anything else implemented. If you have some question or need help with configuration, start a discussion.

Please read CONTRIBUTING.md before opening a PR. You can also help with documentation in the wiki.

Screenshots

Here is a preview of what lualine can look like.

Screenshots of all available themes are listed in THEMES.md

For those who want to break the norms, you can create custom looks for lualine.

Example :

Performance compared to other plugins

Unlike other statusline plugins, lualine loads only the components you specify, and nothing else.

Startup time performance measured with an amazing plugin dstein64/vim-startuptime

Times are measured with a…

View on GitHub

Minha escolha para plugin de statusline. Simples, rápido e bem flexível.

#1

nvim-telescope / telescope.nvim

Find, Filter, Preview, Pick. All lua, all the time.

telescope.nvim

Gaze deeply into unknown regions using the power of the moon.

What Is Telescope?

telescope.nvim is a highly extendable fuzzy finder over lists. Built on the latest awesome features from neovim core. Telescope is centered around modularity, allowing for easy customization.

Community driven builtin pickers, sorters and previewers.

For more showcases of Telescope, please visit the Showcase section in the Telescope Wiki

Telescope Table of Contents

Getting Started

This section should guide you to run your first builtin pickers.

Neovim (v0.9.0) or the latest neovim nightly commit is required for telescope.nvim to work. The neovim version also needs to be compiled with LuaJIT, we currently do not support Lua5.1 because of some ongoing issues.

Required dependencies

nvim-lua/plenary.nvim is required.

Suggested dependencies

BurntSushi/ripgrep is required for live_grep and grep_string and…

View on GitHub

Talvez o plugin mais essencial de todos hoje. Faça buscas por arquivo, texto e muito mais, numa velocidade absurda, com esse plugin fantástico.

Conclusão

Com o ecossistema Lua se desenvolvendo cada dia mais, muitos plugins estão surgindo todos os dias e com certeza essa lista provavelmente precisará ser atualizada em um futuro próximo.

Faltou algum plugin nessa lista? Me conta ae nos comentários!

por Ellison Leão em 13 de May de 2022 às 17:56

August 02, 2021

PythonClub

Participe da Python Brasil 2021, a maior conferência de Python da América Latina

Python Brasil 2021, a maior conferência de Python da América Latina, acontecerá entre os dias 11 e 17 de outubro, reunindo pessoas desenvolvedoras, cientistas de dados e entusiastas da tecnologia.

A Python Brasil 2021, evento promovido pela comunidade brasileira de Python, traz nesta edição uma agenda imperdível para quem quer mergulhar no universo de uma das linguagens de programação mais utilizadas na atualidade.

Uma das principais novidades deste ano é a trilha de atividades em espanhol, que vai trazer ao evento palestras de toda a América Latina, criando um ambiente de muita diversidade onde as comunidades Python de língua espanhola vão poder se conectar e trocar experiências.

Em sua 17 edição, o evento será realizado de forma online, gratuito e aberto para qualquer pessoa de 11 a 17 de outubro. É esperado um público de mais de 10 mil participantes de todas as partes do Brasil e do mundo. Nos sete dias de imersão, os participantes poderão contribuir para projetos de software livre, participar de treinamentos e adquirir novos conhecimentos com outras pessoas da comunidade.

E que tal falar sobre aquele projeto inovador, dar dicas sobre carreira ou mostrar formas de usar a tecnologia para causar impacto social, tudo isso em um ambiente inclusivo, seguro e acolhedor?! Até o dia 12 de agosto você poderá submeter sua proposta para apresentação de palestras e tutoriais na maior conferência Python da América Latina.

Você poderá acompanhar a Python Brasil pelo YouTube em tempo real e também participar de atividades e interagir com toda a comunidade pelo servidor do Discord criado especialmente para o evento.

Não deixe de seguir o perfil do evento no Instagram ou Twitter para ficar por dentro das últimas novidades!

por Rafael Alves RIbeiro em 02 de August de 2021 às 18:00

June 21, 2021

Filipe Saraiva

Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional

A Faculdade de Computação e o Programa de Pós-Graduação em Ciência da Computação da UFPA estão desenvolvendo um projeto que pretende atingir dois objetivos: o primeiro, fazer uma melhor divulgação para o público externo à universidade do que produzimos em nossas pesquisas; o segundo, uma melhor divulgação INTERNA da mesma coisa – o que desenvolvemos em nossas pesquisas.

Sim, INTERNA – não bastasse de fato a comunicação deficitária e pulverizada do que fazemos para os nossos próprios alunos, a pandemia só veio a piorar esse quadro. Após mais de um ano sem contato próximo com as turmas, com aulas completamente à distância e sem maiores interações extra-classe, os alunos em geral estão perdidos sobre as possibilidades de temas para TCCs e pesquisas que eles podem realizar conosco.

Dessa forma as duas subunidades estão entrevistando todos os professores para que falem um pouco sobre os temas que trabalham, o histórico de pesquisa, o que vem sendo feito e, mais interessante, o que pode vir a ser feito. As entrevistas ocorrem no canal do YouTube Computação UFPA e depois são retrabalhadas para aparecerem no FacompCast.

Feitas as devidas introduções, nesta terça dia 22/06 às 11h eu e o amigo prof. Jefferson Morais iremos falar um pouco sobre as pesquisas em Inteligência Computacional (ou seria Artificial?) desenvolvidas por nós. Será um bom apanhado sobre os trabalhos em 4 áreas que atuamos – aprendizado de máquina, metaheurísticas, sistemas fuzzy e sistemas multiagentes -, expondo projetos atuais e novos para os interessados.

Nos vemos portanto logo mais na sala da entrevista.

UPDATE:

A gravação já está disponível, segue abaixo:

por Filipe Saraiva em 21 de June de 2021 às 21:51

May 17, 2021

PythonClub

Orientação a objetos de outra forma: Property

Seguindo com a série, chegou a hora de discutir sobre encapsulamento, ou seja, ocultar detalhes de implementação de uma classe do resto do código. Em algumas linguagens de programação isso é feito utilizando protected ou private, e às vezes o acesso aos atributos é feito através de funções getters e setters. Nesse texto vamos ver como o Python lida com essas questões.

Métodos protegidos e privados

Diferente de linguagens como Java e PHP que possuem palavras-chave como protected e private para impedir que outras classes acessem determinados métodos ou atributos, Python deixa tudo como público. Porém isso não significa que todas as funções de uma classe podem ser chamadas por outras, ou todos os atributos podem ser lidos e alterados sem cuidados.

Para que quem estiver escrevendo um código saiba quais as funções ou atributos que não deveriam ser acessados diretamente, segue-se o padrão de começá-los com _, de forma similar aos arquivos ocultos em sistemas UNIX, que começam com .. Esse padrão já foi seguido na classe AutenticavelComRegistro da postagem sobre mixins, onde a função que pega a data do sistema foi nomeada _get_data. Entretanto isso é apenas uma sugestão, nada impede dela ser chamada, como no exemplo a baixo:

from datetime import datetime


class Exemplo:
    def _get_data(self):
        return datetime.now().strftime('%d/%m/%Y %T')


obj = Exemplo()
print(obj._get_data())

Porém algumas bibliotecas também utilizam o _ para indicar outras informações como metadados do objeto, e que podem ser acessados sem muitos problemas. Assim é possível utilizar esse símbolo duas vezes (__) para indicar que realmente essa variável ou função não deveria ser acessada de fora da classe, apresentando erro de que o atributo não foi encontrado ao tentar executar a função, porém ela ainda pode ser acessada:

from datetime import datetime


class Exemplo:
    def __get_data(self):
        return datetime.now().strftime('%d/%m/%Y %T')


obj = Exemplo()
print(obj.__get_data())  # AttributeError
print(obj._Exemplo__get_data())  # Executa a função

Property

Os getters e setters muitas vezes são usados para impedir que determinadas variáveis sejam alteradas, ou validar o valor antes de atribuir a variável, ou ainda processar um valor a partir de outras variáveis. Porém como o Python incentiva o acesso direto as variáveis, existe a property, que ao tentar acessar uma variável ou alterar um valor, uma função é chamada. Exemplo:

class Pessoa:
    def __init__(self, nome, sobrenome, idade):
        self._nome = nome
        self.sobrenome = sobrenome
        self._idade = idade

    @property
    def nome(self):
        return self._nome

    @property
    def nome_completo(self):
        return f'{self.nome} {self.sobrenome}'

    @nome_completo.setter
    def nome_completo(self, valor):
        valor = valor.split(' ', 1)
        self._nome = valor[0]
        self.sobrenome = valor[1]

    @property
    def idade(self):
        return self._idade

    @idade.setter
    def idade(self, valor):
        if valor < 0:
            raise ValueError
        self._idade = valor

    def fazer_aniversario(self):
        self.idade += 1

Nesse código algumas variáveis são acessíveis através de properties, de forma geral, as variáveis foram definidas começando com _ e com uma property de mesmo nome (sem o _). O primeiro caso é o nome, que possui apenas o getter, sendo possível o seu acesso como obj.nome, porém ao tentar atribuir um valor, será lançado um erro (AttributeError: can't set attribute). Em relação ao sobrenome, como não é necessário nenhum tratamento especial, não foi utilizado um property, porém futuramente pode ser facilmente substituído por um sem precisar alterar os demais códigos. Porém a função nome_completo foi substituída por um property, permitindo tanto o acesso ao nome completo da pessoa, como se fosse uma variável, quanto trocar nome e sobrenome ao atribuir um novo valor para essa property. Quanto a idade utiliza o setter do property para validar o valor recebido, retornando erro para idades inválidas (negativas).

Vale observar também que todas as funções de getter não recebem nenhum argumento (além do self), enquanto as funções de setter recebem o valor atribuído à variável.

Utilizando a ABC, ainda é possível informar que alguma classe filha deverá implementar alguma property. Exemplo:

from abc import ABC


class Pessoa(ABC):
    def __init__(self, nome, sobrenome, idade):
        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

    @property
    @abstractmethod
    def nome_completo(self):
        ...


class Brasileiro(Pessoa):
    @property
    def nome_completo(self):
        return f'{self.nome} {self.sobrenome}'


class Japones(Pessoa):
    @property
    def nome_completo(self):
        return f'{self.sobrenome} {self.nome}'

Considerações

Diferente de algumas linguagens que ocultam as variáveis dos objetos, permitindo o seu acesso apenas através de funções, Python seguem no sentido contrário, acessando as funções de getter e setter como se fossem variáveis, isso permite começar com uma classe simples e ir adicionando funcionalidades conforme elas forem necessárias, sem precisar mudar o código das demais partes da aplicação, além de deixar transparente para quem desenvolve, não sendo necessário lembrar se precisa usar getteres e setteres ou não.

De forma geral, programação orientada a objetos consiste em seguir determinados padrões de código, e as linguagens que implementam esse paradigma oferecem facilidades para escrever código seguindo esses padrões, e às vezes até ocultando detalhes complexos de suas implementações. Nesse contexto, eu recomendo a palestra do autor do htop feita no FISL 16, onde ele comenta como usou orientação a objetos em C. E para quem ainda quiser se aprofundar no assunto de orientação a objetos no Python, recomendo os vídeos do Eduardo Mendes (também conhecido como dunossauro).

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

por Eduardo Klosowski em 17 de May de 2021 às 21:00

May 16, 2021

Blog do PauloHRPinheiro

Iniciando com o ORM Pony no Python III - Erros e Exceções

Seguindo o trilha com o ORM Pony, neste terceiro texto, agora começaremos a gerar situações de erro e ver o que os objetos retornam ou que exceções são geradas.

16 de May de 2021 às 00:00

May 10, 2021

PythonClub

Orientação a objetos de outra forma: ABC

Na discussão sobre herança e mixins foram criadas várias classes, como Autenticavel e AutenticavelComRegistro que adicionam funcionalidades a outras classes e implementavam tudo o que precisavam para seu funcionamento. Entretanto podem existir casos em que não seja possível implementar todas as funções na própria classe, deixando com que as classes que a estende implemente essas funções. Uma forma de fazer isso é través das ABC (abstract base classes, ou classes base abstratas).

Sem uso de classes base abstratas

Um exemplo de classe que não é possível implementar todas as funcionalidades foi dada no texto Encapsulamento da lógica do algoritmo, que discutia a leitura de valores do teclado até que um valor válido fosse lido (ou que repete a leitura caso um valor inválido tivesse sido informado). Nesse caso a classe ValidaInput implementava a lógica base de funcionamento, porém eram suas classes filhas (ValidaNomeInput e ValidaNotaInput) que implementavam as funções para tratar o que foi lido do teclado e verificar se é um valor válido ou não.

class ValidaInput:
    mensagem_valor_invalido = 'Valor inválido!'

    def ler_entrada(self, prompt):
        return input(prompt)

    def transformar_entrada(self, entrada):
        raise NotImplementedError

    def validar_valor(self, valor):
        raise NotImplementedError

    def __call__(self, prompt):
        while True:
            try:
                valor = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_valor(valor):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return valor


class ValidaNomeInput(ValidaInput):
    mensagem_valor_invalido = 'Nome inválido!'

    def transformar_entrada(self, entrada):
        return entrada.strip().title()

    def validar_valor(self, valor):
        return valor != ''


class ValidaNotaInput(ValidaInput):
    mensagem_valor_invalido = 'Nota inválida!'

    def transformar_entrada(self, entrada):
        return float(entrada)

    def validar_valor(self, valor):
        return 0 <= valor <= 10

Entretanto, esse código permite a criação de objetos da classe ValidaInput mesmo sem ter uma implementação das funções transformar_entrada e validar_valor. E a única mensagem de erro ocorreria ao tentar executar essas funções, o que poderia estar longe do problema real, que é a criação de um objeto a partir de uma classe que não prove todas as implementações das suas funções, o que seria semelhante a uma classe abstrata em outras linguagens.

obj = ValidaInput()

# Diversas linhas de código

obj('Entrada: ')  # Exceção NotImplementedError lançada

Com uso de classes base abstratas

Seguindo a documentação da ABC, para utilizá-las é necessário informar a metaclasse ABCMeta na criação da classe, ou simplesmente estender a classe ABC, e decorar com abstractmethod as funções que as classes que a estenderem deverão implementar. Exemplo:

from abc import ABC, abstractmethod


class ValidaInput(ABC):
    mensagem_valor_invalido = 'Valor inválido!'

    def ler_entrada(self, prompt):
        return input(prompt)

    @abstractmethod
    def transformar_entrada(self, entrada):
        ...

    @abstractmethod
    def validar_valor(self, valor):
        ...

    def __call__(self, prompt):
        while True:
            try:
                valor = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_valor(valor):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return valor

Desta forma, ocorrerá um erro já ao tentar criar um objeto do tipo ValidaInput, dizendo quais são as funções que precisam ser implementadas. Porém funcionará normalmente ao criar objetos a partir das classes ValidaNomeInput e ValidaNotaInput visto que elas implementam essas funções.

obj = ValidaInput()  # Exceção TypeError lançada

nome_input = ValidaNomeInput()  # Objeto criado
nota_input = ValidaNotaInput()  # Objeto criado

Como essas funções não utilizam a referência ao objeto (self), ainda é possível decorar as funções com staticmethod, como:

from abc import ABC, abstractmethod


class ValidaInput(ABC):
    mensagem_valor_invalido = 'Valor inválido!'

    @staticmethod
    def ler_entrada(prompt):
        return input(prompt)

    @staticmethod
    @abstractmethod
    def transformar_entrada(entrada):
        ...

    @staticmethod
    @abstractmethod
    def validar_valor(valor):
        ...

    def __call__(self, prompt):
        while True:
            try:
                valor = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_valor(valor):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return valor


class ValidaNomeInput(ValidaInput):
    mensagem_valor_invalido = 'Nome inválido!'

    @staticmethod
    def transformar_entrada(entrada):
        return entrada.strip().title()

    @staticmethod
    def validar_valor(valor):
        return valor != ''


class ValidaNotaInput(ValidaInput):
    mensagem_valor_invalido = 'Nota inválida!'

    @staticmethod
    def transformar_entrada(entrada):
        return float(entrada)

    @staticmethod
    def validar_valor(valor):
        return 0 <= valor <= 10

Isso também seria válido para funções decoradas com classmethod, que receberiam a referência a classe (cls).

Considerações

Não é necessário utilizar ABC para fazer o exemplo discutido, porém ao utilizar essa biblioteca ficou mais explícito quais as funções que precisavam ser implementados nas classes filhas, ainda mais que sem utilizar ABC a classe base poderia nem ter as funções, com:

class ValidaInput:
    mensagem_valor_invalido = 'Valor inválido!'

    def ler_entrada(self, prompt):
        return input(prompt)

    def __call__(self, prompt):
        while True:
            try:
                valor = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_valor(valor):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return valor

Como Python possui duck-typing, não é necessário uma grande preocupação com os tipos, como definir e utilizar interfaces presentes em outras implementações de orientação a objetos, porém devido à herança múltipla, ABC pode ser utilizada como interface que não existe em Python, fazendo com que as classes implementem determinadas funções. Para mais a respeito desse assunto, recomendo as duas lives do dunossauro sobre ABC (1 e 2), e a apresentação do Luciano Ramalho sobre type hints.

Uma classe filha também não é obrigada a implementar todas as funções decoradas com abstractmethod, mas assim como a classe pai, não será possível criar objetos a partir dessa classe, apenas de uma classe filha dela que implemente as demais funções. Como se ao aplicar um abstractmethod tornasse a classe abstrata, e qualquer classe filha só deixasse de ser abstrata quando a última função decorada com abstractmethod for sobrescrita. Exemplo:

from abc import ABC, abstractmethod


class A(ABC):
    @abstractmethod
    def func1(self):
        ...

    @abstractmethod
    def func2(self):
        ...


class B(A):
    def func1(self):
        print('1')


class C(B):
    def func2(self):
        print('2')


a = A()  # Erro por não implementar func1 e func2
b = B()  # Erro por não implementar func2
c = C()  # Objeto criado

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

por Eduardo Klosowski em 10 de May de 2021 às 15:00

May 03, 2021

PythonClub

Orientação a objetos de outra forma: Herança múltiplas e mixins

No texto anterior foi apresentando o conceito de herança, que herda toda a estrutura e comportamento de uma classe, podendo estendê-la com outros atributos e comportamentos. Esse texto apresentará a ideia de herança múltipla, e uma forma para se aproveitar esse recurso, através de mixins.

Herança múltiplas

Voltando ao sistema para lidar com dados das pessoas, onde algumas dessas pessoas possuem a possibilidade de acessar o sistema através de usuário e senha, também deseja-se permitir que outros sistemas autentiquem e tenham acesso os dados através de uma API. Isso pode ser feito criando uma classe para representar os sistemas que terão permissão para acessar os dados. Exemplo:

class Sistema:
    def __init__(self, usuario, senha):
        self.usuario = usuario
        self.senha = senha

    def autenticar(self, usuario, senha):
        return self.usuario == usuario and self.senha == senha

Porém, esse código repete a implementação feita para PessoaAutenticavel:

class PessoaAutenticavel(Pessoa):
    def __init__(self, nome, sobrenome, idade, usuario, senha):
        super().__init__(nome, sobrenome, idade)
        self.usuario = usuario
        self.senha = senha

    def autenticar(self, usuario, senha):
        return self.usuario == usuario and self.senha == senha

Aproveitando que Python, diferente de outras linguagens, possui herança múltipla, é possível extrair essa lógica das classes, centralizando a implementação em uma outra classe e simplesmente herdá-la. Exemplo:

class Autenticavel:
    def __init__(self, *args, usuario, senha, **kwargs):
        super().__init__(*args, **kwargs)
        self.usuario = usuario
        self.senha = senha

    def autenticar(self, usuario, senha):
        return self.usuario == usuario and self.senha == senha


class PessoaAutenticavel(Autenticavel, Pessoa):
    ...


class Sistema(Autenticavel):
    ...


p = PessoaAutenticavel(nome='João', sobrenome='da Silva', idade=20,
                       usuario='joao', senha='secreta')

A primeira coisa a ser observada são os argumentos *args e **kwargs no __init__ da classe Autenticavel, eles são usados uma vez que não se sabe todos os argumentos que o __init__ da classe que estenderá o Autenticavel espera receber, funcionando de forma dinâmica (mais sobre esse recurso pode ser visto na documentação do Python).

A segunda coisa a ser verificada é que para a classe PessoaAutenticavel, agora cria em seus objetos, a estrutura tanto da classe Pessoa, quanto Autenticavel. Algo similar a versão sem orientação a objetos a baixo:

# Arquivo: pessoa_autenticavel.py

import autenticavel
import pessoa


def init(p, nome, sobrenome, idade, usuario, senha):
    pessoa.init(p, nome, sobrenome, idade)
    autenticavel.init(p, usuario, senha)

Também vale observar que as classes PessoaAutenticavel e Sistema não precisam definir nenhuma função, uma vez que elas cumprem seus papéis apenas herdando outras classes, porém seria possível implementar funções específicas dessas classes, assim como sobrescrever as funções definidas por outras classes.

Ordem de resolução de métodos

Embora herança múltiplas sejam interessantes, existe um problema, se ambas as classes pai possuírem uma função com um mesmo nome, a classe filha deveria chamar qual das funções? A do primeiro pai? A do último? Para lidar com esse problema o Python usa o MRO (method resolution order, ordem de resolução do método), que consiste em uma tupla com a ordem de qual classe o Python usará para encontrar o método a ser chamado. Exemplo:

print(PessoaAutenticavel.__mro__)
# (<class '__main__.PessoaAutenticavel'>, <class '__main__.Autenticavel'>, <class '__main__.Pessoa'>, <class 'object'>)

Por esse motivo que também foi possível chamar o super().__init__ dentro de Autenticavel, que devido ao MRO, o Python chama o __init__ da outra classe pai da classe que estendeu Autenticavel, em vez de precisar fazer um método __init__ em PessoaAutenticavel chamando o __init__ de todas as suas classes pais, como foi feito na versão sem orientação a objetos. E por isso a ordem Autenticavel e Pessoa na herança de PessoaAutenticavel, para fazer o MRO procurar os métodos primeiro em Autenticavel e depois em Pessoa.

Para tentar fugir da complexidade que pode ser herança múltipla, é possível escrever classes que tem por objetivo unicamente incluir alguma funcionalidade em outra, como o caso da classe Autenticavel, que pode ser herdada por qualquer outra classe do sistema para permitir o acesso ao sistema. Essas classes recebem o nome de mixins, e adiciona uma funcionalidade bem definida.

Estendendo mixins

Imagine se além de permitir o acesso ao sistema, também gostaríamos de registrar algumas tentativas de acesso, informando quando houve a tentativa e se o acesso foi concedido ou não. Como Autenticavel é uma classe, é possível extendê-la para implementar essa funcionalidade na função autenticar. Exemplo:

from datetime import datetime


class AutenticavelComRegistro(Autenticavel):
    @staticmethod
    def _get_data():
        return datetime.now().strftime('%d/%m/%Y %T')

    def autenticar(self, usuario, senha):
        print(f'{self._get_data()} Tentativa de acesso de {usuario}')
        acesso = super().autenticar(usuario, senha)
        if acesso:
            acesso_str = 'permitido'
        else:
            acesso_str = 'negado'
        print(f'{self._get_data()} Acesso de {usuario} {acesso_str}')
        return acesso


class PessoaAutenticavelComRegistro(AutenticavelComRegistro, Pessoa):
    ...


class SistemaAutenticavelComRegistro(AutenticavelComRegistro, Sistema):
    ...


p = PessoaAutenticavelComRegistro(
    nome='João', sobrenome='da Silva', idade=20,
    usuario='joao', senha='secreta',
)
p.autenticar('joao', 'secreta')
# Saída na tela:
# 23/04/2021 16:56:58 Tentativa de acesso de joao
# 23/04/2021 16:56:58 Acesso de joao permitido

Essa implementação utiliza-se do super() para acessar a função autenticar da classe Autenticavel para não precisar reimplementar a autenticação. Porém, antes de chamá-la, manipula seus argumentos para registrar quem tentou acessar o sistema, assim como também manipula o seu retorno para registrar se o acesso foi permitido ou não.

Essa classe também permite analisar melhor a ordem em que as classes são consultadas quando uma função é chamada:

print(PessoaAutenticavelComRegistro.__mro__)
# (<class '__main__.PessoaAutenticavelComRegistro'>, <class '__main__.AutenticavelComRegistro'>, <class '__main__.Autenticavel'>, <class '__main__.Pessoa'>, <class 'object'>)

Que também pode ser visto na forma de um digrama de classes:

Diagrama de classes

Onde é feito uma busca em profundidade, como se a função fosse chamada no primeiro pai, e só se ela não for encontrada, busca-se no segundo pai e assim por diante. Também é possível observar a classe object, que sempre será a última classe, e é a classe pai de todas as outras classes do Python quando elas não possuirem um pai declarado explicitamente.

Considerações

Herança múltipla pode dificultar bastante o entendimento do código, principalmente para encontrar onde determinada função está definida, porém pode facilitar bastante o código. Um exemplo que usa bastante herança e mixins são as views baseadas em classe do django (class-based views), porém para facilitar a visualização existe o site Classy Class-Based Views que lista todas as classes, e os mixins utilizados em cada uma, como pode ser visto em "Ancestors" como na UpdateView, que é usado para criar uma página com formulário para editar um registro já existente no banco, assim ela usa mixins para pegar um objeto do banco (SingleObjectMixin), processar formulário baseado em uma tabela do banco (ModelFormMixin) e algumas outras funcionalidades necessárias para implementar essa página.

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

por Eduardo Klosowski em 03 de May de 2021 às 18:00

April 26, 2021

PythonClub

Orientação a objetos de outra forma: Herança

Algo que ajuda no desenvolvimento é a reutilização de código. Em orientação a objetos, essa reutilização pode ocorrer através de herança, onde um objeto pode se comportar como um objeto da sua própria classe, como também da classe que herdou.

Adicionando funcionalidades

Uma das utilidades da herança é estender uma classe para adicionar funcionalidades. Pensando no contexto das postagens anteriores, poderíamos querer criar um usuário e senha para algumas pessoas poderem acessar o sistema. Isso poderia ser feito adicionando atributos usuário e senha para as pessoas, além de uma função para validar se os dados estão corretos, e assim permitir o acesso ao sistema. Porém isso não pode ser feito para todas as pessoas, e sim apenas para aqueles que possuem permissão de acesso.

Sem orientação a objetos

Voltando a solução com dicionários (sem utilizar orientação a objetos), isso consistiria em criar um dicionário com a estrutura de uma pessoa, e em seguida estender essa estrutura com os novos campos de usuário e senha nesse mesmo dicionário, algo como:

# Arquivo: pessoa.py

def init(pessoa, nome, sobrenome, idade):
    pessoa['nome'] = nome
    pessoa['sobrenome'] = sobrenome
    pessoa['idade'] = idade


def nome_completo(pessoa):
    return f"{pessoa['nome']} {pessoa['sobrenome']}"

# Arquivo: pessoa_autenticavel.py

def init(pessoa, usuario, senha):
    pessoa['usuario'] = usuario
    pessoa['senha'] = senha


def autenticar(pessoa, usuario, senha):
    return pessoa['usuario'] == usuario and pessoa['senha'] == senha

import pessoa
import pessoa_autenticavel

p = {}
pessoa.init(p, 'João', 'da Silva', 20)
pessoa_autenticavel.init(p, 'joao', 'secreta')

print(pessoa.nome_completo(p))
print(pessoa_autenticavel.autenticar(p, 'joao', 'secreta'))

Porém nessa solução é possível que o programador esqueça de chamar as duas funções init diferentes, e como queremos que todo dicionário com a estrutura de pessoa_autenticavel contenha também a estrutura de pessoa, podemos chamar o init de pessoa dentro do init de pessoa_autenticavel:

# Arquivo: pessoa_autenticavel.py

import pessoa


def init(p, nome, sobrenome, idade, usuario, senha):
    pessoa.init(p, nome, sobrenome, idade)
    p['usuario'] = usuario
    p['senha'] = senha


...  # Demais funções

import pessoa
import pessoa_autenticavel

p = {}
pessoa_autenticavel.init(p, 'João', 'da Silva', 20, 'joao', 'secreta')

print(pessoa.nome_completo(p))
print(pessoa_autenticavel.autenticar(p, 'joao', 'secreta'))

Nesse caso foi necessário alterar o nome do argumento pessoa da função pessoa_autenticavel.init para não conflitar com o outro módulo importado com esse mesmo nome. Porém ao chamar um init dentro de outro, temos a garantia de que o dicionário será compatível tanto com a estrutura pedida para ser criada pelo programador, quanto pelas estruturas pais dela.

Com orientação a objetos

class Pessoa:
    def __init__(self, nome, sobrenome, idade):
        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

    def nome_completo(self):
        return f'{self.nome} {self.sobrenome}'


class PessoaAutenticavel(Pessoa):
    def __init__(self, nome, sobrenome, idade, usuario, senha):
        Pessoa.__init__(self, nome, sobrenome, idade)
        self.usuario = usuario
        self.senha = senha

    def autenticar(self, usuario, senha):
        return self.usuario == usuario and self.senha == senha


p = PessoaAutenticavel('João', 'da Silva', 20, 'joao', 'secreta')

print(Pessoa.nome_completo(p))
print(PessoaAutenticavel.autenticar(p, 'joao', 'secreta'))

A principal novidade desse exemplo é que ao declarar a classe PessoaAutenticavel (filha), foi declarado a classe Pessoa (pai) entre parênteses, isso faz o interpretador Python criar uma cópia dessa classe estendendo-a com as novas funções que estamos criando. Porém pode ser um pouco redundante chamar Pessoa.__init__ dentro da função __init__ sendo que já foi declarado que ela estende Pessoa, podendo ser trocado por super(), que aponta para a classe que foi estendida. Exemplo:

class PessoaAutenticavel(Pessoa):
    def __init__(self, nome, sobrenome, idade, usuario, senha):
        super().__init__(nome, sobrenome, idade)
        self.usuario = usuario
        self.senha = senha

    ...  # Demais funções

Assim se evita repetir o nome da classe, e já passa automaticamente a referência para self, assim como quando usamos o açúcar sintático apresentado na primeira postagem dessa série. E esse açúcar sintática também pode ser usado para chamar tanto as funções declaradas em Pessoa quanto em PessoaAutenticavel. Exemplo:

p = PessoaAutenticavel('João', 'da Silva', 20, 'joao', 'secreta')

print(p.nome_completo())
print(p.autenticar('joao', 'secreta'))

Esse método também facilita a utilização das funções, uma vez que não é necessário lembrar em qual classe que cada função foi declarada. Na verdade, como PessoaAutenticavel estende Pessoa, seria possível executar também PessoaAutenticavel.nome_completo, porém eles apontam para a mesma função.

Sobrescrevendo uma função

A classe Pessoa possui a função nome_completo que retorna uma str contento nome e sobrenome. Porém no Japão, assim como em outros países asiáticos, o sobrenome vem primeiro, e até estão pedindo para seguir a tradição deles ao falarem os nomes de japoneses, como o caso do primeiro-ministro, mudando de Shinzo Abe para Abe Shinzo.

Com orientação a objetos

Isso também pode ser feito no sistema usando herança, porém em vez de criar uma nova função com outro nome, é possível criar uma função com o mesmo nome, sobrescrevendo a anterior, porém apenas para os objetos da classe filha. Algo semelhante ao que já foi feito com a função __init__. Exemplo:

class Japones(Pessoa):
    def nome_completo(self):
        return f'{self.sobrenome} {self.nome}'


p1 = Pessoa('João', 'da Silva', 20)
p2 = Japones('Shinzo', 'Abe', 66)

print(p1.nome_completo())  # João da Silva
print(p2.nome_completo())  # Abe Shinzo

Essa relação de herança traz algo interessante, todo objeto da classe Japones se comporta como um objeto da classe Pessoa, porém a relação inversa não é verdade. Assim como podemos dizer que todo japonês é uma pessoa, mas nem todas as pessoas são japonesas. Ser japonês é um caso mais específico de pessoa, assim como as demais nacionalidades.

Sem orientação a objetos

Esse comportamento de sobrescrever a função nome_completo não é tão simples de replicar em uma estrutura de dicionário, porém é possível fazer. Porém como uma pessoa pode ser tanto japonês quanto não ser, não é possível saber de antemão para escrever no código pessoa.nome_completo ou japones.nome_completo, que diferente do exemplo da autenticação, agora são duas funções diferentes, isso precisa ser descoberto dinamicamente quando se precisar chamar a função.

Uma forma de fazer isso é guardar uma referência para a função que deve ser chamada dentro da própria estrutura. Exemplo:

# Arquivo: pessoa.py

def init(pessoa, nome, sobrenome, idade):
    pessoa['nome'] = nome
    pessoa['sobrenome'] = sobrenome
    pessoa['idade'] = idade
    pessoa['nome_completo'] = nome_completo


def nome_completo(pessoa):
    return f"{pessoa['nome']} {pessoa['sobrenome']}"

# Arquivo: japones.py

import pessoa


def init(japones, nome, sobrenome, idade):
    pessoa(japones, nome, sobrenome, idade)
    japones['nome_completo'] = nome_completo


def nome_completo(japones):
    return f"{pessoa['sobrenome']} {pessoa['nome']}"

import pessoa
import japones

p1 = {}
pessoa.init(p1, 'João', 'da Silva', 20)
p2 = {}
japones.init(p2, 'Shinzo', 'Abe', 66)

print(p1['nome_completo'](p1))  # João da Silva
print(p2['nome_completo'](p2))  # Abe Shinzo

Perceba que a forma de chamar a função foi alterada. O que acontece na prática é que toda função que pode ser sobrescrita não é chamada diretamente, e sim a partir de uma referência, e isso gera um custo computacional adicional. Como esse custo não é tão alto (muitas vezes sendo quase irrelevante), esse é o comportamento adotado em várias linguagens, porém em C++, por exemplo, existe a palavra-chave virtual para descrever quando uma função pode ser sobrescrita ou não.

Considerações

Herança é um mecanismo interessante para ser explorado com o objetivo de reaproveitar código e evitar repeti-lo. Porém isso pode vir com alguns custos, seja computacional durante sua execução, seja durante a leitura do código, sendo necessário verificar diversas classes para saber o que de fato está sendo executado, porém isso também pode ser usado para ocultar e abstrair lógicas mais complicadas, como eu já comentei em outra postagem.

Herança também permite trabalhar com generalização e especialização, podendo descrever o comportamento mais geral, ou mais específico. Ou simplesmente só adicionar mais funcionalidades a uma classe já existente.

Assim como foi utilizado o super() para chamar a função __init__ da classe pai, é possível utilizá-lo para chamar qualquer outra função. Isso permite, por exemplo, tratar os argumentos da função, aplicando modificações antes de chamar a função original, ou seu retorno, executando algum processamento em cima do retorno dela, não precisando rescrever toda a função.

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

por Eduardo Klosowski em 26 de April de 2021 às 20:00

April 19, 2021

PythonClub

Orientação a objetos de outra forma: Métodos estáticos e de classes

Na postagem anterior foi apresentado o self, nessa postagem será discutido mais a respeito desse argumento, considerando opções para ele e suas aplicações.

Métodos estáticos

Nem todas as funções de uma classe precisam receber uma referência de um objeto para lê-lo ou alterá-lo, muitas vezes uma função pode fazer o seu papel apenas com os dados passados como argumento, por exemplo, receber um nome e validar se ele possui pelo menos três caracteres sem espaço. Dessa forma, essa função poderia ser colocada fora do escopo da classe, porém para facilitar sua chamada, e possíveis alterações (que será discutido em outra postagem), é possível colocar essa função dentro da classe e informar que ela não receberá o argumento self com o decorador @staticmethod:

class Pessoa:
    ...  # Demais funções

    @staticmethod
    def valida_nome(nome):
        return len(nome) >= 3 and ' ' not in nome

Dessa forma, essa função pode ser chamada diretamente de um objeto pessoa, ou até mesmo diretamente da classe, sem precisar criar um objeto primeiro:

# Chamando diretamente da classe
print(Pessoa.valida_nome('João'))

# Chamando através de um objeto do tipo Pessoa
p1 = Pessoa('João', 'da Silva', 20)
print(p1.valida_nome(p1.nome))

E essa função também pode ser utilizada dendro de outras funções, como validar o nome na criação de uma pessoa, de forma que caso o nome informado seja válido, será criado um objeto do tipo Pessoa, e caso o nome seja inválido, será lançado uma exceção:

class Pessoa:
    def __init__(self, nome, sobrenome, idade):
        if not self.valida_nome(nome):
            raise ValueError('Nome inválido')

        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

    ...  # Demais funções

    @staticmethod
    def valida_nome(nome):
        return len(nome) >= 3 and ' ' not in nome


p1 = Pessoa('João', 'da Silva', 20)  # Cria objeto
p2 = Pessoa('a', 'da Silva', 20)  # Lança ValueError: Nome inválido

Métodos da classe

Entretanto algumas funções podem precisar de um meio termo, necessitar acessar o contexto da classe, porém sem necessitar de um objeto. Isso é feito através do decorador @classmethod, onde a função decorada com ele, em vez de receber um objeto como primeiro argumento, recebe a própria classe.

Para demonstrar essa funcionalidade será implementado um id auto incremental para os objetos da classe Pessoa:

class Pessoa:
    total_de_pessoas = 0

    @classmethod
    def novo_id(cls):
        cls.total_de_pessoas += 1
        return cls.total_de_pessoas

    def __init__(self, nome, sobrenome, idade):
        self.id = self.novo_id()
        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

p1 = Pessoa('João', 'da Silva', 20)
print(p1.id)  # Imprime 1
p2 = Pessoa('Maria', 'dos Santos', 18)
print(p2.id)  # Imprime 2
print(Pessoa.total_de_pessoas)  # Imprime 2
print(p1.total_de_pessoas)  # Imprime 2
print(p2.total_de_pessoas)  # Imprime 2

Nesse código é criado uma variável total_de_pessoas dentro do escopo da classe Pessoas, e que é compartilhado tanto pela classe, como pelos objetos dessa classe, diferente de declará-la com self. dentro do __init__, onde esse valor pertenceria apenas ao objeto, e não é compartilhado com os demais objetos. Declarar variáveis dentro do contexto da classe é similar ao se declarar variáveis com static em outras linguagens, assim como o @classmethod é semelhante a declaração de funções com static.

As funções declaradas com @classmethod também podem ser chamadas sem a necessidade de se criar um objeto, como Pessoa.novo_id(), embora que para essa função específica isso não faça muito sentido, ou receber outros argumentos, tudo depende do que essa função fará.

Considerações

Embora possa parecer confuso identificar a diferença de uma função de um objeto (função sem decorador), função de uma classe (com decorador @classmethod) e função sem acesso a nenhum outro contexto (com decorador @staticmethod), essa diferença fica mais clara ao se analisar o primeiro argumento recebido por cada tipo de função. Podendo ser a referência a um objeto (self) e assim necessitando que um objeto seja criado anteriormente, ser uma classe (cls) e não necessitando receber um objeto, ou simplesmente não recebendo nenhum argumento especial, apenas os demais argumentos necessários para a função. Sendo diferenciados pelo uso dos decoradores.

Na orientação a objetos implementada pelo Python, algumas coisas podem ficar confusas quando se mistura com nomenclaturas de outras linguagens que possuem implementações diferentes. A linguagem Java, por exemplo, utiliza a palavra-chave static para definir os atributos e métodos de classe, enquanto no Python um método estático é aquele que não acessa nem um objeto, nem uma classe, devendo ser utilizado o escopo da classe e o decorador @classmethod para se criar atributos e métodos da classe.

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

por Eduardo Klosowski em 19 de April de 2021 às 20:00

April 12, 2021

PythonClub

Orientação a objetos de outra forma: Classes e objetos

Nas poucas e raríssimas lives que eu fiz na Twitch, surgiu a ideia de escrever sobre programação orientada a objetos em Python, principalmente por algumas diferenças de como ela foi implementada nessa linguagem. Aproveitando o tema, vou fazer uma série de postagens dando uma visão diferente sobre orientação a objetos. E nessa primeira postagem falarei sobre classes e objetos.

Usando um dicionário

Entretanto, antes de começar com orientação a objetos, gostaria de apresentar e discutir alguns exemplos sem utilizar esse paradigma de programação.

Pensando em um sistema que precise manipular dados de pessoas, é possível utilizar os dicionários do Python para agrupar os dados de uma pessoa em uma única variável, como no exemplo a baixo:

pessoa = {
    'nome': 'João',
    'sobrenome': 'da Silva',
    'idade': 20,
}

Onde os dados poderiam ser acessados através da variável e do nome do dado desejado, como:

print(pessoa['nome'])  # Imprimindo João

Assim, todos os dados de uma pessoa ficam agrupados em uma variável, o que facilita bastante a programação, visto que não é necessário criar uma variável para cada dado, e quando se manipula os dados de diferentes pessoas fica muito mais fácil identificar de qual pessoa aquele dado se refere, bastando utilizar variáveis diferentes.

Função para criar o dicionário

Apesar de prático, é necessário replicar essa estrutura de dicionário toda vez que se desejar utilizar os dados de uma nova pessoa. Para evitar a repetição de código, a criação desse dicionário pode ser feita dentro de uma função que pode ser colocada em um módulo pessoa (arquivo, nesse caso com o nome de pessoa.py):

# Arquivo: pessoa.py

def nova(nome, sobrenome, idade):
    return {
        'nome': nome,
        'sobrenome': sobrenome,
        'idade': idade,
    }

E para criar o dicionário que representa uma pessoa, basta importar esse módulo (arquivo) e chamar a função nova:

import pessoa

p1 = pessoa.nova('João', 'da Silva', 20)
p2 = pessoa.nova('Maria', 'dos Santos', 18)

Desta forma, garante-se que todos os dicionários representando pessoas terão os campos desejados e devidamente preenchidos.

Função com o dicionário

Também é possível criar algumas funções para executar operações com os dados desses dicionários, como pegar o nome completo da pessoa, trocar o seu sobrenome, ou fazer aniversário (o que aumentaria a idade da pessoa em um ano):

# Arquivo: pessoa.py

def nova(nome, sobrenome, idade):
    ...  # Código abreviado


def nome_completo(pessoa):
    return f"{pessoa['nome']} {pessoa['sobrenome']}"


def trocar_sobrenome(pessoa, sobrenome):
    pessoa['sobrenome'] = sobrenome


def fazer_aniversario(pessoa):
    pessoa['idade'] += 1

E sendo usado como:

import pessoa

p1 = pessoa.nova('João', 'da Silva', 20)
pessoa.trocar_sobrenome(p1, 'dos Santos')
print(pessoa.nome_completo(p1))
pessoa.fazer_aniversario(p1)
print(p1['idade'])

Nesse caso, pode-se observar que todas as funções aqui implementadas seguem o padrão de receber o dicionário que representa a pessoa como primeiro argumento, podendo ter outros argumentos ou não conforme a necessidade, acessando e alterando os valores desse dicionário.

Versão com orientação a objetos

Antes de entrar na versão orientada a objetos propriamente dita dos exemplos anteriores, vou fazer uma pequena alteração para facilitar o entendimento posterior. A função nova será separada em duas partes, a primeira que criará um dicionário, e chamará uma segunda função (init), que receberá esse dicionário como primeiro argumento (seguindo o padrão das demais funções) e criará sua estrutura com os devidos valores.

# Arquivo: pessoa.py

def init(pessoa, nome, sobrenome, idade):
    pessoa['nome'] = nome
    pessoa['sobrenome'] = sobrenome
    pessoa['idade'] = idade


def nova(nome, sobrenome, idade):
    pessoa = {}
    init(pessoa, nome, sobrenome, idade)
    return pessoa


...  # Demais funções do arquivo

Porém isso não muda a forma de uso:

import pessoa

p1 = pessoa.nova('João', 'da Silva', 20)

Função para criar uma pessoa

A maioria das linguagens de programação que possuem o paradigma de programação orientado a objetos faz o uso de classes para definir a estrutura dos objetos. O Python também utiliza classes, que podem ser definidas com a palavra-chave class seguidas de um nome para ela. E dentro dessa estrutura, podem ser definidas funções para manipular os objetos daquela classe, que em algumas linguagens também são chamadas de métodos (funções declaradas dentro do escopo uma classe).

Para converter o dicionário para uma classe, o primeiro passo é implementar uma função para criar a estrutura desejada. Essa função deve possui o nome __init__, e é bastante similar a função init do código anterior:

class Pessoa:
    def __init__(self, nome, sobrenome, idade):
        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

As diferenças são que agora o primeiro parâmetro se chama self, que é um padrão utilizado no Python, e em vez de usar colchetes e aspas para acessar os dados, aqui basta utilizar o ponto e o nome do dado desejado (que aqui também pode ser chamado de atributo, visto que é uma variável do objeto). A função nova implementada anteriormente não é necessária, a própria linguagem cria um objeto e passa ele como primeiro argumento para o __init__. E assim para se criar um objeto da classe Pessoa basta chamar a classe como se fosse uma função, ignorando o argumento self e informando os demais, como se estivesse chamando a função __init__ diretamente:

p1 = Pessoa('João', 'da Silva', 20)

Nesse caso, como a própria classe cria um contexto diferente para as funções (escopo ou namespace), não está mais sendo utilizado arquivos diferentes, porém ainda é possível fazê-lo, sendo necessário apenas fazer o import adequado. Mas para simplificação, tanto a declaração da classe, como a criação do objeto da classe Pessoa podem ser feitas no mesmo arquivo, assim como os demais exemplos dessa postagem.

Outras funções

As demais funções feitas anteriormente para o dicionário também podem ser feitas na classe Pessoa, seguindo as mesmas diferenças já apontadas anteriormente:

class Pessoa:
    def __init__(self, nome, sobrenome, idade):
        self.nome = nome
        self.sobrenome = sobrenome
        self.idade = idade

    def nome_completo(self):
        return f'{self.nome} {self.sobrenome}'

    def trocar_sobrenome(self, sobrenome):
        self.sobrenome = sobrenome

    def fazer_aniversario(self):
        self.idade += 1

Para se chamar essas funções, basta acessá-las através do contexto da classe, passando o objeto criado anteriormente como primeiro argumento:

p1 = Pessoa('João', 'dos Santos', 20)
Pessoa.trocar_sobrenome(p1, 'dos Santos')
print(Pessoa.nome_completo(p1))
Pessoa.fazer_aniversario(p1)
print(p1.idade)

Essa sintaxe é bastante semelhante a versão sem orientação a objetos implementada anteriormente. Porém quando se está utilizando objetos, é possível chamar essas funções com uma outra sintaxe, informando primeiro o objeto, seguido de ponto e o nome da função desejada, com a diferença de que não é mais necessário informar o objeto como primeiro argumento. Como a função foi chamada através de um objeto, o próprio Python se encarrega de passá-lo para o argumento self, sendo necessário informar apenas os demais argumentos:

p1.trocar_sobrenome('dos Santos')
print(p1.nome_completo())
p1.fazer_aniversario()
print(p1.idade)

Existem algumas diferenças entre as duas sintaxes, porém isso será tratado posteriormente. Por enquanto a segunda sintaxe pode ser vista como um açúcar sintático da primeira, ou seja, uma forma mais rápida e fácil de fazer a mesma coisa que a primeira, e por isso sendo a recomendada.

Considerações

Como visto nos exemplos, programação orientada a objetos é uma técnica para juntar variáveis em uma mesma estrutura e facilitar a escrita de funções que seguem um determinado padrão, recebendo a estrutura como argumento, porém a sintaxe mais utilizada no Python para chamar as funções de um objeto (métodos) posiciona a variável que guarda a estrutura antes do nome da função, em vez do primeiro argumento.

No Python, o argumento da estrutura ou objeto (self) aparece explicitamente como primeiro argumento da função, enquanto em outras linguagens essa variável pode receber outro nome (como this) e não aparece explicitamente nos argumentos da função, embora essa variável tenha que ser criada dentro do contexto da função para permitir manipular o objeto.

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

por Eduardo Klosowski em 12 de April de 2021 às 18:00

March 29, 2021

PythonClub

Funções in place ou cópia de valor

Eventualmente observo dificuldades de algumas pessoas em usar corretamente alguma função, seja porque a função deveria ser executada isoladamente, e utilizado a própria variável que foi passada como argumento posteriormente, seja porque deveria se atribuir o retorno da função a alguma variável, e utilizar essa nova variável. No Python, essa diferença pode ser observada nos métodos das listas sort e reverse para as funções sorted e reversed, que são implementadas com padrões diferentes, in place e cópia de valor respectivamente. Assim pretendo discutir esses dois padrões de funções, comentando qual a diferença e o melhor caso de aplicação de cada padrão.

Função de exemplo

Para demonstrar como esses padrões funcionam, será implementado uma função que recebe uma lista e calcula o dobro dos valores dessa lista. Exemplo:

entrada = [5, 2, 8, 6, 4]

# Execução da função

resultado = [10, 4, 16, 12, 8]

Função com in place

A ideia do padrão in place é alterar a própria variável recebida pela função (ou o próprio objeto, caso esteja lidando com orientação a objetos). Neste caso, bastaria calcular o dobro do valor de cada posição da lista, e sobrescrever a posição com seu resultado. Exemplo:

from typing import List


def dobro_inplace(lista: List[int]) -> None:
    for i in range(len(lista)):
        lista[i] = 2 * lista[i]


valores = [5, 2, 8, 6, 4]
retorno = dobro_inplace(valores)

print(f'Variável: valores | Tipo: {type(valores)} | Valor: {valores}')
print(f'Variável: retorno | Tipo: {type(retorno)} | Valor: {retorno}')

Resultado da execução:

Variável: valores | Tipo: <class 'list'> | Valor: [10, 4, 16, 12, 8]
Variável: retorno | Tipo: <class 'NoneType'> | Valor: None

Com essa execução é possível observar que os valores da lista foram alterados, e que o retorno da função é nulo (None), ou seja, a função alterou a própria lista passada como argumento. Outro ponto importante a ser observado é a assinatura da função (tipo dos argumentos e do retorno da função), que recebe uma lista de inteiros e não tem retorno ou é nulo (None). Dessa forma embora seja possível chamar essa função diretamente quando está se informando os argumentos de outra função, como print(dobro_inplace(valores)), a função print receberia None e não a lista como argumento.

Função com cópia de valor

A ideia do padrão cópia de valor é criar uma cópia do valor passado como argumento e retornar essa cópia, sem alterar a variável recebida (ou criando um novo objeto, no caso de orientação a objetos). Neste caso, é necessário criar uma nova lista e adicionar nela os valores calculados. Exemplo:

from typing import List


def dobro_copia(lista: List[int]) -> List[int]:
    nova_lista = []

    for i in range(len(lista)):
        nova_lista.append(2 * lista[i])

    return nova_lista


valores = [5, 2, 8, 6, 4]
retorno = dobro_copia(valores)

print(f'Variável: valores | Tipo: {type(valores)} | Valor: {valores}')
print(f'Variável: retorno | Tipo: {type(retorno)} | Valor: {retorno}')

Resultado da execução:

Variável: valores | Tipo: <class 'list'> | Valor: [5, 2, 8, 6, 4]
Variável: retorno | Tipo: <class 'list'> | Valor: [10, 4, 16, 12, 8]

Com essa execução é possível observar que a variável valores continua com os valores que tinha antes da execução da função, e a variável retorno apresenta uma lista com os dobros, ou seja, a função não altera a lista passada como argumento e retorna uma nova lista com os valores calculados. Observado a assinatura da função, ela recebe uma lista de inteiros e retorna uma lista de inteiros. Isso permite chamar essa função diretamente nos argumentos para outra função, como print(dobro_copia(valores)), nesse caso a função print receberia a lista de dobros como argumento. Porém caso o retorno da função não seja armazenado, parecerá que a função não fez nada, ou não funcionou. Então em alguns casos, quando o valor anterior não é mais necessário, pode-se reatribuir o retorno da função a própria variável passada como argumento:

valores = dobro_copia(valores)

Função híbrida

Ainda é possível mesclar os dois padrões de função, alterando o valor passado e retornando-o. Exemplo:

from typing import List


def dobro_hibrido(lista: List[int]) -> List[int]:
    for i in range(len(lista)):
        lista[i] = 2 * lista[i]

    return lista


valores = [5, 2, 8, 6, 4]
retorno = dobro_hibrido(valores)

print(f'Variável: valores | Tipo: {type(valores)} | Valor: {valores}')
print(f'Variável: retorno | Tipo: {type(retorno)} | Valor: {retorno}')

Resultado da execução:

Variável: valores | Tipo: <class 'list'> | Valor: [10, 4, 16, 12, 8]
Variável: retorno | Tipo: <class 'list'> | Valor: [10, 4, 16, 12, 8]

Nesse caso, pode-se apenas chamar a função, como também utilizá-la nos argumentos de outras funções. Porém para se ter os valores originais, deve-se fazer uma cópia manualmente antes de executar a função.

Exemplo na biblioteca padrão

Na biblioteca padrão do Python, existem os métodos sort e reverse que seguem o padrão in place, e as funções sorted e reversed que seguem o padrão cópia de valor, podendo ser utilizados para ordenar e inverter os valores de uma lista, por exemplo. Quando não é mais necessário uma cópia da lista com a ordem original, é preferível utilizar funções in place, que alteram a própria lista, e como não criam uma cópia da lista, utilizam menos memória. Exemplo:

valores = [5, 2, 8, 6, 4]
valores.sort()
valores.reverse()
print(valores)

Se for necessário manter uma cópia da lista inalterada, deve-se optar pelas funções de cópia de valor. Exemplo:

valores = [5, 2, 8, 6, 4]
novos_valores = reversed(sorted(valores))
print(novos_valores)

Porém esse exemplo cria duas cópias da lista, uma em cada função. Para criar apenas uma cópia, pode-se misturar funções in place com cópia de valor. Exemplo:

valores = [5, 2, 8, 6, 4]
novos_valores = sorted(valores)
novos_valores.reverse()
print(novos_valores)

Também vale observar que algumas utilizações dessas funções podem dar a impressão de que elas não funcionaram, como:

valores = [5, 2, 8, 6, 4]

sorted(valores)
print(valores)  # Imprime a lista original, e não a ordenada

print(valores.sort())  # Imprime None e não a lista

Considerações

Nem sempre é possível utilizar o padrão desejado, strings no Python (str) são imutáveis, logo todas as funções que manipulam elas seguiram o padrão cópia de valor, e para outros tipos, pode ocorrer de só existir funções in place, sendo necessário fazer uma cópia manualmente antes de chamar a função, caso necessário. Para saber qual padrão a função implementa, é necessário consultar sua documentação, ou verificando sua assinatura, embora ainda possa existir uma dúvida entre cópia de valor e híbrida, visto que a assinatura dos dois padrões são iguais.

Os exemplos aqui dados são didáticos. Caso deseja-se ordenar de forma reversa, tanto o método sort, quanto a função sorted podem receber como argumento reverse=True, e assim já fazer a ordenação reversa. Assim como é possível criar uma nova lista já com os valores, sem precisar adicionar manualmente item por item, como os exemplos:

valores = [5, 2, 8, 6, 4]
partes_dos_valores = valores[2:]
novos_valores = [2 * valor for valor in valores]

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

por Eduardo Klosowski em 29 de March de 2021 às 15:00

March 02, 2021

PythonClub

Encapsulamento da lógica do algoritmo

Muitas listas de exercícios de lógica de programação pedem em algum momento que um valor seja lido do teclado, e caso esse valor seja inválido, deve-se avisar, e repetir a leitura até que um valor válido seja informado. Utilizando a ideia de otimização do algoritmo passo a passo, começando com uma solução simples, pretendo estudar como reduzir a duplicação de código alterando o algoritmo, encapsulando a lógica em funções, e encapsulando em classes.

Exercício

Um exemplo de exercício que pede esse tipo de validação é a leitura de notas, que devem estar entre 0 e 10. A solução mais simples, consiste em ler um valor, e enquanto esse valor for inválido, dar o aviso e ler outro valor. Exemplo:

nota = float(input('Digite a nota: '))
while nota < 0 or nota > 10:
    print('Nota inválida')
    nota = float(input('Digite a nota: '))

Esse algoritmo funciona, porém existe uma duplicação no código que faz a leitura da nota (uma antes do loop e outra dentro). Caso seja necessário uma alteração, como a mudança da nota para um valor inteiro entre 0 e 100, deve-se alterar os dois lugares, e se feito em apenas um lugar, o algoritmo poderia processar valores inválidos.

Alterando o algoritmo

Visando remover a repetição de código, é possível unificar a leitura do valor dentro do loop, uma vez que é necessário repetir essa instrução até que o valor válido seja obtido. Exemplo:

while True:
    nota = float(input('Digite a nota: '))
    if 0 <= nota <= 10:
        break
    print('Nota inválida!')

Dessa forma, não existe mais a repetição de código. A condição de parada, que antes verificava se o valor era inválido (o que pode ter uma leitura não tão intuitiva), agora verifica se é um valor válido (que é geralmente é mais fácil de ler e escrever a condição). E a ordem dos comandos dentro do loop, que agora estão em uma ordem que facilita a leitura, visto que no algoritmo anterior era necessário tem em mente o que era executado antes do loop.

Porém esses algoritmos validam apenas o valor lido, apresentando erro caso seja informado um valor com formato inválido, como letras em vez de números. Isso pode ser resolvido tratando as exceções lançadas. Exemplo:

while True:
    try:
        nota = float(input('Digite a nota: '))
        if 0 <= nota <= 10:
            break
    except ValueError:
        ...
    print('Nota inválida!')

Encapsulamento da lógica em função

Caso fosse necessário ler várias notas, com os algoritmos apresentados até então, seria necessário repetir todo esse trecho de código, ou utilizá-lo dentro de uma estrutura de repetição. Para facilitar sua reutilização, evitando a duplicação de código, é possível encapsular esse algoritmo dentro de uma função. Exemplo:

def nota_input(prompt):
    while True:
        try:
            nota = float(input(prompt))
            if 0 <= nota <= 10:
                break
        except ValueError:
            ...
        print('Nota inválida!')
    return nota


nota1 = nota_input('Digite a primeira nota: ')
nota2 = nota_input('Digite a segunda nota: ')

Encapsulamento da lógica em classes

Em vez de encapsular essa lógica em uma função, é possível encapsulá-la em uma classe, o que permitiria separar cada etapa do algoritmo em métodos, assim como ter um método responsável por controlar qual etapa deveria ser chamada em qual momento. Exemplo:

class ValidaNotaInput:
    mensagem_valor_invalido = 'Nota inválida!'

    def ler_entrada(self, prompt):
        return input(prompt)

    def transformar_entrada(self, entrada):
        return float(entrada)

    def validar_nota(self, nota):
        return 0 <= nota <= 10

    def __call__(self, prompt):
        while True:
            try:
                nota = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_nota(nota):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return nota


nota_input = ValidaNotaInput()


nota = nota_input('Digite a nota: ')

Vale observar que o método __call__ permite que o objeto criado a partir dessa classe seja chamado como se fosse uma função. Nesse caso ele é o responsável por chamar cada etapa do algoritmo, como: ler_entrada que é responsável por ler o que foi digitado no teclado, transformar_entrada que é responsável por converter o texto lido para o tipo desejado (converter de str para float), e validar_nota que é responsável por dizer se o valor é válido ou não. Vale observar que ao dividir o algoritmo em métodos diferentes, seu código principal virou uma espécie de código comentado, descrevendo o que está sendo feito e onde está sendo feito.

Outra vantagem de encapsular a lógica em classe, em vez de uma função, é a possibilidade de generalizá-la. Se fosse necessário validar outro tipo de entrada, encapsulando em uma função, seria necessário criar outra função repetindo todo o algoritmo, alterando apenas a parte referente a transformação do valor lido, e validação, o que gera uma espécie de repetição de código. Ao encapsular em classes, é possível se aproveitar dos mecanismos de herança para evitar essa repetição. Exemplo:

class ValidaInput:
    mensagem_valor_invalido = 'Valor inválido!'

    def ler_entrada(self, prompt):
        return input(prompt)

    def transformar_entrada(self, entrada):
        raise NotImplementedError

    def validar_valor(self, valor):
        raise NotImplementedError

    def __call__(self, prompt):
        while True:
            try:
                valor = self.transformar_entrada(self.ler_entrada(prompt))
                if self.validar_valor(valor):
                    break
            except ValueError:
                ...
            print(self.mensagem_valor_invalido)
        return valor


class ValidaNomeInput(ValidaInput):
    mensagem_valor_invalido = 'Nome inválido!'

    def transformar_entrada(self, entrada):
        return entrada.strip().title()

    def validar_valor(self, valor):
        return valor != ''


class ValidaNotaInput(ValidaInput):
    mensagem_valor_invalido = 'Nota inválida!'

    def transformar_entrada(self, entrada):
        return float(entrada)

    def validar_valor(self, valor):
        return 0 <= valor <= 10


nome_input = ValidaNomeInput()
nota_input = ValidaNotaInput()


nome = nome_input('Digite o nome: ')
nota = nota_input('Digite a nota: ')

Dessa forma, é possível reutilizar o código já existente para criar outras validações, sendo necessário implementar apenas como converter a str lida do teclado para o tipo desejado, e como esse valor deve ser validado. Não é necessário entender e repetir a lógica de ler o valor, validá-lo, imprimir a mensagem de erro, e repetir até que seja informado um valor válido.

Considerações

É possível encapsular a lógica de um algoritmo em funções ou em classes. Embora para fazê-lo em uma classe exija conhecimentos de programação orientada a objetos, o seu reaproveitamento é facilitado, abstraindo toda a complexidade do algoritmo, que pode ser disponibilizado através de uma biblioteca, exigindo apenas a implementações de métodos simples por quem for a utilizar.

Ainda poderia ser discutido outras formas de fazer essa implementação, como passar funções como parâmetro e a utilização de corrotinas no encapsulamento do algoritmo em função, assim como a utilização de classmethod, staticmethod e ABC no encapsulamento do algoritmo em classes.

Esse artigo foi publicado originalmente no meu blog, passe por lá, ou siga-me no DEV para ver mais artigos que eu escrevi.

por Eduardo Klosowski em 02 de March de 2021 às 18:00

October 30, 2020

PythonClub

Fazendo backup do banco de dados no Django

Apresentação

Em algum momento, durante o seu processo de desenvolvimento com Django, pode ser que surja a necessidade de criar e restaurar o banco de dados da aplicação. Pensando nisso, resolvi fazer um pequeno tutorial, básico, de como realizar essa operação.

Nesse tutorial, usaremos o django-dbbackup, um pacote desenvolvido especificamente para isso.

Configurando nosso ambiente

Primeiro, partindo do início, vamos criar uma pasta para o nosso projeto e, nela, isolar o nosso ambiente de desenvolvimento usando uma virtualenv:

mkdir projeto_db && cd projeto_db #criando a pasta do nosso projeto

virtualenv -p python3.8 env && source env/bin/activate #criando e ativando a nossa virtualenv

Depois disso e com o nosso ambiente já ativo, vamos realizar os seguintes procedimentos:

pip install -U pip #com isso, atualizamos a verão do pip instalado

Instalando as dependências

Agora, vamos instalar o Django e o pacote que usaremos para fazer nossos backups.

pip install Django==3.1.2 #instalando o Django

pip install django-dbbackup #instalando o django-dbbackup

Criando e configurando projeto

Depois de instaladas nossas dependências, vamos criar o nosso projeto e configurar o nosso pacote nas configurações do Django.

django-admin startproject django_db . #dentro da nossa pasta projeto_db, criamos um projeto Django com o nome de django_db.

Depois de criado nosso projeto, vamos criar e popular o nosso banco de dados.

python manage.py migrate #com isso, sincronizamos o estado do banco de dados com o conjunto atual de modelos e migrações.

Criado nosso banco de dados, vamos criar um superusuário para podemos o painel admin do nosso projeto.

python manage.py createsuperuser

Perfeito. Já temos tudo que precisamos para executar nosso projeto. Para execução dele, é só fazermos:

python manage.py runserver

Você terá uma imagem assim do seu projeto:

Configurando o django-dbbackup

Dentro do seu projeto, vamos acessar o arquivo settings.py, como expresso abaixo:

django_db/
├── settings.py

Dentro desse arquivos iremos, primeiro, adiconar o django-dbbackup às apps do projeto:

INSTALLED_APPS = (
    ...
    'dbbackup',  # adicionando django-dbbackup
)

Depois de adicionado às apps, vamos dizer para o Django o que vamos salvar no backup e, depois, indicar a pasta para onde será encaminhado esse arquivo. Essa inserção deve ou pode ser feita no final do arquivo settings.py:

DBBACKUP_STORAGE = 'django.core.files.storage.FileSystemStorage' #o que salvar
DBBACKUP_STORAGE_OPTIONS = {'location': 'backups/'} # onde salvar

Percebam que dissemos para o Django salvar o backup na pasta backups, mas essa pasta ainda não existe no nosso projeto. Por isso, precisamos criá-la [fora da pasta do projeto]:

mkdir backups

Criando e restaurando nosso backup

Já temos tudo pronto. Agora, vamos criar o nosso primeiro backup:

python manage.py dbbackup

Depois de exetudado, será criado um arquivo -- no nosso exemplo, esse arquivo terá uma extensão .dump --, salvo na pasta backups. Esse arquivo contem todo backup do nosso banco de dados.

Para recuperarmos nosso banco, vamos supor que migramos nosso sistema de um servidor antigo para um novo e, por algum motivo, nossa base de dados foi corrompida, inviabilizando seu uso. Ou seja, estamos com o sistema/projeto sem banco de dados -- ou seja, exlua ou mova a a sua base dados .sqlite3 para que esse exemplo seja útil --, mas temos os backups. Com isso, vamos restaurar o banco:

python manage.py dbrestore

Prontinho, restauramos nosso banco de dados. O interessante do django-dbbackup, dentre outras coisas, é que ele gera os backups com datas e horários específicos, facilitando o processo de recuperação das informações mais recentes.

Por hoje é isso, pessoal. Até a próxima. ;)

por Jackson Osvaldo em 30 de October de 2020 às 13:40

August 29, 2020

Filipe Saraiva

Seqtembro de eventos virtuais e gratuitos sobre Qt e KDE

(Ok a piada com seqtembro funciona melhor na versão em inglês, seqtember, mas simbora)

Por uma grande coincidência, obra do destino, ou nada disso, teremos um Setembro de 2020 repleto de eventos virtuais e gratuitos de alta qualidade sobre Qt e KDE.

Começando de 4 à 11 do referido mês teremos o Akademy 2020, o grande encontro mundial da comunidade KDE que esse ano, por motivos que todos sabemos, acontecerá de forma virtual. A programação do Akademy traz palestras, treinamentos, hacking sessions, discussões com foco em aplicações KDE específicas, e mais, reunindo hackers, designers, gerentes de projetos, tradutores, e colabores dos mais diversos segmentos para discutir e planejar o KDE e seus futuros passos.

E como falamos em KDE, por extensão, também falamos em Qt – afinal, grande parte das aplicações é escrita nesse framework. Portanto, mesmo que você trabalhe com Qt mas não use nada do KDE, vale a pena participar do evento – e também se perguntar “porque diabos não estou usando e desenvolvendo aplicações do KDE?”.

Um incentivo extra é que durante o Akademy, entre 7 e 11, acontecerá o Qt Desktop Days, evento da KDAB voltado para Qt no desktop (surpresa?). A programação preliminar já está disponível e será muito interessante ver os avanços da tecnologia em um campo que pode parecer menos sexy hoje em dia, por conta da muita atenção dada a projetos mobile ou embarcados, mas que pelo contrário, continua vibrante e recebendo muito investimento.

Após uma rápida pausa para um respiro, temos a Maratona Qt. Nosso amigo Sandro Andrade, professor do IFBA e colaborador de longa data do KDE, resolveu dedicar uma semana inteira, de 14 à 18 de setembro, para apresentar 5 tópicos sobre o Qt tratando de seus fundamentos e passos iniciais de cada um. O programa cobre QML, C++ e Qt, Qt no Android, no iOS, na web (sim!), computação gráfica e mesmo jogos! Extremamente recomendada pra todo mundo que conhece ou quer conhecer o framework.

A Maratona Qt vai servir como um esquenta para a QtCon Brasil 2020, esse ano também virtual. Em 26 e 27 de setembro o pessoal da qmob.solutions reunirá desenvolvedores Qt de vários países para apresentarem, entre outras coisas, trabalhos com Wayland, visão computacional e IA, análise de dados, Python, containers, prototipagem, embarcados, e outros, tudo envolvendo Qt! E também haverá uma apresentação sobre a próxima versão major da ferramenta, Qt 6.

Portanto pessoal, reservem este mês para uma grande imersão nos vários aspectos e possibilidades disponibilizadas pelo Qt.

por Filipe Saraiva em 29 de August de 2020 às 18:48

July 24, 2020

Gabbleblotchits

MinHashing all the things: a quick analysis of MAG search results

Last time I described a way to search MAGs in metagenomes, and teased about interesting results. Let's dig in some of them!

I prepared a repo with the data and a notebook with the analysis I did in this post. You can also follow along in Binder, as well as do your own analysis!

Preparing some metadata

The supplemental materials for Tully et al include more details about each MAG, so let's download them. I prepared a small snakemake workflow to do that, as well as downloading information about the SRA datasets from Tara Oceans (the dataset used to generate the MAGs), as well as from Parks et al, which also generated MAGs from Tara Oceans. Feel free to include them in your analysis, but I was curious to find matches in other metagenomes.

Loading the data

The results from the MAG search are in a CSV file, with a column for the MAG name, another for the SRA dataset ID for the metagenome and a third column for the containment of the MAG in the metagenome. I also fixed the names to make it easier to query, and finally removed the Tara and Parks metagenomes (because we already knew they contained these MAGs).

This left us with 23,644 SRA metagenomes with matches, covering 2,291 of the 2,631 MAGs. These are results for a fairly low containment (10%), so if we limit to MAGs with more than 50% containment we still have 1,407 MAGs and 2,938 metagenomes left.

TOBG_NP-110, I choose you!

That's still a lot, so I decided to pick a candidate to check before doing any large scale analysis. I chose TOBG_NP-110 because there were many matches above 50% containment, and even some at 99%. Turns out it is also an Archaeal MAG that failed to be classified further than Phylum level (Euryarchaeota), with a 70.3% complete score in the original analysis. Oh, let me dissect the name a bit: TOBG is "Tara Ocean Binned Genome" and "NP" is North Pacific.

And so I went checking where the other metagenome matches came from. 5 of the 12 matches above 50% containment come from one study, SRP044185, with samples collected from a column of water in a station in Manzanillo, Mexico. Other 3 matches come from SRP003331, in the South Pacific ocean (in northern Chile). Another match, ERR3256923, also comes from the South Pacific.

What else can I do?

I'm curious to follow the refining MAGs tutorial from the Meren Lab and see where this goes, and especially in using spacegraphcats to extract neighborhoods from the MAG and better evaluate what is missing or if there are other interesting bits that the MAG generation methods ended up discarding.

So, for now that's it. But more important, I didn't want to sit on these results until there is a publication in press, especially when there are people that can do so much more with these, so I decided to make it all public. It is way more exciting to see this being used to know more about these organisms than me being the only one with access to this info.

And yesterday I saw this tweet by @DrJonathanRosa, saying:

I don’t know who told students that the goal of research is to find some previously undiscovered research topic, claim individual ownership over it, & fiercely protect it from theft, but that almost sounds like, well, colonialism, capitalism, & policing

Amen.

I want to run this with my data!

Next time. But we will have a discussion about scientific infrastructure and sustainability first =]

Comments?

Thread on Twitter

por luizirber em 24 de July de 2020 às 15:00

July 22, 2020

Gabbleblotchits

MinHashing all the things: searching for MAGs in the SRA

(or: Top-down and bottom-up approaches for working around sourmash limitations)

In the last month I updated wort, the system I developed for computing sourmash signature for public genomic databases, and started calculating signatures for the metagenomes in the Sequence Read Archive. This is a more challenging subset than the microbial datasets I was doing previously, since there are around 534k datasets from metagenomic sources in the SRA, totalling 447 TB of data. Another problem is the size of the datasets, ranging from a couple of MB to 170 GB. Turns out that the workers I have in wort are very good for small-ish datasets, but I still need to figure out how to pull large datasets faster from the SRA, because the large ones take forever to process...

The good news is that I managed to calculate signatures for almost 402k of them ¹, which already let us work on some pretty exciting problems =]

Looking for MAGs in the SRA

Metagenome-assembled genomes are essential for studying organisms that are hard to isolate and culture in lab, especially for environmental metagenomes. Tully et al published 2,631 draft MAGs from 234 samples collected during the Tara Oceans expedition, and I wanted to check if they can also be found in other metagenomes besides the Tara Oceans ones. The idea is to extract the reads from these other matches and evaluate how the MAG can be improved, or at least evaluate what is missing in them. I choose to use environmental samples under the assumption they are easier to deposit on the SRA and have public access, but there are many human gut microbiomes in the SRA and this MAG search would work just fine with those too.

Moreover, I want to search for containment, and not similarity. The distinction is subtle, but similarity takes into account both datasets sizes (well, the size of the union of all elements in both datasets), while containment only considers the size of the query. This is relevant because the similarity of a MAG and a metagenome is going to be very small (and is symmetrical), but the containment of the MAG in the metagenome might be large (and is asymmetrical, since the containment of the metagenome in the MAG is likely very small because the metagenome is so much larger than the MAG).

The computational challenge: indexing and searching

sourmash signatures are a small fraction of the original size of the datasets, but when you have hundreds of thousands of them the collection ends up being pretty large too. More precisely, 825 GB large. That is way bigger than any index I ever built for sourmash, and it would also have pretty distinct characteristics than what we usually do: we tend to index genomes and run search (to find similar genomes) or gather (to decompose metagenomes into their constituent genomes), but for this MAG search I want to find which metagenomes have my MAG query above a certain containment threshold. Sort of a sourmash search --containment, but over thousands of metagenome signatures. The main benefit of an SBT index in this context is to avoid checking all signatures because we can prune the search early, but currently SBT indices need to be totally loaded in memory during sourmash index. I will have to do this in the medium term, but I want a solution NOW! =]

sourmash 3.4.0 introduced --from-file in many commands, and since I can't build an index I decided to use it to load signatures for the metagenomes. But... sourmash search tries to load all signatures in memory, and while I might be able to find a cluster machine with hundreds of GBs of RAM available, that's not very practical.

So, what to do?

The top-down solution: a snakemake workflow

I don't want to modify sourmash now, so why not make a workflow and use snakemake to run one sourmash search --containment for each metagenome? That means 402k tasks, but at least I can use batches and SLURM job arrays to submit reasonably-sized jobs to our HPC queue. After running all batches I summarized results for each task, and it worked well for a proof of concept.

But... it was still pretty resource intensive: each task was running one query MAG against one metagenome, and so each task needed to do all the overhead of starting the Python interpreter and parsing the query signature, which is exactly the same for all tasks. Extending it to support multiple queries to the same metagenome would involve duplicating tasks, and 402k metagenomes times 2,631 MAGs is... a very large number of jobs.

I also wanted to avoid clogging the job queues, which is not very nice to the other researchers using the cluster. This limited how many batches I could run in parallel...

The bottom-up solution: Rust to the rescue!

Thinking a bit more about the problem, here is another solution: what if we load all the MAGs in memory (as they will be queried frequently and are not that large), and then for each metagenome signature load it, perform all MAG queries, and then unload the metagenome signature from memory? This way we can control memory consumption (it's going to be proportional to all the MAG sizes plus the size of the largest metagenome) and can also efficiently parallelize the code because each task/metagenome is independent and the MAG signatures can be shared freely (since they are read-only).

This could be done with the sourmash Python API plus multiprocessing or some other parallelization approach (maybe dask?), but turns out that everything we need comes from the Rust API. Why not enjoy a bit of the fearless concurrency that is one of the major Rust goals?

The whole code ended up being 176 lines long, including command line parsing using strucopt and parallelizing the search using rayon and a multiple-producer, single-consumer channel to write results to an output (either the terminal or a file). This version took 11 hours to run, using less than 5GB of RAM and 32 processors, to search 2k MAGs against 402k metagenomes. And, bonus! It can also be parallelized again if you have multiple machines, so it potentially takes a bit more than an hour to run if you can allocate 10 batch jobs, with each batch 1/10 of the metagenome signatures.

So, is bottom-up always the better choice?

I would like to answer "Yes!", but bioinformatics software tends to be organized as command line interfaces, not as libraries. Libraries also tend to have even less documentation than CLIs, and this particular case is not a fair comparison because... Well, I wrote most of the library, and the Rust API is not that well documented for general use.

But I'm pretty happy with how the sourmash CLI is viable both for the top-down approach (and whatever workflow software you want to use) as well as how the Rust core worked for the bottom-up approach. I think the most important is having the option to choose which way to go, especially because now I can use the bottom-up approach to make the sourmash CLI and Python API better. The top-down approach is also way more accessible in general, because you can pick your favorite workflow software and use all the tricks you're comfortable with.

But, what about the results?!?!?!

Next time. But I did find MAGs with over 90% containment in very different locations, which is pretty exciting!

I also need to find a better way of distributing all these signature, because storing 4 TB of data in S3 is somewhat cheap, but transferring data is very expensive. All signatures are also available on IPFS, but I need more people to host them and share. Get in contact if you're interested in helping =]

And while I'm asking for help, any tips on pulling data faster from the SRA are greatly appreciated!

Comments?

Thread on Twitter

Footnotes

pulling about a 100 TB in 3 days, which was pretty fun to see because I ended up DDoS myself because I couldn't download the generated sigs fast enough from the S3 bucket where they are temporarily stored =P ↩

por luizirber em 22 de July de 2020 às 15:00

May 11, 2020

Gabbleblotchits

Putting it all together

sourmash 3.3 was released last week, and it is the first version supporting zipped databases. Here is my personal account of how that came to be =]

What is a sourmash database?

A sourmash database contains signatures (typically Scaled MinHash sketches built from genomic datasets) and an index for allowing efficient similarity and containment queries over these signatures. The two types of index are SBT, a hierarchical index that uses less memory by keeping data on disk, and LCA, an inverted index that uses more memory but is potentially faster. Indices are described as JSON files, with LCA storing all the data in one JSON file and SBT opting for saving a description of the index structure in JSON, and all the data into a hidden directory with many files.

We distribute some prepared databases (with SBT indices) for Genbank and RefSeq as compressed TAR files. The compressed file is ~8GB, but after decompressing it turns into almost 200k files in a hidden directory, using about 40 GB of disk space.

Can we avoid generating so many hidden files?

The initial issue in this saga is dib-lab/sourmash#490, and the idea was to take the existing support for multiple data storages (hidden dir, TAR files, IPFS and Redis) and save the index description in the storage, allowing loading everything from the storage. Since we already had the databases as TAR files, the first test tried to use them but it didn't take long to see it was a doomed approach: TAR files are terrible from random access (or at least the tarfile module in Python is).

Zip files showed up as a better alternative, and it helps that Python has the zipfile module already available in the standard library. Initial tests were promising, and led to dib-lab/sourmash#648. The main issue was performance: compressing and decompressing was slow, but there was also another limitation...

Loading Nodegraphs from a memory buffer

Another challenge was efficiently loading the data from a storage. The two core methods in a storage are save(location, content), where content is a bytes buffer, and load(location), which returns a bytes buffer that was previously saved. This didn't interact well with the khmer Nodegraphs (the Bloom Filter we use for SBTs), since khmer only loads data from files, not from memory buffers. We ended up doing a temporary file dance, which made things slower for the default storage (hidden dir), where it could have been optimized to work directly with files, and involved interacting with the filesystem for the other storages (IPFS and Redis could be pulling data directly from the network, for example).

This one could be fixed in khmer by exposing C++ stream methods, and I did a small PoC to test the idea. While doable, this is something that was happening while the sourmash conversion to Rust was underway, and depending on khmer was a problem for my Webassembly aspirations... so, having the Nodegraph implemented in Rust seemed like a better direction, That has actually been quietly living in the sourmash codebase for quite some time, but it was never exposed to the Python (and it was also lacking more extensive tests).

After the release of sourmash 3 and the replacement of the C++ for the Rust implementation, all the pieces for exposing the Nodegraph where in place, so dib-lab/sourmash#799 was the next step. It wasn't a priority at first because other optimizations (that were released in 3.1 and 3.2) were more important, but then it was time to check how this would perform. And...

Your Rust code is not so fast, huh?

Turns out that my Nodegraph loading code was way slower than khmer. The Nodegraph binary format is well documented, and doing an initial implementation wasn't so hard by using the byteorder crate to read binary data with the right endianess, and then setting the appropriate bits in the internal fixedbitset in memory. But the khmer code doesn't parse bit by bit: it reads a long char buffer directly, and that is many orders of magnitude faster than setting bit by bit.

And there was no way to replicate this behavior directly with fixedbitset. At this point I could either bit-indexing into a large buffer and lose all the useful methods that fixedbitset provides, or try to find a way to support loading the data directly into fixedbitset and open a PR.

I chose the PR (and even got #42! =]).

It was more straightforward than I expected, but it did expose the internal representation of fixedbitset, so I was a bit nervous it wasn't going to be merged. But bluss was super nice, and his suggestions made the PR way better! This simplified the final Nodegraph code, and actually was more correct (because I was messing a few corner cases when doing the bit-by-bit parsing before). Win-win!

Nodegraphs are kind of large, can we compress them?

Being able to save and load Nodegraphs in Rust allowed using memory buffers, but also opened the way to support other operations not supported in khmer Nodegraphs. One example is loading/saving compressed files, which is supported for Countgraph (another khmer data structure, based on Count-Min Sketch) but not in Nodegraph.

If only there was an easy way to support working with compressed files...

Oh wait, there is! niffler is a crate that I made with Pierre Marijon based on some functionality I saw in one of his projects, and we iterated a bit on the API and documented everything to make it more useful for a larger audience. niffler tries to be as transparent as possible, with very little boilerplate when using it but with useful features nonetheless (like auto detection of the compression format). If you want more about the motivation and how it happened, check this Twitter thread.

The cool thing is that adding compressed files support in sourmash was mostly one-line changes for loading (and a bit more for saving, but mostly because converting compression levels could use some refactoring).

Putting it all together: zipped SBT indices

With all these other pieces in places, it's time to go back to dib-lab/sourmash#648. Compressing and decompressing with the Python zipfile module is slow, but Zip files can also be used just for storage, handing back the data without extracting it. And since we have compression/decompression implemented in Rust with niffler, that's what the zipped sourmash databases are: data is loaded and saved into the Zip file without using the Python module compression/decompression, and all the work is done before (or after) in the Rust side.

This allows keeping the Zip file with similar sizes to the original TAR files we started with, but with very low overhead for decompression. For compression we opted for using Gzip level 1, which doesn't compress perfectly but also doesn't take much longer to run:

Level	Size	Time
0	407 MB	16s
1	252 MB	21s
5	250 MB	39s
9	246 MB	1m48s

In this table, 0 is without compression, while 9 is the best compression. The size difference from 1 to 9 is only 6 MB (~2% difference) but runs 5x faster, and it's only 30% slower than saving the uncompressed data.

The last challenge was updating an existing Zip file. It's easy to support appending new data, but if any of the already existing data in the file changes (which happens when internal nodes change in the SBT, after a new dataset is inserted) then there is no easy way to replace the data in the Zip file. Worse, the Python zipfile will add the new data while keeping the old one around, leading to ginormous files over time¹ So, what to do?

I ended up opting for dealing with the complexity and complicating the ZipStorage implementation a bit, by keeping a buffer for new data. If it's a new file or it already exists but there are no insertions the buffer is ignored and all works as before.

If the file exists and new data is inserted, then it is first stored in the buffer (where it might also replace a previous entry with the same name). In this case we also need to check the buffer when trying to load some data (because it might exist only in the buffer, and not in the original file).

Finally, when the ZipStorage is closed it needs to verify if there are new items in the buffer. If not, it is safe just to close the original file. If there are new items but they were not present in the original file, then we can append the new data to the original file. The final case is if there are new items that were also in the original file, and in this case a new Zip file is created and all the content from buffer and original file are copied to it, prioritizing items from the buffer. The original file is replaced by the new Zip file.

Turns out this worked quite well! And so the PR was merged =]

The future

Zipped databases open the possibility of distributing extra data that might be useful for some kinds of analysis. One thing we are already considering is adding taxonomy information, let's see what else shows up.

Having Nodegraph in Rust is also pretty exciting, because now we can change the internal representation for something that uses less memory (maybe using RRR encoding?), but more importantly: now they can also be used with Webassembly, which opens many possibilities for running not only signature computation but also search and gather in the browser, since now we have all the pieces to build it.

Comments?

Thread on Twitter

Footnotes

The zipfile module does throw a UserWarning pointing that duplicated files were inserted, which is useful during development but generally doesn't show during regular usage... ↩

por luizirber em 11 de May de 2020 às 15:00

January 24, 2020

PythonClub

Criando um CI de uma aplicação Django usando Github Actions

Fala pessoal, tudo bom?

Nos vídeo abaixo vou mostrar como podemos configurar um CI de uma aplicação Django usando Github Actions.

https://www.youtube.com/watch?v=KpSlY8leYFY.

por Lucas Magnum em 24 de January de 2020 às 15:10

January 10, 2020

Gabbleblotchits

Oxidizing sourmash: PR walkthrough

sourmash 3 was released last week, finally landing the Rust backend. But, what changes when developing new features in sourmash? I was thinking about how to best document this process, and since PR #826 is a short example touching all the layers I decided to do a small walkthrough.

Shall we?

The problem

The first step is describing the problem, and trying to convince reviewers (and yourself) that the changes bring enough benefits to justify a merge. This is the description I put in the PR:

Calling .add_hash() on a MinHash sketch is fine, but if you're calling it all the time it's better to pass a list of hashes and call .add_many() instead. Before this PR add_many just called add_hash for each hash it was passed, but now it will pass the full list to Rust (and that's way faster).

No changes for public APIs, and I changed the _signatures method in LCA to accumulate hashes for each sig first, and then set them all at once. This is way faster, but might use more intermediate memory (I'll evaluate this now).

There are many details that sound like jargon for someone not familiar with the codebase, but if I write something too long I'll probably be wasting the reviewers time too. The benefit of a very detailed description is extending the knowledge for other people (not necessarily the maintainers), but that also takes effort that might be better allocated to solve other problems. Or, more realistically, putting out other fires =P

Nonetheless, some points I like to add in PR descriptions: - why is there a problem with the current approach? - is this the minimal viable change, or is it trying to change too many things at once? The former is way better, in general. - what are the trade-offs? This PR is using more memory to lower the runtime, but I hadn't measure it yet when I opened it. - Not changing public APIs is always good to convince reviewers. If the project follows a semantic versioning scheme, changes to the public APIs are major version bumps, and that can brings other consequences for users.

Setting up for changing code

If this was a bug fix PR, the first thing I would do is write a new test triggering the bug, and then proceed to fix it in the code (Hmm, maybe that would be another good walkthrough?). But this PR is making performance claims ("it's going to be faster"), and that's a bit hard to codify in tests. ¹ Since it's also proposing to change a method (_signatures in LCA indices) that is better to benchmark with a real index (and not a toy example), I used the same data and command I run in sourmash_resources to check how memory consumption and runtime changed. For reference, this is the command:

sourmash search -o out.csv --scaled 2000 -k 51 HSMA33OT.fastq.gz.sig genbank-k51.lca.json.gz

I'm using the benchmark feature from snakemake in sourmash_resources to track how much memory, runtime and I/O is used for each command (and version) of sourmash, and generate the plots in the README in that repo. That is fine for a high-level view ("what's the maximum memory used?"), but not so useful for digging into details ("what method is consuming most memory?").

Another additional problem is the dual² language nature of sourmash, where we have Python calling into Rust code (via CFFI). There are great tools for measuring and profiling Python code, but they tend to not work with extension code...

So, let's bring two of my favorite tools to help!

Memory profiling: heaptrack

heaptrack is a heap profiler, and I first heard about it from Vincent Prouillet. Its main advantage over other solutions (like valgrind's massif) is the low overhead and... how easy it is to use: just stick heaptrack in front of your command, and you're good to go!

Example output:

$ heaptrack sourmash search -o out.csv --scaled 2000 -k 51 HSMA33OT.fastq.gz.sig genbank-k51.lca.json.gz

heaptrack stats:
        allocations:            1379353
        leaked allocations:     1660
        temporary allocations:  168984
Heaptrack finished! Now run the following to investigate the data:

  heaptrack --analyze heaptrack.sourmash.66565.gz

heaptrack --analyze is a very nice graphical interface for analyzing the results, but for this PR I'm mostly focusing on the Summary page (and overall memory consumption). Tracking allocations in Python doesn't give many details, because it shows the CPython functions being called, but the ability to track into the extension code (Rust) allocations is amazing for finding bottlenecks (and memory leaks =P). ³

CPU profiling: py-spy

Just as other solutions exist for profiling memory, there are many for profiling CPU usage in Python, including profile and cProfile in the standard library. Again, the issue is being able to analyze extension code, and bringing the cannon (the perf command in Linux, for example) looses the benefit of tracking Python code properly (because we get back the CPython functions, not what you defined in your Python code).

Enters py-spy by Ben Frederickson, based on the rbspy project by Julia Evans. Both use a great idea: read the process maps for the interpreters and resolve the full stack trace information, with low overhead (because it uses sampling). py-spy also goes further and resolves native Python extensions stack traces, meaning we can get the complete picture all the way from the Python CLI to the Rust core library!⁴

py-spy is also easy to use: stick py-spy record --output search.svg -n -- in front of the command, and it will generate a flamegraph in search.svg. The full command for this PR is

py-spy record --output search.svg -n -- sourmash search -o out.csv --scaled 2000 -k 51 HSMA.fastq.sig genbank-k51.lca.json.gz

Show me the code!

OK, OK, sheesh. But it's worth repeating: the code is important, but there are many other aspects that are just as important =]

Replacing `add_hash` calls with one `add_many`

Let's start at the _signatures() method on LCA indices. This is the original method:

@cached_property
def _signatures(self):
    "Create a _signatures member dictionary that contains {idx: minhash}."
    from .. import MinHash

    minhash = MinHash(n=0, ksize=self.ksize, scaled=self.scaled)

    debug('creating signatures for LCA DB...')
    sigd = defaultdict(minhash.copy_and_clear)

    for (k, v) in self.hashval_to_idx.items():
        for vv in v:
            sigd[vv].add_hash(k)

    debug('=> {} signatures!', len(sigd))
    return sigd

sigd[vv].add_hash(k) is the culprit. Each call to .add_hash has to go thru CFFI to reach the extension code, and the overhead is significant. It is a similar situation to accessing array elements in NumPy: it works, but it is way slower than using operations that avoid crossing from Python to the extension code. What we want to do instead is call .add_many(hashes), which takes a list of hashes and process it entirely in Rust (ideally. We will get there).

But, to have a list of hashes, there is another issue with this code.

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        sigd[vv].add_hash(k)

There are two nested for loops, and add_hash is being called with values from the inner loop. So... we don't have the list of hashes beforehand.

But we can change the code a bit to save the hashes for each signature in a temporary list, and then call add_many on the temporary list. Like this:

temp_vals = defaultdict(list)

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].append(k)

for sig, vals in temp_vals.items():
    sigd[sig].add_many(vals)

There is a trade-off here: if we save the hashes in temporary lists, will the memory consumption be so high that the runtime gains of calling add_many in these temporary lists be cancelled?

Time to measure it =]

version	mem	time
original	1.5 GB	160s
`list`	1.7GB	173s

Wait, it got worse?!?! Building temporary lists only takes time and memory, and bring no benefits!

This mystery goes away when you look at the add_many method:

def add_many(self, hashes):
    "Add many hashes in at once."
    if isinstance(hashes, MinHash):
        self._methodcall(lib.kmerminhash_add_from, hashes._objptr)
    else:
        for hash in hashes:
            self._methodcall(lib.kmerminhash_add_hash, hash)

The first check in the if statement is a shortcut for adding hashes from another MinHash, so let's focus on else part... And turns out that add_many is lying! It doesn't process the hashes in the Rust extension, but just loops and call add_hash for each hash in the list. That's not going to be any faster than what we were doing in _signatures.

Time to fix add_many!

Oxidizing `add_many`

The idea is to change this loop in add_many:

for hash in hashes:
    self._methodcall(lib.kmerminhash_add_hash, hash)

with a call to a Rust extension function:

self._methodcall(lib.kmerminhash_add_many, list(hashes), len(hashes))

self._methodcall is a convenience method defined in RustObject which translates a method-like call into a function call, since our C layer only has functions. This is the C prototype for this function:

void kmerminhash_add_many(
    KmerMinHash *ptr,
    const uint64_t *hashes_ptr,
    uintptr_t insize
  );

You can almost read it as a Python method declaration, where KmerMinHash *ptr means the same as the self in Python methods. The other two arguments are a common idiom when passing pointers to data in C, with insize being how many elements we have in the list. ⁵. CFFI is very good at converting Python lists into pointers of a specific type, as long as the type is of a primitive type (uint64_t in our case, since each hash is a 64-bit unsigned integer number).

And the Rust code with the implementation of the function:

ffi_fn! {
unsafe fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize,
  ) -> Result<()> {
    let mh = {
        assert!(!ptr.is_null());
        &mut *ptr
    };

    let hashes = {
        assert!(!hashes_ptr.is_null());
        slice::from_raw_parts(hashes_ptr as *mut u64, insize)
    };

    for hash in hashes {
      mh.add_hash(*hash);
    }

    Ok(())
}
}

Let's break what's happening here into smaller pieces. Starting with the function signature:

ffi_fn! {
unsafe fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize,
  ) -> Result<()>

The weird ffi_fn! {} syntax around the function is a macro in Rust: it changes the final generated code to convert the return value (Result<()>) into something that is valid C code (in this case, void). What happens if there is an error, then? The Rust extension has code for passing back an error code and message to Python, as well as capturing panics (when things go horrible bad and the program can't recover) in a way that Python can then deal with (raising exceptions and cleaning up). It also sets the #[no_mangle] attribute in the function, meaning that the final name of the function will follow C semantics (instead of Rust semantics), and can be called more easily from C and other languages. This ffi_fn! macro comes from symbolic, a big influence on the design of the Python/Rust bridge in sourmash.

unsafe is the keyword in Rust to disable some checks in the code to allow potentially dangerous things (like dereferencing a pointer), and it is required to interact with C code. unsafe doesn't mean that the code is always unsafe to use: it's up to whoever is calling this to verify that valid data is being passed and invariants are being preserved.

If we remove the ffi_fn! macro and the unsafe keyword, we have

fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize
  );

At this point we can pretty much map between Rust and the C function prototype:

void kmerminhash_add_many(
    KmerMinHash *ptr,
    const uint64_t *hashes_ptr,
    uintptr_t insize
  );

Some interesting points:

We use fn to declare a function in Rust.
The type of an argument comes after the name of the argument in Rust, while it's the other way around in C. Same for the return type (it is omitted in the Rust function, which means it is -> (), equivalent to a void return type in C).
In Rust everything is immutable by default, so we need to say that we want a mutable pointer to a KmerMinHash item: *mut KmerMinHash). In C everything is mutable by default.
u64 in Rust -> uint64_t in C
usize in Rust -> uintptr_t in C

Let's check the implementation of the function now. We start by converting the ptr argument (a raw pointer to a KmerMinHash struct) into a regular Rust struct:

let mh = {
    assert!(!ptr.is_null());
    &mut *ptr
};

This block is asserting that ptr is not a null pointer, and if so it dereferences it and store in a mutable reference. If it was a null pointer the assert! would panic (which might sound extreme, but is way better than continue running because dereferencing a null pointer is BAD). Note that functions always need all the types in arguments and return values, but for variables in the body of the function Rust can figure out types most of the time, so no need to specify them.

The next block prepares our list of hashes for use:

let hashes = {
    assert!(!hashes_ptr.is_null());
    slice::from_raw_parts(hashes_ptr as *mut u64, insize)
};

We are again asserting that the hashes_ptr is not a null pointer, but instead of dereferencing the pointer like before we use it to create a slice, a dynamically-sized view into a contiguous sequence. The list we got from Python is a contiguous sequence of size insize, and the slice::from_raw_parts function creates a slice from a pointer to data and a size.

Oh, and can you spot the bug? I created the slice using *mut u64, but the data is declared as *const u64. Because we are in an unsafe block Rust let me change the mutability, but I shouldn't be doing that, since we don't need to mutate the slice. Oops.

Finally, let's add hashes to our MinHash! We need a for loop, and call add_hash for each hash:

for hash in hashes {
  mh.add_hash(*hash);
}

Ok(())

We finish the function with Ok(()) to indicate no errors occurred.

Why is calling add_hash here faster than what we were doing before in Python? Rust can optimize these calls and generate very efficient native code, while Python is an interpreted language and most of the time don't have the same guarantees that Rust can leverage to generate the code. And, again, calling add_hash here doesn't need to cross FFI boundaries or, in fact, do any dynamic evaluation during runtime, because it is all statically analyzed during compilation.

Putting it all together

And... that's the PR code. There are some other unrelated changes that should have been in new PRs, but since they were so small it would be more work than necessary. OK, that's a lame excuse: it's confusing for reviewers to see these changes here, so avoid doing that if possible!

But, did it work?

version	mem	time
original	1.5 GB	160s
`list`	1.7GB	73s

We are using 200 MB of extra memory, but taking less than half the time it was taking before. I think this is a good trade-off, and so did the reviewer and the PR was approved.

Hopefully this was useful, 'til next time!

Comments?

Bonus: `list` or `set`?

The first version of the PR used a set instead of a list to accumulate hashes. Since a set doesn't have repeated elements, this could potentially use less memory. The code:

temp_vals = defaultdict(set)

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].add(k)

for sig, vals in temp_vals.items():
    sigd[sig].add_many(vals)

The runtime was again half of the original, but...

version	mem	time
original	1.5 GB	160s
`set`	3.8GB	80s
`list`	1.7GB	73s

... memory consumption was almost 2.5 times the original! WAT

The culprit this time? The new kmerminhash_add_many call in the add_many method. This one:

self._methodcall(lib.kmerminhash_add_many, list(hashes), len(hashes))

CFFI doesn't know how to convert a set into something that C understands, so we need to call list(hashes) to convert it into a list. Since Python (and CFFI) can't know if the data is going to be used later ⁶ it needs to keep it around (and be eventually deallocated by the garbage collector). And that's how we get at least double the memory being allocated...

There is another lesson here. If we look at the for loop again:

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].add(k)

each k is already unique because they are keys in the hashval_to_idx dictionary, so the initial assumption (that a set might save memory because it doesn't have repeated elements) is... irrelevant for the problem =]

Footnotes

We do have https://asv.readthedocs.io/ set up for micro-benchmarks, and now that I think about it... I could have started by writing a benchmark for add_many, and then showing that it is faster. I will add this approach to the sourmash PR checklist =] ↩
or triple, if you count C ↩
It would be super cool to have the unwinding code from py-spy in heaptrack, and be able to see exactly what Python methods/lines of code were calling the Rust parts... ↩
Even if py-spy doesn't talk explicitly about Rust, it works very very well, woohoo! ↩
Let's not talk about lack of array bounds checks in C... ↩
something that the memory ownership model in Rust does, BTW ↩

por luizirber em 10 de January de 2020 às 15:00

December 01, 2019

Gabbleblotchits

Interoperability #rust2020

In January I wrote a post for the Rust 2019 call for blogs. The 2020 call is aiming for an RFC and roadmap earlier this time, so here is my 2020 post =]

Last call review: what happened?

An attribute proc-macro like `#[wasm_bindgen]` but for FFI

This sort of happened... because WebAssembly is growing =]

I was very excited when Interface Types showed up in August, and while it is still very experimental it is moving fast and bringing saner paths for interoperability than raw C FFIs. David Beazley even point this at the end of his PyCon India keynote, talking about how easy is to get information out of a WebAssembly module compared to what had to be done for SWIG.

This doesn't solve the problem where strict C compatibility is required, or for platforms where a WebAssembly runtime is not available, but I think it is a great solution for scientific software (or, at least, for my use cases =]).

"More -sys and Rust-like crates for interoperability with the larger ecosystems" and "More (bioinformatics) tools using Rust!"

I did some of those this year (bbhash-sys and mqf), and also found some great crates to use in my projects. Rust is picking up steam in bioinformatics, being used as the primary choice for high quality software (like varlociraptor, or the many coming from 10X Genomics) but it is still somewhat hard to find more details (I mostly find it on Twitter, and sometime Google Scholar alerts). It would be great to start bringing this info together, which leads to...

"A place to find other scientists?"

Hey, this one happened! Luca Palmieri started a conversation on reddit and the #science-and-ai Discord channel on the Rust community server was born! I think it works pretty well, and Luca also has being doing a great job running workshops and guiding the conversation around rust-ml.

Rust 2021: Interoperability

Rust is amazing because it is very good at bringing many concepts and ideas that seem contradictory at first, but can really shine when synthesized. But can we share this combined wisdom and also improve the situation in other places too? Despite the "Rewrite it in Rust" meme, increased interoperability is something that is already driving a lot of the best aspects of Rust:

Interoperability with other languages: as I said before, with WebAssembly (and Rust being having the best toolchain for it) there is a clear route to achieve this, but it will not replace all the software that already exist and can benefit from FFI and C compatibility. Bringing together developers from the many language specific binding generators (helix, neon, rustler, PyO3...) and figuring out what's missing from them (or what is the common parts that can be shared) also seems productive.
Interoperability with new and unexplored domains. I think Rust benefits enormously from not focusing only in one domain, and choosing to prioritize CLI, WebAssembly, Networking and Embedded is a good subset to start tackling problems, but how to guide other domains to also use Rust and come up with new contributors and expose missing pieces of the larger picture?

Another point extremely close to interoperability is training. A great way to interoperate with other languages and domains is having good documentation and material from transitioning into Rust without having to figure everything at once. Rust documentation is already amazing, especially considering the many books published by each working group. But... there is a gap on the transitions, both from understanding the basics of the language and using it, to the progression from beginner to intermediate and expert.

I see good resources for JavaScript and Python developers, but we are still covering a pretty small niche: programmers curious enough to go learn another language, or looking for solutions for problems in their current language.

Can we bring more people into Rust? RustBridge is obviously the reference here, but there is space for much, much more. Using Rust in The Carpentries lessons? Creating RustOpenSci, mirroring the communities of practice of rOpenSci and pyOpenSci?

Comments?

por luizirber em 01 de December de 2019 às 15:00

October 01, 2019

PythonClub

Criando dicts a partir de outros dicts

Neste tutorial, será abordado o processo de criação de um dict ou dicionário, a partir de um ou mais dicts em Python.

Como já é de costume da linguagem, isso pode ser feito de várias maneiras diferentes.

Abordagem inicial

Pra começar, vamos supor que temos os seguintes dicionários:

dict_1 = {
    'a': 1,
    'b': 2,
}

dict_2 = {
    'b': 3,
    'c': 4,
}

Como exemplo, vamos criar um novo dicionário chamado new_dict com os valores de dict_1 e dict_2 logo acima. Uma abordagem bem conhecida é utilizar o método update.

new_dict = {}

new_dcit.update(dict_1)
new_dcit.update(dict_2)

Assim, temos que new_dict será:

>> print(new_dict)
{
    'a': 1,
    'b': 3,
    'c': 4,
}

Este método funciona bem, porém temos de chamar o método update para cada dict que desejamos mesclar em new_dict. Não seria interessante se fosse possível passar todos os dicts necessários já na inicialização de new_dict?

Novidades do Python 3

O Python 3 introduziu uma maneira bem interessante de se fazer isso, utilizando os operadores **.

new_dict = {
    **dict_1,
    **dict_2,
}

Assim, de maneira semelhante ao exemplo anterior, temos que new_dict será :

>> print(new_dict['a'])
1
>> print(new_dict['b'])
3
>> print(new_dict['c'])
4

Cópia real de dicts

Ao utilizamos o procedimento de inicialização acima, devemos tomar conseiderar alguns fatores. Apenas os valores do primeiro nível serão realmente duplicados no novo dicionário. Como exemplo, vamos alterar uma chave presente em ambos os dicts e verificar se as mesmas possuem o mesmo valor:

>> dict_1['a'] = 10
>> new_dict['a'] = 11
>> print(dict_1['a'])
10
>> print(new_dict['a'])
11

Porém isso muda quando um dos valores de dict_1 for uma list, outro dict ou algum objeto complexo. Por exemplo:

dict_3 = {
    'a': 1,
    'b': 2,
    'c': {
        'd': 5,
    }
}

e agora, vamos criar um novo dict a partir desse:

new_dict = {
    **dict_3,
}

Como no exemplo anterior, podemos imaginar que foi realizado uma cópia de todos os elementos de dict_3, porém isso não é totalmente verdade. O que realmente aconteceu é que foi feita uma cópia superficial dos valores de dict_3, ou seja, apenas os valores de primeiro nível foram duplicados. Observe o que acontece quando alteramos o valor do dict presente na chave c.

>> new_dict['c']['d'] = 11
>> print(new_dict['c']['d'])
11
>> print(dict_3['c']['d'])
11 
# valor anterior era 5

No caso da chave c, ela contem uma referência para outra estrutura de dados (um dict, no caso). Quando alteramos algum valor de dict_3['c'], isso reflete em todos os dict que foram inicializados com dict_3. Em outras palavras, deve-se ter cuidado ao inicializar um dict a partir de outros dicts quando os mesmos possuírem valores complexos, como list, dict ou outros objetos (os atributos deste objeto não serão duplicados).

De modo a contornar este inconveniente, podemos utilizar o método deepcopy da lib nativa copy. Agora, ao inicializarmos new_dict:

import copy

dict_3 = {
    'a': 1,
    'b': 2,
    'c': {
        'd': 5,
    }
}

new_dict = copy.deepcopy(dict_3)

O método deepcopy realiza uma cópia recursiva de cada elemento de dict_3, resolvendo nosso problema. Veja mais um exemplo:

>> new_dict['c']['d'] = 11
>> print(new_dict['c']['d'])
11
>> print(dict_3['c']['d'])
5 
# valor não foi alterado

Conclusão

Este artigo tenta demonstrar de maneira simples a criação de dicts, utilizando os diversos recursos que a linguagem oferece bem como os prós e contras de cada abordagem.

Referências

Para mais detalhes e outros exemplos, deem uma olhada neste post do forum da Python Brasil aqui.

É isso pessoal. Obrigado por ler!

por Michell Stuttgart em 01 de October de 2019 às 23:20

September 10, 2019

Humberto Rocha

Desbravando o pygame 5 - Movimento e Colisão

O movimento é uma característica que está presente na maioria dos jogos. Ao saltar entre plataformas, atirar contra a horda de inimigos, pilotar uma nave espacial e correr pelas estradas estamos exercendo movimento, interagindo com o ambiente do jogo, aplicando ações e causando reações. Neste capítulo iremos conhecer os conceitos básicos de movimentação de objetos na tela e sua interação com outros elementos através da detecção de colisão. Movimento Se você vem acompanhando esta série de postagens, teve um breve exemplo de movimentação na postagem sobre game loop, onde uma bola que se movimentava quicando pela tela foi implementada.

10 de September de 2019 às 00:00

August 28, 2019

Humberto Rocha

Publicando meu primeiro Jogo

Jogos sempre me conectam com tecnologia desde o início. Eu e meu pai montamos nosso primeiro computador (um Pentium 286) e a primeira coisa que eu me lembro de fazer, foi jogar os jogos de DOS como Prince of Persia e Lunar Lander. Eu aprendi vários comandos de CLI só para poder jogar os meus jogos favoritos. A paixão por jogar e fazer jogos sempre me acompanhou como um hobby. I tenho uma série de posts sobre pygame neste blog onde eu passo pelos conceitos básicos de desenvolvimento de jogos tentando explicar para pessoas que estejam iniciando seu aprendizado na área.

28 de August de 2019 às 00:00

June 25, 2019

PythonClub

Tutorial Django 2.2

Este tutorial é baseado no Intro to Django que fica na parte de baixo da página start do Django project.

Até a data deste post o Django está na versão 2.2.2, e requer Python 3.

O que você precisa?

Python 3.6 ou superior, pip e virtualenv.

Considere que você tenha instalado Python 3.6 ou superior, pip e virtualenv.

Criando o ambiente

Crie uma pasta com o nome django2-pythonclub

$ mkdir django2-pythonclub
$ cd django2-pythonclub

A partir de agora vamos considerar esta como a nossa pasta principal.

Considerando que você está usando Python 3, digite

python3 -m venv .venv

Lembre-se de colocar esta pasta no seu .gitignore, caso esteja usando.

echo .venv >> .gitignore

Depois ative o ambiente digitando

source .venv/bin/activate

Lembre-se, sempre quando você for mexer no projeto, tenha certeza de ter ativado o virtualenv, executando o comando source .venv/bin/activate. Você deve repetir esse comando toda a vez que você abrir um novo terminal.

Instalando Django 2.2.2

Basta digitar

pip install django==2.2.2

Dica: se você digitar pip freeze você verá a versão dos programas instalados.

É recomendável que você atualize a versão do pip

pip install -U pip

Se der erro então faça:

python -m pip install --upgrade pip

Instalando mais dependências

Eu gosto de usar o django-extensions e o django-widget-tweaks, então digite

pip install django-extensions django-widget-tweaks python-decouple

Importante: você precisa criar um arquivo requirements.txt para instalações futuras do projeto em outro lugar.

pip freeze > requirements.txt

Este é o resultado do meu até o dia deste post:

(.venv):$ cat requirements.txt 

django-extensions==2.1.6
django-widget-tweaks==1.4.3
python-decouple==3.1
pytz==2018.9
six==1.12.0

Escondendo a SECRET_KEY e trabalhando com variáveis de ambiente

É muito importante que você não deixe sua SECRET_KEY exposta. Então remova-o imediatamente do seu settings.py ANTES mesmo do primeiro commit. Espero que você esteja usando Git.

Vamos usar o python-decouple escrito por Henrique Bastos para gerenciar nossas variáveis de ambiente. Repare que já instalamos ele logo acima.

Em seguida você vai precisar criar um arquivo .env, para isso rode o comando a seguir, ele vai criar uma pasta contrib e dentro dele colocar um arquivo env_gen.py

if [ ! -d contrib ]; then mkdir contrib; fi; git clone https://gist.github.com/22626de522f5c045bc63acdb8fe67b24.git contrib/
rm -rf contrib/.git/  # remova a pasta .git que está dentro de contrib.

Em seguida rode

python contrib/env_gen.py

que ele vai criar o arquivo .env.

Supondo que você está versionando seu código com Git, é importante que você escreva isso dentro do seu arquivo .gitignore, faça direto pelo terminal

echo .env >> .gitignore
echo .venv >> .gitignore
echo '*.sqlite3' >> .gitignore

Pronto, agora você pode dar o primeiro commit.

Criando o projeto e a App

Para criar o projeto digite

$ django-admin startproject myproject .

repare no ponto no final do comando, isto permite que o arquivo manage.py fique nesta mesma pasta django2-pythonclub .

Agora vamos criar a app bands, mas vamos deixar esta app dentro da pasta myproject. Então entre na pasta

$ cd myproject

e digite

$ python ../manage.py startapp bands

A intenção é que os arquivos tenham a seguinte hierarquia nas pastas:

.
├── manage.py
├── myproject
│   ├── bands
│   │   ├── admin.py
│   │   ├── apps.py
│   │   ├── models.py
│   │   ├── tests.py
│   │   └── views.py
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
└── requirements.txt

Agora permaneça sempre na pasta django2-pythonclub

cd ..

e digite

$ python manage.py migrate

para criar a primeira migração (isto cria o banco de dados SQLite), e depois rode a aplicação com

$ python manage.py runserver

e veja que a aplicação já está funcionando. Veja o endereço da url aqui

Django version 2.2.2, using settings 'myproject.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Editando settings.py

Em INSTALLED_APPS acrescente as linhas abaixo.

INSTALLED_APPS = (
    ...
    'widget_tweaks',
    'django_extensions',
    'myproject.bands',
)

E mude também o idioma.

LANGUAGE_CODE = 'pt-br'

E caso você queira o mesmo horário de Brasília-BR

TIME_ZONE = 'America/Sao_Paulo'

Já que falamos do python-decouple, precisamos de mais alguns ajustes

from decouple import config, Csv

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = config('SECRET_KEY')

# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = config('DEBUG', default=False, cast=bool)

ALLOWED_HOSTS = config('ALLOWED_HOSTS', default=[], cast=Csv())

Veja que é importante manter sua SECRET_KEY bem guardada (em outro lugar).

Então crie um arquivo .env e guarde sua SECRET_KEY dentro dele, exemplo:

SECRET_KEY=your_secret_key
DEBUG=True
ALLOWED_HOSTS=127.0.0.1,.localhost

Editando models.py

from django.db import models
from django.urls import reverse_lazy


class Band(models.Model):

    """A model of a rock band."""
    name = models.CharField(max_length=200)
    can_rock = models.BooleanField(default=True)

    class Meta:
        ordering = ('name',)
        verbose_name = 'band'
        verbose_name_plural = 'bands'

    def __str__(self):
        return self.name

    def get_absolute_url(self):
        # retorna a url no formato /bands/1/
        return reverse_lazy('band_detail', kwargs={'pk': self.pk})

    def get_members_count(self):
        # count members by band
        # conta os membros por banda
        return self.band.count()


class Member(models.Model):

    """A model of a rock band member."""
    name = models.CharField("Member's name", max_length=200)
    instrument = models.CharField(choices=(
        ('g', "Guitar"),
        ('b', "Bass"),
        ('d', "Drums"),
        ('v', "Vocal"),
        ('p', "Piano"),
    ),
        max_length=1
    )

    band = models.ForeignKey("Band", related_name='band', on_delete=models.CASCADE)

    class Meta:
        ordering = ('name',)
        verbose_name = 'member'
        verbose_name_plural = 'members'

    def __str__(self):
        return self.name

Tem algumas coisas que eu não estou explicando aqui para o tutorial ficar curto, mas uma coisa importante é que, como nós editamos o models.py vamos precisar criar um arquivo de migração do novo modelo. Para isso digite

python manage.py makemigrations
python manage.py migrate

O primeiro comando cria o arquivo de migração e o segundo o executa, criando as tabelas no banco de dados.

Editando urls.py

from django.urls import include, path
from myproject.bands import views as v
from django.contrib import admin

app_name = 'bands'

urlpatterns = [
    path('', v.home, name='home'),
    # path('bands/', v.band_list, name='bands'),
    # path('bands/<int:pk>/', v.band_detail, name='band_detail'),
    # path('bandform/', v.BandCreate.as_view(), name='band_form'),
    # path('memberform/', v.MemberCreate.as_view(), name='member_form'),
    # path('contact/', v.band_contact, name='contact'),
    # path('protected/', v.protected_view, name='protected'),
    # path('accounts/login/', v.message),
    path('admin/', admin.site.urls),
]

Obs: deixei as demais urls comentada porque precisa da função em views.py para que cada url funcione. Descomente cada url somente depois que você tiver definido a função em classe em views.py a seguir.

Editando views.py

from django.shortcuts import render
from django.http import HttpResponse
from django.contrib.auth.decorators import login_required
from django.views.generic import CreateView
from django.urls import reverse_lazy
from .models import Band, Member
# from .forms import BandContactForm, BandForm, MemberForm

Obs: Deixei a última linha comentada porque ainda não chegamos em forms.

A função a seguir retorna um HttpResponse, ou seja, uma mensagem simples no navegador.

def home(request):
    return HttpResponse('Welcome to the site!')

A próxima função (use uma ou outra) renderiza um template, uma página html no navegador.

def home(request):
    return render(request, 'home.html')

A função band_list retorna todas as bandas.

Para fazer a busca por nome de banda usamos o comando search = request.GET.get('search_box'), onde search_box é o nome do campo no template band_list.html.

E os nomes são retornados a partir do comando bands = bands.filter(name__icontains=search). Onde icontains procura um texto que contém a palavra, ou seja, você pode digitar o nome incompleto (ignora maiúsculo ou minúsculo).

def band_list(request):
    """ A view of all bands. """
    bands = Band.objects.all()
    search = request.GET.get('search_box')
    if search:
        bands = bands.filter(name__icontains=search)
    return render(request, 'bands/band_list.html', {'bands': bands})

Em urls.py pode descomentar a linha a seguir:

path('bands/', v.band_list, name='bands'),

A função band_contact mostra como tratar um formulário na view. Esta função requer BandContactForm, explicado em forms.py.

def band_contact(request):
    """ A example of form """
    if request.method == 'POST':
        form = BandContactForm(request.POST)
    else:
        form = BandContactForm()
    return render(request, 'bands/band_contact.html', {'form': form})

Em urls.py pode descomentar a linha a seguir:

path('contact/', v.band_contact, name='contact'),

A função band_detail retorna todos os membros de cada banda, usando o pk da banda junto com o comando filter em members.

def band_detail(request, pk):
    """ A view of all members by bands. """
    band = Band.objects.get(pk=pk)
    members = Member.objects.all().filter(band=band)
    context = {'members': members, 'band': band}
    return render(request, 'bands/band_detail.html', context)

Em urls.py pode descomentar a linha a seguir:

path('bands/<int:pk>/', v.band_detail, name='band_detail'),

BandCreate e MemberCreate usam o Class Based View para tratar formulário de uma forma mais simplificada usando a classe CreateView. O reverse_lazy serve para tratar a url de retorno de página.

As classes a seguir requerem BandForm e MemberForm, explicado em forms.py.

class BandCreate(CreateView):
    model = Band
    form_class = BandForm
    template_name = 'bands/band_form.html'
    success_url = reverse_lazy('bands')


class MemberCreate(CreateView):
    model = Member
    form_class = MemberForm
    template_name = 'bands/member_form.html'
    success_url = reverse_lazy('bands')

Em urls.py pode descomentar a linha a seguir:

path('bandform/', v.BandCreate.as_view(), name='band_form'),
path('memberform/', v.MemberCreate.as_view(), name='member_form'),

A próxima função requer que você entre numa página somente quando estiver logado.

[@login_required](https://docs.djangoproject.com/en/2.2/topics/auth/default/#the-login-required-decorator) é um decorator.

login_url='/accounts/login/' é página de erro, ou seja, quando o usuário não conseguiu logar.

E render(request, 'bands/protected.html',... é página de sucesso.

@login_required(login_url='/accounts/login/')
def protected_view(request):
    """ A view that can only be accessed by logged-in users """
    return render(request, 'bands/protected.html', {'current_user': request.user})

HttpResponse retorna uma mensagem simples no navegador sem a necessidade de um template.

def message(request):
    """ Message if is not authenticated. Simple view! """
    return HttpResponse('Access denied!')

Em urls.py pode descomentar a linha a seguir:

path('protected/', v.protected_view, name='protected'),
path('accounts/login/', v.message),

Comandos básicos do manage.py

Para criar novas migrações com base nas alterações feitas nos seus modelos

$ python manage.py makemigrations bands

Obs: talvez dê erro porque está faltando coisas de forms.py, explicado mais abaixo.

Para aplicar as migrações

$ python manage.py migrate

Para criar um usuário e senha para o admin

$ python manage.py createsuperuser

Para rodar a aplicação localmente

$ python manage.py runserver

Após criar um super usuário você pode entrar em localhost:8000/admin

Obs: Se você entrar agora em localhost:8000 vai faltar o template home.html. Explicado mais abaixo.

shell_plus

É o interpretador interativo do python rodando via terminal direto na aplicação do django.

Com o comando a seguir abrimos o shell do Django.

$ python manage.py shell

Mas se você está usando o django-extensions (mostrei como configurá-lo no settings.py), então basta digitar

$ python manage.py shell_plus

Veja a seguir como inserir dados direto pelo shell.

>>> from myproject.bands.models import Band, Member
>>> # Com django-extensions não precisa fazer o import
>>> # criando o objeto e salvando
>>> band = Band.objects.create(name='Metallica')
>>> band.name
>>> band.can_rock
>>> band.id
>>> # criando uma instancia da banda a partir do id
>>> b = Band.objects.get(id=band.id)
>>> # criando uma instancia do Membro e associando o id da banda a ela
>>> m = Member(name='James Hetfield', instrument='b', band=b)
>>> m.name
>>> # retornando o instrumento
>>> m.instrument
>>> m.get_instrument_display()
>>> m.band
>>> # salvando
>>> m.save()
>>> # listando todas as bandas
>>> Band.objects.all()
>>> # listando todos os membros
>>> Member.objects.all()
>>> # criando mais uma banda
>>> band = Band.objects.create(name='The Beatles')
>>> band = Band.objects.get(name='The Beatles')
>>> band.id
>>> b = Band.objects.get(id=band.id)
>>> # criando mais um membro
>>> m = Member(name='John Lennon', instrument='v', band=b)
>>> m.save()
>>> # listando tudo novamente
>>> Band.objects.all()
>>> Member.objects.all()
>>> exit()

Criando os templates

Você pode criar os templates com os comandos a seguir...

$ mkdir -p myproject/bands/templates/bands
$ touch myproject/bands/templates/{menu,base,home}.html
$ touch myproject/bands/templates/bands/{band_list,band_detail,band_form,band_contact,member_form,protected}.html

... ou pegar os templates já prontos direto do Github.

mkdir -p myproject/bands/templates/bands
wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/myproject/bands/templates/base.html -P myproject/bands/templates/
wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/myproject/bands/templates/home.html -P myproject/bands/templates/
wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/myproject/bands/templates/menu.html -P myproject/bands/templates/
wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/myproject/bands/templates/bands/band_contact.html -P myproject/bands/templates/bands/
wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/myproject/bands/templates/bands/band_detail.html -P myproject/bands/templates/bands/
wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/myproject/bands/templates/bands/band_form.html -P myproject/bands/templates/bands/
wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/myproject/bands/templates/bands/band_list.html -P myproject/bands/templates/bands/
wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/myproject/bands/templates/bands/member_form.html -P myproject/bands/templates/bands/
wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/myproject/bands/templates/bands/protected.html -P myproject/bands/templates/bands/

forms.py

$ touch myproject/bands/forms.py

Edite o forms.py.

from django import forms
from .models import Band, Member


class BandContactForm(forms.Form):
    subject = forms.CharField(max_length=100)
    message = forms.CharField(widget=forms.Textarea)
    sender = forms.EmailField()
    cc_myself = forms.BooleanField(required=False)


class BandForm(forms.ModelForm):

    class Meta:
        model = Band
        fields = '__all__'


class MemberForm(forms.ModelForm):

    class Meta:
        model = Member
        fields = '__all__'

Lembra que eu deixei o código comentado em views.py?

Descomente ele por favor

from .forms import BandContactForm, BandForm, MemberForm

admin.py

Criamos uma customização para o admin onde em members aparece um filtro por bandas.

from django.contrib import admin
from .models import Band, Member


class MemberAdmin(admin.ModelAdmin):
    """Customize the look of the auto-generated admin for the Member model."""
    list_display = ('name', 'instrument')
    list_filter = ('band',)


admin.site.register(Band)  # Use the default options
admin.site.register(Member, MemberAdmin)  # Use the customized options

Carregando dados de um CSV

Vamos baixar alguns arquivos para criar os dados no banco a partir de um CSV.

wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/create_data.py
mkdir fix
wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/fix/bands.csv -P fix/
wget https://raw.githubusercontent.com/rg3915/django2-pythonclub/master/fix/members.csv -P fix/

Estando na pasta principal, rode o comando

python create_data.py

que ele vai carregar alguns dados pra você.

Veja o código de create_data.py.

Veja o código completo em https://github.com/rg3915/django2-pythonclub

git clone https://github.com/rg3915/django2-pythonclub.git

por Regis da Silva em 25 de June de 2019 às 01:00

January 05, 2019

Gabbleblotchits

Scientific Rust #rust2019

The Rust community requested feedback last year for where the language should go in 2018, and now they are running it again for 2019. Last year I was too new in Rust to organize a blog post, but after an year using it I feel more comfortable writing this!

(Check my previous post about replacing the C++ core in sourmash with Rust for more details on how I spend my year in Rust).

What counts as "scientific Rust"?

Anything that involves doing science using computers counts as scientific programming. It includes from embedded software running on satellites to climate models running in supercomputers, from shell scripts running tools in a pipeline to data analysis using notebooks.

It also makes the discussion harder, because it's too general! But it is very important to keep in mind, because scientists are not your regular user: they are highly qualified in their field of expertise, and they are also pushing the boundaries of what we know (and this might need flexibility in their tools).

In this post I will be focusing more in two areas: array computing (what most people consider 'scientific programming' to be) and "data structures".

Array computing

This one is booming in the last couple of years due to industry interest in data sciences and deep learning (where they will talk about tensors instead of arrays), and has its roots in models running in supercomputers (a field where Fortran is still king!). Data tends to be quite regular (representable with matrices) and amenable to parallel processing.

A good example is the SciPy stack in Python, built on top of NumPy and SciPy. The adoption of the SciPy stack (both in academia and industry) is staggering, and many alternative implementations try to provide a NumPy-like API to try to capture its mindshare.

This is the compute-intensive side science (be it CPU or GPU/TPU), and also the kind of data that pushed CPU evolution and is still very important in defining policy in scientific computing funding (see countries competing for the largest supercomputers and measuring performance in floating point operations per second).

Data structures for efficient data representation

For data that is not so regular the situation is a bit different. I'll use bioinformatics as an example: the data we get out of nucleotide sequencers is usually represented by long strings (of ACGT), and algorithms will do a lot of string processing (be it building string-overlap graphs for assembly, or searching for substrings in large collections). This is only one example: there are many analyses that will work with other types of data, and most of them don't have a universal data representation as in the array computing case.

This is the memory-intensive science, and it's hard to measure performance in floating point operations because... most of the time you're not even using floating point numbers. It also suffers from limited data locality (which is almost a prerequisite for compute-intensive performance).

High performance core, interactive API

There is something common in both cases: while performance-intensive code is implemented in C/C++/Fortran, users usually interact with the API from other languages (especially Python or R) because it's faster to iterate and explore the data, and many of the tools already available in these languages are very helpful for these tasks (think Jupyter/pandas or RStudio/tidyverse). These languages are used to define the computation, but it is a lower-level core library that drives it (NumPy or Tensorflow follow this idea, for example).

How to make Rust better for science?

The biggest barrier to learning Rust is the ownership model, and while we can agree it is an important feature it is also quite daunting for newcomers, especially if they don't have previous programming experience and exposure to what bugs are being prevented. I don't see it being the first language we teach to scientists any time soon, because the majority of scientists are not system programmers, and have very different expectations for a programming language. That doesn't mean that they can't benefit from Rust!

Rust is already great for building the performance-intensive parts, and thanks to Cargo it is also a better alternative for sharing this code around, since they tend to get 'stuck' inside Python or R packages. And the 'easy' approach of vendoring C/C++ instead of having packages make it hard to keep track of changes and doesn't encourage reusable code.

And, of course, if this code is Rust instead of C/C++ it also means that Rust users can use them directly, without depending on the other languages. Seems like a good way to bootstrap a scientific community in Rust =]

What I would like to see in 2019?

An attribute proc-macro like `#[wasm_bindgen]` but for FFI

While FFI is an integral part of Rust goals (interoperability with C/C++), I have serious envy of the structure and tooling developed for WebAssembly! (Even more now that it works in stable too)

We already have #[no_mangle] and pub extern "C", but they are quite low-level. I would love to see something closer to what wasm-bindgen does, and define some traits (like IntoWasmAbi) to make it easier to pass more complex data types through the FFI.

I know it's not that simple, and there are different design restrictions than WebAssembly to take into account... The point here is not having the perfect solution for all use cases, but something that serves as an entry point and helps to deal with the complexity while you're still figuring out all the quirks and traps of FFI. You can still fallback and have more control using the lower-level options when the need rises.

More -sys and Rust-like crates for interoperability with the larger ecosystems

There are new projects bringing more interoperability to dataframes and tensors. While this ship has already sailed and they are implemented in C/C++, it would be great to be a first-class citizen, and not reinvent the wheel. (Note: the arrow project already have pretty good Rust support!)

In my own corner (bioinformatics), the Rust-bio community is doing a great job of wrapping useful libraries in C/C++ and exposing them to Rust (and also a shout-out to 10X Genomics for doing this work for other tools while also contributing to Rust-bio!).

More (bioinformatics) tools using Rust!

We already have great examples like finch and yacrd, since Rust is great for single binary distribution of programs. And with bioinformatics focusing so much in independent tools chained together in workflows, I think we can start convincing people to try it out =]

A place to find other scientists?

Another idea is to draw inspiration from rOpenSci and have a Rust equivalent, where people can get feedback about their projects and how to better integrate it with other crates. This is quite close to the working group idea, but I think it would serve more as a gateway to other groups, more focused on developing entry-level docs and bringing more scientists to the community.

Final words

In the end, I feel like this post ended up turning into my 'wishful TODO list' for 2019, but I would love to find more people sharing these goals (or willing to take any of this and just run with it, I do have a PhD to finish! =P)

Comments?

por luizirber em 05 de January de 2019 às 19:00

November 29, 2018

PythonClub

Algoritmos de Ordenação

Fala pessoal, tudo bom?

Nos vídeos abaixo, vamos aprender como implementar alguns dos algoritmos de ordenação usando Python.

Bubble Sort

Como o algoritmo funciona: Como implementar o algoritmo usando Python: https://www.youtube.com/watch?v=Doy64STkwlI.

Como implementar o algoritmo usando Python: https://www.youtube.com/watch?v=B0DFF0fE4rk.

Código do algoritmo

def sort(array):

    for final in range(len(array), 0, -1):
        exchanging = False

        for current in range(0, final - 1):
            if array[current] > array[current + 1]:
                array[current + 1], array[current] = array[current], array[current + 1]
                exchanging = True

        if not exchanging:
            break

Selection Sort

Como o algoritmo funciona: Como implementar o algoritmo usando Python: https://www.youtube.com/watch?v=vHxtP9BC-AA.

Como implementar o algoritmo usando Python: https://www.youtube.com/watch?v=0ORfCwwhF_I.

Código do algoritmo

def sort(array):
    for index in range(0, len(array)):
        min_index = index

        for right in range(index + 1, len(array)):
            if array[right] < array[min_index]:
                min_index = right

        array[index], array[min_index] = array[min_index], array[index]

Insertion Sort

Como o algoritmo funciona: Como implementar o algoritmo usando Python: https://www.youtube.com/watch?v=O_E-Lj5HuRU.

Como implementar o algoritmo usando Python: https://www.youtube.com/watch?v=Sy_Z1pqMgko.

Código do algoritmo

def sort(array):
    for p in range(0, len(array)):
        current_element = array[p]

        while p > 0 and array[p - 1] > current_element:
            array[p] = array[p - 1]
            p -= 1

        array[p] = current_element

Merge Sort

Como o algoritmo funciona: Como implementar o algoritmo usando Python: https://www.youtube.com/watch?v=Lnww0ibU0XM.

Como implementar o algoritmo usando Python - Parte I: https://www.youtube.com/watch?v=cXJHETlYyVk.

Código do algoritmo

def sort(array):
    sort_half(array, 0, len(array) - 1)


def sort_half(array, start, end):
    if start >= end:
        return

    middle = (start + end) // 2

    sort_half(array, start, middle)
    sort_half(array, middle + 1, end)

    merge(array, start, end)


def merge(array, start, end):
    array[start: end + 1] = sorted(array[start: end + 1])

por Lucas Magnum em 29 de November de 2018 às 15:10

October 06, 2018

PythonClub

Trabalhando com operadores ternários

Quando estamos escrevendo um código qualquer, possivelmente a expressão que mais utilizamos é o if. Para qualquer tarefas que buscamos automatizar ou problemas que buscamos resolver, sempre acabamos caindo em lógicas como "Se isso acontecer, então faça aquilo, senão faça aquele outro...".

Quando estamos falando de ações a serem executadas, pessoalmente gosto da forma com que o código fica organizado em python quando usamos este tipo de condições, por exemplo:

if vencer_o_thanos:
    restaurar_a_paz()

else:
    foo()

Graças a indentação e ao espaçamento, vemos onde onde começa e/ou termina o bloco executado caso a varável vencer_o_thanos seja True. Quanto mais if's você aninhar, mais bonito seu código fica e em momento algum o mesmo se torna mais confuso (ao menos, não deveria se tornar). Entretanto, sempre fico extremamente incomodado quando tenho de escrever um bloco apenas marcar uma variável, como por exemplo:

if vencer_o_thanos:
    paz = True

else:
    paz = False

Por isso, para trabalhar com variáveis que possuem um valor condicional, gosto sempre de trabalhar com expressões condicionais, ou como costumam ser chamadas, operadores ternários.

Operadores ternários são todos os operadores que podem receber três operandos. Como as expressões condicionais costumam ser os operadores ternários mais populares nas linguagens em que aparecem, acabamos por associar estes nomes e considerar que são a mesma coisa. Cuidado ao tirar este tipo de conclusão, mesmo que toda vogal esteja no alfabeto, o alfabeto não é composto apenas por vogais.

A estrutura de uma expressão condicional é algo bem simples, veja só:

paz = True if vencer_o_thanos else False
tipo_de_x = "Par" if x % 2 == 0 else "impar"

Resumidamente, teremos um valor seguido de uma condição e por fim seu valor caso a condição seja falsa. Pessoalmente acredito que apesar de um pouco diferente, essa forma de escrita para casos como o exemplificado acima é muito mais clara, mais explicita.

Se você fizer uma tradução literal das booleanas utilizadas no primeiro exemplo, lerá algo como paz é verdadeira caso vencer_o_thanos, caso contrário é Falsa. já o segundo exemplo fica mais claro ainda, pois lemos algo como tipo_de_x é par caso o resto da divisão de x por 2 seja 0, se não, tipo_de_x é impar..

Interpretar código dessa forma pode ser estranho para um programador. Interpretar uma abertura de chave ou uma indentação já é algo mas natural. Todavia, para aqueles que estão começando, o raciocínio ocorre de forma muito mais parecida com a descrita acima. Espero que tenham gostado do texto e que esse conhecimento lhes seja útil.

por Vitor Hugo de Oliveira Vargas em 06 de October de 2018 às 12:21

September 24, 2018

Gabbleblotchits

What open science is about

Today I got a pleasant surprise: Olga Botvinnik posted on Twitter about a poster she is presenting at the Beyond the Cell Atlas conference and she name-dropped a bunch of people that helped her. The cool thing? They are all open source developers, and Olga interacted thru GitHub to ask for features, report bugs and even submit pull requests.

That's what open science is about: collaboration, good practices, and in the end coming up with something that is larger than each individual piece. Now sourmash is better, bamnostic is better, reflow is better. I would like to see this becoming more and more common =]

Comments?

por luizirber em 24 de September de 2018 às 20:00

September 13, 2018

Gabbleblotchits

New crate: nthash

A quick announcement: I wrote a Rust implementation of ntHash and published it in crates.io. It implements an Iterator to take advantage of the rolling properties of ntHash which make it so useful in bioinformatics (where we work a lot with sliding windows over sequences).

It's a pretty small crate, and probably was a better project to learn Rust than doing a sourmash implementation because it doesn't involve gnarly FFI issues. I also put some docs, benchmarks using criterion, and even an oracle property-based test with quickcheck.

More info in the docs, and if you want an ~~optimization~~ versioning bug discussion be sure to check the ntHash bug? repo, which has a (slow) Python implementation and a pretty nice analysis notebook.

Comments?

por luizirber em 13 de September de 2018 às 20:00

August 27, 2018

Gabbleblotchits

Oxidizing sourmash: WebAssembly

sourmash calculates MinHash signatures for genomic datasets, meaning we are reducing the data (via subsampling) to a small representative subset (a signature) capable of answering one question: how similar is this dataset to another one? The key here is that a dataset with 10-100 GB will be reduced to something in the megabytes range, and two approaches for doing that are:

The user install our software in their computer. This is not so bad anymore (yay bioconda!), but still requires knowledge about command line interfaces and how to install all this stuff. The user data never leaves their computer, and they can share the signatures later if they want to.
Provide a web service to calculate signatures. In this case no software need to be installed, but it's up to someone (me?) to maintain a server running with an API and frontend to interact with the users. On top of requiring more maintenance, another drawback is that the user need to send me the data, which is very inefficient network-wise and lead to questions about what I can do with their raw data (and I'm not into surveillance capitalism, TYVM).

But... what if there is a third way?

What if we could keep the frontend code from the web service (very user-friendly) but do all the calculations client-side (and avoid the network bottleneck)? The main hurdle here is that our software is implemented in Python (and C++), which are not supported in browsers. My first solution was to write the core features of sourmash in JavaScript, but that quickly started hitting annoying things like JavaScript not supporting 64-bit integers. There is also the issue of having another codebase to maintain and keep in sync with the original sourmash, which would be a relevant burden for us. I gave a lab meeting about this approach, using a drag-and-drop UI as proof of concept. It did work but it was finicky (dealing with the 64-bit integer hashes is not fun). The good thing is that at least I had a working UI for further testing¹

In "Oxidizing sourmash: Python and FFI" I described my road to learn Rust, but something that I omitted was that around the same time the WebAssembly support in Rust started to look better and better and was a huge influence in my decision to learn Rust. Reimplementing the sourmash C++ extension in Rust and use the same codebase in the browser sounded very attractive, and now that it was working I started looking into how to use the WebAssembly target in Rust.

WebAssembly?

From the official site,

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

You can write WebAssembly by hand, but the goal is to have it as lower level target for other languages. For me the obvious benefit is being able to use something that is not JavaScript in the browser, even though the goal is not to replace JS completely but complement it in a big pain point: performance. This also frees JavaScript from being the target language for other toolchains, allowing it to grow into other important areas (like language ergonomics).

Rust is not the only language targeting WebAssembly: Go 1.11 includes experimental support for WebAssembly, and there are even projects bringing the scientific Python to the web using WebAssembly.

But does it work?

With the Rust implementation in place and with all tests working on sourmash, I added the finishing touches using wasm-bindgen and built an NPM package using wasm-pack: sourmash is a Rust codebase compiled to WebAssembly and ready to use in JavaScript projects.

(Many thanks to Madicken Munk, who also presented during SciPy about how they used Rust and WebAssembly to do interactive visualization in Jupyter and helped with a good example on how to do this properly =] )

Since I already had the working UI from the previous PoC, I refactored the code to use the new WebAssembly module and voilà! It works!². ³ But that was the demo from a year ago with updated code and I got a bit better with frontend development since then, so here is the new demo:

sourmash + Wasm

Drag & drop a FASTA or FASTQ file here to calculate the sourmash signature.

k-mer size: scaled: number of hashes: Input type:

DNA/RNA Protein

Track abundance?

For the source code for this demo, check the sourmash-wasm directory.

Next steps

The proof of concept works, but it is pretty useless right now. I'm thinking about building it as a Web Component and making it really easy to add to any webpage⁴.

Another interesting feature would be supporting more input formats (the GMOD project implemented a lot of those!), but more features are probably better after something simple but functional is released =P

Next time!

Where we will go next? Maybe explore some decentralized web technologies like IPFS and dat, hmm? =]

Comments?

Updates

2018-08-30: Added a demo in the blog post.

Footnotes

even if horrible, I need to get some design classes =P ↩
the first version of this demo only worked in Chrome because they implemented the BigInt proposal, which is not in the official language yet. The funny thing is that BigInt would have made the JS implementation of sourmash viable, and I probably wouldn't have written the Rust implementation =P. Turns out that I didn't need the BigInt support if I didn't expose any 64-bit integers to JS, and that is what I'm doing now. ↩
Along the way I ended up writing a new FASTQ parser... because it wouldn't be bioinformatics if it didn't otherwise, right? =P ↩
or maybe a React component? I really would like to have something that works independent of framework, but not sure what is the best option in this case... ↩

por luizirber em 27 de August de 2018 às 18:30

August 23, 2018

Gabbleblotchits

Oxidizing sourmash: Python and FFI

I think the first time I heard about Rust was because Frank Mcsherry chose it to write a timely dataflow implementation. Since then it started showing more and more in my news sources, leading to Armin Ronacher publishing a post in the Sentry blog last November about writing Python extensions in Rust.

Last December I decided to give it a run: I spent some time porting the C++ bits of sourmash to Rust. The main advantage here is that it's a problem I know well, so I know what the code is supposed to do and can focus on figuring out syntax and the mental model for the language. I started digging into the symbolic codebase and understanding what they did, and tried to mirror or improve it for my use cases.

(About the post title: The process of converting a codebase to Rust is referred as "Oxidation" in the Rust community, following the codename Mozilla chose for the process of integrating Rust components into the Firefox codebase. ¹ Many of these components were tested and derived in Servo, an experimental browser engine written in Rust, and are being integrated into Gecko, the current browser engine (mostly written in C++).)

Why Rust?

There are other programming languages more focused on scientific software that could be used instead, like Julia². Many programming languages start from a specific niche (like R and statistics, or Maple and mathematics) and grow into larger languages over time. While Rust goal is not to be a scientific language, its focus on being a general purpose language allows a phenomenon similar to what happened with Python, where people from many areas pushed the language in different directions (system scripting, web development, numerical programming...) allowing developers to combine all these things in their systems.

But by far my interest in Rust is for the many best practices it brings to the default experience: integrated package management (with Cargo), documentation (with rustdoc), testing and benchmarking. It's understandable that older languages like C/C++ need more effort to support some of these features (like modules and an unified build system), since they are designed by standard and need to keep backward compatibility with codebases that already exist. Nonetheless, the lack of features increase the effort needed to have good software engineering practices, since you need to choose a solution that might not be compatible with other similar but slightly different options, leading to fragmentation and increasing the impedance to use these features.

Another big reason is that Rust doesn't aim to completely replace what already exists, but complement and extend it. Two very good talks about how to do this, one by Ashley Williams, another by E. Dunham.

Converting from a C++ extension to Rust

The current implementation of the core data structures in sourmash is in a C++ extension wrapped with Cython. My main goals for converting the code are:

support additional languages and platforms. sourmash is available as a Python package and CLI, but we have R users in the lab that would benefit from having an R package, and ideally we wouldn't need to rewrite the software every time we want to support a new language.
reducing the number of wheel packages necessary (one for each OS/platform).
in the long run, use the Rust memory management concepts (lifetimes, borrowing) to increase parallelism in the code.

Many of these goals are attainable with our current C++ codebase, and "rewrite in a new language" is rarely the best way to solve a problem. But the reduced burden in maintenance due to better tooling, on top of features that would require careful planning to execute (increasing the parallelism without data races) while maintaining compatibility with the current codebase are promising enough to justify this experiment.

Cython provides a nice gradual path to migrate code from Python to C++, since it is a superset of the Python syntax. It also provides low overhead for many C++ features, especially the STL containers, with makes it easier to map C++ features to the Python equivalent. For research software this also lead to faster exploration of solutions before having to commit to lower level code, but without a good process it might also lead to code never crossing into the C++ layer and being stuck in the Cython layer. This doesn't make any difference for a Python user, but it becomes harder from users from other languages to benefit from this code (since your language would need some kind of support to calling Python code, which is not as readily available as calling C code).

Depending on the requirements, a downside is that Cython is tied to the CPython API, so generating the extension requires a development environment set up with the appropriate headers and compiler. This also makes the extension specific to a Python version: while this is not a problem for source distributions, generating wheels lead to one wheel for each OS and Python version supported.

The new implementation

This is the overall architecture of the Rust implementation: It is pretty close to what symbolic does, so let's walk through it.

The Rust code

If you take a look at my Rust code, you will see it is very... C++. A lot of the code is very similar to the original implementation, which is both a curse and a blessing: I'm pretty sure that are more idiomatic and performant ways of doing things, but most of the time I could lean on my current mental model for C++ to translate code. The biggest exception was the merge function, were I was doing something on the C++ implementation that the borrow checker didn't like. Eventually I found it was because it couldn't keep track of the lifetime correctly and putting braces around it fixed the problem, which was both an epiphany and a WTF moment. Here is an example that triggers the problem, and the solution.

"Fighting the borrow checker" seems to be a common theme while learning Rust, but the compiler really tries to help you to understand what is happening and (most times) how to fix it. A lot of people grow to hate the borrow checker, but I see it more as a 'eat your vegetables' situation: you might not like it at first, but it's better in the long run. Even though I don't have a big codebase in Rust yet, it keeps you from doing things that will come back to bite you hard later.

Generating C headers for Rust code: cbindgen

With the Rust library working, the next step was taking the Rust code and generate C headers describing the functions and structs we expose with the #[no_mangle] attribute in Rust (these are defined in the ffi.rs module in sourmash-rust). This attribute tells the Rust compiler to generate names that are compatible with the C ABI, and so can be called from other languages that implement FFI mechanisms. FFI (the foreign function interface) is quite low-level, and pretty much defines things that C can represent: integers, floats, pointers and structs. It doesn't support higher level concepts like objects or generics, so in a sense it looks like a feature funnel. This might sound bad, but ends up being something that other languages can understand without needing too much extra functionality in their runtimes, which means that most languages have support to calling code through an FFI.

Writing the C header by hand is possible, but is very error prone. A better solution is to use cbindgen, a program that takes Rust code and generate a C header file automatically. cbindgen is developed primarily to generate the C headers for webrender, the GPU-based renderer for servo, so it's pretty likely that if it can handle a complex codebase it will work just fine for the majority of projects.

Interfacing with Python: CFFI and Milksnake

Once we have the C headers, we can use the FFI to call Rust code in Python. Python has a FFI module in the standard library: ctypes, but the Pypy developers also created CFFI, which has more features.

The C headers generated by cbindgen can be interpreted by CFFI to generate a low-level Python interface for the code. This is the equivalent of declaring the functions/methods and structs/classes in a pxd file (in the Cython world): while the code is now usable in Python, it is not well adapted to the features and idioms available in the language.

Milksnake is the package developed by Sentry that takes care of running cargo for the Rust compilation and generating the CFFI boilerplate, making it easy to load the low-level CFFI bindings in Python. With this low-level binding available we can now write something more Pythonic (the pyx file in Cython), and I ended up just renaming the _minhash.pyx file back to minhash.py and doing one-line fixes to replace the Cython-style code with the equivalent CFFI calls.

All of these changes should be transparent to the Python code, and to guarantee that I made sure that all the current tests that we have (both for the Python module and the command line interface) are still working after the changes. It also led to finding some quirks in the implementation, and even improvements in the current C++ code (because we were moving a lot of data from C++ to Python).

Where I see this going

It seems it worked as an experiment, and I presented a poster at GCCBOSC 2018 and SciPy 2018 that was met with excitement by many people. Knowing that it is possible, I want to reiterate some points why Rust is pretty exciting for bioinformatics and science in general.

Bioinformatics as libraries (and command line tools too!)

Bioinformatics is an umbrella term for many different methods, depending on what analysis you want to do with your data (or model). In this sense, it's distinct from other scientific areas where it is possible to rely on a common set of libraries (numpy in linear algebra, for example), since a library supporting many disjoint methods tend to grow too big and hard to maintain.

The environment also tends to be very diverse, with different languages being used to implement the software. Because it is hard to interoperate, the methods tend to be implemented in command line programs that are stitched together in pipelines, a workflow describing how to connect the input and output of many different tools to generate results. Because the basic unit is a command-line tool, pipelines tend to rely on standard operating system abstractions like files and pipes to make the tools communicate with each other. But since tools might have input requirements distinct from what the previous tool provides, many times it is necessary to do format conversion or other adaptations to make the pipeline work.

Using tools as blackboxes, controllable through specific parameters at the command-line level, make exploratory analysis and algorithm reuse harder: if something needs to be investigated the user needs to resort to perturbations of the parameters or the input data, without access to the more feature-rich and meaningful abstraction happening inside the tool.

Even if many languages are used for writing the software, most of the time there is some part written in C or C++ for performance reasons, and these tend to be the core data structures of the computational method. Because it is not easy to package your C/C++ code in a way that other people can readily use it, most of this code is reinvented over and over again, or is copy and pasted into codebases and start diverging over time. Rust helps solve this problem with the integrated package management, and due to the FFI it can also be reused inside other programs written in other languages.

sourmash is not going to be Rust-only and abandon Python, and it would be crazy to do so when it has so many great exploratory tools for scientific discovery. But now we can also use our method in other languages and environment, instead of having our code stuck in one language.

Don't rewrite it all!

I could have gone all the way and rewrite sourmash in Rust³, but it would be incredibly disruptive for the current sourmash users and it would take way longer to pull off. Because Rust is so focused in supporting existing code, you can do a slow transition and reuse what you already have while moving into more and more Rust code. A great example is this one-day effort by Rob Patro to bring CQF (a C codebase) into Rust, using bindgen (a generator of C bindings for Rust). Check the Twitter thread for more =]

Good scientific citizens

There is another MinHash implementation already written in Rust, finch. Early in my experiment I got an email from them asking if I wanted to work together, but since I wanted to learn the language I kept doing my thing. (They were totally cool with this, by the way). But the fun thing is that Rust has a pair of traits called From and Into that you can implement for your type, and so I did that and now we can have interoperable implementations. This synergy allows finch to use sourmash methods, and vice versa.

Maybe this sounds like a small thing, but I think it is really exciting. We can stop having incompatible but very similar methods, and instead all benefit from each other advances in a way that is supported by the language.

Next time!

Turns out Rust supports WebAssembly as a target, so... what if we run sourmash in the browser? That's what I'm covering in the next blog post, so stay tuned =]

Comments?

Footnotes

The creator of the language is known to keep making up different explanations for the name of the language, but in this case "oxidation" refers to the chemical process that creates rust, and rust is the closest thing to metal (metal being the hardware). There are many terrible puns in the Rust community. ↩
Even more now that it hit 1.0, it is a really nice language ↩
and risk being kicked out of the lab =P ↩

por luizirber em 23 de August de 2018 às 20:00

May 17, 2018

PythonClub

Upload de Arquivos com Socket e Struct

Apesar de termos muitas formas de enviarmos arquivos para servidores hoje em dia, como por exemplo o scp e rsync, podemos usar o python com seus módulos built-in para enviar arquivos a servidores usando struct para serializar os dados e socket para criar uma conexão cliente/servidor.

Struct

O módulo struct é usado para converter bytes no python em formatos do struct em C. Com ele podemos enviar num único conjunto de dados o nome de um arquivo e os bytes referentes ao seus dados.

Struct também é utilizado para serializar diversos tipos de dados diferentes, como bytes, inteiros, floats além de outros, no nosso caso usaremos apenas bytes.

Vamos criar um arquivo para serializar.

!echo "Upload de arquivos com sockets e struct\nCriando um arquivo para serializar." > arquivo_para_upload

Agora em posse de um arquivo vamos criar nossa estrutura de bytes para enviar.

arquivo = "arquivo_para_upload"

with open(arquivo, 'rb') as arq:
    dados_arquivo = arq.read()
    serializar = struct.Struct("{}s {}s".format(len(arquivo), len(dados_arquivo)))
    dados_upload = serializar.pack(*[arquivo.encode(), dados_arquivo])

Por padrão, struct usa caracteres no início da sequência dos dados para definir a ordem dos bytes, tamanho e alinhamento dos bytes nos dados empacotados. Esses caracteres podem ser vistos na seção 7.1.2.1 da documentação. Como não definimos, será usado o @ que é o padrão.

Nessa linha:

serializar = struct.Struct("{}s {}s".format(len(arquivo), len(dados_arquivo)))

definimos que nossa estrutura serializada seria de dois conjuntos de caracteres, a primeira com o tamanho do nome do arquivo, e a segunda com o tamanho total dos dados lidos em

dados_arquivo = arq.read()

Se desempacotarmos os dados, teremos uma lista com o nome do arquivo e os dados lidos anteriormente.

serializar.unpack(dados_upload)[0]

b'arquivo_para_upload'

serializar.unpack(dados_upload)[1]

b'Upload de arquivos com sockets e struct\\nCriando um arquivo para serializar.\n'

Agora de posse dos nossos dados já serializados, vamos criar um cliente e um servidor com socket para transferirmos nosso arquivo.

Socket

O modulo socket prove interfaces de socket BSD, disponiveis em praticamente todos os sistemas operacionais.

Familias de sockets

Diversas famílias de sockets podem ser usadas para termos acessos a objetos que nos permitam fazer chamadas de sistema. Mais informações sobre as famílias podem ser encontradas na seção 18.1.1 da documentação. No nosso exemplo usaremos a AF_INET.

AF_INET

AF_INET precisa basicamente de um par de dados, contendo um endereço IPv4 e uma porta para ser instanciada. Para endereços IPv6 o modulo disponibiliza o AF_INET6

Constante [SOCK_STREAM]

As constantes representam as famílias de sockets, como a constante AF_INET e os protocolos usados como parâmetros para o modulo socket. Um dos protocolos mais usados encontrados na maioria dos sistemas é o SOCK_STREAM.

Ele é um protocolo baseado em comunicação que permite que duas partes estabeleçam uma conexão e conversem entre si.

Servidor e cliente socket

Como vamos usar um protocolo baseado em comunicação, iremos construir o servidor e cliente paralelamente para um melhor entendimento.

Servidor

Para esse exemplo eu vou usar a porta 6124 para o servidor, ela esta fora da range reservada pela IANA para sistemas conhecidos, que vai de 0-1023.

Vamos importar a bilioteca socket e definir um host e porta para passarmos como parametro para a constante AF_INET.

import socket
host = "127.0.0.1"
porta = 6124
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

Agora usaremos o metodo bind para criarmos um ponto de conexão para nosso cliente. Esse método espera por uma tupla contento o host e porta como parâmetros.

sock.bind((host, porta))

Agora vamos colocar nosso servidor socket em modo escuta com o metodo listen. Esse método recebe como parâmetro um número inteiro (backlog) definindo qual o tamanho da fila que será usada para receber pacotes SYN até dropar a conexão. Usaremos um valor baixo o que evita SYN flood na rede. Mais informações sobre backlog podem ser encontradas na RFC 7413.

sock.listen(5)

Agora vamos colocar o nosso socket em um loop esperando por uma conexão e um início de conversa. Pra isso vamos usar o metodo accept que nos devolve uma tupla, onde o primeiro elemento é um novo objeto socket para enviarmos e recebermos informações, e o segundo contendo informações sobre o endereço de origem e porta usada pelo cliente.

Vamos criar um diretório para salvar nosso novo arquivo.

!mkdir arquivos_recebidos

Os dados são enviados sempre em bytes. Leia os comentários

while True:
    novo_sock, cliente = sock.accept()
    with novo_sock:  # Caso haja uma nova conexão
        ouvir = novo_sock.recv(1024)  # Colocamos nosso novo objeto socket para ouvir
        if ouvir != b"":  # Se houver uma mensagem...
            """
            Aqui usaremos os dados enviados na mensagem para criar nosso serielizador.

            Com ele criado poderemos desempacotar os dados assim que recebermos.
            Veja no cliente mais abaixo qual a primeira mensagem enviada.
            """
            mensagem, nome, dados = ouvir.decode().split(":")
            serializar = struct.Struct("{}s {}s".format(len(nome.split()[0]), int(dados.split()[0])))
            novo_sock.send("Pode enviar!".encode())  # Enviaremos uma mensagem para o cliente enviar os dados.
            dados = novo_sock.recv(1024)  # Agora iremos esperar por eles.
            nome, arquivo = serializar.unpack(dados)  # Vamos desempacotar os dados
            """
            Agora enviamos uma mensagem dizendo que o arquivo foi recebido.

            E iremos salva-lo no novo diretório criado.
            """
            novo_sock.send("Os dados do arquivo {} foram enviados.".format(nome.decode()).encode())
            with open("arquivos_recebidos/{}".format(nome.decode()), 'wb') as novo_arquivo:
                novo_arquivo.write(arquivo)
                print("Arquivo {} salvo em arquivos_recebidos.".format(nome.decode()))

    Arquivo arquivo_para_upload salvo em arquivos_recebidos.

Cliente

Nosso cliente irá usar o metodo connect para se conectar no servidor e a partir dai começar enviar e receber mensagens. Ele também recebe como parâmetros uma tupla com o host e porta de conexão do servidor.

host = '127.0.0.1'
porta = 6124
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)  # Cria nosso objeto socket
sock.connect((host, porta))
sock.send("Enviarei um arquivo chamado: {} contendo: {} bytes".format(
    arquivo, len(dados_arquivo)).encode())  # Enviamos a mensagem com o nome e tamanho do arquivo.
ouvir = sock.recv(1024)  # Aguardamos uma mensagem de confirmação do servidor.
if ouvir.decode() == "Pode enviar!":
    sock.send(dados_upload)  # Enviamos os dados empacotados.
    resposta = sock.recv(1024)  # Aguardamos a confirmação de que os dados foram enviados.
    print(resposta.decode())

    Os dados do arquivo arquivo_para_upload foram enviados.

Agora podemos checar nossos arquivos e ver se eles foram salvos corretamente.

!md5sum arquivo_para_upload; md5sum arquivos_recebidos/arquivos_para_upload

    605e99b3d873df0b91d8834ff292d320  arquivo_para_upload
    605e99b3d873df0b91d8834ff292d320  arquivos_recebidos/arquivo_para_upload

Com base nesse exemplo, podem ser enviados diversos arquivos, sendo eles texto, arquivos compactados ou binários.

Sem mais delongas, fiquem com Cher e até a próxima!

Abraços.

por Silvio Ap Silva em 17 de May de 2018 às 22:24

May 15, 2018

PythonClub

Monitorando Ips Duplicados na Rede

Muitos administradores de redes e sysadmins encontram problemas de conectividade nos ambientes que administram e por muitas vezes o problema é um simples IP duplicado causando todo mal estar. Agora veremos como usar o scapy e defaultdict da lib collections para monitorar esses IPs.

Scapy

O Scapy é uma poderosa biblioteca de manipulação de pacotes interativa, com abrangencia a uma enorme quantidade de protocolos provenientes da suite TCP/IP. Mais informações sobre o scpay pode ser encontrada na documentação oficial. Nesse caso em especifico iremos utilizar do scapy a metaclasse ARP e a função sniff.

from scapy.all import ARP, sniff

sniff

Vamos usar a função sniff para monitorar os pacotes que trafegam na rede usando o protocolo ARP. Pra isso vamos utilizar dela quatro parametros basicos:

sniff(prn=pacotes, filter="arp", iface=interface, timeout=10)

prn, chama uma função para ser aplicada a cada pacote capturado pelo sniff.
filter, irá filtrar todos os pacotes que contiverem o protocolo ARP.
iface, determina a interface de rede que será monitorada.
timeout, irá determinar que nosso monitoramento da rede se dara por 60 segundos.

ARP

ARP é uma metaclasse de pacotes com dados sobre o protocolo arp pertencente a camada de enlace de dados. Iremos utilizar essa metaclasse para filtrar os pacotes com informações de pacotes com respostas a requisições arp. (opcode == 2 [is at]) As informações sobre o protocolo ARP podem serm encontradas na rfc826 no site do IETF.

collections.defaultdict

defaultdict é uma subclasse de dict que prove uma instancia de variavel para a chamada de uma chave inexistente.

from collections import defaultdict
list_ips = defaultdict(set)

Basicamente nossa função irá monitorar por um certo tempo o trafego de pacotes pela rede adicionar a nossa variavel list_ips o endereço ou endereços MAC encontrados.

Definindo a função que será passada como parametro para o sniff.

Para cada pacote capturado pela função sniff, será checado se o opcode corresponde a um response do protocolo arp. Caso seja, sera adicionado a nossa defaultdict.

def pacotes(pacote):
    """Checa se o valor do opcode dentro do protocolo arp é igual a 2."""
    if pacote[ARP].op == 2:
        # Se for adiciona o ip de origem e seu mac à dict list_ips
        list_ips[pacote[ARP].psrc].add(pacote[ARP].hwsrc)

Limpando a tabela arp

Para que seja feita novas requisições arp, iremos limpar nossa tabela arp e iniciar o monitoramento da rede. Pra isso iremos usar o comando arp, dentro do shell do sistema. (Como uso FreeBSD vou definir uma função chamando um comando pelo csh)

import os
os.system('which arp')
/usr/sbin/arp

Com posse do caminho do comando arp, irei definir uma função que limpe a tabela e inicie o monitore a rede por 60 segundos.

def monitorar(interface):
    """
    O comando arp no FreeBSD usa os parametros:

    -d para deletar as entradas
    -i para declarar a interface
    -a para representar todas entradas a serem deletas.
    """
    cmd = "/usr/sbin/arp -d -i {} -a".format(interface)
    os.system(cmd)
    sniff(prn=pacotes, filter="arp", iface=interface, timeout=10)

E por ultimo chamar a função de monitoramento. No meu caso eu vou monitorar a interface em0.

monitorar("em0")

Agora só conferir as entradas em nossa dict.

for ip in list_ips:
    print "IP: {} -> MACs: {}".format(ip, ", ".join(list(list_ips[ip])))

IP: 192.168.213.1 -> MACs: 00:90:0b:49:3d:0a
IP: 192.168.213.10 -> MACs: 08:00:27:bf:52:6d, a0:f3:c1:03:74:6a

Eu uso um script rodando nos switchs e gateway da rede que me enviam mensagens assim que ips duplicados são encontrados na rede. Também da pra usar o arping do scpay para fazer as requisições arp e coletar os responses.

Abraços.

por Silvio Ap Silva em 15 de May de 2018 às 13:24

May 01, 2018

Lauro Moura

C++ para Pythonistas – Introdução e iteradores

C++ pode ter a fama de ser uma linguagem com complexidade comparável à legislação tributária brasileira, mas ao mesmo tempo é uma linguagem extremamente poderosa, podendo ser usada tanto desde microcontroladores até super computadores, sem contar satélites espaciais. Tamanha complexidade pode assustar quem vem de linguagens mais simples como Python (malloc? rvalue references?). Uma ajuda … Continuar lendo →

por lauro em 01 de May de 2018 às 02:10

April 19, 2018

Magnun Leno

Algumas Razões Para Amar o PostgresSQL

Geralmente, quando se fala em SGBDs OpenSource, a primeira resposta que se ouve é MySQL/MariaDB. Eu sempre torço meu nariz para respostas como essa… Implicancia pessoal? Talvez um pouco, mas existem muitos fundamentos…

Algumas Razões Para Amar o PostgresSQL é um artigo original de Mind Bending

por Magnun em 19 de April de 2018 às 18:02

February 26, 2018

Python Help

Preservando a ordem de sequências ao remover duplicatas

Imagine que você tenha uma lista com as URLs extraídas de uma página web e queira eliminar as duplicatas da mesma. Transformar a lista em um conjunto (set) talvez seja a forma mais comum de se fazer isso. Tipo assim: Porém, observe que perdemos a ordem original dos elementos. Esse é um efeito colateral indesejado […]

por Valdir Stumm Jr em 26 de February de 2018 às 01:31

February 16, 2018

PythonClub

Django Rest Framework - #3 Class Based Views

0 - Quickstart
1 - Serialization
2 - Requests & Responses
3 - Class based views

Este post é continuação do post Django Rest Framework Requests & Responses.

Finalmente chegamos as views baseadas em classes. A grande vantagem é que com poucas linhas de código já temos nossa API pronta.

Veja como fica a views.py:

from django.http import Http404
from rest_framework.views import APIView
from rest_framework.response import Response
from rest_framework import status
from core.models import Person
from core.serializers import PersonSerializer


class PersonList(APIView):
    """
    List all persons, or create a new person.
    """

    def get(self, request, format=None):
        persons = Person.objects.all()
        serializer = PersonSerializer(persons, many=True)
        return Response(serializer.data)

    def post(self, request, format=None):
        serializer = PersonSerializer(data=request.data)
        if serializer.is_valid():
            serializer.save()
            return Response(serializer.data, status=status.HTTP_201_CREATED)
        return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)


class PersonDetail(APIView):
    """
    Retrieve, update or delete a person instance.
    """

    def get_object(self, pk):
        try:
            return Person.objects.get(pk=pk)
        except Person.DoesNotExist:
            raise Http404

    def get(self, request, pk, format=None):
        person = self.get_object(pk)
        serializer = PersonSerializer(person)
        return Response(serializer.data)

    def put(self, request, pk, format=None):
        person = self.get_object(pk)
        serializer = PersonSerializer(person, data=request.data)
        if serializer.is_valid():
            serializer.save()
            return Response(serializer.data)
        return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)

    def delete(self, request, pk, format=None):
        person = self.get_object(pk)
        person.delete()
        return Response(status=status.HTTP_204_NO_CONTENT)

E urls.py:

urlpatterns = [
    path('persons/', views.PersonList.as_view()),
    path('persons/<int:pk>/', views.PersonDetail.as_view()),
]

Usando Mixins

Repare que no exemplo anterior tivemos que definir os métodos get(), post(), put() e delete(). Podemos reduzir ainda mais esse código com o uso de mixins.

from rest_framework import mixins
from rest_framework import generics
from core.models import Person
from core.serializers import PersonSerializer


class PersonList(mixins.ListModelMixin,
                 mixins.CreateModelMixin,
                 generics.GenericAPIView):
    queryset = Person.objects.all()
    serializer_class = PersonSerializer

    def get(self, request, *args, **kwargs):
        return self.list(request, *args, **kwargs)

    def post(self, request, *args, **kwargs):
        return self.create(request, *args, **kwargs)


class PersonDetail(mixins.RetrieveModelMixin,
                   mixins.UpdateModelMixin,
                   mixins.DestroyModelMixin,
                   generics.GenericAPIView):
    queryset = Person.objects.all()
    serializer_class = PersonSerializer

    def get(self, request, *args, **kwargs):
        return self.retrieve(request, *args, **kwargs)

    def put(self, request, *args, **kwargs):
        return self.update(request, *args, **kwargs)

    def delete(self, request, *args, **kwargs):
        return self.destroy(request, *args, **kwargs)

Usando generic class-based views

E para finalizar usamos ListCreateAPIView e RetrieveUpdateDestroyAPIView que já tem todos os métodos embutidos.

from rest_framework import generics
from core.models import Person
from core.serializers import PersonSerializer


class PersonList(generics.ListCreateAPIView):
    queryset = Person.objects.all()
    serializer_class = PersonSerializer


class PersonDetail(generics.RetrieveUpdateDestroyAPIView):
    queryset = Person.objects.all()
    serializer_class = PersonSerializer

Versão final de views.py.

Abraços.

por Regis da Silva em 16 de February de 2018 às 01:00

February 15, 2018

PythonClub

Django Rest Framework - #2 Requests and Responses

0 - Quickstart
1 - Serialization
2 - Requests & Responses
3 - Class based views

Este post é continuação do post Django Rest Framework Serialization.

O uso de requests e responses torna nossa api mais flexível. A funcionalidade principal do objeto Request é o atributo request.data, que é semelhante ao request.POST, mas é mais útil para trabalhar com APIs.

Objeto Response

Introduzimos aqui um objeto Response, que é um tipo de TemplateResponse que leva conteúdo não renderizado e usa a negociação de conteúdo para determinar o tipo de conteúdo correto para retornar ao cliente.

return Response(data) # Renderiza para o tipo de conteúdo conforme solicitado pelo cliente.

Repare também no uso de status code pré definidos, exemplo: status.HTTP_400_BAD_REQUEST.

E usamos o decorador @api_view para trabalhar com funções. Ou APIView para classes.

Nosso código ficou assim:

# views.py
from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from core.models import Person
from core.serializers import PersonSerializer


@api_view(['GET', 'POST'])
def person_list(request):
    """
    List all persons, or create a new person.
    """
    if request.method == 'GET':
        persons = Person.objects.all()
        serializer = PersonSerializer(persons, many=True)
        return Response(serializer.data)

    elif request.method == 'POST':
        serializer = PersonSerializer(data=request.data)
        if serializer.is_valid():
            serializer.save()
            return Response(serializer.data, status=status.HTTP_201_CREATED)
        return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)


@api_view(['GET', 'PUT', 'DELETE'])
def person_detail(request, pk):
    """
    Retrieve, update or delete a person instance.
    """
    try:
        person = Person.objects.get(pk=pk)
    except Person.DoesNotExist:
        return Response(status=status.HTTP_404_NOT_FOUND)

    if request.method == 'GET':
        serializer = PersonSerializer(person)
        return Response(serializer.data)

    elif request.method == 'PUT':
        serializer = PersonSerializer(person, data=request.data)
        if serializer.is_valid():
            serializer.save()
            return Response(serializer.data)
        return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)

    elif request.method == 'DELETE':
        person.delete()
        return Response(status=status.HTTP_204_NO_CONTENT)

Veja no GitHub.

Usando sufixo opcional

Em core/urls.py acrescente

from rest_framework.urlpatterns import format_suffix_patterns

...

urlpatterns = format_suffix_patterns(urlpatterns)

E em views.py acrescente format=None como parâmetro das funções a seguir:

def person_list(request, format=None):

def person_detail(request, pk, format=None):

Com isso você pode chamar a api da seguinte forma:

http http://127.0.0.1:8000/persons.json # ou
http http://127.0.0.1:8000/persons.api

Até a próxima.

por Regis da Silva em 15 de February de 2018 às 01:00

December 24, 2017

PythonClub

Programação funcional com Python #2 - Iteraveis e iteradores

2. Iteráveis e iteradores

O que são iteráveis? Basicamente e a grosso modo, iteráveis em python são todos os objetos que implementam o método __getitem__ ou __iter__. Beleza, vamos partir do simples.

Quase todos os tipos de dados em python são iteráveis, por exemplo: listas, strings, tuplas, dicionários, conjuntos, etc...

Vamos aos exemplos, eles são sempre mágicos:

lista = [1, 2, 3, 4, 5]

# iteração
for x in lista:
    print(x)

# 1
# 2
# 3
# 4
# 5

Era só isso? Sim, nem doeu, fala a verdade.

Em python, o comando for nos fornece um iterador implícito. O que? Não entendi.

O laço for em python itera em cada elemento da sequência. Como no exemplo, o for, ou foreach, no caso vai passando por cada elemento da sequência. Não é necessária a implementação de um index como na linguagem C, onde a iteração é explícita:

for (i = 0; i > 10; i++){
    sequencia[i];
}

2.1 `getitem`

O padrão de projeto iterator em python já vem implementado por padrão, como já foi dito antes. Basta que um objeto tenha os métodos __iter__ ou __getitem__ para que um laço possa ser utilizado.

Vamos exemplificar:

class iteravel:
    """
    Um objeto que implementa o `__getitem__` pode ser acessado por posição
    """
    def __init__(self, sequencia):
        self.seq = sequencia

    def __getitem__(self, posicao):
        """
        Por exemplo, quando tentamos acessar um elemento da sequência usando
        slice:
            >>> iteravel[2]

        O interpretador python chama o __getitem__ do objeto e nos retorna
            a posição solicitada

        um exemplo:
        IN: lista = [1, 2, 3, 4]
        IN: lista[0]
        OUT: 1
        """
        return self.seq[posicao]

Então, pode-se compreender, sendo bem rústico, que todos os objetos que implementam __getitem__ são iteráveis em python.

2.2 `iter`

Agora os objetos que implementam __iter__ tem algumas peculiaridades. Por exemplo, quando o iterável (vamos pensar no for) chamar a sequência, ela vai pedir o __iter__ que vai retornar uma instância de si mesmo para o for e ele vai chamar o __next__ até que a exceção StopIteration aconteça.

Uma classe que implementa __iter__:

class iteravel:
    def __init__(self, sequencia):
       self.data = sequencia

    def __next__(self):
        """
        Neste caso, como o método pop do objeto data é chamado
            (vamos pensar em uma lista) ele vai retornar o primeiro elemento
            (o de index 0) e vai remove-lo da lista

        E quando a sequência estiver vazia ele vai nos retornar um StopIteration
            O que vai fazer com que a iteração acabe.

        Porém, estamos iterando com pop, o que faz a nossa iteração ser a única,
            pois ela não pode ser repetida, dado que os elementos foram
            removidos da lista
        """
        if not self.sequencia:
           raise StopIteration
        return self.sequencia.pop()

    def __iter__(self):
        return self

Como é possível notar, o objeto com __iter__ não necessita de um __getitem__ e vise-versa. As diferenças partem do ponto em que um pode ser acessado por index/slice e outro não. Também um bom ponto é que nesse nosso caso, removemos os elementos da sequência, ou seja, ela se torna descartável.

Esse conceito, de ser descartado, pode parecer um pouco estranho no início, mas economiza muita memória. E, como estamos falando de programação funcional, pode-se dizer que nossa sequência se torna imutável, pois não existe uma maneira de mudar os valores contidos na lista do objeto.

Seguem dois links maravilhosos explicando sobre iteração em python:

O primeiro é a PEP sobre as estruturas dos iteráveis e o segundo um video do Guru Luciano Ramalho explicando tudo sobre iteradores.

Ah... Ia quase me esquecendo, se você não entendeu muita coisa sobre os dunders, você pode ler o Python data model. Obs: não me responsabilizo pelo programador melhor que você sairá desta página.

Embora esse tópico seja talvez o mais curto, ele vai ser de fundamental importância para o entendimento de um pouco de tudo nesse 'Curso'. É sério. Vamos entender como trabalhar com iteráveis de uma maneira bonita no próximo tópico.

por Eduardo Mendes em 24 de December de 2017 às 00:50

December 08, 2017

PythonClub

Programação funcional com Python #1 - Funções

1. Funções

Como nem tudo são flores, vamos começar do começo e entender algumas características das funções do python (o objeto função) e dar uma revisada básica em alguns conceitos de função só pra gente não se perder no básico depois. Então o primeiro tópico vai se limitar a falar da estrutura básica das funções em python, sem entrar profundamente em cada um dos tópicos. Será uma explanação de código e abrir a cabeça para novas oportunidades de código mais pythonicos e que preferencialmente gere menos efeito colateral. Mas calma, não vamos ensinar a fazer funções, você já está cheio disso.

1.1 Funções como objeto de primeira classe

Funções como objeto de primeira classe, são funções que se comportam como qualquer tipo nativo de uma determinada linguagem. Por exemplo:

# uma lista

lista = [1, 'str', [1,2], (1,2), {1,2}, {1: 'um'}]

Todos esses exemplos são tipos de objetos de primeira classe em Python, mas no caso as funções também são. Como assim? Pode-se passar funções como parâmetro de uma outra função, podemos armazenar funções em variáveis, pode-se definir funções em estruturas de dados:

# Funções como objeto de primeira classe

func = lambda x: x # a função anônima, lambda, foi armazenada em uma variável

def func_2(x):
    return x + 2

lista = [func, func_2] # a variável que armazena a função foi inserida em uma estrutura, assim como uma função gerada com def

lista_2 = [lambda x: x, lambda x: x+1] # aqui as funções foram definidas dentro de uma estrutura

Como é possível notar, em python, as funções podem ser inseridas em qualquer contexto e também geradas em tempo de execução. Com isso nós podemos, além de inserir funções em estruturas, retornar funções, passar funções como parâmetro (HOFs), definir funções dentro de funções(closures) e assim por diante. Caso você tenha aprendido a programar usando uma linguagem em que as funções não são objetos de primeira classe, não se assuste. Isso faz parte da rotina comum do python. Preferencialmente, e quase obrigatoriamente, é melhor fazer funções simples, pequenas e de pouca complexidade para que elas não sofram interferência do meio externo, gerem menos manutenção e o melhor de tudo, possam ser combinadas em outras funções. Então vamos lá!

1.2 Funções puras

Funções puras são funções que não sofrem interferência do meio externo. Vamos começar pelo exemplo ruim:

valor = 5

def mais_cinco(x):
    return x + valor

assert mais_cinco(5) == 10 # True

valor = 7

assert mais_cinco(5) == 10 # AssertionError

mais_cinco() é o exemplo claro de uma função que gera efeito colateral. Uma função pura deve funcionar como uma caixa preta, todas as vezes em que o mesmo input for dado nela, ela terá que retornar o mesmo valor. Agora vamos usar o mesmo exemplo, só alterando a linha do return:

valor = 5

def mais_cinco(x):
    return x + 5

assert mais_cinco(5) == 10 # True

valor = 7

assert mais_cinco(5) == 10 # True

Pode parecer trivial, mas muitas vezes por comodidade deixamos o meio influenciar no comportamento de uma função. Por definição o Python só faz possível, e vamos falar disso em outro tópico, a leitura de variáveis externas. Ou seja, dentro do contexto da função as variáveis externas não podem ser modificadas, mas isso não impede que o contexto externo a modifique. Se você for uma pessoa inteligente como o Jaber deve saber que nunca é uma boa ideia usar valores externos. Mas, caso seja necessário, você pode sobrescrever o valor de uma variável no contexto global usando a palavra reservada global. O que deve ficar com uma cara assim:

valor = 5

def teste():
    global valor # aqui é feita a definição
    valor = 7

print(valor) # 7

Só lembre-se de ser sempre coerente quando fizer isso, as consequências podem ser imprevisíveis. Nessa linha de funções puras e pequeninas, podemos caracterizar, embora isso não as defina, funções de ordem superior, que são funções que recebem uma função como argumento, ou as devolvem, e fazem a chamada das mesmas dentro do contexto da função que a recebeu como parâmetro. Isso resulta em uma composição de funções, o que agrega muito mais valor caso as funções não gerem efeitos colaterais.

1.3 Funções de ordem superior (HOFs)

Funções de ordem superior são funções que recebem funções como argumento(s) e/ou retornam funções como resposta. Existem muitas funções embutidas em python de ordem superior, como: map, filter, zip e praticamente todo o módulo functools import functools. Porém, nada impede de criarmos novas funções de ordem superior. Um ponto a ser lembrado é que map e filter não tem mais a devida importância em python com a entrada das comprehensions (embora eu as adore), o que nos faz escolher única e exclusivamente por gosto, apesar de comprehensions serem mais legíveis (vamos falar disso em outro contexto), existem muitos casos onde elas ainda fazem sentido. Mas sem me estender muito, vamos ao código:

func = lambda x: x+2 # uma função simples, soma mais 2 a qualquer inteiro

def func_mais_2(funcao, valor):
    """
    Executa a função passada por parâmetro e retorna esse valor somado com dois

    Ou seja, é uma composição de funções:

    Dado que func(valor) é processado por func_func:
        func_mais_2(func(valor)) == f(g(x))
    """
        return funcao(valor) + 2

Um ponto a tocar, e o que eu acho mais bonito, é que a função vai retornar diferentes respostas para o mesmo valor, variando a entrada da função. Nesse caso, dada a entrada de um inteiro ele será somado com 2 e depois com mais dois. Mas, vamos estender este exemplo:

func = lambda x: x + 2 # uma função simples, soma mais 2 a qualquer inteiro

def func_mais_2(funcao, valor):
    """
    Função que usamos antes.
    """
        return funcao(valor) + 2

assert func_mais_2(func, 2) == 6 # true

def func_quadrada(val):
    """
    Eleva o valor de entrada ao quadrado.
    """
    return val * val

assert func_mais_2(func_quadrada, 2) == 6 # true

1.3.1 Um exemplo usando funções embutidas:

Muitas das funções embutidas em python são funções de ordem superior (HOFs) como a função map, que é uma das minhas preferidas. Uma função de map recebe uma função, que recebe um único argumento e devolve para nós uma nova lista com a função aplicada a cada elemento da lista:

def func(arg):
    return arg + 2

lista = [2, 1, 0]

list(map(func, lista)) == [4, 3, 2] # true

Mas fique tranquilo, falaremos muito mais sobre isso.

1.4 `call`

Por que falar de classes? Lembre-se, Python é uma linguagem construída em classes, e todos os objetos que podem ser chamados/invocados implementam o método __call__:

Em uma função anônima:

func = lambda x: x

'__call__' in dir(func) # True

Em funções tradicionais:

def func(x):
    return x

'__call__' in dir(func) # True

Isso quer dizer que podemos gerar classes que se comportam como funções?

SIIIIIM. Chupa Haskell

Essa é uma parte interessante da estrutura de criação do Python a qual veremos mais em outro momento sobre introspecção de funções, mas vale a pena dizer que classes, funções nomeadas, funções anônimas e funções geradoras usam uma base comum para funcionarem, essa é uma das coisas mais bonitas em python e que em certo ponto fere a ortogonalidade da linguagem, pois coisas iguais tem funcionamentos diferentes, mas facilita o aprendizado da linguagem, mas não é nosso foco agora.

1.5 Funções geradoras

Embora faremos um tópico extremamente focado em funções geradoras, não custa nada dar uma palinha, não?

Funções geradoras são funções que nos retornam um iterável. Mas ele é lazy (só é computado quando invocado). Para exemplo de uso, muitos conceitos precisam ser esclarecidos antes de entendermos profundamente o que acontece com elas, mas digo logo: são funções lindas <3

Para que uma função seja geradora, em tese, só precisamos trocar o return por yield:

def gen(lista):
    for elemento in lista:
        yield elemento

gerador = gen([1, 2, 3, 4, 5])

next(gerador) # 1
next(gerador) # 2
next(gerador) # 3
next(gerador) # 4
next(gerador) # 5
next(gerador) # StopIteration

Passando bem por cima, uma função geradora nos retorna um iterável que é preguiçoso. Ou seja, ele só vai efetuar a computação quando for chamado.

1.6 Funções anônimas (lambda)

Funções anônimas, ou funções lambda, são funções que podem ser declaradas em qualquer contexto. Tá... Todo tipo de função em python pode ser declarada em tempo de execução. Porém funções anônimas podem ser atribuídas a variáveis, podem ser definidas dentro de sequências e declaradas em um argumento de função. Vamos olhar sua sintaxe:

lambda argumento: argumento

A palavra reservada lambda define a função, assim como uma def. Porém em uma def quase que instintivamente sempre quebramos linha:

def func():
  pass

Uma das diferenças triviais em python é que as funções anônimas não tem nome. Tá, isso era meio óbvio, mas vamos averiguar:

def func():
  pass

func.__name__ # func

lambda_func = lambda arg: arg

lambda_func.__name__ # '<lambda>'

O resultado '<lambda>' será o mesmo para qualquer função. Isso torna sua depuração praticamente impossível em python. Por isso os usuários de python (e nisso incluo todos os usuários, até aqueles que gostam de funcional) não encorajam o uso de funções lambda a todos os contextos da linguagem. Mas, em funções que aceitam outra funções isso é meio que uma tradição, caso a função (no caso a que executa o código a ser usado pelo lambda) não esteja definida e nem seja reaproveitada em outro contexto. Eu gosto de dizer que lambdas são muito funcionais em aplicações parciais de função. Porém, os lambdas não passam de açúcar sintático em Python, pois não há nada que uma função padrão (definida com def), não possa fazer de diferente. Até a introspecção retorna o mesmo resultado:

def func():
  pass

type(func) # function

lambda_func = lambda arg: arg

type(lambda_func) # function

Uma coisa que vale ser lembrada é que funções anônimas em python só executam uma expressão. Ou seja, não podemos usar laços de repetição (while, for), tratamento de exceções (try, except, finally). Um simples if com uso de elif também não pode ser definido. Como sintaticamente só são aceitas expressões, o único uso de um if é o ternário:

valor_1 if condicao else valor_2

O que dentro de um lambda teria essa aparência:

func = lambda argumento: argumento + 2 if argumento > 0 else argumento - 2

Funções lambda também podem ter múltiplos argumentos, embora seu processamento só possa ocorrer em uma expressão:

func = lambda arg_1, arg_2, arg_3: True if sum([arg_1, arg_2, arg_3]) > 7 else min([arg_1, arg_2, arg_3])

Embora essa seja uma explanação inicial sobre as funções anônimas, grande parte dos tópicos fazem uso delas e vamos poder explorar melhor sua infinitude.

Mas por hoje é só e no tópico seguinte vamos discutir, mesmo que superficialmente, iteradores e iteráveis e suas relações com a programação funcional.

por Eduardo Mendes em 08 de December de 2017 às 15:30

November 20, 2017

PythonClub

Programação funcional com Python #0 - Saindo da zona de conforto

0. Saindo da zona de conforto

Sinta-se um vencedor, se você chegou até aqui, isso significa que quer aprender mais sobre o mundo da programação.

Aprender novos paradígmas podem te trazer muitas coisas positivas, assim como aprender linguagens diferentes, pois paradígmas e linguagens transpõem maneiras, estruturas e métodos de implementação completamente diferentes. Com isso você pode ter mais ferramentas para usar no dia a dia. Você pode aumentar sua capacidade de expressar ideias de diferentes maneiras. Eu penso que o maior limitador de um programador é a linguagem de programação em que ele tem domínio. Quando você aprende linguagens imperativas, como C, Python, Java e etc..., você se vê limitado ao escopo de criar e manipular variáveis. Não que isso seja uma coisa ruim, porém existem outras maneiras de resolver problemas e quando você tem conhecimento disso consegue avaliar melhor quando implementar cada tipo de coisa.

Você pode me dizer que aprender diferentes tipos de estruturas e maneiras de computar é uma coisa negativa pois tudo é variável nesse contexto. Mas eu penso exatamente o contrário, quanto mais você aprender da sua língua nativa, no caso estamos falando em português, maior o campo de domínio que você tem sobre como se comunicar e expressar ideias. Assim como aprender outras línguas te darão mais fundamentos para expressar ideias em outros idiomas, que não são melhores que os seu, mas diferentes e compõem diferentes estruturas, e isso pode ser libertador. Não quero me prolongar nesse assunto, mas dizer que isso pode acrescentar muito na suas habilidades cognitivas, até mesmo para usar ferramentas que você já usa no seu dia a dia.

Vamos começar fazendo uma tentativa de entender os paradígmas de programação, sem muito falatório e complicações. Um exemplo muito legal é do David Mertz em "Functional Programming in Python":

Usa-se programação funcional quando se programa em Lisp, Haskell, Scala, Erlang, F# etc..
Do mesmo modo que se usa programação imperativa quando se programada C/C++, Pascal, Java, Python etc...
Também quando se programa Prolog estamos programando usando o paradígma lógico.

Apesar de não ser uma definição muito elegante, talvez seja a melhor a ser dada em muitas ocasiões. Vamos tentar ser um pouco mais objetivos em relação ao estilo de computação, embora essa discussão não tenha fim:

O foco de usar programação imperativa está no ato de mudar variáveis. A computação se dá pela modificação dos estados das variáveis iniciais. Sendo assim, vamos pensar que tudo é definido no início e vai se modificando até que o resultado esperado seja obtido.
Na programação funcional, se tem a noção de que o estado deve ser substituído, no caso da avaliação, para criação de um novo 'objeto' que no caso são funções.

0.1 Mas de onde vem a programação funcional?

O florescer da programação funcional nasce no Lisp (acrônomo para List Processing) para tentar resolver alguns problemas de inteligência artificial que eram provenientes da linguística, que tinha foco em processamento de linguagem natural que por sua vez eram focados em processamento de listas em geral. Isso justifica uma grande parte do conteúdo que vamos ver aqui e seus tipos de dados variam somente entre listas e átomos. E assim foi mantido o foco de processamento de listas em todas as linguagens funcionais e suas funções e abstrações para resolver problemas relativos a listas e estruturas iteráveis. Uma curiosidade é que para quem não sabe porque em lisp existem tantos parênteses é que ele é baseado em s-expression, uma coisa que temos um "equivalente" evoluído em python, que parte dos teoremas de gramáticas livres de contexto:

(+ 4 5)

Sim, isso é uma soma em lisp. Diferente das linguagens imperativas como costumamos ver:

4 + 5

Uma assertiva pode ser feita dessa maneira:

Funcional (ou declarativa)

(= 4 (+ 2 2))

Imperativa

(2 + 2) == 4

Chega de enrolação e vamos correr com essa introdução, não viemos aqui para aprender Lisp ou C. Mas acho que parte desse contexto pode nos ajudar e muito quando formos nos aprofundar em alguns tópicos. Pretendo sempre que iniciar uma nova ferramenta da programação funcional ao menos explicar em que contexto ela foi desenvolvida e para resolver cada tipo de problema.

0.2 Técnicas usadas por linguagens funcionais

Vamos tentar mapear o que as linguagens funcionais fazem de diferente das linguagens imperativas, mas não vamos nos aprofundar nesse tópicos agora, pois são coisas às vezes complexas sem o entendimento prévio de outros contextos, mas vamos tentar só explanar pra que você se sinta empolgado por estar aqui:

Funções como objetos de primeira classe:
- São funções que podem estar em qualquer lugar (em estruturas, declaradas em tempo de execução).
Funções de ordem superior:
- São funções que podem receber funções como argumentos e retornar funções.
Funções puras:
- São funções que não sofrem interferências de meios externos (variáveis de fora). Evita efeitos colaterais.
Recursão, como oposição aos loops:
- Frequentemente a recursão na matemática é uma coisa mais intuitiva e é só chamar tudo outra vez, no lugar de ficar voltando ao ponto inicial da iteração.
Foco em processamento de iteráveis:
- Como dito anteriormente, pensar em como as sequências podem nos ajudar a resolver problemas.
O que deve ser computado, não como computar:
- Não ser tão expressivo e aceitar que as intruções não tem necessidade de estar explicitas todas as vezes, isso ajuda em legibilidade.
Lazy evaluation:
- Criar sequências infinitas sem estourar nossa memória.

0.3 Python é uma linguagem funcional?

Não. Mas é uma linguagem que implementa muitos paradígmas e porque não usar todos de uma vez?

O objetivo desse 'conjunto de vídeos' é escrever código que gere menos efeito colateral e código com menos estados. Só que isso tudo feito na medida do possível, pois Python não é uma linguagem funcional. Porém, podemos contar o máximo possível com as features presentes do paradígma em python.

Exemplos de funcional (básicos) em python:

# Gerar uma lista da string # Imperativo
string = 'Python'
lista = [] # estado inicial

for l in string:
    lista.append(l) # cada iteração gera um novo estado

print(lista) # ['P', 'y', 't', 'h', 'o', 'n']

# Gerar uma lista da string # Funcional
string = lambda x: x

lista = list(map(str, string('Python'))) # atribuição a um novo objeto

print(lista) # ['P', 'y', 't', 'h', 'o', 'n']

Como você pode ver, depois de uma explanação básica das técnicas, a segunda implementação não sofre interferência do meio externo (Funções puras), evita loops e sua saída sem o construtor de list é lazy. Mas não se assuste, vamos abordar tudo isso com calma.

0.4 A quem esse 'curso' é destinado?

Primeiramente gostaria de dizer que roubei essa ideia dos infinitos livros da O’Reilly, que sempre exibem esse tópico. Mas vamos ao assunto. Este curso é para você que sabe o básico de Python, e quando digo básico quero dizer que consegue fazer qualquer coisa com um pouco de pesquisa na internet. O básico de programação se reduz a isso. Vamos falar sobre coisas simples e coisas mais complexas, mas pretendo manter o bom senso para que todos possam absorver o máximo de conteúdo possível.

Então, caso você venha do Python (OO ou procedural) você vai encontrar aqui uma introdução a programação funcional descontraída e sem uma tonelada de material difícil de entender. Caso você venha de linguagens funcionais como Haskell e Lisp, você pode se sentir um pouco estranho com tantas declarações, mas aprenderá a se expressar em Python. Caso você venha de linguagens funcionais modernas como Clojure e Scala, as coisas são bem parecidas por aqui.

Então tente tirar o máximo de proveito. Vamos nos divertir.

0.5 Apresentando o Jaber

Jaber é nosso aluno de mentira, mas vamos pensar que ele é um aluno que senta na primeira fileira e pergunta de tudo, sempre que acha necessário. Roubei essa ideia do livro de expressões regulares do Aurélio. Ele tem um personagem, Piazinho, e acho que toda interação com ele é sempre graciosa e tira dúvidas quando tudo parece impossível.

0.6 Sobre as referências

Não gosto muito de citar referências pois procurei não copiar texto dos livros, mas muita coisa contida neles serve de base para o entendimento de certos tópicos. Outro motivo é o nível de complexidade dos exemplos ou explicações que tentei reduzir ao máximo enquanto escrevia esses roteiros. Para um exemplo, você pode olhar o livro do Steven Lott, cheio de fórmulas e abstrações matemáticas que em certo ponto acabam comprometendo o entendimento de quem não tem uma sólida base em computação teórica ou matemática.

Como um todo, as referências serviram como guia, foi o que lí quando dúvidas para explicações surgiram. Não tiro nenhum crédito delas e as exponho para que todos saibam que existem muitos livros bons e que boa parte do que é passado aqui, foi aprendido neles.

0.7 Mais sobre o histórico das linguagens funcionais

Se você pretende realmente se aprofundar no assunto enquanto acompanha esse curso, fazer uma imersão ou coisa parecida. Tudo começa com o cálculo lambda mentalizado pelo incrível Alonzo Church. Caso você não o conheça, ele foi um matemático fantástico e teve uma carreira acadêmica brilhante. Foi o orientador de pessoas incríveis como Alan Turing, Raymond Smullyan etc...

Outro grande homem e que vale a pena mencionar e ser buscado é o Haskell Curry, um lógico que trouxe excelentes contribuições para o que chamamos hoje de programação funcional.

A primeira linguagem funcional 'oficial' (não gosto muito de dizer isso) é o Lisp (List Processing) criada pelo fenomenal John McCarthy que também vale a pena ser pesquisado e estudado.

Veremos o básico sobre os tipos de função no próximo tópico.

OBS: Referências usadas durante todos os tópicos.

por Eduardo Mendes em 20 de November de 2017 às 21:43

July 21, 2017

PythonClub

Peewee - Um ORM Python minimalista

Peewee é um ORM destinado a criar e gerenciar tabelas de banco de dados relacionais através de objetos Python. Segundo a wikipedia, um ORM é:

Mapeamento objeto-relacional (ou ORM, do inglês: Object-relational mapping) é uma técnica de desenvolvimento > utilizada para reduzir a impedância da programação orientada aos objetos utilizando bancos de dados relacionais. As tabelas do banco de dados são representadas através de classes e os registros de cada tabela são representados como instâncias das classes correspondentes.

O que um ORM faz é, basicamente, transformar classes Python em tabelas no banco de dados, além de permitir construir querys usando diretamente objetos Python ao invés de SQL.

O Peewee é destinado a projetos de pequeno/médio porte e se destaca pela simplicidade quando comparado a outros ORM mais conhecidos, como o SQLAlchemy. Uma analogia utilizada pelo autor da API e que acho muito interessante é que Peewee está para o SQLAlchemy assim como SQLite está para o PostgreSQL.

Em relação aos recursos por ele oferecidos, podemos citar que ele possui suporte nativo a SQLite, PostgreSQL e MySQL, embora seja necessário a instalação de drivers para utilizá-lo com PostgreSQL e MySQL e suporta tanto Python 2.6+ quanto Python 3.4+.

Neste tutorial, utilizaremos o SQLite, por sua simplicidade de uso e pelo Python possuir suporte nativo ao mesmo (usaremos o Python 3.5).

Instalação

O Peewee pode ser facilmente instalado com o gerenciador de pacotes pip (recomendo a instalação em um virtualenv):

pip install peewee

Criando o banco de dados

Para criar o banco de dados é bem simples. Inicialmente passamos o nome do nosso banco de dados (a extensão *.db indica um arquivo do SQLite).

import peewee

# Aqui criamos o banco de dados
db = peewee.SqliteDatabase('codigo_avulso.db')

Diferente de outros bancos de dados que funcionam através um servidor, o SQLite cria um arquivo de extensão *.db, onde todos os nossos dados são armazenados.

Caso deseje ver as tabelas existentes no arquivo codigo_avulso.db, instale o aplicativo SQLiteBrowser. Com ele fica fácil monitorar as tabelas criadas e acompanhar o tutorial.

 sudo apt-get install sqlitebrowser

A título de exemplo, vamos criar um banco destinado a armazenar nomes de livros e de seus respectivos autores. Iremos chamá-lo de models.py.

Inicialmente, vamos criar a classe base para todos os nossos models. Esta é uma abordagem recomendada pela documentação e é considerada uma boa prática. Também adicionaremos um log para acompanharmos as mudanças que são feitas no banco:

# models.py

import peewee

# Criamos o banco de dados
db = peewee.SqliteDatabase('codigo_avulso.db')


class BaseModel(peewee.Model):
    """Classe model base"""

    class Meta:
        # Indica em qual banco de dados a tabela
        # 'author' sera criada (obrigatorio). Neste caso,
        # utilizamos o banco 'codigo_avulso.db' criado anteriormente
        database = db

A class BaseModel é responsável por criar a conexão com nosso banco de dados.

Agora, vamos criar a model que representa os autores:

# models.py

class Author(BaseModel):

    """
    Classe que representa a tabela Author
    """
    # A tabela possui apenas o campo 'name', que receberá o nome do autor sera unico
    name = peewee.CharField(unique=True)

Se observamos a model Author, veremos que não foi especificado nenhuma coluna como primary key (chave primaria), sendo assim o Peewee irá criar um campo chamado id do tipo inteiro com auto incremento para funcionar como chave primária. Em seguida, no mesmo arquivo models.py criamos a classe que representa os livros. Ela possui uma relação de "muitos para um" com a tabela de autores, ou seja, cada livro possui apenas um autor, mas um autor pode possuir vários livros.

# models.py

class Book(BaseModel):
    """
    Classe que representa a tabela Book
    """

    # A tabela possui apenas o campo 'title', que receberá o nome do livro
    title = peewee.CharField(unique=True)

    # Chave estrangeira para a tabela Author
    author = peewee.ForeignKeyField(Author)

Agora, adicionamos o código que cria as tabelas Author e Book.

# models.py

if __name__ == '__main__':
    try:
        Author.create_table()
        print("Tabela 'Author' criada com sucesso!")
    except peewee.OperationalError:
        print("Tabela 'Author' ja existe!")

    try:
        Book.create_table()
        print("Tabela 'Book' criada com sucesso!")
    except peewee.OperationalError:
        print("Tabela 'Book' ja existe!")

excerpt Agora executamos o models.py:

python models.py

A estrutura do diretório ficou assim:

.
├── codigo_avulso.db
├── models.py

Após executarmos o código, será criado um arquivo de nome codigo_avulso.db no mesmo diretório do nosso arquivo models.py, contendo as tabelas Author e Book.

Realizando o CRUD

Agora vamos seguir com as 4 principais operações que podemos realizar em um banco de dados, também conhecida como CRUD.

A sigla CRUD é comumente utilizada para designar as quatro operações básicas que pode-se executar em um banco de dados, sendo elas:

- Create (criar um novo registro no banco)
- Read (ler/consultar um registro)
- Update (atualizar um registro)
- Delete (excluir um registro do banco)

Iremos abordar cada uma dessas operações.

Create: Inserindo dados no banco

Agora, vamos popular nosso banco com alguns autores e seus respectivos livros. Para isso criamos um arquivo create.py. A estrutura do diretório ficou assim:

.
├── codigo_avulso.db
├── models.py
├── create.py

A criação dos registros no banco pode ser feito através do método create, quando desejamos inserir um registro apenas; ou pelo método insert_many, quando desejamos inserir vários registros de uma vez em uma mesma tabela.

# create.py

from models import Author, Book

# Inserimos um autor de nome "H. G. Wells" na tabela 'Author'
author_1 = Author.create(name='H. G. Wells')

# Inserimos um autor de nome "Julio Verne" na tabela 'Author'
author_2 = Author.create(name='Julio Verne')

book_1 = {
    'title': 'A Máquina do Tempo',
    'author_id': author_1,
}

book_2 = {
    'title': 'Guerra dos Mundos',
    'author_id': author_1,
}

book_3 = {
    'title': 'Volta ao Mundo em 80 Dias',
    'author_id': author_2,
}

book_4 = {
    'title': 'Vinte Mil Leguas Submarinas',
    'author_id': author_1,
}

books = [book_1, book_2, book_3, book_4]

# Inserimos os quatro livros na tabela 'Book'
Book.insert_many(books).execute()

Read: Consultando dados no banco

O Peewee possui comandos destinados a realizar consultas no banco. De maneira semelhante ao conhecido SELECT. Podemos fazer essa consulta de duas maneiras. Se desejamos o primeiro registro que corresponda a nossa pesquisa, podemos utilizar o método get().

# read.py

from models import Author, Book

book = Book.get(Book.title == "Volta ao Mundo em 80 Dias").get()
print(book.title)

# Resultado
# * Volta ao Munto em 80 Dias

Porém, se desejamos mais de um registro, utilizamos o método select. Por exemplo, para consultar todos os livros escritos pelo autor "H. G. Wells".

# read.py

books = Book.select().join(Author).where(Author.name=='H. G. Wells')

# Exibe a quantidade de registros que corresponde a nossa pesquisa
print(books.count())

for book in books:
    print(book.title)

# Resultado:
# * A Máquina do Tempo
# * Guerra dos Mundos
# * Vinte Mil Leguas Submarinas

Também podemos utilizar outras comandos do SQL como limit e group (para mais detalhes, ver a documentação aqui).

A estrutura do diretório ficou assim:

.
├── codigo_avulso.db
├── models.py
├── create.py
├── read.py

Update: Alterando dados no banco

Alterar dados também é bem simples. No exemplo anterior, se observarmos o resultado da consulta dos livros do autor "H. G. Wells", iremos nos deparar com o livro de título "Vinte Mil Léguas Submarinas". Se você, caro leitor, gosta de contos de ficção-científica, sabe que esta obra foi escrito por "Julio Verne", coincidentemente um dos autores que também estão cadastrados em nosso banco. Sendo assim, vamos corrigir o autor do respectivo livro.

Primeiro vamos buscar o registro do autor e do livro:

# update.py

from models import Author, Book

new_author = Author.get(Author.name == 'Julio Verne')
book = Book.get(Book.title=="Vinte Mil Leguas Submarinas")

Agora vamos alterar o autor e gravar essa alteração no banco.

# update.py

# Alteramos o autor do livro
book.author = new_author

# Salvamos a alteração no banco
book.save()

A estrutura do diretório ficou assim:

.
├── codigo_avulso.db
├── models.py
├── create.py
├── read.py
├── update.py

Delete: Deletando dados do banco

Assim como as operações anteriores, também podemos deletar registros do banco de maneira bem prática. Como exemplo, vamos deletar o livro "Guerra dos Mundos" do nosso banco de dados.

# delete.py

from models import Author, Book

# Buscamos o livro que desejamos excluir do banco
book = Book.get(Book.title=="Guerra dos Mundos")

# Excluimos o livro do banco
book.delete_instance()

Simples não?

A estrutura do diretório ficou assim:

.
├── codigo_avulso.db
├── models.py
├── create.py
├── read.py
├── update.py
├── delete.py

Conclusão

É isso pessoal. Este tutorial foi uma introdução bem enxuta sobre o Peewee. Ainda existem muitos tópicos que não abordei aqui, como a criação de primary_key, de campos many2many entre outros recursos, pois foge do escopo deste tutorial. Se você gostou do ORM, aconselho a dar uma olhada também na sua documentação, para conseguir extrair todo o potencial da ferramenta. A utilização de um ORM evita que o desenvolvedor perca tempo escrevendo query SQL e foque totalmente no desenvolvimento de código.

Referências

por Michell Stuttgart em 21 de July de 2017 às 02:45

June 17, 2017

Flávio Coelho

Curso de introdução a criptomoedas - Aula 01

For the Portuguese speaking readers of this blog, I am starting an Introductory course on Cryptocurrencies and applications on the blockchain which is an online version of a standard classroom course I am starting now at FGV. This is the first lecture which is basically an intro to the topic and the structure of the course. The online version should have the main content of the lectures on

por Anonymous (noreply@blogger.com) em 17 de June de 2017 às 13:50