andrewchambers
Repos
224
Followers
278
Following
96

Ordered process (re)start, shutdown, and supervision.

150
4

terraform provider that manages nix builds and nixos machines.

105
5

A no frills delta debugger written in myrddin.

29
0

Easy and efficient encrypted backups.

792
24

A high availability distributed filesystem built on FoundationDB and fuse.

11
0

A horizontally scaling object store based on the CRUSH placement algorithm.

10
0

Events

issue comment
Possible corruption detected

Interesting that this was openbsd as well, I can maybe do my fuzzing in a vm there.

Created at 4 days ago
issue comment
Possible corruption detected

The permission error patch might be a potential cause, I will make a new release a priority so people don't need to rely on the custom patches.

There is also a chance something else is going on that is specific to openbsd. I will try to introduce more fuzzing to try and reproduce it on my end.

Created at 4 days ago

Add pipelined request parallelism to the python api.

This change makes use of grpc futures to make many concurrent requests to the server in parallel.

The streamset api automatically makes use of this feature when it can, as well as new streamset.insert, streamset.delete and streamset.obliterate functions.

This change also introduces conn.batch_query and conn.batch_create to allow users to create and query the existence of many streams in parallel.

Overall these changes allow 100x-300x single threaded stream creation, while also improving parallel queries.

Created at 5 days ago
issue comment
Possible corruption detected

Thank you so much for the report - Your setup sounds fine.

A few questions:

  • Are you able to recover the file using a full 'get' of the snapshot?
  • Are you able to reproduce the error reliably?
  • Did you install bupstash from an openbsd package repository? I am curious if they applied any patches.
Created at 5 days ago

Add pipelined request parallelism to the python api.

This change makes use of grpc futures to make many concurrent requests to the server in parallel.

The streamset api automatically makes use of this feature when it can, as well as new streamset.insert, streamset.delete and streamset.obliterate functions.

This change also introduces conn.batch_query and conn.batch_create to allow users to create and query the existence of many streams in parallel.

Overall these changes allow 100x-300x single threaded stream creation, while also improving parallel queries.

Created at 5 days ago

Add pipelined request parallelism to the python api.

This change makes use of grpc futures to make many concurrent requests to the server in parallel.

The streamset api automatically makes use of this feature when it can, as well as new streamset.insert, streamset.delete and streamset.obliterate functions.

This change also introduces conn.batch_query and conn.batch_create to allow users to create and query the existence of many streams in parallel.

Overall these changes allow 100x-300x single threaded stream creation, while also improving parallel queries.

Created at 5 days ago

Add more batching functions.

Created at 5 days ago
issue comment
Redirect stderr to stdout?

I would also try swapping the stderr and stdout - I think its slightly counter intuitive - its essentially equivalent to a call to dup2 iirc.

Created at 5 days ago
issue comment
Redirect stderr to stdout?

I thought this should work, seems like a bug potentially if it doesn't.

Created at 5 days ago

Change from batch to pipelined operation.

Created at 6 days ago
issue comment
CPU usage of workers jumps to 100 percent after input source finishes

Not an expert in timely dataflow, but I noticed once EOF is hit, 'epoch_started' is never updated again which means 'advance' will always be true:

https://github.com/bytewax/bytewax/blob/9e13300aff81857a1131c8fa6e504b4949e1aeda/src/execution/epoch/periodic_epoch.rs#L144

Created at 6 days ago
issue comment
CPU usage of workers jumps to 100 percent after input source finishes

Reproduction:

import time
from bytewax.dataflow import Dataflow
from bytewax.inputs import ManualInputConfig
from bytewax.outputs import StdOutputConfig
from bytewax.execution import run_main, spawn_cluster


def input_builder(worker_index, worker_count, resume_state):
    if worker_index != 0:
        return
    i = 0
    while True:
        time.sleep(0.001)
        i += 1
        yield None, i

flow = Dataflow()
flow.input("input", ManualInputConfig(input_builder))
flow.capture(StdOutputConfig())

if __name__ == '__main__':
    spawn_cluster(
        flow,
        proc_count=4,
    )
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                     
 695475 ac        20   0  654192  24928  14208 S 100.0   0.2   0:57.42 python                      
 695476 ac        20   0  654192  24936  14212 S 100.0   0.2   0:57.42 python                      
 695477 ac        20   0  654192  25064  14152 S 100.0   0.2   0:57.45 python 
Created at 6 days ago
opened issue
CPU of workers jumps to 100 percent after input source finishes

Bug Description

When using spawn_cluster and you create more workers than 'distribute' is able to allocate work to, the workers that finish early immediately jump to 100 percent cpu usage until the job finishes.

Python version (python -V)

3.10

Bytewax version (pip list | grep bytewax)

0.15.1

Relevant log output

No response

Steps to Reproduce

  • Spawn a cluster with many more workers than work available to them.
  • Have some of your manual input workers exit early.
  • The cpu usage of those workers which finished will spike to 100 percent.
Created at 1 week ago
started
Created at 1 week ago
issue comment
Try to make idle workers not spin

Curious if there has been any progress on this - It makes it a bit hard to judge the true cpu usage of some test dataflows I am benchmarking.

Created at 1 week ago
started
Created at 1 week ago
started
Created at 1 week ago
started
Created at 1 week ago

Add flush to prototype.

Created at 1 week ago

Temporarily add benchmark script.

Created at 1 week ago
andrewchambers create branch batching_api
Created at 1 week ago
issue comment
Add a Cross.toml file and a note on using it

So to clarify - this does the build in docker?

Created at 1 week ago
issue comment
Prebuilt binaries

Really quite embarrassed at how long this has been taking for me - just lots of stuff to do really.

Created at 1 week ago
Created at 1 week ago
Created at 2 weeks ago
create tag
andrewchambers create tag v1.0.5
Created at 2 weeks ago
Created at 2 weeks ago
started
Created at 2 weeks ago
started
Created at 2 weeks ago