gena01
Repos
15
Followers
15
Following
7

The plugin-driven server agent for collecting & reporting metrics.

12027
4781

Mesos to Consul bridge for service discovery

340
92

Marathon-lb is a service discovery & load balancing tool for DC/OS

451
282

[MIRROR] Alpine packages build scripts

556
662

Docker images for OpenZipkin

671
313

Stomp Client Extension

C
19
20

Events

issue comment
bug: http_client sends duplicat content-type header

I was testing with main

Created at 2 hours ago
opened issue
bug: http_client sends duplicat content-type header

Just noticed this while testing that when I have this line in my configuration:

output:
  http_client:
     headers:
        Content-Type: application/json

I am seeing this on the receiving end:

2022-09-29T21:35:31-04:00 DBG Got request:
2022-09-29T21:35:31-04:00 DBG POST / HTTP/1.1
2022-09-29T21:35:31-04:00 DBG user-agent: Go-http-client/1.1
2022-09-29T21:35:31-04:00 DBG content-length: 613
2022-09-29T21:35:31-04:00 DBG content-type: application/json
2022-09-29T21:35:31-04:00 DBG content-type: application/json
2022-09-29T21:35:31-04:00 DBG accept-encoding: gzip
2022-09-29T21:35:31-04:00 DBG
Created at 2 hours ago
opened issue
feature: add something like `max_byte_size` option to `batching`

Would it be possible to add something like max_byte_size option to the batching policy/logic so we could have a hard upper limit to the batch(es).

Created at 2 hours ago
opened issue
bug: batching doesn't split/flush a big batch into smaller pieces

Ran into this while testing some of our scenarios. We push a lot of data of various sizes up to 100k through kafka. The data comes in "chunks" which we split into smaller pieces which are then batched up and send via http_client. The HTTP endpoint we are sending data to has a maximum payload limit ~500k. We need to make sure that we are efficient (less round trips, so we try to send as much data as possible) so we try to batch up as much as we can without going over the limit.

Using benthos it makes it a lot more challenging. Looking at this code: https://github.com/benthosdev/benthos/blob/main/internal/component/output/batcher/batcher.go#L109

It looks like it should be flushing multiple times for a single batch, but all it does is set a flag to flush at the end of the transaction. This is problematic for us since we keep hitting our payload limit(s).

Created at 3 hours ago
delete branch
gena01 delete branch feat-sarama-upgrade
Created at 1 week ago

Fix read_until connect check on terminated input

Simplify trace functions and propagate in http requests

Whoops, remove atomic.Pointer usage (came in 1.19)

Expand linting types to include explicit enums

Auto tests only every two hours

feat: upgrade sarama dependency to 1.36.0

  • Hardcode leaderEpoch to 0 for now since we only use v1 version for OffsetCommitRequest.
Created at 1 week ago

fix: avoid closing underlying file in stdin input

This change resolves an issue where the stdin file was being closed in the stdin input when benthos reloaded config in watch mode.

feat: add lint to require labels on components

Add Jaeger tracer default tags

Inspired from this blog article: http://www.inanzzz.com/index.php/post/4qes/implementing-opentelemetry-and-jaeger-tracing-in-golang-http-api

Use http config in stream builder (disabled by default)

Update snowflake_put docs

Looks like we managed to get their attention :) https://twitter.com/felipehoffa/status/1560811785606684672

feat: add file_json_contains

closes #1409

Lints

Merge pull request #1403 from disintegrator/fix-1241

fix: avoid closing underlying file in stdin input

Merge pull request #1404 from disintegrator/lint-require-labels

feat: add lint to require labels on components

Bump release build Go versions

Merge pull request #1410 from mihaitodor/add-jaeger-default-tags

Add Jaeger tracer default tags

Update github.com/segmentio/parquet-go

Add individual public packages for components (#1411)

  • Add public nats package

  • Add public memcached package

  • Add public amqp packages

  • Add public avro package

  • Add public awk package

  • Add public azure package

  • Add public cassandra package

  • Add public confluent package

  • Add public dgraph package

  • Add public elasticsearch package

  • Add public gcp package

  • Add public hdfs package

  • Add public influxdb package

  • Add public jaeger package

  • Add public maxmind package

  • Add public mongodb package

  • Add public mqtt package

  • Add public nanomsg package

  • Add public nsq package

  • Add public otlp package

  • Add public prometheus package

  • Add public pusher package

  • Add public redis package

  • Add public sftp package

  • Add public snowflake package

  • Add public sql package

  • Add public statsd package

  • Add public jsonpath package

Reduce components granularity slightly

Merge pull request #1412 from mihaitodor/update-snowflake-put-docs

Update snowflake_put docs

add endpoint to serve pprof goroutine profile

Merge pull request #1413 from gena01/feat-file_json_contains

feat: add file_json_contains

Merge pull request #1414 from mihaitodor/update-segmentio-parquet-libs

Update github.com/segmentio/parquet-go

Update docs

Merge pull request #1421 from disintegrator/pprof-goroutine-endpoint

add endpoint to serve pprof goroutine profile

Created at 1 week ago
pull request opened
feat: upgrade sarama dependency to 1.36.0
Created at 2 weeks ago
create branch
gena01 create branch feat-sarama-upgrade
Created at 2 weeks ago
delete branch
gena01 delete branch feat-file_json_contains
Created at 2 weeks ago
opened issue
feature: add http status code to the TRACE line of http_client output

It would be nice to know the http status code that was returned when hitting a RESTful or other HTTP API that return more than 200 when using TRACE log_level (which already logs a line)

Created at 2 weeks ago

feat: add file_json_contains

closes #1409

Created at 1 month ago
pull request opened
feat: add file_json_contains
Created at 1 month ago
create branch
gena01 create branch feat-file_json_contains
Created at 1 month ago
Created at 1 month ago
opened issue
feature: implement file_json_contains similar to json_contains

It would be nice to be able to externalize the JSON content from a test file and still have json_contains behavior.

This makes test file(s) cleaner/simpler/shorter. It's also a nice feature when file_json_equals is not quite what I am looking for.

Created at 1 month ago
opened issue
bug: key_values() order is nondeterministic

While putting together unit tests ran into this issue where key_values() doesn't always return the same order, which makes unit tests fail randomly.

input:
  label: file
  file:
    paths:
      - ./input1.json
      - ./input2.json

pipeline:
  processors:
    - label: rewrite_message
      bloblang: |
          root = this.locations.map_each(line -> {
                    "state": line.state,
                     "location": {
                        "id": line.id,
                        "name": line.name
                    }
                 })
    - label: split
      unarchive:
        format: json_array

output:
  broker:
    pattern: fan_out
    outputs:
    - stdout:
        codec: lines

    batching:
      byte_size: 1024
      period: 100ms

      processors:
      - label: join
        archive:
          format: json_array

      - label: merge_state_groups
        bloblang: |
          root = {
            "time": now().format_timestamp(tz: "UTC"),
            "data": this.map_each(msg -> {msg.state: [msg.location]}).squash().
                               key_values().map_each(kv -> {"state":kv.key,"locations":kv.value})
          }

And here is the unit test:

tests:
  - name: output processor test
    target_processors: /output/broker/batching/processors
    input_batch:
      - json_content: |
          {"location":{"id":1,"name":"New York"},"state":"NY"}
      - json_content: |
          {"location":{"id":2,"name":"Bellevue"},"state":"WA"}
      - json_content: |
          {"location":{"id":3,"name":"Olympia"},"state":"WA"}
      - json_content: |
          {"location":{"id":4,"name":"Seattle"},"state":"WA"}

    output_batches:
      - - json_contains: |
            {
            "data": [
                    {
                        "locations": [
                            {
                                "id": 1,
                                "name": "New York"
                            }
                        ],
                        "state": "NY"
                    },
                    {
                        "locations": [
                            {
                                "id": 2,
                                "name": "Bellevue"
                            },
                            {
                                "id": 3,
                                "name": "Olympia"
                            },
                            {
                                "id": 4,
                                "name": "Seattle"
                            }
                        ],
                        "state": "WA"
                    }
                ]
            }
Created at 1 month ago
opened issue
feature request: allow `benthos test` to selectively run one or more tests from a yaml file

Right now when I have a number of tests in a yaml file I can't run individual test(s) during benthos test run.

Go and a lot of other tools provide a selection mechanism to select a subset of tests to run either by name, prefix or a wildcard.

Created at 1 month ago
opened issue
Feature Request: Add `Retry-After` header support to http_client

Looks like Retry-After header is not yet honored by http_client. It's a common mechanism to let the client know when to try again once the client hits a service rate limit.

P.S. Looks like input_http_server generates this header as well: https://github.com/benthosdev/benthos/blob/80327853267a033fffed629217af5fa166bc84af/internal/impl/io/input_http_server.go#L369

Created at 2 months ago