yeya24
Repos
153
Followers
222
Following
297

The Prometheus monitoring system and time series database.

45634
7377

Production-Grade Container Scheduling and Management

94048
32966

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.

11207
1680

A Chaos Engineering Platform for Kubernetes

1
0

Events

issue comment
AM: crash on handleOverSizedMessages

This should be fixed by upstream AM already by https://github.com/prometheus/alertmanager/pull/2543. Cortex also includes this fix since we are with AM v0.24. If you are still on old Cortex version v1.8.0 please upgrade to a newer version and try again.

Created at 1 hour ago
issue comment
Ingester: Added configuration to set the size of the tsdb in-memory queue used before flushing chunks to the disk

Nice! @t00350320 Could you please sign the DCO and fix the lint?

Created at 2 hours ago
issue comment
Expose postings cardinality stats from prometheus tsdb.Head

I love this feature and I think it could be very useful. Did you propose to expose the TSDB stats API https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats? Or another API to expose cardinality stats?

Created at 2 hours ago
issue comment
Cortex store gateway keeps going into crash/failure during startup

If there are no enough log from pods, could you please increase log level to debug and try again?

Created at 2 hours ago
issue comment
query failed after migrating data from thanos to cortex

@yk-zheng-zz Could you please send the goroutine dumps for the runtime panic error? The error log you showed seems another error so having the dump is the easiest for us to pinpoint the issue

Created at 2 hours ago
issue comment
Implement lazy retrieval of series from object store.

Let's fix the conflict and I think this pr is good to go?

Created at 1 day ago

update section

Signed-off-by: Ben Ye benye@amazon.com

Created at 1 day ago
issue comment
Support out of order samples ingestion

can out of order samples create bad results if a query is cached? after a few tests we need a page to clarify expectations to users.

This is a follow up in the query frontend to allow users to specify a non-cacheable time window.

Created at 1 day ago
issue comment
Make max outstanding queries per tenant config in limits

And since the query vertical sharding size is also per tenant, the queue size needs to be adjusted per tenant if we enable sharding for some tenants only.

Created at 4 days ago
pull request opened
Make max outstanding queries per tenant config in limits

Signed-off-by: Ben Ye benye@amazon.com

What this PR does:

Which issue(s) this PR fixes: Fixes #

Checklist

  • [ ] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
Created at 4 days ago

make max outstanding req per tenant config

Signed-off-by: Ben Ye benye@amazon.com

Created at 4 days ago
create branch
yeya24 create branch queue-size-per-tenant
Created at 4 days ago
issue comment
OTLP native support

Based on Prometheus dev summit notes https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit, Prometheus will support this natively soon and we can just reuse the handler. I recommend us to wait for upstream to not reinvent weels ourselves.

Goutham: Reconsider OTLP Ingest
The OpenTelemetry format is gaining adoption and I think we can make a lot of things simpler by adding an ingest layer similar to remote_write ingest to Prometheus.
Safeguards discussion
G: Behind flag not feature flag long term
Chris: gRPC is needed?
G: Yes, but not at first

CONSENSUS: We will implement this behind a feature flag according to the upstream OpenTelemetry Spec on OTLP → Prometheus datamodel mapping. We will have all appropriate safeguards in place before making it a first-class flag.
Created at 4 days ago

Update Thanos to get latest querysharding fix (#4963)

  • update Thanos to get latest querysharding fix

Signed-off-by: Ben Ye benye@amazon.com

  • update modules

Signed-off-by: Ben Ye benye@amazon.com

Signed-off-by: Ben Ye benye@amazon.com

update thanos to bring sharding support for label manipulation functions (#4966)

Signed-off-by: Ben Ye benye@amazon.com

Signed-off-by: Ben Ye benye@amazon.com

Start 1.14 Release (#4968)

  • prepare 1.14

Signed-off-by: Alan Protasio approtas@amazon.com

  • release date

Signed-off-by: Alan Protasio approtas@amazon.com

Signed-off-by: Alan Protasio approtas@amazon.com

Update README with KubeCon talk video and slides (#4973)

Signed-off-by: Alvin Lin alvinlin@amazon.com

Signed-off-by: Alvin Lin alvinlin@amazon.com

fix response error to be ungzipped when status code is not 2xx (#4975)

  • fix response to be gzipped when status code is not 2xx

Signed-off-by: Ben Ye benye@amazon.com

  • adding tests

Signed-off-by: Alan Protasio approtas@amazon.com

  • lint

Signed-off-by: Alan Protasio approtas@amazon.com

  • changelog

Signed-off-by: Alan Protasio approtas@amazon.com

Signed-off-by: Ben Ye benye@amazon.com Signed-off-by: Alan Protasio approtas@amazon.com Co-authored-by: Alan Protasio approtas@amazon.com

chunks can be migrated on v1.13.1 (#4983)

Signed-off-by: Friedrich Gonzalez friedrichg@gmail.com

Signed-off-by: Friedrich Gonzalez friedrichg@gmail.com

Update architecture diagram and doc (#4977)

  • Update architecutre diagram and doc

Signed-off-by: Alvin Lin alvinlin@amazon.com

Add clomonitor badge (#4985)

Badges are cool :)

Signed-off-by: Alvin Lin alvinlin@amazon.com

Signed-off-by: Alvin Lin alvinlin@amazon.com

Support Prometheus /api/v1/status/buildinfo API on Querier/QFE (#4978)

  • support Prometheus /api/v1/status/buildinfo API

Signed-off-by: Ben Ye benye@amazon.com

  • add unit test

Signed-off-by: Ben Ye benye@amazon.com

  • update api doc

Signed-off-by: Ben Ye benye@amazon.com

  • revert docs update

Signed-off-by: Ben Ye benye@amazon.com

Signed-off-by: Ben Ye benye@amazon.com

Make querier_sharding_test less brittle (#4970)

  • Make querier_sharding_test less brittle

Signed-off-by: Alvin Lin alvinlin@amazon.com

  • Make querier_sharding_test less brittle

Signed-off-by: Alvin Lin alvinlin@amazon.com

  • rate limit query per second to avoid 429

Signed-off-by: Alvin Lin alvinlin@amazon.com

  • rate limit query per second to avoid 429

Signed-off-by: Alvin Lin alvinlin@amazon.com

  • Better var name la

Signed-off-by: Alvin Lin alvinlin@amazon.com

Signed-off-by: Alvin Lin alvinlin@amazon.com

Upgrade build image to use go 1.19.3 (#4987)

Signed-off-by: Friedrich Gonzalez friedrichg@gmail.com

Signed-off-by: Friedrich Gonzalez friedrichg@gmail.com

Build Cortex with go 1.19.3 (#4988)

Signed-off-by: Friedrich Gonzalez friedrichg@gmail.com

Signed-off-by: Friedrich Gonzalez friedrichg@gmail.com

Add active series to all_user_stats page (#4972) (#4972)

Signed-off-by: songjiayang songjiayang1@gmail.com

Signed-off-by: songjiayang songjiayang1@gmail.com

support out of order samples ingestion feature

Signed-off-by: Ben Ye benye@amazon.com

update changelog

Signed-off-by: Ben Ye benye@amazon.com

address comments

Signed-off-by: Ben Ye benye@amazon.com

add back removed test

Signed-off-by: Ben Ye benye@amazon.com

update changelog

Signed-off-by: Ben Ye benye@amazon.com

Created at 4 days ago

update API doc

Signed-off-by: Ben Ye benye@amazon.com

Created at 4 days ago

update API doc

Signed-off-by: Ben Ye benye@amazon.com

Created at 4 days ago

add back removed test

Signed-off-by: Ben Ye benye@amazon.com

Created at 4 days ago
issue comment
Support out of order samples ingestion

PTAL. @alvinlin123 @songjiayang @alanprot

Created at 5 days ago