joe-elliott
Repos
51
Followers
184
Following
1

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.

2561
289

CNCF Jaeger, a Distributed Tracing Platform

16781
1973

Debugging techniques for .NET Core on Linux/Kubernetes

121
7

A Prometheus exporter that publishes cert expirations on disk and in Kubernetes secrets

245
63

Logs updates to Kubernetes Objects for storing and querying with Loki

102
16

Events

Drop thanos-io/thanos and ugprade a few modules (#1857)

  • Drop thanos-io/thanos and ugprade a few modules

Signed-off-by: Zach Leslie zach.leslie@grafana.com

  • Update changelog

Signed-off-by: Zach Leslie zach.leslie@grafana.com

  • Update serverless vendor

Signed-off-by: Zach Leslie zach.leslie@grafana.com

  • Update changelog

Signed-off-by: Zach Leslie zach.leslie@grafana.com

Signed-off-by: Zach Leslie zach.leslie@grafana.com

Use snake case on Azure Storage config (#1883)

  • use snake case on azure storage

Signed-off-by: Fausto David Suarez Rosario faustodavid@hotmail.com

  • run jsonnetfmt

Signed-off-by: Fausto David Suarez Rosario faustodavid@hotmail.com

  • update CHANGELOG.md

Signed-off-by: Fausto David Suarez Rosario faustodavid@hotmail.com

    • add example on update CHANGELOG.md
  • update azure docker-compose example

Signed-off-by: Fausto David Suarez Rosario faustodavid@hotmail.com

  • compile jsonnet

Signed-off-by: Fausto David Suarez Rosario faustodavid@hotmail.com

Signed-off-by: Fausto David Suarez Rosario faustodavid@hotmail.com

Setup stale github action workflow (#1865)

  • Setup stale github action workflow

  • reword message

[docs] document the use of TLS (#1881)

  • [docs] document the use of TLS

Signed-off-by: Zach Leslie zach.leslie@grafana.com

  • Update docs/tempo/website/configuration/tls.md

Co-authored-by: Kim Nylander 104772500+knylander-grafana@users.noreply.github.com

  • Update docs/tempo/website/configuration/tls.md

Co-authored-by: Kim Nylander 104772500+knylander-grafana@users.noreply.github.com

  • Update docs/tempo/website/configuration/tls.md

Co-authored-by: Kim Nylander 104772500+knylander-grafana@users.noreply.github.com

  • Update docs/tempo/website/configuration/tls.md

Co-authored-by: Kim Nylander 104772500+knylander-grafana@users.noreply.github.com

  • Update docs/tempo/website/configuration/tls.md

Co-authored-by: Kim Nylander 104772500+knylander-grafana@users.noreply.github.com

  • Update docs/tempo/website/configuration/tls.md

Co-authored-by: Kim Nylander 104772500+knylander-grafana@users.noreply.github.com

  • Update docs/tempo/website/configuration/tls.md

Co-authored-by: Kim Nylander 104772500+knylander-grafana@users.noreply.github.com

  • Update tls page weight

Signed-off-by: Zach Leslie zach.leslie@grafana.com

Signed-off-by: Zach Leslie zach.leslie@grafana.com Co-authored-by: Kim Nylander 104772500+knylander-grafana@users.noreply.github.com

[DOC] Add ingestors block to tanka doc (#1847)

  • Add ingestors block to tanka doc

  • Update docs/tempo/website/setup/tanka.md

  • Changed to be new config block

  • Fix section heading

  • Apply suggestions from code review from Dan Stadler

  • Update the formatting so the optional block is correct.

Signed-off-by: Heds Simons hedley.simons@grafana.com

  • Genericise optional altered resource configuration notes.

Signed-off-by: Heds Simons hedley.simons@grafana.com

Signed-off-by: Heds Simons hedley.simons@grafana.com Co-authored-by: Heds Simons hedley.simons@grafana.com

Add Compiling jsonnet section to CONTRIBUTING.md (#1890)

Signed-off-by: Fausto David Suarez Rosario faustodavid@hotmail.com

Signed-off-by: Fausto David Suarez Rosario faustodavid@hotmail.com

Increase operations-per-run for stale workflow (#1891)

Add Generic Forwarding documentation (#1872)

  • Add generic forwarder documentation

  • Update generic_forwarding.md based on comments

metrics-generator: truncate label names and values exceeding a configurable length (#1897)

Stop Distributor on fatal error (#1887)

  • Stop Distributor on fatal error

  • occasionally otel receivers will report a fatal error, and expected behavior is that the Host is stopped

  • match this behavior by stopping the receiver shim service and letting the distributor stop itself

  • Basic service supports failing out mid run

  • cant use idle service to do this i think

  • also add changelog entry

  • fix my bad copypaste

traceql: fix how timestamps are converted from search request (#1902)

  • traceql: fix how timestamps are converted from search request

  • Use go time constant

  • Check in import

Remove usage of min/max id in trace by id search (#1904)

  • skip max/minid check

Signed-off-by: Joe Elliott number101010@gmail.com

  • remove from trace by id lookup

Signed-off-by: Joe Elliott number101010@gmail.com

  • Added link to issue

Signed-off-by: Joe Elliott number101010@gmail.com

  • lint

Signed-off-by: Joe Elliott number101010@gmail.com

Signed-off-by: Joe Elliott number101010@gmail.com

Add tenant's dashboard (#1901)

  • Add tenant's dashboard

  • Regenerate dashboard

  • Fixes

  • Add entrylog changelog

[DOCS] Add intro to traces (#1859)

  • Add intro to traces

  • Fixed spelling

  • Apply suggestions from code review from Mario

Co-authored-by: Mario mariorvinas@gmail.com

  • Added info about RED

  • Added info about RED link

  • Apply suggestions from code review

Co-authored-by: Matt Dodson 47385188+MattDodsonEnglish@users.noreply.github.com

  • Update docs/tempo/website/getting-started/traces.md

  • Updated definitions for spans

  • Update tempo-in-grafana.md

  • Apply suggestions from code review from Ursula

Co-authored-by: Ursula Kallio ursula.kallio@grafana.com Co-authored-by: Heds Simons hedss@users.noreply.github.com

  • Change labels to trace ID

  • Update docs/tempo/website/getting-started/traces.md

Co-authored-by: Mario mariorvinas@gmail.com Co-authored-by: Matt Dodson 47385188+MattDodsonEnglish@users.noreply.github.com Co-authored-by: Ursula Kallio ursula.kallio@grafana.com Co-authored-by: Heds Simons hedss@users.noreply.github.com

tempo-mixin: don't pull in entire jsonnet-libs repository as dependency (#1909)

Bump actions/add-to-project from 0.3.0 to 0.4.0 (#1912)

Bumps actions/add-to-project from 0.3.0 to 0.4.0.


updated-dependencies:

  • dependency-name: actions/add-to-project dependency-type: direct:production update-type: version-update:semver-minor ...

Signed-off-by: dependabot[bot] support@github.com

Signed-off-by: dependabot[bot] support@github.com Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

tempo-mixin: tweak dashboards to support metrics without cluster label present (#1913)

  • tempo-mixin: tweak dashboards to support metrics without cluster label present

  • Update CHANGELOG.md

Parquet WAL (#1878)

  • vParquet WALBlock

Signed-off-by: Joe Elliott number101010@gmail.com

  • tests

Signed-off-by: Joe Elliott number101010@gmail.com

  • first pass fetch

Signed-off-by: Joe Elliott number101010@gmail.com

  • todo

Signed-off-by: Joe Elliott number101010@gmail.com

  • Fix flaky wal tests, add benchmark just for write path

  • Reuse buffer/writer for efficiency, fix timestamp replay

  • Make wal block version configurable, update many tests

  • ingester flush wal after cutting traces

  • Fix parquet slack time handling by writing meta.json along with data files on flush

  • Don't use parquet.GenericBuffer

  • Pool trace buffers

  • Add more wal benchmarks

  • Store map of trace ids to flushed page

  • Read fewer columns on wal replay

  • Add completeblock benchmark for all encoding combinations

  • Use internal iterator when completing parquet wal -> parquet complete block

  • Fix tests and lint

  • Rename/cleanup/fix benchmark config

  • Wire up searching wal blocks, and don't write flatbuffer data for wal blocks unless needed

  • Fix flaky test

  • Test all combinations of wal->complete versions, fix bug in v2 create path that was unused until now

  • Use pooling in parquet wal->complete process

  • Global trace pooling instead of per-iterator

  • Disable pooling again

  • Honor ingestionslack, lint

  • changelog

  • review feedback

  • Fix test

  • Ingester flush traces in order by ID, so we can skip buffering in the wal

  • comment

  • comment

  • Parquet wal maintain index of traces to rownumber to allow out-of-order writes (although in-order performs better)

  • update mod

  • cleanup

  • Add partial replay test for parquet wal block

  • lint

Signed-off-by: Joe Elliott number101010@gmail.com Co-authored-by: Joe Elliott number101010@gmail.com

Fix broken images (#1915)

Added leveled pool

Signed-off-by: Joe Elliott number101010@gmail.com

Created at 14 hours ago

only create the done channel if necessary

Signed-off-by: Joe Elliott number101010@gmail.com

Created at 16 hours ago
issue comment
TraceView: Represent OTLP Span Intrinsics correctly

The issue is primarily that those attributes don't really exist but the Grafana traceview is displaying them as if they do.

If you search Tempo for { .error = true } it won't find that span b/c there isn't actually an attribute error with the value true. To find that span you have to search for { status = error }. We need to remove those attributes in order to faithfully represent the data and not confuse our users.

Created at 16 hours ago
issue comment
Adding GOMEMLIMIT to compactor libsonnet

I haven't had a chance to reevaluate the impact of ballast in a modern go application. I do think that GOMEMLIMIT and ballast are designed to serve different purposes.

Ballast is designed to signal to the GC that you are expecting to create an application that consumes a large amount of memory. It prevents the GC from overworking in the startup phase of your application.

GOMEMLIMIT is meant to be a hard limit at which you are ok with expending a large amount of CPU (to the detriment of the application's main focus) to GC. This works quite well for compactors. We'd much rather slow down a compactor then have it OOM and throw away the last few minutes of work.

For other components it's not so obvious what we'd prefer. If a querier takes on too much work is it better to OOM so the queries get retried? or just answer them far slower. We've experimented with GOMEMLIMIT on ingesters and, as the allocated memory GOMEMLIMIT, the ingester starts failing a larger and larger percentage of its requests.

If you do some experimentation with these parameters please report back! The goal is to use these tools to create a Tempo that is easy to operate and fulfills the operational expectations of its users.

Created at 17 hours ago
issue comment
Getting 500 Internal server error

Please check your querier logs. They should indicate the block guids that they are having trouble with. You can then remove those blocks specifically from object storage.

Created at 17 hours ago
issue comment
TraceView: Showing and understanding large traces

Jaeger has added a flamegraph view so perhaps we could look there for inspiration. This particular view would likely serve primarily as a way to get quickly get summary information from an enormous trace. I'm also not sure how it could help with navigation or search.

Agree with all of the thoughts regarding Critical Path and Dynamic Span filtering.

Created at 19 hours ago

Use DefaultMaxRowsPerRowGroup when numRows is invalid (#404)

Co-authored-by: gdanichev GDanichev@artenecy.ru

Created at 20 hours ago

Use xk6-client-tracing in examples to generate traces (#1920)

  • Use xk6-client-tracing in examples to generate traces

Replace the synthetic-load-generator with xk6-client-tracing in docker-compose examples

The grafana7.4 example still uses the load generator because it requires the generated trace IDs to be logged for querying

  • Move load-generator.json to grafana7.4 example

The file is no longer used by multiple examples

  • Pin xk6-client-tracing version to v0.0.2

  • Mention fix for #902 in change log

  • Upgrade to Grafana 9.3.0 in docker-compose examples

Created at 22 hours ago
closed issue
Docker Compose examples do not work on Apple M1 Hardware

Describe the bug

When running the Docker Compose examples on Apple's M1 ARM hardware the synthetic load generator container starts but hangs (I'll update this with the error message seen in the logs later).

Looking at the GitHub repository for the image, it appears that it has been archived and is not actively maintained any more so there'll be no plans to support ARM based processors.

To Reproduce Steps to reproduce the behavior:

  1. Start one of the Docker Compose examples that uses the synthetic-load-generator image on an Apple M1 machine

Expected behavior Docker Compose starts as expected and load is generated

Environment: Laptop

Additional Context

Created at 22 hours ago
pull request closed
Use xk6-client-tracing in examples to generate traces

What this PR does:

Replace the synthetic-load-generator with xk6-client-tracing in docker-compose examples.

The grafana7.4 example still uses the load generator because it requires the generated trace IDs to be logged for querying.

Upgrade Grafana used in the examples to the latest version.

Which issue(s) this PR fixes:

Service graphs are now working in all docker-compose examples that use xk6-client-tracing.

Fixes #902

Note It would be great if a reviewer could test the updated examples with a M1 Macbook to make sure #902 is solved

Checklist

  • [ ] Tests updated
  • [ ] Documentation added
  • [x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
Created at 22 hours ago
issue comment
Use xk6-client-tracing in examples to generate traces

Nice! Now I can get a macbook pro ;)

Created at 22 hours ago
opened issue
Adding value large traces

The problem: The value of viewing individual traces in Grafana goes down the larger they are. Even a trace in the hundreds of spans can start to exceed the size that can easily be parsed looking at the current traceview. Loki/Tempo/Mimir can easily make traces in the 10s or 100s of thousands of spans.

A number of ideas have been discussed, but there hasn't been progress in this area. This issue has been created to catalog some of the ideas in the hopes that we can add some value here in the future.

Critical Path Highlighting Often a user of tracing is most interested in the paths that "push" the duration of the trace. By either highlighting, filtering down to, or otherwise indicating the longest paths in the trace we can help a user immediately view the most important parts of a trace.

Dynamic Span Filtering There is modest search functionality available in the current trace view. Perhaps this feature could be used to express conditions by which to filter the traceview. Common ideas might be to only show spans from particular services, namespaces, spans in error or perhaps only the critical path.

Flamegraph visualization If we consider a trace a similar to a stack trace we could render a large trace as a flamegraph. This would provide valuable summary information about the overall shape of the request and where time was spent without individually renderings thousands upon thousands of spans.

cc @slim-bean @cyriltovena

Created at 1 day ago
opened issue
TraceView: Represent OTLP Span Intrinsics correctly

What happened: Currently when rendering traces from Tempo, Grafana pushes certain "intrinsic" properties on a span into attributes (or ignores them entirely). There are quite a few intrinsic properties to a span that should be considered when doing any work on the traceview, but this issue will only specifically call out two of them:

Status

Currently when rendering a span with a status of error Grafana creates an "error" attribute with the value true. This should instead be displayed somehow as part of the span. image

Kind

Currently when rendering a span with the kind set Grafana creates a "span.kind" attribute with a string value. This should also be displayed as an intrinsic element of the span instead of as an attribute.

What you expected to happen: These properties are intrinsic to the span should not be rendered as attributes.

How to reproduce it (as minimally and precisely as possible): Render a trace with any of the above properties.

Additional Context: Tempo uses the OTLP object model natively and currently the Grafana UI doesn't display represent that faithfully. This will be increasingly important as TraceQL rolls out. The language distinguishes between dynamic attributes and intrinsic properties of the span and we would like the Grafana view to be consistent with the object model and query language.

Created at 1 day ago

Add links to fully populated trace (#1923)

  • add links to fully populated slie

Signed-off-by: Joe Elliott number101010@gmail.com

  • lint

Signed-off-by: Joe Elliott number101010@gmail.com

Signed-off-by: Joe Elliott number101010@gmail.com

Created at 1 day ago
pull request closed
Add links to fully populated trace

Improves tests by adding links to the "fullypopulatedtrace" that is used in a lot of tests. This was missed when link support was added.

Created at 1 day ago
Zipkin translator not respecting "error" tag correctly

This means that if I throw an exception/error with an error message "true", it would not be an error anymore.

The span would still be in error b/c the OTLP span would be have its status code set to error. Presumably on translating from OTLP back to Zipkin you would have to add error="true" if the otlp error status was set and there was no error attribute.

I proposed the behavior only to keep it backwards compatible with existing, but don't have strong opinions.

Created at 1 day ago
pull request opened
Add support for synchronous page reading

This PR adds a File level config option that controls whether or not pages are read synchronously. It defaults to async to maintain backwards compatibility.

I'm not married to any of the names in this PR so feel free to suggest alternatives.

Created at 1 day ago

switch

Signed-off-by: Joe Elliott number101010@gmail.com

Created at 1 day ago
create branch
joe-elliott create branch optional-async-reader
Created at 1 day ago

lint

Signed-off-by: Joe Elliott number101010@gmail.com

Created at 1 day ago

Collapse Buffer Pools to 1 (#371)

fix issue 368 (#376)

  • fix issue 368

  • print cpu id in CI

  • cat /proc/cpuinfo

  • only run test for issue 368

  • repro

  • disable dictionaryLookupByteArrayString optimization

  • deactivate all pointer writes in assembly

  • remove sparse.gatherString optimization

fix issue 377 (#378)

set a limit to the number of row groups that a parquet writer will agree to create (#379)

fix issue 370 (#380)

  • fix issue 370

  • Update buffer.go

Co-authored-by: Kevin Burke kevin.burke@segment.com

  • more documentation

Co-authored-by: Kevin Burke kevin.burke@segment.com

do not zero bufferedPages when they are released (#383)

search: add comments to search.go (#388)

  • search: add comments to search.go

I was going through this function to check my understanding of page indexes while thinking about granules in TraceDB.

I believe that granules in other databases like clickhouse and frostdb are the equivlant of column pages in parquet. The page index is the sparse index over the pages (granules) wherein the data being searched for is contained.

This check on the binary search mostly confirms that suspicion, if I'm not mistaken. Please correct me if I'm misunderstaneding something.

Also, as part of this, I found what is possibly an optimization to avoid skipping ahead a page, by comparing the current value to the max in the current page, instead of letting the binary search check the next page, then falling through to the final if-block.

It may or may not be a bug also, if, for example, the following happens:

you have pages:

[1,2,3] [3,4,5] [6,7,8]

Searching for 3 in this case would not return the first page and instead return the second, missing values of 3 in the first page.

I'll have to look at writing a test for this, but the current search test will pass creating this scenario without altering the test somewhat (since it's only looking for presence, not the expected index)

  • clarify, and handle case properly, I think

  • nit

  • search: remove nextIdx-1 > curIdx bounds check, and MaxValue check

The bounds check actually misses some cases where we do want to assign topIdx = nextIdx

e.g. searchValue:40 pages: [1,2,3] [10,20,30] [30,40,40] [40,50,60]

And the MaxValue check is unecessary since we would want to search pages[curIdx] in the case where minValue == value (previously handled by the default case statement, which is no longer necessary removing the MaxValue check.)

  • failing test for #388

add configuration option to limit the number of rows per row group (#394)

  • add configuration option to limit the number of rows per row group

  • limit length and capacity of rows slice

  • rename function arguments to avoid overloading variable names

fix error message of parquet.ErrTooManyRowGroups (#395)

Refactor assigning Parquet values to Go values (#387)

As part of exploring #266, it was identified that the code around deserializing Parquet values into Go values should be refactored to make supporting other types like time.Time possible. This is the first change which refactors the monolithic assignValue() function into an AssignValue() function implemented on each Parquet type.

Do not alloc page offset buffer when dictionary encoded (#398)

set byte slice capacity in encoding (#386)

Convert values between Parquet logical types (#393)

To support deserializing Parquet timestamp values for #266, there needs to be a way to convert between different Parquet types. For example, there might be a TIMESTAMP(isAdjustedToUTC=true, unit=MILLIS) value stored in a file and it needs to be deserialized into a time.Time which expects a number of nanoseconds. This PR adds a ConvertValue(val Value, typ Type) function to the parquet.Type interface which accepts the source Value and Type and return a Value converted to the Type of the receiver.

Allow key value metadata to be set after writing rows (#399)

Run tests on main, display status of main (#403)

Serialize time.Time as a timestamp (#321)

Serialize time.Time values as Parquet timestamps. The default unit is NANOS and can be changed using the timestamp() struct tag.

type timeColumn struct { t1 time.Time t2 time.Time parquet:",timestamp(millisecond)" }

Read decimal column (#406)

Reading a parquet file with a decimal column isn't loaded with logical type information. This behavior was not implemented. decimalType is more complex from the other types because a parquet decimal can be backed by multiple different physical types.

This PR loads logical type information for DECIMAL fields.

Closes #365

Update decimal string format (#407)

parquet-cli prints the string format of decimals as DECIMAL(precision,scale). Update parquet-go's string format to match.

Fix panic when reading file with no row groups (#408)

Fix bug that occurs when ReadAt returns EOF (#416)

Created at 1 day ago
pull request opened
Add links to fully populated slice

Improves tests by adding links to the "fullypopulatedtrace" that is used in a lot of tests. This was missed when link support was added.

Created at 1 day ago
create branch
joe-elliott create branch fix-links
Created at 1 day ago

Bump github.com/google/flatbuffers from 2.0.0+incompatible to 22.10.26+incompatible (#1886)

  • Bump github.com/google/flatbuffers

Bumps github.com/google/flatbuffers from 2.0.0+incompatible to 22.10.26+incompatible.


updated-dependencies:

  • dependency-name: github.com/google/flatbuffers dependency-type: direct:production update-type: version-update:semver-major ...

Signed-off-by: dependabot[bot] support@github.com

  • Update serverless gomod

Signed-off-by: dependabot[bot] support@github.com Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: grafanabot bot@grafana.com

Bump github.com/pierrec/lz4/v4 from 4.1.15 to 4.1.17 (#1894)

  • Bump github.com/pierrec/lz4/v4 from 4.1.15 to 4.1.17

Bumps github.com/pierrec/lz4/v4 from 4.1.15 to 4.1.17.


updated-dependencies:

  • dependency-name: github.com/pierrec/lz4/v4 dependency-type: direct:production update-type: version-update:semver-patch ...

Signed-off-by: dependabot[bot] support@github.com

  • Update serverless gomod

Signed-off-by: dependabot[bot] support@github.com Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: grafanabot bot@grafana.com

Bump go.opencensus.io from 0.23.0 to 0.24.0 (#1895)

  • Bump go.opencensus.io from 0.23.0 to 0.24.0

Bumps go.opencensus.io from 0.23.0 to 0.24.0.


updated-dependencies:

  • dependency-name: go.opencensus.io dependency-type: direct:production update-type: version-update:semver-minor ...

Signed-off-by: dependabot[bot] support@github.com

  • Update serverless gomod

Signed-off-by: dependabot[bot] support@github.com Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: grafanabot bot@grafana.com

chore: correct link to image in readme (#1921)

traceql: pass common.SearchOptions from querier to storage layer (#1906)

  • traceql: pass common.SearchOptions from querier to storage layer

  • Split up SpansetFetcher used by engine and implemented by backend blocks

  • Apply SearchConfig to SearchOptions in Fetch

  • Also fix serverless handler

  • Replace common.SearchOptions{} with common.DefaultSearchOptions()

  • Fix signature walBlock.Fetch

Add TLS support to vulture (#1874)

  • Add TLS support to vulture

Signed-off-by: Zach Leslie zach.leslie@grafana.com

  • Modify config name to separate query vs push TLS

Signed-off-by: Zach Leslie zach.leslie@grafana.com

  • Update changelog

Signed-off-by: Zach Leslie zach.leslie@grafana.com

Signed-off-by: Zach Leslie zach.leslie@grafana.com

Parquet Ingester: Perf Improvements (#1918)

  • row iteration

Signed-off-by: Joe Elliott number101010@gmail.com

  • improve extend reuse slice

Signed-off-by: Joe Elliott number101010@gmail.com

  • increase pool sizes

Signed-off-by: Joe Elliott number101010@gmail.com

  • cleanup

Signed-off-by: Joe Elliott number101010@gmail.com

  • made global pool

Signed-off-by: Joe Elliott number101010@gmail.com

  • lint

Signed-off-by: Joe Elliott number101010@gmail.com

  • review

Signed-off-by: Joe Elliott number101010@gmail.com

Signed-off-by: Joe Elliott number101010@gmail.com

Created at 1 day ago