aalexand
Repos
33
Followers
21

Events

started
Created at 4 days ago
issue comment
proposal: Support Timestamps and Labels for Individual Events

That means that on the read path there is an if statement to take the old path or the new path. With the num_keep_locations approach there is no if statement.

Created at 1 week ago
issue comment
proposal: Support Timestamps and Labels for Individual Events

Nice idea 👍. If we're going to explore this space of forwards incompatible solutions, we should definitely take a look at this approach, along with the tree stuff proposed by @mhansen, and perhaps also stack trace hashing as pioneered by prodfiler/elastic.

One note is that approaches like using a tree are incompatible in both directions while the "incremental stack encoding" approach I mentioned at least allows all current profiles to be transparently opened by the new code since num_keep_locations is unset and so is effectively zero which means reuse the whole stack.

Created at 1 week ago
pull request opened
Apply -noinlines flag to the proto output.

This is similar in spirit to #393. By default this flag is not set, but if it's specified on the command line it's better to respect the value.

This addresses b/259563268.

Created at 1 week ago
Created at 1 week ago
create branch
aalexand create branch fix-noinlines
Created at 1 week ago

Speed-up profile management. (#729)

Time taken for "top" listing for a large (34MB) profile drops by 15%:

name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   11.2s ± 2%  -14.72%  (p=0.008 n=5+5)

Furthermore, the time taken to merge/diff 34MB profiles drops by 53%:

Merge/2-12   7.74s ± 2%   3.63s ± 2%  -53.09%  (p=0.008 n=5+5)

Details follow:

The cost of a trivial merge was very high (4s for 34MB profile). We now just skip such a merge and save the 4s.

  • Only create a Sample the first time a sample key is seen.
  • Faster ID to *Location mapping by creating a dense array that handles small IDs (this is almost always true).
  • Faster sampleKey generation during merging by emitting binary encoding of numbers and using a strings.Builder instead of repeated fmt.Sprintf.

The preceding changes drop the cost of merging two copies of the same 34MB profile by 53%:

name        old time/op  new time/op  delta
Merge/2-12   7.74s ± 2%   3.63s ± 2%  -53.09%  (p=0.008 n=5+5)
  • Use temporary storage when decoding to reduce allocations.
  • Pre-allocate space for all locations in one shot when creating a Profile.

The preceding speed up decoding by 13% and encoding by 7%:

name      old time/op  new time/op  delta
Parse-12   2.00s ± 4%   1.74s ± 3%  -12.99%  (p=0.008 n=5+5)
Write-12   679ms ± 2%   629ms ± 1%   -7.44%  (p=0.008 n=5+5)

When used in interactive mode, each command needs to make a fresh copy of the profile since a command may mutate the profile. This used to be done by serializing/compressing/decompressing/deserializing the profile per command. We now store the original data in serialized uncompressed form so that we just need to deserialize the profile per command. This change can be seen in the improvement in the time needed to generate the "top" output:

name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   12.4s ± 0%  -5.84%  (p=0.008 n=5+5)
  • Avoid filtering cost when there are no filters to apply.
  • Avoid location munging when there are no tag roots or leaves to add.
  • Faster stack entry pruning by caching the result of demangling and regexp matching for a given function name.
name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   12.3s ± 2%  -6.33%  (p=0.008 n=5+5)
  • Added benchmarks for profile parsing, serializing, merging.
  • Added benchmarks for a few web interface endpoints.
  • Added a large profile (1.2MB) to proftest/testdata. This profile is from a synthetic program that contains ~23K functions that are exercised by a combination of stack traces so that we end up with a larger profile than typical. Note that the benchmarks above are from an even larger profile (34MB) from a real system, but that profile is too big to be added to the repository.

Start using golangci-lint instead of the deprecated x/lint. (#731)

See https://github.com/golang/go/issues/38968 for the x/lint deprecation message.

I'm also putting all tool invocations into test.sh so that it's easier to run all checks locally during the development time. GitHub Actions are nice but they make running the checks locally more difficult.

Disable golangci-lint on tip (#735)

  • Disable golangci-lint on tip

  • Use devel version instead of tip

  • Use environment variable to disable golangci linter

  • Fix typo

Compatibilize profiles before merging (#734)

Apply -noinlines flag to the proto output.

This is similar in spirit to #393. By default this flag is not set, but if it's specified on the command line it's better to respect the value.

This addresses b/259563268.

Created at 1 week ago
issue comment
proposal: Support Timestamps and Labels for Individual Events

In that case it might be enough to standardize on label names for timestamps. Is that a direction you'd be interested in exploring?

I would prefer not to special case specific tag names, but rather have a tag unit type that denotes a timestamp and have support for that. Similar to how memory and duration units are handled. The magnitude could be a part of the unit to support the different resolution of the value.

On migrating to this proto to get call stack that can be reused between different samples to minimize duplication - yeah, breaking backward compatibility is expensive and would force a lot of changes upon us that are difficult to justify.

One idea I considered in the past that is roughly backward compatible is having a new num_keep_locations field in the Sample message like

message Sample {
  // The ids recorded here correspond to a Profile.location.id.
  // The leaf is at location_id[0].
  repeated uint64 location_id = 1;

  // Number of location IDs to keep from the previous sample. This is an
  // optimization that allows incremental encoding of the sample stack.
  // For example, if the stack is exactly the same as the previous one because
  // only the labels are different, then all of the location IDs can be reused
  // and so num_keep_locations could be set to the length of the location_id
  // field in the previous sample and the location_id field in the current sample
  // would be empty.
  uint64 num_keep_locations = 4;

  // The type and unit of each value is defined by the corresponding
  // entry in Profile.sample_type. All samples must have the same
  // number of values, the same as the length of Profile.sample_type.
  // When aggregating multiple samples into a single sample, the
  // result has a list of values that is the element-wise sum of the
  // lists of the originals.
  repeated int64 value = 2;

  // label includes additional context for this sample. It can include
  // things like a thread id, allocation size, etc
  repeated Label label = 3;
}

But this is not forward compatible (old pprof won't be able to read new profiles and even recognize them as new and refuse) and it's unclear (aka needs to be measured) how much actual profile size savings this would produce for compressed profiles.

Created at 1 week ago

Compatibilize profiles before merging (#734)

Created at 2 weeks ago
pull request closed
Compatibilize profiles before merging

pprof merging depends on the number of sample type and their order. That requires additional preparation steps before merging profiles that are not entirely identical in terms of sample type sets. The PR makes that process easier.

CompatibilizeSampleTypes is called to allow merging profiles with different sets of sample types. The resulting profile has sample types that appear in all profiles only.

Created at 2 weeks ago
pull request opened
Update codecov action version.
Created at 2 weeks ago
create branch
aalexand create branch update-codecov
Created at 2 weeks ago

Disable golangci-lint on tip (#735)

  • Disable golangci-lint on tip

  • Use devel version instead of tip

  • Use environment variable to disable golangci linter

  • Fix typo

Created at 2 weeks ago
pull request closed
Disable golangci-lint on tip

golangci-lint fails on tip.

Created at 2 weeks ago
issue comment
proposal: Support Timestamps and Labels for Individual Events

If the changes are to be backward incompatible then it's not really pprof anymore per se and should probably be done as part of the OpenTelemetry profiling format work format instead.

Created at 2 weeks ago
issue comment
Start using golangci-lint instead of the deprecated x/lint.

Not expected. Looks like it fails at Go tip - perhaps Go mainline added something that the linter doesn't like.

Created at 2 weeks ago
issue comment
why printWebSource is much complex than printSource

I am not quite sure what the actual question about the tool is. Please feel free to file an issue that is more focused on the user-level side of things.

Created at 2 weeks ago
closed issue
why printWebSource is much complex than printSource

juse wonder why printWebSource is much complex than printSource

printSource: print annotated source code, and order by func name, file name, that is all.

but the function printWebSource is much complex.

Created at 2 weeks ago
closed issue
How can I located the function in pprof with method has "*" ?

This is a duplicate question question to https://github.com/google/pprof/issues/512, since that's not solved while closed. Does any have a solution? thanks

It's about list <regexp> cannot take in an asterisk. And solutions from https://github.com/google/pprof/issues/512 not working.

(pprof) list (*conn).writeString
parsing argument regexp (*conn).writeString: error parsing regexp: missing argument to repetition operator: `*`
Created at 2 weeks ago
issue comment
proposal: Support Timestamps and Labels for Individual Events

Some initial thoughts:

  • The change is fairly large. Changes to profile.proto always immediately make me think about what it would take to make sure the existing tooling (pprof et cetera) supports it and this change is touching some of the fundamentals of the format.
  • The example uses a single sample type. It would help if the example would use 2 sample types to illustrate how the breakdown is represented in that case.
  • The change introduces the notion of "ticks" which is an entirely new concept. I wonder if we could avoid that somehow. I am not opposed to discussing how to make timestamps better represented in profile.proto but going as far as adding a new concept into the schema fields gives me a bit of a pause.
  • Instead of reusing label sets I wonder if normalizing out the stack would make things simpler, similar to this proto.

I need to look into this more though.

Created at 3 weeks ago
Created at 3 weeks ago

Added alternative flamegraph implementation that can show callers. (#716)

Add an experimental flame-graph implementation. It can be selected in pprof's web interface using the new "Flame (experimental)" menu entry. At some point this new implementation may become the default.

The new view is similar to flame-graph view. But it can show caller information as well. This should allow it to satisfy many users of Graph and Peek views as well.

Let's illustrate with an example. Suppose we have profile data that consists of the following stacks:

1000	main -> foo -> malloc
2000	main -> bar -> malloc

When main is selected, both the old and new views show:

[-------------------3000 main---------------------]
[---1000 foo-----] [----------2000 bar------------]
[---1000 malloc--] [----------2000 malloc---------]

But suppose now the user selects the right-most malloc slot. The old view will show just the path leading to that malloc:

[----------2000 main-----------]
[----------2000 bar------------]
[----------2000 malloc---------]

The new view will however show a flame-graph view that grows upwards that displays the call stacks leading to malloc:

[---1000 main----] [----------2000 main-----------]
[---1000 foo-----] [----------2000 bar------------]
[-------------------3000 malloc-------------------]

This caller display is useful when trying to determine expensive callers of function.

A list of important differences between the new view and flame graphs:

New view pros:

  1. Callers are shown, e.g., all paths leading to malloc.
  2. Shows self-cost clearly with a different saturation.
  3. Font size is adjusted to fit more text into boxes.
  4. Highlighting on hover shows other occurrences of a function.
  5. Search works more like other views.
  6. Pivot changes are reflected in browser history (so back and forward buttons can be used to navigate between different selections).
  7. Allows eventual removal of the D3 dependency, which may make integrations into various environments easier.
  8. Colors provide higher contrast between foreground and background.

New view cons:

  1. There are small differences in how things look and feel.
  2. Color-scheme is very different.
  3. Change triggered by selecting a new entry is not animated.

Update pprof documentation on tags. (#722)

Add missing options, restructure the text a bit.

Go 1.19 released, so update supported Go versions. (#721)

  • Go 1.19 released, so update supported Go versions.

  • Remove the usage of the deprecated ioutil package.

Add visual indication of inlined frames. (#723)

  • Add visual indication of inlined frames.

Mark frames sent to javascript with a boolean "Inlined" field. Use this field to change the display by (1) adding an "(inlined)" marker to the tooltip for the frame, and (2) by dropping any border between the caller and the caller.

Document flame graph display, including coloring and visual indication of inlining.

  • Tweak doc changes based on review feedback.

Add new symbol directory layout by build-id /xx/xxxxxxxx.debug. (#724)

This protocol is already supported by various tools and lldb. e.g.: https://github.com/llvm-mirror/lldb/blob/d01083a850f577b85501a0902b52fd0930de72c7/source/Symbol/LocateSymbolFile.cpp#L311

Among others it is used by Fuchisa OS builds.

Increase chunk size for concurrent fetch to 128 (#727)

Speed-up profile management. (#729)

Time taken for "top" listing for a large (34MB) profile drops by 15%:

name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   11.2s ± 2%  -14.72%  (p=0.008 n=5+5)

Furthermore, the time taken to merge/diff 34MB profiles drops by 53%:

Merge/2-12   7.74s ± 2%   3.63s ± 2%  -53.09%  (p=0.008 n=5+5)

Details follow:

The cost of a trivial merge was very high (4s for 34MB profile). We now just skip such a merge and save the 4s.

  • Only create a Sample the first time a sample key is seen.
  • Faster ID to *Location mapping by creating a dense array that handles small IDs (this is almost always true).
  • Faster sampleKey generation during merging by emitting binary encoding of numbers and using a strings.Builder instead of repeated fmt.Sprintf.

The preceding changes drop the cost of merging two copies of the same 34MB profile by 53%:

name        old time/op  new time/op  delta
Merge/2-12   7.74s ± 2%   3.63s ± 2%  -53.09%  (p=0.008 n=5+5)
  • Use temporary storage when decoding to reduce allocations.
  • Pre-allocate space for all locations in one shot when creating a Profile.

The preceding speed up decoding by 13% and encoding by 7%:

name      old time/op  new time/op  delta
Parse-12   2.00s ± 4%   1.74s ± 3%  -12.99%  (p=0.008 n=5+5)
Write-12   679ms ± 2%   629ms ± 1%   -7.44%  (p=0.008 n=5+5)

When used in interactive mode, each command needs to make a fresh copy of the profile since a command may mutate the profile. This used to be done by serializing/compressing/decompressing/deserializing the profile per command. We now store the original data in serialized uncompressed form so that we just need to deserialize the profile per command. This change can be seen in the improvement in the time needed to generate the "top" output:

name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   12.4s ± 0%  -5.84%  (p=0.008 n=5+5)
  • Avoid filtering cost when there are no filters to apply.
  • Avoid location munging when there are no tag roots or leaves to add.
  • Faster stack entry pruning by caching the result of demangling and regexp matching for a given function name.
name    old time/op  new time/op  delta
Top-12   13.2s ± 3%   12.3s ± 2%  -6.33%  (p=0.008 n=5+5)
  • Added benchmarks for profile parsing, serializing, merging.
  • Added benchmarks for a few web interface endpoints.
  • Added a large profile (1.2MB) to proftest/testdata. This profile is from a synthetic program that contains ~23K functions that are exercised by a combination of stack traces so that we end up with a larger profile than typical. Note that the benchmarks above are from an even larger profile (34MB) from a real system, but that profile is too big to be added to the repository.

Start using golangci-lint instead of the deprecated x/lint. (#731)

See https://github.com/golang/go/issues/38968 for the x/lint deprecation message.

I'm also putting all tool invocations into test.sh so that it's easier to run all checks locally during the development time. GitHub Actions are nice but they make running the checks locally more difficult.

Merge branch 'main' into felixge/timestamps

Created at 3 weeks ago

Start using golangci-lint instead of the deprecated x/lint. (#731)

See https://github.com/golang/go/issues/38968 for the x/lint deprecation message.

I'm also putting all tool invocations into test.sh so that it's easier to run all checks locally during the development time. GitHub Actions are nice but they make running the checks locally more difficult.

Created at 3 weeks ago
pull request closed
Start using golangci-lint instead of the deprecated x/lint.

See https://github.com/golang/go/issues/38968 for the x/lint deprecation message.

I'm also putting all tool invocations into test.sh so that it's easier to run all checks locally during the development time. GitHub Actions are nice but they make running the checks locally more difficult.

Created at 3 weeks ago