nikic
Repos
98
Followers
5842
Following
25

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

18920
6076

A PHP parser written in PHP

16060
868

Fast request router for PHP

4787
404

Extension that adds support for method calls on primitive types in PHP

1100
42

Iteration primitives using generators

1076
67

Extension exposing PHP 7 abstract syntax tree

884
75

Events

[ValueTracking] Fix incorrect computeConstantRange() arguments

The second argument is ForSigned, not UseInstrInfo.

Created at 13 hours ago

[InstCombine] Add extra test for non-overflowing usub.sat (NFC)

Same as the existing one, but with both nuw and nsw on the add.

Created at 13 hours ago

[InstCombine] Fold more intrinsics over selects

Move this handling to a centralized place and extend it to handle saturating add/sub intrinsics.

I originally wanted to make this fully generic rather than whitelist based, because this is legal and likely profitable for all speculatable intrinsics. The caveat is that for vector selects, the intrinsic can't perform cross-lane operations like a shuffle or reduction, which we don't really expose as a generic property right now. So for now I'm just extending the list.

Created at 13 hours ago

[InstCombine] Regenerate test checks (NFC)

[InstCombine] Add additional test cases for folding intrinsic into select (NFC)

Test cross-lane intrinsics with vector selects.

Created at 13 hours ago

[mlir][Transform] NFC - Fix spurious reflows

Revert "[AMDGPU] Select v_sat_pk_u8_i16"

This reverts commit 64b45db34a0cd979dae9ca3016e9da517e57b987.

Reason: the patterns are wrong which can result in a miscompilation. However, fixing the pattern is not trivial due to how i8 values are handled, and due to the additional type-checking performed by D147127: trunc/smax/smin are all defined as int ops in the DAG despite them working on vectors too.

As this is not a much-needed pattern, I prefer reverting for now until I can find time to properly rewrite the pattern.

[mlir] Use GenericAdaptor to simplify 1:N type conversion API.

For 1:N type conversion, there is a 1:N relationship between the original operands and the converted operands. The same is true for the results. The previous design passed an instance of a "mapping" class into each pattern that helped with handling this 1:N correspondance. However, this was still rather manual and, in particular, it required the use of magic constants for the indices of the different operands.

This commits uses the generated GenericAdaptor class that is generated for each op class in order to simplify this relationship further. The GenericAdaptor allows to wrap around a list of arbitrary types for each operand (via templating); for 1:N type conversion, this allows the operand accessors of the adaptor class to return a ValueRange that corresponds to the N values in the converted types. Patterns can thus use the named accessors instead of magic constants, which eliminates a common class of errors.

This commit further simplifies the API that patterns need to implement by making the operand and result type mappings part of the adaptor. Since many patterns only need one of the two (or even neither), this reduces the number of unnecessary arguments in many cases.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D147225

[MLIR][OpenMP][Flang] Set OpenMP target attributes in MLIR module

Scope of changes:

  1. Add attribute to OpenMP MLIR dialect which stores target cpu and target features
  2. Store target information in MLIR module

Differential Revision: https://reviews.llvm.org/D146612

Reviewed By: kiranchandramohan

Co-authored-by: Kiran Chandramohan kiran.chandramohan@arm.com

[Orc] Drop arch check in the DebugObjectManagerPlugin for ELF

Tested this with the new AArch32 backend on armv7l and it works without issues in GDB. The size of the load-address field is only 32-bit here, but we implicitly account for it by writing a ELFT::uint which is: https://github.com/llvm/llvm-project/blob/release/16.x/llvm/include/llvm/Object/ELFTypes.h#L57

So, instead of adding a newly supported machine type, let's just drop this restriction althogether.

[clang][Interp] Fix record initialization via CallExpr subclasses

We can't just use VisitCallExpr() here, as that doesn't handle CallExpr subclasses such as CXXMemberCallExpr.

Differential Revision: https://reviews.llvm.org/D141772

[Matrix] Update most dot tests using vXi64 to vXi32.

Update dot-product-int.ll tests to use mostly i32 instead of i64; there's no mul.2d instruction, so vector versions of v2i64 cannot be lowered efficiently.

[InstCombine] Regenerate test checks (NFC)

[Assignment Tracking][SROA] Handle DIArgList in migrateDebugInfo

If the to-be-split dbg.assign has a DIArgList and a new Value has been requested then use a kill-location for the new dbg.assign. We can't simply replace the value component (a DIArgList) with the new Value as that would leave the DIExpression in an invalid state (DW_OP_LLVM_arg operands with no arglist).

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D147312

[Assignment Tracking] Enable by default

See https://discourse.llvm.org/t/rfc-enable-assignment-tracking/69399

This sets the -Xclang -fexperimental-assignment-tracking flag to the value enabled which means it will be enabled so long as none of the following are true: it's an LTO build, LLDB debugger tuning has been specified, or it's an O0 build (no work is done in any case if -g is not specified or -gmlt is used).

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D146987

[Matrix] Add special case dot product lowering

Add special case to matrix lowering for dot products. Normal matrix lowering if optimized for either row-major or column-major, which results in many shufflevector instructions being generated for one vector. We work around this in our special case. We can also use vector-reduce adds instead of sequential adds to sum the result of the element-wise multiplication, which takes advantage of SIMD instructions.

Reviewed By: fhahn, thegameg

Differential Revision: https://reviews.llvm.org/D131125

[InstCombine] Remove min/max special case when folding into select

Now that we canonicalize to min/max intrinsics, we no longer need to guard against this here.

In fact, it seems like the issue from PR46271 was the final push for introducing the intrinsics in the first place...

[mlir][llvm] Import pointer data layout specification.

The revision moves the data layout parsing into a separate file and extends it to support pointer data layout specifications. Additionally, it also produces more precise warnings and error messages.

Reviewed By: Dinistro, definelicht

Differential Revision: https://reviews.llvm.org/D147170

[mlir] Fix casting of leading unit dims for vector.insert

When dropping leading unit dims of vector.insert's operands and creating a new vector.insert, its new position rank should be computed explicitly in two steps: first based on the numbers of leading unit dims dropped from the vector.insert's destination, then based on the numbers of leading unit dims dropped from its source.

Reviewed By: pifon2a

Differential Revision: https://reviews.llvm.org/D147280

[flang] Don't fold operation when shapes differ

When folding a binary operation between two array constructors, it is necessary to check if each value contained in the left operand has the same rank and shape as the one on the right. Otherwise, lowering would end up with an operation between values of different ranks/shapes, which could result in a crash.

For instance, the following code was crashing the compiler: integer :: x(4), y(2, 2), z(4)

z = (/x/) + (/y/)

Fixes #60229

Reviewed By: klausler, jeanPerier

Differential Revision: https://reviews.llvm.org/D147181

[bazel] Port 9d2b84ef6232

Fix a simple think-o; NFC

This was using a bitwise OR of two boolean member variables, now it's using a logical OR instead.

[libc][NFC] Adjust some CMake messages for the GPU build

Summary: This disables the MPFR warning on the GPU since we can't support it anyway. Also fixes a misspelled message.

[clang][Interp] Fix binary comma operators

We left the result of RHS on the stack in case DiscardResult was true.

Differential Revision: https://reviews.llvm.org/D141784

[Assignment Tracking] Remove assertion from DbgAssignIntrinsic::setAddress

Follow up to https://reviews.llvm.org/D146987.

Remove assertion that the Value must be a pointer type. This fires in real-world examples e.g. by codegenprepare introducing ptrtoint conversions.

The buildbots have not caught up yet but without this change the test compiler-rt/test/ubsan/TestCases/TypeCheck/vptr.cpp fails with an ICE.

Created at 14 hours ago

[InstCombine] Add additional test for folding intrinsic into select (NFC)

Created at 14 hours ago
issue comment
[clang] x86 Missed optimization for simple enum to string function

This is just a difference between PIC and non-PIC code.

Created at 14 hours ago
closed issue
[clang] x86 Missed optimization for simple enum to string function

Given the following code godbolt:

#define ENUM() \
    ENUM_I(A)  \
    ENUM_I(B)  \
    ENUM_I(C)  \
    ENUM_I(D)  \
    ENUM_I(E)  \
    ENUM_I(F)  \
    ENUM_I(G)  \
    ENUM_I(H)  \
    ENUM_I(I)  \
    ENUM_I(J)  \
    ENUM_I(K)  \
    ENUM_I(L)  \
    ENUM_I(M)  \
    ENUM_I(N)

enum class E {
#define ENUM_I(name) name,
    ENUM()
#undef ENUM_I

    COUNT,
};

const char* enum_name(E e) noexcept {
    switch (e) {
#define ENUM_I(name) \
    case E::name:    \
        return #name;

        ENUM()

#undef ENUM_I

        default:
            __builtin_unreachable();
    }
}

Running clang with -O3 -march=skylake generates:

enum_name(E):                         # @enum_name(E)
        movsxd  rax, edi
        lea     rcx, [rip + .Lreltable.enum_name(E)]
        movsxd  rax, dword ptr [rcx + 4*rax]
        add     rax, rcx
        ret
(strings...)

while gcc with -O3 -march=skylake generates:

enum_name(E):
        mov     edi, edi
        mov     rax, QWORD PTR CSWTCH.1[0+rdi*8]
        ret
(strings...)
Created at 14 hours ago

[InstCombine] Remove min/max special case when folding into select

Now that we canonicalize to min/max intrinsics, we no longer need to guard against this here.

In fact, it seems like the issue from PR46271 was the final push for introducing the intrinsics in the first place...

Created at 16 hours ago

[InstCombine] Regenerate test checks (NFC)

Created at 16 hours ago
create branch
nikic create branch perf/instcombine-call-select
Created at 16 hours ago
issue comment
Cannot represent a difference across sections

Probably related to the LLVM upgrade, but I was unable to reduce this to a pure LLVM reproducer.

Created at 17 hours ago
issue comment
Nightly segfaults when compiling Zenoh with non-zero `opt-level`

Upstream fix: https://github.com/llvm/llvm-project/commit/fc6e91fe8184129d2395b79ce42f4495b95b0d0d

Created at 17 hours ago
opened issue
Backport copyRangeMetadata() assertion fix

/cherry-pick fc6e91fe8184129d2395b79ce42f4495b95b0d0d

Fixes LLVM 16 regression reported in https://github.com/rust-lang/rust/issues/109775.

Created at 17 hours ago

[Local] Handle size mismatch between pointer/int in copyRangeMetadata()

SROA may convert a wide integer load into a narrow pointer load, make sure we don't crash. It would not be legal to transfer the metadata in this case.

Created at 17 hours ago
issue comment
Nightly segfaults when compiling Zenoh with non-zero `opt-level`

Minimized:

define i128 @test() {
  %a = alloca i128
  store ptr null, ptr %a
  %v = load i128, ptr %a, !range !0
  ret i128 %v
}

!0 = !{i128 1, i128 0}
Created at 18 hours ago
issue comment
Nightly segfaults when compiling Zenoh with non-zero `opt-level`

Preliminary reduction for opt -passes=sroa: https://gist.github.com/nikic/1c606a180808a42806526ec00cee947c

Created at 18 hours ago
closed issue
[clang] Incorrect Optimization On Pointer Arithmetic

reproduce code

class Ptr {
  int* _r;
  inline void m_set(int* p){
    _r = p + 1;
  }
  inline int* m_get(){
    return _r - 1;
  }
public:
  Ptr(int* p) { this->m_set(p); };
  inline operator bool() {
    return !!m_get();
  }
};

int main()
{
  Ptr p((int*)0);
  while(p);
  return 0;
}

when compile with -O0 or -O1, the output returns 0 as expected. when compile with -O2 or higher, the output is an infinite loop;

this issue first appeared at clang 3.3, and affects all archs (those i can test with godbolt), godbolt link below: https://godbolt.org/z/59j8GT1KG

gcc also share the same issue since gcc 9, FYI.

Created at 1 day ago
issue comment
[clang] Incorrect Optimization On Pointer Arithmetic

Adding a non-zero value to a null pointer (in C) or adding any value to a null pointer (in C++) is undefined behavior.

Created at 1 day ago
issue comment
Inefficient scalable vector codegen after "[SCEV] Add SCEVType to represent `vscale`."

Oh right, I confused the outputs.

It looks like the only real difference here is that previously the "vscale * 4" increment was repeated once for each GEP (in the form of a constexpr) and now it appears only once, outside the loop, which prevents it from being folded into the addressing mode.

I believe CGP is responsible for sinking such instructions to exploit addressing modes. I'm not particularly familiar with that code though.

Created at 1 day ago
issue comment
Cherry-pick DAGCombiner fix for issue #61315 to 16.0.1 branch

/cherry-pick e7c35d71007fab6e6729a0cfa821023128de2f74

Created at 1 day ago
create branch
nikic create branch perf/scev-trip-count-2
Created at 1 day ago
create branch
nikic create branch perf/scev-trip-count
Created at 1 day ago

[AArch64] Extend icmp bitcast to vecreduce fold to comparison with -1

D130163 added support for folding setcc (iN (bitcast (vNi1 X))), 0, (eq|ne) to setcc (iN (zext (i1 (vecreduce_or (vNi1 X))))), 0, (eq|ne).

There is a conjugate fold for comparison with -1 which uses vecreduce_and and sext instead.

Proof: https://alive2.llvm.org/ce/z/Zz--xy

Differential Revision: https://reviews.llvm.org/D146518

Created at 1 day ago
issue comment
Upgrade to LLVM 16, again

Right, the check improvements should be due to optimization improvements. What's pretty surprising is that relatively minor instruction count improvements map to such a large wall time improvement (based on the detailed statistics, apparently mostly in typeck?)

Created at 2 days ago
issue comment
llvm-wrapper: adapt for LLVM API change

@bors r+ rollup

Created at 2 days ago
issue comment
Inefficient scalable vector codegen after "[SCEV] Add SCEVType to represent `vscale`."

The post-LSR output contains vscale GEP constant expressions, which are supposed to be forbidden since recently, so something is going very wrong here.

Created at 3 days ago
create branch
nikic create branch fix-mingw
Created at 5 days ago
pull request opened
Limit to one link job on mingw builders

This is another attempt to work around https://github.com/rust-lang/rust/issues/108227.

By limiting to one link job, we should be able to avoid file name clashes in mkstemp().

Created at 5 days ago