uschindler
Repos
24
Followers
78
Following
5

Policeman's Forbidden API Checker

271
29

Apache Lucene open-source search software

1224
555

Apache Solr open-source search software

529
339

Data files of German Decompounder for Apache Lucene / Apache Solr / Elasticsearch

87
12

PANGAEA Framework for Metadata Portals (panFMP)

6
1

Elasticsearch Legacy Completion Plugin

7
4

Events

pull request closed
Initial rewrite of MMapDirectory for JDK-17 preview (incubating) Panama APIs (>= JDK-17-ea-b25)

INFO: This is a followup of #173: It's the same code base, but with API changes from JDK 17 applied

This is just a draft PR for a first insight on memory mapping improvements in JDK 17+.

Some background information: Starting with JDK-14, there is a new incubating module "jdk.incubator.foreign" that has a new, not yet stable API for accessing off-heap memory (and later it will also support calling functions using classical MethodHandles that are located in libraries like .so or .dll files). This incubator module has several versions:

  • first version: https://openjdk.java.net/jeps/370 (slow, very buggy and thread confinement, so making it unuseable with Lucene)
  • second version: https://openjdk.java.net/jeps/383 (still thread confinement, but now allows transfer of "ownership" to other threads; this is still impossible to use with Lucene.
  • third version in JDK 16: https://openjdk.java.net/jeps/393 (this version has included "Support for shared segments"). This now allows us to safely use the same external mmaped memory from different threads and also unmap it! This was implemented in the previous pull request #173
  • fourth version in JDK 17, included in build 25: https://openjdk.java.net/jeps/412 (actual version). This mainly changes the API around the scopes. Instead of having segments explicitely made "shared", we can assign them to some resource scope which control their behaviour. The resourceScope is produced one time for each IndexInput instance (not clones) and owns all segments. When the resourceScope is closed, all segments get invalid - and we throw AlreadyClosedException.

This module more or less overcomes several problems:

  • ByteBuffer API is limited to 32bit (in fact MMapDirectory has to chunk in 1 GiB portions)
  • There is no official way to unmap ByteBuffers when the file is no longer used. There is a way to use sun.misc.Unsafe and forcefully unmap segments, but any IndexInput accessing the file from another thread will crush the JVM with SIGSEGV or SIGBUS. We learned to live with that and we happily apply the unsafe unmapping, but that's the main issue.

@uschindler had many discussions with the team at OpenJDK and finally with the third incubator, we have an API that works with Lucene. It was very fruitful discussions (thanks to @mcimadamore !)

With the third incubator we are now finally able to do some tests (especially performance). As this is an incubating module, this PR first changes a bit the build system:

  • disable -Werror for :lucene:core
  • add the incubating module to compiler of :lucene:core and enable it for all test builds. This is important, as you have to pass --add-modules jdk.incubator.foreign also at runtime!

The code basically just modifies MMapDirectory to use LONG instead of INT for the chunk size parameter. In addition it adds MemorySegmentIndexInput that is a copy of our ByteBufferIndexInput (still there, but unused), but using MemorySegment instead of ByteBuffer behind the scenes. It works in exactly the same way, just the try/catch blocks for supporting EOFException or moving to another segment were rewritten.

It passes all tests and it looks like you can use it to read indexes. The default chunk size is now 16 GiB (but you can raise or lower it as you like; tests are doing this). Of course you can set it to Long.MAX_VALUE, in that case every index file is always mapped to one big memory mapping. My testing with Windows 10 have shown, that this is not a good idea!!!. Huge mappings fragment address space over time and as we can only use like 43 or 46 bits (depending on OS), the fragmentation will at some point kill you. So 16 GiB looks like a good compromise: Most files will be smaller than 6 GiB anyways (unless you optimize your index to one huge segment). So for most Lucene installations, the number of segments will equal the number of open files, so Elasticsearch huge user consumers will be very happy. The sysctl max_map_count may not need to be touched anymore.

In addition, this implements readLongs in a better way than @jpountz did (no caching or arbitrary objects). Nevertheless, as the new MemorySegment API relies on final, unmodifiable classes and coping memory from a MemorySegment to a on-heap Java array, it requires us to wrap all those arrays using a MemorySegment each time (e.g. in readBytes() or readLELongs), there may be some overhead du to short living object allocations (those are NOT reuseable!!!). In short: In future we should throw away on coping/loading our stuff to heap and maybe throw away IndexInput completely and base our code fully on random access. The new foreign-vector APIs will in future also be written with MemorySegment in its focus. So you can allocate a vector view on a MemorySegment and let the vectorizer fully work outside java heap inside our mmapped files! :-)

It would be good if you could checkout this branch and try it in production.

But be aware:

  • You need JDK 17 to compile and run with Gradle (set JAVA_HOME to it)
  • The lucene-core.jar will be JDK17 class files and requires JDK-17 to execute.
  • Also you need to add --add-modules jdk.incubator.foreign to the command line of your Java program/Solr server/Elasticsearch server

It would be good to get some benchmarks, especially by @rmuir or @mikemccand.

My plan is the following:

  • report any bugs or slowness, especially with Hotspot optimizations. The last time I talked to Maurizio, he taked about Hotspot not being able to fully optimize for-loops with long instead of int, so it may take some time until the full performance is there.
  • wait until the final version of project PANAMA-foreign goes into Java's Core Library (no module needed anymore)
  • add a MR-JAR for lucene-core.jar and compile the MemorySegmentIndexInput and maybe some helper classes with JDK 18/19 (hopefully?).
Created at 12 hours ago
issue comment
Initial rewrite of MMapDirectory for JDK-17 preview (incubating) Panama APIs (>= JDK-17-ea-b25)

Closing this as the JDK 19 impl was merged (#912).

Created at 12 hours ago
pull request closed
Initial rewrite of MMapDirectory for JDK-18 preview (incubating) Panama APIs (>= JDK-18-ea-b26)

INFO: This is a followup of #177: It's the same code base, but with API changes from JDK 18 applied

This is just a draft PR for a first insight on memory mapping improvements in JDK 18+.

Some background information: Starting with JDK-14, there is a new incubating module "jdk.incubator.foreign" that has a new, not yet stable API for accessing off-heap memory (and later it will also support calling functions using classical MethodHandles that are located in libraries like .so or .dll files). This incubator module has several versions:

  • first version: https://openjdk.java.net/jeps/370 (slow, very buggy and thread confinement, so making it unuseable with Lucene)
  • second version: https://openjdk.java.net/jeps/383 (still thread confinement, but now allows transfer of "ownership" to other threads; this is still impossible to use with Lucene.
  • third version in JDK 16: https://openjdk.java.net/jeps/393 (this version has included "Support for shared segments"). This now allows us to safely use the same external mmaped memory from different threads and also unmap it! This was implemented in the previous pull request #173
  • fourth version in JDK 17: https://openjdk.java.net/jeps/412 . This mainly changes the API around the scopes. Instead of having segments explicitely made "shared", we can assign them to some resource scope which control their behaviour. The resourceScope is produced one time for each IndexInput instance (not clones) and owns all segments. When the resourceScope is closed, all segments get invalid - and we throw AlreadyClosedException. The big problem is slowness due to heavy use of new instances just to copy memory between segments and java heap. This drives garbage collector crazy. This was implemented in previous PR #177
  • fifth version in JDK 18, included in build 26: https://openjdk.java.net/jeps/419 (actual version). This mainly cleans up the API. From Lucene's persepctive the MemorySegment API now has System.arraycopy()-like APIs to copy memory between heap and memory segments. This improves speed. It also handles byte-swapping automatically. This version of the PR also uses ValueLayout instead of varhandles, as it makes code more readable and type-safe.

This module more or less overcomes several problems:

  • ByteBuffer API is limited to 32bit (in fact MMapDirectory has to chunk in 1 GiB portions)
  • There is no official way to unmap ByteBuffers when the file is no longer used. There is a way to use sun.misc.Unsafe and forcefully unmap segments, but any IndexInput accessing the file from another thread will crush the JVM with SIGSEGV or SIGBUS. We learned to live with that and we happily apply the unsafe unmapping, but that's the main issue.

@uschindler had many discussions with the team at OpenJDK and finally with the third incubator, we have an API that works with Lucene. It was very fruitful discussions (thanks to @mcimadamore !)

With the third incubator we are now finally able to do some tests (especially performance). As this is an incubating module, this PR first changes a bit the build system:

  • disable -Werror for :lucene:core
  • add the incubating module to compiler of :lucene:core and enable it for all test builds. This is important, as you have to pass --add-modules jdk.incubator.foreign also at runtime!

The code basically just modifies MMapDirectory to use LONG instead of INT for the chunk size parameter. In addition it adds MemorySegmentIndexInput that is a copy of our ByteBufferIndexInput (still there, but unused), but using MemorySegment instead of ByteBuffer behind the scenes. It works in exactly the same way, just the try/catch blocks for supporting EOFException or moving to another segment were rewritten.

It passes all tests and it looks like you can use it to read indexes. The default chunk size is now 16 GiB (but you can raise or lower it as you like; tests are doing this). Of course you can set it to Long.MAX_VALUE, in that case every index file is always mapped to one big memory mapping. My testing with Windows 10 have shown, that this is not a good idea!!!. Huge mappings fragment address space over time and as we can only use like 43 or 46 bits (depending on OS), the fragmentation will at some point kill you. So 16 GiB looks like a good compromise: Most files will be smaller than 6 GiB anyways (unless you optimize your index to one huge segment). So for most Lucene installations, the number of segments will equal the number of open files, so Elasticsearch huge user consumers will be very happy. The sysctl max_map_count may not need to be touched anymore.

In addition, this implements readLongs in a better way than @jpountz did (no caching or arbitrary objects). The new foreign-vector APIs will in future also be written with MemorySegment in its focus. So you can allocate a vector view on a MemorySegment and let the vectorizer fully work outside java heap inside our mmapped files! :-)_

It would be good if you could checkout this branch and try it in production.

According to speed tests it should be as fast as MMAPDirectory, partially also faster because less switching between byte-buffers is needed. With recent optimizations also long-based absolute access in loops should be faster.

But be aware:

  • You need JDK 11 or JDK 17 to run Gradle (set JAVA_HOME to it)
  • You need JDK 18-ea-b26 (set RUNTIME_JAVA_HOME to it)
  • The lucene-core.jar will be JDK18 class files and requires JDK-18 to execute.
  • Also you need to add --add-modules jdk.incubator.foreign to the command line of your Java program/Solr server/Elasticsearch server

It would be good to get some benchmarks, especially by @rmuir or @mikemccand. Take your time and enjoy the complexity of setting this up! ;-)

My plan is the following:

  • report any bugs or slowness, especially with Hotspot optimizations. The last time I talked to Maurizio, he taked about Hotspot not being able to fully optimize for-loops with long instead of int, so it may take some time until the full performance is there.
  • wait until the final version of project PANAMA-foreign goes into Java's Core Library (java.base, no module needed anymore)
  • add a MR-JAR for lucene-core.jar and compile the MemorySegmentIndexInput and maybe some helper classes with JDK 18/19 (hopefully?).
  • Add a self-standing JDK-18 compiled module as external JAR. This can be added to classpath or moudle-path and be used by Elasticsearch or Solr. I will work on a Lucene-external project to do this.
Created at 12 hours ago
issue comment
Initial rewrite of MMapDirectory for JDK-18 preview (incubating) Panama APIs (>= JDK-18-ea-b26)

Closing this as the JDK 19 impl was merged (#912).

Created at 12 hours ago
After update to Lucene 9.4 use `--enable-preview` on Java==19 (exact) to allow mmap use new JDK APIs

Hi, My plan is to have another Gradle SourceSet in Lucene for 20 and then one for 21. The logic to load will be the same (no changes needed).

If I figure out that in JDK 20 there are no API changes (most likely for the part of Panama we use), we may just add a Gradle task to copy the 19 class files to the 20 MR-JAR folder while paring classfile version. This spares us to autoprovision JDK20. I will see what works. When JDK 21 LTS comes out we may drop support for the 19 and 20 in newer JDK versions and use a plain MR-JAR variant without any preview supporting logic.

In short: I will support 20 in addition later.

Created at 16 hours ago
issue comment
After update to Lucene 9.4 use `--enable-preview` on Java==19 (exact) to allow mmap use new JDK APIs

Hi @mark-vieira

Is this something the core/infra team could look into, given this would probably be implemented in JvmErgonomics?

Possibly, but I don't know how this works. The JVM options must be set before the main server is started as those are not system properties. It is real JVM options. If JvmErgonomics is called from a tool that runs before the main server with the same JVM, it could just do if (Runtime.version().feature() == 19) and then add the --enable-preview to the default JVM options. The check must be exact, not >= 19.

Created at 1 day ago
After update to Lucene 9.4 use `--enable-preview` on Java==19 (exact) to allow mmap use new JDK APIs

@dbwiddis You can use a similar approach for/with JNA vs. panama-foreign like we do by using an MR-JAR and adding some try/catch logic when loading the classes.

Created at 1 day ago
After update to Lucene 9.4 use `--enable-preview` on Java==19 (exact) to allow mmap use new JDK APIs

Of course when Lucene comes out tomorrow (hopefully, the schedule suggests it, but something bad can always happen), please update Lucene to 9.4.0 first. I have no idea at which stage the snapshot currently used is: 9.4.0-snapshot-ddf0d0a

Created at 2 days ago
After update to Lucene 9.4 use `--enable-preview` on Java==19 (exact) to allow mmap use new JDK APIs

Ah OK, I did not know whcih versions you are running. Because the sister project ES is already on JDK 19 for testing and bundles 18 at moment.

Created at 2 days ago
After update to Lucene 9.4 use `--enable-preview` on Java==19 (exact) to allow mmap use new JDK APIs

Description

Apache Lucene 9.4 will have support for Java 19 Panama APIs to mmap index files (using a MR-JAR). See https://github.com/apache/lucene/pull/912 for more information.

As those APIs are not yet enabled by default in the JDK, we have to still use some opt-in approach, controlled by Java's command line:

  • Lucene by default uses the old implementation using MappedByteBuffer and several hacks which may also risk crushing the JDK if an open index is closed from another thread while a search is running (this is well known). If Java 19 is detected, Lucene will log a warning through JUL when MMapDirectory is initialized (see below).
  • If you pass --enable-preview to the Java command line (next to heap settings), it will enable preview APIs in JDK (https://openjdk.org/jeps/12). Lucene detects this and switches MMapDirectory and uses a new implementation MemorySegmentIndexInput for the inputs to use those new APIs (at moment it will also log this as "info" message to JUL). The new APIs are safe and can no longer crush the JVM. But most importantly, all index files are now mapped in portions of 16 GiB instead of 1 GiB into memory. In fact, unless an index is force-merged to one segment, all index files will then consist only of one memory mapping spawning the whole file! This will help Hotspot to further optimize reading as only one implementation on MemorySegmentIndexInput ist used. In addition, because the number of mappings is dramatically reduced (approximately 5 times less mappings, because the maximum segment size is 5 Gigabytes by default and all such segments now use one instead of 5 mappings). This may allow users to no longer change sysctl (see max_map_count @ https://opensearch.org/docs/latest/opensearch/install/important-settings/) and go with defaults of OS. On the other hand users may host more indexes with many more segments on one node.

Some TODOs:

  • Make sure that Opensearch also redirects stuff logged via java.util.logging (JUL) to its own log file, so they do not land in console. This can be done with log4j by adding the log4j-jul adapter and install it using a system property in the Bootstrap classes. I have not checked if this is already done. The reason for this is that Apache Lucene now logs some events using java.util.logging since Lucene 9.0. Some of those events are MMapDirectory messages (e.g., when unmapping was not working) or few others like some module system settings are incorrect. Logging is very seldom, but for this feature it will definitely log using JUL, so it would be good to make sure Opensearch redirects JUL logging correctly to its own loggers. This could be a separate issue!
  • The Opensearch startup script should pass --enable-preview as command line flag if exactly Java 19 is used to start up Opensearch. If this is not done, a warning gets logged (see above).

Important: Lucene 9.4 only supports this on Java 19 (exactly), because the APIs are in flux. If you start with Java 20, it falls back to the classical MMapDirectory. We will add support for Java 20 in a later release. The reason for this is that the class files of new implementation are marked by some special version numbers that make them ONLY compatible to Java 19, not earlier or later, to allow the JDK to apply changes to the API before final release in Java 21. But passing --enable-preview to later versions won't hurt, so maybe enable it on all versions >= 19.

A last note: The downside of this new code is that closing and unmapping an index file gets more heavy (it will trigger an safepoint in the JVM). We have not yet found out how much this impacts servers opening/closing index files a lot. Because of this we would really like Amazon/Opensearch to do benchmarking on this, ideally if their users and customers could optionally enable it. But benchmarking should be done now, because with hopefully Java 21, Lucene will use the new implementation by default. Java 20 will be the second and last preview round.

Created at 2 days ago
opened issue
After update to Lucene 9.4 use `--enable-preview` on Java==19 (exact) to allow mmap use new JDK APIs

Description

Apache Lucene 9.4 will have support for Java 19 Panama APIs to mmap index files (using a MR-JAR). See https://github.com/apache/lucene/pull/912 for more information.

As those APIs are not yet enabled by default in the JDK, we still use an opt-in approach:

  • Lucene by default uses the old implementation using MappedByteBuffer and several hacks which may also risk crushing the JDK if an open index is closed from another thread while a serach is running (this is well known).
  • If you pass --enable-preview to the Java command line (next to heap settings), it will enable preview APIs in JDK (https://openjdk.org/jeps/12). Lucene detects this and switches the implementation of MMapDirectoryand MemorySegmentIndexInput to use those new APIs (at moment it will also log this as "info" message to JUL). They are safe and can no longer crush the JVM. But most importantly, all index files are now mapped in portions of 16 GiB instead of 1 GiB into memory. In fact, unless an index is force-merged to one segment, all index files will then consist only of one memory mapping spawning the whole file! This will help Hotspot to further optimize reading as only one implementation on MemorySegmentIndexInput ist used. In addition, because the number of mappings is dramatically reduced (approximately 5 times less mappings, because the maximum segment size is 5 Gigabytes by default and all such segments now use one instead of 5 mappings). This may allow uses to no longer change sysctl (see https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html) and go with defaults of OS. On the other hand users may host more indexes with many more segments on one node.

Things TODO:

  • Make sure that Elasticsearch also redirects stuff logged via java.util.logging (JUL) to its log file. This can be done with log4j by adding the log4j-jul adapter and install it using a system property in the Bootstrap. I have not checked if this is already done. The reason for this is that Apache Lucene now logs some events using java.util.logging since Lucene 9.0. Some of those events are MMapDirectory messages and a few others. Logging is very seldom, but for this feature it will definitely log using JUL
  • The Elasticsearch startup script should pass --enable-preview as command line flag if exactly Java 19 is used to start up Elasticsearch. If this is not done, a warning gets logged (see above).

Important: Lucene 9.4 only supports this on Java 19 (exactly), because the APIs are in flux. If you start with Java 20, it falls back to the classical MMapDirectory. We will add support for Java 20 in a later release. The reason for this is that the class files of new MMapDir impl are marked by some special version numbers that make them ONLY compatible to Java 19, not earlier or later to allow the JDK to apply changes to the API before final release in Java 21. But passing --enable-preview to later versions won't hurt, so maybe enable it on all versions >=19.

A last note: The downside of this new code is that closing and unmapping an index file gets more heavy (it will trigger an safepoint in the JVM). We have not yet found out how much this impacts servers opening/closing index files a lot. Because of this we would really like Elastic / Elasticsearch to do benchmarking on this, ideally if their users and customers could optionally enable it.

Created at 2 days ago
issue comment
GITHUB#11795: Add FilterDirectory to track write amplification factor

Robert is right, why do we need to see the values live? getFilePointer() always works.

Created at 2 days ago
issue comment
Expand TieredMergePolicy deletePctAllowed limits

I was also doing consulting for an huge Elasticsearch user and they also had this problem of keeping deletes as low as possible and the 20% limit was way too high. 20% looks like an arbitrary limitation.

Created at 3 days ago
issue comment
Release manager should review lucene benchmarks before building release candidates

I fully agree, some checks should be done. But here are a few bits that came into my mind:

  • The @mikemccand benchmarks are running against main branch only. So the first check should be before creating the release branch, so we know that the start of branch is "clean".
  • Anything else that gets included into release branch should be monitored by the RM if it negatively affects perf on main.
  • Is there a benchmark available for the 9.x branch? If it would be configurable / modular we could have a base branch (against main and in parallel against 9.x). When a release branch is created we could compare the changes on the release branch with the baselines!?

These are just ideas, but basically the important thing would be: Before creating the release branch do a quick check through the benchmark suite on Mike's web page and find potential issues. During the release process when stuff is merged into release branch monitor all merged changes manually on main's benchmark.

Created at 3 days ago
delete branch
uschindler delete branch issues/11819-eclipse
Created at 4 days ago
delete branch
uschindler delete branch issue/11820
Created at 4 days ago