hakre
Repos
65
Followers
104
Following
21

Events

issue comment
Nested child element parsing

@jwundrak: v0.1.12 with the xpath fix (thanks!) and other fixes is out, I updated jwundrak-nested-child-element-parsing accordingly.

Created at 1 month ago

XMLReaderIterator v0.1.12.

  • fix XMLElementXpathFilter over multiple xpath elements (thanks Julian Wundrak and Harm van Tilborg)
  • fix XMLChildElementIterator with named child elements
  • fix missing NULL return on XMLReaderIterator::current()
  • build: test/developbranches in github action workflow
  • build: gist doc-comment url
  • build: error on uncomitted changes
  • add XMLElementXpathFilter regression test
  • add XMLChildElementIterator regression test
  • add XMLChildElementIterator test for getNodePath()

Co-authored-by: Julian Wundrak j.wundrak@dotsource.de

Support read chlidren without using "skipNextRead"

Provide example

Created at 1 month ago
issue comment
Cannot iterate over the contents of multiple matched XPath elements

Is this a bug or standart behaviour?

This was a bug fixed in v0.1.12 @impcyber. Fix credits @jwundrak.

Created at 1 month ago
Cannot iterate over the contents of multiple matched XPath elements

Nice library hakre.

Maybe I'm using it wrong and is it by design that this happens, or maybe I must use another iterator. This is however what I'm trying to achieve.

Suppose this XML document:

<response>
  <foo />
  <result>
    <log>
      <logs count="3">
        <entry id="1">
          <baz>1</baz>
        </entry>
        <entry id="2">
          <baz>2</baz>
        </entry>
        <entry id="3">
          <baz>3</baz>
        </entry>
      </logs>
    </log>
  </result>
  <bar />
</response>

I'm trying to create an iterator that traverses through the three <entry> elements like this:

$xml_element_iterator = new \XMLElementIterator($xml_reader);
$iterator = new \XMLElementXpathFilter(
  $xml_element_iterator,
  '/response/result/log/logs/entry'
);
foreach ($iterator as $entry) {
  $simplexml = $entry->getSimpleXMLElement();
  // The following line fails
  echo "entry: ".(string) $simplexml->baz."\n";
}

The line fails because $simplexml is NULL. Probably because the xpath filter only gets fed an empty <entry> element. Is this a bug or is it deliberate and should I use another iterator?

Created at 1 month ago

XMLReaderIterator v0.1.12.

  • fix XMLElementXpathFilter over multiple xpath elements (thanks Julian Wundrak and Harm van Tilborg)
  • fix XMLChildElementIterator with named child elements
  • fix missing NULL return on XMLReaderIterator::current()
  • build: test/developbranches in github action workflow
  • build: gist doc-comment url
  • build: error on uncomitted changes
  • add XMLElementXpathFilter regression test
  • add XMLChildElementIterator regression test
  • add XMLChildElementIterator test for getNodePath()

Co-authored-by: Julian Wundrak j.wundrak@dotsource.de

Created at 1 month ago

XMLReaderIterator v0.1.12.

  • fix XMLElementXpathFilter over multiple xpath elements (thanks Julian Wundrak and Harm van Tilborg)
  • fix XMLChildElementIterator with named child elements
  • fix missing NULL return on XMLReaderIterator::current()
  • build: test/developbranches in github action workflow
  • build: gist doc-comment url
  • build: error on uncomitted changes
  • add XMLElementXpathFilter regression test
  • add XMLChildElementIterator regression test
  • add XMLChildElementIterator test for getNodePath()

Co-authored-by: Julian Wundrak j.wundrak@dotsource.de

Created at 1 month ago
hakre create tag v0.1.12
Created at 1 month ago
pull request closed
Allow nested xpath filter

Hi @hakre . Thanks for the library.

This PR resolves #12 . The problem is, that the current XPathFilter only check for exactly one match. If there nested matches, they are not found.

Behavior:

Using example in https://github.com/hakre/XMLReaderIterator/issues/12#issuecomment-276355968 with this script:

$xml = <<<XML
<response>
  <entry>
    <log>
      <logs count="3">
        <entry id="1">
          <baz>1</baz>
        </entry>
        <entry id="2">
          <baz>2</baz>
        </entry>
        <entry id="3">
          <baz>3</baz>
        </entry>
      </logs>
    </log>
  </entry>
</response>
XML;


$reader = XMLReader::open('data://text/plain,' . urlencode($xml));
$iterator = new XMLElementIterator($reader);
$list = new \XMLElementXpathFilter($iterator, '//entry');


/** @var \XMLReaderNode $result */
foreach($list as $item) {
    var_dump($item->getName());
    var_dump($item->getAttribute('id'));
}

Before patch

string(5) "entry"
NULL

After patch

string(5) "entry"
NULL
string(5) "entry"
string(1) "1"
string(5) "entry"
string(1) "2"
string(5) "entry"
string(1) "3"
Created at 1 month ago
issue comment
Allow nested xpath filter

@jwundrak Thanks for the fix, will be released in v0.1.12.

Created at 1 month ago
issue comment
Nested child element parsing

@jwundrak: I've pushed my changes to the develop branch and also resolved conflicts with your PR within jwundrak-nested-child-element-parsing 03b6e8923d0c4d656acec396822abe0c72074f23 .

I could not fully review your changes but I like the idea on first glance, I never considered looking into the element type for that (like closing etc). Also having readNext() as protected to be overridden by child classes. Still would have to chew on it, but I think I start to get the idea.

The self vs. $this issue I also asked myself (I'm not aware of a performance benefit just FYI) and had some changes to that in the earlier fix. In the resolved merge conflict readNext() still exists but is effectively set blank as it is not called any longer as I also changed next() logic in the earlier fix. Please take a look.

My code also has at least one FIXME still which needs to be addressed (but its just return false instead of null, if you stumble over it, just ignore).

You should run the test-suite and provide the fixtures/expectations when developing , e.g. for your example (all examples are automatically tested). Let me know if you need support to get it run on your box. You don't need to run it with all php version we do in the project, but you should run it at least with one php version out of the range 5.3 ... 8.2 master while developing and then CI will tell you some of the php version caveats if any in that range.

The .travis.yml normally gives a good summary on how the build is run (which does the testing). you only need php (with the obvious XML extensions), git and composer (and bash shell) IIRC. Let me know if you run into issues with getting it setup and let me know then as well your operating system.

I changed the PR target against the develop branch please see if the branch mentioned in the beginning is useful to you and feel free to do force pushes for this PR. Unfortunately I didn't look into the XPath PR earlier, I know it has a bug, and this earlier fix of mine only remotely addresses it. The XPath issue is missing tests and I started with this ChildIterator issue first.

Maybe rebasing on develop and a commit with a regression unit-test first, then the fix commit.

Created at 1 month ago
create branch
hakre create branch jwundrak-nested-child-element-parsing
Created at 1 month ago

XMLReaderIterator v0.1.11.

  • fixed endless recursion in XMLAttributeIterator::getReader()
  • fixed dumping trailing white-space
  • fixed length in dumping strings
  • fixed xml-file-scanner example deprecated each() use (php 7.2+/php 8.0)
  • fixed XMLSequenceStream node expansion (php 7.4+)
  • fixed php internal interfaces return type deprecation messages on parse time for Iterator, Countable, ArrayAccess etc. (php 8.1+)
  • fixed php internal function parameter type errors for fopen(), strtok(), strlen() etc. (php 8.1+)
  • fixed a couple of minor code issues and typos
  • build: run .travis.yml (recycle) as github action
  • build: patch vendor for php compatibility (php 5.3 ... 8.1)
  • build: fixed composer version detection
  • build: improved composer package size
  • build: change autoload to classmap (from files)
  • build: capture errors during require/auto loading
  • build: pass standard error stream from phpunit
  • build: fix $TRAVIS reference
  • build: no explicit source preference with composer
  • improved utf-8 support for dump operations
  • improved use of SPDX license list (version 3.0 deprecation)
  • added CDATA and Whitespace node support for XMLWritingIteration (thanks CPCoder and Kacper Woźniak)
  • added warning on writing unsupported node-type
  • added SimpleXMLElement class-name support
  • added XMLElementIterator::skipNextRead()
  • added reading and writing with DOM example
  • added reading and writing with SimpleXML example
  • added XMLChildElementIterator example
  • example tests: skip on missing stream wrapper
  • example tests: clean output buffer on failure
  • example tests: show backtrace on failure
  • maintenance

XMLReaderIterator v0.1.12. (prepare)

  • fixed XMLChildElementIterator with named child elements
  • fixed missing NULL return on XMLReaderIterator::current()
  • added XMLChildElementIterator regression test
  • added XMLChildElementIterator test for getNodePath()

test/develop: add branch names for build

to better integrate w/ pull requests.

Created at 1 month ago
issue comment
Nested child element parsing

Thanks for the initiative, I have to take a look. I also have a children related bug-fix I'd like to pull in first, then would review this PR and give feedback.

Created at 1 month ago
issue comment
Classes extending `IteratorAggregate` can not implement `RecursiveIterator`

My understanding was that, IteratorAggregate is an internal decorator/proxy implementation of a Traversable that provides a/n Iterator&Traversable to iterate over. By implementing IteratorAggregate you are implementing a proxy to Iterator.

Is this correct?

No, it is not correct. The Traversable interface is internal, the IteratatorAggregate interface is exposed (as is Iterator). But both are interfaces so IteratorAggregate can't be a decorator or proxy or any other implementation.

Therefore, in terms of compatibility for Traversable, it can only be resolved to an Iterator at run-time. And a decorator/proxy implementation exists for Traversable, which is probably the default implementation you're actually looking for: IteratorIterator.

So as with PHP 8 SplFixedArray has changed from Iterator to IteratorAggregate but remains a Traversable you can still have your implementation backwards and forwards compatible by adhering to the Traversable interface instead of Iterator or IteratorAggregate.

From raising the issue here, it looks like that this is already part of your understanding to a certain degree, at least in your last comment to which I answer.

However, this does not yet fully resolve your issue as you have found out, that it can't have the concrete Traversable implementation any longer:

  • If it runs on PHP < 8.0 then it needs to implement the Iterator - or -
  • If it runs on PHP >= 8.0 then it needs to have the IteratorAggregate.

But as we did learn, it can't have both. And for sure we don't want to load different classes / definitions under the same name, right?

The resolution is still within understanding the change better. What did happen? Traversable didn't change. This is good as we need it for foreach and similar iteration use. But the way on how the actual iterator is provided did change: From inheritance (extends an Iterator) to aggregation (getIterator() an Iterator).

Now as the subject of the matter, SplFixedArray, changed, anything that extends from it just needs to change, too, it needs to reflect those changes.

And the change is clear: From inheritance to aggregation.

And you already voiced what you're looking for here: A decorator/proxy.

Therefore, may humble we suggest as (one) resolution: Your subject RecursiveFixedArray representing an SplFixedArray being an Iterator becomes an IteratorIterator?:

class RecursiveFixedArray extends IteratorIterator implements RecursiveIterator
{
    public function __construct(int $size = 0) {
        parent::__construct(new SplFixedArray($size));
    }

    public function getChildren(): ?RecursiveIterator
    {
        return $this->current();
    }

    public function hasChildren(): bool
    {
        return $this->current() instanceof RecursiveFixedArray;
    }
}

As IteratorIterator proxies all undefined methods to its inner iterator, it is most likely that this already works. A classic refactoring from inheritance to aggregation for a Traversable in PHP.


However, it won't cover your case as a whole I'd assume, because you also raised another "internal interface", not just iteration, so I'd like to add more discussion to it, so it is possible to also decide on the details we may face with design issues:

How can you recursively traverse a SplFixedArray (no modification), without having to convert SplFixedArray to an array and relying on RecursiveArrayIterator?

While the new class already provides recursive iteration (in its RecursiveIterator implementation) and the direct question could be considered answered, I take the term array as a hint that you'd also like to have ArrayAccess (and likely Countable as well as JsonSerializeable) from the original extended SplFixedArray.

These are special interfaces in PHP that need to be implemented in the proxy when needed, aggregation does not maintain these interfaces on the new class automatically (as those are interfaces) and the method forwarding capability of IteratorIterator to the getInnerIterator() is not enough yet (and wouldn't be either for some other reason, but we're dealing with that anyway).

Therefore, to become feature complete the new replacement implementation needs those to be added as well (here only the exemplary count() method):

class RecursiveFixedArray
    extends IteratorIterator
    implements RecursiveIterator,
               ArrayAccess,
               Countable,
               JsonSerializable

{
    private SplFixedArray $subject;

    public function __construct(int $size = 0) {
        $this->subject = new SplFixedArray($size);
        parent::__construct($this->subject);
    }

    public function count(): int
    {
        return $this->subject->count();
    }

# ...

This is the same for all proxy methods of these interfaces, I'm pretty sure as you're writing about proxies and decorators, this should be straight forward to grasp.

And details again: When we're talking about backward and forward compatible code, types may change and PHP 8.1 requires attributes to hint, which (gladly!) are backwards compatible to hop over the 7.4 |\ 8.0 |\ 8.1 barriers fatal and warning free.

If everything worked right with the proxying there shouldn't be any fatal errors including those involving the PHP 8 new InternalIterator nor any warnings or notices.

Coming that far, you may ask yourself if that is not repeating what uncovered the flaw in the first place: Not wrapping third-party (internal) code. I'd say yes and no. While the original code sort-of was programmed against interfaces, it was reasonable sane so why not do it again? Additionally, it might be that a whole proxy is not even necessary, therefore the extra overhead of implementing Countable etc. in the proxy may not even be necessary. Even better if only the recursive iterator is needed, and now favoring composition over inheritance as the solution for both new and old code, the more straight forward implementation may be:

class SplFixedArrayRecursiveIterator extends IteratorIterator implements RecursiveIterator
{
    public function __construct(SplFixedArray $traversable) {
        parent::__construct($traversable);
    }

    public function getChildren(): ?RecursiveIterator
    {
        return $this->current();
    }

    public function hasChildren(): bool
    {
        return $this->current() instanceof RecursiveFixedArray;
    }
}

While it won't act as a full proxy, it would suffice as a recursive iterator for the subject, would have less implementation and would do the really necessary things already.

It may require more refactoring of the existing code as it ain't a full replacement but then on the other hand, when implementing the proxy it may be already useful to implement a full proxy that is as well preferring composition to inheritance and therefore not have it extend IteratorIterator and instead make that one an internal detail as well. As not being an SplFixedArray it would not inherit the original problem after all (and hint: it could use the SplFixedArrayRecursiveIterator as the iterator aggregate if your looking for some refactoring strategies). But if the problem was only on level of recursion, no extended proxy necessary, just an iterator for which the IteratorIterator boilerplate works very well as the replacement from the previous boilerplate inherited from SplFixedArray.

Hope this helps also other stumbling over it and looking into making code more backwards and forward compatible even if such breaking changes may appear as if they could not be solved at all.

TLDR: If you need to deal with the Traversal internal interface from the PHP code side, look at IteratorIterator for an implementation.

Created at 1 month ago