fb55
Repos
42
Followers
450
Following
45

The fast & forgiving HTML and XML parser

3905
345

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

26436
1440

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.

3332
219

a CSS selector compiler & engine

498
68

Handler for htmlparser2, to get a DOM

285
64

Encode & decode HTML & XML entities with ease & speed

273
53

Events

chore(deps-dev): Bump eslint-plugin-jsdoc from 44.2.7 to 46.1.0 (#1390)

Created at 12 hours ago
delete branch
fb55 delete branch dependabot/npm_and_yarn/eslint-plugin-jsdoc-46.1.0
Created at 12 hours ago
pull request closed
chore(deps-dev): Bump eslint-plugin-jsdoc from 44.2.7 to 46.1.0

Bumps eslint-plugin-jsdoc from 44.2.7 to 46.1.0.

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Created at 12 hours ago
delete branch
fb55 delete branch dependabot/npm_and_yarn/eslint-plugin-jsdoc-46.0.0
Created at 1 day ago

build(deps-dev): bump eslint-plugin-jsdoc from 44.2.5 to 46.0.0 (#3219)

Created at 1 day ago
pull request closed
build(deps-dev): bump eslint-plugin-jsdoc from 44.2.5 to 46.0.0

Bumps eslint-plugin-jsdoc from 44.2.5 to 46.0.0.

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Created at 1 day ago
delete branch
fb55 delete branch docs/sponsors
Created at 1 week ago

docs(readme): Update Sponsors (#3180)

Created at 1 week ago
pull request closed
Update Sponsors

Automated changes by create-pull-request GitHub action

Created at 1 week ago
issue comment
Parsing messes up when given a utf-8 input with a BOM

My point is more that some users will not use that.

Agreed. Let's document this properly, so users can make the right choice for themselves.

With <meta charset=utf-8>, that they must always add.

Does the spec mandate a charset? A charset meta tag is also only one of many ways of defining a document as UTF-8 and is redundant if an UTF-8 BOM is present.

How does windows-1252 vs utf-8 affect this project if a string is given that includes an UTF-8 (or the other two) BOMs or not?

This project not supporting the default encoding for web content seems like a big deal.

it looks like DOMParser treats BOMs as regular characters, and puts them in body

"\ufeff" only has a special meaning at the start of a file, and will be treated as regular characters otherwise.

Created at 1 week ago
delete branch
fb55 delete branch dependabot/npm_and_yarn/eslint-plugin-unicorn-47.0.0
Created at 1 week ago

chore(deps-dev): bump eslint-plugin-unicorn from 46.0.0 to 47.0.0 (#927)

Co-authored-by: Felix 188768+fb55@users.noreply.github.com

Created at 1 week ago
pull request closed
chore(deps-dev): bump eslint-plugin-unicorn from 46.0.0 to 47.0.0

Bumps eslint-plugin-unicorn from 46.0.0 to 47.0.0.

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Created at 1 week ago

Disable rules, fix where relevant

Created at 1 week ago
issue comment
How can I remove/replace an attribute?

This functionality is available in Cheerio. The unavailability of new loading methods shouldn't make Cheerio less useful.

Created at 1 week ago
issue comment
Parsing messes up when given a utf-8 input with a BOM

But those BOMs should still be in the string no?

They won't be — eg. iconv-lite strips BOMs.

The HTML spec tells users to always use UTF-8.

The HTML spec uses unicode code points internally, but defaults to windows-1252 as the encoding for the vast majority of locales (see table at the bottom of https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding).

feels insufficient

Why?

For starters, because UTF8 isn't the default encoding for HTML.

As far as I am aware, this project does not expose byte information in positional info, it instead uses character offsets the way they work in JS strings?

This project does expose line/col positions as well as offsets.

Created at 1 week ago
issue comment
Parsing messes up when given a utf-8 input with a BOM

I was thinking 13.2, but forgot that encoding handling is part of that section. Still, this is a quite complicated subject and should IMHO be handled by a separate module as a pre-processing step (this is what JSDOM and Cheerio do). I maintain https://github.com/fb55/encoding-sniffer as this pre-processing step.

users there often read HTML files from the file system or from over the network

Agreed. But supporting a small subset of possible encodings (only UTF8 with a BOM) feels insufficient; having a proper split of responsibilities should make it easier for users to know what to expect.

slice/replace messes up positional info, which is also a very useful part that this project provides which is outside of the scope of the HTML spec

This will always be an issue with character encodings — there is currently no easy way to map the original bytes to the code point positions in the input stream.

The expected byte positions also depend on the use-case. Code editors will ignore BOMs when displaying files, so document positions surfaced to users will be wrong if a BOM is taken into account. (And this applies to most character encodings.)

I don't feel like I have a good answer for how to deal with this — I'm open for suggestions.

it’s an understandable request that, [...] at least, it is explained clearly how to use this project from Node.js

Strong yes. Let's figure out how to deal with this issue, then update the docs.

Created at 1 week ago

docs(readme): Mention decoding

Created at 1 week ago
Created at 1 week ago
issue comment
Parsing messes up when given a utf-8 input with a BOM

This library implements the HTML parsing spec, which does not expect a BOM.

If you will only ever encounter an UTF8 BOM, then a simple String.prototype.replace call suffices (or a .slice if you know the BOM will always be present). A BOM generally point towards a need to support different encodings, which is out of scope for this module.

Created at 2 weeks ago
closed issue
Parsing messes up when given a utf-8 input with a BOM

I have attached two files. One of them starts with a utf-8 BOM (0xEF, 0xBB, 0xBF), and the other does not. They are otherwise identical, and look like this:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Document</title>
  </head>
  <body>
    Hello world!
  </body>
</html>

They look the same in my editor, and they act the same when read and printed back out as files, but when I pass them to serialize(parse(fileContents)) the file without the BOM prints

<!DOCTYPE html><html lang="en"><head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
  </head>
  <body>
    Hello world!
  

</body></html>

This all makes sense so far. However, when I do the exact same process to the file with the BOM I get the output

<html lang="en"><head></head><body>

  
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
  
  
    Hello world!
  

</body></html>

As you can see, the contents of the <head> have magically moved down into the <body>.

bom.html.txt bomless.html.txt

Created at 2 weeks ago
issue comment
Parsing messes up when given a utf-8 input with a BOM

Duplicate of https://github.com/inikulin/parse5/issues/111. Tl;dr input stream decoding is out of scope for parse5, but you could use eg. https://github.com/fb55/encoding-sniffer to decode the input before using parse5.

Created at 2 weeks ago
delete branch
fb55 delete branch dependabot/npm_and_yarn/eslint-plugin-n-16.0.0
Created at 2 weeks ago

build(deps-dev): Bump eslint-plugin-n from 15.7.0 to 16.0.0 (#611)

Created at 2 weeks ago
pull request closed
build(deps-dev): Bump eslint-plugin-n from 15.7.0 to 16.0.0

Bumps eslint-plugin-n from 15.7.0 to 16.0.0.

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Created at 2 weeks ago
delete branch
fb55 delete branch dependabot/npm_and_yarn/eslint-plugin-n-16.0.0
Created at 2 weeks ago