p( 7 V d Z ddlZddlZddlmZ dgZ ej d Z ej d Z ej d Z ej d Z ej d Z ej d Z ej d Z ej d Z ej d Z ej d Z ej dej Z ej dej Z ej dej Z ej d Z ej d Z G d dej ZdS )zA parser for HTML and XHTML. N)unescape HTMLParserz[&<]z &[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z <[a-zA-Z]z [a-zA-Z]>z--!?>z-?>z0([a-zA-Z][^\t\n\r\f />]*)(?:[\t\n\r\f ]|/(?!>))*a{ ( (?<=['"\t\n\r\f /])[^\t\n\r\f />][^\t\n\r\f /=>]* # attribute name ) ([\t\n\r\f ]*=[\t\n\r\f ]* # value indicator ('[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\t\n\r\f ]* # bare value ) )? (?:[\t\n\r\f ]|/(?!>))* # possibly followed by a space a [a-zA-Z][^\t\n\r\f />]* # tag name [\t\n\r\f /]* # optional whitespace before attribute name (?:(?<=['"\t\n\r\f /])[^\t\n\r\f />][^\t\n\r\f /=>]* # attribute name (?:[\t\n\r\f ]*=[\t\n\r\f ]* # value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\t\n\r\f ]* # bare value ) )? [\t\n\r\f /]* # possibly followed by a space )* >? aF <[a-zA-Z][^\t\n\r\f />\x00]* # tag name (?:[\s/]* # optional whitespace before attribute name (?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name (?:\s*=+\s* # value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\s]* # bare value ) \s* # possibly followed by a space )?(?:\s|/(?!>))* )* )? \s* # trailing whitespace z#\s*([a-zA-Z][-.a-zA-Z0-9:_]*)\s*>c e Zd ZdZdZdZddddZd Zd Zd Z dZ d Zdd dZd Z d$dZd Zd Zd$dZd%dZd Zd Zd Zd Zd Zd Zd Zd Zd Zd Zd Zd! Zd" Zd# Z dS )&r aE Find tags and other markup and call handler functions. Usage: p = HTMLParser() p.feed(data) ... p.close() Start tags are handled by calling self.handle_starttag() or self.handle_startendtag(); end tags by self.handle_endtag(). The data between tags is passed from the parser to the derived class by calling self.handle_data() with the data as argument (the data may be split up in arbitrary chunks). If convert_charrefs is True the character references are converted automatically to the corresponding Unicode character (and self.handle_data() is no longer split in chunks), otherwise they are passed by calling self.handle_entityref() or self.handle_charref() with the string containing respectively the named or numeric reference as the argument. )scriptstylexmpiframenoembednoframes)textareatitleTF)convert_charrefs scriptingc J || _ || _ | dS )az Initialize and reset this instance. If convert_charrefs is true (the default), all character references are automatically converted to the corresponding Unicode characters. If *scripting* is false (the default), the content of the ``noscript`` element is parsed normally; if it's true, it's returned as is without being parsed. N)r r reset)selfr r s 2/opt/alt/python311/lib64/python3.11/html/parser.py__init__zHTMLParser.__init__v s$ !1" c d| _ d| _ t | _ d| _ d| _ d| _ t j | dS )z1Reset this instance. Loses all unprocessed data. z???NT) rawdatalasttaginteresting_normalinteresting cdata_elem_support_cdata _escapable_markupbase ParserBaser r s r r zHTMLParser.reset sK -"$$T*****r c N | j |z | _ | d dS )zFeed data to the parser. Call this as often as you want, with as little or as much text as you want (may include '\n'). r N)r goaheadr datas r feedzHTMLParser.feed s% |d*Qr c 0 | d dS )zHandle any buffered data. N)r$ r" s r closezHTMLParser.close s Qr Nc | j S )z)Return full source of start tag: '<...>'.)_HTMLParser__starttag_textr" s r get_starttag_textzHTMLParser.get_starttag_text s ##r escapablec | | _ || _ | j dk rt j d | _ d S |rB| j s;t j d| j z t j t j z | _ d S t j d| j z t j t j z | _ d S )N plaintextz\Zz&|%s(?=[\t\n\r\f />])z%s(?=[\t\n\r\f />])) lowerr r recompiler r IGNORECASEASCII)r elemr/ s r set_cdata_modezHTMLParser.set_cdata_mode s **,,#?k))!z%00D Bt4 B!z*Dt*V*,-*@ B BD "z*BT_*T*,-*@ B BDr c : t | _ d | _ d| _ d S )NT)r r r r r" s r clear_cdata_modezHTMLParser.clear_cdata_mode s -r c || _ dS )a Enable or disable support of the CDATA sections. If enabled, "<[CDATA[" starts a CDATA section which ends with "]]>". If disabled, "<[CDATA[" starts a bogus comments which ends with ">". This method is not called by default. Its purpose is to be called in custom handle_starttag() and handle_endtag() methods, with value that depends on the adjusted current node. See https://html.spec.whatwg.org/multipage/parsing.html#markup-declaration-open-state for details. N)r )r flags r _set_support_cdatazHTMLParser._set_support_cdata s #r c | j }d}t | }||k rI| j r}| j sv| d| }|dk rY| dt ||dz }|dk r*t j d || sn|}n=| j || }|r| }n| j rn|}||k rV| j r2| j r+| t ||| n| ||| | || }||k rn|j } |d| rt" || r| | } n |d| r| | } n |d| r| | } nl |d| r| | } nJ |d | r| | } n(|d z |k s|r| d |d z } nn| dk r|snt" || rn |d| r_|dz |k r| d nt0 || rnd| ||dz d nB |d| rU|}dD ]/} | | |d z r|t | z } n0| ||d z | n |d| r(| j r!| ||dz d n|||dz dk r!| ||dz d ni |d | r!| ||dz d n< |d| r!| ||dz d ntA d |} | || }n# |d| rtB || }|rq|" dd }| # | |$ } |d| d z s| d z } | || }d||d v r9| |||dz | ||dz }nI |d| r5tJ || }|rj|" d }| &