a
ˆljö ã @ s0 d dl Z d dlZddlmZ G dd„ deƒZdS )é Né )ÚProbingStatec @ sn e Zd ZdZddd„Zdd„ Zedd„ ƒZd d
„ Zedd„ ƒZ d
d„ Z
edd„ ƒZedd„ ƒZ
edd„ ƒZdS )Ú
CharSetProbergffffffî?Nc C s d | _ || _t t¡| _d S ©N)Ú_stateÚlang_filterÚloggingZ getLoggerÚ__name__Úlogger)Úselfr © r úE/usr/lib/python3.9/site-packages/pip/_vendor/chardet/charsetprober.pyÚ__init__' s zCharSetProber.__init__c C s t j| _d S r )r Z DETECTINGr ©r r r r
Úreset, s zCharSetProber.resetc C s d S r r r r r r
Úcharset_name/ s zCharSetProber.charset_namec C s d S r r )r Úbufr r r
Úfeed3 s zCharSetProber.feedc C s | j S r )r r r r r
Ústate6 s zCharSetProber.statec C s dS )Ng r r r r r
Úget_confidence: s zCharSetProber.get_confidencec C s t dd| ¡} | S )Ns ([ -])+ó )ÚreÚsub)r r r r
Úfilter_high_byte_only= s z#CharSetProber.filter_high_byte_onlyc C s\ t ƒ }t d| ¡}|D ]@}| |dd… ¡ |dd… }| ¡ sL|dk rLd}| |¡ q|S )u9
We define three types of bytes:
alphabet: english alphabets [a-zA-Z]
international: international characters [€-ÿ]
marker: everything else [^a-zA-Z€-ÿ]
The input buffer can be thought to contain a series of words delimited
by markers. This function works to filter all words that contain at
least one international character. All contiguous sequences of markers
are replaced by a single space ascii character.
This filter applies to all scripts which do not use English characters.
s% [a-zA-Z]*[€-ÿ]+[a-zA-Z]*[^a-zA-Z€-ÿ]?Néÿÿÿÿó €r )Ú bytearrayr ÚfindallÚextendÚisalpha)r ÚfilteredÚwordsZwordZ last_charr r r
Úfilter_international_wordsB s ÿz(CharSetProber.filter_international_wordsc C s¤ t ƒ }d}d}tt| ƒƒD ]n}| ||d … }|dkr