a ˆljöã@s0ddlZddlZddlmZGdd„deƒZdS)éNé)ÚProbingStatec@sneZdZdZddd„Zdd„Zedd„ƒZd d „Zedd„ƒZ d d„Z edd„ƒZedd„ƒZ edd„ƒZdS)Ú CharSetProbergffffffî?NcCsd|_||_t t¡|_dS©N)Ú_stateÚlang_filterÚloggingZ getLoggerÚ__name__Úlogger)Úselfr©rúE/usr/lib/python3.9/site-packages/pip/_vendor/chardet/charsetprober.pyÚ__init__'szCharSetProber.__init__cCstj|_dSr)rZ DETECTINGr©rrrr Úreset,szCharSetProber.resetcCsdSrrrrrr Úcharset_name/szCharSetProber.charset_namecCsdSrr)rÚbufrrr Úfeed3szCharSetProber.feedcCs|jSr)rrrrr Ústate6szCharSetProber.statecCsdS)Ngrrrrr Úget_confidence:szCharSetProber.get_confidencecCst dd|¡}|S)Ns([-])+ó )ÚreÚsub)rrrr Úfilter_high_byte_only=sz#CharSetProber.filter_high_byte_onlycCs\tƒ}t d|¡}|D]@}| |dd…¡|dd…}| ¡sL|dkrLd}| |¡q|S)u9 We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [Â€-Ã¿] marker: everything else [^a-zA-ZÂ€-Ã¿] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. s%[a-zA-Z]*[€-ÿ]+[a-zA-Z]*[^a-zA-Z€-ÿ]?Néÿÿÿÿó€r)Ú bytearrayrÚfindallÚextendÚisalpha)rÚfilteredÚwordsZwordZ last_charrrr Úfilter_international_wordsBsÿz(CharSetProber.filter_international_wordscCs¤tƒ}d}d}tt|ƒƒD]n}|||d…}|dkr characters. Also retains English alphabet and high byte characters immediately before occurrences of >. This filter can be applied to all scripts which contain both English characters and extended ASCII characters, but is currently only used by ``Latin1Prober``. Frró>ós