futd~ p d dl Z d dlZd dlmZmZ ddlmZmZ ej d Z G d d Z dS ) N)OptionalUnion )LanguageFilterProbingStates% [a-zA-Z]*[-]+[a-zA-Z]*[^a-zA-Z-]?c ` e Zd ZdZej fdeddfdZddZede e fd Zede e fd Zd e eef defd Zedefd ZdefdZed e eef defd Zed e eef defd Zed e eef defd ZdS ) CharSetProbergffffff?lang_filterreturnNc t j | _ d| _ || _ t j t | _ d S )NT) r DETECTING_stateactiver logging getLogger__name__logger)selfr s g/builddir/build/BUILD/cloudlinux-venv-1.0.10/venv/lib/python3.11/site-packages/chardet/charsetprober.py__init__zCharSetProber.__init__, s1 ",&'11 c ( t j | _ d S N)r r r r s r resetzCharSetProber.reset2 s ",r c d S r r s r charset_namezCharSetProber.charset_name5 s tr c t r NotImplementedErrorr s r languagezCharSetProber.language9 s !!r byte_strc t r r )r r# s r feedzCharSetProber.feed= s !!r c | j S r )r r s r statezCharSetProber.state@ s {r c dS )Ng r r s r get_confidencezCharSetProber.get_confidenceD s sr bufc 2 t j dd| } | S )Ns ([ -])+ )resub)r* s r filter_high_byte_onlyz#CharSetProber.filter_high_byte_onlyG s f&c22 r c t }t | }|D ]Z}| |dd |dd }| s|dk rd}| | [|S )u7 We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [-ÿ] marker: everything else [^a-zA-Z-ÿ] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. N r, ) bytearrayINTERNATIONAL_WORDS_PATTERNfindallextendisalpha)r* filteredwordsword last_chars r filter_international_wordsz(CharSetProber.filter_international_wordsL s ;; ,33C88 ' 'DOOD"I&&& RSS I$$&&