/xn@_ > d dl Z d dlZddlmZ G d de ZdS ) N )ProbingStatec e Zd ZdZddZd Zed Zd Zed Z d Z ed Zed Z ed ZdS ) CharSetProbergffffff?Nc ^ d | _ || _ t j t | _ d S N)_statelang_filterlogging getLogger__name__logger)selfr s /builddir/build/BUILDROOT/alt-python311-pip-21.3.1-4.el9.x86_64/opt/alt/python311/lib/python3.11/site-packages/pip/_vendor/chardet/charsetprober.py__init__zCharSetProber.__init__' s' &'11 c ( t j | _ d S r )r DETECTINGr r s r resetzCharSetProber.reset, s ",r c d S r r s r charset_namezCharSetProber.charset_name/ s tr c d S r r )r bufs r feedzCharSetProber.feed3 s r c | j S r )r r s r statezCharSetProber.state6 s {r c dS )Ng r r s r get_confidencezCharSetProber.get_confidence: s sr c 2 t j dd| } | S )Ns ([ -])+ )resub)r s r filter_high_byte_onlyz#CharSetProber.filter_high_byte_only= s f&c22 r c t }t j d| }|D ]Z}| |dd |dd }| s|dk rd}| | [|S )u9 We define three types of bytes: alphabet: english alphabets [a-zA-Z] international: international characters [-ÿ] marker: everything else [^a-zA-Z-ÿ] The input buffer can be thought to contain a series of words delimited by markers. This function works to filter all words that contain at least one international character. All contiguous sequences of markers are replaced by a single space ascii character. This filter applies to all scripts which do not use English characters. s% [a-zA-Z]*[-]+[a-zA-Z]*[^a-zA-Z-]?N r" ) bytearrayr# findallextendisalpha)r filteredwordsword last_chars r filter_international_wordsz(CharSetProber.filter_international_wordsB s ;; O ' 'DOOD"I&&& RSS I$$&&