I could easily remove any non Persian (Farsi) characters using this function, the range for Arabic and Persian are shared so this code could be used for Arabic too.
<?php mb_ereg_replace("[^-ۿ]","-",$string); ?>
This is the reference for finding the character range of Unicode languages:
http://unicode.org/charts/
mb_ereg
(PHP 4 >= 4.2.0, PHP 5)
mb_ereg — Regular expression match with multibyte support
설명
int mb_ereg
( string $pattern
, string $string
[, array $regs
] )
Executes the regular expression match with multibyte support.
인수
반환값
Executes the regular expression match with multibyte support, and returns 1 if matches are found. If the optional regs parameter was specified, the function returns the byte length of matched part, and the array regs will contain the substring of matched string. The function returns 1 if it matches with the empty string. If no matches are found or an error happens, FALSE will be returned.
주의
Note:
내부 인코딩이나 mb_regex_encoding()으로 정의한 문자 인코딩을 이 함수의 문자 인코딩으로 사용할 수 있습니다.
참고
- mb_regex_encoding() - Returns current encoding for multibyte regex as string
- mb_eregi() - Regular expression match ignoring case with multibyte support
arash at hemmat dot biz
17-May-2010 08:32
Jon
10-Apr-2009 10:22
Hebrew regex tested on PHP 5, Ubuntu 8.04.
Seems to work fine without the mb_regex_encoding lines (commented out).
Didn't seem to work with \uxxxx (also commented out).
<?php
echo "Line ";
//mb_regex_encoding("ISO-8859-8");
//if(mb_ereg(".*([\u05d0-\u05ea]).*", $this->current_line))
if(mb_ereg(".*([א-ת]).*", $this->current_line))
{
echo "has";
}
else
{
echo "doesn't have";
}
echo " Hebrew characters.<br>";
//mb_regex_encoding("UTF-8");
?>
