• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Help with Chinese Language

NiagaraPete

Major Contributor
Forum Donor
Joined
Jun 23, 2021
Messages
2,190
Likes
1,960
Location
Canada
IP blocking is a way too blunt of a instrument. how about these phrases
认证本科毕业证Certified Undergraduate Diploma
仿真毕业证Simulation diploma
证使馆认证embassy certification
I block all of Russia and China plus a couple others. Not worth the hassle.
 

NiagaraPete

Major Contributor
Forum Donor
Joined
Jun 23, 2021
Messages
2,190
Likes
1,960
Location
Canada
Can I assume you're using the StopForumSpam DB and DNSBL DB?
 

Presently42

Active Member
Joined
Jul 30, 2019
Messages
174
Likes
240
Location
Montreal, Quebec, Canada
I reported one of these a while back, so I'm sad to see this spam is more than just a one-off. Anyway, the characters are simplified, strongly implying the origin of the spam being mainland China: traditional characters are used in Taiwan and Hong Kong (Japan uses a third set). All three posts have the number sequence 993398773, as well as 大学 (dàxué), meaning university.

Source: studied Standard Chinese for a few years. It's been a while, admittedly; and I no longer claim to be conversant in the language, sadly.

Edit: Noting, that Standard Chinese generally doesn't have spacing between words, can your spam filter pick up, that 大学 must occur together, rather than separately? That being said, if having a language other than English being used in posts and titles, I suppose the difference isn't terribly important.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,597
Likes
239,663
Location
Seattle Area
Edit: Noting, that Standard Chinese generally doesn't have spacing between words, can your spam filter pick up, that 大学 must occur together, rather than separately?
That's the unknow. I am using wild cards on either side of the spam word so hopefully it does. I did trap him once this week so I am hoping it works.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,597
Likes
239,663
Location
Seattle Area
Following are the products they are selling:
文凭
毕业证书
成绩单
认证
学位

Following is their contact ID:
Q/微993398773
Thanks. I added all of this. Let's see what happens.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,597
Likes
239,663
Location
Seattle Area
I reported one of these a while back, so I'm sad to see this spam is more than just a one-off.
He was there when we first started. Kept it up for a while but then disappeared. Then he came back a year or two later, and left again. Now he is back and has been around for a month or so. I have upped the various countermeasures making his life more difficult so he may disappear again.
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,247
Likes
17,162
Location
Riverview FL
He was there when we first started. Kept it up for a while but then disappeared. Then he came back a year or two later, and left again. Now he is back and has been around for a month or so. I have upped the various countermeasures making his life more difficult so he may disappear again.

1650851673830.png
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,247
Likes
17,162
Location
Riverview FL
What does it say on the back of the monitor so I can block that too!

Google Lens says:

1650853084493.png


He was visiting "People's Daily" in 2016.

And hacking your site, of course.
 

JeffS7444

Major Contributor
Forum Donor
Joined
Jul 21, 2019
Messages
2,363
Likes
3,546
Would it be practical to have posts from new users initially go into a moderation queue? That would at least force spammers / spies to produce legitimate audio content before they could post their curiously coded messages <-(product of an overactive imagination)
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,597
Likes
239,663
Location
Seattle Area
Couldn't you just blacklist the entire Chinese character set?
I hadn't figure out how but the link above seems to indicate how to do it. So I will look at that.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,597
Likes
239,663
Location
Seattle Area
Would it be practical to have posts from new users initially go into a moderation queue?
Yes. The problem is that we will be punishing the good guys in the process. I am doing everything in my power to not do that. It is not very welcoming to moderate new users this way.
 

oyama

New Member
Joined
Mar 8, 2022
Messages
2
Likes
8
Location
Tokyo
Hello,

If the following line is given in XenForo's "Spam Phrases", it will be a "wildcard" match for Chinese ideographs.

/\p{Han}/u

Han is one of the Unicode categories indicating ideographic characters used in Chinese, Japanese, Korean, and Vietnamese.
This means that \p{Han} would match characters from multiple languages.
I don't think there will be nearly as many problems with the text in this forum:)

 
Last edited:

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,372
Likes
18,290
Location
Netherlands
It's not really strange to block non-English character sets. It's an English-writing forum after all.
 

jae

Major Contributor
Joined
Dec 2, 2019
Messages
1,208
Likes
1,508
Hello,

If the following line is given in XenForo's "Spam Phrases", it will be a "wildcard" match for Chinese ideographs.

/\p{Han}/u

Han is one of the Unicode categories indicating ideographic characters used in Chinese, Japanese, Korean, and Vietnamese.
This means that \p{Han} would match characters from multiple languages.
I don't think there will be nearly as many problems with the text in this forum:)

Very elegant solution. Welcome to ASR.

Perhaps it is better to filter the problematic phrases related to the spam first and see how well that works alone. I've noticed that in regular conversation here, members more than once have posted excerpts from other sites, usually in Chinese- and provided a translation in the case of a new product being launched from a Chinese brand when the news has not made it to western audiences yet. I remember another case- someone was having problems reading a Chinese or Japanese manual or guide for a product and I provided a transcription and translation. While not extremely common on an english-speaking forum, there may be valid cases to use those characters that don't include spam.
 

001

Addicted to Fun and Learning
Forum Donor
Joined
Oct 21, 2020
Messages
548
Likes
985
At the risk of going off-thread, I can heartily recommend this book "The Chinese Typewriter: A history". It's an incredible look at the difficulty of language and the development of a machine for communication. You'll discover among other things that 'predictive typing' has been around since ~ 1952. Just amazing. And, for those of you in the Los Angeles area, there's one of the very few remaining typewriters on display in the Huntington Gardens.

I refer to this as an insight into the difficulties of 'defining' language/words when you're dealing with ideographics and homophones. Trickier when there's a need to stop spam for instance. Again, a wonderful albeit a somewhat heavy duty, read.
(This illustrates the complexity also: https://blog.tutorabcchinese.com/expats/why-there-is-no-chinese-alphabet)

thechinesetypewriter_cover_mullaney.jpg
 
Top Bottom