Legacy Strange error - regex in blogs

Status
Not open for further replies.

janslu

Customer
More detailed description of the problem:

yesterday I've made transition from mysql 5.6 to MariaDB 10.0 (current stable). This caused a little more changes needed in mysql packages - namely change to mysqlnd driver underlying mysql and mysqli php5 (5.5) extensions. Bau all in all everything SHOULD be working as before. In the meantime I've had to tweak few settings in regard to php error logging and reporting. But nothing really important.

I get multiple errors like the one below every hour. There's always the /blogs/xxxxxxx part, and there's some kind of a connection between the regexp and the actual blog "name" in the url - some letters are clearly visible but between them is a strange form of transliteration turned into a regex...

UPDATE:
I've finally got few errors with referers included. And this error shows up when there's translation from polish accented characters. For example - there's a user named "Ćwierć nuty" - last poster on one of the threads here: Dzieci urodzone w lutym 2013 (3/4 towards the bottom). His/her blog is under
Code:
http://www.babyboom.pl/forum/blogs/cwierc-nuta/
but this link causes an error:


Code:
Database error in DragonByte SEO 2.0.0 Beta 5:

Invalid SQL:

					SELECT userid 
					FROM vb_user 
					WHERE username REGEXP '^(&[\#\da-z]*;|[^a-z\d])*(ć|[cÇc])w[iÌÍÎÏìíîï](Ä™|[eÈÉÊËèéêë])r(ć|[cÇc])(&[\#\da-z]*;|[^a-z\d])*(Ć„|[nŃń])([uÙÚÛÜùúûü”]|u|Æ|æ)t(Ä
|[aÀÁÂĂÄĆàáâăäć])(&[\#\da-z]*;|[^a-z\d])*$' 
					LIMIT 1;

MySQL Error   : Got error 'invalid UTF-8 string at offset 28' from regexp
Error Number  : 1139
Request Date  : Friday, April 24th 2015 @ 09:15:38 AM
Error Date    : Friday, April 24th 2015 @ 09:15:38 AM
Script        : http://www.babyboom.pl/forum/blogs/cwierc-nuta/
Referrer      : http://www.babyboom.pl/forum/dzieci-urodzone-w-lutym-2013-a-f465/
IP Address    : 94.228.34.203
Username      : N/A
Classname     : DBSEO_Database_Slave_MySQLi
MySQL Version :
other examples:
Code:
Database error in DragonByte SEO 2.0.0 Beta 5:

Invalid SQL:

					SELECT userid 
					FROM vb_user 
					WHERE username REGEXP '^(&[\#\da-z]*;|[^a-z\d])*(膮|[a懒旅呐噌忏溴])(艣|[s姎])[i掏蜗祉铒](膮|[a懒旅呐噌忏溴])k(&[\#\da-z]*;|[^a-z\d])*(&[\#\da-z]*;|[^a-z\d])*$' 
					LIMIT 1;

MySQL Error   : Got error 'invalid UTF-8 string at offset 28' from regexp
Error Number  : 1139
Request Date  : Friday, April 24th 2015 @ 01:03:37 AM
Error Date    : Friday, April 24th 2015 @ 01:03:37 AM
Script        : http://www.babyboom.pl/forum/blogs/asiak-/
Referrer      : 
IP Address    : 91.121.10.126
Username      : N/A
Classname     : DBSEO_Database_Slave_MySQLi
MySQL Version :

==============

Invalid SQL:

                   SELECT userid 
                   FROM vb_user 
                   WHERE username REGEXP '^(&[\#\da-z]*;|[^a-z\d])*b(Ä[emoji769]|[eÈÉÊËèéêë])(Ä…|[aÀÁÂÃÄÅàáâãäå])([uÙÚÛÜùúûüµ]|u|Æ|æ)t[yŸÝýÿ]b(Ä…|[aÀÁÂÃÄÅàáâãäå])b[yŸÝýÿ]b(ó|[oÒÓÔÕÖØòóôõöø])[yŸÝýÿ](&[\#\da-z]*;|[^a-z\d])*w(ó|[oÒÓÔÕÖØòóôõöø])rdpr(Ä[emoji769]|[eÈÉÊËèéêë])((Å›|[sŠš])(Å›|[sŠš])|ß)(&[\#\da-z]*;|[^a-z\d])*(&[\#\da-z]*;|[^a-z\d])*$' 
                   LIMIT 1;

MySQL Error   : Got error 'invalid UTF-8 string at offset 29' from regexp
Error Number  : 1139
Request Date  : Friday, April 24th 2015 @ 08:27:11 AM
Error Date    : Friday, April 24th 2015 @ 08:27:11 AM
Script        : http://www.babyboom.pl/forum/blogs/beautybabyboy-wordpress-/
Referrer      : 
IP Address    : 104.154.16.207
Username      : N/A
Classname     : DBSEO_Database_Slave_MySQLi

I have observed that these are generated by visiting
Code:
http://www.babyboom.pl/forum/blogs/NONEXISTINGBLOG

Any idea what this may be? Any help will be appreciated...
 
Last edited:
Upvote 0
This suggestion has been closed. Votes are no longer accepted.
Can you please confirm that /dbtech/dbseo/includes/url.php is saved on the server with the Western (Windows-1252) encoding? If it is, please try removing any custom character filters and try again.
 
The file is encoded properly - hasn't been modified in any way... I'm using Custom Character Filter in url rewrite settings - Replace Non-Latin Characters and the format I used in vbseo:
Code:
'ą' => 'a'
'ć' => 'c'
'ę' => 'e'
'ł' => 'l'
'ń' => 'n'
'ó' => 'o'
'ś' => 's'
'ź' => 'z'
'ż' => 'z'
'Ą' => 'A'
'Ć' => 'C'
'Ę' => 'E'
'Ł' => 'L'
'Ń' => 'N'
'Ó' => 'O'
'Ś' => 'S'
'Ź' => 'Z'
'Ż' => 'Z'

I don't want to play with these settings without first asking about reverting and possible consequences. Is it something I can play with on a live production site? I don't want to mess up the urls in the whole forum...

I'm trying to get to the root of the problem. Blog url is probably the only place I'm not using some kind of an ID numeric identifier - all my forums, threads etc use forumid, threadid etc in their url. Does it influence the query generation logic in dbseo if I don't have the id? If so - can you direct me to a place where this query is being generated? I could try to debug this...
 
You could always close your site while you attempt to debug this by removing the contents of that setting. That way, there's no worry that any search engines may pick up any URL changes that occur as a result of your tests.

The issue is indeed because of URLs that don't have IDs in them - it's the query that attempts to reverse a rewritten title to its root component. The easiest solution would be to add IDs to the URLs, but there may be issues with redirecting existing URLs to the new URLs as a result, which is why we should probably figure out what part of the custom character filter is causing the encoding issue.
 
Hi,
I've been playing with rewrite rules for the past 30 minutes, writing a lot of debugs into the error_log. And I have it fixed on my server - all I needed to do was to resave the dbseo/includes/filter.php with utf-8 encoding. Which makes a lot of sense - there's a defintion for character replacement array starting on line 312 - with characters outside of the iso encoding map. And they were giving me problems, not my custom ones...

As a side note - during the debug I found there's a slight difference between $replace before custom chars are merged from $replace2:

Code:
first $replace:
[a] => [aÀÁÂÃÄÅàáâãäå]

after merge:
[a] => (ą|[aÀÁÂÃÄÅàáâãäå])
my custom char replacement is this first ą character, before the pipe. I'm not sure if the pipe is needed here, other characters are not divided. But regexes are like magic to me.

Last question I have - why did it work before, and stopped after mysql lib replacement (most probably). I don't really know.
 
Potentially the character encoding requirements changed when you upgraded mysqli library, I couldn't really tell.

I'm not sure whether I can save that file with UTF-8 and have it work for everyone, though, I believe I tried it myself. I might investigate alternative solutions.
 
Status
Not open for further replies.

Similar threads

  • Locked
  • Support ticket Support ticket
Bug DB Error
Tags Tags
db error
Replies
5
Views
537
  • Locked
  • Support ticket Support ticket
Bug DB Error
Replies
4
Views
2K
  • Locked
  • Support ticket Support ticket
Replies
2
Views
770
Top