[SOLVED] Don't fix corrupted text; we're working on it

Important site and forum news, announcements, and feedback goes here.

Moderator: Board Staff

Monk Ed
Sunshine Administrator
Sunshine Administrator
User avatar
Age: 38
Posts: 8601
Joined: Jul 12, 2008
Location: Chicagoland area
Gender: Male

[SOLVED] Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Monk Ed » Thu Dec 03, 2015 4:06 am

The corrupted text issue is in process and it would be best to leave all corrupted text (e.g. in signatures) as-is so that when we run a reencode on the database everything should go back to normal.

But constructing the necessary script is a delicate process with a lot of measure twice, cut once so it might take a while before we implement the solution.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara

TehDonutKing
Camel Dilettante
Camel Dilettante
User avatar
Age: 28
Posts: 3934
Joined: Apr 23, 2010
Location: Outer Space Jupiter
Contact:

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby TehDonutKing » Thu Dec 03, 2015 12:26 pm

Out of curiosity, could you or Paul explain the mechanics behind the corrupted text?
/hj

I said and did some dumb and hurtful things in my time here when i was younger. If i ever hurt you, i'm sorry. If you see any of this while reading old threads, i'm learning and trying to improve. Donut redemption arc in progress.

pwhodges
A Lilin in Wonderland
A Lilin in Wonderland
User avatar
Age: 77
Posts: 11035
Joined: Nov 18, 2012
Location: Oxford, UK
Contact:

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby pwhodges » Thu Dec 03, 2015 1:40 pm

A character set is a way of translating between conceptual "glyphs" and bit patterns. Initially (Telex and the like) only 32 characters were handled (hence 5-hole paper tape), but quite quickly this was expanded to 128 (7 bits, and an eighth used for parity) - which fitted the concept of a byte and its power-of-two-ness just nicely. But the parts of the western world that used accents and other characters wanted more than the US (who defined the first such mappings) provided; so they dropped the parity and used the extra 128 values freed for their own requirements. They did this differently in every country and language, and often several ways in the same place. Also IBM used a different mapping of the US original characters from everyone else (something to do with cards).

So there grew up dozens, hundreds of ways of mapping between glyphs and numbers (bit patterns), and any computer-stored text was meaningless unless you knew which mapping to use. Email got to include fields that specified the mapping; but lots of programs didn't bother - they just assumed everyone that used their program would agree, and got on with it. So inevitably, with increasing international spread of data, mistakes happened more and more frequently.

But what of other alphabets and systems? Arabic, Cyrillic, Greek Hebrew, Chinese, Japanese, Korean, Urdu, etc, etc? In the 1990s an effort was made to standardise a single set of mappings that included all glyphs in all languages. This was (is) called Unicode. Instead of the eight bits (256 characters) of the old multiple character sets (also called "code pages"), Unicode used sixteen bits (32768 characters) - however, this has proven to be insufficient, and I think eighteen bits are currently required to handle the whole defined sweep of glyphs. There are schemes included in Unicode for using subsets of characters and switching between them, so that a string of eight- (or even seven-) bit characters can be used, for convenience. But of course, these schemes need to be understood by all programs involved with the data.

phpBB v2 was written without built-in awareness of Unicode, and ways of handling alternative character sets including later some forms of Unicode got patched in bit by bit. But sometimes mistakes were made. Sometimes different codings were used in different parts of the program. Sometimes some but not all data was updated when a change (aka improvement) was made. And so gradually the data became inconsistent, and relied on matching inconsistencies in the program to (usually) come out OK in the end.

In the conversion of the EGF database to the new fully Unicode-aware version of the forum software, Monk Ed has been trying to find ways to convert the characters, where necessary, in the right way - which is not always the same. He got most of them right (for part of the time, the converter was simply dumping posts that it didn't understand right, which were a lot); but a few things got missed (sigs, location in profiles), and he's working on getting those converted as well. When this is done, there will be no need for such problems to appear in the forum again.

(Incidentally, the rather acerbic account above is not meant to minimise the achievement of Ornette and Mr.Tines in making the old software handle this as well as it did - they did a good job.)
"Being human, having your health; that's what's important." (from: Magical Shopping Arcade Abenobashi )
"As long as we're all living, and as long as we're all having fun, that should do it, right?" (from: The Eccentric Family )
Avatar: The end of the journey (details); Past avatars.
Before 3.0+1.0 there was Afterwards... my post-Q Evangelion fanfic (discussion)

TehDonutKing
Camel Dilettante
Camel Dilettante
User avatar
Age: 28
Posts: 3934
Joined: Apr 23, 2010
Location: Outer Space Jupiter
Contact:

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby TehDonutKing » Thu Dec 03, 2015 1:50 pm

I was aware of ASCII and Unicode, but i haven't dealt with other character sets before. Thank you.
/hj

I said and did some dumb and hurtful things in my time here when i was younger. If i ever hurt you, i'm sorry. If you see any of this while reading old threads, i'm learning and trying to improve. Donut redemption arc in progress.

Mr. Tines
Administrator
Administrator
User avatar
Age: 66
Posts: 21373
Joined: Nov 23, 2004
Location: This sceptered isle.
Gender: Male
Contact:

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Mr. Tines » Thu Dec 03, 2015 2:27 pm

To add to the fun, rather than putting every character into a 16-bit/2-byte form, the most commonly adopted representation of Unicode is a thing called UTF-8, which lets the original 128 ASCII characters take 1 byte each only, then uses a telescoping representation to hold the other code-points in 2, 3 or 4 bytes, all of which have values in the range 128-255 -- it means that normal English text is unaffected compared to pre-Unicode, Western languages take slightly more bytes, and the ideographic (CJKV) scripts about 50% more than in their previous standard representations.

What has happened in the translation is that the text has effectively been run through the UTF-8 process twice so that what we are seeing for non-ASCII characters (apart from the invisible non-printing characters that can be involved) are the unicode characters in the range 128-255 that correspond with the bytes of the real UTF-8 representation.

There are still lingering vestiges of the same double-dipping from the previous forum move (the one that included a seven month outage, rather than the almost continuity of the present arrangement).
Reminder: Play nicely <<>> My vanity publishing:- NGE|blog|Photos|retro-blog|Fanfics &c.|MAL|𝕏|🐸|🦣
Avatar: art deco Asuka

Bagheera
Asuka's Bulldog
Asuka's Bulldog
User avatar
Posts: 18679
Joined: Oct 15, 2010

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Bagheera » Thu Dec 03, 2015 4:06 pm

Seven months! That makes Monk's and Paul's efforts all the more impressive. The worst we have to worry about now is getting used to a new appearance and restoring some core functionality (the e-mail server and wiki, mainly, though that's just what I've noticed on the user end). Apart from that it's business as usual, and that's mind-blowing to me given the scale of the transfer. I mean, I barely understand the logistics involved, but even so it's obvious they did a lot of testing before taking this live. Nice work fellas!
Last edited by Bagheera on Thu Dec 03, 2015 8:21 pm, edited 2 times in total.
For my post-3I fic, go here.
The law doesn't protect people. People protect the law. -- Akane Tsunemori, Psycho-Pass
People's deaths are to be mourned. The ability to save people should be celebrated. Life itself should be exalted. -- Volken Macmani, Tatakau Shisho: The Book of Bantorra
I hate myself. But maybe I can learn to love myself. Maybe it's okay for me to be here! That's right! I'm me, nothing more, nothing less! I'm me. I want to be me! I want to be here! And it's okay for me to be here! -- Shinji Ikari, Neon Genesis Evangelion
Yes, I know. You thought it would be something about Asuka. You're such idiots.

Monk Ed
Sunshine Administrator
Sunshine Administrator
User avatar
Age: 38
Posts: 8601
Joined: Jul 12, 2008
Location: Chicagoland area
Gender: Male

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Monk Ed » Thu Dec 03, 2015 8:10 pm

It is awesome to have my hard work recognized and appreciated.

I've been working close to nonstop except for basic life functions and social obligations since transfer day.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara

NemZ
Token Misanthrope
Token Misanthrope
User avatar
Posts: 15804
Joined: Jun 28, 2008
Location: St. Louis
Gender: Male

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby NemZ » Thu Dec 03, 2015 9:08 pm

Also worth mentioning is that this process started more than a year ago, so this was very much NOT something we did on a whim. Monk, Paul, Tines and Ornette put in quite a bit of time collectively to make this transition as smooth as possible, and the fact that we can even discuss what isn't working right now is evidence of just how much IS working right.
Rest In Peace ~ 1978 - 2017
"I'd consider myself a realist, alright? but in philosophical terms I'm what's called a pessimist. It means I'm bad at parties." - Rust Cohle
"Think of how stupid the average person is, and realize that half of 'em are stupider than that." - George Carlin
"The internet: It's like a training camp for never amounting to anything." - Oglaf
"I think internet message boards and the like are dangerous." - Anno

Monk Ed
Sunshine Administrator
Sunshine Administrator
User avatar
Age: 38
Posts: 8601
Joined: Jul 12, 2008
Location: Chicagoland area
Gender: Male

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Monk Ed » Fri Dec 04, 2015 11:09 am

Huge breakthrough in the corrupted text issue everyone: I got (raw) text to transmit uncorrupted from the old database to a test clone of the current one. :headbang:

All that's left is to convert BBCode. I think. And that's actually been the hardest part so far -- but it will be worth it IMO to get back whatever might have been lost by trying to revert the lossy conversion.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara

Rei IV
Pilot
Pilot
User avatar
Age: 33
Posts: 2079
Joined: Dec 04, 2012
Location: USA
Gender: Male

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Rei IV » Fri Dec 04, 2015 1:57 pm

Looks like things are going according to plan, then. The transformation is almost complete.

:mwahaha: :gendoscheme:
Last edited by Rei IV on Sat Dec 05, 2015 9:23 am, edited 1 time in total.

Monk Ed
Sunshine Administrator
Sunshine Administrator
User avatar
Age: 38
Posts: 8601
Joined: Jul 12, 2008
Location: Chicagoland area
Gender: Male

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Monk Ed » Sat Dec 05, 2015 12:39 am

Anybody notice a certain not-me not-Tines system administrator's signature looking a little uncorrupted lately? :shifty:

The was the proof of concept, but it still has to be scaled up.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara

Bagheera
Asuka's Bulldog
Asuka's Bulldog
User avatar
Posts: 18679
Joined: Oct 15, 2010

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Bagheera » Sat Dec 05, 2015 3:36 am

Good work, Monk!
For my post-3I fic, go here.
The law doesn't protect people. People protect the law. -- Akane Tsunemori, Psycho-Pass
People's deaths are to be mourned. The ability to save people should be celebrated. Life itself should be exalted. -- Volken Macmani, Tatakau Shisho: The Book of Bantorra
I hate myself. But maybe I can learn to love myself. Maybe it's okay for me to be here! That's right! I'm me, nothing more, nothing less! I'm me. I want to be me! I want to be here! And it's okay for me to be here! -- Shinji Ikari, Neon Genesis Evangelion
Yes, I know. You thought it would be something about Asuka. You're such idiots.

Monk Ed
Sunshine Administrator
Sunshine Administrator
User avatar
Age: 38
Posts: 8601
Joined: Jul 12, 2008
Location: Chicagoland area
Gender: Male

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Monk Ed » Sat Dec 05, 2015 3:43 am

First wave of fixes done. User locations should be fixed now.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara

Rei IV
Pilot
Pilot
User avatar
Age: 33
Posts: 2079
Joined: Dec 04, 2012
Location: USA
Gender: Male

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Rei IV » Sat Dec 05, 2015 3:25 pm

Little by little, the Japanese characters are reappearing again.

:w00t:

Monk Ed
Sunshine Administrator
Sunshine Administrator
User avatar
Age: 38
Posts: 8601
Joined: Jul 12, 2008
Location: Chicagoland area
Gender: Male

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Monk Ed » Sun Dec 06, 2015 5:52 am

Holy shit it worked...

Uh, I mean...

Second wave of fixes done. Sigs should be fixed now. If there's any kind of BBCode oddity left over from the process, that can be fixed just by resubmitting your sig.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara

Monk Ed
Sunshine Administrator
Sunshine Administrator
User avatar
Age: 38
Posts: 8601
Joined: Jul 12, 2008
Location: Chicagoland area
Gender: Male

Re: Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Monk Ed » Mon Dec 07, 2015 1:12 am

:faint:

Third and final wave of fixes complete.

I'm gonna... go take a nap...
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara

Reichu
Admin Emeritus
Admin Emeritus
Posts: 24046
Joined: Aug 21, 2004
Location: Sailing for the white shores
Gender: Female
Contact:

Re: [SOLVED] Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Reichu » Wed Apr 13, 2016 8:08 pm

Here, I've found what appears to have been em-dashes reduced to code. I'll leave them alone, for now.
さらば、全てのEvaGeeks。
「滅びの運命は新生の喜びでもある」
Departure Message | The Arqa Apocrypha: An Evangelion Analysis Blog

Mr. Tines
Administrator
Administrator
User avatar
Age: 66
Posts: 21373
Joined: Nov 23, 2004
Location: This sceptered isle.
Gender: Male
Contact:

Re: [SOLVED] Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Mr. Tines » Thu Apr 14, 2016 1:07 am

Post date: Sat Sep 02, 2006 16:36 UTC
That would be another relic of the previous transfer, needing manual fix-up anyway.
Reminder: Play nicely <<>> My vanity publishing:- NGE|blog|Photos|retro-blog|Fanfics &c.|MAL|𝕏|🐸|🦣
Avatar: art deco Asuka

Reichu
Admin Emeritus
Admin Emeritus
Posts: 24046
Joined: Aug 21, 2004
Location: Sailing for the white shores
Gender: Female
Contact:

Re: [SOLVED] Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Reichu » Wed Apr 27, 2016 8:27 pm

This bit of wackiness is more recent. The thread title, specifically.
さらば、全てのEvaGeeks。
「滅びの運命は新生の喜びでもある」
Departure Message | The Arqa Apocrypha: An Evangelion Analysis Blog

Mr. Tines
Administrator
Administrator
User avatar
Age: 66
Posts: 21373
Joined: Nov 23, 2004
Location: This sceptered isle.
Gender: Male
Contact:

Re: [SOLVED] Don't fix corrupted text; we're working on it

  •      
  •      
  • Quote

Postby Mr. Tines » Thu Apr 28, 2016 12:23 am

Fixed.
Reminder: Play nicely <<>> My vanity publishing:- NGE|blog|Photos|retro-blog|Fanfics &c.|MAL|𝕏|🐸|🦣
Avatar: art deco Asuka


Return to “EvaGeeks News and Feedback”

Who is online

Users browsing this forum: No registered users and 8 guests