[SOLVED] Don't fix corrupted text; we're working on it
Moderator: Board Staff
- Monk Ed
- Sunshine Administrator
- Age: 38
- Posts: 8601
- Joined: Jul 12, 2008
- Location: Chicagoland area
- Gender: Male
[SOLVED] Don't fix corrupted text; we're working on it
The corrupted text issue is in process and it would be best to leave all corrupted text (e.g. in signatures) as-is so that when we run a reencode on the database everything should go back to normal.
But constructing the necessary script is a delicate process with a lot of measure twice, cut once so it might take a while before we implement the solution.
But constructing the necessary script is a delicate process with a lot of measure twice, cut once so it might take a while before we implement the solution.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
- TehDonutKing
- Camel Dilettante
- Age: 28
- Posts: 3934
- Joined: Apr 23, 2010
- Location: Outer Space Jupiter
- Contact:
Re: Don't fix corrupted text; we're working on it
Out of curiosity, could you or Paul explain the mechanics behind the corrupted text?
/hj
I said and did some dumb and hurtful things in my time here when i was younger. If i ever hurt you, i'm sorry. If you see any of this while reading old threads, i'm learning and trying to improve. Donut redemption arc in progress.
I said and did some dumb and hurtful things in my time here when i was younger. If i ever hurt you, i'm sorry. If you see any of this while reading old threads, i'm learning and trying to improve. Donut redemption arc in progress.
- pwhodges
- A Lilin in Wonderland
- Age: 77
- Posts: 11035
- Joined: Nov 18, 2012
- Location: Oxford, UK
- Contact:
Re: Don't fix corrupted text; we're working on it
A character set is a way of translating between conceptual "glyphs" and bit patterns. Initially (Telex and the like) only 32 characters were handled (hence 5-hole paper tape), but quite quickly this was expanded to 128 (7 bits, and an eighth used for parity) - which fitted the concept of a byte and its power-of-two-ness just nicely. But the parts of the western world that used accents and other characters wanted more than the US (who defined the first such mappings) provided; so they dropped the parity and used the extra 128 values freed for their own requirements. They did this differently in every country and language, and often several ways in the same place. Also IBM used a different mapping of the US original characters from everyone else (something to do with cards).
So there grew up dozens, hundreds of ways of mapping between glyphs and numbers (bit patterns), and any computer-stored text was meaningless unless you knew which mapping to use. Email got to include fields that specified the mapping; but lots of programs didn't bother - they just assumed everyone that used their program would agree, and got on with it. So inevitably, with increasing international spread of data, mistakes happened more and more frequently.
But what of other alphabets and systems? Arabic, Cyrillic, Greek Hebrew, Chinese, Japanese, Korean, Urdu, etc, etc? In the 1990s an effort was made to standardise a single set of mappings that included all glyphs in all languages. This was (is) called Unicode. Instead of the eight bits (256 characters) of the old multiple character sets (also called "code pages"), Unicode used sixteen bits (32768 characters) - however, this has proven to be insufficient, and I think eighteen bits are currently required to handle the whole defined sweep of glyphs. There are schemes included in Unicode for using subsets of characters and switching between them, so that a string of eight- (or even seven-) bit characters can be used, for convenience. But of course, these schemes need to be understood by all programs involved with the data.
phpBB v2 was written without built-in awareness of Unicode, and ways of handling alternative character sets including later some forms of Unicode got patched in bit by bit. But sometimes mistakes were made. Sometimes different codings were used in different parts of the program. Sometimes some but not all data was updated when a change (aka improvement) was made. And so gradually the data became inconsistent, and relied on matching inconsistencies in the program to (usually) come out OK in the end.
In the conversion of the EGF database to the new fully Unicode-aware version of the forum software, Monk Ed has been trying to find ways to convert the characters, where necessary, in the right way - which is not always the same. He got most of them right (for part of the time, the converter was simply dumping posts that it didn't understand right, which were a lot); but a few things got missed (sigs, location in profiles), and he's working on getting those converted as well. When this is done, there will be no need for such problems to appear in the forum again.
(Incidentally, the rather acerbic account above is not meant to minimise the achievement of Ornette and Mr.Tines in making the old software handle this as well as it did - they did a good job.)
So there grew up dozens, hundreds of ways of mapping between glyphs and numbers (bit patterns), and any computer-stored text was meaningless unless you knew which mapping to use. Email got to include fields that specified the mapping; but lots of programs didn't bother - they just assumed everyone that used their program would agree, and got on with it. So inevitably, with increasing international spread of data, mistakes happened more and more frequently.
But what of other alphabets and systems? Arabic, Cyrillic, Greek Hebrew, Chinese, Japanese, Korean, Urdu, etc, etc? In the 1990s an effort was made to standardise a single set of mappings that included all glyphs in all languages. This was (is) called Unicode. Instead of the eight bits (256 characters) of the old multiple character sets (also called "code pages"), Unicode used sixteen bits (32768 characters) - however, this has proven to be insufficient, and I think eighteen bits are currently required to handle the whole defined sweep of glyphs. There are schemes included in Unicode for using subsets of characters and switching between them, so that a string of eight- (or even seven-) bit characters can be used, for convenience. But of course, these schemes need to be understood by all programs involved with the data.
phpBB v2 was written without built-in awareness of Unicode, and ways of handling alternative character sets including later some forms of Unicode got patched in bit by bit. But sometimes mistakes were made. Sometimes different codings were used in different parts of the program. Sometimes some but not all data was updated when a change (aka improvement) was made. And so gradually the data became inconsistent, and relied on matching inconsistencies in the program to (usually) come out OK in the end.
In the conversion of the EGF database to the new fully Unicode-aware version of the forum software, Monk Ed has been trying to find ways to convert the characters, where necessary, in the right way - which is not always the same. He got most of them right (for part of the time, the converter was simply dumping posts that it didn't understand right, which were a lot); but a few things got missed (sigs, location in profiles), and he's working on getting those converted as well. When this is done, there will be no need for such problems to appear in the forum again.
(Incidentally, the rather acerbic account above is not meant to minimise the achievement of Ornette and Mr.Tines in making the old software handle this as well as it did - they did a good job.)
"Being human, having your health; that's what's important." (from: Magical Shopping Arcade Abenobashi )
"As long as we're all living, and as long as we're all having fun, that should do it, right?" (from: The Eccentric Family )
Avatar: The end of the journey (details); Past avatars.
Before 3.0+1.0 there was Afterwards... my post-Q Evangelion fanfic (discussion)
"As long as we're all living, and as long as we're all having fun, that should do it, right?" (from: The Eccentric Family )
Avatar: The end of the journey (details); Past avatars.
Before 3.0+1.0 there was Afterwards... my post-Q Evangelion fanfic (discussion)
- TehDonutKing
- Camel Dilettante
- Age: 28
- Posts: 3934
- Joined: Apr 23, 2010
- Location: Outer Space Jupiter
- Contact:
Re: Don't fix corrupted text; we're working on it
I was aware of ASCII and Unicode, but i haven't dealt with other character sets before. Thank you.
/hj
I said and did some dumb and hurtful things in my time here when i was younger. If i ever hurt you, i'm sorry. If you see any of this while reading old threads, i'm learning and trying to improve. Donut redemption arc in progress.
I said and did some dumb and hurtful things in my time here when i was younger. If i ever hurt you, i'm sorry. If you see any of this while reading old threads, i'm learning and trying to improve. Donut redemption arc in progress.
- Mr. Tines
- Administrator
- Age: 66
- Posts: 21376
- Joined: Nov 23, 2004
- Location: This sceptered isle.
- Gender: Male
- Contact:
Re: Don't fix corrupted text; we're working on it
To add to the fun, rather than putting every character into a 16-bit/2-byte form, the most commonly adopted representation of Unicode is a thing called UTF-8, which lets the original 128 ASCII characters take 1 byte each only, then uses a telescoping representation to hold the other code-points in 2, 3 or 4 bytes, all of which have values in the range 128-255 -- it means that normal English text is unaffected compared to pre-Unicode, Western languages take slightly more bytes, and the ideographic (CJKV) scripts about 50% more than in their previous standard representations.
What has happened in the translation is that the text has effectively been run through the UTF-8 process twice so that what we are seeing for non-ASCII characters (apart from the invisible non-printing characters that can be involved) are the unicode characters in the range 128-255 that correspond with the bytes of the real UTF-8 representation.
There are still lingering vestiges of the same double-dipping from the previous forum move (the one that included a seven month outage, rather than the almost continuity of the present arrangement).
What has happened in the translation is that the text has effectively been run through the UTF-8 process twice so that what we are seeing for non-ASCII characters (apart from the invisible non-printing characters that can be involved) are the unicode characters in the range 128-255 that correspond with the bytes of the real UTF-8 representation.
There are still lingering vestiges of the same double-dipping from the previous forum move (the one that included a seven month outage, rather than the almost continuity of the present arrangement).
Reminder: Play nicely <<>> My vanity publishing:- NGE|blog|Photos|retro-blog|Fanfics &c.|MAL|𝕏|🐸|🦣
Avatar: art deco Asuka
Avatar: art deco Asuka
Re: Don't fix corrupted text; we're working on it
Seven months! That makes Monk's and Paul's efforts all the more impressive. The worst we have to worry about now is getting used to a new appearance and restoring some core functionality (the e-mail server and wiki, mainly, though that's just what I've noticed on the user end). Apart from that it's business as usual, and that's mind-blowing to me given the scale of the transfer. I mean, I barely understand the logistics involved, but even so it's obvious they did a lot of testing before taking this live. Nice work fellas!
Last edited by Bagheera on Thu Dec 03, 2015 8:21 pm, edited 2 times in total.
For my post-3I fic, go here.
The law doesn't protect people. People protect the law. -- Akane Tsunemori, Psycho-Pass
People's deaths are to be mourned. The ability to save people should be celebrated. Life itself should be exalted. -- Volken Macmani, Tatakau Shisho: The Book of Bantorra
I hate myself. But maybe I can learn to love myself. Maybe it's okay for me to be here! That's right! I'm me, nothing more, nothing less! I'm me. I want to be me! I want to be here! And it's okay for me to be here! -- Shinji Ikari, Neon Genesis Evangelion
Yes, I know. You thought it would be something about Asuka. You're such idiots.
The law doesn't protect people. People protect the law. -- Akane Tsunemori, Psycho-Pass
People's deaths are to be mourned. The ability to save people should be celebrated. Life itself should be exalted. -- Volken Macmani, Tatakau Shisho: The Book of Bantorra
I hate myself. But maybe I can learn to love myself. Maybe it's okay for me to be here! That's right! I'm me, nothing more, nothing less! I'm me. I want to be me! I want to be here! And it's okay for me to be here! -- Shinji Ikari, Neon Genesis Evangelion
Yes, I know. You thought it would be something about Asuka. You're such idiots.
- Monk Ed
- Sunshine Administrator
- Age: 38
- Posts: 8601
- Joined: Jul 12, 2008
- Location: Chicagoland area
- Gender: Male
Re: Don't fix corrupted text; we're working on it
It is awesome to have my hard work recognized and appreciated.
I've been working close to nonstop except for basic life functions and social obligations since transfer day.
I've been working close to nonstop except for basic life functions and social obligations since transfer day.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
Re: Don't fix corrupted text; we're working on it
Also worth mentioning is that this process started more than a year ago, so this was very much NOT something we did on a whim. Monk, Paul, Tines and Ornette put in quite a bit of time collectively to make this transition as smooth as possible, and the fact that we can even discuss what isn't working right now is evidence of just how much IS working right.
Rest In Peace ~ 1978 - 2017
"I'd consider myself a realist, alright? but in philosophical terms I'm what's called a pessimist. It means I'm bad at parties." - Rust Cohle
"Think of how stupid the average person is, and realize that half of 'em are stupider than that." - George Carlin
"The internet: It's like a training camp for never amounting to anything." - Oglaf
"I think internet message boards and the like are dangerous." - Anno
"I'd consider myself a realist, alright? but in philosophical terms I'm what's called a pessimist. It means I'm bad at parties." - Rust Cohle
"Think of how stupid the average person is, and realize that half of 'em are stupider than that." - George Carlin
"The internet: It's like a training camp for never amounting to anything." - Oglaf
"I think internet message boards and the like are dangerous." - Anno
- Monk Ed
- Sunshine Administrator
- Age: 38
- Posts: 8601
- Joined: Jul 12, 2008
- Location: Chicagoland area
- Gender: Male
Re: Don't fix corrupted text; we're working on it
Huge breakthrough in the corrupted text issue everyone: I got (raw) text to transmit uncorrupted from the old database to a test clone of the current one.
All that's left is to convert BBCode. I think. And that's actually been the hardest part so far -- but it will be worth it IMO to get back whatever might have been lost by trying to revert the lossy conversion.
All that's left is to convert BBCode. I think. And that's actually been the hardest part so far -- but it will be worth it IMO to get back whatever might have been lost by trying to revert the lossy conversion.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
Re: Don't fix corrupted text; we're working on it
Looks like things are going according to plan, then. The transformation is almost complete.
Last edited by Rei IV on Sat Dec 05, 2015 9:23 am, edited 1 time in total.
- Monk Ed
- Sunshine Administrator
- Age: 38
- Posts: 8601
- Joined: Jul 12, 2008
- Location: Chicagoland area
- Gender: Male
Re: Don't fix corrupted text; we're working on it
Anybody notice a certain not-me not-Tines system administrator's signature looking a little uncorrupted lately?
The was the proof of concept, but it still has to be scaled up.
The was the proof of concept, but it still has to be scaled up.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
Re: Don't fix corrupted text; we're working on it
Good work, Monk!
For my post-3I fic, go here.
The law doesn't protect people. People protect the law. -- Akane Tsunemori, Psycho-Pass
People's deaths are to be mourned. The ability to save people should be celebrated. Life itself should be exalted. -- Volken Macmani, Tatakau Shisho: The Book of Bantorra
I hate myself. But maybe I can learn to love myself. Maybe it's okay for me to be here! That's right! I'm me, nothing more, nothing less! I'm me. I want to be me! I want to be here! And it's okay for me to be here! -- Shinji Ikari, Neon Genesis Evangelion
Yes, I know. You thought it would be something about Asuka. You're such idiots.
The law doesn't protect people. People protect the law. -- Akane Tsunemori, Psycho-Pass
People's deaths are to be mourned. The ability to save people should be celebrated. Life itself should be exalted. -- Volken Macmani, Tatakau Shisho: The Book of Bantorra
I hate myself. But maybe I can learn to love myself. Maybe it's okay for me to be here! That's right! I'm me, nothing more, nothing less! I'm me. I want to be me! I want to be here! And it's okay for me to be here! -- Shinji Ikari, Neon Genesis Evangelion
Yes, I know. You thought it would be something about Asuka. You're such idiots.
- Monk Ed
- Sunshine Administrator
- Age: 38
- Posts: 8601
- Joined: Jul 12, 2008
- Location: Chicagoland area
- Gender: Male
Re: Don't fix corrupted text; we're working on it
First wave of fixes done. User locations should be fixed now.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
Re: Don't fix corrupted text; we're working on it
Little by little, the Japanese characters are reappearing again.
- Monk Ed
- Sunshine Administrator
- Age: 38
- Posts: 8601
- Joined: Jul 12, 2008
- Location: Chicagoland area
- Gender: Male
Re: Don't fix corrupted text; we're working on it
Holy shit it worked...
Uh, I mean...
Second wave of fixes done. Sigs should be fixed now. If there's any kind of BBCode oddity left over from the process, that can be fixed just by resubmitting your sig.
Uh, I mean...
Second wave of fixes done. Sigs should be fixed now. If there's any kind of BBCode oddity left over from the process, that can be fixed just by resubmitting your sig.
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
- Monk Ed
- Sunshine Administrator
- Age: 38
- Posts: 8601
- Joined: Jul 12, 2008
- Location: Chicagoland area
- Gender: Male
Re: Don't fix corrupted text; we're working on it
Third and final wave of fixes complete.
I'm gonna... go take a nap...
System Administrator
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
"NGE is like a perfectly improvised jazz piece. It builds on a standard and then plays off it from there, and its developments may occasionally recall what it's done before as a way of keeping the whole concatenated." -- Eva Yojimbo
"To me watching anime is not just for killing time or entertainment, it is a life style, and a healthy one too." -- symbv
"That sounds like the kind of science that makes absolutely 0 sense when you stop and think about it... I LOVE IT." -- Rosenakahara
- Reichu
- Admin Emeritus
- Posts: 24046
- Joined: Aug 21, 2004
- Location: Sailing for the white shores
- Gender: Female
- Contact:
Re: [SOLVED] Don't fix corrupted text; we're working on it
Here, I've found what appears to have been em-dashes reduced to code. I'll leave them alone, for now.
さらば、全てのEvaGeeks。
「滅びの運命は新生の喜びでもある」
Departure Message | The Arqa Apocrypha: An Evangelion Analysis Blog
「滅びの運命は新生の喜びでもある」
Departure Message | The Arqa Apocrypha: An Evangelion Analysis Blog
- Mr. Tines
- Administrator
- Age: 66
- Posts: 21376
- Joined: Nov 23, 2004
- Location: This sceptered isle.
- Gender: Male
- Contact:
Re: [SOLVED] Don't fix corrupted text; we're working on it
Post date: Sat Sep 02, 2006 16:36 UTC
That would be another relic of the previous transfer, needing manual fix-up anyway.
That would be another relic of the previous transfer, needing manual fix-up anyway.
Reminder: Play nicely <<>> My vanity publishing:- NGE|blog|Photos|retro-blog|Fanfics &c.|MAL|𝕏|🐸|🦣
Avatar: art deco Asuka
Avatar: art deco Asuka
- Reichu
- Admin Emeritus
- Posts: 24046
- Joined: Aug 21, 2004
- Location: Sailing for the white shores
- Gender: Female
- Contact:
Re: [SOLVED] Don't fix corrupted text; we're working on it
This bit of wackiness is more recent. The thread title, specifically.
さらば、全てのEvaGeeks。
「滅びの運命は新生の喜びでもある」
Departure Message | The Arqa Apocrypha: An Evangelion Analysis Blog
「滅びの運命は新生の喜びでもある」
Departure Message | The Arqa Apocrypha: An Evangelion Analysis Blog
- Mr. Tines
- Administrator
- Age: 66
- Posts: 21376
- Joined: Nov 23, 2004
- Location: This sceptered isle.
- Gender: Male
- Contact:
Re: [SOLVED] Don't fix corrupted text; we're working on it
Fixed.
Reminder: Play nicely <<>> My vanity publishing:- NGE|blog|Photos|retro-blog|Fanfics &c.|MAL|𝕏|🐸|🦣
Avatar: art deco Asuka
Avatar: art deco Asuka
Return to “EvaGeeks News and Feedback”
Who is online
Users browsing this forum: No registered users and 43 guests