Archives

For use by moderators only to discuss diabolical things. :-)

Archives

Postby GeoffSchultz » Wed Dec 17, 2008 3:59 pm

Well, I think that I've finished with the code to import the Freedom Yahoo group over to here. You can find the results in the Archives category. What do you think of the format? Some of the posts seem to grow with blank lines and I can't figure that out, so I'm just going to let it be.

I only processed the first 30 records, but there are 4800 to do.

Comments?

-- Geoff
BlueJacket
1997 Freedom 40/40
http://www.GeoffSchultz.org
User avatar
GeoffSchultz
 
Posts: 1025
Joined: Thu Dec 04, 2008 8:39 am
Location: BlueJacket: Guatemala

Re: Archives

Postby katorpus » Wed Dec 17, 2008 5:44 pm

Well... a couple of things (OK...six things).

1) Are we seeking a "complete archive" here or a moderated import of that which is germane and of continuing interest?

Example...a "boat for sale" listing imported into the archives

2) What's going to happen when subsequent (not yet imported) posts are made to the existing threads? When they are ultimately imported, will they "show up" by topic as an appendage to the existing thread, or as another topic with the same name? (ye gads!!)

3) How are you (or ARE you) planning to monitor things over there on an ongoing basis in order to import future posts on existing threads? Is that part of your coding?

4) Do you have any objection to your moderators "cleaning up" by editing the posts to eliminate quoted prior posts and leaving only what's been added by the each individual? (To be done ONLY in times of abject boredom, such as when waiting "on hold" for an IRS agent). For that matter, some comments are "worthy" of being eliminated entirely. ("I agree!...I don't agree!...(with no further explanation)...for example).

5) I have my configuration set to show "last posts first" in order to quickly see what's being added (here) without having to scroll/click through multiple posts/pages in order to get to the latest comment. This is a mess when viewing a thread that you are trying to read from start to finish, having never "seen" any of it before.

6) How "automated" is your coding? It sure looks like a hell of a lot of work on either the front end (to not import the "drivel" in the first place) or on the back end (to get rid of said "drivel" and not have the archive be in as (almost) useless a form as it is on the FOG group. There are hundreds of posts ...things like rendevous long past...the pissing match over our very existence here...that are not worthy of archiving.
katorpus
 
Posts: 143
Joined: Thu Dec 04, 2008 10:51 pm
Location: Freedom 40 Cat Ketch, Hull #61, Corpus Christi, TX

Re: Archives

Postby admin » Wed Dec 17, 2008 6:09 pm

Believe me, this is NOT automated at all. If you're really interested, these are the steps that I have to go through. If nothing else this helps me remember all of the steps that I need to take.

  1. Install MySQL and ODBC drivers on PC (1 time job)
  2. Run a program called PG Offine which sucks the Yahoo Group into an Access database on my PC
  3. Run BullZip's "Access to My SQL" program which copies the Access database to a MySQL database on my PC
  4. Use HeidiSQL to export MySQL tables to SQL file.
  5. Edit SQL file to access appropriate database on main web server
  6. Use HeidiSQL to access database on main web server
  7. Import SQL files created/edited in steps 4 & 5 to web server database
  8. Spend 2 days writing code to merge above database with phpBB database.
  9. Modify "inport_yahoo.php" to point to the correct forum. Note this is hard coded.
  10. Execute "import_yahoo.php" via web browser
  11. Delete temporary database tables created in step 7.
  12. Delete "import_yahoo.php" from web server

Now, as far as the rest of your questions.

Keeping this updated will be a spuradic event as it's not automated. Basically you have to repeat the above, just selecting posts past a certain date. The code would have to be updated to look for posts with the same subject and merge them into it. That's a whole other programming issue as I didn't plan for that, but it can be done. The other option is to just rebuild the entire archive every time I do this. That might be a lot easier.

I have absolutely no problem with people cleaning up the archives, but if I rebuild it from scratch, that won't work. I guess that should just be put on hold pending how much time I can devote to automating this.

I plan to have the archive be "read-only" so that no one can add anything. It is an archive after all...

-- Geoff
admin
Site Admin
 
Posts: 12
Joined: Thu Dec 04, 2008 7:49 am

Re: Archives

Postby THATBOATGUY » Wed Dec 17, 2008 6:25 pm

The value as I see it is to have a searchable archive here of the Yahoo group. That you have accomplished. I'd forget about cleaning it up and just re do the whole mess every time the post count at Yahoo reaches a certain number... say 300. That keeps the work down to a minimum.

My hats off once again Geoff. :)

George
George and Kerri Huffman S/V Marquesa Freedom 40 CC CK Sail MarquesaImage
User avatar
THATBOATGUY
 
Posts: 574
Joined: Thu Dec 04, 2008 9:50 am
Location: F40 CC CK Maryland

Re: Archives

Postby katorpus » Wed Dec 17, 2008 6:35 pm

maybe in the hard coding...

you could "identify" threads of no current interest (don't ya kinda think those "free fuller ports" are gone by now?)...and delete them from the update routine (so they wouldn't be re-imported in a later rebuild)...then delete the "junk" that's here (that you won't be re-importing) and at least be a little more cleaned up?

It would be a shame to lose ALL of what's over at the other group (if it "went away"), but I don't see a lot of utility in mirroring all of it.
katorpus
 
Posts: 143
Joined: Thu Dec 04, 2008 10:51 pm
Location: Freedom 40 Cat Ketch, Hull #61, Corpus Christi, TX

Re: Archives

Postby THATBOATGUY » Wed Dec 17, 2008 7:05 pm

hmmmm... I just tried to search the archive and it didn't work.

George
George and Kerri Huffman S/V Marquesa Freedom 40 CC CK Sail MarquesaImage
User avatar
THATBOATGUY
 
Posts: 574
Joined: Thu Dec 04, 2008 9:50 am
Location: F40 CC CK Maryland

Re: Archives

Postby katorpus » Fri Dec 19, 2008 8:30 am

Kudos on your monumental effort, Geoff

I only hope that the responses to your post (over at FOG) are only positive...or at least well-defended by some of the "cooler heads" over there (but I'm more of a realist than that)...

btw: "Pessimist is defined as ...what an optimist calls a realist."

Did your "weeding out of the spam" preclude future "total rebuild" downloads? In other words, would it be a waste of time for a moderator to "trim things down" a little more?
katorpus
 
Posts: 143
Joined: Thu Dec 04, 2008 10:51 pm
Location: Freedom 40 Cat Ketch, Hull #61, Corpus Christi, TX

Re: Archives

Postby GeoffSchultz » Fri Dec 19, 2008 10:34 am

There wasn't spam in the current group. All of the spam that I cleaned out was in the 2003 group, which should be static, and that shouldn't require any more downloads.

-- Geoff

I do have to laugh about the post from Lance stating that the group now has 100 GB of storage. The total storage required by the current group is about 87 MB. Why on earth is anyone concerned about storage?
BlueJacket
1997 Freedom 40/40
http://www.GeoffSchultz.org
User avatar
GeoffSchultz
 
Posts: 1025
Joined: Thu Dec 04, 2008 8:39 am
Location: BlueJacket: Guatemala

Re: Archives

Postby THATBOATGUY » Fri Dec 19, 2008 11:22 am

It looks great but is it searchable? I just tried searching it for the word "plumbing" and came up empty handed. I'm more of a realist too John. I don't have high hopes for the antiques to roll over but you never know...

And once again a tip of the hat to Geoff. Good job!

George
George and Kerri Huffman S/V Marquesa Freedom 40 CC CK Sail MarquesaImage
User avatar
THATBOATGUY
 
Posts: 574
Joined: Thu Dec 04, 2008 9:50 am
Location: F40 CC CK Maryland

Re: Archives

Postby GeoffSchultz » Fri Dec 19, 2008 11:29 am

I need to rebuild the search word list, but I have to do that on my machine as it takes too much CPU time to do it on a shared server. Then I have to \ upload the search word list to the main server. I'm actually working on that today.

-- Geoff
BlueJacket
1997 Freedom 40/40
http://www.GeoffSchultz.org
User avatar
GeoffSchultz
 
Posts: 1025
Joined: Thu Dec 04, 2008 8:39 am
Location: BlueJacket: Guatemala

Next

Return to Moderators Only Forum

Who is online

Users browsing this forum: No registered users and 1 guest