If this comes to be I will contribute again. My primary desire is to stay out of the melee of lily-dev. I remain firm in my belief that my presence reached the point of being an obstacle to growth rather than as an agent of it.
-- ChristianRatliff - 06 Aug 2003
I refuse to consider Christian's many contributions over many years as an obstacle to growth. The problem is not his work or his presence, the problem is that it's so difficult to merge work which is coming from multiple sources at the same time. So, what everyone wanted to do was beat up on Christian until he did whatever good ideas they had. I think it's amazing that Christian lasted in that endless crossfire for as long as he did. I was in that crossfire for about three months before I wanted to scream at everyone to just LEAVE ME ALONE!
--
GaranceDrosehn - 06 Aug 2003
So, we need to make it much much easier to merge work from multiple sources. The obvious solution in normal open-source coding projects is to have a CVS repository, with branches, and diff's, and changelogs which say why any given change was made. Everyone who has done much work with lily understands this. The trick is how to implement it, given that what we think of as "lily" is really a dynamic database of actively-running code. It isn't code which is compiled and then run.
The tactic I think we want to use is to come up with a way to make snapshots from an actively-running core, and turn that into a collection of flat files. We can then put those flat files into a CVS repository. We also need a way to take that set of flat files, and create a new lilyDB out of them.
Then each person can start with some snapshot of a lilyDB, create a running core out of that, make changes to the running core until it works exactly as they want it to work, and then make a new snapshot. You can then "commit" that snapshot to the original repository, and generate a single set of diffs which contain all of the coding (and informational) changes which are necessary for that change.
So, what have we so far? There is a perl script that Josh wrote, which can connect to a running lilyDB and extract all kinds of information from it. I think that is the dump-server-to-files script that is a part of the verb-utils. There is also a "splitcore.rb" ruby script that I have been working on. My script was actually an offshoot of the corediff.pl script that Christian had started to work on, and which is at
http://www.lily.org/users/cratliff/lily/ .
The big difference between the splitcore and dump-server-to-files scripts is that splitcore (and corediff) work on an lilyDB file (ie, a checkpoint of a running core), while dump-server-to-files works by connecting to the running core and asking it a bunch of questions.
The advantage of dump-server-to-files is that it needs no changes due to different versions of the underlying lambdaMOO code. splitcore needs to know exactly what lambdaMOO stores in it's files, and exactly what format the information is stored in. lambdaMOO is open-source, so that is not a serious barrier, but it definitely means that splitcore will have to be updated any time that lambdaMOO changes how it stores information -- even if lily is not using any new features.
The advantage of splitcore is that I was forced to figure out absolutely every piece of information that lambdaMOO stores away. Due to this, splitcore already pulls out a more complete picture of the lilyDB than dump-server-to-files does. There are still some lines that the script does not understand, but the script does know how to format somewhere over 95% of all the information from the lilyDB file.
Right now I'm in the middle of rather significant restructuring of the splitcore script, and I want to finish that before making it more available. Also, this only knows how to turn a lilyDB into a directory of plain files, it does not know how to take those plain files and turn it back into a lilyDB. What I would do is use splitcore on a checkpoint of RPI's core, and then on a checkpoint of some other core (either 'cr7' or the old devCore which many people have tried working on over the years). I would then use the File-Merge utility (on Mac OS 10) to compare the two directories of information. Based on what I found there, I would copy bits&pieces of information from one core to the other one. "information" includes either code-changes to existing verbs, or new verbs, or verb_info, or properties, or property_info.
Some of the information that splitcore generates is organized poorly, or formatted poorly. I just recently realized that the dump-server-to-files script was already written, and in looking that over I see it has some better ideas for formatting some of that information. So, I'm going to make some formatting changes to splitcore based on those ideas.
Another thing about splitcore is that it actually understands all the information that it is organizing, so you have it do things like "only create the files which reference the variable 'discussions'". It should also be possible to have it read in two separate cores, and only create files which have changed between the two cores -- although I'm not sure that feature is as useful as I originally thought it would be.
I also need to make some performance improvements to it. The smarter I made the script, the longer it takes to run (imagine that...). It works pretty quickly on the test cores I give it, but it takes the script more than five minutes to process the production RPI core -- and a large part of that time is skipping over things like user-objects, /memos, and /info's. Ie, stuff that we really don't care about when it comes to comparing code-changes between different cores.
I would also say that it would be helpful if we did a better job of separating "core-specific" values from "code-specific" values. There are global variables which are changing in any lilyDB, simply because discussions are being created or destroyed, or accounts are created, or simply "time passes". Right next to those are other variables which you would have to change for the new /review code to work (for example). We need to have a good, reliable way to know which variables are important for code-changes, and which ones should be ignored.
The splitcore script is not as far along as I would like it to be, but without it I simply could not have gotten as far as I did with merging the "cr7", devCore, and RPI cores. The only tools I used for all that work was this script, File-Merge, and %source commands in clily.
There is more I could write up on this topic, but I need to get back to real-work now.
--
GaranceDrosehn - 06 Aug 2003
As far as CVS is concerned, the way we might want to handle this is to have a single CVS repository for all "development cores" (including RPI's, which sees a lot of development on it), and have each core as a separate CVS branch in that repository. I haven't thought too much about that, though. I'm just guessing-out-loud here.
--
GaranceDrosehn - 06 Aug 2003
I would not want to see separate branches for separate cores. If people are working on something in their own core that they're not ready to check in, they can not check it in. As for "what's important, what's not", that should be in a configuration file in cvs. (cvs should include both the mooDB bits and the tools and glue necessary to hack on them. I envision being able to "make core" and "make files" and "make test".
--
CoKe - 06 Aug 2003
They might want to "check it in", in the sense that the code is in production on their core. I might want to say "I'm not going to pick up that change until it's been running a month", or something, and they would want to check in their work (somewhere) before starting on some other programming task. I'm still not sure this the right idea, but it "kinda seems right" in some ways.
When it comes to "what's important, what's not", I suspect that that information should be in the lilyDB itself, somewhere. Another area where this comes up is what properties we make available via the web interface. We recently had a case that we where making public (through the web interface) some password-related info. I made a quick dumb patch for that specific case, but we need to have a better way to handle that. And in that case, the information most certainly has to be part of the lilyDB itself.
--
GaranceDrosehn - 06 Aug 2003
The one suggestion I can offer to speed the script is to cut off object searches at #99. The rule for the server is that no programmatic object number may exist over #99. The lilyCore cut process destroys all such objects, so any work on objects with higher numbers would be lost by the cut anyhow.
This is one of those tricks which ought to be documented before someone rips out their hair.
-- ChristianRatliff - 07 Aug 2003
Initially my script only did the first 99 objects, because it was literally skipping over any lines it "didn't understand" and only picked up lines that it "thought it understood". The problem with this is you start tripping over lines from /review, /info, or /memos which look like the start of some object (when you are reading the database file directly), and that can cause all kinds of confusion in your processing.
So, I purposely read every single line in the database, do some sanity-checking on it, and save it away if it is something that I care about. The problem is that I save pretty much all information until I get to the end of the object, and then I throw away that object if it is one that I don't care about. I think I can speed things up if I still sanity-check lines, but don't bother saving any values for objects that I know I'm going to toss away.
Also, there's a question of how we would update one core from another one. One way to do that is to split the entire target core into files, apply updates to only the first 99 verbs, and then take those files and rebuild a new lilyDB from them. Thus, the split-up version would need all information from all objects. There are, of course, other ways to handle upgrades, but I didn't want this script to be the limiting factor. I'm still not sure what will be the best way to handle the actual upgrading process.
--
GaranceDrosehn - 07 Aug 2003
Over the Labor Day weekend I did some more work on my splitcore script. The more I think about
LilyInCvs while I work on the script, the more I think this would be difficult to automate. "Automate" in the sense that you could just blindly 'cvs commit' a new core that you have working, and then others could 'cvs update' their core from what you committed in the repository. At least the way that information is organized in lilyDB right now, we have core-specific data mixed in pretty closely with lily-specific data. Object #0, for instance, has some properties which would need to exist and be the same in all lily-cores, and other properties whose value will almost certainly be different in different cores.
Also, right now the script is mainly concerned with formatting information such that it's easy for me to figure out what has changed. This is a good thing, of course, but it means the output files are
not in a format such that it would be easy to re-create a lily database from the output files. The script helps to organize all the information so you can see what has changed, but it still needs a person to look at the result and say "I'll take this change, but ignore that change".
Well, I had more I was going to write up, but I'm too tired at the moment.
--
GaranceDrosehn - 02 Sep 2003
For what it's worth, the present version of the splitcore.rb script is at ftp://lily.acm.rpi.edu/pub/lily/utils/lily-rtools.tar.gz, but I should probably have it under a different name. This allows you to run the script on a core (or preferably two) to see what I'm trying to do. Save a copy of a core, run the core, make some changes, checkpoint the core, and then run splitcore.rb on the original core and the newer version. Then, use some handy tool which allows you to compare two directories of files. Under Mac 10 I use File-Merge.
--
GaranceDrosehn - 03 Sep 2003
This document is beginning to underscore the fact that we need a good editor.
^_^.
After some recent experiments with svn by
TigerLily, I've found that subversion has an actual API that may make it reasonable to integrate source control directly into lily. An upgrade path would then be "apply the version control patch", then "run the admin command that upgrades you to a particular version" (and, after you've applied the vcp once, you never have to do that step again.) This would effectively kill transfer, which is a good thing. (It flips $transfer inside out, upgrading the DB in place rather than saving off the user base and importing them to a new server.).
I need to give this some thought, because I also still want to support those people that like to do their editing on a bunch of flat files.
--
CoKe - 17 Dec 2003
to top