I understood.
But these are just tautologies: "they're big because they're big"; "they're big because they were smaller and got bigger". An explanation, on the other hand, would look like "each zone entry refers to a sample's full pathname and there's no dedupe" (though that would require ~3million entries to get to 1.5GB.)
I've worked on plenty of systems more complex than this, and even those involving some flabby interpreted DSL had a significantly smaller footprint.
I mean, wouldn't it be a bit annoying (yet useful) to discover it was 1.5GB of sample filenames?
I got curious and found this breakdown for Symphobia 1 (without any instrument loaded, just the blank all-in-one patch) under the "expert/engine" tab in Kontakt 6:
My first interpretation of that was that the groups where the sampler is set to TM Pro always loads the sample, even if it isn't used at all. But loading adaptive synced articulations further increases the "sample memory used" value and it can total up to almost twice what is listed here under "TM Pro Voice Memory" if you touch the mic balance slider to load the second mic position (if applicable).
It does seem a bit excessive to me too. I was trying to find out how many samples there are in Symphobia 1 and couldn't find a clear number anywhere. The zone IDs go up to about half a million. If that's a good approximation it would be roundabout 2kb of data per sample. That still seems like more than it should be, but would not be as outrageously high as I expected at first seeing the 1gb for an empty (!) instance.
My conclusion is that the all-in-one patch approach is a deadend for huge kontakt libraries and it will factor into future purchasing decisions for me. I'm still using the patches from an older version of the library because the new ones also bloat the project filesize and save/load times too much. Ideally I'd prefer single articulation patches like in the Metropolis Ark series.