FileMaker index type & size test across 100k records (& empty fields)
"Compact File" & "Optimize File" in FileMaker Pro 9 Advanced.
2 pronged experiment
to determine if key ID methodology significantly impacts index & file size
- 100k number field, serial 1-100k, indexed: 3,190,784 bytes
- 100k text field (using same numbers) indexed minimal: 3,821,568 bytes
- 100k ditto (using serial A0000001...) ditto: 4,661,248 bytes
- 100k ditto (using serial ABC00000001...) ditto: 5,713,920 bytes
- 100k as IV., but with the ID as a minimal indexed multi-key: 13,778,944 bytes - comme ca:
- ABC00000001
ABC0000000
ABC000000
ABC00000
ABC0000
ABC000
ABC00
ABC0
ABC
AB
A
I created 100k records for the test purposes in a single table with just one indexed field.
The following tests were performed:
Conclusion?
You'd have to have a lot of records for it to be a big issue, but stick to text fields for multi-key potential [and avoid danger of text->number accidental relational failure or is it vice-a-versa or both that fail?], just use normal numbers if you want to economise - little penalty on space for these... but overall if I used 10 million records with ABC00000001 format it'd only be 600MB or so of indexes, 300MB over the pure number equivalence, but no risk of mismatches and a few additional trix up your relational sleeve as well, and no need to think about wot index style to use (KISS). [Obviously the multi-key has to die!]
are redundant and empty indexed fields a size problem?
This is of interest to me because I want to have an Account table that holds either Person or Organisation data. If it holds Person details then the Organisation field, and the _mfk_Organisation [minimally indexed] will be empty. If it holds Organisation details then the Title, First Name, Last Name, and {Fn, Ln & _mfk_FnLn} fields [minimally indexed] will be empty. Is this going to cause redundant space issues? So do I really need to think about separating these two types of Account into two different tables??? [Entity relationship nightmare to my mind!]
The data and indexing requirements look like this...
Person fields:
Organisation fields:
dummy data
I filled it in an 80:20 [Person:Org] ratio with lots of dummy data - random numbers in number fields, and lorem ipsum junk text in text fields to approximate title, first name, surname and organisation fields. I measured total file size [16,818,176 bytes], and then split data in two, i.e. 80k records of Person and 20k records of Org [as below].
Person dummy data:
Organisation dummy data:
Then I measured each [after compacting], and then chopped out the irrelevant fields and re-measured [again after compacting]:
number of bytes after>>> split up & removal of irrelevant fields Person data [79999 records]: 13,504,512 13,471,744 Organisation data [20k records]: 3,452,928 3,399,680 Conclusion:
Empty fields and their indexes use next to no space. Exactly what I wanted to hear, so I will tear ahead with my single Account table:
I appreciate this (perhaps) borders on lunacy, but I am contemplating building a pretty large system and wish to start from a solid base, thus having all facts and figures b4 me is helpful. Do certain ways of doing things have unacceptably high space costs or zero costs? I have found the results helpful, particularly in relation to the use of empty and generally redundant fields (allow me to explain below), and since I recorded the results I thought I may as well make them available.