Darren's Weekly Nugget 10/19/2009

dhedges · ‎10-27-2009

One more thing -- if your VI file is already in the disk cache, then you will notice it loads very, very quickly. In this case, the file size has no impact on file load time. That portion of load time has been effectively taken out.

So another way to improve your load time is to have loaded it once before.

Ray.R · ‎10-28-2009

sth wrote:

If there is that much speed improvement it means to me that something is wrong!

It's called "windoze"..

Ben · ‎10-28-2009

dhedges wrote:
...

Why do operating systems behave this way? Why load an entire file into the cache, instead of waiting until specific bytes are requested? Latency. While hard drives have very good bandwidth, they still have to wrestle latency.

Bottom line: if you want your VIs to load as fast as possible, make them as small as possible. Get rid of your diagram if you can.

As an old-time disk specialist I think I can can add some value.

In the old days of DOS, we had to keep our code small and if we could not, we had to include code to load the new code segments over old ones.

With OS's that support Virtual Memory that work was handled by the OS by taking advantage of additional hardware on the CPU to handle the Virtual to Physical memory ammping and to handle page faluts (when an attempt is made to access a page of memory that has been mapped but is not loaded into physical memory).

So the first tierm I'd like to clarify is the files have to be "mapped" and not "cached".

mapping vitual meory requires locating all of the sectors on disk that hold the fie being opened, and configuring the "special hardware" on the CPU I mentioned earlier. once mapping is complete, the code can adress any offset required. If it is currently loaded into physical memory, the adress translation logic convert to a physical adress and the data is accessed. If it is not in physicla memory, then the hardware forces a Page Fault trap and the OS wakes up to get the data from disk and update the translation logic.

So in a OS with virtual memory a page is only read from disk if it is accessed.

Now I think I read years ago about LV forcing all pages into memory through some trickery or whatnot to prevent issues with disk I/O slowing the code down. This works only if you have more physical memory than required by the app plus OS. After that the extra data is stuffed into the page file on the system disk, so it is back on the diska gain anyway (but stored in a manner that allows quick access).

Disk Cache can mean a couple of different things. THe OS can cache the bit map file of the disk to speed up file allocations, or the disk drive could keep a copy of the most recnet data R/W from the drive to speed up access. But cahces are alomost always just a hidden copy of a subset.

Done with my soap-box thing. I probably got something wrong so clarify what seems really bad if you want.

Have fun!

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

sth · ‎10-28-2009

I agree with most of what Ben is saying, but this isn't a virtual memory problem but rather there seems to be some confusion as to the exact use of the word 'cache'.

As Ben pointed out, there is cache on the drive hardware, buffered caches in the OS disk driver, and then whatever LV itself decides to cache. All the caches in the OS and the drive hardware cache on the order of disk blocks or memory pages. Neither the OS or the disk cache will read ahead more than the memory page size which is 4k bytes on most systems.

This is small compared to the file sizes we are talking about. The disk hardware may try to read the next few sectors into disk hardware cache in an attempt to read ahead. Some will read an entire track at a time. But this just means that the code section should be contiguous on the disk. I know that Mac OS X does defragging of the disk on the fly but that means that it is a function of defragging status rather than actual size.

If LV demands that the entire file be read into memory (which it can do) then picks out the code section, then I agree it will slow down. But in that case I think it qualifies as "something is wrong" as well!

Bottom line is that even back when I used DOS 3.2, the only way an entire file is read into any sort of memory when accessing a location is when you force it to load explicitly.

Since the 100s of VIs are scattered on the disk you will have the same number of random disk seeks either way, with and without diagrams. Of course disk read blocks are cached until needed to be recycled so a second disk read to load the set of VIs may be quicker depending on the free RAM in your system and the type of OS. I am fairly familiar with how many OSs work DOS, MacOS 9, OS X, linux, various UNIX, VMS, OS360 but not windows in all its various flavors. Maybe windows is really that stupid, but the others do not load entire files when a single block is accessed (well maybe IBM OS 360 was that dumb). The disk hardware cache doesn't know about files, but rather track and block info and works on that level.

In fact, the point that Ben made about cached vs. mapped memory brings this to another whole level. When a VI is "loaded" by LabVIEW it technically should not even copy the code into RAM memory. Basically it just makes a mapping between the code segment in the file and virtual memory. Thus the size matters even less. The code is only loaded into memory as it is needed to execute, so after each VI is loaded I assume some initialization code is executed. That is the only disk access to load the code that does the initialization. This should not depend on VI size at all.

If I can set up a test quickly today I may try to settle this empirically.... Comes from being an experimentalist.

dhedges · ‎10-28-2009

Thanks, Ben, for the clarification. According to my research, on Windows, roughly half of your physical RAM is used by the operating system, kernel and applications. The other half is used to remember the contents of the most recently accessed files, just in case you need to access them quickly. That's why I call it the "disk cache". Also, other people at work call it that. Also, I think it differs from mapped virtual memory, in that when you close the file and then open it again, the disk drive is not accessed. I believe this happens in physical RAM, not in any hardware on the disk drive, but I could be wrong about that.

I have not researched other operating systems. They may be different. Also, Windows Vista makes all sorts of "performance enhancements" that make measurement difficult. For instance, a colleague just told me that when you load an application on Vista, it loads all the files into the disk cache that you accessed the last time you loaded the application. Just in case you'll need them again. But I haven't tested on Vista.

I'm looking forward to your experiments, sth. Please be sure to make your load measurements after a reboot so you can be reasonably sure there is no part of your file anywhere in memory.

Thanks!

--Dan

sth · ‎10-29-2009

Ok, it does load a lot faster with the diagrams removed. Now the question is why? I don't buy that the OS is the culprit here since I know that opening a file or even loading a single block from a file doesn't trigger the OS into loading the whole file in memory. Here is the data, but I have some more comments below. There are some obvious examples in my normal work flow of opening Gbyte size files where I know that RAM is not used.

Below the top row is with the BD present in the hierarchy and the main VI is referenced and dereferenced 10 times. The first is just after running it several times. The second is after logging out and logging back in which jumps the first load a lot and then the last column is after a total system reboot which should wipe virtual memory. With the BD present the first load time jumps from 4.5 to 7 to 15 seconds for each of these cases.

Without the BD present the time drops to about 60% for both the first and subsequent loads.

I know, it isn't the prettiest picture but it gets the idea across. Took me awhile to get an efficient routine to remove all the BD from a hierarchy.

With the BD removed this routine loads in about 60% of the time that it takes with the BD present. After logging out and especially after reboot the load takes much longer (factors of 3 or 4). The initial load takes longer the second or 3rd load is the fastest and then it slowly increases in time for the next several loads.

A few things argue against the disk latency argument, the routines take much longer to load on my desktop system which has more RAM and a RAID setup, but an older G5 CPU. On the laptop system with the slower disk it is much faster for some reason.

I believe that the BD is loaded by LabVIEW if it is present. That is the only thing that really explains the consistent difference between the datasets above. Otherwise if it was just disk caching the speed difference between the 1st and second load would be larger and about equal even after the BD is removed.

So if the BD is loaded, the question is why bother? This morning while pondering it before work, my gut feeling is that the VI loader always loads the BD because of the "Conditional Disable" nodes. These must be loaded to determine if a new compile is needed. If the BD is missing then of course this step is skipped. Unlike the flag for which platform the VI is compiled for this can't be determined without the BD loaded. LabVIEW could get a big improvement in loading speed if a flag was in the header that indicated if such a structure is in the BD or not. However this is another flag that could get out of sync with the actual BD and cause bugs.

I have included the code I used for timing and for recursively removing BDs in the attachment. Use carefully on a system where you don't care about the BDs. I have an easy system restore. You will have to change the path to the VI you want to test.

Message Edited by Support on 10-29-2009 11:07 AM

dhedges · ‎10-29-2009

Thanks, sth.

The numbers I find particularly compelling are the ones immediately following a reboot. Load time goes from about 15 down to 10.5. This is the number that I consider most important, because it's tied closely to user misery. And the only way to improve this speed significantly is to make the file(s) smaller.

That's not to say we don't care about the other numbers. There are use cases for improving those times, too. But when I talk about load speed, I am usually implying the maximum load time, which is after a reboot.

Two things I can't explain. First, I would expect that after a logout, you would get the exact same performance numbers as either 1) after a reboot, or 2) with the file in the cache. Instead you got somewhere in between. I would expect that the file is either in the cache or it's not. I would not expect any in-between state. But that appears to be what happened. Perhaps there is a hybrid explanation using both disk cache and virtual memory mapping.

The other thing I can't explain:when the file is in the disk cache, why is the performance of loading affected by whether or not the diagram is in the file? No, LabVIEW does not load the diagram (unless there is something wrong with the test). It's possible that LabVIEW does less error checking if the diagram is not available -- in other words, LabVIEW could be more efficient if it can rest assured you won't load a diagram. Still, I would not expect such a dramatic difference. Perhaps there is a gain from spatial locality somehow? I don't know.

johnsold · ‎10-29-2009

Scott,

Did you do anything else (non-LV) with the disk between successive loads? That would change the disk cache without affecting any memory allocated to LV. No real ideas here, just listening.

Lynn

Ben · ‎10-29-2009

I'll do some more specualting based on my experience.

The current version of Mac OS is running on something that is/was Unix (if not disregard the following)

Back whe I was servicine disk drives on mainframes I learned the hard way to "sync twice". Sync was a command that force Unix to actually write the data to the physical disk drive. Unix (back then) would avoid writing to disk unless it had to do so. The sync command was similar to the "flush" we have in LV. So before I shutdown the machine I had to sync to avoid currupting the file system. The "sync twice" was due the the fact the first sync would write waht was in memory but the resulting changes to the disk were not written. A Sync twice habit prevented file system coruption.

So if we fast forward to modern days...

I suspect Scott's OS is caching (keeping a copy of what is on disk in memory) the files.

This could be investigated by using identical folders for each test buyt never the same one twice (without re-booting).

Done with the speculation.

dhedges,

Process_Monitor (? name?) availbel from MS will log all activity that happens under the hood.

Found it in my pocket! Try this out!

Ben

Message Edited by Ben on 10-29-2009 10:26 AM

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

sth · ‎10-30-2009

If you want to take the first after a reboot load, that is fine, that is the most standard proxy for user load times. I don't believe that it should be tied to pure file size. For example, if I create a "small" VI and then add 1 GB of text on the BD. This really shouldn't affect the load time. (yet another test to do....) The LV VI loading routine, should open the file, examine the header, find the location and size of the code block in the file, position the pointer to the start of the code block and read just the code block into memory. The BD with its GB of text should be somewhere on disk, contiguous or not and not enter into the equation. It should be the same number of disk seeks and disk reads with or without the BD.

The bottom line is that really shouldn't change the load time for a VI unless there is extra checking. That is why I hypothesized that the existence of the "Conditional Disable" structure might force this.

For the first problem, logging in and out may just exercise the RAM and overwrite some parts of the file. The OS reuses RAM based on an LRU (Least Recently Used) algorithm and thus may partially overwrite some of the file and need to reload it. Thus you get a value somewhere between a fresh reboot and an immediate re-run.

The graphs were generated by immediately opening a VI reference and then closing it. LV itself may delay flushing the VI and its sub-VIs from memory as well. I did not do anything between runs. However the system normally has about 80 other processes running in the background. This is just the design of the Mac OS as a multi-tasking OS. Most of these are sleeping, but any disk activity by any of them can disturb cached VI data. I did no other GUI activity between runs other than log out and log in, or reboot and login. After login I waited some time for all the login background processes to stabilize.

Is Mac OS based on unix? Wow, that is a complicated question and the answer is, of course, "it depends"..... 🙂

(skip this if you don't like OS internals, but is for you Ben who obviously has dealt with a lot of different OS architecutres)

The Mac has many layers. The lowest layer is Mach or Mellon Architecture. This is the kernel with all the hardware device drivers, memory management and communication. It is not "unix" per se. On top of this is loaded a Free BSD unix architecture, with all the utilities and features that gives. The disk driver is in this kernel layer and any disk caching in RAM is done by this, layer.

The file system is the old Mac HFS type and has more "features" than unix as well. But that has different and more modern file handling than most versions of unix. File caching is in this layer. Applications can call the BSD layer for functionality but they can also call directly into the Mach kernel for efficiency. It is much easier to see this in a picture from wikipedia. Note that the BSD part shows up as part of the kernel and as a top layer, but you can access the kernel without going through the BSD layer.

This is a little complicated but gives a great deal of functionality and ability to optimize access to system resources.

Sync, forces a flush of "dirty" blocks to the disk. If you write to a disk block, the in memory version is modified and the block is marked dirty and the disk driver will eventually get around to writing it to disk. In this case we are talking about reading blocks which sync doesn't change. There is a way by summing up all "inactive" or "free" memory and then creating a file of that size and then reading and examining each byte of that file. It is slow but it makes sure that most of that memory is recycled and used by the disk driver.

But the disk driver is not "unix", though it shares some design features. The file system is more "unix" but is heavily modified so that things like file system events can be monitored and trigger actions. Extended attributes and "access control lists" have been added so it has quite a bit more functionality than the bare "unix" type file system.

LabVIEW

Darren's Weekly Nugget 10/19/2009

Re: Darren's Weekly Nugget 10/19/2009

Re: Darren's Weekly Nugget 10/19/2009

Re: Darren's Weekly Nugget 10/19/2009

Re: Darren's Weekly Nugget 10/19/2009

Re: Darren's Weekly Nugget 10/19/2009

Re: Darren's Weekly Nugget 10/19/2009

Re: Darren's Weekly Nugget 10/19/2009

Re: Darren's Weekly Nugget 10/19/2009

Re: Darren's Weekly Nugget 10/19/2009

Re: Darren's Weekly Nugget 10/19/2009