by Nikolai Sklobovsky,
Senior System Analyst,
Retail Technologies International, Inc.
Before we start digging into this problem, let's conduct a simple experiment. Open any folder on any of your NTFS hard drives (I assume you run Windows 2000), click right mouse button, select "New | Text document" command and run it. Accept the default file name by hitting Enter key. Then open file properties, note for yourself that the file size is zero and go to Summary tab. You will see few edit boxes where I suggest you to type something like follows:

Hit Apply and switch back to the General tab. You will see that the file size still show as zero. You can switch to Summary tab again, you can close the property dialog and reopen it any time later - the text you entered will be there. Interesting enough, according to Explorer or DIR command the file size is still zero. To make the whole thing even more challenging you can make a copy of the file (presumably within the same drive or, at least, within NTFS realm) - and again, the size of the copy will be zero but all your additional information will be there...
Some light to this magic can be shed if you try to copy this file to a non-NTFS drive, the nearest example of which could be ubiquitous A: drive, old good 1.44 Mb floppy usually carrying old FAT16. The easiest way do it is to highlight the file we've just created, call out its popup menu and the select "Send to | 3 1/2 Floppy (A)" command. If you never tried this before - behold: you'll be greeted by a never-seen-before confirmation dialog, conveniently called "Confirm Stream Loss":

This dialog reveals the actual culprit of this new fascinating behavior: multiple data streaming capabilities of NTFS.
You can actually get a little bit further with this feature, even though Windows 2000 does not provide us with a bunch of GUI or at least command-line tools to exploit it. There are actually only two more commands (except COPY) that are "stream-aware", namely MORE (which allows you to see the contents of the stream) and ECHO (which allows you to write text into the stream without any programming efforts). For instance, to obtain the data we have entered you need to issue the following command at the shell prompt:
C:\TEMP>more<"new text document.txt:^ESummaryInformation"
Here ^E shall be entered as ALT-sequence @05 (press ALT, type 0 and 5 on you numeric pad, release ALT), which is an old DOS legal way to enter non-visible characters, chr(5) in this case.
If you do everything right (don't miss the colon!), your efforts will be rewarded by a few screens of garbage and spaces, amidst which you'll see the data we entered.
You can enter additional information into any file without damaging its current contents via ECHO command. Let's consider the following example:
C:\TEMP>echo I know kung-fu!>matrix.txt:neo
As a result an empty MATRIX.TXT file will be created, the only way to see the hidden text being to issue MORE command:
C:\TEMP>more<matrix.txt:neo
which would result to the famous
I know kung-fu!
line.
Unfortunately, these are the only tools available for an non-sophisticated user. To get beyond this point, we need to dig a little bit into the jungle of Windows API.
Further in this article we are going to discuss the following issues:
Finally, you'll be provided with the source code for a simple console tool which would help you to find out the level of stream infestation on your hard drive.
back to the topThe whole idea behind this phenomenon is neatly explained in Dino Esposito's March 2000 article A Programmer's Perspective on NTFS 2000 Part 1: Stream and Hard Link. Briefly, it tells you that under NTFS each file can theoretically contain indefinite number of additional data streams. The only thing you need to do is to follow a very simple syntax: specify the name of the stream after the colon after the file name. Everything else is taken care of by two standard functions: CreateFile and DeleteFile. That's it. No wands, no smoke, no spells, no mirrors. The only tricky thing, according to the paper, is to enumerate those streams or detect their sheer existence without reading their data.
When I first read this, I could not believe what I had just read. I knew - and I double-checked - that almost all Delphi file-handling routines ultimately call these two functions mentioned above. The simplest way to test it, of course, was to try SaveToFile / LoadFromFile methods of any TStrings. I launched my D5, started a new application, dropped a memo, edit box and three buttons on the form and filled the buttons' event handlers:
procedure TForm1.btnSaveClick(Sender: TObject); begin Memo1.Lines.SaveToFile(Edit1.Text); end; procedure TForm1.btnLoadClick(Sender: TObject); begin Memo1.Lines.LoadFromFile(Edit1.Text); end; procedure TForm1.btnDeleteClick(Sender: TObject); begin DeleteFile(Edit1.Text); end; |
Listing 1.
The result can be seen here:

Of course, you can play with another stream, "morpheus" and its "Welcome to the real world!" contents, or Cypher's "Buckle your seat belt, Dorothy!", you can create as many streams as you want and delete them at will. And all the time file size will remain zero, unless, of course, you save some text under the file name itself, without any stream name.
back to the topNext thing you probably wanna do is to detect whether this streaming capability is supported by OS present at your application run-time. This can be achieved by using GetVolumeInformation function. To avoid its numerous parameters I created a simple wrapper for this purpose:
function SupportsStreams(const csAnyPathFileName: string): boolean;
const
FILE_NAMED_STREAMS = $00040000; // from winnt.h
var
s: string;
dwMaxFileNameLength, dwFileSystemFlags: cardinal;
volname, sysname: array[0..MAX_PATH] of char; // debug purposes only
volsernum: DWORD; // debug purposes only
begin
Result := FALSE;
s := ExtractFileDrive(csAnyPathFileName) + '\';
if GetVolumeInformation(PChar(s),
volname, sizeof(volname),
@volsernum,
dwMaxFileNameLength,
dwFileSystemFlags,
@sysname[0],
sizeof(sysname)) then
Result := (FILE_NAMED_STREAMS and dwFileSystemFlags) <> 0;
end; // SupportsStreams
|
This is the most tedious task, which requires some additional information and slightly more coding efforts.
Essentially what we need to know is that from the NTFS point of view any file's multiple streams are located in a simple sequence, each stream being preceded by a stream header. Its structure is described by WIN32_STREAM_ID structure, which in Delphi's Windows.pas is defined as:
type
PWIN32StreamID = ^TWIN32StreamID;
_WIN32_STREAM_ID = record
dwStreamId: DWORD;
dwStreamAttributes: DWORD;
Size: TLargeInteger;
dwStreamNameSize: DWORD;
cStreamName: array[0..0] of WCHAR;
end;
TWIN32StreamID = _WIN32_STREAM_ID;
WIN32_STREAM_ID = _WIN32_STREAM_ID;
|
Listing 3.
The only way to handle this sequence of file streams is to use three less-than-popular functions from backup API, namely BackupRead, BackupWrite and BackupSeek. The last one allows a programmer to skip a part of the stream or even entire stream. These functions are declared in Windows.pas as follows:
function BackupRead(hFile: THandle;
lpBuffer: PByte;
nNumberOfBytesToRead: DWORD;
var lpNumberOfBytesRead: DWORD;
bAbort: BOOL;
bProcessSecurity: BOOL;
var lpContext: Pointer): BOOL; stdcall;
function BackupSeek(hFile: THandle;
dwLowBytesToSeek, dwHighBytesToSeek: DWORD;
var lpdwLowByteSeeked, lpdwHighByteSeeked: DWORD;
lpContext: Pointer): BOOL; stdcall;
function BackupWrite(hFile: THandle;
lpBuffer: PByte;
nNumberOfBytesToWrite: DWORD;
var lpNumberOfBytesWritten: DWORD;
bAbort, bProcessSecurity: BOOL;
var lpContext: Pointer): BOOL; stdcall;
|
Listing 4.
First call to BackupRead allocates some internal resources, a pointer to which
is returned in lpContext parameter. Then you can continue to call the it (or
BackupSeek, if you don't need a certain portion of stream) until you receive
FALSE, which, according to MSDN, indicates that there is no more data. At this
point you are supposed to call BackupRead for the last time, now with bAbort
parameter equals TRUE to deallocate the resources stored in lpContext.
I actually found that in a real life the situation is not that simple. After a few of hours of fighting with weird exceptions and AVs, I realized that the outcome of BackupRead can be as follows:
- Result=TRUE, lpContext is a valid pointer - OK, can continue
- Result=FALSE, lpContext is a valid pointer - done, need to call one more time with bAbort=TRUE
- Result=TRUE, lpContext = 0xFFFFFFFF (i.e. -1) - done, DO NOT call it anymore
Knowing all this, the idea of our stream information retrieving algorithm can be pretty simple:
Putting all this in plain Delphish, we'll get something like this:
function GetStreamName(hFile: THandle; bStop: boolean = FALSE): boolean;
var
recStreamInfo: TWIN32StreamID;
iBytesToRead, iBytesRead: cardinal;
wszStreamName: array[0..MAX_PATH] of widechar;
begin // GetNextStreamName
Result := FALSE;
if ((pContext = nil) and bStop) or (integer(pContext) = -1) then EXIT;
// first let's obtain the actual size of the stream
// via reading only its header info
ZeroMemory(@recStreamInfo, sizeof(recStreamInfo));
ZeroMemory(@wszStreamName[0], sizeof(wszStreamName));
s := '';
iBytesRead := 0;
iBytesToRead := PChar(@recStreamInfo.cStreamName[0]) - PChar(@recStreamInfo);
Result := BackupRead(hFile, // handle to file or directory
@recStreamInfo, // pointer to buffer to read to
iBytesToRead, // number of bytes to read
iBytesRead, // pointer to variable to receive number of bytes read
bStop, // termination type
FALSE, // process security flag
pContext); // pointer to pointer to internal context information
if bStop then
Result := FALSE; // we don't want it to be true in this case
// it turns out that even if your read beyond eof it's still TRUE,
// but context gets destroyed and changed to $FFFFFFFF
if integer(pContext) = -1 then
Result := FALSE;
if (iBytesToRead <> iBytesRead) then
Result := FALSE;
// for whatever reason, if result is not good we're out of here
if not Result then EXIT; // bad case
// at this point we have read header part.
// you can get whole stream size via
// recStreamInfo.Size + sizeof(recStreamInfo) + recStreamInfo.dwStreamNameSize
// next goes stream's name followed by its content
// Let's read the stream name:
iBytesToRead := recStreamInfo.dwStreamNameSize;
if iBytesToRead > 0 then // main stream has 0 bytes in name
begin
Result := BackupRead(hFile, // handle to file or directory
@(wszStreamName[0]), // pointer to buffer to read to
iBytesToRead, // number of bytes to read
iBytesRead, // pointer to variable to receive number of bytes read
FALSE, // termination type
FALSE, // process security flag
pContext); // pointer to pointer to internal context information
if not Result or (iBytesToRead <> iBytesRead) then EXIT; // bad case
s := WideCharToString(wszStreamName);
// any non-trivial stream name is stored as ":stream_name:$DATA"
if s <> '' then // so let's clean it
begin
s := Copy(s, 2, Length(s) - 7);
// at this point you may also want to replace
// invisible characters with dots or codes
end; // not trivial
end; // read the name
// at this point you can process available stream name and stream information
// now we have to skip the rest of the stream to position ourselves on the next one
BackupSeek(hFile,
// it's OK to request more - it wouldn't go across stream boundaries
high(cardinal), high(cardinal),
// let's "recycle" available vars,
// these actually are LowBytesRead and HighBytesRead
iBytesToRead, iBytesRead,
// "bug" in Delphi description - extra dereferencing is requested
@pContext);
end; // GetStreamName
|
Now we have enough information at hands to create simple scanner, which would allow us to obtain this hidden information about alternative data streams stored on our hard drives. We'll make it a simple console type application which would accept up to four command-line parameters:
and in case no filespec parameter is provided display a simple syntax help text.
You can view HTML version of the program here, or download source file from here.
Here is a sample result screen:

As far as this is a very simple program, I didn't spend a lot of time with its UI. One of the consequences is that it only can report the total amount of file extra streams AFTER is displays each stream's "personal info". Of course, in a commercial application this can be easily fixed by providing some temporary storage, etc.
You can also download zipped compiled version of this program (streamscan.exe, 300K, 158K in zip) from here.
In this article we learned about new powerful feature of Windows 2000 - multiple file streams. Basically it means that you have one extra level of nesting, as far as each file can now be treated as a "directory" one-level deep. Having simple "file_name:stream_name" syntax supported at the very heart of the file system, this feature is extremely easy to use as far as all Delphi I/O routines automatically support it also. The only non-trivial thing about them is to obtain stream information or to enumerate existing streams, in which task this article hopefully gives you some aid.
Happy Streaming!
Nikolai Sklobovsky,
hiddentreasures@sklobovsky.com
|
You are Visitor No:
|