JustDATA File Embedder

General Computer Utilities
Posts: 7
Joined: Fri Nov 19, 2010 4:32 pm

JustDATA File Embedder

Post by CirothUngol »

JustDATA is a little utility for automatically embedding any and all types of files into JustBASIC\Liberty BASIC programs through DATA statements, and then ReGenerating them, either individually or together, exactly when and where you need them. I've been using these Embed and ReGenerate routines (originally based on Rutger's All2Bas utility) to embed files in my projects for quite some time now, but I've finally decided to polish up the routines and flesh out the application so that others may have a chance to use it.
Here are some of it's features:

Allows you to embed one file, several files, or a whole directory of files.
Choose to include the ReGeneration routine as a GOSUB, FUNCTION, or SUB... or not at all.
All settings needed to ReGenerate the files are contained inside of the generated subroutine.
Automatically sorts the folder contents alphabetically.
Displays percentage complete, Filename, number of Files, current\total File Size, Execution Time and Bytes per Second.
Build the target file either to the Source Folder or the Default Folder.
Select a FileName or use the suggested default.
Append the routine to the end of existing files.
Allows aborting an operation, where you may choose to Save, Delete, or Continue.
Choose to keep the Absolute Path for the ReGenerated files.
Choose different ByteOffset or Control Charaters for Substitution, Replication or CR+LF.
Change the maximum allowed line length of the generated DATA statements.
Change several other settings like which characters to substitute, byte offset, etc...

When calling any of the ReGeneration routines, you may provide an optional FileName as an argument (JDfile$). If provided, the embedded files will be searched and only that one file will be rebuilt if found. If not found, then no files are regenerated. If a FileName is not provided, then all of the embedded files will be regenerated. You may also use the GLOBAL variable JDpath$ if you'd like to specify a target path other than DefaultDir$, and the routines either return or set the number of files regenerated using the variable JustDATA. I've tested this utility on literally thousands of different files with a 100% success rate for byte-accurate ReGeneration... only really big files seem to have any problems (over 8-10 MegaBytes or more)... they take forever! My 2.1GhZ QuadCore pulls about 15 KiBytes/sec on most encodes and about 43 KiBytes/sec on most decodes.

JustDATA uses a simple byte-wise form of RLE encoding for chains of repeating characters and exchanges unprintable characters on a 1-for-2 basis, both by using a character flag. The repeat flag is only used if it will actually shorten the length of the sequence, which is more than 4 printable characters or more than 2 unprintable characters. Since TextFiles are most of what I use JustDATA for, I've also included a third character flag that exchanges out the very common (and after encoding, rather wordy 4-byte long) CarriageReturn + LineFeed combo... all in an effort to reduce the size of the generated routines. If you'd like, you can de-activate the CR+LF flag by setting it to "0" in the options, the other two flags are mandatory.

Although JustDATA has the potential to actually compress filesizes (and it will if working with uncompressed BitMaps, heavily formatted TextFiles, or the like), but for executables and pre-compressed data (zip, rar, 7z) it usually increases the filesize to under 110% of it's original size... still pretty good for BASIC DATA statements. This is typical of the default settings, as the JustBASIC editor (which was written in Liberty BASIC, right?) seems to have only 5 characters that cannot be contained in a DATA statement: Null, Tab, LineFeed, CarriageReturn, and Double-Quote (ASCII 0, 9, 10, 13, 34). Add in the 3 default control characters (ASCII 3, 4, 5) and there are only 8 values out of 256 that serve to inflate the filesize. Since the DATA output is completely linear, this allows the close regulation of the DATA statement length which results in fewer DATA statements and fewer times the required 7 characters per line have to be repeated (i.e. DATA "")... in fact, the output is so linear that you can even edit the DATA statements after they've been generated and they will still ReGenerate without any issues (excepting your edits, of course).

Please download, give it a try, and leave a post if you find any bugs, glaring omissions, or obvious errors. Hopefully others my find this utility to be as useful as I have.

Edit: Mere hours after uploading, I found an error in the retention of variables selected by the "Options" button, causing the generated subroutines to always have default values... which works great, as long as you never change any of the settings. ^_^
I've corrected it and re-uploaded the archive.
You do not have the required permissions to view the files attached to this post.
Posts: 7
Joined: Fri Nov 19, 2010 4:32 pm

Re: JustDATA File Embedder

Post by CirothUngol »

I've updated JustDATA to version 0.2, now featuring Huffman Compression! Big thanx to AltBAs for converting Rich Geldreich's original QBASIC source to JustBASIC and posting it for others to use. I've been testing the app for a couple of days and so far I've had 100% success regenerating many hundreds of different files. The Huffman Decompressor is slow, but very usable for smallish files, and achieves over 30% compression on most types of uncompressed data. I've seen it get over 60% on some executables and I've used it to regenerate files as large as 12MiB without any issues... well, unless you consider speed an issue. ^_^

I've added a few more options and streamlined the Huffman Decompressor so that it only requires one pre-DIMed Array() and no GLOBAL variables to be declared in your program. I've also stripped down the Huffman Compressor to use only the one truly necessary recursive SUBroutine and its one GLOBAL variable... it made the code much more linear, and I think easier to understand.

I think I've identified all of the "problem" characters inside of DATA statements. For the JB IDE it's Null, Tab, LineFeed and Double-Quote (ASCII 0,9,10,34). For TKN files it's that list plus CAN and ESC (ASCII 24 and 27). I use the JB IDE setting while working on a project, and then use the TKN setting when finalizing it so it can be easily tokenized.
Speaking of, I've noticed a distinct slow-down in the application when using the tokenized version. I've chosen the JustBASIC main executable as my test file, and I invite anyone to check the execution speed of the JB IDE against the TKN file. My results for jbasic.exe using default settings + Huffman Encoding:
IDE - 3min, 41sec at 5,076 bytes/sec
TKN - 5min, 35sec at 3,346 bytes/sec

There are now two versions of the regeneration routine. One with the Huffman Decompressor (which you get if the Compressor is selected) and one without. You can also now specify the targetPath for the regenerated file by including it on the front of the passed fileName$ parameter. For instance:
CALL JustDATA targetPath$ + "myFile.ext"
This works with or without the fileName attatched, just be sure to end the pathName$ with a BackSlash ("\").

I should also note that even though the JB IDE will allow CarriageReturns inside of the DATA statements, WIndows Clipboard will still turn then into LineBreaks if you try to copy-and-paste the contents of the produced files. You can either add ASCII value "13" to the Substituted Charactes List, or just produce the JustDATA routine and copy/paste your program to the top of it (which is what I do).

I've tested this little app about as much as one guy can, so if you run across a bug, find a file it won't reproduce correctly, or can think of another nice addition for the app, please leave a response. I sincerely hope that JustDATA proves to be a useful application to anyone who may need it.
You do not have the required permissions to view the files attached to this post.
Posts: 7
Joined: Fri Nov 19, 2010 4:32 pm

Re: JustDATA File Embedder

Post by CirothUngol »

I've updated JustDATA to version 0.3, now featuring LZW compression! Check out the ReadMe:

Code: Select all

JustDATA File Embedder v0.3

A utility for Liberty BASIC and Just BASIC programs
written using Just BASIC v1.01

JustDATA is a small utility that will convert any type of file into DATA
statements so that they may be embedded into Just BASIC and Liberty BASIC
programs. This allows the programmer to create text files, graphics, pictures,
music, or anything else "on-the-fly" exactly when and where it's needed. 
It features optional Run-Lenth-Encoding or Lempel-Ziv-Welch compression and
will create output as either GOSUBs, FUNCTIONs, SUBroutines, or only DATA
statements. Highly configurable, it allows the programmer full control over
which characters are printable, size of Offset for non-printable characters,
maximum line-length of created DATA statements, size of the internal read 
buffer, maximum size of the LZW dictionary, size of produced files, and more.

There are two listboxes (panes) in the main window. The left pane is for
navigation and selecting files. The right pane is a list of files to encode.

Double-click a filename in the left pane, and it'll appear in the right pane.
Double-click a filename in the right pane to remove it from the encoding list.
Double-click a foldername in the left pane to display it's contents.
Double-click "Go Up One Level" to enter the parent of the current folder.

These select the type of subroutine to create. They are only enabled and
selectable if the "Include ReGenerator" checkbox is checked.

 Include Regenerator - If set, this will include a regeneration subroutine
                       along with the created DATA statements, otherwise only
                       DATA statements are produced.
Create in Source Dir - By default, JustDATA will create an output folder named
                       "DefaultDir$\DATA\" and place all produced files there.
                       If set, this option will instead place all created files
                       into the folder currently selected on the left pane.
     Overwrite Files - By default, JustDATA tries to prompt the user anytime
                       the current operation would result in deleting or over-
                       writing a file. By setting this option any of these
                       operations will assume "Yes" to delete or over-write.
      Compress Files - Select this option to compress all source files using
                       the Lempel-Ziv-Welch algorithm to minimize the size of
                       the produced subroutine. When this option is set, a
                       dialog will appear allowing the following options:
                        Max BitSize - Max size of the LZW dictionary in bits
                                      from 12 to 21, but 20 is optimal maximum
                                      for LZW compression, and 21 can sometimes
                                      choke Libery BASIC, causing a fatal error.
                         Use Static - By default, the LZW dictionary is reset
                                      when full. Set this option to instead
                                      stop creating new entries when the
                                      dictionary becomes full. This can lead
                                      to better compression (esp. with text).
                       Keep Encoded - Set this option if you wish to retain
                                      compressed files created by the LZW
                                      encoder. They will be named with a
                                      ".lzw" extension and they may be
                                      extracted by selecting them in the
                                      left pane of the JustDATA main window.

All -=> - Move all filenames in the left pane to the right pane\encoder list.
<=- All - Move all filenames in the right pane back to the left pane.
Create  - Produce DATA statements for all files in the encoder list.
Options - Brings up the options dialog containing the following settings:
            MaxLen DATA - This is the maximum number of characters allowed
                          for a single DATA statement. This total includes
                          the seven required characters (i.e. DATA "").
           Subst Offset - This is the ASCII offset used for unprintable
                          characters in DATA statements. For example: CHR$(9)
                          with an offset of 40 becomes CHR$(49), or "1".
             Subst Char - This is the ASCII value for the character which
                          flags an unprintable character in the produced DATA
                          statements. Both this value and this value + offset
                          must be printable characters.
               RLE Char - This is the ASCII value for the character which
                          flags a block of repeated characters in the DATA
                          statements. Both this value and this value + offset
                          must be printable characters. This option is only
                          used or selectable when LZW compression is OFF. If
                          set to zero, then Run-Length-Encoding is not used.
             CR+LF Char - This is the ASCII value for the character which
                          flags a Carriage Return + Line Feed pair in the DATA
                          statements. Both this value and this value + offset
                          must be printable characters. This option is only
                          used or selectable when LZW compression is OFF. If
                          set to zero, then this option is not used.
                          Note: This option is only recommended when
                          embedding uncompressed text files.
           MaxChunk KiB - This is the size (in KibiBytes) of the internal
                          buffer that is used when reading files.
            New File at - Before writing another file to output as DATA
                          statements, JustDATA will check if the ouput file
                          is smaller than this value x 64KiB. If not, a new
                          output file is created.
              Font Size - The size of the font displayed throughout the
                          program. Included so that users with non-standard
                          font sizes can adjust the application accordingly.
          Load Defaults - This will reset eveything on the option dialog
                          back to their original default values.
          View ReadMe   - Displays this text file in a separate window. This
                          window is closed if the application is terminated.
 Unprintable Characters - This is the list of unprintable ASCII values that
                          must be offset before including them in the produced
                          DATA statements. This will vary depending on what
                          application you're using to edit or run your Liberty
                          BASIC program. The following presets are provided:
                           JB\LB - for the Just BASIC and Liberty BASIC IDEs
                             TKN - for tokenized programs
                             LBB - for Liberty BASIC Booser
                             LBW - for Liberty BASIC Workshop
                          Safest - the default setting. It contains all of the
                                   known problem characters and is recommended
                                   for maximum compatability.

If you wish to have JustDATA output a regeneration routine along with the
produced DATA statements, you may choose either GOSUB, FUNCTION, or SUB.
They all possess equivalent functionality, so choosing one over the other
is really a matter of how you wish to incorporate it into your program.
The routines will accept a path and\or filename as an optional parameter.
If the optional parameter (JDfile$) is provided, then all DATA statements are
searched for that file and only it is regenerated. If the file is not found,
then no files are regenerated. If the parameter contains a full path
specification, then the file or files will be regenerated to the provided
path, otherwise DefaultDir$ is used. If no parameter is provided, or if it
contains only path info (i.e. it ends in a "\"), then all embedded files
are regenerated. The variable "JustDATA" is then set to the number of files
created and this value is the product returned if the routine is a FUNCTION.
The routines are called in the following manner:

    GOSUB - JDfile$ = "Embedded.File" : GOSUB [JustDATA]
            'This will search all DATA starting at the [JD_DATA] restore point
            'for the file "Embedded.File". If found it will be regenerated
            'to DefaultDir$ and the variable "JustDATA" will be set to 1,
            'otherwise the routine will exit and do nothing.

      SUB - CALL JustDATA "D:\Music\Midi\000497.mid"
            'This will search for the file "000497.mid" in the available
            'DATA statements and regenerate it to the folder "D:\Music\Midi\"
            'if found, then the variable "JustDATA" is set to 1. If the file
            'is not available then the routine will exit and do nothing.

 FUNCTION - numberOfFilesCreated = JustDATA("E:\Projects\Temp\")
            'This will regenerate all available files to the folder provided
            'and return the number of files created as the function's product.
            'The parameter is considered a folder if it ends in a "\". If the
            'trailing back-slash were to be omitted from this example, the
            'routine would attempt to regenerate a file named "Temp" to the
            'folder "E:\Projects\", and do nothing when it failed.

If LZW compression is used then the following code must be added to the
target program and executed before JustDATA is called:
    DIM JD(1)    ' Array for dictionary index-prefix values
    DIM JDch$(1) ' Array for dictionary one-byte strings

* If you save to a filename that already exists, you will have the option to
  either Erase or Append. This allows you to add additional files to a previous
  routine or even tack it automatically on to it's target .bas source file.

* Each file is represented by a block of DATA statements. Each block begins with
  a filename and ends with either "JD_STOP" (indicating end of file) or
  "JD_STOP_FINAL" (indicating end of data).

* The Run-Length-Encoding and Carriage-Return+Line-Feed substitution are
  disabled if LZW compression is enabled (LZW renders both ineffective).
  You may also disable these by setting them to zero. If both of these are
  disabled then a faster and more efficient read routine is used.

* 20 bits is the maximum recommended dictionary size. Liberty BASIC can choke
  when REDIMing the 21+ bit arrays, so if you decide to use them you should
  set the dictionary to "Static".

* When using LZW compression, you may choose to keep the compressed .lzw files.
  These may be extracted by selecting them in the left pane of the main window.
  They will be extracted to the curent folder.

* Any flags used for substitution or RLE will automatically be added to the
  list of Unprintable Characters before encoding begins.

* When you click "OK" in the options dialog any invalid values are either set to
  default or to zero. ASCII character validity is read from left to right;
  Offset, Subst, RLE, then CR+LF.

* Closing the options dialog without clicking "OK" will discard all changes.
by CirothUngol January 18, 2014
You do not have the required permissions to view the files attached to this post.