4350 - Advanced Software Engineering
Week-2 activities
1. Define a file specification for a compressed file.
-----------------------------------------------------
File components should include:
a. Compression type
b. File-name of compressed file
Files in currect directory will have no path
Files in a subdirectory will include the path
c. Size in bytes of the uncompressed size of each file
d. The compressed data stream
2. Write a C++ compression (encode) function.
---------------------------------------------
Start with your lab-1 spell.cpp program.
Write a function that will write an output file containing LZ-77
compression codes as we learned.
Build in a unit test that will compress a known string such as "banana".
Give your program the ability to get input from a user to determine
what data should be compressed.
a. command-line input
b. file input
3. Write a C++ decode function.
------------------------------
Your function will read LZ-77 compressed codes and produce an output
file of uncompressed data.
a. Your file-spec compression type will guide your program to use a
decode function that matches your encode function.
b. Direct the output to the path and filename stored.
Compression details
-------------------
Start with an encode function that simply writes LZ-77 codes without trying
to compress the data.
LZ-77 codes:
byte: offset
byte: size of repeated data
byte: next character
The word "book" can be stored like this
'0' '0' 'b' 3
'0' '0' 'o' 3
'0' '0' 'o' 3
'0' '0' 'k' 3
---
12
The word "book" can also be stored like this
'0' '0' 'b' 3
'0' '0' 'o' 3
'1' '1' 'k' 3
---
9 (improved compression)
Assuming 1-byte per code, the compressed data occupies 12 or 9 bytes for
data of 4-bytes. No actual compression for this small data sample.
Ideas on how to improve compression
-----------------------------------
Specify a compression type with the following LZ-77 codes:
nibble: offset
nibble: size of repeated data
byte: next character
The word "book" can be stored like this
'0' '0' 'b' 2
'0' '0' 'o' 2
'0' '0' 'o' 2
'0' '0' 'k' 2
---
8
The word "book" can also be stored like this
'0' '0' 'b' 2
'0' '0' 'o' 2
'1' '1' 'k' 2
---
6 (improved compression)
With a change in file specifications, we have gone from 12-bytes to 6-bytes
for the same input data.