Thursday, March 31, 2022

A Unix Shell Script for Generating CSV Files from Oracle

Over my last several gigs, I was once asked to generate CSV's from a key database schema that could be loaded into Excel spreadsheets and archived for an Oracle database to be decommissioned. The context here is for a 12.2 Oracle database running on a dated version of Unix. Note that the solution I provide here doesn't include complexities of BLOB, CLOB, or other special field types.

There are some complexities in the context here. One is that although 12.2 and above have a CSV markup mode that can facilitate generate of simple structured tables in sqlplus, I ran into practical issues of a maximum size of concatenated strings (say, >4K character records) and embedded line feeds and ca::rriage returns (e.g., emails in varchar2 fields) which resulted in a violation of the one line per record objective. So I used the Unix shell tr command to strip these from the target file--but that left me with the problem of how to terminate the records. So basically I used the tilde (~) at the end of my concatenated string to  mark the end of records and at the end replace the tilde's with line feeds using the tr command. This left me with another issue to consider: what if tilde's are embedded in your column data? I use the Oracle replace function to replace any tilde's in the data; if carriage returns, line feeds, and/or tilde's cannot be stripped or substituted, you may need to use a different approach.

I generate a PL/SQL script on the fly basically to work around the concatenated string limit in sqlplus; I use a caret (^) as my field delimiter and enclose fields in double-quotes. The first line of the CSV file consists of delimited column names. I assume here the use of a DBA-privileged OPS$ account (i.e., sqlplus / ) with the SID and Oracle binaries set in the environment and you are working in a write-privileged working directory, that you have sufficient storage available to accommodate multiple working copies of the target CSV file.

 Finally, I assume an external file tablist in the working directory, consisting of capitalized target schema table names (one per line) with the schema name overlaying SOME_SCHEMA in the script.


#!/bin/bash
myschema=SOME_SCHEMA
cat tablist | while read newtab
do
echo "set heading off" > myscript0.sql
echo "set echo on" >> myscript0.sql
echo "set feedback off">> myscript0.sql
echo "set pagesize 0" >> myscript0.sql
echo "select column_name from dba_tab_columns" >> myscript0.sql
echo "where owner='${myschema}' and table_name='${newtab}'">> myscript0.sql
echo "order by column_id;">> myscript0.sql
echo "exit;" >> myscript0.sql
sqlplus -s / @myscript0 > mytab

i=0
cat mytab | while read myrec
do
if [[ $i == 0 ]]; then mystring2="${myrec}";
else mystring2="${mystring2}^${myrec}"
fi
((i=$i+1))
echo $mystring2>mystr2
done
`
pstring=`cat mystr2`
echo "set linesize 32766" > newscript.sql
echo "set serveroutput on size unlimited" >> newscript.sql
echo "DECLARE" >> newscript.sql
echo newtab $newtab
echo "TYPE arraytable is table of ${myschema}.${newtab}%rowtype;" >> newscript.sql
echo "myarray arraytable;" >> newscript.sql
echo "CURSOR c IS" >> newscript.sql
echo "select $mystring from ${myschema}.${newtab};" >> newscript.sql
echo "BEGIN" >> newscript.sql
echo "OPEN c;" >> newscript.sql
echo "LOOP" >> newscript.sql
echo "FETCH C BULK COLLECT into myarray limit 10000;" >> newscript.sql
echo "FOR i in 1 .. myarray.COUNT" >> newscript.sql
echo LOOP >> newscript.sql
echo "DBMS_OUTPUT.PUT_LINE(" >> newscript.sql
j=1
cat mytab | while read mycolumn
do
if [[ $j == 1 ]]; then echo "'\"'||replace(myarray(i).${mycolumn},'~','')||'\"'" >> newscript.sql ;
else echo "||'^\"'||replace(myarray(i).${mycolumn},'~','')||'\"'" >> newscript.sql ;
fi
((j=1+$j))
done
echo "||'~');" >> newscript.sql
echo "END LOOP;" >> newscript.sql
echo "EXIT WHEN c%NOTFOUND;" >> newscript.sql
echo "END LOOP;" >> newscript.sql
echo "END;" >> newscript.sql
echo "/" >> newscript.sql
echo "exit" >> newscript.sql
echo "${pstring}~" > temp.csv
sqlplus -s / @newscript >> temp.csv
nbrw=`wc -l < temp.csv`
nbrs=$(( ${nbrw} - 3 ))
sed -n 1,${nbrs}p temp.csv > ${newtab}.csv
tr -d '\r'< ${newtab}.csv > temp2.csv
tr -d '\n'< temp2.csv > temp.csv
tr -s '~' '\n' < temp.csv > ${newtab}.csv

done