RotoGuru Computer Forum

View the Forum Registry


Self-edit this thread


0 Subject: Saving as tab delimited in Excel

Posted by: Sludge
- [54692111] Thu, Aug 26, 2004, 14:33

I have a worksheet that has 4 columns. Columns 2, 3, and 4 are numeric and are not the problem. Column 1 is text, and is giving me fits when I try to save the data as a tab delimited file. Excel doesn't, by default, surround text data using quotes it seems unless there is some other character that may cause a problem (such as a comma). Well, the majority of the cells don't have commas, but some do. So I get stuff that looks like the following in the tab delimited file:


CASRP54|| Candida albicans mRNA for SRP54 protein homologue. 1542 794 1
CATF3BSU|| C.albicans gene for TFIIIB (BRF1) subunit. 810 500 1
CATPS1GEN|| C.albicans TPS1 gene. 1353 804 1
"CAU13233|| Candida albicans saccharopine dehydrogenase (LYS1) gene, complete" 2415 944 1
"CAU25180|| Candida albicans phosphoglycerate kinase (PGK1) gene, complete cds." 2942 802 1
CAU35070|| Candida albicans integrin-like protein alpha Int1p (alpha INT1) 1598 745 1
"CAU37010|| Candida albicans ribosomal protein L39 (RPL39) gene, complete cds." 1413 752 1
"CAU37011|| Candida albicans ribosomal protein L29 (RPL29) mRNA, partial cds." 3633 761 1
"CAU38534|| Candida albicans alpha-tubulin (TUB1) gene, complete cds." 1605 830 1
CAU72203|| Candida albicans glyceraldehyde-3-phosphate dehydrogenase (CaGAP) 5202 981 1


I want to either save the file where the data from every cell in the first column is surrounded by quotes OR where none are (as I can handle either case). But I can't handle the mixed bag that Excel is giving me.

I know that a simple solution is to remove all the commas in the first column, and will use that if I have to, but I was hoping there was an option somewhere in Excel to force it to always surround text data with quotes.

Any help?
1J
      Leader
      ID: 049346417
      Thu, Aug 26, 2004, 16:15
how about take em all out and add a column before and after with quotes (or any symbol)
2Trip
      Donor
      ID: 13961611
      Thu, Aug 26, 2004, 18:07
I would sort the rows by column 1 so that you get all of the rows with and without the quotes together. Next, you will have to insert a column b, and fill it with the " symbol. You can use the concatenate formula to add quotes around the rows without them. The formula would be "=CONCATENATE(B1,A1,B1)" You can then copy this formula to all of the rows which aren't included in quotes. Lastly, you would have to copy the concatenated cells and select "paste special > values" over the original cells. Then of course, delete column B.
3Sludge
      ID: 24914721
      Thu, Aug 26, 2004, 21:04
Thanks guys, but neither of those will accomplish what I'm looking to do. For starters, adding leading and trailing quotes to the contents of the first column results in Excel adding two more leading and trailing quotes when you save it as a tab delimited file, at least it did for me (try it yourself). Secondly, this isn't a one-shot deal, and I will not be the one who always does the data analysis. At present, my job is to work through the steps necessary and to describe to some others how to do it so that they can do it themselves in the future. So the KISS principle definitely applies here.
4RecycledSpinalFluid
      Dude
      ID: 204401122
      Thu, Aug 26, 2004, 23:05
I'll see if I can work something up later.
5RecycledSpinalFluid
      Dude
      ID: 204401122
      Thu, Aug 26, 2004, 23:11
Just want to clear some thing up before digging in:

Only columns A, B, C and D have data?
6Sludge
      ID: 24914721
      Thu, Aug 26, 2004, 23:13
Yes. The remaining columns are all blank.

Also, the rows must remain in their original order, so no sorting can be done (not that I can see why any would need to be done, but I just thought I'd mention it).
7RecycledSpinalFluid
      Dude
      ID: 204401122
      Fri, Aug 27, 2004, 02:02
What file extention do you want on the output file?
8RecycledSpinalFluid
      Dude
      ID: 204401122
      Fri, Aug 27, 2004, 02:08

' Tab-Delimit.vbs
'
' VBScript file to convert an Excel XLS file into a Tab-Delimited output file
'
' Drag and Drop the XLS File on top of this VBScript
'
' This will create and output file in same directory as XLS file called
' TAB_DEL_ appended to front of XLS file name
'
' Create for Sludge by RecycledSpinalFluid (You know the email addy to send payment to... :)
'
'
Set objExcel = CreateObject("Excel.Application")
Set objArgs = WScript.Arguments

'Get file to TAB-DELIMIT
If objArgs.Count = 1 Then
strFileName = trim(objArgs(0))
Else
strFileName = trim(InputBox("Enter XLS File Name" & VbCrLf & VbCrLf & "(Located in same directory as this script)", "XLS File To TAB-DELIMIT", "RSF.XLS"))
End If

'Check if not the current directory
slashLoc = InStrRev(strFileName, "\")
If slashLoc > 0 Then
outputFileName = Left(strFileName, slashLoc) & "TAB_DEL_" & Mid(strFileName,slashLoc + 1, Len(strFileName) - 4 - slashLoc) & ".csv"
Else
outputFileName = "TAB_DEL_" & strFileName
End If

'Check if and XLS or not
If ucase(Right(strFileName,4)) <> ".XLS" Then
WScript.Echo "File to TAB-DELIMIT must be an XLS file"
WScript.Quit
End If

'Create Output file name "TAB_DEL_" + XLS File Name
Set objDictionary = CreateObject("Scripting.Dictionary")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.CreateTextFile(outputFileName, True)

'Open the XLS (Invisible)
Set objWorkbook = objExcel.Workbooks.Open(objArgs(0))

intStartRow = 1
intStartCol = 1
intRowsDown = 65536
intColsWide = 4

rowDate = True
compareLine = VbTAB & VbTAB & VbTAB
Do while rowDate = True
writeLine = trim(objExcel.Cells(intStartRow, intStartCol).Value) & VbTAB & _
trim(objExcel.Cells(intStartRow, intStartCol + 1).Value) & VbTAB & _
trim(objExcel.Cells(intStartRow, intStartCol + 2).Value) & VbTAB & _
trim(objExcel.Cells(intStartRow, intStartCol + 3).Value)
If (writeLine <> compareLine) Then
objTextFile.writeline writeLine
intStartRow = intStartRow + 1
Else
rowDate = False
End if
Loop
objExcel.Quit

Wscript.Echo "The Tab Delimited file:" & _
VbCrLf & VbCrLf & _
"'" & outputFileName & "' has been created."
9RecycledSpinalFluid
      Dude
      ID: 204401122
      Fri, Aug 27, 2004, 02:09
Not the best coding, could use some more error correction, but should get the job done if you drag and drop into it.
10biliruben
      ID: 406171015
      Fri, Aug 27, 2004, 02:12
I'd recommend DBMSCopy for all your conversion needs, except SAS bought them, and I am unsure if it will continue to have support.
11Sludge
      ID: 54692111
      Fri, Aug 27, 2004, 12:49
Awesome, RSF. I bow to your Excel Kung Fu. I'm pretty sure I can handle any changes that may be required in the future (in particular if more numeric columns are added) myself. But this is a great start. I owe you one.
12Sludge
      ID: 54692111
      Fri, Aug 27, 2004, 13:27
One more question (with no guarantees that there won't be more):

What would I replace the following line with to automatically determine the number of columns in the excel file?

intColsWide = 4
13RecycledSpinalFluid
      Dude
      ID: 204401122
      Fri, Aug 27, 2004, 14:15
Lets make it a little smarter (not much, but a little):
Replace from rowDate (why did I name it that???)

rowDate = True
compareLine = ""
For i = 1 To intColsWide - 1
compareLine = compareLine & VbTAB
Next

Do while rowDate = True
writeLine = ""
For i = 1 To intColsWide - 1
writeLine = writeLine & trim(objExcel.Cells(intStartRow, intStartCol + i - 1).Value) & VbTAB
Next
writeLine = writeLine & trim(objExcel.Cells(intStartRow, intStartCol + i - 1).Value)

If (writeLine <> compareLine) Then
objTextFile.writeline writeLine
intStartRow = intStartRow + 1
Else
rowDate = False
End if
Loop
objExcel.Quit

Wscript.Echo "The Tab Delimited file:" & _
VbCrLf & VbCrLf & _
"'" & outputFileName & "' has been created."
14RecycledSpinalFluid
      Dude
      ID: 204401122
      Fri, Aug 27, 2004, 14:18
To answer post 12:

Yes, that will determine how many columns wide you want to report on.

If you didn't want to start with Column A, change the "intStartCol". Say you want to start with Column C, then set it to "intStartCol = 3".
RotoGuru Computer Forum

View the Forum Registry


Self-edit this thread




Post a reply to this message: (But first, how about checking out this sponsor?)

Name:
Email:
Message:
Click here to create and insert a link
Ignore line feeds? no (typical)   yes (for HTML table input)


Viewing statistics for this thread
Period# Views# Users
Last hour11
Last 24 hours11
Last 7 days11
Last 30 days44
Since Mar 1, 20071102522