RotoGuru Computer Forum

View the Forum Registry


Self-edit this thread


0 Subject: More Excelk advice needed (again)

Posted by: rockafellerskank
- Dude [27652109] Tue, Jul 06, 2004, 22:28

Here is a description of what I'm trying to accomplish, then I'll post a link to a mock spreadsheet to illustrate the problem.

One of my clients sends me weekly trasaction files in XL. They contain 1000's of lines of data with each line consisting of:

acct #
name
transaction type (payment or adjustment)
date
amount

Now, on any given week, there will be many lines of data that may have the same or nearly the same elements. These are NOT duplicates. For example:

John Doe account # 123456 makes a payment of $50.00 on 6/15/04 ..... AND the same transaction occurs again that same day on same account, same name, date amount and we DO want to count it again. It's possible for Doe to have more than one account # also.... any combination and repeats are are valid. So far so good.

Now here is the problem. When I add up all transactions for weeks 1,2,3 & 4 together, I get 10,000 lines of data. BUT.... when my client produces a MONTHLY SUMMARY file to compare to, there are 10,500 lines of data.

So far I know that the MONTHLY file is correct. I also know that if the data line appears on the weekly file, it IS on the monthly file. So, I need to identify the 500 data lines that appear ONLY on the MONTHLY SUMMARY file so I can process them. [I've already processed each of the weekly transaction and am only interested in processing the difference]

I thought I could create a couple of arrays and use vlookup to identify the missing transactions, but the problem is the acct #, name, etc.... are not unique and reoccur sometimes several times over and over. A also realize I can sort subsort various ways.

Now, perhaps a sample speadsheet will illustrate what I need. This is asimple spreadsheet that can be solved manually, buty no way I'd want to srth through the real file of 10,000 lines of data.

So, how can I compare these 2 data files to produce the totally unique records? I'm sure this is a simple solution.. I just can't find it.


demo spreadsheet
1Dr. Doom
      Donor
      ID: 52328812
      Wed, Jul 07, 2004, 00:15
rfs,

I couldn't think of any easy way to do it, but about 40 lines of VBA code and I have a solution, lemme know how to reach you if interested.
2Guru
      ID: 330592710
      Wed, Jul 07, 2004, 10:32
How about something like this:

Create a single cell of each record by concatenating the 5 data elements. For example, if the data elements are in columns A thru E, then put the following formula in column F2:
=A2&","&B2&","&C2&","&D2&","&E2
and copy it down the entire column.

Now, sort the sheet by column F. This will put all identical records net to each other.

Next, in column G, put a counter. If the value in the F column is different than the one above it, then the counter is 1. Oherwise, the counter goes up by 1.

Finally, concatenate the counter in column G with the data in column F, again creating a new column H which holds the 5 data elements plus a counter.

Do this same thing on both the monthly and the summary sheets. Then, compare the values in column H for the two sheets.

If you have 4 identical rows in one sheet but 5 in the other, you'll have four matches and one mismatch.

I inserted the formulas in columns F-H of the weekly and monthly data sheets, and then added a column I in the monthly sheet to match the items. The 7 items with #NA in that column are the missing items. demo-copy.xls
3Dr. Doom
      Donor
      ID: 52328812
      Wed, Jul 07, 2004, 12:04
Guru, that's pretty much the same solution I came up with the formula approach, using VBA, I created a concatenated key for the first 5 columns then looped through the monthly records and the weekly records finding matching records and identifying the missing transactions. A sort at the end would move them to the top.
4rockafellerskank
      Dude
      ID: 27652109
      Wed, Jul 07, 2004, 20:58
Thanks Guru & Doom. I'll try that and see if it works.

5Slackjawed Yokel
      Leader
      ID: 52347519
      Wed, Jul 07, 2004, 23:12
you could save your data to tab-delimited ascii files, and run the Unix command 'diff' on the two files. It will spit out the differing records and the line numbers.

you can get access to most unix commands by installing cygwin on your pc: http://www.cygwin.com/

I think textpad may have the same functionality built in, but you have to pay for it (like $20) where cygwin is free.

RotoGuru Computer Forum

View the Forum Registry


Self-edit this thread




Post a reply to this message: (But first, how about checking out this sponsor?)

Name:
Email:
Message:
Click here to create and insert a link
Ignore line feeds? no (typical)   yes (for HTML table input)


Viewing statistics for this thread
Period# Views# Users
Last hour11
Last 24 hours11
Last 7 days33
Last 30 days33
Since Mar 1, 2007647372