FacebookTwitter
Hatrack River Forum Post New Topic  Post A Reply
my profile login | register | search | faq | forum home

  next oldest topic   next newest topic
» Hatrack River Forum » Active Forums » Books, Films, Food and Culture » Data Parser

   
Author Topic: Data Parser
Boris
Member
Member # 6935

 - posted      Profile for Boris   Email Boris         Edit/Delete Post   Reply With Quote 
Does anyone know of a good data parser that allows you to input text delimiters and return data to an excel file/word document?
Posts: 3003 | Registered: Oct 2004  |  IP: Logged | Report this post to a Moderator
Mike
Member
Member # 55

 - posted      Profile for Mike   Email Mike         Edit/Delete Post   Reply With Quote 
perl?
Posts: 1810 | Registered: Jan 1999  |  IP: Logged | Report this post to a Moderator
HollowEarth
Member
Member # 2586

 - posted      Profile for HollowEarth   Email HollowEarth         Edit/Delete Post   Reply With Quote 
We need more info to suggest anything remotely helpful.
Posts: 1621 | Registered: Oct 2001  |  IP: Logged | Report this post to a Moderator
scifibum
Member
Member # 7625

 - posted      Profile for scifibum   Email scifibum         Edit/Delete Post   Reply With Quote 
Excel?

It can read delimited text files all by itself.

Posts: 4287 | Registered: Mar 2005  |  IP: Logged | Report this post to a Moderator
Boris
Member
Member # 6935

 - posted      Profile for Boris   Email Boris         Edit/Delete Post   Reply With Quote 
One of the things we do at work is take a bunch of reports that are generated from event logs from servers, switches, routers, and other devices. We spend about 3 hours per month per client (we have about 50 mid-sized clients right now) digging through those reports to pull out the relevant data and make them more readable.

There are about 12 different types of events that we track so a full parse of the whole report would be very difficult and would require building a fully custom parser, which we don't have financial authorization for. What I'm looking for is something that will allow me to dig through each section and pull out the relevant data by finding the words that come before and after the relevant data and outputting what's in between. My guess at the easiest way to do this would be something that could scan each section of the report using a different set of words for delimiting that I could input prior to scanning. It wouldn't save us nearly as much time as a full parser, but it would probably cut us down from spending 3 hours on each report to maybe 1.5-2 hours.

Posts: 3003 | Registered: Oct 2004  |  IP: Logged | Report this post to a Moderator
TomDavidson
Member
Member # 124

 - posted      Profile for TomDavidson   Email TomDavidson         Edit/Delete Post   Reply With Quote 
quote:
There are about 12 different types of events that we track so a full parse of the whole report would be very difficult and would require building a fully custom parser, which we don't have financial authorization for.
I don't think this would actually be as difficult as you're supposing, here. But that said: what programming languages do you know? There are many, many ways you could code a basic parser for this.

If you don't want to program one -- which will be the cheapest and fastest option, probably, depending on your skills -- there are a few event log parsers around for purchase (many with Event Log or Log Parser in their names). You could even build a very rudimentary parser in Excel or Access, depending on how things are delimited.

If you're more familiar with Access, you might have some success writing twelve import macros that pull data to a single table (or twelve tables, if you must), and then a query that only looks for records with the specified keywords. This wouldn't even require any "real" programming -- just familiarity with the import/export tool.

Posts: 37419 | Registered: May 1999  |  IP: Logged | Report this post to a Moderator
Boris
Member
Member # 6935

 - posted      Profile for Boris   Email Boris         Edit/Delete Post   Reply With Quote 
The major problem is time. Almost everyone at work is working 50-60 hour weeks already (including me), and we're completely slammed with work that has to get done. I have programming experience, and I have a good idea of how it all needs to get done, but I just don't have even remotely enough time to re-learn syntax and put together a program.

The other problem is that all the data is collected by a Cisco MARS box and is then compiled into a single report and sent over to our office through email. The formatting of the report is a bit of an issue. There are probably a couple of things I could do to make the reports more readable, thus requiring less time for us to dig through, but that also takes time (since I'd have to research and test changes, as well as make those changes to about 50 different MARS devices).

Eh. I guess I'll figure something out. Probably my best bet would be to put together a design document that would lay out the specific needs for the reports and then propose we hire someone to write the software for us.

Posts: 3003 | Registered: Oct 2004  |  IP: Logged | Report this post to a Moderator
scifibum
Member
Member # 7625

 - posted      Profile for scifibum   Email scifibum         Edit/Delete Post   Reply With Quote 
Yeah, scraping text out of reports or web pages can be a real pain.

I would guess that the underlying log files might be easier to deal with, as far as parsing goes. It is usually not terribly difficult to transform a log entry into a database record, and then you can just query the database to aggregate the data how you want it.

I don't know anything about the MARS devices or their built in reports, but even if you are stuck dealing with the reports as they are now, I'm fairly certain that what you described - finding sections or pieces of the report that are delimited by keywords - could be largely scripted without a great deal of effort.

Just to give an arbitrary (and potentially embarrassing, I'm sure) example, here's a short VBA script that will put a section of a text file between two headings on the clipboard. This was written in the Excel VBA editor and requires references to be set for the "Microsoft Scripting Runtime" and "Microsoft Forms Object Library" libraries.

The major assumptions are the following:

1. Report is a text file
2. Headings are contained on a single line
3. Nothing on the heading lines needs to be included in the result.

code:
Sub Test()
Dim strText As String
Dim oData As DataObject
Set oData = New DataObject

strText = ExtractText("C:\test.txt", "heading #3", "heading #4")

oData.SetText strText
oData.PutInClipboard

End Sub

Function ExtractText(strFileName, strBeginText, strEndText) As String

Dim oFS As FileSystemObject
Dim oTS As TextStream
Dim oFl As File
Dim blBeginTextFound As Boolean
Dim blEndTextFound As Boolean
Dim strLine As String
Dim strReturnText As String

Set oFS = New FileSystemObject

Set oTS = oFS.OpenTextFile(strFileName, ForReading, False)

Do
strLine = oTS.ReadLine
If InStr(1, strLine, strBeginText, vbTextCompare) > 0 Then
blBeginTextFound = True
End If
Loop Until blBeginTextFound = True Or oTS.AtEndOfStream = True

If blBeginTextFound = False Then
ExtractText = "Error: begin header not found"
oTS.Close
Exit Function
End If

Do
strLine = oTS.ReadLine
If InStr(1, strLine, strEndText, vbTextCompare) = 0 Then
strReturnText = strReturnText & strLine & vbNewLine
Else
blEndTextFound = True
End If

Loop Until blEndTextFound = True Or oTS.AtEndOfStream = True

oTS.Close
ExtractText = strReturnText

End Function

Of course if my assumptions don't translate to your problem then the above code might not be a very useful example. And there's a lot of room for more powerful functions....Among other things you might want something that isn't case sensitive, parses the result data and stores it in a structured format, etc.

Just one way you can grab data out of the middle of a file, FWIW.

Posts: 4287 | Registered: Mar 2005  |  IP: Logged | Report this post to a Moderator
TomDavidson
Member
Member # 124

 - posted      Profile for TomDavidson   Email TomDavidson         Edit/Delete Post   Reply With Quote 
What's your budget? This might be something I'd be willing to do for you over my Christmas break for a relative pittance.
Posts: 37419 | Registered: May 1999  |  IP: Logged | Report this post to a Moderator
Dagonee
Member
Member # 5818

 - posted      Profile for Dagonee           Edit/Delete Post   Reply With Quote 
quote:
The major problem is time. Almost everyone at work is working 50-60 hour weeks already (including me), and we're completely slammed with work that has to get done. I have programming experience, and I have a good idea of how it all needs to get done, but I just don't have even remotely enough time to re-learn syntax and put together a program.
In one month you'd save 150 hours. Given that the program could almost certainly be designed to run overnight and save or email results, you could easily recoup 120 hours of development time in the very first month.

That's enough time for you to learn regular expressions and a language that supports them (which is most languages at this point).

Posts: 26071 | Registered: Oct 2003  |  IP: Logged | Report this post to a Moderator
Boris
Member
Member # 6935

 - posted      Profile for Boris   Email Boris         Edit/Delete Post   Reply With Quote 
TD, I'd have to get budgetary approval for the project, and we have a surprising amount of bureaucracy for a company with 50 employees (IT Consulting for banks has some stiff requirements), so it'd probably be a while before they'll hit the go button on it. But I'll throw your name in the hat when it comes time for it. I'll spend some time this week writing up a design doc and send it to you. Having a rough time to completion estimate would make things move a lot quicker.

Dag, In one month we'd save a whole ton of time. Unfortunately, we already have too few people and too many clients (we've doubled our client base in the last 9 months and are still skyrocketing), so what time we have gets spent doing support and writing reports (which our clients require as a part of FDIC reporting standards, so we can't just shut it down for a month to automate it).

Posts: 3003 | Registered: Oct 2004  |  IP: Logged | Report this post to a Moderator
TomDavidson
Member
Member # 124

 - posted      Profile for TomDavidson   Email TomDavidson         Edit/Delete Post   Reply With Quote 
quote:
Dag, In one month we'd save a whole ton of time. Unfortunately, we already have too few people and too many clients (we've doubled our client base in the last 9 months and are still skyrocketing), so what time we have gets spent doing support and writing reports (which our clients require as a part of FDIC reporting standards, so we can't just shut it down for a month to automate it).
Just an observation: this is why contract programmers exist.

Another observation: if you can get me your requirements before Christmas, I will charge considerably less than I would charge if I were to get them after New Year's Day.

Posts: 37419 | Registered: May 1999  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post   Reply With Quote 
If Tom passes on it (due to time or whatnot), I'll toss my hat in the ring. I've spent a lot of time parsing data (often from really awful sources -- I work for a scientometrics research group, and we get all sorts of bizarrely formatted data), and advising on how to parse data.
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
King of Men
Member
Member # 6684

 - posted      Profile for King of Men   Email King of Men         Edit/Delete Post   Reply With Quote 
Honestly, I don't think you even need a parser for it. Sounds like grep | sed to me.
Posts: 10645 | Registered: Jul 2004  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post   Reply With Quote 
If he just wanted to grab out that particular section automatically, then yes, sed would be sufficient (grep is incapable, though). What we're discussing is the more complete parsing task that's been alluded to, where the details aren't completely clear, but seems likely beyond what would be appropriate to do in sed.
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
Blayne Bradley
unregistered


 - posted            Edit/Delete Post   Reply With Quote 
sed? awk? grep?
IP: Logged | Report this post to a Moderator
TomDavidson
Member
Member # 124

 - posted      Profile for TomDavidson   Email TomDavidson         Edit/Delete Post   Reply With Quote 
Wanna cracker? [Smile]
Posts: 37419 | Registered: May 1999  |  IP: Logged | Report this post to a Moderator
Blayne Bradley
unregistered


 - posted            Edit/Delete Post   Reply With Quote 
Not funny! I am not an animal!
IP: Logged | Report this post to a Moderator
King of Men
Member
Member # 6684

 - posted      Profile for King of Men   Email King of Men         Edit/Delete Post   Reply With Quote 
So that would be a "DO NOT WANT", then.
Posts: 10645 | Registered: Jul 2004  |  IP: Logged | Report this post to a Moderator
   

Quick Reply
Message:

HTML is not enabled.
UBB Code™ is enabled.
UBB Code™ Images not permitted.
Instant Graemlins
   


Post New Topic  Post A Reply Close Topic   Feature Topic   Move Topic   Delete Topic next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:


Contact Us | Hatrack River Home Page

Copyright © 2008 Hatrack River Enterprises Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.


Powered by Infopop Corporation
UBB.classic™ 6.7.2