posted
Does anyone know of a good data parser that allows you to input text delimiters and return data to an excel file/word document?
Posts: 3003 | Registered: Oct 2004
| IP: Logged |
posted
One of the things we do at work is take a bunch of reports that are generated from event logs from servers, switches, routers, and other devices. We spend about 3 hours per month per client (we have about 50 mid-sized clients right now) digging through those reports to pull out the relevant data and make them more readable.
There are about 12 different types of events that we track so a full parse of the whole report would be very difficult and would require building a fully custom parser, which we don't have financial authorization for. What I'm looking for is something that will allow me to dig through each section and pull out the relevant data by finding the words that come before and after the relevant data and outputting what's in between. My guess at the easiest way to do this would be something that could scan each section of the report using a different set of words for delimiting that I could input prior to scanning. It wouldn't save us nearly as much time as a full parser, but it would probably cut us down from spending 3 hours on each report to maybe 1.5-2 hours.
Posts: 3003 | Registered: Oct 2004
| IP: Logged |
quote:There are about 12 different types of events that we track so a full parse of the whole report would be very difficult and would require building a fully custom parser, which we don't have financial authorization for.
I don't think this would actually be as difficult as you're supposing, here. But that said: what programming languages do you know? There are many, many ways you could code a basic parser for this.
If you don't want to program one -- which will be the cheapest and fastest option, probably, depending on your skills -- there are a few event log parsers around for purchase (many with Event Log or Log Parser in their names). You could even build a very rudimentary parser in Excel or Access, depending on how things are delimited.
If you're more familiar with Access, you might have some success writing twelve import macros that pull data to a single table (or twelve tables, if you must), and then a query that only looks for records with the specified keywords. This wouldn't even require any "real" programming -- just familiarity with the import/export tool.
Posts: 37449 | Registered: May 1999
| IP: Logged |
posted
The major problem is time. Almost everyone at work is working 50-60 hour weeks already (including me), and we're completely slammed with work that has to get done. I have programming experience, and I have a good idea of how it all needs to get done, but I just don't have even remotely enough time to re-learn syntax and put together a program.
The other problem is that all the data is collected by a Cisco MARS box and is then compiled into a single report and sent over to our office through email. The formatting of the report is a bit of an issue. There are probably a couple of things I could do to make the reports more readable, thus requiring less time for us to dig through, but that also takes time (since I'd have to research and test changes, as well as make those changes to about 50 different MARS devices).
Eh. I guess I'll figure something out. Probably my best bet would be to put together a design document that would lay out the specific needs for the reports and then propose we hire someone to write the software for us.
Posts: 3003 | Registered: Oct 2004
| IP: Logged |
posted
Yeah, scraping text out of reports or web pages can be a real pain.
I would guess that the underlying log files might be easier to deal with, as far as parsing goes. It is usually not terribly difficult to transform a log entry into a database record, and then you can just query the database to aggregate the data how you want it.
I don't know anything about the MARS devices or their built in reports, but even if you are stuck dealing with the reports as they are now, I'm fairly certain that what you described - finding sections or pieces of the report that are delimited by keywords - could be largely scripted without a great deal of effort.
Just to give an arbitrary (and potentially embarrassing, I'm sure) example, here's a short VBA script that will put a section of a text file between two headings on the clipboard. This was written in the Excel VBA editor and requires references to be set for the "Microsoft Scripting Runtime" and "Microsoft Forms Object Library" libraries.
The major assumptions are the following:
1. Report is a text file 2. Headings are contained on a single line 3. Nothing on the heading lines needs to be included in the result.
code:
Sub Test() Dim strText As String Dim oData As DataObject Set oData = New DataObject
Function ExtractText(strFileName, strBeginText, strEndText) As String
Dim oFS As FileSystemObject Dim oTS As TextStream Dim oFl As File Dim blBeginTextFound As Boolean Dim blEndTextFound As Boolean Dim strLine As String Dim strReturnText As String
Set oFS = New FileSystemObject
Set oTS = oFS.OpenTextFile(strFileName, ForReading, False)
Do strLine = oTS.ReadLine If InStr(1, strLine, strBeginText, vbTextCompare) > 0 Then blBeginTextFound = True End If Loop Until blBeginTextFound = True Or oTS.AtEndOfStream = True
If blBeginTextFound = False Then ExtractText = "Error: begin header not found" oTS.Close Exit Function End If
Do strLine = oTS.ReadLine If InStr(1, strLine, strEndText, vbTextCompare) = 0 Then strReturnText = strReturnText & strLine & vbNewLine Else blEndTextFound = True End If
Loop Until blEndTextFound = True Or oTS.AtEndOfStream = True
oTS.Close ExtractText = strReturnText
End Function
Of course if my assumptions don't translate to your problem then the above code might not be a very useful example. And there's a lot of room for more powerful functions....Among other things you might want something that isn't case sensitive, parses the result data and stores it in a structured format, etc.
Just one way you can grab data out of the middle of a file, FWIW.
Posts: 4287 | Registered: Mar 2005
| IP: Logged |
posted
What's your budget? This might be something I'd be willing to do for you over my Christmas break for a relative pittance.
Posts: 37449 | Registered: May 1999
| IP: Logged |
quote:The major problem is time. Almost everyone at work is working 50-60 hour weeks already (including me), and we're completely slammed with work that has to get done. I have programming experience, and I have a good idea of how it all needs to get done, but I just don't have even remotely enough time to re-learn syntax and put together a program.
In one month you'd save 150 hours. Given that the program could almost certainly be designed to run overnight and save or email results, you could easily recoup 120 hours of development time in the very first month.
That's enough time for you to learn regular expressions and a language that supports them (which is most languages at this point).
Posts: 26071 | Registered: Oct 2003
| IP: Logged |
posted
TD, I'd have to get budgetary approval for the project, and we have a surprising amount of bureaucracy for a company with 50 employees (IT Consulting for banks has some stiff requirements), so it'd probably be a while before they'll hit the go button on it. But I'll throw your name in the hat when it comes time for it. I'll spend some time this week writing up a design doc and send it to you. Having a rough time to completion estimate would make things move a lot quicker.
Dag, In one month we'd save a whole ton of time. Unfortunately, we already have too few people and too many clients (we've doubled our client base in the last 9 months and are still skyrocketing), so what time we have gets spent doing support and writing reports (which our clients require as a part of FDIC reporting standards, so we can't just shut it down for a month to automate it).
Posts: 3003 | Registered: Oct 2004
| IP: Logged |
quote:Dag, In one month we'd save a whole ton of time. Unfortunately, we already have too few people and too many clients (we've doubled our client base in the last 9 months and are still skyrocketing), so what time we have gets spent doing support and writing reports (which our clients require as a part of FDIC reporting standards, so we can't just shut it down for a month to automate it).
Just an observation: this is why contract programmers exist.
Another observation: if you can get me your requirements before Christmas, I will charge considerably less than I would charge if I were to get them after New Year's Day.
Posts: 37449 | Registered: May 1999
| IP: Logged |
posted
If Tom passes on it (due to time or whatnot), I'll toss my hat in the ring. I've spent a lot of time parsing data (often from really awful sources -- I work for a scientometrics research group, and we get all sorts of bizarrely formatted data), and advising on how to parse data.
Posts: 15770 | Registered: Dec 2001
| IP: Logged |
posted
If he just wanted to grab out that particular section automatically, then yes, sed would be sufficient (grep is incapable, though). What we're discussing is the more complete parsing task that's been alluded to, where the details aren't completely clear, but seems likely beyond what would be appropriate to do in sed.
Posts: 15770 | Registered: Dec 2001
| IP: Logged |