Am I missing something? That doesn't seem all that odd, but perhaps that's because I'm familiar with IP,TCP and UDP and I've written something to read PNG headers before. Interchange file formats often include variable length blocks, which a part of the header defining the block length. That first part of the record is just a very small header per item, which defines the type of item and the size. Look into the deflate algorithm, or tar file format, or any number of on-disk storage formats and you'll find they all (well, definitely most) do this, because it's the most efficient way.
Fixed length records are less common tor interchange formats, but likely you use them every day anyway. That's what databases often use, and the reason is because it makes it very efficient to index intro the structure and get whole records (know your data is sorted? Binary search is possible and easy then). Sometimes it's just the indices that are stored this way (they essentially have to be), but you can get fairly efficient table access out of some engines without indexing everything if all the records are fixed and the engine can determine that.
If you generally don't program in a low level language, this is generally abstracted away by some library that is written in that language. People do't usually write PNG and JPEG libraries in pure Python or Ruby (or at least, they don't expect them to be used much in production), they write a shim that wraps libpng or libjpeg.
The interesting part is that the fixed and variable record formats are first class things on the mainframe.
So, something like DB2 on a mainframe can use system supplied functionality (VSAM) as their storage engine. As opposed to unix, where higher level databases like MySql, CockroachDB, etc either roll their own (InnoDB) or use some 3rd party offering like RocksDB, LevelDB, etc.
VSAM isn't just one thing either...it supports k/v indexing, or indexing via relative byte address, or indexing via record number, etc.
So, basically, when you talk with mainframe people about interchanging data, they don't tend to consider that you might actually have to write some code to parse what they are sending you. They tend to assume you already have utilties that understand these things. It's not an interchange format...it's the native format for them.
I suppose the answer seemed novel because it's speaking with very specific "official sounding" terminology about something that's usually ad-hoc negotiated by project in the unix world.
As to the interchange of data, unless you run into a bunch of lazy Mainframers (they do exist) there's a lot they can do at almost zero "cost" to make it easy for you. It is not often there is a genuine case that anyone outside the Mainframe needs to know about internal Mainframe formats. I rail against "how do I translated packed-decimal fields in language-x" questions. There's no need for them to be seen. Same with LRECL and RECFM. Text-only, explicit signs, explicit decimal-points (or scaling factors), then you can have delimited records and no worries. You can even have XML or JSON if happier with that.
So I guess it's fair to say the difference is that the mainframes have a standardized format for record creation and consumption in the OS, sort of like DB2 being included in the kernel? That is nice, and it does explain why it might be confusing, even if I think it doesn't necessarily get you much over using a third party library (unless there are other benefits I'm not considering).
It's less of a difference now, and somewhat unrelated to the specific topic, but...
A big historical difference with mainframes and data was the architecture around I/O. They always had separate processors to offload I/O, and I/O was always asynchronous. And things like VSAM were highly tuned to take advantage of that.
That's why mainframes continued to outpace Linux/X86 for some types of workloads...even after X86 performance far outpaced the main processors in a mainframe.
I believe that advantage is completely gone now, but mostly via brute force vs elegance. Commodity hardware is just so fast now.
Correct about the I/O. You can let the space-bar auto-repeat 1919 times, for instance (nearest equivalent to circling the mouse) and the CPU cost is... zero. When, exactly, do you think that the X86 surpassed the Mainframe processors, and in what particular way? Current generation (expect a new one this year) is 5Ghz (actually slower than the previous) and has lots of stuff. A fully-loaded box has a theoretical throughput of 30bn (yes, billion) RESTful transactions per day. And if that isn't enough power, you can hang another 31 boxes onto it and treat them as one.
"When, exactly, do you think that the X86 surpassed the Mainframe processors, and in what particular way?"
Fairly recently. Through things like affordable ssd, enough Moore's law around intel, and better distributed data stores. And better app side knowledge on how to break up a monolith.
I was around for a few failed "rewrite this TPF system" attempts and I saw what broke.
Commodity stuff can replace it now...but only very recently.
Or if you just meant x86-64 vs any other CPU, for the CPU alone? That debate is just done. They poured enough money into that mess that they won, assuming you don't care about power consumption.
Mainframe DASD is the same as "X86" disks, at least for those using "storage arrays".
Pretty much all the smaller Mainframes are gone, many years ago. I've not heard of any successful replacement of a loaded system which used fewer than three times the initial projection of "X86-power".
Anyway, time will tell. In 10 years' time you'll still think X86 is faster and there'll still be Mainframes.
As to your last line, who is "they"? I'm just interested. Thanks.
Sure, I wasn't trying to indicate there was no reason or benefit to mainframes, just to summarize the situation to make sure I understood it correctly. It does make sense to have an integrated library for advanced file access if you have dedicated IO hardware. That prevents a lot of misconfiguration of libraries that might try unsuccessfully use that system, if they even support it at all (i.e. OpenSSL and crypto hardware such as dedicated AES hardware as in the Via mini-ITX platforms of yesteryear).
If you want DB2 in the operating system, you need an "IBM midrange" or iSeries. Much shorter code-path, much faster. If you want the fastest record access, the operating system z/TPF for the IBM Mainframe. Not only just fixed-length records, but jus fixed-length records of one size. Effectively, there are no "third-party libraries" (until you get to IBM's Java, or any language someone has ported (Lua is a popular example).
Although I was never a mainframe programmer per se, I did quite a bit of interfacing between mini/microcomputers and IBM mainframes, so I got to see under the hood a little. (If I write something stupid, it's either memory issues or ignorance).
I recall seeing how files were allocated on disk (remember that mainframes have many different OSes, like OS\390, and even OSes on top of OSes like VM/CMS, and I don't remember what this was running on).
In this particular case, a file was preallocated in JCL to use N extents starting on a specific cylinder. Fixed size. None of this fancy ext3 or NTFS ;)
JCL (Job Control Language) was a language to control batch jobs, and many have called it the worst language ever designed, although not as bad as brainfk.
On the other hand, I had a chance to interface C++ with CICS (a transaction processing subsystem) using WebSphere MQ, and I must say, I was really impressed with its sophistication. It was a kind of SOA long before the term was invented.
A lot of what I saw in the mainframe world predated things - by decades - that some may think are new(er) concepts, such as clusters (sysplex), front-end processors, hypervisors, HA, and so on.
Those of us who had to fiddle with implied file formats with fixed-length fields and records won't find this stuff quite as alien, but equally as painful to deal with. I recall using some sort of ETL program to get around this. On the plus side, these primitive formats certainly were efficient in terms of processing speed, and a great match for COBOL.
Speaking of COBOL, as part of this project, I had to write a parser in C++ to parse COBOL copybooks (kind of a COBOL data structure definition) and generate C code to read the data.
It is a very different world, but I don't think it's all bad. After all, the technology has been working very well for a long time. Kudos to the COBOL Cowboys. I hope they charge a lot more than $100/hr!
> In this particular case, a file was preallocated in JCL to use N extents starting on a specific cylinder. Fixed size.
Sounds like a z/VSE system (formerly known as VSE/ESA, VSE/SP, DOS/VSE, DOS/VS, DOS/360). In DOS JCL (which is a different syntax to z/OS / OS/390 / MVS / OS/VS2 / OS/360 JCL), you manually allocate files to disk locations using the EXTENT statement. By contrast, in z/OS the operating system decides where on disk to locate your file (or dataset, to use mainframe terminology). (You don't have to manually allocate files any more in z/VSE – you can use VSAM, or store your files in libraries, and in both cases the OS decides on disk locations for you – but, originally, neither VSAM nor libraries existed, so you had to manually assign locations to all the files on disk.) It is very primitive, but remember it was designed in the 1960s to run on machines with only 16KB of memory–plus, humans could design a disk layout to maximise performance, by placing frequently used files on faster areas of the disk. Nowadays, the OS can do a better job of locating files on disks than humans can do, but this capability is kept for backward compatibility.
Thanks for this! I had a number of interactions over the years with the S/3x0 world, and I wasn't always sure what was under the hood. I was aware that there was a bewildering slew of xxAM access methods, but had no chance to look into them.
Unix has a few tricks up its sleeve from when it wasn't the top dog. I hesitate to ever recommend perl, but pack [1] and unpack are pretty sweet for this kind of stuff.
I only learned about them when a state sent me files in EBCDIC [2]. As with all things perl, you can convert from that to ASCII as a one liner. Or, rather, i helped someone much smarter than me do that, 20 years ago.
Please don't remind me. I had to convert the expat XML parser to compile on z/OS and work in EBCDIC, and found that round tripping between ASCII and EBCDIC was sometimes impossible because of the existence of not two, but THREE line terminator characters: CR, LF, and NL (0x85).
Not to mention that you cannot test for uppercase or lowercase like the ASCII `ch >= 'A' && ch <= 'Z'` because they are not contiguous in EBCDIC. A good reason to use the C RTL.
> Not to mention that you cannot test for uppercase or lowercase like the ASCII `ch >= 'A' && ch <= 'Z'` because they are not contiguous in EBCDIC. A good reason to use the C RTL.
Watch your sorting methods, too. I had a guy over here once totally confused why running his SAS job on the mainframe yielded a different result than the same code running on PC SAS against the same data.
I did the same to enable XML messages to flow over MQ between an RS/6000 based front office FX options system, and a back office S/390 system. IIRC there were six (!!) different EBCDIC codepages that could be in play. I had a code generator that would crank out C or Java bindings that could martial between the expat results and the COBOL data structure. 20 years ago now!
Not sure if you remember but sendmail used to require a certain amount of m4 knowledge and hackery. Emboldened by that and reading the dragon book I was very impressed with myself when I wrote a COBOL parser in a mix of C, lex and YACC that automatically generated the needed 'C' structs and Sybase database layout to load data fed from a System/36. I made the data supplier put his code in the first part of the magtape, read and parsed that and then read the rest of the tape.
These days I consider it more of a "what was I thinking" facepalm-worthy sort of thing but at the time I was very proud of it. The "what was I thinking" part is more about the fact that some poor bastard had to come along after me and support that mess.
Likewise: I regard code generation as a red flag these days. The version skew issues when code generated off slightly different versions of messages are in play can be really nasty. CORBA suffered from that issue big time in the late 90s. And if your generated code uses mutexes in a misguided attempt to be "thread safe", all bets are off...
> I hesitate to ever recommend perl ... Or, rather, i helped someone much smarter than me do that, 20 years ago.
No need to hesitate. It's not half bad for a dynamic language if you keep a little discipline. 20 years ago the average Perl programmer was probably akin to the average PHP programmer from 10 years ago. That is, not very experienced, and with code that made that fairly obvious, even if it got the job done. With some of the more modern modules, you get something pretty swizzy[1]. :)
No kidding. Over 20 years ago, I used unpack to read minicomputer log files. It was vastly faster to FTP them over to a Sun Sparcstation and process them with Perl than to use the native log reader. It was also far more flexible.
I read expecting to see something really alien. But it doesn't strike me as alien so much ... just low-level. You have to know the format of the bytes that were written to disk, which is pretty rare these days outside of systems programming (and mainframes I guess).
You don't have to know, and many working on Manframes don't. No-one outside the Mainframe should need to know, unless you get the stupid "oh, but they refuse to change the program" for a data-transfer (read, "we signed-off on it before we knew what we were doing"). There's 256 bit-patterns to a byte. It is not rocket surgery.
Wow. That answer took some time and a whole lot of knowledge to write up, so definitely deserved the "accepted" - but if that's just the surface of things I understand why it's like a different world.
Byte by byte? Where does that come into it? I don't think there's much Production work on a Commodore 64 these days. You don't have to "worry" about anything. I/O is asynchronous, but your High-level Language doesn't know about that, it will appear synchronous. Meanwhile tons of other stuff is going on, and I can't really think of any of it that would be "byte by byte".
I frequently say that IBM sells websphere to assure that they continue to sell mainframes. Because everyone knows if your going to replace your mainframe, you need to rewrite it in java, and what better framework than one provided by IBM! Then when the whole project ends up taking 100x the hardware IBM can pretend there is some secret sauce in those mainframes.
But, to the linked answer. While the details differ a bit, the zseries is simply a low level description of how the machine worked (past tense because modern zseries mainframes have a lot of hidden "virtualization" in order to leverage industry standards). That is why "mainframes" outperform racks of x86 PCs. There really isn't anything magical about the hardware. The real magic is the fact that the software is written by guys who grew up understanding how to processes transactions in a couple K bytes of memory, and the machines grew, as the transaction load did. The result is code which understands the hardware and is crazy optimized in the critical paths. The fact that frequently the critical paths all fit in a fraction of a modern L1 cache doesn't hurt either.
More specifically to your link. While the details vary a bit, modern PC hardware doesn't conceptually differ that much from mainframes. You could just as well ask the same question of a modern PC... Sure you can open a file and treat it as a stream of records, but unless you make sure to size your records on a multiple of 4k (was 512 until recently, although RAID controllers complicate things, as does flash) you will have read/modify/write cycles rather than simple write cycles. Plus, depending on your access method, the kernel may get involved and bounce buffer everything rather than DMA'ing directly to/from the page storing the data. Yah sure, the track/sector meta data on a modern hard-drive varies a bit from the answer given, but you might find a modern SSD that compresses its read/write operations doing things far closer to what was described. So, given a machine with a couple K of memory. You need something that today we would consider the front-end (html/javascript), back-end business logic, database, OS, drivers and disk firmware all in a single piece of code. What would it look like? Yah, all those layers would collapse, and your database records would look a lot like the disk sectors...
For something even closer, you could consider the options to tar, and modern tape drives which continue to actually support the concept of fixed vs variable block reads/writes and blocking factors, and with recent encryption standards even allow what is effectively per block metadata.
My point is that while a lot of the terminology and things exposed on a daily basis with a mainframe seem strange to someone at first glance, a modern server has just as much (if not far more) strange behaviors buried it in. The difference frequently are the layers of standardized interfaces, protocols, and software stacks layered below what most people consider their "software stack".
And this is the comment that hit the nail on the head
Of course Java is going to be slow when the "Enterprise Architect" and his minions will push for hundreds of classes that barely do one thing right and have several inheritance levels deep while the "mainframe" people are shuffling data using something that's simpler than a csv
Also, the virtualization magic is good enough so that the COBOL people keep playing with their 70's technology while ignoring modern-world problems
You mean taking a program which hasn't been recompiled since 1970 and running it (the 1970 executable) on hardware released in 2014 under an OS from 2015? Of course, it will work. There's no virtualization there, just reality. What 70's technology in particular are you thinking of? The latest Enterprise COBOL compiler is just over a year old.
Well I said it originally, and I was referring to the fact that a lot of the zseries hardware is actually just software emulation running somewhere that makes a piece of regular hardware look like a zseries peripheral. Take the disk subsystem for example, the disks are just boring old SCSI disks fronted by a linux/PPC controller which is making them look like ficon CKD/DASD disks. Sure, sometimes IBM has a little "secret sauce" in place to ease that transition, but its not at the level of actually writing native 3390 tracks on modern multi TB disk drives.
I politely accepted it as the right answer but boy oh boy does it feel like it's from an alien civilization.
http://stackoverflow.com/questions/28640159/what-is-the-diff...