Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Felienne Hermans has done some very interesting work examining how to refactor spreadsheets. (http://www.felienne.com/publications)

Her PhD dissertation on the subject "Analyzing and Visualizing Spreadsheets" (http://www.felienne.com/archives/2534) is particularly detailed.

Much though some people might think that spreadsheets are not for serious work, they're being used for that right now, and that is unlikely to change. Better to figure out how to improve the tooling, rather than undertake the quixotic quest to get business people to abandon their perfectly viable programming tool.

There is an excellent video with her on InfoQ, explaining her spreadsheet formula refactoring tool Bumblebee (and the F# code behind it): http://www.infoq.com/presentations/spreadsheet-refactoring



What's interesting to me is that software spreadsheets are more or less just evolved version of what apparently VisiCalc got right on the first try. Sure software spreadsheets are basically just parroting the physical ledgers that came before them, but turning them into software almost immediately opened up a kind of dynamic that didn't exist before. The latest Excel is basically just an organically evolved VisiCalc. There really hasn't been a huge paradigm shift since then.

It's not clear that there should be either, spreadsheets work really well. Efforts to move them into databases or whatever be damned. A spreadsheet is != to a database table except maybe in the most abstract possible sense in some cases. They really are different things and I agree that issues with spreadsheets should be solved by improving the tools not trying to paradigm shift spreadsheet users into an inappropriate environment that also happens to have a very long and highly skilled environment setup requirement.


I'm not a computer scientist, but I think that spreadsheets were possibly ahead of their time in a couple ways:

1. They let novices build fairly effective user interfaces, just by entering headers for rows and columns.

2. If I understand what "data flow" programming is (based on using LabVIEW), then the formula cells in spreadsheets are such a thing. Each formula is recomputed whenever its inputs change. That creates an incredibly quick testing cycles. Displaying intermediate results by necessity creates a built in debugger.

In addition, the lack of "complete" programming features, such as loops, may make spreadsheets less forbidding for novices to create and debug.

If these things are, as I suspect, compelling enough on their own, then like you say, the rest of the stuff in the latest Excel would just be icing on the cake -- not that I'd give up any of it.


They have loads of theoretical advantages :

1) spreadsheets are a purely functional programming language. When you point this out to management users of it (after explaining what it is), they wouldn't want to live without it.

So in reality the most widely used programming language in the world, is a purely functional one.

2) they are NOT turing complete (assuming you stay away from VBScript)

Of course, they're worse than BASIC when it comes to naming things, which is what everyone here is complaining about. But these are major advantages, to be fair. The only thing that comes even vaguely close to how spreadsheets work are the IPython notebooks.


Not to be too pedantic, but...

Felienne Hermans actually implemented a Turing machine in Excel, without using scripting: http://www.felienne.com/archives/2974

There was an HN post about this in September 2013: https://news.ycombinator.com/item?id=6416631


It looks like it uses one row per step. Excel supports finite rows, and hence this does not prove that Excel is Turing complete. A program like while(true); will eventually run out of rows and terminate.


Re. naming things, In the original BASIC, variable names were also quite limited. A-Z, A[0-9]-Z[0-9], which is actually an even more limited space than most spreadsheets which allow for something like [A-Z]{1,4},[0-9]{1,4} at least.

You could also argue that spreadsheets are kind of homoiconic, since code and data occupy the same structure. But that's a bit of a stretch.


Technically VBA != VBScript, although they are similar.


<>


> Efforts to move them into databases or whatever be damned. A spreadsheet is != to a database table except maybe in the most abstract possible sense in some cases.

The problem is that very often, spreadsheets are used as databases because they are the tool the user is familiar with, not because the application is inherently more suited to a spreadsheet.

> They really are different things and I agree that issues with spreadsheets should be solved by improving the tools not trying to paradigm shift spreadsheet users into an inappropriate environment that also happens to have a very long and highly skilled environment setup requirement.

There is no reason a database using a relational data model (but not the multiuser/concurrency features that are also part of the relational model) needs to have a "long and highly skilled environment setup requirement", or, in fact, be any harder to setup than installing an app just like a spreadsheet app would require.


>There is no reason a database using a relational data model (but not the multiuser/concurrency features that are also part of the relational model) needs to have a "long and highly skilled environment setup requirement", or, in fact, be any harder to setup than installing an app just like a spreadsheet app would require.

This is a very salient point to me. A lot of people take the route of talking about why spreadsheets are so bad. In most programming language discussions, I see the more sane folks arguing that most languages aren't bad, but some have terrible idioms, or reams of ancient legacy code from before best practice was a best practice, or maybe they make it too easy to do the wrong thing, and inconvenient to do the right thing (I'm thinking of a lot of stuff Rich Hickey has said).

Instead, I like to think about how the spreadsheet environment or tooling could be altered to make it more natural to do the "right" thing, or to prevent certain classes of errors (maybe in similar ways to how type systems can). I'm not proposing a specific solution, but more a mindset for what I think is interesting (and probably not a terribly original mindset at that).


The excel database functions (DAVERAGE, DCOUNT, DGET, DMAX, DMIN, DPRODUCT, DSTDEV, DSUM, DVAR) give you about 90% of what you want in a database with a query "language" that's simpler than SQL.


> The excel database functions (DAVERAGE, DCOUNT, DGET, DMAX, DMIN, DPRODUCT, DSTDEV, DSUM, DVAR) give you about 90% of what you want in a database with a query "language" that's simpler than SQL.

As someone who has worked rather extensively with both, no, they give a lot less than 90%, and for even fairly simple uses they are often more complex to use than SQL.


For you and the way you think. Don't forget the non-programmer mind.


I've seen non-programmers with years of Excel experience and including several formal classes struggle to use the database functions to do things that non-programmers with similar levels of general technical proficiency breeze through in SQL after a single couple of days intro to SQL class. It's not a scientific study, but in my experience what I said is particularly true of the "non-programmer" mind.


Good to see you're alive. You had me worried there. Message me sometime.


Pivot Tables are even easier to construct and use than these functions. They calculate things like averages, counts and sums for lots of dimensions really quickly.

If you don't like the Pivot Table layout, you can use easily construct formulae to pull the relevant values from a Pivot Table into whatever format you want. You still get the speed of calculation/refresh and, if you label your fields well, have formulae which you can copy-paste across a large area, making the sheet easy to inspect and reason about.


That's a very good point and a good reason why anyone who wants to "fix" this problem should build on spreadsheets.


> Much though some people might think that spreadsheets are not for serious work, they're being used for that right now, and that is unlikely to change. Better to figure out how to improve the tooling, rather than undertake the quixotic quest to get business people to abandon their perfectly viable programming tool.

Its largely used by business people because they (1) have inadequate IT support to have things developed using proper tooling, and (2) have been forbidden from using better tooling themselves by IT. Since both of these are direct products of IT policy, I don't think its a quixotic quest to fix it -- however, trying to fix it by trying to convince them to just give it up is the wrong approach. The people who need to be convinced to change the situation are CIOs.


A lot of people who have access to Excel also have access to Access - and a lot of what they do would be better served in database than a spreadsheet. Yet they don't use one - there must be a reason for that, and I don't think it's IT restrictions.


This little tidbit from Chris Granger seems relevant:

"Excel is inherently observable since it doesn't have any hidden state and all values are there for you to see and manipulate. It's also direct. You change values in the grid, drag drop things, do calculations on selections, and so on. And it manages to sidestep a lot of incidental complexity; spreadsheets are timeless, without setup, and don't even have a notion of being run."

http://www.chris-granger.com/2014/03/27/toward-a-better-prog...


Saying Excel doesn't have any hidden state is stretching things a bit--your formulas don't show up until you are directly on them, so mistakes (formula results that got turned into static data, for example) are hidden until you put your cursor directly onto them.


"formula results that got turned into static data"

This hints at one of the sources of problems with Excel. One good approach to using Excel is to always be in one of two modes: changing the structure (equivalent to coding) and entering values (equivalent to using software). Of course, at the early stages of designing something you will be entering lots of dummy values (e.g. ARPU) to check that your formulae work.

However, many people don't distinguish between collections (rows, columns or other contiguous areas) of cells which _should_ contain values, and those which should contain formulae. I, like many, prefer to indicate input cells with a yellow or orange background, so that I know everything else is a formula.

You can do even better:

- If you're done designing a spreadsheet, and expect to use it for a production process, then mark the input cells as unlocked, and protect all the sheets. Then the end user won't be able to mess up the formulae.

- Always write formulae in a way that they can be copied across or down whilst pointing to the right places. This can be achieved through use of one or more $ signs to fix a reference. I've seen a fair number of spreadsheets where there are a large number of similar formulae, but they have been entered/adjusted manually. This is fine only if you never make mistakes, and if no one else needs to change the formulae later or verify they are correct.

My point is that "formula results that got turned into static data" can be avoided with just a few easy rules. However, most people aren't taught these rules :(


> Always write formulae in a way that they can be copied across or down whilst pointing to the right places. This can be achieved through use of one or more $ signs to fix a reference.

Except if you are using modern version of Excel, don't general use $ references for this, go one step further and use names (for fixed individual datapoints where you would use two $s) or named tables with named columns (for the most common use of single $ references.) This is more self-explanatory and less error-prone.


I never use it, but you can toggle between showing formulas and showing values with CTRL + ` (grave accent) (http://office.microsoft.com/en-001/excel-help/display-or-hid...)


> Excel is inherently observable since it doesn't have any hidden state

Except that it does -- even in terms of simple numerical values what you see is not necessarily the actual value, but instead the result of passing the value through a format string that can vary by cell.


Sadly, most people don't use this feature, and are content to show values using the default format, even if it results in values displayed as 12904819.23 (which might be better displayed as 12.9m or 12.90m).


> don't even have a notion of being run

They actually do have such a notion - in Excel you can set the calculations to Manual and then you hit F5 to refresh everything in one go. Useful when you deal with current date and time.


It's actually F9. F5 opens the Go To dialog box.


This Rory Blyth cartoon has been ungoogleable for at least ten years now, but I still check every so often. Now here it is!

http://wyorock.com/excelasadatabase.htm

What I was going to say before unexpectedly finding that, is that the gap between Excel and Access is huge. I've made a living off of that gap at various points, and at some point I stopped thinking they were "doing it wrong" and started thinking, "whatever makes you happy."


> ...at some point I stopped thinking they were "doing it wrong" and started thinking, "whatever makes you happy."

The secret to success in contract software development.


SQL is scary.

I've talked to hundreds of customer prospects for Treasure Data, and by far the biggest surprise I encountered is how few people know their way around SQL. I am not talking about just "business" people but also programmers.

Contrast this dearth of SQL-proficient population with the massive, massive Excel user base: I jokingly tell my friends that Excel is by far the most popular programming language =p


Much of the ORM market and NoSQL were precisely because programmers don't understand SQL.


True that.

I use SQL extensively, and never run into all these problems other people run into.

I also don't use ORM. there's so much you can do well and fast with SQL.

(I've also seen dreadful abominations in sql, like stored procedures that dynamically generate code for pivot tables... wait, that was me who did that)


That, and typing "select x from table A, table B where tableA.foreign_key = tableB.index and tableA.index=5" is a little long-winded and more prone to error, when you can just do x = ModelA.objects.get(pk=5).ModelB


SELECT B.x FROM tableA A JOIN tableB B on A.foreign_key = B.index WHERE A.index = 5

That's very simple, if you know SQL.

I think


Access actually has (or had, it's been a long time) a very well put together visual query designer. As long as you weren't doing anything crazy you'd never need to touch SQL.


Not true. As soon as you go just a little step further, your query cannot be represented by the GUI stuff anymore.


Someone has to explain how a database works before you can use Access properly. Explain the relational model. Otherwise it just offers tables to the user that look a bit like an excel sheet but with less functionality. In Excel they can form some sort of data model a lot more easily.


I find points 1 and 2 difficult to believe as the primary reason people rely on Excel. I think the primary reason is as seanstickle suggested: Excel is the only programming environment many "business people" know and are comfortable in. I personally know people who use Excel for data analysis, and it's because they don't know a general purpose programming language.

edit: Based on your other comments downthread, I realized that you meant the IT department themselves should develop the application using "proper tooling", not the business people. That means that the business people will have to hand over requirements to the IT people, who will write the software. I think turn-around time may be too much for business people who want to do their own data exploration.


> Based on your other comments downthread, I realized that you meant the IT department themselves should develop the application using "proper tooling", not the business people.

Not necessarily, though that really depends on the technical competencies in the organization and a number of other factors.

> That means that the business people will have to hand over requirements to the IT people, who will write the software.

Well, I kind of think that if you are going to have an organization that effectively uses technology, you need technology-aware process/system analysis embedded in operational organizations at a fairly low level facilitating process optimization including, as necessary, technology acquisition and software development. That's a pretty big culture change for most organizations that aren't technology organizations, I will admit.

> I think turn-around time may be too much for business people who want to do their own data exploration.

I don't think the problem area in Excel use is "data exploration", but, yes, I think that overly bureaucratic, insufficiently agile processes which separate software development from operations do create friction which leads to use of suboptimal tools -- including, but not limited to, use of Excel where its not appropriate.


I disagree strongly with (1).

If you just need to sum a list of numbers, or quickly calculate payroll or something, Excel does a fine job. For certain types of engineering calculations (the horror), Excel does a fine job.

The overhead of having a full developer implement the tool is absurd, because many of these problems can and should be solved in less than 10 minutes of spreadsheet jockeying.


> If you just need to sum a list of numbers, or quickly calculate payroll or something, Excel does a fine job.

Sure, its great for lots of one-time quick calculations. I was really referring to its use in on-going operations with changing requirements where maintenance is necessary.

The problem comes in when something that is suitable as a one-off tool -- and perhaps, as such, might make a decent prototype for a proper ongoing, maintainable tool -- instead gets pressed into service as a production tool and becomes a sinkhole of technical debt.


Don't forget scientists. I am an engineer at a biotech company and they use Excel for everything. You may be amazed at what they can do with it (and horrorstruck by the systems/workflows they implement with it).


(3) It's ubiquitous, because virtually every serious company with financials uses Office, so (4) employees are expected to know how to use them because their senior members have used them for decades. I know plenty of financial analysts with different companies in different positions and every single one of them spends time every day nose-deep in Excel.


My mother was a senior accountant at one of the Big Four accounting firms, and spent so much time Excel that she once built a Gantt chart in it.


I was watching a show the other day where they discussed their mapping of Benjamin Franklin's social network (who he communicated with, where they were in the world, etc). They showed some of the process and it appeared to involve some Excel workbooks with hundreds of sheets and terribly normalized data.

I can only imagine the amount of work involved in trying to extract any sort of useful information from the raw data in that format. I run into stuff like this all the time.

All I could think was that given a day and a Postgres+PostGIS server someone could have probably saved them entire man-months of their time...


If I had millions I would pay people like you to teach people like them how to "do it properly". Or even just implement it properly with lots of comments and docs and maybe a talky video.


I developed a set of utilities for Excel (http://www.breezetree.com/spreadspeed/), and I started adding some auditing features to it. After spending a few weeks on the auditing tools, I decided that I should do a little market research before investing more time on auditing features. Well, I'm glad I did because I found out that the market is saturated with spreadsheet auditing tools. Most are simple and inexpensive (like mine), but there are some fairly sophisticated tools out there. So my takeaway from this is not that there aren't sufficient auditing tools, but that the market needs a non-programmer, user-friendly way to build robust yet malleable models.


> Much though some people might think that spreadsheets are not for serious work, they're being used for that right now, and that is unlikely to change.

Indeed. Here's a great quote from 2005: "databases are rocks, spreadsheets are water": http://www.propylon.com/news/ctoarticles/051115_master_foo.h...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: