Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just reading the script from TFA, it attempts to find secrets.pyc and decompile it, but doesn't even check if secrets.py is also in the repo. A glance at search results (I just used GitHub's web interface, didn't bother to run the code) tells me when secrets.pyc is committed, secrets.py comes with it at least the vast majority of time.

I guess the author did find cases where secrets.pyc is committed but secrets.py is not? It's hard to fathom how that could have happened (especially inside "organization" settings). Sounds like the result of absolute rookies in both Python and git following a tutorial with a step "add secrets.py to .gitignore" but unfortunately takes ignoring __pycache__ and ﹡.pyc for granted, which is too much to ask for some people.

> it is very easy for an experienced programmer to accidentally commit their secrets

No, it doesn't take an experienced programmer to put __pycache__ and ﹡.pyc to global ignore, or use a gitignore boilerplate at project creation, or notice random unwanted files during code review.



Seems to me quite easy to forget ignoring __pycache__ and *pyc, have the secrets pushed to the repo, then never get to remove them from history.


Fix your process then. Use a global ignore file. Add a language-specific gitignore boilerplate first thing you create a new project. Scan for files that don't belong in code review (do I even need to suggest this).

> never get to remove them from history.

Scrubbing specific files from git history isn't hard.

s/__pycache__ and ﹡pyc/secrets.py/g and people will also commit it in. PEBCAK.


Of course there are trivial solutions to this issue.

Nonetheless, this is a common mistake, whether you believe it or not. And if it is common, then it will be exploited.


The premise of my original post is that ignoring secrets.py but not secrets.pyc is probably not very common. TFA claims "thousands of GitHub repositories contain secrets hidden inside their bytecode", which is probably true, but at least the vast majority of those have secrets.py in plain sight as well, no decompiling necessary; and TFA doesn't actually demonstrate any effort to filter those out.


I think I am an experienced developer (not Python) and this would never cross my mind.


It would never cross your mind not to commit .pyc files to source control? They're not even source. Committing .pyc files is to Python what committing .o files is to C.


> They're not even source.

To be clear, they’re not even text. You don’t need to know Python at all to realize something’s not right when you’re committing unknown binaries to source control.


When you review the list of changes (including added files), you notice the files that shouldn't be there; so you would see them before committing. And even if you forget, git also lists new files after calling "git commit"


I actually insist at work that most repositories don’t have a .gitignore; just setup a reasonable global one and you’re done. OSS or repos with a large number of contributors are generally exceptions here.


That sounds like poor advice with unclear rationale behind it. There are definitely project specific ignores that you’d want to set up, and if you’re working with projects with multiple different languages, “one gitignore to rule them all” fast becomes a mess.


Sorry, to clarify, if there's good reason to have project specific ignores, that's fine. But most projects have similar/overlapping ignores


Sure but reasonable for me is different from reasonable for you. In addition, I might miss files that are repo specific. Insisting repos don't have a .gitignore is terrible advice. It doesn't cost much to maintain it if at all.


I think you are misunderstanding. The secret does not need to be hardcoded in the python file. If it's read in from an environment variable or some other external source, it will also be in the pyc


Of course not, that would mean env vars are hard-coded into byte code at compile time, which would be completely crazy. A pyc file is just a parsed series of op codes that the interpreter could dispatch directly, so that it doesn't have to parse source files every single time.

It's very easy to verify:

secrets.py:

  import os
  SECRET = os.getenv('SECRET')
Then

  $ python -m compileall secrets.py
  $ uncompyle6 __pycache__/secrets.cpython-38.pyc
  # uncompyle6 version 3.7.0
  # Python bytecode 3.8 (3413)
  # Decompiled from: Python 3.8.2 (default, Mar 10 2020, 12:58:02)
  # [Clang 11.0.0 (clang-1100.0.33.17)]
  # Embedded file name: secrets.py
  # Compiled at: ...
  # Size of source mod 2**32: 40 bytes
  import os
  SECRET = os.getenv('SECRET')
  # okay decompiling __pycache__/secrets.cpython-38.pyc


That’s totally incorrect. .pyc files just contain a representation of the _code_ and not any values that don’t exist in the code.

So a snippet like “os.environ[‘my_super_secret’]” won’t contain anything else than the bytecode to fetch that environment variable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: