Forum Archive

cloud.File (with AES encryption)

[deleted]

See Samples (pickle, script, JSON, text, images) for interesting things you can do with cloud.File

and see other CloudProviders for alternative cloud storage options

I have the idea to write a cloud.File for the cloud module. It will be a generic file-type object that implements cloud storage. As such it could be used by anyone, anywhere a regular file could be.

The 'open' will take an additional optional parameter of 'encryptionKey' that, if used, will cause the cloud storage for the file to be encrypted.

It will have one additional method 'commit' that will return the 'cloudURL' of the associated storage, which can be used in a subsequent 'read' to retrieve the data.

(The storage works without passwords, so there should be no opportunity for password discovery vulnerabilities. It is created on a per file basis and therefore not liable to mass deletes.)

Currently I have a prototype working with both text and binary files. My next priority is to implement the encryption.

[deleted]

In the prototype I'm using the Vigenere Cipher ... it can be broken but:
- first it would be necessary to know the non-public URL, itself cryptic
- then, it would be necessary to know that the encrypted data related to something (there is nothing to identify it)
- then it would be necessary to know the key, or be prepared to make the effort to discover the cypher type and break it

Webmaster4o

@gueriro I would take a look at PyCrypto. It's included in Pythonista

```python

import Crypto
help(Crypto)
Help on package Crypto:

NAME
Crypto - Python Cryptography Toolkit

FILE
/var/containers/Bundle/Application/CEC4AF18-FDAC-49F8-8561-8A2CC3A36A5E/Pythonista.app/Frameworks/PythonistaKit.framework/pylib/site-packages/Crypto/init.py

DESCRIPTION
A collection of cryptographic modules implementing various algorithms
and protocols.

Subpackages:

Crypto.Cipher
 Secret-key (AES, DES, ARC4) and public-key encryption (RSA PKCS#1) algorithms
Crypto.Hash
 Hashing algorithms (MD5, SHA, HMAC)
Crypto.Protocol
 Cryptographic protocols (Chaffing, all-or-nothing transform, key derivation
 functions). This package does not contain any network protocols.
Crypto.PublicKey
 Public-key encryption and signature algorithms (RSA, DSA)
Crypto.Signature
 Public-key signature algorithms (RSA PKCS#1) 
Crypto.Util
 Various useful modules and functions (long-to-string conversion, random number
 generation, number theoretic functions)

PACKAGE CONTENTS
Cipher (package)
Hash (package)
Protocol (package)
PublicKey (package)
Random (package)
SelfTest (package)
Signature (package)
Util (package)
pct_warnings

DATA
all = ['Cipher', 'Hash', 'Protocol', 'PublicKey', 'Util', 'Signatu...
revision = '$Id$'
version = '2.6'

VERSION
2.6

[deleted]

@Webmaster4o Thanks :) I will

[deleted]

@Webmaster4o From "design goals" ... "Some modules are implemented in C for performance"

Webmaster4o

@guerito It's included in Pythonista though. @omz put it in, c and all. Just import Crypto.

[deleted]

@Webmaster4o Ah, great. Thanks. :)

JonB

Are you using github for storage? gist? googledrive?

[deleted]

@JonB Any or all of those (and some others)?

Provided they meet the design criteria (I think some may not), but it depends on people's ingenuity in implementing them and what if any security risks they are prepared to run.

To the list you could add what @Webmaster4o is developing stored on his server... that will be neat for sure.

In the group with google drive you could put Dropbox, Box, OneDrive, WebDav etc.

Internally a 'CloudProvider' class, 2 methods 'putFileToURL', 'getFileFromURL'. A default implementation for people who want something simple and safe that "just works" out of the box. Sub-classes for the others, either community developed or "roll your own".

[deleted]

I have encryption working now, same approach:

The encryption can be Vigenere Cypher, Crypto, or any other.

Internally an 'EncryptionProvider' class, 2 methods 'encodeFile', 'decodeFile'. A default implementation and sub-classes for the other types, either community developed or "roll your own".

[deleted]

By way of example... here is my test pack code:

See cloud.File - Samples topic

[deleted]

Here are the extra lines (updated to implement all the file methods and AES encryption)) for cloud.py to implement the prototype cloud.File class (shared under the cloud module MIT licence).


# coding: utf-8

'''
cloud.py 

Vision: 

- cloud.Import: to make the entry curve to using code hosted on GitHub much easier
- cloud.File: generic file-type object that implements cloud storage

Credits: 

- cloud.Import: idea and first version by @guerito, future versions on @webmaster4o's GitHub
- cloud.File: idea and first version by @guerito, future versions on @webmaster4o's GitHub

'''

import io, os, urllib, json, base64
from Crypto.Cipher import AES

class File(io.BytesIO):
    """ cloud.File: generic file-type object that implements cloud storage """
    def __init__(self, name, mode = '', buffering = 0, encryptionKey = ''):
        self.name = name
        self.mode = mode
        self.encoding = None
        self.errors = None
        self.newlines = None
        self.softspace = 0
        self.__encryptionKey = encryptionKey
        self.mf = self.__mFile()
        self.__iPos = 0
        self.__sBuffer = None

    def commit(self):
        mf = self.__mFile()
        if self.mode == 'w':
            mf.write(self.__EncryptionProvider().encodeFile(self.mf, self.__encryptionKey))
        elif self.mode == 'wb':
            b64 = self.__mFile()
            base64.encode(self.mf, b64)
            mf.write(self.__EncryptionProvider().encodeFile(b64, self.__encryptionKey))
        url = self.__CloudProvider().putFileToURL(mf)
        return url

    def close(self):
        self.mf.close()
        super(File, self).close()

    def flush(self):
        pass

    #def fileno(self): should not be implemented for file-like objects

    def getvalue(self):
        if self.__sBuffer == None:
            self.__sBuffer = ''
            self.__sBuffer = self.getvalue()
            return self.__sBuffer
        elif len(self.__sBuffer) > 0:
            return self.__sBuffer
        mf = self.__mFile()
        mf.write(self.__EncryptionProvider().decodeFile(self.__CloudProvider().getFileFromURL(self.name), self.__encryptionKey))
        if self.mode == 'r':
            return mf.read()
        if self.mode == 'rb':
            b64 = self.__mFile()
            base64.decode(mf, b64)
            return b64.read()

    #def isatty(self): should not be implemented for file-like objects

    def next(self):
        self.getvalue()
        if self.__iPos < len(self.__sBuffer):
            return self.readline()
        else:
            raise StopIteration

    def read(self, size = -1):
        if size < 0:
            return self.getvalue()[self.__iPos:]
            self.__iPos = len(self.__sBuffer)
        else:
            s = self.getvalue()[self.__iPos:self.__iPos + size]
            self.__iPos += size
            return s 

    def read1(self):
        return self.read()

    def readline(self, size = 0):
        if size < 0:
            return self.read(77)
        else:
            return self.read((self.getvalue()[self.__iPos:] + '\n').find('\n') + 1 if size < 1 else size)

    def readlines(self, sizehint = 0):
        l = list()
        while self.__sBuffer == None or self.__iPos < len(self.__sBuffer):
            l.append(self.readline())
        return l

    def xreadlines(self):
        return self

    def seek(self, offset, whence = os.SEEK_SET):
        self.getvalue()
        if whence == os.SEEK_SET:
            self.__iPos = offset
        elif whence == os.SEEK_CUR:
            self.__iPos += offset
        elif whence == os.SEEK_END:
            self.__iPos = len(self.__sBuffer) + offset

    def tell(self):
        return self.__iPos

    def truncate(self, size = None):
        self.getvalue()
        if size == None: size = self.__iPos
        self.__sBuffer = self.__sBuffer[:size]

    def write(self, str):
        self.mf.write(str)
        self.__iPos += len(str)

    def writelines(self, sequence = None):
        for s in sequence:
            self.write(s)


    class __mFile(io.BytesIO):
        """ memory based file: BytesIO with read() and readline() methods """
        def __init__(self):
            io.BytesIO.__init__(self)
            self.__iPos = 0

        def read(self, size = -1):
            if size < 0:
                return self.getvalue()[self.__iPos:]
            else:
                s = self.getvalue()[self.__iPos:self.__iPos + size]
                self.__iPos += size
                return s 

        def readline(self, size = -1):
            if size < 0:
                return self.read(77)
            else:
                return self.read((self.getvalue()[self.__iPos:] + '\n').find('\n') + 1 if size < 1 else size)


    class __CloudProvider(object):
        """default implementation using Gist can be subclassed for: GitHub, @webmaster4o server, Googledrive, Dropbox, Box, OneDrive, WebDav, etc """
        def putFileToURL(self, f):
            return json.loads(urllib.urlopen('https://api.github.com/gists', json.dumps({ "description": "-", "public": False, "files": { '-': { "content": f.read()} } })).read())['files']['-']['raw_url']

        def getFileFromURL(self, sURL):
            return urllib.urlopen(sURL)

    class __EncryptionProvider(object):
        """default implementation using Crypto.Cipher AES (stateful), can be subclassed for any other"""
        def __init__(self):
            self.s16 = 'vB03eMZQ0lnsNVX6'

        def encodeFile(self, f, key):
            s = f.read()
            if key == '' : return s
            s = str(len(s)) + '-' + s
            s += self.s16[:(16 - (len(s) % 16))]
            aes = AES.new(key[:16] + self.s16[:16 - len(key[:16])], AES.MODE_CBC, self.s16)
            return base64.urlsafe_b64encode(aes.encrypt(s))

        def decodeFile(self, f, key):
            s = f.read()
            if key == '' : return s
            aes = AES.new(key[:16] + self.s16[:16 - len(key[:16])], AES.MODE_CBC, self.s16)
            s = aes.decrypt(base64.urlsafe_b64decode(s))
            i = s.find('-')
            return s[i+1:long(s[:i]) + i + 1]
[deleted]

@Webmaster4o If you're able to add this to the GitHub for cloud, then I will delete the code from above and point to GitHub.

JonB

I hope your intention re Vignere was just a fun exercise, and not intended to actually be secure... it should probably be made clear that this is not secure and should not be relied upon!

[deleted]

@JonB

@guerito said:

In the prototype I'm using the Vigenere Cipher ... it can be broken but:
- first it would be necessary to know the non-public URL, itself cryptic
- then, it would be necessary to know that the encrypted data related to something (there is nothing to identify it)
- then it would be necessary to know the key, or be prepared to make the effort to discover the cypher type and break it

Webmaster4o

@guerito What do you have against making yourself a GitHub? Like, I get that you think these things should be community projects, but I think you need to put them somewhere where it's clear that you get credit for the idea as well as the original code.

[deleted]

@Webmaster4o Like we said... I think the value is in an idea itself, rather than who has it and kicks it off. I only have a limited amount of time for my hobby and that doesn't run to managing a GitHub account. I'm quite happy with posting my code on the forum, the pressure to put it on GitHub comes from GutHub users. So I'm trying to play nice for them.

Webmaster4o

@guerito Still, I think GitHub is a good thing to learn. Even if you only have limited time, the projects could be largely community-based, with your only role being accepting and merging others' pull requests. This wouldn't be time sensitive, you could do it when you have time. I don't object to hosting the projects, I just think that it's better if you do. If you really don't want to, that's fine and I understand.

[deleted]

@JonB , or anyone. Are you up for a challenge?

I've saved something with cloud.File using the default Vigenere Cypher.

I've helped a bit by revealing what the cypher type is.

Can you post here what it says??

ccc

I've saved something with cloud.File using the default Vigenere Cypher.

You have saved it where?

[deleted]

@ccc That's my point...

The class is an EncryptionProvider, not a SecurityProvider.

Data can be encrypted with the strongest cypher known, but if the design makes it vulnerable to password discovery attacks or malicious deletion, for example, that's not secure.

The security comes from a number of design factors, of which strength of encryption is only one:

  • first, the URL is not public, it's not listed anywhere that you could go to a list of files to start attacking the
  • second, the URL is cryptic in itself, it's not one that you could guess, or generate realistically by brute force
  • third, if someone did find the data somehow, there's nothing to identify it, no user account, no file name
  • forth, if someone with inside access, like an employee of the CloudProvider got access to the data, the basic level of encryption means that they couldn't access the contents without breaking the cypher
  • fifth in order to break the cypher, first it would be necessary to discover the cypher type (out of potentially 100s)

So the security here comes much more from the overall design than from the limited default encryption (which I said from the first - it is possible to break.

Of course, stronger encryption can be better, that's why the EncryptionProvider can be subclassed by the community or by a user 'rolling there own', and the Crypto package is suggested.

JonB

by using a secret gist, the security is essentially provided by github, not the encryptionprovider. if someone rolled their own storage provider, say public github, or gist, but reused your encryption provider, i think you would agree this becomes a lot less secure.

There is the old adage about rolling your own encryption (don't do it). It is easy to make a scheme you can't break yourself, hard to make one that cannot be broken by anyone else. Vignere on textual data is vulnerable to frequency attacks. If the goal is to be able to obfuscate the data to thwart the casual observer, no problem. If thr goal is to have a secure way of saving your list of bank passwords, I would be concerned.

It seems to me one thing that would make a cloud.File module very useful would be human readable paths. Are you able to read back your own example (did you save the url?)
i.e one design gosl might be that the storage provider service maybe has a long or cryptic path to thr "repository", but then it does not change, and then relative paths can be used (i.e provide chdir, curdir, listdir, and path handling)

[deleted]

@guerito said:

@JonB Any or all of those (and some others)?

Provided they meet the design criteria (I think some may not)

@JonB that's what I was referring to when I replied to you that some of the cloud providers you mentioned might not meet the design criteria and implementation of those would depend partly on what level of risk the implementer was prepared to run.

Yes, I read the example back. .

[deleted]

Updated the source above to include implementation of readline() for the cloud.File class

[deleted]

Updated the source above to include implementation of 3 more file methods (readlines(), seek(), tell()) and to note 2 methods that shouldn't be implemented for file-type objects

[deleted]

Updated the source above to complete the implementation of all the file methods.

Next will be to update the default EncryptionProvider to use one of the Crypto.Cipher package encryption methods.

[deleted]

Here is the EncryptionProvider class using the Crypto.Cipher - AES encryption. I will be updating the source to make this the default provider.

(Now included in the source above)

Phuket2

@guerito , I have no idea about good encryption. But I read this article yesterday. Was about bcrypt. I tried to install it with stash, but it failed.
I am not sure it's a credible article or not. But it did make sense.

[deleted]

@Phuket2 Thanks, I'll look at that. There's a write up in the Crypto.Cipher docs too under Security Notes... I chose AES because it said this about it...

AES, the Advanced Encryption Standard, was chosen by the US National Institute of Standards and Technology from among 6 competitors, and is probably your best choice. It runs at 7060 K/sec, so it's among the faster algorithms around.

Phuket2

@guerito , ok I get that. The article I was reading was about passwords. Not whole files. So I guess I spoke wrong. But very interesting article nether the less. It purposely slows down the decryption to help hamper brute force attacks.
These articles are starting to appear a lot because of the FBI asking the courts for a back door into the iphone case.
But it's a tricky subject. There is not a one solution that fits all. But not easy to understand unless you have background in it. Like trying to understand compression for JPEG vrs 16 bit bitmap or video compression. So many, but few people just get the basics. I think I have a concept of the basics, not much more , but it's enough. Zip file compression simlar, old run length encoded files .rle ,,not sure they are still used. It has been years since I have seen a .rle file

[deleted]

@Phuket2 Yes it was an interesting article thanks, wise though counterintuitive to use the slowest in that scenario.

[deleted]

@Phuket2 On an aside, if you are interested in attack methods the Wikipedia article I referenced above regarding the Vigenere Cipher points in the references to a Python script that attempts to crack the cipher.

If you look at the code it looks for repeating groups and uses them to try and guess the key length. Once the key length is guessed... the cipher text can be rewritten in columns all encoded with the same key character and conventional methods used to crack each column.

For example, in the challenge I issued, there's a repeating group 32 characters apart. Ruling out ridiculously short key lengths, it means I probably used a key of 8, 16 or 32 characters.

In cloud.File though... the security doesn't depend mainly on encryption... the other design factors are the main security features, encryption is a final strengthening factor. It's why, even knowing the encryption type, and attack method... no one has been able yet to discover the contents of the challenge file.

Phuket2

@guerito , just when they talk about slowing down the decryption for a single authentication of a password makes sense. When you take into account the web, it's not a huge difference. But when someone is trying to run algorithms over your encrypted keys , slows them down so much. It's like why would you bother, move to the next data set that is not so well protected. Or will not take 1000 years to brute force attack it, wether you have big blue or not.
It's been fun reading about how Apple protects itself. Regarding dismantling the hardware, making images etc to attack it. Seems like they have done a good job.
Seems like it to me

dgelessus

I'm no crypto expert, but bcrypt is a password hashing algorithm and not an encryption method. The difference is that encryption has to be reversible in some way (the encrypted data must be decryptable by the intended recipient), whereas with password hashing you only need to know whether a given password is correct. In fact you want password hashes to not be decryptable, so for example an attacker with access to user password hashes cannot just get the plaintext passwords back.

Also I'm not sure if I'm reading your code right... what exactly is the purpose of self.s16? Why do you need to add the file length at the beginning of the data and self.s16 at the end, and why do you mix self.s16 into the key?

[deleted]

@dgelessus AES requires:

  • an initial block value of 16 characters (because each block has the encrypted value of the previous block added to it in CBC mode)
  • text to encode as a multiple of 16 characters
  • a key length of 16, 24 or 32
dgelessus
  • The IV should be random (random.SystemRandom) and not a fixed value for the entire world. https://en.wikipedia.org/wiki/Initialization_vector
  • OK, that makes sense. (Would padding at the end with null bytes be insecure? If not, then why not do that for clarity?)
  • Let it throw an error (Crypto should do that automatically) if the key has an incorrect length as that likely indicates a programming error.
[deleted]

@dgelessus Are you considering that you need the same IV to decode?? Possibly in another instance of the class, and quite likely on a different device.

dgelessus

Yes, which means that you need to transmit the IV unencrypted. That's why it's called an IV and not a secret key. The point of using a random IV is that for example when you encrypt the message "hello world....." twice using the same key, an attacker cannot tell that the encrypted messages are the same.

JonB

I think you'd typically append the IV to the start of the ciphertext. This is more of an issue when the attacker had the ability to choose the plaintext, in which case it would be possible to uncover the key.

dgelessus

OK, that's how I understood it too. Crypto has built-in support for specifying an IV though, so I'd use that instead of doing it manually.

[deleted]

@dgelessus

  • the IV is not necessarily "the same value for the whole world"... it's easy changed in the source to a personal value if required
  • arguably the mistake in sending "hello world" twice would be in using the same key each time
  • padding with nulls or 'X' s (as in the docs example) could aid someone trying to decrypt in that they would know the plain text ended in 0-16 of the same characters
  • the class conforms to its interface in accepting keys of any length and deals internally with the special requirements of any particular encryption provider
  • padding the key with random characters may make it less susceptible to dictionary attacks

... but most of all... you're missing the point, since the security in cloud.File doesn't depend primarily on the strength of the encryption (if used)

dgelessus

Are you perhaps confusing the purpose of the key and the IV? The key is the user's "personal value" that they should choose themselves (ideally by getting it from a CSRNG, that way you don't need to worry about dictionary attacks and other issues with key insecurity). The IV is what should be used to add extra "randomness" to the encryption, which is why it should be randomly generated and exchanged for every encryption process.

What you're suggesting is that the user should set a custom IV (like a secret key) and that the key should have added random bytes (like half-key-half-IV).

If your interface doesn't allow errors when an invalid key is passed, then fix it. If I pass None or a function as a key, you would throw an error instead of somehow deducing a key from the objects. String keys with an invalid length are no different - they don't match the required format, so they should be rejected. At the moment you're "fixing" invalid keys in an insecure way without telling the user that they did something wrong.

[deleted]

@dgelessus Rather than go back and forward about matters that are beside the point... let me give you 2 challenges, if you are up for them:

  • the first is to repeat to you the challenge to @JonB, or anyone, to post here the decrypted contents of the challenge file. That was encrypted using Vignere Cipher which can certainly be broken (I've pointed in this topic to the method and the Python code to attempt it), or, if you prefer, I'll save a new challenge file using AES.
  • the second is that EncryptionProvider is deliberately a distinct class so that anyone can 'plug 'n play' their own choice or favourite. The 'cloud' module is a community project on GitHub, so write an EncryptionProvider that satisfies your own ideas and prefences, and contribute it to the project.
dgelessus

Right now I am talking about the security of this specific encryption provider that you posted, which I understand you are planning to make the default.

The "challenges" you posted demonstrate nothing about the security of cloud.File. Here's a challenge for you: I saved a file years ago on an online hosting service and it is publicly accessible and not encrypted in any way. By now it has probably been mirrored a few times on Google Cache or the Wayback Machine. Since I did not use any of cloud.File's security features, it should be easy for you to tell me its location.

[deleted]

@guerito said:

@ccc That's my point...

The class is an EncryptionProvider, not a SecurityProvider.

Data can be encrypted with the strongest cypher known, but if the design makes it vulnerable to password discovery attacks or malicious deletion, for example, that's not secure.

The security comes from a number of design factors, of which strength of encryption is only one:

  • first, the URL is not public, it's not listed anywhere that you could go to a list of files to start attacking the
  • second, the URL is cryptic in itself, it's not one that you could guess, or generate realistically by brute force
  • third, if someone did find the data somehow, there's nothing to identify it, no user account, no file name
  • forth, if someone with inside access, like an employee of the CloudProvider got access to the data, the basic level of encryption means that they couldn't access the contents without breaking the cypher
  • fifth in order to break the cypher, first it would be necessary to discover the cypher type (out of potentially 100s)

So the security here comes much more from the overall design than from the limited default encryption (which I said from the first - it is possible to break.

Of course, stronger encryption can be better, that's why the EncryptionProvider can be subclassed by the community or by a user 'rolling there own', and the Crypto package is suggested.

@dgelessus That's my point exactly - you got it.

dgelessus

I got what? That the challenges that you posted show nothing of use?

JonB

The point is, unless someone followed this thread they might use your tool thinking it provides security. The comments in the file do not make it clear that security is provided by the private gist, so someone might subclass Cloud Provider to use public gist, and suddenly they are not nearly as secure. I think if you prefaced the discussion with an acknowledgement that it is only as secure as your cloud provider, most of us will shut up.

You seemed to be looking for feedback on the idea, feedback was provided. It is clear now that you don't want feedback, which is okay too.

[deleted]

@JonB Feedback is great, that's why I took onboard straight away the recommendation of the PyCrypto package, and implemented it at the first opportunity.

I'm a fan of that kind of positive, 'to the point' feedback which is really useful.

JonB

@dgelessus He is providing security through obscuration. The encryption protects data I guess from casual reading by github employees or people trying random gist hashes. I think that is his point.

dgelessus

@guerito In my first post about your soon-to-be-default AES encryption provider I wrote what about the code I thought was unclear or needed improvement. Was that not to the point enough?

Speaking of to the point, the OP still does not say anything about what exactly the default "cloud" and "encryption" methods are. Sure, you can replace them with different implementations, but the defaults are what most users will start with (and perhaps the only thing they will ever use) so it is important to say what exactly they do and don't provide. For example that the encryption uses Vignere and should not be considered secure.

[deleted]

@dgelessus I considered your thoughts and responded with the reasons I didn't feel they warranted implementing. There's been ample discussion and more here is off topic in my view. I'll think about it and maybe post something over in 'General Discussion', in a day or 2, which is a place for taking about whatever we want.

[deleted]

@JonB said:

The point is, unless someone followed this thread...

Thanks. I think you have a good general point about the disjoint between forum discussions and corresponding code hosted on GitHub. I'll maybe take that up in General Discussion if nobody else has.

[deleted]

Updated the source above to make AES the default EncryptionProvider. This is a stateful scheme implementation, see:

https://en.wikipedia.org/wiki/Initialization_vector

Depending on whether the IV for a cryptographic scheme must be random or only unique the scheme is either called randomized or stateful. While randomized schemes always require the IV chosen by a sender to be forwarded to receivers, stateful schemes allow sender and receiver to share a common IV state, which is updated in a predefined way at both sides.

[deleted]

See Other Cloud Providers for other storage provider options.