logsnarf.schema module

class logsnarf.schema.Schema(schema_file, default_tz=<UTC>)[source]

Bases: object

The Schema class represents a BigQuery JSON schema.

Objects of this class are able to

Parameters:
  • schema_file (file) – File-like object containing the BigQuery JSON schema.

  • default_tz (datetime.tzinfo) – Timezone to use on date strings that don’t contain TZ information.

Raises:

ValueError – if the schema file doesn’t contain valid JSON

clearPostprocessors()[source]

Removes all post processors.

ignore_fields = ['table', '_sha1']

Fields in this list are permitted, even if they aren’t part of the schema. In Logsnarf we use this for the tables field, which tells us which table this log line belongs in, and we remove it from the entry before upload.

loads(json_string)[source]

Deserialize json_string into a python object.

This applies all schema checks and post-processors.

Parameters:

json_string (string|bytes) – utf-8 encoded string containing a JSON document.

Returns:

The JSON document as a python object

Return type:

dict or list or integer or float or unicode

Raises:
registerPostprocessor(fn)[source]

Register a post processor.

Registers a function to be called on the result of every JSON object decoded by the Schema object.

Parameters:

fn (callable) – A callable that takes on argument, the decoded JSON object, and returns the new version of that object.

setFieldValidator(field_name, fn)[source]

Override the validator for a particular field in the schema.

Parameters:
  • field_name (str) – The field name to replace the validator for. If referring to a field of a subrecord, use dotted notation. e.g. recordfield.subrecord.item

  • fn (callable) – A callable that recieves the root object, and the current value of the field, and returns the new value. In the case where the value is invalid, it should raise errors.ValidationError

setObjectLoadHook(fn)[source]

Set the object load hook used by json.loads.

Parameters:

fn (callable) – A callable that takes a non-literal, decoded json object, and returns an updated version of that object.

toUnixTimestamp(_parent, value)[source]

Validator for TIMESTAMP fields.

Parameters:
  • _parent (dict) – Parent of the value.

  • value (str or integer or float) – The value to validate.

Returns:

validated value

Return type:

float

Raises:

logsnarf.errors.ValidationError – if value is not, or can not be converted to, a unix timestamp.

validateJSON(root_obj)[source]

Validate that an object matches the BigQuery schema.

This involves
  • ensuring all fields in the object are known

  • all required fields are present.

  • running the field validators on each field

Parameters:

root_obj (dict) – the object (dict) to validate against the schema.

Returns:

validated object

Return type:

dict

Raises:

logsnarf.errors.ValidationError – if the object is not valid against the schema

validateSchema()[source]

Validate that the JSON document we loaded as schema, is valid.

static validateSchemaField(field)[source]

Validate a field of a schema.

For clarity this is implemented with asserts. During normal schema validation this is wrapped in a ValidationError in validateSchema

Parameters:

field (dict) – The field to validate.

Raises:

AssertionError – if the field is invalid.