-[Open SIGACT spreadsheet]
-First off, I want to thank everyone for being here for the discussion of SIGACTor.
-When I realize how many qualified people are here with so many knowledge bases, I'm excited about the progress we've made and I think everyone's insight and collaboration will be exceptionally useful.
-To start off I'll give a brief explanation of the product itself and where it's headed, and then I'll answer any questions and take feedback.
-The good part about dealing with highly qualified people is that I can skip the 'data is important' part.
-We all understand the importance of information to analysis, and you certainly don't need a silicon valley speech from me.
-Suffice it to say, ISW has a variety of needs for information processing that involve taking data from the real world and rendering it into forms that incorporate into our analytical methods.
+This summer I interned at the Institute for the Study of War, or ISW, a non-profit think tank that provides analysis and research onmilitary conflicts around the world for policy makers and public.
-SIGACTor is an application that attempts to speed up and ultimately eliminate the somewhat tedious process of repetitively transposing data from its organic formats into the structure we need.
-SIGACTor refines real world data, like a smelting ore.
-It gathers a big piece of information, removes components that are irrelevant to our needs, and then breaks down the useful pieces into the elements of data that fit into our systems.
+Like every type of analysis, a variety of information is required to do this well.
+A key piece of information for ISW is Significant activities, or SIGACTs, a database of kinetic activity, meaning, mlitary engagements, people shooting at eachother, in major conflict zones.
+Each activity is recorded in the database according to an ontology; the set of necessary fields and possible values for each field.
-So, rather than babble on about, let's see it work.
-We've been testing the Syria team's main source, the Syrian Observatory for Human Rights, or SOHR, which basically publishes information on every major incident in the Syrian Civil War.
-Everyday, the SOHR's posts need to be read and catalogued into SIGACTs, a set of pre-defined values which can be imported into Palantir.
-SOHR reports on lots of aspects of the Syrian conflict, including information we don't need or don't trust.
+ISW performs open source intelligence, which means they draw from publicly available information; news reports, published research, social media.
+These sources, while full of valuable information, are not generally designed to be easily integrated into a database.
+It's organic data: news article, scholarly posts, facebook posts.
+As a result, even though e lot of ISW's information came from a set of known sources, which were checked and integrated every day, processing this information into ISW's dataset involved several people manually clicking around the websites of these sources, opening a spreadsheet in excel, reading each post, poking around for data we actually needed, parsing it out, and entering each value manually.
-Before SIGACTor, processing the information from SOHR involved several people going to their Facebook page, opening a spreadsheet in excel, reading each post, poking around for data we actually needed, parsing it out, and entering each value manually.
This takes a lot of time and a lot of eyestrain.
+About a dozen people needed to do this every day for four or five hours.
Many values were redundant.
A lot of work was spent reading things that weren't needed in multi-person needle-searching game.
Manually entering each value led to typos and simple errors that are unavoidable in a data entry operation at this scale.
SIGACTs took up a lot of man-hours and weren't a great use of time for qualified and knowledgable personnel.
+Initially I dealt with data at a later stage.
+I wasn't aware this process was ongoing until I was killed into help when there was a data surge.
+When I saw all this manual data processing, I couldn't believe this was still being done in 2013.
+Interns and analysts weren't even using Excel functions to compile values that were based on other fields.
+Apparently this is fairly standard for a lot of security analysis operations.
+Everyone seemed blaise about it, but as a programmer, it made me want to tear my hair.
+Out of this process was born my first professional software product, SIGACTor.
-So how is SIGACTor different?
-I can close the Facebook page and the excel spreadsheet.
-Currently, SIGACTor runs in a terminal interface, so I'll pull that up, and then import and run SIGACTor.
-SIGACTor prompts the user for a date, so I'll enter 0900 yesterday.
-Without having to go anywhere else, SIGACTor finds the first post after that time and presents its content.
-SIGACTor then divides that post into activities, removing the data thats irrelevent and breaking the remaining parts down into individual SIGACTs entries.
-SIGACTor then parses each activitiy and determines the value for each property automatically from the text.
-After each activity from this post has been processed, SIGACTor automatically loads the next one, divides it up, and begins processing them.
+SIGACTor is an application that attempts to speed up and ultimately eliminate the somewhat tedious process of repetitively transposing data from its organic formats into the structured datasets.
+SIGACTor refines real world data, like furnace a smelting ore.
+It gathers a big piece of information, removes components that are irrelevant to our needs, and then breaks down the useful pieces into the elements of data that fit.
+I'd love to show you the whole application at work, but unfortunately, the specific ontologies ISW uses are protected information for a variety of reasons.
+So, as an example, when recording a military engagements described in a news post, you often want to record its location.
+The post may not simply say 'this event occurred in this location.'
+Different pieces of the location's name may be in different places.
+However, the dataset needs the location as a single, consistently formatted value.
+One source ISW often uses is the Syrian Observatory for human rights, a non-profit organizaton that puts out daily reports on military activity for each province in Syria on their Facebook page.
+SOHR reports on lots of aspects of the Syrian conflict, including information ISW doesn't need or doesn't trust.
+Rather than have a person spend a bunch of time surfing around a Facebook page while alt-tabbing with an excel spreadsheet, they can start up SIGACTor and enter sohr, which is the code for an already pre-processed dataset of SOHR's posts, and enter a date to start at. [2013-07-31-1358]
+SIGACTor analyzes the text in the post, and extract out the location, down to the X, formatted correctly for the dataset.
+If, for example, you wanted the Latitute and Longitude of that location, SIGACTor could look it up in a library, and if available, supply that as another field.
+For a complicated ontology with a large number of fields, these processes can save a lot of time.
SIGACTor isn't always right, yet.
Every now and then a typo in the post or a weird formatting will confuse things.
-At each stage, the information
is checked and reviewed by the analyst.
+At each stage, the information checked and reviewed by the analyst.
If the information is off, the analyst can override and input the correct value.
like the date or the title, are never wrong, which means the analyst doesn't have to worry about them at all.
+Many values, , are never wrong, which means the analyst doesn't have to worry about them at all.
-When the analyst is finished processing posts, SIGACTor takes this data and saves it into a clean, nicely formatted as a spreadsheet file, which can be loaded directly into Palantir.
+SIGACTor then takes this data and exports in the whatever format is necessary to be integrated into the database.
-SIGACTor brings two key benefits to the process.
+SIGACTor brings two key benefits to the process.
Instead of combing through 50 Facebook posts and clicking around into several windows and spreadsheets, SIGACTor provides a unified interface that allows the analyst to keep their hands on the keyboard.
The vast majority of values don't need to be entered by the analyst at all.
-SIGACTor is built on the back of powerful, well-supported technologies.
-SIGACTor is written in Python, an open-source imperative interpreted object-oriented programming language.
-SIGACTor uses Atom to create and mainpulate feeds of information, and YAML to store and manage data fluidly and accessibly.
-SIGACTor is really just getting started.
-Its hard for us to remember sometimes that we've only been using it for less than two weeks.
-We've made a lot of progress in that period.
-Our achievements so far is based on a few simple principles that our fundamental to SIGACTor's success.
-First of all is respect for the current ontology.
-SIGACTor is here to make the current data processing systems better, not to fight them.
-SIGACTor is designed to integrate with ISW's current information demands as smoothly as possible.
-It is SIGACTor's job to fit in with the data flow, not the other way arround.
-Second is a focus on efficiency.
-SIGACTor is a little different than other ways of data processing.
-It doesn't always follow traditional standards for interface and access.
-At every stage, SIGACTor is defined to get the job done as quickly and accurately as possible, even if that means straying a little from the norm.
-Third is ensuring data reliability.
-We want SIGACTor to be fast and easy, but not at the cost of compromising the data.
-SIGACTor gives lots of tools to make the sure the data is collected and analyzed correctly, even if that means taking a second to get things right.
-Last is confidence in computing power.
-SIGACTor, or any other program, cannot replace a qualified analyst.
-A common and understandable reaction when people hear about SIGACTor is "that's cool, but it will never be able to figure out 'x.'"
-SIGACTor is still young.
-There are a lot of things it can't yet do.
-But fundamentally, as long as you're dealing with known data, if a human can do something in the world of data processing, a computer can do it too.
-It may take some figuring out, but ultimately, it can be done.
-In two weeks, we've already crossed boundaries that many surmised could not be crossed.
-There are lot of things that people assume can only be figured out through the magic of human intuition.
-In the data analysis game, with skilled design, time, and experimentation, there are fewer of those things than you might think.
-Finally, I want to briefly outline where SIGACTor is heading and where it can go.
-As we all know, every operation needs a clear objective and exit strategy.
-SIGACTor is currently in the alpha stage, 0.1 (0.1dev.15 to be precise).
-During this stage, we're adding new features and tightening up remaining ones.
-Our main technical challenges are more powerful and consistent location determination, incorporating a broad library of Lat-Long's for known locations, so analysts don't have to manually find each location on a map somewhere, and solving the ever painful 'Multiple Event Problem', where a single sentence describes a wave of activities that occur at different times and locations.
-When these features have been incorporated, we'll move on to 0.2, the beta stage.
-This stage involves heavy testing and dissemination to find bugs and make small improvements.
-After the beta, SIGACTor will reach its first stable version, 1.0.
-SIGACTor will be a full featured terminal application that should consistently and powerfully accomplish its task.
-After 1.0 is complete, work on 2.01dev will be opened.
-SIGACTor 2.0 will take the same backend and attach a full GUI frontend.
-SIGACTor will use Kivy for its frontend development, a open library that will allow for broad deployabilitiy.
-SIGACTor will have an attractive, efficient graphical interface that runs on Windows, Mac, Linux, Android, and iOS.
-SIGACTor 3.0 will take this interface and move it to the web, transforming SIGACTor into a Django based web application that will allow even more fluid access, along with centralized storage, management, and control of work and content flows.
-As I mentioned, right now, SIGACTor only serves one country team from one source.
-It's an important source for an important country team, but obviously our aims are somewhat higher.
-In the near future, probably immediately after 1.0, SIGACTor will begin attempting to support additional sources.
-The current prime candidate is DVIDS targetting the Afghanistan team.
-Some preliminary analysis of this source has already taken place.
-In the long run, SIGACTor needs to be flexibly deployed to different outlets and country teams.
-To accomplish this, the core application will be fully source agnostic.
-Information for how to refine data from different sources will be contained in separate software modules or 'profiles.'
-Profiles will allow for fluid, 'snap-in, snap-out' exploitation of new information resources.
-To target a new source, the base program can remain the same, and a profile for the new source can be deveoped independently.
-This will allow for more rapid and consitent deployment.
-While I'm waxing eloquent about what should and will be, I'll briefly mention that the products of this process are not limited to only the SIGACTs data.
-These same sources are analyzed and manipulated, over and over again, to create other analytical products like the CCIR.
-While the analysis is at a higher level, this same process and interface could be highly beneficial.
-Standardizing and digitizing our data collection procedures can also facilitate the utilization of a variety of visualization and analysis technologies.
-Python tools like matplotlib allow for rapid analysis and dissemination of high level statistical analysis of formalized datasets.
-Before I turn things over to questions, I want to briefly make two acknowledgements.
-First off, I want to thank Maggie; SIGACTor should really be seen as a Communications Deparment product rather than a David product.
-Maggie's leadership, guidance, assistance and management have been hugely integral to the creation process.
-Anything you like is her, all the bad stuff is my fault.
-Second, mad props goes to the Syria team, who have been exceptionally understanding and patient as our alpha testers.
-Their cooperation has been vital to the progress we've made.
-Thanks for being guinea pigs and great team members.
-So now, I'd realy like to make things as open-ended as possible.
-Questions, comments, issues, suggestions, concerns, thoughts?
+Security analysis is changing.