Our lab develops approaches that generate testable molecular hypotheses from genome-scale datasets with an end goal of improving human health. Projects in the lab are generally focused on:
- Developing better gold standards of gene-function, gene-gene, or gene-phenotype relationships.
- Developing new machine learning approaches that better summarize or integrate genome-scale data.
- Applying new and/or existing machine learning approaches to important biological problems.
As you go through this document and as you discover new resources that relate to our software stack, submit a pull request to update this document with your discoveries!
We expect to develop methodological advances and analytical systems that will make analysis of big data, particularly gene expression data, as routine in wet-bench biology labs as PCR. To accomplish this, we will write good code, perform solid and reproducible analyses, and disseminate our results widely through approachable publications and webservers. We recognize that trust, both in the process and in our results, is of primary importance to the biologists that use our methods and webservers. Therefore, we strive to make our source code as open and approachable as possible. When we submit papers, we expect that the analytical code behind those papers will be something that we can be proud of. To these ends, we will provide reviewers and the scientific community with all source code required to generate figures in the paper that result from computational analyses.
Your role in the lab is to take primary responsibility for the success of your research project and career development. As a member of the lab, you are expected to participate fully in the team. When disagreements about methodological approaches arise, you recognize that these should be resolved through a solid and reproducible analysis of available data. In general, lab members are expected to be present from 9:30AM to 4:00PM on weekdays to facilitate discussion within the group. If you aren’t sure — ask.
Casey’s goal is to facilitate your success as well as that of your project. Within your project, Casey will serve as a sounding board for ideas, will help you plan your project, and will help to devise experiments to test your hypotheses. For your success, Casey will help you to plan your training, to devise a career plan that takes you where you want to go, to manage your portfolio of project risk, and other elements of professional development.
Going Above and Beyond (Bonus.ly)
We recognize that people in the lab regularly go above and beyond these expectations, and we wanted a way to recognize each other when this happens. We now use bonus.ly, which is a service that allows us to send a quick virtual thank you note and/or pat on the back. If someone’s paper gets accepted or someone helps you out with a programming question, congratulate or thank them. Slack includes /give syntax that you should explore (or /give someone a point for explaining it to you). When one individual accumulates enough bonus.ly points, the lab goes out to lunch.
Code of Conduct
All members of the lab, along with visitors, are expected to agree with the following code of conduct. We will enforce this code as needed. We expect cooperation from all members to help ensuring a safe environment for everybody.
The Quick Version
The lab is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, or religion (or lack thereof). We do not tolerate harassment of lab members in any form. Sexual language and imagery is generally not appropriate for any lab venue, including lab meetings, presentations, or discussions. (However, do note that we work on biological matters so work-related discussions of e.g. animal reproduction are appropriate.)
The Less Quick Version
Harassment includes offensive verbal comments related to gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention.
Members asked to stop any harassing behavior are expected to comply immediately.
If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact Casey Greene immediately. If Casey is the cause of your concern, Dr. Deborah Hogan (Deborah.A.Hogan@dartmouth.edu) is a good informal point of contact; she does not work for Casey and has agreed to mediate. For official concerns, please see the University of Pennsylvania ombuds office.
We expect members to follow these guidelines at any lab-related event.
Original source and credit:
http://2012.jsconf.us/#/about & The Ada Initiative. Please help by translating or improving: http://github.com/leftlogic/confcodeofconduct.com. The code of conduct section is licensed under a Creative Commons Attribution 3.0 Unported License.
Our most frequent meeting is our scrum. This is a 10 minute meeting that we have each weekday morning, currently scheduled for 9:30AM. The goal of the scrum is to touch base on projects and share accomplishments. Rene currently runs our scrums. In the scrum, we each update with:
- What you accomplished yesterday.
- What you will accomplish today.
- Who, if anyone, is blocking you (now or in the future)?
- Who, if anyone, are you blocking?
Scrums are generally held both in person and via appear.in/greenelab so feel free to join in at either place. If you cannot make it to a scrum physically or in appear.in, you are expected to provide updates via the “general” channel in Slack. Those who work partial schedules (part time and/or undergraduate researchers) or those who are on vacation, are not expected to scrum on days that they are not working.
Our weekly lab meetings have three major goals:
- Develop our abilities to write scientific software.
- Expand the breadth of our scientific perspective.
- Get ideas from the other smart people that we’re lucky to have in the group.
Lab meetings are divided into three categories to accomplish these goals: Code Review, Journal Club, and Braintrust. The lab meeting is a supportive environment, where we can discuss challenges, ugly code, etc in a safe context. The scheduling document is available at: https://docs.google.com/spreadsheets/d/1nlWyg51jfiNknmGMtArrCqsoo5gBv2cDuN2GqNyY40E/edit
For lab meeting, we default to code review. If there is an unscheduled braintrust spot, we’ll do a code review during that time. Code review order is maintained in the scheduling document as well (see “CR Order” column). For the code review, be prepared to discuss an item of code that you’re happy with as well as an item of code that you struggled with for any reason.
We have a journal club for 15 minutes at the start of each meeting. For journal club, prepare a presentation with 4-5 papers. All except for one should have been published since your last journal club presentation. The presentation itself should be simple and shouldn’t take much time to prepare. For each paper, the presentation should consist of:
- A title slide
- An overview slide (usually a flow-chart of some sort from the paper, could also be an initial result that sets context).
- The results figure that convinced you to pick this paper.
During the discussion, please share why you picked the paper, its implication for your research, and any potential implications that touch on other research that is ongoing in the lab.
For the braintrust section, present something that you wish to talk about to the group. This could be a confounding result, an interesting result, an analysis that isn’t working out, etc. This is your chance to have the group focus on and help you solve a challenge that you’re facing. Signups for this are on an as needed basis. As a rule of thumb, if you’re not asking for feedback at least once every three months via the braintrust, you’re probably not asking for enough feedback.
We will schedule weekly individual meetings. Once you join the lab, contact Casey to set up a time. These are set up for a term to accomodate class schedules. We don’t reschedule these meetings by default if one of the parties (Casey or you) are out of town, so if you do want to meet in a week but travel conflicts, please contact Casey to reschedule. The goal of the weekly meeting is to:
- Discuss challenges.
- Plan strategy (project, personal).
Information on how to get accounts on computers and in the services that the lab uses are below.
- The new member should learn what bitbucket is as well as how to use it (see section “Bitbucket”).
- The new member should create a bitbucket account, which will allow access to this document (someone needs to help with steps 1 and 2).
- The new member should then be added to the greenelab group (Casey, Rene)
- The new member should then be added to slack (Casey, )
- The new member should then be added to bonus.ly (Casey, )
- The new member should review this document to make sure that all necessary steps have been completed.
- Create an account on an open rotation station. (Casey, Rene)
- Copy existing user folders on the machine to an open rotation station. (Casey, Rene)
- Re-install the OS to reset user accounts and have the machine in “known good” state (Casey, Rene, or the new member).
Cluster Computing (if required)
TBD after the move to Penn.
As a computational lab, our major output is source code. This means it’s critical that we keep track of it and that we write code that we can be proud of. All source code, no matter how small, should be stored in a version control system. In our lab, we’re currently using bitbucket (primary) and github (secondary). For both services, we have organizations called “GreeneLab” that members are part of.
All code written for the lab should be committed to a repository with the greenelab organization as the owner. This helps us to maintain continuity and provides a single location for our code. You can make your code private or public.
If you make your code public, we have certain expectations:
- It will be under an open source license (specified in a LICENSE.txt file in the repository root).
- In certain circumstances, we will require a specific license. Usually this will be the BSD 3-clause license. If you’re not sure on a license, check with Casey.
- You will maintain a README file describing the purpose and use of the repository.
If you keep your repository private, we will have expectations:
- Your repository can be private during development, but expect that one day your code will be world readable.
- In the event that you publish a manuscript, we expect that lab members will simultaneously release a code repository capable of generating all of the figures in the paper for which it is appropriate (again, if not sure, ask, but if you wrote code to generate a figure at one point, it should be included).
- You will maintain a README file describing the purpose and use of the repository.
We aim to write good code, and we recognize that not all projects are created equal. Using the vocabulary of The Pragmatic Programmer, some code is for production and other code represents tracer bullets. In either case, it’s helpful to use a linter (see tools) and to minimize the number of problematic sections.
There are a few editors that are used within the lab listed below. In general, pick something that works for you and learn it well. We recommend an editor that supports linters and other programming-specific capabilities.
- If you’re writing python, try to stick to PEP-8.
- When you write a function, write a docstring. Give yourself the clues that you’ll need to figure out what it does when you come back to it one year later.
- During code review, if something comes up repeatedly add it here!
Put some useful resources here.
Also list useful resources here.
Science in General
Functional Relationship Networks
Bayesian integration into networks literature.
- Discovery of biological networks from diverse functional genomic data.
- Functional Knowledge Transfer for High-accuracy Prediction of Under-studied Biological Processes
- Understanding multicellular function and disease with human tissue-specific networks